I can't see how human designers could achieve this - purely from a time & organisational perspective.
No human could design a modern CPU transistor by transistor. However, that's not how it's done. You can make heavy use of abstraction, and separate the design into independent parts with clean interfaces.
For example, you are designing a CPU that needs a cache subsystem. You design the inputs and outputs, define where on the chip they will be, and what the protocol for using the cache is -- down to "these wires are the address wires, and these are the data wires. When I pull read control line up, you look up the address on the address lines, and if found, return the contents on the write lines." Then, the cache team can do their work independently, without having to bother the rest of the cpu team again. And the first thing the cache team would do is to design an interface between, and split into tag array team and data array team.
Division of labour is fundamentally how everything that's too hard for individuals is made, whether we are talking about cpus, jet planes, or pencils.
Matt Ridley's awesome TED talk on the subject.
Then, when you can split no more, you still don't need to count every transistor remaining in your budget. If you are in the data array team, you design a single bank, and copy paste that into the 32 banks on the chip. Inside the bank, you design a single cache row, and copy paste that into the 1024 rows in there, etc etc.
Then there's the actual manufacturing. Most videos show some kind of physical abrasive process being used at various stages during manufacture to grind down the top surface. I find this very difficult to believe - as would seem a very crude process - given the scale of the chip components involved.
Today, we have the ability to grind with the precision of individual crystal planes, or, to the accuracy of the width of one atom. How did we get that far? Start with grinding down to roughly visual accuracy, have strong economic advantages for having more accuracy, and have a lot of really smart people working on it for a century, always going after the thing that's currently stopping you from grinding just a little bit finer. That's how technological progress works.
As for the chip manufacture itself, we basically have only 5 tools:
1. Uniformly grind down to a plane.
2. Deposit an uniform layer of something on the surface.
3. Use a chemical to remove some specific coating from the surface.
4. Use a visual mask to do lithography -- basically, shine an UV light on the die, and put a mask between the light and the die so that you can have structures that are not exposed. With fancy optics, you can make the mask a *lot* bigger than the die, which helps a lot in making the damn things.
5. Bombard the die with ions, such as for doping the silicon.
With these, it's possible to build almost anything, given enough steps. Let's say you want to build a layer of wires, with the surface of the die presently mostly coated with the interconnect dielectric, with the tops of the vias that lead to the metal layer below sticking out.
You start by coating the whole chip uniformly with photoresist. Then, use a lithography mask to expose the parts where the wires are going to be to UV. Then, use a chemical that attacks the parts of the photoresist that have been exposed to UV (but doesn't attack the parts that haven't), to remove the "channels". Then, uniformly coat the whole chip with copper. Plane the chip down slighly below the top level of photoresist. Use another chemical to remove the rest of the photoresist. Coat entire chip with dielectric. Plane down so that the copper is visible. You get how this works?
Most modern chips have dozens of individual mask steps, and always several "uniform" steps between them.
I can't follow how it would be possible to reliably create a processor with so many billions of components - 100% functional.
Depends on your definition of reliable. We absolutely cannot reliably produce chips with billions of components, where all of them are 100% functional. What we do is build a lot of functional redundancy -- instead of putting in 16 cache banks, put in 17, and the logic to replace one of the lines with the spare one if doesn't pass the tests. This is again done at multiple levels -- for individual rows, the banks, and in the end for the whole chip, where we "harvest" 2-core chips out of 4-core dies when they have an error in some logic that isn't easy to make redundant.
And even with all that redundancy, it's perfectly normal to ship product when 80% of the chips that come out of the assembly line are complete duds. I believe that GTX480 had <1% yields when it first shipped...