We very rarely needed to pull out an electron microscope. That's why the job wasn't very much fun. Electron microscopy is actually a lot of fun.
Chips usually fail for similar reasons. If you run a design at a given frequency at a given voltage, then usually you will see that (for example) at Transistor A in Unit B will blow due to some design marginality. The first time you see a chip with this failure, it's a big hassle to figure it out. You get out the electron microscope, you prowl around, you figure out what the problem is. But the total number of problems is usually pretty small for a given design run in a specific condition.
Contrary to what you might think, chips almost never just "die" as in nothing happens when you plug them in and apply power. Usually just one or two pins are wrong. This makes it impossible to actually use it in a computer, but you can put it into a $100+ million dollar tester (from Schlumberger, Advantest, or Agilent) and the tester will tell you exactly which pins are failing when and how. This creates a "pattern" on the screen of the tester.
A pattern can nearly always be associated with a specific failure. So, if for example transistor A in Unit B fails, then when we do "xyz" to the CPU, we see that pins #1, #2 and #3 go red on the tester. Then we throw the chip away and check off that we've found another problem that relates back to this specific catagory (transistor A in Unit B).
So that's how defect binning works. It's usually very quickly done from a tester perspective. The problem is the ones that don't fall into predictable patterns. These take a long time. The overall process is also not exactly pressed for time, so the pipeline is usually pretty long (kinda like the Pentium 4). From the point that the dead part arrives until it is binned can take a fair amount of time mostly just due to the fact that the binning process uses multiple steps.
There's another thread in this forum about whether AMD does a similar process. I pretty certain that they do. In fact every major manufacturer from chips to cars to toys probably does this. You get a return, you test it to find the problem, you bin the problem into a catagory and you provide feedback to the designers as to what the problem is and what the percentage of returns that occur with the problem. Then a priority list is developed and the problem is fixing in future designs (or steppings, in this case).
I'm actually somewhat surprised that you guys seem to be under the impression that manufacturers don't do this. Do you really think major manufacturers only examine returns for exterior visible defects? Or that they just toss all the returns into the garbage? If you don't examine returns for defects how can you improve the design?