A very well written article, but being a nitpick, I must point out a one little detail:
<< The NV-20 was originally scheduled to arrive this fall and provide a truly DirectX 8 compliant chip. Unfortunately the NV-20 weighs in at more than 50 million transistors! As a reference, the Intel Pentium 4 has 43 million transistors and takes up 210 mm squared per die on a .18 micron process! This would make the NV-20 absolutely huge on currently available processes! >>
One cannot directly compare die sizes with respect to the amount of transistors. At 210mm^2, fourty-million transistor P4 core is absolutely huge, but this is not only because it has buttloads of transistors, die size also greatly depends on a variety of factors:
- chip layout. A chip can be very tightly laid out to minimize die size, or it can be speculatively laid out to acommodate future additions/fixes/steppings with as easily as possible. My bet is that Intel laid out P4 wastefully just for this latter reason - unlike NV20 for nVidia, P4 architecture's going to stay in Intel's business for years to come.
- logic vs. cache. Various parts of the chip vary in transistor density. At least for AMD's copper process, cache SRAM transistors are much more densely on the die than logic transistors. And P4 has lots of logic.