I also don't remember your exact verbage, but a few days back when replying to someone else about how a typical contract would be structured, you mentioned that if the yield is under x percent (maybe you used 20% in your example?) than the wafer would be free. I don't know how many dice Nvidia could get if they were able to get 100% usable silicon from a wafer, but assuming TSMC's process is so borked now, wouldn't they be able to get a good number of parts free right now, and than assuming TSMC gets the process fixed over the next few weeks Nvidia would still get enough Fermi dice to sell for the margins they want?
Neither TSMC nor Nvidia are idiots, of course, so those threshold yield numbers are going to account for the reality of attempting to yield much larger die when it comes to thresholds between the tiered payouts. Otherwise everyone would shoot for 600mm^2 die so they can get that one functional chip per wafer for free.
The other issue here is that its not just a matter of yields but also a matter of capacity. Even if Nvidia were looking at 10 free chips per wafer if all they have access to are 1000 wfrs per month they aren't about to have any kind of volume worth the cost of the infrastructure that must be setup to support releasing a SKU.
Also there are additional risks equals money to characterize, quantify, and mitigate if necessary) involved with chips from low-yielding wafers in that there is rather good precedence for low-yielding wafers to also have lifetime quality issues for the few good chips. Nvidia (nor AMD, or anyone for that matter) wants to seed the market with a handful of chips gathered off of questionable wafers which have high(er) risk of dying early in the field.
For example, one type of lifetime reliability issue that is related to GOI (gate oxide integrity) is so sensitive that if a single chip on a wafer results in a positive for this issue the entire wafer is scrapped even if the remaining chips test out as being fine.
Its a cost of elevating the
false-negative rate versus profit-loss of missing out on selling on few true-positives trade-off.
When buying a car the last thing you'd want to find out is that the car assembly line that produced your car is so bad that yours was only one of only five cars out of a 100 assembled that day which actually passed Q&A...you'd probably spend the rest of your life wondering what is wrong with your car too that the Q&A probably missed given how crappy the production standards are at the plant that results in 95% reject rate.
Its the same with businesses, they don't like the idea of elevating their risk of customer returns by shipping a few more true-positives on wafers that had a lot of bad die so they will just assume the remaining handful of true-positives (seemingly good die based on testing) on the wafer are actually all false-negatives (would have tested out as being bad chips had they undergone more rigorous/costly lifetime quality testing).
(note - I am intentionally glossing over a LOT of the details and fine print here for the sake of brevity)
The article on techreport (
http://www.techreport.com/articles.x/17815/4) puts GF100 with a core of 650 MHz, 1700 MHz for the shaders and 4200 MHz for the memory (201.6 GB/s) with 512 CUDA cores.
The shaders clock seem quite aggressive considering the Tesla parts seem to be clocked at 1200 MHz.
With this raw data and no idea of what architecture tweaks will do to GF100 vs GT200 and considering both RV770 and RV870, it seems that 20-30% should be the top performance increase over the 5870. In the article they say something along the lines of GF100 being up to GTX 285 SLI speeds.
So, GF100 up 20% faster than 5870 seems reasonable. With that, it seems 5970 will be faster than the GF100 by a decent margin.
If its going to have 50% more transistors it sure the heck better have a skosh better performance.
Quick question for the folks who are paying attention to performance numbers of Evergreen family...if we set aside the topic of RV770 vs. Juniper and why shrinking RV770 seems to have resulted in a less efficient architecture at this time and instead just focus on the performance scaling of a 166mm^2 1.04B xtor Juniper to a 334mm^2 2.15B xtor Cypress where would we expect the performance of a hypothetical 3.0B xtor Evergreen-class GPU to fall in terms of being 10% or 30% or 50% faster than Cypress?
Would the performance scale out to basically be the equivalent of a Juniper+Cypress or is the scaling efficiency of the Evergreen architecture not that high?