... just that they had more time to make better drivers.
Specification is the number one thing when it comes to algorithm or coding, then comes the testing. While they may few a few working fermi back then, it is really not enough to execute driver testing. There are a lot going on from there to here and the TMSC shortage seriously slow down the process. It took 10.3 to arrive 6 months after cypress is released, and they probably have cypress 6 months before release. Again, Unigine wasn't there 1 year ago, so 5870 faced the edge of Dx11, but it pull through. Nvidia on the other hand can test their optimized code with unigine, allowing them to build a better driver under Dx11. Having said that, the 10.3 development have better advantage as they have both hardware/software tools. If we use cata 10 to do comparisons, then 480 will beat 5870 by 30%.
I didn't say that the Heaven bench was bias. I said that it's a synthetic benchmark and does things in a way that no game is likely to do them... mainly tessellation. Basing your opinion of an architecture being superior by a synthetic benchmark doesn't make sense to me...
Yes, it is the fastest single GPU card, no one denies that. That doesn't make it a superior architecture. Nvidia has known how fast the 5870 is since at least 9/09. Nvidia knew how fast Fermi had to be to beat the 5870, they did whatever they needed to do to acheive that. That doesn't make it a superior architecture. All it means in this case is that through brute force they pushed Fermi to be sure to make it faster than the 5870... Nvidia wasn't going to release a slower part. So to get it faster they had to push the power envelope creating a power sucking part that puts out tremendous heat compared to the competition. Think about it, a GTX480 uses a bit more power than a 5970 while being slower... hardly the sign of a 'superior' architecture. Can Nvidia even create a dual GPU Fermi based part to compete with the 5970? I have my doubts, this makes Fermi less dynamic than Cypress... again, hardly 'superior'.
Don't let Charlie's word get into you. If brute force works, then we don't need new hardwares. If you OC, then you know there are lots of boundaries that you can't break beyond heat. The card is design to take that much juice, not brute force as it won't work. Again, forcing 400 watts into your 5870 will fry it, that is all.
5970, 4970, 295, and 275 are more or less xfire/sli in one card. Unless SLI don't work on 480, otherwise making a dual GPU part is not a problem. now 1000 CUDA cores in total will beat 5970, but uses a lot more electricity, but what about 800 CUDA cores? 700? Say 768 CUDA cores will beat 5970, then each card only needs 384 CUDA cores. Well if they can make 480 CUDA core in one card, 384 is not a problem. In fact, increase yield too. What is the optimal number of cores for their first dual GPU can not be determined in these type of debate, but must be done by their Engineers in Lab. I really don't know, but I will say it is very doable.
I'm not being biased at all. AMD can afford to build the 5xxx parts from 'old' architecture because they made changes along the way. AMD was the first to use DDR5. AMD was way ahead of Nvidia in creating a 40nm part. AMD built GPU's that supported DX10.1 when Nvidia did not. AMD has built a tessellator into their hardware for years now. By AMD doing all of these things along the way they did not have to make the leap that Nvidia did. So what seems 'old' to you is the end product of making smaller changes more often. Nvidia is simply making a bigger change all at once. AMD did their homework.
Don't let the word "old" bothers you. We care if it works, not if it is new. If ATI can keep recycle the old design by making it smaller, yet keep being competitive down the road, then it is a good design as they will save lots of money on test, design, and test on new design. In fact, the 4xxx design was engineered to support DirectX10, where 2xx isn't, shrinking it make sense for ATI and the result is not bad at all.
By Superior, I referred it compare to 2xx, not cypress. Now we can argue until the end of time between 2xx design vs 4xxx design, but ATI proved that there are room for a shrink version of the 4xxx design. Nv may have agreed with that and save the 3xx series for a 2xx shrink, I don't know. I do know however, that the Fermi design work as good as if 2xx design were shrinked on Dx9/10, but wins by a lot on Dx11 plus some CUDA applications. Other than tessellation, ATI can't be compare to Nv ATM as they don't have the same common ground.
I have absolutley no idea what you are saying in the last paragraph. But something for you to consider. The GTX480 uses somewhere around 530mm2 of silicon on the 40nm process. The 5970 uses somewhere around 660mm2 silicon on the 40nm process. Despite using more silicon ,running at a bit higher clock speed, and having a seperate pool of higher clocked DDR5 per GPU, the 5970 uses slightly less power. It uses less power and offers significantly more performance. Which architecture would appear to be 'superior'?
All I am saying is, brute force won't work on hardwares. 480 uses more electricity because it was design to do so. Yes you may argue that 5970 uses less electricity, but 480 beats 5970 on tessellation. It is a pro and con situation. 4870 handle most games on max and you won't see differences with 5870, simply because 4870 delivers 50+ FPS already. In those cases, the extra electricity used by 5870 is wasted. Will you then call 5870 inefficient compare to 4870?