- Mar 21, 2004
- 13,576
- 6
- 76
A cerebral discussion on the "Differences and Similarities in Microarchitecture and Design Philosophies of Fermi and Tahiti".
This is not about which is a better GPU, the better buy, etc. This is not comparing products (like the 5870 or the GTX580).
Rather, this is a discussion on the architectural design decisions only in an abstract academic manner.
Some reading about Fermi and Tahiti architectures:
http://www.anandtech.com/show/2849
http://www.anandtech.com/show/4008/nvidias-geforce-gtx-580/2
http://www.anandtech.com/show/4455/amds-graphics-core-next-preview-amd-architects-for-compute
http://www.anandtech.com/show/5261/amd-radeon-hd-7970-review
Architecture:
The most obvious change in tahiti is that it replaces VLIW with simple SIMD; according to anantech that gives much more stable and predictable performance for compute. Usually faster, although sometimes slower then VLIW but overall more compute friendly.
AMD groups their SIMDs into arrays of 16, each such group gets its own 64K cache.
A CU (Compute Unit) is 4 such arrays
nVidia basic grouping is arrays of 32 SIMD units called SM, each group gets its own 64K cache.
4 of those are grouped into a GPC unit
So from that it appears that each SIMD on tahiti gets 2x the cache of fermi. I suspect this is at lest partially due to being designed for a 28nm process, allowing it much transistor budget to make cache with.
Aside from AMD having double the cache per SIMD unit, the two look very similar overall. There is only so much info to be gleaned from staring at architecture overview pictures, so I was wondering if someone who is more familiar with the technical aspects of it could point out where they diverge and why.
Philosophy:
According to anandtech's titles, Tahiti is "Architected For Compute" and Fermi is "Architected for Tesla". I agree that both place heavy emphasis on compute.
It seems to me that both nvidia an AMD currently believe the following two design philosophies
1. GPGPU is the future
2. It is better to develop a single GPU that balances its performance in GPGPU and in video gaming rather then a unique part for each.
I am surprised by this convergence of philosophy between the two since it seems an odd position to take. I would instinctively feel that it would be best to design two separate architectures for those two market segments. One of the biggest advantages of such segmentation is that you can charge an arm and a leg for compute parts from your corporate/science customers as your compute cards (eg, nVidia Tesla, AMD FirePro) do not compete with gaming devices.
The second is that you could eke out slightly more performance in each field by tossing out unneeded parts... A tesla card can do without most of the fixed function gaming stuff and VLIW4 has done very well against SIMD arrays in gaming.
However, I don't have the actual cost numbers to back this up. How much does it actually cost to develop an architecture?
It occurs to me that another plausible explanation for the "one chip for both" choice is not saving money on design costs, but if they believe that games are going to make heavy use compute as well. If that occurs then the GPU architectures designed with compute in mind will do very well on such games.
IIRC the makers of the very popular unreal engine suggested as such in an interview. Are there other indications that games are going to be heavily compute dependent in the future?
This is not about which is a better GPU, the better buy, etc. This is not comparing products (like the 5870 or the GTX580).
Rather, this is a discussion on the architectural design decisions only in an abstract academic manner.
Some reading about Fermi and Tahiti architectures:
http://www.anandtech.com/show/2849
http://www.anandtech.com/show/4008/nvidias-geforce-gtx-580/2
http://www.anandtech.com/show/4455/amds-graphics-core-next-preview-amd-architects-for-compute
http://www.anandtech.com/show/5261/amd-radeon-hd-7970-review
Architecture:
The most obvious change in tahiti is that it replaces VLIW with simple SIMD; according to anantech that gives much more stable and predictable performance for compute. Usually faster, although sometimes slower then VLIW but overall more compute friendly.
AMD groups their SIMDs into arrays of 16, each such group gets its own 64K cache.
A CU (Compute Unit) is 4 such arrays
nVidia basic grouping is arrays of 32 SIMD units called SM, each group gets its own 64K cache.
4 of those are grouped into a GPC unit
So from that it appears that each SIMD on tahiti gets 2x the cache of fermi. I suspect this is at lest partially due to being designed for a 28nm process, allowing it much transistor budget to make cache with.
Aside from AMD having double the cache per SIMD unit, the two look very similar overall. There is only so much info to be gleaned from staring at architecture overview pictures, so I was wondering if someone who is more familiar with the technical aspects of it could point out where they diverge and why.
Philosophy:
According to anandtech's titles, Tahiti is "Architected For Compute" and Fermi is "Architected for Tesla". I agree that both place heavy emphasis on compute.
It seems to me that both nvidia an AMD currently believe the following two design philosophies
1. GPGPU is the future
2. It is better to develop a single GPU that balances its performance in GPGPU and in video gaming rather then a unique part for each.
I am surprised by this convergence of philosophy between the two since it seems an odd position to take. I would instinctively feel that it would be best to design two separate architectures for those two market segments. One of the biggest advantages of such segmentation is that you can charge an arm and a leg for compute parts from your corporate/science customers as your compute cards (eg, nVidia Tesla, AMD FirePro) do not compete with gaming devices.
The second is that you could eke out slightly more performance in each field by tossing out unneeded parts... A tesla card can do without most of the fixed function gaming stuff and VLIW4 has done very well against SIMD arrays in gaming.
However, I don't have the actual cost numbers to back this up. How much does it actually cost to develop an architecture?
It occurs to me that another plausible explanation for the "one chip for both" choice is not saving money on design costs, but if they believe that games are going to make heavy use compute as well. If that occurs then the GPU architectures designed with compute in mind will do very well on such games.
IIRC the makers of the very popular unreal engine suggested as such in an interview. Are there other indications that games are going to be heavily compute dependent in the future?
Last edited:
