What's going on with the memory bandwidth available for every GFLOP of compute crunching power? Did memory bandwidth requirements go down significantly in recent games due to a different mix of workloads? It seems that pure compute gets more important then texturing and other workloads specific to graphics rendering. Maybe on top of that there were some great advancements in memory compression technology along with bigger caches? I analyzed memory bandwidth available for every Gflop of theoretical compute and it turns out that it's been going down and the downward trend is accelerating. The utilization of all of that computational prowess is also going up which makes bandwidth scarcity even more notable then cold numbers would suggest. I made the analisis on NV hardware but the trend is mostly the same on AMD's silicon. I'll start with the first modern unified shader architecture as everyone probably remembers that is the G80 silicon on GF 8800GTX one of the greatest leaps in graphics performance.
Bandwidth 88.4GB/s Compute performance 518 GFLOPs BW to Compute ratio 167MB/s per GFLOPs. That's the most in my comparison.
65nm G90 in 9800GTX 70.4GB/s 648GFLOPs 108MB/s per GFLOP.
55nm G92b in 9800GTX+ 70.4GB/s 705GFLOPs 99MB/s
65nm GT200 in GTX 280 141GB/s 933GFLOPs 151MB/s per GFLOP. This one was a massive increase in bandwidth and the return of high memory bandwidth per GFLOP of compute.
55nm GT200B in GTX 285 159GB/s 1062GFLOPs 149MB/s per GFLOP.
40nm GF100 in GTX480 177 GB/s 1345GFLOPs 131MB/s per GFLOP.
40nm GF110 in GTX580 192 GB/s 1581GFLOPs 121MB/s per GFLOP.
28nm GK104 in GTX680 192GB/s 3000GFLOPs 54MB/s per GFLOP. That's the biggest reduction.
28nm GK110 in GTX780 288GB/s 4000GFLOPs 72MB/s. It seems that after every big relative reduction there's an increase and of course this is quite a big die cut for compute but not for memory controllers.
28nm GK110 in GTX780TI 336GB/s 5046GFLOPs 72MB/s per GFLOP
28nm GM200 in TITAN X 336GB/s 6144GFLOPs 66MB/s per GFLOP
16nm GP204 in GF1080 320GB/s 8228GFLOPs 38MB/s per GFLOP
GP204 relative memory bandwidth looks very bad in relative terms, didn't that card really need wide memory interface or is that improved memory compression enough? How low can we go? Just for the kicks I introduced some IGP/APUs to see how they compare
A10 7870K (886 GFLOPS) with 2133MHz DDR3 DC 34GB/s. Is that the fastest memory that can be run?
38MB/s per GFLOP. That's the same as GP204. We all know that APU is heavily memory bottle-necked and that this bandwidth has to be shared with the CPU but still, that's the same as 1080. What's 1080 trick to get away with such a relatively puny amount of BW? Is that memory compression so damn effective? Anyway this looks promising for the upcoming performance of APUs.
It's very interesting that such a mature technology as memory compression can still be improved by such a huge amount in just one generation.
Bandwidth 88.4GB/s Compute performance 518 GFLOPs BW to Compute ratio 167MB/s per GFLOPs. That's the most in my comparison.
65nm G90 in 9800GTX 70.4GB/s 648GFLOPs 108MB/s per GFLOP.
55nm G92b in 9800GTX+ 70.4GB/s 705GFLOPs 99MB/s
65nm GT200 in GTX 280 141GB/s 933GFLOPs 151MB/s per GFLOP. This one was a massive increase in bandwidth and the return of high memory bandwidth per GFLOP of compute.
55nm GT200B in GTX 285 159GB/s 1062GFLOPs 149MB/s per GFLOP.
40nm GF100 in GTX480 177 GB/s 1345GFLOPs 131MB/s per GFLOP.
40nm GF110 in GTX580 192 GB/s 1581GFLOPs 121MB/s per GFLOP.
28nm GK104 in GTX680 192GB/s 3000GFLOPs 54MB/s per GFLOP. That's the biggest reduction.
28nm GK110 in GTX780 288GB/s 4000GFLOPs 72MB/s. It seems that after every big relative reduction there's an increase and of course this is quite a big die cut for compute but not for memory controllers.
28nm GK110 in GTX780TI 336GB/s 5046GFLOPs 72MB/s per GFLOP
28nm GM200 in TITAN X 336GB/s 6144GFLOPs 66MB/s per GFLOP
16nm GP204 in GF1080 320GB/s 8228GFLOPs 38MB/s per GFLOP
GP204 relative memory bandwidth looks very bad in relative terms, didn't that card really need wide memory interface or is that improved memory compression enough? How low can we go? Just for the kicks I introduced some IGP/APUs to see how they compare
A10 7870K (886 GFLOPS) with 2133MHz DDR3 DC 34GB/s. Is that the fastest memory that can be run?
38MB/s per GFLOP. That's the same as GP204. We all know that APU is heavily memory bottle-necked and that this bandwidth has to be shared with the CPU but still, that's the same as 1080. What's 1080 trick to get away with such a relatively puny amount of BW? Is that memory compression so damn effective? Anyway this looks promising for the upcoming performance of APUs.
It's very interesting that such a mature technology as memory compression can still be improved by such a huge amount in just one generation.
Last edited: