[VC]AMD Fiji XT spotted at Zauba

Page 11 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
lol at that article, how is 20nm broken for GPUs?

Don't all TSMC nodes have a low performance and a high performance version? Just because TSMC decided to call their 20nm High Performance node "16nm FinFET" when it's obviously a 20nm node based on size and performance, and you can't build a GPU on their 20nm low performance node doesn't mean 20nm is broken, it just means its like how it always has been.

16nm FF is not the high performance of 20nm, it is a different process. Different FEOL, different rules, different performance etc etc.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
What if AMD's new architecture is very efficient?

How would one x-fire, 2, 3, 4 if each single GPU card has a similar set up like the 295x2?

From where would AMD pull off 2x the performance/watt on 28nm? You realize how architectures work, right? GCN was AMD's major redesign in 2011 and they have already stated on many occasions they are not changing this architecture. GCN will have gradual improvements in all areas but the core architecture will remain. If you look at AMD, they have a history of using the same fundamental base and growing the functional units with upgrades to memory bandwith, geometry, texture/color fill rate efficiencies. Their last VLIW architecture lasted a very long time.

Even though GCN 2.0 should have some improvements in performance/watt, AMD will gain a lot more from a shift to a less power hungry HBM, a more efficient 20nm, and WC. AMD doesn't have resources of NV to have 2-3 separate teams working on next gen architecture every 3-4 years. With NV they said they had a team working on Kepler for 4 years. That means an entire team worked for a long time concurrently with the Fermi team. Same with Maxwell. AMD can't do this but you continue living in the clouds assuming AMD = NV or Intel in resources. They aren't in the same league, not to mention graphics is not AMD's majority business either. They have many product lines from CPUs to APUs to servers that require resources and R&D. NV has what Tegra and they can focus 90% of their R&D and resources ONLY on graphics. That means even with identical R&D and employees, AMD has less to use towards deaktop graphics. How do you still keep ignoring this?

Therefore, for AMD to keep up, they need to take a lot more risks in adopting new technologies to overcome the lack of resources disadvantage. That's why AMD is way more likely to try more exotic options first like HBM, WC and 20nm. They don't have time or money to do major GPU architectural redesigns. We are going to get GCN 2.0, then 3.0, maybe 4.0 before AMD ditches GCN.

lol at that article, how is 20nm broken for GPUs?.

I guess what they meant in that article is yields are too low and risk of defects is too high for the node to produce large die high performance GPUs in 2015. They are basically saying AMD and NV are going to skip 20nm entitely. This contradicts Lisa Su's statements that AMD will have graphics on 20nm node. I can't imagine AMD building a 550mm2 die on 28, which is what they'd realistically need to compete. Imagine how hot and power hungry such a chip would run after 438mm2 Hawaii? If they reduce transistor density like NV, they will have more difficulty scaling SPs, TMUs, etc.

What if AMD just chose the super secretive approach to 390X this time and as a result most sites can't confirm them using 20nm? WC+20nm+HBM would certainly be a big breakthrough from 290X and its reference cooler.
 
Last edited:

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
This contradicts Lisa Su's statements that AMD will have graphics on 20nm node.

I dont believe Lisa Su said about 20nm Graphics, she said 20nm products.

Also, im estimating full Tonga at 200W TDP and close to or better than R9 290 performance.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,476
136
From where would AMD pull off 2x the performance/watt on 28nm? You realize how architectures work, right? GCN was AMD's major redesign in 2011 and they have already stated on many occasions they are not changing this architexture. GCN will have gradual improvements in all areas but the core architecture will remain. If you look at AMD, they have a history of using the same fundamental base and growing the functional units with upgrades to memory bandwith, geometry, texture/color fill rate efficiencies. Their last VLIW architecture lasted a very long time.

RS the 2x perf/watt at 28nm is not correct. Nvidia compare a 398 sq mm GM204 with a 294 sq mm GK104 and conclude that they got 2x perf/watt. You have to normalize for die size and wattage to conclude that its a perfect 2x perf/watt scaling which is not the case. If you say 2x perf/watt then GM200 should double the perf of GK110. But given that GM200 is likely to be 600+ sqmm and 250W TDP and maybe 50% faster than GTX 780 Ti its clear that 2x perf/watt is just marketing talk.

Nvidia did achieve a significant improvement in performance per shader and perf/ sq mm. They did it by improving SM efficiency. Maxwell was a redesign of Kepler to improve SM efficiency and improve perf/watt and perf/sq mm. 1 Maxwell SM or SMM with 128 cc provides 90% of the performance of a Kepler SMX which has 192 cc. Kepler improved on Fermi by going with a single clock domain and focusing on perf/watt and perf/sq mm. Maxwell refined it. The changes are significant but the core architectures are not radically different. you have 2 to 3 (max) grounds up new arcitectures in a decade. For nvidia it was G80 (late 2006) and GF100 (mid 2010). Mostly the grounds up new architectures are designed keeping in mind the latest DX API they were targetting. DX10 for G80 and DX11 for GF100. For Nvidia their next grounds up new architecture should be Pascal (DX12, HBM, NVLink).

Even though GCN 2.0 should have some improvements in performance/watt, AMD will gain a lot more from a shift to a less power hungry HBM, a more efficient 20nm, and WC. AMD doesn't have resources of NV to have 2-3 separate teams working on next gen architexture every 3-4 years.
Equating R&D size with ability co compete is not always correct. How is it that AMD executed so well with HD 4870 and HD 5870 and provided some of the most intense competition and price wars ever seen by the industry (especially during the GTX 275/HD 4890 timeframe when both launched for USD 250).

With NV they said they had a team working on Kepler for 4 years when Fermi launched. So obviously it was a different team. Same with Maxwell.
Do you believe Nvidia did not incorporate any of the learnings from the mistakes committed in development of the initial Fermi GF100 chip into future chip development projects. These teams might work in parallel but the learnings are definitely what contribute to improvement and drive future product goals. With Fermi GF100 Nvidia must have known from mid-late 2009 that they had problems. They fixed few of them with GF110 in late 2010. But the major focus on perf/watt and perf/sq mm was definitely driven by the mistakes and associated learnings from Fermi.

AMD can't do this but you continue living in the clouds assuming AMD = NV or Intel in resources. They aren't in the same league, not to mention graphics is not AMD's majority business either. They have many product lines from CPUs to APUs to servers that require resources and R&D. NV has what Tegra and they can focus 90% of their R&D and reaoucrs ONLY on graphics. That means even with identical R&D and employees, AMD has less to use towards deaktop graphics. How do you still keep ignoring this?
Today AMD's future depends even more on graphics than it did a few years back. What is saving the company from death is the semi-custom wins which were driven primarily by AMD's APU expertise and graphics IP. Today Nvidia and AMD are quite similar.
1.) Both have CPU cores . Nvidia has custom core Denver 64 bit. AMD has 2 CPU cores on which its working - Zen and K12. From 2016 AMD can easily use Zen to address the entire breadth of their product requirements. AMD is not interested in smartphones. They are interested in tablets,notebooks and desktops. So if a Broadwell core can be used in 4.5W Core M SOCs all the way upto 80W desktop SOCs and even 150w Xeon server chips, then Zen can be used to drive all of AMDs SOC from 4.5W - 95w.
2.) Both AMD and Nvidia develop GPU cores and standalone discrete GPUs.
3.) Both AMD and Nvidia develop SOCs - Nvidia Tegra and AMD's x86-64 SOCs and ARMv8 SOCs.

I agree AMD has lesser resources but that does not mean they cannot compete.

Therefore, for AMD to keep up, they need to take a lot more risks in adopting new technologies to overcome the lack of resources disadvantage.
Even though there is a lot of challenges AMD has a lot of opportunity to grow their business due to x86. I am looking forward to a well designed quad core x86-64 Zen SOC with 1024 GCN 2.0 cores and 128 GB/s HBM bandwidth. This will most likely be faster than HD 7870 and CPU performance should be competitive with Intel Haswell core i5.

That's why AMD is way more likely to try more exotic options first like HBM, WC and 20nm. They don't have time or money to do major GPU architectural redesigns. We are going to get GCN 2.0, then 3.0, maybe 4.0 before AMD ditches GCN.
I would not presume to tell what AMD does in future. But if history is any indication GCN iterations will be here till 2017 - 2018. AMD has had 3 major architectures in the past 12 years - R300 in 2002(9700, 9800, X1800, X1900 series), R600 in 2005 (Xbox 360, HD 2900XT, HD 4870,HD 5870, HD 6970 ) and GCN in 2012. That does not mean these iterations will not be significant improvements in architectural efficiency.

I guess what they meant in that article is yields are too low and risk of defects is too high for the node to produce large die high performance GPUs in 2015. They are basically saying AMD and NV are going to skip 20nm entitely. This contradicts Lisa Su's statements that AMD will have graphics on 20nm node. I can't imagine AMD building a 550mm2 die on 28, which is what they'd realistically need to compete. Imagine how hot and power hungry such a chip would run after 438mm2 Hawaii? If they reduce transistor density like NV, they will have more difficulty scaling SPs, TMUs, etc.
Lisa did not state that they are making GPUs specifically. She said they are designing in 20nm but stopped short of confirming which exact products are moving to 20nm. AMD did confirm Skybridge 20nm SOC x86-64 and ARMv8. But we know they are low power and small die size. btw what makes you think AMD cannot improve perf/shader, perf/CU and perf efficiency/CU. What makes you think AMD cannot go to a 128 stream processor CU design and improve perf/sq mm and perf/watt ?

Let me throw this in for speculation- 32 wide SIMD - 128 sp per SIMD, 4 SIMD per CU, 2 CU per shader engine, 4 shader engines. So total CU count would remain the same as in HD 7970. Shader engine, geometry engine and raster engine count would remain same as in Hawaii. But they also could sport performance and efficiency improvements.

http://www.anandtech.com/show/5261/amd-radeon-hd-7970-review/3

So you are looking at much better perf/sq mm and perf/watt due to improved SIMD and CU layout. Add architectural improvements to bring 10 - 20% in perf/shader. Combine it with better memory efficiency from Tonga (1.4x) and 60% higher memory bandwidth (512 GB/s on R9 390X vs 320 GB/s on R9 290X) (1.6 x 1.4 = 2.24) and you have 2.24 / (4096 / 2816) = 1.54 times memory bandwidth per shader compared to Hawaii. Improved ROP performance as seen in Tonga. Add to it tesselation improvements and other tweaks.

Remember if AMD achieve 50% better perf/watt and 50% better perf than R9 290X they would have an extremely competitive GPU against GM200. Not to mention they already have superior multi GPU scaling wrt CF vs SLI. My gut says AMD has gone with GF 28SHP(which also brings reduced leakage as seen in Beema/Mullins).

http://www.anandtech.com/show/7974/...hitecture-a10-micro-6700t-performance-preview

"AMD claims a 19% reduction in core leakage/static current for Puma+ compared to Jaguar at 1.2V, and a 38% reduction for the GPU. The drop in leakage directly contributes to a substantially lower power profile for Beema and Mullins."

AMD is making Kaveri, semi custom game console chips and GPUs at GF 28SHP. Just to remind you GF,Amkor and Hynix have been partnering with AMD from 2011 on 2.5D stacking.

http://sites.amd.com/la/Documents/TFE2011_001AMC.pdf
http://www.amd.com/Documents/TFE2011_006HYN.pdf
http://www.amkor.com/index.cfm?objectid=E6A2243B-0017-10F6-B680958B1E902E87
http://electroiq.com/blog/2013/12/amd-and-hynix-announce-joint-development-of-hbm-memory-stacks/
http://www.setphaserstostun.org/hc2...Bandwidth-Kim-Hynix-Hot Chips HBM 2014 v7.pdf

2.5D stacking brings a fundamental change to the way in which GPUs are built and no more is it just the foundry partner but the alliance of partners who you have worked with for years to bring this solution to market.
 
Last edited:

III-V

Senior member
Oct 12, 2014
678
1
41
You have to normalize for die size? What kind of nonsense is that?

If chip 1 is twice as large as chip 2, but consumes half the power of chip 2, it has double the performance per watt. Period.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,476
136
You have to normalize for die size? What kind of nonsense is that?

Yes you have to normalize for die size. If you want to claim that an architecture has improved perf/watt by 2x then the comparisons have to be made on a normalized die size basis.

If chip 1 is twice as large as chip 2, but consumes half the power of chip 2, it has double the performance per watt. Period.
Didn't you know that power scales to the square of voltage. Voltage required is related to clock frequency (which drives performance). So hypothetically I could achieve 2x perf/watt by going for a much larger die size and lowering voltage and clocks significantly. The downside to that logic is chip sizes cannot be scaled without limits. Technological limits restrict chip sizes to < 650 sq mm. This approach works well on GPUs which are massively parallel computational devices and graphics is one of the problems which works well on GPUs.
 

f1sherman

Platinum Member
Apr 5, 2011
2,243
1
0
@raghu78

perf/W is pretty much the constant figure for particular NV arch. And when it isn't it's due to binning, leftover dies, bandwidth AND/OR clocks.

For example GM204 with 10% lover clocks would murder Kepler in perf/W, and still retain perf/mm2 and perf edge.

perfwatt_1920.gif
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,476
136
@raghu78
perf/W is pretty much the constant figure for particular NV arch. And when it isn't it's due to binning, leftover dies, bandwidth AND/OR clocks.

For example GM204 with 10% lover clocks would murder Kepler in perf/W, and still retain perf/mm2 and perf edge.

When you make a statement that a certain architecture brings 2x the perf/watt of an older architecture you need to compare chips of same size. Otherwise there is no point in making the comparison.

by your logic GM200 should provide 2x the perf of a GK110 since GM200 should have similar TDP (250w) as GK110 and a slightly larger die size (600 sq mm). But we all know thats not going to happen. :D
 

III-V

Senior member
Oct 12, 2014
678
1
41
Yes you have to normalize for die size. If you want to claim that an architecture has improved perf/watt by 2x then the comparisons have to be made on a normalized die size basis.
This is asinine. There is plenty of area on GPUs that has no role frame rendering, and it will vary between manufacturer and product family. And you can't look at die shots to determine what should be left out, because Nvidia fakes theirs, and AMD rarely publishes them, leaving it up to TechInsights or Chipworks to provide them (which they rarely do). So you're left with assuming that circuitry is constant to run any guesswork, which leads to terrible imprecision.
Didn't you know that power scales to the square of voltage. Voltage required is related to clock frequency (which drives performance). So hypothetically I could achieve 2x perf/watt by going for a much larger die size and lowering voltage and clocks significantly. The downside to that logic is chip sizes cannot be scaled without limits. Technological limits restrict chip sizes to < 650 sq mm. This approach works well on GPUs which are massively parallel computational devices and graphics is one of the problems which works well on GPUs.
The die size limit may be be that small for TSMC, but larger chips have been put into mass production (Nehalem-EX, Tukwila).

Frankly, die size is not relevant to enthusiasts in most cases, except where there is a huge discrepancy between the price of a product, relative to its projected cost based on die size, wafer cost and yield. In these cases, enthusiasts should be aware that they're getting screwed, and refuse to buy the product.

In the end, all that matters is how much one is getting for their dollar. Perf/watt/mm2 might be an interesting academic metric (if you were able to account for unutilized circuitry as I pointed out above), but perf/watt/dollar is much more important. If I'm getting something that's twice as efficient as last year's model, or twice as fast, and costs just as much, that's the most important measurement of progress.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
When you make a statement that a certain architecture brings 2x the perf/watt of an older architecture you need to compare chips of same size. Otherwise there is no point in making the comparison.

by your logic GM200 should provide 2x the perf of a GK110 since GM200 should have similar TDP (250w) as GK110 and a slightly larger die size (600 sq mm). But we all know thats not going to happen. :D

That starts to be messy. Why stop at mm2?

You can then compare transistor to transistor. And you get some silly results like a HD7970 is less efficient than a HD6970.

We can also turn it around and claim that GM204 is 2.5x more efficient than a GK110 in terms of performance/transistor/watt. Or 2.68x in terms of performance/mm2/watt.

We simply just need to compare end product with end product.
 

f1sherman

Platinum Member
Apr 5, 2011
2,243
1
0
Chip size is not determining factor when it comes to perf/W as seen above.
So why would we want to normalize for die size?

OTOH it seems to me that you are forgetting to normalize for clocks.
Or maybe not, because this is design choice made by AMD/Nvidia itself.
But in order to isolate arch improvements, it would certainly make more sense than "normalizing for die size"

Maybe you are talking about normalizing for transistor density, or maybe normalizing for perf/mm2.
But former makes sense only over the same arch, and with later Maxwell demolishes Kepler even more.
So you are giving even more credit to Nvidia than they claim themselves in their "2x perf/W of Kepler" promo slides.

GM200 can not achieve 2x GK110 due to lack of bandwidth.
This is a hard limit which immediately stops any further speculation about 2xGK110.
 
Last edited:

f1sherman

Platinum Member
Apr 5, 2011
2,243
1
0
OTOH it seems to me that you are forgetting to normalize for clocks.
Or maybe not, because this is design choice made by AMD/Nvidia itself.
But in order to isolate arch improvements, it would certainly make more sense than "normalizing for die size"

No it wouldn't. I retract this.
I have no idea what you are talking about raghu78.

perf/mm2 is up with Kepler so any additional normalization of perf/W (why????) via die size, would only boost NV claims.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,476
136
That starts to be messy. Why stop at mm2?

Because finally you don't have limitless die size. GM200 is going to limited by die size. It will have roughly similar TDP as GK110 and it sure as hell wil not be 2x the perf of GK110. :D

You can then compare transistor to transistor. And you get some silly results like a HD7970 is less efficient than a HD6970.

Different processes will mess with making straight comparisons between HD 6970 and HD 7970. I agree perf/transistor is a good metric if the process node is similar. GM104 does very well there. close to 50% more transistors and close to 60% more perf. So improved perf/transistor

http://www.computerbase.de/2014-09/geforce-gtx-980-970-test-sli-nvidia/6/

We can also turn it around and claim that GM204 is 2.5x more efficient than a GK110 in terms of performance/transistor/watt. Or 2.68x in terms of performance/mm2/watt.

We simply just need to compare end product with end product.

Frankly GM204 should be only compared with GK104 as it does not have the Quadro/Tesla compute features and full double precision functionality of GK110. GM200 would be a fair comparison for GK110. :thumbsup:
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,476
136
No it wouldn't. I retract this.
I have no idea what you are talking about raghu78.

perf/mm2 is up with Kepler so any additional normalization of perf/W (why????) via die size, would only boost NV claims.

Take for instance perf
GM204 - 1.6 (60% higher perf according to reviews)
GK104 - 1.0

Watt -
GM104 - 0.8 ( I am being generous here as some reviews show GTX 980 drawing same power as GTX 770 while some less than 10% power reduction)
GK104 - 1.00

http://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_980/24.html
http://www.computerbase.de/2014-09/geforce-gtx-980-970-test-sli-nvidia/12/
http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review/21

Area - sq mm
GM104 - 1.35 (398 / 294 sq mm)
GK104 - 1.00

Perf/watt - 1.6 / 0.8 = 2.0
Perf/watt/sq mm - (Perf/watt) / sq mm = 2.0 / 1.35 = 1.48.

So you see when normalized you get the actual improvement which is close to a 50% improvement. I would say this is the actual Maxwell architectural efficiency improvement. This 50% perf improvement is also what we are likely to see between GM204 and GK110. :thumbsup:
 
Last edited:

f1sherman

Platinum Member
Apr 5, 2011
2,243
1
0
So what are we to do with GM107, when it comes to "actual performance" perf/W/mm2 metric?

GTX 750Ti (GM107)

Perf = 0.55

Watt = 0.3

Area = 0.5

Perf/W/mm2 = 0.55/0.3/0.5 = 3.666

Which makes it 266% improvement over 680.
In what world is GM204 100% downgrade compared to GM107.
Their perf/W is basically the same.

You are normalizing (dividing) twice for size - Watt and mm2.
Yet you and you are accounting for perf only once.

Maybe:
perf^2/(Watt x Area).
That would at least make some sense. Maybe... :)
 
Last edited:

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
No it wouldn't. I retract this.
I have no idea what you are talking about raghu78.

perf/mm2 is up with Kepler so any additional normalization of perf/W (why????) via die size, would only boost NV claims.

His point is actually very simple. It's really incorrect to say that Maxwell has 2x the perf/watt over Kepler architecture. Why is that? Because 980 on the same node isn't the same size as GK204. If you made a 561mm2 GM210, it wouldn't be 2x faster than 780Ti. Therefore, you would have just disproved the statement above. It's more correct to say that 28nm GM204 has 2x the performance/watt of 28nm GK204 but we can't say anything accurate about GM210 vs. GK110 from that.

But even the statement I made that GM204 is 2x more efficient is a lie. It's more like 60-75%.

Power_02.png

Power_03.png

Power_04.png


power-load.gif


980 is not 2x faster than GTX680/770 on average and unless you buy a reference 980, after-market 970/980 cards use way more power than 165W.

Don't get me wrong, Maxwell is very impressive given that they fixed OpenCL performance and perf/watt is a huge jump on 28nm node but it's nowhere near 2X in games. If Maxwell architecture was actually 2X faster in perf/watt over Kepler, then NV could have easily made a 28nm 561mm2 780Ti Maxwell successor at the same power usage with 2x the gaming performance. Does anyone believe a 28nm GM210 will be 2X faster than 780TI? Most people estimate 30-60%.
 
Last edited:

f1sherman

Platinum Member
Apr 5, 2011
2,243
1
0
His point is actually very simple. It's really incorrect to say that Maxwell has 2x the perf/watt over Kepler architecture. Why is that? Because 980 [...] isn't the same size as GK104.

How about GM107?

Its even smaller than GK104, yet retaining 2x advantage.

WTH....
 

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
About HBM power consumption and peformance/watt:
proGraphic4.gif


40% lower memory power consumption compared to GDDR5. <- does it apply to memory chips only, or is the memory bus included?

GDDR5 memory bus is very power hungry. Do we know how the HBM memory interface fares against it in power consumption? Would HBM lower power consumption benefits come not only from not using GDDR5 chips, but also more efficient bus design?

I guess those details are under NDA

Also, how long untill we see HBM in mainstream products? Anyone want to guess?
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
^

--

Anyway, the topic of this thread is AMD's Fiji (390X), not Maxwell. Not sure why nearly every AMD thread is getting derailed. The possibilities for Fiji are mid-size 20nm die + HBM or massive 28nm die + HBM & some kind of Hybrid WC system or some major redesign to the air cooling system. The chip will take all the architectural geometry, color fill-rate, memory bandwidth efficiency of Tonga and go bigger in terms of SPs, TMUs and geometry engines/ACE engines. Since no other data is available, there is hardly much to discuss about 390X at this point. What can be inferred though is GM210/390X should be short lived. With NV planning to already have Volta in supercomputers in 2017 (!), that means Pascal in 2016 and that means GM210/390X will have a very short window of bragging rights (<2 years). I expect major GPU performance increases once we get down to 16nm FinFET. This generation is going to be meh, honestly. That's because 980 SLI/290X CF cannot provide smooth gameplay at 4K in modern games. Even if you increase performance 50% from 980, it's still not enough for 4K next gen gaming. We would need 80-100% over 980/780Ti in a single-GPU card.

Since my next big upgrade is a 4K monitor, I can honestly say deep down inside I don't care for 390X or 28nm Maxwell as these will be more than likely be too slow for the task. I'll probably coast on my 7970s at 1080P until Pascal as I have no confidence Maxwell/390X will change anything for my 4K upgrade.

http--www.gamegpu.ru-images-stories-Test_GPU-Action-The_Evil_Within_-test-evilwithin_3840.jpg

http--www.gamegpu.ru-images-stories-Test_GPU-Action-Ryse_Son_of_Rome-test-Ryse_3840.jpg

http--www.gamegpu.ru-images-stories-Test_GPU-Action-Assassins_Creed_Unity-test-ac_3840_msaa.jpg

http--www.gamegpu.ru-images-stories-Test_GPU-RPG-Middle-earth_Shadow_of_Mordor-test-ShadowOfMordor_3840.jpg

http--www.gamegpu.ru-images-stories-Test_GPU-Action-Dead_Rising_3-test-dr_3_3840_smaa.jpg

http--www.gamegpu.ru-images-stories-Test_GPU-Action-Sniper_Elite_3_-test-SniperElite3_3840_ssaa.jpg

http--www.gamegpu.ru-images-stories-Test_GPU-Action-Battlefield_4_Dragons_Teeth-test-bf4_3840_120.jpg

^ These are existing games, but what about future games?

http--www.gamegpu.ru-images-stories-Test_GPU-Action-Evolve_Alpha-cach-evolve_3840.jpg

http--www.gamegpu.ru-images-stories-Test_GPU-RPG-Kingdom_Come_Deliverance_Alpha_-test-kkd_3840.jpg


Once we get Witcher 3, BF5 and so on, 4K will murder GM200/390, and even in CF/SLI they will be struggling unless GM200 is nearly 80-100% faster than 780Ti. We really need 1 GPU = 980 SLI and then 2-3 SLI/CF of them for 4K next gen gaming. That's why as much as it pains me to bring down the excitement of GM200/390X, I will probably be skipping them too. I have little desire to game on a small 1440P monitor and 30-32" 1440P (3440x1440) aren't far off the $999 4K IPS monitors. It's just a shame that 4K monitor prices are dropping so much faster than the performance increase 390X will bring.

I said before that Titan was a more or less a worthless future proofing gaming card if not buying 2 or more, since it was too fast for 1080P and not fast enough for 1440/1600P on its own. I said that by the time the next wave of games comes out to take advantage of the Titan's level performance at 1080P, we'd have cards for $300-400 with similar or faster performance. Fast forward to today and that's exactly what happened with 290X/970 at $300-350. Now we have a situation where 970 / 980 SLI are perfect for 1440P but they are meh for 4K, even in SLI form. With GPU makers moving to 16nm FinFET by 2016 most likely and 4K IPS monitors dropping in price rapidly, GM200 and 390X are just Titan repeats. If you want 1440P performance, you have 970/980 SLI now, so what's the point of waiting 6-12 months? For 1080P, excepting broken Unity, 7970s CF / 680 or 770 SLI is good enough since they ~ 980. For 4K, you'll likely still need 3-4 GM200s/390Xs, which is a large chunk of $.

This author agrees with me that we are stuck in no-man's land so to speak when it comes to 4K gaming right now, unless you go 3-4 high-end GPUs or turn some settings down.
 
Last edited:

96Firebird

Diamond Member
Nov 8, 2010
5,748
345
126
How much power does GDDR5 actually consume? Considering those chips are mostly passively cooled without an actual heatsink, I'd assume the GPU itself consumes (and displaces) most of the wattage required for a card.
 

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
How much is the lower power consumption from the cut back memory system and/or from rearranged smm config on maxwell? because the power difference isnt that great from a 780ti or 290x.
 

HurleyBird

Platinum Member
Apr 22, 2003
2,818
1,553
136
Die size can obviously be a factor in perf/W. Let's create an example where you have two chips, one 250mm2 and one 500mm2. For the sake of simplicity let's assume that the larger chip is exactly twice the smaller one, and also assume linear scaling. At the same clockspeed the 500mm2 chip is twice as fast as the 250mm2 chip. Halve the clockspeed of the 500mm2 chip and it's around around the same performance as the 250mm2 one, but the 500mm2 chip has far better perf/W because the lower clockspeed allows it to run at a far lower voltage, which is the most important adjustable factor in determining power consumption.

Basically, if you have twice the transistors doing twice the work at the same clock speed and voltage you'll have more or less 2x performance (assuming good scaling) and 2x the power consumption. If you have twice the transistors doing a bit less than twice the work, at a lower clockspeed, and at a lower voltage you'll have less than 2x performance, but better perf/w.

That said, GM204 runs at a higher clockspeed than GK104, and isn't close to a 250/300W thermal limit. Die size probably isn't much of a factor as theoretical +/-100mm2 derivatives will likely run at the same (or at least at a similar) voltage.
 
Last edited:

Abwx

Lifer
Apr 2, 2011
12,000
4,954
136
But even the statement I made that GM204 is 2x more efficient is a lie. It's more like 60-75%.

If those graphs are accurate the perf/watt improvement is the score difference with the 780TI in % to wich we can add a handfull watts better comsumption, this shouldnt be more than 30% overall.