[Rumor, Tweaktown] AMD to launch next-gen Navi graphics cards at E3

Page 107 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
Why are they even bothering comparing a 5500 to the RX480 instead of an RX570 or RX580 for this particular test, performance/watt? What are they even thinking? I don't think ANYONE cares how this card performs against a card that is 3 years (and 3 generations now) old.

They could have easily run an honest test against two-generation-old (instead of three-generation-old) GPUs by slapping an RX570 into the same system they tested the 5500 in. Or against a one-generation old GPU by running the performance/watt against a Vega.

Them including an RX480 just doesn't make any sense.
Two words: OEM Launch.
 

soresu

Diamond Member
Dec 19, 2014
4,244
3,748
136
I would not go that far. Based on what we know about 7 nm process, it should yield 2x the density over 14 nm process.
Yes, and Polaris 10 is 220-230 mm2, therefore a 7nm shrink should net a 110-115 mm2, but lets say 130-135 mm2 for argument sake.

Navi 14 is 158 mm2. That's not a great difference over 135, but a huge difference over 115, especially if you compared them at 14nm, where the difference would be closer to the area of the Zen2 CCD.

At the very least you have to acknowledge that their perf/mm2 has been on a downward slope since they moved away from VLIW, I'm truly a bit baffled as to the why of this to be honest.

I know that things like HMCC, primitive shaders and AI instructions have been added, but even so the area utilisation seems woeful for a 3+ year jump on a new process.
 

soresu

Diamond Member
Dec 19, 2014
4,244
3,748
136
Them including an RX480 just doesn't make any sense.
That was my initial thought too.

It does when you bear in mind that their efficiency increases in the GPU arena are not remotely as promising as their CPU ones I'm sad to say.

The rumors of Navi delays might point to problems with the uArch that were not fully addressed even after Vega 7nm filled in for 6 months, perhaps they simply got so far in bug/errata fixing and decided to kick the problems down the road a generation (and 7nm+ process uptick presumably).

I'm holding out hope that this gen was a rush job to get RDNA v1 off the production line so that they can concentrate on RDNA v2, I guess we'll see if I'm right next year.

Another possibility is that the wheels were still spinning so fast on CPU/Zen team that taking them off and dedicating them to GPU took too long, or was started too late to significantly benefit RDNA v1 given the schedule they had to keep to - the Turing raytracing release may have forced their hand to get a 'new' uArch out ASAP, even if its first implementation was not fully up to scratch.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
Yes, and Polaris 10 is 220-230 mm2, therefore a 7nm shrink should net a 110-115 mm2, but lets say 130-135 mm2 for argument sake.

Navi 14 is 158 mm2. That's not a great difference over 135, but a huge difference over 115, especially if you compared them at 14nm, where the difference would be closer to the area of the Zen2 CCD.

At the very least you have to acknowledge that their perf/mm2 has been on a downward slope since they moved away from VLIW, I'm truly a bit baffled as to the why of this to be honest.

I know that things like HMCC, primitive shaders and AI instructions have been added, but even so the area utilisation seems woeful for a 3+ year jump on a new process.
Why don't you, if you want to speculate how big would be Polaris die on 7 nm process, just take the Polaris 10 transistor count and divide it by 40 mln xTors/mm2? Your assumption then will have hands and legs, because you right now speculate based on what you Believe SHOULD be happening.

Secondly - there is no guarantee that Polaris would be more efficient. Navi is clocked to hell. What makes you believe that Navi with similar clocks to Polaris GPUs would not be more efficient(hint: it is)?
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
Meaning what exactly?

That a better, non-OEM product launch will be for 5600?

It's still odd that they launched Navi 14 first (as in before Navi 12).
It simply means, that AMD launched product for OEMs, first, and that marketing blurb has little meaning.
 

soresu

Diamond Member
Dec 19, 2014
4,244
3,748
136
Secondly - there is no guarantee that Polaris would be more efficient. Navi is clocked to hell. What makes you believe that Navi with similar clocks to Polaris GPUs would not be more efficient(hint: it is)?
P10 is a 36 CU part - Navi 14 is a 22 CU part and still bigger than a 7nm P10 would have been, without clocking higher than it was before, Navi 14 is clearly clocking well outside of its comfort zone, that much is not in question.

Great for AMD's bottom line in regards to die costs, but terrible for consumer power efficiency, at least the console makers are trying to get that from AMD hardware even if AMD themselves aren't.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
P10 is a 36 CU part - Navi 14 is a 22 CU part and still bigger than a 7nm P10 would have been, without clocking higher than it was before, Navi 14 is clearly clocking well outside of its comfort zone, that much is not in question.

Great for AMD's bottom line in regards to die costs, but terrible for consumer power efficiency, at least the console makers are trying to get that from AMD hardware even if AMD themselves aren't.
You realize that first, you do not know how Polaris 10 would scale on 7 nm process, secondly, you realize that Navi is doing more, with less, hence why it is more efficient?

P.S. How efficiency of 36 CU Polaris 10 compares to the efficiency of 36 CU Navi 10 GPU? ;)
 

VirtualLarry

No Lifer
Aug 25, 2001
56,587
10,227
126
I just installed my RX 5700 reference (XFX). Started mining on it with newest version of NH. Miner came up, said 18 CUs detected.

You were just talking about 36 CU Navi 10. What is the 5700, and how many CUs is it supposed to have? If Navi 14 has 22, shouldn't Navi 10 have more?

Edit: I found this subject discussed on Reddit.


Apparently, ETH mining with Claymore is memory-bandwidth-intensive, but not core-intensive, so they actually only use half of the CUs.

Maybe they'll come out with a dual-miner with RavenCoin, that would be cool.
 
Last edited:
Mar 11, 2004
23,444
5,852
146
I just installed my RX 5700 reference (XFX). Started mining on it with newest version of NH. Miner came up, said 18 CUs detected.

You were just talking about 36 CU Navi 10. What is the 5700, and how many CUs is it supposed to have? If Navi 14 has 22, shouldn't Navi 10 have more?

It has 36.
 

soresu

Diamond Member
Dec 19, 2014
4,244
3,748
136
P.S. How efficiency of 36 CU Polaris 10 compares to the efficiency of 36 CU Navi 10 GPU? ;)
How is a FAR larger GPU for the same number of CU's doing more with less?

Your assertion of P10 not scaling with process is without any basis in fact compared to my actual asserted math, you might as well say the same of any semicon design once you take the road of saying that.

I'm not saying this because I'm hating on AMD, I'm just hella confused about where their priorities for die space lie, because area efficiency does not appear to be a consideration based on their continually increasing size for a given number of CU's per generation since GCN began in 2012.

It might be more efficient for a given number of CU's in terms of AMD's bottom line, but not in terms of absolute power efficiency to the consumer, because they have to resort to efficiency eroding clock scaling to make up the shortfall.

Since P10 they have repeatedly picked a GPU/CU size/number which means that competing in a specific market segment requires voltages/clocks that ruin whatever intrinsic watt/mhz efficiency the uArch has, so it effectively doesn't even matter what that intrinsic efficiency is at all, because AMD is basically ignoring it for raw performance - leaving only the APU segment for actual efficiency.

As I've mentioned before, that's fine for people wanting that with no care to system noise or power consumption, but I have a mind to both - for me a quiet, reasonably low power system would be ideal, but I still want a decent mid range card too, is it really too much to ask?
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
How is a FAR larger GPU for the same number of CU's doing more with less?

Your assertion of P10 not scaling with process is without any basis in fact compared to my actual asserted math, you might as well say the same of any semicon design once you take the road of saying that.

I'm not saying this because I'm hating on AMD, I'm just hella confused about where their priorities for die space lie, because area efficiency does not appear to be a consideration based on their continually increasing size for a given number of CU's per generation since GCN began in 2012.

It might be more efficient for a given number of CU's in terms of AMD's bottom line, but not in terms of absolute power efficiency to the consumer, because they have to resort to efficiency eroding clock scaling to make up the shortfall.

Since P10 they have repeatedly picked a GPU/CU size/number which means that competing in a specific market segment requires voltages/clocks that ruin whatever intrinsic watt/mhz efficiency the uArch has, so it effectively doesn't even matter what that intrinsic efficiency is at all, because AMD is basically ignoring it for raw performance - leaving only the APU segment for actual efficiency.

As I've mentioned before, that's fine for people wanting that with no care to system noise or power consumption, but I have a mind to both - for me a quiet, reasonably low power system would be ideal, but I still want a decent mid range card too, is it really too much to ask?
You clearly should read about what TSMC hyped about TSMC's N7 process, and how it turned out in reality.

What if TSMC has missed their performance/power/area targets by huge margin, and is the reason why Nvidia skipped N7 process entirely, and is going straight to EUV process?

RX 590 uses 230W of power. What if N7 allows you to cut it by 30%? You still believe that it will be more efficient than Navi 14, knowing that cut down Navi 14 die is going to be faster than RX 590, by around 10-15%, while still using less power?
 

soresu

Diamond Member
Dec 19, 2014
4,244
3,748
136
I really hope that AMD's bet is somehow on low clocked, ultra high efficiency, multi chiplet GPU designs in the future, because their generational efficiency gains don't even remotely match up with their 25x20 goals, at least outside of the APU arena.

They don't even seem to line up with the process gains, which looks especially bad when you look and see how much of a gain they got from Zen 14nm to Zen2 7nm.

I really hope that this is more the tail end of Raja's later hardware work than any indication of the future prospects from the Zen staff that were moved into the GPU team. I say hardware, Raja's driver focus was good - gotta give him that.
 

soresu

Diamond Member
Dec 19, 2014
4,244
3,748
136
RX 590 uses 230W of power
RX 590 is not running at the same clockspeed as the 480, you are moving the goal posts to fit your argument.

12nm gave P30 some further headroom - which it had already blown past with RX 580, they then asked more of that poor piece of silicon....
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
I really hope that AMD's bet is somehow on low clocked, ultra high efficiency, multi chiplet GPU designs in the future, because their generational efficiency gains don't even remotely match up with their 25x20 goals, at least outside of the APU arena.

They don't even seem to line up with the process gains, which looks especially bad when you look and see how much of a gain they got from Zen 14nm to Zen2 7nm.

I really hope that this is more the tail end of Raja's later hardware work than any indication of the future prospects from the Zen staff that were moved into the GPU team. I say hardware, Raja's driver focus was good - gotta give him that.
Navi is just a factor of two things: Hardware Scheduling on GPUs, and TSMC's failure on delivering the promised performance uplifts.

Nvidia has software scheduling, which saves power, and has very good silicon(physical) design team.

But don't expect miracles from next gen Nvidia and Intel GPUs. With smaller processes, and Heat Density from transistors, we are getting into planes of diminishing reutrns for large dies, like GPU dies. We may never again see sub 75W GPU category like GTX 1050 Ti was.
RX 590 is not running at the same clockspeed as the 480, you are moving the goal posts to fit your argument.

12nm gave P30 some further headroom - which it had already blown past with RX 580, they then asked more of that poor piece of silicon....
Redacted You clearly do not understand the principles of power scaling with clock speeds and design of the GPUs?

I will ask it this way. What makes you believe that Navi 14 running at RX 480 clock speeds would not be more efficient than that RX 480, even if Polaris 10 would be on the same process?

I told you that Navi is doing more with less: ergo - more work with less cores. What makes you believe that scaling 1:1 Polaris 10 on N7 process would result in more efficient GPU, when you factor ALL OF THE ARCHITECTURE, PROCESS AND DESIGN FACTORS?!




No profanity allowed in the tech forums. Even using asterisks, is not allowed.


esquared
Anandtech Forum Director
 
Last edited by a moderator:

soresu

Diamond Member
Dec 19, 2014
4,244
3,748
136
Navi is just a factor of two things: Hardware Scheduling on GPUs, and TSMC's failure on delivering the promised performance uplifts.
The performance of Zen 2 vs Zen 1 makes this assertion questionable at best, and I really do mean at best.
 
Mar 11, 2004
23,444
5,852
146
P10 is a 36 CU part - Navi 14 is a 22 CU part and still bigger than a 7nm P10 would have been, without clocking higher than it was before, Navi 14 is clearly clocking well outside of its comfort zone, that much is not in question.

Great for AMD's bottom line in regards to die costs, but terrible for consumer power efficiency, at least the console makers are trying to get that from AMD hardware even if AMD themselves aren't.

Not sure how you can say that.

This simply boils down to the fact that due to the complexity and costs of newer processes (that rarely offer the doubling of transistor per mm). Can we just skip this same argument over and over and over and over? Yes it stinks, but its not like there's anything you can really do about it as no one is changing that.

The reason why we aren't getting Polaris shrunk to half the size on 7nm, is because it costs a lot of money to design and engineer a GPU for a new process. So they're spending the money on a new GPU architecture that will be the building block. They shrunk Vega because it had outsized compute capability that is valued in certain other markets (and that Navi was not going to best; plus Vega is going in the early 7nm APU so it needed to be shrunk as well). They were not going to spend the money to shrink Polaris, do Navi, and do "true Navi" or whatever the architecture for the next consoles is going to be. They'd rather spend the money doing more Navi designs. Now, you can argue if they would've been better off just shrinking Polaris (just like how people did with Vega), but at some point we're going to have to move the discussion on from lamenting that we're not getting the gains we used to.

Absolutely, but even they have needs outside of pure pixel processing capability. And that's another point of contention. Even games are wanting compute and other things. Which I think there might be something related to cache sizes at play here as well (I thought Navi double some of the cache sizes?) and they don't tend to shrink as well. Which, the design of the base GPU was allegedly tailored towards the consoles, so in many ways they're the driving force behind what we're seeing with dGPU (for sure AMD's, but it even impacts Nvidias as they have to support the features that the overall market is pushing for). With GCN, AMD often had other aspects that seemed because they were doing a singular GPU design for pro and consumer markets as those markets were diverging in needs.

I would also guess that the video processing block is larger partly for why we're not seeing a shrink like you'd expect. I wouldn't be surprised if they've moved to a pseudo FPGA like block for that so that they could keep the same processing block (across generations on the same node) but update it via software, although I have nothing to base that on. If that's the case those tend to be less optimal from a transistor count standpoint.
 

soresu

Diamond Member
Dec 19, 2014
4,244
3,748
136
What makes you believe that Navi 14 running at RX 480 clock speeds would not be more efficient than that RX 480, even if Polaris 10 would be on the same process?
It would still be significantly larger than P10 at 7nm as I detailed before, you are dissembling, for what reason I know not considering I am not some nVidia lifer here.

I have run an AMD only system for 10 years now, I'm fine with that - I just had it with their assertions of efficiency that never actually materialised in basically every segment but APU's.

Remember the 2.8x claims of Polaris? Or the 4x claim of Vega?

Sure Raven Ridge was efficiency was great, but we sure as heck didn't see that efficiency in discrete GPU segments - I know, because I waited long and hard for it.

Still waiting......
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
The performance of Zen 2 vs Zen 1 makes this assertion questionable at best, and I really do mean at best.
Uhhh, You do realize that Zen 2 gains performance over Zen 1 mostly in increased IPC? You do realize that TSMC promised 30% clock frequency uplift at the same power, and we got at best +10% uplift at the same power?

Compare R7 1800X numbers with R7 3700X. Both use the same amount of power, despite their TDP descriptions, and both differ by 10% in maximum Turbo Boost clock, which is rarely achieved.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
It would still be significantly larger than P10 at 7nm as I detailed before, you are dissembling, for what reason I know not considering I am not some nVidia lifer here.

I have run an AMD only system for 10 years now, I'm fine with that - I just had it with their assertions of efficiency that never actually materialised in basically every segment but APU's.

Remember the 2.8x claims of Polaris? Or the 4x claim of Vega?

Sure Raven Ridge was efficiency was great, but we sure as heck didn't see that efficiency in discrete GPU segments - I know, because I waited long and hard for it.

Still waiting......
**** me...

I told you. Take 5.7 bln xTors from P10, and scale it 1:1 to what is achievable on N7 - 40 mln xTors/mm2. You get 142.5 mm2 die size. How big is Navi 14? 158 mm2. Massively bigger?

Secondly, RX 480 had 167W power draw. Lets say that its clock stays 1:1 the same so we get 30% power cut from mythical N7 process. We got 120W of power draw. What makes you believe that Navi 14 will consume more power than 120W?

Thirdly. Lets take Overwatch numbers as indication of its performance. 135 FPS in Overwatch 1080p, Epic settings, for 1408 ALU GPU, that is cut down from 1536 ALU GPU, that is paired with R7 3800X CPU.

RX 480 achieves 110 FPS with Zen CPUs in this very game. 25 FPS less than Navi 14.

Fourth. Mobile version of RX 5500 has 85W TDP, with very high clocks that are on the same level as clock of... RX 590. Can RX 590 be sqeezed on N7 process to this low TDP? If not, why do you believe it would be more efficient(higher performance in the same thermal design, or the same performance in lower thermal design) when we can actually see that it is not the case, at all?
 

soresu

Diamond Member
Dec 19, 2014
4,244
3,748
136
Uhhh, You do realize that Zen 2 gains performance over Zen 1 mostly in increased IPC? You do realize that TSMC promised 30% clock frequency uplift at the same power, and we got at best +10% uplift at the same power?
Uhhhh, doubled core counts? Doubled floating point SIMD?

Perhaps you might want to look into that before marking yourself as a fool.

The GAMING gains might be poor, but the gains in multi threaded DCC areas like video encoding, decoding or RT/PT rendering are far beyond 10%, especially for the same mm2 as Zen.

Not everyone buys a computer to smash a keyboard and mouse for hours on end.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
Uhhhh, doubled core counts? Doubled floating point SIMD?

Perhaps you might want to look into that before marking yourself as a fool.

The GAMING gains might be poor, but the gains in multi threaded DCC areas like video encoding, decoding or RT/PT rendering are far beyond 10%, especially for the same mm2 as Zen.

Not everyone buys a computer to smash a keyboard and mouse for hours on end.
You say to me that I look like a foll, and then you respond with post like this?

You do understand what is being talked here, even?

You realize that I said, that CLOCK FREQUENCY got 10% uplift at the same power, versus Zen 1, despite TSMC claiming it would be 30%?

You realize what is being discussed, even?
 

soresu

Diamond Member
Dec 19, 2014
4,244
3,748
136
Lets take Overwatch numbers as indication of its performance
It still confuses me why this is being used as a benchmark at all.

The pre rendered stuff in Overwatch is truly fantastic, but the game itself is certainly nothing special in regards to gfx - benching a card on a NPR gfx focused game seems like a non sequitor in action.

Same as using Civilisation to bench a card - it's less about the card than how bad certain developers are at coding engines and shaders it seems to me.

I do love Diablo I might add.

...Shields eyes from various weapons thrown in my direction as I run away......
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
It still confuses me why this is being used as a benchmark at all.

The pre rendered stuff in Overwatch is truly fantastic, but the game itself is certainly nothing special in regards to gfx - benching a card on a NPR gfx focused game seems like a non sequitor in action.

Same as using Civilisation to bench a card - it's less about the card than how bad certain developers are at coding engines and shaders it seems to me.

I do love Diablo I might add.

...Shields eyes from various weapons thrown in my direction as I run away......
Overwatch is one of the best benchmarks, because it behaves completely predictably, with hardware, and different maps. It is just a show how good are some developers with optimizing the engine.
 
Status
Not open for further replies.