AMD's next GPU uarch is called "Polaris"

Page 19 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
I think it'll probably be a 1536 shader part, maybe cut down to 1408 for the demo/yield purposes. With GCN AMD nearly always needs more shaders vs the competition to compensate for the clock speed amongst other things, I can't really see that changing with Polaris.

I think the 110 sq mm die is most likely a 1024 sp part. A true next gen successor to the original HD 7770 (Cape Verde). AMD is trying to improve perf/sp as thats the key to being competitive against Nvidia Pascal. AMD realizes that there is a lot of room for shader efficiency improvement and thats why they have tackled it in Polaris. If AMD's perf/sp can get as close to possible to Nvidia's perf/cc then we have a good contest. I am looking forward to the products from both camps. It will be an interesting year. But Nvidia has a massive advantage as they have tremendous market share and mindshare. AMD's failures to compete during the Maxwell generation hurt its marketshare and brand badly. Pascal will be improving on the impressive Maxwell architecture. So AMD faces an uphill task.
 

Techhog

Platinum Member
Sep 11, 2013
2,834
2
26
I think the 110 sq mm die is most likely a 1024 sp part. A true next gen successor to the original HD 7770 (Cape Verde). AMD is trying to improve perf/sp as thats the key to being competitive against Nvidia Pascal. AMD realizes that there is a lot of room for shader efficiency improvement and thats why they have tackled it in Polaris. If AMD's perf/sp can get as close to possible to Nvidia's perf/cc then we have a good contest. I am looking forward to the products from both camps. It will be an interesting year. But Nvidia has a massive advantage as they have tremendous market share and mindshare. AMD's failures to compete during the Maxwell generation hurt its marketshare and brand badly. Pascal will be improving on the impressive Maxwell architecture. So AMD faces an uphill task.

This isn't a major revision, though. I don't think that much will change in that area.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
This isn't a major revision, though. I don't think that much will change in that area.

Since AMD specifically mention improved shader efficiency we have to wait and see how much has improved. If AMD wants Polaris to compete with Nvidia Pascal then the most important area of improvement needs to be perf/sp which will directly affect perf/watt and perf/sq mm. GCN has to improve efficiency in a big way to stand a chance of competing.

http://www.pcper.com/reviews/Graphi...hnologies-Group-Previews-Polaris-Architecture

"How is Polaris able to achieve these types of improvements? It comes from a combination of architectural changes and process technology changes. Even RTG staff were willing to admit that the move to 14nm FinFET process tech was the majority factor for the improvement we are seeing here, something on the order of a 70/30 split. "

So I would not conclude that Polaris is a minor revision until we see actual reviews of the product.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
Raghu, IMO if we will take all info we have currently(test setup, with power draws) I think 1024 GCN core GPU is too small.

That bit about 70/30 Efficiency gains from new process/new architecture is pretty interesting.
http://www.samsung.com/semiconductor/foundry/about-samsung-foundry/
14 nm LPE offers 40% increased performance plus 60% smaller power consumption and 50% smaller die size compared to 28 LPP.
14 nm LPP offers additional 10% better performance and power consumption. The thing is this: it is compared to Samsung process. Not TSMC. That process If I remember correctly was more efficient than TSMC. Apple bought whole production of A7 on that process from Samsung, If I remember correctly.

So the differences may be bigger than we think. Also, if we will see that 70% smaller power consumption at the same speed accounts for 2.5 better efficiency...

I would bet that the small chip that we are discussing here has somewhere between 1792 and 2048 GCN cores.
 
Last edited:

MrTeal

Diamond Member
Dec 7, 2003
3,916
2,700
136
Raghu, IMO if we will take all info we have currently(test setup, with power draws) I think 1024 GCN core GPU is too small.

That bit about 70/30 Efficiency gains from new process/new architecture is pretty interesting.
http://www.samsung.com/semiconductor/foundry/about-samsung-foundry/
14 nm LPE offers 40% increased performance plus 60% smaller power consumption and 50% smaller die size compared to 28 LPP.
14 nm LPP offers additional 10% better performance and power consumption. The thing is this: it is compared to Samsung process. Not TSMC. That process If I remember correctly was more efficient than TSMC. Apple bought whole production of A7 on that process from Samsung, If I remember correctly.

So the differences may be bigger than we think. Also, if we will see that 70% smaller power consumption at the same speed accounts for 2.5 better efficiency...

I would bet that the small chip that we are discussing here has somewhere between 1792 and 2048 GCN cores.

It all depends on the actual die size, but that would be pretty massive scaling. 2048 is the same number as Tahiti, which was a 350mm^2 die. I'm expecting more along the lines of the claimed 2x scaling, and something like 1024 to 1280 shaders, depending on whether the die ends up closee to 100 or 120.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Given how AMD leveraged Pitcairn I also think the chip that was demoed is basically a GCN4 version of Pitcairn at ~100-120mm2. 1280 shaders would even allow for some harvesting of 1024 and possibly even 768 shader SKUs.
 

Techhog

Platinum Member
Sep 11, 2013
2,834
2
26
It all depends on the actual die size, but that would be pretty massive scaling. 2048 is the same number as Tahiti, which was a 350mm^2 die. I'm expecting more along the lines of the claimed 2x scaling, and something like 1024 to 1280 shaders, depending on whether the die ends up closee to 100 or 120.

You should be looking at Tonga, rather than Tahiti. (Not trying to make a point or anything; just pointing that out.)
 

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
This isn't a major revision, though. I don't think that much will change in that area.

It certainly looks like it will be a major revision. The CUs, geometry processor, command processor, L2 cache, and memory controller are all said by AMD to be new. (Of course that doesn't mean completely redone from scratch, but there should be improvements.) That's in addition to the changes we already know about to the multimedia and display sections (HEVC Main10, HDMI 2.0, DP 1.3, etc.) The presentation touts "improved shader efficiency", so we can't necessarily assume that, for instance, 1280 Polaris shaders will only provide Pitcairn-level performance.

I think there were quite a few goodies we were supposed to get as part of the cancelled 20nm GPU line which ended up being held over for FinFET. If we use the existing unofficial revisions (1.0, 1.1, 1.2) for the current GCN cards, then this would be "GCN 2.0" - probably the most substantial revision GCN has had to date.
 

stuff_me_good

Senior member
Nov 2, 2013
206
35
91
I've always wondered that why AMD does not go for the same route as nvidia when designing future GPU's? I mean as it seems the less cores you have the more clocks you can crank up. I think AMD's approach the chip has too many smaller cores than nvidia and while the chip is fast and efficient at to certain point, for some reason the GPU does not scale up very well when adding more shader cores, just like we have seen with Fury. And because of so massive amount of small cores, it's harder to crank the frequency up.

Nvidia's approach seems to be more balanced in term of core size vs. clocks. They have managed to make fast cores that clock high even though individual core is bigger than AMD's.
 

USER8000

Golden Member
Jun 23, 2012
1,542
780
136
I've always wondered that why AMD does not go for the same route as nvidia when designing future GPU's? I mean as it seems the less cores you have the more clocks you can crank up. I think AMD's approach the chip has too many smaller cores than nvidia and while the chip is fast and efficient at to certain point, for some reason the GPU does not scale up very well when adding more shader cores, just like we have seen with Fury. And because of so massive amount of small cores, it's harder to crank the frequency up.

Nvidia's approach seems to be more balanced in term of core size vs. clocks. They have managed to make fast cores that clock high even though individual core is bigger than AMD's.

Part of the problem is AMD is using hardware scheduling with GCN and not software scheduling which Nvidia used from Kepler onwards,so I expect the lower clockspeeds might be down to the additional hardware they need for each shader group.
 

maddie

Diamond Member
Jul 18, 2010
5,147
5,523
136
I've always wondered that why AMD does not go for the same route as nvidia when designing future GPU's? I mean as it seems the less cores you have the more clocks you can crank up. I think AMD's approach the chip has too many smaller cores than nvidia and while the chip is fast and efficient at to certain point, for some reason the GPU does not scale up very well when adding more shader cores, just like we have seen with Fury. And because of so massive amount of small cores, it's harder to crank the frequency up.

Nvidia's approach seems to be more balanced in term of core size vs. clocks. They have managed to make fast cores that clock high even though individual core is bigger than AMD's.
Which path do I choose? I consider this a good example of the consequences of high level design choices.

You have to remember that Nvidia was in similar situation a few generations ago. They had complex, double clocked shaders and starting with Kepler jumped to the AMD path of many simpler shaders. They were facing the negative consequences of an earlier design decision.
 
Last edited:

zlejedi

Senior member
Mar 23, 2009
303
0
0
It's also a matter of how much You can afford - AMD needs to have one architecture good for apus, consoles and professional market and stay relevant for years. When Nvidia can design 600 mm^2 gpu completely devoid of FP32 capabilities just for gaming market and subsegment of professional market while still mantaining production of slightly tweaked Kepler for professional users.

So the whole Maxwell vs GCN era is classic case of specialised solution outperforming more universal design.

Personally I prefer more specialised approach since I don't care about BOINC anymore and future proofing for x years ahead is pointless for my software buying habits of getting GOTY versions on steam sale long after release.
 

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
I've always wondered that why AMD does not go for the same route as nvidia when designing future GPU's? I mean as it seems the less cores you have the more clocks you can crank up.

The shader count isn't the problem. GTX 980 (2048 shaders) and GTX 980 Ti (2816 shaders) can get boost clocks much higher than the AMD cards with the same shader counts. It's the design of the shader cores that causes the issues AMD is having. AMD's current GCN cores run most efficiently at clock speeds between 800-900 MHz, while Nvidia Maxwell cores can easily do 1100-1200 MHz without hogging too much power.

This is the #1 reason why AMD's perf/watt looks so bad. If you compare a first-generation Pitcairn 7850 card to even Maxwell, it stacks up surprisingly well, since it runs at GCN's sweet spot. But AMD had to compete with newer Nvidia chips, and without the funds for new 28nm designs, the only way they had was to crank up the core and memory clocks. This killed efficiency. Hawaii is known as being hot and power-hungry, but if it was run at 800-900 MHz instead of 1000+, it would be far, far more efficient. The FirePro W8100, a professional Hawaii card, was tested as only taking 188W at full load. That's nearly 100W less than the corresponding consumer card (R9 290), and while part of this is due to binning, the biggest factor is that the W8100 runs at a lower clock speed that GCN was actually designed for.

I think AMD's approach the chip has too many smaller cores than nvidia

AMD's cores are not "smaller". In many ways, they are actually more powerful than Nvidia's, especially when it comes to GPGPU. As others pointed out, they have hardware schedulers, whereas Nvidia has stuck with software scheduling for Kepler and Maxwell. (Fermi had a hardware scheduler, and was notorious for being hot, loud, and power-hungry. And Fermi, like GCN, was created as a compute-first design.)

and while the chip is fast and efficient at to certain point, for some reason the GPU does not scale up very well when adding more shader cores, just like we have seen with Fury.

Fury has other problems. It's bottlenecked by either the limited ROP count or by its front end, or both. This is evident in the fact that the 3584-core salvage chip is only a tiny bit inferior to the 4096-core full-fat version, as well as the fact that it barely beats Hawaii in some gaming applications.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
AMD's cores are not "smaller". In many ways, they are actually more powerful than Nvidia's, especially when it comes to GPGPU. As others pointed out, they have hardware schedulers, whereas Nvidia has stuck with software scheduling for Kepler and Maxwell. (Fermi had a hardware scheduler, and was notorious for being hot, loud, and power-hungry. And Fermi, like GCN, was created as a compute-first design.)

You should go back and read the Anandtech article again:
ith that said, in discussing Kepler with NVIDIA’s Jonah Alben, one thing that was made clear is that NVIDIA does consider this the better way to go. They’re pleased with the performance and efficiency they’re getting out of software scheduling, going so far to say that had they known what they know now about software versus hardware scheduling, they would have done Fermi differently. But whether this only applies to consumer GPUs or if it will apply to Big Kepler too remains to be seen.
http://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/3

There is no real advantages in hardware over software scheduling.
And Maxwell is on par with AMD when it comes to compute.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
What would be real nice is if people in this thread who are speaking as authorities give their credentials. If you don't have any then please say what you are giving us is an uneducated guess. Not, "This is why X is the way it is, period", when you honestly don't have a clue and are just making it up or regurgitating something that someone who has half a clue said. It's beyond annoying to have to look at all of the BS just to hope to find a bit of actual information.
 

rgallant

Golden Member
Apr 14, 2007
1,361
11
81
Did anyone see the difference
https://www.youtube.com/watch?v=oNA6fll2DDQ

Left panel color are more sharper and have more saturation (GTX 950) where as right side panel color are totally washed out which is equipped with amd next gen.
should have used better drivers on that es sample maybe , they must have a few on file lol
also the power meters were different so I guess in your mind it's all bs.
 

stuff_me_good

Senior member
Nov 2, 2013
206
35
91
First you say this...

The shader count isn't the problem.

And then this...
Fury has other problems. It's bottlenecked by either the limited ROP count or by its front end, or both. This is evident in the fact that the 3584-core salvage chip is only a tiny bit inferior to the 4096-core full-fat version, as well as the fact that it barely beats Hawaii in some gaming applications.
While saying that that what I said was wrong, but in the end you just confirmed what I said before by yourself.

So yeah, GCN sweet spot is around 800MHz and in theory you could just cram more shaders on the core to compensate the low clocks, but that didn't work so well like we saw in fury.
 

Techhog

Platinum Member
Sep 11, 2013
2,834
2
26
First you say this...



And then this...

While saying that that what I said was wrong, but in the end you just confirmed what I said before by yourself.

So yeah, GCN sweet spot is around 800MHz and in theory you could just cram more shaders on the core to compensate the low clocks, but that didn't work so well like we saw in fury.

No, you just misread what he was saying... unless you don't know what a ROP is? He's saying that there's another bottleneck at play.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
I think it's a good sign for AMD they have actual cards of at least two dies floating around while Nvidia is using Maxwell chips as stand ins for Pascal in their new car platform showcase. If they get little Polaris out soon enough they'll likely get a lot of sales over the Nvidia 750/950 before even having to face off against any FinFet competition. Let little Polaris generate some buzz and goodwill for a month or two then launch the next tier up.


Can't wait to see how AMD marketing messes this up, heh.