New Zen microarchitecture details

Page 95 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
I'd also bet that if you manage find a low leakage ASIC (<65%) you'll can get some serious power efficiency once you set the clocks and the voltages within the efficient range. I'd bet that you can get a 40-50% reduction in power by loosing < 20% of the performance.
 
Last edited:

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Pretty hard to tell what is going on with 14nm LPP currently... Is the huge variation because of GlobalFoundries or the process itself. Would be interesting and helpful to see identical parts manufactured at GlobalFoundries and Samsung fabs, but obviously that's much easier said than done.

Looking at the variation of the most recent 32nm SHP silicon version manufactured in July 2014 (80 samples), the absolute maximum variation in SIDD is 33.1%. When the couple rare extremities of both ends are removed, the average variation is just 10.84%.

The more recent GPU-Z versions store the "ASIC Leakage" value into a database, so that the software can tell how your specimen positions in the average quality of the same type of ASICs. Ellesmere (P10) has been available for three weeks now, and we've already seen a huge variation in the static leakage. The range I managed to find in public was 65.7% - 89.3% (672 - 914 LeakageID). Since both of the screenshots displayed that neither of these are the absolute minimum and maximum figures seen, I asked the author (w1zzard @ TPU) what are the absolute lows and highs recorded so far. The answer was 62.4% - 94.7% (638 - 969). 51.9% variation, after three weeks in D:
Are the absolute leakage scales of older and newer GPUs the same? E.g. 70% could mean 500 mA for Bonaire and 300 mA for Polaris (made up numbers).
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Are the absolute leakage scales of older and newer GPUs the same? E.g. 70% could mean 500 mA for Bonaire and 300 mA for Polaris (made up numbers).

The actual physical scale for Polaris is unknown, however it doesn't change the fact that the variation is absolutely massive compared to any previous ASIC.

The physical ranges:

Bonaire XT = 6.0A (floor), 24.0A (ceil)
Tonga XT = 11.0A (floor), 28.0A (ceil)
Hawaii XT = 21.0A (floor), 34.0A (ceil)
Fiji XT = 26.0A (floor), 46.0A (ceil)

Ceil = ASIC Quality 100.0% (LeakageID 1023)

I used full parts because generally on the harvested parts the variation is somewhat higher.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
The actual physical scale for Polaris is unknown, however it doesn't change the fact that the variation is absolutely massive compared to any previous ASIC.

The physical ranges:

Bonaire XT = 6.0A (floor), 24.0A (ceil)
Tonga XT = 11.0A (floor), 28.0A (ceil)
Hawaii XT = 21.0A (floor), 34.0A (ceil)
Fiji XT = 26.0A (floor), 46.0A (ceil)

Ceil = ASIC Quality 100.0% (LeakageID 1023)

I used full parts because generally on the harvested parts the variation is somewhat higher.
Thx. I see. But without knowing the physical range, the leakage range might even be small for Polaris, thus look like a huge variation. As long as we don't know it, we can't strike this possibility.

For the bigger dies, ceiling is ~ die size*0.0775. If that still holds true for Polaris, it's ceiling would be 18A. But due to very different process characteristics, this value might be anywhere. A measurement with disabled disturbances like power gating could reveal actual values of different cards.
 
Last edited:

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
The more recent GPU-Z versions store the "ASIC Leakage" value into a database, so that the software can tell how your specimen positions in the average quality of the same type of ASICs. Ellesmere (P10) has been available for three weeks now, and we've already seen a huge variation in the static leakage. The range I managed to find in public was 65.7% - 89.3% (672 - 914 LeakageID). Since both of the screenshots displayed that neither of these are the absolute minimum and maximum figures seen, I asked the author (w1zzard @ TPU) what are the absolute lows and highs recorded so far. The answer was 62.4% - 94.7% (638 - 969). 51.9% variation, after three weeks in D:

It's not surprising that the immature GloFo 14LPP process would have a lot of variation. Nor is it surprising that AMD would use a generous binning process for a low-margin, high-volume chip like Polaris 10. What does surprise me is that the touted power-calibration features don't seem to be working at all. Apparently every chip gets 1100-1150 mV by default, even though most of them work just fine at 1050 mV maximum (and even get a clock boost since they then don't hit the power limit as often). Wasn't all the new voltage regulation technology supposed to calibrate this on each boot? But it looks like we still have the same old "overvolt everything!" mentality.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
It's not surprising that the immature GloFo 14LPP process would have a lot of variation. Nor is it surprising that AMD would use a generous binning process for a low-margin, high-volume chip like Polaris 10. What does surprise me is that the touted power-calibration features don't seem to be working at all. Apparently every chip gets 1100-1150 mV by default, even though most of them work just fine at 1050 mV maximum (and even get a clock boost since they then don't hit the power limit as often). Wasn't all the new voltage regulation technology supposed to calibrate this on each boot? But it looks like we still have the same old "overvolt everything!" mentality.

1150mV is the absolute maximum default voltage for P10. No ASIC specimen will exceed it at stock. The lower limit is 800mV but I haven't seen any ASIC with less than 1075mV regardless of the leakage.
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
1150mV is the absolute maximum default voltage for P10. No ASIC specimen will exceed it at stock. The lower limit is 800mV but I haven't seen any ASIC with less than 1075mV regardless of the leakage.

The reports from owners seem to suggest that the cards always use the 1150mv max voltage. At least from what I saw on the Reddit under-volting forum.
 

KTE

Senior member
May 26, 2016
478
130
76
I'd also bet that if you manage find a low leakage ASIC (<65%) you'll can get some serious power efficiency once you set the clocks and the voltages within the efficient range. I'd bet that you can get a 40-50% reduction in power by loosing < 20% of the performance.

IF you can get that routinely on any product, it's a low power process being clocked beyond its efficiency range. This would worry me the most.

Sent from HTC 10
(Opinions are own)
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
The reports from owners seem to suggest that the cards always use the 1150mv max voltage. At least from what I saw on the Reddit under-volting forum.

If that's the case then I'd say something is broken, and pretty badly.

The leakage based default voltage has been used at least since R600 I believe. 1150mV should be the default voltage only for the lowest leaking specimens, while the higher leaking ones should have significantly lower default voltage.

For example on GCN2-3 ASICs (for the highest DPM state):

Bonaire XT (1100MHz) - HL = 1.10625V - LL = 1.22500V
Hawaii XT (1050MHz) - HL HC = 1.20625V - HL LC = 1.21875V - LL = 1.28125V
Tonga XT (970MHz) - HL HC = 1.06750V - HL LC = 1.16250V - LL = 1.30625V
Fiji XT (1050MHz) - HL HC = 1.13125V - LL = 1.25000V

LL = Low leakage
HL = High leakage

LC = Low capacitance
HC = High capacitance
 

krumme

Diamond Member
Oct 9, 2009
5,952
1,585
136
Production is the art of managing the right level for tolerances.

If one copy a process to a new location variation is excactly what to expect.

What Intel have been good at is extremely tight tolerances shown also in the fact they are implementing euv later.

I can not imagine a better product than a p10 desktop gpu in the 150w range to start production of a new finfet process.

Thats also why we get amd gpu at the same time as nv. Gf is in no way compettitive with tsmc. Its a different class.

Imo what we see is damn good business managing at what amd have at hand.
 

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
Mostly correct, you can also hand write code paths into programs to use XOP/FMA4... but no one really does it.

AMD has finally wised up that they have nowhere near enough pull any more to create adoption of new instruction sets, so they have stripped support for them from Zen to help clean the plate.
Did those instructions add much value (efficiency), regardless of the issue of AMD's sway?

wiki said:
XOP is a revisioned subset of what was originally intended as SSE5. It was changed to be similar but not overlapping with AVX, parts that overlapped with AVX were removed or moved to separate standards such as FMA4 (floating-point vector multiply–accumulate) and CVT16 (Half-precision floating-point conversion implemented as F16C by Intel).

All SSE5 instructions that were equivalent or similar to instructions in the AVX and FMA4 instruction sets announced by Intel have been changed to use the coding proposed by Intel. Integer instructions without equivalents in AVX were classified as the XOP extension. The XOP instructions have an opcode byte 8F (hexadecimal), but otherwise almost identical coding scheme as AVX with the 3-byte VEX prefix.

Commentators have seen this as evidence that Intel has not allowed AMD to use any part of the large VEX coding space. AMD has been forced to use different codes in order to avoid using any code combination that Intel might possibly be using in its development pipeline for something else. The XOP coding scheme is as close to the VEX scheme as technically possible without risking that the AMD codes overlap with future Intel codes. This inference is speculative, since no public information is available about negotiations between the two companies on this issue.
That makes it sound, not like AMD was trying to set a new standard (like it did with AMD64), but that it was trying to keep up with Intel's tech. Is it accurate?
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
Mostly correct, you can also hand write code paths into programs to use XOP/FMA4... but no one really does it.

AMD has finally wised up that they have nowhere near enough pull any more to create adoption of new instruction sets, so they have stripped support for them from Zen to help clean the plate.

gcc can autovectorize with an xop/fma4 target, and I'm fairly certain Java has JVM optimization support for xop. So it wasn't really that hard to use it, if you wanted to. The market penetration for AMD products was never big enough for it to be a big selling point on software.

That being said, IF Zen's support of AVX/AVX2 is weak (which seems likely) then there are some weird circumstances where XV could outperform Zen on a per-thread basis, possibly by a lot.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
gcc can autovectorize

GCC autovectorization is non-existent compared to ICL. Tested with a monte carlo raytracer and the performance difference between GCC 5.3 / 6.1 and ICL was around 45% in favor of ICL. GCC can neither auto dispatch so you would have to write your own dispatcher for FMA4 & XOP supporting parts or use separate binaries on them. Dropping FMA4 and XOP was absolutely the right choice.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Did those instructions add much value (efficiency), regardless of the issue of AMD's sway?


That makes it sound, not like AMD was trying to set a new standard (like it did with AMD64), but that it was trying to keep up with Intel's tech. Is it accurate?

I don't know, when AMD learned about Intel's AVX plans with 3 operand and FMA support. But long ago they already proposed an extension called "Technical Floating Point" (TFP) with 32 registers and 3 operand instructions. That was back in Fred Weber's time or even before that.

Like adding registers to the 64 bit mode (knowing the advantage), they also surely knew, that 3 operand instructions and more FP registers would give a performance boost to x86 processors.

For implementing such an extension, or sth like AVX, they have to start very early. IIRC at the time they started to add their SSE5/XOP to the design, they might have known nothing about Intel's plans. I think, this is somewhat similar to Intel's 64b extension to Prescott (Yamhill), which likely started at a time, when not everything was known about AMD's implementation. At least it wasn't fully compatible to AMD64.

The existence of FMA4 is also a sign of AMD's own plans, as this needs full FPU support (3R+1W register file ports per FMA unit and appropriate instruction control modifications).

The changes necessary to support AVX after learning about it were to modify the decoders to create the internal uops (as planned for SSE5, XOP), add microcode for instructions not supported in HW, and add a bit or other method to track the state of 256b regs comprised of two internal 128b regs. Internal 3 operand uops were standard since P6 and K5 times.
 
Last edited:

looncraz

Senior member
Sep 12, 2011
722
1,651
136
If that's the case then I'd say something is broken, and pretty badly.

That certainly appears to be the case. I did find a couple of examples of sub 1150mv, but we're only talking 1137mv as the lowest I've found, ATM.

However, this is only judging by Wattman numbers - it may be that the software is pushing the voltage more once you enable manual control.

Defaults voltages I've found in Wattman are all pretty much the following:

AMD-WattMan-App.png



The only example of a non 1150mv default... and it's still a high 1137mv:

radeon-settings-manual-voltage.jpg


It has been almost universally found that the GPU can run comfortably at 1050mv, many have found 1025 to be stable at stock settings. Under-volting to 1050mv increases performance as well, as the clocks average higher in-game.

LegitReviews reports that default in-game voltage actually began at 1187mv - 37mv higher than the Wattman setting, and it even peaked over 1.3V:

rotr-voltage.jpg


So whatever AMD is doing by default is pretty screwed up.
 

yuri69

Senior member
Jul 16, 2013
373
573
136
So whatever AMD is doing by default is pretty screwed up.
It seems AMD got a long history of releasing products with unfinished/disabled/broken firmware. They dont have time/money for finishing stuff.

A sad example is Kaveri having hardcoded(!) CPU frequency when any load runs on its GPU.
 

KTE

Senior member
May 26, 2016
478
130
76
I just hope... Pray... They don't rush out Zen. Like Phenom.

Architecture is usually OK. Just get the process/clocks right.

Sent from HTC 10
(Opinions are own)
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
It seems AMD got a long history of releasing products with unfinished/disabled/broken firmware. They dont have time/money for finishing stuff.

A sad example is Kaveri having hardcoded(!) CPU frequency when any load runs on its GPU.

Indeed. It's really crazy that AMD has had the hardware on-board to implement dynamic voltage and frequency control in a much more accurate and dynamic manner for many years and has still not managed to make good use of it.

The other area they really suffer is in video playback power consumption - they ramp the RAM all of the way to its max any time there is any load, which wastes 20~30W for no good reason. If they simply stepped the memory clocks (and voltage with it, possibly), they'd save an immense amount of power during DVD/Bluray/Youtube video playback.

This exact same problem is why their multi-monitor or fast-refresh rate power usage is so high... they either run the RAM a 150Mhz, or full blast.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
That certainly appears to be the case. I did find a couple of examples of sub 1150mv, but we're only talking 1137mv as the lowest I've found, ATM.

However, this is only judging by Wattman numbers - it may be that the software is pushing the voltage more once you enable manual control.

Defaults voltages I've found in Wattman are all pretty much the following:

AMD-WattMan-App.png



The only example of a non 1150mv default... and it's still a high 1137mv:

radeon-settings-manual-voltage.jpg


It has been almost universally found that the GPU can run comfortably at 1050mv, many have found 1025 to be stable at stock settings. Under-volting to 1050mv increases performance as well, as the clocks average higher in-game.

LegitReviews reports that default in-game voltage actually began at 1187mv - 37mv higher than the Wattman setting, and it even peaked over 1.3V:

rotr-voltage.jpg


So whatever AMD is doing by default is pretty screwed up.

It might well be a issue caused by "Wattman". I wouldn't call it as a issue, until there is enough evidence to back it up. AIDA64 GPU Register dump can decode the the default voltages calculated by the driver. Few of those dumps taken after a clean driver install (/wo Wattman enabled), combined with GPU-Z "ASIC Quality" value should give us the idea if there is anything actually wrong.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
So, according to Lisa Su in Q2 EC (07/21/2016) "the true volume availability of DT Zeppelin" = Q1/2017.
 

SK10H

Member
Jun 18, 2015
113
42
101
That certainly appears to be the case. I did find a couple of examples of sub 1150mv, but we're only talking 1137mv as the lowest I've found, ATM.

The following is my Wattman voltage after system crashes and it reverts back to default setting. Not clean install, with a lot of crashes already. :twisted: