AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Page 32 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
"Useful" range is undefined. But sure, hardware.fr article you cite (by the way, it does not have numbers you claim it has, but that's for dessert): on 8370E we have 3.3Ghz on 1.032Vcore, 4Ghz on 1.188Vcore, yet if you were right it would be 4.3Ghz on 1.188Vcore or 4Ghz on ~1.136Vcore. Considering that it is area right inside it's operating frequencies, it's pretty useful :)
.

So you are saying that reducing frequency by 1.36x reduce power by more than 1.36^2, lol, yet you said that doing so with Zen it would be the other way around, that is, that it wouldnt scale power by as much as1.36^2, what happened suddenly..?..

http://www.hardware.fr/focus/99/amd-fx-8370e-fx-8-coeurs-95-watts-test.html
This article, correct? I don't see 100W for 8350 or 65W for 8370E anywhere in this article.
What i do see is 120W for 8350 and 72W for 8370E on unknown frequencies for latter :). Thanks though, this article provided all evidence required to disprove your point about voltage/frequency.

These are stock frequency, or are you asking others to do your homework..?.

And how are measured those 120W and 72W..?.

Perhaps in the CPU 12V rails..?..

And Isnt there a DC conversion in this routing that has not 100% efficency..?.

At the end you are just polluting the thread with either incompetence or willfull spamming, so from now just dont answer any of my post and dont expect any more some troll food, it is obvious that you are here to derail the thread while posting what amount to fud, unrelentlessly.
 
Last edited:
  • Like
Reactions: F-Rex

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
So you are saying that reducing frequency by 1.36x reduce power by more than 1.36^2, lol, yet you said that doing so with Zen it would be the other way around, that is, that it wouldnt scale power by as much as1.36^2, what happened suddenly..?..
Reducing frequency by 1.36x can reduce power by any amount, depending on frequency, f(v) and a bunch of other parameters, that's what i am talking about. Latter part of your post is mostly gibberish of no relation to my point at all.
These are stock frequency, or are you asking others to do your homework..?.

And how are measured those 120W and 72W..?.

Perhaps in the CPU 12V rails..?..

And Isnt there a DC conversion in this routing that has not 100% efficency..?.

I ask you again and straight this time. You are claiming the numbers in this article are 100 and 65. As such, the question is: where are those numbers?

Sorry man, but with those we can clearly see that the only one spreading fud here is you, simply because none of your posts last few pages have any evidence in reality that you care to provide. In fact, until you provide a source for your 100W for 8350 in Fritz claim, we may as well call it made up.

I am not sorry i have wasted some time on it, but after this and Cinebench argument in another thread.. Yeah, you see.
 
  • Like
Reactions: Sweepr

Nothingness

Diamond Member
Jul 3, 2013
3,307
2,379
136
As I stated above, A9 has six decoders and no uop cache.
None of the A9 studies I have seen can dismiss the fact that the 6 instr/cycle seen are not the output of a uop cache. It could very well be that in fact the A9 has 4 decoders that feed a uop cache that can output 6 uops per cycle.
 
  • Like
Reactions: Arachnotronic

bjt2

Senior member
Sep 11, 2016
784
180
86
None of the A9 studies I have seen can dismiss the fact that the 6 instr/cycle seen are not the output of a uop cache. It could very well be that in fact the A9 has 4 decoders that feed a uop cache that can output 6 uops per cycle.
Ok, but the differences between 16 stages pipeline and 19 stages pipeline can easily make up for the difference, in terms of power consumption.
And anyway I supposed 5W for the 2 cores alone, but 5W is the TDP of the WHOLE SoC (GPU, NB and SB) and the CPU are 2.26GHz. So we can safely assume that 2 Zen core draw 5W at at least 2GHz...
 

KTE

Senior member
May 26, 2016
478
130
76
Abwx:

You really need to grasp what you're replying to, and understand the parts of science applicable from those that are not.

As pointed above, it is not science of semiconductors anyone disputes, but your beliefs/opinions. Since you are not able to calmly explain or justify them.

The link that is applicable doesn't exist because this is a random scaling belief without any data whatsoever.

No 'law' predicts scaling. Shmoo is the operating word you are looking for, and this link explains the actual laws at work in good detail: http://www.realworldtech.com/overclocking-realities/2/

https://www.computer.org/csdl/proce..._cWO2bUScNinotLdQ&sig2=ucqpFZo6hf2xGERynZ2T0g

http://davefaq.com/Docs/1997-08.HP_...7lpwOI5DP1ibvOtZA&sig2=0a8IRGrnvH-3AO87Nz6LIw


Sent from HTC 10
(Opinions are own)
 
Last edited:

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
I think Abwx meant that the frequency vs. voltage scaling is generally linear between two points of the frequency range of a chip, rather than linear in respect where a change in frequency, at any single point of a frequency range results a change in the voltage by the same amount to the same direction (increase or decrease). Based on my own experience, the scaling can be close to linear however that is extremely rare. For the scaling to be even remotely linear, the two reference points must be within the optimal frequency range of the process and the design itself. As soon as you go outside the optimal range, or hit a other restriction (usually the caches in case of CPUs) the linearity is gone for good. You'll never achieve a full linearity for the full frequency range thou, so this rarely applies fully in practice. However if we place the two reference points to 2.0GHz and 3.0GHz on a design which operates between 1.0GHz and 4.0GHz, then you can with reasonable accuracy to predict the voltage required at either point based on the voltage required for the other. This however is completely useless for a design (such as Zeppelin) which has not been characterised yet, as it is impossible to know what the optimal range will be.

Here is an averaged (multiple specimens) V/F chart, recorded on GF28A and 32SHP parts.
For both of them the scaling in very close to linear when the reference points are placed to 2900MHz & 3700MHz (around 4MHz/mV). However if you place the reference points to 1600MHz and to 4500/4800MHz you get very different results. That's because at 4500/4800MHz both of the parts are operating outside their optimal range, defined by the design and the process.

Kzcbded.png
 

bjt2

Senior member
Sep 11, 2016
784
180
86
Linear scaling for the Vcore means cubic scaling for power, because P~V^2*f (P=V^2*I and I=kf)
Maybe this is what he meant.

If V is linear between 2 and 3 GHz, then at 2GHz the CPU draw 1.5^3=3.375 less than 3GHz... Less than 1/3. Supposing that an 8c Zen draw 95W at 3GHz we have:

A 32c at 2GHz should draw 95/3.375*4=112W...

At 180W we can have 32c at 2*(180/112)^(1/3)=2.34GHz...
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
Linear scaling for the Vcore means cubic scaling for power, because P~V^2*f (P=V^2*I and I=kf)
Maybe this is what he meant.

If V is linear between 2 and 3 GHz, then at 2GHz the CPU draw 1.5^3=3.375 less than 3GHz... Less than 1/3. Supposing that an 8c Zen draw 95W at 3GHz we have:

A 32c at 2GHz should draw 95/3.375*4=112W...

At 180W we can have 32c at 2*(180/112)^(1/3)=2.34GHz...

Not at all, i meant square scaling, cubic scaling occur at the higher frequencies and generaly CPUs are not used in this part of the curve, or only for one core with a turbo.

I pointed that if a 8C Zen manage 95W at 3GHz then at 1.5GHz it will be possible to stick 32C/95W assuming that at 3GHz scaling was still in the square part of the curve, so that s a worst case figure, and despite lots of explanations people who said that GF process is not good do not realise that if at 3GHz they are in the cubic part of the curve then at 1.5GHz a 32C would be well below 95W.
 
Last edited:

bjt2

Senior member
Sep 11, 2016
784
180
86
Not at all, i meant square scaling, cubic scaling occur at the higher frequencies and generaly CPUs are not used in this part of the curve, or only for one core with a turbo.

I pointed that if a 8C Zen manage 95W at 3GHz then at 1.5GHz it will be possible to stick 32C/95W assuming that at 3GHz scaling was still in the square part of the curve, so that s a worst case figure, and despite lost of explanations people who said that GF process is not good do not realiste that if at 3GHz they are in the cubic part of the curve then at 1.5GHz a 32C would be well below 95W.

The estimation I made starting from the apple A9X, gave me at least 2GHz 32c@95W

With quadratic scaling this means 2.8GHz@180W for a 32c and 45W for 8c@2.8GHz. And 90W for 8c@4GHz... Coherent with my estimations made in another forum starting from power curves of a neon FPU (330mW@2.41GHz, less than 1W@4.3GHz, estimated from power curves with various library and assuming Zen FPU=4xNEON FPU and Zen CPU=4x Zen FPU)...
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,596
136
The estimation I made starting from the apple A9X, gave me at least 2GHz 32c@95W

With quadratic scaling this means 2.8GHz@180W for a 32c and 45W for 8c@2.8GHz. And 90W for 8c@4GHz... Coherent with my estimations made in another forum starting from power curves of a neon FPU (330mW@2.1GHz, less than 1W@4.3GHz)...

Is that with soi 2.0 ?
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
The estimation I made starting from the apple A9X, gave me at least 2GHz 32c@95W

That s assuming that throughput/Hz of A9X and Zen are comparable, wich i doubt is the case, particularly in FP.

Since they said that they ll release higher frequency SKUs than 3GHz we can confidently assume that it will be at least 3.2GHz, this suggest that frequency for a 32C should be at most 1.6GHz.

That being said if at 3.2GHz the scaling follow an exponent of 2.25 (on average over a large frequency range) then a 32C would clock at 1.73GHz, for a 2.5 exponent this would be 1.84GHz and i dont think that frequency could be higher than this for said 32C as this would imply that the process has relatively low frequency ceiling and would be unable at the first place to even provide 95W/3.2GHz for the 8C.
 
Last edited:

bjt2

Senior member
Sep 11, 2016
784
180
86
Is that with soi 2.0 ?

I assumed an average exponent of two... We can also do the calculations with 1 or 3, but either two exponent means that AMD is superior to INTEL either on low clocked multicore monsters or high clocked low core count CPUs...

We should stick with one exponent, not with the one that fits our expectations, depending on the circumstances.

If we say that at 3GHz we are near the limits of the 14nm and that zen does not go over 3.2GHz, then we are at exponent near 3, so a 32c Zen is feasible with at least 2GHz, and so will beat the 22c XEON.

If we say that the exponent is nearly linear, then Zen is pretty much under the frequency limits and will scale over 4GHz, and so will beat the 8c INTEL HEDT CPUs...

You must decide. You can't say that under 3GHz the exponent is 1 and over 3GHz the exponent is suddenly 3 to have a 32c at 1.5GHz and a 8c at 3.2GHz just to stay under INTEL CPUs...

That s assuming that throughput/Hz of A9X and Zen are comparable, wich i doubt is the case, particularly in FP.

Since they said that they ll release higher frequency SKUs than 3GHz we can confidently assume that it will at least 3.2GHz, this suggest that frequency for a 32C should be at most 1.6GHz.

That being said if at 3.2GHz the scaling follow an exponent of 2.25 (on average over a large frequency range) then a 32C would clock at 1.73GHz, for a 2.5 exponent this would be 1.84GHz and i dont think that frequency could be higher than this for said 32C as this would imply that the process has relatively low frequency ceiling and would be unable at the first place to even provide 95W/3.2GHz for the 8C.

I made the following calculations:

A9X is 5W TDP with 2 CPUs at 2.26GHz plus GPU, NB and SB.
The A9X has 6 decoders, 4 ALU, 2 AGU and 3 FPU and 16 pipeline stages, so a FO4 similar to INTEL CPUs.
Zen has 4 decoders (+uop cache), 4 ALU, 2 AGU and 4 FPU and 19 pipeline stages, so a FO4 similar to Bulldozer.

Even if the throughput is inferior, Zen has a lower FO4, so at same clock and throughput draws less power, because requires less Vcore.

Anyway 2x2.26 CPUs does not drain 5W because there is GPU, NB and SB.

So after said that, let's say that 2 Zen core at 2.26GHz draw 5W. 32 Zen core draw 80W. Plus NB and SB and we are around 95W. To calculate 180W let's say exponent 2: 3GHz...
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
I don't think AMD would overstate the power consumption of their CPUs, especially for the server segment. At least not by > 90%... The 32C SKU "1451" in the Geekbench leak has TDP of 180W and it allegedly can be even configured upwards (cTDP).
 
  • Like
Reactions: Dresdenboy

bjt2

Senior member
Sep 11, 2016
784
180
86
I don't think AMD would overstate the power consumption of their CPUs, especially for the server segment. At least not by > 90%... The 32C SKU "1451" in the Geekbench leak has TDP of 180W and it allegedly can be even configured upwards (cTDP).

So are you saying that Zen at 3GHz is in the linear zone and so it will scale above 3.2GHz?
Or suddenly after 3GHz the exponent is 3 and it does not go over 3.2GHz (luckily the same frequency of BWE)?
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
I expect the desktop variants of Zeppelin to be completely maxed out, right out of the box. I think we can all agree at this point, that Zeppelin is not going to reach the clocks of the previous AMD desktop / server architecture (Piledriver) or even more importantly the clocks of more recent Intel designs (>= Ivy Bridge). In order to minimize the penalty from the lower clocks, they basically have to clock it very close or even beyond the efficient operating range. Much like they have done in the past with Vishera (FX-9000 -series) and Polaris GPUs. If they have to push it as far beyond the optimal range as they pushed Polaris, then close to ^4 scaling in the highest boost frequencies is not impossible (^4 has already been seen on user clocked Polaris cards).
 

bjt2

Senior member
Sep 11, 2016
784
180
86
I expect the desktop variants of Zeppelin to be completely maxed out, right out of the box. I think we can all agree at this point, that Zeppelin is not going to reach the clocks of the previous AMD desktop / server architecture (Piledriver) or even more importantly the clocks of more recent Intel designs (>= Ivy Bridge). In order to minimize the penalty from the lower clocks, they basically have to clock it very close or even beyond the efficient operating range. Much like they have done in the past with Vishera (FX-9000 -series) and Polaris GPUs. If they have to push it as far beyond the optimal range as they pushed Polaris, then close to ^4 scaling in the highest boost frequencies is not impossible (^4 has already been seen on user clocked Polaris cards).

Alleged INT pipeline depth of Bulldozer is 20 stages. Zen INT pipeline stages is 19. Excavator on the 28nm BULK reach 3.8GHz base and 4.2GHz turbo. 14nm FF gave to Polaris +20% clock speed with less power. Why for Zen can't be the same? Or at least same frequency as excavator...
 

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
Alleged INT pipeline depth of Bulldozer is 20 stages. Zen INT pipeline stages is 19. Excavator on the 28nm BULK reach 3.8GHz base and 4.2GHz turbo. 14nm FF gave to Polaris +20% clock speed with less power. Why for Zen can't be the same? Or at least same frequency as excavator...
I am fairly certain someone can get an EE degree writing about what determines clocking potential of any given design.

P. S. You still ignore the fact that A9X has to deal with a different ISA, though.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
14nm FF gave to Polaris +20% clock speed with less power.

20% clock speed, over what?
28nm Tahiti from 2012 was shipped at > 1150MHz by several vendors and there are 28nm Bonaire cards (from 2013) which are clocked 1200MHz out of the box. Not to mention that Bonaire also generally clocks better than Polaris cards. They got a small improvement in power, but basically no improvement in frequencies what so ever. Considering where Polaris ASICs would ideally be clocked at (based on the curve I've posted), I'd say they rather lost frequency than gained it.
 

bjt2

Senior member
Sep 11, 2016
784
180
86
I am fairly certain someone can get an EE degree writing about what determines clocking potential of any given design.

P. S. You still ignore the fact that A9X has to deal with a different ISA, though.

MIPS are MIPS and FLOPS are FLOPS. If you are talking about decoders, Zen has uop cache and decoders will be gated 50-90% of the time...
Moreover I underestimated all positive factors...

- 5W given all to the CPUs
- Ignored the differences in FO4
- Assumed that Apple can do a custom design and that A9X is not an ASIC, fact that is not sure...
ASICS has FO4>=30, custom designs have FO4<=25... At least 20% more clock at ISO power for Zen...
Even admitting A9X is a custom design, it has 16 stages versus 19, so in any case Zen can have clock 20% higher at the same Vcore. Moreover 19 stages int pipelines is similar to Bulldozer (unknown but max 20). Buldozer is 3.8-4.2GHz on 28nm BULK and MAGICALLY ZEN can't go over 3GHz on a 14nm FF?
Ok, fine... You are right. Zen can't go over 3GHz and AMD engineers were fool to not port excavator to 14nm that could have given over 4GHz CPUs that could compensate the higher IPC of Zen...
 

bjt2

Senior member
Sep 11, 2016
784
180
86
20% clock speed, over what?
28nm Tahiti from 2012 was shipped at > 1150MHz by several vendors and there are 28nm Bonaire cards (from 2013) which are clocked 1200MHz out of the box. Not to mention that Bonaire also generally clocks better than Polaris cards. They got a small improvement in power, but basically no improvement in frequencies what so ever. Considering where Polaris ASICs would ideally be clocked at (based on the curve I've posted), I'd say they rather lost frequency than gained it.

At what power? AMD clocked it at 1050-1266MHz with a TDP of 110W. Tahiti: https://www.techpowerup.com/gpudb/2398/radeon-r9-280x
2048 shaders, 250W tdp, 850-1000 MHz...

And I am talking of default designs...
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
The fact remains that Polaris shipped at higher clocks than 28nm products. I really dont understand why some of you believe clocks will decrease at 14nm FF. And really, i dont see why ZEN Fmax will top at 3GHz.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,476
136
20% clock speed, over what?
28nm Tahiti from 2012 was shipped at > 1150MHz by several vendors and there are 28nm Bonaire cards (from 2013) which are clocked 1200MHz out of the box. Not to mention that Bonaire also generally clocks better than Polaris cards. They got a small improvement in power, but basically no improvement in frequencies what so ever. Considering where Polaris ASICs would ideally be clocked at (based on the curve I've posted), I'd say they rather lost frequency than gained it.

Its important that you make comparisons at the respective time frames in a process's lifetime. So if you compare Polaris on 14LPP at launch then make that comparison with Tahiti at launch (Jan 2012). The fact is AMD launched Tahiti at 800-925 Mhz . Then once yields improved AMD launched HD 7970 Ghz in Aug 2012 at 1050 Mhz and HD 7950 boost at 925 Mhz. As process maturity and yields improve we are going to see improved Polaris chips as hinted by the Rx 400 series naming scheme

http://videocardz.com/61721/amd-radeon-rx-400-series-naming-scheme-explained

You might be looking at a significantly tweaked Polaris in H1 2017 as GF 14LPP seems to be pretty horrible at the moment in terms of process variation and yields. GF has always been bad at launch but they make good improvements over the life of the process. The 32nm SOI process was a classic example. The Llano chip had yield issues and Bulldozer chip launched with horrible clocks in late 2011. But once Piledriver launched the process was in much better shape. Eventually the 32nm SOI process was good for 5 Ghz. So do not try to chose arbitrary points of comparison in process lifetime to fit your narrative.
 

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
MIPS are MIPS and FLOPS are FLOPS. If you are talking about decoders, Zen has uop cache and decoders will be gated 50-90% of the time...
Moreover I underestimated all positive factors...

- 5W given all to the CPUs
- Ignored the differences in FO4
- Assumed that Apple can do a custom design and that A9X is not an ASIC, fact that is not sure...
ASICS has FO4>=30, custom designs have FO4<=25... At least 20% more clock at ISO power for Zen...
Even admitting A9X is a custom design, it has 16 stages versus 19, so in any case Zen can have clock 20% higher at the same Vcore. Moreover 19 stages int pipelines is similar to Bulldozer (unknown but max 20). Buldozer is 3.8-4.2GHz on 28nm BULK and MAGICALLY ZEN can't go over 3GHz on a 14nm FF?
Ok, fine... You are right. Zen can't go over 3GHz and AMD engineers were fool to not port excavator to 14nm that could have given over 4GHz CPUs that could compensate the higher IPC of Zen...
MIPS, FLOPS and uop cache is irrelevant to all that. What matters is that plain decoder complexity on x86 will be way larger. How does that play into power consumption? Well, it has to, but how much is a question to someone who majors in CPU design, not myself. But that is enough to render comparisons between A9X and Zen power-consumption wise entirely irrelevant.
However, a plot twist, if we look at Xeon D instead, there we can find out that 2Ghz base clock 8c16t SKU has 45W TDP. And since it's a SoC with dual channel memory and a bunch, it's basically a perfect comparison we can scale up in core count. And viola, 4 D-1540s are exactly 180W TDP with 2Ghz base clock. Based on this and only this, i conclude that 180W 32c64t Naples is entirely possible and even assume that it can have much healthier base clock in release version than 1.4Ghz (or all-core turbo, at least). The only question that remains is how inferior is GloFo 14LPP to Intel's 14nm in this regard.

Now, onto your faulty assumption of fmax. No, we can't assume fmax is 3Ghz. It has to be at least 3.2Ghz based on ES name :p.

EDIT: Actually, AMD engineers were fools for not abandoning Bulldozer when it failed the first time on 45nm.
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
At what power? AMD clocked it at 1050-1266MHz with a TDP of 110W. Tahiti: https://www.techpowerup.com/gpudb/2398/radeon-r9-280x
2048 shaders, 250W tdp, 850-1000 MHz...

And I am talking of default designs...
Erm, Tahiti at 925 MHz consumed 163W of power according to TPU numbers, and you forget that Tahiti, with 6 GB of VRAM is in FirePro D700 in Mac Pro. And that GPU has clocks of 5400 MHz on Memory, 850 MHz on core, and TDP of 129W.

On the other hand, 125W consumes RX 470 while having 50% higher core clock(1206 MHz). And most of the power consumed is on the memory side, rather than the GPU.

The GPU die for D700 consumes 70W of power while having 850 MHz core clock.
GPU die for RX 470 can consume around 85-90W, while having 1206 MHz core clock.

This is VERY off-topic.
 

bjt2

Senior member
Sep 11, 2016
784
180
86
MIPS, FLOPS and uop cache is irrelevant to all that. What matters is that plain decoder complexity on x86 will be way larger. How does that play into power consumption? Well, it has to, but how much is a question to someone who majors in CPU design, not myself. But that is enough to render comparisons between A9X and Zen power-consumption wise entirely irrelevant.
However, a plot twist, if we look at Xeon D instead, there we can find out that 2Ghz base clock 8c16t SKU has 45W TDP. And since it's a SoC with dual channel memory and a bunch, it's basically a perfect comparison we can scale up in core count. And viola, 4 D-1540s are exactly 180W TDP with 2Ghz base clock. Based on this and only this, i conclude that 180W 32c64t Naples is entirely possible and even assume that it can have much healthier base clock in release version than 1.4Ghz (or all-core turbo, at least). The only question that remains is how inferior is GloFo 14LPP to Intel's 14nm in this regard.

Now, onto your faulty assumption of fmax. No, we can't assume fmax is 3Ghz. It has to be at least 3.2Ghz based on ES name :p.

EDIT: Actually, AMD engineers were fools for not abandoning Bulldozer when it failed the first time on 45nm.

Even if the AMD decoders draw double the power of A9X decoders, there is uop cache that keep them off 50-90% of the time. And decoder power is probabily less than 1/3 of TDP... How much is the difference? 5-10% TDP? And you want throw all my reasoning for a 5-10%...
 
Status
Not open for further replies.