AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Page 73 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

inf64

Diamond Member
Mar 11, 2011
3,884
4,692
136
Did you even read what I wrote? How can SMT come on top if the top is already reached?
SMT is about throughput anyway (MT as you stated) and not about speed so it doesn't even matter for finding out how fast they will be.

So you are denying that Zen core can execute MORE instructions per cycle in MT workloads per core than in ST workloads? Because that is what you are saying and is totally opposite from what AMD Zen architect answered @ HC.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Did you even read what I wrote? How can SMT come on top if the top is already reached?
SMT is about throughput anyway (MT as you stated) and not about speed so it doesn't even matter for finding out how fast they will be.

There is no way you will saturate a wide core like ZEN without SMT.
 
  • Like
Reactions: inf64

cytg111

Lifer
Mar 17, 2008
26,098
15,549
136
Did you even read what I wrote? How can SMT come on top if the top is already reached?
SMT is about throughput anyway (MT as you stated) and not about speed so it doesn't even matter for finding out how fast they will be.

So in your oppinion, if a core was born with 100 threads the correct way of measuring this cores IPC is with all 100 threads at full load?
 

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
So you are denying that Zen core can execute MORE instructions per cycle in MT workloads per core than in ST workloads? Because that is what you are saying and is totally opposite from what AMD Zen architect answered @ HC.
There is a fixed amount of commands per core,if one thread can use ALL of them(100% IN EVERY cycle) then SMT can't do anything to improve on this 100% usage.
If one thread can only use say 80% of available commands then SMT can give you 20% improvement.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
There is a fixed amount of commands per core,if one thread can use ALL of them(100% IN EVERY cycle) then SMT can't do anything to improve on this 100% usage.
If one thread can only use say 80% of available commands then SMT can give you 20% improvement.

The problem is you will never get 100% from a single thread with a wide core like ZEN.

That's what I was saying from the beginning...

No you haven't.
 

inf64

Diamond Member
Mar 11, 2011
3,884
4,692
136
Wow, sorry to say this but I have to stop discussing this matter with you Elf since you simply have no clue what you are talking about. AtenRa is way more patient man than I am :/.
 

jpiniero

Lifer
Oct 1, 2010
16,810
7,254
136
On pricing, Remember that the 9590 was $300 when it was first released to the DIY market (at least according to AT, it was $900 at first for boutique OEMs lol). Didn't last that long at $300 as they cut it to $220 a month or two later. I do expect AMD to have crazy pricing for Zen initially and just cut it later.
 

bjt2

Senior member
Sep 11, 2016
784
180
86
Zen SR will be competing against Kabylake 4C/4T , 4C/8T and Broadwell 8C/16T when it launches in Q1 2017. In H2 2017 Zen will SR will have to contend with the monster Skylake HEDT. Even in best case scenarios with Broadwell IPC the 4C/8T Zen will have to contend with 10-15% higher IPC of Kabylake core i5 combined with roughly 20% higher OC headroom even if Zen SR can overclock to 4-4.2 Ghz as Kabylake looks to be able to easily hit 5 Ghz . So there is no way that AMD can price 4C/8T Zen even on par with core i5 kabylake unlocked as the kaby core i5 will pretty much dominate in the majority of desktop workloads which use upto 4 threads. The few apps which use 8 threads will still see core i5 kabylake come out pretty much ahead given the 20% higher max OC and 10-15% higher IPC. Zen's SMT is unlikely to give above 20% perf increase. So yeah there is no way 4C/8T Zen can launch for any price close to USD 300.

Mean IPC in consumer code is around 1/cycle or lower. HPC FPU code can reach 2.5 (e.g. Spec FP). Almost only power viruses can go over 3.
AMD can do 4 INT PLUS 4 FP PLUS 2 MEM.
INTEL can do 4 among FP and INT (and FP are limited to 2 true FP + 2 vecint) and 3-4 MEM.
So in some, if not most, cases AMD's SMT can even beat INTEL's...
 
  • Like
Reactions: Dresdenboy

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
No you haven't.
Sure I have.
It's not 40% faster,it's 40% more IPC,that's throughput not speed,it will only be 40% faster if you actually find a software that will be able to use all 10 instructions the ZEN core has available per cycle.
Which will be pretty difficult since there aren't many CPUs out there (if there are any) with 10 instructions per core,I guess that's why they went with blender instead of some "traditional" benchmark.
 

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
40% higher IPC means 40% faster Single Thread Performance at the same clocks..
That is what you say it means,if you would ask a court they would come to the same conclusion they came to when people asked them what a core is.
 

cytg111

Lifer
Mar 17, 2008
26,098
15,549
136
That is what you say it means,if you would ask a court they would come to the same conclusion they came to when people asked them what a core is.

Dude.. I see your point however i gurantee that is not what everyone else thinks of ipc.. including amd.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,036
136
It's not 40% faster,it's 40% more IPC,that's throughput not speed,it will only be 40% faster if you actually find a software that will be able to use all 10 instructions the ZEN core has available per cycle.
Which will be pretty difficult since there aren't many CPUs out there (if there are any) with 10 instructions per core,I guess that's why they went with blender instead of some "traditional" benchmark.

First off, Zen can only sustain decode of four ISA ops per cycle - not ten. Obviously the uop cache will help there, some ops will crack to multiple uops, etc, but even measuring uops, it's a maximum of six per cycle (which is still not ten.) The 40% here is realistically calculated from generic ST workloads, likely integer-heavy ones. The purpose of having so many functional units is to allow a favorable instruction mix within a given machine width, not to somehow force you to use all ten per cyc to hit maximum performance.

Second, on the assessment that 10 execution pipes (which seems to be what you're referring to) is unusually many -

Power8 is 8-wide at the frontend, 10-issue, and has 16 execution pipes. P9-SMT8 is wider.

Multiflow Trace went up to 28-wide, front to back, in a VLIW uarch.

Intel Poulson is 12-issue in back, backing up a 6-wide frontend.
 
Last edited:
  • Like
Reactions: psolord

moonbogg

Lifer
Jan 8, 2011
10,731
3,440
136
Bogg, wanna try something interesting? Download Geekbench 4, downclock your CPU to 4.2GHz, run the test, and post a link to the results.

I want to see how SNB @ 4.2GHz does in this test and how it compares to the known XV @ 4.2GHz results that are out there.

3930k@4.2

https://browser.geekbench.com/v4/cpu/1089967

vKeM8Lr.jpg

4UHSK9P.jpg

gsVEFLf.jpg
 
Mar 10, 2006
11,715
2,012
126

Thank you, sir Bogg.

My Broadwell-E 6950X @ 4.2GHz (2.8GHz cache) gets 4585, so that implies that BDW IPC is about 16.4% higher than your SNB, so that's a good first "sanity check" on those results.

According to GB4, the best AMD A12-9800 on record does 2749 single core (@4.2GHz single core). Multiply that by 1.4x and you get...3848.

So, yeah, I'm thinking Zen is an AMD flavored Sandy Bridge.

EDIT: For teh lulz, I OC'd my cache to 3.6GHz and re-ran the test. Got 4600 on single thread. L3$ speed doesn't seem to have much of an impact on GB4 single thread performance.
 
Last edited:

moonbogg

Lifer
Jan 8, 2011
10,731
3,440
136
Well...lets just hope we somehow borked this little experiment, because slower than Sandy with clocks around 4.2 is not going to be super exciting, but its still a great deal at $300 for 8/16 cores/threads. I think a lot of people would jump on an 8/16 Sandy-like chip for around $300. Good enough for decently high refresh gaming and much better at multi threaded stuff. I won't be buying it, but I can't wait for real benchies.
 

majord

Senior member
Jul 26, 2015
509
711
136
Thank you, sir Bogg.

My Broadwell-E 6950X @ 4.2GHz (2.8GHz cache) gets 4585, so that implies that BDW IPC is about 16.4% higher than your SNB, so that's a good first "sanity check" on those results.

According to GB4, the best AMD A12-9800 on record does 2749 single core (@4.2GHz single core). Multiply that by 1.4x and you get...3848.

So, yeah, I'm thinking Zen is an AMD flavored Sandy Bridge.

EDIT: For teh lulz, I OC'd my cache to 3.6GHz and re-ran the test. Got 4600 on single thread. L3$ speed doesn't seem to have much of an impact on GB4 single thread performance.


did you look at the outliers which made up the score?

the most significant of which (from the first A12 set):

HTML5 DOM:

Bristol ridge :932
Haswell-E : 3908

= 420% IPC advantage

Then there's the inclusion of Memory performance in the test

I'm not saying all workloads should show a consistent performance delta between architectures, they never do, but you do have to be able to recognise an outlier that's so extreme it's capable of skewing an IPC comparison by 20-30%

This, and the inclusion of things like AES (Which work back in Excavators favor) are reasons why Geekbench is not very useful for architecture comparisons
 

deasd

Senior member
Dec 31, 2013
603
1,033
136
Geekbench has so much memory test and specific instruction test, like AIDA64, I don't think a general score could tell anything.
I'd prefer something like Fritzchess or wPrime which have pure arithmetic and branch predicting but no other tricks with compiler and long instructions. And whatever Intel and AMD just have too little improvement in these tests since several years ago, this fits the impression that both Intel and AMD have struggled improving performance these years.
 
  • Like
Reactions: prtskg

jpiniero

Lifer
Oct 1, 2010
16,810
7,254
136
I'm not saying all workloads should show a consistent performance delta between architectures, they never do, but you do have to be able to recognise an outlier that's so extreme it's capable of skewing an IPC comparison by 20-30%

I thought we had this discussion already... it's likely due to the 9800's lack of L3. The 8350 for instance gets around 3400 on the HTML 5 DOM test. If anything, it's great because it highlights weaknesses of a chip that you might not obviously see with just one test.
 
  • Like
Reactions: prtskg

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
I'd prefer something like Fritzchess or wPrime which have pure arithmetic and branch predicting but no other tricks with compiler and long instructions.
Why?How much of the daily software the average user runs you think works like this?
 

majord

Senior member
Jul 26, 2015
509
711
136
I thought we had this discussion already... it's likely due to the 9800's lack of L3. The 8350 for instance gets around 3400 on the HTML 5 DOM test. If anything, it's great because it highlights weaknesses of a chip that you might not obviously see with just one test.

Well I'm sorry if it's been discussed before, I wasn't aware, but regardless, it may be great for highlighting corner case issues but not representative of performance in general. More importantly for the purpose of this thread, no use at all for comparing architectures, since you want benchmarks that are predicable, and not heavily influenced by cache of Mem bandwidth
 
Last edited:
Status
Not open for further replies.