Coffeelake thread, benchmarks, reviews, input, everything.

Page 13 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,730
136
Superior in what sense? Are you suggesting Intel's SMT implementation is inefficient? Or is this a case of a stronger, more efficient core leaving little resources for smt?
Superior in the sense that when workloads are ideally suited to SMT in general, which means excluding low latency or FPU intensive scenarios, AMD's SMT comes out ahead of Intel's HT more often than not.
 
  • Like
Reactions: Drazick

hnizdo

Member
Aug 11, 2017
33
16
41
Superior in the sense that when workloads are ideally suited to SMT in general, which means excluding low latency or FPU intensive scenarios, AMD's SMT comes out ahead of Intel's HT more often than not.

Dou you have any example for this statement?
 

epsilon84

Golden Member
Aug 29, 2010
1,142
927
136
I think Cinebench MT is a 'best case' scenario for AMD, as in, it shows its SMT implementation in the best light. Look at upcoming reviews, you will see the 8700K trade blows with the 1800X in most multithreaded apps/benchmarks but the 1800X will be well ahead in CB 15 MT.

That being said, it has been shown that AMDs implementation of SMT *is* slightly superior to Intel's HT, not by a huge margin, I think it was a few percent. IIRC averaged out, Intels HT adds ~25% to MT throughput whereas AMDs SMT added ~28%. I don't remember where I saw this though so don't ask me for a source, I just rememeber reading it during the launch of Ryzen.

So AMD does gain on Intel slightly in heavy MT loads, but its not enough to bring it to performance parity, clock for clock. Otherwise a 1600X will come very close to a 8700/8700K at stock - this is obviously not the case. The 8700K is closer to 1700X/1800X levels of MT performance.
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,065
3,882
136
I think Cinebench MT is a 'best case' scenario for AMD, as in, it shows its SMT implementation in the best light. Look at upcoming reviews, you will see the 8700K trade blows with the 1800X in most multithreaded apps/benchmarks but the 1800X will be well ahead in CB 15 MT.

That being said, it has been shown that AMDs implementation of SMT *is* slightly superior to Intel's HT, not by a huge margin, I think it was a few percent. IIRC averaged out, Intels HT adds ~25% to MT throughput whereas AMDs SMT added ~28%. I don't remember where I saw this though so don't ask me for a source, I just rememeber reading it during the launch of Ryzen.
Its mostly down to core execution resources, both have the same amount of load/store ops a cycle, so that more offen then not is the limit. When it isn't the limit, AMD uop throughput but also its much more symmetrical pipelines ( more likely to be able to schedule an instruction quicker) would be the main reasons for the difference.

Now if either one of AMD/Intel channel IBM and got to a 4x load/store setup then the SMT dynamic would completely change. But in the world of small dies with on package interconnects smt just an optional thing, it doesn't improve perf/watt ( in many cases you go backwards) but it does help with the benchmark wars.......
 

Zucker2k

Golden Member
Feb 15, 2006
1,810
1,159
136
I think Cinebench MT is a 'best case' scenario for AMD, as in, it shows its SMT implementation in the best light. Look at upcoming reviews, you will see the 8700K trade blows with the 1800X in most multithreaded apps/benchmarks but the 1800X will be well ahead in CB 15 MT.

That being said, it has been shown that AMDs implementation of SMT *is* slightly superior to Intel's HT, not by a huge margin, I think it was a few percent. IIRC averaged out, Intels HT adds ~25% to MT throughput whereas AMDs SMT added ~28%. I don't remember where I saw this though so don't ask me for a source, I just rememeber reading it during the launch of Ryzen.

So AMD does gain on Intel slightly in heavy MT loads, but its not enough to bring it to performance parity, clock for clock. Otherwise a 1600X will come very close to a 8700/8700K at stock - this is obviously not the case. The 8700K is closer to 1700X/1800X levels of MT performance.
25% of what base score? We can't look at these numbers in isolation. HT is extra/surplus processing. Remember, it's the same core doing the work. If we take a hypothetical SMT score of 25% for Ryzen 1700, 8x25=200, whereas, for Kabylake at 25%, 4x25=100, assuming linear core scaling as in CB. In the above example, this translates into a 50% HT deficit for kabylake.
129fe87d60b5ae7a3b2801f5d1741a98.png

332e5c382d8577ae897fdce9262205c2.png

35ab345ef01f0f327abfbfe257f6247f.png


https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/#post-38770111

Edited: Changed coffeelake to kabylake - in line with the charts.
 
Last edited:

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,730
136
25% of what base score? We can't look at these numbers in isolation. HT is extra/surplus processing. Remember, it's the same core doing the work. If we take a hypothetical SMT score of 25% for Ryzen 1700, 8x25=200, whereas, for Coffeelake at 25%, 4x25=100%, assuming linear core scaling as in CB. In the above example, this translates into a 50% HT deficit for coffeelake.
That should be 6, and consequently the HT deficit, according to your calculation, should be 25%.
 
  • Like
Reactions: Drazick

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
To close your IPC debate:

There is Instruction Per Clock Dispatched(by scheduler), and Instruction Per Clock Executed.

First is 100% totally CPU dependent. Its hardware level IPC.
Second one is mix of Hardware and software performance.

In each, and every IPC calculation you think everything on CPU side is 100% hardware only dependent. Its not. Otherwise we would not see gains with each software update with optimizations for said CPUs.

Know this difference before you start any IPC debate, that is derailing every thread.


Can we get back to COFFEE LAKE Thread?
 

Zucker2k

Golden Member
Feb 15, 2006
1,810
1,159
136
That should be 6, and consequently the HT deficit, according to your calculation, should be 25%.
Should be Kabylake. See charts. So 50% at the same core clock, sku vs sku. See your link.
 
Last edited:

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,730
136
Splitting hairs again, I see.
Maybe I should point out how dedicated AES hardware on Zen gives it superior performance in encryption, just like how dedicated FPUs on Skylake gives it superior performance in LINPACK.
 
  • Like
Reactions: Drazick

Zucker2k

Golden Member
Feb 15, 2006
1,810
1,159
136
Yes, without AVX2 assuming perfect scaling.
Thought the 12% was already 'AVX-corrected'? And Linpack is only 1 app out of 20+ tested. Yes, 25% clock scaling is best case, but that applies somewhat to AMD as well. These chips are running at 3.5GHz, ya know.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Does that mean that the 2nd CPU is worse for multithreaded performance? Absolutely not. If you believe otherwise then explain why the Platinum 8160 exists when it should in theory just barely beat the Gold 6154 in a Cinebench-like workload with 33% more cores(freq*core count = 67.2 for 8160, 66.6 for 6154)

Xeon Platinums have RAS and 8S capability and should not be compared with Gold stuff. You either need those features or You don't.

Gold is for 1-4S and priced according to perf/features. For example both Gold 6138 and Gold 6148 have 20 cores, but their turbo ratios have enough differential ( ~60 vs 66 ) to justify purchase of the following.

So obviuosly You can't compare different lineups and even blind multiplication of turbo x core does not account for difference in cache and/or wattage.

P.S. Intel's lineup is stright from marketing hell, we just went through server purchase and decision was really muddy compared to 2690's we kept on buying before. Anandtech's non-AVX turbo ratio table is pure gold :)
 
  • Like
Reactions: Arachnotronic

NTMBK

Lifer
Nov 14, 2011
10,435
5,785
136
This talk all makes no sense to me. Why on earth would AVX2 boost IPC? Surely at best IPC should be the same vs. SSE4, and at worst go down slightly? Each AVX2 instruction can pull in more data, so the odds of a memory stall on each instruction is higher, meaning IPC is liable to go down.

Of course, throughput/clock will be up... but that's not IPC.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,730
136
Thought the 12% was already 'AVX-corrected'? And Linpack is only 1 app out of 20+ tested. Yes, 25% clock scaling is best case, but that applies somewhat to AMD as well. These chips are running at 3.5GHz, ya know.
An additional data point was added to the charts to indicate performance with 256-bit workloads excluded (the ones which have actual gains from 256-bit code).
The excluded 256-bit workloads are: Blender, Bullet (IPC only), Embree, Euler3D, Himeno, Linpack, NBody & X265.
 
  • Like
Reactions: Drazick

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
This talk all makes no sense to me. Why on earth would AVX2 boost IPC?

If one of CPUs is executing same 256bit vector workload in half of the time and program instruction count to retire is const, wouldn't the faster CPU have double IPC?
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,730
136
Xeon Platinums have RAS and 8S capability and should not be compared with Gold stuff. You either need those features or You don't.
What if I want a 2P 56 core system? Or even a single 28C workstation? Surely Platinums being capable of 8p doesn't mean that that's the only configuration they're capable of running.
So obviuosly You can't compare different lineups and even blind multiplication of turbo x core does not account for difference in cache and/or wattage.
So what's happening with speculation around Coffee Lake Cinebench MT scores, and with posts with a similar line of argument?
 
  • Like
Reactions: Drazick