Intel Skylake / Kaby Lake

Page 663 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Topweasel

Diamond Member
Oct 19, 2000
4,912
510
136
Yeah the bandwidth is fine, however why do you think Ryzen does well in AVX?
Only with 128-bit ops it does ok.
The idea behind Ryzen is that it has more cores to make up for lost 256bit performance. It's down on Coffeelake this year, but if Ryzen 3k is at max a 12 or more core CPU, then it balances out. It ends up being better at 128bit, ties on 256bit, and then loses completely due to lack of support on 512. I am guess Zen 3 will probably move to 256bit units and do 512 the way it's doing 256 right now.
 

IRobot23

Senior member
Jul 3, 2017
601
18
76
The idea behind Ryzen is that it has more cores to make up for lost 256bit performance. It's down on Coffeelake this year, but if Ryzen 3k is at max a 12 or more core CPU, then it balances out. It ends up being better at 128bit, ties on 256bit, and then loses completely due to lack of support on 512. I am guess Zen 3 will probably move to 256bit units and do 512 the way it's doing 256 right now.
I don't think that is idea.
- Core with AVX512 will take around 20-40% more space than a core that can do AVX256.
- With AVX 512-bit you can double performance. So basically 40% more space, but double throughput.

- Why would you need more than 8 cores on desktop?
- 12Core vs 8Core with 15% higher IPC and 20% higher clocks? Which one would you pick?


I think idea is maximizing efficiency on all three scenarios power/die/performance.

I think AVX should be accelerated by iGPU(GPU)
 
Last edited:

Topweasel

Diamond Member
Oct 19, 2000
4,912
510
136
I don't think that is idea.
- Core with AVX512 will take around 20-40% more space than a core that can do AVX256.
- With AVX 512-bit you can double performance. So basically 40% more space, but double throughput.
I think idea is using AVX perf/power efficiently.
They are unlikely to go with a full blown AVX512 setup any time soon. It's not just the space requirements but the power requirements. Eventually if AVX512 is implemented enough AMD might feel it's hands forced. I kind of doubt it as its capability is split across nearly a dozen implementations, keeping it pretty much vendor specific. But if they feel that they could update to 256 units on core, giving them in theory a performance lead as long as the have a core lead, they could still do 512 work, with one of the more limited subsets.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
85
106
The idea behind Ryzen is that it has more cores to make up for lost 256bit performance. It's down on Coffeelake this year, but if Ryzen 3k is at max a 12 or more core CPU, then it balances out. It ends up being better at 128bit, ties on 256bit, and then loses completely due to lack of support on 512. I am guess Zen 3 will probably move to 256bit units and do 512 the way it's doing 256 right now.
Not all workloads can be parallelized infinitely, if at all.
Modern video encoders such as VP9 and HEVC are a good example.
With HEVC you can only utilize ~10 threads efficiently at 1080 resolution. To efficiently utilize more threads than that, the resolution needs to go up.
That's why it is important to maximise the ILP, which is achieved using >= 256-bit code.

In my opinion, AMD HAS TO make Gen. 3 Zen wider.
Intel went wide with their consumer cores already in 2013 after all.
And its not like the wide workloads are getting any less common.
 

Topweasel

Diamond Member
Oct 19, 2000
4,912
510
136
Not all workloads can be parallelized infinitely, if at all.
Modern video encoders such as VP9 and HEVC are a good example.
With HEVC you can only utilize ~10 threads efficiently at 1080 resolution. To efficiently utilize more threads than that, the resolution needs to go up.
That's why it is important to maximise the ILP, which is achieved using >= 256-bit code.

In my opinion, AMD HAS TO make Gen. 3 Zen wider.
Intel went wide with their consumer cores already in 2013 after all.
And its not like the wide workloads are getting any less common.
You can go wider without going straight to AVX-512 Native support. No reason to waste the real-estate, power requirements, and overall uselessness of it just to say you have it.
 

PhonakV30

Senior member
Oct 26, 2009
954
26
136
Ryzen7 1800X - 3600 Mhz, 8 cores, 2-way SMT, 128-bit registers == 3600000 * 8 * 2 * 4 == 230400000 32-bit IOPS
Core i7-7700K - 4200 Mhz, 4 cores, 2-way SMT, 256-bit registers == 4200000 * 4 * 2 * 8 == 268800000 32-bit IOPS
Core i7-7740X - 4300 Mhz, 4 cores, 2-way SMT, 256-bit registers == 4300000 * 4 * 2 * 8 == 275200000 32-bit IOPS
Core i7-7800X - 3500 Mhz, 6 cores, 2-way SMT, 256-bit registers == 3500000 * 6 * 2 * 8 == 336000000 32-bit IOPS
Core i7-5960X - 3000 Mhz, 8 cores, 2-way SMT, 256-bit registers == 3000000 * 8 * 2 * 8 == 384000000 32-bit IOPS
Core i7-6900K - 3200 Mhz, 8 cores, 2-way SMT, 256-bit registers == 3200000 * 8 * 2 * 8 == 409600000 32-bit IOPS
Core i7-7820X - 3600 Mhz, 8 cores, 2-way SMT, 256-bit registers == 3600000 * 8 * 2 * 8 == 460800000 32-bit IOPS
8 threads * 8-way SIMD = 64 32-bit ops in parallel
16 threads * 4-way SIMD = 64 32-bit ops in parallel
32 threads * 4-way SIMD = 128 32-bit ops in parallel
24 threads * 8-way SIMD = 192 32-bit ops in parallel

That's why 256-bit registers are better than more cores. And leveraging AVX2 is trivial, the OpenCL compiler does it for you even.
Now you look at this link , therefor Ryzen doesn't have a dedicated AVX2 hardware just AVX2 emulation.Just like what Stilt said , AMD really need a Fully 256 bit register not two 128-bit.
 

Topweasel

Diamond Member
Oct 19, 2000
4,912
510
136
Now you look at this link , therefor Ryzen doesn't have a dedicated AVX2 hardware just AVX2 emulation.Just like what Stilt said , AMD really need a Fully 256 bit register not two 128-bit.
Yeah, that what I said didn't I?

I am guessing Zen 3 will probably move to 256bit units and do 512 the way it's doing 256 right now.
 

raghu78

Diamond Member
Aug 23, 2012
4,000
34
136
I don't think that is idea.
- Core with AVX512 will take around 20-40% more space than a core that can do AVX256.
- With AVX 512-bit you can double performance. So basically 40% more space, but double throughput.

- Why would you need more than 8 cores on desktop?
- 12Core vs 8Core with 15% higher IPC and 20% higher clocks? Which one would you pick?


I think idea is maximizing efficiency on all three scenarios power/die/performance.

I think AVX should be accelerated by iGPU(GPU)
Not all workloads can be parallelized infinitely, if at all.
Modern video encoders such as VP9 and HEVC are a good example.
With HEVC you can only utilize ~10 threads efficiently at 1080 resolution. To efficiently utilize more threads than that, the resolution needs to go up.
That's why it is important to maximise the ILP, which is achieved using >= 256-bit code.

In my opinion, AMD HAS TO make Gen. 3 Zen wider.
Intel went wide with their consumer cores already in 2013 after all.
And its not like the wide workloads are getting any less common.
512 bit FP units is a total waste of valuable die space and power especially for a company like AMD who have dedicated High Performance GPUs for accelerating FP workloads. AMD could decide to go for 256 bit FP units in Zen 2 as 7nm will allow them a big increase in transistor budget. Still I would think AMD are better off going with 128 bit FP units and trying to pack as many cores as possible at 7nm. If AMD Rome brings 64C/128T on a single socket thats a massive leap.

https://hothardware.com/news/amd-epyc-2-64-cores-128-threads-and-256mb-l3-cache

Intel's rumoured 38C ICL-SP will have a tough time against 64C Rome in the vast majority of server workloads except AVX based , which are better off being accelerated by GPUs and will run circles around the CPU based SKUs.

https://www.heise.de/newsticker/meldung/Xeon-Phi-ist-tot-es-lebe-der-Xeon-H-3891026.html

Here is an excellent illustration of the drawbacks of the negative AVX frequency offset affecting server workload scenarios

https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling/
 

The Stilt

Golden Member
Dec 5, 2015
1,709
85
106
I didn't say anything about going 512-bit wide, but wider.
256-bit is definitely the way to go, at least for the few years to come.
 

krumme

Diamond Member
Oct 9, 2009
5,786
172
136
I think we all agree here guys.
I think using zen2 derivative for the consoles is priority amd that imo due to mm2 constraints will probably mean we will have to wait to zen3 for wider fpu. Whats your guess here?
 

NostaSeronx

Platinum Member
Sep 18, 2011
2,444
197
126
Whats your guess here?
January 2017 -redacted- joined back to AMD with this: "...microachitectural designs of floating point register caching, per-lane predication, scatter/gather support and wide 512-bit datapath."

If anyone has to guess Zen 2 will be at maximum SMT4/AVX512.

128-bit and 256-bit ops are outdated; particularly AVX/SSE. AVX512 128-bit/256-bit ops are okay.
The VL extension enables AVX-512 instructions to operate on XMM (128-bit) and YMM (256-bit) registers, and are not limited to just the full ZMM registers. This symmetry definitely is good news. AVX-512, with the VL extension, seems well set to be the programming option of choice for compilers and hand coders because it unifies so many capabilities together along with access to 32 vector registers regardless of their size (XMM, YMM or ZMM).
- https://www.hpcwire.com/2017/06/29/reinders-avx-512-may-hidden-gem-intel-xeon-scalable-processors/
 
Last edited:

IRobot23

Senior member
Jul 3, 2017
601
18
76
I didn't say anything about going 512-bit wide, but wider.
256-bit is definitely the way to go, at least for the few years to come.
How much more power will they need for that. We know that Intel node is still superior. AMD will have hard time keeping power down at same performance as intel.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
85
106
How much more power will they need for that. We know that Intel node is still superior. AMD will have hard time keeping power down at same performance as intel.
So far I haven't measured myself, but I've seen 60% quoted over 128-bit SIMDs in some publications.
The additional resources are only powered on on-demand, so while executing legacy code the difference in power draw is negligible.
 


ASK THE COMMUNITY

TRENDING THREADS