Ryzen's halved 256bit AVX2 throughput

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
If the reviews have made one thing clear, it's that Ryzen's content creation capabilities are formidable. Content creation, ie rendering, encoding, transcoding typically utilize SIMD instructions to a high degree, as they are very compute intensive.

I noticed when reading the reviews that despite the supposedly halved throughput, Ryzen was competing exceptionally well with Intel's HEDT processors. I have to admit, it surprised me. I knew Ryzen would do well in that area, but not this well.

Then I read Computerbase.de's fantastic review and it all made sense. Most reviewers utilize short clips or lower settings when doing rendering and encoding, understandably to reduce the already laborious amount of time it takes to review the CPUs. Look at the results for both blender and handbrake comparing light workloads to heavy workloads:
cbPesw.png

Ae2kBM.png


As you can see, the lighter the workload, the more favorable Ryzen appears, but as the workload lengthens or uses higher quality settings, then Intel's full width 256 bit SIMD throughput begins to gain steam.

What this sort of means, is that for serious content creation, Intel will still be the "go to" CPU as long as they have the full 256 bit SIMD throughput, and AMD has halved throughput.
 

Kenmitch

Diamond Member
Oct 10, 1999
8,505
2,249
136
If the reviews have made one thing clear, it's that Ryzen's content creation capabilities are formidable. Content creation, ie rendering, encoding, transcoding typically utilize SIMD instructions to a high degree, as they are very compute intensive.

As you can see, the lighter the workload, the more favorable Ryzen appears, but as the workload lengthens or uses higher quality settings, then Intel's full width 256 bit SIMD throughput begins to gain steam.

What this sort of means, is that for serious content creation, Intel will still be the "go to" CPU as long as they have the full 256 bit SIMD throughput, and AMD has halved throughput.

Well....Time is money! Guess it would depend on if you have more money or time to spare. More time get the 1700, more money get the 6900k.

Viewing the charts Ryzen really makes those older AMD chips look silly at best!
 
  • Like
Reactions: french toast

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
It looks like the 1800X is doing pretty well against the 6900K, even at higher workloads.

I never said it wasn't doing well. I'm just making an observation that Ryzen's halved 256 bit SIMD throughput definitely seems to hurt it for long renders, or heavy encoding. Case in point, the Intel hexcore 6850 is able to displace the 1700x in the heavier tests.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Well....Time is money! Guess it would depend on if you have more money or time to spare. More time get the 1700, more money get the 6900k.

I guess so! But AMD really need to move to full width 256 bit SIMD with Zen 2, if they really want to compete with Intel in content creation.

Viewing the charts Ryzen really makes those older AMD chips look silly at best!

I agree, Ryzen is almost twice as fast as the FX-9590, and lets not even talk about the Phenom. Mot of it is due to AVX2 optimization. AVX2 is finally coming home to roost, at least for content creation.
 

imported_jjj

Senior member
Feb 14, 2009
660
430
136
It's not really halved , people call it that because they do 2x128b but not half of Intel's and it's pretty strong considering.
AMD also benefits from better SMT yield than Intel.
All in all it's a neat core with solid efficiency and Intel is likely to follow a somewhat similar path in consumer to gain efficiency.

x264 shouldn't be using 256b
 
Last edited:

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
It's not really halved , people call it that because they do 2x128b but not half of Intel's and it's pretty strong considering.
AMD also benefits from better SMT yield than Intel.
All in all it's a neat core with solid efficiency and Intel is likely to follow a somewhat similar path in consumer to gain efficiency.

x264 shouldn't be using 256b

It's half the throughput of its competitor. Also, the number of AVX2 enabled programs continues to increase, along with the level of optimization. Just look at the benchmarks, and how non AVX2 CPUs do in comparison to AVX2 enabled processors. This 2x128 bit thing is a half measure, and whilst it might suffice for now, it won't cut it in the future as Intel seems hell bent on widening the SIMD vectors even more. AVX-512 will eventually make it to consumer chips one day, and if AMD is still on 128x2 because of "efficiency" then they're going to get a rude awakening.
 
  • Like
Reactions: pcp7

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
AVX, or memory bandwidth? You're comparing a dual channel platform with a quad channel one.

I don't know about blender, but from my experience, Handbrake is basically completely CPU bound. Memory bandwidth makes hardly any difference with handbrake, it's all cores, threads, clock speed and instruction set.
 

leoneazzurro

Senior member
Jul 26, 2016
924
1,451
136
I said I was not sure, so if they use I stand corrected. It was a little strange to me because theoretical output of Intel AVX2 CPU is double when doing FMA (2x256 bit).
 

ElFenix

Elite Member
Super Moderator
Mar 20, 2000
102,414
8,356
126
AVX, or memory bandwidth? You're comparing a dual channel platform with a quad channel one.

I don't know about blender, but from my experience, Handbrake is basically completely CPU bound. Memory bandwidth makes hardly any difference with handbrake, it's all cores, threads, clock speed and instruction set.

we do know that running single channel vs dual channel affects x265 (i need to update the thread!) thanks to @VirtualLarry for running it both ways on his G4600. so it's not as CPU bound as that. with a dual core kaby, i'd expect a CPU bound condition before a memory bandwidth bound condition. so with 4x moar cores (even if not quite so fast), i'd expect an even higher memory bound condition.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
I said I was not sure, so if they use I stand corrected. It was a little strange to me because theoretical output of Intel AVX2 CPU is double when doing FMA (2x256 bit).

Ryzen supports FMA as well, and the FX 9590 supports both AVX and FMA.

But look at the Phenom. The poor Phenom neither supports AVX2, AVX or FMA. Look at how slow it is compared to the other processors :eek:
 

Kenmitch

Diamond Member
Oct 10, 1999
8,505
2,249
136
I don't know about blender, but from my experience, Handbrake is basically completely CPU bound. Memory bandwidth makes hardly any difference with handbrake, it's all cores, threads, clock speed and instruction set.

It's not like Ryzen was designed with a specific task in mind. It's more of a all around chip that looks to somewhat live up to it's hype. Sure it'll have some growing pains which I'd imagine will be settled one by one. Might take a while? Could be in a month? Time will tell in the end.

I'd imagine anybody really seriously into the content creation end of the spectrum would be looking at Ryzen's performance and drooling over Naples cores!
 
  • Like
Reactions: lightmanek

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
we do know that running single channel vs dual channel affects x265 (i need to update the thread!) thanks to @VirtualLarry for running it both ways on his G4600. so it's not as CPU bound as that. with a dual core kaby, i'd expect a CPU bound condition before a memory bandwidth bound condition. so with 4x moar cores (even if not quite so fast), i'd expect an even higher memory bound condition.

Yeah, but you have to consider the cache size as well. CPUs with larger cache sizes will be less affected by memory bandwidth, as the working set will fit in the cache. Maybe I'm wrong, but I can't think of a single review I've ever seen where an Intel HEDT CPU with quad channel memory was affected by memory bandwidth in encoding/transcoding tests.

The G4600 has only 3MB of L3 cache, compared to the massive 20MB that a 6900K has. The G4600 also lacks AVX2 if I'm not mistaken.
 

leoneazzurro

Senior member
Jul 26, 2016
924
1,451
136
Ryzen supports FMA as well, and the FX 9590 supports both AVX and FMA.

But look at the Phenom. The poor Phenom neither supports AVX2, AVX or FMA. Look at how slow it is compared to the other processors :eek:
I know it supports FMA, but in AVX 2 FMA output is half respect to SKY/Kaby Lake, because Ryzen can do 2 FMA 128 bit, Intel 2 FMA 256 bit
 
  • Like
Reactions: lightmanek

imported_jjj

Senior member
Feb 14, 2009
660
430
136
It's half the throughput of its competitor. Also, the number of AVX2 enabled programs continues to increase, along with the level of optimization. Just look at the benchmarks, and how non AVX2 CPUs do in comparison to AVX2 enabled processors. This 2x128 bit thing is a half measure, and whilst it might suffice for now, it won't cut it in the future as Intel seems hell bent on widening the SIMD vectors even more. AVX-512 will eventually make it to consumer chips one day, and if AMD is still on 128x2 because of "efficiency" then they're going to get a rude awakening.

Keep dreaming. Intel is pushing 512b in server but what they need is a supple core for thin and shiny notebooks and for that, they need to drop some bulk not add.
Starting this year, even Windows on ARM becomes a competitor.
Plus Intel has been hinting at lower perf in the future.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
I know it supports FMA, but in AVX 2 FMA output is half respect to SKY/Kaby Lake, because Ryzen can do 2 FMA 128 bit, Intel 2 FMA 256 bit

I'm not a programmer so I don't know how useful or prevalent the FMAC instructions are when it comes to encoding/rendering. But if you look at the benchmarks, the Intel CPUs definitely gain on Ryzen as the workloads become longer and heavier. The Intel hexcore is basically on par with the 1800x in the Blenderman test, and would presumably overtake it on a longer render.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Keep dreaming. Intel is pushing 512b in server but what they need is a supple core for thin and shiny notebooks and for that, they need to drop some bulk not add.
Starting this year, even Windows on ARM becomes a competitor.
Plus Intel has been hinting at lower perf in the future.

Where do you think Intel's HEDT CPUs are sourced from? Their server CPUs. There are tons of professionals and amateurs out there that use workstations for content creation, where AVX-512 bit would be very helpful indeed.

There's always going to be a need for more processing power as long as content creation and gaming exists.
 

Ancalagon44

Diamond Member
Feb 17, 2010
3,274
202
106
It's half the throughput of its competitor. Also, the number of AVX2 enabled programs continues to increase, along with the level of optimization. Just look at the benchmarks, and how non AVX2 CPUs do in comparison to AVX2 enabled processors. This 2x128 bit thing is a half measure, and whilst it might suffice for now, it won't cut it in the future as Intel seems hell bent on widening the SIMD vectors even more. AVX-512 will eventually make it to consumer chips one day, and if AMD is still on 128x2 because of "efficiency" then they're going to get a rude awakening.

So, Ryzen has less memory bandwidth than its competitor, lower IPC, and half the AVX throughput, and the worst it does is 20% slower (if I estimate correctly)?

Geez I'm not sure I'd want to see what it would like if AMD doubled Ryzen's AVX capacity - it would be a slaughter!
 

Rifter

Lifer
Oct 9, 1999
11,522
751
126
Not so bad really, maybe 15-20% slower for less than half the cost of equivalent intel 8 core HEDT CPU.

It all comes down to if you have the $$$ i guess. Still a huge win on price/perf.

That and with AM4 you should be good for the next 2 zen generations, with intel HEDT is dead end platform thats obsolete in few months..

As with anything it all comes down to what you do, and how much $$ you have, and if you care about future upgrade path for mobo.
 
  • Like
Reactions: lightmanek

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
So, Ryzen has less memory bandwidth than its competitor, lower IPC, and half the AVX throughput, and the worst it does is 20% slower (if I estimate correctly)?

Geez I'm not sure I'd want to see what it would like if AMD doubled Ryzen's AVX capacity - it would be a slaughter!

You're missing the point. The point is that Intel's margin of victory increases with heavier workloads, and with greater AVX2 optimization. I've never done rendering, but I've heard that professional rendering/encoding jobs can take many hours to complete. With that in mind, the performance gap will likely be much larger than the small rendering and encoding jobs that we see in the Ryzen reviews, including even Computerbase.de's review and they used the biggest jobs I could find.