Ryzen's halved 256bit AVX2 throughput

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Kenmitch

Diamond Member
Oct 10, 1999
8,505
2,249
136
You're missing the point. The point is that Intel's margin of victory increases with heavier workloads, and with greater AVX2 optimization. I've never done rendering, but I've heard that professional rendering/encoding jobs can take many hours to complete. With that in mind, the performance gap will likely be much larger than the small rendering and encoding jobs that we see in the Ryzen reviews, including even Computerbase.de's review and they used the biggest jobs I could find.

Use your imagination and just 10x the shown results. Doesn't look like an earth shattering sway in favor of Intel. People should really be happy that Ryzen is so competitive.

Interesting results. But as someone who does 3D, I'd rather make 2 Ryzen Systems for the price of one 6900k system.

That's not playing fair!
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Not so bad really, maybe 15-20% slower for less than half the cost of equivalent intel 8 core HEDT CPU.

It all comes down to if you have the $$$ i guess. Still a huge win on price/perf.

Yeah, Ryzen is awesome for price/performance no doubt. But for me the jury is still out when it comes to performance in content or media creation and AVX2 applications, which are becoming more and more prevalent.

The reviewers were too handicapped by time to do thorough reviews which would actually mimic what you'd see in the real world.

That and with AM4 you should be good for the next 2 zen generations, with intel HEDT is dead end platform thats obsolete in few months..

I'd hardly say it's obsolete. o_O I just ordered a 6900K after owning a 5930K. I hope to overclock the 6900K to at least 4.2ghz, and it will be paired with a DDR4-3200 32GB quad channel kit. A setup like this will be viable for years, although I upgrade my platform much faster and I will probably upgrade again once the die shrink for Skylake-E becomes available or PCI-E 4.0 is released.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Use your imagination and just 10x the shown results. Doesn't look like an earth shattering sway in favor of Intel. People should really be happy that Ryzen is so competitive.

I am happy that Ryzen is competitive, but I'm not happy that they decided to go with 128x2 rather than the full 256 bit vectors. I think this is going to come back to haunt them in the future, as more and more applications become heavily AVX2 optimized.

Lets hope that Ryzen 2 has the full 256 bit width vectors.
 

Stormflux

Member
Jul 21, 2010
140
26
91
I'm curious, but how much time does the average render take for a professional job?

Completely depends on what you're trying to do. Some of the most complex shots in films could take days for a single frame.. or it could take a couple minutes. This is an unanswerable question, and any benchmark results can not be thought of in a vacuum or averages. Scale is the real metric. So price/perf is king.

AVX2 relevancy is seriously being over stated. For these tasks, GPUs are completely blindsiding any CPU for 3D, AVX2 adoption isn't even a blip.
 

Rifter

Lifer
Oct 9, 1999
11,522
751
126
I'd hardly say it's obsolete. o_O I just ordered a 6900K after owning a 5930K. I hope to overclock the 6900K to at least 4.2ghz, and it will be paired with a DDR4-3200 32GB quad channel kit. A setup like this will be viable for years, although I upgrade my platform much faster and I will probably upgrade again once the die shrink for Skylake-E becomes available or PCI-E 4.0 is released.

I dont mean obsolete as in not usable i mean obsolete as in replaced by SL-X in a few months, its a dead end platform with no upgrade path. It will for sure be usable for years to come im sure.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
Well said Stormflux. AMD are not going to follow Intel's design philosophy with their wide AVX units. Intel even went one step more and have AVX-512 units on Skylake-EP. I think AMD are clear that for parallel FP workloads the GPU is the better fit and thats why they will have server and consumer APUs with Zen/Vega. Those will murder any AVX based design for FP throughput. AMD's design philosophy is very simple. Maximum compute density for servers and using the best compute engine for the job. For traditional servers its Naples and for HPC its Naples servers with Vega GPUs and Zen/Vega APUs.
 

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
Increasing the width of the AVX data path isn't without cost. It takes up die space, and affects thermals - note that Intel had to create a second, lower base clock rate specifically for AVX on its newer chips. IMO, the vast majority of laptops don't need AVX. Nor do gaming systems. This is pretty much purely a workstation/server feature.

In a future iteration of Zen, perhaps AMD should consider going with two different CCX types. You could have a 4-core CCX with a 128-bit AVX data path - basically a refined version of what they have now - for APUs. This would minimize die size while keeping TDP down (especially important for laptops). Then you could have an 8-core CCX with a 256-bit AVX data path for HEDT, workstations, and servers. There would still be the option of interlinking multiple CCXs with Infinity Fabric, as Naples will be doing, but this would allow HEDT to be a monolithic 8-core chip with none of the issues with L3 latency on core shuffling that we're seeing now with Ryzen.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Completely depends on what you're trying to do. Some of the most complex shots in films could take days for a single frame.. or it could take a couple minutes. This is an unanswerable question, and any benchmark results can not be thought of in a vacuum or averages. Scale is the real metric. So price/perf is king.

AVX2 relevancy is seriously being over stated. For these tasks, GPUs are completely blindsiding any CPU for 3D, AVX2 adoption isn't even a blip.

But most of the big 3D CGI companies look like they are using CPUs to render rather than GPUs. And if they are using CPUs to do rendering, then any type of SIMD would be useful.

That said, I don't expect CPUs to equal GPUs when it comes to rendering, as GPUs are explicitly made for that purpose.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Those will murder any AVX based design for FP throughput. AMD's design philosophy is very simple. Maximum compute density for servers and using the best compute engine for the job. For traditional servers its Naples and for HPC its Naples servers with Vega GPUs and Zen/Vega APUs.

Utilizing the GPU to the utmost is a smart thing to do, but AVX2 isn't really about FP throughput where the GPU is king, it's for integer workloads.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Increasing the width of the AVX data path isn't without cost. It takes up die space, and affects thermals - note that Intel had to create a second, lower base clock rate specifically for AVX on its newer chips. IMO, the vast majority of laptops don't need AVX. Nor do gaming systems. This is pretty much purely a workstation/server feature.

AVX is already used for gaming, in PhysX. AVX2 can also be used for gaming, and will probably be used for future iterations of PhysX that run on the CPU. Gaming is one of the rare consumer applications that are demanding of processing power.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Why would AVX2 not work with light workloads?

Nobody is saying it doesn't work in light workloads. What Computerbase.de was saying, is that lengthier and heavier workloads allow Intel's single cycle 256 bit throughput to gain steam over AMD's halved throughput. To make a bad car analogy, it's like a Porsche 911 with 530hp and all wheel drive will beat a Corvette ZR1 with 650 HP and rear wheel drive in the quarter mile due to faster acceleration, but the longer the race continues, the faster Corvette eventually catches up and blows right past the Porsche.
 

nickmania

Member
Aug 11, 2016
47
13
81
You're missing the point. The point is that Intel's margin of victory increases with heavier workloads, and with greater AVX2 optimization. I've never done rendering, but I've heard that professional rendering/encoding jobs can take many hours to complete. With that in mind, the performance gap will likely be much larger than the small rendering and encoding jobs that we see in the Ryzen reviews, including even Computerbase.de's review and they used the biggest jobs I could find.

the problem for Intel and its optimizations is that 3d rendering and video encoding are moving pretty fast to the GPU side, using less and less CPU for the tasks. Every year a lot of filters and programs pass more and more calculations to the GPU. Today you can buy a $75 GPU and encode video faster than a $4000 build, and in a few years 3d rendering will be done by the GPU or by a stack of GPUs instead a farm of x86 processors.

For example, Maxon (Cinema 4d) has signed with AMD to use their GPU render
 
  • Like
Reactions: lightmanek

HurleyBird

Platinum Member
Apr 22, 2003
2,690
1,278
136
As you can see, the lighter the workload, the more favorable Ryzen appears, but as the workload lengthens or uses higher quality settings, then Intel's full width 256 bit SIMD throughput begins to gain steam.

While those are interesting benchmarks, I don't see the evidence pointing towards the small versus large rendering task differences being AVX related, neither does it make logical sense that it would be. I'm sure there are a number of potential variables, but the two things that come to my mind as obvious possibilities are turbo clocks staying higher during long periods of activity on the Intel side, or simply the working size of the task eg. past 256 KB Ryzen will have an advantage due to its larger L2, but as the dataset further increases in size and starts hitting main memory, Intel's superior memory controller should give it more and more of an edge.
 

TandemCharge

Junior Member
Mar 10, 2017
4
4
16
Many times the FLOP/cycle count of a particular chip gets overstated due to the presence of FMA. The real FLOP count if only using mul or add is half of that rate.

People need to realise that there are other operations too like division and square root and they run at half width (128-bit) on Intel chips ( so they run at quarter rate). These operations are at times not pipelined AFAIK.

So if I do a simple math operation like this :

cos_theta = a.b/|a||b|
where |p| = sqrt(px^2+py^2+pz^2)

Then division and square root will dominate the running times not the chip's ability to do FMA.
 

bjt2

Senior member
Sep 11, 2016
784
180
86
It's half the throughput of its competitor. Also, the number of AVX2 enabled programs continues to increase, along with the level of optimization. Just look at the benchmarks, and how non AVX2 CPUs do in comparison to AVX2 enabled processors. This 2x128 bit thing is a half measure, and whilst it might suffice for now, it won't cut it in the future as Intel seems hell bent on widening the SIMD vectors even more. AVX-512 will eventually make it to consumer chips one day, and if AMD is still on 128x2 because of "efficiency" then they're going to get a rude awakening.

The throughput is double only with 256 bit FP-FMAC. AMD can do 4x128 bit FP ops and 4x128 bit IVEC ops. INTEL can do 2x256 FP ops (including FMACs) and 2+1 256 IVEC ops (2 calculus and 1 shuffle/misc).

So Zen can do 1x256 Fmul + 1x256 Fadd or 1x256 FMAC versus 2x256 Fmul OR 2x256 Fadd OR 2x256 Fmac (provided that the ports are not occupied and with 2 threads this can happen. On Zen no.)
For IVEC (I assume h264/5 don't use FP, but blender yes):
Zen can do 2x256 or 4x128 IVEC (all: add, sub, mul, div, ecc), INTEL can do 2x256 calculus + 1x256 shuffle/misc (provided that the ports are not occupied and with 2 threads this can happen. On Zen no.)
 
  • Like
Reactions: Carfax83

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
the problem for Intel and its optimizations is that 3d rendering and video encoding are moving pretty fast to the GPU side, using less and less CPU for the tasks. Every year a lot of filters and programs pass more and more calculations to the GPU. Today you can buy a $75 GPU and encode video faster than a $4000 build, and in a few years 3d rendering will be done by the GPU or by a stack of GPUs instead a farm of x86 processors.

For example, Maxon (Cinema 4d) has signed with AMD to use their GPU render

We've had GPU encoding/transcoding for years now, but it's still inferior to encoding/transcoding on a CPU when it comes to actual quality. What's the point of encoding a video at breakneck speed on a GPU if it's going to end up looking like VHS?

Now rendering is another matter, as rendering is naturally more suited to the GPU. But even that has some limitations, as GPUs don't have as much RAM to play with as CPUs. I tried looking, but I couldn't find any examples of a big rendering studio that uses GPUs for rendering.
 
  • Like
Reactions: pcp7

krumme

Diamond Member
Oct 9, 2009
5,955
1,591
136
So best case avx2 scenario for Intel. An ryzen 1700x is still slightly faster to similar cost 6800. That is excluded mb and ram cost for x99 platform.
For the rest of encoding 1800x zen is just faster than 6900.

Seems to me the benchmark ought to be zen and not Intel.
Who have the balanced arch here?
3d now+ anyone?
 
  • Like
Reactions: lightmanek

nickmania

Member
Aug 11, 2016
47
13
81
We've had GPU encoding/transcoding for years now, but it's still inferior to encoding/transcoding on a CPU when it comes to actual quality. What's the point of encoding a video at breakneck speed on a GPU if it's going to end up looking like VHS?

Now rendering is another matter, as rendering is naturally more suited to the GPU. But even that has some limitations, as GPUs don't have as much RAM to play with as CPUs. I tried looking, but I couldn't find any examples of a big rendering studio that uses GPUs for rendering.

gpu video encode does not look bad, there are tons of videos encoded using the gpu in YouTube. in fact every camera in the market encode the footage using a dedicated chip in real time, not a general purpose X86 GPU.

the 3d engines are not prepared for GPU rendering yet, they just render using open gl, but is moving fast. Cinema 4d is going to add the amd render so we will see in a few months.
 

Ancalagon44

Diamond Member
Feb 17, 2010
3,274
202
106
But most of the big 3D CGI companies look like they are using CPUs to render rather than GPUs. And if they are using CPUs to do rendering, then any type of SIMD would be useful.

That said, I don't expect CPUs to equal GPUs when it comes to rendering, as GPUs are explicitly made for that purpose.

What makes you say that? How do you know whether it is true or not?

First, why would they use CPUs? There is no benefit. GPUs are purpose built for rendering - they are much more efficient at it. Not only that, there is much better software support for GPU rendering than CPU rendering. They just don't gain anything from CPU rendering.

Also bear in mind that rendering and encoding are two separate steps. They could render on GPUs and then encode on CPUs if they really wanted to. Although even then, a GPU is likely a better choice.
 

dacostafilipe

Senior member
Oct 10, 2013
772
244
116
We've had GPU encoding/transcoding for years now, but it's still inferior to encoding/transcoding on a CPU when it comes to actual quality. What's the point of encoding a video at breakneck speed on a GPU if it's going to end up looking like VHS?

If you are talking about dedicated hardware (Intel QuickSync, AMD VCE, ...), then yes, CPUs do a better job, but it's not true for Cuda/OpenCL encoders.
 
  • Like
Reactions: lightmanek

Nothingness

Platinum Member
Jul 3, 2013
2,444
767
136
Now rendering is another matter, as rendering is naturally more suited to the GPU. But even that has some limitations, as GPUs don't have as much RAM to play with as CPUs. I tried looking, but I couldn't find any examples of a big rendering studio that uses GPUs for rendering.
My understanding is that most (all?) large rendering farms are only using CPUs. Easier integration, less software issues, more RAM, etc.
 
  • Like
Reactions: Carfax83

naukkis

Senior member
Jun 5, 2002
710
586
136
Ryzen is so good because it lacks 256bit ops which are just waste of power. 6900K running AVX is 140W tdp, 1800X is 95W. Ryzen already have at least equal performance per watt compared to Intel's best offerings and let's wait and compare 180W Naples to Intel 180W offerings - it will be very competitive in AVX workloads too.

Intel should drop 256 bit ops or AMD will offer faster all-around cpu soon.