***OFFICIAL*** Ryzen 5000 / Zen 3 Launch Thread REVIEWS BEGIN PAGE 39

Page 27 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

ksosx86

Member
Sep 27, 2012
105
44
101
Considering selling my 3900X and case moving to mATX .... the RAM too... to segway into Zen 3... after looking at all of this... didn't expect as many gains tbh ...
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
All those cores and all of that throughput, Zen 3 is going to be really impressive for heavy compute workloads like encoding/transcoding and rendering.

Speaking of which, Anandtech needs to update their benchmarking suite for Zen 3. Why are they still using such an old version of Handbrake?

*edit* They used a newer version for the Tiger Lake review, but for desktop CPUs it seems they still use an older version.
 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
12 core 5900X outperforms the 16 core 3950X in multimedia AVX2. Likely with an extra FPMUL & FPADD unit in combination with an extra Load and extra Store unit.
Indeed, that extra FP unit is giving AMD the performance crown in Cinebench R20, due to the extra AVX unit. Will be the same in a lot of rendering loads.
In contrast to smartphone designs which lean more to integer operations.

All those cores and all of that throughput, Zen 3 is going to be really impressive for heavy compute workloads like encoding/transcoding and rendering.

Speaking of which, Anandtech needs to update their benchmarking suite for Zen 3. Why are they still using such an old version of Handbrake?

*edit* They used a newer version for the Tiger Lake review, but for desktop CPUs it seems they still use an older version.
If you look at Phoronix, they have the most comprehensive test for desktop and server use cases. For me who is more interested in server and dev type loads it is my go to place for benchmarks.
If you look there, they got most of the things that are actually real world applications in a server environment and dev machines. Also for desktop use cases they have a bunch of common frameworks and apps.
If you dont like the way the benchmark is done you can fork and make changes yourself.

If AMD can get to a new node, they should have a nice power and space budget to improve in a lot of other respects in "IPC"
But it is not going to be revolutionary.
Something outside of the box has to happen for that.
 
Last edited:

Hans de Vries

Senior member
May 2, 2008
321
1,018
136
www.chip-architect.com
So just to clarify, AMD has moved from 2x256b FMAC to 3x256b?

It seems so. The multimedia benchmark shows it.

The "multimedia" benchmark is really Mandelbröt/Julia. The whole dataset effectively resides in the (AVX2) registerfile, even L1 access doesn't matter much.

Therefor the performance is more or less proportional with the amount of vector (AVX2) execution resources.

It's even clearer with the 5800X results. (The 5950X seems limmited by the 105W TDP when all AVX2 units go at max utilization)

Thanks to TUM_APISAK for the 5800X find:

We can see a near 50% performance increase going from Zen2 to Zen3 in the multimedia benchmark which would correspond with a 50% increase in AVX2 execution units.

(This 50% was actually one of the first Zen3 rumors)

EkhOhZzUUAAvMXe.png
So we may expect something like this in the image below in combination with 3 load and 2 store ports. (Mark Papermaster mentioned more L1 loads and more L1 stores per cycle in Ian's interview at the 20 minute mark)
EkgPxaxX0AEuuGe.jpeg

amd_ryzen_5000_zen_3x.jpg
 
Last edited:

Bigos

Member
Jun 2, 2019
127
281
136
This probably also explains why the base clocks are lower. If all of these new units run fully utilized it probably generates a lot of heat. In regular workloads however the gain will be probably nowhere close to +50% and the clocks should be comparable to the previous generation.
 
  • Like
Reactions: Zucker2k

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,323
4,904
136
@Hans de Vries

Seems pretty conclusive. Should be interesting to see what happens with 5900X and 5950X clocks running things like Prime95 SmallFFTs. A 3900x already drops a lot of clocks running that bench.

I imagine that's why the base clocks are where they are. Having more AVX2 capability needs a bit more clock offset to hit their 24% efficiency improvement target as well as stay under the PPT limit.

Conversely, this could mean you may actually be able to substantially improve multicore AVX2 performance just by increasing PPT... since the process is the same but Fmax is ~200MHz higher we know it can handle higher clocks. So if you are willing to sacrifice efficiency for clocks I see more potential/theoretical gain in Zen 3 than with Zen 2.

We'll have to bait for wenchmarks, I mean, wait for benchmarks, of course.
 

Dave3000

Golden Member
Jan 10, 2011
1,343
91
91
Would running a Ryzen 5000 series with faster than DDR4-3600 require overlocking the Infinity Fabric to keep a 1:1 ratio with the RAM?
 

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,323
4,904
136
Would running a Ryzen 5000 series with faster than DDR4-3600 require overlocking the Infinity Fabric to keep a 1:1 ratio with the RAM?

Anything greater than 3200 is technically an overclock. Although we'll have to wait to see with reviews, it looks like 4000 (2000 FCLK) may be possible with the 5000 series. Whereas 3800 (1900 FCLK) was a good sample for Zen 2.
 
  • Like
Reactions: Tlh97 and Makaveli

.vodka

Golden Member
Dec 5, 2014
1,203
1,537
136
If Zen 3 has 50% more fp resources, then full-speed 512-bit vector ops in Zen 4 feels like it's basically confirmed.

It looks like RedGamingTech had a pretty good source about a year ago on Zen3

+ ~17% IPC
+ ~50% FP performance

He was the first to claim the FP performance increase. It seems to be the case for the latter based on Sandra results... 7nm brought full speed 256bit execution on Zen2, so if there's any good moment to add another unit for full speed 512bit execution (and probably AVX-512 to go with it) while keeping power mostly in check, it's with a node shrink (5nm -> Zen4)

Zen4 also gets a nice memory bandwidth boost thanks to DDR5, so these units won't be starved. Why not increase the per CCX cache to 64MB while we're at it, lol. Zen3 isn't even out yet and I'm already getting hyped for Zen4. Goddamnit. Where's the time machine when you need it?
 
Last edited:

Dave3000

Golden Member
Jan 10, 2011
1,343
91
91
Anything greater than 3200 is technically an overclock. Although we'll have to wait to see with reviews, it looks like 4000 (2000 FCLK) may be possible with the 5000 series. Whereas 3800 (1900 FCLK) was a good sample for Zen 2.

I don't want to overclock anything on my next rig, at least not until I know that it is stable at stock settings.
 

DrMrLordX

Lifer
Apr 27, 2000
21,583
10,785
136
Conversely, this could mean you may actually be able to substantially improve multicore AVX2 performance just by increasing PPT... since the process is the same but Fmax is ~200MHz higher we know it can handle higher clocks.

Hopefully! Increasing PPT only brought lower clocks on my 3900x, but then PBO has been buggy as hell for me so I don't know what else to say there.
 

Hans de Vries

Senior member
May 2, 2008
321
1,018
136
www.chip-architect.com
It seems that significant improvements can be reached for ZEN 3 by improving the FP units instead of adding extra ones. Mark Papermaster talks about improving the floating point units instead of adding ones.

So let's have a look:

- The AVX2 register is sharing ports between the FMUL and FADD units for FMA (multiply adds)
- Only one of the two FMUL units can do vector integer multiplies. (both can do vector integer adds)
- Only one of the two FADD units can do vector integer adds

Relatively cheap improvements that can be made for ZEN 3:

- Adding ports to the AVX2 register file can increase the max FLOPS by 50% (The extra ones are FADDs)
- Give the other FMUL unit vector integer multiplies.
- Give the other FADD unit vector integer adds.

From the ZEN 2 floor-planning there seems very little extra area required for this.

amd_ryzen_zen_2_3.jpg


Thanks to Jeff Smith for the port info: https://twitter.com/JeffSmith888
Thanks to Agner Fog for the instruction tables: https://www.agner.org/optimize/instruction_tables.pdf
 
Last edited:

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
It seems that significant improvements can be reached for ZEN 3 by improving the FP units instead of adding extra ones. Mark Papermaster talks about improving the floating point units instead of adding ones.

I think they went the simplest route of making FP pipes more symmetrical. No need to add more pipes that need more(register) ports, scheduler resources when you can add some transistors to existing pipes.
They could have went with:

2 ports FMUL/FADD giving FMA without complexity of sharing FADD pipe on different port
1 port FMUL
1 port FADD

This would be straitforward design for most flexibility. Straight 3/0.33 MULs and ADDs @ 4 diferent ops, can execute 2 FMAs @ 4/0.5 + additional ADD or MUL ( or both if register ports allow? ).

Would give AMD strong boost where it matters and prepare things for AVX512 introduction in the future.
 
Last edited: