Discussion Zen 5 Architecture & Technical discussion

lopri · Sep 22, 2024

Why is this thing so bad at Super Pi? With 7700X I got 6 secs for 1M but with 9700X can't do it under 7 secs.

igor_kavinski · Sep 22, 2024

lopri said:
Why is this thing so bad at Super Pi? With 7700X I got 6 secs for 1M but with 9700X can't do it under 7 secs.

Is your 9700X at 105W? If not, that may explain it.

lopri · Sep 22, 2024

Super pi is purely single threaded. Power limit has no impact until you go way lower.

igor_kavinski · Sep 22, 2024

lopri said:
Super pi is purely single threaded.

Try turning SMT off and running it in the hidden Administrator account.

Nothingness · Sep 23, 2024

lopri said:
Why is this thing so bad at Super Pi? With 7700X I got 6 secs for 1M but with 9700X can't do it under 7 secs.

Isn't SuperPI limited to x87 and SSE2? If so it might be impacted by changes in instruction latency in Zen5.

Det0x · Sep 23, 2024

Can confirm Zen5 being slower in superpi than Zen4

Both these results are mine (done with static clocks)

Det0x · Sep 23, 2024

MS_AT said:
Ok, maybe I wasn't clear enough. The CCD to IOD interface limits you to 64GB/s, while 6000MT/s DDR5 setups provides theoretical 96GB/s. Since CCD to IOD bandwidth is the limiting factor here, it doesn't matter how fast your DRAM is if you saturate CCD to IOD link first [probably better to have a bit higher for various contoller related overheads].

AVX512 would love to use the bandwidth but it won't be able to.

Not correct, single CCD Zen4 scales alittle with memoryspeeds even in 2:1 8000MT/s vs 1:1 6600MT/s
My own results with Clam cache/mem benchmark:

AMD DDR5 OC And 24/7 Daily Memory Stability Thread

https://www.amd.com/en/technologies/expo Soon to be released EXPO AMD optimized DDR5 memory. Likely as with Intel memory, A-die will be preferred. M-die is a close second. It has been said that like 3800-4000 1-1 the sweet spot for AMD 5000 series CPUs, 6000 1-1 should be the sweet spot...

www.overclock.net

Results in Clam cache/mem benchmark:

Latency ranking:

SR 2x16gigs @ 6600MT/s 1:1 mode= 68.75ns
DR 2x32gigs @ 6600MT/s 1:1 mode =70.17ns
SR 2x16gigs @ 8000MT/s 2:1 mode = 70.24ns
DR 2x32gigs @ 8000MT/s 2:1 mode = 71.84ns

Bandwidth read-modify-write (ADD) ranking:

SR 2x16gigs @ 8000MT/s 2:1 mode= 97.11GB/s
DR 2x32gigs @ 8000MT/s 2:1 mode = 92.87GB/s
SR 2x16gigs @ 6600MT/s 1:1 mode = 91.23GB/s
DR 2x32gigs @ 6600MT/s 1:1 mode = 87.34GB/s

A few comments in random order to my findings above

A single 8core Zen4 CCD can take advantage of the higher bandwidth afforded by 2:1 mode vs 1:1 mode, even if the common misconception on many forums is that there is no benefit because they can hardly see any difference in gimmicky AIDA64 memory bench. (its also easy to double check this in other benchmarks such as y-cruncher / GB3 membench which will show the same)

The next question would naturally be what's the "best memory setup", 1:1 mode with its lower latency or 2:1 with its higher bandwidth. There is no easy answer for this as it all depends on what benchmark/game you comparing the numbers in.. Some will prefer latency while others bandwidth, so you just have to check on an individual basis.

But what i can say is that i pretty much always think higher memoryspeed is better, be it in 1:1 mode or 2:1 mode... From time to time i see some limit themself to something like 6000/6200MT/s because they think its faster in games than say 6400MT/s for some reason (?)

My next observation is that i did not find any bandwidth benefit from the "dual rank" (quad) in Clam cache/mem benchmark, but karhu is seemingly showing higher mb/s. But i suspect this is because the higher memory size tested, not increased bandwidth from DR. I will do some more DR karhu runs where i limit used memorysize to same as SR and check if the numbers change. (y) edit Its also possible the forced GDM enabled with DR is eating up the bandwidth benefit compared to SR

Have also seen some complains about some ppl having a hardtime tuning memory on the 1.1.7.0 PatchA FireRangeP AGESA, i can only say that is working pretty good for me on the ASUS GENE, even if i'm using a beta bios. But be warned, stabilizing DR 64gigs @ 8000MT/s is still insanely hard, think i spent like 5x the time on this profile compared to all others combined... Its really on a razors edge, +-5 mv on some rails and you can forget about 10k karhu.

Have also saved all pictures here also, incase this forum goes loco again with the screenshots

The same should be true for Zen5 as they share the same memory system

static shock · Sep 23, 2024

And you guys were laughing at us about Zen5 performance...
Zen5 is so efficient!

igor_kavinski · Sep 23, 2024

lopri said:
Super pi is purely single threaded. Power limit has no impact until you go way lower.

It's legacy code. If PiFast from Benchmate or the following multithreaded Rust program also shows Zen 5 losing to Zen 4, then yes, we can say Houston, we have a problem.

GitHub - lyubolp/multithreaded-pi-calculations-rust: Program that calculates the number Pi with big precision

Program that calculates the number Pi with big precision - GitHub - lyubolp/multithreaded-pi-calculations-rust: Program that calculates the number Pi with big precision

github.com

Det0x · Sep 23, 2024

Zen5 alittle faster than Zen4 in pifast

Nothingness · Sep 23, 2024

igor_kavinski said:
It's legacy code. If PiFast from Benchmate or the following multithreaded Rust program also shows Zen 5 losing to Zen 4, then yes, we can say Houston, we have a problem.

There are two sides to this: some people run some random obsolete benchmark, get odd results and draw conclusions (not saying anyone is doing the latter here); OTOH legacy code has to run fast enough.

But at this point I'm not sure what the point of running that obsolete unmaintained PiFast is. Especially if it's not been characterized (what SIMD extensions are used? Is it memory bottlenecked?).

igor_kavinski · Sep 23, 2024

Nothingness said:
Especially if it's not been characterized (what SIMD extensions are used? Is it memory bottlenecked?).

Intel® Microprocessor Quick Reference Guide - Year

Released in 2003 so best it would be using is SSE2, if at all.

Nothingness · Sep 23, 2024

igor_kavinski said:
Intel® Microprocessor Quick Reference Guide - Year

Released in 2003 so best it would be using is SSE2, if at all.

I think it's been updated to at least use SSE2 (that's why I pointed above, it might hit the latency increase of some instructions in Zen5).

igor_kavinski · Sep 23, 2024

Nothingness said:
I think it's been updated to at least use SSE2 (that's why I pointed above, it might hit the latency increase of some instructions in Zen5).

I hope Det0x will go to the trouble of compiling that Rust benchmark. For science

Markfw · Sep 23, 2024

Det0x said:
Can confirm Zen5 being slower in superpi than Zen4

Both these results are mine (done with static clocks)

View attachment 108063

View attachment 108062

But your 9950x is not retail. Any chance the retail is better ?

gdansk · Sep 23, 2024

Markfw said:
But your 9950x is not retail. Any chance the retail is better ?

As you may recall he compared his ES to a bunch of retail units and it was the best so he kept it.

jdubs03 · Sep 23, 2024

He got the golden sample. Lucky fella! 🍀🍀🍀🍀

igor_kavinski · Sep 23, 2024

jdubs03 said:
He got the golden sample. Lucky fella! 🍀🍀🍀🍀

The sample is the lucky one! Imagine if it had got into the hands of someone who ignored it coz it was "just" an ES so it would have been sitting around somewhere in his desk collecting dust.

Det0x · Sep 23, 2024

In all its beauty 😘

But on a more serious note guys, watch out with the direct die frame v2
Even if TG says it support Zen5, its not without problems..

Long store short, i was getting pretty bad temp spread on the cores after delid

7 remounts later i found the problem (yes this took hours)
Frame had been pressing down on the glue on each side of the CCD's

Temperature spread @ 310w PPT after the fix are looking much better

MS_AT · Sep 23, 2024

Det0x said:
Not correct, single CCD Zen4 scales alittle with memoryspeeds even in 2:1 8000MT/s vs 1:1 6600MT/s
My own results with Clam cache/mem benchmark:

AMD DDR5 OC And 24/7 Daily Memory Stability Thread

https://www.amd.com/en/technologies/expo Soon to be released EXPO AMD optimized DDR5 memory. Likely as with Intel memory, A-die will be preferred. M-die is a close second. It has been said that like 3800-4000 1-1 the sweet spot for AMD 5000 series CPUs, 6000 1-1 should be the sweet spot...

www.overclock.net

Results in Clam cache/mem benchmark:

Latency ranking:

SR 2x16gigs @ 6600MT/s 1:1 mode= 68.75ns

DR 2x32gigs @ 6600MT/s 1:1 mode =70.17ns

SR 2x16gigs @ 8000MT/s 2:1 mode = 70.24ns

DR 2x32gigs @ 8000MT/s 2:1 mode = 71.84ns

Bandwidth read-modify-write (ADD) ranking:

SR 2x16gigs @ 8000MT/s 2:1 mode= 97.11GB/s

DR 2x32gigs @ 8000MT/s 2:1 mode = 92.87GB/s

SR 2x16gigs @ 6600MT/s 1:1 mode = 91.23GB/s

DR 2x32gigs @ 6600MT/s 1:1 mode = 87.34GB/s

The same should be true for Zen5 as they share the same memory system

I could have been more precise. So, just to clear the first point I was talking about bandwidth only, not latency.

The part that I ignored is the fact that CCD to IOD connection is 32B/16B read and write respectively (based on one of earlier C&C investigations) with both lanes, so to speak, usable at the same time what gives you higher bandwidth limit for a test that is mixing reads and writes. Pure read should show bandwith closer to 32B x IF clk. Unless I have missed something in my analysis.

yuri69 · Sep 29, 2024

Reading Lion Cove/Skymont analysis at David Huang's Blog there are interesting comparisons to Zen 5.

* Intel really went from 6-wide to 8-wide x86 decode this gen; while AMD apparently sticks to 4-wide
* Skymont internal structure sizing is dangerously close to Zen 5 (except FP-related)
* Lion Cove vs Zen 5 SPEC2017 INT scores achieved at 4.2GHz are very close

coercitiv · Sep 29, 2024

yuri69 said:
Reading Lion Cove/Skymont analysis at David Huang's Blog there are interesting comparisons to Zen 5.

* Intel really went from 6-wide to 8-wide x86 decode this gen; while AMD apparently sticks to 4-wide
* Skymont internal structure sizing is dangerously close to Zen 5 (except FP-related)
* Lion Cove vs Zen 5 SPEC2017 INT scores achieved at 4.2GHz are very close

TL;DR - cores dangerously close to each other, collisions expected.

The one thing that does not sit right with me is the efficiency of the cores in the dense cluster, less efficiency than vanilla cores is weird.

Abwx · Sep 29, 2024

yuri69 said:
* Skymont internal structure sizing is dangerously close to Zen 5 (except FP-related)

20/30% better perf/clock in INT/FP for Zen 5c vs SKT in GB 6 ST, so that s close only apparently, in the real world they are far apart, about two gens apart, and to think that we had people here expecting SKT to match or even beat Zen 4, i once said that it was at Zen 3 level at best.

Saylick · Sep 29, 2024

coercitiv said:
TL;DR - cores dangerously close to each other, collisions expected.

The one thing that does not sit right with me is the efficiency of the cores in the dense cluster, less efficiency than vanilla cores is weird.
View attachment 108440

They are called dense cores instead of efficient cores for a reason, although it is weird that they aren’t as efficient even though AMD touted a perf/W gain, e.g.

coercitiv · Sep 30, 2024

Saylick said:
They are called dense cores instead of efficient cores for a reason

AMD themselves showed Zen 4c improving efficiency for low power scenarios in Phoenix 2. Also note the wording: "better optimized for NT efficiency, and size". The idea was to gain the density jump while also preserving or preferably improving efficiency. That being said, David Huang's package power readings might not be enough to tell the whole story here, but for what is worth it's showing a regression.

Discussion Zen 5 Architecture & Technical discussion

Elite Member

Lifer

Elite Member

Lifer

Diamond Member

Golden Member

Golden Member

Member

Lifer

Golden Member

Diamond Member

Lifer

Diamond Member

Lifer

Moderator Emeritus, Elite Member

Diamond Member

Golden Member

Lifer

Golden Member

Senior member

Senior member

Diamond Member

Lifer

Diamond Member

Diamond Member