Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

H433x0n · Sep 29, 2023

Markfw said:
This is more of a comment, but also a little bit of a question. And yes I know about OEMS and idiot IT managers, BUT...

With AMD as king in performance and perf/watt and perf/$$ for at least 4 years, and with Genoa,Genoa-X and Bergamo do crushing anything Intel has, or is going to release soon, how is it that they STILL can't get more market share. How long can Intels name keep them selling their crap server parts this far down the line ? This many years with crap ?

Edit: not to mention Zen 5 and Turin.....

I think that's easy to see. If I was in charge of purchasing & configuring server racks for a major company that has used Intel for the past 20 years, would I risk my salary for the company to get better TCO? Even if I knew that the AMD product would be better in every way and just as reliable - I would still be fearful of getting blamed for anything that went wrong in the migration. Combine this with better availability for Intel processors and their willingness to accept much lower margins and it's easy to see people taking the path of least resistance.

Geddagod · Sep 29, 2023

TESKATLIPOKA said:
Who tested It?

Don't remember tbh. One of those chinese reviewers that actually test perf/watt unlike their western counterparts. Just google it lol.

TESKATLIPOKA · Sep 29, 2023

Geddagod said:
Don't remember tbh. One of those chinese reviewers that actually test perf/watt unlike their western counterparts. Just google it lol.

Link

Looks very bad. I have to wonder, If It's really correct.

Markfw · Sep 29, 2023

Geddagod said:
GNR?

What is GNR ? what core ? how many ? what power ?

the 144/288 e-cores CPU they are coming out with is DOA IMO. e-cores vs less cache but full power Zen 4 cores ? Like 2 to 1. Is GNR something else ?

H433x0n · Sep 29, 2023

Markfw said:
What is GNR ? what core ? how many ? what power ?

Redwood Cove, ~120 cores, 500W.

Geddagod · Sep 29, 2023

Markfw said:
What is GNR ? what core ? how many ? what power ?

Granite Rapids? ~120 cores IIRC, power same as Turin, 500 watts.

Markfw said:
the 144/288 e-cores CPU they are coming out with is DOA IMO. e-cores vs less cache but full power Zen 4 cores ? Like 2 to 1.

People love comparing Bergamo and Sierra Forest in total MT performance as if people are really going to use it for that. What matters is how efficient they are per core or per core cluster at any given clock.

Markfw · Sep 29, 2023

H433x0n said:
Redwood Cove, 120 cores (could be slightly more depending how much is disabled per tile) 500W.

Doesn't Bergamo beat that ? at WAY less power usage ?

I just saw the post above this, but Turn at 500 watts is most likely WAY faster. And I will wait for reviews on Sierra forest as to how fast and efficient.

H433x0n · Sep 29, 2023

Markfw said:
Doesn't Bergamo beat that ? at WAY less power usage ?

I just saw the post above this, but Turn at 500 watts is most likely WAY faster. And I will wait for reviews on Sierra forest as to how fast and efficient.

To be honest, I don't know. I don't expect GNR to outperform Turin outside of AMX accelerated tasks.

Markfw · Sep 29, 2023

H433x0n said:
To be honest, I don't know. I don't expect GNR to outperform Turin outside of AMX accelerated tasks.

And then of course, does GNR support avx-512 ?

Edit: I found this

"AVX10.1 is just Granite Rapids' AVX-512 renamed.
Granite Rapids has a few extra AVX-512 instructions (including those added by Tiger Lake, but omitted in Sapphire Rapids and Alder Lake), so Sapphire Rapids does not support all of AVX10.1. Therefore neither Sapphire Rapids nor Emerald Rapids may turn on the AVX10 CPUID bits.

Nevertheless, the differences between the AVX-512 instruction sets of Granite Rapids and Sapphire Rapids are small and of little importance."

adroc_thurston · Sep 29, 2023

Markfw said:
What is GNR ?

Granite Rapids.

Geddagod said:
People love comparing Bergamo and Sierra Forest in total MT performance as if people are really going to use it for that

Well yea, they're exact same socket power (360 vs 350W).

HurleyBird · Sep 29, 2023

adroc_thurston said:
It just doesn't scale to 64c*2p systems.

So? Perf/clock is basically agnostic to thread count scaling. If Zen5 relies so much less on SMT for its MT performance, and Cinebench scales poorly to 256 threads, then this benchmark should be a massive outlier in favor of the Zen5 system, not a massive outlier against it.

adroc_thurston said:
Yeah it would but really depends on the workload.

No, in the scenario where SMT yield is lowered because those resources are now better utilized without SMT, it wouldn't. That would only help MT because of Amdahl's law.

The only scenario where MT should decrease is when SMT yield is lowered for reasons other than increased resource utilization on the primary thread.

adroc_thurston · Sep 29, 2023

HurleyBird said:
then this benchmark should be a massive outlier in favor of the Zen5 system, not a massive outlier against it.

No, it'll load as many threads as it could and cap out the perf.
It's a bad bench for high-CC systems.

HurleyBird said:
That would only help MT because of Amdahl's law.

We're not talking a single nT app running on whole socket.
They're separate things instanced a zillion times over.

naukkis · Sep 29, 2023

TESKATLIPOKA said:
Link

Looks very bad. I have to wonder, If It's really correct.

It's freq/voltage curve - not perf/w. Dense core has less static and dynamic power variables so it's more efficient at given voltage. Zen4 &c efficiency switch point is somewhere around 3GHz, which quick calculation gives approximation that it's combined dynamic/static capacitance at that 3GHz is about 2/3 of regular core.

HurleyBird · Sep 29, 2023

adroc_thurston said:
No, it'll load as many threads as it could and cap out the perf.

This is exactly what I'm getting at, so I'm not sure where the confusion is. Threads 0-127 are disproportionately faster on Zen5, while threads 128-255 are probably faster on Zen4 (assuming your claim of ~1/3rd the SMT yield on Z5 vs. Z4). So what happens when an application doesn't scale well to 256T? To the extent it doesn't, Zen5 looks disproportionately better against Zen4 than it would otherwise. But that's not what we're seeing in the Cinebench leak. We appear to be seeing the *opposite of that*, and that doesn't square up unless there's some other monkey wrench in the gears.

adroc_thurston said:
We're not talking a single nT app running on whole socket.
They're separate things instanced a zillion times over.

And are all those instances single threaded?

adroc_thurston · Sep 29, 2023

HurleyBird said:
So what happens when an application doesn't scale well to 256T

It loads the threads sequentially until it hits the socket cap.

HurleyBird said:
And are all those instances single threaded?

Yes, just read SPECcpu 2017 documentation.

Goop_reformed · Sep 29, 2023

Say what you want about mlid the guy does have good infos. So zen 5 was in the making since 2018 and the ipc uplift from zen 4 is only 10-15%?. These slides have to be either sand bagging or anti-leakers material.

Also, mlid outright dismissed the 30% ipc uplift claim. If he knew the actual number, and not from 10% - 15% +, he wouldn't have outright said that. Just my 2 cents. Still hoping for the magical numbers, finger crossed!

Geddagod · Sep 29, 2023

Markfw said:
Doesn't Bergamo beat that ? at WAY less power usage ?

I mean I just posted this, but no one is using Bergamo for just raw MT perf. Oh sure, technically, Bergamo even beats Genoa in raw MT perf, but again, even AMD is calling these cloud processors, not HPC processors. The reason why is pretty obvious.

Markfw said:
I just saw the post above this, but Turn at 500 watts is most likely WAY faster.

Well, lets do some rough, simplistic math based on rumors.
Choosing avrg render tests since that's where Intel loses hardest from what I've seen:

SPR baseline: 1.00
Genoa baseline: 1.57
EMR guesswork: 1.10 (6% cores, better binning, less MCM)
Turin guesswork v0.5: 2.61 (33% cores, 25% IPC)
Granite Rapids guesswork:2.20 (x2 over EMR from Intel internal documents)
There is an additional wildcard for Turin- power. 25% higher power over Genoa, but this shouldn't translate perfectly over to Turin. It has more cores to feed, more CCDs requiring more power, and also fatter cores that require more power as well- unless AMD pulls a rabbit out of their hat and gets another increase or equalization in frequency iso power despite the fatter core. Also 25% more power doesn't equate to 25% higher clocks, but let's just, very optimistically, again assume it does.
So lets optimistically say 25% does fully translate over (it prob won't):
Turin guesswork v1: 3.26 (33% cores, 25% IPC, 25% power)

In which case we get Turin being 48% more performant than GNR, which doesn't shrink the gap between how much SPR loses to Genoa... but it drastically improves the efficiency gap since both of these CPUs will be at 500 watts. Me personally? I think the gap is going to be closer to ~30%, but we will see.
(final disclaimer, ik this is a massively oversimplified projection based on leaks, not including stuff like memory bandwidth or SMT vs 1T, etc etc, but this was just for fun anyway

Geddagod · Sep 29, 2023

naukkis said:
It's freq/voltage curve - not perf/w. Dense core has less static and dynamic power variables so it's more efficient at given voltage. Zen4 &c efficiency switch point is somewhere around 3GHz, which quick calculation gives approximation that it's combined dynamic/static capacitance at that 3GHz is about 2/3 of regular core.

We have straight up power curves too, finally decided to stop being lazy and just search it up lol

Markfw · Sep 29, 2023

Geddagod said:
I mean I just posted this, but no one is using Bergamo for just raw MT perf. Oh sure, technically, Bergamo even beats Genoa in raw MT perf, but again, even AMD is calling these cloud processors, not HPC processors. The reason why is pretty obvious.

Geddagod said:
<snip>

In which case we get Turin being 48% more performant than GNR, which doesn't shrink the gap between how much SPR loses to Genoa... but it drastically improves the efficiency gap since both of these CPUs will be at 500 watts. Me personally? I think the gap is going to be closer to ~30%, but we will see.
(final disclaimer, ik this is a massively oversimplified projection based on leaks, not including stuff like memory bandwidth or SMT vs 1T, etc etc, but this was just for fun anyway

I think this is basically what I am saying. AMD rules servers in every respect today, and for the foreseeable future. I saw the post about the IT manager thinking "I will never get fired for buying Intel", but I am so sick of that. I can not believe for over 5 years, and with power getting more and more expensive that they won't rethink. They could get a raise if they told their manager "look, this is 30% more efficient on power and is faster". Not to mention in data center, the amount is arguable, but 1 watt saved = 2-3 watts saved due to AC and to APS support.

Geddagod · Sep 29, 2023

Markfw said:
I think this is basically what I am saying. AMD rules servers in every respect today, and for the foreseeable future. I saw the post about the IT manager thinking "I will never get fired for buying Intel", but I am so sick of that. I can not believe for over 5 years, and with power getting more and more expensive that they won't rethink. They could get a raise if they told their manager "look, this is 30% more efficient on power and is faster". Not to mention in data center, the amount is arguable, but 1 watt saved = 2-3 watts saved due to AC and to APS support.

Intel's just going to continue selling stuff dirt cheap. Gonna be fun to see how they manage to do that with insanely expensive GNR, that prob uses more silicon than Turin while also using a "newer" node.

HurleyBird · Sep 29, 2023

adroc_thurston said:
It loads the threads sequentially until it hits the socket cap.

Which isn't relevant. The worse that additional threads perform, the better the architecture that relies less on SMT for its total throughput should come off.

adroc_thurston said:
Yes, just read SPECcpu 2017 documentation.

So now we're just talking about SPEC? I thought we were talking general enterprise/server performance?

randomhero · Sep 29, 2023

adroc_thurston said:
Wow you've almost nailed Turin perf.
Congrats.

What about x(3d) factor? That is big unknown.

And thank you!

adroc_thurston · Sep 29, 2023

HurleyBird said:
Which isn't relevant. The worse that additional threads perform, the better the architecture that relies less on SMT for its total throughput should come off.

Again, cinememe can't scale well on server CPUs since it runs out of tiles.

HurleyBird said:
So now we're just talking about SPEC

Socket perf projection numbers from every vendor out there tend to be SIR2017 rate n.

HurleyBird · Sep 29, 2023

adroc_thurston said:
Again, cinememe can't scale well on server CPUs since it runs out of tiles.

Again, that's the exact scenario that will disproportionately benefit the architecture that gains a massive non-SMT uplift at the expense of SMT yield, except that's not what the leak is showing.

At this point, I think you might be trolling me because based on your posting history (this is a complement) I don't see how this could possibly be going over your head for so long now (this isn't). You aren't engaging. You're just repeating yourself despite the fact that the thing you're repeating goes against your claim.

One more time: If the non-SMT threads (which are assigned first) have a massive uplift while SMT threads (which are assigned later) actually regress, and you're talking about an application with a semi-hard physical limit to thread scaling due to additional threads not being able to perform any meaningful work after a point, when comparing two CPUs that go past that point we're talking about a task that disproportionately benefits the CPU that relies less on SMT.

Therefore if the uplift of 256T Z5 vs 256 Z4 in Cinebench is only 15%, then you should expect a much lower uplift in consumer parts. And that just doesn't sound realistic.

adroc_thurston · Sep 29, 2023

HurleyBird said:
Again, that's the exact scenario that will disproportionately benefit the architecture that gains a massive non-SMT uplift at the expense of SMT yield, except that's not what the leak is showing.

Only if it loads 1c/1t increments(it doesn't, tiles are loaded sequentially and contained to one socket).
Otherwise what you win in 1t IPC you lose in nT scaling.
See Milan (was a 0 to net negative nT gain ISO wattage for silly parallel stuff like cinememe).

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Golden Member

Golden Member

Platinum Member

Moderator Emeritus, Elite Member

Golden Member

Golden Member

Moderator Emeritus, Elite Member

Golden Member

Moderator Emeritus, Elite Member

Diamond Member

Platinum Member

Diamond Member

Golden Member

Platinum Member

Diamond Member

Senior member

Golden Member

Golden Member

Moderator Emeritus, Elite Member

Golden Member

Platinum Member

Member

Diamond Member

Platinum Member

Diamond Member