Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 154 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

H433x0n

Golden Member
Mar 15, 2023
1,224
1,604
106
This is more of a comment, but also a little bit of a question. And yes I know about OEMS and idiot IT managers, BUT...

With AMD as king in performance and perf/watt and perf/$$ for at least 4 years, and with Genoa,Genoa-X and Bergamo do crushing anything Intel has, or is going to release soon, how is it that they STILL can't get more market share. How long can Intels name keep them selling their crap server parts this far down the line ? This many years with crap ?

Edit: not to mention Zen 5 and Turin.....
I think that's easy to see. If I was in charge of purchasing & configuring server racks for a major company that has used Intel for the past 20 years, would I risk my salary for the company to get better TCO? Even if I knew that the AMD product would be better in every way and just as reliable - I would still be fearful of getting blamed for anything that went wrong in the migration. Combine this with better availability for Intel processors and their willingness to accept much lower margins and it's easy to see people taking the path of least resistance.
 
  • Like
Reactions: Executor_

TESKATLIPOKA

Platinum Member
May 1, 2020
2,696
3,259
136
Don't remember tbh. One of those chinese reviewers that actually test perf/watt unlike their western counterparts. Just google it lol.
Link
v2-3246ea92d494fffe1ea26f30bc9d9d79_1440w.webp

Looks very bad. I have to wonder, If It's really correct.
 

Geddagod

Golden Member
Dec 28, 2021
1,340
1,433
106
What is GNR ? what core ? how many ? what power ?
Granite Rapids? ~120 cores IIRC, power same as Turin, 500 watts.
the 144/288 e-cores CPU they are coming out with is DOA IMO. e-cores vs less cache but full power Zen 4 cores ? Like 2 to 1.
People love comparing Bergamo and Sierra Forest in total MT performance as if people are really going to use it for that. What matters is how efficient they are per core or per core cluster at any given clock.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,001
15,952
136
Redwood Cove, 120 cores (could be slightly more depending how much is disabled per tile) 500W.
Doesn't Bergamo beat that ? at WAY less power usage ?

I just saw the post above this, but Turn at 500 watts is most likely WAY faster. And I will wait for reviews on Sierra forest as to how fast and efficient.
 

H433x0n

Golden Member
Mar 15, 2023
1,224
1,604
106
Doesn't Bergamo beat that ? at WAY less power usage ?

I just saw the post above this, but Turn at 500 watts is most likely WAY faster. And I will wait for reviews on Sierra forest as to how fast and efficient.
To be honest, I don't know. I don't expect GNR to outperform Turin outside of AMX accelerated tasks.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,001
15,952
136
To be honest, I don't know. I don't expect GNR to outperform Turin outside of AMX accelerated tasks.
And then of course, does GNR support avx-512 ?

Edit: I found this

"AVX10.1 is just Granite Rapids' AVX-512 renamed.
Granite Rapids has a few extra AVX-512 instructions (including those added by Tiger Lake, but omitted in Sapphire Rapids and Alder Lake), so Sapphire Rapids does not support all of AVX10.1. Therefore neither Sapphire Rapids nor Emerald Rapids may turn on the AVX10 CPUID bits.

Nevertheless, the differences between the AVX-512 instruction sets of Granite Rapids and Sapphire Rapids are small and of little importance."
 

HurleyBird

Platinum Member
Apr 22, 2003
2,792
1,512
136
It just doesn't scale to 64c*2p systems.

So? Perf/clock is basically agnostic to thread count scaling. If Zen5 relies so much less on SMT for its MT performance, and Cinebench scales poorly to 256 threads, then this benchmark should be a massive outlier in favor of the Zen5 system, not a massive outlier against it.

Yeah it would but really depends on the workload.

No, in the scenario where SMT yield is lowered because those resources are now better utilized without SMT, it wouldn't. That would only help MT because of Amdahl's law.

The only scenario where MT should decrease is when SMT yield is lowered for reasons other than increased resource utilization on the primary thread.
 

adroc_thurston

Diamond Member
Jul 2, 2023
5,540
7,728
96
then this benchmark should be a massive outlier in favor of the Zen5 system, not a massive outlier against it.
No, it'll load as many threads as it could and cap out the perf.
It's a bad bench for high-CC systems.
That would only help MT because of Amdahl's law.
We're not talking a single nT app running on whole socket.
They're separate things instanced a zillion times over.
 

naukkis

Golden Member
Jun 5, 2002
1,004
844
136
Link
v2-3246ea92d494fffe1ea26f30bc9d9d79_1440w.webp

Looks very bad. I have to wonder, If It's really correct.

It's freq/voltage curve - not perf/w. Dense core has less static and dynamic power variables so it's more efficient at given voltage. Zen4 &c efficiency switch point is somewhere around 3GHz, which quick calculation gives approximation that it's combined dynamic/static capacitance at that 3GHz is about 2/3 of regular core.
 

HurleyBird

Platinum Member
Apr 22, 2003
2,792
1,512
136
No, it'll load as many threads as it could and cap out the perf.

This is exactly what I'm getting at, so I'm not sure where the confusion is. Threads 0-127 are disproportionately faster on Zen5, while threads 128-255 are probably faster on Zen4 (assuming your claim of ~1/3rd the SMT yield on Z5 vs. Z4). So what happens when an application doesn't scale well to 256T? To the extent it doesn't, Zen5 looks disproportionately better against Zen4 than it would otherwise. But that's not what we're seeing in the Cinebench leak. We appear to be seeing the *opposite of that*, and that doesn't square up unless there's some other monkey wrench in the gears.

We're not talking a single nT app running on whole socket.
They're separate things instanced a zillion times over.

And are all those instances single threaded?
 
Last edited:

Goop_reformed

Senior member
Sep 23, 2023
315
340
96
Say what you want about mlid the guy does have good infos. So zen 5 was in the making since 2018 and the ipc uplift from zen 4 is only 10-15%?. These slides have to be either sand bagging or anti-leakers material.

Also, mlid outright dismissed the 30% ipc uplift claim. If he knew the actual number, and not from 10% - 15% +, he wouldn't have outright said that. Just my 2 cents. Still hoping for the magical numbers, finger crossed!
 

Geddagod

Golden Member
Dec 28, 2021
1,340
1,433
106
Doesn't Bergamo beat that ? at WAY less power usage ?
I mean I just posted this, but no one is using Bergamo for just raw MT perf. Oh sure, technically, Bergamo even beats Genoa in raw MT perf, but again, even AMD is calling these cloud processors, not HPC processors. The reason why is pretty obvious.
I just saw the post above this, but Turn at 500 watts is most likely WAY faster.
Well, lets do some rough, simplistic math based on rumors.
Choosing avrg render tests since that's where Intel loses hardest from what I've seen:
1696020067626.png
SPR baseline: 1.00
Genoa baseline: 1.57
EMR guesswork: 1.10 (6% cores, better binning, less MCM)
Turin guesswork v0.5: 2.61 (33% cores, 25% IPC)
Granite Rapids guesswork:2.20 (x2 over EMR from Intel internal documents)
There is an additional wildcard for Turin- power. 25% higher power over Genoa, but this shouldn't translate perfectly over to Turin. It has more cores to feed, more CCDs requiring more power, and also fatter cores that require more power as well- unless AMD pulls a rabbit out of their hat and gets another increase or equalization in frequency iso power despite the fatter core. Also 25% more power doesn't equate to 25% higher clocks, but let's just, very optimistically, again assume it does.
So lets optimistically say 25% does fully translate over (it prob won't):
Turin guesswork v1: 3.26 (33% cores, 25% IPC, 25% power)

In which case we get Turin being 48% more performant than GNR, which doesn't shrink the gap between how much SPR loses to Genoa... but it drastically improves the efficiency gap since both of these CPUs will be at 500 watts. Me personally? I think the gap is going to be closer to ~30%, but we will see.
(final disclaimer, ik this is a massively oversimplified projection based on leaks, not including stuff like memory bandwidth or SMT vs 1T, etc etc, but this was just for fun anyway :)
 

Geddagod

Golden Member
Dec 28, 2021
1,340
1,433
106
It's freq/voltage curve - not perf/w. Dense core has less static and dynamic power variables so it's more efficient at given voltage. Zen4 &c efficiency switch point is somewhere around 3GHz, which quick calculation gives approximation that it's combined dynamic/static capacitance at that 3GHz is about 2/3 of regular core.
We have straight up power curves too, finally decided to stop being lazy and just search it up lol
1696021954309.png
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,001
15,952
136
I mean I just posted this, but no one is using Bergamo for just raw MT perf. Oh sure, technically, Bergamo even beats Genoa in raw MT perf, but again, even AMD is calling these cloud processors, not HPC processors. The reason why is pretty obvious.

<snip>

In which case we get Turin being 48% more performant than GNR, which doesn't shrink the gap between how much SPR loses to Genoa... but it drastically improves the efficiency gap since both of these CPUs will be at 500 watts. Me personally? I think the gap is going to be closer to ~30%, but we will see.
(final disclaimer, ik this is a massively oversimplified projection based on leaks, not including stuff like memory bandwidth or SMT vs 1T, etc etc, but this was just for fun anyway :)
I think this is basically what I am saying. AMD rules servers in every respect today, and for the foreseeable future. I saw the post about the IT manager thinking "I will never get fired for buying Intel", but I am so sick of that. I can not believe for over 5 years, and with power getting more and more expensive that they won't rethink. They could get a raise if they told their manager "look, this is 30% more efficient on power and is faster". Not to mention in data center, the amount is arguable, but 1 watt saved = 2-3 watts saved due to AC and to APS support.
 

Geddagod

Golden Member
Dec 28, 2021
1,340
1,433
106
I think this is basically what I am saying. AMD rules servers in every respect today, and for the foreseeable future. I saw the post about the IT manager thinking "I will never get fired for buying Intel", but I am so sick of that. I can not believe for over 5 years, and with power getting more and more expensive that they won't rethink. They could get a raise if they told their manager "look, this is 30% more efficient on power and is faster". Not to mention in data center, the amount is arguable, but 1 watt saved = 2-3 watts saved due to AC and to APS support.
Intel's just going to continue selling stuff dirt cheap. Gonna be fun to see how they manage to do that with insanely expensive GNR, that prob uses more silicon than Turin while also using a "newer" node.
 
  • Like
Reactions: Tlh97

HurleyBird

Platinum Member
Apr 22, 2003
2,792
1,512
136
It loads the threads sequentially until it hits the socket cap.

Which isn't relevant. The worse that additional threads perform, the better the architecture that relies less on SMT for its total throughput should come off.

Yes, just read SPECcpu 2017 documentation.

So now we're just talking about SPEC? I thought we were talking general enterprise/server performance?
 

adroc_thurston

Diamond Member
Jul 2, 2023
5,540
7,728
96
Which isn't relevant. The worse that additional threads perform, the better the architecture that relies less on SMT for its total throughput should come off.
Again, cinememe can't scale well on server CPUs since it runs out of tiles.
So now we're just talking about SPEC
Socket perf projection numbers from every vendor out there tend to be SIR2017 rate n.
 
  • Like
Reactions: Tlh97 and Joe NYC

HurleyBird

Platinum Member
Apr 22, 2003
2,792
1,512
136
Again, cinememe can't scale well on server CPUs since it runs out of tiles.

Again, that's the exact scenario that will disproportionately benefit the architecture that gains a massive non-SMT uplift at the expense of SMT yield, except that's not what the leak is showing.

At this point, I think you might be trolling me because based on your posting history (this is a complement) I don't see how this could possibly be going over your head for so long now (this isn't). You aren't engaging. You're just repeating yourself despite the fact that the thing you're repeating goes against your claim.

One more time: If the non-SMT threads (which are assigned first) have a massive uplift while SMT threads (which are assigned later) actually regress, and you're talking about an application with a semi-hard physical limit to thread scaling due to additional threads not being able to perform any meaningful work after a point, when comparing two CPUs that go past that point we're talking about a task that disproportionately benefits the CPU that relies less on SMT.

Therefore if the uplift of 256T Z5 vs 256 Z4 in Cinebench is only 15%, then you should expect a much lower uplift in consumer parts. And that just doesn't sound realistic.
 
  • Like
Reactions: Exist50

adroc_thurston

Diamond Member
Jul 2, 2023
5,540
7,728
96
Again, that's the exact scenario that will disproportionately benefit the architecture that gains a massive non-SMT uplift at the expense of SMT yield, except that's not what the leak is showing.
Only if it loads 1c/1t increments(it doesn't, tiles are loaded sequentially and contained to one socket).
Otherwise what you win in 1t IPC you lose in nT scaling.
See Milan (was a 0 to net negative nT gain ISO wattage for silly parallel stuff like cinememe).
 
Last edited: