Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

Ajay · Sep 28, 2023

Markfw said:
and what does gaming have to do with desktop ? Yes, its a PART of desktop, but there are others. Some want office, some want mini workstations, some participate in DC or things that require a lot of cores. Why do you think DIY owns gaming ? Consoles ? handhelds ? Gaming is all over, but they don't own DIY.

A bit confused here about how your DC software works. For high performance servers (high load) it’s always been about ST * Cores for net performance. That’s the mistake some RISC vendors made that ceded the High performance crown in servers to Intel. Does DC software run differently (F@H DIDN'T, except for Big Units where a minimum number of cores needed to be used).

Markfw · Sep 28, 2023

Ajay said:
A bit confused here about how your DC software works. For high performance servers (high load) it’s always been about ST * Cores for net performance. That’s the mistake some RISC vendors made that ceded the High performance crown in servers to Intel. Does DC software run differently (F@H DIDN'T, except for Big Units where a minimum number of cores needed to be used).

Lets put it this way. My Genoa farm kills everything for performance, especially when avx-512 is used, which the recent PG race proved. My 4 Genoa + 6 7950x + 2 7763 Rome plus 2 7V12 Rome, beat everything except 1620 older Xeon cores. Yes, its ST * cores, and Genoa has both. But on the flip 9554 Genoa and 64 cores runs at 3.5 g side... 64 cores of Genoa in one chip = 2.7 7950x (16 cores), so ST works, but who wants 3 times as many boxes ???? 64 core 9554 runs at 3.5 ghz fully loaded, so its a good compromise.

Ajay · Sep 28, 2023

Markfw said:
But on the flip 96554 Genoa and 64 cores runs at 3.5 g side... 64 cores of Genoa in one chip = 2.7 7950x (16 cores), so ST works, but who wants 3 times as many boxes ???? 64 core 9554 runs at 3.5 ghz fully loaded, so its a good compromise.

Thanks, I understand your point now. Wish I had the dosh to get back into DC. Electricity rate in NH have gone through the roof in recent years. Coal and Nuclear plant shutdowns, along with increased NG prices are killing us (on top of hardware costs).

Markfw · Sep 28, 2023

Ajay said:
Thanks, I understand your point now. Wish I had the dosh to get back into DC. Electricity rate in NH have gone through the roof in recent years. Coal and Nuclear plant shutdowns, along with increased NG prices are killing us (on top of hardware costs).

Its not cheap for me either, even with Hydro and windmill as the sources (There might be others also). That month cost me $1000. But efficiency is also king. One 320 watt Genoa almost equals 3 142 watt 7950x's (all my 7950x's are ECO mode set)

StefanR5R · Sep 28, 2023

Below in the spoiler is some off-topic talk. (The last from me in this thread, I promise.) I look forward to Zen 5 discussion.

Fjodor2001 said:
Fixed it for you.

Your "fix" does not make sense.

There is virtually no limit to the number of cores which you can buy or rent *right now*. In the big picture, your costs are scaling merely linearly with the desired core count, give or take. It's very affordable today to go orders of magnitude beyond 16c/32t. — In contrast, there is a definite, absolute limit to the single-core performance you can buy or rent right now. No matter how much you are ready to pay, you can't break that ceiling. Yet at the same time, there are countless workloads, not the least among those which are typically executed on so-called desktop computers, at which (1.) application performance is governed by single-thread performance _and_ (2.) user experience would benefit from higher application performance.

DisEnchantment said:
I am beyond certain many if not most engineering loads need a combination of both. Engineering workloads are vast and diverse.
We have a large bunch of AMD EPYC 9374F clusters and we choose this specific SKU due to the high boost it has among the 4th gen family. Sometimes I wish it has more cores but the number ST passes in our loads are high that currently we are looking to migrate to 2P 9374F clusters instead of 1P 64C clusters.

It is my experience too with engineering applications that a mixture of ST _and_ MT computing is a common case, with both portions taking up notable fractions of the overall run time. Sometimes the presence of intense ST computing can be chalked up as the usual lack of optimization, because software product management sets other priorities. Other times it really is because the particular sub-problem is technically very difficult to parallelize.

Ajay said:
A bit confused here about how your DC software works. For high performance servers (high load) it’s always been about ST * Cores for net performance. That’s the mistake some RISC vendors made that ceded the High performance crown in servers to Intel. Does DC software run differently (F@H DIDN'T, except for Big Units where a minimum number of cores needed to be used).

In the bigger picture, Distributed Computing is all about embarrassingly parallel computing. It entirely depends on the ability to divide a huge computing task in to a very large number of small tasks which can be solved almost independently of each other, with very little communication happening between work distribution/ result collection server and compute client, and no communication at all between compute clients. The clients can be small or big, slow or fast; they can be online 24/7 or down for much of the day or week. From the POV of the science project, the sum of performances of all these clients makes up the project performance.

Given this, an individual contributor who wants to donate more computer time can choose between deploying few big or more small clients, and between having few clients active much of the time or more clients but at part-time.

The picture becomes more differentiated when we look at those individual tasks on the compute clients; they differ between Distributed Computing projects. Most typically they are

single-threaded tasks which have modest resource requirements per task, such that e.g. a 16c/32t CPU on dual-channel memory has no problems to run 32 of these tasks in parallel,
single-threaded tasks which offload most of their computation onto a GPU. (A cheap consumer GPU works fine for that.)

But there are also some Distributed Computing projects which hand out

multi-threaded tasks which easily scale to a modest number of threads, such that one would typically run one such task at once on a typical desktop CPU or a small number of such tasks concurrently on typical server CPUs,
single- or multithreaded tasks which have one or another special resource requirement, such that they are difficult to scale to the particular hardware which a Distributed Computing contributor has got at hand.
For example, there have been meteorology tasks out there which created so much result data, that a somewhat faster & wider computer could easily compute these tasks faster than it could then upload the results to the project server, over a common home/small office internet link. (Incidentally, the result collection server of this project broke down several times under the sheer rate of data returns from all clients combined. Project management and service provider had underestimated the demand on the collection server.)
As another example of such less common task types, the tasks of the latest Distributed Computing competition which Markfw mentioned, required 30 MBytes CPU cache per task instance, otherwise they would be slowed down a lot by heavy memory accesses.

H433x0n · Sep 29, 2023

Source

MLID brought the receipts. It’s way different than what many people were claiming with the 32% IPC increase.

Internal AMD documents show 10-15% IPC increase for Zen 5. There’s also leaked details of the new core arch.

adroc_thurston · Sep 29, 2023

H433x0n said:
Internal AMD documents show 10-15% IPC increase for Zen 5

That's SIR n-copy for Turin-D with SMT on, yes, that's how it sits at .99 of Genoa vCPU perf.
Veeeeeerry old slideware with old timelines.

branch_suggestion · Sep 29, 2023

H433x0n said:
Source

MLID brought the receipts. It’s way different than what many people were claiming with the 32% IPC increase.

Internal AMD documents show 10-15% IPC increase for Zen 5. There’s also leaked details of the new core arch.

Presented without comment.

H433x0n · Sep 29, 2023

adroc_thurston said:
That's SIR n-copy for Turin-D with SMT on, yes, that's how it sits at .99 of Genoa vCPU perf.
Veeeeeerry old slideware with old timelines.

This document is from 2023. It doubled IPC expectations within the year?

H433x0n · Sep 29, 2023

branch_suggestion said:
Presented without comment.
View attachment 86466 View attachment 86467

So doubling down then?

adroc_thurston · Sep 29, 2023

H433x0n said:
This document is from 2023

Lol, it's old stuff for projections.

H433x0n said:
It doubled IPC expectations within the year?

1t IPC isn't N-copy SIR IPC.

branch_suggestion · Sep 29, 2023

H433x0n said:
So doubling down then?

Of course, AMD has underpromised and overdelivered with every single Zen generation.
They are big fans of the '+' and the '>' to use for marvelous sandbagging. The only time they screwed up in recent memory was RDNA3.

H433x0n · Sep 29, 2023

adroc_thurston said:
Lol, it's old stuff for projections.

It’s no more than 9 months old.

adroc_thurston said:
1t IPC isn't N-copy SIR IPC.

It doesn’t state that anywhere. The numbers it provides for Zen 3 & Zen 4 are the proper IPC. Why would they give the legit values for previous arch but sandbag Zen 5 with a Turin-D core in a suboptimal scenario for an internal document? That doesn’t make any sense.

Also you said it’s an old invalid document anyway? So that’s not representative of Turin-D anymore?

adroc_thurston · Sep 29, 2023

H433x0n said:
It’s no more than 9 months old.

A bit older.

H433x0n said:
It doesn’t state that anywhere

All server IPC projections for all vendors tend to be N-copy for both IPC and perf total.

H433x0n said:
Why would they give the legit values for previous arch

Those are specifically server numbers.
Zen4 is 14% SIR over 13% client 1t due to higher SMT yield (which they've disclosed this hotchips).

H433x0n said:
That doesn’t make any sense.

Yes it does when you know Turin SIR perf numbers.

H433x0n · Sep 29, 2023

adroc_thurston said:
A bit older.

In the bottom left corner of the document it’s dated 2023.

adroc_thurston said:
All server IPC projections for all vendors tend to be N-copy for both IPC and perf total.

Those are specifically server numbers.

Suuureeee. Guess we’ll know when it launches. I’ll anxiously await the 32% IPC increase.

adroc_thurston · Sep 29, 2023

H433x0n said:
In the bottom left corner of the document it’s dated 2023.

I know.

H433x0n said:
Guess we’ll know when it launches. I’ll anxiously await the 32% IPC increase.

Yea, you only have a tiny little bit left to wait for the first three parts.

Thunder 57 · Sep 29, 2023

H433x0n said:
In the bottom left corner of the document it’s dated 2023.

Suuureeee. Guess we’ll know when it launches. I’ll anxiously await the 32% IPC increase.

Who claimed 32% increase? Sounds like a straw man.

adroc_thurston · Sep 29, 2023

Thunder 57 said:
Who claimed 32% increase?

Me, it's based off 96c Turin numbers AMD gave.
Definitely not SMT yield thingy given how relatively poorly SMT on Turin-D performs per thread relative to 128/96c Turins.

Saylick · Sep 29, 2023

Might as well show the slides MLID revealed so that we're all on the same page regarding what's discussed:

Edit: Had some time to look at the slides more closely and here's my thoughts.

1) That timeline slide... The years seem off. It's almost the end of 2023 and Zen 5 isn't even out yet, which suggests that it's either a fake slide, this timeline isn't to scale, or it's out of date. Seeing as how Covid delayed things, if it's an out of date slide, it probably dates to sometime before Covid.
2) Regarding the Zen 5 block diagram, here's Zen 4s for reference.

Zen 4 has 4 ALUs vs 6 for Zen 5.
Zen 4 has 6 op dispatch vs 8 for Zen 5.
Zen 4 has 3 load, 2 stores vs 4 loads, 2 stores for Zen 5.
Zen 4 has 32 kib L1D cache vs 48 kib for Zen 5.

adroc_thurston · Sep 29, 2023

Saylick said:
Zen 4 has 4 ALUs vs 6 for Zen 5.
Zen 4 has 6 op dispatch vs 8 for Zen 5.
Zen 4 has 3 load, 2 stores vs 4 loads, 2 stores for Zen 5.
Zen 4 has 32 kib L1D cache vs 48 kib for Zen 5.

Yes, Zen5 looks for the most part similar to Nuvia Phoenix or Apple Firestorm/Avalanche/Everest 'cept the non-baby mode FPU.
Which is...
I mean I've said that a thousand times over before.

leoneazzurro · Sep 29, 2023

It seems a bit like the Comet Lake->Alder Lake jump in terms of core resources.

Gideon · Sep 29, 2023

So, if the slides are to believed (and they look quite authentic), the core ends up being much more similar to Alder Lake than I anticipated. But still noticeably fatter.

The same 12-way 48KB L1 cache as Colden Cove (hopefully without the latency penalty)
8-wide dispatch (+2 vs Alder Lake and Zen 4)
6 ALUs (+1 vs Alder lake +2 vs Zen 4)
4 loads / 2 stores per cycle (vs 3/2 for Golden cove, 2 /1 for Zen 4)
- - if I'm reading this right, these are 512bit (64 byte) ? That's a massive uplift from Zen 4 if true (4x the throughput in ideal AVX-512 scenarios)

The biggest unknown for me is how do they plan to feed the beast? There are no mentions of any decoder changes, surely it would be an absurd bottleneck, if not changed?

Anyway looking forward to comparisons to the Arrow Lake core. In the end, they couuld end up pretty similar in width - so it would all come down to execution.

Gideon · Sep 29, 2023

Saylick said:
Zen 4 has 3 load, 2 stores vs 4 loads, 2 stores for Zen 5.

Hmm, Have I misunderstood it. I always thought i'ts two 256-bit loads and one 256 bit store:

AMD’s Zen 4 Part 1: Frontend and Execution Engine

AMD’s Zen 4 architecture has been hotly anticipated by many in the tech sphere; as a result many rumors were floating around about its performance gains prior to its release.

chipsandcheese.com

This is likely because AMD didn’t implement wider buses to the L1 data cache. Zen 4’s L1D can handle two 256-bit loads and one 256-bit store per cycle, which means that vector load/store bandwidth remains unchanged from Zen 2. The Gigabyte leak suggested alignment changed to 512-bit, but that clearly doesn’t apply for stores.

adroc_thurston · Sep 29, 2023

leoneazzurro said:
It seems a bit like the Comet Lake->Alder Lake jump in terms of core resources.

Bigger.

Gideon said:
the core ends up being much more similar to Alder Lake than I anticipated

It's. A. Firestorm.

Gideon said:
There are no mentions of any decoder changes, surely it would be an absurd bottleneck,

It kinda cutely mentions twice the i$ fetch bandwidth.

Henry swagger · Sep 29, 2023

Gideon said:
So, if the slides are to believed (and they look quite authentic), the core ends up being much more similar to Alder Lake than I anticipated. But still noticeably fatter.

The same 12-way 48KB L1 cache as Colden Cove (hopefully without the latency penalty)

8-wide dispatch (+2 vs Alder Lake and Zen 4)

6 ALUs (+1 vs Alder lake +2 vs Zen 4)

4 loads / 2 stores per cycle (vs 3/2 for Golden cove, 2 /1 for Zen 4)

- if I'm reading this right, these are 512bit (64 byte) ? That's a massive uplift from Zen 4 if true (4x the throughput in ideal AVX-512 scenarios)

The biggest unknown for me is how do they plan to feed the beast? There are no mentions of any decoder changes, surely it would be an absurd bottleneck, if not changed?

Anyway looking forward to comparisons to the Arrow Lake core. In the end, they couuld end up pretty similar in width - so it would all come down to execution.

Zen 5 can compete of they can hit 6.2 gjz.. pr arrow lake wins easily

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Lifer

Moderator Emeritus, Elite Member

Lifer

Moderator Emeritus, Elite Member

Elite Member

Golden Member

Diamond Member

Senior member

Golden Member

Golden Member

Diamond Member

Senior member

Golden Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Platinum Member

Platinum Member

Diamond Member

Senior member