Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

StefanR5R · Oct 3, 2023

Re #4,028, #4,044:

Ansys Fluent throughput very much depends on memory bandwidth and level 3 cache (https://www.amd.com/en/server-docs/ansys-fluent-performance-amd-epyc-7003-series-processors). Hence, SMT isn't useful. However, on Linux, you get almost the same performance if you

a) leave SMT on but configure the application to limit its computing thread count to half of the number of logical CPUs,
b) switch SMT off in the BIOS and let the application use all of the logical CPUs,
c) do it like in a) but additionally bind the computing threads to dedicated logical CPUs such that none of them share a physical core.

I suspect that a) might be way behind b) and c) on Windows, but it definitely is not on Linux. — In other words, SMT=off is a bit of a primitive hack for applications like this.

Bergamo isn't quite intended for CFD, obviously. The F (cache and frequency optimized) and X (3D cache) SKUs of Milan, Genoa, and (to get back on topic) Turin, are or will be better suited. Much more so if you pay per-CPU license fees. Even if you have a license for 256 CPUs, I do suspect that really large simulations compute faster (at the expense of respectively more task energy) on a small Infiniband cluster of nodes with said F or X SKUs, with higher MPI latency but also with much more aggregate cache and memory channels. Actually, I wonder if a single 2P Genoa-X wouldn't already handily outperform the 2P Bergamo — given the same memory throughput, same power budget, same MPI latency, but more processor cache (though lower compute density) on Genoa-X.

And getting back to Zen 5 and Turin: Seems as if it might get some interesting updates for streaming FP math, but what about classic FP math? I'm rather out of touch with CFD by now; last time I dealt with it, vector arithmetic wasn't a thing for CFD (pressure solvers).

(edited for clarity)

Tigerick · Oct 3, 2023

DisEnchantment said:
I wonder if he would want to wake up if he realize he will get single digit percentage gains . [15% IPC gain + supposed 5 % clock regression = paltry 9% perf gain.]

Besides that, core uarch update seems really interesting. Biggest update since Zen 1 for sure.
I would have like to see more updates on the SoC architecture but looks like another long wait.

16M SLC on 8C Strix Zen 5c with 8WGP RDNA3+ would have been great for a lot of these Windows handhelds otherwise.

Strix seems to be getting a huge bump on AIE tiles and +33% CUs.

View attachment 86599

https://x.com/All_The_Watts/status/1708791849652273180?s=20

Curious whether they will put SLC/MALL all around for Zen 6. I have seen several MALL prefetch patents.

Hmm, now we are dealing with 2 different dies of Strix Point; instead of cutting core counts, AMD decided to cut half amount of L3 cache of Zen 5C, interesting. At least we have clearer picture of what the mobile APU lineup in 2024:-

2024	TDP	Node	Die Size	P-core	E-core	Total L3 Cache	RDNA3+	ALU	AIE	Memory BW
Ryzen 5 U-series	?	N4P	?	? x Zen5	8xZen5c 8 MB	? MB	?	?	?	128-bit 8533
Ryzen 7 U-series	28-35W+	N4P	225 mm2	4 x Zen5 16 MB	8xZen5c 16 MB	32 MB	8 WGP	1024	64	128-bit 8533
Ryzen 7 HS-series	45W+	N4P	225 mm2	4 x Zen5 16 MB	8xZen5c 16 MB	32 MB	8 WGP	1024	64	128-bit 8533
Ryzen 9 HS-series	?	N4Px2 + N3E	?	16xZen5 64 MB	NA	64 MB	20 WGP	2560	?	256-bit 8533
Ryzen 9 HX-series	55W+	N4Px2 + N6 ?	?	16xZen5 64 MB	NA	64 MB	?	?	NA	128-bit

eek2121 · Oct 3, 2023

PJVol said:
I may have missed this, but were there any assumptions on whether Zen 6 will require a new socket?
Just curious of what actual AM5 lifespan is.

Zen 6 will likely still use AM5.

cortexa99 said:
I'm afraid there would be no Zen5 at January. Only teaser.

OTOH, few months ago Zen5 DT completion had been already planned to be Oct-Nov, mass production could happen at this timeframe, and there were about 4 months gap between completion to release since Zen2, so you can expect the actual release could happen in 1H2024 or even as early as Mar-Apr.

Mass production could even happening right now when I type this message.

Oh I didn’t mean to imply as much. “Announce” was what I was referring to, though you are right we could just get a teaser. Usually parts follow the announcement after a period of weeks or months, so March - May are probably good bets. I have not actually seen any solid leaks for that timeline, however. The only leak I have seen that was reliable indicated late 3rd quarter.

HurleyBird said:
Also doesn't disprove that there are instances where a change can both increase IPC and reduce power consumption, which is very obviously true. Practically anything that reduces the need to move up the memory hierarchy may do that, or anything that reduces communication distances/hops, and of course not all work is created equal and it's possible to do more or less work to achieve a result. A more general approach to finding a result can both perform worse and consume more energy than a more specific approach. It's of course possible to do more work with less active transistors and vice versa. And then there's pipelines, branch prediction (which is huge, mispredictions are extremely expensive), OoO etc.

It's not a claim any engineer would make.

Very rarely will that ever be the case. When it is, the issue is usually either a failure to optimize the first iteration or the introduction of new power management features (or both)

Absent those two things, increasing IPC means increasing transistors, which means increasing power consumption.

Oh, and regardless of what you think of @adroc_thurston , note that I AM an engineer. I used to build some hardware products for a living, but these days it is all software, (though I did build a 6502 system on a breadboard recently, and I do have a product I am working on outside of work that is hardware, but not chip level stuff). I am also a tech veteran, having been building PCs since the late 80s. I also have an A+ cert and many other certs, used to work in IT (I do mostly web development these days since it pays very well) and have quite a few industry contacts, just very few that could (or are willing to) give any inside info.

What AMD did with Zen 3 is actually pretty unheard of in the industry. However…

coercitiv said:
Zen 3 vs. Zen 2

19% IPC increase

same node class, but improved

slightly higher clocks

bigger die

ISO power

right in our face

View attachment 86584
View attachment 86585

The biggest problem in this thread isn't this discussion point though, but rather whether folks around here are going to accept the rude verdicts of a poster as gospel or demand the minimum of proof and decorum.

Zen 3 is a perf/watt champ, but those charts show exactly what we are referring to. IPC is up, but so is power. That is why the big upgrades often happen after node shrinks. Shrinks drop power/die area, increasing the budget for more transistors, which allow for IPC upgrades.

Goop_reformed said:
To be honest I'm sort of baffled of how people here treated mlid like he's an anti-christ. He definitely has good infos and people who are into tech news watch him religiously. Right after a new video is posted, a member linked that here almost instantly. I used to think people hate watch him but my opinions have been swayed.

He makes stuff up all the time.
He has deleted videos where he made stuff up and got it wrong.
He also spins his failures to make it look like products were cancelled, etc.
He is not a tech person, so often he doesn’t understand what he does see.
He will do/say anything to get views, because he makes money, and that is the only reason he still exists.

Most valid leaks are found in other locations. I’ve yet to hear/see a single one that came originally from one of these YouTube “leakers”.

There are some folks on this forum that know A LOT about Intel/AMD’s release plans, and I am sure they die a bit inside every time he gets quoted.

bakyt115 · Oct 3, 2023

Also seems like mlid has friends on this forum (or his own accounts) promoting his videos.

Tuna-Fish · Oct 3, 2023

blackangus said:
From a platform perspective are we expecting new 770/750 chipsets?

adroc_thurston said:
No.
Why?

The motherboard vendors probably want a new model name. If they do, I can't see AMD not supporting their partners with one.

The reason is that Zen4 can currently support much higher memory clocks than it could at launch. Memory clock speed is a selling point printed in large type on MB packaging, and most current models only have 6400+, which was what it was possible to test when the motherboards were released. So MB vendors want to refresh their lineups, and when they do that, they usually want to have a new chipset name.

It's entirely possible that the chipset in question is literally PROM21, just rebranded.

HurleyBird · Oct 3, 2023

eek2121 said:
Very rarely will that ever be the case. When it is, the issue is usually either a failure to optimize the first iteration or the introduction of new power management features (or both)

Absent those two things, increasing IPC means increasing transistors, which means increasing power consumption.

Rarely isn't the same as never. Especially when a company focuses on agility, many things can be left of the table. Zen4c is a poster child for that. And as an engineer, you know there is such a thing as a free lunch--they just happen to be extremely rare.

But I don't think your categorization is correct.

Anything to do with cache and memory can increase IPC while reducing power consumption. That's a category you can pretty much always do something with.

And anything to do with layout can increase IPC (timings) while reducing power consumption (distances, voltages). That's a category that is never 100% optimal in this day of billions of transistors.

For the former, the most significant example I can think of is Maxwell memory compression: Huge increase to IPC with a massive decrease in power.

For the later, as an extreme example, take an Epyc processor (or even desktop Zen) and make it monolithic (or stacked).

eek2121 said:
Zen 3 is a perf/watt champ, but those charts show exactly what we are referring to. IPC is up, but so is power.

It shows a mix, mostly because Zen3 runs at a higher frequency but has better voltage scaling. Taken as a whole, running clock-for-clock, volt-for-volt, Zen3 probably runs a bit hotter, but that's taking into account the aggregate changes. Unified L3 likely does both increase IPC and reduce power consumption, and there are most likely smaller optimizations that have some positive effect on both, documented or not. If you count changes that less directly improve power consumption by facilitating lower voltage, there are probably others of that kind also.

Ajay · Oct 3, 2023

Tuna-Fish said:
The motherboard vendors probably want a new model name. If they do, I can't see AMD not supporting their partners with one.

The reason is that Zen4 can currently support much higher memory clocks than it could at launch. Memory clock speed is a selling point printed in large type on MB packaging, and most current models only have 6400+, which was what it was possible to test when the motherboards were released. So MB vendors want to refresh their lineups, and when they do that, they usually want to have a new chipset name.

It's entirely possible that the chipset in question is literally PROM21, just rebranded.

I agree. It would almost be idiotic for AMD not to have a new chipset for Zen5. Maybe they tweak it, maybe they don't, but how are motherboard manufactures going to name their new mobos without a new chipset?
ROG CROSSHAIR X670 E++ MAXIMUM OVDRIVE EDITION???

Joe NYC · Oct 3, 2023

bakyt115 said:
Also seems like mlid has friends on this forum (or his own accounts) promoting his videos.

Thanks for the reminder, I have been slacking off here. Here is a good discussion with Wendell:

Kepler_L2 · Oct 3, 2023

Ajay said:
I agree. It would almost be idiotic for AMD not to have a new chipset for Zen5. Maybe they tweak it, maybe they don't, but how are motherboard manufactures going to name their new mobos without a new chipset?
ROG CROSSHAIR X670 E++ MAXIMUM OVDRIVE EDITION???

Zen5 doesn't have any new I/O what's the point of a new chipset?

Tuna-Fish · Oct 3, 2023

Kepler_L2 said:
Zen5 doesn't have any new I/O what's the point of a new chipset?

If they actually had a new chipset, it could significantly better. The links from the CPU are all PCIE 5.0, a better chipset could provide twice the throughput.

But what Ajay and I are proposing is that the reason is just literally marketing. New x770 motherboards will seem fancier than last-gen x670 ones. Even if it's the same chip. There are reasons why vendors do rebrands.

Abwx · Oct 3, 2023

HurleyBird said:
It shows a mix, mostly because Zen3 runs at a higher frequency but has better voltage scaling. Taken as a whole, running clock-for-clock, volt-for-volt, Zen3 probably runs a bit hotter, but that's taking into account the aggregate changes.

3% higher frequency in MT than Zen 2, 15% more throughput and power lower by 9W, temp is 20°C lower at 64°C, numbers are for the 5950X vs 3950X, as i pointed likely that Zen 3 use an enhanced N7.

AMD Ryzen 5950X, 5900X, 5800X & 5600X im Test: Leistungsaufnahme und Temperatur

AMD Ryzen 5000 im Test: Leistungsaufnahme und Temperatur / Leistungsaufnahme von Leerlauf bis Volllast

www.computerbase.de

inf64 · Oct 3, 2023

I was reading this AT article on SMT scaling for Zen 3, and this quote piqued my interest:

"But, if a core design benefits from SMT, then perhaps the core hasn’t been designed optimally for a single thread of performance in the first place. If enabling SMT gives a user exact double performance and perfect scaling across the board, as if there were two cores, then perhaps there is a direct issue with how the core is designed, from execution units to buffers to cache hierarchy. It has been known for users to complain that they only get a 5-10% gain in performance with SMT enabled, stating it doesn't work properly - this could just be because the core is designed better for ST. Similarly, stating that a +70% performance gain means that SMT is working well could be more of a signal to an unbalanced core design that wastes power.

This is the dichotomy of Simultaneous Multi-Threading. If it works well, then a user gets extra performance. But if it works too well, perhaps this is indicative of a core not suited to a particular workload. The answer to the question ‘Is SMT a good thing?’ is more complicated than it appears at first glance."

Now looking forward to Zen 5, if the slide from mlid is for nT server workload and we assume AMD is sandbagging a bit (let's assume it's +20% higher MT performance at ISO clocks, or in other words 20% higher MT IPC), then below could be Zen 5's performance versus Zen 3 and Zen 4:

	ST	SMT multipier	MT
Zen3	1.000	1.250	1.250
Zen4	1.110	1.285	1.426
Zen5	1.476	1.150	1.697

So 33% higher ST IPC vs Zen 4 , core that is designed for ST supremacy, but as a consequence of that, scales much less with SMT (almost a half of the % that Zen 4 gets from SMT : ~28.5% vs 15%). I guess that the total MT throughput performance (at ISO core count and clocks) will greatly depend on what all core turbo speeds the Turin parts can achieve at the given TDP. If TDP is +25%, I think it's possible that we might see similar all core Turbo clocks as we had in Genoa case. That would translate to :

Turin classic vs Genoa classic: 128/96 x 1.2 = 1.6 or 60% more MT performance ; 0.95 x 1.33 = 1.26 or 26% higher ST performance (assuming 5% ST clock deficit vs Genoa)
Turin dense vs Bergamo (both with SMT) : 192 / 128 x 1.2 = 1.8 or 80% more MT performance ; 1 x 1.33 = 1.33 or 33% higher ST performance (assuming no ST clock deficit vs Bergamo)

Geddagod · Oct 3, 2023

Abwx said:
as i pointed likely that Zen 3 use an enhanced N7.

no

Geddagod · Oct 3, 2023

inf64 said:
I was reading this AT article on SMT scaling for Zen 3, and this quote piqued my interest:

"But, if a core design benefits from SMT, then perhaps the core hasn’t been designed optimally for a single thread of performance in the first place. If enabling SMT gives a user exact double performance and perfect scaling across the board, as if there were two cores, then perhaps there is a direct issue with how the core is designed, from execution units to buffers to cache hierarchy. It has been known for users to complain that they only get a 5-10% gain in performance with SMT enabled, stating it doesn't work properly - this could just be because the core is designed better for ST. Similarly, stating that a +70% performance gain means that SMT is working well could be more of a signal to an unbalanced core design that wastes power.

This is the dichotomy of Simultaneous Multi-Threading. If it works well, then a user gets extra performance. But if it works too well, perhaps this is indicative of a core not suited to a particular workload. The answer to the question ‘Is SMT a good thing?’ is more complicated than it appears at first glance."

Now looking forward to Zen 5, if the slide from mlid is for nT server workload and we assume AMD is sandbagging a bit (let's assume it's +20% higher MT performance at ISO clocks, or in other words 20% higher MT IPC), then below could be Zen 5's performance versus Zen 3 and Zen 4:

ST SMT multipier MT
Zen3 1.000 1.250 1.250
Zen4 1.110 1.285 1.426
Zen5 1.476 1.150 1.697

So 33% higher ST IPC vs Zen 4 , core that is designed for ST supremacy, but as a consequence of that, scales much less with SMT (almost a half of the % that Zen 4 gets from SMT : ~28.5% vs 15%). I guess that the total MT throughput performance (at ISO core count and clocks) will greatly depend on what all core turbo speeds the Turin parts can achieve at the given TDP. If TDP is +25%, I think it's possible that we might see similar all core Turbo clocks as we had in Genoa case. That would translate to :

Turin classic vs Genoa classic: 128/96 x 1.2 = 1.6 or 60% more MT performance ; 0.95 x 1.33 = 1.26 or 26% higher ST performance (assuming 5% ST clock deficit vs Genoa)
Turin dense vs Bergamo (both with SMT) : 192 / 128 x 1.2 = 1.8 or 80% more MT performance ; 1 x 1.33 = 1.33 or 33% higher ST performance (assuming no ST clock deficit vs Bergamo)

esentially the same math as I did here yesterday lol

inf64 · Oct 3, 2023

Geddagod said:
esentially the same math as I did here yesterday lol

I missed your post, but looking at it now, it's similar. I gave more details though

Joe NYC · Oct 3, 2023

Tuna-Fish said:
If they actually had a new chipset, it could significantly better. The links from the CPU are all PCIE 5.0, a better chipset could provide twice the throughput.

But what Ajay and I are proposing is that the reason is just literally marketing. New x770 motherboards will seem fancier than last-gen x670 ones. Even if it's the same chip. There are reasons why vendors do rebrands.

PCIe links to chiplet are Gen 5. I think 4.

All the other links coming directly from the CPU can be routed by Mobo makers.

I think the only problem is that there does not appear to ben easy way to split 8x PCIe Gen 5 to 16x Gen 4. Because if that's all that GPU needs, the mobo makers could gain the other 8x Gen 5 lanes.

Ajay · Oct 3, 2023

Kepler_L2 said:
Zen5 doesn't have any new I/O what's the point of a new chipset?

As @Tuna-Fish pointed out, this must be done, at a minimum, for marketing purposes. At least in the DIY space anyway. As he also pointed out, there are still technical reasons to do so, even if the I/O configuration on the CPU itself hasn't changed. AMD may opt to not spend another dime on the chipset - their choice obviously, but I'd be shocked if they didn't rename it.

Joe NYC · Oct 3, 2023

Ajay said:
As @Tuna-Fish pointed out, this must be done, at a minimum, for marketing purposes. At least in the DIY space anyway. As he also pointed out, there are still technical reasons to do so, even if the I/O configuration on the CPU itself hasn't changed. AMD may opt to not spend another dime on the chipset - their choice obviously, but I'd be shocked if they didn't rename it.

There are always tradeoffs, and time to market for Zen 5 is one of those tradeoffs.

If Zen 5 indeed launches in ~Q1 2024 into well-established eco system vs. 2-3 quarter delays, overpriced mobos with buggy BIOSes, I would definitely take faster time to market.

Abwx · Oct 3, 2023

Geddagod said:
no
View attachment 86623

Same 7nm technology doesnt imply that it s the same 7nm process, N7 and N7P are both based on the same 7nm process.

If we look at the 5950X vs 3950X the former has 20% better perf/watt at isoclocks, this is a hint that these are not the same iteration of 7nm.

TSMC’s N7P uses the same design rules as the company’s N7, but features front-end-of-line (FEOL) and middle-end-of-line (MOL) optimizations that enable to either boost performance by 7% at the same power, or lower power consumption by 10% at the same clocks.

TSMC Announces Performance-Enhanced 7nm & 5nm Process Technologies

www.anandtech.com

Det0x · Oct 3, 2023

Kepler_L2 said:
Zen5 doesn't have any new I/O what's the point of a new chipset?

Its a only a few AM5 motherboards that can run 8000MT/s stable in 2:1 mode, pretty much only two 1DPC from Asus and Gigabyte atm...

A new generation motherboards could improve improve memory layout/traces for the 2DPC boards.
At the moment they cap out at ~7400-7600MT/s if you want stability every reboot. (kinda behave like tuning memory on raptor lake, stability changes each reboot)

Below i did on my 1DPC GENE

Geddagod · Oct 3, 2023

Abwx said:
Same 7nm technology doesnt imply that it s the same 7nm process,

it does

Abwx said:
N7 and N7P are both based on the same 7nm process.

Then AMD would say "different TSMC 7nm finfet technology as Zen 2"

Abwx said:
If we look at the 5950X vs 3950X the former has 20% better perf/watt at isoclocks, this is a hint that these are not the same iteration of 7nm.

Better arch

H433x0n · Oct 3, 2023

inf64 said:
I was reading this AT article on SMT scaling for Zen 3, and this quote piqued my interest:

"But, if a core design benefits from SMT, then perhaps the core hasn’t been designed optimally for a single thread of performance in the first place. If enabling SMT gives a user exact double performance and perfect scaling across the board, as if there were two cores, then perhaps there is a direct issue with how the core is designed, from execution units to buffers to cache hierarchy. It has been known for users to complain that they only get a 5-10% gain in performance with SMT enabled, stating it doesn't work properly - this could just be because the core is designed better for ST. Similarly, stating that a +70% performance gain means that SMT is working well could be more of a signal to an unbalanced core design that wastes power.

This is the dichotomy of Simultaneous Multi-Threading. If it works well, then a user gets extra performance. But if it works too well, perhaps this is indicative of a core not suited to a particular workload. The answer to the question ‘Is SMT a good thing?’ is more complicated than it appears at first glance."

Now looking forward to Zen 5, if the slide from mlid is for nT server workload and we assume AMD is sandbagging a bit (let's assume it's +20% higher MT performance at ISO clocks, or in other words 20% higher MT IPC), then below could be Zen 5's performance versus Zen 3 and Zen 4:

ST SMT multipier MT
Zen3 1.000 1.250 1.250
Zen4 1.110 1.285 1.426
Zen5 1.476 1.150 1.697

So 33% higher ST IPC vs Zen 4 , core that is designed for ST supremacy, but as a consequence of that, scales much less with SMT (almost a half of the % that Zen 4 gets from SMT : ~28.5% vs 15%). I guess that the total MT throughput performance (at ISO core count and clocks) will greatly depend on what all core turbo speeds the Turin parts can achieve at the given TDP. If TDP is +25%, I think it's possible that we might see similar all core Turbo clocks as we had in Genoa case. That would translate to :

Turin classic vs Genoa classic: 128/96 x 1.2 = 1.6 or 60% more MT performance ; 0.95 x 1.33 = 1.26 or 26% higher ST performance (assuming 5% ST clock deficit vs Genoa)
Turin dense vs Bergamo (both with SMT) : 192 / 128 x 1.2 = 1.8 or 80% more MT performance ; 1 x 1.33 = 1.33 or 33% higher ST performance (assuming no ST clock deficit vs Bergamo)

This all makes sense but makes me wonder.. why keep SMT then? If there’s a meager 15% uplift from SMT, is it really worth the trade offs at that point? Getting rid of it reduces a lot of security and validation hurdles.

Geddagod · Oct 3, 2023

H433x0n said:
This all makes sense but makes me wonder.. why keep SMT then? If there’s a meager 15% uplift from SMT, is it really worth the trade offs at that point? Getting rid of it reduces a lot of security and validation hurdles.

looks at LNC

H433x0n · Oct 3, 2023

Geddagod said:
looks at LNC

Starting to think they had the right idea. For client it makes a lot of sense to get rid of it.

Glo. · Oct 3, 2023

H433x0n said:
This all makes sense but makes me wonder.. why keep SMT then? If there’s a meager 15% uplift from SMT, is it really worth the trade offs at that point? Getting rid of it reduces a lot of security and validation hurdles.

Expect that both, Intel and AMD will ditch the SMT from their mainstream CPUs.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Elite Member

Senior member

Diamond Member

Member

Golden Member

Platinum Member

Lifer

Diamond Member

Senior member

Golden Member

Lifer

Diamond Member

Golden Member

Golden Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Golden Member

Golden Member

Golden Member

Golden Member

Golden Member

Diamond Member