Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

StefanR5R · Jan 9, 2024

naukkis said:
Cinebench would be doing just as fine

I suspect Cinebench would be fine if there were only core-private caches.

coercitiv said:
I happen to think the second option is more likely because AMD believes the cases with optimal resource allocation will outweigh the other ones.

They surely ran lots of simulations, leaving uncertainty mostly to the question how well the investigated workloads overlap with customers' workloads.

cherullo · Jan 9, 2024

In current AMD designs, the L3 is clocked dynamically at the same clock of the fastest core in the cluster. If all the C cores are in the same cluster, then the low power cluster's L3 can be optimized in area/power for a lower maximum frequency. And during low power/idle time, the other larger, faster L3 can be completely turned off.
So I'd guess that this is the most power efficient setup for common laptop workloads, while still offering plenty of multi-threaded performance.

naukkis · Jan 9, 2024

adroc_thurston said:
That may sound weird, but Zen5 cluster interconnect is designed to be explicitly modular, goes from 4 to 16 cores.

I know but I also know that it ain't cheap to redesign anything in modern manufacturing processes. Look how AMD did reuse full 8-ring stop cpu ring for Phoenix2 instead of designing 6-stop reduced cache version for such a cost-oriented cpu - design costs seems to play big role.

adroc_thurston · Jan 9, 2024

naukkis said:
I also know that it ain't cheap to redesign anything in modern manufacturing processes.

?
It's not that expensive.

naukkis said:
Look how AMD did reuse full 8-ring stop cpu ring for Phoenix2 instead of designing 6-stop reduced cache version for such a cost-oriented cpu - design costs seems to play big role.

The ring isn't provisioned for scaling seamlessly like that.

Saylick · Jan 10, 2024

adroc_thurston said:
That may sound weird, but Zen5 cluster interconnect is designed to be explicitly modular, goes from 4 to 16 cores.

Mmm, it’s starting to make sense to me about that ladder cache then. Basically introduce additional horizontal rungs so that portions of the ladder can be lopped off based on how many cores you want in the CCX.

Edit: Adding some images to explain.

8 cores:

6 cores:

4 cores:

eek2121 · Jan 10, 2024

Hitman928 said:
P-core boost differences are typically very small, especially if all of the cores are in close proximity on the same piece of silicon. That's just max boost though which doesn't come into play in this context because you're not hitting max boost clocks past 1 - 2 cores being loaded. Then the question becomes, what is the 3 - 4 cores boost frequency of the P-cores? If the C-cores can't hit that same frequency, it makes no sense to have a 2p4c+2p4c split because you'd have a significant drop-off in performance past 2 cores being loaded. If the C-cores can hit that frequency but are at the end of their frequency range and thus less efficient than the P-cores in that range, then it makes no sense to have a 2p4c+2p4c split because you are using more power for no performance improvement. Additionally, once you move to the 2nd CCX, you are now bringing in 2 P-cores that will never boost above a 7-8 core loaded frequency, which the C-cores could easily achieve, so why make them P-cores at all? You are then using more space for no performance or efficiency gain. The proposed configuration makes no sense.

On AMD the differences are actually quite large. more than 300 Mhz on my 7950X. Only 2 of my cores can hit the 5.75 ghz. The worst of the bunch can technically hit 5.5. ghz, but rarely ever comes close.

At least 1 of my cores hits 6ghz with a bit of prodding (read: overclocking magic, though I run stock)

Hitman928 · Jan 10, 2024

eek2121 said:
On AMD the differences are actually quite large. more than 300 Mhz on my 7950X. Only 2 of my cores can hit the 5.75 ghz. The worst of the bunch can technically hit 5.5. ghz, but rarely ever comes close.

At least 1 of my cores hits 6ghz with a bit of prodding (read: overclocking magic, though I run stock)

In this case you're talking about 2 different pieces of silicon that I'm pretty sure AMD bins for 1 fast and 1 slow. Considering the difference is <5% even in this case, that's pretty close. The difference will be even tighter on the same silicon and much smaller than the jump down to a C-core speed.

eek2121 · Jan 10, 2024

Hitman928 said:
In this case you're talking about 2 different pieces of silicon that I'm pretty sure AMD bins for 1 fast and 1 slow. Considering the difference is <5% even in this case, that's pretty close. The difference will be even tighter on the same silicon and much smaller than the jump down to a C-core speed.

Even on the same chiplet there are wide variations.

I wasn’t intending to compare it to Zen4C, I was merely stating that there is absolutely a significant difference between P-Cores.

Hitman928 · Jan 10, 2024

eek2121 said:
Even on the same chiplet there are wide variations.

I wasn’t intending to compare it to Zen4C, I was merely stating that there is absolutely a significant difference between P-Cores.

On the same CCD I would expect the cores to have a spread of ~2% at most, though probably less than 1% (assuming adequate cooling). When dealing with a 7950x, that means around 100 - 125 MHz max boost difference but probably closer to 50 MHz.

moinmoin · Jan 10, 2024

Hitman928 said:
On the same CCD I would expect the cores to have a spread of ~2% at most, though probably less than 1% (assuming adequate cooling). When dealing with a 7950x, that means around 100 - 125 MHz max boost difference but probably closer to 50 MHz.

What makes you think so? That'd essentially be margin of error level. If that were the case the whole effort of designating preferred cores etc. going on since the first Zen gen in CPPC and Ryzen Master would be a sure waste.

Hitman928 · Jan 10, 2024

moinmoin said:
What makes you think so? That'd essentially be margin of error level. If that were the case the whole effort of designating preferred cores etc. going on since the first Zen gen in CPPC and Ryzen Master would be a sure waste.

Because TSMC has better wafer skew characteristics than GF, especially GF's early FinFET wafers and AMD's physical design team has gotten really good. Preferred core is still relevant as you'll still have 1 or 2 cores that can reach slightly higher frequencies or hit the same frequencies but with slightly less voltage, but preferred core is much more significant in multi-CCD products where one CCD is going to be from a completely different, and most likely weaker, wafer. So, in multi-CCD products, having a preferred core system makes sure you are keeping the single/lightly threaded loads on the stronger CCD.

naukkis · Jan 10, 2024

adroc_thurston said:
?
It's not that expensive.

The ring isn't provisioned for scaling seamlessly like that.

What do you mean? We have die shot from Phoenix2 which clearly shows that there's 8-slice L3 ring probably directly from bigger Phoenix. Ring still works fine even if some of ring client slots are unused. Amd chose that copy-paste approach instead of designing more area-friendly 6-slice L3 ring for Phoenix2 - that if something should show how much they prefer copy-pasting previously designed parts instead of totally new designs for one product.

adroc_thurston · Jan 10, 2024

naukkis said:
What do you mean?

The. Ring. Is. Not. Provisioned. For. <8 stop. Operation.
Got it?

naukkis said:
there's 8-slice L3 ring probably directly from bigger Phoenix. Ring still works fine even if some of ring client slots are unused.

They are used, ring only connects L3 slices, not cores.

naukkis · Jan 10, 2024

adroc_thurston said:
The. Ring. Is. Not. Provisioned. For. <8 stop. Operation.
Got it?

They are used, ring only connects L3 slices, not cores.

Ring has to have a ring stop for each L3-slice. That ring stop serve both core and it's L3 slice. Phoenix2 is pretty unusual design which wastes a bit space as each L3 slice still has shadow tags and other stuff needed for 8 cores. AMD decided not to redesign L3-ring for mass produced 6-core part to save silicon space.

adroc_thurston · Jan 10, 2024

Saylick said:
Edit: Adding some images to explain.

8 cores:

6 cores:

4 cores:

I've no idea if that is what AMD is doing but IBM surely did that for their racetrack ring in Power10 and (I think) Telum.

Glo. · Jan 11, 2024

https://twitter.com/x/status/1745426266730487982

ASRock > News

asrock.com

I know that this is FP6, but you can imagine how boards with for example Strix Point and Strix Point Halo soldered in would look like.

naukkis · Jan 11, 2024

Saylick said:
Mmm, it’s starting to make sense to me about that ladder cache then. Basically introduce additional horizontal rungs so that portions of the ladder can be lopped off based on how many cores you want in the CCX.

Edit: Adding some images to explain.

8 cores:
View attachment 91691

6 cores:
View attachment 91692

4 cores:
View attachment 91693

L3 on todays designs is sliced. What it means that every L3 slice is hard-wired to handle it's own part of address space - with AMD ring designs every slice handle 1/8 of address space. That means that cache has always have all it's slices active no matter how many cross-interconnects they utilize - cache it's only in working condition with every slice. They can, of course, change that design but hardware sliced L3 has obvious performance and performance/watt advantage so doing some other kind of design is a quite a big step.

AMD's Zen1 design have 4-way sliced L3 where every slice is connected to every other slice directly. But they cannot disable slices as that address space is hardwired. So 2-core Zen1-designs have 4-slice L3 too.

DrMrLordX · Jan 11, 2024

Glo. said:
https://twitter.com/x/status/1745426266730487982

ASRock > News

asrock.com

I know that this is FP6, but you can imagine how boards with for example Strix Point and Strix Point Halo soldered in would look like.

Looks aimed at the OEM market. Which is interesting.

Joe NYC · Jan 11, 2024

Glo. said:
https://twitter.com/x/status/1745426266730487982

ASRock > News

asrock.com

I know that this is FP6, but you can imagine how boards with for example Strix Point and Strix Point Halo soldered in would look like.

Hopefully, someone will release a similar mobo with LPCAMM2 memory in the future.

Doug S · Jan 12, 2024

DrMrLordX said:
Looks aimed at the OEM market. Which is interesting.

Doesn't it have to be, given that it seems to be using soldered on CPUs (at least that's what I infer from it using mobile CPUs) That would be a royal pain to stock with more than one or two options, and you have a lot more inventory value sitting on shelves when there is a CPU attached.

But yeah I agree LPCAMM2 would make that better. Using DDR4 is terrible for a 2024 product, but I guess it is using outdated CPUs too. It is too soon for LPCAMM2, you're probably paying through the nose to get it until late this year (and then only if we actually start to see products using LPCAMM2 shipping in the hundreds of thousands or higher by the end of the year)

DrMrLordX · Jan 12, 2024

Doug S said:
but I guess it is using outdated CPUs too.

It is. It's going to use "7nm AMD processors" according to the blurb, which probably means Zen3-gen mobile SoCs. Those SoCs would only be compatible with DDR4.

Glo. · Jan 12, 2024

Joe NYC said:
Hopefully, someone will release a similar mobo with LPCAMM2 memory in the future.

LPCAMM2 are LPDDR5/X memory chips. So the only options are 7000 series and later.

I expect that this form factor will become prevalent, because use case for it will be much more widespread. Not only for AMD CPUs, but also Intel, and potentially - ARM chips.

Joe NYC · Jan 12, 2024

Glo. said:
LPCAMM2 are LPDDR5/X memory chips. So the only options are 7000 series and later.

I expect that this form factor will become prevalent, because use case for it will be much more widespread. Not only for AMD CPUs, but also Intel, and potentially - ARM chips.

Yes, I understand. Which is why I mentioned "in the future".

One thing I am wondering if the memory controllers of desktop parts, such as those using AM5 socket can deal with LPDDR5x and LMCAMM2 memory.

adroc_thurston · Jan 13, 2024

Joe NYC said:
such as those using AM5 socket can deal with LPDDR5x and LMCAMM2 memory.

You can't.

Fjodor2001 · Jan 13, 2024

So no new info about Zen5 from AMD at CES 2024.

Do you still think it’ll be released and available in stores in April 2024, like some on this forum have said previously?

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Elite Member

Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member