Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 228 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

StefanR5R

Elite Member
Dec 10, 2016
6,670
10,551
136
Cinebench would be doing just as fine
I suspect Cinebench would be fine if there were only core-private caches.

I happen to think the second option is more likely because AMD believes the cases with optimal resource allocation will outweigh the other ones.
They surely ran lots of simulations, leaving uncertainty mostly to the question how well the investigated workloads overlap with customers' workloads.
 

cherullo

Member
May 19, 2019
55
126
106
In current AMD designs, the L3 is clocked dynamically at the same clock of the fastest core in the cluster. If all the C cores are in the same cluster, then the low power cluster's L3 can be optimized in area/power for a lower maximum frequency. And during low power/idle time, the other larger, faster L3 can be completely turned off.
So I'd guess that this is the most power efficient setup for common laptop workloads, while still offering plenty of multi-threaded performance.
 

naukkis

Golden Member
Jun 5, 2002
1,020
853
136
That may sound weird, but Zen5 cluster interconnect is designed to be explicitly modular, goes from 4 to 16 cores.

I know but I also know that it ain't cheap to redesign anything in modern manufacturing processes. Look how AMD did reuse full 8-ring stop cpu ring for Phoenix2 instead of designing 6-stop reduced cache version for such a cost-oriented cpu - design costs seems to play big role.
 

adroc_thurston

Diamond Member
Jul 2, 2023
7,074
9,822
106
I also know that it ain't cheap to redesign anything in modern manufacturing processes.
?
It's not that expensive.
Look how AMD did reuse full 8-ring stop cpu ring for Phoenix2 instead of designing 6-stop reduced cache version for such a cost-oriented cpu - design costs seems to play big role.
The ring isn't provisioned for scaling seamlessly like that.
 

Saylick

Diamond Member
Sep 10, 2012
4,035
9,454
136
That may sound weird, but Zen5 cluster interconnect is designed to be explicitly modular, goes from 4 to 16 cores.
Mmm, it’s starting to make sense to me about that ladder cache then. Basically introduce additional horizontal rungs so that portions of the ladder can be lopped off based on how many cores you want in the CCX.

Edit: Adding some images to explain.

8 cores:
1704905032611.png

6 cores:
1704905040770.png

4 cores:
1704905048417.png
 
Last edited:

eek2121

Diamond Member
Aug 2, 2005
3,408
5,046
136
P-core boost differences are typically very small, especially if all of the cores are in close proximity on the same piece of silicon. That's just max boost though which doesn't come into play in this context because you're not hitting max boost clocks past 1 - 2 cores being loaded. Then the question becomes, what is the 3 - 4 cores boost frequency of the P-cores? If the C-cores can't hit that same frequency, it makes no sense to have a 2p4c+2p4c split because you'd have a significant drop-off in performance past 2 cores being loaded. If the C-cores can hit that frequency but are at the end of their frequency range and thus less efficient than the P-cores in that range, then it makes no sense to have a 2p4c+2p4c split because you are using more power for no performance improvement. Additionally, once you move to the 2nd CCX, you are now bringing in 2 P-cores that will never boost above a 7-8 core loaded frequency, which the C-cores could easily achieve, so why make them P-cores at all? You are then using more space for no performance or efficiency gain. The proposed configuration makes no sense.

On AMD the differences are actually quite large. more than 300 Mhz on my 7950X. Only 2 of my cores can hit the 5.75 ghz. The worst of the bunch can technically hit 5.5. ghz, but rarely ever comes close.

At least 1 of my cores hits 6ghz with a bit of prodding (read: overclocking magic, though I run stock)
 

Hitman928

Diamond Member
Apr 15, 2012
6,695
12,370
136
On AMD the differences are actually quite large. more than 300 Mhz on my 7950X. Only 2 of my cores can hit the 5.75 ghz. The worst of the bunch can technically hit 5.5. ghz, but rarely ever comes close.

At least 1 of my cores hits 6ghz with a bit of prodding (read: overclocking magic, though I run stock)

In this case you're talking about 2 different pieces of silicon that I'm pretty sure AMD bins for 1 fast and 1 slow. Considering the difference is <5% even in this case, that's pretty close. The difference will be even tighter on the same silicon and much smaller than the jump down to a C-core speed.
 

eek2121

Diamond Member
Aug 2, 2005
3,408
5,046
136
In this case you're talking about 2 different pieces of silicon that I'm pretty sure AMD bins for 1 fast and 1 slow. Considering the difference is <5% even in this case, that's pretty close. The difference will be even tighter on the same silicon and much smaller than the jump down to a C-core speed.

Even on the same chiplet there are wide variations.

I wasn’t intending to compare it to Zen4C, I was merely stating that there is absolutely a significant difference between P-Cores.
 

Hitman928

Diamond Member
Apr 15, 2012
6,695
12,370
136
Even on the same chiplet there are wide variations.

I wasn’t intending to compare it to Zen4C, I was merely stating that there is absolutely a significant difference between P-Cores.

On the same CCD I would expect the cores to have a spread of ~2% at most, though probably less than 1% (assuming adequate cooling). When dealing with a 7950x, that means around 100 - 125 MHz max boost difference but probably closer to 50 MHz.
 

moinmoin

Diamond Member
Jun 1, 2017
5,242
8,456
136
On the same CCD I would expect the cores to have a spread of ~2% at most, though probably less than 1% (assuming adequate cooling). When dealing with a 7950x, that means around 100 - 125 MHz max boost difference but probably closer to 50 MHz.
What makes you think so? That'd essentially be margin of error level. If that were the case the whole effort of designating preferred cores etc. going on since the first Zen gen in CPPC and Ryzen Master would be a sure waste.
 

Hitman928

Diamond Member
Apr 15, 2012
6,695
12,370
136
What makes you think so? That'd essentially be margin of error level. If that were the case the whole effort of designating preferred cores etc. going on since the first Zen gen in CPPC and Ryzen Master would be a sure waste.

Because TSMC has better wafer skew characteristics than GF, especially GF's early FinFET wafers and AMD's physical design team has gotten really good. Preferred core is still relevant as you'll still have 1 or 2 cores that can reach slightly higher frequencies or hit the same frequencies but with slightly less voltage, but preferred core is much more significant in multi-CCD products where one CCD is going to be from a completely different, and most likely weaker, wafer. So, in multi-CCD products, having a preferred core system makes sure you are keeping the single/lightly threaded loads on the stronger CCD.
 
  • Like
Reactions: Tlh97

naukkis

Golden Member
Jun 5, 2002
1,020
853
136
?
It's not that expensive.

The ring isn't provisioned for scaling seamlessly like that.

What do you mean? We have die shot from Phoenix2 which clearly shows that there's 8-slice L3 ring probably directly from bigger Phoenix. Ring still works fine even if some of ring client slots are unused. Amd chose that copy-paste approach instead of designing more area-friendly 6-slice L3 ring for Phoenix2 - that if something should show how much they prefer copy-pasting previously designed parts instead of totally new designs for one product. 1704912145092.png
 
  • Like
Reactions: dr1337

naukkis

Golden Member
Jun 5, 2002
1,020
853
136
The. Ring. Is. Not. Provisioned. For. <8 stop. Operation.
Got it?

They are used, ring only connects L3 slices, not cores.

Ring has to have a ring stop for each L3-slice. That ring stop serve both core and it's L3 slice. Phoenix2 is pretty unusual design which wastes a bit space as each L3 slice still has shadow tags and other stuff needed for 8 cores. AMD decided not to redesign L3-ring for mass produced 6-core part to save silicon space.
 

adroc_thurston

Diamond Member
Jul 2, 2023
7,074
9,822
106
Edit: Adding some images to explain.

8 cores:
1704905032611.png


6 cores:
1704905040770.png


4 cores:
1704905048417.png
I've no idea if that is what AMD is doing but IBM surely did that for their racetrack ring in Power10 and (I think) Telum.
 
  • Like
Reactions: lightmanek

naukkis

Golden Member
Jun 5, 2002
1,020
853
136
Mmm, it’s starting to make sense to me about that ladder cache then. Basically introduce additional horizontal rungs so that portions of the ladder can be lopped off based on how many cores you want in the CCX.

Edit: Adding some images to explain.

8 cores:
View attachment 91691

6 cores:
View attachment 91692

4 cores:
View attachment 91693

L3 on todays designs is sliced. What it means that every L3 slice is hard-wired to handle it's own part of address space - with AMD ring designs every slice handle 1/8 of address space. That means that cache has always have all it's slices active no matter how many cross-interconnects they utilize - cache it's only in working condition with every slice. They can, of course, change that design but hardware sliced L3 has obvious performance and performance/watt advantage so doing some other kind of design is a quite a big step.

AMD's Zen1 design have 4-way sliced L3 where every slice is connected to every other slice directly. But they cannot disable slices as that address space is hardwired. So 2-core Zen1-designs have 4-slice L3 too.
 
  • Like
Reactions: Tlh97 and moinmoin

Doug S

Diamond Member
Feb 8, 2020
3,574
6,305
136
Looks aimed at the OEM market. Which is interesting.

Doesn't it have to be, given that it seems to be using soldered on CPUs (at least that's what I infer from it using mobile CPUs) That would be a royal pain to stock with more than one or two options, and you have a lot more inventory value sitting on shelves when there is a CPU attached.

But yeah I agree LPCAMM2 would make that better. Using DDR4 is terrible for a 2024 product, but I guess it is using outdated CPUs too. It is too soon for LPCAMM2, you're probably paying through the nose to get it until late this year (and then only if we actually start to see products using LPCAMM2 shipping in the hundreds of thousands or higher by the end of the year)
 
  • Like
Reactions: Joe NYC

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
Hopefully, someone will release a similar mobo with LPCAMM2 memory in the future.
LPCAMM2 are LPDDR5/X memory chips. So the only options are 7000 series and later.

I expect that this form factor will become prevalent, because use case for it will be much more widespread. Not only for AMD CPUs, but also Intel, and potentially - ARM chips.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,634
5,174
136
LPCAMM2 are LPDDR5/X memory chips. So the only options are 7000 series and later.

I expect that this form factor will become prevalent, because use case for it will be much more widespread. Not only for AMD CPUs, but also Intel, and potentially - ARM chips.

Yes, I understand. Which is why I mentioned "in the future".

One thing I am wondering if the memory controllers of desktop parts, such as those using AM5 socket can deal with LPDDR5x and LMCAMM2 memory.
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,208
583
126
So no new info about Zen5 from AMD at CES 2024.

Do you still think it’ll be released and available in stores in April 2024, like some on this forum have said previously?