Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 50 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
846
799
106
AFAIK Strix Point (4+8, 16 CU, 4nm monolithic) is STX1, it was originally 8+4 and 3nm but got redefined due to TSMC issues. Strix Halo is (at least internally) called SAR(Sarlak) and STX3 has been cancelled entirely.
If STX3 is indeed being canceled, that would be pretty bad cause Intel is coming strong with Lunar Lake targeting ultraportables. So far, Lunar Lake is pretty much confirmed to have 4P+4E designs with Battlemage grahics engine that made by TSMC N3, the specs match with STX3. So I would hope AMD resume the designs:confused:
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
AFAIK Strix Point (4+8, 16 CU, 4nm monolithic) is STX1, it was originally 8+4 and 3nm but got redefined due to TSMC issues. Strix Halo is (at least internally) called SAR(Sarlak) and STX3 has been cancelled entirely.
STX3 being canceled certainly seems like an odd choice. Leaves no direct successor to PHX2. It's extra weird because the equivalent in Intel's lineup (2+8 ADL, for example) seems to be their best selling chip.
 

Tigerick

Senior member
Apr 1, 2022
846
799
106
STX3 being canceled certainly seems like an odd choice. Leaves no direct successor to PHX2. It's extra weird because the equivalent in Intel's lineup (2+8 ADL, for example) seems to be their best selling chip.
Hmm, come to think of it, I have wild speculation about STX3..... We all know about Intel Lunar Lake design so would AMD, so it is possible that AMD has made redesign of STX3 by porting it to N3E to fit in more CPU and GPU cores with double memory bandwidth.

All this while, I have been thinking LNL might employ dual channel memory support, especially I heard Battlemage might support dual issue pipeline, that made total ALU of 1024. That explains how Intel would position LNL as top to bottom U series (from i7 to i5). With STX3 refresh, AMD could scale up to Ryzen 7 to have better chance. What do you think?

Or AMD could scale down STX to 4P+4E with lower V/F curve (just like Z1 series) in order to target 12W TDP?
 
Last edited:

yuri69

Senior member
Jul 16, 2013
677
1,215
136
TBH investing into many mobile designs is weird given the OEM adoption track record.

AMD keeps reiterating how the reuse and tape out count minimization has been critical but switches to all in an OEM-dominated sector?
 
  • Like
Reactions: Tlh97

Joe NYC

Diamond Member
Jun 26, 2021
3,639
5,177
136
Doesn't make a lot of sense at this "early" point in time, not to mention the voltage sensitivity of having L3 as a V-cache die on top.
The recent burn out CPUs are prove of the current unsafe design/control.

I don't think L2 has grown significant as suggested for the simple reason that the die size has to be kept at the same size as before as much as possible despite node shrinks that mostly used to add other stuff (5-wide expansion) or squeeze the c/d variants. Not to mention SRAM doesn't shrink well.

Removing L3 from the main CCD die and increasing the size of L2 would not grow the die. It may even shrink it.

Sharing the L2 by making the other L2's core's L2s victim caches would improve utilization of the L2s and simulate some of the benefits of L3.

The interesting rumor bit is that L2 is somehow unified. I can only guess it's somewhat similar to IBM's Z15 virtual cache or moving toward that kind of solution. I'm guessing it's more like keeping core-bound data at closer L2 cells while larger data at more distant & area sharing cells. It's still conjecture at this point. Hence, depending on how you view it the L2 has grown even if it's not.

Yes, and faster mesh-like connect would make accessing the other core's L2s faster. The faster core to core latency would have a very specific benefits, in moving contents of L2s back and forth.
 
  • Like
Reactions: Tlh97

Joe NYC

Diamond Member
Jun 26, 2021
3,639
5,177
136
TBH investing into many mobile designs is weird given the OEM adoption track record.

AMD keeps reiterating how the reuse and tape out count minimization has been critical but switches to all in an OEM-dominated sector?

Notebook segment is by far the largest in in the PC sphere. AMD is positioning itself to have the best solution for many use cases in this market. As a way to grow the market share.
 
  • Like
Reactions: Tlh97

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
does intel have plans for a discreet gpu in mobile platforms in addition to their future igpus?
 

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
If STX3 is indeed being canceled, that would be pretty bad cause Intel is coming strong with Lunar Lake targeting ultraportables. So far, Lunar Lake is pretty much confirmed to have 4P+4E designs with Battlemage grahics engine that made by TSMC N3, the specs match with STX3. So I would hope AMD resume the designs:confused:
We hear about cancellations, if true, well after AMD has done so. This would mean that they altered their plans an are possibly bring another design to the market. Seriously, they may have cancelled it two years ago, and we could just be getting hints that that happened now. All the semi-ODMs have learned how to keep a real tight lid on the details of their designs compared to 20 or even 10 years ago.
 

Anhiel

Member
May 12, 2022
81
34
61
Removing L3 from the main CCD die and increasing the size of L2 would not grow the die. It may even shrink it.

Sharing the L2 by making the other L2's core's L2s victim caches would improve utilization of the L2s and simulate some of the benefits of L3.
The advantage is without question but that's not the point I'm getting at.
Obviously cost and complexity in managing production & packing of another chiplet as well as the disadvantage for low end products having to forgo L3 cache chiplet might be more important practically.

By "early" I'm referring to backside power rail that will come with TSMC's N2 family as outlined in a recent article coincidentally posted the same day (after our posts?) https://www.anandtech.com/show/1883...e-power-delivery-in-2026-n2x-added-to-roadmap
 

Anhiel

Member
May 12, 2022
81
34
61
Yeah, 120W does seem high, even 16-core Zen4 7945HX only requires 75W. Besides high clocks, reasons that I can think of are additional 128-bit memory bus (that would require 32-bit x 4) and huge amount of caches (FYI, M2 Pro has total L2+L3 cache of 60MB, STX Halo has 64MB total cache excluding L2 cache of each CPU cores, I would assume at least 16MB L2 caches, so total cache of Halo would be at least 80MB). That's why I don't believe of 96MB rumors unless Zen5/5c has 2MB of L2 cache each...
The 5000 X3D chiplets only amount to ~3-5W; the ones for 7000 X3D shouldn't be too far off. So their specific TDP was never a major problem but their voltage sensitivity and added heat insulation of the CCD due to being on top.
Everything should be solved with N2 process node. It's too bad we won't see this before Zen6 or Zen6+.
 

soresu

Diamond Member
Dec 19, 2014
4,105
3,563
136
Removing L3 from the main CCD die and increasing the size of L2 would not grow the die. It may even shrink it.
Each 'slice' of Zen4 on die L3$ is more than comparable in area with the cores themselves in this die shot:

zen4 CCD.jpg

If they can find a way to stack all the L3$ above or beneath the CCD without incurring significant power or latency overhead then the potential density increase of CCDs for the same socket package could be at least 50-75% at iso process.
 
  • Like
Reactions: Tlh97

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
^^ With 16 cores and 1 MB L2$/core, sans L3 the CCD would increase the area by ~8%. A 2 MB L2$ would make it 16% (keeping tags on die would increase that). Though, if AMD were to do that, wouldn't it be better to use an SLC (basically memory side cache) on the IOD??
 
  • Like
Reactions: Tlh97

soresu

Diamond Member
Dec 19, 2014
4,105
3,563
136
surely there's another hapless engineer from amd @ linked in who's listed everything in his or her profile going back to their childhoold limeade stand.
Don't believe it for a second.

It came so long after Zen6 would have started its initial design, and Zen5 will be into prefab heavy prep at this point, possibly even very early engineering samples.

AMD just stealth released the info, and it doesn't really compromise anything, just creates futur buzz.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,777
6,791
136
Don't believe it for a second.

It came so long after Zen6 would have started its initial design, and Zen5 will be into prefab heavy prep at this point, possibly even very early engineering samples.

AMD just stealth released the info, and it doesn't really compromise anything, just creates futur buzz.
Zen 5 in post silicon bring up
Zen 6 probably would have already long passed the HLS, RTL tests etc., and deep into physical implementation and pre-silicon validation.
Zen 7 already would be in HLS
Zen 8 in Arch definition phase

Leaks are not much useful to pivot in the short term but could help in long term counters.
Looking at the lead times, it is obvious why having a good Engg. & R&D budget helps because all the phases can run full steam without waiting for the engineers to be relieved from immediate forthcoming product.
 

moustachio

Junior Member
May 19, 2019
1
2
81
2 takeaways for me:
- the new bus, that can scale to 16 unified cores in a single CCD.
- larger L2 without latency penalty being possible in future cores

Two of those combined lead me to believe that AMD plans on dropping L3 entirely from future generations of processors. Unknown if it will be Zen 5 or Zen 6.
Assuming that the information told by AdoredTV is true regarding the L2 cache testing, it seems weird to me for AMD to actually test monolithic chips with different L2 sizes, considering the large design and mask costs associated just for R&D.

The way it is phrased that the "latency penalty is negligible when adding more L2 cache" sounds like a similar type of analysis that would have been done for the 3D stacked L3 cache chiplet by AMD. From all this, it seems to me (as in I am guessing) that AMD are instead testing the stacking of a (or multiple) "L2+L3" cache chiplet(s) on a CCD, hence the necessity to see if latencies change when doing so. Does this seem feasible based on what we know of TSMC's process and AMD current CCD design esp. with TSVs?
 
  • Like
Reactions: Joe NYC and Vattila

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
Don't believe it for a second.

It came so long after Zen6 would have started its initial design, and Zen5 will be into prefab heavy prep at this point, possibly even very early engineering samples.

AMD just stealth released the info, and it doesn't really compromise anything, just creates futur buzz.
you overestimate the common sense of engineers even veterans who don't know how a platform works. any person in the know can figure out what to get from info posted. The linkedin profile was nothing interesting. we've seen what lazy self awareness and security can do to any company. On the other hands, those who want to make a lil green will steal slowly over time. the list of people whom I've worked with over my career who've been arrested, tried and sometimes jailed grows every 4-5 years. some pass through the cracks.
 
Last edited:

naukkis

Golden Member
Jun 5, 2002
1,020
853
136
Removing L3 from the main CCD die and increasing the size of L2 would not grow the die. It may even shrink it.

Sharing the L2 by making the other L2's core's L2s victim caches would improve utilization of the L2s and simulate some of the benefits of L3.

Notice that L3 partion of core ain't just cache - it's a uncore part of CPU integrated into core. It includes interconnection between cores - snoop filter etc. If AMD would like to cut L3 out of their cores and rely just with L2 it would make absolutely no sense at all if they don't simultaneously increase their L2 size to something like L3 is now. Which isn't happening without L1-cache increase.
 

Panino Manino

Golden Member
Jan 28, 2017
1,143
1,383
136
Sorry, it's just that I was reading Anand's Phenom II review and that was this like: "Carrying that further, we may even see future CPUs with more cores add a forth level of cache.".

Anand predicted V-Cache!

But seriously now, It got me thinking.
Zen 4 increased L2, it doubled. There are even rumors I think about increasing L2 even more and I understand the logic. If it's to get rid of the L3 on-die makes sense, but otherwise, no?
By Anand's logic, the more cores you have the less L2 makes sense. With the constant talk of increasing core counts, yes, each core will be working on this own unique task without needing to shame much, so a large L2 is unnecessary, instead a smaller and faster L2 would be preferable. Seems to make sense to me? That increasing L2 at the same time that they cram more cores to the die would be a mistake? Or is that the trick about getting rid of L3? To keep the die-size smaller and be able to use an even larger L3 V-Cache?
 
  • Like
Reactions: Joe NYC

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
More like he predicted either slc (see apple) or l4 cache.
yeah cause the v cache is a bit of l3 slapped onto the core chiplet connected by tsvs. what I don't get is the prior explorations of l4$ by intel were on systems where it was for the igpu and done with edram. from the mtl leaks a week ago it said intel's new approach is where the gpu is blocked from using the l4$, but from my own historic knowledge on l4$ on x86 is that $ level is largely useless. has intel made a breakthrough because all I've seen regarding this new approach is criticisms.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,508
3,190
136
AMD likely still wants to be able to sell SKUs without V-cache for the next few generations. If they expand the L2 to 2MB and go to a larger CCX (perhaps 12 cores?) And also use a si.ilar technique to IBM with L2 cache sharing as a virtual L3, they could achieve a similar effect as having a larger L3 as L2 has lower latency, thus keeping average memory latency in the same ballpark as current non-vcache chips, while allowing for the optional vcache to be an effective L4 cache that has no worse than current latency, save for the snoop cycle on the virtual L3.

While using the L2s as a shared virtual L3 cache would typically require a lot of extra logic, the smaller nodes not shrinking SRAM much, but achieving good gains on other circuit types can make that extra logic less painful.
 
  • Like
Reactions: Joe NYC and Tlh97

BorisTheBlade82

Senior member
May 1, 2020
707
1,130
136
Might someone give a short explainer as to what that "ladder" cache is supposed to be. I mean, the current L3 is already shared on a bidirectional ring.
So might they be adding further links in order to decrease latency and increase bandwidth?