Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 108 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Joe NYC

Diamond Member
Jun 26, 2021
3,650
5,189
136
MALL doesn't need SoIC-X (MI300 MALL is run-thru a bunch of 2.5D interfaces really).

Yeah, seems to be working fine with RDNA 3.

A large one, it's a memory side cache.

Good point

Good lord someone missed his comparch course in college.

I would be curious what the latency MALL cache hit would be vs. L3 cache hit.
(Zen 4's L3 cache latency is 8-9 ns, according to Chips and Cheese.)

Client yes, battery life is nice.
It seems that Strix Halo could, in theory, be using MALL. Not sure about other Strix parts...

Server ehhhh.

Seems like it would be tricky to implement that in an IOD that resembles current Genoa IOD.

But in theory, since the IOD is about 500 mm2, and could grow in Turin and Venice, it could be possible to stack SRAM on top of IOD. TSMC said up to 12 layers of SRAM could be stacked, so say 1 layer per memory channel... 500 mm2 could fit 512 MB of SRAM...

But this would be a massive undertaking...
 
Last edited:

Joe NYC

Diamond Member
Jun 26, 2021
3,650
5,189
136
They sold more 7950x3ds than 13900k per your source, 7900x3d matches the Intel chip in sales but both are still ahead of the vanilla 7950x. For halo chips they seem to have a very solid place in the lineup, that's a lot more than just existing for a marketing goal. And your logic makes the same argument for Intel which I would also completely disagree with, since flagships aren't supposed to be huge sellers in a random month well after launches.

75% of the content of the review videos was a gibberish about which CCD the game would run on. Some "expert" videos even dedicated to this.

What got lost in this gibberish is that AMD has the best gaming CPU, and it deprived AMD of the Hello effect.

If AMD released 7800x3d first, the message would be simple: "7800x3d is the best gaming CPU. Period."

And increased CPU sales would follow.

The interesting part about 7800x3d is that AMD probably makes as much or more money as the higher core, 2 CCD CPUs, but because the price mid range, the potential sales and profits from this CPU are high, and from the subsequent Halo effect, sales of other CPUs would improve too. It did not happen, because AMD marketing is just not good.
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,073
3,897
136
MALL doesn't need SoIC-X (MI300 MALL is run-thru a bunch of 2.5D interfaces really).

A large one, it's a memory side cache.

Good lord someone missed his comparch course in college.

Client yes, battery life is nice.
Server ehhhh.
my assumption is traditional IOD + CCD's on standard organic substrate.
Also while latency will be higher then L3 vcache it will still be alot faster then main memory.
The question then would be around interconnect bandwidth and pj/bit.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Also while latency will be higher then L3 vcache it will still be alot faster then main memory.
The question then would be around interconnect bandwidth and pj/bit.

I think it's more complicated than that. AMD has excellent performance and energy efficiency due to their excellent L3. Stripping chiplet out of L3 would result in huge hits in perf and efficiency.
They would need to increase L2 and maybe even keep some on chiplet L3 to still keep quite a few requests local. Zen5 is rumoured to have 2MB or even 3MB of L3, so that is kinda confirming preparations for MALL cache. It might not come in Z5, maybe in Z6. According to rumors there is some sort of "ladder" L3, so maybe AMD is is just going to keep L3 and throw 3D SRAM on IOD.
 
  • Like
Reactions: Tlh97 and Joe NYC

itsmydamnation

Diamond Member
Feb 6, 2011
3,073
3,897
136
I think it's more complicated than that. AMD has excellent performance and energy efficiency due to their excellent L3. Stripping chiplet out of L3 would result in huge hits in perf and efficiency.
They would need to increase L2 and maybe even keep some on chiplet L3 to still keep quite a few requests local. Zen5 is rumoured to have 2MB or even 3MB of L3, so that is kinda confirming preparations for MALL cache. It might not come in Z5, maybe in Z6. According to rumors there is some sort of "ladder" L3, so maybe AMD is is just going to keep L3 and throw 3D SRAM on IOD.
I'm not talking removal of l3. Only moving of vcache memory controller side and increasing the fabric to handle the extra throughput so you could have a max click symmetric 16core design that still gets alot of the vcache benifits.

Crystalwell on iod if you will.
 
  • Like
Reactions: Tlh97 and Joe NYC

Saylick

Diamond Member
Sep 10, 2012
4,036
9,456
136
That would allow the MALL cache to be shared between multiple CCDs.

Between 2 CCDs each having its own 64MB of L3 as one alternative and MALL having 128MB shared cache as another alternative, MALL would come out ahead in most scenarios as far as achieving cache hits, but there would be small latency hit vs. a cache hit in CCDs own or stacked L3.

If CCD is connected to IOD+MALL using Hybrid Bond bridges, as could happen with Venice, the extra latency would be reduced.

Looking at AMD deploying MALL to client GPU, datacenter GPU / APU in Mi300, I think MALL will be the answer to CPU, both client and server.

As far as when we could see this, definitely not in Zen 5 client, highly unlikely in Zen 5 Turin server (even though there is a new IOD coming)

But I think highly likely with Zen 6, like 90+ % likely.
I think you're onto something here. Rumors do seem to align with this take, where Zen 6 goes all in on silicon bridges to replace the organic substrate Infinity Fabric connection. It's been hinted here that Zen 6 is a big system overhaul (analogous to "Penryn to Nehalem" level of change, implying a cache restructuring) and Kepler concurs w/ the use of silicon bridges. Seeing as how SRAM doesn't scale with advanced nodes, it makes more and more sense to shift the big L3 from the CCD to the IOD, and if the penalty of having the L3 off the CCD is reduced significantly via silicon bridges, it then becomes viable. If you can stack V-cache, which is also on an older node, onto that IOD it nets you a very scalable and cost optimized product.

IMG_20230822_092242.jpg
 
  • Like
Reactions: Tlh97 and Joe NYC

Joe NYC

Diamond Member
Jun 26, 2021
3,650
5,189
136
I think it's more complicated than that. AMD has excellent performance and energy efficiency due to their excellent L3. Stripping chiplet out of L3 would result in huge hits in perf and efficiency.
They would need to increase L2 and maybe even keep some on chiplet L3 to still keep quite a few requests local. Zen5 is rumoured to have 2MB or even 3MB of L3, so that is kinda confirming preparations for MALL cache. It might not come in Z5, maybe in Z6. According to rumors there is some sort of "ladder" L3, so maybe AMD is is just going to keep L3 and throw 3D SRAM on IOD.

That is what I was thinking: substantially increased L2, getting rid of L3, introducing shared MALL and shifting SRAM from L3 to it.

- This would shift latencies, where a lot of former L2 misses and L3 hits would become L2 hits with lower latency. Advantage
- Remainder of former L3 hits would turn into MALL hits, with higher latency. Disadvantge
- Some L3 misses would become MALL hits (since MALL would be shared, better utilized). Advantage
- The rest of cache misses, that go to memory: L3 lookup time was eliminated, but MALL cache lookup was introduced. So depends on how these two compare. So the outcome is unclear.

As far as changing L2 sizes, BTW, that is not happening in Zen 5. These are being sampled already, and someone would have noticed. So maybe Zen 6...

But in summary, it looks quite challenging to overcome the low latency of L3 that Zen has, and additionally the size it can have with addition of V-Cache.
 

adroc_thurston

Diamond Member
Jul 2, 2023
7,108
9,865
106
That is what I was thinking: substantially increased L2, getting rid of L3, introducing shared MALL and shifting SRAM from L3 to it.
Not happening.
So maybe Zen 6...
No.
it looks quite challenging to overcome the low latency of L3 that Zen has
Yea it's the best L3 in the industry and there's no reason to ever get rid of it.
MALL exists for very different reasons (just like every other SLC out there it's a bandwidth ramp and traffic catcher for things not CPU).
 
  • Like
Reactions: Tlh97 and Joe NYC

Joe NYC

Diamond Member
Jun 26, 2021
3,650
5,189
136
I think you're onto something here. Rumors do seem to align with this take, where Zen 6 goes all in on silicon bridges to replace the organic substrate Infinity Fabric connection. It's been hinted here that Zen 6 is a big system overhaul (analogous to "Penryn to Nehalem" level of change, implying a cache restructuring) and Kepler concurs w/ the use of silicon bridges. Seeing as how SRAM doesn't scale with advanced nodes, it makes more and more sense to shift the big L3 from the CCD to the IOD, and if the penalty of having the L3 off the CCD is reduced significantly via silicon bridges, it then becomes viable. If you can stack V-cache, which is also on an older node, onto that IOD it nets you a very scalable and cost optimized product.

View attachment 84713

So, what @Kepler_L2 and @adroc_thurston are saying that the upcoming generations, still in development, only the highest end and lower volume parts (Mi400, Navi5 Halo, Venice) will get the silicon bridges.

And lower priced parts - CPU, GPU client - will get FOWLP - like RDNA3. Client desktop - likely with Zen 6.

So this is probably about cost and capacity TSMC has for Hybrid Bond. When the capacity catches up, another product can adopt it.

As far as technology where the technology is going, these high end Halo products are pointing the way. From the cancelled RDNA4, we already know where GPUs are going, and from Venice, we will see where client desktop may go. So Venice will be the one to watch.

As far as laptop chips, we will see what Strix Halo brings. FOWLP, apparently in this upcoming generation. We will see if monolithic will still be the mainstream laptop part for AMD in Zen 6 generation, at which point, Intel will be onto their 2nd generation chiplet mobile parts.
 

adroc_thurston

Diamond Member
Jul 2, 2023
7,108
9,865
106
only the highest end and lower volume parts (Mi400, Navi5 Halo, Venice) will get the silicon bridges.

And lower priced parts - CPU, GPU client - will get FOWLP - like RDNA3. Client desktop - likely with Zen 6.

So this is probably about cost and capacity TSMC has for Hybrid Bond. When the capacity catches up, another product can adopt it.
Ya obvious things are obvious.
and from Venice, we will see where client desktop may go
No desktop will be a small, least relevant extension of mobile starting with Zen6.
Won't ever have any relation to server anymore.
Period.
 

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
so the dies will only be shared between epyc and thread ripper? doesn't this cost more for amd to develop two different compute dies or are they now financially able to do this or, and big or here is it because amd is seeing some limits years in advance with their current method and they want to explore a more vibrant option for client to kill intel there too?
 

adroc_thurston

Diamond Member
Jul 2, 2023
7,108
9,865
106
  • Like
Reactions: A///

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
Yes.

Yes.

$1.45B in quarterly R&D gotta count for something.

No they're just folding DT into mobile.
ok when you say folding what do you mean, those two will share similar dies with mobile getting a more unique efficient design because it's mobile? does this mean ryzen will get new cores? I assume tr or epyc will get a core count increase to at that that point.
 

adroc_thurston

Diamond Member
Jul 2, 2023
7,108
9,865
106
those two will share similar dies with mobile getting a more unique efficient design because it's mobile
Well Zen6 mobile is quite special, we'll talk about it at a later date.
does this mean ryzen will get new cores
everything is getting new cores, it's Zen6 after all.
I assume tr or epyc will get a core count increase to at that that point.
TR is not a priority but EPYC yes, Venice is another core count bump.