Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

A/// · Aug 22, 2023

Joe is a film editor iirc.

Joe NYC · Aug 22, 2023

adroc_thurston said:
MALL doesn't need SoIC-X (MI300 MALL is run-thru a bunch of 2.5D interfaces really).

Yeah, seems to be working fine with RDNA 3.

adroc_thurston said:
A large one, it's a memory side cache.

Good point

adroc_thurston said:
Good lord someone missed his comparch course in college.

I would be curious what the latency MALL cache hit would be vs. L3 cache hit.
(Zen 4's L3 cache latency is 8-9 ns, according to Chips and Cheese.)

adroc_thurston said:
Client yes, battery life is nice.

It seems that Strix Halo could, in theory, be using MALL. Not sure about other Strix parts...

adroc_thurston said:
Server ehhhh.

Seems like it would be tricky to implement that in an IOD that resembles current Genoa IOD.

But in theory, since the IOD is about 500 mm2, and could grow in Turin and Venice, it could be possible to stack SRAM on top of IOD. TSMC said up to 12 layers of SRAM could be stacked, so say 1 layer per memory channel... 500 mm2 could fit 512 MB of SRAM...

But this would be a massive undertaking...

Joe NYC · Aug 22, 2023

A/// said:
Joe is a film editor iirc.

Not exactly. And I may have taken some sort of elementary Computer Architecture 101 class, but it was before they had caches.

Joe NYC · Aug 22, 2023

dr1337 said:
They sold more 7950x3ds than 13900k per your source, 7900x3d matches the Intel chip in sales but both are still ahead of the vanilla 7950x. For halo chips they seem to have a very solid place in the lineup, that's a lot more than just existing for a marketing goal. And your logic makes the same argument for Intel which I would also completely disagree with, since flagships aren't supposed to be huge sellers in a random month well after launches.

75% of the content of the review videos was a gibberish about which CCD the game would run on. Some "expert" videos even dedicated to this.

What got lost in this gibberish is that AMD has the best gaming CPU, and it deprived AMD of the Hello effect.

If AMD released 7800x3d first, the message would be simple: "7800x3d is the best gaming CPU. Period."

And increased CPU sales would follow.

The interesting part about 7800x3d is that AMD probably makes as much or more money as the higher core, 2 CCD CPUs, but because the price mid range, the potential sales and profits from this CPU are high, and from the subsequent Halo effect, sales of other CPUs would improve too. It did not happen, because AMD marketing is just not good.

A/// · Aug 22, 2023

Joe NYC said:
Not exactly. And I may have taken some sort of elementary Computer Architecture 101 class, but it was before they had caches.

oh sorry I always mix you up with someone else on an old site who went by the same name.

itsmydamnation · Aug 22, 2023

adroc_thurston said:
MALL doesn't need SoIC-X (MI300 MALL is run-thru a bunch of 2.5D interfaces really).

A large one, it's a memory side cache.

Good lord someone missed his comparch course in college.

Client yes, battery life is nice.
Server ehhhh.

my assumption is traditional IOD + CCD's on standard organic substrate.
Also while latency will be higher then L3 vcache it will still be alot faster then main memory.
The question then would be around interconnect bandwidth and pj/bit.

JoeRambo · Aug 22, 2023

itsmydamnation said:
Also while latency will be higher then L3 vcache it will still be alot faster then main memory.
The question then would be around interconnect bandwidth and pj/bit.

I think it's more complicated than that. AMD has excellent performance and energy efficiency due to their excellent L3. Stripping chiplet out of L3 would result in huge hits in perf and efficiency.
They would need to increase L2 and maybe even keep some on chiplet L3 to still keep quite a few requests local. Zen5 is rumoured to have 2MB or even 3MB of L3, so that is kinda confirming preparations for MALL cache. It might not come in Z5, maybe in Z6. According to rumors there is some sort of "ladder" L3, so maybe AMD is is just going to keep L3 and throw 3D SRAM on IOD.

itsmydamnation · Aug 22, 2023

JoeRambo said:
I think it's more complicated than that. AMD has excellent performance and energy efficiency due to their excellent L3. Stripping chiplet out of L3 would result in huge hits in perf and efficiency.
They would need to increase L2 and maybe even keep some on chiplet L3 to still keep quite a few requests local. Zen5 is rumoured to have 2MB or even 3MB of L3, so that is kinda confirming preparations for MALL cache. It might not come in Z5, maybe in Z6. According to rumors there is some sort of "ladder" L3, so maybe AMD is is just going to keep L3 and throw 3D SRAM on IOD.

I'm not talking removal of l3. Only moving of vcache memory controller side and increasing the fabric to handle the extra throughput so you could have a max click symmetric 16core design that still gets alot of the vcache benifits.

Crystalwell on iod if you will.

igor_kavinski · Aug 22, 2023

itsmydamnation said:
Crystalwell on iod if you will.

Curious that I've never heard of AMD's codename for their V-cache. There's gotta be something they lovingly call it in their offices.

adroc_thurston · Aug 22, 2023

Joe NYC said:
Not sure about other Strix parts

They don't have any.

itsmydamnation said:
my assumption is traditional IOD + CCD's on standard organic substrate

For Zen6 parts?
Hahahahahha

Saylick · Aug 22, 2023

Joe NYC said:
That would allow the MALL cache to be shared between multiple CCDs.

Between 2 CCDs each having its own 64MB of L3 as one alternative and MALL having 128MB shared cache as another alternative, MALL would come out ahead in most scenarios as far as achieving cache hits, but there would be small latency hit vs. a cache hit in CCDs own or stacked L3.

If CCD is connected to IOD+MALL using Hybrid Bond bridges, as could happen with Venice, the extra latency would be reduced.

Looking at AMD deploying MALL to client GPU, datacenter GPU / APU in Mi300, I think MALL will be the answer to CPU, both client and server.

As far as when we could see this, definitely not in Zen 5 client, highly unlikely in Zen 5 Turin server (even though there is a new IOD coming)

But I think highly likely with Zen 6, like 90+ % likely.

I think you're onto something here. Rumors do seem to align with this take, where Zen 6 goes all in on silicon bridges to replace the organic substrate Infinity Fabric connection. It's been hinted here that Zen 6 is a big system overhaul (analogous to "Penryn to Nehalem" level of change, implying a cache restructuring) and Kepler concurs w/ the use of silicon bridges. Seeing as how SRAM doesn't scale with advanced nodes, it makes more and more sense to shift the big L3 from the CCD to the IOD, and if the penalty of having the L3 off the CCD is reduced significantly via silicon bridges, it then becomes viable. If you can stack V-cache, which is also on an older node, onto that IOD it nets you a very scalable and cost optimized product.

adroc_thurston · Aug 22, 2023

Saylick said:
where Zen 6 goes all in on silicon bridges to replace the organic substrate Infinity Fabric connection.

No.
Not the client parts anyway.
Client will look like STX-halo which in itself is a peek into the future™.
Which also means, yes, Apple-style eDTC spam!
So cute.

A/// · Aug 22, 2023

igor_kavinski said:
Curious that I've never heard of AMD's codename for their V-cache. There's gotta be something they lovingly call it in their offices.

big papa

Joe NYC · Aug 22, 2023

JoeRambo said:
I think it's more complicated than that. AMD has excellent performance and energy efficiency due to their excellent L3. Stripping chiplet out of L3 would result in huge hits in perf and efficiency.
They would need to increase L2 and maybe even keep some on chiplet L3 to still keep quite a few requests local. Zen5 is rumoured to have 2MB or even 3MB of L3, so that is kinda confirming preparations for MALL cache. It might not come in Z5, maybe in Z6. According to rumors there is some sort of "ladder" L3, so maybe AMD is is just going to keep L3 and throw 3D SRAM on IOD.

That is what I was thinking: substantially increased L2, getting rid of L3, introducing shared MALL and shifting SRAM from L3 to it.

- This would shift latencies, where a lot of former L2 misses and L3 hits would become L2 hits with lower latency. Advantage
- Remainder of former L3 hits would turn into MALL hits, with higher latency. Disadvantge
- Some L3 misses would become MALL hits (since MALL would be shared, better utilized). Advantage
- The rest of cache misses, that go to memory: L3 lookup time was eliminated, but MALL cache lookup was introduced. So depends on how these two compare. So the outcome is unclear.

As far as changing L2 sizes, BTW, that is not happening in Zen 5. These are being sampled already, and someone would have noticed. So maybe Zen 6...

But in summary, it looks quite challenging to overcome the low latency of L3 that Zen has, and additionally the size it can have with addition of V-Cache.

adroc_thurston · Aug 22, 2023

Joe NYC said:
That is what I was thinking: substantially increased L2, getting rid of L3, introducing shared MALL and shifting SRAM from L3 to it.

Not happening.

Joe NYC said:
So maybe Zen 6...

No.

Joe NYC said:
it looks quite challenging to overcome the low latency of L3 that Zen has

Yea it's the best L3 in the industry and there's no reason to ever get rid of it.
MALL exists for very different reasons (just like every other SLC out there it's a bandwidth ramp and traffic catcher for things not CPU).

Joe NYC · Aug 22, 2023

Saylick said:
I think you're onto something here. Rumors do seem to align with this take, where Zen 6 goes all in on silicon bridges to replace the organic substrate Infinity Fabric connection. It's been hinted here that Zen 6 is a big system overhaul (analogous to "Penryn to Nehalem" level of change, implying a cache restructuring) and Kepler concurs w/ the use of silicon bridges. Seeing as how SRAM doesn't scale with advanced nodes, it makes more and more sense to shift the big L3 from the CCD to the IOD, and if the penalty of having the L3 off the CCD is reduced significantly via silicon bridges, it then becomes viable. If you can stack V-cache, which is also on an older node, onto that IOD it nets you a very scalable and cost optimized product.

View attachment 84713

So, what @Kepler_L2 and @adroc_thurston are saying that the upcoming generations, still in development, only the highest end and lower volume parts (Mi400, Navi5 Halo, Venice) will get the silicon bridges.

And lower priced parts - CPU, GPU client - will get FOWLP - like RDNA3. Client desktop - likely with Zen 6.

So this is probably about cost and capacity TSMC has for Hybrid Bond. When the capacity catches up, another product can adopt it.

As far as technology where the technology is going, these high end Halo products are pointing the way. From the cancelled RDNA4, we already know where GPUs are going, and from Venice, we will see where client desktop may go. So Venice will be the one to watch.

As far as laptop chips, we will see what Strix Halo brings. FOWLP, apparently in this upcoming generation. We will see if monolithic will still be the mainstream laptop part for AMD in Zen 6 generation, at which point, Intel will be onto their 2nd generation chiplet mobile parts.

igor_kavinski · Aug 22, 2023

A/// said:
big papa

BPWBB <<<< I think you can decipher that 😛

A/// · Aug 22, 2023

igor_kavinski said:
BPWBB <<<< I think you can decipher that 😛

You sir have strange fetishes.

adroc_thurston · Aug 22, 2023

Joe NYC said:
only the highest end and lower volume parts (Mi400, Navi5 Halo, Venice) will get the silicon bridges.

And lower priced parts - CPU, GPU client - will get FOWLP - like RDNA3. Client desktop - likely with Zen 6.

So this is probably about cost and capacity TSMC has for Hybrid Bond. When the capacity catches up, another product can adopt it.

Ya obvious things are obvious.

Joe NYC said:
and from Venice, we will see where client desktop may go

No desktop will be a small, least relevant extension of mobile starting with Zen6.
Won't ever have any relation to server anymore.
Period.

A/// · Aug 22, 2023

adroc_thurston said:
o desktop will be a small, least relevant extension of mobile starting with Zen6.
Won't ever have any relation to server anymore.
Period.

with zen 6 the compute aka core dies will not be the same from epyc down to ryzen?

adroc_thurston · Aug 22, 2023

A/// said:
with zen 6 the compute dies will not be the same from epyc down to ryzen?

No.

A/// · Aug 22, 2023

adroc_thurston said:
No.

so the dies will only be shared between epyc and thread ripper? doesn't this cost more for amd to develop two different compute dies or are they now financially able to do this or, and big or here is it because amd is seeing some limits years in advance with their current method and they want to explore a more vibrant option for client to kill intel there too?

adroc_thurston · Aug 22, 2023

A/// said:
so the dies will only be shared between epyc and thread ripper?

Yes.

A/// said:
doesn't this cost more for amd to develop two different compute dies

Yes.

A/// said:
or are they now financially able to do this

$1.45B in quarterly R&D gotta count for something.

A/// said:
they want to explore a more vibrant option for client to kill intel there too

No they're just folding DT into mobile.

A/// · Aug 22, 2023

adroc_thurston said:
Yes.

Yes.

$1.45B in quarterly R&D gotta count for something.

No they're just folding DT into mobile.

ok when you say folding what do you mean, those two will share similar dies with mobile getting a more unique efficient design because it's mobile? does this mean ryzen will get new cores? I assume tr or epyc will get a core count increase to at that that point.

adroc_thurston · Aug 22, 2023

A/// said:
those two will share similar dies with mobile getting a more unique efficient design because it's mobile

Well Zen6 mobile is quite special, we'll talk about it at a later date.

A/// said:
does this mean ryzen will get new cores

everything is getting new cores, it's Zen6 after all.

A/// said:
I assume tr or epyc will get a core count increase to at that that point.

TR is not a priority but EPYC yes, Venice is another core count bump.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member