Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 60 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

BorisTheBlade82

Senior member
May 1, 2020
708
1,130
136
I wonder if there is a way to retrofit AM5 socket to move to the Mi300 approach on the client side. Maybe it is not worth it, and maybe AMD will offer alternative to AM5 socket sometimes between Zen 5 and Zen 6. The Strix Halo may potentially be the harbinger of this.
Do you mean the RAM approach or the packaging?
Regarding packaging: Maybe I am oversimplifying things, but anything that happens "above the pins" should be socket agnostic, as long as the socket provides enough signal and power traces as well as enough surface area. So I see no reason, why they should not be able to change the packaging on AM5.
Regarding memory: I expect them to go the Apple route for future mobile SoCs and that is what we hear already. And yes, that won't work for the current socket IMHO. But that's only a problem for OEMs and not so much for customers.
 
Last edited:

Joe NYC

Diamond Member
Jun 26, 2021
3,647
5,186
136
Do you mean the RAM approach or the packaging?
Regarding packaging: Maybe I am oversimplifying things, but anything that happens "above the pins" should be socket agnostic, as long as the socket provides enough singnal and power traces as well as enough surface area. So I see no reason, why they should not be able to change the packaging on AM5.
Regarding memory: I expect them to go the Apple route for future mobile SoCs and that is what we hear already. And yes, that won't work for the current socket IMHO. But that's only a problem for OEMs and not so much for customers.

The rumor has it that Mi300 does not have any local motherboard memory. Only on package HBM memory and CXL links for more external memory.

Remember when Lisa Su said that the package would be the new motherboard? The biggest part of this is to eliminate memory from the motherboard and move it to the package.

Apple client computers have made this move, without suffering any adverse effects. The ability of OEMs to pair AMD CPUs with garbage memory is only to AMD's determent, so having fast enough memory inside the package would be to AMD's benefit.

The memory can be LPDDR5 at the outset, for lowest cost, biggest bang for the buck, and later possibly HBM memory.

So, hopefully, AMD is already planning on this approach for the future generations. We will see if Strix Halo (as was leaked by MLID) will have this sort of socket, or if it is only a notebook chip.
 

BorisTheBlade82

Senior member
May 1, 2020
708
1,130
136
The rumor has it that Mi300 does not have any local motherboard memory. Only on package HBM memory and CXL links for more external memory.

Remember when Lisa Su said that the package would be the new motherboard? The biggest part of this is to eliminate memory from the motherboard and move it to the package.

Apple client computers have made this move, without suffering any adverse effects. The ability of OEMs to pair AMD CPUs with garbage memory is only to AMD's determent, so having fast enough memory inside the package would be to AMD's benefit.

The memory can be LPDDR5 at the outset, for lowest cost, biggest bang for the buck, and later possibly HBM memory.

So, hopefully, AMD is already planning on this approach for the future generations. We will see if Strix Halo (as was leaked by MLID) will have this sort of socket, or if it is only a notebook chip.
Yep, exactly my line of thinking.
And yes, it has the theoretical drawback of not being able to upgrade RAM - but IMHO for low power devices this has already been dead in the water for a couple of years.
 
  • Like
Reactions: Tlh97 and Joe NYC

Tuna-Fish

Golden Member
Mar 4, 2011
1,667
2,532
136
That seems like the most logical progression.

Mi400 will get Zen 5 cores. Cloud providers / hyperscalers start migration from local motherboard memory to pooled CXL memories.
I doubt that will ever happen. Latency is a thing, CXL will mostly be used where large pools of memory are needed, not to replace the local memory. That will migrate to the package.
 

Thibsie

Golden Member
Apr 25, 2017
1,127
1,334
136
I think that is what was meant : memory either on the package or CXL but not on the motheboard itself anymore.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,647
5,186
136
I doubt that will ever happen. Latency is a thing, CXL will mostly be used where large pools of memory are needed, not to replace the local memory. That will migrate to the package.
HBM would be providing the local memory. It is a little limited, at 128 GB so there are some tradeoffs.

HBM3 theoretical maximum for 8 stacks is 512 GB, but that's not what AMD is going to be able to deliver in the first iteration.

The problem for hyperscalers is that the cost of memory. It is the the most expensive part of the server, and it ends up very under-utilized. So a pool of memory serving multiple servers can be a fraction of capacity and fraction of cost. At the cost of extra latency.
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
When Zen 2 came out in 2019, it was a radical change when it comes to CPU packaging as well as core scaling with SCF/SDF.
5 years later in 2024 we would hope something new will come up to address the short comings of this technology for the next gen CPUs
  • Latency is a known issue with AMD's IFOP and for addressing that few LinkedIn posts put it at 64 Gbps for next gen IF. This is a big jump and could have a major efficiency impact if same IFOP is used going forward.
  • Bandwidth was not an issue with earlier cores, but Zen 4 showed signs of bandwidth deficiency in many workloads with the 36 Gbps IFOP.
  • Power, well 0.4 pJ/bit for MCD links vs 2pJ/bit speaks for itself. GLink is being advertised as 0.25 pJ/bit
  • Die area being consumed on Zen 4 for IFOP is ~7mm2 in a 66mm2 chip (excluding the SDF/SCF that is part of L3), that is 10% of a potential N4P and soon N3E Si. GCD-MCD links have demonstrated smaller beachfront for higher bandwidth density. GuCs GLink for instance needs 3mm of beachfront to provide 7.5 Tbps of BW.
  • Trace density from the IOD to the CCDs and IO. It seems there is already a limit reached how much space is available to route the signals from IOD to CCD considering the space is also needed for IO/Memory etc traces
AMD will have to address the above problems with a new interconnect, even their competitor is using much more exotic BE packaging in current gen products. But I wouldn't hold my breath if next gen CPUs are stuck on same tech.
As for the costs, AMD is doing fanout links on a 750 USD GPU product in which the actual chip should be sold at half of that if not less, and with 6 of these links. And SoIC parts like 5800X3D selling for less than 300 USD.


I noticed one patent which attempted to address this issue by using normal fanout when the CCDs are single column on each side of the IOD and adding bridges in the fanout when multi columns of CCDs are there. so basically same CCD/IOD can be used but packaged differently for different configs.
https://www.freepatentsonline.com/11469183.html
View attachment 81610

Rumors of MI300C have been floating around, lets see if this is real in couple of days. It could be a precursor.
I remember an interview with an AMD person discussing this briefly quite a while ago, but I don't remember who it was or when. Such long bridges sounded problematic. It might be that they still run a SerDes type link, but run it through the embedded silicon to reduce power and increase speed?

If I am remembering correctly, the memory is currently routed under the chiplets with the IO being on the other 2 edges without compute die for Epyc. The routing required on package is getting too complicated and going to next gen at higher speed will take a lot of power. I hope infinity fabric fan out used for MCD/GCD gets used other places. I haven't read much about strix halo. I have said in the past that it may make sense to make an APU with an infinity fabric connection for other chiplets. That would work great if they can use infinity fabric fan out rather than waste power with regular IFOP links. That would allow them to make something like an 8 core APU with low power cores and then connect several different chip types, like a high performance 8 core chiplet, a 16 core dense chiplet, or maybe even a gpu or other accelerator chiplet. I have also wondered if they might use an MCD in such a situation, perhaps with a LPDDR controller rather than a GDDR controller. Connecting that together with regular IFOP likely would take too much power, but it may be reasonable if infinity fabric fan out can be used. There isn't any adjacency issues really, it would just be memory, MCD, APU, then accessory chip kind of in a line.
 
  • Like
Reactions: Tlh97 and Joe NYC

jamescox

Senior member
Nov 11, 2009
644
1,105
136
HBM would be providing the local memory. It is a little limited, at 128 GB so there are some tradeoffs.

HBM3 theoretical maximum for 8 stacks is 512 GB, but that's not what AMD is going to be able to deliver in the first iteration.

The problem for hyperscalers is that the cost of memory. It is the the most expensive part of the server, and it ends up very under-utilized. So a pool of memory serving multiple servers can be a fraction of capacity and fraction of cost. At the cost of extra latency.
AMD may be at a disadvantage if they do not have anything other than HBM on package and can't connect off package memory other than CXL. Grace-hopper has up to 512 GB LPDDR5x per module and up to 96 GB HBM3. AMD may only have 128 GB HBM3. Even in a 4s system, this would only be 512 GB vs. 2 TB for nvidia. I don't know if Nvidia and AMD are using the HBM in exactly the same way though.
 
  • Like
Reactions: Tlh97 and Joe NYC

itsmydamnation

Diamond Member
Feb 6, 2011
3,072
3,897
136
AMD may be at a disadvantage if they do not have anything other than HBM on package and can't connect off package memory other than CXL. Grace-hopper has up to 512 GB LPDDR5x per module and up to 96 GB HBM3. AMD may only have 128 GB HBM3. Even in a 4s system, this would only be 512 GB vs. 2 TB for nvidia. I don't know if Nvidia and AMD are using the HBM in exactly the same way though.
We sure they don't have flat mounted mrdimm or something like that under those hat sinks?
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
We sure they don't have flat mounted mrdimm or something like that under those hat sinks?
It looks like these are likely to be used in 8 socket systems, so that will be 1 TB or 1.5 TB for the 192 GB gpu only version. This may still be problematic for some HPC applications where they need to hold a large amount of data in memory. Multiple TB of system memory is sometimes required. With the HBM acting as cache, having some CXL to back it may work in some cases. I don't really know how the gpu only systems will be set-up. The 8 socket GPU only module looks relatively compact, but I am not that familiar with the OCP/OAM form factor. I was wondering if the GPU only 8-way node would be paired with SP5 processors for up to 12 TB of memory with 2 sockets.
 

moinmoin

Diamond Member
Jun 1, 2017
5,242
8,456
136
The AMD Instinct Infinity 8x MI300X OAM UBB Platform doesn't look like it would cover the CPUs in any way which makes the interface to other boards crucial.

Photo from StH:
image_2023-06-15_11569sewx.png
 
  • Like
Reactions: Exist50 and Joe NYC

Joe NYC

Diamond Member
Jun 26, 2021
3,647
5,186
136
AMD may be at a disadvantage if they do not have anything other than HBM on package and can't connect off package memory other than CXL. Grace-hopper has up to 512 GB LPDDR5x per module and up to 96 GB HBM3. AMD may only have 128 GB HBM3. Even in a 4s system, this would only be 512 GB vs. 2 TB for nvidia. I don't know if Nvidia and AMD are using the HBM in exactly the same way though.
Theoretical max for HBM3 memory is 64 GB per stack, which would be 512 GB for the Mi300.

The way to get from 128 GB to 512 GB is to double the memory chip size and then to double the stack height from 8-high to 16-high.

AMD is apparently going to use Samsung for HBM, while NVidia is using Hynix. Hynix has been leading the HBM area, but Samsung is right behind.

Samsung is planning production of 12-high stack for Q4 - which coincides with Mi300x ramp, and what allows the 192 GB capacity.

The way AMD is planning on selling Mi300x, with 2 Genoa sockets and 8 Mi300x, it competes ok with NVidia system of 2 Sapphire Rapids (8 memory channels vs. 12 for Genoa) and also on on GPU memory, where Mi300x would be clearly ahead in memory capacity.

On the APU side Grace Hopper would have some extra memory attached to Grace CPU, but since the memory is not unified, there would be duplication of copying back and forth to Hopper. But still, NVidia will have more memory when it is released. We will see how customers perceive the tradeoffs.

AMD has not mentioned Mi300c, the CPU only version. So it may come after Mi300a and Mi300x. If it does, then it will likely have an option of using 12-high stack with 192 GB of memory.

When it comes to Zen 5 (Turin), from some leaks, it seems it is moving forward timely, and we may see it out in H1 2024.

And then, the next gen Mi400 likely in H2 2024. By then, the memory makers may be able to have the 48-64 GB HBM stacks, so we could have 512 GB on Mi400.

The time to pull the plug on local motherboard memory is approaching. Faster on the datacenter side of things.
 
Last edited:

Kepler_L2

Senior member
Sep 6, 2020
998
4,262
136
Theoretical max for HBM3 memory is 32 GB per stack, which would be 512 GB for the Mi300.

The way to get from 128 GB to 512 GB is to double the memory chip size and then to double the stack height from 8-high to 16-high.

AMD is apparently going to use Samsung for HBM, while NVidia is using Hynix. Hynix has been leading the HBM area, but Samsung is right behind.

Samsung is planning production of 12-high stack for Q4 - which coincides with Mi300x ramp, and what allows the 192 GB capacity.

The way AMD is planning on selling Mi300x, with 2 Genoa sockets and 8 Mi300x, it competes ok with NVidia system of 2 Sapphire Rapids (8 memory channels vs. 12 for Genoa) and also on on GPU memory, where Mi300x would be clearly ahead in memory capacity.

On the APU side Grace Hopper would have some extra memory attached to Grace CPU, but since the memory is not unified, there would be duplication of copying back and forth to Hopper. But still, NVidia will have more memory when it is released. We will see how customers perceive the tradeoffs.

AMD has not mentioned Mi300c, the CPU only version. So it may come after Mi300a and Mi300x. If it does, then it will likely have an option of using 12-high stack with 192 GB of memory.

When it comes to Zen 5 (Turin), from some leaks, it seems it is moving forward timely, and we may see it out in H1 2024.

And then, the next gen Mi400 likely in H2 2024. By then, the memory makers may be able to have the 48-64 GB HBM stacks, so we could have 512 GB on Mi400.

The time to pull the plug on local motherboard memory is approaching. Faster on the datacenter side of things.
MI400 is H2 2025 at best.
 
  • Like
Reactions: Tlh97 and Joe NYC

Joe NYC

Diamond Member
Jun 26, 2021
3,647
5,186
136
MI400 is H2 2025 at best.
That would be 2 year gap again - not optimal.

Any ideas of what features it could have that are not present in Mi300.

Certain things are obvious:
- more advanced process nodes for all the compute units
- latest gen CPU and GPU CCDs
- latest iteration of HBM, with greater capacity

It seems to me that AMD could, in theory, release a new version in Mi line that would have Zen 5 Dense cores on N3 (doubling the CPU core count) and possibly the GPU unit also on N3, with a modest increase in HBM specs.

Eventually, in future generations, AMD is probably aiming to having all of the internal links being SoIC and having some optical links to the outside world...
 
  • Love
  • Like
Reactions: Tlh97 and A///

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
Joe I agree. Recently I was reading some translated chatter of zen 5 coming sooner than most of us were anticipating many months ago. that 2h target may be closer to virgin time than later in the year near the end. arrow lake will have some stiff competition but i'm not expecting it to be significantly faster than this raptor lake fresher upper coming out soon. contrary to the wide jawed fool and the greasy doughy lad, I expect arrow lake to be anywhere from 5-12% behind Zen 5 in performance while pushing higher watts regardless of any fancy footwork from intel's behalf getting it under control which they'd soon violate.
 
  • Like
Reactions: Joe NYC

eek2121

Diamond Member
Aug 2, 2005
3,410
5,049
136
Joe I agree. Recently I was reading some translated chatter of zen 5 coming sooner than most of us were anticipating many months ago. that 2h target may be closer to virgin time than later in the year near the end. arrow lake will have some stiff competition but i'm not expecting it to be significantly faster than this raptor lake fresher upper coming out soon. contrary to the wide jawed fool and the greasy doughy lad, I expect arrow lake to be anywhere from 5-12% behind Zen 5 in performance while pushing higher watts regardless of any fancy footwork from intel's behalf getting it under control which they'd soon violate.
Arrow Lake will be on a superior node. It will very likely beat Zen 5 in terms of perf/watt unless AMD completely switches up the design.

I have seen a few indications that Zen 5 might be “more of the same”. Disappointing, if true. I hope to see further innovations from AMD. The multidie and multichiplet designs were both incredible.
 

soresu

Diamond Member
Dec 19, 2014
4,105
3,566
136
That would be 2 year gap again - not optimal.

Any ideas of what features it could have that are not present in Mi300.

Certain things are obvious:
- more advanced process nodes for all the compute units
- latest gen CPU and GPU CCDs
- latest iteration of HBM, with greater capacity

It seems to me that AMD could, in theory, release a new version in Mi line that would have Zen 5 Dense cores on N3 (doubling the CPU core count) and possibly the GPU unit also on N3, with a modest increase in HBM specs.

Eventually, in future generations, AMD is probably aiming to having all of the internal links being SoIC and having some optical links to the outside world...
Is there still no sign of HBM in the consumer CPU space?

You would think with HMC going belly up that the HBM people would exploit this.
 

soresu

Diamond Member
Dec 19, 2014
4,105
3,566
136
unless AMD completely switches up the design
From the way Papermaster has been talking it up years in advance it is either a Bulldozer level flop from switching to a snazzy but unproven µArch - or a significant leap forward, even if only at a foundational level for future µArch improvements, while giving a decent 1st gen 15-20% improvement over previous Zen designs.

The latter seems much more likely - I don't think that AMD would risk a big change at the core µArch level after the roiling hellscape that was Bulldozer's CMT.
I hope to see further innovations from AMD. The multidie and multichiplet designs were both incredible.
On this score you can be certain AMD has much planned in advance.

There were slides from them some time ago outlining possible future 3D packaging and logic designs including monolithic multi layer 3D logic (and likely memory too) which would seem like the natural way forward given their past moves.

I'm most interested to see if they shake up the V cache game with non SRAM cache designs now that they have started the ball rolling on decoupling cache again from the main die - newer MRAM types like Spin Orbit Torque could do well vs SRAM in L3 while dramatically reducing die area.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,647
5,186
136
Is there still no sign of HBM in the consumer CPU space?

You would think with HMC going belly up that the HBM people would exploit this.
The datacenter GPU is in such a state of feeding frenzy that HBM will continue to be priced at very high premiums.

Which is bad new short term.

But the volume of HBM production is going to get to such a high level that the costs of production will continue to decline, maybe getting down to the consumer price range.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
I'm most interested to see if they shake up the V cache game with non SRAM cache designs now that they have started the ball rolling on decoupling cache again from the main die - newer MRAM types like Spin Orbit Torque could do well vs SRAM in L3 while dramatically reducing die area.
They'll need SRAM (or something equivalently low latency) for anything up through L3 cache, but if they add an additional memory side cache, they could afford to focus more on capacity. That could be an opportunity for different technology. Probably would be a volatile memory, however. Doubt MRAM has any appeal.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
Is there still no sign of HBM in the consumer CPU space?

You would think with HMC going belly up that the HBM people would exploit this.
No, and I would be shocked if we ever do see it. SoC designers have plenty of choices with stacked SRAM (like AMD's v-cache), which improves performance at a lower cost than putting HBM on package. Or, if you mean as main system RAM, as pointed out, it is just too expensive for that - DRAM is produced in massive quantities.
 

soresu

Diamond Member
Dec 19, 2014
4,105
3,566
136
The datacenter GPU is in such a state of feeding frenzy that HBM will continue to be priced at very high premiums.

Which is bad new short term.

But the volume of HBM production is going to get to such a high level that the costs of production will continue to decline, maybe getting down to the consumer price range.
Ah interesting, I guess I'd just figured in the 6 years since Vega it would already have been scaled up enough for that, put a pin in that for now then 😅

Hopefully 3D DRAM comes along in the interim and scales up the per die capacity too - as it is now it's going to need at least 4 stacks just to reach the DDR4 maximum on consumer platforms (TR excluded).
 
  • Like
Reactions: Joe NYC

soresu

Diamond Member
Dec 19, 2014
4,105
3,566
136
They'll need SRAM (or something equivalently low latency) for anything up through L3 cache, but if they add an additional memory side cache, they could afford to focus more on capacity. That could be an opportunity for different technology. Probably would be a volatile memory, however. Doubt MRAM has any appeal.
MRAM comes in several flavors and the latest like SOT MRAM are already pushing into SRAM's latency category for L3 cache as of 2015.

If they can crack the voltage gate assisted SOT MRAM variant (VG-SOT MRAM) then they could get all of that at even lower power too with field free switching - such as that demonstrated by IMEC back in February: