Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

BorisTheBlade82 · Jun 12, 2023

Joe NYC said:
I wonder if there is a way to retrofit AM5 socket to move to the Mi300 approach on the client side. Maybe it is not worth it, and maybe AMD will offer alternative to AM5 socket sometimes between Zen 5 and Zen 6. The Strix Halo may potentially be the harbinger of this.

Do you mean the RAM approach or the packaging?
Regarding packaging: Maybe I am oversimplifying things, but anything that happens "above the pins" should be socket agnostic, as long as the socket provides enough signal and power traces as well as enough surface area. So I see no reason, why they should not be able to change the packaging on AM5.
Regarding memory: I expect them to go the Apple route for future mobile SoCs and that is what we hear already. And yes, that won't work for the current socket IMHO. But that's only a problem for OEMs and not so much for customers.

Joe NYC · Jun 12, 2023

BorisTheBlade82 said:
Do you mean the RAM approach or the packaging?
Regarding packaging: Maybe I am oversimplifying things, but anything that happens "above the pins" should be socket agnostic, as long as the socket provides enough singnal and power traces as well as enough surface area. So I see no reason, why they should not be able to change the packaging on AM5.
Regarding memory: I expect them to go the Apple route for future mobile SoCs and that is what we hear already. And yes, that won't work for the current socket IMHO. But that's only a problem for OEMs and not so much for customers.

The rumor has it that Mi300 does not have any local motherboard memory. Only on package HBM memory and CXL links for more external memory.

Remember when Lisa Su said that the package would be the new motherboard? The biggest part of this is to eliminate memory from the motherboard and move it to the package.

Apple client computers have made this move, without suffering any adverse effects. The ability of OEMs to pair AMD CPUs with garbage memory is only to AMD's determent, so having fast enough memory inside the package would be to AMD's benefit.

The memory can be LPDDR5 at the outset, for lowest cost, biggest bang for the buck, and later possibly HBM memory.

So, hopefully, AMD is already planning on this approach for the future generations. We will see if Strix Halo (as was leaked by MLID) will have this sort of socket, or if it is only a notebook chip.

BorisTheBlade82 · Jun 12, 2023

Joe NYC said:
The rumor has it that Mi300 does not have any local motherboard memory. Only on package HBM memory and CXL links for more external memory.

Remember when Lisa Su said that the package would be the new motherboard? The biggest part of this is to eliminate memory from the motherboard and move it to the package.

Apple client computers have made this move, without suffering any adverse effects. The ability of OEMs to pair AMD CPUs with garbage memory is only to AMD's determent, so having fast enough memory inside the package would be to AMD's benefit.

The memory can be LPDDR5 at the outset, for lowest cost, biggest bang for the buck, and later possibly HBM memory.

So, hopefully, AMD is already planning on this approach for the future generations. We will see if Strix Halo (as was leaked by MLID) will have this sort of socket, or if it is only a notebook chip.

Yep, exactly my line of thinking.
And yes, it has the theoretical drawback of not being able to upgrade RAM - but IMHO for low power devices this has already been dead in the water for a couple of years.

DisEnchantment · Jun 12, 2023

Kepler_L2 said:
Zen5 won't change anything in this regard, but AFAIK Zen6 and onwards will use MI300/400 style packaging.

That could be likely, any changes would likely be triggered by DC and AI first.,

Any idea about this snippet below I screen shotted from LinkedIn?

Kepler_L2 · Jun 12, 2023

DisEnchantment said:
That could be likely, any changes would likely be triggered by DC and AI first.,

Any idea about this snippet below I screen shotted from LinkedIn?
View attachment 81628

You mean this?

Tuna-Fish · Jun 12, 2023

Joe NYC said:
That seems like the most logical progression.

Mi400 will get Zen 5 cores. Cloud providers / hyperscalers start migration from local motherboard memory to pooled CXL memories.

I doubt that will ever happen. Latency is a thing, CXL will mostly be used where large pools of memory are needed, not to replace the local memory. That will migrate to the package.

Thibsie · Jun 12, 2023

I think that is what was meant : memory either on the package or CXL but not on the motheboard itself anymore.

Joe NYC · Jun 12, 2023

Tuna-Fish said:
I doubt that will ever happen. Latency is a thing, CXL will mostly be used where large pools of memory are needed, not to replace the local memory. That will migrate to the package.

HBM would be providing the local memory. It is a little limited, at 128 GB so there are some tradeoffs.

HBM3 theoretical maximum for 8 stacks is 512 GB, but that's not what AMD is going to be able to deliver in the first iteration.

The problem for hyperscalers is that the cost of memory. It is the the most expensive part of the server, and it ends up very under-utilized. So a pool of memory serving multiple servers can be a fraction of capacity and fraction of cost. At the cost of extra latency.

jamescox · Jun 12, 2023

DisEnchantment said:
When Zen 2 came out in 2019, it was a radical change when it comes to CPU packaging as well as core scaling with SCF/SDF.
5 years later in 2024 we would hope something new will come up to address the short comings of this technology for the next gen CPUs

Latency is a known issue with AMD's IFOP and for addressing that few LinkedIn posts put it at 64 Gbps for next gen IF. This is a big jump and could have a major efficiency impact if same IFOP is used going forward.

Bandwidth was not an issue with earlier cores, but Zen 4 showed signs of bandwidth deficiency in many workloads with the 36 Gbps IFOP.

Power, well 0.4 pJ/bit for MCD links vs 2pJ/bit speaks for itself. GLink is being advertised as 0.25 pJ/bit

Die area being consumed on Zen 4 for IFOP is ~7mm2 in a 66mm2 chip (excluding the SDF/SCF that is part of L3), that is 10% of a potential N4P and soon N3E Si. GCD-MCD links have demonstrated smaller beachfront for higher bandwidth density. GuCs GLink for instance needs 3mm of beachfront to provide 7.5 Tbps of BW.

Trace density from the IOD to the CCDs and IO. It seems there is already a limit reached how much space is available to route the signals from IOD to CCD considering the space is also needed for IO/Memory etc traces

AMD will have to address the above problems with a new interconnect, even their competitor is using much more exotic BE packaging in current gen products. But I wouldn't hold my breath if next gen CPUs are stuck on same tech.
As for the costs, AMD is doing fanout links on a 750 USD GPU product in which the actual chip should be sold at half of that if not less, and with 6 of these links. And SoIC parts like 5800X3D selling for less than 300 USD.

I noticed one patent which attempted to address this issue by using normal fanout when the CCDs are single column on each side of the IOD and adding bridges in the fanout when multi columns of CCDs are there. so basically same CCD/IOD can be used but packaged differently for different configs.
https://www.freepatentsonline.com/11469183.html
View attachment 81610

Rumors of MI300C have been floating around, lets see if this is real in couple of days. It could be a precursor.

I remember an interview with an AMD person discussing this briefly quite a while ago, but I don't remember who it was or when. Such long bridges sounded problematic. It might be that they still run a SerDes type link, but run it through the embedded silicon to reduce power and increase speed?

If I am remembering correctly, the memory is currently routed under the chiplets with the IO being on the other 2 edges without compute die for Epyc. The routing required on package is getting too complicated and going to next gen at higher speed will take a lot of power. I hope infinity fabric fan out used for MCD/GCD gets used other places. I haven't read much about strix halo. I have said in the past that it may make sense to make an APU with an infinity fabric connection for other chiplets. That would work great if they can use infinity fabric fan out rather than waste power with regular IFOP links. That would allow them to make something like an 8 core APU with low power cores and then connect several different chip types, like a high performance 8 core chiplet, a 16 core dense chiplet, or maybe even a gpu or other accelerator chiplet. I have also wondered if they might use an MCD in such a situation, perhaps with a LPDDR controller rather than a GDDR controller. Connecting that together with regular IFOP likely would take too much power, but it may be reasonable if infinity fabric fan out can be used. There isn't any adjacency issues really, it would just be memory, MCD, APU, then accessory chip kind of in a line.

jamescox · Jun 12, 2023

Joe NYC said:
HBM would be providing the local memory. It is a little limited, at 128 GB so there are some tradeoffs.

HBM3 theoretical maximum for 8 stacks is 512 GB, but that's not what AMD is going to be able to deliver in the first iteration.

The problem for hyperscalers is that the cost of memory. It is the the most expensive part of the server, and it ends up very under-utilized. So a pool of memory serving multiple servers can be a fraction of capacity and fraction of cost. At the cost of extra latency.

AMD may be at a disadvantage if they do not have anything other than HBM on package and can't connect off package memory other than CXL. Grace-hopper has up to 512 GB LPDDR5x per module and up to 96 GB HBM3. AMD may only have 128 GB HBM3. Even in a 4s system, this would only be 512 GB vs. 2 TB for nvidia. I don't know if Nvidia and AMD are using the HBM in exactly the same way though.

itsmydamnation · Jun 13, 2023

jamescox said:
AMD may be at a disadvantage if they do not have anything other than HBM on package and can't connect off package memory other than CXL. Grace-hopper has up to 512 GB LPDDR5x per module and up to 96 GB HBM3. AMD may only have 128 GB HBM3. Even in a 4s system, this would only be 512 GB vs. 2 TB for nvidia. I don't know if Nvidia and AMD are using the HBM in exactly the same way though.

We sure they don't have flat mounted mrdimm or something like that under those hat sinks?

jamescox · Jun 14, 2023

itsmydamnation said:
We sure they don't have flat mounted mrdimm or something like that under those hat sinks?

It looks like these are likely to be used in 8 socket systems, so that will be 1 TB or 1.5 TB for the 192 GB gpu only version. This may still be problematic for some HPC applications where they need to hold a large amount of data in memory. Multiple TB of system memory is sometimes required. With the HBM acting as cache, having some CXL to back it may work in some cases. I don't really know how the gpu only systems will be set-up. The 8 socket GPU only module looks relatively compact, but I am not that familiar with the OCP/OAM form factor. I was wondering if the GPU only 8-way node would be paired with SP5 processors for up to 12 TB of memory with 2 sockets.

moinmoin · Jun 15, 2023

The AMD Instinct Infinity 8x MI300X OAM UBB Platform doesn't look like it would cover the CPUs in any way which makes the interface to other boards crucial.

Photo from StH:

Joe NYC · Jun 17, 2023

jamescox said:
AMD may be at a disadvantage if they do not have anything other than HBM on package and can't connect off package memory other than CXL. Grace-hopper has up to 512 GB LPDDR5x per module and up to 96 GB HBM3. AMD may only have 128 GB HBM3. Even in a 4s system, this would only be 512 GB vs. 2 TB for nvidia. I don't know if Nvidia and AMD are using the HBM in exactly the same way though.

Theoretical max for HBM3 memory is 64 GB per stack, which would be 512 GB for the Mi300.

The way to get from 128 GB to 512 GB is to double the memory chip size and then to double the stack height from 8-high to 16-high.

AMD is apparently going to use Samsung for HBM, while NVidia is using Hynix. Hynix has been leading the HBM area, but Samsung is right behind.

Samsung is planning production of 12-high stack for Q4 - which coincides with Mi300x ramp, and what allows the 192 GB capacity.

The way AMD is planning on selling Mi300x, with 2 Genoa sockets and 8 Mi300x, it competes ok with NVidia system of 2 Sapphire Rapids (8 memory channels vs. 12 for Genoa) and also on on GPU memory, where Mi300x would be clearly ahead in memory capacity.

On the APU side Grace Hopper would have some extra memory attached to Grace CPU, but since the memory is not unified, there would be duplication of copying back and forth to Hopper. But still, NVidia will have more memory when it is released. We will see how customers perceive the tradeoffs.

AMD has not mentioned Mi300c, the CPU only version. So it may come after Mi300a and Mi300x. If it does, then it will likely have an option of using 12-high stack with 192 GB of memory.

When it comes to Zen 5 (Turin), from some leaks, it seems it is moving forward timely, and we may see it out in H1 2024.

And then, the next gen Mi400 likely in H2 2024. By then, the memory makers may be able to have the 48-64 GB HBM stacks, so we could have 512 GB on Mi400.

The time to pull the plug on local motherboard memory is approaching. Faster on the datacenter side of things.

Kepler_L2 · Jun 18, 2023

Joe NYC said:
Theoretical max for HBM3 memory is 32 GB per stack, which would be 512 GB for the Mi300.

The way to get from 128 GB to 512 GB is to double the memory chip size and then to double the stack height from 8-high to 16-high.

AMD is apparently going to use Samsung for HBM, while NVidia is using Hynix. Hynix has been leading the HBM area, but Samsung is right behind.

Samsung is planning production of 12-high stack for Q4 - which coincides with Mi300x ramp, and what allows the 192 GB capacity.

The way AMD is planning on selling Mi300x, with 2 Genoa sockets and 8 Mi300x, it competes ok with NVidia system of 2 Sapphire Rapids (8 memory channels vs. 12 for Genoa) and also on on GPU memory, where Mi300x would be clearly ahead in memory capacity.

On the APU side Grace Hopper would have some extra memory attached to Grace CPU, but since the memory is not unified, there would be duplication of copying back and forth to Hopper. But still, NVidia will have more memory when it is released. We will see how customers perceive the tradeoffs.

AMD has not mentioned Mi300c, the CPU only version. So it may come after Mi300a and Mi300x. If it does, then it will likely have an option of using 12-high stack with 192 GB of memory.

When it comes to Zen 5 (Turin), from some leaks, it seems it is moving forward timely, and we may see it out in H1 2024.

And then, the next gen Mi400 likely in H2 2024. By then, the memory makers may be able to have the 48-64 GB HBM stacks, so we could have 512 GB on Mi400.

The time to pull the plug on local motherboard memory is approaching. Faster on the datacenter side of things.

MI400 is H2 2025 at best.

Joe NYC · Jun 18, 2023

Kepler_L2 said:
MI400 is H2 2025 at best.

That would be 2 year gap again - not optimal.

Any ideas of what features it could have that are not present in Mi300.

Certain things are obvious:
- more advanced process nodes for all the compute units
- latest gen CPU and GPU CCDs
- latest iteration of HBM, with greater capacity

It seems to me that AMD could, in theory, release a new version in Mi line that would have Zen 5 Dense cores on N3 (doubling the CPU core count) and possibly the GPU unit also on N3, with a modest increase in HBM specs.

Eventually, in future generations, AMD is probably aiming to having all of the internal links being SoIC and having some optical links to the outside world...

A/// · Jun 19, 2023

Joe I agree. Recently I was reading some translated chatter of zen 5 coming sooner than most of us were anticipating many months ago. that 2h target may be closer to virgin time than later in the year near the end. arrow lake will have some stiff competition but i'm not expecting it to be significantly faster than this raptor lake fresher upper coming out soon. contrary to the wide jawed fool and the greasy doughy lad, I expect arrow lake to be anywhere from 5-12% behind Zen 5 in performance while pushing higher watts regardless of any fancy footwork from intel's behalf getting it under control which they'd soon violate.

eek2121 · Jun 19, 2023

A/// said:
Joe I agree. Recently I was reading some translated chatter of zen 5 coming sooner than most of us were anticipating many months ago. that 2h target may be closer to virgin time than later in the year near the end. arrow lake will have some stiff competition but i'm not expecting it to be significantly faster than this raptor lake fresher upper coming out soon. contrary to the wide jawed fool and the greasy doughy lad, I expect arrow lake to be anywhere from 5-12% behind Zen 5 in performance while pushing higher watts regardless of any fancy footwork from intel's behalf getting it under control which they'd soon violate.

Arrow Lake will be on a superior node. It will very likely beat Zen 5 in terms of perf/watt unless AMD completely switches up the design.

I have seen a few indications that Zen 5 might be “more of the same”. Disappointing, if true. I hope to see further innovations from AMD. The multidie and multichiplet designs were both incredible.

soresu · Jun 19, 2023

Joe NYC said:
That would be 2 year gap again - not optimal.

Any ideas of what features it could have that are not present in Mi300.

Certain things are obvious:
- more advanced process nodes for all the compute units
- latest gen CPU and GPU CCDs
- latest iteration of HBM, with greater capacity

It seems to me that AMD could, in theory, release a new version in Mi line that would have Zen 5 Dense cores on N3 (doubling the CPU core count) and possibly the GPU unit also on N3, with a modest increase in HBM specs.

Eventually, in future generations, AMD is probably aiming to having all of the internal links being SoIC and having some optical links to the outside world...

Is there still no sign of HBM in the consumer CPU space?

You would think with HMC going belly up that the HBM people would exploit this.

soresu · Jun 19, 2023

eek2121 said:
unless AMD completely switches up the design

From the way Papermaster has been talking it up years in advance it is either a Bulldozer level flop from switching to a snazzy but unproven µArch - or a significant leap forward, even if only at a foundational level for future µArch improvements, while giving a decent 1st gen 15-20% improvement over previous Zen designs.

The latter seems much more likely - I don't think that AMD would risk a big change at the core µArch level after the roiling hellscape that was Bulldozer's CMT.

eek2121 said:
I hope to see further innovations from AMD. The multidie and multichiplet designs were both incredible.

On this score you can be certain AMD has much planned in advance.

There were slides from them some time ago outlining possible future 3D packaging and logic designs including monolithic multi layer 3D logic (and likely memory too) which would seem like the natural way forward given their past moves.

I'm most interested to see if they shake up the V cache game with non SRAM cache designs now that they have started the ball rolling on decoupling cache again from the main die - newer MRAM types like Spin Orbit Torque could do well vs SRAM in L3 while dramatically reducing die area.

Joe NYC · Jun 19, 2023

soresu said:
Is there still no sign of HBM in the consumer CPU space?

You would think with HMC going belly up that the HBM people would exploit this.

The datacenter GPU is in such a state of feeding frenzy that HBM will continue to be priced at very high premiums.

Which is bad new short term.

But the volume of HBM production is going to get to such a high level that the costs of production will continue to decline, maybe getting down to the consumer price range.

Exist50 · Jun 19, 2023

soresu said:
I'm most interested to see if they shake up the V cache game with non SRAM cache designs now that they have started the ball rolling on decoupling cache again from the main die - newer MRAM types like Spin Orbit Torque could do well vs SRAM in L3 while dramatically reducing die area.

They'll need SRAM (or something equivalently low latency) for anything up through L3 cache, but if they add an additional memory side cache, they could afford to focus more on capacity. That could be an opportunity for different technology. Probably would be a volatile memory, however. Doubt MRAM has any appeal.

Ajay · Jun 19, 2023

soresu said:
Is there still no sign of HBM in the consumer CPU space?

You would think with HMC going belly up that the HBM people would exploit this.

No, and I would be shocked if we ever do see it. SoC designers have plenty of choices with stacked SRAM (like AMD's v-cache), which improves performance at a lower cost than putting HBM on package. Or, if you mean as main system RAM, as pointed out, it is just too expensive for that - DRAM is produced in massive quantities.

soresu · Jun 19, 2023

Joe NYC said:
The datacenter GPU is in such a state of feeding frenzy that HBM will continue to be priced at very high premiums.

Which is bad new short term.

But the volume of HBM production is going to get to such a high level that the costs of production will continue to decline, maybe getting down to the consumer price range.

Ah interesting, I guess I'd just figured in the 6 years since Vega it would already have been scaled up enough for that, put a pin in that for now then 😅

Hopefully 3D DRAM comes along in the interim and scales up the per die capacity too - as it is now it's going to need at least 4 stacks just to reach the DDR4 maximum on consumer platforms (TR excluded).

soresu · Jun 19, 2023

Exist50 said:
They'll need SRAM (or something equivalently low latency) for anything up through L3 cache, but if they add an additional memory side cache, they could afford to focus more on capacity. That could be an opportunity for different technology. Probably would be a volatile memory, however. Doubt MRAM has any appeal.

MRAM comes in several flavors and the latest like SOT MRAM are already pushing into SRAM's latency category for L3 cache as of 2015.

If they can crack the voltage gate assisted SOT MRAM variant (VG-SOT MRAM) then they could get all of that at even lower power too with field free switching - such as that demonstrated by IMEC back in February:

https://www.eetimes.eu/sot-mram-architecture-opens-doors-to-high-density-last-level-cache-memory-applications/

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Senior member

Diamond Member

Senior member

Golden Member

Senior member

Golden Member

Golden Member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Lifer

Diamond Member

Diamond Member