- Mar 3, 2017
- 1,777
- 6,791
- 136
Do you mean the RAM approach or the packaging?I wonder if there is a way to retrofit AM5 socket to move to the Mi300 approach on the client side. Maybe it is not worth it, and maybe AMD will offer alternative to AM5 socket sometimes between Zen 5 and Zen 6. The Strix Halo may potentially be the harbinger of this.
Do you mean the RAM approach or the packaging?
Regarding packaging: Maybe I am oversimplifying things, but anything that happens "above the pins" should be socket agnostic, as long as the socket provides enough singnal and power traces as well as enough surface area. So I see no reason, why they should not be able to change the packaging on AM5.
Regarding memory: I expect them to go the Apple route for future mobile SoCs and that is what we hear already. And yes, that won't work for the current socket IMHO. But that's only a problem for OEMs and not so much for customers.
Yep, exactly my line of thinking.The rumor has it that Mi300 does not have any local motherboard memory. Only on package HBM memory and CXL links for more external memory.
Remember when Lisa Su said that the package would be the new motherboard? The biggest part of this is to eliminate memory from the motherboard and move it to the package.
Apple client computers have made this move, without suffering any adverse effects. The ability of OEMs to pair AMD CPUs with garbage memory is only to AMD's determent, so having fast enough memory inside the package would be to AMD's benefit.
The memory can be LPDDR5 at the outset, for lowest cost, biggest bang for the buck, and later possibly HBM memory.
So, hopefully, AMD is already planning on this approach for the future generations. We will see if Strix Halo (as was leaked by MLID) will have this sort of socket, or if it is only a notebook chip.
You mean this?That could be likely, any changes would likely be triggered by DC and AI first.,
Any idea about this snippet below I screen shotted from LinkedIn?
View attachment 81628
I doubt that will ever happen. Latency is a thing, CXL will mostly be used where large pools of memory are needed, not to replace the local memory. That will migrate to the package.That seems like the most logical progression.
Mi400 will get Zen 5 cores. Cloud providers / hyperscalers start migration from local motherboard memory to pooled CXL memories.
HBM would be providing the local memory. It is a little limited, at 128 GB so there are some tradeoffs.I doubt that will ever happen. Latency is a thing, CXL will mostly be used where large pools of memory are needed, not to replace the local memory. That will migrate to the package.
I remember an interview with an AMD person discussing this briefly quite a while ago, but I don't remember who it was or when. Such long bridges sounded problematic. It might be that they still run a SerDes type link, but run it through the embedded silicon to reduce power and increase speed?When Zen 2 came out in 2019, it was a radical change when it comes to CPU packaging as well as core scaling with SCF/SDF.
5 years later in 2024 we would hope something new will come up to address the short comings of this technology for the next gen CPUs
AMD will have to address the above problems with a new interconnect, even their competitor is using much more exotic BE packaging in current gen products. But I wouldn't hold my breath if next gen CPUs are stuck on same tech.
- Latency is a known issue with AMD's IFOP and for addressing that few LinkedIn posts put it at 64 Gbps for next gen IF. This is a big jump and could have a major efficiency impact if same IFOP is used going forward.
- Bandwidth was not an issue with earlier cores, but Zen 4 showed signs of bandwidth deficiency in many workloads with the 36 Gbps IFOP.
- Power, well 0.4 pJ/bit for MCD links vs 2pJ/bit speaks for itself. GLink is being advertised as 0.25 pJ/bit
- Die area being consumed on Zen 4 for IFOP is ~7mm2 in a 66mm2 chip (excluding the SDF/SCF that is part of L3), that is 10% of a potential N4P and soon N3E Si. GCD-MCD links have demonstrated smaller beachfront for higher bandwidth density. GuCs GLink for instance needs 3mm of beachfront to provide 7.5 Tbps of BW.
- Trace density from the IOD to the CCDs and IO. It seems there is already a limit reached how much space is available to route the signals from IOD to CCD considering the space is also needed for IO/Memory etc traces
As for the costs, AMD is doing fanout links on a 750 USD GPU product in which the actual chip should be sold at half of that if not less, and with 6 of these links. And SoIC parts like 5800X3D selling for less than 300 USD.
I noticed one patent which attempted to address this issue by using normal fanout when the CCDs are single column on each side of the IOD and adding bridges in the fanout when multi columns of CCDs are there. so basically same CCD/IOD can be used but packaged differently for different configs.
https://www.freepatentsonline.com/11469183.html
View attachment 81610
Rumors of MI300C have been floating around, lets see if this is real in couple of days. It could be a precursor.
AMD may be at a disadvantage if they do not have anything other than HBM on package and can't connect off package memory other than CXL. Grace-hopper has up to 512 GB LPDDR5x per module and up to 96 GB HBM3. AMD may only have 128 GB HBM3. Even in a 4s system, this would only be 512 GB vs. 2 TB for nvidia. I don't know if Nvidia and AMD are using the HBM in exactly the same way though.HBM would be providing the local memory. It is a little limited, at 128 GB so there are some tradeoffs.
HBM3 theoretical maximum for 8 stacks is 512 GB, but that's not what AMD is going to be able to deliver in the first iteration.
The problem for hyperscalers is that the cost of memory. It is the the most expensive part of the server, and it ends up very under-utilized. So a pool of memory serving multiple servers can be a fraction of capacity and fraction of cost. At the cost of extra latency.
We sure they don't have flat mounted mrdimm or something like that under those hat sinks?AMD may be at a disadvantage if they do not have anything other than HBM on package and can't connect off package memory other than CXL. Grace-hopper has up to 512 GB LPDDR5x per module and up to 96 GB HBM3. AMD may only have 128 GB HBM3. Even in a 4s system, this would only be 512 GB vs. 2 TB for nvidia. I don't know if Nvidia and AMD are using the HBM in exactly the same way though.
It looks like these are likely to be used in 8 socket systems, so that will be 1 TB or 1.5 TB for the 192 GB gpu only version. This may still be problematic for some HPC applications where they need to hold a large amount of data in memory. Multiple TB of system memory is sometimes required. With the HBM acting as cache, having some CXL to back it may work in some cases. I don't really know how the gpu only systems will be set-up. The 8 socket GPU only module looks relatively compact, but I am not that familiar with the OCP/OAM form factor. I was wondering if the GPU only 8-way node would be paired with SP5 processors for up to 12 TB of memory with 2 sockets.We sure they don't have flat mounted mrdimm or something like that under those hat sinks?
Theoretical max for HBM3 memory is 64 GB per stack, which would be 512 GB for the Mi300.AMD may be at a disadvantage if they do not have anything other than HBM on package and can't connect off package memory other than CXL. Grace-hopper has up to 512 GB LPDDR5x per module and up to 96 GB HBM3. AMD may only have 128 GB HBM3. Even in a 4s system, this would only be 512 GB vs. 2 TB for nvidia. I don't know if Nvidia and AMD are using the HBM in exactly the same way though.
MI400 is H2 2025 at best.Theoretical max for HBM3 memory is 32 GB per stack, which would be 512 GB for the Mi300.
The way to get from 128 GB to 512 GB is to double the memory chip size and then to double the stack height from 8-high to 16-high.
AMD is apparently going to use Samsung for HBM, while NVidia is using Hynix. Hynix has been leading the HBM area, but Samsung is right behind.
Samsung is planning production of 12-high stack for Q4 - which coincides with Mi300x ramp, and what allows the 192 GB capacity.
The way AMD is planning on selling Mi300x, with 2 Genoa sockets and 8 Mi300x, it competes ok with NVidia system of 2 Sapphire Rapids (8 memory channels vs. 12 for Genoa) and also on on GPU memory, where Mi300x would be clearly ahead in memory capacity.
On the APU side Grace Hopper would have some extra memory attached to Grace CPU, but since the memory is not unified, there would be duplication of copying back and forth to Hopper. But still, NVidia will have more memory when it is released. We will see how customers perceive the tradeoffs.
AMD has not mentioned Mi300c, the CPU only version. So it may come after Mi300a and Mi300x. If it does, then it will likely have an option of using 12-high stack with 192 GB of memory.
When it comes to Zen 5 (Turin), from some leaks, it seems it is moving forward timely, and we may see it out in H1 2024.
And then, the next gen Mi400 likely in H2 2024. By then, the memory makers may be able to have the 48-64 GB HBM stacks, so we could have 512 GB on Mi400.
The time to pull the plug on local motherboard memory is approaching. Faster on the datacenter side of things.
That would be 2 year gap again - not optimal.MI400 is H2 2025 at best.
Arrow Lake will be on a superior node. It will very likely beat Zen 5 in terms of perf/watt unless AMD completely switches up the design.Joe I agree. Recently I was reading some translated chatter of zen 5 coming sooner than most of us were anticipating many months ago. that 2h target may be closer to virgin time than later in the year near the end. arrow lake will have some stiff competition but i'm not expecting it to be significantly faster than this raptor lake fresher upper coming out soon. contrary to the wide jawed fool and the greasy doughy lad, I expect arrow lake to be anywhere from 5-12% behind Zen 5 in performance while pushing higher watts regardless of any fancy footwork from intel's behalf getting it under control which they'd soon violate.
Is there still no sign of HBM in the consumer CPU space?That would be 2 year gap again - not optimal.
Any ideas of what features it could have that are not present in Mi300.
Certain things are obvious:
- more advanced process nodes for all the compute units
- latest gen CPU and GPU CCDs
- latest iteration of HBM, with greater capacity
It seems to me that AMD could, in theory, release a new version in Mi line that would have Zen 5 Dense cores on N3 (doubling the CPU core count) and possibly the GPU unit also on N3, with a modest increase in HBM specs.
Eventually, in future generations, AMD is probably aiming to having all of the internal links being SoIC and having some optical links to the outside world...
From the way Papermaster has been talking it up years in advance it is either a Bulldozer level flop from switching to a snazzy but unproven µArch - or a significant leap forward, even if only at a foundational level for future µArch improvements, while giving a decent 1st gen 15-20% improvement over previous Zen designs.unless AMD completely switches up the design
On this score you can be certain AMD has much planned in advance.I hope to see further innovations from AMD. The multidie and multichiplet designs were both incredible.
The datacenter GPU is in such a state of feeding frenzy that HBM will continue to be priced at very high premiums.Is there still no sign of HBM in the consumer CPU space?
You would think with HMC going belly up that the HBM people would exploit this.
They'll need SRAM (or something equivalently low latency) for anything up through L3 cache, but if they add an additional memory side cache, they could afford to focus more on capacity. That could be an opportunity for different technology. Probably would be a volatile memory, however. Doubt MRAM has any appeal.I'm most interested to see if they shake up the V cache game with non SRAM cache designs now that they have started the ball rolling on decoupling cache again from the main die - newer MRAM types like Spin Orbit Torque could do well vs SRAM in L3 while dramatically reducing die area.
No, and I would be shocked if we ever do see it. SoC designers have plenty of choices with stacked SRAM (like AMD's v-cache), which improves performance at a lower cost than putting HBM on package. Or, if you mean as main system RAM, as pointed out, it is just too expensive for that - DRAM is produced in massive quantities.Is there still no sign of HBM in the consumer CPU space?
You would think with HMC going belly up that the HBM people would exploit this.
Ah interesting, I guess I'd just figured in the 6 years since Vega it would already have been scaled up enough for that, put a pin in that for now then 😅The datacenter GPU is in such a state of feeding frenzy that HBM will continue to be priced at very high premiums.
Which is bad new short term.
But the volume of HBM production is going to get to such a high level that the costs of production will continue to decline, maybe getting down to the consumer price range.
MRAM comes in several flavors and the latest like SOT MRAM are already pushing into SRAM's latency category for L3 cache as of 2015.They'll need SRAM (or something equivalently low latency) for anything up through L3 cache, but if they add an additional memory side cache, they could afford to focus more on capacity. That could be an opportunity for different technology. Probably would be a volatile memory, however. Doubt MRAM has any appeal.