Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

BorisTheBlade82 · Feb 3, 2023

Joe NYC said:
MLID said it is a giant interposer.

Also, I don't know if this is true, but AMD may dumped (for now) EFB, for now.

https://twitter.com/x/status/1621127941269626882

OTOH, AMD's Papermaster mentioned that there may be a future EFB with hybrid bond.

But not a giant silicon based monolithic interposer. Maybe a giant RDL interposer.
Mark my words: Although it is only my humble opinion, I am dead sure about this.

DisEnchantment · Feb 3, 2023

Joe NYC said:
Also, I don't know if this is true, but AMD may dumped (for now) EFB, for now.

I find Dylan make a mountain out of every mole hill. But MLID is a different league, not only is he barely literate on such matters, he just make things up.

BorisTheBlade82 said:
But not a giant silicon based monolithic interposer. Maybe a giant RDL interposer.
Mark my words: Although it is only my humble opinion, I am dead sure about this.

Maybe the specifics are missing. If we are talking about 2.5D packaging, then RDL fanout package makes more sense if the trace counts are low and no active routing logic needed in between the dies. Giant Si Interposer makes less sense, when EFB is there. For 3D stacking, the base die basically can be an Si interposer with functional logic. But I bet MLID would say I told you so even if he said Si Interposer from the beginning.

Joe NYC · Feb 3, 2023

DisEnchantment said:
I find Dylan make a mountain out of every mole hill. But MLID is a different league, not only is he barely literate on such matters, he just make things up.

On Mi300, MLID has been very good.

Also, another thing that MLID was the first to say is that V-Cache on Zen 4 will be N6. Which the guy from Tom's hardware let slip out yesterday.

Anyway, this would mean that TSMC can stack N5 on N6 (Mi300) and N6 on N5 (Zen 4)

maddie · Feb 3, 2023

Joe NYC said:
On Mi300, MLID has been very good.

Also, another thing that MLID was the first to say is that V-Cache on Zen 4 will be N6. Which the guy from Tom's hardware let slip out yesterday.

Anyway, this would mean that TSMC can stack N5 on N6 (Mi300) and N6 on N5 (Zen 4)

I always found the assumed different node stacking issue strange. At the interface, where the bond actually happens is where they must match. Why would it matter what the interior logic size was, when it was not directly involved? Validation would take time, but the possibility of stacking different nodes appears perfectly normal.

DisEnchantment · Feb 3, 2023

maddie said:
I always found the assumed different node stacking issue strange. At the interface, where the bond actually happens is where they must match. Why would it matter what the interior logic size was, when it was not directly involved? Validation would take time, but the possibility of stacking different nodes appears perfectly normal.

Drive currents, voltage ranges, CTEs etc differs, so they need to take care of such things.

maddie · Feb 3, 2023

DisEnchantment said:
Drive currents, voltage ranges, CTEs etc differs, so they need to take care of such things.

So basically design for it.

BorisTheBlade82 · Feb 3, 2023

Joe NYC said:
On Mi300, MLID has been very good.

Also, another thing that MLID was the first to say is that V-Cache on Zen 4 will be N6. Which the guy from Tom's hardware let slip out yesterday.

Anyway, this would mean that TSMC can stack N5 on N6 (Mi300) and N6 on N5 (Zen 4)

I couldn't find that on THW - do you have a link? Nevertheless, it is quite likely that even THW are sometimes echoing things, MLID mentions.

Joe NYC · Feb 3, 2023

BorisTheBlade82 said:
I couldn't find that on THW - do you have a link? Nevertheless, it is quite likely that even THW are sometimes echoing things, MLID mentions.

It was in a video, and I think it kind of slipped that it is 6nm:

https://twitter.com/x/status/1621222268994224129

jamescox · Feb 6, 2023

Joe NYC said:
MLID said it is a giant interposer.

Also, I don't know if this is true, but AMD may dumped (for now) EFB, for now.

https://twitter.com/x/status/1621127941269626882

OTOH, AMD's Papermaster mentioned that there may be a future EFB with hybrid bond.

A lot of different stuff has been invented to avoid using a giant silicon interposer so I don't know if I believe that it is a single piece of silicon, if that is what they are trying to say. I would definitely doubt that it is a single, monolithic, silicon interposer under all 4 groups of chiplets. It is much more plausible that it is a separate interposer under each set of 2 gpu chiplets (4 silicon interposers total), but even that seems like more than necessary.

Note that much of the advanced packaging technology has reticle size limits. Even if they are talking about a reticle size limit, that doesn't mean that it is a single, monolithic piece of silicon. It seems more likely to be some form of LSI (local silicon interconnect) and RDL, which can use very thin pieces of silicon embedded under the the die. TSMC has a bunch of different forms of this which all have reticle size limits, although they are likely at 3 or 4x now. I believe the thin peices of silicon can be just passive interconnect or active chiplets, so it seems plausible that they could use an MCD type chiplet under the compute die.

This old link is still a good overview: https://www.anandtech.com/show/16051/3dfabric-the-home-for-tsmc-2-5d-and-3d-stacking-roadmap

None of these use a full silicon interposer. I don't know if the infinity fabric fan-out that they are using for RDNA3 with MCD matches any of these, so that may be something new. I thought they indicated that it was not embedded silicon. I believe they said something about it being derived from tech originally meant for mobile use.

The last slide from the link above looks a lot like the "EFB" that AMD has talked about. It appears to have copper pillars (TIV) that elevate the main chip and allow other chiplets to be embedded underneath. It also shows an SoIC stacked die (like an MCD with v-cache) under other chiplets.

BorisTheBlade82 · Feb 7, 2023

jamescox said:
A lot of different stuff has been invented to avoid using a giant silicon interposer so I don't know if I believe that it is a single piece of silicon, if that is what they are trying to say. I would definitely doubt that it is a single, monolithic, silicon interposer under all 4 groups of chiplets. It is much more plausible that it is a separate interposer under each set of 2 gpu chiplets (4 silicon interposers total), but even that seems like more than necessary.

Note that much of the advanced packaging technology has reticle size limits. Even if they are talking about a reticle size limit, that doesn't mean that it is a single, monolithic piece of silicon. It seems more likely to be some form of LSI (local silicon interconnect) and RDL, which can use very thin pieces of silicon embedded under the the die. TSMC has a bunch of different forms of this which all have reticle size limits, although they are likely at 3 or 4x now. I believe the thin peices of silicon can be just passive interconnect or active chiplets, so it seems plausible that they could use an MCD type chiplet under the compute die.

This old link is still a good overview: https://www.anandtech.com/show/16051/3dfabric-the-home-for-tsmc-2-5d-and-3d-stacking-roadmap

None of these use a full silicon interposer. I don't know if the infinity fabric fan-out that they are using for RDNA3 with MCD matches any of these, so that may be something new. I thought they indicated that it was not embedded silicon. I believe they said something about it being derived from tech originally meant for mobile use.

The last slide from the link above looks a lot like the "EFB" that AMD has talked about. It appears to have copper pillars (TIV) that elevate the main chip and allow other chiplets to be embedded underneath. It also shows an SoIC stacked die (like an MCD with v-cache) under other chiplets.

View attachment 76032

I agree with you that this is all highly confusing. AFAIK the Fan-out used on N31 is more or less identical to InFo-R(DL). The Redistribution layer is not silicon based and has no real reticle limit but might be bound to the reticle limit in the certain way, that no single connection can exceed it. The same goes for EFB. As per my understanding the reticle limit there applies to each single bridge, while the composition as a whole is more or less unlimited. Of course I might be terribly wrong.

Joe NYC · Feb 7, 2023

jamescox said:
A lot of different stuff has been invented to avoid using a giant silicon interposer so I don't know if I believe that it is a single piece of silicon, if that is what they are trying to say. I would definitely doubt that it is a single, monolithic, silicon interposer under all 4 groups of chiplets. It is much more plausible that it is a separate interposer under each set of 2 gpu chiplets (4 silicon interposers total), but even that seems like more than necessary.

No, that his the "base die" - a N6 based die that will have I/O, memory controller, SRAM. There are 4 of them, and each one is 300-350mm2.

Here is a picture from MLID that makes it clearer:

The silicon interposer is underneath all of these stacked dies, these being:
- 8x stacks of HBM
- 4x of base die with compute stacked on top of these.

jamescox said:
Note that much of the advanced packaging technology has reticle size limits. Even if they are talking about a reticle size limit, that doesn't mean that it is a single, monolithic piece of silicon. It seems more likely to be some form of LSI (local silicon interconnect) and RDL, which can use very thin pieces of silicon embedded under the the die. TSMC has a bunch of different forms of this which all have reticle size limits, although they are likely at 3 or 4x now. I believe the thin peices of silicon can be just passive interconnect or active chiplets, so it seems plausible that they could use an MCD type chiplet under the compute die.

This old link is still a good overview: https://www.anandtech.com/show/16051/3dfabric-the-home-for-tsmc-2-5d-and-3d-stacking-roadmap

None of these use a full silicon interposer. I don't know if the infinity fabric fan-out that they are using for RDNA3 with MCD matches any of these, so that may be something new. I thought they indicated that it was not embedded silicon. I believe they said something about it being derived from tech originally meant for mobile use.

The last slide from the link above looks a lot like the "EFB" that AMD has talked about. It appears to have copper pillars (TIV) that elevate the main chip and allow other chiplets to be embedded underneath. It also shows an SoIC stacked die (like an MCD with v-cache) under other chiplets.

View attachment 76032

As far as the connections in the picture above, each of the 4 pair of HBM memory most likely only needs to talk to their adjacent base die.

So, one way to save on the size of the silicon interposer would be to have 4 or those connections using different technology.

But the 4 base dies need to have a high bandwidth, low latency interconnect, so possibly the silicon interposer would only be under those 4 base dies.

Intel is using EMIB in SPR to connect the "tiles", but I think the bandwidth requirements of the disaggregated GPGPU is an order of magnitude (or more) higher bandwidth than what SPR requires.

BTW, this may mean nothing, just a rumor, but there is a rumor out there that AMD had yield issues with EFB of the Mi250.

BorisTheBlade82 · Feb 7, 2023

Joe NYC said:
No, that his the "base die" - a N6 based die that will have I/O, memory controller, SRAM. There are 4 of them, and each one is 300-350mm2.

Here is a picture from MLID that makes it clearer:
View attachment 76039

The silicon interposer is underneath all of these stacked dies, these being:
- 8x stacks of HBM
- 4x of base die with compute stacked on top of these.

As far as the connections in the picture above, each of the 4 pair of HBM memory most likely only needs to talk to their adjacent base die.

So, one way to save on the size of the silicon interposer would be to have 4 or those connections using different technology.

But the 4 base dies need to have a high bandwidth, low latency interconnect, so possibly the silicon interposer would only be under those 4 base dies.

Intel is using EMIB in SPR to connect the "tiles", but I think the bandwidth requirements of the disaggregated GPGPU is an order of magnitude (or more) higher bandwidth than what SPR requires.

BTW, this may mean nothing, just a rumor, but there is a rumor out there that AMD had yield issues with EFB of the Mi250.

Not for a single moment do I believe that AMD might produce a silicon interposer north of 1600mm2 - I do not even know if someone in the world is able to do this. And more importantly: There is absolutely no need.
The HBM stacks have the same bandwidth demand as the MCD of N31, where InFo-R is sufficient. Apple has shown, that you can produce chiplet GPUs by connecting them via silicon bridge.

Joe NYC · Feb 7, 2023

BorisTheBlade82 said:
Not for a single moment do I believe that AMD might produce a silicon interposer north of 1600mm2 - I do not even know if someone in the world is able to do this. And more importantly: There is absolutely no need.

I have seen a number of TSMC presentations saying that they can produce the interposers of this size.
If anything, ever is going to use a big interposer, what kind of product is about $10,000 - $30,000 AI / HPC processor?

$100 is not going to break the bank for this type of product.

BorisTheBlade82 said:
The HBM stacks have the same bandwidth demand as the MCD of N31, where InFo-R is sufficient. Apple has shown, that you can produce chiplet GPUs by connecting them via silicon bridge.

Less expensive technologies can be used for point to point, adjacent chips.

But Mi300 may have a mesh, and it may not be a mesh of just 4 base dies, but 8-9 compute dies.

DisEnchantment · Feb 7, 2023

Found an interesting patent from AMD for increasing IPC by concurrently executing both sides of a branch instruction.

ALTERNATE PATH FOR BRANCH PREDICTION REDIRECT

https://www.freepatentsonline.com/y2022/0075624.html

They need a good bump size of the register file and the other OoO resources to pull this off.

Reposting earlier patents for increasing decode width and multiple op cache pipelines which did not make it to Zen 4.
Not sure what "Re-pipelined front end and wide issue" is going to be or will include such patents at all but interesting regardless.

PROCESSOR WITH MULTIPLE FETCH AND DECODE PIPELINES

PROCESSOR WITH MULTIPLE FETCH AND DECODE PIPELINES - ADVANCED MICRO DEVICES, INC.

<div p-id="p-0001">A processor employs a plurality of fetch and decode pipelines by dividing an instruction stream into instruction blocks with identified boundaries. The processor includes a branch p

www.freepatentsonline.com

PROCESSOR WITH MULTIPLE OP CACHE PIPELINES

PROCESSOR WITH MULTIPLE OP CACHE PIPELINES - ADVANCED MICRO DEVICES, INC.

<div p-id="p-0001">A processor employs a plurality of op cache pipelines to concurrently provide previously decoded operations to a dispatch stage of an instruction pipeline. In response to receiving

www.freepatentsonline.com

Compressing Micro-Operations in Scheduler Entries in a Processor

https://www.freepatentsonline.com/y2022/0100501.html

BorisTheBlade82 · Feb 7, 2023

DisEnchantment said:
Found an interesting patent from AMD for increasing IPC by concurrently executing both sides of a branch instruction.

ALTERNATE PATH FOR BRANCH PREDICTION REDIRECT

https://www.freepatentsonline.com/y2022/0075624.html
View attachment 76053

They need a good bump size of the register file and the other OoO resources to pull this off.

Reposting earlier patents for increasing decode width and multiple op cache pipelines which did not make it to Zen 4.
Not sure what "Re-pipelined front end and wide issue" is going to be or will include such patents at all but interesting regardless.

PROCESSOR WITH MULTIPLE FETCH AND DECODE PIPELINES

PROCESSOR WITH MULTIPLE FETCH AND DECODE PIPELINES - ADVANCED MICRO DEVICES, INC.

<div p-id="p-0001">A processor employs a plurality of fetch and decode pipelines by dividing an instruction stream into instruction blocks with identified boundaries. The processor includes a branch p

www.freepatentsonline.com

View attachment 76051
PROCESSOR WITH MULTIPLE OP CACHE PIPELINES

PROCESSOR WITH MULTIPLE OP CACHE PIPELINES - ADVANCED MICRO DEVICES, INC.

<div p-id="p-0001">A processor employs a plurality of op cache pipelines to concurrently provide previously decoded operations to a dispatch stage of an instruction pipeline. In response to receiving

www.freepatentsonline.com

View attachment 76052

Compressing Micro-Operations in Scheduler Entries in a Processor

https://www.freepatentsonline.com/y2022/0100501.html
View attachment 76056

Thanks for posting. I'd say at least the first one is a kind of patent they are applying for in the thousands. The kind of trade-offs involved make it possible that we might never see it in a product.

DisEnchantment · Feb 7, 2023

BorisTheBlade82 said:
Thanks for posting. I'd say at least the first one is a kind of patent they are applying for in the thousands. The kind of trade-offs involved make it possible that we might never see it in a product.

Actually in one embodiment, they said to use the resources which would have been used by SMT, so something they might try but indeed it is just a patent and one off at that. But lets say, disable SMT and get IPC gains, sounds acceptable.

For example, a processor (or processor core) that implements simultaneous multithreading executes a software thread along the main path using a first logical or physical pipeline (or first hardware thread) and the alternate path using a second logical or physical pipeline (or second hardware thread).

However I noticed AMD has been working a lot on PIM, just a handful of what I found

PROVIDING ATOMICITY FOR COMPLEX OPERATIONS USING NEAR-MEMORY COMPUTING
From <https://www.freepatentsonline.com/y2022/0413849.html>
APPROACH FOR REDUCING SIDE EFFECTS OF COMPUTATION OFFLOAD TO MEMORY
From <https://www.freepatentsonline.com/y2023/0004491.html>
ERROR CHECKING DATA USED IN OFFLOADED OPERATIONS
From <https://www.freepatentsonline.com/y2022/0318089.html>
DETECTING EXECUTION HAZARDS IN OFFLOADED OPERATIONS
From <https://www.freepatentsonline.com/y2022/0318085.html>
Processing-in-memory concurrent processing system and method
From <https://www.freepatentsonline.com/11468001.html>
OFFLOADING COMPUTATIONS FROM A PROCESSOR TO REMOTE EXECUTION LOGIC
From <https://www.freepatentsonline.com/y2022/0206855.html>
MEMORY ALLOCATION FOR PROCESSING-IN-MEMORY OPERATIONS
From <https://www.freepatentsonline.com/y2021/0303355.html>
Command throughput in PIM-enabled memory using available data bus bandwidth
From <https://www.freepatentsonline.com/11262949.html>
HARDWARE-SOFTWARE COLLABORATIVE ADDRESS MAPPING SCHEME FOR EFFICIENT PROCESSING-IN-MEMORY SYSTEMS
From <https://www.freepatentsonline.com/y2022/0066662.html>
PROCESSOR-GUIDED EXECUTION OF OFFLOADED INSTRUCTIONS USING FIXED FUNCTION OPERATIONS
From <https://www.freepatentsonline.com/y2022/0188117.html>
REUSING REMOTE REGISTERS IN PROCESSING IN MEMORY
From <https://www.freepatentsonline.com/y2022/0206685.html>
PRESERVING MEMORY ORDERING BETWEEN OFFLOADED INSTRUCTIONS AND NON-OFFLOADED INSTRUCTIONS
From <https://www.freepatentsonline.com/y2022/0206817.html>
Providing host-based error detection capabilities in a remote execution device
From <https://www.freepatentsonline.com/11409608.html>

I am wondering if the feature they had with Xilinx Virtex Ultrascale+ with Samsung Aquabolt XL for PIM will make it to Zen 5 DC parts with HBM (MI300 type parts), usually recurring patents and provisional patents are good candidates for making it to a product.

Samsung Brings In-Memory Processing Power to Wider Range of Applications

Integration of HBM-PIM with the Xilinx Alveo AI accelerator system will boost overall system performance by 2.5X while reducing energy consumption by more than 60% PIM architecture will be broadly deployed beyond HBM, to include mainstream DRAM modules and mobile memory

news.samsung.com

igor_kavinski · Feb 7, 2023

BorisTheBlade82 said:
The kind of trade-offs involved make it possible that we might never see it in a product.

Unless it's a CPU optimized specifically to be the best gaming CPU period.

Bigos · Feb 7, 2023

DisEnchantment said:
Compressing Micro-Operations in Scheduler Entries in a Processor

https://www.freepatentsonline.com/y2022/0100501.html
View attachment 76056

Some kind of compression is already being done in Zen4, though only NOP instructions. This has some utility when branch targets are being aligned with more than one NOP instruction, but I am not sure how often it is used (the compiler/assembler can create one pretty long NOP instruction instead).

Looking forward to having this used in more cases.

AMD’s Zen 4 Part 1: Frontend and Execution Engine

AMD’s Zen 4 architecture has been hotly anticipated by many in the tech sphere; as a result many rumors were floating around about its performance gains prior to its release.

chipsandcheese.com

Doug S · Feb 7, 2023

DisEnchantment said:
Found an interesting patent from AMD for increasing IPC by concurrently executing both sides of a branch instruction.

There has been research/work on this since at least the 90s, and while I believe a few CPUs may do this on an extremely very limited basis (I've seen claims that Apple's big cores can run both paths in certain cases, though that may simply be to allow progress before the branch predictor has its result ready) no one has gone all-in on it because branch predictors are so good these days you won't get much out of it.

Sure, there are some branches that are essentially impossible to predict where it would be of benefit (so long as they aren't quickly followed by more such branches) but then you are paying a price in terms of additional transistors, power to operate them, and verification time for something that doesn't help you very often.

Mopetar · Feb 7, 2023

I don't think there's a much performance to be had from such a scheme as most people might assume. It assumes that taking a branch isn't going to result in a cache miss that would delay execution by enough time that the result of the condition is available (and further that the extra memory access we didn't need to make isn't polluting or thrashing the cache and degrading performance down the road) and likely doesn't handle situations where there are nested branches several layers deep and you have a large fanout of possible paths.

There are a lot of other cases like large iteration count loops where executing the alternative path is pointless 99.999999% of the time and saves you a few cycles on that last iteration. Granted they can incorporate some branch prediction into this to avoid those cases and there's a lot of other considerations for something like this, but ultimately it's only as good as your branch prediction isn't. It almost warrants a specific instruction that could be used when dealing with what's essentially a random outcome or something that will only give the branch predictor fits.

yuri69 · Feb 7, 2023

Doug S said:
There has been research/work on this since at least the 90s, and while I believe a few CPUs may do this on an extremely very limited basis (I've seen claims that Apple's big cores can run both paths in certain cases, though that may simply be to allow progress before the branch predictor has its result ready) no one has gone all-in on it because branch predictors are so good these days you won't get much out of it.

Sure, there are some branches that are essentially impossible to predict where it would be of benefit (so long as they aren't quickly followed by more such branches) but then you are paying a price in terms of additional transistors, power to operate them, and verification time for something that doesn't help you very often.

Exactly this.

That idea is sometimes called eager execution. Eager execution doing twice the work and throwing one half of that burned power away is quite bad given the overall accuracy of the current predictors. Turning the eager execution on and off based on the prediction history doesn't sound easy. There is a RWT thread about this topic.

BorisTheBlade82 · Feb 7, 2023

Exactly, it is not a new thing and of limited benefit. And didn't Itanium do exactly that IIRC?

Mopetar · Feb 7, 2023

It's certainly nothing new, but any company would be foolish to put something in a product that they don't have a patent on. Too many potent trolls out there who are eager for an opportunity for a potential payout.

jamescox · Feb 7, 2023

Joe NYC said:
No, that his the "base die" - a N6 based die that will have I/O, memory controller, SRAM. There are 4 of them, and each one is 300-350mm2.

Here is a picture from MLID that makes it clearer:
View attachment 76039

The silicon interposer is underneath all of these stacked dies, these being:
- 8x stacks of HBM
- 4x of base die with compute stacked on top of these.

As far as the connections in the picture above, each of the 4 pair of HBM memory most likely only needs to talk to their adjacent base die.

So, one way to save on the size of the silicon interposer would be to have 4 or those connections using different technology.

But the 4 base dies need to have a high bandwidth, low latency interconnect, so possibly the silicon interposer would only be under those 4 base dies.

Intel is using EMIB in SPR to connect the "tiles", but I think the bandwidth requirements of the disaggregated GPGPU is an order of magnitude (or more) higher bandwidth than what SPR requires.

BTW, this may mean nothing, just a rumor, but there is a rumor out there that AMD had yield issues with EFB of the Mi250.

Yeah, MLIDs mock-up is what I meant by "a separate interposer under each set of 2 gpu chiplets (4 silicon interposers total)". That is plausible. Much more plausible than a single giant silicon interposer. You would still need to connect them together somehow, with RDL or embedded bridge chips. The HBM would likely need embedded bridge chips, so you are up to possibly 3 layers of silicon die instead of 2. Four of the "base die" would probably be over 1200 mm2 of silicon with each one not too far from an entire Epyc IO die in size.

They could fit a ridiculous amount of cache on each one, but that large of cache has not proved to be that useful for GPUs. The infinity cache on RDNA3 is only 96 MB. It has 8 stackes of HBM3, so it doesn't need the bandwidth boost from infinity cache. It already has ridiculously high bandwidth. If it is stacked with SoIC rather than other stacking tech, then that could be a very different beast. That could allow compute units to have massive local caches rather than a monolithic, but much farther L3 cache. All chiplets used would need to be designed with that in mind though. The Zen 4 chiplets likely would not be able to use it. In fact it is unclear how stacked Zen 4 chiplets will work on a base die anyway. One of AMD's slides did show what looked like two 8-core cpu chiplets on each end of the "base die" and something else in the middle. I don't know how they get 24 cores from that, but it could be that the cpu chiplets are mounted over the "IO area" and the square in the middle that appears to be something else is mounted over the "cache area".

Given AMD's modular approach, something like embedded MCD (off package memory controllers + cache) , embedded IO die, bridge chips (LSI), etc seems like it may make more sense; chiplets that can be used across many different products rather than just MI300. I am not sure if we have any info on what the MI300 will have for off package connectivity in the SH5 socket. Will it have the same IO as SP5? HPC often needs TB of memory, so it can't just be the 128 GB of HBM. Also, sending signals across these giant interposers may be problematic. They don't daisy chain or run silicon under cpu chiplets in Epyc; it is better to just go the IFOP route with a separate connection than to have to route across multiple chips in silicon. The Epyc IO die already has a number of switches and repeaters internally that add latency. Due to the scaling differences between IO, cache, and logic, I am still thinking that the "base die" may be something made out of a number of different pieces of silicon, made on slightly different processes.

scineram · Feb 8, 2023

Mopetar said:
It's certainly nothing new, but any company would be foolish to put something in a product that they don't have a patent on. Too many potent trolls out there who are eager for an opportunity for a potential payout.

If it's nothing new the patent is invalid and completely wasted.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Senior member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Golden Member

Senior member

Golden Member

Lifer

Member

Diamond Member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Senior member