Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

Glo. · Jan 19, 2023

Paul suggests that Strix Point is Monolithic, and that it is scheduled for announcement on CES 2024.

TESKATLIPOKA · Jan 19, 2023

Tigerick said:
Hi guys, since STX1 should move to chiplets design, there are some discussions on which solutions AMD will employ. Thus I have listed 3 designs with pro and cons.

Cyan color indicates N3E process whereas orange color means N4 process. AMD should use 2 x MCD, each containing 16MB IC and supporting 32-bit x 2 LPDDR5x-8533 memory controller.

I created 2 versions of RDNA3+ cause there was rumors of 24 CU with 1536SP. That means STX1 will have 18TF . I have doubts but Intel is rumored to have 320EU with 2560ALU in the works so AMD may need to response

See for yourself and let me know what you think AMD will employ. If you have other ideas, do let me know...

Oh yeah, I use PHX1 base as scale: 178mm2 die size. N3E no doubt have higher density but STX has 8 Zen 5 cores and 4 Zen 4C (plus 8MB L3 cache?), RDNA3+ and other improvement. For the sake of comparison, let's use 178mm2.

RDNA3+ 768SP (1536ALU) Pros and Cons RDNA3+ 1536SP (3072ALU)
View attachment 74936

Derived from desktop 7000 CPU

CCD contains CPU with L3 cache

IOD contains graphics, AIE and FCH

Small CCD saves cost

IOD can get pretty big with more features like double the SP of graphics

View attachment 74937
View attachment 74938

Diagram leaks by RGT

CCD contains CPU + IOD

GCD contains graphics only

CCD die is comparable big

GCD can have multiple sizes

AMD could change graphics GCD depends on competition

GCD with 1536SP will draw more power. My estimate would be addition of 10-15W

That's why AMD will create monolithic version of STX2 to cater ultraportable market

View attachment 74939

Wild speculation: In 2025, STX+ RDNA4 with 1536SP

Half of SP of N43

With N3E, GCD should consume lower power than STX version

View attachment 74940
View attachment 74941

All in one cores with external cache + memory controllers

Almost like 3nm version of M2 Pro with IC

BOM will be highest

Lack of flexibility

View attachment 74942

I don't see any advantage in having CPU+IOD in a single 3nm die, but IGP using a separate die.
From what you made the first or last is most likely.

Geddagod · Jan 19, 2023

DisEnchantment said:
There is going to be too much capacity.

TSMC did not actually start the F12P9 meant for Intel, only F12P8. Intel have cut orders.
Plenty of N3 is there by beginning of 2024. F18P5/6/7 are for N3 and F18P4 also was doing N3 as well during risk production in 2022. And by 2H24 Fab18P8 will be online.
No more expansion for N3 and TSMC is actually starting to build N2 fabs now, they already have clearance to build behind the Fab12P9 in Hsinchu.

F12P8 was for Intel I thought...

Among them, the P5 plant of Nanke Plant 18B is mainly used for the second-generation processors of Apple's next-generation tablet PCs and Macbook series notebook computers; the eighth new plant (P8) of the R&D Center of Hsinchu Plant No. 12 is for Intel. The foundry supports the supporting chips in the company's core processors.

DisEnchantment · Jan 20, 2023

Geddagod said:
F12P8 was for Intel I thought...

F12P8 and F12P9 are meant for Intel. TSMC did not start F12P9.
But they are starting F21P2 for N3 though, coming online in 2025+

uzzi38 · Jan 20, 2023

DisEnchantment said:
They already said, N3 and N4 for Zen 5. Also from LinkedIn, some engineers were mentioning 64 Gbps GMI on N3.

But 3D stacking would be more mature on N4 I guess at beginning of 2024

I know what was said.

DisEnchantment · Jan 20, 2023

uzzi38 said:
I know what was said.

That is assuming you know all there is to know and ready to share. Otherwise there is not much to argue about until official disclosure.

Kaluan · Jan 20, 2023

DisEnchantment said:
There is going to be too much capacity.

TSMC did not actually start the F12P9 meant for Intel, only F12P8. Intel have cut orders.
Plenty of N3 is there by beginning of 2024. F18P5/6/7 are for N3 and F18P4 also was doing N3 as well during risk production in 2022. And by 2H24 Fab18P8 will be online.
No more expansion for N3 and TSMC is actually starting to build N2 fabs now, they already have clearance to build behind the Fab12P9 in Hsinchu.

This also tracks well with the recent report on TSMC seeking to woo it's big customers to transition to N3 family of nodes faster, allegedly pitching even price cuts/discounts.

Yeah I don't think capacity will be a issue. TSMC wouldn't risk greasing the wheels for chipmakers to jump on ship quicker only to be caught with their pants down via unable to allocate enough capacity.

That being said I don't think Turin or Granite Ridge will be N3 based, at least not fully. Not hearing anything about Zen6 yet also leaves a lot of room for Zen5+/shrink speculation in 2025.

jamescox · Jan 20, 2023

Exist50 said:
So my thought was that the only die-to-die connections are from the IOD-fast to each other die, so that would take care of the big-ticket items like feeding the CPU and GPU. Then the IOD-slow would only have to support enough bandwidth to run whatever IO you put on there — namely, stuff like PCIe, USB4, and a few other bits and pieces.

I think the big question would actually be the GPU. It's likely to be the most bandwidth-intensive die, so that link might need special treatment. Perhaps IFOP for the CPU/IOD-slow and "Infinity Fanout Links" or EFB for the GPU?

There is no "slow" IO die bit for Epyc. With 128 pci-express 5 lanes, that can be more than 500 GB/s of bandwidth, so it is similar to the memory interfaces. With CXL, it may actually be memory, so latency is important also.

IntelUser2000 · Jan 20, 2023

Tigerick said:
Another reason that I believe Zen 5 has double cores count cause Turin will double the cores from 96 cores to 192 cores with same socket SP5.

No it doesn't. The process used gains minimally in density and it improves the uarch which means the core size will be at the best equal if not bigger.

jamescox · Jan 20, 2023

I am looking at the MI300 rendering and it isn't making sense. They say 24 CPU cores. The image clearly shows two 8-core cpu chiplets and a big square that I assume is some of the IO die components (maybe). Where are the other 8 cpu cores? It is unclear whether they have interposers running under the whole thing. That would be very expensive, but an MI300 will not be cheap. They do need a massive switching network for 6 gpu chiplets and 3 (?) cpu chiplets, but it doesn't seem like that would take up the entire area under the CPU and GPU chiplets. The connections have to be very wide due to the bandwidth requirements for GPUs using HBM. Are the big blue rectangles between the HBM and CPU/GPU chiplets some kind of cache maybe? It would be a strange aspect ratio, but it does look separate from the other die. If they have a base die, then it may make sense to have large caches in the base die. The rest of it would just be infinity fabric switches and maybe distributed IO. I am wondering if each one of the 4 quadrants acts like the 4 quadrants of an Epyc IO die with 3 DDR5 memory channels and 2 x16 pci-express. That still seems like it would be significantly smaller than 2 GPU chiplets, but comparing with the cpu chiplets, meybe they are smaller than I though. With the cache and IO scaling differently, it seems like they would want to split the physical interfaces from the rest of the IO (like having the UMC on logic optimized and the PHY on a different chip) and also split the cache and the compute. I am thinking there may be several smaller chips underneath rather than a monolithic base die. They may be the same chiplets that will be used across the Zen 5 product line. Really curious as to how this thing is put together.

Not sure hotlink will work:

https://cdn.wccftech.com/wp-content/uploads/2023/01/AMD-Instinct-MI300-CDNA-3-CPU-GPU-APU.jpg

Edit: Also, whatever the square chip is between what looks like 2 cpu chiplets, it looks like it may have massive cache. If the blocks on the upper left are cache and it is as dense as v-cache, then that might be 96 to 128 MB.

Edit: if this is the real device, then it looks completely different from the rendering.

https://images.anandtech.com/doci/18721/MI300_CES.jpg

This has two small chiplets in between each HBM. Anyone know what those are?

igor_kavinski · Jan 20, 2023

Maybe it's six chiplets of quad core clusters?

Saylick · Jan 20, 2023

igor_kavinski said:
View attachment 75040

Maybe it's six chiplets of quad core clusters?

Someone in the Zen 4 thread had this idea (forgot who it was, but all credit goes to them):
- CPU cores numbered in red (24 total)
- 6 APU chiplets
- 1 FPGA chiplet
- 2 AI chiplets

Exist50 · Jan 20, 2023

DisEnchantment said:
At the very least N5 --> N3E should yield as much perf as the N7 --> N5 transition if not better.

Even taking it at face value, that comparison seems to be to base N5. Comparing N3E to N5P or N4P would show a much smaller gap.

DisEnchantment said:
Zen 4 CCD has an abysmally low density for N5 at ~93 MTr/mm2. The thermal hotspot also is a constraint.

I doubt that's something that would change much on new node. With SRAM density near constant, you're not going to get any relief there, and high speed logic like you need to hit ~6GHz is going to be quite low density. You'd only expect a change there if AMD backed off on their frequency targets significantly.

eek2121 said:
I wouldn’t mind a frequency regression if it meant improved efficiency, but I doubt AMD will willingly drop frequencies. Many people still look at clock speed to this day as an indicator of performance.

I think clock speed can also tell us a bit about how "grounds up" the new architecture is. For an iterative design, at least, you don't want to budge your cycle times much, because it means you have to re-tune all those critical timing loops you spent so long refining.

Exist50 · Jan 20, 2023

TESKATLIPOKA said:
I don't see any advantage in having CPU+IOD in a single 3nm die, but IGP using a separate die.
From what you made the first or last is most likely.

It limits their flexibility, but combining the CPU and IOD (or at least memory controller) would make some sense from a PnP perspective. Will be very, very curious to see where they draw the line, if at all. If Strix is still on N4/P, then I think they'd probably leave it monolithic.

jamescox said:
There is no "slow" IO die bit for Epyc. With 128 pci-express 5 lanes, that can be more than 500 GB/s of bandwidth, so it is similar to the memory interfaces. With CXL, it may actually be memory, so latency is important also.

Yeah, that idea was just meant for client. For the datacenter chips, I think if they did a multi-chip IO die, they'd just chunk it up into 2 or 4 mirrored pieces, with something fast like EFB in between.

Vattila · Jan 20, 2023

Saylick said:
Someone in the Zen 4 thread had this idea (forgot who it was, but all credit goes to them)

Thanks. I did speculate that MI300 contains 6 APU chiplets, but there are other possibilities, of course. And, someone intriguingly hinted to me that my speculation is wrong. Anyway, here is my mock-up based on the slide rendering and the actual chip photo:

instinct-mi300-chiplet-speculation-png.74130

Page 467 - Discussion - Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 467 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Now, continue the "Zen 5" speculation!

DisEnchantment · Jan 21, 2023

Seems I keep forgetting what I posted in the first page. From these slides, DT Zen 5 it would seem will be on N4, and Zen 5c and Strix (Advanced Node) on N3.
Zen 5 and Zen 5 3D V-Cache would make sense to be on N4. 3D stacking tech probably not ready on N3.
I forgot that they said EPYC will have XDNA.

Tigerick · Jan 21, 2023

DisEnchantment said:
View attachment 75060 View attachment 75061 View attachment 75062

Seems I keep forgetting what I posted in the first page. From these slides, DT Zen 5 it would seem will be on N4, and Zen 5c and Strix (Advanced Node) on N3.
Zen 5 and Zen 5 3D V-Cache would make sense to have on N4. 3D stacking tech probably not ready on N3.
I forgot that they said EPYC will have XDNA.

If Zen5c is indeed on N3 node, that explains how AMD solve Turin Bergamo Next die size issue. I was expecting 32 Zen 5c per CCD but seems like die area would be too big on N4. So with N3, 32 * 8 = 256 cores seem possible...not sure on stacking cache though

Tigerick · Jan 21, 2023

IntelUser2000 said:
No it doesn't. The process used gains minimally in density and it improves the uarch which means the core size will be at the best equal if not bigger.

Of course, I know. Please refer to front page of this thread to have better idea how AMD might put 16 cores per CCD on N4 within same die area...

Tigerick · Jan 21, 2023

igor_kavinski said:
The V-cache die is not the limitation. The bonding process is. I read somewhere that currently TSMC has a production limit of about 30K V-cache CPUs per month. Has there been any progress in increasing that rate?

Didn't know about bonding issues until now. But I would expect this will resolve as long there is demand.

Plus, the DIY market is not really that big. I expect around 2 millions DIY TAM per quarter. Let's say 30% of customers are targeting above $200 CPU, then 600K CPU per quarter are available for Intel and AMD to pursuit.

igor_kavinski · Jan 21, 2023

Tigerick said:
then 600K CPU per quarter are available for Intel and AMD to pursuit.

Not sure if Intel can have access to V-cache. I think it's more likely that AMD has reserved all V-cache capacity at TSMC fabs for the next few years. Depending on their deal, it could even be a perpetually exclusive deal where only AMD gets to enjoy V-cache because maybe they had a big hand in helping TSMC develop it.

Exist50 · Jan 21, 2023

igor_kavinski said:
Not sure if Intel can have access to V-cache. I think it's more likely that AMD has reserved all V-cache capacity at TSMC fabs for the next few years. Depending on their deal, it could even be a perpetually exclusive deal where only AMD gets to enjoy V-cache because maybe they had a big hand in helping TSMC develop it.

Nah, I doubt that. My understanding is that the tools they're using for hybrid bonding come from one supplier, and they've basically just been shipping trial systems unsuited to volume production. AMD's probably been the guinea pig for it. This year or next, they should get the proper equipment, and things should really ramp up.

igor_kavinski · Jan 21, 2023

Exist50 said:
This year or next, they should get the proper equipment, and things should really ramp up.

Don't you find it weird that no one has V-cache on their roadmap other than AMD?

Exist50 · Jan 21, 2023

igor_kavinski said:
Don't you find it weird that no one has V-cache on their roadmap other than AMD?

Not really, in light of what I just said. TSMC needed something they could make in fairly low volume, and the customer couldn't be dependent on the tech this early in its life cycle. AMD fit the bill. I'm sure in a year or two we'll start to see broader adoption.

Exist50 · Jan 21, 2023

DisEnchantment said:
Seems I keep forgetting what I posted in the first page. From these slides, DT Zen 5 it would seem will be on N4, and Zen 5c and Strix (Advanced Node) on N3.

I don't think that necessarily follows. It's a reasonable assumption, but AMD's clearly trying not to publicly tie any core or product to a given node. The "Advanced Node" thing for Strix is them deliberately withholding that info for now. Tbh, wouldn't be surprising if Strix uses N4P and server use N3.

Mopetar · Jan 21, 2023

igor_kavinski said:
Don't you find it weird that no one has V-cache on their roadmap other than AMD?

Exist50 said:
Not really, in light of what I just said. TSMC needed something they could make in fairly low volume, and the customer couldn't be dependent on the tech this early in its life cycle. AMD fit the bill. I'm sure in a year or two we'll start to see broader adoption.

Lead times on design are usually around four years and v-cache isn't just something you can bolt on to an existing chip. It's not just a matter of capacity, but also being able to actually use it.

I'd imagine that Intel has something in the works that uses this sort of technology by now, or have started figuring out how to do something similar with their own future processes.

There may also be a matter of experience in creating designs with it. Apparently some after the fact analysis of earlier Zen designs (I recall some article pointing out that Zen 2 had the physical structures necessary to connect to v-cache) showed that AMD had probably been tinkering with it for quite a while before they released Zen 3D. Companies incorporating it now will obviously have less of a learning curve given the technology has matured, but it's still not trivial.

RDNA3+ 768SP (1536ALU)	Pros and Cons	RDNA3+ 1536SP (3072ALU)
View attachment 74936	Derived from desktop 7000 CPU CCD contains CPU with L3 cache IOD contains graphics, AIE and FCH Small CCD saves cost IOD can get pretty big with more features like double the SP of graphics	View attachment 74937
View attachment 74938	Diagram leaks by RGT CCD contains CPU + IOD GCD contains graphics only CCD die is comparable big GCD can have multiple sizes AMD could change graphics GCD depends on competition GCD with 1536SP will draw more power. My estimate would be addition of 10-15W That's why AMD will create monolithic version of STX2 to cater ultraportable market	View attachment 74939
	Wild speculation: In 2025, STX+ RDNA4 with 1536SP Half of SP of N43 With N3E, GCD should consume lower power than STX version	View attachment 74940
View attachment 74941	All in one cores with external cache + memory controllers Almost like 3nm version of M2 Pro with IC BOM will be highest Lack of flexibility	View attachment 74942

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Diamond Member

Platinum Member

Golden Member

Golden Member

Platinum Member

Golden Member

Senior member

Senior member

Elite Member

Senior member

Lifer

Diamond Member

Platinum Member

Platinum Member

Senior member

Golden Member

Senior member

Senior member

Senior member

Lifer

Platinum Member

Lifer

Platinum Member

Platinum Member

Diamond Member