Discussion Zen 5 Discussion (EPYC Turin and Strix Point/Granite Ridge - Ryzen 8000)

Page 19 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

What do you expect with Zen 5?


  • Total voters
    103

TESKATLIPOKA

Golden Member
May 1, 2020
1,280
1,527
106
Hi guys, since STX1 should move to chiplets design, there are some discussions on which solutions AMD will employ. Thus I have listed 3 designs with pro and cons.

Cyan color indicates N3E process whereas orange color means N4 process. AMD should use 2 x MCD, each containing 16MB IC and supporting 32-bit x 2 LPDDR5x-8533 memory controller.

I created 2 versions of RDNA3+ cause there was rumors of 24 CU with 1536SP. That means STX1 will have 18TF ;). I have doubts but Intel is rumored to have 320EU with 2560ALU in the works so AMD may need to response

See for yourself and let me know what you think AMD will employ. If you have other ideas, do let me know...

Oh yeah, I use PHX1 base as scale: 178mm2 die size. N3E no doubt have higher density but STX has 8 Zen 5 cores and 4 Zen 4C (plus 8MB L3 cache?), RDNA3+ and other improvement. For the sake of comparison, let's use 178mm2.

RDNA3+ 768SP (1536ALU)Pros and ConsRDNA3+ 1536SP (3072ALU)
View attachment 74936
  • Derived from desktop 7000 CPU
  • CCD contains CPU with L3 cache
  • IOD contains graphics, AIE and FCH
  • Small CCD saves cost
  • IOD can get pretty big with more features like double the SP of graphics
View attachment 74937
View attachment 74938
  • Diagram leaks by RGT
  • CCD contains CPU + IOD
  • GCD contains graphics only
  • CCD die is comparable big
  • GCD can have multiple sizes
  • AMD could change graphics GCD depends on competition
  • GCD with 1536SP will draw more power. My estimate would be addition of 10-15W
  • That's why AMD will create monolithic version of STX2 to cater ultraportable market
View attachment 74939
  • Wild speculation: In 2025, STX+ RDNA4 with 1536SP
  • Half of SP of N43
  • With N3E, GCD should consume lower power than STX version
View attachment 74940
View attachment 74941
  • All in one cores with external cache + memory controllers
  • Almost like 3nm version of M2 Pro with IC
  • BOM will be highest
  • Lack of flexibility
View attachment 74942
I don't see any advantage in having CPU+IOD in a single 3nm die, but IGP using a separate die.
From what you made the first or last is most likely.
 

Geddagod

Member
Dec 28, 2021
171
152
76
There is going to be too much capacity.

TSMC did not actually start the F12P9 meant for Intel, only F12P8. Intel have cut orders.
Plenty of N3 is there by beginning of 2024. F18P5/6/7 are for N3 and F18P4 also was doing N3 as well during risk production in 2022. And by 2H24 Fab18P8 will be online.
No more expansion for N3 and TSMC is actually starting to build N2 fabs now, they already have clearance to build behind the Fab12P9 in Hsinchu.
F12P8 was for Intel I thought...
Among them, the P5 plant of Nanke Plant 18B is mainly used for the second-generation processors of Apple's next-generation tablet PCs and Macbook series notebook computers; the eighth new plant (P8) of the R&D Center of Hsinchu Plant No. 12 is for Intel. The foundry supports the supporting chips in the company's core processors.
 

Kaluan

Senior member
Jan 4, 2022
429
897
96
There is going to be too much capacity.

TSMC did not actually start the F12P9 meant for Intel, only F12P8. Intel have cut orders.
Plenty of N3 is there by beginning of 2024. F18P5/6/7 are for N3 and F18P4 also was doing N3 as well during risk production in 2022. And by 2H24 Fab18P8 will be online.
No more expansion for N3 and TSMC is actually starting to build N2 fabs now, they already have clearance to build behind the Fab12P9 in Hsinchu.
This also tracks well with the recent report on TSMC seeking to woo it's big customers to transition to N3 family of nodes faster, allegedly pitching even price cuts/discounts.

Yeah I don't think capacity will be a issue. TSMC wouldn't risk greasing the wheels for chipmakers to jump on ship quicker only to be caught with their pants down via unable to allocate enough capacity.


That being said I don't think Turin or Granite Ridge will be N3 based, at least not fully. Not hearing anything about Zen6 yet also leaves a lot of room for Zen5+/shrink speculation in 2025.
 

jamescox

Senior member
Nov 11, 2009
588
1,016
136
So my thought was that the only die-to-die connections are from the IOD-fast to each other die, so that would take care of the big-ticket items like feeding the CPU and GPU. Then the IOD-slow would only have to support enough bandwidth to run whatever IO you put on there — namely, stuff like PCIe, USB4, and a few other bits and pieces.

I think the big question would actually be the GPU. It's likely to be the most bandwidth-intensive die, so that link might need special treatment. Perhaps IFOP for the CPU/IOD-slow and "Infinity Fanout Links" or EFB for the GPU?
There is no "slow" IO die bit for Epyc. With 128 pci-express 5 lanes, that can be more than 500 GB/s of bandwidth, so it is similar to the memory interfaces. With CXL, it may actually be memory, so latency is important also.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,578
3,634
136
Another reason that I believe Zen 5 has double cores count cause Turin will double the cores from 96 cores to 192 cores with same socket SP5.
No it doesn't. The process used gains minimally in density and it improves the uarch which means the core size will be at the best equal if not bigger.
 
  • Like
Reactions: scineram

jamescox

Senior member
Nov 11, 2009
588
1,016
136
I am looking at the MI300 rendering and it isn't making sense. They say 24 CPU cores. The image clearly shows two 8-core cpu chiplets and a big square that I assume is some of the IO die components (maybe). Where are the other 8 cpu cores? It is unclear whether they have interposers running under the whole thing. That would be very expensive, but an MI300 will not be cheap. They do need a massive switching network for 6 gpu chiplets and 3 (?) cpu chiplets, but it doesn't seem like that would take up the entire area under the CPU and GPU chiplets. The connections have to be very wide due to the bandwidth requirements for GPUs using HBM. Are the big blue rectangles between the HBM and CPU/GPU chiplets some kind of cache maybe? It would be a strange aspect ratio, but it does look separate from the other die. If they have a base die, then it may make sense to have large caches in the base die. The rest of it would just be infinity fabric switches and maybe distributed IO. I am wondering if each one of the 4 quadrants acts like the 4 quadrants of an Epyc IO die with 3 DDR5 memory channels and 2 x16 pci-express. That still seems like it would be significantly smaller than 2 GPU chiplets, but comparing with the cpu chiplets, meybe they are smaller than I though. With the cache and IO scaling differently, it seems like they would want to split the physical interfaces from the rest of the IO (like having the UMC on logic optimized and the PHY on a different chip) and also split the cache and the compute. I am thinking there may be several smaller chips underneath rather than a monolithic base die. They may be the same chiplets that will be used across the Zen 5 product line. Really curious as to how this thing is put together.


Not sure hotlink will work:



Edit: Also, whatever the square chip is between what looks like 2 cpu chiplets, it looks like it may have massive cache. If the blocks on the upper left are cache and it is as dense as v-cache, then that might be 96 to 128 MB.



Edit: if this is the real device, then it looks completely different from the rendering.


This has two small chiplets in between each HBM. Anyone know what those are?
 
Last edited:
  • Like
Reactions: ftt

Exist50

Golden Member
Aug 18, 2016
1,694
1,778
136
At the very least N5 --> N3E should yield as much perf as the N7 --> N5 transition if not better.
Even taking it at face value, that comparison seems to be to base N5. Comparing N3E to N5P or N4P would show a much smaller gap.
Zen 4 CCD has an abysmally low density for N5 at ~93 MTr/mm2. The thermal hotspot also is a constraint.
I doubt that's something that would change much on new node. With SRAM density near constant, you're not going to get any relief there, and high speed logic like you need to hit ~6GHz is going to be quite low density. You'd only expect a change there if AMD backed off on their frequency targets significantly.
I wouldn’t mind a frequency regression if it meant improved efficiency, but I doubt AMD will willingly drop frequencies. Many people still look at clock speed to this day as an indicator of performance.
I think clock speed can also tell us a bit about how "grounds up" the new architecture is. For an iterative design, at least, you don't want to budge your cycle times much, because it means you have to re-tune all those critical timing loops you spent so long refining.
 

Exist50

Golden Member
Aug 18, 2016
1,694
1,778
136
I don't see any advantage in having CPU+IOD in a single 3nm die, but IGP using a separate die.
From what you made the first or last is most likely.
It limits their flexibility, but combining the CPU and IOD (or at least memory controller) would make some sense from a PnP perspective. Will be very, very curious to see where they draw the line, if at all. If Strix is still on N4/P, then I think they'd probably leave it monolithic.
There is no "slow" IO die bit for Epyc. With 128 pci-express 5 lanes, that can be more than 500 GB/s of bandwidth, so it is similar to the memory interfaces. With CXL, it may actually be memory, so latency is important also.
Yeah, that idea was just meant for client. For the datacenter chips, I think if they did a multi-chip IO die, they'd just chunk it up into 2 or 4 mirrored pieces, with something fast like EFB in between.
 
  • Like
Reactions: TESKATLIPOKA

Vattila

Senior member
Oct 22, 2004
771
1,240
136
Someone in the Zen 4 thread had this idea (forgot who it was, but all credit goes to them)
Thanks. I did speculate that MI300 contains 6 APU chiplets, but there are other possibilities, of course. And, someone intriguingly hinted to me that my speculation is wrong. Anyway, here is my mock-up based on the slide rendering and the actual chip photo:




Now, continue the "Zen 5" speculation!
 

DisEnchantment

Golden Member
Mar 3, 2017
1,419
4,787
136
1674300460177.png1674300500382.png1674300521169.png

Seems I keep forgetting what I posted in the first page. From these slides, DT Zen 5 it would seem will be on N4, and Zen 5c and Strix (Advanced Node) on N3.
Zen 5 and Zen 5 3D V-Cache would make sense to be on N4. 3D stacking tech probably not ready on N3.
I forgot that they said EPYC will have XDNA.
 

Tigerick

Member
Apr 1, 2022
129
150
76
View attachment 75060View attachment 75061View attachment 75062

Seems I keep forgetting what I posted in the first page. From these slides, DT Zen 5 it would seem will be on N4, and Zen 5c and Strix (Advanced Node) on N3.
Zen 5 and Zen 5 3D V-Cache would make sense to have on N4. 3D stacking tech probably not ready on N3.
I forgot that they said EPYC will have XDNA.
If Zen5c is indeed on N3 node, that explains how AMD solve Turin Bergamo Next die size issue. I was expecting 32 Zen 5c per CCD but seems like die area would be too big on N4. So with N3, 32 * 8 = 256 cores seem possible...not sure on stacking cache though :p
 

Tigerick

Member
Apr 1, 2022
129
150
76
No it doesn't. The process used gains minimally in density and it improves the uarch which means the core size will be at the best equal if not bigger.
Of course, I know. Please refer to front page of this thread to have better idea how AMD might put 16 cores per CCD on N4 within same die area...
 

Tigerick

Member
Apr 1, 2022
129
150
76
The V-cache die is not the limitation. The bonding process is. I read somewhere that currently TSMC has a production limit of about 30K V-cache CPUs per month. Has there been any progress in increasing that rate?
Didn't know about bonding issues until now. But I would expect this will resolve as long there is demand.

Plus, the DIY market is not really that big. I expect around 2 millions DIY TAM per quarter. Let's say 30% of customers are targeting above $200 CPU, then 600K CPU per quarter are available for Intel and AMD to pursuit.
 
  • Like
Reactions: ftt and Vattila

igor_kavinski

Diamond Member
Jul 27, 2020
7,602
4,394
106
then 600K CPU per quarter are available for Intel and AMD to pursuit.
Not sure if Intel can have access to V-cache. I think it's more likely that AMD has reserved all V-cache capacity at TSMC fabs for the next few years. Depending on their deal, it could even be a perpetually exclusive deal where only AMD gets to enjoy V-cache because maybe they had a big hand in helping TSMC develop it.
 
  • Like
Reactions: Tigerick

Exist50

Golden Member
Aug 18, 2016
1,694
1,778
136
Not sure if Intel can have access to V-cache. I think it's more likely that AMD has reserved all V-cache capacity at TSMC fabs for the next few years. Depending on their deal, it could even be a perpetually exclusive deal where only AMD gets to enjoy V-cache because maybe they had a big hand in helping TSMC develop it.
Nah, I doubt that. My understanding is that the tools they're using for hybrid bonding come from one supplier, and they've basically just been shipping trial systems unsuited to volume production. AMD's probably been the guinea pig for it. This year or next, they should get the proper equipment, and things should really ramp up.
 

Exist50

Golden Member
Aug 18, 2016
1,694
1,778
136
Don't you find it weird that no one has V-cache on their roadmap other than AMD?
Not really, in light of what I just said. TSMC needed something they could make in fairly low volume, and the customer couldn't be dependent on the tech this early in its life cycle. AMD fit the bill. I'm sure in a year or two we'll start to see broader adoption.
 
  • Like
Reactions: ftt

Exist50

Golden Member
Aug 18, 2016
1,694
1,778
136
Seems I keep forgetting what I posted in the first page. From these slides, DT Zen 5 it would seem will be on N4, and Zen 5c and Strix (Advanced Node) on N3.
I don't think that necessarily follows. It's a reasonable assumption, but AMD's clearly trying not to publicly tie any core or product to a given node. The "Advanced Node" thing for Strix is them deliberately withholding that info for now. Tbh, wouldn't be surprising if Strix uses N4P and server use N3.
 
  • Like
Reactions: ftt

Mopetar

Diamond Member
Jan 31, 2011
7,096
4,540
136
Don't you find it weird that no one has V-cache on their roadmap other than AMD?
Not really, in light of what I just said. TSMC needed something they could make in fairly low volume, and the customer couldn't be dependent on the tech this early in its life cycle. AMD fit the bill. I'm sure in a year or two we'll start to see broader adoption.
Lead times on design are usually around four years and v-cache isn't just something you can bolt on to an existing chip. It's not just a matter of capacity, but also being able to actually use it.

I'd imagine that Intel has something in the works that uses this sort of technology by now, or have started figuring out how to do something similar with their own future processes.

There may also be a matter of experience in creating designs with it. Apparently some after the fact analysis of earlier Zen designs (I recall some article pointing out that Zen 2 had the physical structures necessary to connect to v-cache) showed that AMD had probably been tinkering with it for quite a while before they released Zen 3D. Companies incorporating it now will obviously have less of a learning curve given the technology has matured, but it's still not trivial.
 

ASK THE COMMUNITY