Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 83 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
809
1,412
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

jpiniero

Lifer
Oct 1, 2010
15,223
5,768
136
Since you don't need/want every IO die to have a GPU you wind up making a special IO die with a GPU. Why not just make a separate GPU chiplet at that point since you're still designing a separate unique piece of silicon, but at least the different chips can be manufactured on separate nodes which are best suited for each chip.

It would depend on what node the IO die is on. If it's N6, the IGP die might not be big enough to be worthwhile to separate at this point given how good the yields are. We're talking about the lowest viable CU possible, if that's 3 or 6 that's what it will be.

Remember there's going to be mobile versions of the entire Raphael lineup to combat Alder/Raptor Lake-S BGA.
 
  • Like
Reactions: Tlh97

jamescox

Senior member
Nov 11, 2009
644
1,105
136
Hmm, wonder if it's just the max socket TDP for AM5. Could be AMD is giving itself some extra headroom.
If they added a lot of extra floating point units (AVX or whatever), then that could pull a lot of power when utilized. Also might be extra power for a high end stacked cache chip version.
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
Stacked on the IO die would mean the IO Die and chiplets are on the same node right? TSMC don't do cross node stacking or is it that they don't do cross node stacking yet?
Depends on the type of stacking. The SoIC stacking without micro-solder bumps probably requires that the chips be the same process. Using the lower density micro-solder ball based stacking doesn’t have those restrictions though, so it should be possibly to mix chips made on different process tech at different locations.
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
I'd say that Genoa looks like Rome, except it has 12 dies instead of 8. 96 was probably chosen because of power consumption and perhaps space. Have to read that article to see what Charlie thinks but Bergamo could be stacked dies or stacked on top of the IO die. Either way the power consumption is going to be crazy.
I was thinking the same, but that requires 3 links per quadrant. They might have gone that route for the first iteration of Zen 4 with only stacked L3. Going up to 128-core doesn’t really fit with that though.

I have also been thinking that 4 die (or stacks) is 32 cores and they could be connected with LSI since they can be directly adjacent to the IO die. Even with higher core counts available, I would expect most sales are still 32-core or less, so the common, single layer part is cheap. Going up to 2 layers is 64 cores, 3 layers is 96, and 4 layers would be 128. The thermal constraint would get worse with each layer, although the SoIC stacking without micro-solder balls has good thermal conductivity compared to the micro-solder ball solution. The die are also polished down very thin. Perhaps 3 layers is doable, but 4 pushes the clocks too low without using some extra cooling tech, which might come later, in a refreshed version. I don’t know how they would handle cache die stacked on top unless they only use the cache die on 32-core, single layer, single core optimized parts only, basically F-series.

It should be interesting once we actually get some 3D stacking rather than just 2.5D. Speculation is going to be all over the place due to the number of possibilities with die stacking tech.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
If AMD is eventually going MCM with their GPUs it wouldn't be a bad idea to get some practical experience with an APU first to work out some of the quirks of using such an approach.
They have this already with Aldebaran, and seems it is also available outside AMD for sometime to partners like HPE and supporting ISVs.

Solid state heat conduction for 3D stacks sounds cool and simple in theory.

But in practice it simply isn't viable for logic stacks without some radically new tech, parhaps thermal transistors which are still largely at the lab research phase IIRC.

The main problem is that the heat won't just move in the right direction without some kind of external force acting upon it, as with liquid current, or the process through which heat pipes work (wicking?) - though I'm not even certain a heat pipe could work adequately at that scale.
Yeah this article I posted describes some of it.
The temperature gradient induces mechanical stress leading to device failure.
There are patents for thermoelectric devices embedded in the device to address these topics, but temperature gradient could be a problem.
Immersion cooling which is very common in HPC could actually accelerate device failure due to the temperature gradient (in case the heat from bottom die cannot be dissipated evenly). SoIC packaging is very desired because of this.
Another knob to tune in addition to the already very long list of knobs.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Interesting news about 170W. These power limits pretty much put a ceiling on stock CPU all core clocks. So it's not like Zen4 uses 170W, it's more like AMD knows that it needs 170W to reach performance they want to have with top SKUs.
AMD could release 170W ZEN3 parts today as long as there is socket with power delivery spec that allows it. CPU is ready to use that power budget with ease, just there is no competition from Intel and they can enjoy efficiency advantages due to running lower all-core clocks and voltages.
 

soresu

Diamond Member
Dec 19, 2014
3,230
2,515
136
Previous APU was monolithic, so Raphael APU is build on chiplet too ?
Raphael is meant to be the successor to Vermeer. It's going to be primarily a high-performance desktop part. It isn't meant to be a replacement for Cezanne.
As DrMrLordX says Raphael is a different market segment and a successor to Vermeer or Vermeer-x (Warhol?).

The replacement for Cezanne will be Rembrandt, likely a monolithic 6nm/N6 die from TSMC.

Rembrandt is rumoured to have 8C Zen3 CPU, 12CU RDNA2 GPU, USB4, PCIe4, DDR5 and probably a host of other goodies less obvious.

It should also support HW AV1 decode as a RDNA2 based chip.
 

soresu

Diamond Member
Dec 19, 2014
3,230
2,515
136
Interesting news about 170W. These power limits pretty much put a ceiling on stock CPU all core clocks. So it's not like Zen4 uses 170W, it's more like AMD knows that it needs 170W to reach performance they want to have with top SKUs.
AMD could release 170W ZEN3 parts today as long as there is socket with power delivery spec that allows it. CPU is ready to use that power budget with ease, just there is no competition from Intel and they can enjoy efficiency advantages due to running lower all-core clocks and voltages.
Probably mostly a 65W reservation for GPU powaaaa.
 

Abwx

Lifer
Apr 2, 2011
11,557
4,349
136
Why build an IO die on N6 thought when you don't see much size reduction (remember the physical interfaces always take up the same amount of space)? Given that AMD already can't get enough wafers to satisfy all of the demand they're seeing it seems bizarre to go down that route.

Also pairing an IO die with such low-end GPU capabilities seems pointless outside of ensuring that everything now has some minimal onboard video. If you want something more powerful then you need yet another piece of silicon. Are they going to make another IO die with 8 - 12 CU?

If you wanted to make a lowest viable CU product, just make it part of a monolithic die. There were some other rumors about AMD doing an Athlon refresh on a newer node at Global Foundries, so it would seem odd to duplicate that using a far more expensive TSMC node.

Athlons and 8C APUs are not on the same market but anyway there s no Zen 4 monolithic APU coming before 2023, rumour is that the 5000s replacements will include a GPU.


Of course a truckload of salt is to be considered...

Also the pic posted by Computerbase seem to show three dies of different sizes, dunno if it s related to the article :

article-630x354.b42a6c0d.jpg


 
  • Like
Reactions: Elfear

Hitman928

Diamond Member
Apr 15, 2012
6,186
10,693
136
Athlons and 8C APUs are not on the same market but anyway there s no Zen 4 monolithic APU coming before 2023, rumour is that the 5000s replacements will include a GPU.


Of course a truckload of salt is to be considered...

Also the pic posted by Computerbase seem to show three dies of different sizes, dunno if it s related to the article :

article-630x354.b42a6c0d.jpg



That image looks like the sample Lisa Su showed of the 3d cache CPU where there's the IO die and then 2 chiplets, one with 3d cache and one without to show the difference.
 

Abwx

Lifer
Apr 2, 2011
11,557
4,349
136
That image looks like the sample Lisa Su showed of the 3d cache CPU where there's the IO die and then 2 chiplets, one with 3d cache and one without to show the difference.

Well, dunno the purpose of a single stacked chip if that s two cpu clusters on the top.

Other than this wonder if it s not the backported Zen 2 below :

 
  • Like
Reactions: lightmanek

LightningZ71

Golden Member
Mar 10, 2017
1,798
2,156
136
Why build an IO die on N6 thought when you don't see much size reduction (remember the physical interfaces always take up the same amount of space)? Given that AMD already can't get enough wafers to satisfy all of the demand they're seeing it seems bizarre to go down that route.

Also pairing an IO die with such low-end GPU capabilities seems pointless outside of ensuring that everything now has some minimal onboard video. If you want something more powerful then you need yet another piece of silicon. Are they going to make another IO die with 8 - 12 CU?

If you wanted to make a lowest viable CU product, just make it part of a monolithic die. There were some other rumors about AMD doing an Athlon refresh on a newer node at Global Foundries, so it would seem odd to duplicate that using a far more expensive TSMC node.

Why not use N6? There are two significant issues that AMD needs to address for competitive reasons. The first is excessive power draw from their IOD chips and IF links between the various dies. N6 can help there as it is an improvement over N7, though not massive, which is a big improvement over GF 14/12LPP, which is currently in use. The N6 based IOD should have notably lower draw from PCIe 4, and the SerDes links connecting it to the CCDs. It will also allow the memory controller run more efficiently. This leaves a greater fraction of the package power for the CCDs.

The second big advantage is density. While N6 is only a minor density improvement over N7, it is a big improvement over GF 12/14LPP. Making the IOD more dense allows AMD the space to add a small iGPU, fixing a competitive disadvantage that they have vs. Intel. Furthermore, since Intel has moved to a Xe based iGPU, it has nontrivial performance available to the user. AMD will need a nontrivial amount of die resources to match or beat it, which means something more than the 3CU Vega solution in Raven2/Dali.

As for choosing N6 over N7, the bits of information that we are getting indicates that it will be a long life node with minimal extra cost over N7, compatible design rules to make migrating IP easier, and a slight wafer yield improvement. While both are more expensive per wafer than gf12/14lpp, there is a cost savings in not having to ship the IODs from their foundry to the package assembly site that offsets some of that. AMD already has much of what's needed for an N6 IOD designed for N7/N6 from working on Cezanne, Renoir, and their revisions, so that's less complicated to migrate as well.
 

jpiniero

Lifer
Oct 1, 2010
15,223
5,768
136
I just don't see the point outside of doing it to provide a bare bones setup to drive a display which some people would no doubt appreciate. However I don't think it adds as much value to the product as it costs them to include.

The mobile version of Raphael is going to need it.
 

eek2121

Diamond Member
Aug 2, 2005
3,100
4,398
136
The IO die can't make as much use because no matter how physically small the transistors can get, the physical interfaces can't shrink. Obviously there's more to the die than just that, but it's still a big part of the total area.

Even if the GPU part only has a really small number of CUs it still needs the front end and other parts for video display. Normally those only take up a small part of a GPU's total area, but they'll be a bigger part relative to a low CU count.

Any space savings you'd get from going to a new node probably are largely lost adding a GPU in so you're not saving any money going with that new node. More likely it costs a fair bit more.

I just don't see the point outside of doing it to provide a bare bones setup to drive a display which some people would no doubt appreciate. However I don't think it adds as much value to the product as it costs them to include.

It is a requirement for larger OEMs. GPUs are near impossible to find. Finally, with FSR the GPU is useful for light gaming.
 

Mopetar

Diamond Member
Jan 31, 2011
8,113
6,768
136
The mobile version of Raphael is going to need it.

Why go through all this extra trouble and not just make a monolithic APU at that point? Is there enough of a market for 16-core mobile CPUs (that also can't just use a desktop CPU for whatever reason) to justify going to a chiplet design? It seems all that's really accomplished is moving all of the video components off of the APU die (along with anything else that also needs to be there which isn't usually included on a chiplet) and putting them on an IO die instead.

It is a requirement for larger OEMs. GPUs are near impossible to find. Finally, with FSR the GPU is useful for light gaming.

I can certainly see the merit in that, but people are proposing a 3 - 6 CU GPU. That's not going to be terribly great even with FSR. Creating a potentially bigger die containing upwards of 12 CU certainly allows more flexibility, but again it comes at the expense of space and practically erodes any advantages of going with a new node outside of lower power use.

Also what stops AMD from selling APUs to OEMs that don't want to use a dedicated graphics card for those builds? Or why not just develop a chiplet-GPU part that connects to the same IO die that they already use? Basically just run with 1/1 CPU/GPU chiplets instead of the 2/0 arrangement that we see with current Zen 3 desktop parts.

This just more and more sounds like a solution in search of a problem than anything.
 

LightningZ71

Golden Member
Mar 10, 2017
1,798
2,156
136
I believe that having the GPU on a separate chiplet on the MCM such as Ryzen currently uses will cost a significant amount of power. The chiplet will constantly keep the IF link between it and the IOD saturated as it makes calls to memory and drives the displays. That IF link will be consuming it's theoretical maximum power draw continuously. For something that isn't sacrificing power at the alter of maximum performance, it just doesn't make sense to do that. Going to an N6 based IOD over the GF14LPP one currently in use is going to allow them enough space on the IOD to have a usable iGPU that is competitive with the market. YEs, I fully realize that IO pad area on the die won't shrink much, but, there's still a significant amount of die area that's not involved in that which can provide enough room for the iGPU. And, remember, the APU dies use design rules that are generalized across all the needs of the die, and biased towards the most performance critical parts from there. The IOD on N6 will certainly have different design rules, or, as they called it, knobs and levers pulled, that will make it more favorable to their intended use. We've already seen this impact density on the SRAM stacked Die in the presentation a few weeks ago.

If we believe the rumor that AMD is working on a mobile version of the 12+ core ryzen products, then this makes even more sense. It allows them to shrink the package a bit, it allows them to save power on the memory controller via the process tech change, and it allows a general reduction in "uncore" power by having a generally more efficient IOD. The extra cost is going to bring them the improvements that they need to be competitive. As a premium product, it will also allow higher ASPs to cover the additional costs.
 
  • Like
Reactions: Joe NYC and Tlh97

jpiniero

Lifer
Oct 1, 2010
15,223
5,768
136
Why go through all this extra trouble and not just make a monolithic APU at that point?

It's on two different nodes - the IO die is presumably on N6 while the CPUs are on N5. Ideally the IGP would be a chiplet on a cheap node but I don't think AMD wants to spend the effort backporting the RDNA2+ IP to some GloFo node.

Is there enough of a market for 16-core mobile CPUs (that also can't just use a desktop CPU for whatever reason) to justify going to a chiplet design?

Might need more than 8 cores for marketing/sales/competitive reasons. And as you've seen with Cezanne the extra L3 makes a big difference in gaming.
 
  • Like
Reactions: Tlh97

eek2121

Diamond Member
Aug 2, 2005
3,100
4,398
136
Why go through all this extra trouble and not just make a monolithic APU at that point? Is there enough of a market for 16-core mobile CPUs (that also can't just use a desktop CPU for whatever reason) to justify going to a chiplet design? It seems all that's really accomplished is moving all of the video components off of the APU die (along with anything else that also needs to be there which isn't usually included on a chiplet) and putting them on an IO die instead.



I can certainly see the merit in that, but people are proposing a 3 - 6 CU GPU. That's not going to be terribly great even with FSR. Creating a potentially bigger die containing upwards of 12 CU certainly allows more flexibility, but again it comes at the expense of space and practically erodes any advantages of going with a new node outside of lower power use.

Also what stops AMD from selling APUs to OEMs that don't want to use a dedicated graphics card for those builds? Or why not just develop a chiplet-GPU part that connects to the same IO die that they already use? Basically just run with 1/1 CPU/GPU chiplets instead of the 2/0 arrangement that we see with current Zen 3 desktop parts.

This just more and more sounds like a solution in search of a problem than anything.

AMD’s competitors all have it. Intel is using Xe for machine learning/AI tasks, so it goes above and beyond gaming.
 

Mopetar

Diamond Member
Jan 31, 2011
8,113
6,768
136
AMD’s competitors all have it. Intel is using Xe for machine learning/AI tasks, so it goes above and beyond gaming.

I don't think that's a particularly good argument considering AMD also sells APUs which contain graphics. While a GPU isn't just limited to gaming, anyone who needs one for the kind of workloads they excel at is going to buy a discrete card because what's included with a CPU typically isn't enough for professional work.

Also, it would probably work considerably better to design separate circuitry for AI/ML tasks as dedicated hardware will be better at that than offloading it to a GPU. Apple does this with their "neural engine" in their SoCs. Even Nvidia has special tensor cores in their GPUs to handle these tasks.

Finally, AMD is having a hard time keeping enough of their Zen 3 CPUs (which completely lack a built-in GPU) in stock to actually satisfy consumer demand. I really don't think they need to go tacking on something that not every consumer needs or wants just because the competition is doing it. If AMD stuck to what all of their competitors were doing, we wouldn't even have Zen in the first place.