Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 75 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
821
1,457
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

soresu

Diamond Member
Dec 19, 2014
4,115
3,570
136
That's refrigerant based cooling.
So is the TSMC effort.

Solid state heat conduction for 3D stacks sounds cool and simple in theory.

But in practice it simply isn't viable for logic stacks without some radically new tech, parhaps thermal transistors which are still largely at the lab research phase IIRC.

The main problem is that the heat won't just move in the right direction without some kind of external force acting upon it, as with liquid current, or the process through which heat pipes work (wicking?) - though I'm not even certain a heat pipe could work adequately at that scale.

Micro fluidic silicon via channels are a lot more advanced than anything currently in the cooling market that does not cost an arm and a leg to operate.

It wouldn't surprise me if all in one MF pump/reservoir/radiator coolers could be created which just directly attach to the socket as current air coolers do - and likely somewhat more compact than current AIO models, at least for SKUs closer to current TDPs rather than 4-8 hi stacks of logic, or something with similarly insane thermal density.

I have to say that I've never been a fan of liquid cooling before, but this does make me hope for a future with it fully integrated at chip level.

At least until some near perfect solid state solution arrives in the future, assuming that better materials/devices for logic, memory, IO and power IC's don't make cooling largely redundant by then.

Which advanced spintronic devices/materials could certainly do for logic and memory - likewise for photonics and IO. Power remains problematic though, unless graphene or some similar 2D material can handle that without significant resistance thermals in stacks.
 

eek2121

Diamond Member
Aug 2, 2005
3,414
5,051
136
Dunno, but if the pic above is accurate then the i/o device seems quite big since TSMC s 6nm has 3x GF s 14LP density, so possibly for a basic GPU.

Probably a pre overclocked model. Anyone remember the Athlon FX-51 and similar models from back in the day?

Coincidentally, a 280mm AIO can cool a 170W CPU. without much issue. Maybe we will see 5ghz all core out of it.
 

Mopetar

Diamond Member
Jan 31, 2011
8,488
7,729
136
Most likely the IGP is a part of the IO die.

Seems like a bit of an odd combination given that the IO doesn't benefit as much from smaller nodes because the physical interface can't shrink, whereas the GPU benefits a lot from being on a newer node.

Since you don't need/want every IO die to have a GPU you wind up making a special IO die with a GPU. Why not just make a separate GPU chiplet at that point since you're still designing a separate unique piece of silicon, but at least the different chips can be manufactured on separate nodes which are best suited for each chip.

If AMD is eventually going MCM with their GPUs it wouldn't be a bad idea to get some practical experience with an APU first to work out some of the quirks of using such an approach.
 

jpiniero

Lifer
Oct 1, 2010
16,814
7,257
136
Since you don't need/want every IO die to have a GPU you wind up making a special IO die with a GPU. Why not just make a separate GPU chiplet at that point since you're still designing a separate unique piece of silicon, but at least the different chips can be manufactured on separate nodes which are best suited for each chip.

It would depend on what node the IO die is on. If it's N6, the IGP die might not be big enough to be worthwhile to separate at this point given how good the yields are. We're talking about the lowest viable CU possible, if that's 3 or 6 that's what it will be.

Remember there's going to be mobile versions of the entire Raphael lineup to combat Alder/Raptor Lake-S BGA.
 
  • Like
Reactions: Tlh97

jamescox

Senior member
Nov 11, 2009
644
1,105
136
Hmm, wonder if it's just the max socket TDP for AM5. Could be AMD is giving itself some extra headroom.
If they added a lot of extra floating point units (AVX or whatever), then that could pull a lot of power when utilized. Also might be extra power for a high end stacked cache chip version.
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
Stacked on the IO die would mean the IO Die and chiplets are on the same node right? TSMC don't do cross node stacking or is it that they don't do cross node stacking yet?
Depends on the type of stacking. The SoIC stacking without micro-solder bumps probably requires that the chips be the same process. Using the lower density micro-solder ball based stacking doesn’t have those restrictions though, so it should be possibly to mix chips made on different process tech at different locations.
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
I'd say that Genoa looks like Rome, except it has 12 dies instead of 8. 96 was probably chosen because of power consumption and perhaps space. Have to read that article to see what Charlie thinks but Bergamo could be stacked dies or stacked on top of the IO die. Either way the power consumption is going to be crazy.
I was thinking the same, but that requires 3 links per quadrant. They might have gone that route for the first iteration of Zen 4 with only stacked L3. Going up to 128-core doesn’t really fit with that though.

I have also been thinking that 4 die (or stacks) is 32 cores and they could be connected with LSI since they can be directly adjacent to the IO die. Even with higher core counts available, I would expect most sales are still 32-core or less, so the common, single layer part is cheap. Going up to 2 layers is 64 cores, 3 layers is 96, and 4 layers would be 128. The thermal constraint would get worse with each layer, although the SoIC stacking without micro-solder balls has good thermal conductivity compared to the micro-solder ball solution. The die are also polished down very thin. Perhaps 3 layers is doable, but 4 pushes the clocks too low without using some extra cooling tech, which might come later, in a refreshed version. I don’t know how they would handle cache die stacked on top unless they only use the cache die on 32-core, single layer, single core optimized parts only, basically F-series.

It should be interesting once we actually get some 3D stacking rather than just 2.5D. Speculation is going to be all over the place due to the number of possibilities with die stacking tech.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,777
6,791
136
If AMD is eventually going MCM with their GPUs it wouldn't be a bad idea to get some practical experience with an APU first to work out some of the quirks of using such an approach.
They have this already with Aldebaran, and seems it is also available outside AMD for sometime to partners like HPE and supporting ISVs.

Solid state heat conduction for 3D stacks sounds cool and simple in theory.

But in practice it simply isn't viable for logic stacks without some radically new tech, parhaps thermal transistors which are still largely at the lab research phase IIRC.

The main problem is that the heat won't just move in the right direction without some kind of external force acting upon it, as with liquid current, or the process through which heat pipes work (wicking?) - though I'm not even certain a heat pipe could work adequately at that scale.
Yeah this article I posted describes some of it.
The temperature gradient induces mechanical stress leading to device failure.
There are patents for thermoelectric devices embedded in the device to address these topics, but temperature gradient could be a problem.
Immersion cooling which is very common in HPC could actually accelerate device failure due to the temperature gradient (in case the heat from bottom die cannot be dissipated evenly). SoIC packaging is very desired because of this.
Another knob to tune in addition to the already very long list of knobs.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Interesting news about 170W. These power limits pretty much put a ceiling on stock CPU all core clocks. So it's not like Zen4 uses 170W, it's more like AMD knows that it needs 170W to reach performance they want to have with top SKUs.
AMD could release 170W ZEN3 parts today as long as there is socket with power delivery spec that allows it. CPU is ready to use that power budget with ease, just there is no competition from Intel and they can enjoy efficiency advantages due to running lower all-core clocks and voltages.
 

soresu

Diamond Member
Dec 19, 2014
4,115
3,570
136
Previous APU was monolithic, so Raphael APU is build on chiplet too ?
Raphael is meant to be the successor to Vermeer. It's going to be primarily a high-performance desktop part. It isn't meant to be a replacement for Cezanne.
As DrMrLordX says Raphael is a different market segment and a successor to Vermeer or Vermeer-x (Warhol?).

The replacement for Cezanne will be Rembrandt, likely a monolithic 6nm/N6 die from TSMC.

Rembrandt is rumoured to have 8C Zen3 CPU, 12CU RDNA2 GPU, USB4, PCIe4, DDR5 and probably a host of other goodies less obvious.

It should also support HW AV1 decode as a RDNA2 based chip.
 

soresu

Diamond Member
Dec 19, 2014
4,115
3,570
136
Interesting news about 170W. These power limits pretty much put a ceiling on stock CPU all core clocks. So it's not like Zen4 uses 170W, it's more like AMD knows that it needs 170W to reach performance they want to have with top SKUs.
AMD could release 170W ZEN3 parts today as long as there is socket with power delivery spec that allows it. CPU is ready to use that power budget with ease, just there is no competition from Intel and they can enjoy efficiency advantages due to running lower all-core clocks and voltages.
Probably mostly a 65W reservation for GPU powaaaa.
 

Mopetar

Diamond Member
Jan 31, 2011
8,488
7,729
136
It would depend on what node the IO die is on. If it's N6, the IGP die might not be big enough to be worthwhile to separate at this point given how good the yields are. We're talking about the lowest viable CU possible, if that's 3 or 6 that's what it will be.

Remember there's going to be mobile versions of the entire Raphael lineup to combat Alder/Raptor Lake-S BGA.

Why build an IO die on N6 thought when you don't see much size reduction (remember the physical interfaces always take up the same amount of space)? Given that AMD already can't get enough wafers to satisfy all of the demand they're seeing it seems bizarre to go down that route.

Also pairing an IO die with such low-end GPU capabilities seems pointless outside of ensuring that everything now has some minimal onboard video. If you want something more powerful then you need yet another piece of silicon. Are they going to make another IO die with 8 - 12 CU?

If you wanted to make a lowest viable CU product, just make it part of a monolithic die. There were some other rumors about AMD doing an Athlon refresh on a newer node at Global Foundries, so it would seem odd to duplicate that using a far more expensive TSMC node.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
Why build an IO die on N6 thought when you don't see much size reduction (remember the physical interfaces always take up the same amount of space)? Given that AMD already can't get enough wafers to satisfy all of the demand they're seeing it seems bizarre to go down that route.

Also pairing an IO die with such low-end GPU capabilities seems pointless outside of ensuring that everything now has some minimal onboard video. If you want something more powerful then you need yet another piece of silicon. Are they going to make another IO die with 8 - 12 CU?

If you wanted to make a lowest viable CU product, just make it part of a monolithic die. There were some other rumors about AMD doing an Athlon refresh on a newer node at Global Foundries, so it would seem odd to duplicate that using a far more expensive TSMC node.

Athlons and 8C APUs are not on the same market but anyway there s no Zen 4 monolithic APU coming before 2023, rumour is that the 5000s replacements will include a GPU.


Of course a truckload of salt is to be considered...

Also the pic posted by Computerbase seem to show three dies of different sizes, dunno if it s related to the article :

article-630x354.b42a6c0d.jpg


 
  • Like
Reactions: Elfear

Hitman928

Diamond Member
Apr 15, 2012
6,695
12,370
136
Athlons and 8C APUs are not on the same market but anyway there s no Zen 4 monolithic APU coming before 2023, rumour is that the 5000s replacements will include a GPU.


Of course a truckload of salt is to be considered...

Also the pic posted by Computerbase seem to show three dies of different sizes, dunno if it s related to the article :

article-630x354.b42a6c0d.jpg



That image looks like the sample Lisa Su showed of the 3d cache CPU where there's the IO die and then 2 chiplets, one with 3d cache and one without to show the difference.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,509
3,191
136
Why build an IO die on N6 thought when you don't see much size reduction (remember the physical interfaces always take up the same amount of space)? Given that AMD already can't get enough wafers to satisfy all of the demand they're seeing it seems bizarre to go down that route.

Also pairing an IO die with such low-end GPU capabilities seems pointless outside of ensuring that everything now has some minimal onboard video. If you want something more powerful then you need yet another piece of silicon. Are they going to make another IO die with 8 - 12 CU?

If you wanted to make a lowest viable CU product, just make it part of a monolithic die. There were some other rumors about AMD doing an Athlon refresh on a newer node at Global Foundries, so it would seem odd to duplicate that using a far more expensive TSMC node.

Why not use N6? There are two significant issues that AMD needs to address for competitive reasons. The first is excessive power draw from their IOD chips and IF links between the various dies. N6 can help there as it is an improvement over N7, though not massive, which is a big improvement over GF 14/12LPP, which is currently in use. The N6 based IOD should have notably lower draw from PCIe 4, and the SerDes links connecting it to the CCDs. It will also allow the memory controller run more efficiently. This leaves a greater fraction of the package power for the CCDs.

The second big advantage is density. While N6 is only a minor density improvement over N7, it is a big improvement over GF 12/14LPP. Making the IOD more dense allows AMD the space to add a small iGPU, fixing a competitive disadvantage that they have vs. Intel. Furthermore, since Intel has moved to a Xe based iGPU, it has nontrivial performance available to the user. AMD will need a nontrivial amount of die resources to match or beat it, which means something more than the 3CU Vega solution in Raven2/Dali.

As for choosing N6 over N7, the bits of information that we are getting indicates that it will be a long life node with minimal extra cost over N7, compatible design rules to make migrating IP easier, and a slight wafer yield improvement. While both are more expensive per wafer than gf12/14lpp, there is a cost savings in not having to ship the IODs from their foundry to the package assembly site that offsets some of that. AMD already has much of what's needed for an N6 IOD designed for N7/N6 from working on Cezanne, Renoir, and their revisions, so that's less complicated to migrate as well.