Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 86 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
809
1,412
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

jamescox

Senior member
Nov 11, 2009
644
1,105
136
There will be glut of N6 capacity at the time Zen 4 comes out. The mobile hordes are already moving from N7 to N5, and we are still at least 6 months away from first wafer starts for Zen 4 and its IO die.

By then, the mobile customers of TSMC will not only be gone from N7/N6, they will also be in their seasonal low periods.



Separate chip is a goal I bet AMD is aspiring to, but it would mean another communication link (from this GPU chiplet to I/O die, and its power and latency overhead.

Big OEMs like to have some base video.

Whether AMD tries to go beyond that - remains to be seen.



But if AMD is going to keep re-using the CCD between desktop, and server, with potential of multiple CCDs per package, putting the graphics on CCD is a non-starter.
There is almost no overhead for the SoIC stacking that doesn’t use any micro-solder balls. The stacked solution will be more expensive though, so there needs to be a reason to make the stacked device on a separated process rather than just put it on the same die. For the cache die, they seem to have achieved significantly higher density in the 64 MB cache chip by making it on a slightly different process variant. It probably is almost entirely cache though, with a lot of the control already on the base CCD. I don’t think there is much of an advantage for the gpu to be on a separate process from the cpu, although the gpu might be better on a more density optimized variant with the cpu in a more high performance variant. I don’t know if there is enough of an advantage that incurring the cost of a stacked device will make sense. I have no idea how much the stacking tech cost though compared to just using more die area.
 
  • Like
Reactions: Tlh97 and Joe NYC

A///

Diamond Member
Feb 24, 2017
4,351
3,158
136
The exotic cooling might be needed for logic on logic stacking, but not really for L3 SRAM stacking.
Oh, I see now. You were referring to core layers in reference to the theory that AMD would layer cores/logic atop one another and they'd have to come up with a method to cool these layers. This is a very interesting and hotly discussed, pun not intended, topic when it comes to future Zens since space on the mobo is a premium the further smaller the mobo gets.
 

A///

Diamond Member
Feb 24, 2017
4,351
3,158
136
We don't know exactly where the I/O overhead lies, but if a big part of it is in I/O die, then perhaps that would be part of the motivation.

Possibility of stacking on top of I/O die might be simpler with TSMC, and also, if the GPU resides in I/O die, it could have beefier and more efficient GPU.

I think long term objective might be to move APUs to chiplet era, and using TSMC could bring AMD closer to that goal.
That's exactly what I'm alluding to. TSMC has a wealth of tech they can implement for a tighter stack of a product from start to finish for clients. In reference, "tighter stack" isn't inferring physical stacked layers, but overall product offerings, public knowledge or not.
 
  • Like
Reactions: Tlh97 and Joe NYC

Joe NYC

Platinum Member
Jun 26, 2021
2,535
3,466
106
There is almost no overhead for the SoIC stacking that doesn’t use any micro-solder balls. The stacked solution will be more expensive though, so there needs to be a reason to make the stacked device on a separated process rather than just put it on the same die. For the cache die, they seem to have achieved significantly higher density in the 64 MB cache chip by making it on a slightly different process variant. It probably is almost entirely cache though, with a lot of the control already on the base CCD. I don’t think there is much of an advantage for the gpu to be on a separate process from the cpu, although the gpu might be better on a more density optimized variant with the cpu in a more high performance variant. I don’t know if there is enough of an advantage that incurring the cost of a stacked device will make sense. I have no idea how much the stacking tech cost though compared to just using more die area.

There are a few advantages to stack vs same die
- optimized process (as you mentioned) even node. 5nm Zen4 may have 6nm L3 V-Cache
- cost of processed SRAM wafer is likely a fraction of a price of logic wafer. So L3 on logic wafer is a lot more expensive. If you have 80 mm2 of logic and 288mm2 of L3 (see below), that cost differential may be higher than the entire cost of stacking.
- the the 8x64MB would actually not be 288mm2 on logic, non-optimized die. It may in fact be close to double the die size.
- 1, 2, 4, 8 levels high require just 2 types of die rather than 5 (base + 4) or salvaging die using less L3.
- area of core CCD + 8 levels of L3 is 80 + 8*36 = 80 + 288 = 368mm2, which could hurt yields
- vs. assembling 9 (tested) good die.
- AMD can decide late if it wants to stack any L3 and how many
- distances are actually shorter in stacked die vs. monolithic. The height of each layer is extremely thin.

The advantages of stacking IMO could completely outweigh the costs. Additionally, stacking turns an insane product spec - say Epyc with 2 GB of L3 into something quite feasible and affordable.
 
  • Like
Reactions: Tlh97

Joe NYC

Platinum Member
Jun 26, 2021
2,535
3,466
106
Oh, I see now. You were referring to core layers in reference to the theory that AMD would layer cores/logic atop one another and they'd have to come up with a method to cool these layers. This is a very interesting and hotly discussed, pun not intended, topic when it comes to future Zens since space on the mobo is a premium the further smaller the mobo gets.

The CCD chiplet is still quite small compared to size of the overall MCM. Probably not even 20% of the MCM area for single chiplet:

1626581482704.png

But I think the best way to save space in desktops and mobile mobo is to integrate DRAM right into the CPU MCM, like Apple has done. That space saving, eliminating the need for memory DIMMs is same or greater than total area for the CPU MCM.
 
  • Like
Reactions: Tlh97

Joe NYC

Platinum Member
Jun 26, 2021
2,535
3,466
106
That's exactly what I'm alluding to. TSMC has a wealth of tech they can implement for a tighter stack of a product from start to finish for clients. In reference, "tighter stack" isn't inferring physical stacked layers, but overall product offerings, public knowledge or not.

Yup. And also worth considering the level of Automation TSMC is installing in their facility for assembly and stacking - the resulting cost to customers I think it will be very competitive.

And it opens door for TSMC to sell even more wafers as result of stacking...
 
  • Like
Reactions: Tlh97 and A///

A///

Diamond Member
Feb 24, 2017
4,351
3,158
136
The CCD chiplet is still quite small compared to size of the overall MCM. Probably not even 20% of the MCM area for single chiplet:

View attachment 47340

But I think the best way to save space in desktops and mobile mobo is to integrate DRAM right into the CPU MCM, like Apple has done. That space saving, eliminating the need for memory DIMMs is same or greater than total area for the CPU MCM.
Having held the CCD chiplets in my hand that were 'torn' off a CPU (don't have them anymore) they are indeed small. We likely won't see DRAM on the CPU replace DIMMs anytime soon. It's more likely we'll see it both on CPU and DIMM slots in the future. I know I sound crazy but AMD submitted a research paper on such a few years ago. I want to say 2014 or 2016.
 
  • Like
Reactions: Tlh97

A///

Diamond Member
Feb 24, 2017
4,351
3,158
136
When Apple introduces a new line of iPhone, they continue selling last year's model at a discount. For the last few years they've been keeping the low end of the two year old one on the price list as well. Plus there's the "SE" which is only updated every few years and stays on the same SoC for a while - the "SE2" is using A13 which is N7P and will be around 2 or 3 years. When you see carrier deals for iPhones they are often the last year's version, probably because they are cheaper for the carrier and Apple perhaps is willing to offer additional discounts for those they won't on the latest and greatest, so they probably sell a lot more of those than you'd think.

I've seen estimates that only about half of Apple's iPhone sales are of the latest model, and the other half are the N+1/N+2 and SE. Whether that's true we have no way of verifying, but the fact that analysts are guessing that probably means there's plenty of evidence that those older phones sell in much larger numbers than you seem to think.

Then there's the iPad (non-Pro) iPad Mini, Apple TV (three generations now - they still ship the Apple TV HD which uses the 20nm A8 and the old 4K model using the 10nm A10X) Don't forget about the Apple Watch S-series SoC which they sell multiple years versions of, and HomePod which I think is also A8 or at least was at first. And I'm not really sure what process the "W" chips in the AirPod are made with, but it is not going to be N5 simply based on when the various models were released.

None of those sell as many as the iPhone, and some (i.e. Watch and AirPod) have smaller less complex chips, but when you add all that stuff together that's a crapload of wafers on multiple processes older than N5. And that doesn't even count any ancillary chips that may be included in various products that don't get the "fanfare" of the A* M* S* and W* chips.
Really depends on the take rate of these older devices. iPhones last for 5 or 6 years before they become EOL through Apple, and generally don't slow down save for bad batteries. I think the iPhone X and this past year caused more to buy 'older' models.
 
  • Like
Reactions: Tlh97

jpiniero

Lifer
Oct 1, 2010
15,223
5,768
136
The number of customers who want a 16-core Zen CPU with onboard graphics isn't terribly large. I'm not sure the added cost across an entire product line is worth what niche market segments they might be able to pick up or the small bit of extra convenience that the onboard graphics provides if a GPU goes bad and there isn't a spare to use.

It's not the 16 cores as much as it is the better performance. Especially for discrete GPU gaming. You kind of need the IGP for battery life reasons.

The N6 choice was probably driven by Epyc's need for power IO die power consumption. They just didn't want to backport the IP back to a GloFo node, especially when the IGP lets them stay/be in new markets.

I suppose they could update the existing IO die that the 5000 series is using so that it will work in AM5 and use that in some products. But that would mean DDR4.
 
Last edited:
  • Like
Reactions: Joe NYC

Mopetar

Diamond Member
Jan 31, 2011
8,114
6,770
136
It's not the 16 cores as much as it is the better performance. Especially for discrete GPU gaming. You kind of need the IGP for battery life reasons.

Just use an APU if you want the best battery life. Having to move data over infinity fabric from the IO die to a chiplet is just extra power cost.

You're basically describing someone who needs a 16-core laptop, but also doesn't need a powerful GPU to accompany it. I'm sure that there are some people who do legitimately have those kinds of performance needs, but they're already in the desktop replacement crowd that's going to be plugged in most of the time. Even people who game on laptops are probably going to be plugged in if they're playing anything that requires mouse use.

Who is this hypothetical product actually being made to sell to and what real world needs is it addressing? It seems like more like the result of working backwards from a conclusion because it feels entirely like a solution in search of a problem to solve. Only any problem it might solve is better solved by some other product that already exists.
 

jpiniero

Lifer
Oct 1, 2010
15,223
5,768
136
You're basically describing someone who needs a 16-core laptop, but also doesn't need a powerful GPU to accompany it. I'm sure that there are some people who do legitimately have those kinds of performance needs, but they're already in the desktop replacement crowd that's going to be plugged in most of the time. Even people who game on laptops are probably going to be plugged in if they're playing anything that requires mouse use.

It's for when you would use the laptop outside of gaming. There's also a competitiveness issue that OEMs will lose interest in using AMD in gaming laptops and just use exclusively Alder or Raptor if you don't do this.
 

Mopetar

Diamond Member
Jan 31, 2011
8,114
6,770
136
It's for when you would use the laptop outside of gaming. There's also a competitiveness issue that OEMs will lose interest in using AMD in gaming laptops and just use exclusively Alder or Raptor if you don't do this.

So why can't they just pair an AMD APU with a discrete GPU like they're already doing?
 

Mopetar

Diamond Member
Jan 31, 2011
8,114
6,770
136
And what prevents the next AMD APU from leapfrogging Alder Lake in similar fashion assuming it is behind Alder Lake enough for it to matter?
 

Doug S

Platinum Member
Feb 8, 2020
2,784
4,747
136
Thanks for the detail post. Hmm... Maybe I was going by subjective experience, in getting a quote for an older model (on 2 occasions) for our corporate account and when I went to order them they were gone. Discontinued, no stock.



I am looking at the wafer starts, rather than trailing indicator of sales. If this is the last sales breakdown:

View attachment 47355

and the wafer starts are what is going to be sold 3-4 months in the future, I would say, as of today, none of the N7 based iPhone 11 models have any more wafer starts. Only N5 based for Model 12 and upcoming 13 Model. (wafers started in May, 2 months ago).

From overall TSMC sales figure, N7 was down from Q1 to Q2. Apple may have contributed to it and others mobile players as well. But Q3 TSMC, with no more mainstream Iphone orders for N7 (only the budget models), all mainstream models on N5

M1 are on N5 (and the new upcoming variant as well).



There is IPhone, a big gap, Mac, a big gap, and then everything else. not really volume products in the same class.

You may be right about some ancillary chips going to these high volume products that may be trailing edge process node, but the Apple has been putting just about everything on the SOC, there is less of the stuff outside of the SOC. And those ancillary chips are likely on nodes below the node of interest of this thread for AMD, as of now, which is N7 in 2021.


Check the image you inserted more closely. It shows the N7 XR still being sold in June 2021 - and indeed it is still on Apple's site. That will almost certainly be dropped this fall, but they will carry the 11 as their "two years old" model as well as the SE2 which will both require N7P wafer starts through next spring (longer for the SE2 which at the earliest will be upgraded in Spring 2023)

You're also mistaken on the volumes. The iPad, AirPod, and Apple Watch all have higher unit volumes than the Mac. Granted the W* & S* in the latter two are significantly smaller than the M1 so if you go by wafer volume then the iPad the only one that beats the Mac.
 
  • Like
Reactions: Tlh97 and Ajay

Ajay

Lifer
Jan 8, 2001
16,094
8,109
136
're also mistaken on the volumes. The iPad, AirPod, and Apple Watch all have higher unit volumes than the Mac. Granted the W* & S* in the latter two are significantly smaller than the M1 so if you go by wafer volume then the iPad the only one that beats the Mac.
Yes, the various iPads sell around 60M units per year. Not sure what the breakdown is by SoC generation.
 

A///

Diamond Member
Feb 24, 2017
4,351
3,158
136
I suppose they could update the existing IO die that the 5000 series is using so that it will work in AM5 and use that in some products. But that would mean DDR4.
I think I commented on this the other day or some weeks ago, but from what I've personally gathered including asking fellow industry who work on more complex stuff than I do, the IOD will likely be redesigned for TSMC's N7 for Zen 4/6000 series. Epyc will probably use N6 and take part of that efficiency gap it provides over N7. My personal hot take was N7 IOD for both Ryzen and Epyc. Though the proposed theory makes a lot of sense because you're not wasting wafer time on a single node. I suspect Threadripper may get the N6 IOD from Epyc, too, simply because the HEDT space is going to heat up going forward with Intel possibly looking to get back into the game in 2-3 generations if their big.little venture pays off and simply isn't a stupid internet meme.
 
  • Like
Reactions: Tlh97 and Joe NYC

A///

Diamond Member
Feb 24, 2017
4,351
3,158
136
And what prevents the next AMD APU from leapfrogging Alder Lake in similar fashion assuming it is behind Alder Lake enough for it to matter?
Nothing. If you were of age from the 90s into the mid 2000s and could comprehend benchmark articles then none of this should surprise you. It's younger folks who only grew up with the Core era thinking Intel was the best. Core was a whirlwind of an architecture for Intel. Truly one of their best in history. Younger folks, and I want to be careful here and not make a derogatory comment on younger generations because I'm not that kind of person (old man yells at sky), but Intel has failed more than they have succeeded. This can be said for just about any company in tech, but Intel was always bested by AMD over and over again. I remember speaking with former Intel engineers from that era who claimed they suffered mental health problems because Intel's corporate level kept beating down on them to give it all they've got to AMD who'd routinely snatch their lunch money from them.

I hope Intel does make a decent effort, because if AMD is left alone, not only will they keep improving as they're not a company that prefers being stagnant (they have this "us first" fascination to them) but as a result of bringing better tech, they will charge more. If people balk at the prices now, like @DrMrLordX did back before Zen 3 launched, then they'll lose their minds in a few years.
 
  • Like
Reactions: Ajay and Tlh97

Joe NYC

Platinum Member
Jun 26, 2021
2,535
3,466
106
Check the image you inserted more closely. It shows the N7 XR still being sold in June 2021 - and indeed it is still on Apple's site. That will almost certainly be dropped this fall, but they will carry the 11 as their "two years old" model as well as the SE2 which will both require N7P wafer starts through next spring (longer for the SE2 which at the earliest will be upgraded in Spring 2023)

X model is at ~5% and that's where a single 11 will be headed. While the higher end models of 12 will start to disappear.

BTW, I don't know what it is about the SE, but it is being pushed by phone carriers in the US.

You're also mistaken on the volumes. The iPad, AirPod, and Apple Watch all have higher unit volumes than the Mac. Granted the W* & S* in the latter two are significantly smaller than the M1 so if you go by wafer volume then the iPad the only one that beats the Mac.

Wow, I am surprised about iPad being higher volume than Mac. I have to look into it some more, since tablets are really obsolete as a consumer devices. I do see it being used at some businesses.

Looking at IPad Pro and Air on N5, Mini likely moving to N5 later this year, which just leaves the regular iPad on N7

The other devices small dies and or small volume to take into account for TSMC N7
 
  • Like
Reactions: Tlh97

DrMrLordX

Lifer
Apr 27, 2000
22,065
11,693
136
I am skeptical about whether or not OEMs will want Raphael in their laptops and mass-market desktops. The only flaw in AMD's APUs as of late is that they have been late to market compared to desktop, workstation, and server parts. AMD's current Cezanne offerings are very competitive CPU-wise (unlike their APUs from days of yore). Even if Vermeer had had an iGPU:

-Vermeer only had 4 SKUs
-Vermeer could not be had for less than $299 MSRP
-Vermeer had an I/O die with all the power draw issues that come with it (see below)
-Vermeer only had one current-gen "budget" chipset option (A520) which might not have met all of OEM's needs. B550 and X570 might not have met OEM price targets.

AMD's OEM laptop problems in particular are often related to power budget. AMD's APUs still have relatively "big" iGPUs which means a fairly significant part of the cTDP budget has to be committed to the iGPU in instances where it can be utilized. OEMs seem interested in including budget dGPUs to shift power budget away from the SoC and enable them to use cheaper distributed cooling solutions. If they include a budget dGPU and disable the iGPU, it's just wasted silicon. Throw something like Raphael into the mix and now you've got cTDP budget being consumed by the I/O die. Not to speak of potential platform power draw issues!

And judging by the way many OEMs build their "desktops" now, I'm assuming the same issues could prevail in All-in-One desktop units etc.

What OEMs would really want from AMD is something like their 5900HX or HS, only with a tiny iGPU so that OEMs can slap on whatever dGPU they like and then configure cTDP accordingly. Raphael doesn't give OEMs that(read: I/O die), and I'm not sure Rembrandt will either.

For OEM boxes in the $1000+ range that feature decent build quality and more-serious components, i do see Raphael with small iGPU as being potentially viable, even for non-gamer (read: workstation) buyers. That's where I see the Raphael iGPU opening a few doors for AMD. But for laptops and cheaper desktops? Maybe not so much.
 
May 17, 2020
123
233
116
I am skeptical about whether or not OEMs will want Raphael in their laptops and mass-market desktops. The only flaw in AMD's APUs as of late is that they have been late to market compared to desktop, workstation, and server parts. AMD's current Cezanne offerings are very competitive CPU-wise (unlike their APUs from days of yore). Even if Vermeer had had an iGPU:
Raphael is on AM5 socket so it's only for desktop not for laptop, after Cezanne for laptop there will be Rembrandt and Phoenix

AMD's OEM laptop problems in particular are often related to power budget.
Renoir is power efficient, like Cezanne so i don't understand. Which laptop are-you talking about ?
 
  • Like
Reactions: Tlh97 and Joe NYC

eek2121

Diamond Member
Aug 2, 2005
3,100
4,398
136
I am skeptical about whether or not OEMs will want Raphael in their laptops and mass-market desktops. The only flaw in AMD's APUs as of late is that they have been late to market compared to desktop, workstation, and server parts. AMD's current Cezanne offerings are very competitive CPU-wise (unlike their APUs from days of yore). Even if Vermeer had had an iGPU:

-Vermeer only had 4 SKUs
-Vermeer could not be had for less than $299 MSRP
-Vermeer had an I/O die with all the power draw issues that come with it (see below)
-Vermeer only had one current-gen "budget" chipset option (A520) which might not have met all of OEM's needs. B550 and X570 might not have met OEM price targets.

AMD's OEM laptop problems in particular are often related to power budget. AMD's APUs still have relatively "big" iGPUs which means a fairly significant part of the cTDP budget has to be committed to the iGPU in instances where it can be utilized. OEMs seem interested in including budget dGPUs to shift power budget away from the SoC and enable them to use cheaper distributed cooling solutions. If they include a budget dGPU and disable the iGPU, it's just wasted silicon. Throw something like Raphael into the mix and now you've got cTDP budget being consumed by the I/O die. Not to speak of potential platform power draw issues!

And judging by the way many OEMs build their "desktops" now, I'm assuming the same issues could prevail in All-in-One desktop units etc.

What OEMs would really want from AMD is something like their 5900HX or HS, only with a tiny iGPU so that OEMs can slap on whatever dGPU they like and then configure cTDP accordingly. Raphael doesn't give OEMs that(read: I/O die), and I'm not sure Rembrandt will either.

For OEM boxes in the $1000+ range that feature decent build quality and more-serious components, i do see Raphael with small iGPU as being potentially viable, even for non-gamer (read: workstation) buyers. That's where I see the Raphael iGPU opening a few doors for AMD. But for laptops and cheaper desktops? Maybe not so much.

Raphael appears to be the only “Zen 4” part (ignoring Genoa) leaked thus far.

I suspect this means exactly what has been leaked: Raphael will have an iGPU, and there will be no “G” parts this time around.

Laptops aren’t moving to Zen 4, according to leaks.
 

moinmoin

Diamond Member
Jun 1, 2017
5,064
8,032
136
How are regular APUs simultaneously not good enough for AMD, but good enough for the competition? That doesn't make any sense.
You know we are discussing future products? You can bet that Intel will put a version of its 8+8 ADL as H series on laptops as well, TDP be damned. Currently AMD equals even H series CPUs with its APUs, but I doubt AMD will do a monolithic 16c APU to be able to do the same with ADL as well. That's where Raphael with any iGPU can slot in.

AMD wants to be clearly ahead of the competition spec wise. Intel trying to match AMD pushed more and more desktop style chips into the laptop market, so with said market and its audience getting used to those frankly absurd machines AMD is likely to follow suit now.
 
  • Like
Reactions: Tlh97
May 17, 2020
123
233
116
You know we are discussing future products? You can bet that Intel will put a version of its 8+8 ADL as H series on laptops as well, TDP be damned. Currently AMD equals even H series CPUs with its APUs, but I doubt AMD will do a monolithic 16c APU to be able to do the same with ADL as well. That's where Raphael with any iGPU can slot in.

AMD wants to be clearly ahead of the competition spec wise. Intel trying to match AMD pushed more and more desktop style chips into the laptop market, so with said market and its audience getting used to those frankly absurd machines AMD is likely to follow suit now.
Raphael is AM5 only... Rembrandt and/or Phoenix will be in laptops and Phoenix is in 5nm, i hope that AMD can fit more cores on it