Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 76 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
800
1,364
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

Mopetar

Diamond Member
Jan 31, 2011
7,935
6,227
136
Why not use N6? There are two significant issues that AMD needs to address for competitive reasons.

The second big advantage is density.

The IO die can't make as much use because no matter how physically small the transistors can get, the physical interfaces can't shrink. Obviously there's more to the die than just that, but it's still a big part of the total area.

Even if the GPU part only has a really small number of CUs it still needs the front end and other parts for video display. Normally those only take up a small part of a GPU's total area, but they'll be a bigger part relative to a low CU count.

Any space savings you'd get from going to a new node probably are largely lost adding a GPU in so you're not saving any money going with that new node. More likely it costs a fair bit more.

I just don't see the point outside of doing it to provide a bare bones setup to drive a display which some people would no doubt appreciate. However I don't think it adds as much value to the product as it costs them to include.
 

jpiniero

Lifer
Oct 1, 2010
14,686
5,316
136
I just don't see the point outside of doing it to provide a bare bones setup to drive a display which some people would no doubt appreciate. However I don't think it adds as much value to the product as it costs them to include.

The mobile version of Raphael is going to need it.
 

eek2121

Platinum Member
Aug 2, 2005
2,933
4,030
136
The IO die can't make as much use because no matter how physically small the transistors can get, the physical interfaces can't shrink. Obviously there's more to the die than just that, but it's still a big part of the total area.

Even if the GPU part only has a really small number of CUs it still needs the front end and other parts for video display. Normally those only take up a small part of a GPU's total area, but they'll be a bigger part relative to a low CU count.

Any space savings you'd get from going to a new node probably are largely lost adding a GPU in so you're not saving any money going with that new node. More likely it costs a fair bit more.

I just don't see the point outside of doing it to provide a bare bones setup to drive a display which some people would no doubt appreciate. However I don't think it adds as much value to the product as it costs them to include.

It is a requirement for larger OEMs. GPUs are near impossible to find. Finally, with FSR the GPU is useful for light gaming.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
It is a requirement for larger OEMs. GPUs are near impossible to find. Finally, with FSR the GPU is useful for light gaming.
From what I recall reading online, all the Raphaels will get an igpu. Do you think AMD may leverage hardware acceleration like Quick Sync?
 

Mopetar

Diamond Member
Jan 31, 2011
7,935
6,227
136
The mobile version of Raphael is going to need it.

Why go through all this extra trouble and not just make a monolithic APU at that point? Is there enough of a market for 16-core mobile CPUs (that also can't just use a desktop CPU for whatever reason) to justify going to a chiplet design? It seems all that's really accomplished is moving all of the video components off of the APU die (along with anything else that also needs to be there which isn't usually included on a chiplet) and putting them on an IO die instead.

It is a requirement for larger OEMs. GPUs are near impossible to find. Finally, with FSR the GPU is useful for light gaming.

I can certainly see the merit in that, but people are proposing a 3 - 6 CU GPU. That's not going to be terribly great even with FSR. Creating a potentially bigger die containing upwards of 12 CU certainly allows more flexibility, but again it comes at the expense of space and practically erodes any advantages of going with a new node outside of lower power use.

Also what stops AMD from selling APUs to OEMs that don't want to use a dedicated graphics card for those builds? Or why not just develop a chiplet-GPU part that connects to the same IO die that they already use? Basically just run with 1/1 CPU/GPU chiplets instead of the 2/0 arrangement that we see with current Zen 3 desktop parts.

This just more and more sounds like a solution in search of a problem than anything.
 

LightningZ71

Golden Member
Mar 10, 2017
1,629
1,898
136
I believe that having the GPU on a separate chiplet on the MCM such as Ryzen currently uses will cost a significant amount of power. The chiplet will constantly keep the IF link between it and the IOD saturated as it makes calls to memory and drives the displays. That IF link will be consuming it's theoretical maximum power draw continuously. For something that isn't sacrificing power at the alter of maximum performance, it just doesn't make sense to do that. Going to an N6 based IOD over the GF14LPP one currently in use is going to allow them enough space on the IOD to have a usable iGPU that is competitive with the market. YEs, I fully realize that IO pad area on the die won't shrink much, but, there's still a significant amount of die area that's not involved in that which can provide enough room for the iGPU. And, remember, the APU dies use design rules that are generalized across all the needs of the die, and biased towards the most performance critical parts from there. The IOD on N6 will certainly have different design rules, or, as they called it, knobs and levers pulled, that will make it more favorable to their intended use. We've already seen this impact density on the SRAM stacked Die in the presentation a few weeks ago.

If we believe the rumor that AMD is working on a mobile version of the 12+ core ryzen products, then this makes even more sense. It allows them to shrink the package a bit, it allows them to save power on the memory controller via the process tech change, and it allows a general reduction in "uncore" power by having a generally more efficient IOD. The extra cost is going to bring them the improvements that they need to be competitive. As a premium product, it will also allow higher ASPs to cover the additional costs.
 
  • Like
Reactions: Joe NYC and Tlh97

jpiniero

Lifer
Oct 1, 2010
14,686
5,316
136
Why go through all this extra trouble and not just make a monolithic APU at that point?

It's on two different nodes - the IO die is presumably on N6 while the CPUs are on N5. Ideally the IGP would be a chiplet on a cheap node but I don't think AMD wants to spend the effort backporting the RDNA2+ IP to some GloFo node.

Is there enough of a market for 16-core mobile CPUs (that also can't just use a desktop CPU for whatever reason) to justify going to a chiplet design?

Might need more than 8 cores for marketing/sales/competitive reasons. And as you've seen with Cezanne the extra L3 makes a big difference in gaming.
 
  • Like
Reactions: Tlh97

eek2121

Platinum Member
Aug 2, 2005
2,933
4,030
136
Why go through all this extra trouble and not just make a monolithic APU at that point? Is there enough of a market for 16-core mobile CPUs (that also can't just use a desktop CPU for whatever reason) to justify going to a chiplet design? It seems all that's really accomplished is moving all of the video components off of the APU die (along with anything else that also needs to be there which isn't usually included on a chiplet) and putting them on an IO die instead.



I can certainly see the merit in that, but people are proposing a 3 - 6 CU GPU. That's not going to be terribly great even with FSR. Creating a potentially bigger die containing upwards of 12 CU certainly allows more flexibility, but again it comes at the expense of space and practically erodes any advantages of going with a new node outside of lower power use.

Also what stops AMD from selling APUs to OEMs that don't want to use a dedicated graphics card for those builds? Or why not just develop a chiplet-GPU part that connects to the same IO die that they already use? Basically just run with 1/1 CPU/GPU chiplets instead of the 2/0 arrangement that we see with current Zen 3 desktop parts.

This just more and more sounds like a solution in search of a problem than anything.

AMD’s competitors all have it. Intel is using Xe for machine learning/AI tasks, so it goes above and beyond gaming.
 

Mopetar

Diamond Member
Jan 31, 2011
7,935
6,227
136
AMD’s competitors all have it. Intel is using Xe for machine learning/AI tasks, so it goes above and beyond gaming.

I don't think that's a particularly good argument considering AMD also sells APUs which contain graphics. While a GPU isn't just limited to gaming, anyone who needs one for the kind of workloads they excel at is going to buy a discrete card because what's included with a CPU typically isn't enough for professional work.

Also, it would probably work considerably better to design separate circuitry for AI/ML tasks as dedicated hardware will be better at that than offloading it to a GPU. Apple does this with their "neural engine" in their SoCs. Even Nvidia has special tensor cores in their GPUs to handle these tasks.

Finally, AMD is having a hard time keeping enough of their Zen 3 CPUs (which completely lack a built-in GPU) in stock to actually satisfy consumer demand. I really don't think they need to go tacking on something that not every consumer needs or wants just because the competition is doing it. If AMD stuck to what all of their competitors were doing, we wouldn't even have Zen in the first place.
 

Mopetar

Diamond Member
Jan 31, 2011
7,935
6,227
136
It's on two different nodes - the IO die is presumably on N6 while the CPUs are on N5. Ideally the IGP would be a chiplet on a cheap node but I don't think AMD wants to spend the effort backporting the RDNA2+ IP to some GloFo node.

This makes even less sense. Why would you put an IO die on N6 and make a graphics chiplet on an older node? The graphics logic pretty much all benefits from a die shrink whereas likely half of the IO die doesn't. Put the graphics chiplet on N6 and the IO die on the cheap node.
 

eek2121

Platinum Member
Aug 2, 2005
2,933
4,030
136
I don't think that's a particularly good argument considering AMD also sells APUs which contain graphics. While a GPU isn't just limited to gaming, anyone who needs one for the kind of workloads they excel at is going to buy a discrete card because what's included with a CPU typically isn't enough for professional work.

Also, it would probably work considerably better to design separate circuitry for AI/ML tasks as dedicated hardware will be better at that than offloading it to a GPU. Apple does this with their "neural engine" in their SoCs. Even Nvidia has special tensor cores in their GPUs to handle these tasks.

Finally, AMD is having a hard time keeping enough of their Zen 3 CPUs (which completely lack a built-in GPU) in stock to actually satisfy consumer demand. I really don't think they need to go tacking on something that not every consumer needs or wants just because the competition is doing it. If AMD stuck to what all of their competitors were doing, we wouldn't even have Zen in the first place.

Why would I buy a GPU for a virus scanner? What about encoding movies? What about photography work? There are several use cases for having a GPU capable of GPGPU workloads. Windows defender already uses the GPU for virus scanning. Expect Windows and other software to use the GPU for other workloads in the future. Smart compression? Encryption? fast hashing? Video/photo upscaling?

Why should an end user have to buy a gpu?
 

Mopetar

Diamond Member
Jan 31, 2011
7,935
6,227
136
They don't. There are plenty of APUs that include a bare-bones GPU for people who don't need a lot of 3D graphical capabilities or just do some light processing. Anyone doing any kind of professional work is going to want a discrete GPU. Hell, even a casual user might be able to benefit from a discrete card if they spend a considerable amount of time using those capabilities.

But not everyone needs an iGPU. Why should they be forced to buy something that they don't need or will never use?
 

jpiniero

Lifer
Oct 1, 2010
14,686
5,316
136
This makes even less sense. Why would you put an IO die on N6 and make a graphics chiplet on an older node? The graphics logic pretty much all benefits from a die shrink whereas likely half of the IO die doesn't. Put the graphics chiplet on N6 and the IO die on the cheap node.

Power savings? The graphics can use the power savings too, just that I think that the tradeoff of the cost savings could be worth it in some cases.

But not everyone needs an iGPU. Why should they be forced to buy something that they don't need or will never use?

Eventually I think AMD will separate the IGP out as a chiplet. When that will happen I have no idea.

I could see the DIY focused models having the IGP disabled for yield purposes.
 

Mopetar

Diamond Member
Jan 31, 2011
7,935
6,227
136
Maybe I just misread what you wrote, but you basically made it sound like you were proposing the following:

N5: Zen chiplet
N6: IO die
??: GPU chiplet

It makes far more sense to use the following configuration:

N5: Zen chiplet
N6: GPU chiplet
??: IO die

To better illustrate why this is, here's some die annotations from @GPUsAreMagic on Twitter.

This first one is Zepplin (Zen1) which was made on the Global Foundries 14LP process:

EiSh0OSX0AEifhr


The next is Renoir (Zen2) which was made on the TSMC 7nm process:

Ea49yMHXkAMc-6W


Even without doing a detailed analysis it's pretty easy to just eyeball the core sizes and compare them to the DDR PHY. In the first image it looks like about 2 of the Zen 1 cores (include the blue band identified as the L2 cache) will fit in the same physical area. Now compare this with the second image (here the gold areas identified as "Core" contain L2 cache) where you can fit at least 4 of those Zen 2 cores. It's pretty clear to see that the cores were able to get a lot smaller due to the shrink enabled by going from the 14nm GF node to the 7nm TSMC node. The size of the IO doesn't get a similar benefit.

For one more point of reference here's the IO die that was used with Matisse (Zen2) which I believe used the 12nm process from Global Foundries:

EWis7yzWoAMbkOD


All of the parts annotated PHY are parts that don't really benefit from process node shrink. An eyeball estimate puts them somewhere slightly less than half of the die area.

Maybe in an ideal world where you can get as many wafers as you want you put your IO die on best node available just because it will have better power characteristics and that's important, but we don't live in that reality. So why waste wafers on a chip that contains mostly parts that don't shrink down as well as another chip that scales far, far better.

Perhaps there's some really clever design possible, such as stacking a lot of 3D cache over top of part of (or all of) that PHY, but I can't really speak to the feasibility of doing something like that. However, the only reason to move the IO die to 6N is because they can't make anything else on it and given the utter shortage of GPUs right now, that's really hard to believe.
 
  • Like
Reactions: Vattila

Mopetar

Diamond Member
Jan 31, 2011
7,935
6,227
136
It may be a moot point anyway since I think the main reason AMD is using N6 for the IO die is because that's where the IP has been designed for.

That's just as well explained by an eventual APU (or other products like SoCs or GPUs that would also utilize those units) being designed for N6. An APU needs to incorporate all of the IO onto the APU so there would be a need to have designs for all of that on the process as well. Rumors about someone doing design work for that on a particular node doesn't necessarily imply it's for an IO die.
 
Last edited:

Ajay

Lifer
Jan 8, 2001
15,614
7,945
136
That's just as well explained by an eventual APU (or other products like SoCs or GPUs that would also utilize those units) being designed for N6. An APU needs to incorporate all of the IO onto the APU so there would be a need to have designs for all of that on the process as well. Rumors about someone doing design work for that on a particular node doesn't necessarily imply it's for an IO die.
Seems like it would make much more sense to target 7N for Zen5 IODs. Save EUV for the chiplets where it will make the most difference.
 

moinmoin

Diamond Member
Jun 1, 2017
4,975
7,735
136
Including the iGPU on the IOD makes perfect sense for one reason: the GPU needs bandwidth first and foremost, and the memory controllers are part of the IOD. The bandwidth of the links to the individual chiplets while sufficient for cores is rather lackluster, a link to a GPUlet would need to run through something else than organic substrate to achieve the necessary higher bandwidth, likely whatever a future MCM based GPU would also use.

Seems like it would make much more sense to target 7N for Zen5 IODs. Save EUV for the chiplets where it will make the most difference.
TSMC is encouraging all N7 using customers to use N6 instead in the future.
 

tomatosummit

Member
Mar 21, 2019
184
177
116
Including the iGPU on the IOD makes perfect sense for one reason: the GPU needs bandwidth first and foremost, ...

TSMC is encouraging all N7 using customers to use N6 instead in the future.

I agree the gpu should be on the IO die but I really don't think it's going to be a high performance part, 3-6cus would be more than adequate, need the extra bandwidth might be a moot point.
I think it's mostly to increase the target market for the high performance zen4 desktop cpus, oems would eat up being able to market 16core business pcs and you'll have the most powerful desktop replacement laptops around that won't rely on a discrete gpu.

Leaks have said for a while the IO die will shrink, n7 or n6 is neither here nor there in the grandscheme with the shared production capabilities or design compatability.
What I think will come with the shrink is left over die space. People keep saying the PHY modules will not shrink and that might cause a minimum size limit for the, if I wasn't lazy I'd take the renoir die layout, remove the two ccx and a bit of uncore and see how it fits together after some rearrangement. Two (or three) IF links have to be added as well that takes up more edge space on the already phy heavy io die and once all that is done I think there'll be some left over space in the middle which would suit a small igpu.
I'll wait for someone to tell me phys can be inset now and I'm an idiot.
 

Mopetar

Diamond Member
Jan 31, 2011
7,935
6,227
136
The bandwidth of the links to the individual chiplets while sufficient for cores is rather lackluster, a link to a GPUlet would need to run through something else than organic substrate to achieve the necessary higher bandwidth, likely whatever a future MCM based GPU would also use.

That makes a certain amount of sense. It's not necessarily a problem if more IF links are used for a GPU chiplet, but it will still ultimately be limited by the bandwidth between the IO die and the system memory. DDR5 will increase that by a fair amount, but there's also the possibility of utilizing infinity cache to help offset that as well. I'm sure that the IFOPs for Zen4/Zen5 will need to be a bit beefier just to handle the additional memory bandwidth that DDR5 will provide.

Another possibility is adding a specialized link to allow higher transfer rates to the GPU. They've already talked about a 100 GB/s link that they've developed. I believe that Navi 21 has two of those links, but they may be intended for professional cards only since they haven't been talked about all that much. Of course that's not necessarily ideal because it now adds a specialized link to the IO die that wouldn't always be used. However, it is rather close to the 112 GB/s bandwidth that DDR5 will bring.

Still, that also creates a strong argument for just continuing to build a monolithic APU as well. Personally I think it would be interesting if they could manage to build something where the CPU and GPU share an exceptionally large L3 cache that can be provisioned as necessary. That makes for the most efficient overall design from the perspective of minimizing power use, which for a laptop chip is always going to be one of the most important aspects to design around.

Based on RDNA 1 cards, AMD seemed to have somewhere around 10 GB/s of memory bandwidth available for every CU, at least for the non-OEM parts that didn't have gimped memory. Assuming that future APUs/chiplets don't have any infinity cache to alleviate some of that, they probably wouldn't want more than 12 CU as a part of the design depending on what clock speeds they're targeting. That aside, if they are targeting some kind of MCM design for GPUs in the future there are probably some parts of the GPU that would do better on whatever they call the die that's designed to connect all of the GPU chiplets.
 
  • Like
Reactions: Vattila

Thibsie

Senior member
Apr 25, 2017
765
834
136
They don't. There are plenty of APUs that include a bare-bones GPU for people who don't need a lot of 3D graphical capabilities or just do some light processing. Anyone doing any kind of professional work is going to want a discrete GPU. Hell, even a casual user might be able to benefit from a discrete card if they spend a considerable amount of time using those capabilities.

But not everyone needs an iGPU. Why should they be forced to buy something that they don't need or will never use?

Mom and Dad don't need AVX, why do they pay for it ?
 

Ajay

Lifer
Jan 8, 2001
15,614
7,945
136
Including the iGPU on the IOD makes perfect sense for one reason: the GPU needs bandwidth first and foremost, and the memory controllers are part of the IOD. The bandwidth of the links to the individual chiplets while sufficient for cores is rather lackluster, a link to a GPUlet would need to run through something else than organic substrate to achieve the necessary higher bandwidth, likely whatever a future MCM based GPU would also use.


TSMC is encouraging all N7 using customers to use N6 instead in the future.
Still doesn't make any sense for AMD to use it unless 7N is, somehow, now more expensive than 6N. AMD is the customer - they choose the node they want to use.