Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 63 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

Kepler_L2

Senior member
Sep 6, 2020
330
1,162
106
It could be just that the Epyc/MI products are coming first and there isn't room to do the client products until later.
MI200 is coming this year, MI300 should be after RDNA3.

EPYC does release early for the big boys (google, facebook, tencent, etc.) but general availability is usually after the desktop release. Lisa Su confirmed recently that Genoa will launch 2022, so imo we should expect Raphael Q3 and Genoa Q4.
 

exquisitechar

Senior member
Apr 18, 2017
657
871
136
MI200 is coming this year, MI300 should be after RDNA3.

EPYC does release early for the big boys (google, facebook, tencent, etc.) but general availability is usually after the desktop release. Lisa Su confirmed recently that Genoa will launch 2022, so imo we should expect Raphael Q3 and Genoa Q4.
It has been rumored that Genoa will launch before Raphael, I think.
I think he's wrong but we'll see. This would mean over 3 years of development for Zen 4 and 2.5 years for RDNA3, much longer than usual for AMD.
I believe it's true that RDNA3 hasn't been taped out yet, at least. Unfortunately, the rest might be too.
 

CakeMonster

Golden Member
Nov 22, 2012
1,389
496
136

This guy seems to think Zen 4 (desktop?) will be Q4 2022.

My bet would also be Q4 '22 since we have confirmation of a Zen3 with more cache. Gives them time to get everything right and build up some stock + account for unforeseen problems.

I'm not making shit up for Twitter views though, I'm just a clueless hardware fan who goes with what makes most sense given my very limited knowledge.
 

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
I think he's wrong but we'll see. This would mean over 3 years of development for Zen 4 and 2.5 years for RDNA3, much longer than usual for AMD.
TSMC 5nm on 5nm die stacking won't be available until Q3 2022. The consensus is that this is the future, so expected. In fact, this is the earliest timeframe possible if Zen4 has die stacking as standard and not an add on.

Production timeframes are not only design limited but also if the product can be made.
 
  • Like
Reactions: Tlh97 and Vattila

DisEnchantment

Golden Member
Mar 3, 2017
1,601
5,780
136
It is theoretically possible that there are Zen4 versions of EPYC without v-cache.
I would say this is almost guaranteed.
Lambda services, load balancers, proxy servers, REST/gRPC API gateways etc scale well with cores. Not sure if these folks hosting such services wanna pay premium for cache heavy SKUs which won't bring any gain vs regular EPYC 7002/3 type cache SKUs.
I would even suggest that Altra even cut the cache for their 128 core chip which is squarely aimed at nginx type loads.
The weaker but more cores and with lesser cache did well in such tests on Phoronix.
 
Last edited:

Doug S

Platinum Member
Feb 8, 2020
2,252
3,481
136
TSMC 5nm on 5nm die stacking won't be available until Q3 2022. The consensus is that this is the future, so expected. In fact, this is the earliest timeframe possible if Zen4 has die stacking as standard and not an add on.

Production timeframes are not only design limited but also if the product can be made.


If they are waiting a full two years before the initial 5nm ramp to make 5nm die stacking available, that would 100% be because of customer scheduling. If Apple was going to use it they would have had it available much earlier, so you can probably use it to tell when those AMD products will ship (though it is possible they have other customers wanting to use it we don't know about)
 

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
If they are waiting a full two years before the initial 5nm ramp to make 5nm die stacking available, that would 100% be because of customer scheduling. If Apple was going to use it they would have had it available much earlier, so you can probably use it to tell when those AMD products will ship (though it is possible they have other customers wanting to use it we don't know about)
I actually don't understand your points. What was the time lag for 7nm? AMD designed the CCDs for it knowing that it would only be available for use in Q4 2021. It takes time to R&D new techniques. Why would you think that they could do it earlier but held back? someone could use that argument for almost everything. Why 3d stacking only now? Why chiplets only recently? Why many others.
 

DrMrLordX

Lifer
Apr 27, 2000
21,617
10,826
136
I would say this is almost guaranteed.
Lambda services, load balancers, proxy servers, REST/gRPC API gateways etc scale well with cores. Not sure if these folks hosting such services wanna pay premium for cache heavy SKUs which won't bring any gain vs regular EPYC 7002/3 type cache SKUs.
I would even suggest that Altra even cut the cache for their 128 core chip which is squarely aimed at nginx type loads.
The weaker but more cores and with lesser cache did well in such tests on Phoronix.

Bear in mind that, in the case of v-cache, it isn't a matter of sacrificing area that could be used for cores in favor of cache (as was the case with the Altra). It's more as @moinmoin indicated - waiting for validation. AMD can get you Genoa today without v-cache, or you can wait a year and get it with v-cache if your workload would actually benefit from the extra L3.

Not all workloads benefit from L3.

Genoa DOES offer an increase in core count vs. Milan, so it isn't necessarily a choice between Genoa w/out v-cache vs. Genoa with v-cache. It's a matter of choosing between Milan-X and Genoa-not-X.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,601
5,780
136
Bear in mind that, in the case of v-cache, it isn't a matter of sacrificing area that could be used for cores in favor of cache (as was the case with the Altra). It's more as @moinmoin indicated - waiting for validation. AMD can get you Genoa today without v-cache, or you can wait a year and get it with v-cache if your workload would actually benefit from the extra L3.

Not all workloads benefit from L3.

Genoa DOES offer an increase in core count vs. Milan, so it isn't necessarily a choice between Genoa w/out v-cache vs. Genoa with v-cache. It's a matter of choosing between Milan-X and Genoa-not-X.
Mmmm ... I am not sure I understood the relation between Milan-X and Genoa, but what I am trying to say is that there are lots of customers who would be interested in a plain Genoa without the V Cache, especially if it comes at a lower cost. The whole point with V Cache is to have another tool (like IF for scaling cores) to scale the end product, be it cache, core count and so on.

Therefore I believe AMD would definitely offer such a high core count Genoa SKU w/o V Cache, because it is suitable for many common loads.
We have a whole bunch of services on Azure/AKS that does nothing but authenticate and process request/response to/from the worker nodes within our DMZ and doing nothing but the very bare minimum of operations and processing at most a dozen bytes of data. Changing the instance type does nothing for us, changing vCPU count makes a difference. For such a service I would select an SKU for my Azure Subscription that allows me the highest vCPU count possible which is what we did.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
We have a whole bunch of services on Azure/AKS that does nothing but authenticate and process request/response to/from the worker nodes within our DMZ and doing nothing but the very bare minimum of operations and processing at most a dozen bytes of data. Changing the instance type does nothing for us, changing vCPU count makes a difference. For such a service I would select an SKU for my Azure Subscription that allows me the highest vCPU count possible which is what we did.

I think eventually ARM will eat both AMD/Intel in such usages. Graviton X or Altra Biturbo or whatever they will call it is perfect fit for such workloads, going to be unbeatable in all metrics relevant for cloud and usecases like Your AWS stuff.
AMD should be shooting for higher performance segment, and with V-Cache, ZEN3 they are doing just that. 8C chiplet with 96MB of private L3 is force is nice unit of computing for those non-trivial computing backend nodes.
 

Doug S

Platinum Member
Feb 8, 2020
2,252
3,481
136
I actually don't understand your points. What was the time lag for 7nm? AMD designed the CCDs for it knowing that it would only be available for use in Q4 2021. It takes time to R&D new techniques. Why would you think that they could do it earlier but held back? someone could use that argument for almost everything. Why 3d stacking only now? Why chiplets only recently? Why many others.

What I'm saying is that the time lag depends on having customers who want to use it. Sure it takes time to R&D new techniques, but doing the same thing with 5nm they are doing with 7nm is not a "new technique".

What would be the point of having all the equipment ready and employees trained to handle stacking 5nm chips when Apple is your only 5nm customer and they aren't interested in doing it? They are releasing it in Q3 2022 because that's when the first customer is ready to take delivery of 3D stacked 5nm chips.

If Apple had wanted to 3D stack the A15, you can bet they would have had it ready by Q3 2021 instead of Q3 2022.
 
  • Like
Reactions: spursindonesia

itsmydamnation

Platinum Member
Feb 6, 2011
2,764
3,131
136
I think eventually ARM will eat both AMD/Intel in such usages. Graviton X or Altra Biturbo or whatever they will call it is perfect fit for such workloads, going to be unbeatable in all metrics relevant for cloud and usecases like Your AWS stuff.
AMD should be shooting for higher performance segment, and with V-Cache, ZEN3 they are doing just that. 8C chiplet with 96MB of private L3 is force is nice unit of computing for those non-trivial computing backend nodes.
its funny with Zen 1 of the big iron cpu's ( regardless of ISA) AMD would have had to be the least preferred CPU uarch for big DB. Now with Zen3 + Vcache it has to be the most, only down side right now it maximum memory pool because of only 2P.
 

MadRat

Lifer
Oct 14, 1999
11,910
238
106
Zen 4, for example, has a GPU included.
GPU on die sounds like a great business case for at least 2GB of HBM. Aim at the consumers that don't want top of the line 4K or higher gaming. (Many are content at 1920x1280 @ 30fps.) Leaves top end video cards securely in hands of Bitcoin mining.

It's not like they have an endless glut of video card stock to sell off. Great timing for it.
 

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
What I'm saying is that the time lag depends on having customers who want to use it. Sure it takes time to R&D new techniques, but doing the same thing with 5nm they are doing with 7nm is not a "new technique".

What would be the point of having all the equipment ready and employees trained to handle stacking 5nm chips when Apple is your only 5nm customer and they aren't interested in doing it? They are releasing it in Q3 2022 because that's when the first customer is ready to take delivery of 3D stacked 5nm chips.

If Apple had wanted to 3D stack the A15, you can bet they would have had it ready by Q3 2021 instead of Q3 2022.
I have to disagree.

What makes you think that stacking 5nm is the same as stacking 7nm?

TSMC has already stated, and was already commented on here, that the copper pillar interconnect spacing is going to decrease by a factor of 10 over time. Advancing the state of the art is and will be continuing.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
They need to drastically increase L1 as well to get that IPC uplift.
Also my impression is the GPU is integrated into the IOD. That is what the old image GN shared showed as well.
There are a lot of possibilities if stacking is involved. It could be directly on the IO die or it could be a chiplet stacked on top of the IO die or a chiplet stacked on top of a larger interposer with other chiplets. If the IO die is made on the latest process, then it may make sense for it to be directly on the same die. For lower cost systems, it would make sense for it to just be directly on the IO die with no stacking; basically the same as current Zen 3.

That kind of comes back to making a chip for stacking on an interposer vs. a non-stacked solution. If they make a cpu chiplet specifically designed for stacking, then how do you do a lower end design where stacking is possibly too expensive? Do they make two different chiplets? It seems like they wouldn’t do two different chiplets. It seems like the lower end would just be a fully integrated APU. Where does the IO die with graphics fit? What market does it cover? It might be that the integrated graphics is so much better than previous solutions (due to DDR5 or large caches or something) that it can compete well with low end discrete graphics. That would change the marketing positions if the integrated IO die graphics was sufficient for 1080p. I have been suspicious of graphics in the IO die due to the market segmentation. If you are going for cheap, then a monolithic APU seems to make more sense. It does make some sense to have some graphics functionality across the whole line though.