Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 55 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

DisEnchantment

Golden Member
Mar 3, 2017
1,607
5,799
136
A bit refreshing to get some official dibs like these, they are way more interesting than listening to the average clairvoyant YouTube tech crystal balls.

I am wondering how this thing will perform compared to a vanilla 5950X for example.
The stacked dies could possibly get in the way of heat transfer from the CCD and impair boost behaviour.
Also for regular usage, not really sure the extra cache will really bring much performance, if at all, since most of their data set can fit in the already giant L3 already. Especially important "applications" like Cinebench and GB etc.

One thing though, APUs of the future will be insane. I can see this being applied everywhere.
Lower end SKUs just cut the stacked dies and go on sale as is, brilliant segmentation.
Can be applied to GPUs as well.
Although rumor has it that the RDNA3 chiplets sit on top of the cache chiplet. GCD sits on top of MCD.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
Thats wasn't really true for this case.

Aurora was supposed to be ready eariler, was won by Intel and is being built with Sapphire Rapid chiplets that have PCIe 5.0, "Rambo cache" chiplets, HMB2 on package if needed (and it looks like similar unified-memory-space software). The problem is it's using microbumps for stacking (well it's also very late, but that wasn't certain when Frontier was announced). So if anything Intel had the I/O advantage.

There had to be some secret sauce in AMDs offerings to win Frontier like they did. This is certainly one key differentiator. Bear in mind the V-cache solution acually most likely has two layers (as it sits on top of 32MB L3 and is excatly as big on the same process). There is nothing stopping AMD from adding more layers for some server CPUs and I'm convinced now CDNA2 has this stacking as well.

And while all of this is only possible because of AMD's engineering prowless, keep in mind that this is also TSMC's win as much as it's AMDs as it's the only foundry that has anything like that ready in this timeframe. The hoops TSMC had to make to make this work and be producable at scale are also enormous.

All in all ever since Zen 2 it looks like it's the trifecta of execution (Synopsis + AMD + TSMC) that is to be congradulated. AMD couldn't just do it alone.
Well, yeah, that is obvious. While AMD and Intel are competitors, Intel is a fab, so Intel competes with TSMC. By supporting AMD, TSMC managed to grab a large chunk of Intel's fab business. They could probably have done it eventually with ARM instead of AMD. At this point, some of the ARM solutions are more competitive with AMD than Intel is.

Good point about the cache chiplets possibly being 2 layers. I can't load that tweet for some reason. We will have to wait to see the specifics. The cache chiplet may be made on a different process compared to Zen 3 and it may be specifically optimized for cache density, so I don't know if it is 2 layers or not. You would get a massive number of the cache chips per wafer and it is going to probably be lower volume; only high-end products. They wouldn't need very many wafers and it is probably a good process pipe cleaner (simple, repeated structures and probably some redundancy). I haven't seen anything indicating what process was used to make the cache chiplets.

I am looking at the 4 images in the wccftech article. In the first graphic, it looks like it is the size of the underlying cache, but it is just a graphic. In the last image of 4, where Lisa is holding the die, it looks like very narrow pieces of structural silicon along the edges, if that is actually showing the cache chip. If that is the case, then it looks like it is actually, roughly 2x the size of the 32 MB already on die, so probably a 7 nm chiplet. This means no larger variant since it already covers most of the cpu die. That is, unless they can double stack.

Edit: I guess we have rumors talking about 8 high stacks. That would make some sense if it is essentially HBM style stacks except with SRAM. I don't know if I believe that just yet though. It seems like heat would be a problem with higher stacks. If this is used on GPUs, is it going to take the place of infinity cache? The possibly real die Lisa is holding up in the image looks like a single layer.

I can't imagine the cache thing being that big of differentiator. A lot of HPC applications are more towards streaming, where caches don't help as much. For some of the government projects, they specifically try to source from multiple vendors anyway. The whole system is under consideration though. Do you really think that Epyc wouldn't have made the cut without the extra cache? They still would have looked very good when power consumption (very important) and rack space are taken into account. These systems consume megawatts of power and Intel does not look good at the moment. They also have the Cray slingshot interconnect. For such large systems, the interconnect can be more important than what is actually in the nodes.
 
Last edited:

exquisitechar

Senior member
Apr 18, 2017
657
871
136
I am wondering how this thing will perform compared to a vanilla 5950X for example.
The stacked dies could possibly get in the way of heat transfer from the CCD and impair boost behaviour.
Also for regular usage, not really sure the extra cache will really bring much performance, if at all, since most of their data set can fit in the already giant L3 already. Especially important "applications" like Cinebench and GB etc.
I'm interested in this as well. Can't wait to see in-depth reviews. Expecting some crazy performance in certain applications and not much of an increase at all in most, as you said.

It's probably going to completely break some benchmarks.
 

coercitiv

Diamond Member
Jan 24, 2014
6,204
11,909
136
The stacked dies could possibly get in the way of heat transfer from the CCD and impair boost behaviour.
There will definitely be a price to pay in terms of thermals/max clocks, but this situation is very similar to what first gen Zen presented us with: a certain type of performance compromise in exchange for an entire new dimension in terms of flexibility. And then they'll fix the downsides with further iterations.

_rogame put it very succintly in a recent tweet:
Relentless Execution at its finest. Every technology advancement unlocks a sea of possibilities similar to technology trees in RPG games.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
The stacked dies could possibly get in the way of heat transfer from the CCD and impair boost behaviour.

Cache is not where the heat is tho. Even in BW heavy tasks like Linpack etc, the main offenders are vector FPU units. And due to the way L3 cache "segmenting" works, the heat is spread even more, since hashing has 3x more slices to work with.

There is a reason AMD is not covering L2 and cores and sticking exactly to existing L3 area, they have planned this since inception of ZEN3.
Remember the initial legendary leak of ZEN3, that told about 8C CCX size and all other things that we have found were 100% true? In that very prezentation there was also a golden gem of "32 + MB L3", at the time we thought it was for "server" stuff, increase to 48 or so, but it was here all along, meant for stacking those 32MB L3's.

1622545699315.png


So yeah, they have had vias and necessary infrastructure in every ZEN3 chiplet from day 1.
 

coercitiv

Diamond Member
Jan 24, 2014
6,204
11,909
136
Last edited:

Shivansps

Diamond Member
Sep 11, 2013
3,855
1,518
136
If they could release a AM4 APU with this cache structure it would be marvelous (for me) :D

Having just bought a 4650G this would really be an worthy upgrade for my needs. Even a GPU-less design on AM4 would be fantastic. Practically tle best socket ever, having supported for 4 full generations (excavator through zen3+)

With just Vega 8? it would be a massive waste, Vega 8 cant really take advantage of faster ram speeds over 4200, a large cache will not do much. But 12CU of RDNA2 on other hand....................... thats were performance north of RX570 starts being possible. But i dont think it will be used for RMB, APUs are always one gen behind.
 
Last edited:

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
Assuming that this becomes a standard thing for their CCDs going forward, do we think that stacking cache on the die will allow AMD to significantly increase the transistor counts for their actual ZEN4 cores, perhaps allowing them to go a bit wider, add a bunch of FPU circuits to support AVX-512, etc, and have a reduced amount of L3 con the CCD itself? Perhaps keeping the 8 core CCX, but using only 16MB of L3 between them, which is heavily taken up by the virtual op cache, and a large 64MB+ L3 on the stacked cache dies?

That should go a long way towards solving a whole lot of performance and competitive feature limitations against their competition.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,607
5,799
136

I also share this view, if this thing goes in to production by year end, then we can only see it earliest by Q2 2022.

However if this is planned also for a consumer product as mentioned below, I am not sure it will be launched by AMD back to back with Zen4 (which is confirmed by AMD to launch next year)



However if there is no Zen3+ DT parts if this tweet is to be believed




Then, the only V Cache based DT Ryzens would launch next year Q2, which would probably be Zen4 based unless AMD decides to launch back to back a Zen3 based Ryzen and Zen4 based Ryzen series.
I guess at this point it could be anything
 
Last edited:

randomhero

Member
Apr 28, 2020
181
247
86
I also share this view, if this thing goes in to production by year end, then we can only see it earliest by Q2 2022.

However if this is planned also for a consumer product as mentioned below, I am not sure it will be launched by AMD back to back with Zen4 (which is confirmed by AMD to launch next year)




However if there is no Zen3+ if this tweet is to be believed





Then, the only V Cache based DT Ryzens would launch next year Q2, which would probably be Zen4 based unless AMD decides to launch back to back a Zen3 based Ryzen and Zen4 based Ryzen series.
I guess at this point it could be anything
DT Ryzen will launch when DT Alderlake launches would be my bet. So Q122. It is such an opportunity to piss on Intel's parade. Also Threadrippers whenever ready. Epyc will launch first without a doubt.
 

misuspita

Senior member
Jul 15, 2006
401
452
136
Yes, I think this might be a worthy countermeasure for Alder, with Zen 4 coming later, without having to be in a hurry, so that they launch without too much beta-testing directly on consumers.
 

dr1337

Senior member
May 25, 2020
337
566
106
I think they're being coy about timelines because they don't want to tank their current sales. Why would anyone in their right mind buy a 5900x right now knowing that 15% faster with 2x the cache is only a few months away? Also the fact that zen 3 has been designed with stacked cache in mind from the very beginning says a lot with how far they are in its development. Right now its their best play to take things slow and im sure we'll hear more the closer we get to alderlakes launch. Also id assume their main use for the cache chiplets is frontier. Theres a reason why they're demoing this technology on desktop chips first even though it seems like its best use is in graphics and datacenter; I bet this is partly due to NDAs.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
They demoed this "5950x" with gaming benchmarks, so it seems it is coming to Ryzen parts somehow. If they paired it with AM5 for the real product, then I would expect it to actually be a Ryzen 9 6xxx part. It is a weird demo and it would be odd to show off a non-ThreadRipper prototype part and then never make a product based on it. These may not be that expensive to make. It is a tiny cache chip; they would get huge numbers of them per wafer.
They demo'd a 5900X, not a 5950X. 35:46 mm:ss


I simply don't see it happening on anything before Zen 5 for mainstream, maybe Zen 4 next year. I can see it being used on Threadripper 5000 processors, though. I have faith in AMD releasing their products on time, but they also said these would be in their higher end products at first. That's a very generic statement that could cover swathes of products most consumers won't buy.

Intel's Q4 for Alderlake is equally open given it being a quarter, but knowing how ridiculous they've gotten over the years, I expect a post Christmas launch from them.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
While James makes great points on the enterprise sector, I can see this being excellent if Threadripper 5XXX comes with it. If not, it's something to look forward to in Zen 4. However, I don't believe R9 Ryzens will see this anytime soon. Maybe with Zen 5.
What is nuts is that many of us were expecting a Threadripper announcement. Instead we got an an Intel basher.
They demoed this "5950x" with gaming benchmarks, so it seems it is coming to Ryzen parts somehow. If they paired it with AM5 for the real product, then I would expect it to actually be a Ryzen 9 6xxx part. It is a weird demo and it would be odd to show off a non-ThreadRipper prototype part and then never make a product based on it. These may not be that expensive to make. It is a tiny cache chip; they would get huge numbers of them per wafer.
It was a 5900X. I get the feeling it isn’t all that much more expensive to make. I am curious if they will release these as XT parts and charge more. It makes sense to do that rather than push out a completely new product line.

I am also second guessing the warhol stuff. Maybe warhol WAS canceled for this project. Warhol may also be for the low end refresh.
 
  • Like
Reactions: spursindonesia

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,663
136
All in all ever since Zen 2 it looks like it's the trifecta of execution (Synopsis + AMD + TSMC) that is to be congratulated.
Indeed. Also shows once more how closely AMD apparently works with TSMC.

And it only gets crazier
It's just the belated Zen moment for this third gen.
  • Zen 1: Competitive Zeppelin die. Oh, you can use 4 of them as 32 cores Epyc.
  • Zen 2: Chiplets and separate IOD. Oh, you can use 8 of them as 64 cores Epyc.
  • Zen 3: Uh, some nice IPC improvement. Wait, stacked 32MB SRAM as well, sneaked in as game cache v2 at a Computex keynote. You can use 8 of them for a total of 288MB L3 per CCD, for a total of 2304MB L3 in a 64 cores Epyc??? :oops:

Probably Frontier paid a lot of the upfront R&D, like they have been doing with a lot of their tech, being starved for money.
It's all the fruits of research and developments funded by DARPA's Fast Forward that started in 2012.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
Then, the only V Cache based DT Ryzens would launch next year Q2, which would probably be Zen4 based unless AMD decides to launch back to back a Zen3 based Ryzen and Zen4 based Ryzen series.
DT Ryzen will launch when DT Alderlake launches would be my bet. So Q122. It is such an opportunity to piss on Intel's parade. Also Threadrippers whenever ready. Epyc will launch first without a doubt.
One of you is saying 2Q22 for Zen 4, the other posted a vague statement of Q1, could you please clarify, RH?