Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
AMD already has existing 10 years commitments for all their embedded products. That should cover all their relic WSA that may still exist at that point in time.
The obligations are new products only, existing products only touch revenue valve. Existing products utilize significantly less volume than new products. Meaning that existing products won't let AMD hit the pay target globalfoundries given.

Anything above 7nm and not FinFETs, AMD is obligated to support GlobalFoundries sell these nodes.
FinFETs => ~$5500M debt @ Globalfoundries (Design win and volume revenue, Mubadala is going to cut this hard.)
FDSOI => ~$3500M debt @ GlobalFoundries (Only Design win revenue, no volume revenue included so it might be less)
Etc. => No debt, just not significant enough.

Fab 8 => Ramping up 45nm PDSOI/FDSOI 1st, 22FDX should take 40% of 14LPP/12LP/12LP+ capacity, then in 2H20 12FDX should take the rest. After that FinFETs are gone for good.
 
Last edited:

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
The obligations are new products only, existing products only touch revenue valve. Existing products utilize significantly less volume than new products. Meaning that existing products won't let AMD hit the pay target globalfoundries given.

Anything above 7nm and not FinFETs, AMD is obligated to support GlobalFoundries sell these nodes.
FinFETs => ~$5500M debt @ Globalfoundries (Design win and volume revenue, Mubadala is going to cut this hard.)
FDSOI => ~$3500M debt @ GlobalFoundries (Only Design win revenue, no volume revenue included so it might be less)
Etc. => No debt, just not significant enough.
Still they're making a crapload of I/O dies. If everything goes well, the I/O dies alone could surpass the volume they've had a couple of years ago @ GF.
 
  • Like
Reactions: lightmanek

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
Still they're making a crapload of I/O dies. If everything goes well, the I/O dies alone could surpass the volume they've had a couple of years ago @ GF.
I doubt the I/O dies actually do anything. They are probably worse volume than anything else.

They are also on an already dead node. Not once since 2014 has Malta ever been black with FinFETs, not once. So, the idea of I/O dies will save the day is a joke.

CPUs, GPUs, APUs, IODs, none of these are going to save the FinFET at GloFo. If AMD can't make FinFETs => No revenue target will be hit. Since, the 7th wsa defaults to the original WSA only the SOI nodes matter.
 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,661
136
I doubt the I/O dies actually do anything. They are probably worse volume than anything else.
Excuse me? In every single Zen 2 chip there's an IOD. The IOD in Epyc and Threadripper is a monster as well. And then there's the chipset dies for both Ryzen 3xxx and Threadripper 3 on top of that. How can this result in a "worse volume than anything else"?

They are also on an already dead node. Not once since 2014 has Malta ever been black with FinFETs, not once. So, the idea of I/O dies will save the day is a joke.
That's all on GloFo though. Beyond the WSA AMD has no obligation to save anything for GloFo.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
Zen 3: Yep. Agreed there. Other stuff as well is almost certain, as pretty much every new arch from here on out will likely increase cache sizes (have to to lower thermal density, will be a major problem in the future)

Zen 4: No point in shrinking I/O afaik, but could be wrong there. Definitely not to 6nm though. It'll likely use GF's 12LPP+ instead (or whetever it's called, their 12nm+ that's slated for late 2020). Agreed on the smaller IPC bump and clocks. Power reduction isn't much, but density improvements is the real bonus from 5nm over 7nm.

Zen 5: Doubt cores per die increasing. Like I said before, thermal density is a real issue. This is where the big arch jump will happen again, so big IPC improvement. As for how, heck if I know.

There is not much of a return on shrinking the IO parts of the die. My thought was that they would move to an interposer for the IO die eventually, may be not with Zen 4 though. The IO die may be pad limited to some extent; that means that if they make it any smaller, they wouldn’t have room for all of the contact pads for signals and power. It is over 128 pci-e, 8 channel memory, and 8 infinty fabric links to the cpu chiplets. That is a lot of connections. An active interposer would make some sense. They could put most of the stuff that doesn’t scale well (larger transistors for driving external interfaces) into the interposer and stack everything else on top. I don’t think it makes sense to stack all of the cpu cores on an interposer. With pci-e 5 speeds, infinity fabric links on package are probably fine. It may be lower power to stack on the interposer, but it would require a significantly larger and much more expensive interposer.

That would allow some interesting things, like having separate memory controller chips stacked on the interposer. With the IO interfaces in the interposer, they could put a massive amount of cache on the memory controller(s) by using 7 or 5 nm, which allows very dense SRAM cache. That could essentially be taking the 4 quadrants of the current IO die, splitting them off into separate chips with the IO interfaces in the interposer and adding a lot of L4 cache.

I don’t really expect HBM to be used. HBM doesn’t actually have that good of latency (it is still DRAM) so it may not be that different from accessing DRAM modules as far as latency is concerned. To fully take advantage of it, you really want it connected directly to the chip using the data. Passing it through the infinity fabric interface just waste power. HBM2 is also quite large. The original HBM was tiny chip stacks, but HBM2 is around 100 square mm per die stack. The infinity fabric bandwidth is a good match for dram speed currently so what does HBM provide in this scenario?

I have also wondered if they could put some cpu chiplets on the interposer. That could allow them to place, perhaps 2 cpu chiplets on the interposer and have more cpu chiplets on the side, as they do now. That would make an 80 core device with 8 core chiplets. Perhaps they could even put a small HBM based GPU on either side. The gpus would have HBM cache and nearly direct access to 8 channel DDR5 over up to 4 infinity fabric interfaces each. Such an interposer would be very expensive though, so even if they do something like that, I would expect that it may be limited to super high end Epyc processors. They could have more than one type of IO die in use.

It could just be a minor shrink to the latest GF low power process, except for support for pci-e 5 and DDR5. Perhaps we shouldn’t over hype it.
 
  • Like
Reactions: Vattila

ksec

Senior member
Mar 5, 2010
420
117
116
My Take on Zen Roadmap.

Zen 3 will have new L3 cache structure that shares more between Core compared to current CCX design. ˜15+% IPC improvement, 7nm EUV. I am expecting an even smaller die size as cost optimisation .

Zen 4 will be 5nm shrink and I/O Shrink. Possibly Improved Infinity Fabric, DDR5, PCI-E 5, Smaller I/O Die, may be FD-SOI from GF, or completely move to TSMC 6nm for I/O Die. Along with some minor tweak on IPC and ClockSpeed improvement. The I/O improvement alone would be extremely welcome for HPC customers. Not to mention the power reduction from Node Shrink.

Zen 5 will see 5nm+, may be 16 Core per die, or minor IPC improvement.

But the trend is AMD will basically be doing IPC, ClockSpeed, Node, I/O cost optimisation in tick tock fashion like old Intel.

Quoting myself.

Actually It makes more sense to have 16 Core Die in Zen 4, where basic uArch is the same as Zen 3, and updated uArch again in Zen 5.

One of the thing I completely missed was I utterly overlook or assume GF 's normal node improvement are done. And they would be focused on SOI. It wasn't until the news on 12nm First Gen Ryzen that struck me, why didn't GF continue to iterate on DUV node? There is still quite a bit to go before EUV is an absolute must.

Turns out they were, Anandtech reported it but I somehow completely forgotten about it. ( I even wrote in the comment section on the article suggesting AMD do I/O Die shrink on it )


So to revised the original guess, I/O die will stay on GF with 12nm+.
 
Last edited:
  • Like
Reactions: Vattila

uzzi38

Platinum Member
Oct 16, 2019
2,632
5,959
146
While you're correct abput 12nm+ being a much better choice for the I/O die, asumming 40% efficiency gain is a bad idea. We don't know if that efficiency bump applies to analog circuits, and it it does, by how much.
 

Panino Manino

Senior member
Jan 28, 2017
821
1,022
136
Remember people here talking about going from 4 to 5-6 ALU as the next jump.
Today I discovered that the Samsung M5 is a 7 ALU architecture. What the hell... I remember that Samsung was not delivering in actual performance despite going so wide, what about this last in-house effort?

Edit: this is a phone CPU, but I mean, people were making comparisons here.
 

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
Sadly, Samsung disbanded that team. The M5/Exynos 990 will likely be the last Exynos.
Last Mongoose/Mx CPU core in Exynos you mean?

Exynos existed as their flagship SoC branding well before they switched to their custom CPU core after A57 - it's also used for their midrange SoC's that have continued using ARM's licensed core designs all this time.
 

DrMrLordX

Lifer
Apr 27, 2000
21,632
10,845
136
Last Mongoose/Mx CPU core in Exynos you mean?

Exynos existed as their flagship SoC branding well before they switched to their custom CPU core after A57 - it's also used for their midrange SoC's that have continued using ARM's licensed core designs all this time.

There is that. Exynos could go back to being just-another-ARM-derived SoC.
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
There is that. Exynos could go back to being just-another-ARM-derived SoC.

It was eventually bound to happen since Samsung was unimaginably expanding too fast and especially at a time where their profitability in traditionally dominant sectors such as DRAM, NAND, panels, mobile devices are being threatened and eroded away by their neighboring state backed competitors ...

Samsung got too greedy for their own good when they had no strategic reason behind it. No need to dominate every high-end industries if you're not likely to face a technological blackout ...
 

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
There is that. Exynos could go back to being just-another-ARM-derived SoC.
A good one to be fair, the last S Series phone I was tempted to buy was the S6 with the Exynos 7420 chip.

It well and truly smoked Qualcomm's less than stellar A57/A53 big little implementation at the time.
 

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
Samsung got too greedy for their own good when they had no strategic reason behind it. No need to dominate every high-end industries if you're not likely to face a technological blackout ...
This 100%.

I got the Galaxy S2 before I ever began to dislike Apple, and did so because it was a competitive product at a very competitive cost.

Now they price gouge at least as bad as Apple - which is why I switched to Huawei.

I'm only moving away from them because of their insane decision to make a proprietary nanoSIM sized variant of microSD their only accepted memory card type.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
Going back to Zen 4, with the density improvement, given the heat dissipation issues and the fact that 4-core CCXs on same chiplet still communicate over IOD/IF just like CCX on different chiplet, would it make sense to put a single CCX on a chiplet, increase yields further on an expensive/new/potentially higher-defect-rate process, without hurting latency too much? (IIRC inter-chiplet and inter-CCX latency aren't too far off, and should improve with advances in IF).

The second question is tangential, and that is - if IF improves so much to the point that latency is close to the same as between cores in a CCX, what does that mean for the GPU realm?
 
  • Like
Reactions: Vattila

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,661
136
Going back to Zen 4, with the density improvement, given the heat dissipation issues and the fact that 4-core CCXs on same chiplet still communicate over IOD/IF just like CCX on different chiplet, would it make sense to put a single CCX on a chiplet, increase yields further on an expensive/new/potentially higher-defect-rate process, without hurting latency too much? (IIRC inter-chiplet and inter-CCX latency aren't too far off, and should improve with advances in IF).
The decision to put two CCXs on one CCD likely was due to two things 1) assembly and routing is easier when there are fewer parts 2) the plan to interlink the two CCXs more closely in Zen 3 was likely set way before Zen 2 to be able to reuse Zen 2's package routing design in Zen 3.

Zen 4 may change far more than that. Aside CCDs the IOD may be split into smaller parts, like separating IMC and all the other IO, or some other separation where using different nodes makes sense. In any case I don't think it's a given Zen 4 will still use the same package topology as Zen 2.

The second question is tangential, and that is - if IF improves so much to the point that latency is close to the same as between cores in a CCX, what does that mean for the GPU realm?
It means nothing since the bandwidth requirement for potential GPU chiplets is far far higher than that of CPU ones. It's where Intel's choice to go with a chiplets approach for Xe will be very interesting to watch.
 
  • Like
Reactions: Vattila

Ajay

Lifer
Jan 8, 2001
15,454
7,862
136
Defect density will depend on the defect rate is for a die the size of a Zen3 chiplet on 7nm EUV (What ever that will be). Latency should mainly be a function serdes encode decode rates. The average latency for core to L3$ access could go down a bit, depending on the cache level memory architecture (ring, mesh, other).
 
  • Like
Reactions: Vattila

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
It means nothing since the bandwidth requirement for potential GPU chiplets is far far higher than that of CPU ones. It's where Intel's choice to go with a chiplets approach for Xe will be very interesting to watch.

So far that's only shown for compute Xe. On client and 3D workload oriented GPUs it might still be monolithic anyway.

Yea for GPUs splitting them up is akin splitting up the core into multiple different dies. Imagine a "6-wide" CPU made up of 2x 3-wide core dies!
 
  • Like
Reactions: Vattila

Vattila

Senior member
Oct 22, 2004
799
1,351
136
With the news that AMD has made an EPYC win with Zen 4 in the 2+ exaflops El Capitan supercomputer, feel free to speculate on the configurations — and vote in the poll.


Node speculation artwork:

9114301_784fe02041725d14929582ac2563fb00.png
 
  • Like
Reactions: lightmanek

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,560
14,514
136
From the article:
"The new system, expected to be put into service in 2023, will be 10x faster than Summit, the fastest publicly-ranked supercomputer in the world today (Top500, November 2019) "

10 times faster than the fastest supercomputer today ? WOW is all I can say.

Edit: And this, 2 of the 3 supercomputers owned by this company use AMD ?

"HPE, through Cray, has been the big winner so far in the U.S. Exascale sweepstakes, obtaining contracts for all three systems – Aurora, with an Intel CPU/GPU pair; Frontier, with another AMD CPU/GPU pair, and El Capitan, which we now know will also feature AMD processors and AMD accelerators. "
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,560
14,514
136
Silly question, but with the ARM supporters maintaining they have better IPC and use less power, why would the fastest supercomputers in the world all use x86 processors ? (AMD/Intel)

Since supercomputers use their own proprietary OS and applications, they should work on any selected hardware.