Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 19 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
Zen4 server chips (IOD+CCD) will have a lot of things which will not see any use in desktop (HEDT maybe), GenZ/CCIX/CXL support, CCP (Crypto Co Processor), IFIS (IF Intersocket), SEV stuffs...
From die area and efficiency perspective it is redundant and rather detrimental I would say. Of course it is a tradeoff vs reusability but for power efficency every little pJ saved counts.
They did not reuse CCDs for Mobile for this reason. But the more they can strip off irrelavant blocks the better it is for space and energy efficiency.
They always scale down their server cores, instead of a purpose built Mobile core. This hurts the efficiency and therefore market perception.
They will have N5P on their Zen4 products, might as well extract maximum energy efficiency of the node with a purpose built SoC for Mobile.
Most of what you write is on the IOD. The PSP with SEV etc. is used for the PRO variants (even on mobile chips), so can't really be cut. I'm still not even sure the PSP is distinct from the ARM core responsible for SCF. I don't think there is much savings to be had on the CCDs with the split you propose.
 
  • Like
Reactions: Thibsie

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
Most of what you write is on the IOD. The PSP with SEV etc. is used for the PRO variants (even on mobile chips), so can't really be cut. I don't think there is much savings to be had on the CCDs with the split you propose.
Just examples I am giving, coherency engine is in the L3 of each CCX, the IOD has the the PHY for going off socket. For a single 8 Core CCX chip (Mobile for example), this can be stripped, because the L3 only deals with maintaining the directory for its own L2 clients.
This is one example. I am just wondering how many more of these things are there.
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
Just examples I am giving, coherency engine is in the L3 of each CCX, the IOD has the the PHY for going off socket. For a single 8 Core CCX chip (Mobile for example), this can be stripped, because the L3 only deals with maintaining the directory for its own L2 clients.
This is one example. I am just wondering how many more of these things are there.
Currently there are two lines: server/desktop, and mobile. Within server/desktop we already have two distinct IOD dies, with the CCDs likely having different binning targets. The mobile line up to now seemed like an afterthought but if the leaked roadmap is to be believed it will be more of an own thing from now on, getting more dies and optimizations independent from the server/desktop roadmap.

Going with this pattern I think it's way more likely AMD creates more special purpose dies where everything not needed is left away than splitting server/desktop CCDs themselves. Van Gogh seems to be an example of such a new special purpose die. I feel instead creating new die designs by stripping existing die designs AMD is more likely to design seemingly superfluous part in a more generic way from the start so they can be used for other purposes then.
 
  • Like
Reactions: maddie

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Hm....So....Ok, bear with me.
Zen4 will probably be evolution of Zen3 on 5nm node.We could then expect that AMD would reuse Zen3 topology. They also have experience with Zen2 dual CCX.3 Bandwidth between IOD and CCD has to go up, but they need to preserve low latency, both IOD to CCD and CCX to CCX, hence L4. I believe it will be 64MB, basically double the L3s in CCD with 2 CCX.

2x as a ratio is too small. You want 4x at the minimum. They can also straight up double the L3 cache which is a better solution than having an L4 that's mere twice the size.

Rumors suggest increase in core count. I don't think it'll double as they suggest because I think the 80% density claim for 5nm will fall quite a bit short. Maybe 20-24 cores. If they achieve another 15% improvement with uarch using Zen 4 in single thread it will be another formidable CPU.
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
Genoa, at 96 cores, sounds like it has 12 CCDs. The current ZEN3 CCDs are essentially the same size as the ZEN2 ones. So, in order to fit more, either the IOD has drastically shrunk, or it's using different CCDs. I guess it could also be a combinaton. There are multiple nodes between GF 12LP and the expensive N7 class nodes that could enable a shrink of the IOD at reasonable costs. It's not impossible that they shrank the IOD enough to put a CCD on either end and went from four CCDs per side to five.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
Zen4 brings so many changes to the table it is quite interesting to see how the next year will shape up for AMD in terms of innovations in the server and HPC market.
DDR5, PCIe5, CCIX/CXL, N5, 3D stacking, NVDIMM, FPGA integration, Cache coherent Infinity Architecture... quite a long laundry list of possibilites
 

maddie

Diamond Member
Jul 18, 2010
4,722
4,625
136
Zen4 server chips (IOD+CCD) will have a lot of things which will not see any use in desktop (HEDT maybe), GenZ/CCIX/CXL support, CCP (Crypto Co Processor), IFIS (IF Intersocket), SEV stuffs...
From die area and efficiency perspective it is redundant and rather detrimental I would say. Of course it is a tradeoff vs reusability but for power efficency every little pJ saved counts.
They did not reuse CCDs for Mobile for this reason. But the more they can strip off irrelavant blocks the better it is for space and energy efficiency.
They always scale down their server cores, instead of a purpose built Mobile core. This hurts the efficiency and therefore market perception.
They will have N5P on their Zen4 products, might as well extract maximum energy efficiency of the node with a purpose built SoC for Mobile/DT.
I have always expected AMD to start specializing as soon as sales in individual markets justified the investment. They still can utilize circuitry design (elements/blocks) as needed in various products. This should be a good way of reusing R&D expenditures for various markets while still optimizing for the various use cases.

By the time 5nm arrives, we should expect unified APU like products to satisfy the desktop PC segment. 16 cores is very close to the limit for the foreseeable future on the normal desktop. IOD + chiplets will be reserved for premium products.

RDNA & CDNA is the model for the CPU lines.
 

randomhero

Member
Apr 28, 2020
180
247
86
Genoa, at 96 cores, sounds like it has 12 CCDs. The current ZEN3 CCDs are essentially the same size as the ZEN2 ones. So, in order to fit more, either the IOD has drastically shrunk, or it's using different CCDs. I guess it could also be a combinaton. There are multiple nodes between GF 12LP and the expensive N7 class nodes that could enable a shrink of the IOD at reasonable costs. It's not impossible that they shrank the IOD enough to put a CCD on either end and went from four CCDs per side to five.
Or 6x16.
Also is nice 2 channels(actually 4 because it is DDR5) per CCD.
edit:Forgot to say that also aligns well with my speculation, for what is worth.
 

randomhero

Member
Apr 28, 2020
180
247
86
2x as a ratio is too small. You want 4x at the minimum. They can also straight up double the L3 cache which is a better solution than having an L4 that's mere twice the size.

Rumors suggest increase in core count. I don't think it'll double as they suggest because I think the 80% density claim for 5nm will fall quite a bit short. Maybe 20-24 cores. If they achieve another 15% improvement with uarch using Zen 4 in single thread it will be another formidable CPU.
Yes, if CCD stays at curent one CCX. I believe it won't.

You think ST performance will be obviously better? I actually don't.I think clocks will be lower, and eat IPC improvements, but MT performance will be substantially better.
 
  • Haha
Reactions: scineram

dr1337

Senior member
May 25, 2020
309
503
106
They did not reuse CCDs for Mobile for this reason.
No frankly the reason why they went monolithic for mobile is that their current MCM solution in Matisse and Vermeer is much more power hungry. In fact the core itself is completely identical to Matisse other than the l3 cache, they didn't strip down any features from the compute die, only the IO die.
They always scale down their server cores, instead of a purpose built Mobile core.
lol, "always". We've had officially 3 generations of ryzen now, sure they've all been relatively similar but IMO it doesn't make any sense to speak in absolutes in what is a fairly new product range. The matter of fact is that if AMD could use chiplets in a mobile design without power getting out of control (by using stacking or an active interposer), they would.

Having different CCD's for server and desktop makes zero sense because they have different IO dies anyway. If they don't intend to use chiplets on mainstream mobile parts, it'd be far more likely that they transition to the intel model and have all desktop and mobile chips share the same silicon. But then this would kill their core count lead, and a 16 core APU doesn't seem likely in the next few years. Your points and concerns you bring up are legitimate but context you're arguing them in makes no sense. AMD already has a clear plan with clear roads for them to go ahead on, separate compute dies only makes sense on a superficial level. If you account for diminishing returns and current plans/product lines, there isn't any logical, financial, or business reason why they would cut their margins and wafer supply in half, at least no reason is actually presented in your argument.
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
Genoa, at 96 cores, sounds like it has 12 CCDs. The current ZEN3 CCDs are essentially the same size as the ZEN2 ones. So, in order to fit more, either the IOD has drastically shrunk, or it's using different CCDs. I guess it could also be a combinaton. There are multiple nodes between GF 12LP and the expensive N7 class nodes that could enable a shrink of the IOD at reasonable costs. It's not impossible that they shrank the IOD enough to put a CCD on either end and went from four CCDs per side to five.
Unlike Zen 2 and 3 with N7, Genoa is Zen 4 using N5 which is significantly denser. The question is whether they keep the Zen 3 ratio of one CCX per CCD or increase that to two CCX per CCD again.

I'm not sure how feasible shrinking the IOD is. I'd expect it to be split up further into hard to shrink I/O parts that would use a node that's energy efficient for that job, and further logic that could use a denser node and be stacked on it. Packaging may make use of interposer and X3D then.

What's interesting is that the latest rumors says Genoa would still be on SP5 socket. If that weren't the case AMD could easily define a new server platform and socket supporting DDR5 and PCIe 5 where the package is bigger to accommodate more chiplets.
 
  • Like
Reactions: Tlh97

gdansk

Golden Member
Feb 8, 2011
1,973
2,352
136
Hi guys, lurker here. What do you think is the likelihood of this leak? The performance claims seem outlandish to me but the improvements of the IOD seem plausible
Very high salt content. Off the bat apparently an AMD employee that would know about this and "CPU's".
  1. I would be surprised if it isn't launched within 16 months of Zen 3. I swear half their investor updates are emphasizing "consistent execution". But maybe they'll have a Zen 3 refresh/XT to deliver consistency.
  2. PCIe5 is essential for a new platform. If they are going DDR5, there is no reason not to include PCIe5. People want to future proof for next generation storage devices. I don't see why AMD would be laggard here when PCIe4 was a good selling point against Intel. Alder Lake is planned for end of 2021 using DDR5 and PCIe5 so they must have it for competitiveness.
  3. 4 CCD in the desktop chips? Why? They only need 2 to beat Alder Lake and 3 to obliterate it. If the new IOd is smaller they will have room to fit more CCd. But I expect this to apply more to servers where they want to kill Intel's 4 socket systems with only 2 sockets.
  4. L4 cache? Extreme doubt.
  5. 30-40% IPC uplift? 15%-20% improvement in 1T performance is my expectation with no basis except past performance. But they'll have many more transistors to play with so who knows.
 
Last edited:

HurleyBird

Platinum Member
Apr 22, 2003
2,670
1,250
136
Unlike Zen 2 and 3 with N7, Genoa is Zen 4 using N5 which is significantly denser. The question is whether they keep the Zen 3 ratio of one CCX per CCD or increase that to two CCX per CCD again.

I expect the number of cores per CCX will improve. If the 96 core rumour is correct, then I wouldn't be surprised if Zen 4 CCDs have 12 cores and 48MB L3.

Main thing is I'm betting that when AMD refactored the CCX, that they built in some flexibility. Why refactor in a way that only allows eight cores if you're likely to need to refactor again for more? Just do it right the first time. Looking at the die shot of Zen 3 going around kind of supports this. Cores are no longer "mirrored" relative to their closest neighbours, and the interfaces between L3 blocks running east-to-west are no longer there. The area between east and west L3 regions looks like a giant crossbar, and perhaps this can scale arbitrarily albeit at the expense of latency.
 
Last edited:

HurleyBird

Platinum Member
Apr 22, 2003
2,670
1,250
136
The 4 CCDs on mainstream desktop rumour and the 96-core Genoa rumour don't mix well together (if the first is true, then the second should be 128 cores). That said, AMD did try to convince everyone through false leaks that Rome was only 48 cores, so they could be doing the same thing again.
 

HurleyBird

Platinum Member
Apr 22, 2003
2,670
1,250
136
Regarding the Reddit leak, at first blush it seems plausible.

At second blush, @jrdls linking it as his first foray onto the forum, exactly one week after the information was posted on Reddit without making too much of a splash? Yeah, it seems obvious enough that he's the author of the Reddit post.

And that makes me discount it a bit. Feels more like something an attention seeking faker would do rather than a genuine leaker, considering that he would be increasing his exposure and AMD is certainly on the warpath to root him out if the leak is legit. In such a scenario, the first and only thing on the leaker's mind should be avoiding detection. Only thing I can think of that might make sense here is if AMD somehow failed to notice the leak.
 
Last edited:
  • Like
Reactions: scannall

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
There's a lot of problems with that leak.

It's no big leap to understand that they are going to be pushing the IF bandwidth going forward. If they have a big leap in DDR bandwidth, the IF will need to keep up somehow. Either go wider, or go faster.

Desktop would be very well served by a node step for the IO die and using the space freed up for an additional CCD. That gives you a 24 core processor on AM5 being fed by DDR5. That puts them miles ahead of Intel on the desktop, and in HEDT, Intel will have to use an XCC die to match it. That's a drastic leap in costs. If they do go to 3d stacking, thesky is the limit, with the potential for four CCDs stacked on an active interposer or the IO die itself. 32 cores is a possibility there.
 

jrdls

Junior Member
Aug 19, 2020
12
12
51
At second blush, @jrdls linking it as his first foray onto the forum, exactly one week after the information was posted on Reddit without making too much of a splash? Yeah, it seems obvious enough that he's the author of the Reddit post.
I understand it is unorthodox the way I went about posting for the first time here, but that reddit post seemed too interesting not to post here, especially in light of the way AMD went about architecting RDNA2.
And as to me being the author, I'm not. I found about it on twitter. NerdTechGasm weighted on it here:
 

soresu

Platinum Member
Dec 19, 2014
2,612
1,812
136
The 4 CCDs on mainstream desktop rumour and the 96-core Genoa rumour don't mix well together (if the first is true, then the second should be 128 cores). That said, AMD did try to convince everyone through false leaks that Rome was only 48 cores, so they could be doing the same thing again.
Especially considering that the Zen3 PR specifically highlights double digit perf/watt percentage gains on the same node.

Coupled with that and a likely move to the N5P node, a 50% increase in cores would be a poor improvement for 2 uArch generations since EPYC 2 last changed the core count.
 
  • Like
Reactions: Tlh97