Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 174 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
820
1,456
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

biostud

Lifer
Feb 27, 2003
19,798
6,891
136
If we are going to see a launch with two different chiplets, then it opens up for all kinds of combinations.
The lower tiers will probably be a single chiplet with 6c/8c with 64mb L3 cache.
Next tier could be the chiplet with 8pc+8ec 64mb 3D cache, with varying amounts of cores enabled. (Depending on if it is cheaper to make a CPU with two chiplets 8c without 3D cache or a single with 3D cache)
Top tier would be two chiplets with 8pc+8ec 64mb 3D cache, with varying amounts of cores enabled.

And that is not even counting if they can mix the chiplets.

And if that is the case, that could explain why Threadripper is moving up a tier, if we are going to see 32 cores (even if 16 are efficiency cores) in a mainstream PC.
 

DrMrLordX

Lifer
Apr 27, 2000
22,757
12,767
136
Maybe it's more of a binning thing? They will have some cores binned to be power efficient and others binned for max frequency?

I'm not sure how that would work. According to the rumoured design, the die has top-speed "outer" cores and power-constrained "inner" cores. Those dice have to be purposefully fabbed for that specific alignment before any testing can be done. It's not like they're gonna have an L3-less design out there with 16 cores binned for full speed operation and no L3 stacked on top, and if any of the "outer" cores bin badly enough that they can't operate well except in low-power mode, they'll just have to scrap the entire die (they can't swap the cores around).

i dont believe that at all, because how does cache coherency work

Look at the rumoured die layout. The L3 is stacked on top. There's no L3 on the base level of the die. So they remove L3, replace it with low-power cores, and then stack the L3 on top of the low-power cores. Ergo they double the core count, have a huge L3 cache, and accomplish this without any increase in lateral die area or density increase. In terms of total silicon area spent on the die, yes, that does increase.
 

DrMrLordX

Lifer
Apr 27, 2000
22,757
12,767
136
Uh oh. So stacked L3 spells the end of overclocking from AMD side?

Who knows? Unlike the 5800X3D, the hypothetical 16c Zen4 consumer core has the 3D cache stacked on top of a low-power core cluster that is meant to have a TDP of 30W. You wouldn't be overclocking those anyway. Zen3/Vermeer already lets you OC on a core-by-core basis, I think?
 

ryanjagtap

Member
Sep 25, 2021
144
204
96
You know if they use the deep SOC Power Partitioning and power states and configure it so that the Low TDP zen 4C cores have limited power states with respect to the outer zen 4 cores they could possibly do it. Don't know for sure just looking at this slide I thought it could be possible.
Ryzen 6000 Mobile Tech Day - Technology & Architecture-page-008.jpg
 

biostud

Lifer
Feb 27, 2003
19,798
6,891
136
Uh oh. So stacked L3 spells the end of overclocking from AMD side?
But they have showed a 5Ghz all core zen4, so that in itself bodes well for clockspeeds on zen4. But the last couple of years, has overclocking even been useful? I mean you get a few percentages for a huge extra power draw, it is not like in the good old time where you could get 20-30% from an overclock.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Overclocking lately is mostly undervolting and fixing clocks. Aka getting same or better than stock performance with much better thermals and power consumption. There is no excuse for AMD taking these controls away.
 

Saylick

Diamond Member
Sep 10, 2012
3,950
9,217
136
I'm not sure how that would work. According to the rumoured design, the die has top-speed "outer" cores and power-constrained "inner" cores. Those dice have to be purposefully fabbed for that specific alignment before any testing can be done. It's not like they're gonna have an L3-less design out there with 16 cores binned for full speed operation and no L3 stacked on top, and if any of the "outer" cores bin badly enough that they can't operate well except in low-power mode, they'll just have to scrap the entire die (they can't swap the cores around).



Look at the rumoured die layout. The L3 is stacked on top. There's no L3 on the base level of the die. So they remove L3, replace it with low-power cores, and then stack the L3 on top of the low-power cores. Ergo they double the core count, have a huge L3 cache, and accomplish this without any increase in lateral die area or density increase. In terms of total silicon area spent on the die, yes, that does increase.
Please tell me you're not referring to this bogus rumor from WCCFTech?
AMD-Ryzen-7000-Raphael-Zen-4-CPU-Chiplet-Layout.png

Even the usual Twitter suspects called it out as bogus.

Edit: Okay, just scrolled back a few pages and saw that it IS this WCCFTech rumor. Yeah, imma call BS on it. First of all, dodgy MS Paint sketch, and secondly, why would AMD handicap itself by putting full size Zen 4 cores in a limited TDP configuration when they have what's likely smaller Zen 4c cores they could have used instead. Third, that base die isn't going to work without V-cache, or if it does, the "priority" cores would be gimped. There's pretty much no re-use for the base die if there's no V-cache.
 
Last edited:

Ajay

Lifer
Jan 8, 2001
16,094
8,112
136

Saylick

Diamond Member
Sep 10, 2012
3,950
9,217
136
Thanks. I’ve been following this debate and see no business case for use of a more complex SoC in the desktop space. Using a specialized server chiplet is patently ridiculous - all because of some clickbait rumors.
No problem. Just trying to call out BS when I see it, and WCCFtech is just full of it.

The rumor feels almost as if someone saw Alderlake's hybrid approach and they thought up some way AMD could do the same but with AMD's own technology by simply adding V-cache. The moment I saw that it requires V-cache to work, it was an absolute bust in my opinion because it runs counter to AMD's ethos of modularity and re-usability.
 
  • Like
Reactions: Tlh97 and Ajay

ryanjagtap

Member
Sep 25, 2021
144
204
96
Thanks. I’ve been following this debate and see no business case for use of a more complex SoC in the desktop space. Using a specialized server chiplet is patently ridiculous - all because of some clickbait rumors.
Well we're speculating , right? We already know that consumer/desktop zen 4 will be just 16C/32T max. Rest is just speculating on future products cause there is no other thread for AMD like Intel Current and Future Lakes & Rapids thread.
 

DrMrLordX

Lifer
Apr 27, 2000
22,757
12,767
136
Please tell me you're not referring to this bogus rumor from WCCFTech?

I mean, it's just a rumour. And it is interesting, though it may be bunk. Still, Zen4 cores are already pretty area-efficient, so it kinda makes sense to just use stock Zen4 cores, limit their TDP, and then call them "Zen4c". As to whether they would be stacking cache this early in the game on N5 I don't know.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Zen4D is Zen4c we know they are twicking the Cache$ to allow Higher Density, the chances of a single 16C die are very High.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,112
136
Well we're speculating , right? We already know that consumer/desktop zen 4 will be just 16C/32T max. Rest is just speculating on future products cause there is no other thread for AMD like Intel Current and Future Lakes & Rapids thread.
Yes, but at some point it becomes ridiculous. Maybe I've been on these forums too long.
 

Saylick

Diamond Member
Sep 10, 2012
3,950
9,217
136
Zen4D is Zen4c we know they are twicking the Cache$ to allow Higher Density, the chances of a single 16C die are very High.
I don't doubt we'll see 16 Zen 4c cores on a single CCD. That "rumor" suggests that Zen 4c is basically Zen 4 but without an L3 cache, but my gut tells me it's not that simple. Every modern x86 architecture uses some kind of pooled last-level cache, so for that rumor to suggest that the interior cores only have individual L1 and L2 caches without a shared LLC is suspect. Even Gracemont uses a multi-megabyte shared L3 cache for each 4-core cluster. Given that Zen has traditionally used an eviction cache for it's LLC, I suspect Zen 4c is Zen 4 with a much smaller L3 cache (similar to the amount offered to the mobile CPUs, so maybe 1 MB per core) and using libraries that are denser and more power efficient at the cost of clock speeds.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,777
6,787
136
I don't doubt we'll see 16 Zen 4c cores on a single CCD. That "rumor" suggests that Zen 4c is basically Zen 4 but without an L3 cache, but my gut tells me it's not that simple. Every modern x86 architecture uses some kind of pooled last-level cache, so for that rumor to suggest that the interior cores only have individual L1 and L2 caches without a shared LLC is suspect. Even Gracemont uses a multi-megabyte shared L3 cache for each 4-core cluster. Given that Zen has traditionally used an eviction cache for it's LLC, I suspect Zen 4c is Zen 4 with a much smaller L3 cache (similar to the amount offered to the mobile CPUs, so maybe 1 MB per core) and using libraries that are denser and more power efficient at the cost of clock speeds.
Zen L3 is the cache coherence master. Removing L3 is like cutting the CCX from Infinity Fabric. Don't waste your time on such rumor.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,809
1,289
136
Edit: Okay, just scrolled back a few pages and saw that it IS this WCCFTech rumor. Yeah, imma call BS on it. First of all, dodgy MS Paint sketch, and secondly, why would AMD handicap itself by putting full size Zen 4 cores in a limited TDP configuration when they have what's likely smaller Zen 4c cores they could have used instead. Third, that base die isn't going to work without V-cache, or if it does, the "priority" cores would be gimped. There's pretty much no re-use for the base die if there's no V-cache.
Zen4 and Zen4c probably are the same size, core-wise. Where it changes is cache where Zen4 introduces 1 MB L2 per-core with 32 MB of high-current celled L3 Cache. Zen4c introduces 2 MB L2 per-core with no on-die L3 SRAMs, using exclusively only 64MB of high-density celled L3D/X3D-LLC.

5nm Goals:
2x Density
+
2x Power
+
>1.25x Perf
Zen L3 is the cache coherence master. Removing L3 is like cutting the CCX from Infinity Fabric. Don't waste your time on such rumor.
SRAM's don't have logic, L3 Control and L2 Shadow Tags would still be on die.
 
Last edited:

Saylick

Diamond Member
Sep 10, 2012
3,950
9,217
136
Zen4 and Zen4c probably are the same size, core-wise. Where it changes is cache where Zen4 introduces 1 MB L2 per-core with 32 MB of high-current celled L3 Cache. Zen4c introduces 2 MB L2 per-core with no on-die L3 using exclusively only 64MB of high-density celled L3D/X3D-LLC.

5nm Goals:
2x Density
+
2x Power
+
>1.25x Perf
What's your source for this?
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,809
1,289
136
What's your source for this?
5nm goals:
amd20215nmgoals.jpg
On the press slides it is 5nm improvements vs 7nm process. Basically, Zen4 if they achieve their goals will be half the size of Zen3, given the same Fam 19h. None of the bits for Zen4 given the leak indicate units larger than Zen3. Near-identical core to Zen3 = 1/2th the size in 5nm, given 2x density goals.

On Zen4/Zen4c being the way they are is similar to Milan/Milan-X (Genesis/Genesis-X). Genoa is non-X3D die, and Bergamo is X3D die. Basically, similar to Altra (32MB SLC) to Altra Max (16 MB SLC):: same architecture, more cores, less L3 hogging die-area. Hence, it is "Zen4" with a cache hierarchy that is cloud optimized.
Milan-X = 96*8
Bergamo = 64*8 to 256*8, since Bergamo isn't the prototype die, but rather the production die.
--ARMv9 N3 Cores in progress of tapeout since July 2021 for cloud have indicated Neoverse 3nm is targeting 128+ cores w/ 2MB L2 + 128 MB L3. AMD in servers is competing against single architecture ARMv9, cloud-orientated processors.

On the power efficiency comments, since it is fully X3D optimized, there might be more aggressive power-savings added from Zen4 on-die cache hierarchy -> Zen4 stacked-die cache hierarchy:
zen3plus.jpg
Different die, different optimizations.

Raphael would thus be conventionally split into two dies:
Durango CCD = Genoa CCD = 8c/1 MB L2 + on-die 32 MB Hi-Current SRAM L3.
Durango-X CCD = Bergamo CCD = 16c/2 MB L2 + stacked-vertical-cache 64 MB Hi-Density SRAM via L3D.

Specifically, I am not operating that wccftech is accurate as there has been no indications of a 2x512KB L2 Durango part.

General profits model:
Vermeer = $300 for 8-core
Vermeer-X = $450 for 8-core (~80 mm2 + ~40 mm2 stacked-die)
Raphael 8C CCD = $375 for 8-core (Limited ASP gain)
Raphael 16C CCD = $675~$900 for 16-core (~80 mm2 + ~40 mm2 stacked-die; increase of 1.5x~2x ASP over Vermeer-X)

Alder Lake has two incompatible architectures in 8P+8E config without AVX512 for ~$600
Raphael 16C CCD has a single architecture in 16P config with AVX512 for >$600 (better solution can ask for a higher price)
 
Last edited:
  • Like
Reactions: BTRY B 529th FA BN

Frenetic Pony

Senior member
May 1, 2012
218
179
116
I mean, it's just a rumour. And it is interesting, though it may be bunk. Still, Zen4 cores are already pretty area-efficient, so it kinda makes sense to just use stock Zen4 cores, limit their TDP, and then call them "Zen4c". As to whether they would be stacking cache this early in the game on N5 I don't know.

Feels like AMD might be a bit late on the whole efficiency cores thing, aren't they supposed to premiere with Zen 5?
 

DrMrLordX

Lifer
Apr 27, 2000
22,757
12,767
136
Zen L3 is the cache coherence master. Removing L3 is like cutting the CCX from Infinity Fabric. Don't waste your time on such rumor.

How would stacking L3 instead of including it in the base die harm cache coherence?

Feels like AMD might be a bit late on the whole efficiency cores thing, aren't they supposed to premiere with Zen 5?

Allegedly, Zen5 may include Zen4 cores (or Zen4c cores, or . . . something) on the same package. It doesn't mean they can't mess with per-core TDP power limits on Zen4.

But again, rumour.