Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 281 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
821
1,457
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

Kaluan

Senior member
Jan 4, 2022
515
1,092
106
Raptor Lake? Fine. But I think it's wild some of you think Zen4 won't be faster than Alder Lake in gaming. ADL really isn't that much faster to begin with and in stuff like 12600K v 5800X (non X3D obv), 5800X comes up on top already (provide you test a wide/big enough batch and not just Riftbreaker and FC6).

It's only when you test i9 ADL + fast DDR5 that ADL appears to be conclusively faster than AMD's non V-Cache Zen3 best (such as 5900X or 5950X).
 

Timorous

Golden Member
Oct 27, 2008
1,978
3,864
136
On the other hand, if there's no chip releases from either company till 2024, then there's no need to sit on a Raphael-X release after RPL-S drops. The only key question here will be: Why release Raphael-X so soon if vanilla Raphael can take on RPL-S, especially, on its own?

Lots of reasons.
  • Higher ASPs for the 3D parts might mean more margin.
  • Getting AM5 locked in early so AMD can target future parts at drop in upgraders making it harder for Intel to reclaim any ground should they release something that competes well with Zen 5, Zen 6 etc.
  • They have the inventory to supply Genoa, Genoa-X, Raphael and Raphael-X in good quantities so why sit on it?
  • More options. AMD will have a variety of options to suit different budgets and different use cases. Want the best of both go 7950X3D. Gaming only 7800X3D. MT only 7900X / 7950X. More budget friendly 7700X / 7600X.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Zen 4, a month before launch is still a mystery to me:

1/ All the extra die space for 8-10% ipc uplift? Why? Can't be all of that for just avx512 right?

2/ Why do amd focus heavily on frequency now? You can't have high ipc and clock obviously but still, why now? Is it because of tsmc limited capacity for N5?

Double L2 and AVX-512 took all of the density space saved by the node shrink of 7nm to 5nm which is not that great to begin with
 

Kaluan

Senior member
Jan 4, 2022
515
1,092
106
Lots of reasons.
  • Higher ASPs for the 3D parts might mean more margin.
  • Getting AM5 locked in early so AMD can target future parts at drop in upgraders making it harder for Intel to reclaim any ground should they release something that competes well with Zen 5, Zen 6 etc.
  • They have the inventory to supply Genoa, Genoa-X, Raphael and Raphael-X in good quantities so why sit on it?
  • More options. AMD will have a variety of options to suit different budgets and different use cases. Want the best of both go 7950X3D. Gaming only 7800X3D. MT only 7900X / 7950X. More budget friendly 7700X / 7600X.
Ryxen 5 7600 is also rumored to be a thing at launch. Couple that with some budget B650 options and a <$100 16GB DDR5-5200CL38 kit and AMD might have a really good budget DDR5 proposition for "budget but futureproof" buyers.

Anyway, may I point out to people who want to belive Hasan's words in those WCCFT comments that HE ALSO SAID RAPTOR LAKE CAN PULL 350W OUT OF THE BOX... Maybe that's fine for a handful of people but for me and (I think) a majority of people that's very *yikes!* (and sad)
 
  • Like
Reactions: Tlh97 and KompuKare

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Ryxen 5 7600 is also rumored to be a thing at launch. Couple that with some budget B650 options and a <$100 16GB DDR5-5200CL38 kit and AMD might have a really good budget DDR5 proposition for "budget but futureproof" buyers.

Anyway, may I point out to people who want to belive Hasan's words in those WCCFT comments that HE ALSO SAID RAPTOR LAKE CAN PULL 350W OUT OF THE BOX... Maybe that's fine for a handful of people but for me and (I think) a majority of people that's very *yikes!* (and sad)

I really like to see how Ryzen 7600 (6+6 = 12Threads) will compete against RL 13400 (6+4 = 16Theads ), 13400 @ $200-220 should be faster vs current 12600K.
 
  • Like
Reactions: Tlh97 and Kaluan

Kaluan

Senior member
Jan 4, 2022
515
1,092
106
I really like to see how Ryzen 7600 (6+6 = 12Threads) will compete against RL 13400 (6+4 = 16Theads ), 13400 @ $200-220 should be faster vs current 12600K.
I think it's in the ballpark (w/ maybe better percentile lows in gaming scenarios), but IDK if E cores are confirmed for non-K 13th gen this time around. That would mean a different die.

Either way, may point was based around the fact that non-K/budget Raptor Lake (and B760 boards) is 5-6 months away, Ryzen 7600 might be just 1 month and a half.
 

InstrEd

Junior Member
Aug 3, 2022
8
30
61
first post on this forum. I admit I have been lurking for several years. I would go the AMD route on a new build myself just because I know I'm more future proof for the next several years. Even if Intel somehow manages to take the crown it will be at a higher power consumption I'm sure. I'm really interested to see what happens on the mobile front next year when I'm thinking of buying a new 2n1 laptop.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
The next diagram illustrate the die area space required to double the size of L2 and AVX-512

The one of the left is the compute unit of a Zen3 CCD, the next one is the same Zen3 CCD with Double L2 and AVX-512 with TSMC N7 the one on the far right a Zen4 with 1MiB L2, AVX-512 built on TSMC N5

1659625535668.png
 

moinmoin

Diamond Member
Jun 1, 2017
5,242
8,456
136
Why do amd focus heavily on frequency now?
Does AMD really "focus heavily" on frequency "now"? Isn't it rather forum members and leakers focusing on that? Frequency is an easy parameter to focus on for the press and public. It's essentially the only thing Intel publicly cared about since Skylake, to the point that they rather turn their chips into furnaces instead taking the efficiency gains new nodes should bring.

Whether frequency is AMD's focus we won't see until Zen 4 actually launches. As you wrote yourself there's a lot of space essentially unaccounted for.

And the work on frequency is consequential: On N7 desktop Zen chips are limited by a hard wall which make OC night impossible. Desktop Zen 2's V/F curve looked like a hockey stick. Desktop Zen 3's improved on it, but essentially only moved the wall in the end. V/F curves of mobile Zen chips showed how much better those curves could look like, but those naturally are optimized for lower frequencies. It looks like AMD took the move to N5 to move the wall further, possible even tackle better V/F curves for desktop as well.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
Should be noted Zen4 doesn't actually have 512-bit width registers, it is actually just two 128-bit width registers.
zen4zmmhilo.png
It should be done like this.

Zen2-esque mode for AVX512 (FP0/1 + FP2/3 + FP4/5 :: 3-pipes for AVX512)
or
Zen3-esque mode for AVX256 (FP0 + FP1 + FP2 + FP3 + FP4 + FP5 :: 6-pipes for AVX256)

Zen4c cuts off the second FPU unit and runs AVX512 like AVX256 in Zen1.

Decoder = Full Decode of AVX512
NSQ = Splits AVX512 into AVX512_L0 to Scheduler0 and AVX512_HI to Scheduler1.
Zen4 FPU's Store0 and Store1 can operate simultaneous to store full 512-bit width.

There is no power penalty from AVX512. Since, it is just using the existing AVX256 units that were improved from Zen3. There is no penalty for mixing AVX256 and AVX512.

There is no 2x512-bit register.
There is no 12 FPU units.

Family 19h only has
3 FP Pipes of 256-bit width per PRF+Cluster.
2x128-bit Registers (6 FP pipes only)
2xStores (Last FP pipe in each FPU cluster)
 
Last edited:

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
The next diagram illustrate the die area space required to double the size of L2 and AVX-512

The one of the left is the compute unit of a Zen3 CCD, the next one is the same Zen3 CCD with Double L2 and AVX-512 with TSMC N7 the one on the far right a Zen4 with 1MiB L2, AVX-512 built on TSMC N5

View attachment 65356

N5 density is around 137M transistors/mm2 while N5HP is at 93M/mm2, most of the N5 vs N7 density improvement has been traded for high frequency.
 

Schmide

Diamond Member
Mar 7, 2002
5,744
1,032
126
Should be noted Zen4 doesn't actually have 512-bit width registers, it is actually just two 256-bit width registers.
View attachment 65358
It should be done like this.

Zen2-esque mode for AVX512 (FP0/1 + FP2/3 + FP4/5 :: 3-pipes for AVX512)
or
Zen3-esque mode for AVX256 (FP0 + FP1 + FP2 + FP3 + FP4 + FP5 :: 6-pipes for AVX256)

Zen4c cuts off the second FPU unit and runs AVX512 like AVX256 in Zen1.

Decoder = Full Decode of AVX512
NSQ = Splits AVX512 into AVX512_L0 to Scheduler0 and AVX512_HI to Scheduler1.
Zen4 FPU's Store0 and Store1 can operate simultaneous to store full 512-bit width.

There is no power penalty from AVX512. Since, it is just using the existing AVX256 units that were improved from Zen3.

Yeah no. You can't implement avx512 without 512 bit registers. Permute, gather, shuffle. mask and the like make that basically impossible.

Sure you could split avx512 registers to do avx but interlane operations and persistence of data make combining 2 lesser registers inadequate.
 

Timmah!

Golden Member
Jul 24, 2010
1,571
935
136
If that rumor is true, those base clocks are pretty impressive So are the boost clocks! The 7950x has a 32.5% frequency increase over the 5950x.

Yeah, if the base clock is 4,5GHz, i dont question the "5,1GHz+ all core" rumors at all. Its pretty awesome. AutoCAD should run on this really swiftly.
My only hope that these numbers will be true for the 3D version as well. V-cache is indeed fantastic, but if it comes at 500 MHz penalty, it kinda loses lot of its allure.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Yeah no. You can't implement avx512 without 512 bit registers. Permute, gather, shuffle. mask and the like make that basically impossible.

Sure you could split avx512 registers to do avx but interlane operations and persistence of data make combining 2 lesser registers inadequate.
AMD Just Pulled that off.. Two AVX-256 combined to make a single less power hungry AVX-512
 
  • Like
Reactions: Joe NYC

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
Yeah no. You can't implement avx512 without 512 bit registers. Permute, gather, shuffle. mask and the like make that basically impossible.

Sure you could split avx512 registers to do avx but interlane operations and persistence of data make combining 2 lesser registers inadequate.
It would be categorized as the opposite. AMD found a way to reuse avx512 registers as 2 avx256 registers.
Zen4 does not have one 512-bit execution unit for any single pipe.
Zen4 does not have one PRF that has an entry having a 512-bit width.

Zen4 does ZMM this way:
2x128-bit on one PRF as ZMM_low and 2x128-bit on another PRF as ZMM_high.
Of which data is executed on two FP pipes:
FP0/1 for 512-bit MULs
FP2/3 for 512-bit ADDs
FP4/5 for 512-bit Stores

0-2-4 is FPU Cluster0 under FPScheduler0 which modifies PRF0
1-3-5 is FPU Cluster1 under FPScheduler1 which modifies PRF1

Zen4 functions as a Family 19h processor. Its behavior is a superset of Zen3. In this case, the superset behavior is Zen2-esque execution but with AVX512 instead of AVX256.

Zen2 have four pipes available for 256-bit, but it was actually routed via eight pipes that are 128-bit.
Four 128-bit + Four 128-bit = For four 256-bit.
128-bit work didn't have access to the upper four units.

Zen4 has three fused pipes available for 512-bit, but it actually routed via six pipes that are 256-bit.
Three 256-bit + Three 256-bit = For three 512-bit.
256-bit work however does have access to the upper three units. Since, that behavior was present in Zen3.

The hardware control has two options to display behavior on Fam 19h with AVX512:
Parallel ZMM (Lo256+Hi256): ZMM_Low on Cluster0 and ZMM_High on Cluster1
Temporal ZMM (256Lo+256Hi): ZMM_Low and ZMM_High on same cluster at half_rate.

Zen4 is suppose to be Parallel ZMM halves.
Zen4c is suppose to be Temporal ZMM halves, Zen4c superset behavior does Zen1-esque execution w/ lower FP power consumption from 3-pipe only.
 
Last edited:
  • Like
Reactions: Tlh97 and gruffi

gruffi

Member
Nov 28, 2014
35
117
106
Here is techspot's 5800X3D running DDR 3800 CL16 vs 12900K running DDR5 6400 CL32, tested in 40(!) games with 3090ti:

5800X3D is still faster, so I'm not sure how computerbase got their results.
That guy from ComputerBase used overclocked 12th gen models with faster memory. 12900K +42% faster DDR5, 12700K +25% faster DDR4, 5800X3D +19% faster DDR4. That comparison doesn't say much about Ryzen 7000. The advantage of the 12900K in that test is clearly based on OC and fast DDR5. Ryzen 7000 also can use fast DDR5.
 

gruffi

Member
Nov 28, 2014
35
117
106
2/ Why do amd focus heavily on frequency now? You can't have high ipc and clock obviously but still, why now? Is it because of tsmc limited capacity for N5?
They are not focusing on frequency in general. Since the first Zen generation the design clearly had some speed path limits. With Zen 4 AMD seems to eliminate most critical speed path limits before focusing on a wider design for more IPC (Zen 5). Which absolutely makes sense. And btw, up to 10% IPC isn't that bad after all. RPL seems to have ~0% IPC uplift. :grin: