Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Vattila · Oct 6, 2019

Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts!

RnR_au · Aug 4, 2022

Vope45 said:
To compete in the next 5+ years I think amd need their own fab for sure.

Alright... I'll ask. Why? Could you please expand on this bit please?

Kaluan · Aug 4, 2022

Raptor Lake? Fine. But I think it's wild some of you think Zen4 won't be faster than Alder Lake in gaming. ADL really isn't that much faster to begin with and in stuff like 12600K v 5800X (non X3D obv), 5800X comes up on top already (provide you test a wide/big enough batch and not just Riftbreaker and FC6).

It's only when you test i9 ADL + fast DDR5 that ADL appears to be conclusively faster than AMD's non V-Cache Zen3 best (such as 5900X or 5950X).

Timorous · Aug 4, 2022

Zucker2k said:
On the other hand, if there's no chip releases from either company till 2024, then there's no need to sit on a Raphael-X release after RPL-S drops. The only key question here will be: Why release Raphael-X so soon if vanilla Raphael can take on RPL-S, especially, on its own?

Lots of reasons.

Higher ASPs for the 3D parts might mean more margin.
Getting AM5 locked in early so AMD can target future parts at drop in upgraders making it harder for Intel to reclaim any ground should they release something that competes well with Zen 5, Zen 6 etc.
They have the inventory to supply Genoa, Genoa-X, Raphael and Raphael-X in good quantities so why sit on it?
More options. AMD will have a variety of options to suit different budgets and different use cases. Want the best of both go 7950X3D. Gaming only 7800X3D. MT only 7900X / 7950X. More budget friendly 7700X / 7600X.

nicalandia · Aug 4, 2022

Vope45 said:
Zen 4, a month before launch is still a mystery to me:

1/ All the extra die space for 8-10% ipc uplift? Why? Can't be all of that for just avx512 right?

2/ Why do amd focus heavily on frequency now? You can't have high ipc and clock obviously but still, why now? Is it because of tsmc limited capacity for N5?

Double L2 and AVX-512 took all of the density space saved by the node shrink of 7nm to 5nm which is not that great to begin with

Kaluan · Aug 4, 2022

Timorous said:
Lots of reasons.

Higher ASPs for the 3D parts might mean more margin.

Getting AM5 locked in early so AMD can target future parts at drop in upgraders making it harder for Intel to reclaim any ground should they release something that competes well with Zen 5, Zen 6 etc.

They have the inventory to supply Genoa, Genoa-X, Raphael and Raphael-X in good quantities so why sit on it?

More options. AMD will have a variety of options to suit different budgets and different use cases. Want the best of both go 7950X3D. Gaming only 7800X3D. MT only 7900X / 7950X. More budget friendly 7700X / 7600X.

Ryxen 5 7600 is also rumored to be a thing at launch. Couple that with some budget B650 options and a <$100 16GB DDR5-5200CL38 kit and AMD might have a really good budget DDR5 proposition for "budget but futureproof" buyers.

Anyway, may I point out to people who want to belive Hasan's words in those WCCFT comments that HE ALSO SAID RAPTOR LAKE CAN PULL 350W OUT OF THE BOX... Maybe that's fine for a handful of people but for me and (I think) a majority of people that's very *yikes!* (and sad)

AtenRa · Aug 4, 2022

Kaluan said:
Ryxen 5 7600 is also rumored to be a thing at launch. Couple that with some budget B650 options and a <$100 16GB DDR5-5200CL38 kit and AMD might have a really good budget DDR5 proposition for "budget but futureproof" buyers.

Anyway, may I point out to people who want to belive Hasan's words in those WCCFT comments that HE ALSO SAID RAPTOR LAKE CAN PULL 350W OUT OF THE BOX... Maybe that's fine for a handful of people but for me and (I think) a majority of people that's very *yikes!* (and sad)

I really like to see how Ryzen 7600 (6+6 = 12Threads) will compete against RL 13400 (6+4 = 16Theads ), 13400 @ $200-220 should be faster vs current 12600K.

Kaluan · Aug 4, 2022

AtenRa said:
I really like to see how Ryzen 7600 (6+6 = 12Threads) will compete against RL 13400 (6+4 = 16Theads ), 13400 @ $200-220 should be faster vs current 12600K.

I think it's in the ballpark (w/ maybe better percentile lows in gaming scenarios), but IDK if E cores are confirmed for non-K 13th gen this time around. That would mean a different die.

Either way, may point was based around the fact that non-K/budget Raptor Lake (and B760 boards) is 5-6 months away, Ryzen 7600 might be just 1 month and a half.

InstrEd · Aug 4, 2022

first post on this forum. I admit I have been lurking for several years. I would go the AMD route on a new build myself just because I know I'm more future proof for the next several years. Even if Intel somehow manages to take the crown it will be at a higher power consumption I'm sure. I'm really interested to see what happens on the mobile front next year when I'm thinking of buying a new 2n1 laptop.

nicalandia · Aug 4, 2022

The next diagram illustrate the die area space required to double the size of L2 and AVX-512

The one of the left is the compute unit of a Zen3 CCD, the next one is the same Zen3 CCD with Double L2 and AVX-512 with TSMC N7 the one on the far right a Zen4 with 1MiB L2, AVX-512 built on TSMC N5

moinmoin · Aug 4, 2022

Vope45 said:
Why do amd focus heavily on frequency now?

Does AMD really "focus heavily" on frequency "now"? Isn't it rather forum members and leakers focusing on that? Frequency is an easy parameter to focus on for the press and public. It's essentially the only thing Intel publicly cared about since Skylake, to the point that they rather turn their chips into furnaces instead taking the efficiency gains new nodes should bring.

Whether frequency is AMD's focus we won't see until Zen 4 actually launches. As you wrote yourself there's a lot of space essentially unaccounted for.

And the work on frequency is consequential: On N7 desktop Zen chips are limited by a hard wall which make OC night impossible. Desktop Zen 2's V/F curve looked like a hockey stick. Desktop Zen 3's improved on it, but essentially only moved the wall in the end. V/F curves of mobile Zen chips showed how much better those curves could look like, but those naturally are optimized for lower frequencies. It looks like AMD took the move to N5 to move the wall further, possible even tackle better V/F curves for desktop as well.

NostaSeronx · Aug 4, 2022

Should be noted Zen4 doesn't actually have 512-bit width registers, it is actually just two 128-bit width registers.

It should be done like this.

Zen2-esque mode for AVX512 (FP0/1 + FP2/3 + FP4/5 :: 3-pipes for AVX512)
or
Zen3-esque mode for AVX256 (FP0 + FP1 + FP2 + FP3 + FP4 + FP5 :: 6-pipes for AVX256)

Zen4c cuts off the second FPU unit and runs AVX512 like AVX256 in Zen1.

Decoder = Full Decode of AVX512
NSQ = Splits AVX512 into AVX512_L0 to Scheduler0 and AVX512_HI to Scheduler1.
Zen4 FPU's Store0 and Store1 can operate simultaneous to store full 512-bit width.

There is no power penalty from AVX512. Since, it is just using the existing AVX256 units that were improved from Zen3. There is no penalty for mixing AVX256 and AVX512.

There is no 2x512-bit register.
There is no 12 FPU units.

Family 19h only has
3 FP Pipes of 256-bit width per PRF+Cluster.
2x128-bit Registers (6 FP pipes only)
2xStores (Last FP pipe in each FPU cluster)

Abwx · Aug 4, 2022

nicalandia said:
The next diagram illustrate the die area space required to double the size of L2 and AVX-512

The one of the left is the compute unit of a Zen3 CCD, the next one is the same Zen3 CCD with Double L2 and AVX-512 with TSMC N7 the one on the far right a Zen4 with 1MiB L2, AVX-512 built on TSMC N5

View attachment 65356

N5 density is around 137M transistors/mm2 while N5HP is at 93M/mm2, most of the N5 vs N7 density improvement has been traded for high frequency.

biostud · Aug 4, 2022

Racan said:
If you’re a gamer mainly and not strapped for cash why would you even buy Zen 4 chips without 3D cache?

If it would be launched late '23 as I first thought.

nicalandia · Aug 4, 2022

NostaSeronx said:
Should be noted Zen4 doesn't actually have 512-bit width registers.

That's a Given. It's been know for quite a while now.

NostaSeronx said:
There is no power penalty from AVX512. Since, it is just using the existing AVX256 units that were improved from Zen3.

They double on the FP Units.

eek2121 · Aug 4, 2022

If that rumor is true, those base clocks are pretty impressive So are the boost clocks! The 7950x has a 32.5% frequency increase over the 5950x.

Schmide · Aug 4, 2022

NostaSeronx said:
Should be noted Zen4 doesn't actually have 512-bit width registers, it is actually just two 256-bit width registers.
View attachment 65358
It should be done like this.

Zen2-esque mode for AVX512 (FP0/1 + FP2/3 + FP4/5 :: 3-pipes for AVX512)
or
Zen3-esque mode for AVX256 (FP0 + FP1 + FP2 + FP3 + FP4 + FP5 :: 6-pipes for AVX256)

Zen4c cuts off the second FPU unit and runs AVX512 like AVX256 in Zen1.

Decoder = Full Decode of AVX512
NSQ = Splits AVX512 into AVX512_L0 to Scheduler0 and AVX512_HI to Scheduler1.
Zen4 FPU's Store0 and Store1 can operate simultaneous to store full 512-bit width.

There is no power penalty from AVX512. Since, it is just using the existing AVX256 units that were improved from Zen3.

Yeah no. You can't implement avx512 without 512 bit registers. Permute, gather, shuffle. mask and the like make that basically impossible.

Sure you could split avx512 registers to do avx but interlane operations and persistence of data make combining 2 lesser registers inadequate.

Timmah! · Aug 4, 2022

eek2121 said:
If that rumor is true, those base clocks are pretty impressive So are the boost clocks! The 7950x has a 32.5% frequency increase over the 5950x.

Yeah, if the base clock is 4,5GHz, i dont question the "5,1GHz+ all core" rumors at all. Its pretty awesome. AutoCAD should run on this really swiftly.
My only hope that these numbers will be true for the 3D version as well. V-cache is indeed fantastic, but if it comes at 500 MHz penalty, it kinda loses lot of its allure.

nicalandia · Aug 4, 2022

Schmide said:
Yeah no. You can't implement avx512 without 512 bit registers. Permute, gather, shuffle. mask and the like make that basically impossible.

Sure you could split avx512 registers to do avx but interlane operations and persistence of data make combining 2 lesser registers inadequate.

AMD Just Pulled that off.. Two AVX-256 combined to make a single less power hungry AVX-512

Schmide · Aug 4, 2022

nicalandia said:
AMD Just Pulled that off.. Two AVX-256 combined to make a single less power hungry AVX-512

It would be categorized as the opposite. AMD found a way to reuse avx512 registers as 2 avx256 registers.

deasd · Aug 4, 2022

Looks like L3 is almost as fast as Zen3's L2

https://twitter.com/x/status/1555212797520187392

also:

https://twitter.com/x/status/1555176687440199680

NostaSeronx · Aug 4, 2022

Schmide said:
Yeah no. You can't implement avx512 without 512 bit registers. Permute, gather, shuffle. mask and the like make that basically impossible.

Sure you could split avx512 registers to do avx but interlane operations and persistence of data make combining 2 lesser registers inadequate.

Schmide said:
It would be categorized as the opposite. AMD found a way to reuse avx512 registers as 2 avx256 registers.

Zen4 does not have one 512-bit execution unit for any single pipe.
Zen4 does not have one PRF that has an entry having a 512-bit width.

Zen4 does ZMM this way:
2x128-bit on one PRF as ZMM_low and 2x128-bit on another PRF as ZMM_high.
Of which data is executed on two FP pipes:
FP0/1 for 512-bit MULs
FP2/3 for 512-bit ADDs
FP4/5 for 512-bit Stores

0-2-4 is FPU Cluster0 under FPScheduler0 which modifies PRF0
1-3-5 is FPU Cluster1 under FPScheduler1 which modifies PRF1

Zen4 functions as a Family 19h processor. Its behavior is a superset of Zen3. In this case, the superset behavior is Zen2-esque execution but with AVX512 instead of AVX256.

Zen2 have four pipes available for 256-bit, but it was actually routed via eight pipes that are 128-bit.
Four 128-bit + Four 128-bit = For four 256-bit.
128-bit work didn't have access to the upper four units.

Zen4 has three fused pipes available for 512-bit, but it actually routed via six pipes that are 256-bit.
Three 256-bit + Three 256-bit = For three 512-bit.
256-bit work however does have access to the upper three units. Since, that behavior was present in Zen3.

The hardware control has two options to display behavior on Fam 19h with AVX512:
Parallel ZMM (Lo256+Hi256): ZMM_Low on Cluster0 and ZMM_High on Cluster1
Temporal ZMM (256Lo+256Hi): ZMM_Low and ZMM_High on same cluster at half_rate.

Zen4 is suppose to be Parallel ZMM halves.
Zen4c is suppose to be Temporal ZMM halves, Zen4c superset behavior does Zen1-esque execution w/ lower FP power consumption from 3-pipe only.

gruffi · Aug 4, 2022

inf64 said:
Here is techspot's 5800X3D running DDR 3800 CL16 vs 12900K running DDR5 6400 CL32, tested in 40(!) games with 3090ti:

Ryzen 7 5800X3D vs. Core i9-12900K in 40 Games

A massive gaming benchmark comparison between the Ryzen 7 5800X3D and Core i9-12900K, pitting the two head to head across 40 games at 1080p, 1440p and 4K,...

www.techspot.com

5800X3D is still faster, so I'm not sure how computerbase got their results.

That guy from ComputerBase used overclocked 12th gen models with faster memory. 12900K +42% faster DDR5, 12700K +25% faster DDR4, 5800X3D +19% faster DDR4. That comparison doesn't say much about Ryzen 7000. The advantage of the 12900K in that test is clearly based on OC and fast DDR5. Ryzen 7000 also can use fast DDR5.

Schmide · Aug 4, 2022

NostaSeronx said:
Zen4 has three fused pipes available for 512-bit, but it actually routed via six pipes that are 256-bit.

If they are fused they are 512 bit.

If you think they are separate, please explain how a _mm512_permutex2var is going to execute in separate units with 256bit registers?

gruffi · Aug 4, 2022

Vope45 said:
2/ Why do amd focus heavily on frequency now? You can't have high ipc and clock obviously but still, why now? Is it because of tsmc limited capacity for N5?

They are not focusing on frequency in general. Since the first Zen generation the design clearly had some speed path limits. With Zen 4 AMD seems to eliminate most critical speed path limits before focusing on a wider design for more IPC (Zen 5). Which absolutely makes sense. And btw, up to 10% IPC isn't that bad after all. RPL seems to have ~0% IPC uplift.

Det0x · Aug 4, 2022

deasd said:
Looks like L3 is almost as fast as Zen3's L2
View attachment 65362

I already have those L3 bandwidth numbers with regular Zen3..

Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Senior member

Platinum Member

Senior member

Golden Member

Diamond Member

Senior member

Lifer

Senior member

Junior Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Lifer

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Member

Diamond Member

Member

Golden Member