Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 296 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,829
136
I hope Lisa Su doesn't call FBI to break my house's door just for searching Zen4 CPU cuz it's not the right place.

If she does, ask her where my Radeon VII letter went, cuz they never sent me one. Le sad face.

Seriously though

Add me to the surprised queue over the reorder buffer changes. Did not expect.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,829
136
Increasing the buffer requires significant engineering? It can't be increased like cache?

Per @arcsign and @Saylick , proper sizing of the ROB is important. It takes a fair amount of engineering and code profiling to size that correctly. Actually integrating the increased ROB into the design itself is probably not such a herculean task. But given how little AMD has changed it since Zen2, such a change seemed unlikely.

Should be interesting to see what effects it has, or if anyone even bothers testing for that with commercial samples.
 

BorisTheBlade82

Senior member
May 1, 2020
663
1,014
106
Well it makes sense in light of Angstronomics' leak:

"As for the claimed Single-Thread Uplift of ‘greater than 15% expected‘, Angstronomics can confirm this is a conservative value, done at below final frequencies and using Maxon’s Cinebench R23 Single Thread Benchmark. We can independently confirm that the Performance Per Clock (PPC) targets for the Zen4 core are targeted at +7% Single-Thread PPC, +10% Multi-Thread PPC over their Zen3 core, with significantly higher PPC for memory sensitive workloads thanks to DDR5 while core execution bound workloads like Cinema4D have a lower PPC improvement. "

Significant uplifts in memory sensitive workloads (>20%) can be expected with changes like the tweet above suggests.
Well, that is an interesting matter.
The ROB helps with basically everything. In memory intense workloads many instructions wait for loads from caches and RAM. The ROB will help to keep instruction units busy as well as load and store units. More generally the ROB size is a good indication of how wide an architecture is. The bigger, the wider. This is because AFAIK the ROB is also very expensive in terms of consumption, speed, area or - most likely - a combination of all of them. Otherwise everyone would simply have a ROB with thousands of entries.
 

inf64

Diamond Member
Mar 11, 2011
3,697
4,015
136
Some changes in Zen 4 were also listed on wikichip: https://en.wikichip.org/wiki/amd/microarchitectures/zen_4#Key_changes_from_Zen_3

  • Core
    • AVX-512 instructions support
    • L1 and L2 DTLB size increased from 64 to 72 and 2,048 to 3,072 entries
    • L2 cache doubled from 512 KiB to 1 MiB per core
    • Max. physical and linear address size raised from 48 to 52 and 57 bits respectively
    • Improved cache load, write and prefetch from/to register (less latency).
    • Higher Transistor Density, due to 5nm process
    • Capable of higher all-core clockspeeds (shown by AMD to reach 5GHz+ on all cores)
They still have the "old" uOP (L0) cache size listed for Zen 4.

Edit: From the chipsandcheese article , we see the author pretty much nailed it

" What AMD may change with Zen 4
Now I am going to preface this section by saying that cutting edge CPUs are basically black voodoo magic that no single person could ever fully understand however, based on the gathered data we can draw some conclusions on what AMD may change in Zen 4.

The first thing AMD will almost certainly do is make the Branch Predictor better. Now Zen 3’s BPU is already very good, however the fewer cycles you waste on branch mispredicts or on L2 BTB overrides the more cycles you can do useful work.

The next thing AMD will try and improve with Zen 4 is the amount of dispatch stalls that the core has which can be solved in two ways, either lower the latency of the structures in the backend so the resources the backend has can be freed up quicker or make the structures in the backend bigger to better absorb the amount of instructions coming from the frontend.

The three most common reasons for Zen 3 to stall out on the backend are the ROB filling, the Load Queue filling, and the Store Queue fill, with an honorary mention of the Floating Point Registers being the second most common reason for stalls in our testing of Linpack. So AMD making these structures larger and/or lower latency in Zen 4 is a likely possibility.

And lastly a larger L1 and L2 cache may also come with Zen 4 because the less time you spend trying to access data the more time you can spend working on data."
 
Last edited:

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Some changes in Zen 4 were also listed on wikichip: https://en.wikichip.org/wiki/amd/microarchitectures/zen_4#Key_changes_from_Zen_3

  • Core
    • AVX-512 instructions support
    • L1 and L2 DTLB size increased from 64 to 72 and 2,048 to 3,072 entries
    • L2 cache doubled from 512 KiB to 1 MiB per core
    • Max. physical and linear address size raised from 48 to 52 and 57 bits respectively
    • Improved cache load, write and prefetch from/to register (less latency).
    • Higher Transistor Density, due to 5nm process
    • Capable of higher all-core clockspeeds (shown by AMD to reach 5GHz+ on all cores)
They still have the "old" uOP (L0) cache size listed for Zen 4.
That has been known since the Gigabyte Zen4 Leak

1660768904889.png

 

Dave3000

Golden Member
Jan 10, 2011
1,351
91
91
I wonder if there is going to be a bigger difference in gaming between the Ryzen 7950X and 7700X than there is between the 5950X and 5800X respectively. The difference in gaming between the 5800X and 5950X is almost nothing and whatever little gain in gaming from the 5950X is because of the little higher boost clocks. If the difference will be bigger for gaming with the next 8-core and 16-core Ryzens, I might be willing to go for the 7950X instead of the 7700X as my PC is primarily used for gaming with work on the side. I woudn't consider the 7900X since I want an 8-core CCX.
 

Joe NYC

Golden Member
Jun 26, 2021
1,934
2,272
106
I wonder if there is going to be a bigger difference in gaming between the Ryzen 7950X and 7700X than there is between the 5950X and 5800X respectively. The difference in gaming between the 5800X and 5950X is almost nothing and whatever little gain in gaming from the 5950X is because of the little higher boost clocks. If the difference will be bigger for gaming with the next 8-core and 16-core Ryzens, I might be willing to go for the 7950X instead of the 7700X as my PC is primarily used for gaming with work on the side. I woudn't consider the 7900X since I want an 8-core CCX.

Gains from higher boost clocks are tiny. The difference is that if the game is using some threads ( > 1 ) then it can benefit from the 2nd L3 cache of the 2nd CCD, and that's where most of the gains came from.

5800x3d delivered greater performance increase from 5800x to 5800x3d than from 5800x to 5950x, even at significantly lower clock speeds of 5800x3d.

So, for gaming, between 7700x and 7950x - neither one, wait for 7800x3d,

If AMD solved the voltage issue that kept boost clocks of 5800x3d low, and 7800x3d can clock at the same speeds as the rest of Zen 4, then it is going to completely outclass both 7700x, 7950x and Raptor Lake in gaming.
 

Vattila

Senior member
Oct 22, 2004
799
1,351
136
OK, lets talk the IHS design, i mean the text printed on top. I dont like it at all.

The "octopus" IHS design is pretty eye-catching. Very cool!

Regarding the text, I would like them to simplify it a bit. Most of all, I don't like the "Ryzen 3/5/7" tier branding nonsense. It is awkward when combined with the model number. If they need it, invent a letter prefix instead. But I think the model number is enough to communicate the tier. Bigger number, higher tier. Simple as that.

And I don't like the repetition of "AMD Ryzen". Say it once, bigger and bolder, instead. And, for old eyes like mine, I'd like the model number to be big and prominent.

The product details (QR code, SKU number, production code, serial number, origin and copyright) could be better formatted and nicer aligned.

Something like my best effort edit on the right:

Ryzen 7700X IHS text design proposal.png

PS. Here is a variant with emphasis on the AMD name and logo branding:
Ryzen 7700X IHS text design proposal (Taylor edition).png
And here is a variant with emphasis on the Ryzen brand with the "Zen" ensō:
Ryzen 7700X IHS text design proposal (Clark edition).png
And, let's finish with a variant for that special occasion:
Ryzen 7700X IHS text design proposal (Su edition).png
 
Last edited:

Abwx

Lifer
Apr 2, 2011
10,939
3,440
136

Curious that CPUZ detect a 96 cores Epyc but still display 128 cores/256 threads in the core/thread and L1/L2 caches counts as well as 16 x L3, wich also point to 128C if there s 32MB/CCX.

Edit : That would be possible if Zen4 and Zen 4c use the same die with Genoa having 32 cores disabled to get higher frequencies within a same TDP.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
Curious that CPUZ detect a 96 cores Epyc but still display 128 cores/256 threads in the core/thread and L1/L2 caches counts as well as 16 x L3, wich also point to 128C if there s 32MB/CCX.

Edit : That would be possible if Zen4 and Zen 4c use the same die with Genoa having 32 cores disabled to get higher frequencies within a same TDP.
Zen 4 and Zen 4c are not the same die. Zen 4c is likely made on a much more power / density optimized process, but that will likely limit max boost clock significantly. That doesn’t matter much for the market they are targeting where they will likely run at 100% load all of the time. Bergamo is a throughput processor likely aimed at 128-core ARM competition

The first image has contradictory information. The 96 core Genoa is 12 chiplets with 32 MB L3 each. Bergamo is, I guess, 16 CCX across 8 chiplets, but it is supposed to be 16 x 16 MB L3, not 16 x 32 MB L3. Obviously 96 not equal 128.

The bottom image seems correct, or at least it has 12 x 32 MB for the L3 cache.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Curious that CPUZ detect a 96 cores Epyc but still display 128 cores/256 threads in the core/thread and L1/L2 caches counts as well as 16 x L3, wich also point to 128C if there s 32MB/CCX.

Edit : That would be possible if Zen4 and Zen 4c use the same die with Genoa having 32 cores disabled to get higher frequencies within a same TDP.
Its just a CPUZ Bug. Its two QS Genoa on that 2S System.
 

Timmah!

Golden Member
Jul 24, 2010
1,418
630
136
The "octopus" IHS design is pretty eye-catching. Very cool!

Regarding the text, I would like them to simplify it a bit. Most of all, I don't like the "Ryzen 3/5/7" tier branding nonsense. It is awkward when combined with the model number. If they need it, invent a letter prefix instead. But I think the model number is enough to communicate the tier. Bigger number, higher tier. Simple as that.

And I don't like the repetition of "AMD Ryzen". Say it once, bigger and bolder, instead. And, for old eyes like mine, I'd like the model number to be big and prominent.

The product details (QR code, SKU number, production code, serial number, origin and copyright) could be better formatted and nicer aligned.

Something like my best effort edit on the right:

View attachment 66069

PS. Here is a variant with an emphasis on the AMD name and logo branding:

Not quite what i wanted, but my point was in principle about more focus on aesthetics and you seem to get it, as your designs looks significantly better and less busy than the original. Good work. On the second alternative i would move the Ryzen 7700X part upwards, closer to the AMD logo, though.
 
  • Like
Reactions: Tlh97 and Vattila

HurleyBird

Platinum Member
Apr 22, 2003
2,684
1,267
136
Bergamo is, I guess, 16 CCX across 8 chiplets

That's the really interesting thing about the Zen4C CCDs. Either they have 16-core CCXes, or they have 2x8-core CCXes, and either possibility leads to interesting deductions.

If it's a 16-core CCX, then why is Bergamo 128 cores with 8 CCDs instead of 192 with 12 CCDs? No reason the later couldn't be made, so in this scenario AMD is sand-bagging.

On the other hand if it's a 2x8 core CCX, then that means the 4th gen Epyc IOD can connect up to 16 CCXes, which means that it should be possible to get to the same 128-core count with regular Zen 4 Epyc, albeit requiring some work on the layout and routing.

And, that could also mean that the consumer IOD is similarly overprovisioned if it were designed with theoretical Zen4C products in mind.
 

BorisTheBlade82

Senior member
May 1, 2020
663
1,014
106
That's the really interesting thing about the Zen4C CCDs. Either they have 16-core CCXes, or they have 2x8-core CCXes, and either possibility leads to interesting deductions.

If it's a 16-core CCX, then why is Bergamo 128 cores with 8 CCDs instead of 192 with 12 CCDs? No reason the later couldn't be made, so in this scenario AMD is sand-bagging.

On the other hand if it's a 2x8 core CCX, then that means the 4th gen Epyc IOD can connect up to 16 CCXes, which means that it should be possible to get to the same 128-core count with regular Zen 4 Epyc, albeit requiring some work on the layout and routing.

And, that could also mean that the consumer IOD is similarly overprovisioned if it were designed with theoretical Zen4C products in mind.
I would guess that they use 2x8 core CCX per CCD. This way they can reuse the current topology for connecting the cores and can use the 8c CCX as a "little" 8c CCD in the future.

WRT to the IFOP links: We know from the Gigabyte leak that AMD doubled the number of ports per Link from one to two. The reasoning seemed to be that they can disable one of them and save power when there is low bandwidth pressure. But what if they will use both ports when connecting a Genoa CCD but using 1 port per CCD for Bergamo? Maybe they came to the conclusion that halving the bandwidth per CCX was no big disadvantage for Cloud computing...
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
So, this looks like the IoD would have the wiring for 12 physical links to CCDs, each link with two ports. In Bergamo, there are 8 physical links with the ports split between the two CCXs in 8 CCDs. In Genoa, it's 12 physical links with two ports each.

I speculate that, in N5, they can't physically fit more than 8 16 core CCXs on the existing package. If there's a shrink, say to N4 (if there is a dense library that allows a smaller CCD with the same number of cores, but is physically smaller) they may be able to fit 12 CCDs that have 2 8 core CCXs each, connected by 12 physical links with 2 ports each. There is a possible path forward for a package with 192 cores (12 X 16 core CCDs).
 

blackangus

Member
Aug 5, 2022
69
95
51
Has anyone seen motherboard costs for the x670 chipsets?
I saw 1 article that showed the lowest (no PCIE 5.0 slot) for over 300$ USD and anything with 1 5.0 PCIE slot + 1 5.0 NVME slot being 500$+ USD.
Is this realistic?
 

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
^
Sounds like an early adopter premium to me. They'll take time with the cheaper boards anyway.

I would guess that they use 2x8 core CCX per CCD. This way they can reuse the current topology for connecting the cores and can use the 8c CCX as a "little" 8c CCD in the future.

WRT to the IFOP links: We know from the Gigabyte leak that AMD doubled the number of ports per Link from one to two. The reasoning seemed to be that they can disable one of them and save power when there is low bandwidth pressure. But what if they will use both ports when connecting a Genoa CCD but using 1 port per CCD for Bergamo? Maybe they came to the conclusion that halving the bandwidth per CCX was no big disadvantage for Cloud computing...
Considering FP workloads need the most bandwidth and Zen 4c is rumored to halve the AVX512 throughput over Zen 4 the approach you describe makes perfect sense to me.
 

blackangus

Member
Aug 5, 2022
69
95
51
^
Sounds like an early adopter premium to me. They'll take time with the cheaper boards anyway.
These boards were model compared to the same x570 model was 220$ ish which compared to the same x670 model at 500$.
The 300$ board was the lowend x670 chipset.
Which is where my concern comes from.
That just seems outlandish.