Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 293 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

poke01

Senior member
Mar 8, 2022
741
727
106
. Of course Mac users cannot just drop in a standard radeon card. They must pay the apple ferry man 2x-3x the cost of a standard card for their firmware extortion fee.
The Intel Mac Pro has PCI-e slots and it supports RDNA 2 cards from rx 6800 to rx 6900XT. You can use off the shelf cards from third partys too.
Same with any Intel Macbook, eGPU supports third party AMD GPUs.

Sucks because with M1 or M2 that option is going and it will only be Apple GPUs and thats when the real extortion fee of 2-4x begins.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,821
3,643
136
More info on AVX512 Genoa

View attachment 65955

Genoa and Zen4 are going to murk Xeons in AVX512 Because they will lock it. Unless you pay to unlock it by SDSi


Genoa is such a Monster. It has so many Cores that it just breaks Cinebench and CPUZ.
Eh, it's the same Score/GHz as Tiger Lake. For reference, my 11370H scores 785 points at ~4.2 GHz in ST.
 
  • Like
Reactions: lightmanek

deasd

Senior member
Dec 31, 2013
520
763
136
How reliable this site 'Nanoreview' is? They said they got all Zen4 samples and made them run, got all pre R23 result.


7950x ST ~2000 MT ~38000

edit: very likely to be just speculation........


and now it seems that there's a new Genoa GB score which is more reliable


Single-Core Score 1460
Crypto Score 4489
Integer Score 1212
Floating Point Score 1494
 
Last edited:

poke01

Senior member
Mar 8, 2022
741
727
106
The i9 and 3080 combination using Quicksync is truly impressive for workstation customers. But it is not simple and too much work for your average consumers. It is safe to say they do not compete for the same market.

The problem isn't a few percent, its a steady erosion of the same base customer. Ignore it if you will, but this is a real threat to any manufacturer in the PC market.
MadRat is right. Apple has a serious advantage in the video editing space. The M2 Air even though fanless is pretty good thanks to its excellent encoders and decoders. AMD needs to address this drawback with Zen 4 mobile chips.
1660633706967.png
 

biostud

Lifer
Feb 27, 2003
18,251
4,765
136
MadRat is right. Apple has a serious advantage in the video editing space. The M2 Air even though fanless is pretty good thanks to its excellent encoders and decoders. AMD needs to address this drawback with Zen 4 mobile chips.
View attachment 65972
Isn't basically because Apple has hardware that works like Intels "Quick sync", nVidias nvEnc and AMD's AMF encoder, and they compare these with pure CPU encoding? Currently AMD and nVidia has their encoders placed on the GPU, but it would be interesting if AMD decides (I don't know if it already has) to support it on their APU's.

The real comparison would be to test the M2 against a PC using either of the dedicated encoders and post those results.
 

inf64

Diamond Member
Mar 11, 2011
3,703
4,034
136
and now it seems that there's a new Genoa GB score which is more reliable


16MB L3 x 4? Is this Genoa running a cut down L3 cache per CCD(X)?

Anyhow, versus Zen 3 at the same clock and excluding the aggregate ST score and crypto score, we have these uplifts (in %) for int and fp in GB5 ST benchmark:
Integer Score
1.104
Floating Point Score
1.085

Matches pretty well with what AMD said so far. Compute is a bit lower (8.5%), integer is a bit higher (10.5%). Overall score is a lot higher because of crypto subtest that scores 50% better per clock than Zen 3 (that's why I excluded it).

So if Zen 4 clocks to 5.7-5.9Ghz it should be between 26 and 28% faster than 5950X. Below is current ST benchmark table from Computerbase (without Raptor Lake):

1660640812961.png

Top Raptor lake is rumored to clock to ~5.8Ghz at stock on a few threads so that should put it at 130% (I assumed ~2% IPC uplift due to bigger L2; 121 x 58/55 x 1.02= 130)

If this checks out, the difference in average ST performance between 7950X and 13900K should be within margin of error (~1-3%). This is pretty much expected as Golden Cove has around 10%-11% higher IPC vs Zen 3, and Zen 4 has that much better IPC than Zen 3 (so they are pretty much even on average). Zen4 and Raptor cove clock to about the same max boost clocks so ST performance should be very similar (not gaming performance).
 
Last edited:

tamz_msc

Diamond Member
Jan 5, 2017
3,821
3,643
136
and now it seems that there's a new Genoa GB score which is more reliable

Took a while to find a score with the same kernel version and GB version, but here is Milan for comparison, running at 3.8 GHz:

Genoa: 1212 pts integer @ 3.5 GHz = 346 pts/GHz
Milan: 1234 pts integer @ 3.8 GHz = 325 pts/GHz

Perf/clock improvement = 346/325 = 1.0646 or 6.5%.

Genoa: 1494 pts fp @ 3.5 GHz = 427 pts/GHz
Milan: 1583 pts fp @ 3.8 GHz = 417 pts/GHz

Perf/clock improvement = 427/417 = 1.0239 or 2.4%

That's lower than AMD's 8-10% projections.
 

biostud

Lifer
Feb 27, 2003
18,251
4,765
136
16MB L3 x 4? Is this Genoa running a cut down L3 cache per CCD(X)?

Anyhow, versus Zen 3 at the same clock and excluding the aggregate ST score and crypto score, we have these uplifts (in %) for int and fp in GB5 ST benchmark:
Integer Score
1.104
Floating Point Score
1.085

Matches pretty well with what AMD said so far. Compute is a bit lower (8.5%), integer is a bit higher (10.5%). Overall score is a lot higher because of crypto subtest that scores 50% better per clock than Zen 3 (that's why I excluded it).

So if Zen 4 clocks to 5.7-5.9Ghz it should be between 26 and 28% faster than 5950X. Below is current ST benchmark table from Computerbase (without Raptor Lake):

View attachment 65973

Top Raptor lake is rumored to clock to ~5.8Ghz at stock on a few threads so that should put it at 130% (I assumed ~2% IPC uplift due to bigger L2; 121 x 58/55 x 1.02= 130)

If this checks out, the difference in average ST performance between 7950X and 13900K should be within margin of error (~1-3%). This is pretty much expected as Golden Cove has around 10%-11% higher IPC vs Zen 3, and Zen 4 has that much better IPC than Zen 3 (so they are pretty much even on average). Zen4 and Raptor cove clock to about the same max boost clocks so ST performance should be very similar (not gaming performance).

Could it be Zen4c cores with less L3 cache?
 

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
Eh, it's the same Score/GHz as Tiger Lake. For reference, my 11370H scores 785 points at ~4.2 GHz in ST.
The implication there is that AMD only implemented a single AVX-512 Vector unit in Zen4 instead of the dual units in Cascade Lake and Ice Lake Xeon. Its good from an instruction compatibility point of view. It's good from a comparison to competition POV. It's bad when you need to do raw performance comparisons to Ice Lake and Saphire Rapids Xeons for single core throughput. The one saving grace for AMD in that comparison may be that they can sustain higher clocks than Saphire Rapids in their 64-96 core parts, enough so that having the lower per-clock throughput won't hamper them in benchmarks.
 

inf64

Diamond Member
Mar 11, 2011
3,703
4,034
136

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
The implication there is that AMD only implemented a single AVX-512 Vector unit in Zen4 instead of the dual units in Cascade Lake and Ice Lake Xeon. Its good from an instruction compatibility point of view. It's good from a comparison to competition POV. It's bad when you need to do raw performance comparisons to Ice Lake and Saphire Rapids Xeons for single core throughput.
Its the CPUZ AVX512 Benchmark using one or two? Because Genoa its wrecking Ice Lake Xeons on that AVX512 Benchmark
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Didn't AMD say that Zen4c has the same instruction support as Zen4?
They did... except a full two chunk of AVX-512 would take too much space and they need every mm^2 they can find. For general tasks and cloud I don't see the point of AVX-512 nor too much cache(either on die or 3D stacked). That is how they reduced the size in exactly the half.
 

BorisTheBlade82

Senior member
May 1, 2020
664
1,015
106
To reduce space(To put 16 pieces of 8 Core CCDs AMD Halved the L3 and removed AVX512, but left AVX256 )
You are stating this as if it was a fact. I am not convinced that they axed AVX512. Firstly they stated that Bergamo would have the same feature set which means to me AVX512 support (no matter if half, full or double speed). Secondly that would put them in the same awkward position as Intel if they wanted to use Bergamo as little cores later on (which IMHO is highly probable). And thirdly the ST crypto score is much better than Milan.
I know they want to save die space - but there are other options than entirely getting rid of AVX512. So do you have further background info?
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
I know they want to save die space - but there are other options than entirely getting rid of AVX512. So do you have further background info?

Perhaps they are using a denser/Shorter AVX512 like they did with PS5 Zen2(compact AVX256), that saved them a 56% die area space.

This is what I believe Bergamo is going to look compared to Genoa.

Bergamo Full ISA compatible(AVX512 with less throughput), 1MiB L2 and 16MiB per CCD. Bergamo lacks the TSV Rails.

1660663872291.png

1660664886387.png
 
Last edited:

inf64

Diamond Member
Mar 11, 2011
3,703
4,034
136
Looking at the above CCD sizes in the pictures, it appears to me that Bergamo CCD is 0.75x of the full Zen 4 CCD. That makes perfect sense as 96 cores x 1/ 0.75 = 128 cores. From a cost perspective, one 128C Bergamo part should not cost more to produce than one regular Genoa 96C part.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,355
1,550
136
They did... except a full two chunk of AVX-512 would take too much space and they need every mm^2 they can find. For general tasks and cloud I don't see the point of AVX-512 nor too much cache(either on die or 3D stacked). That is how they reduced the size in exactly the half.

That would still be literally lying. To investors in an investor event. This would result in enforcement action.

What is possible, or even likely, is that they are implementing weaker AVX-512 support. Such as some kind of <½ throughput split implementation. To meet the public statements they have done, Bergamo must be able to execute every instruction that Genoa can. But no-one has said anything about how fast it does that.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
What is possible, or even likely, is that they are implementing weaker AVX-512 support. Such as some kind of <½ throughput split implementation. To meet the public statements they have done, Bergamo must be able to execute every instruction that Genoa can. But no-one has said anything about how fast it does that.

Hence my point about them being able to reduce the footprint size of AVX512 on the die area by sacrificing throughput

1660666808292.png