Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Vattila · Oct 6, 2019

Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts!

poke01 · Aug 15, 2022

Hans Gruber said:
. Of course Mac users cannot just drop in a standard radeon card. They must pay the apple ferry man 2x-3x the cost of a standard card for their firmware extortion fee.

The Intel Mac Pro has PCI-e slots and it supports RDNA 2 cards from rx 6800 to rx 6900XT. You can use off the shelf cards from third partys too.
Same with any Intel Macbook, eGPU supports third party AMD GPUs.

Sucks because with M1 or M2 that option is going and it will only be Apple GPUs and thats when the real extortion fee of 2-4x begins.

tamz_msc · Aug 16, 2022

nicalandia said:
More info on AVX512 Genoa

View attachment 65955

Genoa and Zen4 are going to murk Xeons in AVX512 Because they will lock it. Unless you pay to unlock it by SDSi

Genoa is such a Monster. It has so many Cores that it just breaks Cinebench and CPUZ.

Eh, it's the same Score/GHz as Tiger Lake. For reference, my 11370H scores 785 points at ~4.2 GHz in ST.

deasd · Aug 16, 2022

How reliable this site 'Nanoreview' is? They said they got all Zen4 samples and made them run, got all pre R23 result.

AMD Ryzen 9 7950X: benchmarks and specs | NR

Performance tests in benchmarks and full specifications of AMD Ryzen 9 7950X with 16 cores, 4500 MHz.

nanoreview.net

AMD Ryzen 9 7900X: benchmarks and specs | NR

Performance tests in benchmarks and full specifications of AMD Ryzen 9 7900X with 12 cores, 4700 MHz.

nanoreview.net

AMD Ryzen 7 7700X: benchmarks and specs | NR

Performance tests in benchmarks and full specifications of AMD Ryzen 7 7700X with 8 cores, 4500 MHz.

nanoreview.net

AMD Ryzen 5 7600X: benchmarks and specs | NR

Performance tests in benchmarks and full specifications of AMD Ryzen 5 7600X with 6 cores, 4700 MHz.

nanoreview.net

7950x ST ~2000 MT ~38000

edit: very likely to be just speculation........

and now it seems that there's a new Genoa GB score which is more reliable

Suma R6250A0 - Geekbench

Benchmark results for a Suma R6250A0 with an AMD Eng Sample: 100-000000997-01 processor.

browser.geekbench.com

https://twitter.com/x/status/1559399820493742080

Single-Core Score 1460
Crypto Score 4489
Integer Score 1212
Floating Point Score 1494

poke01 · Aug 16, 2022

MadRat said:
The i9 and 3080 combination using Quicksync is truly impressive for workstation customers. But it is not simple and too much work for your average consumers. It is safe to say they do not compete for the same market.

The problem isn't a few percent, its a steady erosion of the same base customer. Ignore it if you will, but this is a real threat to any manufacturer in the PC market.

MadRat is right. Apple has a serious advantage in the video editing space. The M2 Air even though fanless is pretty good thanks to its excellent encoders and decoders. AMD needs to address this drawback with Zen 4 mobile chips.

biostud · Aug 16, 2022

poke01 said:
MadRat is right. Apple has a serious advantage in the video editing space. The M2 Air even though fanless is pretty good thanks to its excellent encoders and decoders. AMD needs to address this drawback with Zen 4 mobile chips.
View attachment 65972

Isn't basically because Apple has hardware that works like Intels "Quick sync", nVidias nvEnc and AMD's AMF encoder, and they compare these with pure CPU encoding? Currently AMD and nVidia has their encoders placed on the GPU, but it would be interesting if AMD decides (I don't know if it already has) to support it on their APU's.

The real comparison would be to test the M2 against a PC using either of the dedicated encoders and post those results.

inf64 · Aug 16, 2022

deasd said:
and now it seems that there's a new Genoa GB score which is more reliable

Suma R6250A0 - Geekbench

Benchmark results for a Suma R6250A0 with an AMD Eng Sample: 100-000000997-01 processor.

browser.geekbench.com

https://twitter.com/x/status/1559399820493742080

16MB L3 x 4? Is this Genoa running a cut down L3 cache per CCD(X)?

Anyhow, versus Zen 3 at the same clock and excluding the aggregate ST score and crypto score, we have these uplifts (in %) for int and fp in GB5 ST benchmark:

Integer Score	1.104
Floating Point Score	1.085

Matches pretty well with what AMD said so far. Compute is a bit lower (8.5%), integer is a bit higher (10.5%). Overall score is a lot higher because of crypto subtest that scores 50% better per clock than Zen 3 (that's why I excluded it).

So if Zen 4 clocks to 5.7-5.9Ghz it should be between 26 and 28% faster than 5950X. Below is current ST benchmark table from Computerbase (without Raptor Lake):

Top Raptor lake is rumored to clock to ~5.8Ghz at stock on a few threads so that should put it at 130% (I assumed ~2% IPC uplift due to bigger L2; 121 x 58/55 x 1.02= 130)

If this checks out, the difference in average ST performance between 7950X and 13900K should be within margin of error (~1-3%). This is pretty much expected as Golden Cove has around 10%-11% higher IPC vs Zen 3, and Zen 4 has that much better IPC than Zen 3 (so they are pretty much even on average). Zen4 and Raptor cove clock to about the same max boost clocks so ST performance should be very similar (not gaming performance).

tamz_msc · Aug 16, 2022

deasd said:
and now it seems that there's a new Genoa GB score which is more reliable

Suma R6250A0 - Geekbench

Benchmark results for a Suma R6250A0 with an AMD Eng Sample: 100-000000997-01 processor.

browser.geekbench.com

https://twitter.com/x/status/1559399820493742080

Took a while to find a score with the same kernel version and GB version, but here is Milan for comparison, running at 3.8 GHz:

Suma R6240A0 - Geekbench

Benchmark results for a Suma R6240A0 with an AMD EPYC 7763 processor.

browser.geekbench.com

Genoa: 1212 pts integer @ 3.5 GHz = 346 pts/GHz
Milan: 1234 pts integer @ 3.8 GHz = 325 pts/GHz

Perf/clock improvement = 346/325 = 1.0646 or 6.5%.

Genoa: 1494 pts fp @ 3.5 GHz = 427 pts/GHz
Milan: 1583 pts fp @ 3.8 GHz = 417 pts/GHz

Perf/clock improvement = 427/417 = 1.0239 or 2.4%

That's lower than AMD's 8-10% projections.

biostud · Aug 16, 2022

inf64 said:
16MB L3 x 4? Is this Genoa running a cut down L3 cache per CCD(X)?

Anyhow, versus Zen 3 at the same clock and excluding the aggregate ST score and crypto score, we have these uplifts (in %) for int and fp in GB5 ST benchmark:

Integer Score
1.104
Floating Point Score
1.085

Matches pretty well with what AMD said so far. Compute is a bit lower (8.5%), integer is a bit higher (10.5%). Overall score is a lot higher because of crypto subtest that scores 50% better per clock than Zen 3 (that's why I excluded it).

So if Zen 4 clocks to 5.7-5.9Ghz it should be between 26 and 28% faster than 5950X. Below is current ST benchmark table from Computerbase (without Raptor Lake):

View attachment 65973

Top Raptor lake is rumored to clock to ~5.8Ghz at stock on a few threads so that should put it at 130% (I assumed ~2% IPC uplift due to bigger L2; 121 x 58/55 x 1.02= 130)

If this checks out, the difference in average ST performance between 7950X and 13900K should be within margin of error (~1-3%). This is pretty much expected as Golden Cove has around 10%-11% higher IPC vs Zen 3, and Zen 4 has that much better IPC than Zen 3 (so they are pretty much even on average). Zen4 and Raptor cove clock to about the same max boost clocks so ST performance should be very similar (not gaming performance).

Could it be Zen4c cores with less L3 cache?

inf64 · Aug 16, 2022

biostud said:
Could it be Zen4c cores with less L3 cache?

It's possible that this is some sort of lower core count Bergamo ES based on Zen4c. The ST performance Vs the full fat L3 Genoa is pretty much the same in GB5 at least.

nicalandia · Aug 16, 2022

tamz_msc said:
Eh, it's the same Score/GHz as Tiger Lake. For reference, my 11370H scores 785 points at ~4.2 GHz in ST.

Except the Genoa single CPU Does over 15,000 points in MT. It smokes Ice Lake Xeons and SPR has AVX512 As a paid for Feature SDSi

deasd · Aug 16, 2022

I guess some paranoia guys would have start measuring how many times larger is this '>' than the Computex one......

https://twitter.com/x/status/1559457964230152193

LightningZ71 · Aug 16, 2022

tamz_msc said:
Eh, it's the same Score/GHz as Tiger Lake. For reference, my 11370H scores 785 points at ~4.2 GHz in ST.

The implication there is that AMD only implemented a single AVX-512 Vector unit in Zen4 instead of the dual units in Cascade Lake and Ice Lake Xeon. Its good from an instruction compatibility point of view. It's good from a comparison to competition POV. It's bad when you need to do raw performance comparisons to Ice Lake and Saphire Rapids Xeons for single core throughput. The one saving grace for AMD in that comparison may be that they can sustain higher clocks than Saphire Rapids in their 64-96 core parts, enough so that having the lower per-clock throughput won't hamper them in benchmarks.

nicalandia · Aug 16, 2022

deasd said:
seems that there's a new Genoa GB score which is more reliable

Suma R6250A0 - Geekbench

Benchmark results for a Suma R6250A0 with an AMD Eng Sample: 100-000000997-01 processor.

browser.geekbench.com

https://twitter.com/x/status/1559399820493742080

Its Not Genoa. Its Bergamo(1/2 Cache and no AVX512)

inf64 · Aug 16, 2022

deasd said:
I guess some paranoia guys would have start measuring how many times larger is this '>' than the Computex one......

https://twitter.com/x/status/1559457964230152193

There is no need, as we have all the info from AMD's presentation. ~8-10% IPC and >5.5Ghz ST clock. If they hit 5.6-5.8Ghz it's at minimum ~21% and max. around 27%.

nicalandia · Aug 16, 2022

LightningZ71 said:
The implication there is that AMD only implemented a single AVX-512 Vector unit in Zen4 instead of the dual units in Cascade Lake and Ice Lake Xeon. Its good from an instruction compatibility point of view. It's good from a comparison to competition POV. It's bad when you need to do raw performance comparisons to Ice Lake and Saphire Rapids Xeons for single core throughput.

Its the CPUZ AVX512 Benchmark using one or two? Because Genoa its wrecking Ice Lake Xeons on that AVX512 Benchmark

nicalandia · Aug 16, 2022

Genoa 866

vs

Bergamo 897

To reduce space(To put 16 pieces of 8 Core CCDs AMD Halved the L3 and removed AVX512, but left AVX256 )

Tarkin77 · Aug 16, 2022

AMD to Host Livestream Event to Unveil Next Generation Ryzen Processors

SANTA CLARA, Calif., Aug. 16, 2022 (GLOBE NEWSWIRE) -- Today, AMD (NASDAQ: AMD) announced “together we advance_PCs,” a livestream premiere…...

ir.amd.com

AMD to Host Livestream Event to Unveil Next Generation Ryzen Processors

The show will premiere at 7 p.m. ET on Monday, August 29, on the AMD YouTube channel.

Hitman928 · Aug 16, 2022

nicalandia said:
To reduce space(To put 16 pieces of 8 Core CCDs AMD Halved the L3 and removed AVX512, but left AVX256 )

Didn't AMD say that Zen4c has the same instruction support as Zen4?

nicalandia · Aug 16, 2022

Hitman928 said:
Didn't AMD say that Zen4c has the same instruction support as Zen4?

They did... except a full two chunk of AVX-512 would take too much space and they need every mm^2 they can find. For general tasks and cloud I don't see the point of AVX-512 nor too much cache(either on die or 3D stacked). That is how they reduced the size in exactly the half.

LightningZ71 · Aug 16, 2022

nicalandia said:
Its the CPUZ AVX512 Benchmark using one or two? Because Genoa its wrecking Ice Lake Xeons on that AVX512 Benchmark

Hmm, I didn't think of a flawed bench doing that. How does Ice Lake Server compare to Ice Lake Mobile in that benchmark? Ice Lake mobile certainly only has one AVX-512 unit as compared to server with 2.

BorisTheBlade82 · Aug 16, 2022

nicalandia said:
To reduce space(To put 16 pieces of 8 Core CCDs AMD Halved the L3 and removed AVX512, but left AVX256 )

You are stating this as if it was a fact. I am not convinced that they axed AVX512. Firstly they stated that Bergamo would have the same feature set which means to me AVX512 support (no matter if half, full or double speed). Secondly that would put them in the same awkward position as Intel if they wanted to use Bergamo as little cores later on (which IMHO is highly probable). And thirdly the ST crypto score is much better than Milan.
I know they want to save die space - but there are other options than entirely getting rid of AVX512. So do you have further background info?

nicalandia · Aug 16, 2022

BorisTheBlade82 said:
I know they want to save die space - but there are other options than entirely getting rid of AVX512. So do you have further background info?

Perhaps they are using a denser/Shorter AVX512 like they did with PS5 Zen2(compact AVX256), that saved them a 56% die area space.

This is what I believe Bergamo is going to look compared to Genoa.

Bergamo Full ISA compatible(AVX512 with less throughput), 1MiB L2 and 16MiB per CCD. Bergamo lacks the TSV Rails.

inf64 · Aug 16, 2022

Looking at the above CCD sizes in the pictures, it appears to me that Bergamo CCD is 0.75x of the full Zen 4 CCD. That makes perfect sense as 96 cores x 1/ 0.75 = 128 cores. From a cost perspective, one 128C Bergamo part should not cost more to produce than one regular Genoa 96C part.

Tuna-Fish · Aug 16, 2022

nicalandia said:
They did... except a full two chunk of AVX-512 would take too much space and they need every mm^2 they can find. For general tasks and cloud I don't see the point of AVX-512 nor too much cache(either on die or 3D stacked). That is how they reduced the size in exactly the half.

That would still be literally lying. To investors in an investor event. This would result in enforcement action.

What is possible, or even likely, is that they are implementing weaker AVX-512 support. Such as some kind of <½ throughput split implementation. To meet the public statements they have done, Bergamo must be able to execute every instruction that Genoa can. But no-one has said anything about how fast it does that.

nicalandia · Aug 16, 2022

Tuna-Fish said:
What is possible, or even likely, is that they are implementing weaker AVX-512 support. Such as some kind of <½ throughput split implementation. To meet the public statements they have done, Bergamo must be able to execute every instruction that Genoa can. But no-one has said anything about how fast it does that.

Hence my point about them being able to reduce the footprint size of AVX512 on the die area by sacrificing throughput

Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Senior member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Member

Diamond Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Diamond Member

Golden Member

Diamond Member