Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 462 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
809
1,412
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

Geddagod

Golden Member
Dec 28, 2021
1,296
1,368
106
man.....

So .....

1. separate integer and FP resources. approximately 0 workloads stress both at the same time , you know how like SPEC has separate FP and INT benchmarks.
2. when people talk IPC they are almost always talking integer / logic / branch etc operations
3. look at number of EXECUTION resources the two cores have
zen 4:
simple integer : 4
complex mul : 1
complex div: 1
Load/Store : 3

Golden cove:
simple integer : 5
complex mul : 1
complex div 1
load /store : 5

if you look at FP its a little more complex because intel have a dedicated low latency FADD/ALU unit + two FMA/ MISC units while AMD have 2 FADD ( slower then then fast intel ALU ) and two FMA units. But again golden cove has way more Load/Store resources.

Golden Cove is a significantly bigger core, its bigger in almost all ways.
bigger core local caches
bigger front end
bigger execution / more PRF ports ( please actual understand what this means before ignoring it like last time ! )
bigger ROB
bigger L/S

The microarch cheat sheet column is quite literarily labelled "execution ports". So AMD has more execution ports not execution resources?
Ok I will make an addendum then- AMD has more execution PORTS (not resources) than Intel which allows them competitive execution throughput which also allows them to be much closer in performance without blowing up the front end.
Bigger in almost every way, Golden Cove still has parts that are smaller, which is why chips and cheese claims "Against Intel, Zen 4 has competitive vector throughput already. Intel doesn’t have a significant throughput advantage"

Overall though, my conclusion still doesn't change. RWC (GLC+) on a competitive node with AMD Zen 4 doesn't has, at worst, a 15% worse perf/area while also being more performant. GLC is not a hilariously bloated architecture as some here make it out to seem, at least client.
AMD targeted the backend more than Intel, which means that they have to build up the front end in their architectures more. Zen 4 does this, as is the main focus of Zen 5 by increasing core width. GLC mostly focused on the front end, which means they can optimize their massive front end resources and focus on increasing the execution of the massive capacity they have, which could also be gleamed by comparing the shrinkage of part-to-part subsection of GLC vs RWC from Intel 7 to Intel 4.
 

Geddagod

Golden Member
Dec 28, 2021
1,296
1,368
106
With a big enough sample size of games Raptor and vanilla Zen 4 seem pretty evenly matched. It's not really hard to cherry pick examples that show either in a better light. I tend to think Spider-man is just a poorly optimized port of a console game that favors Intel. There are plenty of games where AMD has the upper hand like Horizon Zero Dawn or Battlefield V.

The 30% number is completely misleading . The memory configuration and resolution for that testing makes the data nearly useless IMO. It's not an invalid way to test necessarily, but it's also very far removed from reality. Nobody would (intentionally) configure either one of these platforms with JDEC timings and officially supported memory speeds to play their video games. 720p data on flagship CPU/GPU's in 2023 is kind of silly IMO. It's mildly interesting for teh science but not particularity relevant to how any reasonable person would use the tested hardware. With a much more realistic memory config at 1080p the 13900k is no where near 30% faster than a 7950x in spidey.

Spiderr-p.webp
You can't really have it both ways, either you test at low resolutions to see how CPUs perform to the fullest, or you should test at worst 1440p because that's what the vast majority of people who buy those CPUs (13900k or 7950x) will be using for gaming, 1440p and 4k with 4090s and 4080s. And a small subsection of pro E-sports player I suppose.
Also testing at lower resolutions helps you predict future CPU performance at lower resolutions. For example, with HWUB initial 12900k testing with a 3090, he found the 12900k to be 3% faster. But with the only improvement being moving to DDR 6400 vs 6000, and also using a 4090, the improvement became nearly 20%. Which also matched the 3Dcenter Meta Review gains of the 12900k vs 5950x which was a mix of 1080p and 720 testing IIRC.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,931
3,556
136
The microarch cheat sheet column is quite literarily labelled "execution ports". So AMD has more execution ports not execution resources?
Ok I will make an addendum then- AMD has more execution PORTS (not resources) than Intel which allows them competitive execution throughput which also allows them to be much closer in performance without blowing up the front end.
Bigger in almost every way, Golden Cove still has parts that are smaller, which is why chips and cheese claims "Against Intel, Zen 4 has competitive vector throughput already. Intel doesn’t have a significant throughput advantage"

Overall though, my conclusion still doesn't change. RWC (GLC+) on a competitive node with AMD Zen 4 doesn't has, at worst, a 15% worse perf/area while also being more performant. GLC is not a hilariously bloated architecture as some here make it out to seem, at least client.
AMD targeted the backend more than Intel, which means that they have to build up the front end in their architectures more. Zen 4 does this, as is the main focus of Zen 5 by increasing core width. GLC mostly focused on the front end, which means they can optimize their massive front end resources and focus on increasing the execution of the massive capacity they have, which could also be gleamed by comparing the shrinkage of part-to-part subsection of GLC vs RWC from Intel 7 to Intel 4.


No your just flat out wrong , the backend of GC is larger in every way that matters. that's because you have to get data in and out of the PRF. remember the number 1 rule of all computing , moving data is hard, executing data is easy. the things that matters to sustained IPC in the execution engine of the core is number of Arithmetic units, number of Read/write PRF ports , PRF size and ROB size, of which GC has more in all cases.

The cost to have both a bigger PRF and an additional 2 read and 1 write port to the PRF is very large because you have to sustain the same latency.

The GC backend is significantly bigger because the things that actually cost power and area are larger by a non trivial amount.

edit: fixing dyslexic moment of every PRF being RPF.......

edit2: answer me this , why would intel have so much more capacity in load store queue , bytes cycle to L1D and L/S ports if it didnt also have more execution resources? load store is extremely complex with memory disambiguation and the large number of reads and write a cycle it can do to L1D per cycle. They just did it for fun ?

And no its not a future growth thing , because intel have kept it growing at the same rate they have grown ALU capacity over the years ( from 3 -> 4 ->5 )
 
Last edited:

Geddagod

Golden Member
Dec 28, 2021
1,296
1,368
106
No your just flat out wrong , the backend of GC is larger in every way that matters. that's because you have to get data in and out of the RFP. remember the number 1 rule of all computing , moving data is hard, executing data is easy. the things that matters to sustained IPC in the execution engine of the core is number of Arithmetic units, number of Read/write RFP ports , RFP size and ROB size, of which GC has more in all cases.

The cost to have both a bigger RFP and an additional 2 read and 1 write port to the RFP is very large because you have to sustain the same latency.

The GC backend is significantly bigger because the things that actually cost power and area and larger by a non trivial amount.
That's just flat out wrong, since the number of execution ports AMD has is larger, and I'm literarily quoting an established source when I state that the execution throughput of Zen 4 is roughly the same as GLC. For having a backend that is "larger in every way that matters" Intel is yet STILL more bottlenecked than AMD because their front end is disproportionally larger, ALSO according to chips and cheese.
And you could see this in the total performance as well.
 

Geddagod

Golden Member
Dec 28, 2021
1,296
1,368
106
No your just flat out wrong , the backend of GC is larger in every way that matters. that's because you have to get data in and out of the PRF. remember the number 1 rule of all computing , moving data is hard, executing data is easy. the things that matters to sustained IPC in the execution engine of the core is number of Arithmetic units, number of Read/write PRF ports , PRF size and ROB size, of which GC has more in all cases.

The cost to have both a bigger PRF and an additional 2 read and 1 write port to the PRF is very large because you have to sustain the same latency.

The GC backend is significantly bigger because the things that actually cost power and area are larger by a non trivial amount.

edit: fixing dyslexic moment of every PRF being RPF.......

edit2: answer me this , why would intel have so much more capacity in load store queue , bytes cycle to L1D and L/S ports if it didnt also have more execution resources? load store is extremely complex with memory disambiguation and the large number of reads and write a cycle it can do to L1D per cycle. They just did it for fun ?

And no its not a future growth thing , because intel have kept it growing at the same rate they have grown ALU capacity over the years ( from 3 -> 4 ->5 )
About your edits:
edit 2: Idk if you just didn't read my post which you are responding too, because I said, and quoting "AMD has more execution PORTS (not resources) than Intel which allows them competitive execution throughput"
Also yes, architectures aren't completely balanced right out of the gate. What?
It quite literarily is a future growth thing. What did they keep growing at the same rate as ALU capacity?
Idk what is with the passive aggressiveness too, like cmon dude chill, we are talking about CPU architecture not deciding the future of your country lmao.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,102
136
I think all of this talk of execution resources, buffer sizes, etc etc is all rather superfluous. GLC/RPC use significantly more transistors than Zen 4 (with higher power to match) yet doesn't have anything significant (IPC, meaningful frequency gap) to show for it. I think it's thus fair to conclude that GLC is bloated. The fact that they seem to be more or less reusing it until '24 doesn't make it look any better today. The opposite, if anything.
 

Hans Gruber

Platinum Member
Dec 23, 2006
2,305
1,218
136
Can we talk about the non "X" Zen 4 7900 processor that is rated at 65w? Normally those would be 105w vs. the Zen 3 5900x. It says that it tops out @ 5.4ghz. Is anybody else interested in this CPU?
 
  • Like
Reactions: Harry_Wild

coercitiv

Diamond Member
Jan 24, 2014
6,677
14,272
136

Timorous

Golden Member
Oct 27, 2008
1,748
3,240
136
There's no evidence for that particular change to Zen 4c.

It fits logically though.

current density of Zen 4 is 94M xtors / mm^2. At that density the Zen 4c CCD would be around 117mm^2 (based on 11B xtors which is 2x Zen4 with 2B removed because L3 cache is not doubled) which is just too big so density must be higher and not by a small amount either. Now you might argue it will be fewer transistors than that because clocks are going to be lower and it may be a bit lower but to hit 80mm^2 at 94M per mm^2 needs to CCD to be no more than 7.5B transistors. With 32MB cache that is just 5.5B transistors for 16c. It just does not look doable, at least not while still being called Zen4.

On Bergamo you may only have 8 CCDs to fit but you have twice the links because the IO die is connected to each CCX so rather than the 12 CCXs you have in Genoa you need links for 16 CCXs, more if there are 2 IO dies (which with Siena would make perfect sense, 1 for Siena for upto 64 cores and 2 for Bergamo for upto 128 cores) so they can talk to each other and act as 1.

What you get with APU transistor density is 11B fitting in 79mm^2 which seems far easier to fit for both Siena and Bergamo.

It also fits because the APU CCX has been designed, tested and is in production so re-using that block and just doubling them up gives you a huge head start in designing the Zen 4c CCD and is pretty standard for AMD to do given they like to use as much as possible as wide as possible.
 
  • Like
Reactions: lightmanek

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
AMD 7000 Non-X Review by Anandtech is up..


1673273011405.png

1673273132447.png


TechpowerUp 7600 Review
AMD Ryzen 5 7600 Review - Affordable Zen 4 for the Masses

 
Last edited:

Asterox

Golden Member
May 15, 2012
1,039
1,823
136

Approximate worst case scenario for a small stock Wraith Stealth CPU cooler, Cinebench R23. In gaming or mixed use, significantly lower temperatures and higher CPU All Core boost can be expected.

2023-01-09_155651.jpg

As expected, Wraith Prism RGB cpu cooler is very good for CPU-s with lower TDP.Again Cinebench R23, and what are the CPU temperatures in the classic CPU rendering test.

2023-01-09_160009.jpg

2023-01-09_160249.jpg
 
Last edited:

yuri69

Senior member
Jul 16, 2013
541
975
136
Something is seriously off at AMD right now. Did they hire too many too quickly?
Nah, it seems to be more or less in line with the previous releases.

Early Zen 1s were replaced due to an errata causing segfaults.
Zen 2 era got boosting issues resulting in the meme "ABBA" version.
Zen 3 era got those endless USB port issues.

RDNA1 got endless stream of those black-screen reports.
Early RDNA2 got problems with power management.
RDNA 3 is... RDNA3.
 

eek2121

Diamond Member
Aug 2, 2005
3,100
4,398
136
AMD 7900 vs 7900X vs 12900K vs 5950X

View attachment 74280


This has got to be the Most Efficient x86 CPU. 65W TDP beating not so old 12900K and 5950X

View attachment 74281

The 7950X is likely more efficient when run at lower power limits. I know it is a beast when TDP is set to 65W. AMD likely does not use the best dies with these non-X chips.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
I would be very surprised if AMD just used APU desing(half L3) to get the space required to put 128 Cores worth of die area on the same package as Genoa.
 
Last edited:

lopri

Elite Member
Jul 27, 2002
13,233
618
126
We have enough information already but let me add one more graph. Didn't Computerbase.de not too long ago publish a chart that showed 7700X, 7900X, and 7950X at same performance rating at 45W? I thought someone linked that chart here. This is a new chart apparently.

Computerbase.PNG
 
  • Like
Reactions: Elfear