Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 464 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
809
1,412
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,166
15,309
136
Last edited:
  • Like
Reactions: Kaluan and Mopetar

Tuna-Fish

Golden Member
Mar 4, 2011
1,486
2,023
136
On paper Zen 4 *does* looks weaker/simpler than Golden Cove but its unquantifiable characteristics must be excellent as it ends up with similar performance at similar clock rates. Not in every workload but when averaged out they're pretty close.

Cache latencies are notably lower. That's probably all it is.

zen4_latency_cycles.png


Latency in cycles is lower for all levels of cache. That 1 cycle of L1 latency probably has a deceptively high impact. L1 has >95% hit rate on most well behaved loads, one extra cycle on every access adds up.

Zen 4s reorder buffer performs surprisingly well compared to GLC despite the size difference
When your average latencies are lower, you don't need as big of a ROB to keep the units occupied.

separate integer and FP resources. approximately 0 workloads stress both at the same time , you know how like SPEC has separate FP and INT benchmarks.
That's not really right. Even heavy vector loads typically have a bunch of scalar integer "bookkeeping", like loop counters, list indexes, etc stuff they need to update.

Granted, with the 2 dedicated scalar ports, GLC now probably has enough for most cases.
 
Last edited:

eek2121

Diamond Member
Aug 2, 2005
3,100
4,398
136
They will probably reveal pricing once the KS part drops.

Target audience you mean? Yeah I suppose.
7000 non-X, 7600X/7700X budget builds, upcoming APUs etc

But I doubt you can't run a 7950X on them if you really wanted to.

Indeed, these boards will very likely be able to run all chips. Expect IO to be gimped, however. Likely no PCIE 5. Limited PCIE 4, if any.
 
  • Like
Reactions: Kaluan

itsmydamnation

Platinum Member
Feb 6, 2011
2,931
3,556
136
That's not really right. Even heavy vector loads typically have a bunch of scalar integer "bookkeeping", like loop counters, list indexes, etc stuff they need to update.

Granted, with the 2 dedicated scalar ports, GLC now probably has enough for most cases.
Thats why I used the word stressed . its not that there isn't a bunch of flow control , loop counter etc going on. If this was really such a big advantage AMD would have much higher dispatch then they do :) . Does anyone know if Zen3/4 has higher retirement /cycle then dispatch like Zen2 did ?
 

Abwx

Lifer
Apr 2, 2011
11,557
4,349
136
Last edited:
  • Like
Reactions: lightmanek

JustViewing

Senior member
Aug 17, 2022
217
383
106
1. separate integer and FP resources. approximately 0 workloads stress both at the same time , you know how like SPEC has separate FP and INT benchmarks.

Not necessarily, not all FP workload are steaming AVX. Most of the time they will be mix of both. Having said that, compilers may optimize for Intel's combined Int/FP architecture by keeping FP and code sections separately to help with scheduler. Unlike in the past, modern schedulers are very flexible with its deep out of order buffers.
 
  • Like
Reactions: Geddagod

Mopetar

Diamond Member
Jan 31, 2011
8,113
6,768
136
Now that we're starting to see cheaper Zen 4 CPUs, I think we're going to see less expensive motherboards. It's considerably easier to talk the person who just bought a 7950X into a premium board, but the guy who's in the market for a 7600 or a 7700 is t looking to spend nearly as much.

It's the same every time a new platform launches. Companies lead with high-end products to get higher margins out of enthusiasts that are willing to spend more than the average consumer. Over time the mainstream and low-end options trickle out, but there's not a lot of financial incentive to lead with a low-cost, low-margin product.
 
  • Like
Reactions: lightmanek

biostud

Lifer
Feb 27, 2003
18,700
5,431
136
When there were a difference between functionality between X and B boards I can understand why you would choose an X board. But the X670E and B650E are based on the same chip (x2 for the X670E) , it is just the number of pcie lanes, sata and USB connectors that differ. How many users really need that?
 
  • Like
Reactions: Racan and Exist50

Det0x

Golden Member
Sep 11, 2014
1,299
4,234
136
Last edited:

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Latency in cycles is lower for all levels of cache. That 1 cycle of L1 latency probably has a deceptively high impact. L1 has >95% hit rate on most well behaved loads, one extra cycle on every access adds up.

Intel has advantages with size tho, 50% more L1 for just 1 more cycle of latency is good tradeoff. And 100% more L2 with 14 vs 16 or 17 cycles is damn good tradeoff as well.
Combined that creates a situation, when workload does not allow AMD to make use of their excellent L3, they get beaten badly by wider Intel's machinery:

5Ghz vs 5Ghz:
1673783277780.png
1673783243357.png
 
  • Like
Reactions: pcp7