Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 193 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

jamescox

Senior member
Nov 11, 2009
637
1,103
136
I think i had good laughs at certain mister in this forum, who used to shout about 6 ALUs that AMD/Intel need to add to match Apple who is already 6-wide ( at way lesser clock and much tighter memory subsystem and different ABI )
So lets not pretend to be smarter than actual chip designers.

But i think some facts are true for all chips. Even if average IPC is 1, there were sections of code, where suddenly a lot of ops became ready ( maybe some blocking dependancy arrived from DRAM ) and then 5 ALUs and wider machine can chew those ready ops faster. Maybe the end difference in average IPC will be 0.97 versus 0.98, but wider chip will still come out on top.
So armed with this information we find that in web browser benchmarks - esp Speedometer 2.0 Apple and Alder Lake are especially strong. And i think most of that power comes from having massive OoO cores backed by 5-6 ALUs and other resources. How else You'd explain 250 vs 325 score for 5Ghz Z3 vs ADL ?



I did a test on Zen3 as well, and with tuned memory @4.4ghz it is scoring 7950 MIPS ( vs 6800 as in article @4.9ghz i believe ), so it is another confirmation than 7Zr compression algorithm is scaling too well with memory to be proper measurement of ALU process. Who knows where Zen3 or ADL peak ?
I don’t really have time to study the specifics here. Performance of modern processors is really difficult to predict with multiple levels of caches, prefetch, TLB effects, and everything else. Those things effect applications differently. One application may work well with the prefetcher while one might not. More bandwidth will help the prefetcher, but it might evict useful data in some cases. I know that the large number of ALUs are very unlikely to be fully utilized. You obviously might get some better utilization with SMT, but that doesn’t help single thread performance. The complexity of the scheduler goes up significantly with added units, so it can limit clock speed. That doesn’t effect Apple as much since their target clock isn’t 5 GHz.

Due to the complexity of modern systems, it is difficult to determine cause for sure, but the out of order window on all of these processors is quite large, so do you really expect that to be limiting Zen 3 performance? At 4 or 5 GHz, I would expect even a single ALU would be waiting on memory accesses quite a bit. Also, isn’t Zen 3 four ALU? How often does the extra 1 or 2 ALU come into play? It just doesn’t seem like it is going to make a difference very often.

You have some subset of applications that are likely very cacheable and two processors with larger and/or faster L2 that do a lot better. Seems likely that it is the caches rather than the “core”; you can’t really separate them. Cache design is probably the most complicated component of high performance CPUs these days. Intel was winning for an long time since they were very good at it. I would say that AMD is kind of brute forcing it a bit by going for larger cache size and using the advantages of their MCM architecture to compete, or dominate in some cases. Intel’s Safire rapids appears to still be made up of rather large, expensive, monolithic die. I haven’t read too much on it, but it looks like 4 x 400 mm2 or so. That is ~1600 mm2 of silicon on and advanced process while Milan is about 300 for a common 4 die part and 600 for the more expensive 8 die part. The IO die is cheap GF silicon. That is a big difference. It would be a crippling difference for Intel if they didn’t own their own fabs. It will still limit Intel’s capacity and pricing for these parts. AMD can make a lot of Epyc chips per 5nm wafer. It will be 4, 8, or 12 with Genoa vs. just 4 or 8 with Milan, but the parts that use more than 8 will likely be very expensive and less common.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
Vidyacardz ran an article this morning quoting life's a disease boy about TR7000. Interesting "leaks' were that it's 96 cores but nothing apart from that, there was some comments on the article and other places I checked discussing avx512 possibly not being present. Let's for a moment assume disease boy wasn't lying up and down a great wall, would AMD really release avx512 on Ryzen but not TR, but also have it again on epyc? Although I can't see TR being a diy product and more of a pro product through AMD's vendors unless Intel dropped something in the hedt space to get AMD's attention.
 

DrMrLordX

Lifer
Apr 27, 2000
21,617
10,826
136
would AMD really release avx512 on Ryzen but not TR, but also have it again on epyc?

Likely not. You never know, but creating that kind of product segmentation wouldn't make a lot of sense. AMD has been all too happy for there to be overlap between Threadripper, Threadripper Pro, and EPYC in the past.
 

coercitiv

Diamond Member
Jan 24, 2014
6,185
11,851
136
there was some comments on the article and other places I checked discussing avx512 possibly not being present.
The AVX512 comments on that article make no sense, and the least we can do to keep sanity around here is discuss leaks/rumors from sources with a minimum of proven track record.

Imagine if I told you that AMD were considering disabling SMT on Threadripper in order to keep AVX512 enabled.
 

deasd

Senior member
Dec 31, 2013
516
746
136
well since this is Zen4 thread..........IIRC there's no indication that Zen5 would be big+little design though, was an executive from AMD clarified that they haven't considered it yet?
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,762
3,131
136
well since this is Zen4 thread..........IIRC there's no indication that Zen5 would be big+little design though, was an executive from AMD clarified that they haven't considered it yet?
not really , also AMD have a bunch of patients that have the idea of a core at that CCX that does low power stuff and then if the real core needs to wake it does so. basically being transparent to the OS. having local pools of L3 like this works in amd favor because in a way its just a more advanced version of sleep states and to migrate the small core just needs to flush to the local L3, big core wakes up.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
The AVX512 comments on that article make no sense, and the least we can do to keep sanity around here is discuss leaks/rumors from sources with a minimum of proven track record.

Imagine if I told you that AMD were considering disabling SMT on Threadripper in order to keep AVX512 enabled.
The only way I could see the AVX512 support being different is if the APUs do not support it but the chiplets do, or they don’t support full width on the APUs. A lot of lower end parts may be APUs. AVX512 takes a lot of extra power, so not having it on a mobile derived part makes some sense. I always thought that they should have left the width at 256-bit but added new operations. The 512-bit width seems like it was originally more of a kludge to make a cpu behave more like a gpu. Intel kind of gave up on that and actually made a gpu. If you leave the architectural width at 256-bit then you can have lower end parts with 1 unit and higher end parts with a lot more. They could also do the split it across two clocks though. That might be a reasonable solution in a mobile part. I have also wondered if it would be plausible to just execute AVX512 code segments on an integrated GPU.
 

soresu

Platinum Member
Dec 19, 2014
2,656
1,857
136
This. I wonder if AVX512 is going to be done on Zen 4's iGPU since every Zen 4 SKU is supposed to have it.
I have speculated on it in the past based on the original AMD Fusion PR, however I have my doubts about it these days.

Integrating the CPU so highly with the GPU, even if it were feasible to manage x64/AVX <-> GCN/RDNA ISA efficiently would make it harder to shrink the design for 2x core count Zen4c.

If it were possible it would more likely be some future iteration of the CDNA architecture series rather than RDNA as the SIMD engine for the CPU.
 

coercitiv

Diamond Member
Jan 24, 2014
6,185
11,851
136
The only way I could see the AVX512 support being different is if the APUs do not support it but the chiplets do, or they don’t support full width on the APUs. A lot of lower end parts may be APUs.
The discussion was focused on Threadripper, the AM5 platform is a different game indeed.

I'd think you'd have finally lost it but in this weird world we're in your satire will be picked up by some bumbling fool and purged onto the twitter as if it's truth.
My point was that making such suggestions with no proof or serious logic behind them is quite close to random speculation (not you, those comments in VC article ). Even based on the MLID video, assuming the leak is close to reality, Intel is most likely going to support AVX512 in their new workstation platform. And that's on top of workstation being the second most likely to put AVX512 to good use after server.

So is it possble AMD may remove AVX512 from Threadripper? Sure, as long as we admit they may do anything wild like removing SMT or limiting PCIe lane count, memory channel count, etc... It's all just as random, as any of these is already baked into the silicon.
 
Last edited:

Tuna-Fish

Golden Member
Mar 4, 2011
1,346
1,525
136
This. I wonder if AVX512 is going to be done on Zen 4's iGPU since every Zen 4 SKU is supposed to have it.

Can't be done. AVX512 has many instructions that tightly integrate with the existing vector ISAs or the scalar GPRs. Placing it anywhere outside of the core would absolutely murder performance, so much so that there would be no point.
 

Mopetar

Diamond Member
Jan 31, 2011
7,831
5,980
136
The only reason not to include it is die area. Most workloads don't use it, but even if they do it doesn't matter if the clocks have to run slower because it's still a performance boost in all the benchmarks I've seen where it's used. Yes it uses more power too, but if you look at the performance as well the efficiency is better.

I don't think it really matters either way. It's not widely used and most people could go the lifespan of their machine without using those instructions.
 

Thibsie

Senior member
Apr 25, 2017
746
798
136
The only reason not to include it is die area. Most workloads don't use it, but even if they do it doesn't matter if the clocks have to run slower because it's still a performance boost in all the benchmarks I've seen where it's used. Yes it uses more power too, but if you look at the performance as well the efficiency is better.

I don't think it really matters either way. It's not widely used and most people could go the lifespan of their machine without using those instructions.

That's the real use case. It completely ignores marketing but I hear you.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
So is it possble AMD may remove AVX512 from Threadripper? Sure, as long as we admit they may do anything wild like removing SMT or limiting PCIe lane count, memory channel count, etc... It's all just as random, as any of these is already baked into the silicon.
I see it more as a matter of whether Threadripper will be even available to the general masses via the DIY route or remain a Pro product through their vendor, Lenovo. Zen 3 based TR was late than most anticipated which may have been fueled by the pandemic or AMD realized they could for that time being at least make more money through their Epyc sales. If Intel can get their act together and release a competent HEDT solution I can see Threadripper coming back to the masses. AMD isn't really too concerned with this now it seems because of higher Epyc sales and of course Lenovo handles the TR Pros for them. In my earnest opinion as an outsider, it all hinges on Intel's efforts.

Of course we could be wrong and AMD didn't see the need for TR Zen 3 for the masses because Zen 2 TR is still an incredibly powerful system, even if slightly hamstrung by some limitations.

Zen 4's R7 and R9 should be incredible beasts based on the very little preliminary info we have on them. Zen 5 will be my to upgrade or not path since I'm on a new Intel platform. AM5 platform woes should be addressed by then and DDR5 won't cost me an arm and a leg, and a kidney.
 
  • Like
Reactions: Tlh97 and coercitiv

deasd

Senior member
Dec 31, 2013
516
746
136
another possible Zen4 ES with 1M L2 per core, appeared in geekbench, but the result is OpenCL which means GPU test



CPU Information
NameAMD Eng Sample: 100-000000866-01
Topology1 Processor, 32 Cores, 64 Threads
Base Frequency1.20 GHz
Maximum Frequency4698 MHz
L1 Instruction Cache32.0 KB x 32
L1 Data Cache32.0 KB x 32
L2 Cache1.00 MB x 32
L3 Cache32.0 MB x 4

more interesting this new found one has very different ID (100-000000866-01) compared to that leaked in March which is also 32C64T: (100-000000479-13)

possible 32C desktop variant ???
 
Last edited:
  • Like
Reactions: lightmanek

Mopetar

Diamond Member
Jan 31, 2011
7,831
5,980
136
Threadripper has been a bit of an odd case. AMD basically drove Intel out of HEDT and then found themselves in a very wafer constrained world.

If they have the free chiplets we'll eventually see the return of DIY TR parts, but it's a small enough market that they can keep selling older parts because of a lack of competition.

Intel seems like they could return to that space now so I think that will force AMD's hand.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
Threadripper has been a bit of an odd case. AMD basically drove Intel out of HEDT and then found themselves in a very wafer constrained world.

If they have the free chiplets we'll eventually see the return of DIY TR parts, but it's a small enough market that they can keep selling older parts because of a lack of competition.

Intel seems like they could return to that space now so I think that will force AMD's hand.

I doubt we will see future DIY Threadripper parts, reasons:

  1. The DIY HEDT market is small. (And it would be smaller with future products, the cost of TR is going up along with the motherboards)
  2. It is just another product line that pulls chiplets away from EPYC.
How many are willing to drop $700 on a motherboard and another 4 grand on a CPU for DIY? Not many.
 

dnavas

Senior member
Feb 25, 2017
355
190
116
Threadripper has been a bit of an odd case. AMD basically drove Intel out of HEDT and then found themselves in a very wafer constrained world.

I prefer to think of it like: https://s3.amazonaws.com/libapps/accounts/2581/images/nowwhat.png

Intel seems like they could return to that space now so I think that will force AMD's hand.

I fully expect Intel will leave the space abandoned as well. Would love to be wrong. I continue to believe that TR is a good platform to experiment with more radical designs -- it's a niche product with niche appeals, but it's hard to experiment when you have a two+ year release cycle, and it's probably easier to run experiments with NDA'd partners. A 32 core AM5 product seems more likely to me than a TRzen4, sad though that is to me.

I doubt we will see future DIY Threadripper parts [...]
How many are willing to drop $700 on a motherboard and another 4 grand on a CPU for DIY? Not many.

Well, yes, if a 32 core TR is 8 times the cost of a 16 core desktop (it's more like 6x right now, although there's a generation gap contributing to the lack of value as well), that would be a bit of a hurdle. I'm not sure the issue is the buying public in this case, though :) Per-core pricing with a modest platform cover charge seems reasonable. I'd also expect that the motherboard will be a little more than twice the cost of a decent AM5 board given the higher IO density.
 
Last edited:

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
I prefer to think of it like: https://s3.amazonaws.com/libapps/accounts/2581/images/nowwhat.png



I fully expect Intel will leave the space abandoned as well. Would love to be wrong. I continue to believe that TR is a good platform to experiment with more radical designs -- it's a niche product with niche appeals, but it's hard to experiment when you have a two+ year release cycle, and it's probably easier to run experiments with NDA'd partners. A 32 core AM5 product seems more likely to me than a TRzen4, sad though that is to me.



Well, yes, if a 32 core TR is 8 times the cost of a 16 core desktop (it's more like 6x right now, although there's a generation gap contributing to the lack of value as well), that would be a bit of a hurdle. I'm not sure the issue is the buying public in this case, though :) Per-core pricing with a modest platform cover charge seems reasonable. I'd also expect that the motherboard will be a little more than twice the cost of a decent AM5 board given the higher IO density.

Current sWRX8 boards start at $725 and get up into the thousands. I expect AMD to raise prices on the chips themselves to be closer to EPYC pricing. Currently you can find the 3995wx for around $5,600, the 3975wx for $3,200, and the 3955wx for $1,200.

People asking for DIY do NOT understand where the platform is going.

Just a thought.
 
  • Like
Reactions: Tlh97 and scineram