Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 179 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

jamescox

Senior member
Nov 11, 2009
637
1,103
136
Depending on how AVX-512 is implemented, and the workload being run, I suppose a 16 core Zen4 could have a peak power of 170W. Of course, that's socket power, which means the actual max power load could be high as it is for Zen2/3.



A lot of task would be bandwidth limited with 24 cores, IMHO. I suppose this question will be put to bed once we have a verified layout for Raphael.



It seems that AMD is only interested in the high performance workstation market with Threadripper Pro; which actually puts it out of the range of a true HEDT system for the small 'prosumer' market . They are probably using chiplets that don't quite meet the standards for chiplets used in Epyc CPUs. Hence the long delay - Epyc is so much more profitable with it's high demand rate, that Zen3 based workstations didn't make business sense until Zen4 based Epyc CPUs started trickling into the server market.

IMHO.
Bandwidth limited at 24 cores? Well, I guess we might need 4 high v-cache stacks. An extra 256 MB per cpu die would go a long way. Anyway, DDR5 can still supply a lot of bandwidth even at 2 “channels”. If they make the IO die on TSMC for the gpu, then it could contain other interesting things. It is amazing how secretive AMD has been.
 

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
So, working off of the assumption that the IOD is moving to TSMC N6, and that TSMC has announced that stacking will be available for the N6 processor this year (supposedly), is it possible that AMD might look at implementing an "SLC" sort of L4 cache on the IOD? A 4 high stack could have 256MB of cache, significantly reducing usage of the memory bus. It would likely not have the same sort of limitations that the onCCD stack requires on the 5800X3D due to the IOD running at lower clocks. It would also help with the integrated GPU as that's going to have a certain amount of memory draw as well.
 
  • Like
Reactions: lightmanek

eek2121

Platinum Member
Aug 2, 2005
2,930
4,027
136
Bandwidth limited at 24 cores? Well, I guess we might need 4 high v-cache stacks. An extra 256 MB per cpu die would go a long way. Anyway, DDR5 can still supply a lot of bandwidth even at 2 “channels”. If they make the IO die on TSMC for the gpu, then it could contain other interesting things. It is amazing how secretive AMD has been.

Excluding core improvements that may shift the bottleneck, AMD could move to a 32 core design and not be any worse than the 5950X when it comes to bandwidth.
 

Ajay

Lifer
Jan 8, 2001
15,507
7,899
136
Excluding core improvements that may shift the bottleneck, AMD could move to a 32 core design and not be any worse than the 5950X when it comes to bandwidth.
For what? Games. Odds are, anyone needing 32 cores is doing professional workstation work on a client platform. I'm sure some of those loads don't push bandwidth (maybe a bunch of VMs for development work running light loads). This just doesn't make sense to me. I don't see why AMD would do this, even if they can fit 3 chiplets on AM5 (certainly 4 chiplets is no-go).
 

dnavas

Senior member
Feb 25, 2017
355
190
116
Excluding core improvements that may shift the bottleneck, AMD could move to a 32 core design and not be any worse than the 5950X when it comes to bandwidth.

pcie5 is twice pcie4, but ddr5 is not twice ddr4. Yet. Unless you're overclocking 6000 kits to 8888 because ... lucky ;^/ And a higher IPC will make that worse (which you mention). The last time I ran the numbers I wound up 50% short of equality. A 24 core Zen4 on present DDR5 would seem to be more similar to a 5950X than 32 for memory-bandwidth-limited usecases. Of course, that view will change as DDR5 gets faster, and the extent to which you care will depend on the extent to which your limit is solely memory. I don't think we know what the max stable frequency of DDR5 on Zen4 IOD is either.

I suspect power is going to be the limiting factor until Zen5, honestly, and not just for the CPUs. Have you seen what pcie5 nvme draws? Yikes....
 
  • Like
Reactions: lightmanek

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Hmm, there are twice the channels, which does make the comparison complicated. Are there other differences of note?
I'm unsure of the exact breakdown of the cause, but you can find some numbers from the memory vendors.

Compared to DDR4 at an equivalent data rate of 3200 megatransfers per second (MT/s), a DDR5 system-level simulation example indicates an approximate performance increase of 1.36X effective bandwidth. At a higher data rate, DDR5-4800, the approximate performance increase becomes 1.87X—nearly double the bandwidth as compared to DDR4-3200.


So at DDR5-5200 (a perfectly likely number for '22 memory controllers), that's about twice DDR4-3200 bandwidth.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Another Review


"To get a better idea of how the Radeon RX 6800S stacks up against Nvidia’s mobile GPUs, I re-ran these benchmarks at 1920x1080 and compared them to RTX-equipped laptops like the Asus TUF Dash F15 and Lenovo Legion 7i. Generally speaking, I’d put it somewhere between the RTX 3070 and the RTX 3060"
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,784
3,188
136
pcie5 is twice pcie4, but ddr5 is not twice ddr4. Yet. Unless you're overclocking 6000 kits to 8888 because ... lucky ;^/ And a higher IPC will make that worse (which you mention). The last time I ran the numbers I wound up 50% short of equality. A 24 core Zen4 on present DDR5 would seem to be more similar to a 5950X than 32 for memory-bandwidth-limited usecases. Of course, that view will change as DDR5 gets faster, and the extent to which you care will depend on the extent to which your limit is solely memory. I don't think we know what the max stable frequency of DDR5 on Zen4 IOD is either.

I suspect power is going to be the limiting factor until Zen5, honestly, and not just for the CPUs. Have you seen what pcie5 nvme draws? Yikes....
A higher IPC will generally make memory contention lower, if you require to fetch from memory ,stalling more you aren't "higher IPC"

There are also other potential bottlenecks before memory like the bandwidth off the interface leaving the CCX that could come into play.

the obviously caveat is SIMD workloads that are very parallel
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
Seems someone at AMD went ahead and tried patenting this.

ML accelerator die stacked on IOD

20220101179 - DIRECT-CONNECTED MACHINE LEARNING ACCELERATOR
View attachment 59378
Interesting. I don’t have time to try and sort through the parent. Does this indicate the type of stacking used? SoIC levels of connectivity doesn’t seem like it would be required. Some micro-bump tech would be sufficient to take full advantage of the memory system and possibly connectivity to the cpu cores.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,027
136
For what? Games. Odds are, anyone needing 32 cores is doing professional workstation work on a client platform. I'm sure some of those loads don't push bandwidth (maybe a bunch of VMs for development work running light loads). This just doesn't make sense to me. I don't see why AMD would do this, even if they can fit 3 chiplets on AM5 (certainly 4 chiplets is no-go).

I wasn't arguing the why, but rather, the fact they CAN do it if they wanted and bandwidth would not be the limit, however, since you asked:

They've moved Threadripper into the high-end workstation space, which leaves many of us that need high core count workloads without an option that can handle our productivity workloads. For me that means the ability to quickly compile code, encode video, render 3D, and also play games in my spare time. Could I get a Threadripper? Sure, I might even decide to go that route, however I would lose out on the high peak clocks of Ryzen, and Threadripper also has a much higher TDP. If I wanted a chip to keep my office warm I would have purchased an Intel chip.

That being said, I don't expect they will release such a chip, at least not right away.
 

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
I was hoping that we would see desktop processors that were effectively a quarter of the Genoa processors. Once Quadrant will have three CCDs and three DDR5 channels. That would be something special on a desktop...
 

Panino Manino

Senior member
Jan 28, 2017
827
1,028
136
Another Review


"To get a better idea of how the Radeon RX 6800S stacks up against Nvidia’s mobile GPUs, I re-ran these benchmarks at 1920x1080 and compared them to RTX-equipped laptops like the Asus TUF Dash F15 and Lenovo Legion 7i. Generally speaking, I’d put it somewhere between the RTX 3070 and the RTX 3060"

I was under the impression these had better battery life?
 

Mopetar

Diamond Member
Jan 31, 2011
7,875
6,115
136
it's not so much a bottleneck as it is an advantage. Many Apps will take advantage of extra$$

If they can take advantage it just means that previously there was a bottleneck in that area. It's entirely possible that a particular application can have multiple bottlenecks or that by removing or lessening one the ultimate bottleneck is shifted to another hardware element.

But it doesn't change the fact that it was a bottleneck. If more cache results in better performance than cache was a bottleneck. If performance improvements scale linearly with whatever factor was increased then you know it was a significant bottleneck. 100% scaling means you still probably haven't eliminated that bottleneck either because no other pet of the computer has started slowing down the workload.
 
  • Like
Reactions: Tlh97 and maddie

Hitman928

Diamond Member
Apr 15, 2012
5,340
8,110
136
I was under the impression these had better battery life?

It's a smaller laptop compared to most 'gaming laptops' which means a smaller battery but it still has a fairly powerful dGPU included and the only battery life test they did was while gaming at full brightness on a high res screen. I wouldn't expect more than an hour of battery life in such a scenario.
 

Ajay

Lifer
Jan 8, 2001
15,507
7,899
136
I wasn't arguing the why, but rather, the fact they CAN do it if they wanted and bandwidth would not be the limit, however, since you asked:

They've moved Threadripper into the high-end workstation space, which leaves many of us that need high core count workloads without an option that can handle our productivity workloads. For me that means the ability to quickly compile code, encode video, render 3D, and also play games in my spare time. Could I get a Threadripper? Sure, I might even decide to go that route, however I would lose out on the high peak clocks of Ryzen, and Threadripper also has a much higher TDP. If I wanted a chip to keep my office warm I would have purchased an Intel chip.

That being said, I don't expect they will release such a chip, at least not right away.
Well, with a 16 core Zen4 CPU (7950X?), all those tasks will run faster as it stands. Fast DDR5 and a Fast NVMe SSD are important to those kind of tasks. With the new iGPU on the IOD, there could be some encoding accelerators as well. For a b*lls to the wall workstation, there's only Threadripper Pro now (Castle Peak w/Zen3). Big $$ investment, but if it for ones job, then it'll pay off in productively and as a business expense. I think it would game fine with a consumer GFX card. Obviously, high core count CPUs burn more watts, but one must pay the piper to get the performance one wants.
 

moinmoin

Diamond Member
Jun 1, 2017
4,961
7,699
136
$ has ALWAYS been a bottleneck, no way around that, since the beginning of CPU time. There is no reason to point that out as if it just happened to Zen3
True. Another way I'd put it: Ideally you'd want data on storage to be available without any latency at the same speed as the processor itself. Since that's clearly not possible over time systems added memory and all the different levels of caches.