Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 197 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

Hans Gruber

Platinum Member
Dec 23, 2006
2,131
1,088
136
I just want to point out that DDR4 didn't disappoint in performance over they years. The failure or disappointment is that neither AMD nor Intel embraced the high frequency DDR4 memory. There are DDR4 kits that go up to 5000mhz+. I have a Hynix CRJ kit that is guaranteed up to 4500mhz and it's a 3600mhz kit. Intel scores higher in memory performance (synthetic) DDR4 clock for clock vs. AMD in memory benchmarks for ram. AMD wins in CPU performance with high clock memory in real world performance.

DDR5 is supposed to be a giant leap forward in ram performance vs. previous iterations DDR4 & DDR3 have been in the past. I know they have memory standards but I hope AMD takes full advantage of DDR5 during the product lifecycle. If they plan for 8000mhz sticks, expect 10,000mhz memory frequencies or more before the end of life cycle of DDR5. The memory manufacturers always go way beyond supported memory speeds.
 

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
Keep in mind:

threadripper 2990wx: 32 cores, quad dram channels at 2400 to maybe 3000 if you were lucky.

conjectural 24 core zen 4 desktop part: 24 cores, quad channel dram (ddr5 Simms are essentially two channels each), 2400 (1/2 of ddr5 4800 bandwidth per channel) to around 3000 per channel (ddr5 5800+ will be available).

feeding the beast won’t be a problem.
Yeah, but data throughput will go up by what, 80 - 100% from that generation? pretty much cancelling the DDR5 advantage. Look at the Intel 12 series. They are already giving up performance with fast DDR4 instead of DDR5. Zen 4 will be faster still, so back to 1 channel/8 core rule of thumb. The DDRx advance continues to be aligned with the core demands from IPC and speed.
 
  • Like
Reactions: Tlh97 and Vattila

jamescox

Senior member
Nov 11, 2009
637
1,103
136
You need space to route the chiplet<>IOD connections and you need to feed the beast.
I think 3 die would fit on the current AM4 socket if they turned the cpu die 90 degrees. The die would be very close though. I don’t know if the routing is as big of a problem. With the standard substrates used for Zen 3 and older, it was quite complicated. Zen 1 actually had an extra IFOP link (4 total) with only 3 of them being used. The extra link was just there to simplify the routing on an Epyc package, but with all of the new packaging tech available now, going up to 3 die may not be that big of an a routing issue.

The renderings of the AM5 package we have seen is a rather odd looking package, so I suspect it has some special sauce. It seems rather thick, so I have wondered if it may use an integrated vapor chamber for some parts. I thought some GPUs have used such a solution. Also, I have seen some patents that indicate the possibility of integrated peltier devices. It would need to be a very low powered TEC, but if such a thing exists, then needing an integrated vapor chamber might make sense. That all seems unlikely and probably too expensive, but there is an AMD patent showing a TEC layer in a stack with 2 logic die, the TEC layer, and then stacked cache die.

I agree with some of the others that this is likely technically possible, but does this product make sense in the stack with market segmentation? That may depend what Intel, and to some extent, what Apple does. Both will be increasing core counts significantly. I assume that AMD wants to make a big push into the mobile market and they don’t just want low end, which does mean competing with both Intel and Apple to some extent. If we have at least 16 core parts in the mobile market then it might make sense for higher end desktop parts to go up to 24. A lot will depend on if they actually have an in between socket with maybe 6 channel memory and 6 chiplets for up to 48 cores.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
Why? There are boards with 4 slots, yet only 2 channels, so its 2 sticks per channel, right? How does number of sticks concerns the inner layout of CPU, whether the traces there are routed to 2 6-core chiplets, rather than hypothetical single 12-core chip? (I assume that hypothetical 24 core would be 4 6-cores chiplets, not 3 8-cores)
The thinking behind 24-cores was likely a high powered 8-core chiplet paired with a low power 16-core Zen 4c type chiplet. This makes a lot of sense since games generally don’t take much advantage of beyond 8 cores. I saw some test a while ago where some games continues to scale up to 10 cores, but not by that much. The Zen 4c type part might have relatively good base clocks, but probably would not be able to boost very high. It seems like a good solution for a lot of scenarios, but I kind of doubt we will see such things until Zen 5.

The memory controller is rather independent of the number of chiplets. The channels are not bound to any individual cpu chiplet. It is a little different for Epyc since the IO die is split into quadrants with each quadrant being very similar to a desktop IO die (2 memory channels, 2 cpu links, 2 x16 pci express). So Epyc does have NUMA even in a single socket. It can be configured as 1, 2, or 4 NUMA nodes per socket because of that.
 

Timmah!

Golden Member
Jul 24, 2010
1,417
630
136
The thinking behind 24-cores was likely a high powered 8-core chiplet paired with a low power 16-core Zen 4c type chiplet. This makes a lot of sense since games generally don’t take much advantage of beyond 8 cores. I saw some test a while ago where some games continues to scale up to 10 cores, but not by that much. The Zen 4c type part might have relatively good base clocks, but probably would not be able to boost very high. It seems like a good solution for a lot of scenarios, but I kind of doubt we will see such things until Zen 5.

The memory controller is rather independent of the number of chiplets. The channels are not bound to any individual cpu chiplet. It is a little different for Epyc since the IO die is split into quadrants with each quadrant being very similar to a desktop IO die (2 memory channels, 2 cpu links, 2 x16 pci express). So Epyc does have NUMA even in a single socket. It can be configured as 1, 2, or 4 NUMA nodes per socket because of that.
Thank you for explanation, thats what i wanted to know.
 

randomhero

Member
Apr 28, 2020
180
247
86
There is a possibility to have 3 chiplets,3 memory channels also.
Ryzen IOD was one quarter of Epyc IOD. So, if we have 12 CCD and 12 memory channels and if pictures of Genoa package is real, then one quadrant gives you template for Ryzen IOD.
AMD is become well known for reusing silicon and chip layouts across different segments of products.

Just my 2c.
 

Ajay

Lifer
Jan 8, 2001
15,429
7,847
136
The traces require a lot less space (and the chiplets can be closer together) if the rumors about the InFO packaging are correct. This would also help reduce the size of the io chiplet, it might be surprisingly small if it's made on N6.

Thanks. Looking up InFO, again, more, thinner layers can be fabricated in the substrate. Also, electrical characteristics are improved. So, if I understand correctly, what you said makes sense. More traces can be accommodated, and, with a smaller IOD, it would be easier to fit three CCDs. Interesting stuffs, even if only rumors.
 
  • Like
Reactions: Tlh97

Mopetar

Diamond Member
Jan 31, 2011
7,831
5,980
136
A 6nm IO die that can connect to 3 chiplets and includes some small amount of graphics would be bigger than the IO die from earlier Zen parts.

The physical interfaces that make up most of the IO die don't benefit from node shrinks like the logic for the cores or hardware accelerators does.

I suspect that at least the server parts will still be using Global Foundries for the IO dies. If the graphics are truly bare bones they could probably get away with GF for the Zen 4 IO die as well.
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
Yeah, but data throughput will go up by what, 80 - 100% from that generation? pretty much cancelling the DDR5 advantage. Look at the Intel 12 series. They are already giving up performance with fast DDR4 instead of DDR5. Zen 4 will be faster still, so back to 1 channel/8 core rule of thumb. The DDRx advance continues to be aligned with the core demands from IPC and speed.
L2 caches will also be doubled. Many parts will have massively more L3 cache. Even the base parts will have twice the total L3 cache and four times as much L3 cache that is "local" to each core as compared to the Zen+ era. So, memory bus pressure per core should be lower as a result for workloads that have small footprints or high locality.

So, the question really isn't if it will over-subscribe the existing memory architecture. It's more about what can use all those cores.
 

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
L2 caches will also be doubled. Many parts will have massively more L3 cache. Even the base parts will have twice the total L3 cache and four times as much L3 cache that is "local" to each core as compared to the Zen+ era. So, memory bus pressure per core should be lower as a result for workloads that have small footprints or high locality.

So, the question really isn't if it will over-subscribe the existing memory architecture. It's more about what can use all those cores.
I know you know this, but, there's a reason cache exists and is increasing in size.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
Thank you for explanation, thats what i wanted to know.
The problem with heterogeneous cores is that the OS needs to handle the scheduling properly if they are all visible to the OS. With Intel already releasing such a cpu with performance and energy efficient cores, the software might already be there to some extent. That could allow AMD to move up plans for heterogeneous core devices, but previous rumors and some patents seemed to indicate that AMD was not going to go this route. They already have a power advantage, so “efficiency” cores may not be necessary with Zen 4.

The Zen 5 rumors made it seem like they were going to handle heterogeneous cores at the hardware level with no OS changes required. It would just fire up the big Zen 5 core depending on the load and use a low power Zen 4 based core otherwise.

I still kind of doubt that we will see a mixed device with Zen 4, at least not with Zen 4c. I was thinking that Bergamo may be a stacked device. If it is a stacked device (using silicon bridges instead of high speed serial to save power), then the Zen 4c chiplets would be incompatible with desktop IO die. It may not be stacked though, so they may be compatible, but Zen 4c likely comes out a while after Zen 4.

There are all kinds of weird things that they could do with modular chips and/or stacking. I have wondered about combining multiple APUs for a really high end mobile device using a silicon bridge, similar to the M1 Ultra. Perhaps using SoIC to stack a gpu chiplet on top of an IO die to allow for a much larger gpu on some products. Perhaps have an APU act as an IO die by just putting an IFOP link on it; that would allow the same thing as the Zen 4c type device with 8 low power cores on die with the option of adding another chiplet for more cores. If they ran 2 external links then they could go up to 24 cores and the device would really look like a quadrant of an Epyc IO die, except one CCX and a gpu would be integrated or stacked. An IO die with a gpu starts to look like an APU anyway. This would allow a huge number of different products with the base APU die, cpu chiplet(s), and possibly a stacked GPU die. The low end could just be the APU die and nothing else. Perhaps this is wild speculation, but AMD has talked about stacking other things using SoIC rather than just cache.
 
  • Like
Reactions: Tlh97 and Timmah!

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
Cant you have 2 chiplets served by single memory channel?
Yes. Running with 1 memory stick. Not wise, but it will work.

The number of memory channels does not have anything to do with the number of chiplets. The chiplets have no idea how many channels there are. Unsure where you guys are getting this from. The memory controllers are on the IO die, NOT the compute dies.

AMD could make a quad channel Ryzen part without ever touching the compute dies. If the IF link is saturated, they WOULD have to speed it up using more links or a higher speed, however, I doubt they are saturating things currently, and Zen 4 is rumored to use a new version of IF anyway.

EDIT: Threadripper is a perfect example.

At any rate, I stopped by to say DDR5 32gb kits have dropped to under $200 (cheapest I've seen is $179). Hopefully prices continue to decline as we get closer to the launch of Zen 4.
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
The number of memory channels does not have anything to do with the number of chiplets. The chiplets have no idea how many channels there are. Unsure where you guys are getting this from. The memory controllers are on the IO die, NOT the compute dies.

AMD could make a quad channel Ryzen part without ever touching the compute dies. If the IF link is saturated, they WOULD have to speed it up using more links or a higher speed, however, I doubt they are saturating things currently, and Zen 4 is rumored to use a new version of IF anyway.

EDIT: Threadripper is a perfect example.

At any rate, I stopped by to say DDR5 32gb kits have dropped to under $200 (cheapest I've seen is $179). Hopefully prices continue to decline as we get closer to the launch of Zen 4.
I think you might be confused.


The question was, "Cant you have 2 chiplets served by single memory channel?".

My answer was, yes, which happens if you only have 1 memory stick installed.
 
  • Like
Reactions: Tlh97 and Thibsie

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
I think you might be confused.


The question was, "Cant you have 2 chiplets served by single memory channel?".

My answer was, yes, which happens if you only have 1 memory stick installed.

You can 4 chiplets served by 1 memory channel, or 1 chiplet served by 4 memory channels. It makes no difference. Memory channels have nothing to do with the chiplets on a Ryzen processor.
 
  • Like
Reactions: lightmanek

deasd

Senior member
Dec 31, 2013
516
746
136
the socket looks quite huge......no?


GBT-AM5-HERO-X670-banner-1200x437.jpg



edit: ok this Asrock X670E Taichi is much more intuitive:

ASROCK-X670-TAICHI-768x432.jpg
 
Last edited:

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
the socket looks quite huge......no?


View attachment 61832

Looks to me to be very close to AM4 in size. The change in bracket/mounting hardware is probably throwing you off.

At least we have a solid confirmation that Zen 4 will be announced. :D
 

biostud

Lifer
Feb 27, 2003
18,237
4,755
136
I sincerely hope that the move from X570 Taichi which had five slots to X670 which appears (?) to have two is not indicative of a trend in X670 boards. Particularly if TR is getting the boot.
I thought it was a weird layout too. Who needs two PCIe x16 slots nowadays, except for some special professional work.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
I sincerely hope that the move from X570 Taichi which had five slots to X670 which appears (?) to have two is not indicative of a trend in X670 boards. Particularly if TR is getting the boot.
I am curious; what other types of cards do people add with so much built into the motherboard these days (USB, network, SATA, m.2, sound, etc)? I would expect that a huge number of systems add a video card and that is all. That says to me that the form factor is likely headed towards being obsolete. A lot of people just use a laptop at best these days and don’t own a desktop.
 

dnavas

Senior member
Feb 25, 2017
355
190
116
I am curious; what other types of cards do people add with so much built into the motherboard these days (USB, network, SATA, m.2, sound, etc)? I would expect that a huge number of systems add a video card and that is all. That says to me that the form factor is likely headed towards being obsolete. A lot of people just use a laptop at best these days and don’t own a desktop.

Plenty of people who are dissatisfied with networking and sound on motherboards. :> I've got a networking card (sfp28) and a video capture board. If I could get properly isolated sound equipment, I would be tempted. I've yet to have sound output that isn't horrifyingly noisy. There are a few sundry items I can imagine adding at some point as well, but they would be very hobbyist in nature.
 
  • Like
Reactions: ryan20fun