Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 34 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,663
136
Die sizes as well now
Interesting sizes. If true (looks like somebody close to the tape out leaked it then) AMD continues to stay close to past die sizes while introducing improvements within those constraints.

True, but they also went from GF 12nm to TSMC 7nm, which is one of the best improvements in node tech in a while Samsung's EUV 5nm seems to be only about equal to that in power draw. If Apple's chips are any indication 5nm didn't really offer anywhere near as large power or SRAM (cache) shrinking benefits
How did Samsung's 5nm enter this comparison? It barely plays catch up with TSMC's 7nm. While from GF 12nm to TSMC N7 is definitely bigger, I don't think Samsung's 5nm helps with guessing the properties of TSMC N5.

Aside of all that increasing the amount of cores is more a problem of package size, and with both AM4 and SP3 bound to make way for new platform very likely introducing bigger packages, that's essentially a non-issue. Furthermore I'm of the belief that even at high core counts like 96 or 128 a TDP over 300W matters only for cores clocking beyond their power efficiency sweet spot.
 
  • Like
Reactions: Tlh97

uzzi38

Platinum Member
Oct 16, 2019
2,632
5,959
146
Interesting sizes. If true (looks like somebody close to the tape out leaked it then) AMD continues to stay close to past die sizes while introducing improvements within those constraints.
What's most surprising to me is the I/O die. I'm really not at all surprised by the CCDs if I'm honest, but the I/O die - new nodes have little to know effect on analog circuit density, and Genoa has a very significant increase to I/O (12ch DDR5, 128 PCIe5 lanes etc etc), yet despite that, the I/O die is actually smaller than Rome's.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,605
5,795
136
Zen2-->Zen3 CCD saw a 9% bump in transitor count
Assuming a very conservative 1.35x - 1.40x density gain from N7--> N5P (in comparison to the advertised logic density gain of 1.8x), that gives a rough minimum transistor count gain of 20-25% and probably closer to 30% if design is not cache heavy.
A fairly substantial bump, was expecting it to be around 65mm2.

1614615781095.png


20% transistor gain per CCD if not mostly cache should result in fairly potent upgrade going by how Zen3 turns out to be.
I am not sure the cache will go up big time, they probaby need to keep latency in check too.

EDIT:
I saw that the Zen4 adds 52 bit Physical addressing which is fairly substantial uplift. 4096 Terabyte of addressable space., up from 256 TB.
I imagine most of the blocks dealing with address operations would be upgraded.
 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,605
5,795
136
What's most surprising to me is the I/O die. I'm really not at all surprised by the CCDs if I'm honest, but the I/O die - new nodes have little to know effect on analog circuit density, and Genoa has a very significant increase to I/O (12ch DDR5, 128 PCIe5 lanes etc etc), yet despite that, the I/O die is actually smaller than Rome's.
I think there is a possibility that the new IOD is probably GF 12LP+ or even 8LPP/U
There is lot of logic in the IOD as well if you see Fritz's scans. Those should scale.

You can see line drivers, SerDes etc on the edge of the die here for Rome IOD and a big chunk of logic(and probable some cache) in the middle
1614616378329.png
 

Ajay

Lifer
Jan 8, 2001
15,454
7,862
136
I think there is a possibility that the new IOD is probably GF 12LP+ or even 8LPP/U
There is lot of logic in the IOD as well if you see Fritz's scans. Those should scale.

You can see line drivers, SerDes etc on the edge of the die here for Rome IOD and a big chunk of logic(and probable some cache) in the middle
View attachment 40399
I would think SRAM buffers rather than 'cache' A lot of switches/routers and lookup tables. Also, a ton of traces for all that I/O - must be a routing nightmare.
 
  • Like
Reactions: Tlh97 and NTMBK

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
True, but they also went from GF 12nm to TSMC 7nm, which is one of the best improvements in node tech in a while Samsung's EUV 5nm seems to be only about equal to that in power draw. If Apple's chips are any indication 5nm didn't really offer anywhere near as large power or SRAM (cache) shrinking benefits
AMD are not moving to Samsung 5nm though.

They may also use it for somethings, but I'd be surprised if the Zen4 CCD was one of them.

Assuming AMD are using N5P it would be a very sizable jump from N7P, perhaps not the jump of GloFo 12nm to N7P, but certainly a nice whop of a change.
 
  • Like
Reactions: Tlh97

Hans de Vries

Senior member
May 2, 2008
321
1,018
136
www.chip-architect.com
What's most surprising to me is the I/O die. I'm really not at all surprised by the CCDs if I'm honest, but the I/O die - new nodes have little to know effect on analog circuit density, and Genoa has a very significant increase to I/O (12ch DDR5, 128 PCIe5 lanes etc etc), yet despite that, the I/O die is actually smaller than Rome's.

The physical I/O of 7nm Cezanne is quite small. The 128 bit bus is just 5% of the 180mm2 die (The top-right rectangle) or 9 mm2.

Cezanne_die.jpg

This makes me think that AM5 may jump over Alderlake's 1700 pin package to 2000+ pins (from 1331 for AM4)
It would make desktop motherboards with 4 memory slots, each with its own channel.

AM5 needs to support 3nm CPU's and APU's. Two channel LPDDR4-4266 is already exhausted by 8 VEGA compute units. (Rembrandt has 12 Navi2 compute units on 6nm). The 5nm Rafael has an unknown amount of Navi3 compute units on AM5 and they could be clocked north of 3GHz.
 

jpiniero

Lifer
Oct 1, 2010
14,599
5,218
136
AM5 needs to support 3nm CPU's and APU's. Two channel LPDDR4-4266 is already exhausted by 8 VEGA compute units. (Rembrandt has 12 Navi2 compute units on 6nm). The 5nm Rafael has an unknown amount of Navi3 compute units on AM5 and they could be clocked north of 3GHz.

The IGP is obviously not a priority for AMD right now, esp on desktop. NBD if it's memory bottlenecked. DDR5 does officially go up to 6400 and beyond so there is room for additional bandwidth.

Now I wouldn't be surprised if Threadripper goes to 6 channel.
 
  • Like
Reactions: Tlh97 and moinmoin

DisEnchantment

Golden Member
Mar 3, 2017
1,605
5,795
136
I would think SRAM buffers rather than 'cache' A lot of switches/routers and lookup tables
Just an example, could be registers besides others, even cache is basically SRAM on N5/N7. TSMC does not have another cache type like eDRAM for example for these processes. I dont know if Samsung has other cache types for sub 10nm processes.
 
Last edited:

coercitiv

Diamond Member
Jan 24, 2014
6,202
11,908
136
AM5 needs to support 3nm CPU's and APU's. Two channel LPDDR4-4266 is already exhausted by 8 VEGA compute units. (Rembrandt has 12 Navi2 compute units on 6nm). The 5nm Rafael has an unknown amount of Navi3 compute units on AM5 and they could be clocked north of 3GHz.
If their RDNA2 move is to be continued, AMD is trying to reduce data movement for their GPU IP. I think we're more likely to see a continued emphasis on caching as the means to increase effective bandwidth.
 

Gideon

Golden Member
Nov 27, 2007
1,641
3,678
136
How did Samsung's 5nm enter this comparison? It barely plays catch up with TSMC's 7nm. While from GF 12nm to TSMC N7 is definitely bigger, I don't think Samsung's 5nm helps with guessing the properties of TSMC N5.

AMD are not moving to Samsung 5nm though.

Sorry, I put two totally different points into the same paragraph, I should have explained the points separately.

  1. The fact that both Intel (10nm ++) and Samsung (7nm EUV) are unable to match TSMCs efficiency IMO shows that TSMC N7 was a particularly good node. Especially so, when coming from the relatively weak GlobalFoundries 12nm (which was hardly any better than Samsung 14nm it was licenced from). The fact that AMD was able to get Renoir up to 8 cores seems to confirm that.
    1. 1. One might argue that this is due to AMD's engineering prowess, which is certainly true but AMD has been adamant in their presentations that the gains were due to a huge combined effort by them, TSMC and synthesis software maker (Synopsis I believe).
  2. Apple has made multiple dies (A14 and M1) on TSMC N5 and their scaling (especially SRAM, which is used in caches a lot) hasn't been all that impressive - not in dimensions nor power draw.

Overall these were the points I was trying to make.

Now Obviously I might be wrong, that’s why we're discussing it on the forumsafter all, and I agree that TSMC N5P looks quite strong on , but I’m sceptical about how soon AMD will get their hands on it because of Apple taking all. IMO Zen 4 is still "vanilla" N5.
 

Hans de Vries

Senior member
May 2, 2008
321
1,018
136
www.chip-architect.com
The IGP is obviously not a priority for AMD right now, esp on desktop. NBD if it's memory bottlenecked. DDR5 does officially go up to 6400 and beyond so there is room for additional bandwidth.

Now I wouldn't be surprised if Threadripper goes to 6 channel.

If you want a 4 year lifespan for AM5 then you have to plan for the future with products way past the ones we have today.

If their RDNA2 move is to be continued, AMD is trying to reduce data movement for their GPU IP. I think we're more likely to see a continued emphasis on caching as the means to increase effective bandwidth.

This is an option but it only works with a minimum amount of SRAM which is quite large. At 4K it's already less effective now.
 

jpiniero

Lifer
Oct 1, 2010
14,599
5,218
136
If you want a 4 year lifespan for AM5 then you have to plan for the future with products way past the ones we have today.

I think additional caches are the solution (maybe even HBM2e or a sequel), that is if there is any actual desktop OEM demand for it.

I'd say the bigger issue is CPU core counts. Although since it looks like Intel isn't going more than "10" on the mainstream platform any time soon AMD isn't really going to be pushed to go much more than 16 on AM5.
 

Hans de Vries

Senior member
May 2, 2008
321
1,018
136
www.chip-architect.com
I think additional caches are the solution (maybe even HBM2e or a sequel), that is if there is any actual desktop OEM demand for it.

I'd say the bigger issue is CPU core counts. Although since it looks like Intel isn't going more than "10" on the mainstream platform any time soon AMD isn't really going to be pushed to go much more than 16 on AM5.

According to Intel marketing they will be at 16 cores for the desktop (and probably high end mobile as well) early next year :sweatsmile:
 

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,663
136
What's most surprising to me is the I/O die. I'm really not at all surprised by the CCDs if I'm honest, but the I/O die - new nodes have little to know effect on analog circuit density, and Genoa has a very significant increase to I/O (12ch DDR5, 128 PCIe5 lanes etc etc), yet despite that, the I/O die is actually smaller than Rome's.
I personally find that to be not surprising, technically the IOD is still the uncore we know from Zen 1, that's now 3 gens ago. So AMD has had a lot of room and time for improvements there. The APUs hinted on some of the possibilities, finding and using a node more fitting for IO should be able to offer more improvements on top.

Apple has made multiple dies (A14 and M1) on TSMC N5 and their scaling (especially SRAM, which is used in caches a lot) hasn't been all that impressive - not in dimensions nor power draw.
Being the first mover Apple essentially uses TSMC nodes as is (or rather, the nodes likely are primarily created to fit Apple's specification and timing demand, with little time for further optimizations for either the process node or the silicon design as hitting the launch window with a preset quantity is imperative). AMD has shown with Zen 2 and 3 it takes the additional time to adapt and optimize its designs to the nodes. I have no doubt it will continue doing so.
 

Ajay

Lifer
Jan 8, 2001
15,454
7,862
136
OEMs absolutely do not want more memory channels in the high-volume products. They add significant amount of additional cost to the lowest-performance, highest-volume products.
HPC and ML folks can't get enough bandwidth. If there is flexibility in channel use, this would work fine for multiple segments.
 

Hans de Vries

Senior member
May 2, 2008
321
1,018
136
www.chip-architect.com
OEMs absolutely do not want more memory channels in the high-volume products. They add significant amount of additional cost to the lowest-performance, highest-volume products.



- There is of course no obligation or need to route out all channels for the low end boards for low end processors. That's the same as not filling all slots.
- The problem is to avoid limiting what you can do at the high end of the spectrum which is where much of AMD's products are right now.
- AM4 spans from 2017 to 2021. If AM5 has to go all the way from 2022 to 2026 then you have to keep the 2026 products in mind as well.
- It seems that from 2022 and onward all AMD client processors will have integrated graphics on board, both desktop and mobile.
- DDR5-6400 only provides a 50% improvement over LPDDR4-4266 and further increases will go slow and DDR6 is a different package all together.
- SRAM scales much less compared to general logic so it's effectiveness in reducing bandwidth becomes less economical going forwards.
- We have been hoping for HBM2e many years now. Still waiting. I can see it being used in Apple laptops as main & graphics memory when a single stack can hold 32GB in a few years time. Not expandable of course.


Now AMD has to concider all of these for their long term roadmaps. It's not for me to decide anything. We'll see what comes out.
 
Last edited:

fleshconsumed

Diamond Member
Feb 21, 2002
6,483
2,352
136
At the beginning of the Zen/Ryzen saga I was very excited, but now? Everything seems so excessive.
Wish we could go back to simpler times.
Back to simpler times of paying $400 for 4c8t? Years of 2-4% performance gain year over year? No thanks. I'll take my 16 cores without having to pay through the nose.