Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 102 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

DisEnchantment

Golden Member
Mar 3, 2017
1,601
5,780
136
Curious why the patent show the SI under most of the CCD. I would think a narrow SI from the CCD to the IOD would provide all that's needed for higher speed (frequency) IF interconnects. SI is only needed for signalling. Anyway, this would explain how Genoa will increase IF speeds at the same or lower power. I suppose it could be easier to manufacture, or the patent is just providing more coverage for legal reasons.
The above is just my opinion/guess. But for sure they are trying to be as vague as possible to cover whatever permutations of that idea.

When you look at all these patents, AMD is trying to avoid Wafer Level / chip first packaging like InFO (Because of yield issues, cost and also because they might need to integrate an off the shielf die like HBM).
The tendency is towards RDL with mentions of interconnect chips/bridges (aka LSI) in some cases. They do mention that the dies could be molded to fan out.
So I imagine it will be initially CoWoS-R --> CoWoS-L once the traces are too excessive to be done in RDL only. The Si Bridges needs the usual process steps like a normal active chip which is why they are avoiding.

I imagine, the extra thickness of the AM5 package (leaked by ExecuFix) could hint the presence of the RDL/Molding Layer/LSI Bridges/Comm Die etc
e.g.
20200294923
MULTI-RDL STRUCTURE PACKAGES AND METHODS OF FABRICATING THE SAME
[0030] The disclosed arrangements utilize stacked RDL structures to create a fan-out on fan-out package that can have a total thickness of less than 1 mm, while providing for surface component mounting, thick metal for power/ground, and high density RDL proximate the die or chip. In addition, bumpless mounting can be used such that ESD protection is not necessary.

Diagram below shows CoWoS-R on left and CoWoS-L on right.
1634836590220.png1634836610198.png


20210098437 : INTEGRATED CIRCUIT MODULE WITH INTEGRATED DISCRETE DEVICES
20210313269 : INTEGRATED CIRCUIT PACKAGE WITH INTEGRATED VOLTAGE REGULATOR
20200294923 : MULTI-RDL STRUCTURE PACKAGES AND METHODS OF FABRICATING THE SAME
20210057352 : FAN-OUT PACKAGE WITH REINFORCING RIVETS
20200168549 : MOLDED DIE LAST CHIP COMBINATION
Contains some cool stuff like integrating IVR within the RDL and also putting discrete devices on package.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,601
5,780
136
Rumors about RDNA3 so far point to something akin to COWOS-L
I think RDNA3 could be SoIC. The throughput needed for L3/MCD to GCD interconnect is very high. Interconnect density and power could be the constraints. But lets see, seems AMD have lots of options on the table.

Updated with one patent reference I found.
I think Bondrewd also mentioned SoIC
1634840412845.png
Bumpless direct die to die bonding,
 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
The correct answer, IMO, is o fix the interconnect and go for even more modularity. To be able to deploy even wider portfolio of product cost efficiently.
The selection of the interconnect used as well as the degree of modularity still depends on the overhead and the cost induced. It's always a balancing act were the cost overhead for superior solution just isn't feasible due to the resulting packaging costing more than possible with the product's budget. That's another reason why mobile APUs up to now are all monolithic, and very likely stay that way.
 
  • Like
Reactions: BorisTheBlade82

Ajay

Lifer
Jan 8, 2001
15,429
7,847
136
I think RDNA3 could be SoIC. The throughput needed for L3/MCD to GCD interconnect is very high. Interconnect density and power could be the constraints. But lets see, seems AMD have lots of options on the table.

Updated with one patent reference I found.
I think Bondrewd also mentioned SoIC
View attachment 51705
Bumpless direct die to die bonding,
5A or 5B make sense. I think going 3 layers right now might be a bridge to far right now.

Another diagram with some definitions on it - just for reference; as I keep forgetting ;)

TSMC_COWOS_RDL_LSI.png
 
  • Like
Reactions: Tlh97 and prtskg

Ajay

Lifer
Jan 8, 2001
15,429
7,847
136
The selection of the interconnect used as well as the degree of modularity still depends on the overhead and the cost induced. It's always a balancing act were the cost overhead for superior solution just isn't feasible due to the resulting packaging costing more than possible with the product's budget. That's another reason why mobile APUs up to now are all monolithic, and very likely stay that way.
Also yield impact. Having a CU-CU cold weld bond fail sends a lot of silicon into the bin (unless they can be be heated, removed and go thru some surface rework).
 

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
Also yield impact. Having a CU-CU cold weld bond fail sends a lot of silicon into the bin (unless they can be be heated, removed and go thru some surface rework).
Indeed, that's a major part of the indirect packaging cost. The more modular your package is the lower the combined yield as every single part can cause your house of cards to crash down. Even at high yield the individually low failure rate multiply with each other. Ponte Vecchio has to be a nightmare for that reason.
 

Kepler_L2

Senior member
Sep 6, 2020
330
1,162
106
The selection of the interconnect used as well as the degree of modularity still depends on the overhead and the cost induced. It's always a balancing act were the cost overhead for superior solution just isn't feasible due to the resulting packaging costing more than possible with the product's budget. That's another reason why mobile APUs up to now are all monolithic, and very likely stay that way.
Nope, mobile APUs will be MCM as well (and 3D stacked!).
 

Joe NYC

Golden Member
Jun 26, 2021
1,934
2,269
106
The selection of the interconnect used as well as the degree of modularity still depends on the overhead and the cost induced. It's always a balancing act were the cost overhead for superior solution just isn't feasible due to the resulting packaging costing more than possible with the product's budget. That's another reason why mobile APUs up to now are all monolithic, and very likely stay that way.

If AMD can get these active SoIC bridges into production in RDNA 3 next year, it means that interconnect is solved. Cost (in power and latency) is, in round numbers, zero, and bandwidth is, in round numbers, infinite.

So the gate to further modularity will be far more wide open than the initial stab at chiplets that we are seeing now.

As far as next gen chips certain things may be misleading. Say N6 IOD, vs. N5 compute / logic vs. N6 or N7 SRAM.

I think IOD will be on N6 only because it supports stacking. It does not need N6. And AMD's IODs going forward may be on N6 for another decade, when N6 is dirt cheap. So AMD may accumulate 5-10 different IODs over next few years, or could even use 4, 6, 8 smaller IODs on a big server chip. Any future product will not have to worry about IO, it will just take one from the shelf.

This will also extend to a wide array of Xilinx chips, once that goes through.

SRAM / active bridges can just follow price competitiveness and available capacity to select their node.

Logic can use the leading node, The final product may only use 30-50% of the die on the leading node, while still offering full performance of the leading node.

Next semi-custom, for X-Box / PS6? Microsoft and Sony can just pick how much of each component they want and the whole semi-custom design will be at small fraction of the current design cost.
 
  • Like
Reactions: Tlh97

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
What would be the possible lower end of the price range for these? I can't imagine MCM to be cost effective/with decent margin at an end consumer price of ~$100, nevermind ~$50.
Anyone has an idea of the SoIC bonding costs. It might be a lot less than is assumed.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,601
5,780
136
Nice interview by Anandtech

MC: I think it comes back to that balance aspect, in the sense that I think going beyond four with the number of transistors and the smarts we have in our branch predictor, and the ability to feed it worked fine. But we are going to go wider, you're going to see us go wider, and to be efficient, we'll have the transistors around the front end of the machine to make it the right architectural decision. So it's really having the continuous increase in transistors that we get, allowing us to beef up the whole design to continue to get more and more IPC out of it.

We do see core counts growing, and we will continue to increase the number of cores in our core complex that are shared under an L3. As you point out, communicating through that has both latency problems, and coherency problems, but though that's what architecture is, and that's what we signed up for. It’s what we live for - solving those problems. So I'll just say that the team is already looking at what it takes to grow to a complex far beyond where we are today, and how to deliver that in the future.

MC: I think IPC gets all the glory! What it really is – I call it the ‘Wheel of Performance’ because there's four main tenets – performance, frequency, area and power. They really are all equal in a sense and you have to balance them all out to get a good design. So if you go for a really high frequency but crush IPC, you can end up with a really bad design, and increased area. If you go really hard on IPC and that adds a lot of area and a lot of power, you can be going backwards. So that's really the critical part like we said, we're trying to get that IPC but we have to get it in a way that optimizes the transistor use for both area and power, and frequency too.

There were a lot of people, even internally, that were worried we weren't going to be able to sustain the rate of progress. It is a very risky strategy, - tearing the whole core up every three years is risky. But to me, I've managed to convince everyone, and it’s what the market requires. If we don't do it, someone else will.

Re Zen5
MC: It's going be great! I wish I could tell you of all what's coming. I have this annual architecture meeting where we go over everything that's going on, and at one of them (I won't say when) the team and I went through Zen 5. I learned a lot, because of nowadays as running the roadmap, I don't get as close to the design as I wish I could. Coming out of that meeting, I just wanted to close my eyes, go to sleep, and then wake up and buy this thing. I want to be in the future, this thing is awesome and it's going be so great - I can't wait for it. The hard part of this business is knowing how long it takes to get what you have conceived to a point where you can build it to production.


Small tidbit, they originally planned the K12 core with similar performance to Zen
MC: Originally Zen and K12 were, I think, we call them sister projects. They had kind of the same goals, just a different ISA actually hooked up.

What to expect (not necessarily Zen4)
Wider front end
More Cores/CCX(D)
Balanced approach to IPC wrt Area/Power/Frequency
 

Ajay

Lifer
Jan 8, 2001
15,429
7,847
136
Seems like Zen5 will be the next 'tear up' of the core. Apparently, Zen4 will be boring like Zen2 /jk Glad AMD is going wider in the future, it seems like the only way to improve efficiency compared to the current trajectory. More cores seems almost crazy, well, at least right now. I can only imagine that AMD will have to have 2 different CCDs by then; 8 for desktop, 12/16 for server or something like that. Or, maybe APUs take over the whole bottom range up to 8 cores. Interesting times ahead.
 

leoneazzurro

Senior member
Jul 26, 2016
919
1,450
136
I think that Zen4 will be more than what Zen2 was to the original Zen, after all going from Zen2 to Zen3 was done on the same process, and this probably limited the budget in terms of transistors. Looking at the architecture, I'd say there are some obvious low hanging fruits especially in the interconnects but quite probably AMD engineers know a lot more.
 

Saylick

Diamond Member
Sep 10, 2012
3,125
6,294
136
I think that Zen4 will be more than what Zen2 was to the original Zen, after all going from Zen2 to Zen3 was done on the same process, and this probably limited the budget in terms of transistors. Looking at the architecture, I'd say there are some obvious low hanging fruits especially in the interconnects but quite probably AMD engineers know a lot more.
Zen 4 is a big lift, irrespective of whether or not the core itself is re-built. A new node, new socket, new platform, new memory standard, and to top it off still updating the core, even if it's just a "derivative"? AVX-512? I think it's safe to say that the engineering hours needed to get this next generation up and running is going to be big.
 
  • Like
Reactions: Tlh97

Ajay

Lifer
Jan 8, 2001
15,429
7,847
136
I think that Zen4 will be more than what Zen2 was to the original Zen, after all going from Zen2 to Zen3 was done on the same process, and this probably limited the budget in terms of transistors. Looking at the architecture, I'd say there are some obvious low hanging fruits especially in the interconnects but quite probably AMD engineers know a lot more.

There's allot of high speed I/O and memory coming on board with Zen4, so faster, lower power interconnects along with a lower power IOD would be a big help, particularly for Epyc. If it's true that that the IOD will contain an iGPU for Ryzen, well, power would go up in some scenarios - but not for for owners of AIB GPUs.

Zen 4 is a big lift, irrespective of whether or not the core itself is re-built. A new node, new socket, new platform, new memory standard, and to top it off still updating the core, even if it's just a "derivative"? AVX-512? I think it's safe to say that the engineering hours needed to get this next generation up and running is going to be big.

There is certainly the potential to feed more bandwidth into the core with DDR5 and AVX-512 execution units would definitely eat that up. It will be interesting to see how DDR5 will affect various applications and games, in light of the longer latency. I wouldn't expect larger L3$ sizes to hide some of that, but IDK. Again, it's easier to see how the improvements will benefit Epyc compared to Raphael. PCIe 5.0 in Raphael doesn't seem very useful, at present, except for lane splitting down to more PCIe4 lanes. I'm sure their will be improvement from the front end to the back end to bring up IPC and net performance (I give up, I will use IPC, even though it meant something very different to me when I started out in firmware development).

AMD is clearly aware that they are and need to continue to maintain their edge over Intel their key customer domains (laptop, PC, Server and, increasingly, HPC). HPC is more than just supercomputers and Cloud instances. Building an HPC rack or two for ML/DL, etc., will become more and more important for developers so they can push out scalable solutions to an increasing number of Corporate users, game developers and those in the sciences.

Personally, I don't think I'll be upgrading to Zen4, but I'd like to - so much coolness.
 
Last edited:
  • Like
Reactions: Tlh97 and Saylick

gdansk

Platinum Member
Feb 8, 2011
2,078
2,559
136
It is interesting. He gave a general description of what AMD is going to do but we still have few (official) explanations as to what they are going to do in Zen 4. And his excitement for Zen 5 suggests it will be a big improvement.