Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Vattila · Oct 6, 2019

Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts!

Tarkin77 · Mar 4, 2021

yuri69 said:
So Milan is shipping next month. This means Q2, right? Do we really expect Genoa to be anytime soon in 2022? Are the SP3 and SP5 platforms supposed to coexist?

Mark Papermaster said THIS week, Milan is shipping since Q4 2020. And next gen Epyc (Genoa) is on track for 2022 and will leverage 5nm TSMC

direct qoute:

"And the choice of EPYC configurations is going again with the third-generation Milan that will be launching later this month, but has already been shipping in select accounts since the end of last year."

"In fact second gen and third gen will be in the market coincidence. And we're on track with fourth-gen EPYC to go-to-market in 2022."

eek2121 · Mar 4, 2021

You guys are forgetting that Warhol could be used in a manner similar to Lucienne. It would be beneficial for AMD to do so, because it would increase capacity.

Ajay · Mar 4, 2021

eek2121 said:
You guys are forgetting that Warhol could be used in a manner similar to Lucienne. It would be beneficial for AMD to do so, because it would increase capacity.

So I must have missed something here. How did Lucienne allow AMD to produce more chips?

Doug S · Mar 4, 2021

Ajay said:
Doh! Thanks. To think, I have an iPhone 12 with an A14 and still blew that. @Doug S nailed the problem with expanding 5N and getting 3N up and running. Fortunately for TSMC, Intel being out of the leading edge ATM has allowed TSMC to get a larger allocation.

It appears Intel and TSMC made a deal where Intel will trade TSMC some of the EUV orders they don't need delivered yet, in exchange for the additional N3 capacity that will allow TSMC to build. So it isn't likely to help AMD.

Ajay · Mar 4, 2021

Doug S said:
It appears Intel and TSMC made a deal where Intel will trade TSMC some of the EUV orders they don't need delivered yet, in exchange for the additional N3 capacity that will allow TSMC to build. So it isn't likely to help AMD.

TSMC has a large % of the total AMSL EUV lithography machines that have been delivered. I can only assume that Intel (who was expected to be large purchaser) cancelled some orders in advance making more available for TSMC to buy. I don't have specific evidence of this, other than the fact that the number EUV machines TSMC has represents the bulk of said equipment that has been shipped from AMSL.

turtile · Mar 4, 2021

Ajay said:
So I must have missed something here. How did Lucienne allow AMD to produce more chips?

Lucienne is a smaller chip because it has half the cache and the Zen 2 architecture. A 6nm shrink of Zen 3 would also yield more chips per wafer. Not sure if it can clock as high considering it seems to mainly increase density with EUV.

Ajay · Mar 4, 2021

turtile said:
Lucienne is a smaller chip because it has half the cache and the Zen 2 architecture. A 6nm shrink of Zen 3 would also yield more chips per wafer. Not sure if it can clock as high considering it seems to mainly increase density with EUV.

A shrink to 6N would reduce supply, since 6N relies on EUV, which is also being used by 5N. Unless there is some wafer agreement between AMD and TSMC that will boost AMD's total wafer allocation.

NostaSeronx · Mar 4, 2021

6nm uses different EUV machines than 5nm, btw.

Fab 15 (only 7nm/~~7nm+~~/6nm) => Older EUV machines. (Re-uses Fab 12 R&D EUV equipment) // Aged 2015 and up.
Fab 18 (only 5nm/4nm) => Newer EUV machines. (Used newly ordered EUV equipment) // Aged 2018 and up.

The 3nm fab will be re-using Fab 18 R&D EUV equipment.

6nm is existing supply at TSMC. (As they ran down Fab 12's EUV R&D phase)
5nm/3nm is new supply from ASML. As they build up Fab 18's EUV R&D phase for 3nm/2nm/so on. Which eventually gets converted over to 3nm/2nm/so on mass production fabs.

Fab 15 will be running pretty much fully automotive. (6nm has extremely high demand in auto market)
So, Fab 18 will be needed to run consumer products. (5nm has lower demand in auto market)

Only those who outrun the (or join the) automotive market will have enough supply.

jamescox · Mar 4, 2021

Tarkin77 said:
Mark Papermaster said THIS week, Milan is shipping since Q4 2020. And next gen Epyc (Genoa) is on track for 2022 and will leverage 5nm TSMC

direct qoute:

"And the choice of EPYC configurations is going again with the third-generation Milan that will be launching later this month, but has already been shipping in select accounts since the end of last year."

"In fact second gen and third gen will be in the market coincidence. And we're on track with fourth-gen EPYC to go-to-market in 2022."

The rumors aren’t quite what I expected. The mock-up of Genoa does not appear to use any die stacking unless there is more than one layer in the IO package or cpu packages. That could make some sense to try stacking in the IO die first since it should be lower power than cpu cores and lower risk. I kind of expected that we would see infinity cache in the Genoa IO die. The UMC (taking DDR vs. QDR graphics memory and such into account) seems like it would be very similar internally. It could be plausible, although wild speculation, that the IO die is 2 layer device with a layer made on an older process with the physical interfaces and another layer on a newer process for logic and infinity cache. An Epyc processor 128 MB L4 would be amazing. I don’t think they would want to make L4 cache on GF process, so it would make sense that either TSMC makes the whole thing on older process tech or GF only makes the interposer (actual IO) portion.

The mock up doesn’t look like any die stacking at all. When I heard the 96 core rumor I was thinking that they might make a stacked device with possibly the multi-layer IO die described above (essentially an active interposer) and 4 cpu chiplets stacked on top. That would allow them to make devices with 32-cores and room for an HBM gpu on either side. It also would allow placing cpu cores on either side for 96 cores, but latency would probably be asymmetric, so it seems unlikely. With a lot of cache, it might not make that much difference though. The other thought was that they might stack two cpu die for a maximum of 128 core and 96 core was just one sku. They could connect to the IO package with one link in the same way that 2 CCX share one link in Zen 3. It would be a much faster link though.

It is kind of disappointing if we don’t get any of this stuff in Zen 4, but if Zen 4 is a completely new architecture, then that would make up for it a bit. If they did go up to 12 links, then it would make sense for each quadrant (and the desktop parts) to have 3 cpu links, 3 DDR5 (whatever that means for DDR5), and 2 x16 pci express. They don’t really need to increase the IO; Zen 3 already has ridiculous levels of IO bandwidth.

I doubt that the CCX will be more than 8 cores and 32 MB L3 unless stacking is used. I thought that one possibility for stacking is to place some or all of the L3 onto a separate die stacked with the cpu die. They could then bin cache die by usable size, possibly offering up to 64 MB. The cache die could also possibly be made in a different process that is better for making SRAM. That could also save valuable fab capacity. The cpu die would be very power dense without L3 though. That is wild speculation, but if it is that much of a new architecture, then they may change the cache hierarchy significantly. I expect we are going to, at a minimum, get a much larger L2 cache. 32 MB is still very large for one 8 core CCX for L3.

eek2121 · Mar 4, 2021

Ajay said:
So I must have missed something here. How did Lucienne allow AMD to produce more chips?

Different nodes, different fabs. As long as they have components, they will be able to boost output by 60-80% once Zen 4 rolls out.

andermans · Mar 4, 2021

eek2121 said:
Different nodes, different fabs. As long as they have components, they will be able to boost output by 60-80% once Zen 4 rolls out.

I thought Lucienne and Cezanne are the same node though?

turtile · Mar 4, 2021

Ajay said:
A shrink to 6N would reduce supply, since 6N relies on EUV, which is also being used by 5N. Unless there is some wafer agreement between AMD and TSMC that will boost AMD's total wafer allocation.

How would it reduce the supply of chips if AMD will not be using 5nm at the point it moves to 6nm? I assume AMD will be using 6nm for the APU since it's much larger than Zen chiplets. AMD can cheaply move RDNA 2 and Zen 3 designs to 6nm since the design rules are the same. Otherwise, they'd have to wait until RDNA 3 + Zen 4 which would be late 2022 at the earliest. AMD has stated that it wants to have a yearly release for mobile CPUs.

The cost of porting RDNA 2 and Zen 3 to 5nm doesn't make sense.

moinmoin · Mar 5, 2021

jamescox said:
The mock up doesn’t look like any die stacking at all.

I personally doubt the mock up is based on any real layout. To me it looks like a guess based on the "known" specs including pin count and die sizes when expecting the layout to be an evolution of the existing layout. Once die stacking, interposers etc. are used the layout is bound to change more than this.

eek2121 · Mar 5, 2021

andermans said:
I thought Lucienne and Cezanne are the same node though?

They are. I am sorry I wasn’t clear, I was referring to AMD using 5nm for Zen 4 and 7nm or 6nm for a Zen 3 refresh, all released under the Ryzen 6000 series.

moinmoin · Mar 5, 2021

eek2121 said:
They are. I am sorry I wasn’t clear, I was referring to AMD using 5nm for Zen 4 and 7nm or 6nm for a Zen 3 refresh, all released under the Ryzen 6000 series.

This may be necessary for the OEM/ODM market (like laptops) where only the latest supposedly sells well so a lot of stuff gets reused with newer model numbers. But for DIY desktops I sure hope AMD will just continue offering older gens instead mixing them using new model numbers.

Saylick · Mar 5, 2021

moinmoin said:
I personally doubt the mock up is based on any real layout. To me it looks like a guess based on the "known" specs including pin count and die sizes when expecting the layout to be an evolution of the existing layout. Once die stacking, interposers etc. are used the layout is bound to change more than this.

FWIW, the leaker said this about the layout:

https://twitter.com/x/status/1366310317635088385

They also said this regarding 3D stacking:

https://twitter.com/x/status/1366310770858934272

jamescox · Mar 6, 2021

Saylick said:
FWIW, the leaker said this about the layout:

https://twitter.com/x/status/1366310317635088385

They also said this regarding 3D stacking:

https://twitter.com/x/status/1366310770858934272

I saw that they are claiming it is based on the real layout but is photoshop. I don’t know how much I trust that but it might make sense for the first revision of Zen 4 to not be stacked since that they can still make a cheap desktop part.

If the IO die is remotely accurate, then it still may be a stacked die. It would make sense to make the physical layer interfaces on the older process as an active interposer and stack cache and other logic as a separate die on top made on the latest process. It does look too small but perhaps it is doable. It takes a huge number of solder balls for the required signals on the IO die, especially if it has 50% more interfaces.

It is also possible that it is using embedded silicon interconnect bridges. I don’t remember what TSMC calls their version of it (intel calls it EMIB), but it would allow the IO die to be smaller since it would use much smaller micro-solder balls and much simpler interfaces. I assume that current IO die is essentially just in a bga package. An embedded silicon bridge would not be obvious from just looking at the package since it would just be under the cpu die and partially under the IO die. Using embedded silicon interconnect bridges could reduce power consumption and increase bandwidth. If they are using embedded interconnect die, then I would expect the cpu die to not have much of any space between adjacent die and the IO die. The embedded die could be around the the size of 4 cpu die or smaller for a row of 3 cpu die.
It is still all just speculation at this point. Even if that image reflects the organization, it may not be accurate in size and exact placement.

jamescox · Mar 7, 2021

jamescox said:
I saw that they are claiming it is based on the real layout but is photoshop. I don’t know how much I trust that but it might make sense for the first revision of Zen 4 to not be stacked since that they can still make a cheap desktop part.

If the IO die is remotely accurate, then it still may be a stacked die. It would make sense to make the physical layer interfaces on the older process as an active interposer and stack cache and other logic as a separate die on top made on the latest process. It does look too small but perhaps it is doable. It takes a huge number of solder balls for the required signals on the IO die, especially if it has 50% more interfaces.

It is also possible that it is using embedded silicon interconnect bridges. I don’t remember what TSMC calls their version of it (intel calls it EMIB), but it would allow the IO die to be smaller since it would use much smaller micro-solder balls and much simpler interfaces. I assume that current IO die is essentially just in a bga package. An embedded silicon bridge would not be obvious from just looking at the package since it would just be under the cpu die and partially under the IO die. Using embedded silicon interconnect bridges could reduce power consumption and increase bandwidth. If they are using embedded interconnect die, then I would expect the cpu die to not have much of any space between adjacent die and the IO die. The embedded die could be around the the size of 4 cpu die or smaller for a row of 3 cpu die.
It is still all just speculation at this point. Even if that image reflects the organization, it may not be accurate in size and exact placement.

I looked up the tsmc stacking roadmap again:

3DFabric: The Home for TSMC’s 2.5D and 3D Stacking Roadmap

www.anandtech.com

The most likely tech for stacked cpu die or cpu die with stacked caches seems like it would be the SoIC tech. This is stacking without micro-solder bumps. It has better thermal characteristics than other stacking tech that uses micro-solder bumps. The rough equivalent of Intel EMIB is LSI (Local Silicon Interconnect), I guess. They have a bunch of different technologies with a bunch of confusing names. I don't know if that will be usable for the Epyc package, but it would save a lot of power vs. running serial at pci-express 5.0 speeds. Perhaps something like the InFo_LSI in the second to last slide. That doesn't really match with the mock-up image though. If it was using LSI, I would expect the die to be closer together and lined up differently. I didn't really expect them to make the package larger since stacked die will be much more space efficient. If the rumored layout is correct, then it seems likely that there is no stacking in initial Genoa processors, unless it is in the suspiciously small IO die.

Gideon · Mar 16, 2021

The Anandtech's interview with Forrest Norrod has quite a few interesting dibits regarding AMDs future (Genoa) as well as Trento and CDNA2:

1. AMD is strongly hinting at (finally) using better nodes for I/O dies as well as beter packaging tech for Zen 4:

You’re going to see us continue to drive the process node very hard on both the cores as well as the uncore. We’re going to continue to drive innovation around the interconnect. So Infinity Fabric as a protocol has got a lot of legs, but you’ll see us continue to do things to make that more and more power efficient, and lower the picojoule per bit of switch traffic.

2. On package HBM for CPUs. Initially for server workloads: AI and such, but it would work quite well in some client scenarios as well (not even strictly APUs think of a 5800X with on die HBM2 stack instead of the other CPU die. The memory latency/bandwidth numbers would dwarf anything now):

You can see a bifurcation coming in the roadmap, where there are parts that have different memory hierarchies. Maybe with storage class memory as the main store with an HBM - on die, or a smaller memory almost like an L4 cache, or maybe a software managed resource that the application can take advantage of. But anyway, I think you’ll see innovation in the memory system in the next few years.

3 Trento, as long specualted, is Milan with custom I/O die and possibly HBM2 on board:

the first exascale system in the world, which will be deployed at Oak Ridge National Labs later this year. It’s called Frontier, and it really uses a next generation CDNA architecture, Instinct parts, which is something we haven’t announced yet. It also uses a Milan generation CPU, and the reason I say that is it actually is the CPU in that system is something called Trento - it’s a sibling of Milan if you will. It’s slightly different - it has a physically different piece of silicon in the I/O die, so it’s slightly different from Milan. But the key aspect there is something we think is hugely important going forward - it’s a coherent system.

4. From the previous quote AMD will launch a new CDNA product for Oak Ridge. My guess: it's CDNA2 and probably their first 5nm product. They did this with Vega based 7nm instinct in 2018 and it just makes sense as shrinking a GPU is easier than any of their other products. It also gels nicely with this tweet:

https://twitter.com/x/status/1371623039742271489

yuri69 · Mar 16, 2021

1. AMD needs to get rid of the 14nm 2019 die. There is a huge potential for improvement.
2. The HPC fields seems to be pretty strong with Zen, so making a dedicated/semi-custom HPC line with HBM makes sense. Stories about "the BIG APU" make no sense besides HPC. Consumer APUs are value market with tight margins.
3. From the quote it seems Trento features the Intifnity Architecture 3 - maybe somehow ported Genoa IOD tech. Not even a hint about different packaging, memory, etc.

NTMBK · Mar 16, 2021

Don't worry, GloFo will get their FDSOI node working any day now, and we'll have a magically efficient northbridge...

andermans · Mar 16, 2021

I think it is going to be hard to really get the IO die cost down as I'd expect something like 40W or so just being the MCM + memory IO cost of going outside the chips. Then with Zen4 you likely have DDR5 and an increase of the number of channels & number of cores.

If

https://twitter.com/x/status/1366020413327818752

is to be believed we're not going to see more efficient packaging options for Zen4, so I'm not sure that side of the problem gets more efficient. So with an increase of connectivity, same chiplet technology and a process shrink I would expect the power usage of the IO die to land approximately on the same level for Zen4.

Note that even with that there is some more power going to the cores by increasing the overall TDP (

https://twitter.com/x/status/1365981401808580614

), so the ratio between cores and IO should improve a bit.

LightningZ71 · Mar 16, 2021

Both Samsung and TSMC offer intermediate nodes between GF 14/12LP and TSMC N7. Those nodes are mature and have been in volume production for the better part of half a decade at this point in some cases. It hasn't made a whole lot of sense to move to those nodes yet, however, as the tech needed to increase the grid pitch (number of contacts per square mm) of the BGA/microdots under the I/O die has been more slowly developing. In other words, a part of the reason for the I/O die being so large was to spread it out enough to make all the contacts required to drive 128 PCIe lanes, 8 DDR4 channels, power, ground, connections to all the CCDs, other I/o, etc. That's not to say that it was impossible to do previously on a more compact node, just that the gain wasn't worth the expense to do so. Now, the tech is more matured, and the need is there.

jpiniero · Mar 16, 2021

LightningZ71 said:
Both Samsung and TSMC offer intermediate nodes between GF 14/12LP and TSMC N7.

TSMC doesn't really. 10 uses the same tools as 7. Don't think 16FF would be much of an improvement.

CHADBOGA · Mar 16, 2021

NTMBK said:
Don't worry, GloFo will get their FDSOI node working any day now, and we'll have a magically efficient northbridge...

22FDX 4 Life

Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Senior member

Member

Platinum Member

Lifer

Platinum Member

Lifer

Senior member

Lifer

Diamond Member

Senior member

Platinum Member

Member

Senior member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Senior member

Senior member

Golden Member

Senior member

Lifer

Member

Golden Member

Lifer

Platinum Member