Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 86 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

jamescox

Senior member
Nov 11, 2009
644
1,105
136
They'll need SRAM (or something equivalently low latency) for anything up through L3 cache, but if they add an additional memory side cache, they could afford to focus more on capacity. That could be an opportunity for different technology. Probably would be a volatile memory, however. Doubt MRAM has any appeal.
They can get very dense SRAM cache on a process optimized for it. Since the infinity cache is connected to the compute die in MI300 with SoIC, it can be as fast as on-die caches. I don’t know if there is need in the hierarchy for an explicit memory side cache.

I initially thought that MI300 would have separate cache, IO, and compute die. I guess IO may be fine in a cache optimized process, so perhaps there is less need to split IO and cache since it seems to be cheap enough to make on 6 nm node. If significantly larger caches make sense, then then it may also make sense to split out the cache onto a separate die such that multiple cache die can be stacked to increase capacity. Infinity cache doesn’t seem to need to be that large to be effective, so I don’t know if that will happen.
 

turtile

Senior member
Aug 19, 2014
623
299
136
I don't imagine that they will launch the CPU only MI400 SKU without its CDNA variants which may not be ready by 2024.
I'm not saying that it will be CPU only. I think it could be upgraded to Zen 5, which will be the first platform on the server with Xilinx IP, and they could shrink CDNA and add a few minor improvements to get a boost.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,109
136
I'm not saying that it will be CPU only. I think it could be upgraded to Zen 5, which will be the first platform on the server with Xilinx IP, and they could shrink CDNA and add a few minor improvements to get a boost.
I don't think so, Nvidia is NOT standing still, so MI400 will still need a much larger increasing in performance than AMD would get from an optical shirnk.
 

soresu

Diamond Member
Dec 19, 2014
3,230
2,515
136
I don't think so, Nvidia is NOT standing still, so MI400 will still need a much larger increasing in performance than AMD would get from an optical shirnk.
I think going forward AMD have the benefit of having a compact high density CPU + accelerator platform which may offer advantages beyond just equal or better accelerator performance for some customers.

This is something nVidia will never have in the x86 space at least, hence their move to cover lotsa ARM cores with Grace.
 

moinmoin

Diamond Member
Jun 1, 2017
5,064
8,032
136
I think the point is that AMD continues to have plenty flexibility open to exploit.

Bergamo is the first product where within the same gen the same platform and IOD is offered with two different types of CCDs.

MI300 has a pretty insane flexibility, with the CPU, GPU and APU variants as well as being offered for SH5 and OAM platforms. Upgrading only specific chiplets without changing the overall package can and probably should happen. Previously this already happened with Zen 2 and Zen 3 both using the same IOD and package. Whether the result is marketed as MI400 or as a more gradual upgrade is a different matter.
 

jamescox

Senior member
Nov 11, 2009
644
1,105
136
I think going forward AMD have the benefit of having a compact high density CPU + accelerator platform which may offer advantages beyond just equal or better accelerator performance for some customers.

This is something nVidia will never have in the x86 space at least, hence their move to cover lotsa ARM cores with Grace.
AMD already has a bunch of advantages here with a much higher bandwidth, more HBM, and a truly unified memory system compared to Grace-Hopper. Grace-Hopper will have higher capacity within the package, but only at around 500 GB/s to the LPDDR. AMD can build machines with SH5 gpus paired with dual socket SP3 cpus with close to 500 GB/s memory bandwidth to each cpu socket. AMD also probably has lower cost to produce since it uses small chiplets rather than giant, monolithic die. It will still be very expensive, but I suspect Nvidia's solution will be just ridiculously expensive.
 

soresu

Diamond Member
Dec 19, 2014
3,230
2,515
136
AMD already has a bunch of advantages here with a much higher bandwidth, more HBM, and a truly unified memory system compared to Grace-Hopper. Grace-Hopper will have higher capacity within the package, but only at around 500 GB/s to the LPDDR. AMD can build machines with SH5 gpus paired with dual socket SP3 cpus with close to 500 GB/s memory bandwidth to each cpu socket. AMD also probably has lower cost to produce since it uses small chiplets rather than giant, monolithic die. It will still be very expensive, but I suspect Nvidia's solution will be just ridiculously expensive.
I seem to recall some research a while back pertaining to ARM based chiplets from somewhere.

It seems unlikely that ARM or its server/datacenter partners will stay monolithic for long given how ridiculously expensive chip design is becoming even with ML based placement assistance and other computational design tools.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,102
136
I seem to recall some research a while back pertaining to ARM based chiplets from somewhere.

It seems unlikely that ARM or its server/datacenter partners will stay monolithic for long given how ridiculously expensive chip design is becoming even with ML based placement assistance and other computational design tools.
Nah, I'm sure Nvidia has something cooking up. Grace-Hopper is just a stopgap product pending a more integrated solution.
 

Saylick

Diamond Member
Sep 10, 2012
3,532
7,859
136
Nah, I'm sure Nvidia has something cooking up. Grace-Hopper is just a stopgap product pending a more integrated solution.
I wonder if we'll see in our lifetime a Jensen-Huang. I've heard JHH has an ego, and many analysts see him as some sort of AI god/pioneer/etc.
 

soresu

Diamond Member
Dec 19, 2014
3,230
2,515
136
I wonder if we'll see in our lifetime a Jensen-Huang. I've heard JHH has an ego, and many analysts see him as some sort of AI god/pioneer/etc.
That seems more like a Musk move.

I think for all his ego JHH is a bit too savvy about his companies PR to court that kind of comparison - especially at a time where public opinion over AI is still very much in flux and influenced by decades of negative attention in Hollywood media and fiction literature.

There have been plenty of advances in more recent times (like Li ion battery co inventor Goodenough) worthy of named product generations, so we may eventually see some of them highlighted by nVidia.

Or we can just get moar 19th century stuff ala Curie and Faraday 😅
 

eek2121

Diamond Member
Aug 2, 2005
3,100
4,398
136
It will be curious to see if AMD backtracks on high TDP/power limits with Zen 5. Zen 4 at 105/142W loses very little in terms of performance vs. stock, and by raising power limits, messaging about the chip became a lot more negative.

AMD should give all the ‘X’ chips 125W TDPs (170W PPT) next round. More room to clock high, but still efficient.
 

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
It will be curious to see if AMD backtracks on high TDP/power limits with Zen 5. Zen 4 at 105/142W loses very little in terms of performance vs. stock, and by raising power limits, messaging about the chip became a lot more negative.

AMD should give all the ‘X’ chips 125W TDPs (170W PPT) next round. More room to clock high, but still efficient.
I do not see a world in which peak power for Zen 5 per core is lower than that for Zen 4.

Maybe if it had a good node shrink to prop it up, but N4P v s N5 ain't it. With that in mind, I'm not sure backtracking on power limits would make sense.
 
  • Like
Reactions: Tlh97 and Thibsie

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
Maybe if it had a good node shrink to prop it up, but N4P v s N5 ain't it. With that in mind, I'm not sure backtracking on power limits would make sense.
From TSMC official numbers (best case scenarios)
N4P --> N3E is ~10% efficiency gain.
N5 --> N4P is ~22% efficiency gain
N5 --> N3E is ~35% efficiency gain

Even 15% efficiency gain from process is significant, if AMD cannot extract that much at the very least then it is poor execution.
From architecture they should be able to extract some efficiency as well.
But if net loss of efficiency due to bigger/wider cores then the execution is lacking.

They might be driving the core to even more frequencies is the only reason that would be acceptable.
 

Kepler_L2

Senior member
Sep 6, 2020
537
2,198
136
From TSMC official numbers (best case scenarios)
N4P --> N3E is ~10% efficiency gain.
N5 --> N4P is ~22% efficiency gain
N5 --> N3E is ~35% efficiency gain

Even 15% efficiency gain from process is significant, if AMD cannot extract that much at the very least then it is poor execution.
From architecture they should be able to extract some efficiency as well.
But if net loss of efficiency due to bigger/wider cores then the execution is lacking.

They might be driving the core to even more frequencies is the only reason that would be acceptable.
It's likely to be similar to Zen3 vs Zen2, where the perf gap is small at low power (15-25W) but grows significantly at 45W+.
 

DrMrLordX

Lifer
Apr 27, 2000
22,065
11,693
136
Even 15% efficiency gain from process is significant, if AMD cannot extract that much at the very least then it is poor execution.
It's numbers like that that remind me how good N4P is relative to its release. N3-family nodes will take awhile to become significantly better in any metric outside of maybe density.
 

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146
From TSMC official numbers (best case scenarios)
N4P --> N3E is ~10% efficiency gain.
N5 --> N4P is ~22% efficiency gain
N5 --> N3E is ~35% efficiency gain

Even 15% efficiency gain from process is significant, if AMD cannot extract that much at the very least then it is poor execution.
From architecture they should be able to extract some efficiency as well.
But if net loss of efficiency due to bigger/wider cores then the execution is lacking.

They might be driving the core to even more frequencies is the only reason that would be acceptable.
I don't think there will be a net loss in efficiency?

I specifically stated higher peak power. I'm going to double down on this actually. I think that at any given clock frequency Zen 5 will require more power than Zen 4.

That doesn't mean I believe Zen 5 will sport worse power efficiency than Zen 4 (at least at any frequency where it actually matters). Just that the peak of what is possible will be increased.
 

Gideon

Golden Member
Nov 27, 2007
1,774
4,145
136
I do not see a world in which peak power for Zen 5 per core is lower than that for Zen 4.

Maybe if it had a good node shrink to prop it up, but N4P v s N5 ain't it. With that in mind, I'm not sure backtracking on power limits would make sense.
IMO the only way forward is a significantly wider core, which absolutely means more power, when totally maxed out (far past it's efficiency optimum).
 
Last edited:

soresu

Diamond Member
Dec 19, 2014
3,230
2,515
136
Zen5 to use CCX design? What's the purpose? Maybe a typo(Zen5c instead of Zen5)?


View attachment 82572
Lol, everytime I see the Dense core variant mentioned.....

back-to-the-future-george-mcfly.gif
 

SpudLobby

Senior member
May 18, 2022
989
680
106
I don't think there will be a net loss in efficiency?

I specifically stated higher peak power. I'm going to double down on this actually. I think that at any given clock frequency Zen 5 will require more power than Zen 4.

That doesn't mean I believe Zen 5 will sport worse power efficiency than Zen 4 (at least at any frequency where it actually matters). Just that the peak of what is possible will be increased.
Yeah this is a straightforward consequence of a wider, deeper microarchitecture which AMD somewhat explicitly alluded to especially on the frontend and I believe will take Zen to parity with Golden Cove on sheer size. All else equal - and the node isn't changing afaict for mobile - it's going to require more at a given frequency unless gating was just lackluster on Zen 4. Of course for a similar performance itself I suspect energy efficiency or power efficiency will be improved, likely because you won't have to clock as high/raise voltages as high. Idle could improve as well but that's more uncore stuff.