Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 847 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

OneEng2

Member
Sep 19, 2022
131
161
86
The X3D is a miracle of modern silicon engineering. So they could be a bit limited by the laws of physics. Big deal. It's still going to be better than what came before it. Seriously give the melodrama a rest :)
I am also wondering if AMD will not make X3D part of its main product line. Surely more than just games would benefit from more L3?
If I am not mistaken first HT-related vulnerability was reported in 2005. We have 2024 now when they removed it;) Wouldn't call it fast. And if they really did it for security they would remove it from newest Xeons. The simplest explanation that they did it to make scheduling easier with two types of the cores and to save on validation time is the most fitting imo.

When it comes to AMD they have their own implementation. That seems to be doing better on the security front, of course it is also "younger" so might be we will see new ones popping up, but as well as with speculative execution, the idea itself is a clever way to boost CPU utilization so I guess companies will try to salvage it as much as possible.
Agree. Intel didn't remove SMT because of vulnerability. You have it right.
Yeah, I dont know man. The result from this morning looks very bad. 328W for 42286 R23?

View attachment 109867

Source: HWBOT https://t.co/Rkxoev9JKf https://t.co/4KapR8CWtp

HXL (@9550pro) on X
I would like to believe that we still have the most competitive environment :).
Some saying that the 62K score was using chilled water. Either way, it’s probably higher than even an average person with an AIO can achieve. Still a really great score, but given that and it being just one game, trying to temper expectations a little.
I suspect it wasn't that obscure.
I wonder how the economics would work out if AMD were to completely forego L3 on the main CCD die (saving die area) and making all the processors V-Cache, and stacking them on top of IO die.

Then, the IO die would be broken into a section of dedicated V-Cache, private to the CCD sitting above it (maintaining the low latency). Plus there would be a link to IO die for the rest of the communication. IO die could continue to be N6 based.

The alternative - Strix Halo like link between CCD and IO Die - while cheap, it is not free.

The advantage disadvantages of proposed CCD on top of IO die:
- cost of 3D stacking
+ die saving on expensive node of CCD, not suitable for SRAM to cheaper N6 node
+ every CPU already starts with V-Cache and its performance advantage
+ unlimited bandwidth and low latency to IO die

Strix Halo / Navi 31/32 fanout link:
+ cheaper than 3D stacking
- fanout link still has its own cost
- SRAM on expensive node, where it does not scale
- adding V-Cache still has the same additional cost
I agree. I can see AMD making lots of different variations using this process.
The problem with hybrid bonding is not cost, but capacity. The process used for it is slow, meaning that the throughput of a line doing it is not very high, meaning you have to build a lot of capacity, which is slow.

I cannot see them doing a product stack that strictly depends on hybrid bonding for all SKUs either next gen or the one after that. Not because of cost, but because you cannot magic up capacity for it.



There are ways it would help latency, because the most distant piece of cache would be closer.
Agree; however, I suspect they will find a way to make it much faster as they use it more.
 
  • Like
Reactions: Joe NYC

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,211
15,353
136
I am also wondering if AMD will not make X3D part of its main product line. Surely more than just games would benefit from more L3?

Agree. Intel didn't remove SMT because of vulnerability. You have it right.

I would like to believe that we still have the most competitive environment :).

I suspect it wasn't that obscure.

I agree. I can see AMD making lots of different variations using this process.

Agree; however, I suspect they will find a way to make it much faster as they use it more.
You have not even mentioned the (code-name)-X server CPUs. They use the same technology and have become a very saleable CPU.
 
  • Like
Reactions: lightmanek

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,211
15,353
136
AMD has been very quiet about these for Zen 5. They may be a mid-generation upgrade for Zen 5 in server space.
early for Zen 5. supermicro is backordered since Zen 5 was released for my Turin motherboard. Coming this week they say. (per a supermicro salesman directly to me)

Wait a while. My point is at least for several generations it has been a hot ticket for Zen server.
 

Joe NYC

Platinum Member
Jun 26, 2021
2,568
3,549
106
I wonder how the economics would work out if AMD were to completely forego L3 on the main CCD die (saving die area) and making all the processors V-Cache, and stacking them on top of IO die.

Then, the IO die would be broken into a section of dedicated V-Cache, private to the CCD sitting above it (maintaining the low latency). Plus there would be a link to IO die for the rest of the communication. IO die could continue to be N6 based.

The alternative - Strix Halo like link between CCD and IO Die - while cheap, it is not free.

The advantage disadvantages of proposed CCD on top of IO die:
- cost of 3D stacking
+ die saving on expensive node of CCD, not suitable for SRAM to cheaper N6 node
+ every CPU already starts with V-Cache and its performance advantage
+ unlimited bandwidth and low latency to IO die

Strix Halo / Navi 31/32 fanout link:
+ cheaper than 3D stacking
- fanout link still has its own cost
- SRAM on expensive node, where it does not scale
- adding V-Cache still has the same additional cost

Kepler just posted a patent that is one level above of what I was thinking, namely, there is a separate "pair node" which sits on top of IOD / AID. This "pair node" I am guessing would be SRAM for L3.

My thinking was that there could just be a section of the bottom die dedicated to L3.


 

Saylick

Diamond Member
Sep 10, 2012
3,567
7,973
136
Kepler just posted a patent that is one level above of what I was thinking, namely, there is a separate "pair node" which sits on top of IOD / AID. This "pair node" I am guessing would be SRAM for L3.

My thinking was that there could just be a section of the bottom die dedicated to L3.


Seems pretty generalized to me... This diagram appears to include a lot of the various packaging techniques in one, such as TSVs + hybrid bonding, silicon bridges, and silicon interposer.
 

Gideon

Golden Member
Nov 27, 2007
1,790
4,205
136
Kepler just posted a patent that is one level above of what I was thinking, namely, there is a separate "pair node" which sits on top of IOD / AID. This "pair node" I am guessing would be SRAM for L3.

My thinking was that there could just be a section of the bottom die dedicated to L3.


Isn't it essentially what MI300C is doing?
 

therealmongo

Member
Jul 5, 2019
127
287
136
Core isolation also seems to kill performance on my 7800x3d. 42k on Win10 and 38k on Win11 with core isolation turned on. Was using high desktop, just flicked to standard laptop and waiting for the score.

Gpu is a 4070 Ti Super

Score for standard laptop (Win11, core isolation on) is 42700. So to answer my own question preset plays a significant role in the score
well at least on this hardware

I had Win10 on performance profile, Win11 on balanced, maybe this also effects the score.

Summary
Win 10, high desktop - 42k
Win11, high desktop - 38k (core isolation on)
Win 10, standard laptop - 48k
Win 11, standard laptop - 42k (core isolation on)
 
Last edited:

Joe NYC

Platinum Member
Jun 26, 2021
2,568
3,549
106
Isn't it essentially what MI300C is doing?

Mi300 does not have this "pair node", the compute chipets are stacked directly on top of the AID base.

And Mi300 does not have silicon bridges to connect the base dies. Although, it may be a safe bet that AMD is considering such an option, since it was already planned in Navi 4c.

Being able to use silicon bridges rather than CoWoS would also let AMD bypass the CoWoS capacity bottleneck in the supply chain for the datacenter GPUs.
 

adroc_thurston

Diamond Member
Jul 2, 2023
3,584
5,187
96
Being able to use silicon bridges rather than CoWoS
That's also CoWoS, just -L.
Unless you mean bumpless, then yeah it's SoIC with custom wunderwaffen process flow.
AMD bypass the CoWoS capacity bottleneck in the supply chain for the datacenter GPUs.
You're hammering into an even harder bottleneck then (hybrid bonding is hella slow and ass).
 

tsamolotoff

Member
May 19, 2019
194
364
136
During the cpu heavy parts how many threads get loaded? Does it keep to one ccd or go full 32 threads?
if it goes to the other ccd the score tanks, so it better to keep it within x3d ccd either manually or via process lasso / affinity mask (actually, this time devs of the game knew of this and they specifically try to keep the game within ccd0 - a few months ago I've tried to keep it wiithin ccd1 and it wasn't easy lol). Anyways, here are some results for ccd1 only (non cache chiplet) and game bar/pinning off:
 

Attachments

  • 720p_scale50_ccd1.png
    720p_scale50_ccd1.png
    1.5 MB · Views: 35
  • 720p_scale100_default.png
    720p_scale100_default.png
    1.5 MB · Views: 34

tsamolotoff

Member
May 19, 2019
194
364
136
Overall, it seems a lot of the Intellers have gone quiet and contemplating switching because they know Intel won't be able to pull out a leprechaun out of their hat in less than two years to counter this level of performance increase in gaming.
Just a flesh wound, they'll find a game benchmark in which ARL will reign supreme and build their impregrable fortress off that 🏹🏹
 

Gideon

Golden Member
Nov 27, 2007
1,790
4,205
136
Notice the "DDR" part of the AID, implying this a server or desktop CPU.
I have to say, it's a super interesting concept. I can't really imagine it being used in server or desktop CPUs, would they really route hundreds of watts of power (total) through DDR chips?

But I'd really like to see a mobile chip where the entire CPU is placed on top of the RAM. This looks like the next logical step up from the "memory on package" era, offering power (and potentially latency) improvements.

However, it's worth noting that not every exciting packaging patent ends up being used. Remember this one:
https://www.freepatentsonline.com/20210097013.pdf