Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 719 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
What would you do if your engineering teams are developing something exciting and turns out to be a turd, like Zen 5 for example? :D

I am wondering this is the reason why David Suggs is no longer at AMD, since 1 and half years ago

I wonder if they realized early on that Z5 is going to suck, but they are already 4 years into development.
He was chief architect of Zen 2 and Zen 5.
Z3 and Z4 seems OK, especially Z4 got helped by clocks a lot.

Z6 is going to suffer the same fate, being a derivative architecture.

More theory crafting ...

If Z4 got delayed to accommodate CXL (as per Forrest) and COVID played some part, that would leave Z5 very long dev time.
It could have been that they were trying hard to polish this turd to not regress so much like BD.

However, they could have done something in the uncore and address the BW and latency shortcomings and shore up the perf a bit.
 
Last edited:

Thunder 57

Platinum Member
Aug 19, 2007
2,992
4,570
136
What would you do if your engineering teams are developing something exciting and turns out to be a turd, like Zen 5 for example? :D

I am wondering this is the reason why David Suggs is no longer at AMD, since 1 and half years ago

I wonder if they realized early on that Z5 is going to suck, but they are already 4 years into development.
He was chief architect of Zen 2 and Zen 5.
Z3 and Z4 seems OK, especially Z4 got helped by clocks a lot.

Z6 is going to suffer the same fate, being a derivative architecture.

Saying it sucks is a bit harsh and premature considering the whole lineup isn't even out yet. The 9 series may fare better with more traditional TDP's.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
Saying it sucks is a bit harsh and premature considering the whole lineup isn't even out yet. The 9 series may fare better with more traditional TDP's.
Well, It is not exactly stellar, saying it is mild improvement is being too generous considering the time frame involved.

I am mostly looking at Alexander Yee's blog to make this statement.

Other than AVX512 there is not much improvement
 
  • Like
Reactions: exquisitechar

CouncilorIrissa

Senior member
Jul 28, 2023
542
2,123
96
What would you do if your engineering teams are developing something exciting and turns out to be a turd, like Zen 5 for example? :D

I am wondering this is the reason why David Suggs is no longer at AMD, since 1 and half years ago

I wonder if they realized early on that Z5 is going to suck, but they are already 4 years into development.
He was chief architect of Zen 2 and Zen 5.
Z3 and Z4 seems OK, especially Z4 got helped by clocks a lot.

Z6 is going to suffer the same fate, being a derivative architecture.

More theory crafting ...

If Z4 got delayed to accommodate CXL (as per Forrest) and COVID played some part, that would leave Z5 very long dev time.
It could have been that they were trying hard to polish this turd to not regress so much like BD.

However, they could have done something in the uncore and address the BW and latency shortcomings and shore up the perf a bit.
To me this release feels like a consequence of misreading the room when the development of Zen 5 started, which is 5-6 years ago realistically.

At the time, Intel had a big lead in FP and vector throughput in HEDT/server with SKL-X and then followed it up by bringing it to client with ICL and TGL.

To me it feels like AMD decided to match them in this respect no matter what and dedicated bulk of the resources to FP throughput (L1 -> FP PRF doubled, doubled the FP register file, went for the most overkill AVX-512 implementation known to man).

Little did they know that Intel would ditch the thing and ARM would become a major threat with their ultra-wide OOO machines with ridiculous integer throughput.

Couple that with Suggs' propensity for large FP units and bean counters reverting the Zen 5 to N4P, and you have a perfect storm for the lowest gen-on-gen INT gain.

I'd also add that Zen 3 was more than OK, it was a goated gen-on-gen jump. 16 months after Zen 2, miniscule area increase, massive improvement in INT throughput.
 

yuri69

Senior member
Jul 16, 2013
541
975
136
To me this release feels like a consequence of misreading the room when the development of Zen 5 started, which is 5-6 years ago realistically.

At the time, Intel had a big lead in FP and vector throughput in HEDT/server with SKL-X and then followed it up by bringing it to client with ICL and TGL.

To me it feels like AMD decided to match them in this respect no matter what and dedicated bulk of the resources to FP throughput (L1 -> FP PRF doubled, doubled the FP register file, went for the most overkill AVX-512 implementation known to man).

Little did they know that Intel would ditch the thing and ARM would become a major threat with their ultra-wide OOO machines with ridiculous integer throughput.

Couple that with Suggs' propensity for large FP units and bean counters reverting the Zen 5 to N4P, and you have a perfect storm for the lowest gen-on-gen INT gain.

I'd also add that Zen 3 was more than OK, it was a goated gen-on-gen jump. 16 months after Zen 2, miniscule area increase, massive improvement in INT throughput.
Intel did not ditch exotic and expensive stuff. Intel server chips still keep pushing AVX512, AMX, accelerators, etc. The goal for server SKUs has been set to match those instructions.

In case of AVX512, they provided a very balanced implementation by Zen 4. However, they somehow felt the need to jump the full-speed with the following design. Unlike Zen 3 which kept the vector width.

Zen 5 feels like another big bold design being great at niches but not being an all-rounder.
 
  • Like
Reactions: exquisitechar

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
To me this release feels like a consequence of misreading the room when the development of Zen 5 started, which is 5-6 years ago realistically.

At the time, Intel had a big lead in FP and vector throughput in HEDT/server with SKL-X and then followed it up by bringing it to client with ICL and TGL.


You have an interesting angle.
Couple that with Suggs' propensity for large FP units and bean counters reverting the Zen 5 to N4P, and you have a perfect storm for the lowest gen-on-gen INT gain.
He is no longer at AMD, They knew at least couple of years earlier that it would turn out this way.

One of the weirder stuff I heard during some of the interviews with Mike Clark, was to unify the int schedulers so that you can make do with lesser int PRF.
They are counting their registers there but doubled the FP PRF.

On the other hand, while Z6 would also be a minor iterative core architecturally, it is going to benefit from clocks being on N3E.
So I think the physical implementation team would be able to come to their rescue here. They would have had enough time.
I think there is potential uplift from improving the uncore too which can help.
 

LightningZ71

Golden Member
Mar 10, 2017
1,798
2,156
136
Keep in mind that, while Zen5 appears to be a typical server-first core design, this is the first time we're seeing a tangible difference between the base core of the server chip, and a client-only part (Strix Point). We have heard in the past that there was going to be a divergence between the server parts and client parts with respect to the cores coming, and that by Zen6, it was going to be notable. What we are likely to see, going forward, as has been said in the past, is that client becomes based on the family line from Strix's Zen5 and server continues the progression of the desktop/server CCDs with full dress Zen5.

In a more academic sense, client is always going to be more memory bandwidth constrained than server, and putting a pair of giant, full throughput, AVX-512 units in the client cores is just a tremendous waste in the vast majority of cases (though I can certainly see where limited dataset size tasks might absolutely fly on 9Xx0X3D parts). Switching to half throughput AVX-512 like mobile Zen5 and maybe picking AVX-10.2 from Intel on client would seem to make a lot more sense going forward.

That split will allow the client core to focus more on improvements on client centric tasks and server cores to continue to focus on what they need to do better at.
 

CouncilorIrissa

Senior member
Jul 28, 2023
542
2,123
96
Intel did not ditch exotic and expensive stuff. Intel server chips still keep pushing AVX512, AMX, accelerators, etc. The goal for server SKUs has been set to match those instructions.
They did on client, though. Whereas AMD with their "one size fits all" approach ended up with a core that dedicates a large portion of its area for stuff that's almost irrelevant.

On the other hand, while Z6 would also be a minor iterative core architecturally, it is going to benefit from clocks being on N3E.
So I think the physical implementation team would be able to come to their rescue here. They would have had enough time.
I think there is potential uplift from improving the uncore too which can help.
We'll have our preview with STX Halo soon enough I guess.
The uncore is just pure cope at this point.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
We'll have our preview with STX Halo soon enough I guess.
The uncore is just pure cope at this point.
Discussing the bits and pieces of architectural weaknesses and how to overcome them is not coping. I don't know why this word is very much used in the shillicon twitter universe.

I am saying this based on the visible improvements in SPEC int, fixed clock, when Z4 is equipped with 3D V-Cache . I would think removing the uncore bottlenecks which 3D Vcache attempts to work around would improve the situation, until the next bottle neck at least
Also Z4 in MI300A benefits from the LLC prefetching as per AMD themselves.
 

CouncilorIrissa

Senior member
Jul 28, 2023
542
2,123
96
Discussing the bits and pieces of architectural weaknesses and how to overcome them is not coping. I don't know why this word is very much used in the shillicon twitter universe.

I am saying this based on the visible improvements in SPEC int, fixed clock, when Z4 is equipped with 3D V-Cache . I would think removing the uncore bottlenecks which 3D Vcache attempts to work around would improve the situation, until the next bottle neck at least
Also Z4 in MI300A benefits from the LLC prefetching as per AMD themselves.
I don't mean that discussing it is cope, it isn't. I meant that the uncore is just poor. It's just downright funny that CCD is unable to use all of the memory bandwidth because of a single GMI3 link.
 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136

moinmoin

Diamond Member
Jun 1, 2017
5,064
8,032
136
Well, It is not exactly stellar, saying it is mild improvement is being too generous considering the time frame involved.

I am mostly looking at Alexander Yee's blog to make this statement.

Other than AVX512 there is not much improvement
Yee did point out huge improvements in scalar integer though. It's everything in-between that's stagnating.

The bigger issue is that AMD continues to slip on its cadence, it was to be under 18 months. But competition in DC appears not to be strong enough for AMD to keep that up, causing it to lag more and more in mobile (and desktop as far as Apple and ARM can be considered competition there already). Wouldn't matter as much if there were a realistic chance of the cadence catching up, but it seems to get worse instead.
 

inquiss

Senior member
Oct 13, 2010
200
274
136
So this is another typical AMD launch. Couple users over hype the product. Others fall for this hype. When the product is actually released, everyone feels disappointed. For me the performance meets the expectation from the architectural perspective.
It is well know that it is very difficult to increase integer IPC. The number of general purpose registers is a bottle neck. More read/write ports will help, but it may also increase power usage. As I said before, we need to wait for APX instruction set implementation before we see huge IPC increase.
Having said that, there is still lots of potential still left in AVX. With AVX512 they can probably go over 16 execution units.

My real disappointment is there is no 24/32 core AM5 Zen5 CPU.
How would you feed these cores in AM5?
 

gdansk

Platinum Member
Feb 8, 2011
2,966
4,497
136
Wouldn't matter as much if there were a realistic chance of the cadence catching up, but it seems to get worse instead.
It's about the same time-between-releases as Zen 4. But for this length of time people expect bigger gains (even if the process uplift was less)
 

DisEnchantment

Golden Member
Mar 3, 2017
1,747
6,598
136
Yee did point out huge improvements in scalar integer though
But is constrained by memory bandwidth, that is they have throughput as long as there is no data to be fetched from somewhere lower in the memory hierarchy.
But they kept the L2 at 1MiB and kept the L2 to L3 at 32B/cycle. So no respite there too.
 

moinmoin

Diamond Member
Jun 1, 2017
5,064
8,032
136
It's about the same time-between-releases as Zen 4.
Which was known to be delayed to account for CXL. So Zen 5 to spend the same time means it is actually doubly delayed instead catching up with the intended cadence.

But is constrained by memory bandwidth, that is they have throughput as long as there is no data to be fetched from somewhere lower in the memory hierarchy.
But they kept the L2 at 1MiB and kept the L2 to L3 at 32B/cycle. So no respite there too.
That was to be expected though considering we already knew bigger uncore/IO changes would only happen with Zen 6 going by previous gens.
 
  • Like
Reactions: Tlh97 and Joe NYC

gdansk

Platinum Member
Feb 8, 2011
2,966
4,497
136
Which was known to be delayed to account for CXL. So Zen 5 to spend the same time means it is actually doubly delayed instead catching up with the intended cadence.
Let's just put it this way. Only once did a Zen land on time and that was Zen 3. And Zen 5 is right on average. If all Zens but one are delayed then well, what's the exception? It isn't Zen 5.
 

inquiss

Senior member
Oct 13, 2010
200
274
136
For Zen 3 DDR4 4000 was the sweet spot, for zen 5 it's 6000. So that's 50% more bandwidth. The codes are also higher performance so need more memory bandwidth. You can't have more than 26 cores without increasing the memory channels, which taxes everyone on the platform. If you need more cores or bandwidth you go to TR.