Discussion Zen 7 speculation thread

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

soresu

Diamond Member
Dec 19, 2014
3,910
3,338
136
Hasn't each follow on generation taken longer?
tbf it's not like there wasn't insane global trade conditions created by the supply chain knock on effects of COVID and associated inflation.

It's hard to say if the longer gaps between generations is AMD, global conditions or a combination of the 2 without looking directly into the head space of their upper management.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,300
4,817
136
What are the odds that AMD adds both APX and AMX instruction support?

APX - I think excellent. With Intel adding it likely in Diamond Rapids in 2026, there should be some software and compiler support. And it seems like this would be only a modest change to the CPU, centered on the decoder. So seems like a very easy addition leading to some cleanup of the x86 code. But it will likely take years for the software support to find its way to client applications.

AMX - AMD will find itself with a huge transistor budget increase from Zen 6 to Zen 7, if L3 completely moves to V-Cache level. If ~33% of die area migrates to V-Cache, that leaves +50% increase in die size availability, and then, going from N2 -> A16 -> A14 could add another ~50% logic transistor density increase. So, possibly, doubling transistor budget.

Would it make sense to use these transistors for AMX? Or will it eventually be an overkill on client to have matrix operation capability in 3 places: CPU, GPU and NPY?
 

gdansk

Diamond Member
Feb 8, 2011
4,266
7,147
136
Why put AMX in Zen? Put AMX accelerator somewhere near memory to get a nearly free benefit from Intel's software mis-investment. Zen does not grow larger. Benchmarks for ML junk still look good, if you need more accelerators just make a big IOd with more of them included. But that's basically reinventing an ML accelerator from no adoption, for some reason I don't think AMD would need to do that.
 
  • Like
Reactions: Joe NYC

511

Diamond Member
Jul 12, 2024
3,084
3,084
106
APX - I think excellent. With Intel adding it likely in Diamond Rapids in 2026, there should be some software and compiler support. And it seems like this would be only a modest change to the CPU, centered on the decoder. So seems like a very easy addition leading to some cleanup of the x86 code. But it will likely take years for the software support to find its way to client applications.
APX is excellent but unlikely to be introduced untill Zen 7
AMX - AMD will find itself with a huge transistor budget increase from Zen 6 to Zen 7, if L3 completely moves to V-Cache level. If ~33% of die area migrates to V-Cache, that leaves +50% increase in die size availability, and then, going from N2 -> A16 -> A14 could add another ~50% logic transistor density increase. So, possibly, doubling transistor budget.
The logic xtor gain from N2 to A14 is mere 1.23X. AMX is fine for server though
images(30).jpg
 
  • Like
Reactions: Joe NYC

OneEng2

Senior member
Sep 19, 2022
691
933
106
SMT is forever.
Well... for AMD I agree. Intel seems to believe the die space is better utilized for E cores.... at least for the time being.

The idea that Zen 7 will be released within a year of Zen 6 would require that the reality of the world of lithography be drastically reversed.

AMD has been taking between 18-24 months between generational releases. I personally don't expect this to get LOWER. If anything it will get HIGHER.

My speculation is that we will see Zen 6 by the end of 2026 and maybe not until the beginning of 2027. I would tag Zen 7 for around 2029 or at best late 2028.

Since A16 is only going to get ~ 10% density improvement and will be the first BSPDN for TSMC, I could see AMD holding off for A14 for Zen 7 server parts.

Considering the difficulties OEM's have faced in the past with BSPDN (hot spots limiting overall clock speed), I wouldn't be surprised to see AMD stick with N2 for high end desktop/workstation for some time.

In DC, it seems more likely they will push forward with A14. I personally don't see why AMD wouldn't greatly increase core counts with Zen 7c in DC.
 
  • Like
Reactions: CakeMonster

Joe NYC

Diamond Member
Jun 26, 2021
3,300
4,817
136
Well... for AMD I agree. Intel seems to believe the die space is better utilized for E cores.... at least for the time being.

The idea that Zen 7 will be released within a year of Zen 6 would require that the reality of the world of lithography be drastically reversed.

AMD has been taking between 18-24 months between generational releases. I personally don't expect this to get LOWER. If anything it will get HIGHER.

There is the AI wild card to consider.

During this week AI summit in Washington DC, Lisa was on stage (with All-In Podcast hosts) and she alluded to use of AI by AMD in-house.

One thing she mentioned as a current benefit is faster release schedule of products. So it is possible that AI tools will shorten the cadence, as some of the optimizations and testing is done faster and better by AI / ML tools:


In DC, it seems more likely they will push forward with A14. I personally don't see why AMD wouldn't greatly increase core counts with Zen 7c in DC.

MLID shows very modest increase of core count from Zen 6c -> Zen 7c being 32 -> 33 per CCD. But freeing up a lot of die space by moving all of L3 to V-Cache, and increasing the transistor budget that way (in addition to transistor density from new node).

MLID also mentioned L2 going from 1 MB to 2 MB which should only have a modest transistor count and die area increase.
 

gdansk

Diamond Member
Feb 8, 2011
4,266
7,147
136
So it is possible that AI tools will shorten the cadence
Doubtful. It's not new, so the Cadence should have been shortened before now. But it didn't. The designs simply grew instead.

The faster release schedule is for MI because AMD is throwing money/people at it and feel the need to iterate.
 
  • Like
Reactions: RnR_au and marees

AMDK11

Senior member
Jul 15, 2019
459
379
136
Not sure I buy into the concept that they somehow need a multi layer core to get more IPC, unless it's just about reducing IO wire lengths between certain parts of the core.
Core logic includes structures that are characterized by some latency due to their single-plane nature. For example, the edges of these planes/regions of a given logic. A single logic plane can be problematic when expanding across generations. Splitting a given piece of logic into two layers reduces the logic's latency and shortens critical paths.
 
  • Like
Reactions: Thibsie

AMDK11

Senior member
Jul 15, 2019
459
379
136
ngl with the less than mind blowing perf uplift of Zen5 after years of hype I'm starting to feel that someone is trying to generate that kind of pre release hype train here.
Regarding Zen5, people have blown their own bubble based on the Epyc (Zen5) results.

Zen5 is a massively, but cautiously, expanded (widened and deepened) x86 core. It features a super-advanced BPU and prefetching. It's very extensive and significantly deeper than previous Zen generations. The BPU in Zen5 anticipates two consecutive branches and can have three branch windows. Zen5 sees very long and complex branch patterns.

Zen5 has about 26% more transistors per core, or 218 million more than Zen4. The difference is practically the entire Skylake+L2 core (217 million transistors).

The IPC increase is an average of +16% (average +13% Int and average +24% FP).

Compared to Zen2, it's about 50-55% (average IPC increase).

Zen5 has the most powerful and modern BPU in the x86 architecture.

Golden/RaptorCove
BTB L0 128
BTB L1 5K
BTB L2 12K
Return Address Stack 2-4

LionCove
BTB L0 256
BTB L1 6K
BTB L2 12K
Return Address Stack 24

Zen4
BTB L0 128
BTB L1 1.5K
BTB L2 7K
Return Address Stack 32

Zen5
BTB L0 1K!(1024!)
BTB L1 16K!
BTB L2 8K(victim cache for BTB L1)
Return Address Stack 52x2(104 for SMT)

Golden/RaptorCove
Cache L3 ST 90-100GB/s (60-70 cycles)

LionCove
Cache L3 ST 57GB/s (84 cycles)???

Zen5
Cache L3 ST 173GB/s (48 cycles)!!!

Edit:
The Zen5 BPU can predict the next two independent branch paths not only for two threads(SMT) but also within a single thread(ST). When the ST code is heavily branched, the second decoder cluster can take over part of the ST code (2x4-Wide(8-Wide))! (Zen1-Zen4 decode 4-Wide)

SMT Zen4 profit average +13%

SMT Zen5 profit average +18%

OP cache 6144 (instruction fusion) 16-way, 12 ops/ST cycle and 2x 6ops/SMT cycle. Thanks to instruction fusion, the Zen5 op cache has larger capacity than the Zen4 (6912, 12-Way, 9 ops/cycle) op cache.
 
Last edited:

AMDK11

Senior member
Jul 15, 2019
459
379
136
I feel like extrapolating from CCD transistor count is not a solid enough lead to use, if that is what you are using to get to this figure.
Until official data for the Zen5+L2 core is available, this is the only way.

Even considering that some of these transistors also went to Ring-BUS and L3 (despite the same capacitance), this is a decent solution, and the only one at this time.

Zen4 CCD 6.57 billion transistors
~821 million transistors per core Zen4 + 1MB L2 + 4MB L3 + Ring stop, etc.

Zen5 CCD 8.315 billion transistors
~1 billion transistors per core Zen5 + 1MB L2 + 4MB L3+ Ring stop, etc.
 
Last edited:
  • Like
Reactions: Joe NYC

soresu

Diamond Member
Dec 19, 2014
3,910
3,338
136
Core logic includes structures that are characterized by some latency due to their single-plane nature. For example, the edges of these planes/regions of a given logic. A single logic plane can be problematic when expanding across generations. Splitting a given piece of logic into two layers reduces the logic's latency and shortens critical paths.
That's basically what I meant by the comment you responded to.
 

Asterox

Golden Member
May 15, 2012
1,043
1,837
136
AMD marketing team is amateurish, the name of the italian city is Verona, look like
Lisa Su is unaware of this city since she publicly made the typo.
Well, why wouldn't AMD use the Verano name for new EPYC series or why would that be a mistake? :grinning: If that city didn't exist, it would be quite logical to assess it as a mistake in marketing presentation. So you have two cities Verona and Verano, so now we can have a beer in Verano.