Discussion Zen 7 speculation thread

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Abwx

Lifer
Apr 2, 2011
11,852
4,827
136
Well, why wouldn't AMD use the Verano name for new EPYC series or why would that be a mistake? :grinning: If that city didn't exist, it would be quite logical to assess it as a mistake in marketing presentation. So you have two cities Verona and Verano, so now we can have a beer in Verano.
Verano is a village with barely 1000 inhabitants, i doubt that it was ever mentioned elsewhere than in this thread, they just made a typo out of Verona since that s a big city that is close to Venice.
 
  • Like
Reactions: NTMBK

StefanR5R

Elite Member
Dec 10, 2016
6,577
10,352
136
The server CPU codename Verano is neither a typo, nor is it a reference to Vöran in South Tyrol. Rather, it refers to Veràn in Lombardy, right outside of Milan. AMD chose this name because this CPU isn't Zen 7 but a Zen 3 refresh. ;-)
 
  • Like
Reactions: marees

yuri69

Senior member
Jul 16, 2013
668
1,198
136
Regarding Zen5, people have blown their own bubble based on the Epyc (Zen5) results.
Yes, that 40% IPC bubble is Zen 5's trademark.
Zen5 is a massively, but cautiously, expanded (widened and deepened) x86 core. It features a super-advanced BPU and prefetching. It's very extensive and significantly deeper than previous Zen generations. The BPU in Zen5 anticipates two consecutive branches and can have three branch windows. Zen5 sees very long and complex branch patterns.
Yes, but... reading the Chips'n'Cheese profiled data, majority of stalls within the Zen 5 core are caused by the frontend latency.
Zen5 has about 26% more transistors per core, or 218 million more than Zen4. The difference is practically the entire Skylake+L2 core (217 million transistors).
This metric is hugely flawed. Capacitance/frequency optimization, weird IO stuff, niche features, etc. Specifically, Zen 5 invested a notable amount of transistors to support the AVX-512 full-speed data flow.
The IPC increase is an average of +16% (average +13% Int and average +24% FP).

Compared to Zen2, it's about 50-55% (average IPC increase).
Even that 13% is not so impressive for such ambitious design, right? I know the relative base value of Zen 4 was higher than Zen 1's but still - there are Apple and Qualcomm.
Those sizes are cool, but how about the int RF or ROB size?

At 448 slots Zen 5's ROB is still smaller than Golden Cove's 512, let alone Lion Cove's 576.
The same applies to 240 int RF vs 280-300 of Intel's.

Both limits manifest themselves in the profiled data.
 

marees

Golden Member
Apr 28, 2024
1,291
1,844
96
The server CPU codename Verano is neither a typo, nor is it a reference to Vöran in South Tyrol. Rather, it refers to Veràn in Lombardy, right outside of Milan. AMD chose this name because this CPU isn't Zen 7 but a Zen 3 refresh. ;-)
It is zen6+ (despite the misleading site name, Charlie is usually accurate)

Update June 16, 2025 @ 945am: Well we screwed this up, and we blame chronic lack of sleep last week coupled to ultra-short time lines to write up the info. Verano is _NOT_ the successor to Venice, it is a Venice variant. Florence is Venice+1, something we knew and should have remembered before writing the above paragraph.

 

Asterox

Golden Member
May 15, 2012
1,044
1,839
136
Compared to Zen2, it's about 50-55% (average IPC increase).

Zen5 has the most powerful and modern BPU in the x86 architecture.

Golden/RaptorCove
BTB L0 128
BTB L1 5K
BTB L2 12K
Return Address Stack 2-4

LionCove
BTB L0 256
BTB L1 6K
BTB L2 12K
Return Address Stack 24

Zen4
BTB L0 128
BTB L1 1.5K
BTB L2 7K
Return Address Stack 32

Zen5
BTB L0 1K!(1024!)
BTB L1 16K!
BTB L2 8K(victim cache for BTB L1)
Return Address Stack 52x2(104 for SMT)

Golden/RaptorCove
Cache L3 ST 90-100GB/s (60-70 cycles)

LionCove
Cache L3 ST 57GB/s (84 cycles)???

Zen5
Cache L3 ST 173GB/s (48 cycles)!!!

Edit:
The Zen5 BPU can predict the next two independent branch paths not only for two threads(SMT) but also within a single thread(ST). When the ST code is heavily branched, the second decoder cluster can take over part of the ST code (2x4-Wide(8-Wide))! (Zen1-Zen4 decode 4-Wide)

SMT Zen4 profit average +13%

SMT Zen5 profit average +18%

OP cache 6144 (instruction fusion) 16-way, 12 ops/ST cycle and 2x 6ops/SMT cycle. Thanks to instruction fusion, the Zen5 op cache has larger capacity than the Zen4 (6912, 12-Way, 9 ops/cycle) op cache.
If you look at the classic Geekbench test, you get a much broader picture. Roughly speaking, Zen 5 brings an average of 100% better singlecore and multithread CPU score.
If you take average numbers, that would roughly be the comparison.
 

OneEng2

Senior member
Sep 19, 2022
710
949
106
MLID shows very modest increase of core count from Zen 6c -> Zen 7c being 32 -> 33 per CCD. But freeing up a lot of die space by moving all of L3 to V-Cache, and increasing the transistor budget that way (in addition to transistor density from new node).

MLID also mentioned L2 going from 1 MB to 2 MB which should only have a modest transistor count and die area increase.
If all the L3 is placed outside the die using vCache, I would definitely expect AMD to raise core counts for Zen 7. In fact, I would think double.... or as many as they could feed with bandwidth and the power of their socket limitation at the time.
Where have you been? Every node is a half node now...quarter node really.
I wouldn't be surprised if both AMD and Intel start producing new designs ONLY ever 2 nodes. They might just do a "refresh" every half node (or quarter node ;)). It doesn't make sense to redesign without an increase in transistor budget IMO.
Yes, but... reading the Chips'n'Cheese profiled data, majority of stalls within the Zen 5 core are caused by the frontend latency.
If that is the case, then Zen 6 could show some interesting IPC improvements by freeing this up and having a lower latency IOD as well.
this stuff sells..... decision makers see big number and make decision.....
... and lots of applications today in DC can take advantage of AVX512... so it isn't just marketing, it really makes a difference.
Update June 16, 2025 @ 945am: Well we screwed this up, and we blame chronic lack of sleep last week coupled to ultra-short time lines to write up the info. Verano is _NOT_ the successor to Venice, it is a Venice variant. Florence is Venice+1, something we knew and should have remembered before writing the above paragraph.
Well then .... that makes a great deal of sense. I definitely can't see Zen 7 before 2028 .... maybe 2029 unfortunately.
 
  • Like
Reactions: Tlh97 and marees

Joe NYC

Diamond Member
Jun 26, 2021
3,321
4,858
136
Well then .... that makes a great deal of sense. I definitely can't see Zen 7 before 2028 .... maybe 2029 unfortunately.

While Lisa Su said (at recent AI summit) that some new AI tools will allow AMD faster release schedule, you expect a slow down, Zen 7 taking 50% longer than any other Zen releases.

What are your reasons for this? Why do you think you have a better idea about AMD release schedule than Lisa?
 
  • Like
Reactions: Tlh97 and Racan

fastandfurious6

Senior member
Jun 1, 2024
614
769
96
Lisa Su said (at recent AI summit) that some new AI tools will allow AMD faster release schedule

exactly!!!! said that few weeks ago and everyone here was like "but we don't know, AMD doesn't confirm anything" lol

everyone uses AI for both r&d and development lmao, it's a pure accelerator no frills just works. only needs pair of senior eyes and strong QA/testing (also enhanced by AI) to detect the unavoidable hallucinations
 
  • Like
Reactions: Tlh97 and Joe NYC

DavidC1

Golden Member
Dec 29, 2023
1,683
2,769
96
Edit:
The Zen5 BPU can predict the next two independent branch paths not only for two threads(SMT) but also within a single thread(ST). When the ST code is heavily branched, the second decoder cluster can take over part of the ST code (2x4-Wide(8-Wide))! (Zen1-Zen4 decode 4-Wide)
The second decode cluster is only when SMT is active it doesn't work in ST. C&C even postulates it is essentially a return of clustered multi-threading in Bulldozer.
 

DavidC1

Golden Member
Dec 29, 2023
1,683
2,769
96
Most of the gains are due to the new ISA, not the 512-bit path.
When taking the geometric mean of all the raw AVX-512 performance benchmarks, AVX-512 in the default FP512 configuration yielded 1.45x the performance compared to disabling AVX-512 outright. Having the 512-bit data path allowed for 1.12x the performance compared to running the EPYC 9755 processor in the 256-bit data path mode, similar to how AVX-512 operates with Zen 4.
Without the 512-bit but keeping the 256-bit path you get +29%.

For power/area efficiency it makes sense to have AVX512-256.
 
Last edited:

Joe NYC

Diamond Member
Jun 26, 2021
3,321
4,858
136
exactly!!!! said that few weeks ago and everyone here was like "but we don't know, AMD doesn't confirm anything" lol

everyone uses AI for both r&d and development lmao, it's a pure accelerator no frills just works. only needs pair of senior eyes and strong QA/testing (also enhanced by AI) to detect the unavoidable hallucinations

Exactly. When I posted this in another thread (with link to the video where Lisa said it) someone replied that AMD will only do it for datacenter GPUs.

Which makes zero sense. If you develop tools to accelerate your work, you will deploy the same tools across all the divisions, across the product stack.

And for AMD, which has such a wide portfolio of products, AMD can get the best return on the investment into these tools.
 

DavidC1

Golden Member
Dec 29, 2023
1,683
2,769
96
The second decode cluster is only when SMT is active it doesn't work in ST. C&C even postulates it is essentially a return of clustered multi-threading in Bulldozer.
And you don't need "heavy branching" to make clustered decode work. Branching happens quite often enough that Tremont reaches 6-wide quite often even without the load balancer.
 

gdansk

Diamond Member
Feb 8, 2011
4,299
7,201
136
Which makes zero sense. If you develop tools to accelerate your work, you will deploy the same tools across all the divisions, across the product stack.
Not what I said at all. If you'd like to be charitable go read it again. But to reiterate:
1. ML has been used in hardware design for sometime. Cadences didn't improve instead have become worse. Why?
2. Design complexity grows. Often beyond any increase in productivity.
3. MI is getting more resources. It should have more overlapping work and could have a shorter cadence.

And obviously the Zen teams are doing far more work on producing variants and proliferation of SKUs into more niches than they were before. I don't expect the cadence to improve.
 
Last edited:

soresu

Diamond Member
Dec 19, 2014
3,922
3,349
136
Even that 13% is not so impressive for such ambitious design, right?
I always felt like AMD saw potential vulnerabilities in Zen5's design in the later engineering stages causing them to preemptively disable parts of the silicon to prevent exploitation.
 

gdansk

Diamond Member
Feb 8, 2011
4,299
7,201
136
Even that 13% is not so impressive for such ambitious design, right?
No, it's pretty good. People are just expecting too much exploitable ILP. Despite all the changes Zen is still limited by the frontend and it is (allegedly) the most difficult part of an x86 chip.
 
Last edited:
  • Like
Reactions: Tlh97

Geddagod

Golden Member
Dec 28, 2021
1,426
1,540
106
I always felt like AMD saw potential vulnerabilities in Zen5's design in the later engineering stages causing them to preemptively disable parts of the silicon to prevent exploitation.
I just don't think that AMD hit their goals they were expecting too tbh.
No, it's pretty good. People are just expecting way too much ILP from existing real world x86 code.
TBF, Zen 5's tock just does not seem great. Esp in comparison to Zen 3.
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,055
3,859
136
I just don't think that AMD hit their goals they were expecting too tbh.

TBF, Zen 5's tock just does not seem great. Esp in comparison to Zen 3.

LNC wasn't a bad tock either. That's just how fruit are picked.
Zen5 is very different/odd compared to everyone else.
They went and built way bigger bones but then kept the muscle at zen4+ size.
if i remember it was supposed to be on 3N , so it ended up being a bigger sized core on 4N.

it seems the balance mantra is still king and they aren't in a rush to take all the sugar many other cores have ( ROB/reg file size ).

im guessing they would have targeted higher clocks on 4N and some larger internal structures and then Zen5 would have been more inline with typical.

here is hoping for Zen6 to spend 2 gens of sugar at once + good clock boost to boot.
if i was a betting man Zen 7's probably gonna but Zen5 plus a bit style up lift.