Question Zen 6 Speculation Thread

Page 245 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

soresu

Diamond Member
Dec 19, 2014
4,062
3,519
136
yeah.

oh nooo 3D core is something wayyyyy different. Forget about it for a moment.
Oh no I got that it isn't, MLID made that clear - it's just what I originally assumed he meant before he explained.

disappointed-hercules.gif
 

soresu

Diamond Member
Dec 19, 2014
4,062
3,519
136
Intel has been trying to make this work for a certain forest for years now. Lots of slideware available on that.
Oh I have no doubt, but the density of vertical interconnects is gonna have to be pretty insane to make it truly viable to put L2 off die.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,480
3,152
136
Yeah. Don't think that L2 is leaving the building any time soon. If anything, it might get doubled to hide the extra cycles paid to move all the L3 off to a cache die.
 

soresu

Diamond Member
Dec 19, 2014
4,062
3,519
136
With 50% more cores and 1.7-1.75x expected better MT it s definitly no, it wont match the Zen/Zen+ to Zen 2 transition.


It'll be a nice transition from my aging 3950X tho 😅

Should definitely supercharge my encoding speed with x265 and SVT-AV1.
 

OneEng2

Senior member
Sep 19, 2022
815
1,082
106
SP5 Turin Classic has no successor.
Why not? Why does this make any sense at all?

If there is some reason it DOES make sense, then why make anything (other than thread ripper) with full Zen 6 cores?

Put another way .....

What applications will run faster on 96 Zen 6 full cores than on 128 full Zen 5 cores?
I think that people are failing to see the big change for this generation and not seeing where it will make the most difference. The "c" core CCD is rumored to have the same total amount of L3 cache as the normal core CCD. It's still less per core, but the total local pool is much larger. In addition, with the node improvement, even if just from N3, you still get a notable improvement in throughput per watt. I suspect that, in many cases, 128 cores of Zen 6c will be faster than regular Zen5, and I don't think that there will be a notable difference in all core steady state clocks under load with Zen6 possibly doing better.
128 cores of Zen 6c will be faster than 128 cores of Zen 5 full? Please explain why.

Even if the above is true, in workloads where Turin Zen 5 full was used, it is hard to imagine how 96c Zen 6 could perform better than 128c Zen 5.
 

marees

Golden Member
Apr 28, 2024
1,657
2,261
96
Why not? Why does this make any sense at all?

If there is some reason it DOES make sense, then why make anything (other than thread ripper) with full Zen 6 cores?

Put another way .....

What applications will run faster on 96 Zen 6 full cores than on 128 full Zen 5 cores?

128 cores of Zen 6c will be faster than 128 cores of Zen 5 full? Please explain why.

Even if the above is true, in workloads where Turin Zen 5 full was used, it is hard to imagine how 96c Zen 6 could perform better than 128c Zen 5.
Who will buy 192 zen 6p ? And any system bottlenecks that make it less cost effective?
 

adroc_thurston

Diamond Member
Jul 2, 2023
6,757
9,449
106
Who wants it?
Why does this make any sense at all?
Because.
If there is some reason it DOES make sense, then why make anything (other than thread ripper) with full Zen 6 cores?
Because enterprise dinosaurs exist.
What applications will run faster on 96 Zen 6 full cores than on 128 full Zen 5 cores?
Anything sensitive to per-thread perf.
128 cores of Zen 6c will be faster than 128 cores of Zen 5 full? Please explain why.
idk new cores and N2p looks caaaash mang.
 

branch_suggestion

Senior member
Aug 4, 2023
809
1,747
106
What applications will run faster on 96 Zen 6 full cores than on 128 full Zen 5 cores?
Well each core has 50% moar L3 to access if needed, they clock higher and IPC bump, so should be a fair battle. Memory is actually a wash with the clock uplift and MRDIMM.
128 cores of Zen 6c will be faster than 128 cores of Zen 5 full? Please explain why.
Each core has up to 4x moar L3 if needed, clocks at actual operating power will be similar enough and IPC is better.
Even if the above is true, in workloads where Turin Zen 5 full was used, it is hard to imagine how 96c Zen 6 could perform better than 128c Zen 5.
Z6 EPYC having only up to 8 CCD's vs 12/16 for Z5 helps a lot with just about everything, along with the upgrade for dense to 4MB L3/core.
SP8 is a cheaper platform than SP5 with a different customer mix, it doesn't have to beat the old all classic part in socket perf, just single core for those who license such things.
 
  • Like
Reactions: Tlh97 and Win2012R2

adroc_thurston

Diamond Member
Jul 2, 2023
6,757
9,449
106
clocks at actual operating power will be similar enough
higher.
It's really really funny given what A0 booted at.
SP8 is a cheaper platform than SP5 with a different customer mix, it doesn't have to beat the old all classic part in socket perf, just single core for those who license such things.
It's just that the appeal of gigasockets is limited outside of cloud favelas.
 

basix

Senior member
Oct 4, 2024
209
416
96
Keep in mind that we are talking about the Zen 6c variant for the 128C SKU. Those 5.0 GHz of the F-SKUs will be hard to reach. But as I speculated before, 4.5 GHz could be a thing.

But in the end peak boost clock rates are not what matter too much, but the average frequency when under load. And there, with same core count, Zen 6 could potentially run circles around Zen 5.
 
  • Like
Reactions: Tlh97

marees

Golden Member
Apr 28, 2024
1,657
2,261
96
OK but what is it? Experimenting with chip on chip? Experimenting with making a single chip in layers? Hmmm
The leaker claims that Zen 7 chips will have 2 MB of on-die L2 cache per core alongside 7 MB of L3 per core in the form of V-Cache chiplets. This way, AMD is taking the 3D V-Cache concept that it first introduced with the Ryzen 7 5800X3D and giving each Zen 7 CPU core its own V-Cache, hence the term “3D Core”.

 

MS_AT

Senior member
Jul 15, 2024
856
1,734
96
The leaker claims that Zen 7 chips will have 2 MB of on-die L2 cache per core alongside 7 MB of L3 per core in the form of V-Cache chiplets. This way, AMD is taking the 3D V-Cache concept that it first introduced with the Ryzen 7 5800X3D and giving each Zen 7 CPU core its own V-Cache, hence the term “3D Core”.

It makes little sense for each core to have a distinct own L3 die. Wouldn't it be simply easier to produce just moving the whole L3 to separate die beneath the compute die? I mean they can tune the layout to ensure that each 7MB segment lands underneath a corresponding core making it tad faster to access but should be easier to produce and ensure that that no additional synchro is needed between smaller L3 dies. It might be I have misread the leak though.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,662
2,523
136
Wouldn't it be simply easier to produce just moving the whole L3 to separate die beneath the compute die?
Yes, that is precisely what they are doing, marees is confused.

I mean they can tune the layout to ensure that each 7MB segment lands underneath a corresponding core making it tad faster to access
You don't want to do that. You need to access a line in L3 that is not currently in your L2. Which L3 slices do you send a request to? If that line can exist in any L3 slice, you would have to send a message to all of them, this would absolutely destroy power efficiency. The way all Zen L3s work is that line placement in slices only depends on the physical address of that line, so that they are evenly striped across all slices. This + extra tag arrays in each L3 slice that cover any lines currently in L2 caches minimizes coherency traffic and can be made quite fast, as shown by how good the Zen L3 generally is.

Also, for a lot of workloads, such as games, you want all cores to be able to efficiently use all of the L3, there will probably be a lot of sharing. (This is less of an issue for webservers and such.)
 

MadRat

Lifer
Oct 14, 1999
11,997
305
126
It makes little sense for each core to have a distinct own L3 die. Wouldn't it be simply easier to produce just moving the whole L3 to separate die beneath the compute die? I mean they can tune the layout to ensure that each 7MB segment lands underneath a corresponding core making it tad faster to access but should be easier to produce and ensure that that no additional synchro is needed between smaller L3 dies. It might be I have misread the leak though.
It should speed up sharing information from other L3s if they work it that way.
 

Kaluan

Senior member
Jan 4, 2022
513
1,082
106
Sure, but that's nothing that we didn't know already from back in October last year.

N2 = 38 Mb/mm²
vs
N3E/N4(x)/N5(x) = 31.8 Mb/
~20% more bitcells per mm²


Edit: Well, that's certainly interesting. I honestly expected Z7 to be more of a "tock" than a potential industry gamechanger. But TBH, I followed exactly 0 leaks on Z7 until now. As early as this year's CES I wasn't even sure AMD will continue with the Zen moniker, hence I didn't bother to look up if Zen7 even exists as e mention from credible sources.
It makes little sense for each core to have a distinct own L3 die. Wouldn't it be simply easier to produce just moving the whole L3 to separate die beneath the compute die? I mean they can tune the layout to ensure that each 7MB segment lands underneath a corresponding core making it tad faster to access but should be easier to produce and ensure that that no additional synchro is needed between smaller L3 dies. It might be I have misread the leak though.
 
Last edited:

Kaluan

Senior member
Jan 4, 2022
513
1,082
106
That is Macro not the bitcell which i was referring to.
View attachment 130616
View attachment 130617
View attachment 130618

Ah OK.
But the end-result is still an improvement? They're making the arrays/periphery/I-O that house SRAM more efficient (that's what the macro SRAM structure entails, no?), so more N2 cells can fit in the same space a N3E/N4/N5 chip would have allocated.

That was mostly my point, don't think I even mention bit-cells specifically and I'm too lazy to check right now lol

Also, now I wonder how that would affect the 3D stacking? Has TSMC further optimized N7-SRAM node for N2 use? Anyway, I might just be splitting hairs at this point.
 
  • Like
Reactions: 511

Joe NYC

Diamond Member
Jun 26, 2021
3,519
5,097
136
I think that people are failing to see the big change for this generation and not seeing where it will make the most difference. The "c" core CCD is rumored to have the same total amount of L3 cache as the normal core CCD. It's still less per core, but the total local pool is much larger.

The pool goes from 32 MB for Zen 5c to 128 MB for Zen 6c

I remember there were performance gains in Zen 3 just from going to 2x16 MB to 1x32MB. So we may see a parallel here.

In addition, with the node improvement, even if just from N3, you still get a notable improvement in throughput per watt. I suspect that, in many cases, 128 cores of Zen 6c will be faster than regular Zen5, and I don't think that there will be a notable difference in all core steady state clocks under load with Zen6 possibly doing better.

Comparing full Zen 6c vs. full Zen 5
- IPC: Zen 6c > Zen 5
- L3 per core: same
- local L3 pool: 4x the size for Zen 6c
- clock speeds: with node improvement, we could possibly expect Zen 6c ~= Zen 5

So performance per core of Zen 6c should be approximately equal to full Zen 5 in Turin.