Question Zen 6 Speculation Thread

adroc_thurston · 2025-10-29T10:25:49-0400

LightningZ71 said:
is it really using a Medusa Monolithic chip with the CPU CCX fused off?

yeah

LightningZ71 said:
Unless they are expecting a whole lot of die recovery needed

You don't need a 'whole lot' since Hi is only for R9 (tiny subset of the overall volume).

OneEng2 · 2025-10-29T10:36:46-0400

poke01 said:
N3B isn’t that great of an improvement over N4P

N3B is a pretty good bump over N4P. Even N3E was. N3B was a very good node (better than N3E), it was just expensive.

poke01 said:
52 cores on dual channel memory is a gimmick or rather a marketing play,

Intel can’t win in 1T so they try to win nT which is much easier.

I am wondering more and more as we discuss this point if this isn't exactly what is going on. It seems very "Intel-esque" like the days of Netburst and the blue man group.

Thunder 57 said:
You must be joking. 1T is more difficult than MT easily. You can just spam cores and add power to win at MT with a halfway decent design.

I agree.

StefanR5R said:
Of these rumored 52 cores, 4 do not pull any power to speak of: They are low-power cores for background load/ near idle scenarios. These cores exist for battery powered devices. Intel could just as well fuse these cores off in desktop SKUs. But maybe they won't for marketing purposes.

So let's look just a hair's breadth beneath the surface.

Intel's rumored top desktop CPU runs either up to 8+8 fast threads or up to 48 throughput threads.

AMD's rumored top desktop CPU runs either up to 12+12 fast threads or up to 48 throughput threads.

From that, it is not hard to extrapolate the likely behaviour of programs which scale better or worse on such CPUs.

True. NVL is actually a 48c part. Still, a single SMT core with Zen 5 in highly MT code, only gets you about 30% on desktop (can be more like 40% in DC) by using SMT.

Two Skymont cores gives you 100% scaling in highly MT code.

Josh128 said:
A lot of people here are claiming N3B is barely any better than N4P. If thats the case, then Intel going from N3B to N2 is going to be very close to the same jump that AMD is getting going from N4P to N2. Cant have it both ways.

Agree. We should use the same logic in both cases!

Joe NYC · 2025-10-29T10:49:28-0400

adroc_thurston said:
yeah

You don't need a 'whole lot' since Hi is only for R9 (tiny subset of the overall volume).

So the low end configuration is just the monolithic chip, and the higher end implementation just uses the LP cores, disables the cores on SoC and uses the cores from attached CCD?

I kind of like the idea (other than wasting die space). The cores on the 12 core CCD will perform much faster on CPU intensive tasks, and why leave it up to MSFT to screw it up with their scheduling and threads jumping back and forth. And there will be a nice 48MB L3 touse.

511 · 2025-10-29T11:01:37-0400

OneEng2 said:
N3B is a pretty good bump over N4P. Even N3E was. N3B was a very good node (better than N3E), it was just expensive.

no it's not N3E is better than N3B except SRAM

adroc_thurston · 2025-10-29T11:04:03-0400

Joe NYC said:
I kind of like the idea (other than wasting die space)

You're not "wasting die space" since you're using a tiny amount of salvage to field a tiny amount of top-end SKUs.

Josh128 · 2025-10-29T11:27:57-0400

adroc_thurston said:
yeah

You don't need a 'whole lot' since Hi is only for R9 (tiny subset of the overall volume).

So its only die with defective CCXs? Otherwise why not just leave them active for higher core count?

adroc_thurston · 2025-10-29T11:29:01-0400

Josh128 said:
So its only die with defective CCXs?

Yeah, bottom of the barrel MDS1's for top configs.

Josh128 said:
Otherwise why not just leave them active for higher core count?

Tri-cluster with *four* core tiers is a bad idea!

Hulk · 2025-10-29T11:29:07-0400

Josh128 said:
Nothing is a given.

Exactly. It's all rumors at this point. Always in motion are these CPU's on the horizon.

In other news I wasted the morning finding the USB drive with my new TPLink Deco Be63 mesh network. Got it connected but was waaay harder to find and connect to the USB drive than it should have been. Good news is USB sharing speed is like 12 times faster, from 5mpbs with old Netgear R7000 to 60mpbs with Deco mesh. Cover with two units is of course much better as well. Funny thing is I was on a chat with TPLink tech support for an hour before they escalated me to higher support, during the wait I figured it out. Just in case you come across a network file sharing issue...

Fixed using Powershell:

Set-SmbClientConfiguration -EnableInsecureGuestLogons $true -Force

Set-SmbClientConfiguration -RequireSecuritySignature $false -Force

Set-SmbServerConfiguration -RequireSecuritySignature $false -Force

StefanR5R · 2025-10-29T14:16:13-0400

StefanR5R said:
Intel's rumored top desktop CPU runs either up to 8+8 fast threads or up to 48 throughput threads.

AMD's rumored top desktop CPU runs either up to 12+12 fast threads or up to 48 throughput threads.

Fjodor2001 said:
Difference is that Intel's vs AMD's throughput threads are different animals, because the latter one uses HT/SMT.

OneEng2 said:
Still, a single SMT core with Zen 5 in highly MT code, only gets you about 30% on desktop (can be more like 40% in DC) by using SMT.

Two Skymont cores gives you 100% scaling in highly MT code.

But we aren't talking 2 SMT2 threads vs. 2 E cores. We are talking fourty-eight SMT2 threads (in a desktop computer) vs. fourty-eight P/E cores (in another desktop computer). This gets you pretty much the same scaling (i.e. total computer throughput) as if AMD put over 9000 SMT2 threads into their desktop computer and Intel put over 9000 P/E cores into theirs.

Edit, that is: At these high thread counts, uncore, memory subsystem, socket power budget... suddenly become much more interesting than core-internal implementation details such as SMT2 vs. P/E. Not to mention the impact of the software side (algorithms, dataset sizes, data formats...) on your scaling.

Fjodor2001 · 2025-10-29T14:36:59-0400

luro said:
Yeah. At the end it’s going to be 48 vs 48. That 4LPE will just make any difference in the marketing presentations

Difference is that the 48 AMD threads will be slower than the 48 Intel threads. Because for AMD it's 24C/48T, but for Intel it's 48C/48T.

(And yeah I know for Intel it's a mix of P+E cores while AMD uses only P cores, but the difference above will trump that.)

Tuna-Fish · 2025-10-29T14:42:54-0400

StefanR5R said:
Edit, that is, uncore, memory subsystem, socket power budget... suddenly become much more interesting than core-internal implementation details such as SMT2 vs. P/E.

Seconding this, at 24 full cores the max core throughput of Zen6 probably depends more on the IOD than on the cores. With all that throughput driven by just 128 bits of DDR5, just how fast the DDR5 runs and how well it's utilized will matter a lot.

LightningZ71 · 2025-10-29T14:45:46-0400

Something else to keep in mind: The P cores of Nova Lake are switching to a shared L2 cache strategy where pairs of P cores share a single 4 MB L2 cache pool. This will have a modest, but negative impact on MT performance as they will both have to share a single ring bus port. The e core will also continue to share L2 pools in a 4:1 ratio. The more throughput the individual cores demand, the more of a bottleneck those shared ports will create. That's 24 full performance cores vying for 8 ports on a ring bus per core chip vs. 12 cores with 2 SMT threads vying for 12 links to a hybrid mesh L3. Now, the L2 on Intel's cores is larger for the P cores at least, so that should help, but, in the end, that's a whole lot of contention.

Philste · 2025-10-29T14:52:35-0400

adroc_thurston said:
You don't need a 'whole lot' since Hi is only for R9 (tiny subset of the overall volume).

So R7 is basically Krackan with ZEN6? (In a similar way that Cezanne was Renoir with ZEN3).

And the R9s get a CCD with 10/12 active ZEN6 cores and the 4+4 in the main Die gets fused off?

adroc_thurston · 2025-10-29T15:02:31-0400

Philste said:
So R7 is basically Krackan with ZEN6? (In a similar way that Cezanne was Renoir with ZEN3).

Not quite, it's a shrink + assorted updated IPs.

Philste said:
And the R9s get a CCD with 10/12 active ZEN6 cores and the 4+4 in the main Die gets fused off?

Yeah why not.

StefanR5R · 2025-10-29T15:34:09-0400

Fjodor2001 said:
the 48 AMD threads will be slower than the 48 Intel threads.

So you are saying that e.g. 48 FP64 fused multiply add ops (without data interdependence) will go faster on Intel than on AMD? And why will this be the case?

HurleyBird · 2025-10-29T15:34:45-0400

adroc_thurston said:
Yeah, bottom of the barrel MDS1's for top configs.

Tri-cluster with *four* core tiers is a bad idea!

You could do three clusters with one tier each by disabling just one tier on the middle cluster. And the LPE cluster doesn't really count the same.

Markfw · 2025-10-29T15:43:50-0400

Fjodor2001 said:
Difference is that the 48 AMD threads will be slower than the 48 Intel threads. Because for AMD it's 24C/48T, but for Intel it's 48C/48T.

(And yeah I know for Intel it's a mix of P+E cores while AMD uses only P cores, but the difference above will trump that.)

AMD does not have P cores or E cores, thats an Intel thing. AMDs cores are all the same.

OneEng2 · 2025-10-29T15:56:32-0400

511 said:
no it's not N3E is better than N3B except SRAM

Link please.

StefanR5R said:
But we aren't talking 2 SMT2 threads vs. 2 E cores. We are talking fourty-eight SMT2 threads (in a desktop computer) vs. fourty-eight P/E cores (in another desktop computer). This gets you pretty much the same scaling (i.e. total computer throughput) as if AMD put over 9000 SMT2 threads into their desktop computer and Intel put over 9000 P/E cores into theirs.

Edit, that is: At these high thread counts, uncore, memory subsystem, socket power budget... suddenly become much more interesting than core-internal implementation details such as SMT2 vs. P/E. Not to mention the impact of the software side (algorithms, dataset sizes, data formats...) on your scaling.

Currently, it appears that with ARL it is not yet memory bound and scales well with the number of e cores.

I believe that both Intel and AMD will increase memory bandwidth with the next gen. Will it be enough? Great question.

Fjodor2001 said:
Difference is that the 48 AMD threads will be slower than the 48 Intel threads. Because for AMD it's 24C/48T, but for Intel it's 48C/48T.

(And yeah I know for Intel it's a mix of P+E cores while AMD uses only P cores, but the difference above will trump that.)

Agree.

MS_AT · 2025-10-29T16:00:00-0400

StefanR5R said:
So you are saying that e.g. 48 FP64 fused multiply add ops (without data interdependence) will go faster on Intel than on AMD? And why will this be the case?

In favourable circumstances for Intel it will be the case. Think Cinebench vs Y-cruncher. E cores have more but narrower execution units (4fma per cycle vs 2 on zen) so if software uses mostly scalar or narrow simd E cores might have a lead.

soresu · 2025-10-29T16:00:24-0400

AMD XDNA Linux Driver Preps For New Ryzen AI "NPU3A" Revision - Phoronix

www.phoronix.com

Is this NPU3x the internal name for XDNA3?

Fjodor2001 · 2025-10-29T16:06:03-0400

StefanR5R said:
So you are saying that e.g. 48 FP64 fused multiply add ops (without data interdependence) will go faster on Intel than on AMD? And why will this be the case?

So you are saying that's the only CPU instruction that the CPUs are executing?

Fjodor2001 · 2025-10-29T16:07:50-0400

Markfw said:
AMD does not have P cores or E cores, thats an Intel thing. AMDs cores are all the same.

Correct, but it's just a matter of terminology, and you know what I mean.

naukkis · 2025-10-29T17:01:05-0400

LightningZ71 said:
Something else to keep in mind: The P cores of Nova Lake are switching to a shared L2 cache strategy where pairs of P cores share a single 4 MB L2 cache pool. This will have a modest, but negative impact on MT performance as they will both have to share a single ring bus port. The e core will also continue to share L2 pools in a 4:1 ratio. The more throughput the individual cores demand, the more of a bottleneck those shared ports will create. That's 24 full performance cores vying for 8 ports on a ring bus per core chip vs. 12 cores with 2 SMT threads vying for 12 links to a hybrid mesh L3. Now, the L2 on Intel's cores is larger for the P cores at least, so that should help, but, in the end, that's a whole lot of contention.

You know ringbus performance won't grow with more ringstops but actually regress? L3-speed increases directly with more slices but that's a different story. Nova Lake L3-slices will be massive 20MB so instead of splitting L3-accesses to ring like current generation it might prefer local slice(s) L3-caching, so each CPU pair might actually have 24MB of local cache before accessing ring - thus greatly reducing L3 bandwidth. And because of limiting ring stops to so few Intel have great possibility to unify their L3 in their two chiplet designs. There is possibility that Nova Lake might perform quite well.

My impression is that Pat kicked bean counters to their nuts and that design is for performance without cost limits. It might not be financially valid product but it sure should not lack performance and AMD need to be on top of their game to retain top spot on performance race.

adroc_thurston · 2025-10-29T17:36:07-0400

HurleyBird said:
You could do three clusters with one tier each by disabling just one tier on the middle cluster. And the LPE cluster doesn't really count the same.

A lot of effort for a dingy quad not worth mentioning.

Thunder 57 · 2025-10-29T18:17:28-0400

Chicken76 said:
1 Million?! Who? How? When?
Idiocracy© was awsome I agree. Hard to believe it came out almost 20 years ago.

You want me to do your homework for you

? This is supposed to be a family friendly-ish site. I'll give you a clue, open any search engine and type "bh" and look at what it suggests.

Question Zen 6 Speculation Thread

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Elite Member

Diamond Member

Golden Member

Platinum Member

Senior member

Diamond Member

Elite Member

Platinum Member

Moderator Emeritus, Elite Member

Senior member

Senior member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member