Question Zen 6 Speculation Thread

Page 282 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

OneEng2

Senior member
Sep 19, 2022
883
1,132
106
N3B isn’t that great of an improvement over N4P
N3B is a pretty good bump over N4P. Even N3E was. N3B was a very good node (better than N3E), it was just expensive.
52 cores on dual channel memory is a gimmick or rather a marketing play,

Intel can’t win in 1T so they try to win nT which is much easier.
I am wondering more and more as we discuss this point if this isn't exactly what is going on. It seems very "Intel-esque" like the days of Netburst and the blue man group.
You must be joking. 1T is more difficult than MT easily. You can just spam cores and add power to win at MT with a halfway decent design.
I agree.
Of these rumored 52 cores, 4 do not pull any power to speak of: They are low-power cores for background load/ near idle scenarios. These cores exist for battery powered devices. Intel could just as well fuse these cores off in desktop SKUs. But maybe they won't for marketing purposes.

So let's look just a hair's breadth beneath the surface.
  • Intel's rumored top desktop CPU runs either up to 8+8 fast threads or up to 48 throughput threads.
  • AMD's rumored top desktop CPU runs either up to 12+12 fast threads or up to 48 throughput threads.
From that, it is not hard to extrapolate the likely behaviour of programs which scale better or worse on such CPUs.
True. NVL is actually a 48c part. Still, a single SMT core with Zen 5 in highly MT code, only gets you about 30% on desktop (can be more like 40% in DC) by using SMT.

Two Skymont cores gives you 100% scaling in highly MT code.
A lot of people here are claiming N3B is barely any better than N4P. If thats the case, then Intel going from N3B to N2 is going to be very close to the same jump that AMD is getting going from N4P to N2. Cant have it both ways.
Agree. We should use the same logic in both cases!
 

Joe NYC

Diamond Member
Jun 26, 2021
3,731
5,291
136
yeah

You don't need a 'whole lot' since Hi is only for R9 (tiny subset of the overall volume).

So the low end configuration is just the monolithic chip, and the higher end implementation just uses the LP cores, disables the cores on SoC and uses the cores from attached CCD?

I kind of like the idea (other than wasting die space). The cores on the 12 core CCD will perform much faster on CPU intensive tasks, and why leave it up to MSFT to screw it up with their scheduling and threads jumping back and forth. And there will be a nice 48MB L3 touse.
 

Hulk

Diamond Member
Oct 9, 1999
5,163
3,780
136
Nothing is a given.
Exactly. It's all rumors at this point. Always in motion are these CPU's on the horizon.

In other news I wasted the morning finding the USB drive with my new TPLink Deco Be63 mesh network. Got it connected but was waaay harder to find and connect to the USB drive than it should have been. Good news is USB sharing speed is like 12 times faster, from 5mpbs with old Netgear R7000 to 60mpbs with Deco mesh. Cover with two units is of course much better as well. Funny thing is I was on a chat with TPLink tech support for an hour before they escalated me to higher support, during the wait I figured it out. Just in case you come across a network file sharing issue...

Fixed using Powershell:

Set-SmbClientConfiguration -EnableInsecureGuestLogons $true -Force

Set-SmbClientConfiguration -RequireSecuritySignature $false -Force

Set-SmbServerConfiguration -RequireSecuritySignature $false -Force
 
Last edited:
  • Like
Reactions: OneEng2

StefanR5R

Elite Member
Dec 10, 2016
6,706
10,632
136
  • Intel's rumored top desktop CPU runs either up to 8+8 fast threads or up to 48 throughput threads.
  • AMD's rumored top desktop CPU runs either up to 12+12 fast threads or up to 48 throughput threads.
Difference is that Intel's vs AMD's throughput threads are different animals, because the latter one uses HT/SMT.
Still, a single SMT core with Zen 5 in highly MT code, only gets you about 30% on desktop (can be more like 40% in DC) by using SMT.

Two Skymont cores gives you 100% scaling in highly MT code.
But we aren't talking 2 SMT2 threads vs. 2 E cores. We are talking fourty-eight SMT2 threads (in a desktop computer) vs. fourty-eight P/E cores (in another desktop computer). This gets you pretty much the same scaling (i.e. total computer throughput) as if AMD put over 9000 SMT2 threads into their desktop computer and Intel put over 9000 P/E cores into theirs.

Edit, that is: At these high thread counts, uncore, memory subsystem, socket power budget... suddenly become much more interesting than core-internal implementation details such as SMT2 vs. P/E. Not to mention the impact of the software side (algorithms, dataset sizes, data formats...) on your scaling.
 
Last edited:

Fjodor2001

Diamond Member
Feb 6, 2010
4,239
596
126
Yeah. At the end it’s going to be 48 vs 48. That 4LPE will just make any difference in the marketing presentations
Difference is that the 48 AMD threads will be slower than the 48 Intel threads. Because for AMD it's 24C/48T, but for Intel it's 48C/48T.

(And yeah I know for Intel it's a mix of P+E cores while AMD uses only P cores, but the difference above will trump that.)
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,677
2,560
136
Edit, that is, uncore, memory subsystem, socket power budget... suddenly become much more interesting than core-internal implementation details such as SMT2 vs. P/E.
Seconding this, at 24 full cores the max core throughput of Zen6 probably depends more on the IOD than on the cores. With all that throughput driven by just 128 bits of DDR5, just how fast the DDR5 runs and how well it's utilized will matter a lot.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,573
3,256
136
Something else to keep in mind: The P cores of Nova Lake are switching to a shared L2 cache strategy where pairs of P cores share a single 4 MB L2 cache pool. This will have a modest, but negative impact on MT performance as they will both have to share a single ring bus port. The e core will also continue to share L2 pools in a 4:1 ratio. The more throughput the individual cores demand, the more of a bottleneck those shared ports will create. That's 24 full performance cores vying for 8 ports on a ring bus per core chip vs. 12 cores with 2 SMT threads vying for 12 links to a hybrid mesh L3. Now, the L2 on Intel's cores is larger for the P cores at least, so that should help, but, in the end, that's a whole lot of contention.
 

Philste

Senior member
Oct 13, 2023
304
478
106
You don't need a 'whole lot' since Hi is only for R9 (tiny subset of the overall volume).
So R7 is basically Krackan with ZEN6? (In a similar way that Cezanne was Renoir with ZEN3).

And the R9s get a CCD with 10/12 active ZEN6 cores and the 4+4 in the main Die gets fused off?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,281
16,122
136
Difference is that the 48 AMD threads will be slower than the 48 Intel threads. Because for AMD it's 24C/48T, but for Intel it's 48C/48T.

(And yeah I know for Intel it's a mix of P+E cores while AMD uses only P cores, but the difference above will trump that.)
AMD does not have P cores or E cores, thats an Intel thing. AMDs cores are all the same.
 
  • Like
Reactions: booklib28

OneEng2

Senior member
Sep 19, 2022
883
1,132
106
no it's not N3E is better than N3B except SRAM
Link please.
But we aren't talking 2 SMT2 threads vs. 2 E cores. We are talking fourty-eight SMT2 threads (in a desktop computer) vs. fourty-eight P/E cores (in another desktop computer). This gets you pretty much the same scaling (i.e. total computer throughput) as if AMD put over 9000 SMT2 threads into their desktop computer and Intel put over 9000 P/E cores into theirs.

Edit, that is: At these high thread counts, uncore, memory subsystem, socket power budget... suddenly become much more interesting than core-internal implementation details such as SMT2 vs. P/E. Not to mention the impact of the software side (algorithms, dataset sizes, data formats...) on your scaling.
Currently, it appears that with ARL it is not yet memory bound and scales well with the number of e cores.

I believe that both Intel and AMD will increase memory bandwidth with the next gen. Will it be enough? Great question.
Difference is that the 48 AMD threads will be slower than the 48 Intel threads. Because for AMD it's 24C/48T, but for Intel it's 48C/48T.

(And yeah I know for Intel it's a mix of P+E cores while AMD uses only P cores, but the difference above will trump that.)
Agree.
 
  • Like
Reactions: Fjodor2001

MS_AT

Senior member
Jul 15, 2024
886
1,780
96
So you are saying that e.g. 48 FP64 fused multiply add ops (without data interdependence) will go faster on Intel than on AMD? And why will this be the case?
In favourable circumstances for Intel it will be the case. Think Cinebench vs Y-cruncher. E cores have more but narrower execution units (4fma per cycle vs 2 on zen) so if software uses mostly scalar or narrow simd E cores might have a lead.
 

naukkis

Golden Member
Jun 5, 2002
1,021
853
136
Something else to keep in mind: The P cores of Nova Lake are switching to a shared L2 cache strategy where pairs of P cores share a single 4 MB L2 cache pool. This will have a modest, but negative impact on MT performance as they will both have to share a single ring bus port. The e core will also continue to share L2 pools in a 4:1 ratio. The more throughput the individual cores demand, the more of a bottleneck those shared ports will create. That's 24 full performance cores vying for 8 ports on a ring bus per core chip vs. 12 cores with 2 SMT threads vying for 12 links to a hybrid mesh L3. Now, the L2 on Intel's cores is larger for the P cores at least, so that should help, but, in the end, that's a whole lot of contention.

You know ringbus performance won't grow with more ringstops but actually regress? L3-speed increases directly with more slices but that's a different story. Nova Lake L3-slices will be massive 20MB so instead of splitting L3-accesses to ring like current generation it might prefer local slice(s) L3-caching, so each CPU pair might actually have 24MB of local cache before accessing ring - thus greatly reducing L3 bandwidth. And because of limiting ring stops to so few Intel have great possibility to unify their L3 in their two chiplet designs. There is possibility that Nova Lake might perform quite well.

My impression is that Pat kicked bean counters to their nuts and that design is for performance without cost limits. It might not be financially valid product but it sure should not lack performance and AMD need to be on top of their game to retain top spot on performance race.
 
Last edited:

Thunder 57

Diamond Member
Aug 19, 2007
4,090
6,835
136
1 Million?! :eek: Who? How? When?
Idiocracy© was awsome I agree. Hard to believe it came out almost 20 years ago.

You want me to do your homework for you ;) ? This is supposed to be a family friendly-ish site. I'll give you a clue, open any search engine and type "bh" and look at what it suggests.