Question Zen 6 Speculation Thread

Nothingness · Tuesday at 10:02 AM

MS_AT said:
Do they give higher uplift in SPEC specifically compared to other codebases? I remember Phoronix was doing AOCC against vanilla clang, but since Phoronix is not doing SPEC I was not able to find a place that would compare the two (aocc vs vanilla clang) on the same hw/os in SPEC.

I only know that it has impacts on parts of SPEC. In my experience too often the optimizations targetting SPEC have little interest outside of it, but that's only my dated experience

OneEng2 · Tuesday at 1:10 PM

Hitman928 said:
One easy way to see that CPU design affects power is that different architectures use different numbers of transistors and power is directly related to the number of transistors in your design (static portion of total power unless power gated) and the number of transistors switching at any given time (dynamic portion of total power).

True. Still, something is always given up. Die size, clock speed, PPA, etc. You can't win on all sides at once.

Clever CPU designs do a better job of making the trade-offs though

.

Josh128 said:
Stop dogging Intel bruh, I thought you were rooting for them.

I absolutely am rooting for them! I don't wish for them to rely on US tax payer money forever though. I want them to actually be competitive.

I think that BSPDN and GAA in one step was very very risky. Intel only did this because they felt it was the only way to jump ahead of TSMC. My fear is that the gamble will not pay off.

I also feel their lack of SMT is killing them in DC where they used to reign supreme.

These kinds of strategic mistakes are troubling for anyone who is rooting for Intel IMO.

jpiniero · Tuesday at 1:11 PM

OneEng2 said:
I also feel their lack of SMT is killing them in DC where they used to reign supreme.

The main problem is that they need to be fabbing the server products at TSMC to even have a chance at competitiveness.

adroc_thurston · Tuesday at 1:12 PM

jpiniero said:
The main problem is that they need to be fabbing the server products at TSMC to even have a chance at competitiveness.

neither is really the problem.
Intel just has bad cores, bad fabric, bad etc.

Geddagod · Tuesday at 2:02 PM

adroc_thurston said:
neither is really the problem.
Intel just has bad cores, bad fabric, bad etc.

Honestly IMO there's no point of fabbing anything at TSMC if they couldn't compete even if they did.
ARL on TSMC might have outright been a strategic mistake in hindsight, but then again with 20A canned they might have been stuck on MTL-R if they didn't use TSMC lmao.

adroc_thurston · Tuesday at 2:04 PM

Geddagod said:
Honestly IMO there's no point of fabbing anything at TSMC if they couldn't compete even if they did.

well the plan was that they could.

Geddagod said:
ARL on TSMC might have outright been a strategic mistake in hindsight

They wanted the best node for the best cores, only that the cores turned out to be not-best.

LightningZ71 · Tuesday at 2:28 PM

The sad part is that, in the context of MTL(R) as seen in ARL-U, Intel 3 isn't a bad node. It is my opinion that they under invested in building it out, or it just ran too late for what they needed, to have been used for ARL proper, or for ARL(R) instead of TSMC. Of course, I'm not privy to the number of wafers that the committed to at TSMC on N3B and they may have just been in a situation that it didn't make any economic sense to use anything else.

Geddagod · Tuesday at 2:37 PM

LightningZ71 said:
The sad part is that, in the context of MTL(R) as seen in ARL-U, Intel 3 isn't a bad node. It is my opinion that they under invested in building it out, or it just ran too late for what they needed, to have been used for ARL proper, or for ARL(R) instead of TSMC. Of course, I'm not privy to the number of wafers that the committed to at TSMC on N3B and they may have just been in a situation that it didn't make any economic sense to use anything else.

Given how N3B turned out, I think Intel deff has some regrets. I still think it's the best node Intel could have gotten at the time... but still...

I do wonder though why they didn't end up porting their stuff to N3E. Or have at least LNL be on N3E. Even if ARL was planned for 2023 and thus uses N3B, wasn't LNL always planned for 2024? And during the time they were making these decisions, was Intel not still relatively well financially?

adroc_thurston · Tuesday at 2:38 PM

Geddagod said:
I do wonder though why they didn't end up porting their stuff to N3E.

because they're completely different nodes.

Geddagod · Tuesday at 2:51 PM

adroc_thurston said:
because they're completely different nodes.

Hence the use of the word port.
Intel deff has made relatively wasteful die segmentation decisions in the past, this would be right up Intel's alley, except this time the difference would actually be some what beneficial.

adroc_thurston · Tuesday at 2:54 PM

Geddagod said:
Hence the use of the word port.

Intel uses a ton of custom IP and all that is just a hassle to port.

Joe NYC · Tuesday at 3:02 PM

OneEng2 said:
I think that BSPDN and GAA in one step was very very risky. Intel only did this because they felt it was the only way to jump ahead of TSMC. My fear is that the gamble will not pay off.

It seems to me that where BSPD would help the most is high performance, high power chip, where a lot of power has to be delivered.

At the same time 18A does not excel for these types of chips, Intel went to TSMC N2 for Nova Lake.

adroc_thurston · Tuesday at 3:26 PM

Joe NYC said:
It seems to me that where BSPD would help the most is high performance, high power chip, where a lot of power has to be delivered.

BSPDNs are inherent heat insulators and provide a massive challenge for HPC applications.

Joe NYC said:
At the same time 18A does not excel for these types of chips, Intel went to TSMC N2 for Nova Lake.

That's because N2p is a much faster node.

poke01 · Tuesday at 3:26 PM

Joe NYC said:
It seems to me that where BSPD would help the most is high performance, high power chip, where a lot of power has to be delivered.

Or you could just make a powerful chip that is efficient without BSPD

adroc_thurston · Tuesday at 3:30 PM

poke01 said:
Or you could just make a powerful chip that is efficient without BSPD

It's a scaling booster.
It doesn't make anything efficient.

Geddagod · Tuesday at 3:30 PM

adroc_thurston said:
BSPDNs are inherent heat insulators and provide a massive challenge for HPC applications.

GPU specific maybe, if not for the highest end DT CPU. Nvidia seems to be using A16

poke01 said:
Or you could just make a powerful chip that is efficient without BSPD

Am I tweaking or does Apple not prob have amazing thermal density? Huge cores, low(ish) 1T power...

adroc_thurston · Tuesday at 3:31 PM

Geddagod said:
GPU specific maybe, if not for the highest end DT CPU. Nvidia seems to be using A16

Everyone.
Heat's getting hard for everyone.

Doug S · Tuesday at 4:57 PM

MS_AT said:
You don't need to go to assembly unless compiler is doing something stupid (or you are writing microbenchmarks that are checking architecture specific things, but then it's easier from assembly). You have intrinsics. You have 3rd party libs. Some language have support built-in into the standard library of the language. And most of all, if you write the high level code using well known idioms for a language the compiler will be able to pick this up. Problem is compilers are capricious so the libs are better fit if you want control. But we are well past: "want to use SIMD? you have to roll your hand optimized assembly every time".

True but when it comes to libraries that's another variable that can confuse results. If a library is rewritten to use SIMD or matrix instructions it could have some pretty significant impact on the results of certain benchmarks. Unless Geekbench is statically linking everything (and that's increasingly not an option these days) you could do update your OS to the latest patches and if it includes that new library suddenly the same version of Geekbench on the same hardware is showing improved results.

Not really any way to control for that factor, and you may not even know it IS a factor unless you notice a performance gain that's large enough to not dismiss as noise and start digging around to find out why.

OneEng2 · Tuesday at 7:17 PM

Tuna-Fish said:
If you can fit some of your most important indexes in cache, it suddenly becomes very relevant for databases.

Absolutely.

Joe NYC said:
It seems to me that where BSPD would help the most is high performance, high power chip, where a lot of power has to be delivered.

At the same time 18A does not excel for these types of chips, Intel went to TSMC N2 for Nova Lake.

I think it is just the opposite. Where BSPDN is likely to do well is in something like DC where you are likely to max out the socket power with a butt ton of cores all running at relatively lower clocks.

I believe BSPDN provides higher density, and lower IR losses (less heat) at the interconnects .... but it makes it easier to get hot spots in your core logic ..... making it difficult to scale up in clock speed for applications like HEDT.

Markfw · Tuesday at 7:49 PM

OneEng2 said:
Absolutely.

I think it is just the opposite. Where BSPDN is likely to do well is in something like DC where you are likely to max out the socket power with a butt ton of cores all running at relatively lower clocks.

I believe BSPDN provides higher density, and lower IR losses (less heat) at the interconnects .... but it makes it easier to get hot spots in your core logic ..... making it difficult to scale up in clock speed for applications like HEDT.

OK, I can't find the original post to quote, but where you agreed about the cache fitting in memory, I wanted to add a real world scenario. A primegrid task where you use pinning (can be accomplished by smaller instances that are pinned) time goes from days to 9 hours. smaller examples exist, just giving real world times on a database task.

Thibsie · 2025-10-15T00:46:56-0400

Doug S said:
True but when it comes to libraries that's another variable that can confuse results. If a library is rewritten to use SIMD or matrix instructions it could have some pretty significant impact on the results of certain benchmarks. Unless Geekbench is statically linking everything (and that's increasingly not an option these days) you could do update your OS to the latest patches and if it includes that new library suddenly the same version of Geekbench on the same hardware is showing improved results.

Not really any way to control for that factor, and you may not even know it IS a factor unless you notice a performance gain that's large enough to not dismiss as noise and start digging around to find out why.

Not if the said library is bundled with geekbench ?

Doug S · 2025-10-15T00:53:37-0400

Thibsie said:
Not if the said library is bundled with geekbench ?

I should have clarified I was talking about libraries that are part of the OS / shipped by the OS vendor, like Apple's Accelerate library. Those are the only ones an OS update might modify/improve.

MS_AT · 2025-10-15T04:55:16-0400

Doug S said:
True but when it comes to libraries that's another variable that can confuse results

Just a clarification by mentioning libs in SIMD context I meant libs facilitating SIMD usage like Agner Fog's vectorclass, EVE or google's highway. These you could link statically if you so desired.

In general sure, depending on system wide DLLs will introduce variations in the results.

Nothingness · 2025-10-15T08:04:40-0400

Doug S said:
True but when it comes to libraries that's another variable that can confuse results. If a library is rewritten to use SIMD or matrix instructions it could have some pretty significant impact on the results of certain benchmarks. Unless Geekbench is statically linking everything (and that's increasingly not an option these days) you could do update your OS to the latest patches and if it includes that new library suddenly the same version of Geekbench on the same hardware is showing improved results.

Not really any way to control for that factor, and you may not even know it IS a factor unless you notice a performance gain that's large enough to not dismiss as noise and start digging around to find out why.

Can't remember if it was in GB4 or GB5, but one of the tests was making a significant use of sinf/cosf (single precision sine/cosine) which back then were poorly optimized in Android math lib. Also note how results posted on SPEC website use specific versions of math and memory libraries to override system libraries.

Geddagod · 2025-10-15T11:04:43-0400

reaperrr3 said:
Zen4->5 was on a barely improved node, with a ~30% fatter core due to full-rate AXV512/512bit FP pipes and 50% more INT ALUs.
It was a very server-focused design, so hitting 6+ GHz was secondary, as server CPUs don't clock in that range anyway.
With Zen6 on the other hand, the only server-focused aspect is the design of the 32c dense CCD, otherwise it seems to be more about clocks and core count.

And ARL's E core was a shrink from Intel 7 to TSMC N3B, that's like 3 full node jumps by today's standards (Intel 7 ~= TSMC N10, in terms of density and power efficiency).
Also, right now it still has a fundamentally shorter pipeline and worse V/f curve than Zen, let's wait and see how well its PPA advantage holds up when they keep adding transistors for IPC, SMT, increasing pipeline length and adjust the physical design to hit higher frequencies etc. to reach Zen6/7-like ST performance.

Zen 4 to Zen 5 was a massive sub node improvement.
Zen 5 being a server focused design is just cope for why it's so mid tbh.
ARL's E-core's pipeline is extremely long. Way longer than Zen's.

Question Zen 6 Speculation Thread

Diamond Member

Senior member

Lifer

Diamond Member

Golden Member

Diamond Member

Platinum Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Senior member

Moderator Emeritus, Elite Member

Golden Member

Diamond Member

Senior member

Diamond Member

Golden Member