Question Zen 6 Speculation Thread

Page 270 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Nothingness

Diamond Member
Jul 3, 2013
3,307
2,379
136
Do they give higher uplift in SPEC specifically compared to other codebases? I remember Phoronix was doing AOCC against vanilla clang, but since Phoronix is not doing SPEC I was not able to find a place that would compare the two (aocc vs vanilla clang) on the same hw/os in SPEC.
I only know that it has impacts on parts of SPEC. In my experience too often the optimizations targetting SPEC have little interest outside of it, but that's only my dated experience ;)
 

OneEng2

Senior member
Sep 19, 2022
855
1,115
106
One easy way to see that CPU design affects power is that different architectures use different numbers of transistors and power is directly related to the number of transistors in your design (static portion of total power unless power gated) and the number of transistors switching at any given time (dynamic portion of total power).
True. Still, something is always given up. Die size, clock speed, PPA, etc. You can't win on all sides at once.

Clever CPU designs do a better job of making the trade-offs though :).
Stop dogging Intel bruh, I thought you were rooting for them.
I absolutely am rooting for them! I don't wish for them to rely on US tax payer money forever though. I want them to actually be competitive.

I think that BSPDN and GAA in one step was very very risky. Intel only did this because they felt it was the only way to jump ahead of TSMC. My fear is that the gamble will not pay off.

I also feel their lack of SMT is killing them in DC where they used to reign supreme.

These kinds of strategic mistakes are troubling for anyone who is rooting for Intel IMO.
 

Geddagod

Golden Member
Dec 28, 2021
1,541
1,627
106
neither is really the problem.
Intel just has bad cores, bad fabric, bad etc.
Honestly IMO there's no point of fabbing anything at TSMC if they couldn't compete even if they did.
ARL on TSMC might have outright been a strategic mistake in hindsight, but then again with 20A canned they might have been stuck on MTL-R if they didn't use TSMC lmao.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,536
3,226
136
The sad part is that, in the context of MTL(R) as seen in ARL-U, Intel 3 isn't a bad node. It is my opinion that they under invested in building it out, or it just ran too late for what they needed, to have been used for ARL proper, or for ARL(R) instead of TSMC. Of course, I'm not privy to the number of wafers that the committed to at TSMC on N3B and they may have just been in a situation that it didn't make any economic sense to use anything else.
 

Geddagod

Golden Member
Dec 28, 2021
1,541
1,627
106
The sad part is that, in the context of MTL(R) as seen in ARL-U, Intel 3 isn't a bad node. It is my opinion that they under invested in building it out, or it just ran too late for what they needed, to have been used for ARL proper, or for ARL(R) instead of TSMC. Of course, I'm not privy to the number of wafers that the committed to at TSMC on N3B and they may have just been in a situation that it didn't make any economic sense to use anything else.
Given how N3B turned out, I think Intel deff has some regrets. I still think it's the best node Intel could have gotten at the time... but still...

I do wonder though why they didn't end up porting their stuff to N3E. Or have at least LNL be on N3E. Even if ARL was planned for 2023 and thus uses N3B, wasn't LNL always planned for 2024? And during the time they were making these decisions, was Intel not still relatively well financially?
 

Geddagod

Golden Member
Dec 28, 2021
1,541
1,627
106
because they're completely different nodes.
Hence the use of the word port.
Intel deff has made relatively wasteful die segmentation decisions in the past, this would be right up Intel's alley, except this time the difference would actually be some what beneficial.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,680
5,208
136
I think that BSPDN and GAA in one step was very very risky. Intel only did this because they felt it was the only way to jump ahead of TSMC. My fear is that the gamble will not pay off.

It seems to me that where BSPD would help the most is high performance, high power chip, where a lot of power has to be delivered.

At the same time 18A does not excel for these types of chips, Intel went to TSMC N2 for Nova Lake.
 

adroc_thurston

Diamond Member
Jul 2, 2023
7,248
10,007
106
It seems to me that where BSPD would help the most is high performance, high power chip, where a lot of power has to be delivered.
BSPDNs are inherent heat insulators and provide a massive challenge for HPC applications.
At the same time 18A does not excel for these types of chips, Intel went to TSMC N2 for Nova Lake.
That's because N2p is a much faster node.
 
  • Like
Reactions: Joe NYC

Doug S

Diamond Member
Feb 8, 2020
3,601
6,368
136
You don't need to go to assembly unless compiler is doing something stupid (or you are writing microbenchmarks that are checking architecture specific things, but then it's easier from assembly). You have intrinsics. You have 3rd party libs. Some language have support built-in into the standard library of the language. And most of all, if you write the high level code using well known idioms for a language the compiler will be able to pick this up. Problem is compilers are capricious so the libs are better fit if you want control. But we are well past: "want to use SIMD? you have to roll your hand optimized assembly every time".


True but when it comes to libraries that's another variable that can confuse results. If a library is rewritten to use SIMD or matrix instructions it could have some pretty significant impact on the results of certain benchmarks. Unless Geekbench is statically linking everything (and that's increasingly not an option these days) you could do update your OS to the latest patches and if it includes that new library suddenly the same version of Geekbench on the same hardware is showing improved results.

Not really any way to control for that factor, and you may not even know it IS a factor unless you notice a performance gain that's large enough to not dismiss as noise and start digging around to find out why.
 
  • Like
Reactions: Nothingness

OneEng2

Senior member
Sep 19, 2022
855
1,115
106
If you can fit some of your most important indexes in cache, it suddenly becomes very relevant for databases.
Absolutely.
It seems to me that where BSPD would help the most is high performance, high power chip, where a lot of power has to be delivered.

At the same time 18A does not excel for these types of chips, Intel went to TSMC N2 for Nova Lake.
I think it is just the opposite. Where BSPDN is likely to do well is in something like DC where you are likely to max out the socket power with a butt ton of cores all running at relatively lower clocks.

I believe BSPDN provides higher density, and lower IR losses (less heat) at the interconnects .... but it makes it easier to get hot spots in your core logic ..... making it difficult to scale up in clock speed for applications like HEDT.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,258
16,116
136
Absolutely.

I think it is just the opposite. Where BSPDN is likely to do well is in something like DC where you are likely to max out the socket power with a butt ton of cores all running at relatively lower clocks.

I believe BSPDN provides higher density, and lower IR losses (less heat) at the interconnects .... but it makes it easier to get hot spots in your core logic ..... making it difficult to scale up in clock speed for applications like HEDT.
OK, I can't find the original post to quote, but where you agreed about the cache fitting in memory, I wanted to add a real world scenario. A primegrid task where you use pinning (can be accomplished by smaller instances that are pinned) time goes from days to 9 hours. smaller examples exist, just giving real world times on a database task.
 

Thibsie

Golden Member
Apr 25, 2017
1,130
1,334
136
True but when it comes to libraries that's another variable that can confuse results. If a library is rewritten to use SIMD or matrix instructions it could have some pretty significant impact on the results of certain benchmarks. Unless Geekbench is statically linking everything (and that's increasingly not an option these days) you could do update your OS to the latest patches and if it includes that new library suddenly the same version of Geekbench on the same hardware is showing improved results.

Not really any way to control for that factor, and you may not even know it IS a factor unless you notice a performance gain that's large enough to not dismiss as noise and start digging around to find out why.
Not if the said library is bundled with geekbench ?
 

MS_AT

Senior member
Jul 15, 2024
878
1,770
96
True but when it comes to libraries that's another variable that can confuse results
Just a clarification by mentioning libs in SIMD context I meant libs facilitating SIMD usage like Agner Fog's vectorclass, EVE or google's highway. These you could link statically if you so desired.

In general sure, depending on system wide DLLs will introduce variations in the results.
 
  • Like
Reactions: Nothingness

Nothingness

Diamond Member
Jul 3, 2013
3,307
2,379
136
True but when it comes to libraries that's another variable that can confuse results. If a library is rewritten to use SIMD or matrix instructions it could have some pretty significant impact on the results of certain benchmarks. Unless Geekbench is statically linking everything (and that's increasingly not an option these days) you could do update your OS to the latest patches and if it includes that new library suddenly the same version of Geekbench on the same hardware is showing improved results.

Not really any way to control for that factor, and you may not even know it IS a factor unless you notice a performance gain that's large enough to not dismiss as noise and start digging around to find out why.
Can't remember if it was in GB4 or GB5, but one of the tests was making a significant use of sinf/cosf (single precision sine/cosine) which back then were poorly optimized in Android math lib. Also note how results posted on SPEC website use specific versions of math and memory libraries to override system libraries.
 

Geddagod

Golden Member
Dec 28, 2021
1,541
1,627
106
Zen4->5 was on a barely improved node, with a ~30% fatter core due to full-rate AXV512/512bit FP pipes and 50% more INT ALUs.
It was a very server-focused design, so hitting 6+ GHz was secondary, as server CPUs don't clock in that range anyway.
With Zen6 on the other hand, the only server-focused aspect is the design of the 32c dense CCD, otherwise it seems to be more about clocks and core count.

And ARL's E core was a shrink from Intel 7 to TSMC N3B, that's like 3 full node jumps by today's standards (Intel 7 ~= TSMC N10, in terms of density and power efficiency).
Also, right now it still has a fundamentally shorter pipeline and worse V/f curve than Zen, let's wait and see how well its PPA advantage holds up when they keep adding transistors for IPC, SMT, increasing pipeline length and adjust the physical design to hit higher frequencies etc. to reach Zen6/7-like ST performance.
Zen 4 to Zen 5 was a massive sub node improvement.
Zen 5 being a server focused design is just cope for why it's so mid tbh.
ARL's E-core's pipeline is extremely long. Way longer than Zen's.