• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Question Zen 6 Speculation Thread

Page 270 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Do they give higher uplift in SPEC specifically compared to other codebases? I remember Phoronix was doing AOCC against vanilla clang, but since Phoronix is not doing SPEC I was not able to find a place that would compare the two (aocc vs vanilla clang) on the same hw/os in SPEC.
I only know that it has impacts on parts of SPEC. In my experience too often the optimizations targetting SPEC have little interest outside of it, but that's only my dated experience 😉
 
One easy way to see that CPU design affects power is that different architectures use different numbers of transistors and power is directly related to the number of transistors in your design (static portion of total power unless power gated) and the number of transistors switching at any given time (dynamic portion of total power).
True. Still, something is always given up. Die size, clock speed, PPA, etc. You can't win on all sides at once.

Clever CPU designs do a better job of making the trade-offs though 🙂.
Stop dogging Intel bruh, I thought you were rooting for them.
I absolutely am rooting for them! I don't wish for them to rely on US tax payer money forever though. I want them to actually be competitive.

I think that BSPDN and GAA in one step was very very risky. Intel only did this because they felt it was the only way to jump ahead of TSMC. My fear is that the gamble will not pay off.

I also feel their lack of SMT is killing them in DC where they used to reign supreme.

These kinds of strategic mistakes are troubling for anyone who is rooting for Intel IMO.
 
neither is really the problem.
Intel just has bad cores, bad fabric, bad etc.
Honestly IMO there's no point of fabbing anything at TSMC if they couldn't compete even if they did.
ARL on TSMC might have outright been a strategic mistake in hindsight, but then again with 20A canned they might have been stuck on MTL-R if they didn't use TSMC lmao.
 
The sad part is that, in the context of MTL(R) as seen in ARL-U, Intel 3 isn't a bad node. It is my opinion that they under invested in building it out, or it just ran too late for what they needed, to have been used for ARL proper, or for ARL(R) instead of TSMC. Of course, I'm not privy to the number of wafers that the committed to at TSMC on N3B and they may have just been in a situation that it didn't make any economic sense to use anything else.
 
The sad part is that, in the context of MTL(R) as seen in ARL-U, Intel 3 isn't a bad node. It is my opinion that they under invested in building it out, or it just ran too late for what they needed, to have been used for ARL proper, or for ARL(R) instead of TSMC. Of course, I'm not privy to the number of wafers that the committed to at TSMC on N3B and they may have just been in a situation that it didn't make any economic sense to use anything else.
Given how N3B turned out, I think Intel deff has some regrets. I still think it's the best node Intel could have gotten at the time... but still...

I do wonder though why they didn't end up porting their stuff to N3E. Or have at least LNL be on N3E. Even if ARL was planned for 2023 and thus uses N3B, wasn't LNL always planned for 2024? And during the time they were making these decisions, was Intel not still relatively well financially?
 
I think that BSPDN and GAA in one step was very very risky. Intel only did this because they felt it was the only way to jump ahead of TSMC. My fear is that the gamble will not pay off.

It seems to me that where BSPD would help the most is high performance, high power chip, where a lot of power has to be delivered.

At the same time 18A does not excel for these types of chips, Intel went to TSMC N2 for Nova Lake.
 
It seems to me that where BSPD would help the most is high performance, high power chip, where a lot of power has to be delivered.
BSPDNs are inherent heat insulators and provide a massive challenge for HPC applications.
At the same time 18A does not excel for these types of chips, Intel went to TSMC N2 for Nova Lake.
That's because N2p is a much faster node.
 
You don't need to go to assembly unless compiler is doing something stupid (or you are writing microbenchmarks that are checking architecture specific things, but then it's easier from assembly). You have intrinsics. You have 3rd party libs. Some language have support built-in into the standard library of the language. And most of all, if you write the high level code using well known idioms for a language the compiler will be able to pick this up. Problem is compilers are capricious so the libs are better fit if you want control. But we are well past: "want to use SIMD? you have to roll your hand optimized assembly every time".


True but when it comes to libraries that's another variable that can confuse results. If a library is rewritten to use SIMD or matrix instructions it could have some pretty significant impact on the results of certain benchmarks. Unless Geekbench is statically linking everything (and that's increasingly not an option these days) you could do update your OS to the latest patches and if it includes that new library suddenly the same version of Geekbench on the same hardware is showing improved results.

Not really any way to control for that factor, and you may not even know it IS a factor unless you notice a performance gain that's large enough to not dismiss as noise and start digging around to find out why.
 
If you can fit some of your most important indexes in cache, it suddenly becomes very relevant for databases.
Absolutely.
It seems to me that where BSPD would help the most is high performance, high power chip, where a lot of power has to be delivered.

At the same time 18A does not excel for these types of chips, Intel went to TSMC N2 for Nova Lake.
I think it is just the opposite. Where BSPDN is likely to do well is in something like DC where you are likely to max out the socket power with a butt ton of cores all running at relatively lower clocks.

I believe BSPDN provides higher density, and lower IR losses (less heat) at the interconnects .... but it makes it easier to get hot spots in your core logic ..... making it difficult to scale up in clock speed for applications like HEDT.
 
Absolutely.

I think it is just the opposite. Where BSPDN is likely to do well is in something like DC where you are likely to max out the socket power with a butt ton of cores all running at relatively lower clocks.

I believe BSPDN provides higher density, and lower IR losses (less heat) at the interconnects .... but it makes it easier to get hot spots in your core logic ..... making it difficult to scale up in clock speed for applications like HEDT.
OK, I can't find the original post to quote, but where you agreed about the cache fitting in memory, I wanted to add a real world scenario. A primegrid task where you use pinning (can be accomplished by smaller instances that are pinned) time goes from days to 9 hours. smaller examples exist, just giving real world times on a database task.
 
True but when it comes to libraries that's another variable that can confuse results. If a library is rewritten to use SIMD or matrix instructions it could have some pretty significant impact on the results of certain benchmarks. Unless Geekbench is statically linking everything (and that's increasingly not an option these days) you could do update your OS to the latest patches and if it includes that new library suddenly the same version of Geekbench on the same hardware is showing improved results.

Not really any way to control for that factor, and you may not even know it IS a factor unless you notice a performance gain that's large enough to not dismiss as noise and start digging around to find out why.
Not if the said library is bundled with geekbench ?
 
True but when it comes to libraries that's another variable that can confuse results
Just a clarification by mentioning libs in SIMD context I meant libs facilitating SIMD usage like Agner Fog's vectorclass, EVE or google's highway. These you could link statically if you so desired.

In general sure, depending on system wide DLLs will introduce variations in the results.
 
True but when it comes to libraries that's another variable that can confuse results. If a library is rewritten to use SIMD or matrix instructions it could have some pretty significant impact on the results of certain benchmarks. Unless Geekbench is statically linking everything (and that's increasingly not an option these days) you could do update your OS to the latest patches and if it includes that new library suddenly the same version of Geekbench on the same hardware is showing improved results.

Not really any way to control for that factor, and you may not even know it IS a factor unless you notice a performance gain that's large enough to not dismiss as noise and start digging around to find out why.
Can't remember if it was in GB4 or GB5, but one of the tests was making a significant use of sinf/cosf (single precision sine/cosine) which back then were poorly optimized in Android math lib. Also note how results posted on SPEC website use specific versions of math and memory libraries to override system libraries.
 
Zen4->5 was on a barely improved node, with a ~30% fatter core due to full-rate AXV512/512bit FP pipes and 50% more INT ALUs.
It was a very server-focused design, so hitting 6+ GHz was secondary, as server CPUs don't clock in that range anyway.
With Zen6 on the other hand, the only server-focused aspect is the design of the 32c dense CCD, otherwise it seems to be more about clocks and core count.

And ARL's E core was a shrink from Intel 7 to TSMC N3B, that's like 3 full node jumps by today's standards (Intel 7 ~= TSMC N10, in terms of density and power efficiency).
Also, right now it still has a fundamentally shorter pipeline and worse V/f curve than Zen, let's wait and see how well its PPA advantage holds up when they keep adding transistors for IPC, SMT, increasing pipeline length and adjust the physical design to hit higher frequencies etc. to reach Zen6/7-like ST performance.
Zen 4 to Zen 5 was a massive sub node improvement.
Zen 5 being a server focused design is just cope for why it's so mid tbh.
ARL's E-core's pipeline is extremely long. Way longer than Zen's.
 
Back
Top