Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

adroc_thurston · Feb 11, 2024

DisEnchantment said:
It is very odd for AMD to send patches before the HW is shipping

no?
They've been hiring compiler and linux engineers for a reason.

Gideon · Feb 12, 2024

adroc_thurston said:
no?
They've been hiring compiler and linux engineers for a reason.

Makes sense.

Keeping secrets is nice and all, but as AMD caters more and more towards the data-center - these customers really appreciate (not to say expect) day-1 support in Linux kernel, compilers etc ... with minimal proprietary solutions / hacks if possible. That's why intel tends to upstream architectures years in advance.

Considering GCC 14 will be released annually in March - April, there isn't much time to get stuff stable for the launch year.

So if the plan was for this year, it's the absolute latest hour for these patches.

yuri69 · Feb 12, 2024

The Zen 4 enablement is not indicative since it wasn't really different form the rest of Family 19h from the machine PoV. OTOH the Zen 3 situation was similar to the Family 1Ah/Zen 5 one. Yet the Zen 3 GCC patches were published early Dec 2020 - a month after the launch.

Seeing both binutils and GCC posted like 2Qs before launch is certainly an improvement. However, when you count the real "end user availability" there is a room for improvement... Post patches for a toolchain component -> wait till merged -> wait till released -> wait till a mainstream distro upgrades (-> wait till somebody actually recompiles packages using it).

Nothingness · Feb 12, 2024

Gideon said:
Considering GCC 14 will be released annually in March - April, there isn't much time to get stuff stable for the launch year.

So if the plan was for this year, it's the absolute latest hour for these patches.

I'm afraid it's already too late for inclusion in 14.0 since gcc entered staged 4 in January.

GCC Development Plan - GNU Project

gcc.gnu.org

Stage 4

During this period, the only (non-documentation) changes that may be made are changes that fix regressions. Other important bugs like wrong-code, rejects-valid or build issues may be fixed as well. All changes during this period should be done with extra care on not introducing new regressions - fixing bugs at all cost is not wanted. Note that the same constraints apply to release branches. This period lasts until stage 1 opens for the next release.

GCC Development Plan - GNU Project

gcc.gnu.org

GCC 14 Stage 4 (starts 2024-01-12)

mikegg · Feb 12, 2024

adroc_thurston said:
Not even close.

Do you have a source? Apple claims the Macbook Air is the best selling laptop in the world, which is believable considering how the models are concentrated compared to PCs.

yottabit said:
Geekbench 6 sucks and doesn't represent true MT performance.

It represents MT performance better than Cinebench for the vast majority of consumers.

yottabit said:
Edit: anyway, it was probably a bad comparison. I just happened to be shopping those Lenovo laptops recently. The 7840hs would be better to compare against the M1 Pro/Max for its MT and GPU. Notebookcheck has some comparisons here: https://www.notebookcheck.net/R7-7840HS-vs-M2-Pro-vs-M1-Max_14948_14973_13843.247596.0.html

Notebookcheck's performance numbers are quite bad. Unlike Anandtech, Notebook check bases most of its performance and perf/watt figure on Cinebench. Unfortunately, Cinebench is heavily hand optimized for x86 AVX instructions which means it simply does not represent how most applications will run.

Regardless, AMD's Zen4 is still about 2-3x worse in perf/watt compared to M1 and even farther behind compared to M3.

Nothingness · Feb 12, 2024

mikegg said:
Notebookcheck's performance numbers are quite bad. Unlike Anandtech, Notebook check bases most of its performance and perf/watt figure on Cinebench. Unfortunately, Cinebench is heavily hand optimized for x86 AVX instructions which means it simply does not represent how most applications will run.

I agree with everything you wrote except this. My understanding is that this stopped being true starting with Cinebench 24. And it's visible in CB24 vs CB23 results in the link yottabit posted: in CB23 ST 7840HS is faster than M2 Pro while in CB24 the M2 Pro is faster.

The problem is the version Notebookcheck uses for power consumption: CB R15! This is ridiculous.

Gideon · Feb 12, 2024

Nothingness said:
I'm afraid it's already too late for inclusion in 14.0 since gcc entered staged 4 in January

Oh well. Hopefully they manage better with LLVM and its byannual release

mikegg · Feb 12, 2024

Nothingness said:
I agree with everything you wrote except this. My understanding is that this stopped being true starting with Cinebench 24. And it's visible in CB24 vs CB23 results in the link yottabit posted: in CB23 ST 7840HS is faster than M2 Pro while in CB24 the M2 Pro is faster.

The problem is the version Notebookcheck uses for power consumption: CB R15! This is ridiculous.

We should take ST and MT from CB2024 with a grain of salt.

https://www.reddit.com/r/hardware/comments/pitid6

We should strive to use SPEC. Absent SPEC, we should use Geekbench.

Nothingness · Feb 12, 2024

mikegg said:
We should take ST and MT from CB2024 with a grain of salt.

https://www.reddit.com/r/hardware/comments/pitid6

You're quoting a 2 years old thread. It'd have to be revisited for CB24. I wasn't pleading for CB, it's completely stupid to assess CPU performance only a rendering test.

Anyway my take on this, as always, is that using a single benchmark or aggregated scores of subtests (as in SPEC and GB) is the best way to draw wrong general conclusions.

mikegg said:
We should strive to use SPEC. Absent SPEC, we should use Geekbench.

Agreed, with the added comment I made above: always look at scores of subtests. The execution profiles of all these are so different that geo-meaning them is misleading.

StefanR5R · Feb 12, 2024

mikegg said:
AMD's Zen4 is still about 2-3x worse in perf/watt compared to M1 and even farther behind compared to M3.

Not that I for one care a lot about Geekbench, but nonetheless, do you have links to lab tests which show that GB task energy (Joule per testrun) is 2…3× on (mobile) Zen 4 compared to M1? Thanks in advance.

eek2121 · Feb 12, 2024

soresu said:
Design/fab node was my point, as in heftier core regressing clock without a dramatically better fab node.

N4P should be significantly better than N5, but enough to absorb the shift to 6+ wide core?

Colour me doubtful for the moment, though I'll be extremely happy to be proven an apostate 🙏

Considering a good 7950X bin can hit 6ghz, even if peak clocks dropped 400mhz, it wouldn’t matter much.

uzzi38 said:
This is entirely dependant on either:

a) Base ARM offerings being competitive with Intel/AMD designs, which really isn't a given. Recently gains haven't been very impressive, they've either come at significant increases to die area spent or power consumption. (Zen and even more so particular Zen xC competes extremely well against Cortex-X/V based designs with regards to power/perf/area)

b) those companies producing their own in-house cores to compete against AMD/Intel. Apple did a great job getting to A14 gen where they're on a level playing field, and have swiftly dropped off with no improvements since. Ampere have almost vanished off the face of the earth with their own in house core debuting with AmpereOne. Only real hopes at this approach is really Qualcomm with Nuvia cores and potentially Nvidia, but we don't really know if the latter is still developing their own in-house cores given Orin uses base ARM ones instead - and they're weren't even recent cores for when Orin started shipping either.

Also, losing margins to AMD/Intel is a great tagline, but you're missing out the bit where all of those companies need to spend millions on R&D to develop their own in-house products, have to then fight for wafer and packaging capacity with significantly less volume than AMD/Intel as they're only servicing their own needs. Reality isn't quite as rosy as "we get to save money if we do it ourselves". There's a good reason why AWS hasn't totally abandoned anything but Graviton - it's because they don't have a choice.

It seems some also forget that AWS is not the only cloud provider.

My cloud provider offers a choice of Intel/ARM/AMD, and ARM/AMD cost the same, but the AMD chip is faster.

FlameTail said:
Ah, you haven't heard of ARM Blackhawk (Cortex X5), the Ultimate ARM core to kill all custom ARM cores!

RESEARCH NOTE: Arm’s “Blackhawk” CPU Is An Audacious Plan To Have The Best Smartphone CPU Core This Year

For years now, there has been what I consider a healthy, competitive tension between Arm CPU instruction set licensees and Arm’s pe-packaged and pre-validated IP licensees. (I am sure some licensees would challenge me on “healthy” given Apple’s performance.) I think it made sense that Arm would...

moorinsightsstrategy.com

A word from ARM themselves about Blackhawk

Just right when the CPU architects left!

I doubt Nvidia would go to the trouble of designing custom ARM cores. Custom designs are difficult (Look at the fate of Samsung Mongoose). If Blackhawk is good enough, there is no need for custom cores.

UNRELEASED PRODUCT HYPE is usually find and all, however….

mikegg said:
Regardless, AMD's Zen4 is still about 2-3x worse in perf/watt compared to M1 and even farther behind compared to M3.

…this is an AMD future products thread, not an ARM one, and that statement above is a complete fabrication of reality. The M3 is nowhere NEAR 200-300% more efficient than Zen 4.

FlameTail · Feb 12, 2024

eek2121 said:
The M3 is nowhere NEAR 200-300% more efficient than Zen 4.

Give evidence in the form of numbers.

DrMrLordX · Feb 12, 2024

FlameTail said:
Give evidence in the form of numbers.

He doesn't need to since he did not make the initial claim.

Hitman928 · Feb 12, 2024

FlameTail said:
Give evidence in the form of numbers.

The onus is on the person making the claim to provide evidence, in this case that would mean providing evidence that M3 is 2x - 3x more efficient than Zen 4, which was the original claim.

adroc_thurston · Feb 12, 2024

Hitman928 said:
The onus is on the person making the claim to provide evidence, in this case that would mean providing evidence that M3 is 2x - 3x more efficient than Zen 4, which was the original claim.

It is, in 1t, I think.

adroc_thurston · Feb 12, 2024

mikegg said:
Apple claims

opinion discarded

mikegg said:
It represents MT performance better than Cinebench for the vast majority of consumers.

no? the subtests are all over the place; half of them are outright server stuff.

mikegg said:
Notebookcheck's performance numbers are quite bad. Unlike Anandtech, Notebook check bases most of its performance and perf/watt figure on Cinebench. Unfortunately, Cinebench is heavily hand optimized for x86 AVX instructions which means it simply does not represent how most applications will run.

"your benchmark is cope. my isn't"
Also cinememe isn't hand-optimised SIMD.
Try something like y-cruncher.

soresu · Feb 12, 2024

FlameTail said:
I doubt Nvidia would go to the trouble of designing custom ARM cores. Custom designs are difficult (Look at the fate of Samsung Mongoose)

nVidia already had their failed custom ARM core period.

There was Denver 1, Denver 2 and Carmel.

The latter couldn't stand up to A76 - seemingly causing the death of their custom CPU core ambitions and was in the Tegra Xavier SoC that came before Orin which uses off the shelf A78AE.

soresu · Feb 12, 2024

adroc_thurston said:
They've been hiring compiler and linux engineers for a reason.

Yup - all hands to the ROCm/HIP pump.

Speaking of which did anyone hear about the new ZLUDA release that supports AMD GPU hardware now?

AMD funded the guy for a year or 2 and just recently cut funding at which point he released the code and Phoronix reported on it:

https://www.phoronix.com/review/radeon-cuda-zluda

Hitman928 · Feb 12, 2024

adroc_thurston said:
It is, in 1t, I think.

The original M1 was insanely efficient in 1t thanks to the very sensible clock speed and a stripped down SOC. The subsequent chips that added SOC features and pushed the frequency for performance actually dropped 1t efficiency significantly to better compete with what others in the marketplace were offering. So if the comparison is 1t efficiency with the M1, then sure given the proper context, but even Apple can’t match its own bar in the years since its release. The original claim was that M3 is even more efficient though which I haven’t seen evidence that this is true (if we’re still talking 1t efficiency only).

Doug S · Feb 12, 2024

I'm assuming the claims of Apple SoCs being 2-3x more efficient than Intel's would come from Andrei's deep dive into M1/M1P/M1M where he showed that. I'm not aware of anything similar for M3, and while the claims it has reduced power efficiency compared to M1 appear to be true I tend to discount that, because there are ample reasons to believe N3B is such a broken process that Apple's ability to minimize power draw may have been compromised by the yield issues. I'd rather wait for A18/M4 before concluding that Apple is on a path towards reduced efficiency rather than assuming A17/M3 was a special case.

adroc_thurston · Feb 12, 2024

Doug S said:
N3B is such a broken process

That's bull.
It's just expensive.

Tigerick · Feb 12, 2024

adroc_thurston said:
That's bull.
It's just expensive.

I am with adroc. Someone keep forgetting Apple managed to launch M3 Max with 92 billions transistors within a year of HVM is a real achievement. As for any new process, there is learning curve, clearly TSMC has managed to pass through it.

S'renne · Feb 13, 2024

Btw where would Strix Halo even fit in current laptop configurations, wouldn't it really unfortunately get paired with a dGPU system unless Lenovo makes a special exception or something

mikegg · Feb 13, 2024

eek2121 said:
…this is an AMD future products thread, not an ARM one, and that statement above is a complete fabrication of reality. The M3 is nowhere NEAR 200-300% more efficient than Zen 4.

It's probably 3-4x.

mikegg · Feb 13, 2024

adroc_thurston said:
"your benchmark is cope. my isn't"
Also cinememe isn't hand-optimised SIMD.
Try something like y-cruncher.

Cinebench uses Intel Embree Engine underneath. What do you think it's optimized for?

Read this on why Cinebench is a poor standard CPU benchmark from Ex-Anandtech Andrei F:

https://www.reddit.com/r/hardware/comments/pitid6

The reason why Cinebench is so popular is because AMD used it in its main marketing because it was the main benchmark earlier Zen chips excelled in. Cinebench does not correlate with most consumer workloads.

I smell denial.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Platinum Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Member

Platinum Member

Platinum Member