Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

adroc_thurston · Aug 8, 2025

jpiniero said:
The point of Strix Halo is AI

Nope.

MS_AT · Aug 8, 2025

Hail The Brain Slug said:
It's the Nimo Mini PC Pro. It has a GameMax 12VO Flex ATX unit that appears to be a modified version of their Flex ATX gold rated 350W, and back-of-the-napkin math comparing it to the mini pc's using a power brick, which has a datasheet-provided efficiency figure, it looks to be 90%+ efficient, like GameMax claims.

It doesn't seem to me like the PSU efficiency is a major factor here, especially considering my desktop is
1) using a 1000W unit so 190 watts is not meaningfully more efficient than this PSU appears to be
2) has the GPU idling at 35W because high refresh rate displays
3) has 8 fans, leds, a 4x25GbE SFP28 NIC chugging additional power overhead
and it still draws the same power at the wall.

my desktop idles at 100W at the wall, so the relatively smaller idle to load power draw delta is pretty definitive IMO.

Sure, I didn't mean to say it's PSU fault

Anyway doing some digging it seems notebookcheck measured 154ns latency https://www.notebookcheck.net/HP-Z2...en-AI-Max-and-Radeon-RX-8060S.1069652.0.html# that's much worse than I thought. I was expecting something like 100-110ns like Strix Point. I guess it's double the latency of your desktop.

Josh128 · Aug 8, 2025

poke01 said:
It’s interesting to note that Ars also had similar results, in fact I would say the 9700X was doing much better relative to the power consumption when compared to the Max+ 395 in Handbrake.
Note: All AMD readings include the package power.

View attachment 128461 View attachment 128462 View attachment 128463

However the 9700X did poorly against the Max+ 395 in Cinebench/Blender.

Whats going on with Handbrake here? 32 threads equaling 16 threads while using twice the power? Getting smoked by 24 threads at the same power? The only thing that makes sense is that something is broken for Halo in this workload.

igor_kavinski · Aug 8, 2025

Josh128 said:
The only thing that makes sense is that something is broken for Halo in this workload.

The broken thing is probably latency. It's what destroyed Arrow Lake's potential. As a version 1.0 product, Strix Halo is decent. Just not the realization of all our dreams. Medusa Halo may end up fixing a lot of the issues.

Hail The Brain Slug · Aug 8, 2025

Josh128 said:
Whats going on with Handbrake here? 32 threads equaling 16 threads while using twice the power? Getting smoked by 24 threads at the same power? The only thing that makes sense is that something is broken for Halo in this workload.

In my testing, strix halo has some rigid scheduling rules that make every effort to group threads onto CCD0, so something that schedules, say, 1 thread per physical core expecting that is the outcome actually gets all 16 threads shoved onto CCD0, leaving the other CCD idle.

As far as I can figure out there's no way around this for strix halo. 9950X3D exhibits similar behavior sometimes but changing the scheduling directive in the bios fixes it.

So workloads that dont saturate all 32 threads but expect to get every physical core, well, don't.

My unreal results were significantly worse for Strix Halo initially due to this. I used config overrides to spam enough threads that the scheduler had no choice but to saturate both CCDs.

AFAIK Handbrake is similar, it doesnt saturate every logical processor so if the scheduling is bunching as many threads as it can on CCD0 and only spilling over onto CCD1, it will perform extremely poorly with no real good remedy.

This behavior might make sense in a laptop with extremely constrained cooling, power delivery, and a battery but it doesnt once those factors are removed, as is the case with these mini PCs.

NOTE: all the perf figures I shared earlier were after I applied workarounds to get full CPU saturation, this is not the reason why its inexplicably slow and not efficient compared to GNR eco mode in my workloads.

igor_kavinski · Aug 8, 2025

How about sharing your experience with AMD support? Might result in an AGESA fix.

poke01 · Aug 8, 2025

Hail The Brain Slug said:
This behavior might make sense in a laptop with extremely constrained cooling, power delivery, and a battery but it doesnt once those factors are removed, as is the case with these mini PCs.

surely its up to AMD and Framework to test this and fix it. They are leaving perf on the table.

Hail The Brain Slug · Aug 8, 2025

igor_kavinski said:
How about sharing your experience with AMD support? Might result in an AGESA fix.

Its not my job to make sure AMDs product functions competently and as expected. They can read the thread like everyone else.

I've already abandoned strix halo as a failure for my needs and have moved on.

StefanR5R · Aug 9, 2025

MS_AT said:
I just wonder what is the reason the performance is worse than expected given apparently higher clocks.

One potential reason: If the cores stall a lot on memory accesses, they don't pull a lot of power and the firmware may therefore clock them higher.

I have seen this with other workloads (vector arithmetic; if the data fits into L3$, it is energy-intensive, produces quite a bit of heat and reduces the core clocks while power-limited; but if the data has to be read/written a lot from/to main memory, the job takes of course longer, the CPU produces less heat yet the cores clock higher). I don't know if this corresponds with what @Hail The Brain Slug saw.

Hail The Brain Slug · Aug 9, 2025

StefanR5R said:
One potential reason: If the cores stall a lot on memory accesses, they don't pull a lot of power and the firmware may therefore clock them higher.

I have seen this with other workloads (vector arithmetic; if the data fits into L3$, it is energy-intensive, produces quite a bit of heat and reduces the core clocks while power-limited; but if the data has to be read/written a lot from/to main memory, the job takes of course longer, the CPU produces less heat yet the cores clock higher). I don't know if this corresponds with what @Hail The Brain Slug saw.

Temps were sky high, it was the hottest workload by far. Wall draw never showed a decrease in power consumption, it was pegged at max power nonstop and drawing 190W the entire time.

MS_AT · Aug 9, 2025

StefanR5R said:
One potential reason: If the cores stall a lot on memory accesses, they don't pull a lot of power and the firmware may therefore clock them higher.

I discarded that options because because as Hail reinforces above, the power draw was still significant. Another option is that it was unstable and clock-streching. Either way something was off.

fastandfurious6 · Aug 9, 2025

there's something about if a chip is overly-optimized for low power then max perf suffers right?

it's super insane they managed to slap full 9950X+midrange gpu into a handheld lmao

yottabit · Aug 9, 2025

I mean, I’m not surprised a chip with V-cache in eco mode can beat out Strix Halo efficiency-wise in certain workloads. Assuming more of the hot loop code can fit into the L3 it would make sense. IMO it would be more “fair” to compare it to 9950x efficiency before concluding there is something “wrong” with Halo

poke01 · Aug 9, 2025

yottabit said:
I mean, I’m not surprised a chip with V-cache in eco mode can beat out Strix Halo efficiency-wise in certain workloads. Assuming more of the hot loop code can fit into the L3 it would make sense. IMO it would be more “fair” to compare it to 9950x efficiency before concluding there is something “wrong” with Halo

i mean it’s loosing to a 9700X in the Ars review in the Handbrake test. Something is wrong and AMD so far hasn’t commented.

Hail The Brain Slug · Aug 9, 2025

yottabit said:
I mean, I’m not surprised a chip with V-cache in eco mode can beat out Strix Halo efficiency-wise in certain workloads. Assuming more of the hot loop code can fit into the L3 it would make sense. IMO it would be more “fair” to compare it to 9950x efficiency before concluding there is something “wrong” with Halo

I retested the workloads with each CCD disabled to verify V$ gains. It was only single digit %, so nothing significant to alter the outcome of my other testing.

poke01 · Aug 9, 2025

@Hail The Brain Slug , do you still have your Halo?

Hail The Brain Slug · Aug 9, 2025

poke01 said:
@Hail The Brain Slug , do you still have your Halo?

Its reset and boxed up pending rma approval

I wish I could keep it if only to keep testing stuff, but I needed a workstation and I need the money back from it to go toward it. $1700 is a bit high to spend on a toy

poke01 · Aug 9, 2025

While going thru the HPC ARM vs x86 rabbit hole.

I found this interesting note in regards to Strix Halo in Windows.

My first impression on HP Zbook Ultra G1a (Ryzen AI Max+ 395, Strix Halo +128 GB)

I got my HP Zbook G1a (395, 128 GB version) a month ago for my research, manipulating big matrices (need large memory capacity) and running FDTD simulations (require large memory bandwidth). For those two primary workloads, I think Strix Halo fits quite well among current laptops in the market...

forum.level1techs.com

“
Things that make performance-squeezing-out tricky on Windows

Regardless of the power plan, the second CCD remains parked by default—even when running on AC power—and it doesn’t wake up unless all 16 threads (8 cores + 8 SMT) are fully utilized. As a result, if you run a 16-threaded program, the second CCD won’t be activated. I’m not sure whether this behavior is controlled by AMD or HP, but I hope this policy will be changed later.

So, to make use of 16 threads across the two CCDs while running the COMSOL benchmark, I had to use Process Lasso to manually wake up the second CCD.

It would be best if HP provided an option to disable SMT in the BIOS, but I could not find it. Considering this laptop is intended for workstation use, I think this is more or less disappointing.

“

This is down to AMD to fix since it’s happening on ALL Halo machines regardless of laptop or mini PC. AMD cannot advertise this as a workstation class machine when it’s running Windows…

fastandfurious6 · Aug 10, 2025

process lasso is kinda essential software for windows, big benefits

StefanR5R · Aug 10, 2025

StefanR5R said:
One potential reason: If the cores stall a lot on memory accesses, they don't pull a lot of power and the firmware may therefore clock them higher.

Hail The Brain Slug said:
Temps were sky high, it was the hottest workload by far. Wall draw never showed a decrease in power consumption, it was pegged at max power nonstop and drawing 190W the entire time.

I should have written: If the cores stall a lot on memory accesses, they don't pull a lot of power at a given clock speed, e.g. 3.8 GHz (at which the 9950X3D-eco happened to pull the same system power), and the firmware may therefore clock them higher until the power limit is reached again, or any other limit is reached (temperature, Amperage, …, if not f_max), e.g. 4.6 GHz (was that a time averag? – it is already 90% of f_max), which just means burning power for burning power's sake while the execution units aren't actually doing much.

StefanR5R · Aug 10, 2025

poke01 said:
Regardless of the power plan, the second CCD remains parked by default

Does anybody know whether Strix Point computers are set up in the same way — that is, keep the dense CCX idle as long as all runnable software threads fit onto the logical CPUs of the classic CCX¹, IOW prefer SMT usage over dual-CCX spread usage?

________
¹) That would be 8 threads in case of Strix Point, except Ryzen AI 7 PRO 360 in which the classic CCX only has 3 cores/ 6 threads.

Josh128 · Aug 10, 2025

fastandfurious6 said:
process lasso is kinda essential software for windows, big benefits

Its extremely niche software for very specific use cases for eccentric enthusiasts. Far from essential. If you have to rely on third party software for your hardware to work properly, your hardware is not up to par...

Shmee · Aug 11, 2025

Any updates here on the supposedly upcoming CPU with dual 3D cache CCDs?

Hail The Brain Slug · Aug 12, 2025

Shmee said:
Any updates here on the supposedly upcoming CPU with dual 3D cache CCDs?

AMD Ryzen CPU With Dual X3D Chiplets Is Reportedly Fake And Doesn't Exist

As per a new report, the news about AMD preparing another Ryzen 9000X3D chip with Dual X3D CCDs is reportedly untrue.

wccftech.com

Shmee · Aug 12, 2025

Ah poo, oh well.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Diamond Member

Senior member

Golden Member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Elite Member

Diamond Member

Senior member

Senior member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Elite Member

Elite Member

Golden Member

Memory & Storage, Graphics Cards Mod Elite Member

Diamond Member

Memory & Storage, Graphics Cards Mod Elite Member