- Mar 3, 2017
- 1,777
- 6,789
- 136
Sure, I didn't mean to say it's PSU faultIt's the Nimo Mini PC Pro. It has a GameMax 12VO Flex ATX unit that appears to be a modified version of their Flex ATX gold rated 350W, and back-of-the-napkin math comparing it to the mini pc's using a power brick, which has a datasheet-provided efficiency figure, it looks to be 90%+ efficient, like GameMax claims.
It doesn't seem to me like the PSU efficiency is a major factor here, especially considering my desktop is
1) using a 1000W unit so 190 watts is not meaningfully more efficient than this PSU appears to be
2) has the GPU idling at 35W because high refresh rate displays
3) has 8 fans, leds, a 4x25GbE SFP28 NIC chugging additional power overhead
and it still draws the same power at the wall.
my desktop idles at 100W at the wall, so the relatively smaller idle to load power draw delta is pretty definitive IMO.
Whats going on with Handbrake here? 32 threads equaling 16 threads while using twice the power? Getting smoked by 24 threads at the same power? The only thing that makes sense is that something is broken for Halo in this workload.It’s interesting to note that Ars also had similar results, in fact I would say the 9700X was doing much better relative to the power consumption when compared to the Max+ 395 in Handbrake.
Note: All AMD readings include the package power.
View attachment 128461View attachment 128462View attachment 128463
However the 9700X did poorly against the Max+ 395 in Cinebench/Blender.
The broken thing is probably latency. It's what destroyed Arrow Lake's potential. As a version 1.0 product, Strix Halo is decent. Just not the realization of all our dreams. Medusa Halo may end up fixing a lot of the issues.The only thing that makes sense is that something is broken for Halo in this workload.
In my testing, strix halo has some rigid scheduling rules that make every effort to group threads onto CCD0, so something that schedules, say, 1 thread per physical core expecting that is the outcome actually gets all 16 threads shoved onto CCD0, leaving the other CCD idle.Whats going on with Handbrake here? 32 threads equaling 16 threads while using twice the power? Getting smoked by 24 threads at the same power? The only thing that makes sense is that something is broken for Halo in this workload.
surely its up to AMD and Framework to test this and fix it. They are leaving perf on the table.This behavior might make sense in a laptop with extremely constrained cooling, power delivery, and a battery but it doesnt once those factors are removed, as is the case with these mini PCs.
Its not my job to make sure AMDs product functions competently and as expected. They can read the thread like everyone else.How about sharing your experience with AMD support? Might result in an AGESA fix.
One potential reason: If the cores stall a lot on memory accesses, they don't pull a lot of power and the firmware may therefore clock them higher.I just wonder what is the reason the performance is worse than expected given apparently higher clocks.
Temps were sky high, it was the hottest workload by far. Wall draw never showed a decrease in power consumption, it was pegged at max power nonstop and drawing 190W the entire time.One potential reason: If the cores stall a lot on memory accesses, they don't pull a lot of power and the firmware may therefore clock them higher.
I have seen this with other workloads (vector arithmetic; if the data fits into L3$, it is energy-intensive, produces quite a bit of heat and reduces the core clocks while power-limited; but if the data has to be read/written a lot from/to main memory, the job takes of course longer, the CPU produces less heat yet the cores clock higher). I don't know if this corresponds with what @Hail The Brain Slug saw.
I discarded that options because because as Hail reinforces above, the power draw was still significant. Another option is that it was unstable and clock-streching. Either way something was off.One potential reason: If the cores stall a lot on memory accesses, they don't pull a lot of power and the firmware may therefore clock them higher.
i mean it’s loosing to a 9700X in the Ars review in the Handbrake test. Something is wrong and AMD so far hasn’t commented.I mean, I’m not surprised a chip with V-cache in eco mode can beat out Strix Halo efficiency-wise in certain workloads. Assuming more of the hot loop code can fit into the L3 it would make sense. IMO it would be more “fair” to compare it to 9950x efficiency before concluding there is something “wrong” with Halo
I retested the workloads with each CCD disabled to verify V$ gains. It was only single digit %, so nothing significant to alter the outcome of my other testing.I mean, I’m not surprised a chip with V-cache in eco mode can beat out Strix Halo efficiency-wise in certain workloads. Assuming more of the hot loop code can fit into the L3 it would make sense. IMO it would be more “fair” to compare it to 9950x efficiency before concluding there is something “wrong” with Halo
Its reset and boxed up pending rma approval@Hail The Brain Slug , do you still have your Halo?
One potential reason: If the cores stall a lot on memory accesses, they don't pull a lot of power and the firmware may therefore clock them higher.
I should have written: If the cores stall a lot on memory accesses, they don't pull a lot of power at a given clock speed, e.g. 3.8 GHz (at which the 9950X3D-eco happened to pull the same system power), and the firmware may therefore clock them higher until the power limit is reached again, or any other limit is reached (temperature, Amperage, …, if not f_max), e.g. 4.6 GHz (was that a time averag? – it is already 90% of f_max), which just means burning power for burning power's sake while the execution units aren't actually doing much.Temps were sky high, it was the hottest workload by far. Wall draw never showed a decrease in power consumption, it was pegged at max power nonstop and drawing 190W the entire time.
Does anybody know whether Strix Point computers are set up in the same way — that is, keep the dense CCX idle as long as all runnable software threads fit onto the logical CPUs of the classic CCX¹, IOW prefer SMT usage over dual-CCX spread usage?Regardless of the power plan, the second CCD remains parked by default
Its extremely niche software for very specific use cases for eccentric enthusiasts. Far from essential. If you have to rely on third party software for your hardware to work properly, your hardware is not up to par...process lasso is kinda essential software for windows, big benefits
Any updates here on the supposedly upcoming CPU with dual 3D cache CCDs?