Discussion AMD SoC Halo series GPU discussion

adroc_thurston · Feb 18, 2025

poke01 said:
If my hunch is right Strix Halo should have higher single thread than Strix Point.

Well yeah it has twice the accessible L3.

Joe NYC · Feb 18, 2025

adroc_thurston said:
Or you can just quadruple the MALL with SoIC magicwaffen.

If Medusa Halo stays with 256 bit LPDDR5, and assuming GPU will be more capable, the only way to avoid the GPU being bandwidth bottlenecked all the time would be to have massive MALL / Infinity cache.

If the both top and bottom die were to be ~250 mm2 die size, than MALL could be 256 MB and leave a plenty of room on the bottom die for other things.

Joe NYC · Feb 18, 2025

HP add for Strix Halo Mini PC:

adroc_thurston · Feb 18, 2025

Joe NYC said:
than MALL could be 256 MB

nope.

Joe NYC · Feb 18, 2025

adroc_thurston said:
nope.

From 32 MB to 256 MB would be some jump, but if AMD were to use SoIC with two equally sized dies, there would be a lot of available die area.

adroc_thurston · Feb 18, 2025

Joe NYC said:
From 32 MB to 256 MB would be some jump,

You can't hit this capacity in the bottom die.

Joe NYC · Feb 18, 2025

adroc_thurston said:
You can't hit this capacity in the bottom die.

What is the bottleneck stopping it?

There are 8 x 32 bit channels in Strix Halo. Is it a practical limit per channel or some other limit?

leoneazzurro · Feb 18, 2025

More than a bottleneck, it is probably a case of heavily diminishing returns for a noticeable increase in costs. I remember a graph about the hit rate of the cache against the size , which returned the bandwidth multiplier factor, and while the hit rate always increase with size, the increase went rapidly down after a certain amount of cache. 256 Mbytes was never used by AMD, even in the N21 case (128). N31 used 96 Mbytes for a 384 bit bus - and both these GPUs are classes above what Halo GPU is. 7800XT uses 64 Mbytes. I don't see Halo using more than 32 Mbytes for the reasons above.

adroc_thurston · Feb 18, 2025

Joe NYC said:
What is the bottleneck stopping it?

Bottom die needs a bajillion TSVs for power and ground to deliver the oomph for the compute on top.
Please consult MI300 HC deck for further information.

Joe NYC · Feb 18, 2025

leoneazzurro said:
More than a bottleneck, it is probably a case of heavily diminishing returns for a noticeable increase in costs. I remember a graph about the hit rate of the cache against the size , which returned the bandwidth multiplier factor, and while the hit rate always increase with size, the increase went rapidly down after a certain amount of cache. 256 Mbytes was never used by AMD, even in the N21 case (128). N31 used 96 Mbytes for a 384 bit bus - and both these GPUs are classes above what Halo GPU is. 7800XT uses 64 Mbytes. I don't see Halo using more than 32 Mbytes for the reasons above.

The cache hit rates reached diminishing returns in that chart, depending on resolution. and the hit rate was only in 50-60% range mostly, in 1440p and 40-50% in 4k.

It would be interesting to see the effect on actual FPS in a system that has a balanced bandwidth to GPU performance and then in one that is severely bandwidth starved.

leoneazzurro · Feb 18, 2025

I think after that graph AMD modified the caches in N3x and following, because with RDNA3 the cache size went down while they kept the efficiency at the same level. IIRC I did never see a similar graph for the RDNA3 architecture. Also, Strix Halo is not made for gaming at 4K for sure.

coercitiv · Feb 18, 2025

leoneazzurro said:
I think after that graph AMD modified the caches in N3x and following, because with RDNA3 the cache size went down while they kept the efficiency at the same level. IIRC I did never see a similar graph for the RDNA3 architecture. Also, Strix Halo is not made for gaming at 4K for sure.

RDNA3 went from 256 bit max to 384 bit max, memory clocks went up as well.

CouncilorIrissa · Feb 18, 2025

poke01 said:
Also we can see how memory bandwidth affects rendering. If my hunch is right Strix Halo should have higher single thread than Strix Point.

Not by much. Read bandwidth remains the same, only write bandwidth from IOD -> CCD has changed from 16B/clk to 32B.

leoneazzurro · Feb 18, 2025

coercitiv said:
RDNA3 went from 256 bit max to 384 bit max, memory clocks went up as well.

I was more referring to the 7800XT, which is using 64Mbytes on a 256 bit bus even if the memories are faster in this case.

SolidQ · Feb 18, 2025

Here is

igor_kavinski · Feb 18, 2025

igor_kavinski · Feb 18, 2025

igor_kavinski · Feb 18, 2025

igor_kavinski · Feb 18, 2025

igor_kavinski · Feb 18, 2025

Hitman928 · Feb 18, 2025

@igor_kavinski Would like to see long idle tests to see if the rumored LPE cores are a thing (haven't seen any official material on it so it seems like it isn't true). The heavy load test is also very interesting as the M4 is supposed to have a significantly lower power limit and has a bigger battery, but is still showing less battery life. Probably the lowered scores for the AMD systems on battery is because they are reducing the power limit to have better battery life when under load whereas the Macbook doesn't lower the power limit (or not nearly as much) so performance doesn't drop significantly on battery, but then battery life takes a much bigger hit when under heavy load.

branch_suggestion · Feb 18, 2025

Hitman928 said:
@igor_kavinski Would like to see long idle tests to see if the rumored LPE cores are a thing (haven't seen any official material on it so it seems like it isn't true). The heavy load test is also very interesting as the M4 is supposed to have a significantly lower power limit and has a bigger battery, but is still showing less battery life. Probably the lowered scores for the AMD systems on battery is because they are reducing the power limit to have better battery life when under load whereas the Macbook doesn't lower the power limit (or not nearly as much) so performance doesn't drop significantly on battery, but then battery life takes a much bigger hit when under heavy load.

Well the SoC tile is N4P so you can probably figure out why it isn't there.
Plans changed or the N3E info was complete bogus.
It does just fine vs M4 Pro when you consider the node disadvantage and the chassis being not ideal.

branch_suggestion · Feb 18, 2025

https://imgur.com/sZkDGvS

I love me some 2.5D interposers.

Hitman928 · Feb 18, 2025

branch_suggestion said:
Well the SoC tile is N4P so you can probably figure out why it isn't there.
Plans changed or the N3E info was complete bogus.
It does just fine vs M4 Pro when you consider the node disadvantage and the chassis being not ideal.

Right, seems like the leaks were wrong or, at the least, very outdated info.

fastandfurious6 · Feb 18, 2025

> The CPU can consume up to 86 Watts and then levels off at 70 Watts. the new Max+ 395 is comparable to the Ryzen 9 7945HX3D (Zen4) running at 115 Watts

> multi-core performance is very stable under sustained workloads

on a freaking tablet. huge

then this

+ 4200 points over 7945HX3D, and it's already gutted at max 70w (86w turbo) !

Discussion AMD SoC Halo series GPU discussion

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Senior member

Golden Member

Golden Member

Lifer

Lifer

Lifer

Lifer

Lifer

Diamond Member

Senior member

Senior member

Diamond Member

Senior member