Discussion AMD SoC Halo series GPU discussion

marees · Jan 7, 2025

Literal teaser

insertcarehere · Jan 7, 2025

adroc_thurston said:
Uh no it doesn't.

Not to rehash an old point but yeah it is.

12CU RDNA3 is basically c. 12-13% faster than 12CU RDNA2 in an APU, holding memory speeds and bandwidth constant. That is despite having a node shrink from 6nm to 4nm and as a result ability to run higher clocks. Not seeing much there that RDNA2 with a node shrink can't replicate.

adroc_thurston · Jan 7, 2025

insertcarehere said:
Not seeing much there that RDNA2 with a node shrink can't replicate.

RDNA2 would be more area. Hope that helps!

marees · Jan 7, 2025

adroc_thurston said:
RDNA2 would be more area. Hope that helps!

Also only RDNA 3.5 has the low power learnings from Samsung radeon experiment

adroc_thurston · Jan 7, 2025

marees said:
Also only RDNA 3.5 has the low power learnings from Samsung radeon experiment

Oh it's not an experiment. S.LSI would be aggressively shipping SoCs even now if they had a goddamn node they could use

gdansk · Jan 7, 2025

GTracing said:
I think some people are overestimating how much Strix Halo costs. It isn't that expensive.

I reserve the right to be pleasantly surprised but the only interesting devices so far (HP) look like they'll be very expensive.

marees · Jan 7, 2025

gdansk said:
I reserve the right to be pleasantly surprised but the only interesting devices so far (HP) look like they'll be very expensive.

You will have to sell your children to get a device with this apu

But performance will be worth it. A monster that will eat your children

insertcarehere · Jan 7, 2025

GTracing said:
I think some people are overestimating how much Strix Halo costs. It isn't that expensive. Plugging die size for the GPUIO die into a calculator gives 148 good dies. At $17000 per wafer, that's $114.86 per die.

View attachment 114424

The CCDs are around $20 each. Even after the advanced packaging, the SKU costs less than $200 to manufacture.

That's not taking into account the margins that AMD would want though. Plugging the same parameters for Strix Point gets us a bit less than $80/die. And from what we've seen on the market AMD isn't selling that for cheap. Strix Halo is a much lower volume chip that will necessitate higher margins than Strix Point for the math to work out...

GTracing · Jan 7, 2025

insertcarehere said:
That's not taking into account the margins that AMD would want though. Plugging the same parameters for Strix Point gets us a bit less than $80/die. And from what we've seen on the market AMD isn't selling that for cheap. Strix Halo is a much lower volume chip that will necessitate higher margins than Strix Point for the math to work out...

That $80 doesn't take into account the dGPU though. Strix Halo should perform roughly like a HX 370 + RTX 4060 mobile (stronger in CPU, but weaker in GPU). By my estimates, that combo is at most $30 less than an HX 395 to manufacture.

I agree margins are a big unknown. And there would be other costs that go into selling the chip.

marees · Jan 7, 2025

GTracing said:
That $80 doesn't take into account the dGPU though. Strix Halo should perform roughly like a HX 370 + RTX 4060 mobile (stronger in CPU, but weaker in GPU). By my estimates, that combo is at most $30 less than an HX 395 to manufacture.

I agree margins are a big unknown. And there would be other costs that go into selling the chip.

You can game on battery (sub 30watts) on strix halo. I think that is difficult with discrete gpu

Joe NYC · Jan 7, 2025

GTracing said:
That $80 doesn't take into account the dGPU though. Strix Halo should perform roughly like a HX 370 + RTX 4060 mobile (stronger in CPU, but weaker in GPU). By my estimates, that combo is at most $30 less than an HX 395 to manufacture.

I agree margins are a big unknown. And there would be other costs that go into selling the chip.

Did you count separate video memory and extra cost to assemble that as well, including cooling for the video chip?

GTracing · Jan 7, 2025

Joe NYC said:
Did you count separate video memory and extra cost to assemble that as well, including cooling for the video chip?

No, that's just a rough estimate of the cost to manufacture the dies and package them on a substrate. It doesn't include shipping, RAM, motherboard, cooling, etc.

insertcarehere · Jan 7, 2025

marees said:
You can game on battery (sub 30watts) on strix halo. I think that is difficult with discrete gpu

I can't see being that much better than Strix Point by itself at those power levels though. Just running stuff through the IOD + CCDs is going to take consequentially more power than a single die, which will matter at these power levels, not to mention how well (or not) 16c Zen 5 + 40CU RDNA3. 5 can scale down effectively.

gdansk · Jan 7, 2025

insertcarehere said:
I can't see being that much better than Strix Point by itself at those power levels though. Just running stuff through the IOD + CCDs is going to take consequentially more power than a single die, which will matter at these power levels, not to mention how well (or not) 16c Zen 5 + 40CU RDNA3. 5 can scale down effectively.

Wide & low is a recipe for GPU efficiency. C.f. Apple.
Still dubious at ~30W because the multiple chips. Though really how much data is being sent between them if it can share memory? It depends possibly on how quickly the interconnect can power down. But the crossover point with Strix Point is probably not too bad.

insertcarehere · Jan 7, 2025

gdansk said:
Wide & low is a recipe for GPU efficiency. C.f. Apple.
Still dubious at ~30W because the multiple chips. Though really how much data is being sent between them if it can share memory? It depends possibly on how quickly the interconnect can power down. But the crossover point with Strix Point is probably not too bad.

I'd imagine the main memory (RAM?) to be orders of magnitude higher latency and lower bandwidth than pinging stuff through the interconnect,and not advisable unless the data is not super affected by latency.

Given the interconnect links both CCDs and the GPU + IOD, that'd be pretty inadvisable to power down during a gaming session..

gdansk · Jan 7, 2025

insertcarehere said:
Given the interconnect links both CCDs and the GPU + IOD, that'd be pretty inadvisable to power down during a gaming session..

Oh, right. The IOD and GPU are the same die still.

I'd imagine the main memory (RAM?) to be orders of magnitude higher latency and lower bandwidth than pinging stuff through the interconnect,and not advisable unless the data is not super affected by latency.

Where are you putting the assets? You don't have to stream anything. Pass a pointer, done. And I guess the MC is on the GPU die because it will cause the majority of memory bandwidth. And the CPU, which will have to go through the interconnect, would be less bandwidth and draw lists are small so this presents ample opportunity for doing nothing on most of the links.

adroc_thurston · Jan 7, 2025

gdansk said:
Still dubious at ~30W because the multiple chips

USRs are hella cheap.

gdansk said:
Though really how much data is being sent between them if it can share memory? It depends possibly on how quickly the interconnect can power down

you can pretty much ignore it has d2d link. it's incredibly overbuilt if you're doing just CPU stuff.

DavidC1 · Jan 7, 2025

GTracing said:
I think some people are overestimating how much Strix Halo costs. It isn't that expensive. Plugging die size for the GPUIO die into a calculator gives 148 good dies. At $17000 per wafer, that's $114.86 per die.

It isn't the raw cost that's the problem, but volume. New motherboard required, new packaging required for the CPU on a much lower volume.

Sure dGPUs require a separate PCB and all that but that's on a mass produced already existing design that needs little modifications to accommodate newer generations.

I again doubt these halo iGPUs will gain real traction.

gdansk said:
Wide & low is a recipe for GPU efficiency. C.f. Apple.

It depends a lot on the V/F curve, meaning it can vary between different uarch, silicon, and power levels.

Going from 0.6V to 0.7V is a no brainer, because you are going from say 300MHz to over 1GHz. So in this case, you can't win with a "wide and slow strategy", because you need to make up for over 3x difference in clocks.

At the very end, 1.1V might be 2.4GHz while 1.2V is 2.5GHz. Then you absolutely benefit taking 1.1V instead and making it 5-10% wider.

The scaling goes from superlinear, to linear, to sublinear. Wide & Slow only works for sublinear scaling.

Joe NYC · Jan 8, 2025

gdansk said:
You don't have to stream anything. Pass a pointer, done.

I was wandering if this is possible in Windows environment. This would be ideal, if possible.

igor_kavinski · Jan 8, 2025

Joe NYC said:
I was wandering if this is possible in Windows environment. This would be ideal, if possible.

Using zero-copy buffers on integrated GPUs | ArrayFire

arrayfire.com

Intel and AMD already have a kind of this unified memory with the zero copy buffers for their iGPUs. I suppose AMD will do some driver magic to let both the CPU and GPU access any part of the unified memory space and the memory controller will ensure coherence.

UMA apparently was introduced back in DX11: https://learn.microsoft.com/en-us/windows/win32/direct3d11/unified-memory-architecture

igor_kavinski · Jan 8, 2025

ROG Flow Z13 (2025) | LAPTOP | ASUS UAE

Discover the ROG Flow Z13 (2025) GZ302: Experience top-tier gaming with Intel Core i9, NVIDIA GeForce RTX, and 1TB SSD for unbeatable performance.

rog.asus.com

AED 8999 / 3.68 = $2445 for 32GB

I hope 128GB doesn't exceed $3000 from the cheapest manufacturer.

igor_kavinski · Jan 8, 2025

OK, $2445 sounds like a bargain for a whole AI device, instead of just a fat stupid GPU.

Joe NYC · Jan 8, 2025

igor_kavinski said:
View attachment 114488

OK, $2445 sounds like a bargain for a whole AI device, instead of just a fat stupid GPU.

The power efficiency should also be superior by similar factor.

Meteor Late · Jan 8, 2025

Display is not that good, though, no Mini LED or OLED, unless I looked at it wrong.

techjunkie123 · Jan 8, 2025

Asus has some strix halo performance data on their page.

GPU time spy score seems to be ~4060 / low power 4070. Not bad. Comparable to 32 CU 7700S too, but at lower power (ofc 6 mm vs 4 nm). GPU efficiency seems similar to 4060/4070 too, perhaps even slightly higher.

CPU R23 nT scores are also solid. Comparable at the top end to 16C dragon range from last gen. 60 W or so seems to be the sweet spot. Efficiency should be comparable to M4 Pro/Max/Strix point for R23 nT specifically.

Discussion AMD SoC Halo series GPU discussion

Golden Member

Senior member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Senior member

Senior member

Golden Member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Lifer

Lifer

Lifer

Diamond Member

Senior member

Member