Discussion AMD SoC Halo series GPU discussion

Page 31 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Glo.

Diamond Member
Apr 25, 2015
5,930
4,990
136
I mean it.
Look, I know that AMD is focused on increasing perf/CU with next gen GPU archs. Thats great.

That does not exclude the possibility for larger APU than Halo.

As I have said. IF, and ONLY, IF Strix Halo and Medusa Halo will have high enough demand, and IF there will be demand from users for even larger design - thats what AMD will do.
 
  • Like
Reactions: Tlh97 and marees

MS_AT

Senior member
Jul 15, 2024
666
1,350
96
Should've used RDNA4 for Strix Halo, not fast enough for Blender. The 8060S is just 10% faster than the base M4.
Isn't that more of a software problem? Apparently 8060S should be comparable to RTX 4060, and 4060 is benching much better in blender. I doubt RDNA4 would have helped.
 
  • Like
Reactions: scineram

poke01

Diamond Member
Mar 8, 2022
3,482
4,801
106
Isn't that more of a software problem? Apparently 8060S should be comparable to RTX 4060, and 4060 is benching much better in blender. I doubt RDNA4 would have helped.
I don’t believe it’s a software problem, hardware RT acceleration using HIP is supported in Blender.

I could be wrong here, so we’ll just wait for the 9070 XT results.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,990
136
I think AMD would be happy to if the market existed.
But demand for Mx Max compete is far too niche, not sure the Apple SKU balance but I can imagine Max/Ultra are <10% of Mac volume.
Mac Studio and Mac Pro ALONE are 5% of volume of all of Macs. MacBook Pro 16 with M4 Max is something around 15% of all of MBP sales.

Low volume, high profit margin part, as it should be.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,696
3,259
136
Don't know about you guys, but I am pretty sad Strix Halo didn't end up with RDNA4 instead.

7800XT: 60CU, 2.43GHz, 37.3TFLOPs ; 256-bit 19.5Gbit, 624GB/s, 64MB IC, 263W
9070XT: 64CU, 2.97GHz, 48.7TFLOPs ; 256-bit 20Gbit, 640GB/s, 64MB IC, 304W
9070XT is ~50% faster than 7800XT in games at 4K excluding RT while having only 16% higher TDP and 2.6% higher BW.
Conclusion: RDNA4 offers higher performance within the same TDP and the same BW.

Strix Halo could have had easily an RDNA4 IGP: 40CU RDNA4, 2.9Ghz, 29.7TFLOPs ; 256-bit 8Gbit, 256GB/s, 32MB
and be less limited by BW and TDP than the current IGP.

But the biggest question is how big an 40CU RDNA4 IGP would be compared to the existing one, because we already know RDNA4 uses a lot more transistors than RDNA3.
If you can put at least 32CU RDNA4 IGP inside that IO die without increasing die size, then that would already be good enough, because IPC looks to be ~20-25% higher, which would already compensate for the missing CUs.
Then the higher average gaming clock thanks to a more efficient architecture + less CUs + better BW efficiency should mean higher raster performance than Strix Halo.
Just not sure by how much.
 
Last edited:

Timorous

Golden Member
Oct 27, 2008
1,944
3,786
136
Yeah, just like we would never get Strix Halo soldered into desktop motherboards.

Which desktop motherboard?

Afaik the one framework have built is fundamentally a laptop board in an itx form factor. It is entirely unique.
 
  • Like
Reactions: scineram

branch_suggestion

Senior member
Aug 4, 2023
676
1,419
96
Mac Studio and Mac Pro ALONE are 5% of volume of all of Macs. MacBook Pro 16 with M4 Max is something around 15% of all of MBP sales.

Low volume, high profit margin part, as it should be.
Which is a hair above 10% of Mac shipments overall and Mac has a higher ASP mix than PC.
Demand would be maybe ~10Mu yearly for a 512b MoP mobile/SFF SoC at maturity.
Let's say ASP is $3k, which is very optimistic for a $30B TAM sometime in 5-10 years.
Note Mac shipments in 2024 were <25Mu, so maybe 3Mu were Mx Max/Ultra.
Not really worth the investment for AMD over other markets, others might but it is a gamble without strong OEM support.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,990
136
Don't know about you guys, but I am pretty sad Strix Halo didn't end up with RDNA4 instead.

7800XT: 60CU, 2.43GHz, 37.3TFLOPs ; 256-bit 19.5Gbit, 624GB/s, 64MB IC, 263W
9070XT: 64CU, 2.97GHz, 48.7TFLOPs ; 256-bit 20Gbit, 640GB/s, 64MB IC, 304W
9070XT is ~50% faster than 7800XT in games at 4K excluding RT while having only 16% higher TDP and 2.6% higher BW.
Conclusion: RDNA4 offers higher performance within the same TDP and the same BW.

Strix Halo could have had easily an RDNA4 IGP: 40CU RDNA4, 2.9Ghz, 29.7TFLOPs ; 256-bit 8Gbit, 256GB/s, 32MB
and be less limited by BW and TDP than the current IGP.

But the biggest question is how big an 40CU RDNA4 IGP would be compared to the existing one, because we already know RDNA4 uses a lot more transistors than RDNA3.
If you can put at least 32CU RDNA4 IGP inside that IO die without increasing die size, then that would already be good enough, because IPC looks to be ~20-25% higher, which would already compensate for the missing CUs.
Then the higher average gaming clock thanks to a more efficient architecture + less CUs + better BW efficiency should mean higher raster performance than Strix Halo.
Just not sure by how much.
Yeah, was thinking that recently. RDNA4 is much better, more efficient use of resources available for the GPU.

Especially in the context of Medusa Point, even if it has 16 CUs it could be even faster than RX 6500 XT desktop, assuming it has Infinity Cache available to the GPU, even 8 MB's. But as Adroc and Kepler say - Medusa is RDNA3.5, unless I am mistaken.
Which is a hair above 10% of Mac shipments overall and Mac has a higher ASP mix than PC.
Demand would be maybe ~10Mu yearly for a 512b MoP mobile/SFF SoC at maturity.
Let's say ASP is 3k, which is very optimistic for a $30B TAM sometime in 5-10 years.
Note Mac shipments in 2024 were <25Mu, so maybe 3Mu were Mx Max/Ultra.
Not really worth the investment for AMD over other markets, others might but it is a gamble without strong OEM support.
10 mil/year of demand on a product that SOC price starts at 1000$ means 10 bln dollars, per year of earnings on this product alone.

And on 512 bit bus with LPDDR6 it could scale all the way to 512 GB of RAM. So the price margin would be skyhigh.
 

branch_suggestion

Senior member
Aug 4, 2023
676
1,419
96
Yeah, was thinking that recently. RDNA4 is much better, more efficient use of resources available for the GPU.
32CU RDNA4 would've crushed 4070m, alas it merely trades blows.
TTM is harsh, but getting it out now vs 6+ months later is the right call.
Especially in the context of Medusa Point, even if it has 16 CUs it could be even faster than RX 6500 XT desktop, assuming it has Infinity Cache available to the GPU, even 8 MB's. But as Adroc and Kepler say - Medusa is RDNA3.5, unless I am mistaken.
Yeah only Halo is getting the new shinies, Medusa Point might be more or less Strix with a new CPU bolted on.
Lame, but still a great CPU bump, shame about handhelds though if true.
10 mil/year of demand on a product that SOC price starts at 1000$ means 10 bln dollars, per year of earnings on this product alone.
Split across a few potential players, ehh.
And on 512 bit bus with LPDDR6 it could scale all the way to 512 GB of RAM. So the price margin would be skyhigh.
Would depend on future demand of extended memory configs, especially in enterprise.
Just so risky, the PC market relies on a lot of unspoken truths, this would break many of them.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,990
136
Would depend on future demand of extended memory configs, especially in enterprise.
Just so risky, the PC market relies on a lot of unspoken truths, this would break many of them.
That would be exactly the MAIN market target, first. Its absolutely not a mainstream product. Just as Mx Max and Ultra chips are not mainstream products.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,696
3,259
136
Continuation to my previous post.

Strix Halo in Asus ROG Flow Z13 managed 10224 points in Time Spy Graphics at Turbo power profile(TGP 60 Watts). (Link)
Screenshot_3.png
RX 7700S(2.5GHz boost and 2.2GHz gaming frequency) at 100W? managed 10206 points in Time Spy Graphics. (Link)
RX 7600M XT(2.6GHz boost and 2.3GHz gaming frequency) at 120W? managed 10777 points in Time Spy Graphics. (Link)
The funny thing is that even desktop RX 7600 manages only 10,771 points in Time Spy Graphics. (Link)

I will ignore Halo's score at 80W in TimeSpy, because It looks like It's already bottlenecked by gaining only 5.5%, which is interesting considering in CP2077 they still measured a 11% increase in FPS moving from 60W -> 80W TGP.
It looks like the IGP worked at ~80% of 7700S's clockspeed(32/40CU = 0.8 or 80%) so somewhere between ~1.8GHz at 60W TGP, assuming 7700S has 2.2GHz during TimeSpy.

FT5-M7YacAIwDWj.jpg

Not sure If that IO die uses N4 or N4P(N4X), but If the above graph holds true, then moving from N6(N7P) to N4P should give you -39% lower power consumption or 19.3% higher performance(clockspeed).
With this in mind Halo's 40CU IGP doesn't look very impressive, because 7700S ported to N4P would mean 61W TGP(100W*0.61), so only 1W more while being disadvantaged by clocking higher.

What does It mean for an RDNA4 based IGP?
40CU RDNA4 looks like an overkill for that BW, the same as Strix Halo's IGP. Even 32CU RDNA4 IGP should provide some increase in performance.
It would be a downclocked desktop RX 9600, which should have ~50% raster increase compared to RX 7600, the same as N48 vs N32, If It can boost to 3.2GHz and be paired with 20gbps 128-bit GDDR6, so ~16,000 points, not sure about TimeSpy scaling but whatever.
Gaming clock of 2.4-2.6GHz of this hypothetical 32CU RDNA IGP + 256-bit 8000gbps should allow 12,000-13,000 points at 70-85W TGP maybe? With higher TGP even a bit more, but BW could prove to be a problem once more.
Performance would be better than the 40CU RDNA3.5 IGP by ~10-20%, but let's not forget about other advantages like better RT performance, better features etc.

My conclusion stands, RDNA4 would have been better even with only 32CU IGP.

P.S. Feel free to correct me If I made any mistakes, which probably did at least in my assumption that 7700S has only 2.2GHz during TimeSpy.
 
Last edited:

gorobei

Diamond Member
Jan 7, 2007
3,938
1,421
136
More telling is that it's still Q3. Either the batches are small or there are plenty Halo chips guaranteed for framework.
Do we know how many is in each round?
no clue, maybe you could figure it out by typical min quantity orders for a sub component.
the motherboard pcb and assembly could easily be a custom order. so no luck there.
no dgpu
so about all you could look into is the sfx psu. assuming FW will only do a pre-order batch if they have enough orders to match the psu order they can source in bulk, we would need to find how many units fsp will do in a round for a discount price.