Question AMD Rembrandt/Zen 3+ APU Speculation and Discussion

Page 19 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

izaic3

Member
Nov 19, 2019
61
96
91
Alright, so we've had some leaks so far. I don't know if any of it's been confirmed yet, as it's pretty early, but here is what I've surmised so far (massive grain of salt of course):

If if turns out to have RDNA 2 and 12 CU, I could see iGPU performance potentially almost doubling over Cezanne.

If I've made any mistakes or gotten anything wrong, please let me know. I'd also love to hear more knowledgeable people weigh in on their expectations.
 
Last edited:

Shivansps

Diamond Member
Sep 11, 2013
3,918
1,570
136
Trust me, if AMD does anything like what Apple did, it will be $700 or more.

Most likely more, AMD only sells the APU, Apple sells the ecosystem, but at least that would mean they cant brand a low end or mainstream APU as "premium" anymore.
 
  • Like
Reactions: Mopetar

Thibsie

Golden Member
Apr 25, 2017
1,128
1,334
136
Do you think everyone wants to spend 3K$ only to get "the fastest and most efficient APU available" when a cheaper X86 alternative with discrete graphics will have same/better performances at the cost of worse battery ? Especially when that platform misses the driving force of sales for most of the people(videogames)?

But it's a Mac !
 

Spicy

Member
Oct 5, 2021
46
48
51
Using older architectures for iGPU has several reasons to be. First, it is a well established "building block": if the projects for the new graphical architecture and for the new APU start in parallel, it is unlikely using for the iGPU something new and unproven, whih does not even exist in reality. Second, power optimization:we have seen Vega going from power hungry monster in the form of Radeon VII to a sober mobile incarnation in Renoir and Cezanne, and this even going higher on clocks on the very same production process. This could have been done for RDNA , too, but it would have taken longer and quite probably the launch windows might have been not met. A third reason may be driver stability, which is important for corporate users, and these are one of the main targets of these APUs.
I've always thought the reason was: Navi2 need high bandwidth (DDR5). It would have been a waste to associate it with DDR4. False?
 
  • Like
Reactions: Joe NYC

leoneazzurro

Golden Member
Jul 26, 2016
1,114
1,867
136
I've always thought the reason was: Navi2 need high bandwidth (DDR5). It would have been a waste to associate it with DDR4. False?

I'd say false, as the need for bandwidth is mainly due to the amount of data to be elaborated per second. Probably it is true that in these APUs you are mainly limited by the bandwidth available, but you could have simply solved that by using less CUs for the same performance. The main reason must be related to when the design freeze took place, and how muc time you need to optimize these graphic blocks for mobile use.
 

Spicy

Member
Oct 5, 2021
46
48
51
I'd say false, as the need for bandwidth is mainly due to the amount of data to be elaborated per second. Probably it is true that in these APUs you are mainly limited by the bandwidth available, but you could have simply solved that by using less CUs for the same performance.
Yes, but why change the design, for the same performance?
Everyone would have criticized: new iGP Navi2, but no boost perf? bad publicity.
 
  • Like
Reactions: Joe NYC

uzzi38

Platinum Member
Oct 16, 2019
2,746
6,655
146
Yes, but why change the design, for the same performance?
Everyone would have criticized: new iGP Navi2, but no boost perf? bad publicity.
Even on DDR4 you'd have a performance boost.

I'm just gonna say it now, Rembrandt's iGPU performance boost over Renoir/Cezanne is significantly greater than the boost in memory bandwidth.
 

leoneazzurro

Golden Member
Jul 26, 2016
1,114
1,867
136
Yes, but why change the design, for the same performance?
Everyone would have criticized: new iGP Navi2, but no boost perf? bad publicity.

Rembrandt will use DDR5, which will result in higher bandwidth available. At this point probably the optimizations for RDNA2 mobile are finished, and the new architecture is also more feature rich (not speaking about Ray Tracing, quite useless on an APU with only 12 CU, but variable rate shading and so on). So now it makes sense to change the IGPU block with a more advanced and efficient one.
 
  • Like
Reactions: prtskg

Joe NYC

Diamond Member
Jun 26, 2021
3,656
5,200
136
...how long do you think it takes to produce a new SoC? Honestly- from the "oh, that is what we should build" moment, to the point where they actually put it on a store shelf? It's a lot longer than 1 year.

I know it takes a while.

But AMD is already doing it in XBox, PS5 with different type of memory, in SteamDeck, with exactly the same memory controller. Steam Deck is 1+ quarter ahead of Rembrandt. So AMD knows what is needed to make the GPU part of CPU perform better.

So with 2 technologies to make a powerful laptop CPU (V-Cache and multiple channels of memory inside the MCM), if AMD does neither in Rembrandt, it is an opportunity missed.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,656
5,200
136
Worth reiterating that Apple is charging an extra $400 to go from the 10+16 to go to 10+32. Plus you are also required to go to 32 GB of memory which is an extra $400 over 16 GB. So the cheapest model with the 10+32 you can buy is $3199. You really cant compare to an AMD APU that goes into sub-1k machines.

It also goes the other way.

The reason AMD APUs go into sub-$1,000 laptops is because AMD does not put the same fast memory in the MCM feeding a powerful GPU.
 
  • Like
Reactions: scannall

Joe NYC

Diamond Member
Jun 26, 2021
3,656
5,200
136
There's not much of a need to partition the gpu into a Chiplet if you can stack a monster SRAM die on it and use it as infinity cache. But that's way off in the future and impossible for AMD to do right now (/s)

Since today, since Apple has redefined APU, I think the race is on for AMD and Intel to match it. Except, at prices that would not limit the penetration to small segment of the ruling class.

It will be cheaper to put together 4 chiplets of CPU, GPU, SRAM, IOD to match Apple. Probably at less than half the cost or less.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,656
5,200
136
Even on DDR4 you'd have a performance boost.

I'm just gonna say it now, Rembrandt's iGPU performance boost over Renoir/Cezanne is significantly greater than the boost in memory bandwidth.

That's what my understanding is as well. Which means that Rembrandt needs more bandwidth...
 

Shivansps

Diamond Member
Sep 11, 2013
3,918
1,570
136
Yes, but why change the design, for the same performance?
Everyone would have criticized: new iGP Navi2, but no boost perf? bad publicity.
It would have been faster for sure as RDNA2 is more memory efficient than Vega, it just that, instead of 8/12CU like they have in Van Gogh or RMB, it would have been, lets say 6. RDNA2 was likely way too late to be used on Renoir and Cezzane, but it should perform faster than Vega at the same bandwidth.

They also didnt have much incentive to push for IGP perf until TGL came out. Even Vega 3 was faster than the Intel UHD 630...
 

Insert_Nickname

Diamond Member
May 6, 2012
4,971
1,695
136
They also didnt have much incentive to push for IGP perf until TGL came out. Even Vega 3 was faster than the Intel UHD 630...

Faster is a relative term in that case. Is it faster? Sure. But slightly faster then UHD630 isn't enough to practically matter. That said, you can game on it in a pinch, and older titles are just fine.

Where Vega3 has the real advantage is its drivers, since they're the same as regular desktop Radeons.
 
  • Like
Reactions: prtskg and Joe NYC

Asterox

Golden Member
May 15, 2012
1,058
1,864
136
It would have been faster for sure as RDNA2 is more memory efficient than Vega, it just that, instead of 8/12CU like they have in Van Gogh or RMB, it would have been, lets say 6. RDNA2 was likely way too late to be used on Renoir and Cezzane, but it should perform faster than Vega at the same bandwidth.

They also didnt have much incentive to push for IGP perf until TGL came out. Even Vega 3 was faster than the Intel UHD 630...

Shame on you, you put salt on a painful old Intel iGPU wound........................:mask:


RDNA2 3CU with 2000mhz iGPU, that would be hm 40-50% faster vs Vega 3CU with iGPU at 1100mhz.
 
Last edited:
  • Like
Reactions: lightmanek

LightningZ71

Platinum Member
Mar 10, 2017
2,527
3,220
136
Since today, since Apple has redefined APU, I think the race is on for AMD and Intel to match it. Except, at prices that would not limit the penetration to small segment of the ruling class.

It will be cheaper to put together 4 chiplets of CPU, GPU, SRAM, IOD to match Apple. Probably at less than half the cost or less.

That's a very expensive package to put into mid-grade laptops...

It would be FAR cheaper to just make the same 8 core, 6 WGP / 12 CU RDNA2 package, but expand it to 10 WGP / 20 CU and do four channels of DRAM in four SODIMM slots, two on either side of the processor, or, have 16GB soldered on two channels, and have two SODIMM slots for the other two. That's MUCH cheaper than integrating a dGPU that will likely cost over $100, require routing for 128 bits of GRRD6, handle thermals for those packages and the dGPU, handle power for that dGPU, and deal with the UEFI oddities that it requires , and, instead, keep the revenue in AMD's pocket instead.

We aren't seeing many design wins for AMD mobile GPUs in the market. We've got, what, 6 different models out of the 6600M right now, base around 4 distinct chassis designs? Nothing higher in the x86 market, and a few 5500M designs knocking around in the low end? If AMD decided to invest in a SOC that could obviate the need for low end dGPUs, they could get FAR more market penetration and revenue from that market, revenue that they are currently not getting! Sure, they are getting money from low end gaming laptops with 4600/5600H processors, but they are giving away the GPU side to 1650s and 3050s. With a SOC that costs maybe $20 more per unit to make and package, they can achieve an additional $100 in realized revenue per unit in laptops that are price and performance competitive with 1650/3050 discrete designs that allow both the laptop manufacturer and themselves to realize a greater share of the revenue.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,527
3,220
136
Shame on you, you put salt on a painful old Intel iGPU wound........................:mask:


RDNA2 3CU with 2000mhz iGPU, that would be hm 40-50% faster vs Vega 3CU with iGPU at 1100mhz.

I was under the impression that the smallest denomination of RDNA2 was a WGP, which is 2CU equivalents. The bottom end product would likely be 2 WGPs, being roughly 4CU. That should be in the neighborhood of the 32EU Xe packages in the desktop chips. I don't know if it meets or beats the 48EU base configs in the G4 packages.
 
  • Like
Reactions: scineram

maddie

Diamond Member
Jul 18, 2010
5,156
5,545
136
That's a very expensive package to put into mid-grade laptops...

It would be FAR cheaper to just make the same 8 core, 6 WGP / 12 CU RDNA2 package, but expand it to 10 WGP / 20 CU and do four channels of DRAM in four SODIMM slots, two on either side of the processor, or, have 16GB soldered on two channels, and have two SODIMM slots for the other two. That's MUCH cheaper than integrating a dGPU that will likely cost over $100, require routing for 128 bits of GRRD6, handle thermals for those packages and the dGPU, handle power for that dGPU, and deal with the UEFI oddities that it requires , and, instead, keep the revenue in AMD's pocket instead.

We aren't seeing many design wins for AMD mobile GPUs in the market. We've got, what, 6 different models out of the 6600M right now, base around 4 distinct chassis designs? Nothing higher in the x86 market, and a few 5500M designs knocking around in the low end? If AMD decided to invest in a SOC that could obviate the need for low end dGPUs, they could get FAR more market penetration and revenue from that market, revenue that they are currently not getting! Sure, they are getting money from low end gaming laptops with 4600/5600H processors, but they are giving away the GPU side to 1650s and 3050s. With a SOC that costs maybe $20 more per unit to make and package, they can achieve an additional $100 in realized revenue per unit in laptops that are price and performance competitive with 1650/3050 discrete designs that allow both the laptop manufacturer and themselves to realize a greater share of the revenue.
Seeing that the console makers have committed financially to GPU development, is there a possibility of a secret agreement preventing AMD from getting too close in performance with a retail client APU?
 
  • Like
Reactions: Spicy

eek2121

Diamond Member
Aug 2, 2005
3,415
5,056
136
Seeing that the console makers have committed financially to GPU development, is there a possibility of a secret agreement preventing AMD from getting too close in performance with a retail client APU?

It makes little business sense to make a giant APU, as much as everyone drools over the idea. AMD would rather sell a GPU and a CPU instead of just a CPU. I am not saying I agree with it.

AMD could consider selling motherboard OEMs radeon mobile chips that would be integrated directly onto the motherboard.

I wish they would get more creative and do something like that.
 

uzzi38

Platinum Member
Oct 16, 2019
2,746
6,655
146
We have a score boys:


Single channel DDR5 and the performance is somewhere in the RX560/GTX1050 region (the 560 is the lower score here)

What'd I say huh? The performance uplift is greater than the improvement to memory bandwidth

This is actually less memory bandwidth than dual channel DDR4-3200 and it still beats the average 4800U score by 50%
ba3973fc0e4e9534432ee609ad245144.jpg
c60bcba5c41b7b92863e28810b8a0dd0.jpg
 

DisEnchantment

Golden Member
Mar 3, 2017
1,777
6,791
136
Those Windows console makers will be really happy.
Put two LPDDR5 6000+ modules and they have a potent GPU for small screen, better than the steam deck.
If it is on N6, that would be around 10% power efficiency gain over plain N7. Then on top of that Hallock mentioned some big engineering work around make parts of the SoC more fine grained power gateable.
Will be interesting to see vs Steam Deck, but of course the Deck SoC is much cheaper.

Single channel DDR5 and the performance is somewhere in the RX560/GTX1050 region (the 560 is the lower score here)
I believe it is operating in dual channel mode here, even if it is single SODIMM, so it is would have achieved at least comparable memory parallelism.
DDR5 compensates for the narrower bus width with a longer Burst Length.
1634757899913.png
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
5,156
5,545
136
It makes little business sense to make a giant APU, as much as everyone drools over the idea. AMD would rather sell a GPU and a CPU instead of just a CPU. I am not saying I agree with it.

AMD could consider selling motherboard OEMs radeon mobile chips that would be integrated directly onto the motherboard.

I wish they would get more creative and do something like that.
Your argument assumes that margins might be similar. The selling price of a low end GPU as compared to high end APU, to me, appears to favor the APU. After-all, the APU + GPU combo replicates close to 40-50% of the silicon of a low end GPU (video). Adding more CU and additional memory controllers does not scale linearly with the performance increase.

With a fair amount of low end motherboards already offering 4 memory slots, the increased cost would be the motherboard routing to 4 channel memory, single slot each.

Pipe dream most likely.
 

uzzi38

Platinum Member
Oct 16, 2019
2,746
6,655
146
I believe it is operating in dual channel mode here, even if it is single SODIMM, so it is would have achieved at least comparable memory parallelism.
DDR5 compensates for the narrower bus width with a longer Burst Length.
View attachment 51673
Yeah, you can still read and write at the same time, but the memory bandwidth is still limited by the 64b bus width instead of the usual 128b bus width you'd get from two modules correctly configured