Discussion RDNA 5 / UDNA (CDNA Next) speculation

Joe NYC · Friday at 3:44 AM

adroc_thurston said:
oh no that's not how any of that works.

no, GPU is the SOC tile.

I was wondering if these would end up standalone or shared die.

Sharing memory IO would suggest a shared die would be more optimal.

marees · Friday at 3:46 AM

Joe NYC said:
I was wondering if these would end up standalone or shared die.

Sharing memory IO would suggest a shared die would be more optimal.

If I grokked MLID correctly the cpu is in the i/o die

Then you have 2 options:

Add another cpu die (replacement to strix point)
Add another gpu die (medusa premium & also halo I think)

Tuna-Fish · Friday at 3:47 AM

I wonder what will be the smallest LPDDR6 chip available? IIRC the smallest LPDDR5 ones are 8Gb, but I dunno if those are even still in production, the most common ones I see around are 12Gb.

AMD might be forced to put more memory in AT3 cards than on AT2 ones?

marees · Friday at 3:48 AM

Tuna-Fish said:
I wonder what will be the smallest LPDDR6 chip available? IIRC the smallest LPDDR5 ones are 8Gb, but I dunno if those are even still in production, the most common ones I see around are 12Gb.

AMD might be forced to put more memory in AT3 cards than on AT2 ones?

Need a thread for how much vram is too much vram 😉

Tuna-Fish · Friday at 3:50 AM

I mean, it's great for us, but might be weird for their product lineup.

Joe NYC · Friday at 3:51 AM

marees said:
If I grokked MLID correctly the cpu is in the i/o die

Then you have 2 options:

Add another cpu die (replacement to strix point)

Add another gpu die (medusa premium & also halo I think)

One thing that contradicts is the XBox configuration, shown in previous videos, where there is a base monolithic CPU-SOC die and a separate GPU die

And this video suggests this approach may be shared with laptops.

marees · Friday at 3:58 AM

Joe NYC said:
One thing that contradicts is the XBox configuration, shown in previous videos, where there is a base monolithic CPU-SOC die and a separate GPU die

And this video suggests this approach may be shared with laptops.

No the Xbox is gddr7 — desktop mode
Medusa halo & premium are lpddr6 — laptop mode

So the architecture changes. But still take all this with mountains of salt as MLID is the source

Joe NYC · Friday at 4:08 AM

marees said:
No the Xbox is gddr7 — desktop mode
Medusa halo & premium are lpddr6 — laptop mode

So the architecture changes. But still take all this with mountains of salt as MLID is the source

It is not spelled out what is on each die, but the way I understand it is that what he calls IOD in Medusa Point is a die that also has the base set of cores ~4 full + 4-8 dense + 2 LP cores.

And this base die would be the base, low cost monolithic laptop die.

Then, you can add on one end the 12 Zen 6 CPU chiplet
And in Medusa Mini, on the other end, you can add, the small GPU chiplet.

On Medusa Full, instead, add the big GPU chiplet.

marees · Friday at 9:01 AM

marees said:
Any guesses on (2027) launch prices ??

AT0 (10090xt > 5090) — 384 bit bus so $1500+ ?

AT1 (10080xt) — scrapped

AT2 (10070xt = 5080 > xbox next) — 72 CU 192bit gddr7 so $600+

AT3 (10060xt < 5070) — 48 CU 384bit lpddr6 so $400+

9060xt 16gb (=ps5 pro) ~ $300

AT4 (10050xt > 3060 12gb in raster) — 24 CU 128bit lpddr6 so $250 ?

My revised estimations / guesstimates (no LLMs used)

(Assuming this lpddr vram thingy is true & also assuming it works out) Imagine this line-up (in 2027)

AT0
- 10090xt+ — Multiple models starting at $1500 plus and huge vram like Radeon VII or titan
AT1
- 10080xt — scrapped (Lisa Su took her toys & went home)
AT2 (gddr7)
- 10070 xtx 24gb = $700 (~5080)
- 10070 xt 18gb = $600 (~5070 ti)
- 10070 gre 15gb = $500-$550 (~5070 super)
AT3 (lpddr6)
- 10060 xt 24gb = $450-$500 (~5070)
- 10060 16gb = $400 (~5060ti 16gb)
AT4 (lpddr6/lpddr5x)
- 10050xt 32gb = $350 (~9060xt 16gb)
- 10050xt 24gb = $300 (~9060)
- 10040xt 16gb = $250 (~3060 12gb in raster)

basix · Friday at 12:31 PM

Why adding so much memory to AT2, AT3 and AT4? I would assume 18gb / 16gb / 12gb for these.

I'd like to get more, but it is unlikely that we are seeing that.

gdansk · Friday at 12:36 PM

basix said:
Why adding so much memory to AT2, AT3 and AT4? I would assume 18gb / 16gb / 12gb for these.

I'd like to get more, but it is unlikely that we are seeing that.

never say the AI bubble didn't do anything for you

marees · Friday at 12:53 PM

basix said:
Why adding so much memory to AT2, AT3 and AT4? I would assume 18gb / 16gb / 12gb for these.

I'd like to get more, but it is unlikely that we are seeing that.

AT3 & AT4 are joke guesses because MLID said lpddr

AT2 I have to give a serious rethink.
I am now thinking xtx will use 4gb gddr7 while xt & gre will use 3gb gddr7

Saylick · Friday at 12:55 PM

marees said:
My revised estimations / guesstimates (no LLMs used)

(Assuming this lpddr vram thingy is true & also assuming it works out) Imagine this line-up (in 2027)

AT0

10090xt+ — Multiple models starting at $1500 plus and huge vram like Radeon VII or titan

AT1

10080xt — scrapped (Lisa Su took her toys & went home)

AT2 (gddr7)

10070 xtx 24gb = $700 (~5080)

10070 xt 18gb = $600 (~5070 ti)

10070 gre 15gb = $500-$550 (~5070 super)

AT3 (lpddr6)

10060 xt 24gb = $450-$500 (~5070)

10060 16gb = $400 (~5060ti 16gb)

AT4 (lpddr6/lpddr5x)

10050xt 32gb = $350 (~9060xt 16gb)

10050xt 24gb = $300 (~9060)

10040xt 16gb = $250 (~3060 12gb in raster)

I'd be surprised if the 10070 XT or whatever they call it only ends up being a 5070 Ti at $600. That's basically the same as a 9070 XT in perf/$ but with 50% more VRAM.

marees · Friday at 12:57 PM

Saylick said:
I'd be surprised if the 10070 XT or whatever they call it only ends up being a 5070 Ti at $600. That's basically the same as a 9070 XT in perf/$ but with 50% more VRAM.

But path tracing !!!

basix · Friday at 1:22 PM

marees said:
AT3 & AT4 are joke guesses because MLID said lpddr

AT2 I have to give a serious rethink.
I am now thinking xtx will use 4gb gddr7 while xt & gre will use 3gb gddr7

Even if they are using LPDDR6, why adding more memory than useful? These things have to be cheap and 16 GByte for a mainstream GPU and 12 GByte for the Low End part seem to be reasonable.
Yes, you can add more memory. But why should AMD do that when not of big benefit for the average gamer?
The same logic applies to AT2 with GDDR7. 18 GByte are most reasonable on 192bit with 24 Gbit chips. And 18 GByte are also perfectly suited for a 1440p card and also fine enough for 4K with upscaling. Most gamers won't benefit from 24 GByte but would have to pay more.

For workstation and professional parts it will be another story. There you could attach 128 GByte to a 2-ch LPDDR6 bus if AMD wants to.

gdansk said:
never say the AI bubble didn't do anything for you

Maybe this ML/AI stuff is even the lone reason for exchanging a GDDR7 with a LPDDR6 memory interface on the lower end parts. You can build dGPUs aside of APUs with the same chips and the same humongous amounts of memory for the professional market. Good for ML/AI workloads and probably also nice for other workstation applications where FLOPS of a midrange GPU are enough but more memory is welcome (EDA etc.).

marees · Friday at 1:29 PM

basix said:
Even if they are using LPDDR6, why adding more memory than useful? These things have to be cheap and 16 GByte for a mainstream GPU and 12 GByte for the Low End part seem to be reasonable.
Yes, you can add more memory. But why should AMD do that when not of big benefit for the average gamer?
The same logic applies to AT2 with GDDR7. 18 GByte are most reasonable on 192bit SI with 24 Gbit chips. And 18 GByte are also perfectly suited for a 1440p card and also fine enough for 4K with upscaling. Most gamers won't benefit from 24 GByte but would have to pay more.

Not sure what will happen with AT2

But for AT3 & AT4 the digital phone camera & mega pixels scenario applies, imo
Basically marketing

Anecdotally there was a 2gb vram Nvidia card much slower than a 1gb vram card but my colleague bought the slower 2gb one & very proudly proclaimed that he bought the 2gb. That is the market I have in mind for AT3 & AT4. definitely not forum users

basix · Friday at 1:37 PM

I know what you are thinking of. But does that work today as good as in the past? Anybody can pull out ChatGPT to ask for the better GPU (and might get the correct answer - or not).
For example, the megapixel race has pretty much ended. Many new phones and cameras get released with fewer pixels than their predecessor. People either gained more knowledge (more pixels != more quality), simply don't care because good enough or are not interested in technical details.

Higher VRAM amounts on lower end parts make the more expensive ones less attractive as well.

marees · Friday at 1:57 PM

basix said:
I know what you are thinking of. But does that work today as good as in the past? Anybody can pull out ChatGPT to ask for the better GPU (and might get the correct answer - or not).
For example, the megapixel race has pretty much ended. Many new phones and cameras get released with fewer pixels than their predecessor. People either gained more knowledge (more pixels != more quality), simply don't care because good enough or are not interested in technical details.

Higher VRAM amounts on lower end parts make the more expensive ones less attractive as well.

You are logical
But imo AMD needs a marketing trick to beat the 6050 9gb?
This could be it

But now that Jensen knows of this he will be scheming up a riposte for this

marees said:
If you had these 3 options (for an entry level GPU) then which one are you buying 🤔

10050xt 32gb = $350 (~9060xt 16gb)

10050xt 24gb = $300 (~9060)

10040xt 16gb = $250 (~3060 12gb in raster)

Win2012R2 · Friday at 2:11 PM

marees said:
But now that Jensen knows of this he will be scheming up a riposte for this

He is too busy for that

Tuna-Fish · Friday at 3:43 PM

basix said:
Why adding so much memory to AT2, AT3 and AT4? I would assume 18gb / 16gb / 12gb for these.

I'd like to get more, but it is unlikely that we are seeing that.

AT3 and AT4 use LPDDR interfaces. AT3 supposedly has 384 bit LPDDR6. This puts a lower limit on the amount of ram, as I doubt there will even be very small LPDDR6 chips.

basix · Friday at 4:32 PM

Good point. 16...24 Gbit chips are probably the lower boundary. There will for sure be no 8 Gbit Chips.

Which then makes 4 modules and therefore 8...12 GByte on a "Dual-Channel" 192bit LPDDR6 interface and 8 modules resulting in 16...24 GByte at Quad-Channel. Still everything possible

Magras00 · 2025-08-23T05:31:02-0400

marees said:
Cc @dangerman1337

AT3 — 48 CU 384 bit lpddr6
AT4 — 24 CU 128 bit lpddr6

MLID said LPDDR5X not LPDDR6.

Tuna-Fish said:
I wonder what will be the smallest LPDDR6 chip available? IIRC the smallest LPDDR5 ones are 8Gb, but I dunno if those are even still in production, the most common ones I see around are 12Gb.

32Gb per chip according to Cadence. Yep no 16Gb or even 24Gb capacities.

They list 4-64GB device densities but it's for 48bit bus width vs LPDDR5X's 32bit. AT3's quad channel 384bit LPDDR6 has 8 modules as @basix said so minimum capacity of 32GB cappint out at 512GB no clamshell.
256bit LPDDR5X AT3 design using 9600Mbps or faster lowest amount is 24GB and up to 128GB without clamshell.

Yeah Samsung only lists 12Gb but that's for slower LPDDR5X not practical for AT4 unless it's really weak. Samsung lists LPDDR5X 9600Mbps densities from 24-128Gb. So AT4 based designs can be 12-64GB without clamshell.

marees said:
AT3 & AT4 are joke guesses because MLID said lpddr

AT2 I have to give a serious rethink.
I am now thinking xtx will use 4gb gddr7 while xt & gre will use 3gb gddr7

LPDDR is unconventional but not unreasonable. It's about trade-offs. GDDR6 has a much higher GB/s per mm^2 for PHYs while LPDDR6 lowers board complexity, power draw, and at least halves $/GB while allowing for true LLM slot-in cards. Trade offs reminiscent of Infinity Cache.

Did some pixel counting to figure out GB/s/mm^2 for LPDDR5X, LPDDR6 (speculative). Skip to conclusion for comparisons.

Strix Halo (N4) 256bit LPDDR5X PHYs (256GB/s @8000Mbps) = ~37mm^2
GB203/5080 (4N) 256bit GDDR7 PHYs (960GB/s @30Gbps) = ~51mm^2
Navi 48/9070 XT (N4) 256bit GDDR6 PHYs (645GB/s @20.1Gbps) = 41mm^2

Area overhead of GDDR7 is ~25%. No info on LPDDR6 yet but worst case let's use 25% overhead from GDDR7 and add 50 percent for 384bit PHY (could be lower if 256bit area matches 384bit). LPDDR6 384bit PHY ~69mm^2 but there's no way the PHYs will be this big.

AT3 128bit GDDR7 vs 384bit LPDDR6
With higher data rates (assuming no change to area)
69mm^2 384bit @12Gbps LPDDR6 = 576GB/s
51mm ^2 256bit @36Gbps GDDR7 = 1152GB/s

128bit GDDR7 enough for 576GB/s. That's 25.5mm^2 vs 69mm^2, so -43.5mm^2 worst case. With N3P at ~$21K/wafer that's +$16 silicon cost.

Conclusion
12Gbps LDDR6 requires 2.71x (speculative worst case) more area per GB/S than 36Gbps GDDR7.
8000Mbps LPDDR5X requires 2.27x more area per GB/s than 20.1Gbps GDDR6.

GDDR6 and GDDR7 requires more interconnects and spacing than LPDDR5X so this somewhat reduces the gap, but it's still massive. If AMD is using LPDDR for AT3 and AT4 then they're increasing GPU die size for some other benefits. With that said 2X memory per tier vs NVIDIA without higher BOM is probably wishful thinking but deprecating MALL and shrinking supersized L2 (as MLID implied) will result in significant area savings.

basix · 2025-08-23T07:06:59-0400

The thing about board complexity can be an important one for mobile designs. If you can move the LPDDRx packages closer to the chip (because lower datarate compared to GDDR), your design gets more compact. Neat for laptops.

I did look up LPDDR5 packages and there are 64bit packages available. If that extends to LPDDR6 (96bit per package then) you could serve Dual-Channel with just two packages and Quad-Channel with four packages. That would be very dense regarding PCB space.
But a single Die should always be 16/32bit (LPDDR5) or 24/48bit (LPDDR6) in width. So if 32 Gbit is the smallest available LPDDR6 Die, AT4 would land at 16 GByte.

As a side note:
There are dual-PHY available, which support both LPDDR5X and LPDDR6. So we could see "anything" regarding SKU definition. Some might use LPDDR6. Some might use LPDDR5X. For chips like AT3 and AT4 such dual-PHY would make sense and allow for a very broad specification range (bandwidth and memory capacity). Ideal for two Die which should allegedly get used for a wide range of applications. Both dGPUs (Low-End to mainstream, professional, ML/AI) and APUs (Premium to Highend, professional, ML/AI) with a wide range of memory capacities and bandwidth (we could think of 8...512 GByte capacity and 100...700 GByte/s bandwidth). LPDDR6/5X is just much better suited for that compared to GDDR7. So for Die re-use between dGPU and APUs that choice of memory type (use LPDDR instead of GDDR) makes additional sense.

https://www.cadence.com/en_US/home/company/newsroom/press-releases/pr/2025/cadence-introduces-industry-first-lpddr65x-144gbps-memory-ip-to.html

Magras00 · 2025-08-23T09:55:15-0400

basix said:
The thing about board complexity can be an important one for mobile designs. If you can move the LPDDRx packages closer to the chip (because lower datarate compared to GDDR), your design gets more compact. Neat for laptops.

I did look up LPDDR5 packages and there are 64bit packages available. If that extends to LPDDR6 (96bit per package then) you could serve Dual-Channel with just two packages and Quad-Channel with four packages. That would be very dense regarding PCB space.
But a single Die should always be 16/32bit (LPDDR5) or 24/48bit (LPDDR6) in width. So if 32 Gbit is the smallest available LPDDR6 Die, AT4 would land at 16 GByte.

As a side note:
There are dual-PHY available, which support both LPDDR5X and LPDDR6. So we could see "anything" regarding SKU definition. Some might use LPDDR6. Some might use LPDDR5X. For chips like AT3 and AT4 such dual-PHY would make sense and allow for a very broad specification range (bandwidth and memory capacity).

https://www.cadence.com/en_US/home/company/newsroom/press-releases/pr/2025/cadence-introduces-industry-first-lpddr65x-144gbps-memory-ip-to.html

x96 packages could allow some interesting designs for mobile indeed. 2027 will be an interesting year for PC tech.

Yep the smallest x64/64 bit LPDDR5X package is 32Gb/s and that's LPDDR5X 7500-8533, the faster memory (>8533Gbps) begins at 48Gb per package. Anything less than 96bit 64Gb packages won't happen. AT3 LPDDR6 = 32GB, AT3 LPDDR5X = 24GB. AT4 LPDDR6 = 16GB, AT3 LPDDR5X = 12GB.

Could be a weird situation with AT3 vs AT2. AT2 could be restricted to 18/24GB offerings (without clamshell), while AT3 defaults to 24/32GB.

Totally forgot. IIRC hasn't AMD mobile APUs sported dual-PHYs since IDK how long. Zen 2 or even earlier? So one dual-PHY that supports lets say LPDDR5X 10677 and LPDDR6 12000 used for AT3 and AT4. Likely a repeat of RDNA 4's Navi 48 and 44. Mirrored die for lower design cost with the addition of IO, display and media moved to seperate MID.

Edit: Like @basix said dual-PHYs are available and I can't see AMD not going for an easy win here. AT4 with 24 RDNA5 CUs with 192bit LPDDR6 12000 16GB with 288 GB/s, that's only 10% regression vs 9060XT, could be very interesting. Add 16-24MB L2, increase clocks by 10-15% vs 9060XT and add ~20% higher raster IPC (Kepler's guesstimate) will allow AMD to easily match 9060XT 16GB in raster with significantly lower BOM. $249-279 seems doable. With 2X RT performance bump (@Kepler_L2 guesstimate) AT4 dGPU would annihilate 9060 XT 16GB in path traced games. 2X increase in raw ray traversal and intersection throughput should land it around a RX 9070. However based on MLID claims, massive PT perf gap between Blackwell and RDNA 4, and AMD patent filings I don't think that 2X figure vs RDNA 4 is high enough.

For the entry market AMD could use slower LPDDR5X + cut down config or clocks for a PCIe only power card with 12GB VRAM around 4060 level perf in the leak. But It'll probably be 4060 or higher not 3060 level.

basix · 2025-08-23T11:21:49-0400

And another consideration could be Apple. The two Medusa Halo versions (12C / 24C) together with AT3 and AT4 could challenge M5/M6 Pro & Max.

Medusa Point competes with the bottom line M5/M6 SoC.

And I would like to get a workstation card with AT3 and 256 or even 512 GByte VRAM

Regarding ML/AI, AT3 and AT4 with LPDDR6 would operate in a similar bandwidth range per tensor/matrices TFLOPS like an RTX 5090. Crazy, if you think about that. So AT3 and AT4 could be quite decent "accelerator cards" for ML/AI. Together with the huge VRAM capacity you could even outmatch RTX Pro 6000 with 96 GByte or GB202 successor with max. 128 GByte for some tasks (if 32 Gbit GDDR7 modules are available), despite featuring much lower raw TFLOPS numbers. And because the chips are small and use cheap memory: Much better bang for the buck. And having the possibility to pair that in APU style with a CPU is just the cherry on top.

I see a strategy there...

Edit:
One funny thought I had now was US export restrictions to China. Because AT3 and AT4 are much below the bandwidth and TFLOPS limits of those restrictions, they could potentially sell well in China.

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Diamond Member

Golden Member

Golden Member

Golden Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Member

Golden Member

Member

Golden Member

Golden Member

Attachments

Golden Member

Member

Member

Member

Member

Member