Discussion RDNA 5 / UDNA (CDNA Next) speculation

Tachyonism · Monday at 12:43 AM

marees said:
RGT on L1 cache pooling/sharing & reduced L2 cache & lack of infinity cache

https://twitter.com/x/status/2015463229980393612

"Patent leak" lol.

marees · Monday at 9:29 AM

wayner. said:
Yes, that’s correct. LLVM has a roughly 6-month release cycle. If AMD misses contributing or integrating for one release, the next official release opportunity would be about 6 months later. So missing a deadline can introduce a half-year delay for official inclusion.

so the question is what did they hope to achieve in these 6 months of H1-2026

Some testing of hardware ?? why does that need LLVM updates ???

Win2012R2 · Monday at 10:14 AM

marees said:
Some testing of hardware ?? why does that need LLVM updates ???

Perfecting drivers? Only I don't think they committed anywhere near the stuff for this, maybe it's just somebody to collect bonus for shipping LLVM enablement on time and budget.

marees · Monday at 3:32 PM

so the target is September 2026 ???

For now this initial AMDGPU GFX13 target is based on the features of GFX12 and GFX1250. Nothing too exciting but it's a start.

This commit lays out that initial GFX13 target. We'll see how the AMD GFX13 support evolves by the time of the LLVM 23.1 stable release around late August or September.

Initial AMD GFX13 Target Merged To LLVM 23 Git - Presumably RDNA5 - Phoronix

www.phoronix.com

adroc_thurston · Monday at 3:37 PM

marees said:
so the target is September 2026 ???

That's just normal compiler enablement schedules.

MrMPFR · 2026-01-27T17:51:44-0500

Missed the interesting TBIMR thread in Zen6 thread so will I'll move some of it here:

Kepler_L2 said:
Well gfx13+ are TBIMR

Yup patents (see next post) confirm.

adroc_thurston said:
As is everything Nvidia since Maxwell.
You're still have no real tile storage (or a real tiler) to distribute screenspace across.

Seems like AMD's implementation is more advanced.

reaperrr3 said:
It's entirely possible AMD's tiling/binning still isn't on Nvidia's level and that AMD won't have true TBIMR until gfx13.

IDK if it's true TBIMR but looks improved.

basix said:
If gfx13 makes good improvent towards a (better) TBIMR we could expect decent efficiency gains (energy and memory bandwidth).

It does look promising, but I can only repeat what the patents say (see next post), not what it means and what the implications are.

MrMPFR · 2026-01-27T17:54:43-0500

TBIMR patents:
Found these patents a while ago but were note sure what to do with them. Now that Kepler has confirmed I'll drop them here. Seems like one of Chris Brennan's last RDNA5 efforts before he left AMD and joined Meta:

#1 TBIMR:
- PPC buffers, per-tile queues managed by FF HW, reduced ressource use, and processing time
https://patentscope.wipo.int/search/en/detail.jsf?docId=US464435295

#2 TBIMR + pixel circuitry balancing:
- Improved load balancing
https://patentscope.wipo.int/search/en/detail.jsf?docId=US464435313

#3 TBIMR + per-tile depth pre-passes
- No need to repeat assembly and shading of primitives
https://patentscope.wipo.int/search/en/detail.jsf?docId=US464435298

#4 TBIMR + SE localized geometry + deferred attribute shading
- "reducing unnecessary computations and memory bandwidth usage"
https://patentscope.wipo.int/search/en/detail.jsf?docId=US464101837

TLPBB a thing?
There's also another rendering pipeline efficiency boosting effort called TLPBB (Two-Level Primitive Batch Binning) that has five public patents but won't list these here unless Kepler confirms.

basix · 2026-01-27T19:04:26-0500

MrMPFR said:
TLPBB a thing?
There's also another rendering pipeline efficiency boosting effort called TLPBB (Two-Level Primitive Batch Binning) that has five public patents but won't list these here unless Kepler confirms.

Isn't that the old news regarding chiplet based GPUs?

https://www.computerbase.de/artikel/grafikkarten/amd-patentantrag-gpu-chiplet-auslastung.81049/

As far as I understand TLPBB and TBIMR could be combined.

MrMPFR · 2026-01-27T19:33:18-0500

basix said:
Isn't that the old news regarding chiplet based GPUs?

https://www.computerbase.de/artikel/grafikkarten/amd-patentantrag-gpu-chiplet-auslastung.81049/

As far as I understand TLPBB and TBIMR could be combined.

Seems like the new TLPBB patents expand on the underlying idea significantly and has been modified to enhance the TBIMR pipeline. But as a standalone thing in general it seems to achieve the same goal: reduce unneccessary computations and memory bandwidth usage.
Here's one of them: https://patentscope.wipo.int/search/en/detail.jsf?docId=US425302144

In the mean time here's an interesting post from Imagination comparing their TBDR vs old school IMR (pre-Maxwell):

A look at the PowerVR Graphics Architecture: Tile-Based Deferred Rendering - Imagination

Explore the intricacies of PowerVR's Tile-Based Deferred Rendering architecture and its efficiency advantages over Immediate Mode Renderers in modern embedded systems.

blog.imaginationtech.com

It seems like the proposed design from the TBIMR patents is a lot closer to TBDR, albeit still without the strict requirements and characterstics, and does look quite different from the simple TBIMR designs we've seen so far.

That is my understanding as well. As I said it'll enhance the TBIMR pipeline by feeding it better inputs while also making the TLPBB pipeline compatible with deferred attribute shading and chiplet friendly designs (local SE geo + pixel pipelines). The TBIMR pipeline already looks like a significant step up in efficiency, but the as I said the five TLPBB patents further enhance this.
It will be interested to see just how large the perf/watt and perf/BW impact of these changes are but if it's anywhere close to TBDR characteristics then that's a massive win.

Just to be certain it'll reiterate that TLPBB hasn't been confirmed unlike TBIMR. There are also many more related patents, that could improve further upon the design, but I won't flood the thread with them.

Have to note that this is based on my limited surface level understanding. Maybe @basix or someone else can do a better job at explaining what those patents achieve?

basix · 2026-01-28T04:00:09-0500

All market players agree that memory bandwidth and data movement (energy efficiency) is the one major concern regarding making chips faster (besides of slowed down Moore's Law). So developing technologies or at least enhancements of existing stuff with such a focus makes very much sense (TBIMR, TBDR, TLPBB, shared L0/L1 caches, universal compression, ...).

MrMPFR · 2026-01-28T05:08:22-0500

basix said:
All market players agree that memory bandwidth and data movement (energy efficiency) is the one major concern regarding making chips faster (besides of slowed down Moore's Law). So developing technologies or at least enhancements of existing stuff with such a focus makes very much sense (TBIMR, TBDR, TLPBB, shared L0/L1 caches, universal compression, ...).

More like the death of Moore's Law for caches and memory. PHY scaling is done and cache is moving at snail pace.

I've rewritten #2,210 after taking a closer look at TBIMR + TLPBB patent derived implementation and in terms of efficiency closer to a TBDR implementation than partial TBIMR (Maxwell and later). This is very impressive and not what I expected at all.
In some of the related patents there's also a common theme of dedicated logic to get rid of unused data, deallocations without writing discarded data back to memory, and a method for overwriting the same physical page multiple times to save on cache and bandwidth use. The last one seems like a great fit for a tile based renderer. The same physical page can be reused many times and overwriting it instead of doing expensive cache line invalidations is just smarter.
Hope that the design implements these changes as well, especially the last one, but we'll see.

Search

Discussion RDNA 5 / UDNA (CDNA Next) speculation

Tachyonism

Junior Member

marees

Platinum Member

Win2012R2

Golden Member

marees

Platinum Member

Initial AMD GFX13 Target Merged To LLVM 23 Git - Presumably RDNA5 - Phoronix

adroc_thurston

Diamond Member

MrMPFR

Member

MrMPFR

Member

basix

Senior member

MrMPFR

Member

A look at the PowerVR Graphics Architecture: Tile-Based Deferred Rendering - Imagination

basix

Senior member

MrMPFR

Member

TRENDING THREADS