Discussion RDNA 5 / UDNA (CDNA Next) speculation

Page 89 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

marees

Platinum Member
Apr 28, 2024
2,217
2,862
96
Yes, that’s correct. LLVM has a roughly 6-month release cycle. If AMD misses contributing or integrating for one release, the next official release opportunity would be about 6 months later. So missing a deadline can introduce a half-year delay for official inclusion.
so the question is what did they hope to achieve in these 6 months of H1-2026

Some testing of hardware ?? why does that need LLVM updates ???
 

Win2012R2

Golden Member
Dec 5, 2024
1,322
1,361
96
Some testing of hardware ?? why does that need LLVM updates ???
Perfecting drivers? Only I don't think they committed anywhere near the stuff for this, maybe it's just somebody to collect bonus for shipping LLVM enablement on time and budget.
 

marees

Platinum Member
Apr 28, 2024
2,217
2,862
96

MrMPFR

Member
Aug 9, 2025
198
397
96
Missed the interesting TBIMR thread in Zen6 thread so will I'll move some of it here:

Well gfx13+ are TBIMR
Yup patents (see next post) confirm.

As is everything Nvidia since Maxwell.
You're still have no real tile storage (or a real tiler) to distribute screenspace across.
Seems like AMD's implementation is more advanced.

It's entirely possible AMD's tiling/binning still isn't on Nvidia's level and that AMD won't have true TBIMR until gfx13.
IDK if it's true TBIMR but looks improved.

If gfx13 makes good improvent towards a (better) TBIMR we could expect decent efficiency gains (energy and memory bandwidth).
It does look promising, but I can only repeat what the patents say (see next post), not what it means and what the implications are.
 

MrMPFR

Member
Aug 9, 2025
198
397
96
TBIMR patents:
Found these patents a while ago but were note sure what to do with them. Now that Kepler has confirmed I'll drop them here. Seems like one of Chris Brennan's last RDNA5 efforts before he left AMD and joined Meta:

#1 TBIMR:
- PPC buffers, per-tile queues managed by FF HW, reduced ressource use, and processing time
https://patentscope.wipo.int/search/en/detail.jsf?docId=US464435295

#2 TBIMR + pixel circuitry balancing:
- Improved load balancing
https://patentscope.wipo.int/search/en/detail.jsf?docId=US464435313

#3 TBIMR + per-tile depth pre-passes
- No need to repeat assembly and shading of primitives
https://patentscope.wipo.int/search/en/detail.jsf?docId=US464435298

#4 TBIMR + SE localized geometry + deferred attribute shading
- "reducing unnecessary computations and memory bandwidth usage"
https://patentscope.wipo.int/search/en/detail.jsf?docId=US464101837

TLPBB a thing?
There's also another rendering pipeline efficiency boosting effort called TLPBB (Two-Level Primitive Batch Binning) that has five public patents but won't list these here unless Kepler confirms.
 

basix

Senior member
Oct 4, 2024
308
605
96
  • Like
Reactions: MrMPFR

MrMPFR

Member
Aug 9, 2025
198
397
96
Isn't that the old news regarding chiplet based GPUs?

As far as I understand TLPBB and TBIMR could be combined.
Seems like the new TLPBB patents expand on the underlying idea significantly and has been modified to enhance the TBIMR pipeline. But as a standalone thing in general it seems to achieve the same goal: reduce unneccessary computations and memory bandwidth usage.
Here's one of them: https://patentscope.wipo.int/search/en/detail.jsf?docId=US425302144

In the mean time here's an interesting post from Imagination comparing their TBDR vs old school IMR (pre-Maxwell):
It seems like the proposed design from the TBIMR patents is a lot closer to TBDR, albeit still without the strict requirements and characterstics, and does look quite different from the simple TBIMR designs we've seen so far.

That is my understanding as well. As I said it'll enhance the TBIMR pipeline by feeding it better inputs while also making the TLPBB pipeline compatible with deferred attribute shading and chiplet friendly designs (local SE geo + pixel pipelines). The TBIMR pipeline already looks like a significant step up in efficiency, but the as I said the five TLPBB patents further enhance this.
It will be interested to see just how large the perf/watt and perf/BW impact of these changes are but if it's anywhere close to TBDR characteristics then that's a massive win.

Just to be certain it'll reiterate that TLPBB hasn't been confirmed unlike TBIMR. There are also many more related patents, that could improve further upon the design, but I won't flood the thread with them.

Have to note that this is based on my limited surface level understanding. Maybe @basix or someone else can do a better job at explaining what those patents achieve?
 
Last edited:
  • Like
Reactions: basix

basix

Senior member
Oct 4, 2024
308
605
96
All market players agree that memory bandwidth and data movement (energy efficiency) is the one major concern regarding making chips faster (besides of slowed down Moore's Law). So developing technologies or at least enhancements of existing stuff with such a focus makes very much sense (TBIMR, TBDR, TLPBB, shared L0/L1 caches, universal compression, ...).
 
  • Like
Reactions: MrMPFR

MrMPFR

Member
Aug 9, 2025
198
397
96
All market players agree that memory bandwidth and data movement (energy efficiency) is the one major concern regarding making chips faster (besides of slowed down Moore's Law). So developing technologies or at least enhancements of existing stuff with such a focus makes very much sense (TBIMR, TBDR, TLPBB, shared L0/L1 caches, universal compression, ...).
More like the death of Moore's Law for caches and memory. PHY scaling is done and cache is moving at snail pace.

I've rewritten #2,210 after taking a closer look at TBIMR + TLPBB patent derived implementation and in terms of efficiency closer to a TBDR implementation than partial TBIMR (Maxwell and later). This is very impressive and not what I expected at all.
In some of the related patents there's also a common theme of dedicated logic to get rid of unused data, deallocations without writing discarded data back to memory, and a method for overwriting the same physical page multiple times to save on cache and bandwidth use. The last one seems like a great fit for a tile based renderer. The same physical page can be reused many times and overwriting it instead of doing expensive cache line invalidations is just smarter.
Hope that the design implements these changes as well, especially the last one, but we'll see.
 
  • Like
Reactions: marees