ToTTenTranz
Senior member
- Feb 4, 2021
- 686
- 1,147
- 136
This is honestly the best outcome, as long as they don’t take away the render backend units...you don't need to improve your NPU anymore if your GPU is very good for AI
This is honestly the best outcome, as long as they don’t take away the render backend units...you don't need to improve your NPU anymore if your GPU is very good for AI
AMD blog on DGF (from feb) — NANITE accelerator for RT use casesDGF + prefiltering related patents:
- PRE-FILTERING NODES FOR BOUNDING VOLUME HIERARCHY,
- Intersection Testing on Dense Geometry Data using Triangle Prefiltering
- Dense Geometry Format
- SIMPLIFIED LOW-PRECISION RAY INTERSECTION THROUGH ACCELERATED HIERARCHY STRUCTURE PRECOMPUTATION
- System and Method for Low-precision Ray Tests
- DISCRETE ROTATIONS FOR ORIENTED BOUNDING BOXES BASED ON PLATONIC SOLIDS
Nvm I didn't recall things correctly. Here's the explanation from Kepler:Thought RX and MI wouldn't become more µarch aligned given Kepler's earlier statements on UDNA.
People misunderstood what "UDNA" meant. It's not a single architecture across gaming and datacenter, but a unification of the development pipeline.
CDNA1/2/3/4 have many architecture advancements that are not in RDNA2/3/4 because they are in a completely different architecture branch.
With "UDNA" strategy, development follows a gaming->datacenter->gaming pattern, where advancements from one type of architecture can be re-integrated into the next if they make sense, but the architectures are still different as they don't need to have the same features (i.e gaming doesn't need strong FP64 or extremely large matrix cores, datacenter doesn't need RT/Texture/Geometry/Raster features).
I wonder how long you will try to act dumb and keep asking this question over and over againWhat's UDNA?
nothing called 'UDNA' exists.I wonder how long you will try to act dumb and keep asking this question over and over again
AMD was the first one to talk about it. No one else made it up.
![]()
AMD announces unified UDNA GPU architecture — bringing RDNA and CDNA together to take on Nvidia's CUDA ecosystem
Two become one.www.tomshardware.com
View attachment 131270
He. Said. It.nothing called 'UDNA' exists.
DGF isn't Nanite it's really just AMD's take NVIDIA's now deprecated Displaced Micro-meshes introduced with Ada Lovelace, albeit with fewer drawbacks and some extra functionality, for example an OMM header (see patent) and I don't recall support for animation in DMM either.AMD blog on DGF (from feb) — NANITE accelerator for RT use cases]
He. Said. It.
The guy responsible for GPUs at AMD.
Fine. Make him unsay it then.
Tomorrow. On news sites. He apologizes and says that the word UDNA was a slip of the tongue.
Interesting: co-compute linked to L3 to avoid cache-thrashing of L1 cache (for memory intensive loads such as RT)
This seems like would need lots of more die area 🤔
Or is this just renaming of RT core to a more generic co-compute core 🤔 🤔
Is this CCU restricted only to RT cores or can it do the matrix*vector multiplication of the tensor cores too ??So it seems, but patent specifies it can be any non-local cache, so they could be coupled to Shader Engine private cache, L2, or MALL.
If the reduced L2 (AT2 = 24MB L2 vs Navi 48 = 64MB MALL) is accurate and CCU is leveraged for RDNA 5 then those being coupled to L2 would shrink L2 for other processes since they require sizeable dedicated. Wonder how AMD engineers would tackle this. A SE cache implementation could happen as well, but it would require a much bigger SE cache, but some other benefits like superior cache latency, closer integration with CUs (routing and latency) etc... Latter seems more likely given the entire thing about the supposedly (not confirmed IIRC) overhaul with autonomous SE scheduling and dispatch (WGS and ADC). In this case CCUs outside SEs would complicate things alot.
There's also a virtual CCU implementation. The patent doesn't clearly specify the difference but sounds like pooled CCU ressources managed by a central scheduler instead of one private CCU for each CU. This one directly conflicts with the WGP Local launch patent, a major HW optimization for workgraps, so still lean towards a non virtual CU-CCU direct link.
Remember CCU offloads work from CU, so overhead might be lower than what it seems. No need to duplicate instructions, but yeah still some overhead, but how much?
Highly doubt that. But some BW heavy RT instructions could be offloaded to CCUs.
You see that in the same interview he calls udna a strategy when asked about merging of cnda and rnda.He. Said. It.
The guy responsible for GPUs at AMD.
Fine. Make him unsay it then.
Tomorrow. On news sites. He apologizes and says that the word UDNA was a slip of the tongue.
The patent is vague so it could probably apply to whatever instructions AMD thinks they need to offload: ML, RT and other cache greedy instructions.Is this CCU restricted only to RT cores or can it do the matrix*vector multiplication of the tensor cores too ??
there is no MALL thereBiggest implications might not be for RDNA 5 but actually MI500. Maybe offload GMV to AID and take advantage of that massive MALL slab? Or perhaps a clever cache hierarchy bypass mode (not in patent), allowing cores to bypass local registers and XCD cache and store everything in AID MALL slab? But all this could be less important if HBM4 with PIM gets rolled out in 2027.
For actual chiplet stuff (SED+AID+MID) I think so, but for the quasi-monolithic GPUs like ATx with just GMD+MID it's gone.
Yeah why not.So MI400 and MI500 also deprecates MALL?
MALL's dead because SRAM is kinda expensive.Kepler doesn't seem certain here, but perhaps that only applies to true chiplet consumer GPUs not DC.
SRAM should start getting less expensive because N2 and lower finally show a substantial increase in transistor density for memory cells.Yeah why not.
MALL's dead because SRAM is kinda expensive.
It's only roughly matching the pathetic 20% logic scaling with N2 after being completely stagnant for one gen (N5->N3) and underwhelming for another gen (N7->N5). Things don't look better for Angstrom era nodes. Complete joke.SRAM should start getting less expensive because N2 and lower finally show a substantial increase in transistor density for memory cells.
Oh lol you delicious troll 🤣😂Maybe the code name for the next version of CDNA is UDNA just to confuse the ever loving hell out of everyone. I proposed Project Opposite Day once, just so that when it was terminated the following week no one was quite sure of the actual status of the project. It didn't sow quite as much confusion as Project Withajee, but for some odd reason I've been banned from coming up with project names.
And I think that interview is the only place where AMD have uttered the word 'UDNA'. It wasn't mentioned in a later finance day... they just talked about CDNA-Next.You see that in the same interview he calls udna a strategy when asked about merging of cnda and rnda.
So make him unsay what exactly.
Yeah why not.
MALL's dead because SRAM is kinda expensive.
Offset by higher wafer costsSRAM should start getting less expensive because N2 and lower finally show a substantial increase in transistor density for memory cells.
Their cache hierarchy is just different now.Are they still doing the stacking? MALL must have been a small percentage of the die...
Not if you go with Samsung's foundries.Offset by higher wafer costs
Yes I too love sub-3% yield on >300mm^2 die.Not if you go with Samsung's foundries.