Commentary to #1,967. In advance sorry for the wall of text.
#1. Superior BVH maintenance cost due to no rebuilds with animating/moving geometry. Much higher flexibility for RT due BVH overhead which makes new RT practical at real times. Does all this by extending DGF functionality. SW patent, but perhaps hardcoded into RDNA5's DGF decompressor.
Related to DGF/DMM, at 20:25, One of the AMD RT Fellow Architects is asked that can DGF and DMM be combined and he basically says yes but it will require architecture work to do so and the goal is to have a DGF base layer with DMM encoded on top of the DGF base layer.
#2. DGF with subvisions (implicit and explicit, essentially DMM like) as
@vinifera spotted^. Authors at HPG match 1:1 + one month after patent filing. Noticed the patent said extremely efficient geometry compression, very rare to see such strong wording in patents.
We can derive impact to BVH build times (~8-20X RTX MG) and storage overhead (>30X easily) as well as this is effectively RTX MG's CBLAS (DGF = CBLAS as per patents) with DMM multiplier on top.
#3. Additional improvement to standard DGF compression for implied mesh topologies that permits omitting the index buffer entirely and just storing the vertices, from which this can be inferred.
Wrap up #1-3: It seems like AMD will finally address the BVH issue entirely allowing ray tracing against the full detail geometry without any compromises (doesn’t apply to shading though). DGF rn isn't the full picture and expect AMD to update it in the future alongside a BVH SDK superior to RTX MG.
#4. This is a very interesting idea and it could extend to many others things besides collisions and ray tracing; really everything that requires spatial awareness can benefit from it. No more dumb AI vision and hearing etc... Potential for a larger impact to gameplay realism and immersion than PT and should be doable on PS6 and RDNA5 given the large expected ray traversal gains.
Hopefully AMD, MS and Sony can agree on some new universal DXR-like standard that doesn't care what the input and output is (assume it's a ray). I know it's not the same framework across all but assuming basically identical frameworks similar how it is now, then this standardization would help with game adoption. It would also make things a lot easier for devs, no need for many different systems for different things. Just trace everything through a universal shared BVH = massive engine simplification.
This one seems kind of silly. You generally don't want the same collision geometry and render geometry, render geometry is too complex for efficient physics calculation. Hunting down overly complex physics meshes is a pretty common optimization step when you're trying to speed up CPU performance.
This only tracing “rays” for collision detection and other stuff, doesn’t offload actual physics calculations or change them. Doom TDA already uses this. Not sure if it’s the same geometry but would suspect that as they said 1 pixel accuracy.
#5. Precomputed (at BVH build) and dynamic (along ray path) tagging of nodes (discard values) in a BVH which removes nodes from consideration during traversal and intersection. This leads to far fewer ray-box intersection tests, stack pushes, and subtree traversals. It seems like the benefits scale proportionately with BVH width potentially allowing for some absurdly wide and shallow BVH trees.
Can be implemented in SW but needs HW fully realize the benefits, but here some modifications are needed. For example
this patent implemented in HW would likely result in a higher quality BVH. In addition one parallel fixed-point intersection tester per ray isn’t practical. For a BVH32 where more than half of nodes are removed that’s a huge waste of ressources and range of intersected nodes could wary a ton. So AMD has to find a way to fill the slots up with calculations likely by executing multiple rays in parallel. Now this is likely only practical if the BVH data is shared between multiple rays. Implementing this requires sorting rays into buckets/payloads by projected ray path destination (spatial coherency sorting) and finally executing the sorted ray buckets in parallel on each RT core.
Wrap up #5: Discard values further extend the massive RT traversal uplifts expected in RDNA5 (IF included) and should finally make ludicrously wide BVH viable. For reference fixed-point only allowed very wide BVH. I really did not expect to see another idea of a similar if not even bigger magnitude than low-precision intersection testing. Realistically with only a modest area overhead gains (at iso-node) due to new traversal logic (Radiance cores) the performance/area gains are conceivably so massive that they can get away with not even spending area more on intersection logic but we'll see. Also wonder how this is going to run (shaders or HW accel) and if it would benefit from a GPU BVH builder (no idea).
For RT shading overhead there's SER, OMM, and soon prob SER on steroids, which is delayed anyhit shader evaluation combined with HW accell payload sorting and execution via work graphs or later evolution (see prev posts from October). Then for RT traversal overhead there's low precision prefiltering, ray coherency sorting (again see Oct posts) and now this insane BVH pruning for ray traversal. What other innovations lie around the corner? I honestly can't wait to see how NVIDIA counters all this and what other innovations AMD has in the pipeline.
There are some early indications; like let's remember as lighting becomes more complex shading overhead begins to baloon out of control and dominate RT traversal, making it far less important than shading performance. So even if NVIDIA looses in RT traversal to RDNA5 they could decide to use some further refinement of GATE and much stronger tensor cores (CPX = 6090?) to brute force MLP overhead. Assuming MLP based neural rendering will replace ALL PT shading, heck even most traversal, bar a limited MLP training input, NVIDIA will prob just move the goalpost even further. Certain they'll find a way to spam as many offline renderer quality effects into the pipeline as possible forcing the 6090 to its knees (4K DLSS P 40-50FPS xD). While this application of MLPs is prob excessive I really hope RDNA5's ML HW is much much stronger than RDNA4's (on paper specs and HW optimizations); enough to drive many lighting MLPs, improved FSR suite and many other things.
Do we have leaks for RDNA5's ML FP8 rate per CU vs RDNA4 per WGP?