@vinifera you're free to steal this formatting an include in your post. Makes it easier to reference later. I'll delete when it's moved:
- RESCHEDULING WORK ONTO PERSISTENT THREADS
- SHADER CORE INDEPENDENT SORTING CIRCUIT
- SYSTEMS AND METHODS FOR GRAPHICS PROCESSING UNITS WITH ENHANCED RESOURCE BARRIERS
- IMPROVED QUALITY OF ANIMATION FOR COMPRESSED GEOMETRY
- DEFERRED ANY HIT SHADER EXECUTION FOR REDUCED DIVERGENCE
- HIGH-QUALITY SKINNED OBJECT ANIMATIONS FOR DISPLACED MICRO MESHES
END
---------------------------
Reporting and analysis
#1: Looks like deadlock and long latency stall mitigation that can make GPUs more versatile (i.e. supporting more application types). Introduces fine-grained context saves and restores on GPU down to wavefront level.
Might be related to this patent, that sounds a lot like Volta's Independent Thread Scheduling:
https://patents.google.com/patent/US20250306946A1
- Guessing this is CDNA 5 related.
#2: A Shader Engine level payload sorting circuit coupled to the Work Graph Scheduler. Might also be implemented at CU level. It is a specific HW optimization for work graphs independent of compute units. It
"...improves coherency recovery time by sorting payloads to be consumed by the same consumer compute unit(s) into the same bucket(s). The producer compute units are able to perform processing while the sorting operations are being performed by the sorting circuit in parallel."
While the main target is work graphs the technique
"...applicable to other operations, such as raytracing or hit shading, and other objects, such as rays and material identifiers (IDs)." Complementary to the Streaming Wave Coalescer.
- Since they mention rays it's very possible that this unit is responsible for the ray coalescing against DGF nodes that I described earlier. Very likely a RDNA 5 patent. Chajdas is involved and once again this optimization is crucial for Work Graphs.
#3: This allows a ressource for a second task to be assessed in advance without interfering with first task. It's as follows: execute first task, then initiate second task, but pausing before accessing said ressource, and if ressource for second task is ready after completion of first task then it gets executed. Looks like this is implemented at the Shader Engine level. The patent states:
"...sequential tasks can be executed more quickly and/or GPU resources can be utilized more fully and/or efficiently."
- Not sure about this one, but could end up in RDNA 5 or perhaps CDNA 5.
#4: A method of animated compressed geometry that's based on curved surface patches. This is related to the beyond DMM patent I discussed in prev post.
- Looks too novel to be in RDNA 5 + no HW blocks specified. Gruen is the sole originator.
#5: A method of deferring any hit shader execution until which makes it
"...possible to group the execution of an any hit shader for multiple work-items together, thereby reducing divergence."
- This is a big deal, possibly even bigger than SER if they can make the any hit shader evaluation very coherent.
NVIDIA said this at the launch of SER:
"With increasingly complex renderer implementations, more workloads are becoming limited by shader execution rather than the tracing of rays." Until fairly recently I thought SER was for coalescing ray tracing operations. Yeah I know it's stupid.
- This patent has McAllister listed alongside many researchers. Has to be in RDNA 5 since not including it would be asinine.
#6: This looks like the technique behind the
Animated DMM GPUOpen paper unveiled at Eurographics 2024 and shared by
@basix.
- I don't see specific HW mentions of logic for the animated DMMs beyond basic DMM HW pipeline, but AMD needs this or a better approach because the paper stated that it on RDNA 3 has
"...∼ 2.3−2.5× higher ray tracing costs than regular ray tracing of static DMMs." Gruen is the sole originator.
What can we expect?
#2 and #5 are most important and will almost certainly end up in RDNA 5 on top of what I previously discussed in my last comment. It strongly implies their GFX13 RT implementation is leapfrogging NVIDIA Blackwell by several gens, well at least in sophistication. AMD could decide to just gimp RT cores to save on die space, but overall it looks like AMD might turn the tables against NVIDIA in RT nextgen. Rubin is still a joker so anything could happen and we'll see.
If they loose NVIDIA will prob go: "
RT is for console peasants, now here's a selection of generated AI games that can run on the new 6090 at 20 frames per second. We use DLSS and MFG to run it at 120 FPS xD." or
"Now our tensor cores are so powerful that we can replace most of the ray tracing pipeline and it looks better."
Regardless not surprised AMD and Sony is openly talking about path tracing on future HW when the pipeline looks this capable. Hope they resist temptation offsetting architectural sophistication with less HW by of cutting it down because it's "good enough". It can be amazing it they let it shine.
hey that's normal, Intel GFX R&D guys got swallowed whole by AMD.
Think we're beginning to see the results of that in patent filings rn.
Looks like RDNA5 def won't be short of paradigm shifts and novel ideas.