does all this benefit RDNA 5 (& DirectX 13 / xbox next etc.)
If the patents (#1-6) are part of RDNA 5 then sure helluva lot. Scheduling is tailored to
Programmable Shaders 2.0. Maybe
@Kepler_L2 can confirm if any will be missing?
The potential speedup is massive especially with rewritten game engines (GCN Mantle fine wine on steroids). That's if we only focus on the six patents I listed in earlier post, ignoring all the potential data management, cachemem, dispatch, and execution related changes from patent filings. Add those on top and the gains are probably even greater.
This quote from the WGP Local launcher patent illustrates it quite well:
"In some implementations, the workgroup processor local launch mechanism provides an order-of-magnitude improvement to thread launch performance, allowing finer-grained dispatches, local consumption of data within a compute unit, and much improved performance in highly variable workloads. For example, the local launch mechanism improves the performance of application program interfaces that utilize work graphs by allowing a workgroup processor to self-schedule work without needing to submit a request to a work scheduling mechanism such as a command processor. Additionally, enabling resources to be allocated by either the local launcher or the SPI allows for better distribution of workloads that use both at the same time, such as graphics functions running concurrently with compute functions."
Also dug up some old responses from Kepler related to patents (see below). Seems like #6 (WGP local launcher) is confirmed to work through compiler and I would guess #1-3 (WGS related) are hardcoded into design, but not sure if that needs compiler as well. Assume it's very likely that some aspects of #5 (smarter graphics pipeline) could be enabled via compiler.
Clarifying API change isn't mandatory:
Yes but it goes beyond that. WorkGraphs are used by developers but the WGP Local Scheduler can be used by the shader compiler without developer intervention.
Work Graphs RDNA3 gen support:
Work Graphs is only supported on RDNA3 and later, and is really only optimized for performance in RDNA5 and later.
Meanwhile on Linux it can be emulated through VKD3D-Proton:
Didn't know emulating Work Graphs was even possible. Compute shaders FTW!
Assessing my old RDNA3 vs RDNA5 Work Graphs description + GCN fine wine 2.0:
Pretty accurate
It will probably end up a repeat of Async Compute performance advantage for AMD for a few years
Can recommend exchange from early to mid August last year (begins at page 33). Quite a lot of confirmations by Kepler regarding RDNA5.
IF they've implemented some limited form of true dataflow execution like in
this patent, then the perf and efficiency gains should extend even further, but have no idea if that's part of RDNA5.