Work Graphs is only supported on RDNA3 and later, and is really only optimized for performance in RDNA5 and later.They can't run on base PS5?
Work Graphs is only supported on RDNA3 and later, and is really only optimized for performance in RDNA5 and later.They can't run on base PS5?
Ah, that's a pity, but at least this looks like nice exciting stuff for next gen - far more promising than mesh stuff and RTRT obviously limited by hardwareWork Graphs is only supported on RDNA3 and later, and is really only optimized for performance in RDNA5 and later.
Seems like work graphs got a lot more potential benefits
Work Graphs is only supported on RDNA3 and later, and is really only optimized for performance in RDNA5 and later.
Could just as easily mean that the current sw solution(s) that work graphs are designed to replace are pretty suboptimal, such that even a relatively unoptimised hw µArch can extract a decent perf uptick.AMD already touted work graphs vs IE = 39% less ms for PCG with 7900XTX and you're tell me it isn't even optimized for performance at HW level. Impressive!
Could just as easily mean that the current sw solution(s) that work graphs are designed to replace are pretty suboptimal, such that even a relatively unoptimised hw µArch can extract a decent perf uptick.
It wouldn't surprise me at all to learn that certain features of the API are essentially just lower level equivalents to what was already in OpenGL or DirectX at the time of Vulkan's inception and hadn't yet been replaced with a more optimal solution.
Is Nvidia fully onboard with this?No it's not another DX11 -> DX12 shift in code abstraction. Work graphs completely changes how a GPU is programmed and what a GPU even is. Think of it as another GPGPU or Programmable shader moment.
And why do you feel that way? NVIDIA has supported work graphs for >=Ampere since 551.76. I think they were first with driver support if I remember correctly.Is Nvidia fully onboard with this?
I get the feeling AMD is ahead of the curve here just like with Mantle, and unlikely Nvidia like this.
Is Nvidia fully onboard with this?
I get the feeling AMD is ahead of the curve here just like with Mantle, and unlikely Nvidia like this.
Will AMD again delay like RDNA 4 to launch behind Nvidia or take the lead with RDNA 5 ??Yes and they have blogposts about it. Really don't think they have a choice.
Likely another repeat of GCN+Mantle. AMD is building spearheading it and basically building an entire GPU architecture around it with GFX13.
NVIDIA has superior official backwards HW support (Ampere and later). One gen earlier than AMD (RDNA 3 and later).
They were forced to adapt last time (Turing) but didn't mind as they were already building an AI and compute monster. Maybe Rubin will be a repeat of that.
Pretty accurateAnything is pretty suboptimal compared to work graphs especially if the underlying workload is either branchy, recursive, and/or data-dependent: SpMV, RTRT and PT, ML, physics simulations, PCG etc...
No it's not another DX11 -> DX12 shift in code abstraction. Work graphs completely changes how a GPU is programmed and what a GPU even is. Think of it as another GPGPU or Programmable shader moment.
One definition of GPU Work graphs could be: "Autonomous execution and scheduling of dependency-driven task graphs on a GPU with more fine-grained parallelism and dynamic execution."
This sounds nothing like Execute Indirect. Execute Indirect was about adressing the shortcomings of high level APIs not complete reinventing the API as is the case with GPU work graphs.
Work graphs in RDNA 3 vs RDNA 5
Based on my limited understanding GFX13 has aligned the HW scheduling scheme with GPU work graphs by making it as decentral and localized as possible which helps to further harness the benefits of GPU work graphs.
One example is WGP local self launch where the HW matches SW's independent task launch for a node in a work graph.
Another example is the autonomous schedulers and task dispatchers within each Shader Engine instead of relying on the command processor.
RDNA 3: everything is orchestrated through L2 cache and command processor.
RDNA 5: work is scheduled (WGS) and dispatched (ADC) within in each Shader Engine (L1 cache) and when required each WGP (L0 cache) can launch and manage its own work when a node in a work graph launches tasks.
Result: Much lower scheduling latency and more fine grained scheduling and dispatch benefiting most tasks although branchy, recursive, and/or data-dependent tasks should see the biggest benefits.
This probably only scratches the surface and with GFX13 I would expect many more changes benefitting work graphs. The schedulers will prob be upgraded to better handle the complexity of work graphs.
As for SWC that is probably most beneficial to coalescing launches.
On page 223 in the course notes PDF states this "We can specify how many records to the same node should be grouped together at maximum, and how many threads the group that is processing this collection should have."
RDNA introduced Wave-32, wouldn't at all be surprised if RDNA 5 introduces additional wave execution granularity and flexibility since this could greatly benefit work graphs in some cases, but there's still no patent for this.
@Kepler_L2 is this accurate or am I missing some important details?
It will probably end up a repeat of Async Compute performance advantage for AMD for a few yearsIs Nvidia fully onboard with this?
I get the feeling AMD is ahead of the curve here just like with Mantle, and unlikely Nvidia like this.
Rubin will launch firstWill AMD again delay like RDNA 4 to launch behind Nvidia or take the lead with RDNA 5 ??
Seems perfect for distributed chiplet designs.Anything is pretty suboptimal compared to work graphs especially if the underlying workload is either branchy, recursive, and/or data-dependent: SpMV, RTRT and PT, ML, physics simulations, PCG etc...
No it's not another DX11 -> DX12 shift in code abstraction. Work graphs completely changes how a GPU is programmed and what a GPU even is. Think of it as another GPGPU or Programmable shader moment.
One definition of GPU Work graphs could be: "Autonomous execution and scheduling of dependency-driven task graphs on a GPU with more fine-grained parallelism and dynamic execution."
This sounds nothing like Execute Indirect. Execute Indirect was about adressing the shortcomings of high level APIs not complete reinventing the API as is the case with GPU work graphs.
Work graphs in RDNA 3 vs RDNA 5
Based on my limited understanding GFX13 has aligned the HW scheduling scheme with GPU work graphs by making it as decentral and localized as possible which helps to further harness the benefits of GPU work graphs.
One example is WGP local self launch where the HW matches SW's independent task launch for a node in a work graph.
Another example is the autonomous schedulers and task dispatchers within each Shader Engine instead of relying on the command processor.
RDNA 3: everything is orchestrated through L2 cache and command processor.
RDNA 5: work is scheduled (WGS) and dispatched (ADC) within in each Shader Engine (L1 cache) and when required each WGP (L0 cache) can launch and manage its own work when a node in a work graph launches tasks.
Result: Much lower scheduling latency and more fine grained scheduling and dispatch benefiting most tasks although branchy, recursive, and/or data-dependent tasks should see the biggest benefits.
This probably only scratches the surface and with GFX13 I would expect many more changes benefitting work graphs. The schedulers will prob be upgraded to better handle the complexity of work graphs.
As for SWC that is probably most beneficial to coalescing launches.
On page 223 in the course notes PDF states this "We can specify how many records to the same node should be grouped together at maximum, and how many threads the group that is processing this collection should have."
RDNA introduced Wave-32, wouldn't at all be surprised if RDNA 5 introduces additional wave execution granularity and flexibility since this could greatly benefit work graphs in some cases, but there's still no patent for this.
@Kepler_L2 is this accurate or am I missing some important details?
Feels like really exciting stuff and it isn't "ai", that's why!And why do you feel that way?
Hopefully they have accelerated RDNA 5 as much as possible, MLID keeps saying it's early 2027 but it sounds bull to me.Will AMD again delay like RDNA 4 to launch behind Nvidia or take the lead with RDNA 5 ??
Has it been confirmed that Rubin will also be in gaming cards? I guess after Blackwell fiasco Nvidia got no choice really, they'd better put out cracking 6000 series or elseRubin will launch first
It's just a name, Ampere and Blackwell have the same name for datacenter and gaming GPUs but the architectures are completely different.Hopefully they have accelerated RDNA 5 as much as possible, MLID keeps saying it's early 2027 but it sounds bull to me.
Has it been confirmed that Rubin will also be in gaming cards? I guess after Blackwell fiasco Nvidia got no choice really, they'd better put out cracking 6000 series or else
Volta was also the name, no?It's just a name, Ampere and Blackwell have the same name for datacenter and gaming GPUs but the architectures are completely different.
Not what I meant.No it's not another DX11 -> DX12 shift in code abstraction
It wouldn't be the first time AMD had taken so long between one high end SKU and another.MLID keeps saying it's early 2027 but it sounds bull to me
Seems perfect for distributed chiplet designs.
Not what I meant.
I meant that it wouldn't surprise me if many/most API features in Vulkan v1 were just lower level equivalent implementations of features with the same name in OGL or DX11.
As in literally just blank replication of the same feature set in the higher level gfx APIs without redesigning for modern hardware capabilities or more modern software paradigms.
Given Kepler's take that RDNA5 represents the biggest change to AMD's GPU µArch since Southern Islands or GCN v1 then it's a significantly bigger step forward vs Vega -> RDNA1 or even Vega -> RDNA2, which will have no small technical debt attached to it on the driver side of things.
Especially if AMD wants to get serious about non CDNA ROCm use and give RDNA5 GPUs first class citizenship in ROCm from day 1, which is a whole extra thing on top of graphics API compliance and optimisation.
Bearing in mind that real time RT/PT added a whole extra chunk of work to do on the gfx side already vs a decade ago.
Even if they eliminated driver level work on OGL + DX7-11 and just left it to Zink/DXVK Vulkan translation it's stil so much work to stand up a new µArch.
These are marketing names and not used internally. They either use gfx number or in some cases MIxxx Navix (and now ATx)Assuming RDNA 5 name sticks around that's unlike the previous rebrands (Terascale, GCN, and RDNA). Is UDNA still being used internally @Kepler_L2?
gfx9 to gfx10 lmaoHow many gens do we skip to find a change equivalent to RDNA 4 -> RDNA 5 @Kepler_L2?
it's same time interval, like first Ampere, and then RDNA2 or more gap?Rubin will launch first
gfx9 to gfx10 lmao