Oh thank god, it's like they were just begging for nVidia to point it out.it's just that the current WMMA/dual-issue implementation is too limited and doesn't trigger often enough, so AMD stopped advertising dual-issue FLOPs.
Oh thank god, it's like they were just begging for nVidia to point it out.it's just that the current WMMA/dual-issue implementation is too limited and doesn't trigger often enough, so AMD stopped advertising dual-issue FLOPs.
They still advertise dual-issue FLOPs, what they don't do is advertise 2x number of "streaming processors" because they don't want another Bulldozer lawsuit. I expect that will change with RDNA5RDNA4 (and RDNA3) already have 128 ALUs per CU, it's just that the current WMMA/dual-issue implementation is too limited and doesn't trigger often enough, so AMD stopped advertising dual-issue FLOPs.
RDNA5 will have the same ALU count per WGP/CU, only with much better utilisation.
What is the spec of medusa point 1 monolithic apu ?The Medusa 3nm-SoC-Die only still has an NPU because it'll be used as stand-alone APU as well, and its small RDNA3.5 IGP can't handle AI workloads.
4x Zen6 4x Zen6c 2x Zen6LPWhat is the spec of medusa point 1 monolithic apu ?
Is it 4 + 8 + 2 ??
Thx. One more question:4x Zen6 4x Zen6c 2x Zen6LP
NoThx. One more question:
Do medusa halo & medusa premium share any chiplets at all ?
Are we sure AT2 is 70CUs? Feels like if AT0 is 192 then AT2 will be 72 CUs since that's the multipler scaling up with the memory bit bus (512/192 = 2.666666666666667, 2.666666666666667 x 72 = 192 as well). Unless AMD has a reason to cut 2 CUs out there.Updated RDNA5-AT2 Lineup Speculation
View attachment 133874
- As explained in SWV thread, RDNA5 will get double SP per CU. Thus AT2 with max 70CU will get 140CU in old format. That's explain 20% faster performance than RTX4080. It also means AT2 GPU is severely bounded by memory bandwidth.
- Therefore, AMD do not need to clock as high as RDNA4, I am expecting 2GHz+ not ~3GHz. It also means AT2 has headroom to grow. That's why I suspect AMD is reserving XTX model with full die of 70CU for future 40Gbps and 4GB GDDR7 die to appear. That explains the cancellation of AT1 cause AT2 XTX is good enough to compete with upcoming Rubin-70Ti with 24GB 256-bit memory bus.
- There are leaks saying RDNA5 dGPU will be released in Q2 next year. I am actually expecting early announcement in Q1. Thus, the cancellation of RTX-50 Super make senses because it will be bloodbath for NV
: no amount of overclocking will save RTX-50 Super series. NV needs to speed up the release of Rubin dGPU. If Rubin dGPUs are indeed fabbed by 3N (variant of 3X), then the earliest release date would be Q3 next year. That gives AMD early head up of next gen dGPU war.
- That's why I am predicting AMD will set higher price point for AT2-70XT and AT2-70. AMD will keep selling RX9070XT until NV able to launch Rubin-60.
- AMD most likely will keep selling N48 in the form of 9070GRE by then. And no, AT3 and AT4 are NOT for cheap dGPU lineup, period. Now that we know Medusa will have XDNA3, where do you think the NPU will reside in AT3, huh?
If both kepler_l2 (4 SE) and MLID (70 CUs for full chip) are right, it looks like an asymmetric 9-9-9-8 (WGP) config.Unless AMD has a reason to cut 2 CUs out there.
Yield?Are we sure AT2 is 70CUs? Feels like if AT0 is 192 then AT2 will be 72 CUs since that's the multipler scaling up with the memory bit bus (512/192 = 2.666666666666667, 2.666666666666667 x 72 = 192 as well). Unless AMD has a reason to cut 2 CUs out there.
At the Financial Analyst Day 2025 AMD talked about the line up for 2026. Medusa was mentioned because it is based on Zen 6.There are leaks saying RDNA5 dGPU will be released in Q2 next year. I am actually expecting early announcement in Q1. Thus, the cancellation of RTX-50 Super make senses because it will be bloodbath for NV: no amount of overclocking will save RTX-50 Super series. NV needs to speed up the release of Rubin dGPU. If Rubin dGPUs are indeed fabbed by 3N (variant of 3X), then the earliest release date would be Q3 next year. That gives AMD early head up of next gen dGPU war.
Still doesn't rule out xboxAt the Financial Analyst Day 2025 AMD talked about the line up for 2026. Medusa was mentioned because it is based on Zen 6.
Why do you think AMD has not mentioned RDNA5? I would suggest because RDNA5 is not in the 2026 line up.
Valve is coming for Microsoft’s lunch. The xbox team needs to communicate their strategies better to the public. There is now a decent window to strike if they launch next gen in another 1 years time (25th anniversary) & capitalize on the GTO delay. Xbox next could be the best way to play GTO — if they manage to fix the windows-xbox software merge in time by then
Any next-gen Xbox demands a UI that works. In that way, Valve is already far ahead of whatever is happening at Microsoft. Valve’s Linux-based SteamOS software is easier to navigate on handhelds, and it could be coming to more devices like a VR headset or a PC-like console. Microsoft “has always been chasing Valve,” video game researcher and NYU Stern School of Business professor Joost van Dreunen told Gizmodo. It’s a smaller, more agile company run by ex-Microsoft programmer Gabe Newell. Steam—which most developers think is a monopoly—makes so much money, you could consider it a yacht factory for the Valve CEO. It’s not likely to sell out to Microsoft or anybody else any time soon.
No chance with current RAM/NAND trends for the next couple of years, frankly they won't sell many even with old pricing and even if they could match whatever Sony's cooking: they've lost the war. Sony themselves recently said that console cycles getting longer, it's no brainer for them to wait till 2029 and get proper upgrade on N2.I believe xbox has a great opportunity to release RDNA 5 hardware (magnus)
When's the last time dram prices remained stupid for 2 years?No chance with current RAM/NAND trends for the next couple of years
When's the last time dram prices remained stupid for 2 years?
I was on the verge of getting a X870E + 9950X3D + 4x48GB DDR5 combo in September, that I postponed due to the news of a dual-vcache CPU from AMD.My biggest regret is not getting a high speed 96gb kit. Instead, I got a DDR5-6000 64gb kit.
Recently, investing in gold is almost as profitable as investing in RAM .... go for itI was on the verge of getting a X870E + 9950X3D + 4x48GB DDR5 combo in September, that I postponed due to the news of a dual-vcache CPU from AMD.
I guess I either spend $2k on RAM or that upgrade isn't coming anytime soon.
This time it's different ™️When's the last time dram prices remained stupid for 2 years?
Not really, no.Console release needs at least 12-18 months price fix
I thought Kepler L2 speculated it was 72 CUs?If both kepler_l2 (4 SE) and MLID (70 CUs for full chip) are right, it looks like an asymmetric 9-9-9-8 (WGP) config.
Cutting 1 WGP/2 CUs from one SE like this was possibly the only way to sqeeze more than 64 CUs into the maximum desired chip size/layout.
72 CUs possibly would've required a different layout with more whitespace and lead to worse PPA.
Better hope Sony can get GTA VI at 60 FPS or so on the PS5 Pro then...Sony is under zero pressure now, if anything they are better off maxing out PS5 sales, GTA6 will help push PS5 Pro nicely too.
It sounded like kepler is only sure about the SE count, not the CU count, while MLID's info says 70 CUs for the full chip, of which 68 will be enabled on Xbox Next and possibly as few as 64 enabled for desktop cards (may depend on yields).I thought Kepler L2 speculated it was 72 CUs?
Seems unlikely as they have almost same CPU which is going to be main bottleneck for 60 FPS, but I guess we'll see, incentive from Sony to help make it happen will be very high.Better hope Sony can get GTA VI at 60 FPS or so on the PS5 Pro then...
CPU performance can be scaled as well, its not fixed. You can have less NPCs/vehicles, less complex AI in a potential 60 fps mode.Seems unlikely as they have almost same CPU which is going to be main bottleneck for 60 FPS, but I guess we'll see, incentive from Sony to help make it happen will be very high.
It can be, but that will be standard Perf mode that will work on PS5 too, assuming it's still CPU bottleneck for their logic, chances are they shifted a bunch to GPU, if that's humanly possibleCPU performance can be scaled as well, its not fixed. You can have less NPCs/vehicles, less complex AI in a potential 60 fps mode.
developers also noted that this version also brings experimental support for D3D12 Work Graphs. They note that there are no games that use this technology yet, but based on their testing of it, it already runs faster than native implementation after converting it to normal compute shaders.Anything is pretty suboptimal compared to work graphs especially if the underlying workload is either branchy, recursive, and/or data-dependent: SpMV, RTRT and PT, ML, physics simulations, PCG etc...
No it's not another DX11 -> DX12 shift in code abstraction. Work graphs completely changes how a GPU is programmed and what a GPU even is. Think of it as another GPGPU or Programmable shader moment.
One definition of GPU Work graphs could be: "Autonomous execution and scheduling of dependency-driven task graphs on a GPU with more fine-grained parallelism and dynamic execution."
This sounds nothing like Execute Indirect. Execute Indirect was about adressing the shortcomings of high level APIs not complete reinventing the API as is the case with GPU work graphs.
Work graphs in RDNA 3 vs RDNA 5
Based on my limited understanding GFX13 has aligned the HW scheduling scheme with GPU work graphs by making it as decentral and localized as possible which helps to further harness the benefits of GPU work graphs.
One example is WGP local self launch where the HW matches SW's independent task launch for a node in a work graph.
Another example is the autonomous schedulers and task dispatchers within each Shader Engine instead of relying on the command processor.
RDNA 3: everything is orchestrated through L2 cache and command processor.
RDNA 5: work is scheduled (WGS) and dispatched (ADC) within in each Shader Engine (L1 cache) and when required each WGP (L0 cache) can launch and manage its own work when a node in a work graph launches tasks.
Result: Much lower scheduling latency and more fine grained scheduling and dispatch benefiting most tasks although branchy, recursive, and/or data-dependent tasks should see the biggest benefits.
This probably only scratches the surface and with GFX13 I would expect many more changes benefitting work graphs. The schedulers will prob be upgraded to better handle the complexity of work graphs.
As for SWC that is probably most beneficial to coalescing launches.
On page 223 in the course notes PDF states this "We can specify how many records to the same node should be grouped together at maximum, and how many threads the group that is processing this collection should have."
RDNA introduced Wave-32, wouldn't at all be surprised if RDNA 5 introduces additional wave execution granularity and flexibility since this could greatly benefit work graphs in some cases, but there's still no patent for this.
@Kepler_L2 is this accurate or am I missing some important details?
Didn't know emulating Work Graphs was even possible. Compute shaders FTW!developers also noted that this version also brings experimental support for D3D12 Work Graphs. They note that there are no games that use this technology yet, but based on their testing of it, it already runs faster than native implementation after converting it to normal compute shaders.
Source: VKD3D-Proton
| v / > | FPS (native) | FPS (emulated) |
| RX 7600 | 80 | ~500 |
| RX 6800 | n/a | 1339 |
| RTX 4070 | 400 | 650 |
| v / > | ms (WG-native) | ms (WG-emulated) | ms (EI-native) | ms (EI-emulated) |
| RX 7600 | 1.7 | 0.9 | 2.9 | 1 |
| RTX 4070 | 0.55 | n/a | 3.1 |
| v / > | ms (native-WG) | ms (native-compute dispatch/EI) | ms (proton-WG) | ms (proton-compute dispatch/EI) |
| RTX 4070 | 3.2 | 3.1 | 5.5 | 3.9 |
| RX 7600 | 6.8 | 5.8 | 5.1 | 5.8 |
