adroc_thurston
Diamond Member
- Jul 2, 2023
- 7,083
- 9,840
- 106
the what turnThat just shows me that Nvidia has more to lose and AMD more to gain when the tides turn.
the what turnThat just shows me that Nvidia has more to lose and AMD more to gain when the tides turn.
Something other than the GPU excels at AI workloads at lower power.the what turn
That's a net loss for both AMD and NV.Something other than the GPU excels at AI workloads at lower power.
Not if it comes out of AMDThat's a net loss for both AMD and NV.
AMD has a GPU roadmap and that's it.Not if it comes out of AMD![]()
They have Xilinx too which is not a GPU.AMD has a GPU roadmap and that's it.
GPU guys won.They have Xilinx too which is not a GPU.
If their "N2" was any good they would never drop prices so much at time of max "AI" demand.Not if you go with Samsung's foundries.
& the ultimate plan has a higher quality cloud gamingMicrosoft has changed all the Xbox Game Pass plans and they all include cloud gaming now which appears to finally be out of beta.
They also raised the prices by 50%+ (99% in my country). Gotta pay for all those AT0s.
Is this patent related to the gfx13 implementation?Yeah but gfx13 does the real cocaine stuff like register renaming.
Meanwhile here's one AMD patent dissing round-robin for load balancing: https://patents.google.com/patent/US11941723B2Not the case with RDNA4 anymore, they support round-robin scheduling now
All that was before the AMD R&D heavy weights Matthäus G. Chajdas, Michael J. Mantor, and Christopher J. Brennan got involved and proposed the CP decoupled scheduling and dispatch with WGS+ADC:MI450X doesn't have any texture/geometry/RT engine stuff, it also doesn't have any of the gfx13 goodies like SWC/WGS/DGF.
Not surprisingly the local launcher patent first spotted by Kepler lists him as well alongside the other heavy weights:More like they improved SE-level scaling with some trickery: https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2025144455&_cid=P21-MCPWUB-70374-1
Crazy work. Thanks for the detailed explanation.Seeking some clarification for info + updating and providing new info potentially relevant for RDNA 5. Hope y'all don't mind.
Is this patent related to the gfx13 implementation?
US20230315536A1 - Dynamic register renaming in hardware to reduce bank conflicts in parallel processor architectures - Google Patents
To reduce inter- and intra-instruction register bank access conflicts in parallel processors, a processing system includes a remapping circuit to dynamically remap virtual registers to physical registers of a parallel processor during execution of a wavefront. The remapping circuit remaps...patents.google.com
For OoO execution SWC was less clear but here there can be no doubts. Why would AMD do something this insane if it weren't to do actual OoO execution on a GPU. Just hope it doesn't suffer from the same issues as LOOG, but perhaps AMD can reign it in through the compiler.
Comparison between the LOOG and the lightweight SOTA GhOST OoO scheduler available here: https://liberty.cs.princeton.edu/Publications/isca24_ghost.pdf
Meanwhile here's one AMD patent dissing round-robin for load balancing: https://patents.google.com/patent/US11941723B2
Maybe this was the original low effort plan for RDNA 5 or possibly what was to be used in N4C (cancelled gargantuan RDNA 4 chiplet GPU).
Possibly also these as well two for N4C:
#1 https://patents.google.com/patent/US12361628B2
#2 https://patents.google.com/patent/US11755336B2
All that was before the AMD R&D heavy weights Matthäus G. Chajdas, Michael J. Mantor, and Christopher J. Brennan got involved and proposed the CP decoupled scheduling and dispatch with WGS+ADC:
#1 https://patents.google.com/patent/US20240111574A1
#2 https://patents.google.com/patent/US12153957B2
#3 https://patents.google.com/patent/US20240111575A1
Not surprisingly the local launcher patent first spotted by Kepler lists him as well alongside the other heavy weights:
Here's the Google Patents version: https://patents.google.com/patent/US20250217195A1
Completely instrumental for accelerating workgraphs prime candidate code (branchy):
"...local launch mechanism provides an order-of-magnitude improvement to thread launch performance, allowing finer-grained dispatches, local consumption of data within a compute unit, and much improved performance in highly variable workloads. For example, the local launch mechanism improves the performance of application program interfaces that utilize work graphs by allowing a workgroup processor to self-schedule work without needing to submit a request to a work scheduling mechanism such as a command processor."
Here's another patent where he's listed that rewrites the pipeline for fixed-function units, unconfirmed for now:
WO2024129349A1 - Inclusion of dedicated accelerators in graph nodes - Google Patents
Systems, apparatuses, and methods for implementing a hierarchical scheduling in fixed-function graphics pipeline are disclosed. In various implementations, a processor includes a pipeline comprising a plurality of fixed-function units and a scheduler. The scheduler is configured to schedule a...patents.google.com
Might be going out on a limb but especially patents that list Matthäus G. Chajdas are very likely to inform finalized HW implementations going forward. Part of DGF and GPU Work Graphs SW teams, multiple patents already implemented etc... Keep an eye out on his work and patents for sure.
WGS+ADC+Work stealing (load balancing) and local launchers in RDNA 5 alone is enough to warrant gfx13 a clean slate designation. RDNA tackled CU level scheduling and dispatch and dispatch while it looks like RDNA 5 will adress the entire stack from CP to SIMD unit in a radical fashion. Remember that this is just one aspect of a GPU architecture. Imagine this kind of clean slate approach or at the bare minimum major changes across the entire GPU architecture as Kepler alluded to here:
"Well yeah they are changing everything, gfx13 is the biggest architectural overhaul since GCN."
The most crazy thing is that as Kepler aluded to this redesign is just one aspect of a GPU, and while obviously I can't say this is how it'll be for sure, the patents suggest AMD is tackling every single aspect of the GPU. Indicates AMD is building a power efficient and data locality focused GPU architecture heavily reliant on adaptive and feedback driven mechanisms that are is extremely fine-grained and flexible while also implementing clever and novel techniques to maximize processing and data efficiency. RDNA 5 isn't a repeat of RDNA, that tried to overcome GCNs limitations by align itself with Maxwell µarch derivatives and it isn't a repeat of GCN, that while forward looking in multiple areas was unfortunately still limited by its circumstances of being mostly funded by Sony and MS during the comatose AMD era. It's AMD finally tackling nearly every single aspect of GPU design in a manner generally forward looking, novel and well beyond NVIDIA's current gen.
The shared design and IP pipeline moving forward means R&D spent on consumer and DC overlaps, Many of these optimizations also carry over to DC which makes the monetary incentives much stronger. Besides the gaming specific optimizations (no overlap with DC) I doubt RDNA 5 is primarely funded by MS and Sony as these changes are too big to be primarely for nextgen consoles and gaming GPU family. Would wager that the combined R&D funding for CDNA 5 and RDNA 5 massively eclipses anything previous even when accounting for number of implementations (dies) and node related design cost explosion.
With such a big redesign comes enourmous risk of HW bugs and flaws so I really hope AMD finally has the neccesary money and manpower to avoid any major issues and unused features similar to Vega's DSBR, RDNA 1 HW bugs, and the RDNA 3 chiplet bugs. So AMD take your time with this one.
Yw but I should probably stop looking torturing myself by compiling AMD GPU patent filings. AMD files a crazy amount of patents. Almost 1200 patents glanced with publication date from EoY 2022-now. 150 saved for RDNA 5, later RDNA or never used. I've given a few dozen a closer look. Every new patent I read makes the wait increasingly unbearable. Can't wait ~2 years xDCrazy work. Thanks for the detailed explanation.
A combined launch with NG consoles, Zen 7 (to counter Intel), CDNA AND RDNA 5 around Black Friday-Holidays 2027 would be insane, but we'll see.Mid to late 2027
Can this DGF work be outsourced to CCU ?It is very easy to implement a Mega Geometry-like solution in hardware, while DGF can be emulated via compute shaders, which is important, because a GDF-like decompression engine can be extremely complex in hardware, so it will take a while for Intel and NVIDIA to implement it."
Can this DGF work be outsourced to CCU ?
That patent is not in RDNA5Can this DGF work be outsourced to CCU ?
@marees Someone on Reddit found this additional useful info about DGF. It's from JoeMan in Disgus section below the Videocardz article on Animated DGF. Explains it much better than I ever could:
"So let’s clear this up, because an incredible amount of nonsense is being said about it.
NVIDIA DMM and AMD DGF are fundamentally different approaches to the same problem. DMM works by taking the original surface and then radically reducing its detail, so in the end it consists of fewer triangles, while storing the surface information in the form of a displacement map. This way, the BVH acceleration structure can remain simple, since only a fraction of the surface’s real detail needs to be taken into account for the actual calculations. DGF, on the other hand, is a scalable, lossy compression method for meshlets, and unlike DMM, it actually represents the geometry instead of only reproducing it through a displacement map. The result is similar, and the BVH acceleration structure can remain relatively simple. Since both methods are lossy compression techniques, there will be some quality degradation, but the benefits gained from compression are significantly greater than the loss in quality.
The principle is therefore similar, but the advantages and disadvantages lie in different places. DMM compresses much more effectively, so in theory the gain is greater, but because of the required surface preprocessing it imposes a significant overhead on the CPU, and it is also not compatible with the actual geometry used in today’s games, since the content has to be tailored specifically for DMM. DGF compresses less effectively, so in theory the gain is smaller, but it is compatible with all kinds of geometry, and it does not impose any significant overhead on the CPU either.
Because DMM proved so impractical that no developer was willing to adopt it, NVIDIA decided to discontinue the technology in the RTX 50 series, meaning it is unlikely to ever see use in practice.
Since DMM is practically unusable, NVIDIA introduced Mega Geometry as its replacement, which primarily works by clustering triangles rather than manipulating the surface itself. This addresses DMM’s compatibility issues and imposes relatively low additional overhead on the CPU, but it does not perform actual geometric compression, meaning its memory requirements are extremely high compared to both DMM and DGF.
A simple comparison of the situation:
DMM: very limited surface compatibility, high CPU overhead, extremely low memory usage
Mega Geometry: full surface compatibility, moderate CPU overhead, very high memory usage
DGF: full surface compatibility, low CPU overhead, low memory usage"
DMM well ahead of DGF in compression factor when it works, but limited usability, which explains deprecation.
DMM memory usage superior to RTX Mega Geometry reliant on raw uncompressed triangle clustering.
"One more point: while DMM and DGF target similar problem areas and therefore cannot really be used alongside each other, DGF and Mega Geometry are not direct replacements and these can actually complement each other, as they approach the situation differently. A Mega Geometry-like solution can works on DGF surfaces, so it is likely that both will eventually be utilized together, as they work very well side by side. It is very easy to implement a Mega Geometry-like solution in hardware, while DGF can be emulated via compute shaders, which is important, because a GDF-like decompression engine can be extremely complex in hardware, so it will take a while for Intel and NVIDIA to implement it."
DMM and RTX Mega Geometry are complementary. DGF + RTX MG = Goated RT
Intel has to counter Zen6 first, NVL ain't it.Zen 7 (to counter Intel)
Umm in terms of gaming it has to beat Zen5X3D too. Forget Zen6 lolIntel has to counter Zen6 first, NVL ain't it.
There is Gaming AT0, with lower CU'sXcloud = AT0
Huge AT0 newsRound-up of a few news stories that could impact future RDNA 5 (2027 to 2029)
1. Memory prices to stay high for next decade — https://www.club386.com/nand-shorta...e-of-ai-data-centre-demand-claims-phison-ceo/
2. Xbox raises gamepass prices just as it adds the most games ever. Adds cloud gaming to all console tiers. (Nadella was earlier head of Azure) https://insider-gaming.com/microsoft-tries-to-defend-xbox-game-pass-price-increase/
3. Strix Halo (zen 5 + RDNA 3.75) scales linearly from 25watts to 45 watts — raising hope that medusa halo (zen 6 + AT3) will scale well from 15 watts itself
4. Current Leaked RDNA 5 line-up
1. Series S / handheld / z3 extreme = AT4 in medusa premium2. Halo tablets/NUC/All-in-one/AI box = AT3 in medusa halo3. Xbox = AT2 in magnus4. scrapped? = AT1 (6080 competitor)5. Xcloud = AT0
Idk what did you snort, but mdsH is an 85W part.Strix Halo (zen 5 + RDNA 3.75) scales linearly from 25watts to 45 watts — raising hope that medusa halo (zen 6 + AT3) will scale well from 15 watts itself
Buy 2 extra battery packs with the 2028 version of gpd win 5 😉Idk what did you snort, but mdsH is an 85W part.