News [Toms] AMD multi chiplet gpu patent

gorobei · Jan 8, 2021

AMD Patents Chiplet Design To Build Colossal GPUs

After chiplet-based CPUs, chiplet-based GPUs make sense, right?

www.tomshardware.com

actual patent

https://www.freepatentsonline.com/20200409859.pdf

given the rumors of intel and nv seemingly delaying on multi-chiplet for consumer, this seems like a server product for amd.

TnTecks · Jan 8, 2021

Thats exactly where it would make the most sense!

thesmokingman · Jan 9, 2021

It was only a matter of time...

GodisanAtheist · Jan 9, 2021

If I'm reading that patent right, it looks like AMD's sort of mimics what they're doing with their Zen chiplets with a sort of master/IO chiplet that is then ganged together with a set of subchiplets to do the actual processing.

As was mentioned, this looks like a fine method of handling compute and machine learning, but the data latency issues would tank any sort of graphics processing workload.

Curious to see how much Infinity Cache figures into hiding latency issues in this design.

Greenman · Jan 9, 2021

A couple of youtube reviewers are claiming this will be a whole new level of performance for gpu's.

Bigos · Jan 9, 2021

From the looks of things, the chiplets will be symmetrical (given how they are meant to stick closely together in a square formation for a configuration with 4 chiplets). One will be the primary one simply because it is connected directly to the PCIe bus and the rest need to be access through the primary one. I wonder if it does mean that there will be more duplicated structures than usual (like multiple VCN blocks).

Regarding latency, it is pretty much not an issue in GPU workloads with very long pipelines. Using a tile-based rasterizer will also make it possible to subdivide the geometry between the chiplets to reduce the inter-chiplet bandwidth requirements. However, things like texture/geometry access might need to go through the interposer as each chiplet will contain its own memory bank (something akin of NUMA or Zen 1 EPYC). I wonder how much bandwidth will the interposer solution allow for. It was evidently not possible on organic substrate (like EPYC and Ryzen 3000+ use) as it would probably burn too much power for the required bandwidth.

Dribble · Jan 12, 2021

Bigos said:
From the looks of things, the chiplets will be symmetrical (given how they are meant to stick closely together in a square formation for a configuration with 4 chiplets). One will be the primary one simply because it is connected directly to the PCIe bus and the rest need to be access through the primary one. I wonder if it does mean that there will be more duplicated structures than usual (like multiple VCN blocks).

Regarding latency, it is pretty much not an issue in GPU workloads with very long pipelines. Using a tile-based rasterizer will also make it possible to subdivide the geometry between the chiplets to reduce the inter-chiplet bandwidth requirements. However, things like texture/geometry access might need to go through the interposer as each chiplet will contain its own memory bank (something akin of NUMA or Zen 1 EPYC). I wonder how much bandwidth will the interposer solution allow for. It was evidently not possible on organic substrate (like EPYC and Ryzen 3000+ use) as it would probably burn too much power for the required bandwidth.

The way to reduce the latency and inter-chip transfer issues is to optimise for them. To do that you'd need to code your game renderer in such a way as this happens. The easiest way to achieve this would be in the gpu's drivers but that's only possible with a high level driver (something AMD killed when they helped usher in the current age of low level drivers). With a low level driver it becomes to a much greater extent the game dev's problem. As we know the game devs really aren't very keen on this sort of thing as it's too much work (see the death of Xfire and SLi which died because it moved from being a driver problem to make them work to the game dev's who just weren't interested).
Even if AMD (and of course Nvidia and Intel) provided libraries to help support this they'd all be different, and unlike a high level drivers which all support the same unified interface these would all have completely different interfaces. It would just be a ton of work for the devs to sort out which they won't be keen to do.

Bigos · Jan 12, 2021

At the very least, Vulkan provides facilities to program for tile-based renderers by partitioning render passes into subpasses. That should make it possible to reduce the inter-tile dependencies and allow for them to be scheduled into different chiplets. This should be done transparently in the driver.

Crossfire/SLI worked with very little communication between the GPUs. Both the dedicated bridges and the PCIe interconnect provide an order of magnitude less bandwidth than even the GPU memory bus, let alone the on-chip cache interconnects. The solution described in the patent seems to imply this won't be an issue anymore thanks to high performance interconnect through a passive interposer.

The patent also tells that the solution is meant to be seen as a "single GPU" to the CPU. It remains to be seen whether it means that the solution will be 100% hardware/firmware based or the driver will be involved. I don't think the game engine will have to be involved, though, unless there will be a "NUMA" mode or something like that.

gorobei · Jan 12, 2021

amd's work on active interposer (butterdoughnut) suggests they can get the bandwidth needed, but the 'single gpu' part maybe why they need xylinx fpga which would allow them to avoid needing to get it perfect in the hardware at the start, vs a patch later.

soresu · Jan 14, 2021

gorobei said:
but the 'single gpu' part maybe why they need xylinx fpga

Nah, FPGA tech acquisition is just about diversifying and remaining competitive with Intel.

Not counting ML optimised HW in CDNA (or splitting compute/gaming uArch), the Xilinx acquisition represents the first major diversification AMD has made in years.

If they had not been in dire straits financially for at least half a decade they might have bought into FPGA tech much sooner, as it is they had to prioritise belt tightening and core uArch R&D to keep going - now they have regained investor confidence a lot* and can afford to splurge on new investments.

Right now Qualcomm has bought Nuvia I believe in response to Samsung/RDNA as much as for it's competitive CPU possibilities - it remains to be seen just how well prepared Qualcomm's Adreno team were for the last couple of years DX12 Ultimate and Vulkan RT features, so they may be facing uncertain times unless they also cut a deal with AMD (or IMG Tec for Wizard/Caustic RT licensing).

*if in part due to Intel's process foibles vs their fabless model.

News [Toms] AMD multi chiplet gpu patent

gorobei

Diamond Member

AMD Patents Chiplet Design To Build Colossal GPUs

TnTecks

Junior Member

thesmokingman

Platinum Member

GodisanAtheist

Diamond Member

Greenman

Lifer

Bigos

Senior member

Dribble

Platinum Member

Bigos

Senior member

gorobei

Diamond Member

soresu

Diamond Member

TRENDING THREADS