Discussion General GPU µArch Research Thread

soresu · Feb 5, 2025

Hi all, thought I'd start some new threads just for research on µArchs, with this one specific to GPUs alone.

soresu · Feb 5, 2025

First up, a paper about GPU out of order execution:

GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction

https://ieeexplore.ieee.org/document/10609594

soresu · Mar 4, 2025

Another paper I found about Out of Order execution on GPUs:

TURBULENCE: Complexity-Effective Out-of-Order Execution on GPU With Distance-Based ISA

TURBULENCE: Complexity-Effective Out-of-Order Execution on GPU With Distance-Based ISA | IEEE Computer Architecture Letters

A graphics processing unit (GPU) is a processor that achieves high throughput by exploiting data parallelism. We found that many GPU workloads also contain instruction-level parallelism that can be extracted through out-of-order execution to provide ...

dl.acm.org

soresu · Jan 12, 2026

Anybody got anything to add to this thread?

imma go lookin for some research papers....

MrMPFR · Jan 12, 2026

soresu said:
Anybody got anything to add to this thread?

imma go lookin for some research papers....

There are many interesting papers listing Gabriel H. Loh that cover pretty much all bases. I'l let you decide which ones to add: https://scholar.google.com/citations?hl=en&user=e_D2XsUAAAAJ&view_op=list_works&sortby=pubdate

Should we keep this thread limited to research papers or include patents as well?

MrMPFR · Jan 12, 2026

PDF download options for post #2 and #3 papers:

#2/GhOST: https://liberty.cs.princeton.edu/Publications/isca24_ghost.pdf
#3/TURBULENCE (looks like an early preprint): https://past.date-conference.com/proceedings-archive/2023/DATA/1031.pdf

I couldn't find percentage area overhead for TURBULENCE paper said scheduler was 4X larger. GhOST had tiny 0.007% area overhead.
Speed-ups impossible to compare with apples to orange benchmark applications.

MrMPFR · Feb 1, 2026

Hardware Acceleration of Neural Graphics

Hardware Acceleration of Neural Graphics

Rendering and inverse-rendering algorithms that drive conventional computer graphics have recently been superseded by neural representations (NR). NRs have recently been used to learn the geometric and the material properties of the scenes and use the information to synthesize photorealistic...

arxiv.org

Proposes a neural graphics ASIC engine called the NGPC or Neural Graphics Processing Cluster. Consists of Neural Field Processors (NFP). These consist of two stages and fully fused:
#1 hashgrid encoders (Encoding Engines)
#2 a specialized MLP calculations (MLP Engine)

This engine is situated alongside the GPC's with a local 1MB scratchpad and is to be used for ALL Neural Shading MLP calculations.

Substantial area and power overhead but with +4.52% area and +2.75% power a NGCP-8 configuration could be doable.

Massive speedups were observed across NeRF, NSDF, NVR, and GIA. Based on a modified 3090 design 30FPS 4K for NeRF and 8K 120FPS for the other applications.

I know thread says GPU only, but this is a design for augmenting a GPU with an ASIC, similar to augmenting shaders by adding RT cores and other prev things so nothing new. Let me know if I need to repost it somewhere else.

I've taken a brief at the rest of the litterature at with things such as Instant-NGP and it looks like everyone is trying to avoid MLPs like the plague and find other approaches that are faster and use less ressources. MLPs are reliable and established but not particularly efficient to say the least.

So paper is interesting but as for actual HW implementation I would not bet on it. If possible SW is always king.

Discussion General GPU µArch Research Thread

soresu

Diamond Member

soresu

Diamond Member

soresu

Diamond Member

TURBULENCE: Complexity-Effective Out-of-Order Execution on GPU With Distance-Based ISA

TURBULENCE: Complexity-Effective Out-of-Order Execution on GPU With Distance-Based ISA | IEEE Computer Architecture Letters

soresu

Diamond Member

MrMPFR

Senior member

MrMPFR

Senior member

MrMPFR

Senior member

Hardware Acceleration of Neural Graphics

Hardware Acceleration of Neural Graphics

TRENDING THREADS

Discussion General GPU µArch Research Thread

Diamond Member

Diamond Member

Diamond Member

TURBULENCE: Complexity-Effective Out-of-Order Execution on GPU With Distance-Based ISA​

Diamond Member

Senior member

Senior member

Senior member

Hardware Acceleration of Neural Graphics​

TURBULENCE: Complexity-Effective Out-of-Order Execution on GPU With Distance-Based ISA

Hardware Acceleration of Neural Graphics