• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Question Intel Celestial XE3 discussion - not dead yet

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Timestamp 20:40

Tom Peterson is asked why Arc Battlemage doesn't have SIMT, and be hinted that a future Xe architecture (Xe3? Xe4?) might add it.
 
What are some GPU architectural upgrades that might come to Xe3/Celestial?

1. Shader Execution Re-ordering
2. Mesh Nodes and Work Graphs
3. SIMT architecture
Maybe do like nvidea did on Ampere and add FP to INT pipe?
SgUQV39DAW5Bso5Ao2VxgP.jpg
 
So the whole confusion is because Celestial dGPUs will be using Xe⁴, right? That's why some say Celestial dGPU is canceled, they only got the information that Xe³ dGPUs are canceled, so the conclusion was Celestial=canceled.

In reality they probably keep the 2 Year Rhythm and Celestial dGPU are probably targeting late 2026/early 2027 with Xe⁴ and Xe³ is APU only starting H2 25.
 
Too early for Xe4. Nova Lake and Wildcat Lake are on Xe3 and according to Exist50 Razor Lake as well. Alchemist dGPU to Battlemage dGPU was a 2.5 years cadence (A380 launched on June 14th, 2022), mobile versions even earlier. Celestial to Druid sounds like a bigger redesign.
 
Too early for Xe4. Nova Lake and Wildcat Lake are on Xe3 and according to Exist50 Razor Lake as well.
But these are all iGPUs. Strix Halo with RDNA3.5 will launch on the same day as Desktop RDNA4, so what stops Intel from launching Xe⁴ Desktop Cards in early 2027 while iGPUs are using Xe³?
 
Lack of Development time and budget stops them. Two generations within 2.5 years is not realistic. If it's Xe4 it's called Druid and not Celestial. Intel favours iGPUs whereas AMD usually dGPUs.
 
XE3 with larger cache ? If each bit controls two banks instead of one if number of bits did not decrease this could translate to a potentially doubled L3 cache size if fully enabled. Or just diff organization.

1736906631128.png
 
Tom Peterson is asked why Arc Battlemage doesn't have SIMT, and be hinted that a future Xe architecture (Xe3? Xe4?) might add it.
Since when did Intel not have SIMT? Up until Broadwell graphics, they were using SIMD/SIMT hybrid architecture and switched between on the fly. On Skylake they moved to SIMT and saved xtors on it. If they are not on SIMT anymore, it means they would have changed back. SIMT is what GPUs use and SIMD is what CPUs use.
 
post the patent news in this thread @MrMPFR
Think I might have found Xe3P's RT core design. This guy seems to be the HW RT architect.
  1. Apparatus and method for manageable fragmented acceleration structures
  2. Apparatus and method for implementing a bounding volume hierarchy with oriented bounds using quantized shared orientations
  3. Apparatus and method for using multiple bounds for child nodes in a bounding volume hierarchy
  4. Apparatus and method for block-friendly ray traversal
  5. Apparatus and Method for Extended Cache Control for Workloads using Temporary or Scratch Memory Space
  6. Apparatus and method for throttling ray tracing operations based on cache hit rate

Haven't had the chance to read the patents properly. @basix what do you think?
 
Think I might have found Xe3P's RT core design. This guy seems to be the HW RT architect.
  1. Apparatus and method for manageable fragmented acceleration structures
  2. Apparatus and method for implementing a bounding volume hierarchy with oriented bounds using quantized shared orientations
  3. Apparatus and method for using multiple bounds for child nodes in a bounding volume hierarchy
  4. Apparatus and method for block-friendly ray traversal
  5. Apparatus and Method for Extended Cache Control for Workloads using Temporary or Scratch Memory Space
  6. Apparatus and method for throttling ray tracing operations based on cache hit rate

Haven't had the chance to read the patents properly. @basix what do you think?
will go thru. are you sure these are for xe3p & not xe4p (or xe5p etc. )

there is a rumour from the dark satanic interwebs that xe3p will be purely igp based but xe4p could be discrete dGPUs
 
Think I might have found Xe3P's RT core design. This guy seems to be the HW RT architect.
  1. Apparatus and method for manageable fragmented acceleration structures
  2. Apparatus and method for implementing a bounding volume hierarchy with oriented bounds using quantized shared orientations
  3. Apparatus and method for using multiple bounds for child nodes in a bounding volume hierarchy
  4. Apparatus and method for block-friendly ray traversal
  5. Apparatus and Method for Extended Cache Control for Workloads using Temporary or Scratch Memory Space
  6. Apparatus and method for throttling ray tracing operations based on cache hit rate

Haven't had the chance to read the patents properly. @basix what do you think?
Patent contributions:
  1. BVH consisting of fragmented independently managed chunks that are very flexible. Mechanism for tracking, updating and traversing acros fragments.
  2. OBBs with shared, quantized orientation set. Reigning in bound box orientations making it compressable and reducing overhead.
  3. Union-bounded child nodes (Children 0...N/box). Tight coverage of complex/diagonal/curved geometry without exploding node count.
  4. HW accel BVH node block allocation (co-located parent and child nodes). Strict policy to reduce traversal misses and pointer chasing
  5. Per-workload dirty-state cache control for workloads flooding into scratch pad. Avoids redundant writebacks and reloads.
  6. Throttling ray dispatch operations based on cache hit rate for RT memory operations such as BVH node loads, ray state loads, stack accesses etc... Avoids cache thrashing.
While this is not RDNA5 level RT progress it's still significant. Should help either Xe3P (most likely) or later Intel GPUs run ray tracing a lot faster.
 
Back
Top