Question Speculation: RDNA2 + CDNA Architectures thread

Page 185 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,635
5,983
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

Saylick

Diamond Member
Sep 10, 2012
3,170
6,403
136
VCZ - RDNA 2 Press Deck (Transcript)

AMD RDNA 2 ARCHITECTURE DESIGNS GOALS

  • Pushing performance with higher frequencies
  • New levels of power efficiency with AMD Infinity Cache
  • Designed with features for gamers
PRODUCT DESIGN GOALS

  • Engineering – Exceptional thermals, PCB, and electrical
  • Platform – Built with the entire PC platform in mind
  • Experience – Tangible benefits for end-users
THE ROAD TO POWER EFFICIENCY
Achieving an average of 4.1X perf/watt with AMD RDNA2

[ graph where R9 290X is 1x, RX 6800 XT is 4.1x ]

EXCEPTIONAL THERMAL DESIGN

  • Extended vapor chamber for maximum performance
  • Graphite thermal interface material on GPU for high-performance and maximum relatability
  • Die-cast aluminum frame for structural rigidity
  • High-performance, ultra-soft gap pads for efficient GDDR6 and MOSFET cooling
  • Zero RPM fan mode for silent operation during light workloads
  • Custom-designed axial fans for outstanding cooling and quiet operation
  • Premium die-cast aluminum shroud
PREMIUM PCB | INNOVATIVE ELECTRICAL

  • HDMI 2.1 with FRL
  • USB Type-C
  • Low PCIe slot peak currents
  • Premium IT-170 material
  • 15 high efficiency power-stages phases
  • Standard edge location of power connectors
  • RGB Control [header]
  • 14-layer high performance PCB with 4 layers of 2 oz. copper for exceptional power delivery
MEMORY POWER PHASE COUNTS
High performance, low power

  • RX 6800 XT: 2 power phases, 8 memory devices
  • RTX 3090: 4 power phases, 24 memory devices
  • RTX 3080: 3 power phases 10 memory devices
PLATFORM: BUILT FOR STANDARDS
Enabled by exceptional engineering

[ A render with RX 6800 air-flow in chassis, similar to the famous RTX 30 air flow render ]

  • STANDARD Air flow for push-pull chassis configuration
  • STANDARD Enthusiast power draw for simple upgrades (RX 6800: 650W min, RX 6800XT: 750W min PSU)
  • STANDARD Power connector and location for clean cable management
DESIGNED WITH PARTNERS IN MIND
Enabling broad ecosystem and platform partnership

  • STANDARD SIZE – A 2 to 2.5 slot form factor enables seamless integration into existing chassis and partners systems
  • STANDARD PCB FORM FACTOR – A common design language suited for after-market cooling including AIO liquid cooling casing
  • STANDARD POWER – Suited for operation with existing enthusiast PSUs starting at 650W
EXPERIENCE – PHENOMENAL ACOUSTICS
Enabled by custom fan design and extended vapor chamber

  • Radeon RX 6800 XT 6 dBA quieter than Radeon RX 5700 XT
  • 70% less perceived noise with Radeon RX 6800 XT (compared to the Radeon RX 5700 XT at 35C intake),
LOW POWER IDLE AND FAST WAKE-UP
Enabled by system-level power management innovations

  • Low power graphics off – 0.54X power – monitor idle vs RX 5700XT
  • Display – 850ms monitor wake-up from long idle
EXCELLENT OVERCLOCKING
Extra performance on Radeon RX 6800 XT

  • 14-layer premium PCB – 4 layers of 2 ounces of copper for overclocking stability
  • 15 power stage phases – High efficiency power stages for clean voltage draw
  • Exceptional cooling – Extra thermal and acoustics margin built-in
AMD RADEON SOFTWARE
PERFORMANCE TUNING PRESETS
Simple, one-click custom power tuning modes to improve performance or save power

BENEFITS

  • QUIET – Reduces power and fan noise for cool & quiet operation with little impact on performance
  • BALANCED – Default power levels
  • RAGE MODE – Takes advantage of any extra headroom on the GPU to deliver the ultimate gaming performance
Radeon RX 6800 XT PresetGame ClockBoost Clock
QUIET1950 MHzup to 2185 MHz
BALANCED2015 MHzup to 2250 MHz
RAGE2065 MHzup to 2310 MHz
INTRODUCING AMD FidelityFX Super Resolution

  • Currently in development at AMD
  • Stay tuned for more information as we collaborate with game developers
RASTERIZATION VS RAY TRACING

RASTERIZATION

  • Traditional path for real-time graphics rendering
  • Fast & Flexible
  • Can look very, very good, but results not “perfect”
    • Trade-offs between performance and & quality are the norm
RAY TRACING

  • Ultimate solution to recreating reality in games
  • High performance cost
  • Typically reserved for offline rendering
RAY-TRACING ACCELERATION
Changes the game

  • As rasterization becomes more cable and complex, its performance cost grows
  • In some cases, tracing rays becomes a reasonable trade-off for improved image quality
  • Hardware acceleration of ray tracing makes some ray-traced effects feasible now
SELECTIVE RAY-TRACED EFFECTS ARE NOW POSSIBLE

  • Developers can judiciously deploy ray tracing to improve realism in their games
  • Real-time ray tracing will involve quality and performance tradeoffs
  • Developers are still learning about how best to use ray-traced effects in combination with rasterization
COMMON USES OF RAY TRACING IN HYBRID RENDERING

REFLECTIONS

  • Can show reflections of objects nut currently on-screen which rasterized reflections typically miss
  • Fallback option: FidelityFX Screen Space Reflections
SHADOWS

  • Replaces often incredibly complex shadow volume implementations with higher-quality results
AMBIENT OCCLUSION

  • More accurately renders the finer detail of light and shadow, especially in the nooks and crannies of indirectly lit areas
  • Fallback options: FidelityFX Ambient Occlusion
GLOBAL ILLUMINATION

  • Attempts to model the transport of light around a scene, especially diffuse reflections from object to object
INTRODUCING FIDELITYFX DENOISER

  • Tracing rays is computation expensive, so ray-traced effects are typically sparsely sampled
  • The resolution ray-traced images include some visual noise
  • FidelityFX Denoiser removes this noise and produces a clean, clear image
OUR GOAL: ENABLING DEVELOPERS TO DELIVER ASTOUNDING EXPERIENCES

  • The AMD RDNA 2 architecture and its ray-tracing acceleration hardware will set the standard for the industry
  • AMD is working with developers to enable the use of ray-traced effects where they will have the best impact
  • The goal, as always, remains fast and fluid animation with compelling results
AMD RDNA 2 DEEP DIVE
AMD RDNA 2 ARCHITECTURE

Enthusiast gaming with performance-per-watt leadership

  • PERFORMANCE – Up to 2X AMD Radeon RX 5700 XT in Just Over One Year
  • EFFICIENCY – Up to 54% Performance-per-Watt Gains in Same Process Node
  • FEATURES – Deliver DX12 Ultimate Experience for Every Gamer
RDNA 2 GAMING ARCHITECTURE
MORE PERFORMANCE, LESS POWER

  • BREAKTHROUGH HIGH-SPEED DESIGN – High frequencies and superb efficiency
  • REVOLUTIONARY AMD INFINITY CACHE – 128MB cache with extreme bandwidth at lower power
  • ADVANCED FEATURES – DX12 Ultimate and support for DirectStorage API
NAVI21 GPU details

  • 7nm
    • 519.8 sqmm
    • 26.8 Billion Transistors
  • I/O
    • x16 PCIe Gen4
    • 256 GDDR6 @ 16 Gbps peak
  • Display Engine
    • HDMI 2.1, AMD FreeSync Technology, DSC, and VRR
    • Future Ready for up to 8K 120Hz
  • Multi-Media Engine
    • 8K AV1 Decode
    • High Quality 8K HEV Encode Accelerator
    • H.265 B-frame support
  • Command Processors
    • Graphics Engine
    • 4 Async Compute Engine
  • Cache Hierarchy
    • 128MB AMD Infinity Cache
    • 4MB L2
    • 1MB Distributed L1
  • Up to 80 Compute Units
    • 5120 Stream Processors
    • 320 Texture Units
    • 80 Ray Accelerators
  • Geometry Processor
    • 8 Pre-Cull Prims/Cycle
    • 4 Post-Cull Prims/Cycle
  • RB+
    • 1024 Hiz Pixels/Cycle
    • 256 Death Samples/ Cycle
    • 128 Pixel Launch/Cycle
    • 128 32b Pixel color write/Cycle
    • 64 64b Pixel color write/Cycle
    • 64 Pixel color blend/Cycle
BREAKTHROUGH HIGH-SPEED DESIGN

HIGH FREQUENCY IN THE DNA

  • Leverages world-class CPU design methodologies
  • Streamlined micro-architecture
PERFORMANCE-POWER SCALABILITY

  • Up to 1.3 frequency at the same power per CPU
  • Up to 50% per CU power at the same frequency
PERFORMANCE-PER-WATT ACHIEVEMENT UP TO 54%

16% – DESIGN FREQUENCY INCREASE

  • Leverages CPU high frequency expertise
  • High speed performance libraries
  • Streamlined micro-architecture and design
  • Aggressive re-pipelined logic for speed
17% – CAC and Power Optimizations

  • Pervasive fine-grain clock gating
  • Clock tree splitting and gating
  • Redesigned for minimal data movement
  • Aggressive pipeline rebalancing
21% – Performance per Clock Enhancement

  • Infinity Cache amplified low latency/power bandwidth
  • TLD streamlined for latency reductions
  • Redesign 32bt pipe and included new HDR format
  • Optimized geometry distribution and tessellation
THE ENHANCED AMD RDNA 2 COMPUTE UNIT

  • Streamlined for increased frequency and low power
  • Mixed Precision Operations for tensor math
  • Sampler feedback streaming and texture space shading
  • Ray Accelerator: 4 Box or 1 Triangle Intersection per cycle
OPERAND / RESULTMODEOPS/CYCLE/CU
FP16/FP16Packed256
FP16/FP32Mixed Precision256
FP32Native128
FP64Native8
Int64Native32
Int32Native128
Int16/Int16Packed256
Int16/Int32Mixed Precision256
Int8/Int32Mixed Precision512
Int4/Int32Mixed Precision1024
REDESIGNED RB+
DESIGNED GROUND UP FOR FREQUENCY, POWER, AND EFFICIENCY

  • Each RB+ natively doubled the 32bpp color rate by processing eight 32-bit pixels per cycle.
  • The RB+ in conjunction with Rasterization expands Variable Rate Sharing (VRS) results for 2×1, 1×2, 2×2 modes to the destination surface.
AMD RDNA 2 MESH SHADING

Mesh shader process workgroups of primitives

  • A geometry front-end with the flexibility of GPU Compute
Shader-based culling and work optimizations

  • Object ID, facedness, depth, occlusion
  • Bouning volume
  • LOD-based mesh determination
  • Custom vertex and geometry data de-composition
Data reuse

  • Vertex reuse on a workgroup scale
Optimized Computation

  • Attribute shading only for primitives that are not culled
  • Particle system physics + mesh in the same shader
AMD RDNA 2 SAMPLER FEEDBACK
Sampler feedback supports both advanced streaming and next-generation rendering

Advanced streaming

  • Memory footprint optimization
  • Texture filtering constrained to resident mipmap levels
  • Asynchronous updates of resident texture data
Texture space rendering

  • Identification of texture locations used in rasterization
  • Feedback data to optimize shading workloads
AMD RDNA 2 RAYTRACING

  • Dynamic Global Illumination
  • Ray-traced soft shadows from area lights
  • Hybrid reflections mixing compute and screen-space effects with full raytracing
AMD RDNA 2 RAYTRACING

  • 4 Ray/Box Intersections processed per CU per clock
  • 1 Ray/Triangle Intersection processed per C per clock
  • AMD RDNA 2 implements a high-performance ray tracing intersection acceleration architecture
    • The Ray Accelerator handles intersection of rays with the BVH, and sorting of ray intersections times
  • It provides an order of magnitude increase in intersection performance compared to a software implementation
  • Traversal of the BVH and shading of ray results is handled by shader code running on the Compute Units
  • AMD Infinity Cache can hold a very high percentage of the BVH working set, reducing intersection latency
AMD RDNA VARIABLE RATE SHADING

  • AMD RDNA2 variable rate sharing is designed to deliver the maximum usability and flexibility for developers
  • Fine grained rate selection (per 8×8 pixels) makes it easier to select the appropriate shading date for each region. Larger regions could cause more image quality or performance compromises.
  • AMD RDNA 2 supports coarse shading rates up to 2×2 with consistent and predictable performance improvements. Up to 4x improvements in effective shading throughput are attainable.
AMD INFINITY CACHE BENEFITS

  • 1.3 pJ Infinity Cache Access vs 7-8 pJ GDDR6 Access (Average hit rates for 4K titles up to 58%)
  • AMD Infinity Cache unleashes the potential of high-frequency GPU
  • Performance gains with a frequency significantly amplified with the cache
  • Key to unlocking more power-efficient gaming performance
  • A larger configuration will generally mean higher latency (wasted power and lower performance)
  • But with Radeon RX 6800 XT we source most of our bandwidth from the AMD Infinity Cache with up to 48% lower latency than Radeon RX 5700 XT memory
  • With our higher AMD Infinity Fabric clock rates, even raw memory accesses are faster
  • Combined, we get 34% reduction in average latency for improved energy efficiency and performance
BANDWIDTH ON DEMAND
Cache boost clock for turbo-charged bandwidth

  • Games go through phases with widely varying bandwidth requirements
  • Since AMD Infinity Cache sources most bandwidth, power management can boost om-demand
  • Boost Infinity fabric clock for up to a 550 GB/s BW increase when needed, save power when not
 

PhoBoChai

Member
Oct 10, 2017
119
389
106
Cannot wait to see Minecraft with RT on Series X (RDNA 1.8 or whatever it is). This was shown to media folks months ago and while obviously it would have made an amazing "launch" update I have yet to find any rumors on when it is actually showing up.

If you have any steam on this I would really like to chase it down :)

No info on that sorry.

Though I think when the reviews show up for 6800 series, ppl should look more carefully at RT performance in current DXR 1.0 games (RTX titles) and then compare to Dirt 5 thats using DXR 1.1, as that is the future of RT. Will be interesting if there's any difference.

It appears that RDNA 2 is more optimized for the DXR 1.1 spec with inline RT and BVH rebuilding, as MS announced the feature as they started talking about RDNA 2 in the Series X.
 

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,666
136
Anyone care to speculate on the benefits of DXR 1.0 vs 1.1?
What is there to speculate? 1.1 is more flexible.


"Inline raytracing is an alternative form of raytracing that doesn’t use any separate dynamic shaders or shader tables. It is available in any shader stage, including compute shaders, pixel shaders etc."


"Inline ray tracing is an alternative form of ray tracing that gives developers the option to drive more of the ray tracing process, as opposed to handing work scheduling entirely to the system (dynamic-shading). It is available in any shader stage (including compute shaders, pixel shaders etc) which also allows for easier integration into existing game engines."
(Highlights by me, thought this addition to the text is noteworthy.)

I also expect 1.1 to work better on RDNA2 GPUs as the changes seem to be geared toward how RT is setup on there compared to Nvidia's hardware implementation.
 

kurosaki

Senior member
Feb 7, 2019
258
250
86
Still can't shake that feeling the 6800 is over priced with 150 bucks, at least. Compared to the 3070 anyways. What's your take on this? It should perform similar to the 3070, maybe a bit higher in raster, maybe a bit lower in RT. All in all, 579 ex VAT, which seems to translate into over 800 usd in some parts of Europe is kind of, unsexy.
 
Last edited:
  • Haha
Reactions: scineram

leoneazzurro

Senior member
Jul 26, 2016
930
1,465
136
Still can't shake that feeling the 6800 is over priced with 150 bucks, at least. Compared to the 3070 anyways. What's your take on this? It should perform similar to the 3070, maybe a bit higher in raster, maybe a bit lower in RT. All in all, 650 ex VAT, which seems to translate into over 800 usd in some parts of Europe is kind of, unsexy.

The 6800 should be respectably above the 3070 in raster and probably below in RT (we have yet to see in AMD optimized titles), but it comes with 16 Gb of VRAM (which is a cost) and with a big overclocking potential, too. I feel too price should be lower, but not by 150$. 530-540$ would have been a sweeter spot for it.
 
  • Like
Reactions: Tlh97

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
The 6800 is a weird orphan that they'd really rather not be producing or selling. Not cutting a huge die down that much.

Either a way to project down the market a bit until the lower tier parts are available or maybe indicative that their final product stack is going to have a few holes in it. A few holes would still represent significant progress vs recent times!

They've got very little motivation to try and price very aggressively, and that's ignoring the fact that every next gen GPU anyone can make is sold out 10 times over....
 

kurosaki

Senior member
Feb 7, 2019
258
250
86
The 6800 should be respectably above the 3070 in raster and probably below in RT (we have yet to see in AMD optimized titles), but it comes with 16 Gb of VRAM (which is a cost) and with a big overclocking potential, too. I feel too price should be lower, but not by 150$. 530-540$ would have been a sweeter spot for it.
Well, that was taking into account that the 3070 was over priced as well. Its still uncommon for me to even look at high end cards above 3-400 usd mark. Including vat that is. Crazy times. Feels like stagnation in perf per dollar still is quite strong.
 
  • Haha
Reactions: scineram

leoneazzurro

Senior member
Jul 26, 2016
930
1,465
136
The 6800 is a weird orphan that they'd really rather not be producing or selling. Not cutting a huge die down that much.

Either a way to project down the market a bit until the lower tier parts are available or maybe indicative that their final product stack is going to have a few holes in it. A few holes would still represent significant progress vs recent times!

They've got very little motivation to try and price very aggressively, and that's ignoring the fact that every next gen GPU anyone can make is sold out 10 times over....

By that logic, 3080 should not be produced as well, because the percentage of cutting is very similar to the 6800.
 

Leadbox

Senior member
Oct 25, 2010
744
63
91
Pretty sure these 6800s are faulty/defective die that don't make the cut as XTs or 6900s. I seriously doubt they're cutting fully functional die to get these and IF they are, it goes to explain the $50 too high price tag.
 

exquisitechar

Senior member
Apr 18, 2017
657
871
136
I wonder where the problem in the hardware implementation is for RT to cause a (seemingly) bigger drop off in performance than Turing. Will be interesting to see if things change as more games that make use of RT come out in the future.
 

kurosaki

Senior member
Feb 7, 2019
258
250
86
I wonder where the problem in the hardware implementation is for RT to cause a (seemingly) bigger drop off in performance than Turing. Will be interesting to see if things change as more games that make use of RT come out in the future.
I'm playing with the thought that the RT-cores in the AMD cards could work as fast in 1440p as 4k. That they have a problem with scaling (down), but on the other hand holds the performance when going up in resolutions. The "Leaked" slide from Vidcarrdzz is a 1440p slide, if true, what if AMD just kept their target of 60 FPS + in RT, regardless of resolution.
Mybe, the RT cores them self arent the bottleneck, what if the 128 MB infinity cache is the part targeted for keeping ~60 fps under worst case conditions, then buffer runs out and hits RAM, where we see a steep decline.
 

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
By that logic, 3080 should not be produced as well, because the percentage of cutting is very similar to the 6800.

Not at all similar really? The 6900XT does exist :) They've cut a good quarter of the die off the 6800 (including a shader engine) and down clocked it as well.

Its an unusual amount, but entirely rational if they're trying to cover the market with slightly fewer dies than NV are using.
 

leoneazzurro

Senior member
Jul 26, 2016
930
1,465
136
Not at all similar really? The 6900XT does exist :) They've cut a good quarter of the die off the 6800 (including a shader engine) and down clocked it as well.

Its an unusual amount, but entirely rational if they're trying to cover the market with slightly fewer dies than NV are using.

I mean, full A102 is 84 SM, 3080 is 68 SM, a fifth of the SM less. It has 96 ROPs instead of 112, and 320 bit bus instead of 384. So it has around 80% of the capabilities of the full chip all around. The 6800XT is 75% of the full chip, and in some areas (cache and bus) is exactly the same as the 6800XT/6900XT. So one can say 3080 and 6800 are similarly cut chips. But one should "not be produced" and the other "is just fine". Which seems silly to me.
 
  • Like
Reactions: Zepp and Tlh97

Glo.

Diamond Member
Apr 25, 2015
5,711
4,559
136
power-gaming-average.png


This is absolutely mental efficiency.
 

Leon

Platinum Member
Nov 14, 1999
2,215
4
81
DXR performance looks BAD, in some cases slower than 3070, much cheaper card. Also, it trail in raster performance in majority of games tested above. AMD, always a bridesmaid, never the bride. Some things never change....

DXR is coming, and AMD have nothing to compensate for it. Enjoy the "efficiency" boys :)
 
  • Like
Reactions: kurosaki

Mopetar

Diamond Member
Jan 31, 2011
7,842
5,995
136
power-gaming-average.png


This is absolutely mental efficiency.

Yeah that's nuts when you think about it. It's the same TSMC 7nm process, but AMD has managed to get the same amount of power draw from the 6800 as the 5700 despite adding 20 additional CUs and increasing the base clock by 350 MHz.

I mean I'm looking at the actual results, but I still don't quite believe it. Just insane.
 

Glo.

Diamond Member
Apr 25, 2015
5,711
4,559
136
@Glo. That's crazy. AMD sandbagged hard on their efficiency claim.
Yeah that's nuts when you think about it. It's the same TSMC 7nm process, but AMD has managed to get the same amount of power draw from the 6800 as the 5700 despite adding 20 additional CUs and increasing the base clock by 350 MHz.

I mean I'm looking at the actual results, but I still don't quite believe it. Just insane.
Scrap that review.

TPU has done some errors in their testing, since both their power draw, AND performance numbers are way lower than anybody else's.
 
  • Haha
  • Like
Reactions: moinmoin and Tlh97