• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."
  • Community Question: What makes a good motherboard?

Info AMD CDNA Compute GPU architecture

soresu

Golden Member
Dec 19, 2014
1,532
761
136
Seems like AMD is truly creating a permanent separation in GPU between gaming and compute focused uArch - the new compute focused uArch family is called CDNA.

1QXw1RdMYEkpb9Yq.jpg

Hopefully this doesn't mean any significant drop in async compute and general Vulkan compute performance with RDNA.
 

soresu

Golden Member
Dec 19, 2014
1,532
761
136

Veradun

Senior member
Jul 29, 2016
564
780
136
I guess for now CDNA is just a label put on GCN (Vega). The true departure will happen somewhere down the road, my guess is with CDNA3.
 

DisEnchantment

Senior member
Mar 3, 2017
699
1,619
106
Not quite, RDNA still has general compute capabilities, it's just not focused on that, so likely no DP FP RDNA ever, and it will lack the tensor acceleration of CDNA too, so ML will not run nearly as well or efficiently on RDNA.
I can imagine the first thing they will do is nerf Navi10's fp64 capabilities and lots of IF function blocks. Navi10 has a lot more fp64 throughput than Turing.
 

GodisanAtheist

Platinum Member
Nov 16, 2006
2,522
1,008
136
I guess for now CDNA is just a label put on GCN (Vega). The true departure will happen somewhere down the road, my guess is with CDNA3.
-AMD seems to draw a distinction in their own slides and there are some serious changes under the hood as well.

They're ripping out all the rasterization HW used for pumping out graphics and replacing it with Tensor cores and other compute focused stuff.
 
  • Like
Reactions: Tlh97 and Stuka87

Hitman928

Diamond Member
Apr 15, 2012
3,200
3,209
136
Is there a write up of this anywhere?
Short write-up here:


It's faster than A100 in 'traditional' fp32 and fp64 but slower in pure matrix/bfloat calculations. In mixed workloads, MI100 may have the advantage as well. A100 has more VRAM but MI100 should be considerably cheaper unless Nvidia adjusts price in response.
 

Qwertilot

Golden Member
Nov 28, 2013
1,568
226
106
I did think I'd seen a few people here say that the A100 could somehow dual purpose it's tensor cores to give it a bunch more effective FP performance?

Probably depends somewhat on the details of specific workloads though.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,153
1,674
136
I did think I'd seen a few people here say that the A100 could somehow dual purpose it's tensor cores to give it a bunch more effective FP performance?
Yes and they are wrong and i kept asking them to prove it and crickets.

if you understand how a tensor core actually works its easy to understand why.

But the thing to also remember is memory/register pressure, the execution is largely the easy part. its the data movement that costs. So A100 / MI100 etc will be largely be designed so they have more or same execution resources as bandwidth/register/cache read/write , because execution resources are cheap and easy and data movement is expensive and hard so they just aren't going to leave that performance on the table.

If you could rewrite your FMA code to be GEMM then of course that is a different situation.
 

gdansk

Senior member
Feb 8, 2011
523
211
116
On a similar node, it must be at least twice as big as Vega 20. Would this be the closest AMD has ever come to a reticle limit chip?

Seem to be winning some big deals but I guess if you're building a very expensive super computer you also have the money to hand tune the software to run well on the machine. Where this will suffer is selling to cloud vendors who rent them out and their clients will still prefer CUDA to HIP.
 
  • Like
Reactions: lightmanek

soresu

Golden Member
Dec 19, 2014
1,532
761
136
and their clients will still prefer CUDA to HIP.
The entire point of HIP is portability from CUDA.

Not just to AMD hardware, but also back to CUDA from HIP if you wish it = so you can keep a dual hardware codebase if you don't mind it lagging the CUDA state of the art a bit.
 

Hitman928

Diamond Member
Apr 15, 2012
3,200
3,209
136

gdansk

Senior member
Feb 8, 2011
523
211
116
The entire point of HIP is portability from CUDA.

Not just to AMD hardware, but also back to CUDA from HIP if you wish it = so you can keep a dual hardware codebase if you don't mind it lagging the CUDA state of the art a bit.
Have you tried using it? There are many corner cases where HIPify doesn't actually work and you'll have to dig through and fix it manually. And as you say you must have to restrict yourself to CUDA8 which hasn't been a big deal but might be a step back for some people.

It's basically their only shot to get Nvidia customers over to their side but they need people to use it. But no one wants to use it because Nvidia hardware isn't prohibitively expensive and the code we have written works as-is.
 

gdansk

Senior member
Feb 8, 2011
523
211
116
Why is there supposedly VCN still included in these? Are the expected to be used for video decode/encode at all? Seems strange when all graphics capability has been stripped out.
For machine learning applications which need to decode video, per the overview:
the AMD CDNA family retains dedicated logic for HEVC, H.264, and VP9 decoding that is sometimes used for compute workloads that operate on multimedia data, such as machine learning for object detection
 

Saylick

Senior member
Sep 10, 2012
899
621
136
This thing must be *massive*
I think people were already estimating it to be in the low to mid-700mm2 range for die size based on the size of the HBM PHYs, which seems kind of large to be honest since it's got 50% extra CUs than Big Navi, which is estimated to be in the low 500mm2 range with a huge 128 MB LLC, yet has all of the graphics pipeline stripped out (no TMUs, ROPs, geometry engines, etc). Are the tensor cores and doubled register files really that space hungry? I wouldn't imagine so.
 

Stuka87

Diamond Member
Dec 10, 2010
5,216
985
126
On a similar node, it must be at least twice as big as Vega 20. Would this be the closest AMD has ever come to a reticle limit chip?

Seem to be winning some big deals but I guess if you're building a very expensive super computer you also have the money to hand tune the software to run well on the machine. Where this will suffer is selling to cloud vendors who rent them out and their clients will still prefer CUDA to HIP.
Vega20 still had rasterization hardware in it. It was a full blown GPU.

CDNA cards arent video cards. They don't have the ability to output video. So yes, it will be larger than Vega20, but unlikely to be double as it had lots of stuff removed that Vega20 had.
 
  • Like
Reactions: prtskg

ASK THE COMMUNITY