Question Speculation: RDNA2 + CDNA Architectures thread

uzzi38 · Apr 28, 2020

All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html

DisEnchantment · Jul 28, 2020

edited this post, because it is not so meaningful in light of recent Mesa updates.

It seems Sienna and Navy are both GFX1030. Sienna = HBM and Navy = GDDR6. Both Sienna and Navy are Big Navi.
At least it seems both are similar (i.e. Polaris 12/11/VegaM)

src/amd/llvm/ac_llvm_util.c · abed921ce710d3a4463e0f8ccca2cfadf113e42b · Mesa / mesa · GitLab

Mesa 3D graphics library

gitlab.freedesktop.org

C:

    case CHIP_RENOIR:
        return "gfx909";
    case CHIP_ARCTURUS:
        return "gfx908";
    case CHIP_NAVI10:
        return "gfx1010";
    case CHIP_NAVI12:
        return "gfx1011";
    case CHIP_NAVI14:
        return "gfx1012";
    case CHIP_SIENNA_CICHLID:
    case CHIP_NAVY_FLOUNDER:
        return "gfx1030";

So it is untrue that Navy Flounder is Navi22/GFX1031 as reported here

AMD "Navy Flounder" NAVI 22 GPU added to Linux patches - VideoCardz.com

AMD Navy Flounder AMD is adding more patches to the Linux display drivers. We have so far heard about Sienna Cichlid, also known as Navi 21 or The Big Navi) The Navy Flounder is a new codename for yet another Navi 2X processor. The graphics processor appears to be from GFX103X family. Update: it...

videocardz.com

TESKATLIPOKA · Jul 28, 2020

itsmydamnation said:
2 renoir sized CCX's are pretty small and remember a regular GPU has a 16x PCI interface, the consoles if they have a south bridge would likely only need 12 and none of the other misc IO.

edit: here is ps4pro , not much "uncore " about

https://flic.kr/p/JkEiRK

2 Renoir CCX's are ~40-50mm2 and GDDR6 PHY and memory controller is much bigger than HBM2 PHY, just look at Navi10 vs Navi12. Here is also a dieshot of Renoir here
So there is no reason why a 72CU GPU should be much bigger than Xbox with 56CU. Keep in mind that CUs are very small 2.1mm2 for RDNA1.

DisEnchantment · Jul 29, 2020

Hmmm... no wonder code names and commit messages, commit cherry picking were so obscure and often misleading.

That's the goal - to get upstream support in place before launch without anyone knowing what happened

AMD Sends In Navy Flounder Support, More Sienna Cichlid For Linux 5.9 - Phoronix Forums

Phoronix: AMD Sends In Navy Flounder Support, More Sienna Cichlid For Linux 5.9 At the end of June was the first batch of AMDGPU changes queued for DRM-Next to in turn go into the Linux 5.9 kernel when that cycle opens up in August. On Friday a second batch of feature changes for this...

www.phoronix.com

On the plus side, ROCm will land for Navi1x/2x.

Gideon · Jul 29, 2020

Coreteks seems to think Big Navi isn't really that big. Supposedly only 15% faster than 2080 Ti in AMD-optimized titles:

“Big Navi” not so Big? – Coreteks

coreteks.tech

DiogoDX · Jul 29, 2020

Gideon said:
Coreteks seems to think Big Navi isn't really that big. Supposedly only 15% faster than 2080 Ti in AMD-optimized titles:

“Big Navi” not so Big? – Coreteks

coreteks.tech

If AMD launch another vega then RIP good prices on high end.

maddie · Jul 29, 2020

DiogoDX said:
If AMD launch another vega then RIP good prices on high end.

Meaning? Is it only the Ti models considered as high end?

DiogoDX · Jul 29, 2020

maddie said:
Meaning? Is it only the Ti models considered as high end?

You can call the top of the performance as you like.

The terrible Vega was more that 15% faster than the 980Ti. 0 to 15% faster that the 2080Ti will be another disaster unless is a reative small chip like 400mm tops. For the rumorerd 505mm chip will be a joke.

Zstream · Jul 29, 2020

DiogoDX said:
You can call the top of the performance as you like.

The terrible Vega was more that 15% faster than the 980Ti. 0 to 15% faster that the 2080Ti will be another disaster unless is a reative small chip like 400mm tops. For the rumorerd 505mm chip will be a joke.

If it's cheaper, why complain? If you think people are going to continue to pay for high GPU prices in the midst of C19, and other world factors, you have another thing coming. The console is finally pretty good in the graphics department.

GodisanAtheist · Jul 29, 2020

Zstream said:
If it's cheaper, why complain? If you think people are going to continue to pay for high GPU prices in the midst of C19, and other world factors, you have another thing coming. The console is finally pretty good in the graphics department.

I think it would be a problem for a few reasons:

- Its not a sustainable business model in this market to pump out inferior products and sell them for narrower margins than the competition. AMD being perpetually behind NV means they exit the discreet GPU space or become irrelevant in due time.

- Lower margins means less money to put into the intangibles, like additional software features that are more and more separating the wheat from the chaff.

- Not having the halo product in this space means that your competition will effectively always be able to match your performance and likely have more room to play around with price, which comes back to the first point.

- Most distressing, IMO, is it means AMD doesn't have the mental muscle or will to catch NV even if NV makes a mistep (if all the rumors surrounding the node drama are true).

Saylick · Jul 29, 2020

Gideon said:
Coreteks seems to think Big Navi isn't really that big. Supposedly only 15% faster than 2080 Ti in AMD-optimized titles:

“Big Navi” not so Big? – Coreteks

coreteks.tech

I'm a little skeptical... Only 15% faster, and that's with AMD optimized titles?

According to ComputerBase, the 2080 Ti FE is on average 53% faster than the 5700XT at 4K across a wide variety of games. If we assume 72 CUs and similar clocks, Big Navi should be 80% faster than a 5700XT, which makes it ~18% faster than the 2080 Ti. If you add in IPC gains or the chance that there's actually 80 CUs, not 72, then it ought to be closer to 30% faster on average.

https://www.computerbase.de/thema/grafikkarte/rangliste/#diagramm-performancerating-3840-2160

Geranium · Jul 30, 2020

DiogoDX said:
If AMD launch another vega then RIP good prices on high end.

People will get good price on GPU's if people keep buying Nvidia.

Geranium · Jul 30, 2020

TESKATLIPOKA said:
Does It?
Isn't the new Xbox SoC just 360mm^2? It has 56CU RDNA2 GPU and 8Cores + 320bit GDDR6 memory bus, south bridge and so on.

You cant measure dedicated GPU die size from Custom APU.
Xbox's APU dont have many unit that the dedicated GPU will have. Xbox's APU has less sophisticated display engine, no PCI-e root complex, probably no FP64 unit, less complex encode/decode engine, memory controller without ECC support, also probably less cache.

Geranium · Jul 30, 2020

Saylick said:
I'm a little skeptical... Only 15% faster, and that's with AMD optimized titles?

According to ComputerBase, the 2080 Ti FE is on average 53% faster than the 5700XT at 4K across a wide variety of games. If we assume 72 CUs and similar clocks, Big Navi should be 80% faster than a 5700XT, which makes it ~18% faster than the 2080 Ti. If you add in IPC gains or the chance that there's actually 80 CUs, not 72, then it ought to be closer to 30% faster on average.

https://www.computerbase.de/thema/grafikkarte/rangliste/#diagramm-performancerating-3840-2160

53% is best case senerio for RTX 2080 Ti. On TechPowerUp list 2080 Ti is only 34% faster at 1080p, 42% at 1440p and only 49% on 2160p. But for high resolution 2080 Ti has more Sadder, ROP and Bandwidth than RX 5700 XT.

DiogoDX · Jul 30, 2020

Geranium said:
People will get good price on GPU's if people keep buying Nvidia.

I had a 5970 and 7970. Only miss the 290X that was the last AMD good high end card and that was freaking 2013.

Then 980Ti and 1080Ti. If AMD lauch the 3rd turd in a row (Fiji, Vega) who buys high end will continue to only have Nvidia as option.

Qwertilot · Jul 30, 2020

Saylick said:
I'm a little skeptical... Only 15% faster, and that's with AMD optimized titles?

According to ComputerBase, the 2080 Ti FE is on average 53% faster than the 5700XT at 4K across a wide variety of games. If we assume 72 CUs and similar clocks, Big Navi should be 80% faster than a 5700XT, which makes it ~18% faster than the 2080 Ti. If you add in IPC gains or the chance that there's actually 80 CUs, not 72, then it ought to be closer to 30% faster on average.

https://www.computerbase.de/thema/grafikkarte/rangliste/#diagramm-performancerating-3840-2160

The 5700xt is clocked way past its efficient point. Scale it up sizewise and the TDP very quickly starts to limit your performance. You're much more likely to scale up at 5700 style clocks.

Even with the marketing,so a priori dubious, 50% efficiency gain claim.

We'll see.

Geranium · Jul 30, 2020

DiogoDX said:
I had a 5970 and 7970. Only miss the 290X that was the last AMD good high end card and that was freaking 2013.

Then 980Ti and 1080Ti. If AMD lauch the 3rd turd in a row (Fiji, Vega) who buys high end will continue to only have Nvidia as option.

Are you telling that GTX 980 Ti was a turd also cause it was only 2-5% faster than Fury X.
Only GTX 1080 Ti was 30% faster than Vega 64 but was also expensive than Vega 64. And Vega 64 was so turd that Apple used those on their expensie iMac.

Geranium · Jul 30, 2020

DisEnchantment said:
edited this post, because it is not so meaningful in light of recent Mesa updates.

It seems Sienna and Navy are both GFX1030. Sienna = HBM and Navy = GDDR6. Both Sienna and Navy are Big Navi.
At least it seems both are similar (i.e. Polaris 12/11/VegaM)

src/amd/llvm/ac_llvm_util.c · abed921ce710d3a4463e0f8ccca2cfadf113e42b · Mesa / mesa · GitLab

Mesa 3D graphics library

gitlab.freedesktop.org

C:

case CHIP_RENOIR: return "gfx909"; case CHIP_ARCTURUS: return "gfx908"; case CHIP_NAVI10: return "gfx1010"; case CHIP_NAVI12: return "gfx1011"; case CHIP_NAVI14: return "gfx1012"; case CHIP_SIENNA_CICHLID: case CHIP_NAVY_FLOUNDER: return "gfx1030";

So it is untrue that Navy Flounder is Navi22/GFX1031 as reported here

AMD "Navy Flounder" NAVI 22 GPU added to Linux patches - VideoCardz.com

AMD Navy Flounder AMD is adding more patches to the Linux display drivers. We have so far heard about Sienna Cichlid, also known as Navi 21 or The Big Navi) The Navy Flounder is a new codename for yet another Navi 2X processor. The graphics processor appears to be from GFX103X family. Update: it...

videocardz.com

Not surprising. Apple will use those, so HBM is must.

SK hynix Starts Mass-Production of HBM2E High-Speed DRAM

SK hynix announced that it has started the full-scale mass-production of high-speed DRAM, 'HBM2E', only ten months after the Company announced the development of the new product in August last year. SK hynix's HBM2E supports over 460 GB (Gigabyte) per second with 1,024 I/Os (Inputs/Outputs)...

www.techpowerup.com

^ 2 module in 2048-bit configuration can give 921.6 GBps of bandwidth, which would be enough for "Big Navi".

Kedas · Jul 30, 2020

CDNA MI100 seems to be 46TFLOPS for FP16.
46 vs 30 about +50%

Exclusive: CDNA and MI100 Presentation Slides Leak

Technology Vision

adoredtv.com

That is faster than the A100 of nvidia for SGEMM and with a smaller die.
But I'm not sure if this score of the A100 is correct.
According to the specs the A100 should be faster. (also has more transistors)

MI60 https://www.techpowerup.com/gpu-specs/radeon-instinct-mi60.c3233
MI100 https://www.techpowerup.com/gpu-specs/amd-mi100.g927
A100 https://www.techpowerup.com/gpu-specs/a100-pcie.c3623

SpaceBeer · Jul 30, 2020

I don't get this one

https://adoredtv.com/wp-content/uploads/2020/07/4-wm.png

If MI100 is 2,4x faster in FP32 and 2x slower in FP16,where in A100 FP16 is 4x FP32, it would mean MI100 FP32 is faster than its FP16? Ok, slide says "up to", but still it would mean FP32=FP16 in TFLOPS?
I mean, MI50/60 have FP16:FP32 as 2:1, so FP16 in MI100 should be at least the same as in A100. Similar is for FP64.

I mean this is CDNA, this chip should be optmized for Compute, it makes no sense to have lower FP64 and FP16 performance than MI60

Edit:
So I just found this one from few days ago

https://adoredtv.com/wp-content/uploads/2020/07/mi100-1-1.png

Here they say it has 9,5 TFLOPS FP64 (same as A100) and 150 TFLOPS FP16 (2x more than A100)
So what am I missing?

Krteq · Jul 30, 2020

Oh my.. AdoredTV 🙄

soresu · Jul 30, 2020

SpaceBeer said:
150 TFLOPS FP16 (2x more than A100)

Eh? That's not right, that number is half the A100 FP16 (312 TFLOPs) more like. Link.

Though I'm not sure if the A100 number doesn't come from the sparse AI tensor ops peak perf:

If so and the 150 FP16 Arcturus figure is accurate then CDNA1 and Ampere may well be close on certain counts.

With all these numbers floating around (hahaha pun), we will probably go mad speculating on their meaning until the official AMD presentation and slides give us both accurate numbers and an explanation to fit them.

Given that they just reaffirmed Zen3, RDNA2 and CDNA1 are all coming out this year, it's likely a huge mega announcement is coming that covers it all at once with deep dives to follow

soresu · Jul 30, 2020

Something to bear in mind, AMD made the very odd move of talking about CDNA2 before CDNA1 products had even been announced, and I don't think that this was simply because of the future HPC/supercomputer contract(s) that they had recently won with CDNA2.

It seems that much as with RDNA1 we will get a not so dramatically impressive move forward to the new compute accelerator (not GPU) paradigm with CDNA1 - but that the xDNA2 generations will be the real intended demonstrators for AMD's new diverged accelerator uArch strategy.

SpaceBeer · Jul 30, 2020

soresu said:
Eh? That's not right, that number is half the A100 FP16 (312 TFLOPs) more like. Link.

But that is from Tensor cores, and IIUC, can be used only for DL/NN applications, not for general computing. Or even if it could, does it require additional code change or nVidia's hardware/software handles it automatically?

Stuka87 · Jul 30, 2020

SpaceBeer said:
But that is from Tensor cores, and IIUC, can be used only for DL/NN applications, not for general computing. Or even if it could, does it require additional code change or nVidia's hardware/software handles it automatically?

Tensor cores can only perform specific types of matrix math. They can not be used for anything outside of that.

soresu · Jul 30, 2020

SpaceBeer said:
But that is from Tensor cores, and IIUC, can be used only for DL/NN applications, not for general computing.

Given that, if you go off the A100 FP64 figure of 9.7 TFLOPS x4 (39.2 TFLOPS) then both GPU's should be within about 800 GFLOPS of each other for FP16 general compute use cases (Arcturus being 38 TFLOPS) - assuming those shader cores even do general compute for FP16 in A100 that is.

Question Speculation: RDNA2 + CDNA Architectures thread

Platinum Member

Golden Member

Platinum Member

Golden Member

Platinum Member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Member

Member

Member

Senior member

Golden Member

Member

Member

Senior member

Senior member

Golden Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member