Discussion RDNA4 + CDNA3 Architectures Thread

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,608
5,810
136
1655034287489.png
1655034259690.png

1655034485504.png

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it :grimacing:

This is nuts, MI100/200/300 cadence is impressive.

1655034362046.png

Previous thread on CDNA2 and RDNA3 here

 
Last edited:

Mopetar

Diamond Member
Jan 31, 2011
7,842
5,993
136
No FP4 or INT4?

I thought that it did have INT4, but I can't recall if that's just support or native hardware. Technically anything will support INT4 using the hardware it has, but there wouldn't be any speed up.

Is FP4 actually a thing because that seems incredibly pointless no matter how you try to arrange it. If it's signed you'd only have two bits for either the exponent or the mantissa and the other would get a single bit.

16-bit floating point (at least with the bfloat format) still gives a range essentially as large as F32 so it's still useful as long as you don't care as much about precision. Anything below that runs into the same issues that make IEEE FP16 format less desirable.
 

MrTeal

Diamond Member
Dec 7, 2003
3,569
1,699
136
I thought that it did have INT4, but I can't recall if that's just support or native hardware. Technically anything will support INT4 using the hardware it has, but there wouldn't be any speed up.

Is FP4 actually a thing because that seems incredibly pointless no matter how you try to arrange it. If it's signed you'd only have two bits for either the exponent or the mantissa and the other would get a single bit.

16-bit floating point (at least with the bfloat format) still gives a range essentially as large as F32 so it's still useful as long as you don't care as much about precision. Anything below that runs into the same issues that make IEEE FP16 format less desirable.
The little reading I've done indicated best results are actually with a sign and 3 exponent and no bits for mantissa, but I think FP4 is more researchy than actually used at this time.
 

Mopetar

Diamond Member
Jan 31, 2011
7,842
5,993
136
The little reading I've done indicated best results are actually with a sign and 3 exponent and no bits for mantissa, but I think FP4 is more researchy than actually used at this time.

Assuming you wanted FP4 to be able to store special values (infinities and 0) you'd effectively lose one bit of that exponent and you effectively get positive or negative numbers in the set {♾, 16, 8, 4, 2, 0, .5, .25} assuming I've calculated it correctly which I'll admit I may not have. There might be some issues due to not having a mantissa as I think some special values like NaN are represented in a specific way using those bits.

It seems like you could probably find a clever way of using INT4 to accomplish the same thing you'd be trying to do with such a small floating point number. I suppose having an infinity value might be useful as INT4 overflow could cause some nasty issues.
 

Joe NYC

Golden Member
Jun 26, 2021
1,962
2,294
106
Semianalysis has an article on MI300. Enjoy!
Interesting info. MLID got nearly everything right in his original leak.

BTW, I am most curious about the status, when it is ready, when it is shipping etc. There were number of contradicting rumors / statements. Q4 2023, Q1 2024. And now, Dylan adds another one - that it is shipping now and ramping in Q3.
 

Saylick

Diamond Member
Sep 10, 2012
3,170
6,398
136
Interesting info. MLID got nearly everything right in his original leak.

BTW, I am most curious about the status, when it is ready, when it is shipping etc. There were number of contradicting rumors / statements. Q4 2023, Q1 2024. And now, Dylan adds another one - that it is shipping now and ramping in Q3.
I suspect you'll get a better answer to hose questions tomorrow at their AI event.
 
  • Like
Reactions: Tlh97 and Joe NYC

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
Semianalysis has an article on MI300. Enjoy!
Marvel of engineering. Yes. But that doesn't help if it takes a team of experts devs to get the software running on it.

Nvidia has the market share because of software. And because you can just prototype things on your company provided windows laptop. good luck trying to get rocm running on windows. only change is via WSL and even then it's hit and miss. plus AFAIK normal laptop gpus aren't even supported to run rocm. no engineering marvel will fix that for AMD to get widespread adoption outside of supercomputers.
 
  • Like
Reactions: xpea

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,666
136
good luck trying to get rocm running on windows.
Seriously though who would want that on a professional production machine? I understand that Nvidia has the advantage of allowing client GPUs using gaming OSes to use professional grade software, but experts would never use either in production.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
Well, that was a damp squib. :confused:
Doesn't seem like much info on the package construction. They basically just confirmed the MI300A and MI300X products and that is about it. I skipped around a bit, but I don't think there was any mention of a cpu only version, but they have to connect a gpu only version to a cpu somehow. Would they have a mixed board with an SH5 and an SP5 socket?

With 896 GB/s infinity fabric bandwidth, would that be 4 x16 links at ~56 GB/s per base die?
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
Was there any good reason to be excited about CDNA3? Sorry, didn't follow this thread too closely and kinda glossed over the enterprise/data center talk.
If you aren’t a victim of nvidia vendor lock-in practices, then there is reason to be interested. Also, you obviously have to be interested in such hardware in the first place. This is isn’t a gaming gpu.

I am more interested in the packaging tech and how it could be applied elsewhere. I use systems with Nvidia GPUs and AMD CPUs at the moment. We may have to make a decision to port to ARM or drop cuda eventually. I suspect that nvidia’s arm cores will be rather weak compared to Zen 4 or Zen 5, so porting to them wouldn’t just be x86 to ARM, it would also probably mean making the code much more multi-threaded, which is not simple.

I have wondered if unified memory will reduce the porting effort significantly, if a lot of the low level memory management stuff goes away. I am not a gpu programmer though.
 

Joe NYC

Golden Member
Jun 26, 2021
1,962
2,294
106
Doesn't seem like much info on the package construction. They basically just confirmed the MI300A and MI300X products and that is about it. I skipped around a bit, but I don't think there was any mention of a cpu only version, but they have to connect a gpu only version to a cpu somehow. Would they have a mixed board with an SH5 and an SP5 socket?

With 896 GB/s infinity fabric bandwidth, would that be 4 x16 links at ~56 GB/s per base die?
Really nothing new was said, other than Mi300x being a drop in replacement for H100, and confirmation of some widely leaked info.

It seems that this was more of a Bergamo and Genoa-X launch event, and another Mi300 tease event. The actual launch of Mi300 - maybe sometimes late Q3, early Q4 - will have the more detailed technical info we were looking for.
 

Joe NYC

Golden Member
Jun 26, 2021
1,962
2,294
106
Would they have a mixed board with an SH5 and an SP5 socket?
Mi300x is actually OCP socket.

So like a lot of the AI system out there featuring 2 CPUs and 8 GPUs, AMD platform will have 2 Genoa CPUs, 8 OCP Mi300x GPUs, and no socket SH5.

Mi300a and Mi300c will be SH5 socket.
 

Ajay

Lifer
Jan 8, 2001
15,458
7,862
136
Really nothing new was said, other than Mi300x being a drop in replacement for H100, and confirmation of some widely leaked info.

It seems that this was more of a Bergamo and Genoa-X launch event, and another Mi300 tease event. The actual launch of Mi300 - maybe sometimes late Q3, early Q4 - will have the more detailed technical info we were looking for.
It’s not a 'drop in' solution as Nvidia uses a different connector and fabric which are proprietary, AFAIK. Also, not CUDA, which isn’t a problem for supercomputers, but will be for workstations and small server setups.
 

DeathReborn

Platinum Member
Oct 11, 2005
2,746
741
136
It’s not a 'drop in' solution as Nvidia uses a different connector and fabric which are proprietary, AFAIK. Also, not CUDA, which isn’t a problem for supercomputers, but will be for workstations and small server setups.
Yea, this isn't a drop in replacement at all, it's an almost complete start from scratch in both HW & SW.
 

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
Seriously though who would want that on a professional production machine? I understand that Nvidia has the advantage of allowing client GPUs using gaming OSes to use professional grade software, but experts would never use either in production
Not for production but for development. if you can't do it on your laptop, you need yet another infrastructure/server for development which means higher cost.
 

Joe NYC

Golden Member
Jun 26, 2021
1,962
2,294
106
It’s not a 'drop in' solution as Nvidia uses a different connector and fabric which are proprietary, AFAIK. Also, not CUDA, which isn’t a problem for supercomputers, but will be for workstations and small server setups.

From hardware POV, it is the It is the same form factor - OAM.
 

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,666
136
Not for production but for development. if you can't do it on your laptop, you need yet another infrastructure/server for development which means higher cost.
Let's remember that Nvidia's approach ensured that gaming hardware was perfectly usable for mining with the predictable results on the market. Current development on ROCm appears to imply AMD intends to support up to last gen consumer graphics in ROCm while current gen will have to wait.