Discussion AMD XDNA AIE and FPGA Speculation and Discussion

moinmoin

Diamond Member
Jun 1, 2017
4,926
7,609
136
Next to Zen, RDNA and CDNA the Xilinx contributed XDNA is AMD's next major core IP joining.

bild_2022-06-12_23431t5kfc.png

(Source: slide 3 Mark Papermaster presentation)

It will first joining us with the Rembrandt sucessor Phoenix Point.

bild_2022-06-10_0112426jqy.png

(Source: slide 11 Saeid Moshkelani presentation)


AI Engine (AIE) and FPGA are both part of AMD XDNA.

bild_2022-06-12_23532q0jro.png


AIE is optimized for typical neural network trees.

bild_2022-06-12_23560c5jxl.png


AIE will come to Ryzen and Epyc and make them vastly more capable for AI applications.

bild_2022-06-12_235901jkrx.png


AMD's and Xilinx's software stack will be unified to access AI capability across Zen, RDNA, CDNA and XDNA.

bild_2022-06-13_00032s6jut.png

bild_2022-06-13_00040e4j5g.png

bild_2022-06-13_00043pkju5.png


My hope is the above unified stack will also make the code portable to non-AMD systems, increasing the likelihood of it going into common code then making use of AIE if available.

(Source: slide 19, 20, 22, 23-25 Victor Peng presentation)

As @nicalandia pointed out AI Engine is an IP Xilinx presented earlier already. It is believed AMD originally licensed it before merger talks started.


The website actually has some more information on it not included in the PDF, like the actual output of each of the two types of AIE tile: https://www.xilinx.com/products/technology/ai-engine.html

1622846115613.png

1622846124385.png
 
Last edited:

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
I wonder if the IA Acceleration module used on Ryzen 7000 is part of the Xilinx IP or if it's using AMD's AI patent process.


1655073812059.png

Or that is due to AVX512 and RDNA GPU on the IO chiplet?
 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
4,926
7,609
136
I wonder if the IA Acceleration module used on Ryzen 7000 is part of the Xilinx IP or if it's using AMD's AI patent process.


View attachment 63002

Or that is due to AVX512 and RDNA GPU on the IO chiplet?
To me that sounds like the Xilinx IP indeed. The AI AVX512 instructions are part of the core, don't think splitting that off and putting that on a stacked accelerator is really feasible.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,587
5,703
136
The interesting thing to me about the AIE seems to be that it is architected in PIM layout. So they typically hit very close to peak throughout when running real workloads. And efficiency is higher too due to avoiding the constant data movement to and from memory unlike GPUs. However, they lose some efficiency once the entire network cannot fit on the entire device.
In a best case scenario you can just stream the entire data set in and out comes the final result of the inferencing without any trip to main memory for storing results of intermediate layers or activation functions.

Radically different compared to what i have seen in some ARM SoCs I have worked with which seems to be evolution of DSP SIMD VLIW blocks to handle low precision arithmetic. (e.g. mDSP, cDSP, SLPI, aDSP ... if you have ever heard of these in public domain)

In the slide you can also see they are used for signal processing. In Android, you can load an algorithm on such a block and it can trigger wakeup of the CPU when it detects a hotword from the mic or when it recognizes an image from the camera. Very good for power efficiency when using wakeup from sleep via camera or voice. You can put the CPU to sleep and let it work in the background.
Skype, MS Teams and Webex integrate echo and noise cancellation, would be interesting if AMD can get MS on board with this. Same thing with video con, you can blur or change background.
Very useful during work from home scenario or working across different geographies.
Remains to be seen if other software can integrate support for this AIE. But there's potential.

But I think AMD is lacking an AOP block.
 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
4,926
7,609
136
Not much to report in this area it seems. Closest is following ongoing work on a driver to connect FPGAs to CPUs:

"Since last year AMD-Xilinx has been posting Linux patches for enabling CDX as a new bus between application processors (APUs) and FPGAs."

Seems to be quite a convoluted setup. Will the same interface be used with integrated AIE on Phoenix and upcoming CPUs or just external FPGAs?
 
  • Like
Reactions: Tlh97 and Vattila

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
Am I reading the original post's slides correctly where it says the only AIE involved with Zen 4 is on Strix Point, and desktop didn't get anything which I know it didn't but why mobile first? Did they begin PP's development later in the Zen 4 cycle?
 

moinmoin

Diamond Member
Jun 1, 2017
4,926
7,609
136
Did they begin PP's development later in the Zen 4 cycle?
Yes, mobile is usually getting the same core significantly after said core premieres in server and desktop chips. Looks much closer now time wise due to the delays the Zen 4 Ryzen and Epyc chips saw.

Another indication of that delay is how soon Zen 4c as used in PHX2 and Bergamo is supposed to arrive. In future gens I expect that to happen more like in the middle in-between Zen gens.
 
  • Love
Reactions: A///

VirtualLarry

No Lifer
Aug 25, 2001
56,225
9,987
126
Will be interesting, if you can use these consumer CPUs with integrated FPGA to do "FPGA mining", with the right software / bitstreams. Could really shake things up in the GPU mining scene, and drive some strong adoption for these new AMD CPUs.
 
  • Like
Reactions: A///

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
drive some strong adoption for these new AMD CPUs.
that's underselling the blowout sales. If it comes to that and if intel is smart they'd incorporate the tech, too. it'd be massive sales boom period. jh will be off in the corner playing with a doll of himself crying losing out on that sweet money rolling in.
 

DrMrLordX

Lifer
Apr 27, 2000
21,570
10,763
136
Will be interesting, if you can use these consumer CPUs with integrated FPGA to do "FPGA mining", with the right software / bitstreams. Could really shake things up in the GPU mining scene, and drive some strong adoption for these new AMD CPUs.

Depends on how large and powerful is the integrated FPGA.
 

nickmania

Member
Aug 11, 2016
47
13
81
I am really interested on this, my new computer acquisition need to have AI capabilities to train models, but really confused at this point. The last interview with an AMD representative clearly expose that the consumer and server line will have different tools and hardware optimizations for AI, so is possible that the purchase a new mobile APU with AI integrated is not what I am expecting due the lack of software and this differentiation exposed on the interview. Also an actual Zen4 CPU does not have any AI silicon, so I am waiting for the ThreadRipper line to include the same silicon Ai circuits than Epyc server line... I am missing something?

I like to do an optimal purchase for this but is really difficult at this point. For graphical AI the winner is clearly a Nvidia GPU with tons of memory, but for CPUs I have no clue at this time.
 

DrMrLordX

Lifer
Apr 27, 2000
21,570
10,763
136
I am really interested on this, my new computer acquisition need to have AI capabilities to train models, but really confused at this point. The last interview with an AMD representative clearly expose that the consumer and server line will have different tools and hardware optimizations for AI, so is possible that the purchase a new mobile APU with AI integrated is not what I am expecting due the lack of software and this differentiation exposed on the interview. Also an actual Zen4 CPU does not have any AI silicon, so I am waiting for the ThreadRipper line to include the same silicon Ai circuits than Epyc server line... I am missing something?

I like to do an optimal purchase for this but is really difficult at this point. For graphical AI the winner is clearly a Nvidia GPU with tons of memory, but for CPUs I have no clue at this time.

Consumer products (e.g. APUs/SoCs) with integrated AI functionality seem more aimed at inference hardware than anything else. If you're looking to do training on consumer hardware, you're probably still stuck with whatever NV dGPU you can get.