Parallel processing

nshariff22

Junior Member
Mar 14, 2005
11
0
0
Hi

I am not sure if I am posting this in the right catogery, I have been studying electronics engineering. I am interested in doing a project in multicore programming for image or signal processing .
What I really want to know if whether the code would have to be written differently for Intel and AMD multicore processors, or would it be the same. and which one do you think is easier to understand and implement ?

Ideally I would like to see a same sample code in sequential form and parallelized or multithreaded form .

Any help would be great.

Thanks
 

blackllotus

Golden Member
May 30, 2005
1,875
0
0
Code does not have to be written differently for Intel processors or AMD processors

EDIT: At the assembly level there may be a few differences, however it is highly unlikely that you have to worry about them.
 

f95toli

Golden Member
Nov 21, 2002
1,547
0
0
Blackllototus is right. However, one thing one needs to worry about when doing numerical work (image or signal processing in your case) on parallell processors is how to distribute the work. Essentially there is always a compromise between "redundant code" (meaning you are doing the same thing on two or more processors) and communication BETWEEN the processors (which is slow, meaning it causes a lot of communication overhead).
Communication overhead is obviously less of an issue with multcore architecure than if you have several "discrete" CPUs but I wonder if the "optimal" trade-off between communicaion and "redundancy" is the same for the AMD and Intel procesors? I suspect the answer is not obvious and might depend a great deal on the application.

Last year I worked on a project where I was using a cluster comprising a number of quad-Opteron computers in a high-speed network. Distributing the work was major PITA since the communication overhead was reasonably low as long as only CPUs in the same computer were involved, but it went through the roof if one of the four CPUs needed to communicate with a fifth on another computer.

 

bobsmith1492

Diamond Member
Feb 21, 2004
3,875
3
81
Signal processing is extremely parallel by nature. The DSP board we've been using in class can do 8 operations/cycle, with some restrictions (2 can be multiplications, 2 can be memory fetches, and so on). I know its compiler automatically splits it up, and it does the DFT by radix 4 FFTs more efficiently than radix 2 just because of this. I would wonder if there are any compilers around that might allow for multithreaded compilation.
 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
Originally posted by: bobsmith1492
Signal processing is extremely parallel by nature. The DSP board we've been using in class can do 8 operations/cycle, with some restrictions (2 can be multiplications, 2 can be memory fetches, and so on). I know its compiler automatically splits it up, and it does the DFT by radix 4 FFTs more efficiently than radix 2 just because of this. I would wonder if there are any compilers around that might allow for multithreaded compilation.

For your DSP board or in general?

Many compilers support multithreaded applications, as long as you have the right libraries and the OS you are compiling for supports multithreading. I've personally used gcc and Visual C++, compiling for both Windows and UNIX/Linux.

To 'natively' support multiple threads using the DSP board, it would have to have drivers that are thread-safe and/or reentrant. Its documentation should tell you if it supports this kind of operation. Otherwise, to have multiple threads using it, you would need to put some sort of thread-safe scheduler (or at least a mutual exclusion lock) around the driver calls.

For example, you could have one thread in your program queue data to be processed by the DSP board, and then other threads would just put data into the shared queue rather than directly sending it to the board.
 

f95toli

Golden Member
Nov 21, 2002
1,547
0
0
Originally posted by: bobsmith1492
Signal processing is extremely parallel by nature.

That is not entirely correct. There are plenty of examples of algorithms that are very difficult to run efficiently on parallell processors.
One example that comes to mind is a signal recovery algorithm that is used in base stations for 3G, there is litterarly no gain in running it on multiple processors so a very fast serial DSP is used instead. The problem is that it is impossible to do the same for 4G base stations because of the higher bandwidth which would require a DSP running at a clock frequency of something like 200 GHz.
This is not possible using semiconductors and is actually one area where superconducing electronics might have a future since that is the only technology that can handle those speeds.

 

blackllotus

Golden Member
May 30, 2005
1,875
0
0
Originally posted by: bobsmith1492
I would wonder if there are any compilers around that might allow for multithreaded compilation.

IIRC The optimizer in Microsoft's Visual C++ compiler can automatically multithread certain sections of code.

EDIT: NVM, it just introduces support for OpenMP.
 

bobsmith1492

Diamond Member
Feb 21, 2004
3,875
3
81
Originally posted by: Matthias99
Originally posted by: bobsmith1492
Signal processing is extremely parallel by nature. The DSP board we've been using in class can do 8 operations/cycle, with some restrictions (2 can be multiplications, 2 can be memory fetches, and so on). I know its compiler automatically splits it up, and it does the DFT by radix 4 FFTs more efficiently than radix 2 just because of this. I would wonder if there are any compilers around that might allow for multithreaded compilation.

For your DSP board or in general?

Many compilers support multithreaded applications, as long as you have the right libraries and the OS you are compiling for supports multithreading. I've personally used gcc and Visual C++, compiling for both Windows and UNIX/Linux.

To 'natively' support multiple threads using the DSP board, it would have to have drivers that are thread-safe and/or reentrant. Its documentation should tell you if it supports this kind of operation. Otherwise, to have multiple threads using it, you would need to put some sort of thread-safe scheduler (or at least a mutual exclusion lock) around the driver calls.

For example, you could have one thread in your program queue data to be processed by the DSP board, and then other threads would just put data into the shared queue rather than directly sending it to the board.

Well, the compiler works more on the individual instruction level rather than on multiple threads. That is, a single-threaded program can be compiled to run more quickly since the processor can do several steps at once. Granted, this won't help much in sequential operations, which is why the radix-4 FFT is faster on this particular core - it can use the processor more efficiently.

I'm pretty sure modern CPUs do this to an extent as well - possibly the meaning of C2D's 4-issue core? Maybe this DSP is essentially a limited 8-issue core...

That is not entirely correct. There are plenty of examples of algorithms that are very difficult to run efficiently on parallel processors.

I'm sure there are... I was thinking along the lines of image and video processing, though, and wasn't really thinking about multithreading at all with this board.