Why not skip vector extensions (avx) and focus on adding full vector units (gpgpu/apu

bobb2qw

Junior Member
Jul 28, 2010
1
0
0
Can someone explain why the cpu makers are bothering to add vector extensions when they are working towards adding full vector units to their cpu's?


Especially amd since their gpu tech is ahead of intel. They could have created an advantage like they did with developing x86-64. When intel announced they would add avx instead of sse5, amd could have decided to focus on getting their gpgpu integrated instead of playing ball with avx.

Or on intel's side, instead of bothering with avx extensions why not add full gpgpu core. So instead of quad-core cpu, they could replace 1 cpu core with gpgpu core, making it 3cpu+1gpgpu. The gpgpu would be made up of 4-8 larrabee cores.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Or on intel's side, instead of bothering with avx extensions why not add full gpgpu core. So instead of quad-core cpu, they could replace 1 cpu core with gpgpu core, making it 3cpu+1gpgpu. The gpgpu would be made up of 4-8 larrabee cores.

http://forums.anandtech.com/showpost.php?p=30187264&postcount=166

Computer Bottleneck said:
I know you are referring to AMD in this post, but how is Intel's support for Open CL coming along?

Scali said:
I have no idea really.
Intel officially supports the OpenCL standard, they've mentioned it a few times in press releases as well.
Perhaps they were/are planning to release OpenCL support once Larrabee is on the market. I doubt that they have much interest in regular x86 support, as there's not much use. All software already supports x86 natively, and that will always be more efficient than OpenCL code. Enabling OpenCL support for their x86 processors would only open the door to more GPGPU support in applications, which is not in Intel's interest obviously.
And Intel's current GPUs are probably slower at OpenCL than their CPUs, so not much point in supporting those either (although I think their GPUs may be capable of it, technically. They have full DX10 support, and certain operations such as triangle clipping are also done on their shaders, rather than with dedicated hardware).

So I don't know the status of their OpenCL... perhaps they have developed a CPU runtime, but just don't see a reason to release it... and perhaps they have not invested time in it at all. My guess would be the former though.

AMD's CPU OpenCL runtime works fine on Intel CPUs though, so end-users can still use OpenCL on Intel processors if they want.
 

aphorism

Member
Jun 26, 2010
41
0
0
interesting question.

AVX is basically improved SSE with double the register size. i think one strategy of x86 makers is to avoid adding more cores thus scaling issues by adding wider SIMD to them.

also programming models would be very different if the SIMD cores were not an extension of x86. they would have their own ISA and could not share the same memory space. this is a problem with AMD's "fusion" because you have to buffer memory in a really inefficient way if you need to send small amounts of data at once.

lastly you can not just make a massive vector unit and expect good utilization. you have to add more units and you will end up with some ratio of vector cores v. scalar cores when you could make a superscalar core with internal vector and scalar units.
 

Cogman

Lifer
Sep 19, 2000
10,277
125
106
The way a GPGPU works and the way a CPU works is in two very different ball parks. Think of the GPU as 100 really dumb, and somewhat slow, CPUs. The reason GPGPU programming works (for some applications) is because what you are essentially saying is "This half of the GPGPU cores, add 1 to each 64bit element of this chunk of memory" Yes, it does a lot of processing, but the problem is, individually, each stream processor is somewhat slow.

CPUs, on the other hand, work in a much more highly specialized fashion. They work with small chunks of data in a very serial manner, quickly jumping around and performing different tasks based on what the data tells them to do.

The GPGPU isn't the silver bullet of computing, same goes for multiple cores. You can't just say "Make everything a GPU, then things will go faster" because, it won't. No amount of programming paradigm switches will change that, some operations are inherently sequential in nature.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
AVX is pretty similar to what they were (are) planning to do with Larrabee.
Adding vector extensions is Intel's way of adding full vector units and arriving at a GPGPU/APU.
Larrabee just had an entirely different implementation of this x86+extensions architecture... namely with 4-way hyperthreading and in-order cores.
After all, the instructionset (or its extensions) doesn't define HOW you implement it.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
The way a GPGPU works and the way a CPU works is in two very different ball parks. Think of the GPU as 100 really dumb, and somewhat slow, CPUs.

SSE/AVX and GPUs are very similar in operation though. Especially in the case of nVidia and their 'scalar thread' approach.

A single SSE/AVX instruction does one operation on a set of registers in parallel.
That is exactly what Cuda does aswell. Difference is mainly that Cuda takes the parallelism way further than just 4 or 8 registers.
A Cuda multiprocessor unit is basically like an uber-wide SSE/AVX unit on an Intel CPU.
The main difference is in how nVidia selects the instructions to execute next. It can have many instances of the same thread, and it uses a scoreboarding algorithm to keep track of which threads are ready for execution of the next instruction (all dependencies on previous instructions being resolved).
This is more or less what Intel tried to do with their 4-way SMT in Larrabee, I suppose. Having 4 different threads of SIMD routines running on a single core, where they can hide eachothers latency.
 

ydnas7

Member
Jun 13, 2010
160
0
0
Amdals law is why we haven't been using multiprocessors since the late transputer.