What is the difference between cpu and gpu?

draggoon01

Senior member
May 9, 2001
858
0
0
Recently i've been trying to understand the difference. There's a lot of technical stuff but i'm just aiming for a basic conceptual understanding.

It's a bit difficult because articles i find are either too basic (cpu's are programmable, gpu's are massively parallel) or too technical (getting into how code is handled)

So here's my understanding so far, please make corrections, fill in the gaps, etc:




The work unit of cpu is alu and fpu. The fundamental work unit of gpu is the fpu.

The cpu is general and handles any type of processing. Since the work is so varied, left alone, programs would stall at various stages of the pipeline and waste many cycles. Maximum performance is achieved by dedicated transistors to organizational stuff (branch prediction, out of order processing, pre-fetch, cache, etc) to keep the work units busy at all times.

The gpu is more specific and workload is very regular and predictable. So when the cpu does this kind of work the resources used for organizing is wasted and unnecessary. Gpu's get much better performance than cpu because organizing transistors are thrown out and more work units are put it their place.

Special instructions for cpu, like sse4, improve performance by acting as shortcut commands to skip unnecessary steps and therefore act more gpu-like

Because gpu work is much simpler, the stages are shorter and this is why their clock speeds are so much less than cpu's using the same manufacturing process.

ppu physics processors are very much like gpu and the fundamental work unit is the fpu. ppu try to improve performance over gpu by removing the unnecessary rendering resources and dedicating more transistors to physics processing. Whereas gpu's allow vast improvement over cpu, many believe the difference of ppu over gpu is minimal because they are so similar and so it would be better for physics to use a second gpu card if not just relying on multi core cpu.
 

techkyle

Junior Member
Aug 22, 2007
1
0
66
Researching stream processing might help on this. Sorry I couldn't give a more technical answer.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
I make some serious simplifications in this post. I make some guesses about GPU design and manufacturing. When I say "GPU", I'm referring to modern programmable GPUs, not older GPUs (their lack of programmability significantly simplifies some things, but I know practically nothing about their internal architecture, whereas I've seen a few powerpoint presentations (e.g. this) about modern GPUs).

Originally posted by: draggoon01
The work unit of cpu is alu and fpu. The fundamental work unit of gpu is the fpu.

A CPU is ~3 ALUs and 1 FPU (which might do 2-3 operations at a time) and a boatload of support logic. A GPU is ~64 FPUs (oversimplification) and support logic.

The cpu is general and handles any type of processing. Since the work is so varied, left alone, programs would stall at various stages of the pipeline and waste many cycles. Maximum performance is achieved by dedicated transistors to organizational stuff (branch prediction, out of order processing, pre-fetch, cache, etc) to keep the work units busy at all times.

It's not really that it is general-purpose... there are very specific tasks that a CPU is very good at. A CPU is very good at tasks where you frequently have to look at some data and make a decision based on it (a branch). A CPU can do a few things in parallel at the same time, but not very many, because for most workloads, there aren't that many things you can compute before you hit another decision point. Look through this file (you don't have to understand any of it) and note how many of the lines are conditional statements ("if (whatever)"). Partly as a result of this, a CPU has to put a lot of resources towards predicting the branches so it can run fast.

The gpu is more specific and workload is very regular and predictable. So when the cpu does this kind of work the resources used for organizing is wasted and unnecessary. Gpu's get much better performance than cpu because organizing transistors are thrown out and more work units are put it their place.
The GPU is good at "embarrasingly parallel" tasks - tasks in which you do the same thing to, say, 1000 pieces of data before you do something else. If you want to take two long lists of numbers and multiply them, the GPU can perform a LOT of the multiplies each cycle (whereas the CPU can finish at most 1 per cycle, or maybe 2/4 if there's some MMX/SSE support).

Special instructions for cpu, like sse4, improve performance by acting as shortcut commands to skip unnecessary steps and therefore act more gpu-like

MMX and SSE basically add some "embarrasingly parallel" support to the CPU. For example, if you want to multiply 4 pairs of numbers, without MMX/SSE it would be 4 operations, none of which happen simultaneously (because there's only 1 integer multiplier). With MMX, I think you can do them all in one shot.

Because gpu work is much simpler, the stages are shorter and this is why their clock speeds are so much less than cpu's using the same manufacturing process.

The manufacturing process isn't as similar as you might think. Two 90nm manufacturing processes can be very different. The transistors can have different threshold voltages, oxide thicknesses, gate lengths (a 65um transistor gate length is NOT 65nm, and Intel's probably differs from AMD"s probably differs from TSMC's probably differs from ...), dielectric materials, metal thicknesses, etc. There are a huge number of parameters that can be tuned to trade off speed, power, cost, design density, etc.

GPUs are generally made by third party manufacturers ("foundries"), while high-performance CPUs are generally made by the same company that designs them. This also leads to some differences. At a CPU company, design of a chip can start well before the manufacturing process the chip will use is ready for production. The transistor models used by the designers are highly speculative (read: wrong) at the beginning of a project. As the project approaches completion, the models get more accurate (since the fab is also getting closer to being ready for production). This allows CPU design teams to work on a chip well before the manufacturing technology is nailed down. The design team also talks with the fab and may request many changes as the manufacturing process is being developed ("make this metal thicker", "make this oxide thinner", etc).

A GPU company, on the other hand, may not to start designing for a given process until the foundry is reasonably confident about what the production-ready process will look like (if the foundry's speculative models were very inaccurate, you can imagine their customers might be pretty annoyed, so they may not want to release early, highly-speculative models). There may also be more limits on how much the foundry is willing to change to suit the chip designer's desires.

The circuits in GPUs tend to be largely designed by computers (automatic synthesis). Synthesis is very fast (it might take a day to re-spin the whole chip), but it doesn't produce very good results. Critical circuits in high-performance CPUs are almost always designed by hand - while this takes months, the result is usually faster and smaller. Now, when you combine this with the preceding paragraphs, you might see that GPU companies start later, but their design cycles are also going to be shorter, which does help them catch up a bit.

There are probably also architectural factors - in a CPU, since you're not doing much work each cycle, you have to clock it very high. In a GPU, you could double the number of units or double the frequency, and probably end up with similar performance (and lower frequencies generally translate to significant power savings).

Oh, and don't bump posts in this forum. It gets very few new threads per day, so a thread will stay on the front page for quite a while before it needs to be bumped.
 

imported_Baloo

Golden Member
Feb 2, 2006
1,782
0
0
Ok, The OP asks "what's the difference" then gives a pretty good explanation of the difference in his first post, answering his own question.

/thread closed. ;)