Why can't GPU/VPU's get to 1 GHz?

futuristicmonkey

Golden Member
Feb 29, 2004
1,031
0
76
I was just thinking, if CPUs are already at 3.40 Gigahertz, why are graphics processing units only able to run at like 520 mhz? Is it because you wouldn't be able to get a big enough of a cooler on there? or enough power? or are they just incapable of running at those speeds because of their increased amount of transistors (complexity- and heatwise)?

Why?
 

VIAN

Diamond Member
Aug 22, 2003
6,575
1
0
Well a CPU just has one pipeline with as many as 31 stages.

A GPU has up to 16 pipelines with hundreds of stages.

Note: GPU has 16 pipelines - that's the same as saying that the chip has 16 similar GPUs in it. CPUs will be coming out with this next year or so.
 

Diablo6178

Senior member
Aug 23, 2000
448
0
0
Mostly it's due to the architecture and the transistor count. They are on the same or lower manufacturing process as cpu's but it's not the primary objective to have fast chips, instead they need alot of bandwidth. Also there are alot of parallel pipelines for executing multiple instructions at once. This gets around alot of the need for speed increases.
 

alent1234

Diamond Member
Dec 15, 2002
3,915
0
0
i thought graphics processors were built with RISC in mind and that the architecture was prone to heat.
 

imported_obsidian

Senior member
May 4, 2004
438
0
0
Transistors:

P4 Northwood - 55 million
P4 Prescott - 125 million
P4EE - 178 million(most for L3 cache)

9800 XT - 107 million
X800 - 160 million
Geforce 6800 - 222 million
 

VIAN

Diamond Member
Aug 22, 2003
6,575
1
0
Don't forget that:

The P4 Northwood has 512kB of L2 cache.
The P4 Prescott has 1MB of L2 cache.
The P4 Extreme Edition has 512kB of L2 cache and 2MB of L3 cache.
 

Goi

Diamond Member
Oct 10, 1999
6,771
7
91
GPUs come out really fast, between 6-12 months every iteration. CPUs come out once every few years, and hence the engineers have a lot of time to optimize the die/mask. Because of this they're able to ramp up the clockspeeds a lot more than GPUs. If the GPU engineers took as long to optimize we'd probably still be stuck at a RIVA128 now ;)
 

jiffylube1024

Diamond Member
Feb 17, 2002
7,430
0
71
Think about it. A GPU has up to 16 parallel pipelines (ie the X800XT and 6800U). That means it process 16 instructions at once, ie if the chip is running at 500 MHz, it's like an 8000 MHz processor running on a single pipeline.


GPU's sacrifice raw clock speed (which is not needed) for ultra heavy paralellization (which is essential). As has been proven time and time again, clockspeed isn't everything; this is never more true than in the GPU field.
 

Insomniak

Banned
Sep 11, 2003
4,836
0
0
Originally posted by: VIAN
yeah, so technically... the X800 XT is a 8GHz single pipeline GPU.


Technically, it's a 600-odd Mhz 16 pipeline GPU. Theoretically, an 8Ghz single pipeline GPU would give the same result.

Key word - theoretically.
 

Dman877

Platinum Member
Jan 15, 2004
2,707
0
0
CPU's are designed by hand, gpu's are done by computers. That's part of the difference.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
CPUs are designed by computers as well, and important parts are modified by humans with the know-how.
You'd need a 3.2GHz single-pipeline single-texturing GPU to keep up with a 412MHz 8-pipeline single-texturing GPU.
Now, do you go through all kinds of hard work like CPU manufacturers do to get there...or go through 80% of it and add 7 more units, and a couple parts to manage them? You do the latter.
The Athlons and Pentium4s have shown that a lesser-clocked beast can still perform very well. But that's with decision-making tasks. 1+1, 10 cycles, next, 10 cycles, OK, now do something else, wait, wait, wait...and you can have other stuff done in those 9 'blank' pipeline stages.
Most of the GPU work could have pipelines hundreds of cycles long (and for all I know, they may), but as long as each stage is filled--like an assembly line--you get an output of pipes * Mhz.
The 9500 Pro and 9600 XTs at similar clock speeds basically prove it.
9500 Pro 8x1 275MHz core, 540MHz RAM.
9600 XT 4x1 500Mhz core, 600MHz RAM.
A 9500 Pro at 275/600 will get the same single and multi-texture results as a 9600 XT at 550/600.
So if you need cost-saving, work on speed. But if you need performance, the extra pipelines give you near perfect scaling.
 

Falloutboy

Diamond Member
Jan 2, 2003
5,916
0
76
in my mind it would be in better interest of Nvidia and ATI to try for a 2ghz 2x2 pipelined card. they would save a ton of transistors and would be alot cheaper to build
 

Genx87

Lifer
Apr 8, 2002
41,091
513
126
1. 16 pipelines
2. Uses generic transistors and design. They dont customize it as much as a CPU since the shelf life is about 18 months vs 5 years for a CPU.
3. Highly parallel.

Overall just the shear mass of executing units is what probably keeps them such a low clock.
It isnt that big of a deal when you consider the shear amount of processing power these things have. I bet when the DirectNext cards come out with a full programming model that a bunch of GPUs will be used for mathematical uses. Highly parallel GPU with a very fast local memory.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Originally posted by: Falloutboy525
in my mind it would be in better interest of Nvidia and ATI to try for a 2ghz 2x2 pipelined card. they would save a ton of transistors and would be alot cheaper to build
It already costs insane R&D for each new chip. When a new Direct3D spec is on its way, they might as well be redesigning from the ground up. It might be a smaller chip, but when all is said and done, can they get it to market on time? If not, it's more expensive. Can they get similar yields to their larger parts? If not, it's more expensive.
Lastly, why can't they just tack more pipelines on it at 2GHz and blow the competition away completely? :)
 

Wolfdog

Member
Aug 25, 2001
187
0
0
There are a quite a few parts that goes into making a cpu/gpu chip. When you look at it, the manufacturing processes that most cpu makers use to fab thier chips is far more advanced than TSMC, or IBM for that matter. Cpu makers build thier chips with performance and clockspeed in mind. While they must keep a somewhat general approach to how they run. Since gpu's on the other hand have a very specific set of things that they do. Complexity is not really holding them back though. GPU producers are generally one or two processes behind the cpu generations. We won't see .09 micron gpu's until late this, early next year. While both Intel and AMD have invested billions into utilizing the bleeding edge processes to fab thier parts this year. When it comes down to it though having a 1ghz gpu will ultimately be limited to its memory interface. Graphics designers know this and will be settling into the long slog here soon. Since they can no longer just add complexity on, since process is not going to carry them for much longer. One only need to look at the current generation x800, and 6800u's. They run very hot and use more power than any desktop cpu. On thier current trend we should see 300 million+ transistor parts next year. So to get back on track gpu makers are definitely looking toward the wider slower rather than narrower faster.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Originally posted by: JeremiahTheGreat
Well.. i don't think the AGP slot could handle 1/2 KG of copper tacked to the side.. :eek:
That has nothing to do with it. Current GPUs are already using 80% or more of what the P4s can as far as power. And people seem to be doing OK with that much on the video card in the form of aftermarket coolers...
 

Insomniak

Banned
Sep 11, 2003
4,836
0
0
Originally posted by: Falloutboy525
in my mind it would be in better interest of Nvidia and ATI to try for a 2ghz 2x2 pipelined card. they would save a ton of transistors and would be alot cheaper to build


Bad idea. Graphics rendering requires MANY threads to be processed in parallel, otherwise it takes forever to render a frame because so many different kinds of effects are going on at once. GPUs need massive throughput, not sky high speeds.

Trust me, the engineers at ATi and NV know what they're doing, k? ;)
 

jiffylube1024

Diamond Member
Feb 17, 2002
7,430
0
71
Originally posted by: Insomniak
Originally posted by: Falloutboy525
in my mind it would be in better interest of Nvidia and ATI to try for a 2ghz 2x2 pipelined card. they would save a ton of transistors and would be alot cheaper to build


Bad idea. Graphics rendering requires MANY threads to be processed in parallel, otherwise it takes forever to render a frame because so many different kinds of effects are going on at once. GPUs need massive throughput, not sky high speeds.

Trust me, the engineers at ATi and NV know what they're doing, k? ;)

Damn, beat me to it!

Yep, you can't just say a 2 Ghz 2X2 card would be better, it would be many times worse. Think about the super long pipeline that would be necessary for a 2 GHz GPU. Cache misses and mispredicted branches would be a killer.

You need the GPU to do all that work in parallel - unlike a CPU where it's processing different instructions, there is so much repetition with rendering every single pixel on a screen.

As Insomniak said - trust the engineers, they know what they're doing.
 

KF

Golden Member
Dec 3, 1999
1,371
0
0
CPUs work at a multiple of the memory data rate of around 5. It makes sense to do this only because of what is called "locality." Average programs reuse relatively small (under 1 MB) sections of memory repeatedly . Therefore a small local memory (cache) operating at CPU speed is feasible. Without that CPU-speed cache, a 3 GHz CPU would be unable to do much useful at a 3GHz rate. Working at muliple speed is not very efficient in terms of CPU cycles, but it does maximize program speed.

A GPU needs to access 8 MB of data and more to do a single high-resolution frame. Video cards are coming with 128 MB. Therefore the data lacks enough locality for a reasonable sized cache to make operating the GPU at a multiple of the memory data rate useable. (Maybe a somewhat higher rate might be useful as a kind of overkill to make up for stalls and to keep buffers full.) Instead they use very high speed (expensive) and very wide memory to get the data rate very high. GPUs with enough on-chip, full-speed memory have been just too expensive to build, I think. It appears that given the data which GPUs work on, it works out just as well the way they have to do it, without a lot of wasted cycles.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,587
10,225
126
Originally posted by: Insomniak
Originally posted by: VIAN
yeah, so technically... the X800 XT is a 8GHz single pipeline GPU.

Technically, it's a 600-odd Mhz 16 pipeline GPU. Theoretically, an 8Ghz single pipeline GPU would give the same result.

Key word - theoretically.

Which is why I've often wondered, why someone hasn't built a multi-processor software-based "virtual GPU" card, by slapping a bunch of cheap, modern, high-clock-speed CPUs on a card, running the 3D pipeline in software. It would be infinitely more flexible and upgradable, and when not processing 3D, could be dedicated to other things, like distributed-computing projects, etc. I think that the main issue is interconnect bandwidth (with multiple CPUs), and moreso heat issues.

But with the introduction of things like 1.5Ghz 25W AXP chips, or maybe undervolted 1.8Ghz Durons, this seems a bit more feasable. Indeed, there are now some partially software-implemented GPU solutions, like the XGI Volari and whatnot, except that they use the host CPU, instead of offloading onto dedicated slave CPUs. Btw, most older 3D arcade games, didn't use dedicated GPU ASICs, they actually did have a group of slave multi-processor CPUs running a 3D rendering pipeline in software. Most Sega arcade Model2 and Model3 games were like this, as were some of Namco's System22/23 hardware. It was only the Naomi and System246 hardware that they started to use dedicated rendering GPUs and stuff.

Even better, maybe you could use those slave CPUs to run a software emulation of an entire hardware system. That would be great, both for entertainment (arcade and console game emulation), business (emulate a different "work computer" OS), and even design (hardware simulation on the desktop work).

This thought mostly driven by the fact that modern video cards only run at 500Mhz, and cost $200-500, and yet, you can purchase a 1.8Ghz Duron for around $40. For $500 you could purchase 8 of them, with three effective pipelines each, at 1.8Ghz, would yield somewhere around, lets say conservatively, equivalent to a 32Ghz single-pipeline GPU/CPU. You would still have $180 left over for some high-speed DRAM and interconnect switching hardware and PCB fabrication.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,587
10,225
126
Originally posted by: Insomniak
Originally posted by: Falloutboy525
in my mind it would be in better interest of Nvidia and ATI to try for a 2ghz 2x2 pipelined card. they would save a ton of transistors and would be alot cheaper to build

Bad idea. Graphics rendering requires MANY threads to be processed in parallel, otherwise it takes forever to render a frame because so many different kinds of effects are going on at once. GPUs need massive throughput, not sky high speeds.

Trust me, the engineers at ATi and NV know what they're doing, k? ;)

If you think about it, that's basically been the same strategy that Intel has had for their Itanium design - lower clock speeds, but many parallel pipelines/execution units. (Indeed, maybe Intel should convert their Itanium chips into GPUs instead - or have NVidia engineers take over their IA-64 program. Hmm.)

Of course, the reason that one approach (massive parallel execution) works better than the other (high-speed single execution), is because the tasks that they are performing (graphics rendering) is basically inherently infinitely-parallelizable (you could theoretically build a GPU with a seperate pipeline for every output pixel on the screen, and generate an entire new frame rendering in only a few clocks - essentially the "ultimate in SLI"), whereas execution of logical program control-code isn't (due to branching and looping).

With the trend towards more and more "CPU like" instruction codes (programmable shaders, etc.) that now include loop/branch constructs, I'm curious how 3D GPU makers will respond, as those features have the potential to disrupt shader execution pipelines, and potentially stall other parts of the chip if not implemented correctly.