Why can't GPU/VPU's get to 1 GHz?

futuristicmonkey · Jun 11, 2004

I was just thinking, if CPUs are already at 3.40 Gigahertz, why are graphics processing units only able to run at like 520 mhz? Is it because you wouldn't be able to get a big enough of a cooler on there? or enough power? or are they just incapable of running at those speeds because of their increased amount of transistors (complexity- and heatwise)?

Why?

VIAN · Jun 11, 2004

Well a CPU just has one pipeline with as many as 31 stages.

A GPU has up to 16 pipelines with hundreds of stages.

Note: GPU has 16 pipelines - that's the same as saying that the chip has 16 similar GPUs in it. CPUs will be coming out with this next year or so.

Diablo6178 · Jun 11, 2004

Mostly it's due to the architecture and the transistor count. They are on the same or lower manufacturing process as cpu's but it's not the primary objective to have fast chips, instead they need alot of bandwidth. Also there are alot of parallel pipelines for executing multiple instructions at once. This gets around alot of the need for speed increases.

alent1234 · Jun 11, 2004

i thought graphics processors were built with RISC in mind and that the architecture was prone to heat.

imported_obsidian · Jun 11, 2004

Transistors:

P4 Northwood - 55 million
P4 Prescott - 125 million
P4EE - 178 million(most for L3 cache)

9800 XT - 107 million
X800 - 160 million
Geforce 6800 - 222 million

VIAN · Jun 11, 2004

Don't forget that:

The P4 Northwood has 512kB of L2 cache.
The P4 Prescott has 1MB of L2 cache.
The P4 Extreme Edition has 512kB of L2 cache and 2MB of L3 cache.

Goi · Jun 12, 2004

GPUs come out really fast, between 6-12 months every iteration. CPUs come out once every few years, and hence the engineers have a lot of time to optimize the die/mask. Because of this they're able to ramp up the clockspeeds a lot more than GPUs. If the GPU engineers took as long to optimize we'd probably still be stuck at a RIVA128 now

otispunkmeyer · Jun 12, 2004

isnt it because cpu is 32/64 bit but gpus are 256bit processors or sumthing?

jiffylube1024 · Jun 12, 2004

Think about it. A GPU has up to 16 parallel pipelines (ie the X800XT and 6800U). That means it process 16 instructions at once, ie if the chip is running at 500 MHz, it's like an 8000 MHz processor running on a single pipeline.

GPU's sacrifice raw clock speed (which is not needed) for ultra heavy paralellization (which is essential). As has been proven time and time again, clockspeed isn't everything; this is never more true than in the GPU field.

VIAN · Jun 12, 2004

yeah, so technically... the X800 XT is a 8GHz single pipeline GPU.

Insomniak · Jun 12, 2004

Originally posted by: VIAN
yeah, so technically... the X800 XT is a 8GHz single pipeline GPU.

Technically, it's a 600-odd Mhz 16 pipeline GPU. Theoretically, an 8Ghz single pipeline GPU would give the same result.

Key word - theoretically.

VIAN · Jun 12, 2004

It was 500MHz * 16. Not odd.

Yes, we must not forget THEORETICALLY.

Dman877 · Jun 12, 2004

CPU's are designed by hand, gpu's are done by computers. That's part of the difference.

Cerb · Jun 12, 2004

CPUs are designed by computers as well, and important parts are modified by humans with the know-how.
You'd need a 3.2GHz single-pipeline single-texturing GPU to keep up with a 412MHz 8-pipeline single-texturing GPU.
Now, do you go through all kinds of hard work like CPU manufacturers do to get there...or go through 80% of it and add 7 more units, and a couple parts to manage them? You do the latter.
The Athlons and Pentium4s have shown that a lesser-clocked beast can still perform very well. But that's with decision-making tasks. 1+1, 10 cycles, next, 10 cycles, OK, now do something else, wait, wait, wait...and you can have other stuff done in those 9 'blank' pipeline stages.
Most of the GPU work could have pipelines hundreds of cycles long (and for all I know, they may), but as long as each stage is filled--like an assembly line--you get an output of pipes * Mhz.
The 9500 Pro and 9600 XTs at similar clock speeds basically prove it.
9500 Pro 8x1 275MHz core, 540MHz RAM.
9600 XT 4x1 500Mhz core, 600MHz RAM.
A 9500 Pro at 275/600 will get the same single and multi-texture results as a 9600 XT at 550/600.
So if you need cost-saving, work on speed. But if you need performance, the extra pipelines give you near perfect scaling.

Falloutboy · Jun 13, 2004

in my mind it would be in better interest of Nvidia and ATI to try for a 2ghz 2x2 pipelined card. they would save a ton of transistors and would be alot cheaper to build

Genx87 · Jun 13, 2004

1. 16 pipelines
2. Uses generic transistors and design. They dont customize it as much as a CPU since the shelf life is about 18 months vs 5 years for a CPU.
3. Highly parallel.

Overall just the shear mass of executing units is what probably keeps them such a low clock.
It isnt that big of a deal when you consider the shear amount of processing power these things have. I bet when the DirectNext cards come out with a full programming model that a bunch of GPUs will be used for mathematical uses. Highly parallel GPU with a very fast local memory.

Cerb · Jun 13, 2004

Originally posted by: Falloutboy525
in my mind it would be in better interest of Nvidia and ATI to try for a 2ghz 2x2 pipelined card. they would save a ton of transistors and would be alot cheaper to build

It already costs insane R&D for each new chip. When a new Direct3D spec is on its way, they might as well be redesigning from the ground up. It might be a smaller chip, but when all is said and done, can they get it to market on time? If not, it's more expensive. Can they get similar yields to their larger parts? If not, it's more expensive.
Lastly, why can't they just tack more pipelines on it at 2GHz and blow the competition away completely?

JeremiahTheGreat · Jun 13, 2004

Well.. i don't think the AGP slot could handle 1/2 KG of copper tacked to the side..

Wolfdog · Jun 13, 2004

There are a quite a few parts that goes into making a cpu/gpu chip. When you look at it, the manufacturing processes that most cpu makers use to fab thier chips is far more advanced than TSMC, or IBM for that matter. Cpu makers build thier chips with performance and clockspeed in mind. While they must keep a somewhat general approach to how they run. Since gpu's on the other hand have a very specific set of things that they do. Complexity is not really holding them back though. GPU producers are generally one or two processes behind the cpu generations. We won't see .09 micron gpu's until late this, early next year. While both Intel and AMD have invested billions into utilizing the bleeding edge processes to fab thier parts this year. When it comes down to it though having a 1ghz gpu will ultimately be limited to its memory interface. Graphics designers know this and will be settling into the long slog here soon. Since they can no longer just add complexity on, since process is not going to carry them for much longer. One only need to look at the current generation x800, and 6800u's. They run very hot and use more power than any desktop cpu. On thier current trend we should see 300 million+ transistor parts next year. So to get back on track gpu makers are definitely looking toward the wider slower rather than narrower faster.

Cerb · Jun 13, 2004

Originally posted by: JeremiahTheGreat
Well.. i don't think the AGP slot could handle 1/2 KG of copper tacked to the side..

That has nothing to do with it. Current GPUs are already using 80% or more of what the P4s can as far as power. And people seem to be doing OK with that much on the video card in the form of aftermarket coolers...

Insomniak · Jun 13, 2004

Originally posted by: Falloutboy525
in my mind it would be in better interest of Nvidia and ATI to try for a 2ghz 2x2 pipelined card. they would save a ton of transistors and would be alot cheaper to build

Bad idea. Graphics rendering requires MANY threads to be processed in parallel, otherwise it takes forever to render a frame because so many different kinds of effects are going on at once. GPUs need massive throughput, not sky high speeds.

Trust me, the engineers at ATi and NV know what they're doing, k?

jiffylube1024 · Jun 13, 2004

Originally posted by: Insomniak

Originally posted by: Falloutboy525
in my mind it would be in better interest of Nvidia and ATI to try for a 2ghz 2x2 pipelined card. they would save a ton of transistors and would be alot cheaper to build

Click to expand...

Bad idea. Graphics rendering requires MANY threads to be processed in parallel, otherwise it takes forever to render a frame because so many different kinds of effects are going on at once. GPUs need massive throughput, not sky high speeds.

Trust me, the engineers at ATi and NV know what they're doing, k?

Damn, beat me to it!

Yep, you can't just say a 2 Ghz 2X2 card would be better, it would be many times worse. Think about the super long pipeline that would be necessary for a 2 GHz GPU. Cache misses and mispredicted branches would be a killer.

You need the GPU to do all that work in parallel - unlike a CPU where it's processing different instructions, there is so much repetition with rendering every single pixel on a screen.

As Insomniak said - trust the engineers, they know what they're doing.

KF · Jun 14, 2004

CPUs work at a multiple of the memory data rate of around 5. It makes sense to do this only because of what is called "locality." Average programs reuse relatively small (under 1 MB) sections of memory repeatedly . Therefore a small local memory (cache) operating at CPU speed is feasible. Without that CPU-speed cache, a 3 GHz CPU would be unable to do much useful at a 3GHz rate. Working at muliple speed is not very efficient in terms of CPU cycles, but it does maximize program speed.

A GPU needs to access 8 MB of data and more to do a single high-resolution frame. Video cards are coming with 128 MB. Therefore the data lacks enough locality for a reasonable sized cache to make operating the GPU at a multiple of the memory data rate useable. (Maybe a somewhat higher rate might be useful as a kind of overkill to make up for stalls and to keep buffers full.) Instead they use very high speed (expensive) and very wide memory to get the data rate very high. GPUs with enough on-chip, full-speed memory have been just too expensive to build, I think. It appears that given the data which GPUs work on, it works out just as well the way they have to do it, without a lot of wasted cycles.

VirtualLarry · Jun 15, 2004

Originally posted by: Insomniak

Originally posted by: VIAN
yeah, so technically... the X800 XT is a 8GHz single pipeline GPU.

Click to expand...

Technically, it's a 600-odd Mhz 16 pipeline GPU. Theoretically, an 8Ghz single pipeline GPU would give the same result.

Key word - theoretically.

Which is why I've often wondered, why someone hasn't built a multi-processor software-based "virtual GPU" card, by slapping a bunch of cheap, modern, high-clock-speed CPUs on a card, running the 3D pipeline in software. It would be infinitely more flexible and upgradable, and when not processing 3D, could be dedicated to other things, like distributed-computing projects, etc. I think that the main issue is interconnect bandwidth (with multiple CPUs), and moreso heat issues.

But with the introduction of things like 1.5Ghz 25W AXP chips, or maybe undervolted 1.8Ghz Durons, this seems a bit more feasable. Indeed, there are now some partially software-implemented GPU solutions, like the XGI Volari and whatnot, except that they use the host CPU, instead of offloading onto dedicated slave CPUs. Btw, most older 3D arcade games, didn't use dedicated GPU ASICs, they actually did have a group of slave multi-processor CPUs running a 3D rendering pipeline in software. Most Sega arcade Model2 and Model3 games were like this, as were some of Namco's System22/23 hardware. It was only the Naomi and System246 hardware that they started to use dedicated rendering GPUs and stuff.

Even better, maybe you could use those slave CPUs to run a software emulation of an entire hardware system. That would be great, both for entertainment (arcade and console game emulation), business (emulate a different "work computer" OS), and even design (hardware simulation on the desktop work).

This thought mostly driven by the fact that modern video cards only run at 500Mhz, and cost $200-500, and yet, you can purchase a 1.8Ghz Duron for around $40. For $500 you could purchase 8 of them, with three effective pipelines each, at 1.8Ghz, would yield somewhere around, lets say conservatively, equivalent to a 32Ghz single-pipeline GPU/CPU. You would still have $180 left over for some high-speed DRAM and interconnect switching hardware and PCB fabrication.

VirtualLarry · Jun 15, 2004

Originally posted by: Insomniak

Originally posted by: Falloutboy525
in my mind it would be in better interest of Nvidia and ATI to try for a 2ghz 2x2 pipelined card. they would save a ton of transistors and would be alot cheaper to build

Click to expand...

Bad idea. Graphics rendering requires MANY threads to be processed in parallel, otherwise it takes forever to render a frame because so many different kinds of effects are going on at once. GPUs need massive throughput, not sky high speeds.

Trust me, the engineers at ATi and NV know what they're doing, k?

If you think about it, that's basically been the same strategy that Intel has had for their Itanium design - lower clock speeds, but many parallel pipelines/execution units. (Indeed, maybe Intel should convert their Itanium chips into GPUs instead - or have NVidia engineers take over their IA-64 program. Hmm.)

Of course, the reason that one approach (massive parallel execution) works better than the other (high-speed single execution), is because the tasks that they are performing (graphics rendering) is basically inherently infinitely-parallelizable (you could theoretically build a GPU with a seperate pipeline for every output pixel on the screen, and generate an entire new frame rendering in only a few clocks - essentially the "ultimate in SLI"), whereas execution of logical program control-code isn't (due to branching and looping).

With the trend towards more and more "CPU like" instruction codes (programmable shaders, etc.) that now include loop/branch constructs, I'm curious how 3D GPU makers will respond, as those features have the potential to disrupt shader execution pipelines, and potentially stall other parts of the chip if not implemented correctly.

Why can't GPU/VPU's get to 1 GHz?

Golden Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Banned

Diamond Member

Platinum Member

Elite Member

Diamond Member

Lifer

Elite Member

Senior member

Member

Elite Member

Banned

Diamond Member

Golden Member

No Lifer

No Lifer