Multi core GPUs? Why not?

Locut0s · Nov 15, 2007

With all the buzz in the CPU world nowadays being about more cores and not more MHZ it's interesting to see that the latest graphics cards have been all about more MHZ and more features. It seems to me that it's in the graphics card world that more cores would make the most sense given the almost infinite scalability of rendering. Instead of making the next generation of GPUs more and more complex than the previous generation why not work instead on making these GPUs work together better. Then your next generation card could just be 4 or 5 of the current generation GPUs on the same die or card. Think of it, if they can get the scaling and drivers down pat then you could churn out blazingly fast cards just by adding more cores to the card. And as long as you are manufacturing the same generation chip and doing so at HUGE volumes the cost per chip should go down too.

Think this is something we will start to see soon?

Edit: I agree the thermal envelope problem is a huge one to overcome with this idea.

Moved from Off Topic
moderator allisolm

0roo0roo · Nov 15, 2007

3dfx?
........

Mojoed · Nov 15, 2007

We will be seeing more of this in the future. In fact, AMD will be releasing a dual-GPU card early next year based on the new 38xx series.

I think the big reason why you don't see this widespread use of multiple cores on a card is power usage.

A 8800 GTX in SLI uses 373 watts under load. Even the new 8800GT which is built on a smaller process uses 327 watts under load.

Now imagine the power requirements for a multi-GPU/multi-CPU system. :Q

Even if it weren't for the power issues, there are still driver issues for SLI/Crossfire. Some games are only seeing a 30-40% benefit from additional GPU cores.

Sphexi · Nov 15, 2007

Realistically they are doing something similar to that. They don't really need to add cores, they just add more stream processors, and those can do any of the different functions required really. It's like having 30+ simple cores on a chip, each can be programmed to something different. Works really well too apparantly, I somehow don't think that GPUs can be improved much more than they already are.

Mojoed · Nov 15, 2007

Originally posted by: Sphexi
Realistically they are doing something similar to that. They don't really need to add cores, they just add more stream processors, and those can do any of the different functions required really. It's like having 30+ simple cores on a chip, each can be programmed to something different. Works really well too apparantly, I somehow don't think that GPUs can be improved much more than they already are.

"640K ought to be enough for anybody." -- Bill Gates

Your comment above reminded me of this quote. Personally I think 3d is still in it's infancy. We still have a long way to go for real-time radiosity/ray tracing, etc.

Sphexi · Nov 15, 2007

Originally posted by: Mojoed

Originally posted by: Sphexi
Realistically they are doing something similar to that. They don't really need to add cores, they just add more stream processors, and those can do any of the different functions required really. It's like having 30+ simple cores on a chip, each can be programmed to something different. Works really well too apparantly, I somehow don't think that GPUs can be improved much more than they already are.

Click to expand...

"640K ought to be enough for anybody." -- Bill Gates

Your comment above reminded me of this quote. Personally I think 3d is still in it's infancy. We still have a long way to go for real-time radiosity/ray tracing, etc.

Well, maybe I should've worded that better. I didn't mean to say that GPUs are as good as they're going to get, just that the improvements being done to them are already so advanced and cutting edge that I don't think we can ask for much more.

jagec · Nov 15, 2007

Well, multi-core means more heat, more power consumption, and a much more expensive die (and higher reject rate), so companies will try to put it off if they can. I think the reason they haven't done it yet is simply that they haven't hit the MHz wall yet. Intel and AMD didn't start going to multiple core until they were having trouble pushing their clock speeds any further. It's easier and cheaper to tweak the existing GPU architecture a bit, and run it faster and harder with the same basic silicon, than to develop a new architecture to deal with the problems of multiple cores (although I imagine they'll be able to peek over Intel and AMD's shoulder a bit in this regard). Until it gets seriously expensive and unproductive to up the MHz, I feel they'll probably continue in this vein.

In the meantime, enthusiasts who really want the ultimate performance are buying TWO TO FOUR high-margin graphics cards with cheap-to-produce processors in parallel, so why try to sell them one low-margin card with multiple cores and an expensive processor?

But it will happen eventually, I'm sure.

Locut0s · Nov 15, 2007

Originally posted by: Sphexi
Realistically they are doing something similar to that. They don't really need to add cores, they just add more stream processors, and those can do any of the different functions required really. It's like having 30+ simple cores on a chip, each can be programmed to something different. Works really well too apparantly, I somehow don't think that GPUs can be improved much more than they already are.

Well yes the 'stream processors' AMD uses are sort of like simplified cores. However I was thinking of fully functional, or nearly so, cores that could handle most if not all of the tasks in the rendering pipeline. Think of breaking a scene down into many sections then intelligently assigning each core a section of the scene to render. Or you could have each core rendering a different frame while using another more specialized core doing physics calculations and the like.

ViRGE · Nov 15, 2007

Originally posted by: Locut0s

Originally posted by: Sphexi
Realistically they are doing something similar to that. They don't really need to add cores, they just add more stream processors, and those can do any of the different functions required really. It's like having 30+ simple cores on a chip, each can be programmed to something different. Works really well too apparantly, I somehow don't think that GPUs can be improved much more than they already are.

Click to expand...

Well yes the 'stream processors' AMD uses are sort of like simplified cores. However I was thinking of fully functional, or nearly so, cores that could handle most if not all of the tasks in the rendering pipeline. Think of breaking a scene down into many sections then intelligently assigning each core a section of the scene to render. Or you could have each core rendering a different frame while using another more specialized core doing physics calculations and the like.

You can already do all of that with a modern GPU. The former (breaking down a scene) is exactly how things are rendered today, in fact it's exactly how things have been rendered since the olden days of the TNT where Nvidia released the first multiple-pipeline pre-GPU GPU. The latter can also be done today, however there are certain limitations to this due to how GPUs handle threading. It's not done mostly due to the fact that most games are GPU limited and taking power from the GPU's graphics work only makes that worse. The threading issues in return are being resolved with further improvements in GPU design; Microsoft has already mapped out several generations of Windows Driver Display Model (WDDM) requirements that progressively increase the number of threads and the granularity of the threads that can be run. WDDM 3.0 in particular (which will be DX11) calls for something very close to CPU-like threading.

NanoStuff · Nov 15, 2007

Originally posted by: jagec
Well, multi-core means more heat

no

Originally posted by: jagec
more power consumption

no!

Originally posted by: jagec
much more expensive die

wtf? shit no.

Originally posted by: jagec
and higher reject rate

Higher reject rate, higher success rate at the same time, you're making more cores. Lower reject cost because of simplicity of core. Sum +1.

Originally posted by: jagec
It's easier and cheaper to tweak the existing GPU architecture a bit, and run it faster and harder with the same basic silicon, than to develop a new architecture to deal with the problems of multiple cores

You might be surprised, but no. There's nothing easy about die shrinks. The architecture is already available.

Throckmorton · Nov 15, 2007

I have a 7950GX2 and in Tabula Rasa I get 10-30fps if I turn shadows on... People with single cards don't have that problem.

TheNewbie · Nov 15, 2007

We are in Off Topic forum you know...

Edit: sorry for being the on duty prick..

AMCRambler · Nov 15, 2007

Originally posted by: NanoStuff

Originally posted by: jagec
Well, multi-core means more heat

Click to expand...

no

Originally posted by: jagec
more power consumption

Click to expand...

no!

Originally posted by: jagec
much more expensive die

Click to expand...

wtf? shit no.

Originally posted by: jagec
and higher reject rate

Click to expand...

Higher reject rate, higher success rate at the same time, you're making more cores. Lower reject cost because of simplicity of core. Sum +1.

Originally posted by: jagec
It's easier and cheaper to tweak the existing GPU architecture a bit, and run it faster and harder with the same basic silicon, than to develop a new architecture to deal with the problems of multiple cores

Click to expand...

You might be surprised, but no. There's nothing easy about die shrinks. The architecture is already available.

Die shrinks are really dependent on the technology used to print the integrated circuits on the silicon. My brother is a technician for the machines that do this. So as fr as die shrinks go, they can only go as far as current technology in this area allows. The way they measure it is the smallest distance between the pathways in the circuit. The tighter they are packed, the more you can fit on a die. The whole process is pretty interesting. I don't pretend to understand all of it, but sometimes we get talking about it when we hang out.
The way it works is they are using light to expose a layer of photoresist on the silicon wafers. They've got some powerful lasers that will shine through a lens onto the wafers to print the circuit. The stage that the wafer sits on actually moves as the laser turns off and it positions the wafer for the next chip to be printed on it.
What they are doing now in order to achieve smaller distances between the pathways is they are actually using what they call immersion lenses. A bubbles of water is placed between the lens and the wafer. The lens is actually immersed in the water. The reason for it is the different index of refraction for water as opposed to air. They can shrink the wavelength of light more by passing it through water as opposed to air. This just boggles my mind if you think about what actually has to happen mechanically for this to be possible. The water needs to be held in place so that it doesn't just run off the top of the wafer. I don't think they just plunk the wafers into a dish of water either. From what my brother was telling me they use some sort of vacuum to hold the bubble of water in place between the lens and the wafer. Blows my mind. Anyway, I could "ramble" on all morning. Wikipedia's got some pretty good entries on this stuff if you want to check it out.

Steppers - Step and Scan & Lithography
Lens Immersion Lithography

jagec · Nov 15, 2007

We are only talking about multiple cores here...not multiple cores combined with die shrinks.

Originally posted by: NanoStuff

Originally posted by: jagec
Well, multi-core means more heat

Click to expand...

no

You've almost doubled the number of transistors. The extra transistors don't create any heat then?

Originally posted by: NanoStuff

Originally posted by: jagec
more power consumption

Click to expand...

no!

Or consume any power, right.

Originally posted by: NanoStuff

Originally posted by: jagec
much more expensive die

Click to expand...

wtf? shit no.

Twice as much silicon area (or more) per completed "chip". Highly refined silicon is free now?

Originally posted by: NanoStuff

Originally posted by: jagec
and higher reject rate

Click to expand...

Higher reject rate, higher success rate at the same time, you're making more cores. Lower reject cost because of simplicity of core. Sum +1.

Whaa?
Take the following example:
10,000 single-core GPUs on a wafer, 50 irreparable defects, randomly spaced. At most you'll lose 50 GPUs, so you get 9,950 GPUs out of a wafer.

5,000 double-core GPUs on the same size wafer, 50 irreparable defects (we're still buying from the same supplier and using the same equipment to etch), randomly spaced. Since we're not Sony and won't sell a GPU that has a non-working core, again we'll lose at most 50 GPUs. Now there's a greater chance that a single GPU will have multiple defects (greater area per GPU), so let's say we lose 49 GPUs. So we get 4,951 GPUs out of a wafer.

#1 success rate=99.5%
#2 success rate=99.0%

biostud · Nov 16, 2007

I would think shared memory bandwidth would be a problem for a multicore GPU.

Dribble · Nov 16, 2007

Originally posted by: Locut0s
With all the buzz in the CPU world nowadays being about more cores and not more MHZ it's interesting to see that the latest graphics cards have been all about more MHZ and more features. It seems to me that it's in the graphics card world that more cores would make the most sense given the almost infinite scalability of rendering. Instead of making the next generation of GPUs more and more complex than the previous generation why not work instead on making these GPUs work together better. Then your next generation card could just be 4 or 5 of the current generation GPUs on the same die or card. Think of it, if they can get the scaling and drivers down pat then you could churn out blazingly fast cards just by adding more cores to the card. And as long as you are manufacturing the same generation chip and doing so at HUGE volumes the cost per chip should go down too.

Think this is something we will start to see soon?

What do think the 128/320 stream processors on top end ati/nvidia graphics cards are?
They have been multi-core for years. The latest ones aren't even that specific to graphics - hence you can use your graphics card to do cpu intensive things like folding home.

The 7950GT2 and new equivalents being brought out by ati/nvidia beginning of next year aren't multi core. They are two complete graphics card in one slot - just Sli/Xfire without the need for two slots on your motherboard.

taltamir · Nov 16, 2007

Originally posted by: jagec
We are only talking about multiple cores here...not multiple cores combined with die shrinks.

Originally posted by: NanoStuff

Originally posted by: jagec
Well, multi-core means more heat

Click to expand...

no

Click to expand...

You've almost doubled the number of transistors. The extra transistors don't create any heat then?

Originally posted by: NanoStuff

Originally posted by: jagec
more power consumption

Click to expand...

no!

Click to expand...

Or consume any power, right.

Originally posted by: NanoStuff

Originally posted by: jagec
much more expensive die

Click to expand...

wtf? shit no.

Click to expand...

Twice as much silicon area (or more) per completed "chip". Highly refined silicon is free now?

Originally posted by: NanoStuff

Originally posted by: jagec
and higher reject rate

Click to expand...

Higher reject rate, higher success rate at the same time, you're making more cores. Lower reject cost because of simplicity of core. Sum +1.

Click to expand...

Whaa?
Take the following example:
10,000 single-core GPUs on a wafer, 50 irreparable defects, randomly spaced. At most you'll lose 50 GPUs, so you get 9,950 GPUs out of a wafer.

5,000 double-core GPUs on the same size wafer, 50 irreparable defects (we're still buying from the same supplier and using the same equipment to etch), randomly spaced. Since we're not Sony and won't sell a GPU that has a non-working core, again we'll lose at most 50 GPUs. Now there's a greater chance that a single GPU will have multiple defects (greater area per GPU), so let's say we lose 49 GPUs. So we get 4,951 GPUs out of a wafer.

#1 success rate=99.5%
#2 success rate=99.0%

Sony sells defective cpus now? doesn't surprise me one bit, sony's motto should be "we hate you"

Anyways multi gpu cards are good.... much much better then multi CARD solution. Even more then a multi-core cpu is better then a multi socket board solution.

because you can have video cards from any company you want work with the same single slot pcie. But you must buy a specific multi slot board for a specific company's multi card crap.

And it will have BETTER bandwidth then having two cards connected over a bridge. And pcie2 is already WAAAY more then cards realistically need AT THE MOMENT (of course it will be maxed out... but right now bandwidth is plentiful.... only because they NEED all that extra bandwidth for multi card solutions)

taltamir · Nov 16, 2007

Originally posted by: Sphexi

Originally posted by: Mojoed

Originally posted by: Sphexi
Realistically they are doing something similar to that. They don't really need to add cores, they just add more stream processors, and those can do any of the different functions required really. It's like having 30+ simple cores on a chip, each can be programmed to something different. Works really well too apparantly, I somehow don't think that GPUs can be improved much more than they already are.

Click to expand...

"640K ought to be enough for anybody." -- Bill Gates

Your comment above reminded me of this quote. Personally I think 3d is still in it's infancy. We still have a long way to go for real-time radiosity/ray tracing, etc.

Click to expand...

Well, maybe I should've worded that better. I didn't mean to say that GPUs are as good as they're going to get, just that the improvements being done to them are already so advanced and cutting edge that I don't think we can ask for much more.

EDIT: it seems you edited the original post, and I was getting on his case for berating you for something inappropriately (that or i misread your original post... meh)

And anyways you make an excellent point. An "additional core" makes sense in a cpu because you have 30+ processess all fighting for the attention of the ONE cpu, which contains multiple ALU (arithmatic logic units, aka calculators) etc... So having multiple cores improves multitasking efficiency...

For a video card you don't have that. if anything it is the opposite. You are making only ONE imagine, you have to take many many sources of information and trim and combine them into one image. Perfect work for a single "core" (aka archetecture) with many secondary processors (aka, stream processors and the like). Thats why an average cpu core has 8 ALUs, while the new amd card supports 320 stream processors and 16 rops and 16 tsu. While nvidia's is 128 SP, 64 (i think) rops and 16 tsu... (is it called tsu? whatever)

The point stands that it is most efficient NOT to simply duplicate the same "core" but rather add "components" to that core. A core is not a single calculation churning machine but a complex archetecture of controllers, calculators, memory, manupulators, and special task hardware.

NanoStuff · Nov 16, 2007

Originally posted by: jagec
Whaa?
Take the following example:
10,000 single-core GPUs on a wafer, 50 irreparable defects, randomly spaced. At most you'll lose 50 GPUs, so you get 9,950 GPUs out of a wafer.

5,000 double-core GPUs on the same size wafer, 50 irreparable defects (we're still buying from the same supplier and using the same equipment to etch), randomly spaced. Since we're not Sony and won't sell a GPU that has a non-working core, again we'll lose at most 50 GPUs. Now there's a greater chance that a single GPU will have multiple defects (greater area per GPU), so let's say we lose 49 GPUs. So we get 4,951 GPUs out of a wafer.

#1 success rate=99.5%
#2 success rate=99.0%

First off, you're measuring failure rate as a function of the number of cores. Defect probability is a function of, usually, the sum of transistors. Also, you suggest that two cores will necessarily be grouped into a single die, which, as SLI shows, does not have to be.

Lets just simplify here.

1000x 1 Billion transistor die. 10% failure rate, 900 working cores.
> 900 Billion working transistors.

4000x 250 Million transistor die. 2.5% failure rate, 3900 working cores.
> 975 Billion working transistors.

Quarter complexity, quarter failure rate, lower sum of transistors lost per core. Everyone wins.

taltamir · Nov 16, 2007

um.... You just told him "you are wrong" and then paraphrased him... He himself said that smaller dies means lower failure rate and then proceeded to explain and calculate why... you just calculated why without explaining how you got those numbers... which are the EXACT SAME AS HIS.

40sTheme · Nov 16, 2007

Aren't GPUs ultra parallel/threaded already? Adding cores would probably cause more problems than help out.

NanoStuff · Nov 16, 2007

Originally posted by: taltamir
um.... You just told him "you are wrong" and then paraphrased him...

Oh good, that means we're in agreement

aka1nas · Nov 16, 2007

Originally posted by: 40sTheme
Aren't GPUs ultra parallel/threaded already? Adding cores would probably cause more problems than help out.

Yes, for the most part you would just be duplicating supporting logic such as memory controllers, when what you really want to do usually is just add more shaders and texturing units.

Snooper · Nov 16, 2007

Originally posted by: NanoStuff

Originally posted by: jagec
and higher reject rate

Click to expand...

Higher reject rate, higher success rate at the same time, you're making more cores. Lower reject cost because of simplicity of core. Sum +1.

Not quite. The complexity of the core really doesn't impact things THAT much (especially in the GPU world, as they are fairly simple to begin with). When it comes to wafers, die size is king. That, and a little thing called "critical defect density".

In general, in a given process that you have up and running, you will (on average) get X critical defects per wafer (critical meaning they kill a die for either functional or performance reasons). Say you have 100 critical defects per wafer. If you have small die sizes so that you make 1000 die per wafer, you will USUALLY get around 900 good die per wafer with 100 die being "killed" by a defect.

But if you have a large die that only gives you 200 die per wafer, then you will USUALLY only get 100 good die per wafer!

Needless to say, that is quite a difference! Driving down critical defects and die size are pretty much the holy grail in the semiconductor world.

jagec · Nov 16, 2007

Originally posted by: taltamir
Sony sells defective cpus now? doesn't surprise me one bit, sony's motto should be "we hate you"

Article.

The Cell processor is so complex that IBM even accepts chips that have only four out of the eight cores working. Not all cores end up functional says Reeves. In regards to why the yields are so low, Reeves says "[defects becomes a bigger problem the bigger the chip is. With chips that are one-by-one and silicon germanium, we can get yields of 95 percent. With a chip like the Cell processor, you?re lucky to get 10 or 20 percent. If you put logic redundancy on it, you can double that." According to Reeves, Sony will be using Cell processors whether they have all cores functional or not. Reeves says that the PlayStation 3 requires at least seven of the eight cores operational.

According to Reeves, IBM is still debating whether or not to discard the processors that have only six or less cores operational. Because of the design, the processors are still operational and can be used for various applications. IBM says that it will reserve the top chips for applications such as medical imaging and defense applications.

Originally posted by: NanoStuff
First off, you're measuring failure rate as a function of the number of cores. Defect probability is a function of, usually, the sum of transistors. Also, you suggest that two cores will necessarily be grouped into a single die, which, as SLI shows, does not have to be.

Lets just simplify here.

1000x 1 Billion transistor die. 10% failure rate, 900 working cores.
> 900 Billion working transistors.

4000x 250 Million transistor die. 2.5% failure rate, 3900 working cores.
> 975 Billion working transistors.

Quarter complexity, quarter failure rate, lower sum of transistors lost per core. Everyone wins.

I think the OP was referring to using multiple core GPUs on a single card as contrasted with the current scheme of using single-core GPUs and then either speeding them up or adding an extra GPU on its own graphics card (SLI, Crossfire). Or the multiple-GPU-on-a-single-card technique, which I don't think anyone uses. Either way, putting multiple cores close together at full clock speed on a single die allows for MUCH faster communication between them for a number of reasons...basically why your L1 cache is faster than L2, which is faster than system memory, even though all three are just an array of silicon transistors.

As for your numbers, I don't understand what you're getting at.

1 trillion transistors per wafer, 10% failure rate = 900 billion "working" transistors. Omitting the possibility of multiple defects on a single core (making this a pessimistic estimate), that's 900 "working" cores with a a complex single core.

Now it appears that you're postulating a 4-core system, with each core being a quarter as complex as the single-core system, to create a GPU that in very rough terms is close to the original in speed. My scenario doubled-up the single-core chip to create the multi-core...in other words, we were getting a lower yield of a faster GPU. I think this was part of the disagreement.

I don't know why you're postulating that the failure rate is somehow lower now...there ought to be just as many critical defects per wafer, regardless of whether we are using it to create a vast quantity of simple PNP transistors, or making a mythical wafer-large die for some imaginary computer. The difference is that the critical defect can be ANYWHERE on the wafer-large die to render it inoperable, whereas we'll have a whole bunch of transistors there were "missed" by the defect and therefore usable.

Now, if there was a way to create a whole bunch of very simple cores, and then link them together as if they had been created on a single die with no performance hit, you'd get a much higher yield from creating simpler cores. But this isn't how it works...AMD, Intel, and IBM all create multiple cores on a single die, so that a single defected core will render the whole dual-core chip inoperable (or at the very least second-tier goods; not suitable for the enthusiast market). Thus, in your scheme of using 4 cores of 1/4 complexity, we should see EXACTLY the same reject rate as the single core of normal complexity.

What I do not know is whether the addition of transistors necessary for intercommunication between the cores is greater than or less than the reduction of transistors due to any shared resources between the cores.

Multi core GPUs? Why not?

Lifer

No Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Lifer

Elite Member, Moderator Emeritus

Banned

Lifer

Senior member

Diamond Member

Lifer

Lifer

Platinum Member

Lifer

Lifer

Banned

Lifer

Golden Member

Banned

Diamond Member

Senior member

Lifer