Poll/discussion: do you think multi-GPU/multi-die is the future?

BFG10K · Jun 12, 2008

Because we're getting a bit off topic in the GT200/4xxx threads (sorry about that Virge

) I thought it was time to start this thread.

In this thread we can discuss the advantages, pitfalls, opinions, facts and whether you think multi-GPU is the way of the future.

I'll start with my thoughts on the issue:

The first major point made against single monolithic cores is the fact that die size is increasing despite process improvements. While this is certainly true my response to that is while the GT200 will be a big die @ 65 nm, it should be much more reasonable once it shrinks to 55 nm and it?ll likely be mid-range by the time it hits 45 nm. Also we haven?t even touched alternative manufacturing techniques like laser and organic.

I think the major problem at the moment is that GPU vendors are pursuing performance at the expense of everything else, much like how Intel was during the P4 days and it backfired on them once they started hitting thermal limits. Not even throwing multiple cores at the problem couldn?t dig them out of that hole because their single cores were already at the limit. It was only once they started designing chips with a focus on doing as much work per watt (i.e. the Core architecture) things got much more reasonable.

Moving on from this, slapping multiple smaller cores may seem like a good idea from yield and thermal perspectives but it?s absolutely horrid from a performance and compatibility perspective. Adding multiple cores doesn?t guarantee any kind of speedup from a single core. None whatsoever. You need to get the driver in there and start optimizing on an application basis to get scaling working and to work out the individual quirks of today?s complex games.

Sure, an NV40 chip @ 65 nm would be tiny and a card with eight such cores might be quite easy to build compared to a GT200 but will it be a performance match for a GT200? I doubt it, not unless you get reasonably perfect 8-way GPU scaling in most games which of course isn?t going to happen anytime soon. And this is ignoring the other issues associated with multi-GPU drivers such as input lag, micro-stutter, and general driver incompatibilities.

Also single cores form the basis of multi-core so if you?re hitting limits on them your multi-GPU won?t be viable either. Case and point the R600 before the 55 nm shrink. A 3870X2 simply wasn?t possible with that core before that.

You still need more performance in single cores so if you hit a wall you can?t really make multi-core faster either unless you keep adding more and more cores and rely on the driver to provide n-way scaling. Given it?s quite easy to see 2-way fail what chance does n-way have?

The final example given is the Voodoo 5. While that is indeed a shining example of multi-GPU ?just working? it?s also not really relevant to today?s world. Firstly the card used SFR (up to 128 scanlines per GPU) and while this is more compatible than AFR, it won?t scale as well because among other things vertex performance isn?t increased with this method. Both ATi and nVidia are currently pursuing AFR whenever possible.

The other point is that games, APIs and drivers are much more advanced. Now we have multiple render targets, complex shaders and similar while the Voodoo 5 didn?t even have T&L so it was basically just a simple rasterizer that ran very simple games. To get proper scaling today a lot more work is required from the driver.

Edit: added third poll option.

taltamir · Jun 12, 2008

I think it needs a thid option... "neither, it will remain as a niche market".

I think multi GPU is here to stay, but it is not going to replace single GPUs. You will have most people using single GPU and some using multi GPU.

I think the die size increased not despite process improvement, but BECAUSE of process improvement. nVidia felt that they could make a profitable die with monsterous bus width, massive increase in SP, etc. They could have chosen to make small bus and use GDDR4 or 5 to compensate, but they are actually compensating for cheap, slow GDDR3 by using more die space. This could only come about with DEFECT REDUCTION type of process improvement.

I complete agree with you that performance/watt is the way to go. And on most other points too...

I think the middle road will triumph. Both will remain, each with its own role.

TC91 · Jun 12, 2008

i like the third option

BFG10K · Jun 12, 2008

I've added the third option but unfortunately it seems to have messed with the poll votes.

taltamir · Jun 12, 2008

it actually reset it to 0 for all of them. When I voted for it they were all 0 (except 1 for the one I voted).

So the poll is more accurate now.

Nothing is stopping nvidia from making a G200 variant with GDDR5 + 256bit bus, identical bandwidth and significantly smaller die and power consumption. It will just cost more to get the same level of performance, but save you oodles on electricity.
Ofcourse I harbor a strong suspecion that we will soon see a G290 which is 512bit bus + GDDR5

Extelleron · Jun 12, 2008

To me I don't so much see "multi-GPU" as the future, as I said I see "multi-die GPU" as the future.

I'm no EE and I don't have a technical background, but I don't see why the future GPU can't look something like this. You have a "main" die which contains the basic GPU parts such as UVD. Perhaps the ROPs could be located here as well. ATI's Xenos GPU that is found in the X360 does this; the ROPs are located on a seperate die from the rest of the GPU, along with the 10MB of eDRAM.

Connected to this main GPU, via Hypertransport links (or some high-speed connection, much faster than the FSB used in Intel's C2Q) you could connect multiple die containing shading processors and texture units. The number of these die would vary depending on what GPU you were creating; high end, 4; semi high end (ie GTX 260), 3; midrange, 2. For the low end, everything could be integrated onto one die as there would not be a problem with die size and this would reduce the complexity (no need for seperate die / connections between die).

Maybe this is not possible, but it makes some sense in my mind.

So I wish you would alter the question a bit. It makes it sound more like multi-GPU, as in Crossfire / SLI, is the future. That's not what I believe, at least I hope not. Software scaling for GPUs is not ideal.

BFG10K · Jun 12, 2008

To me I don't so much see "multi-GPU" as the future, as I said I see "multi-die GPU" as the future

In terms of software support these two things are no different (i.e. you still need Crossfire/SLI scaling to take advantage of them).

What you're describing doesn?t sound like multi-die at all but sounds like a traditional single GPU with the design changed a bit.

This bit for example:

Connected to this main GPU, via Hypertransport links (or some high-speed connection, much faster than the FSB used in Intel's C2Q) you could connect multiple die containing shading processors and texture units. The number of these die would vary depending on what GPU you were creating;.

...sounds exactly like SP clusters on a single GPU, and we're already adding or removing them on product lines.

Remember that single GPUs are already "multi-core" (e.g. all of those SPs linked internally). The problem with multi-core/multi-die is that you have several standalone discrete units (as opposed to different parts of one discrete unit on a single unit) and they need to somehow be synchronized to provide a performance gain.

Nevertheless I have changed the poll to your request.

Extelleron · Jun 12, 2008

Originally posted by: BFG10K

To me I don't so much see "multi-GPU" as the future, as I said I see "multi-die GPU" as the future

Click to expand...

In terms of software support these two things are no different (i.e. you still need Crossfire/SLI scaling to take advantage of them).

What you're describing doesn?t sound like multi-die at all but sounds like a traditional single GPU with the design changed a bit.

Remember that single GPUs are already "multi-core" (e.g. all of those SPs linked internally). The problem with multi-core/multi-die is that you have several standalone discrete units (as opposed to different parts of one discrete unit on a single unit) and they need to somehow be synchronized to provide a performance gain.

Nevertheless I have changed the poll to your request.

As I said I'm no expert, but I don't think there is any reason you would need software for this. It would be a pure hardware connection AFAIK. Maybe it is not possible, I don't know, but it makes sense in my head.

The idea is that it is exactly like a single GPU, just that there are multiplie die, allowing for yields to improve greatly and the most state of the art manufacturing process to be used.

Intel doesn't need any software to sychronize the two die of Kentsfield/Yorkfield; they work together exactly as if they were a single die containing four cores. Why can't GPUs be the same? I don't see any reason why not.

There are certainly other options as well. As I've mentioned before, single-die GPU might make sense for the next two years, because in less than two years, TSMC should have 32nm process technology ready for use. Considering we are still on 55nm / 65nm, 32nm would likely allow for great performance with a single-die.

The other thing that could be a possibility (not in the near future, but farther along) is 3D die stacking, which has yet to be implemented in a consumer product AFAIK but it is coming along. With 3D stacking, you can stack multiplie die on top of each other instead of making the die wider. Not only does this make the horizontal die size more managable, it increases interconnect performance and, at least with CPUs, allows for shorter pipeline depth, which increases IPC. For example, in a P4 Prescott, the number of pipeline stages could be cut down considerably from 31 because a lot of those stages would no longer be needed because of reduced metal runs and elimination of wire delay (reducing latency). I know G80's shading units are deeply-pipelined, so this might help there.

Anyway, bit off topic, but clearly there are definitely more options than Crossfire / SLI software scaling and more options than what I have been talking about as well.

taltamir · Jun 12, 2008

did you happen to read it where I came up with the multi die idea, or did you come up with it in parallel?

I see no reason for other people not to come to the same conclusion... And if we both could come up with it, so could engineers in nvidia and AMD

(or maybe they haven't, they could be less creative then us, or maybe it just doesnt work, or maybe they are working on it).

I figured, you know how intel packages 2 x wolfdale die on a single quad core CPU? that's the same thing that needs to happen. But with the option to further "stretch" them. Make a special link so that you could be a whole cluster of parts on a whole seperate chip on the other end of the card, and connect them directly to the main die.

I would also suggest that you could implement improvements faster... upgrade the video decoder die and you can release a slightly improved model. Upgrade the effciency of the SP units, and again, just swap them out.. and so on.
This will mean REALLY quick product cycles. And the quicker the product cycle, the more you sell.

bunnyfubbles · Jun 12, 2008

Yes, simply because it should be almost perfectly scalable and without the headaches we know of today if/when ray tracing takes over.

crapfest · Jun 12, 2008

Do I think it will? Yeah. Do I want it to? Definitely not.

taltamir · Jun 12, 2008

as long as enough people think like you, how could it take over? people just wouldn't buy it...

Anyways, the exploded die idea... Did you know, back in the day, the HDCP circuitry was on its own mini die. So you could buy a version of your video card with or without the HDCP, and the one with came with a TINY second die on the same chip in an MCM package.

but that is probably not the same since the HDCP circuitry are probably independent of the GPU.

Nemesis 1 · Jun 13, 2008

Originally posted by: BFG10K

To me I don't so much see "multi-GPU" as the future, as I said I see "multi-die GPU" as the future

Click to expand...

In terms of software support these two things are no different (i.e. you still need Crossfire/SLI scaling to take advantage of them).

What you're describing doesn?t sound like multi-die at all but sounds like a traditional single GPU with the design changed a bit.

This bit for example:

Connected to this main GPU, via Hypertransport links (or some high-speed connection, much faster than the FSB used in Intel's C2Q) you could connect multiple die containing shading processors and texture units. The number of these die would vary depending on what GPU you were creating;.

Click to expand...

...sounds exactly like SP clusters on a single GPU, and we're already adding or removing them on product lines.

Remember that single GPUs are already "multi-core" (e.g. all of those SPs linked internally). The problem with multi-core/multi-die is that you have several standalone discrete units (as opposed to different parts of one discrete unit on a single unit) and they need to somehow be synchronized to provide a performance gain.

Nevertheless I have changed the poll to your request.

Hi!
I would like to ask how far tech range do we have here as long as I connect the dots.

I didn't vote because I believe 1/2 In the short range I see Extellerons point of view as the correct one . With this.
http://www.siggraph.org/s2008/...=papers&id=34%20%20%20

Later agree with this post.

biostud · Jun 13, 2008

I think that the HD 4xxx line is likely to be the way of the future. 95% of users will have a single GPU setup, and the 5% top gamers will have multi GPU. It will be very interesting how well the CF has been improved in 4870x2. So, niche market for me.

BFG10K · Jun 13, 2008

Intel doesn't need any software to sychronize the two die of Kentsfield/Yorkfield; they work together exactly as if they were a single die containing four cores. Why can't GPUs be the same? I don't see any reason why not.

Yes but in order to take advantage of multiple cores the software that runs on them has to be multi-threaded and not have too many inter-thread dependencies. If it isn't it'll run no faster on quad-core than it does on a single core. I think this is the point you're missing.

Now if Intel shipped a single core four times faster than previous single cores all CPU limited software would automatically run four times faster regardless of whether it was threaded properly or not. That is the point I?m making when I state single core is more robust than multi-core.

With multi-GPUs the same applies but instead of the games it's the driver that does most of the heavy lifting to enable multiple-GPUs to scale to higher levels of performance than a single core. Without proper scaling you?ve basically got a working single core with the rest of the cores acting as paperweights, so you gain absolutely nothing from having them there.

biostud · Jun 13, 2008

are there even perfect scaling in otherwise similar chips with different amount of sp?

Extelleron · Jun 13, 2008

Originally posted by: BFG10K

Intel doesn't need any software to sychronize the two die of Kentsfield/Yorkfield; they work together exactly as if they were a single die containing four cores. Why can't GPUs be the same? I don't see any reason why not.

Click to expand...

Yes but in order to take advantage of multiple cores the software that runs on them has to be multi-threaded and not have too many inter-thread dependencies. If it isn't it'll run no faster on quad-core than it does on a single core. I think this is the point you're missing.

Now if Intel shipped a single core four times faster than previous single cores all CPU limited software would automatically run four times faster regardless of whether it was threaded properly or not. That is the point I?m making when I state single core is more robust than multi-core.

With multi-GPUs the same applies but instead of the games it's the driver that does most of the heavy lifting to enable multiple-GPUs to scale to higher levels of performance than a single core. Without proper scaling you?ve basically got a working single core with the rest of the cores acting as paperweights, so you gain absolutely nothing from having them there.

That's different though; you are talking about multi-core CPUs, which do need to be programmed in software.

AMD's quad-core Phenom has all four cores on one die, and it needs software to support multi-core. Intel has four cores spread out two connected die, and they need software to support multiple cores as well. It has nothing to do with the fact that there are two die, that is just the nature of multi-core processing.

Aberforth · Jun 13, 2008

Originally posted by: Extelleron

Originally posted by: BFG10K

Intel doesn't need any software to sychronize the two die of Kentsfield/Yorkfield; they work together exactly as if they were a single die containing four cores. Why can't GPUs be the same? I don't see any reason why not.

Click to expand...

Yes but in order to take advantage of multiple cores the software that runs on them has to be multi-threaded and not have too many inter-thread dependencies. If it isn't it'll run no faster on quad-core than it does on a single core. I think this is the point you're missing.

Now if Intel shipped a single core four times faster than previous single cores all CPU limited software would automatically run four times faster regardless of whether it was threaded properly or not. That is the point I?m making when I state single core is more robust than multi-core.

With multi-GPUs the same applies but instead of the games it's the driver that does most of the heavy lifting to enable multiple-GPUs to scale to higher levels of performance than a single core. Without proper scaling you?ve basically got a working single core with the rest of the cores acting as paperweights, so you gain absolutely nothing from having them there.

Click to expand...

That's different though; you are talking about multi-core CPUs, which do need to be programmed in software.

AMD's quad-core Phenom has all four cores on one die, and it needs software to support multi-core. Intel has four cores spread out two connected die, and they need software to support multiple cores as well. It has nothing to do with the fact that there are two die, that is just the nature of multi-core processing.

I think you are not quite clear of how threading works, any multi-core cpu will act as if they are independent processors, the controller will split the task into a set of parallel threads. In fact when you are on windows, it does make use of multiple cores whether the application is optimized for multi-threading or not. Windows is a virtual layer to interact with hardware, it's memory addressing is virtual so the multi-threading model is also virtual (not directly connected to hardware layer). So in Windows, an application can create as many threads as it wants, the processor decides how threads are to be scheduled over multiple cores- for example a quad core can have 4 physical threads but the application can use like 100 threads, the processing stack is scheduled over 4 cores. Now you can override this default process by initiating a dedicated thread which is very useful for special applications that make use of compression, video/audio decoding, physics in games etc.

---

Also with the GPU's - Vista introduces a new driver model (which NV has a hard time to learn), this model introduces Video Paging so when the GPU memory is low, it can use system ram or hdd to page the data. To counter the slowdown they make use of thread scheduling between shader programs, however DX9 class apps cannot make use of it. Also resources are shared across many processes, so DX10 class GPU's are heavily dependent on Multicore. These are not done by gpu drivers.

Extelleron · Jun 13, 2008

Originally posted by: Aberforth

Originally posted by: Extelleron

Originally posted by: BFG10K

Intel doesn't need any software to sychronize the two die of Kentsfield/Yorkfield; they work together exactly as if they were a single die containing four cores. Why can't GPUs be the same? I don't see any reason why not.

Click to expand...

Yes but in order to take advantage of multiple cores the software that runs on them has to be multi-threaded and not have too many inter-thread dependencies. If it isn't it'll run no faster on quad-core than it does on a single core. I think this is the point you're missing.

Now if Intel shipped a single core four times faster than previous single cores all CPU limited software would automatically run four times faster regardless of whether it was threaded properly or not. That is the point I?m making when I state single core is more robust than multi-core.

With multi-GPUs the same applies but instead of the games it's the driver that does most of the heavy lifting to enable multiple-GPUs to scale to higher levels of performance than a single core. Without proper scaling you?ve basically got a working single core with the rest of the cores acting as paperweights, so you gain absolutely nothing from having them there.

Click to expand...

That's different though; you are talking about multi-core CPUs, which do need to be programmed in software.

AMD's quad-core Phenom has all four cores on one die, and it needs software to support multi-core. Intel has four cores spread out two connected die, and they need software to support multiple cores as well. It has nothing to do with the fact that there are two die, that is just the nature of multi-core processing.

Click to expand...

I think you are not quite clear of how threading works, any multi-core cpu will act as if they are independent processors, the controller will split the task into a set of parallel threads. In fact when you are on windows, it does make use of multiple cores whether the application is optimized for multi-threading or not. Windows is a virtual layer to interact with hardware, it's memory addressing is virtual so the multi-threading model is also virtual (not directly connected to hardware layer). So in Windows, an application can create as many threads as it wants, the processor decides how threads are to be scheduled over multiple cores- for example a quad core can have 4 physical threads but the application can use like 100 threads, the processing stack is scheduled over 4 cores. Now you can override this default process by initiating a dedicated thread which is very useful for special applications that make use of compression, video/audio decoding, physics in games etc.

---

Also with the GPU's - Vista introduces a new driver model (which NV has a hard time to learn), this model introduces Video Paging so when the GPU memory is low, it can use system ram or hdd to page the data. To counter the slowdown they make use of thread scheduling between shader programs, however DX9 class apps cannot make use of it. Also resources are shared across many processes, so DX10 class GPU's are heavily dependent on Multicore. These are not done by gpu drivers.

And this changes what I said how? Applications cannot take advantage of multi-core processors if they are not coded to spawn multiplie threads. An application not coded for multi-core will not utilize multi-core. Other apps, like Cinebench, will spawn as many threads as your CPU supports. So if you have a dual-core CPU, 2 threads. Quad-core, 4 threads. Quad-core w/ 2-way SMT, 8 threads.

Not exactly sure what you are trying to say here.

Aberforth · Jun 13, 2008

No, the processor splits the task whether the application is multi-threaded or not. A single core processor can also manage multiple threads, this is done by a process called thread-scheduling.

Extelleron · Jun 13, 2008

Originally posted by: Aberforth
No, the processor splits the task whether the application is multi-threaded or not. A single core processor can also manage multiple threads, this is done by a process called thread-scheduling.

And this is why we see 0% performance improvement from multi-core processors in single-threaded apps.....

Genx87 · Jun 13, 2008

I could see them going to a multi-core package if the size of the die gets so big the manufacturing costs become high enough.

Aberforth · Jun 13, 2008

Originally posted by: Extelleron

Originally posted by: Aberforth
No, the processor splits the task whether the application is multi-threaded or not. A single core processor can also manage multiple threads, this is done by a process called thread-scheduling.

Click to expand...

And this is why we see 0% performance improvement from multi-core processors in single-threaded apps.....

this is the reason why OS like windows exist, this is why mult-tasking exist. Threads create a benefit of I/O interruption or task switching, it's not only for the performance. So single threaded app can make use of multi-core but the task is considered generic just like any other process, special operations ( compression, decoding etc) require overriding of default working of a CPU.

Lorne · Jun 13, 2008

It will always be a niche market, Every design in teh past always has been and some take off some dont but you will see that the ones that dont may really have influenced other future products (ahem,, P-Pro), Even SLI wasnt created by Nvidia or by 3DFX for those that dont know, But 3DfX got the patent and NVidia bought 3DFX.
I had a set of LB gfx cards (Thats localbus for you newbs) that SLI 10yrs before 3DFX was around, They were used for building design renduring not for games (Who the F Would need to SLI to play Pong)

As certain limitations ($$, R&D and physics) hold advancements they try alternatives ,both AMD and Intel have done that to many times to mention.
Multicore on-die, IMC is the way to go at this time due to $$ and physical limitation ratio.
There are other limitation due to copyright ownership. (eg. Like Seagate and SSD)

BTW, All hardware is software driven to a point hence the BIOS post, Yes even your GFX card has to boot its own bios to itself and into the system its hooked to, Drivers enhance the abillity to comunicate with the OS better, Theres no way around the driver situation unless you go console or Mac II with no add ons.

aka1nas · Jun 13, 2008

Originally posted by: BFG10K

What you're describing doesn?t sound like multi-die at all but sounds like a traditional single GPU with the design changed a bit.

This bit for example:

Connected to this main GPU, via Hypertransport links (or some high-speed connection, much faster than the FSB used in Intel's C2Q) you could connect multiple die containing shading processors and texture units. The number of these die would vary depending on what GPU you were creating;.

Click to expand...

...sounds exactly like SP clusters on a single GPU, and we're already adding or removing them on product lines.

Remember that single GPUs are already "multi-core" (e.g. all of those SPs linked internally). The problem with multi-core/multi-die is that you have several standalone discrete units (as opposed to different parts of one discrete unit on a single unit) and they need to somehow be synchronized to provide a performance gain.

This would be the "right" way for Nvidia et al to do scaling as you wouldn't have to deal with syncing up data between cards. Yields would also be higher as they would be fabbing smaller dies, and they wouldn't have to disable SPs on some models as they would just be tossing the few defective chips.

IIRC, there was some article here last year about Nvidia planning to do exactly this.

Poll/discussion: do you think multi-GPU/multi-die is the future?

Lifer

Lifer

Golden Member

Lifer

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Lifer

Member

Lifer

Lifer

Lifer

Lifer

Lifer

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Lifer

Golden Member

Senior member

Diamond Member