physX CUDA transition almost ready

biostud

Lifer
Feb 27, 2003
18,251
4,764
136
http://www.tomshardware.com/20...p_and_running_almost_/

While Intel's Nehalem demo had 50,000-60,000 particles and ran at 15-20 fps (without a GPU), the particle demo on a GeForce 9800 card resulted in 300 fps. If the very likely event that Nvidia's next-gen parts (G100: GT100/200) will double their shader units, this number could top 600 fps, meaning that Nehalem at 2.53 GHz is lagging 20-40x behind 2006/2007/2008 high-end GPU hardware. However, you can't ignore the fact that Nehalem in fact can run physics.
 

bunnyfubbles

Lifer
Sep 3, 2001
12,248
3
0
But Nehalem is the CPU...it's Larrabee and its potential performance that we'd want to know about...

And 300 or 600 fps is worthless, however 250,000 or 500,000 particles @ 60 fps would be nice :p
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
now we know why nvidia has been pushing such unrealistically high shader units without things to keep up with it...
 

hooflung

Golden Member
Dec 31, 2004
1,190
1
0
Originally posted by: bunnyfubbles
But Nehalem is the CPU...it's Larrabee and its potential performance that we'd want to know about...

And 300 or 600 fps is worthless, however 250,000 or 500,000 particles @ 60 fps would be nice :p

The point is that the Nehalem tested was an 8 core CPU so if things were built with that in mind The GPU 'it would have' could run all the standard rasterization and rendering while cores where used for physics. Nvidia now has a valid case for SLI motherboards. Why buy an 8 core CPU that will probably cost 300 bucks when you could buy an 8800GT and pop it into your SLI motherboard for half the price and have SLI and/or physics as viable options for established and future games.
 

aka1nas

Diamond Member
Aug 30, 2001
4,335
1
0
Once it's available, I'll dig out some of my PhysX-enabled games and compare performance between the PPU and the CUDA implementation.
 

Piuc2020

Golden Member
Nov 4, 2005
1,716
0
0
I wonder if SLI is going to be a necessity or the GPU can cope with graphics and physics at the same time, doubtful but otherwise it could be troublesome since nvidia chipsets suck and they are certainly not in a position to force it upon their users.
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
Originally posted by: Piuc2020
I wonder if SLI is going to be a necessity or the GPU can cope with graphics and physics at the same time, doubtful but otherwise it could be troublesome since nvidia chipsets suck and they are certainly not in a position to force it upon their users.
The hardware and software is capable of running CUDA and graphics threads at the same time (or rather via scheduling), so that's not the problem. If there's a problem, it's going to be the performance hit to rendering performance that results from weighing down the GPU with CUDA code too.
 

superbooga

Senior member
Jun 16, 2001
333
0
0
Just think of it as another graphical option. You turn it up, and your framerates will drop, just like any other graphical option.
 

apoppin

Lifer
Mar 9, 2000
34,890
1
0
alienbabeltech.com
Originally posted by: superbooga
Just think of it as another graphical option. You turn it up, and your framerates will drop, just like any other graphical option.

according to ATi - when they first announced they were doing also physics on their x800-x1900[?] claimed that the "extra unused" cycles would manage it

did they ever demonstrate it so the performance penalty was more limited than nVidia's method?

 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
Originally posted by: apoppin
Originally posted by: superbooga
Just think of it as another graphical option. You turn it up, and your framerates will drop, just like any other graphical option.

according to ATi - when they first announced they were doing also physics on their x800-x1900[?] claimed that the "extra unused" cycles would manage it

did they ever demonstrate it so the performance penalty was more limited than nVidia's method?
They never demonstrated simultaneous operation in the first place AFAIK. The only time you'd have spare cycles would be if you were bottlenecked by something other than the GPU (i.e. the CPU), when the whole idea of GPU physics is to get such work off of the CPU, in effect reducing the CPU bottleneck. Using spare cycles would be counterproductive, in other words.

GPU-accelerated <almost anything> has been a joke so far. Commercial GPGPU use has matured but there's been no real use of the GPU in the consumer space (video decoding is a GPU feature, but it's handled with dedicated hardware, not as a GPGPU program like first intentioned). HavokFX, Quantum Effects, video encode acceleration, etc have all been a dud.
 

apoppin

Lifer
Mar 9, 2000
34,890
1
0
alienbabeltech.com
Originally posted by: ViRGE
Originally posted by: apoppin
Originally posted by: superbooga
Just think of it as another graphical option. You turn it up, and your framerates will drop, just like any other graphical option.

according to ATi - when they first announced they were doing also physics on their x800-x1900[?] claimed that the "extra unused" cycles would manage it

did they ever demonstrate it so the performance penalty was more limited than nVidia's method?
They never demonstrated simultaneous operation in the first place AFAIK. The only time you'd have spare cycles would be if you were bottlenecked by something other than the GPU (i.e. the CPU), when the whole idea of GPU physics is to get such work off of the CPU, in effect reducing the CPU bottleneck. Using spare cycles would be counterproductive, in other words.

GPU-accelerated <almost anything> has been a joke so far. Commercial GPGPU use has matured but there's been no real use of the GPU in the consumer space (video decoding is a GPU feature, but it's handled with dedicated hardware, not as a GPGPU program like first intentioned). HavokFX, Quantum Effects, video encode acceleration, etc have all been a dud.

thanks, i really have not been keeping up .. and it didn't seem to make any sense what ATI said either back then
rose.gif


However, to question what you said [that i bolded] - isn't this the "fault" of the programs? If they are written properly with the GPU in mind, they should be far better than now - and nearly equal to what dedicated HW can do - at least that appears to be the 'theory' of the "extra" cycles not being extra but dedicated to processing the program written specifically to take advantage of the parralelism inherent in GPU architecture?
- and that would also mean they are really working on Fusion - this is all mostly "theoretical" and brand new, right?


 

aka1nas

Diamond Member
Aug 30, 2001
4,335
1
0
To the best my understanding, the main issue with having the GPU process these sort of tasks is that data will have to be constantly sent back to the rest of the system. 3d graphics are mostly a one-way operation, in that data is sent to the GPU for processing and then dumped directly to the screen.

Communications in the other direction have always been relatively computationally expensive due to the latency of the bus connecting the GPU to the system, especially if the GPU has to then wait for the system to deliver more data to it. PCI-E probably helps in that regard, but latency is likely still a killer in some situations.

A lot of the performance-related aspects of Dx10 are intended to do things like send multiple operations as a single batch to the GPU so there is less communication over the bus and the GPU can stay fully utilized and not waiting around for the rest of the system. First-order physics on the GPU seems to run counter-intuitively to that, as you would need to be communicating data back and forth constantly.
 

jjzelinski

Diamond Member
Aug 23, 2004
3,750
0
0
Why couldn't the focus shift from max frame rate to minimum frame rate? If ATI or NV were to design their PPU integration with maintaining minimum frame rates in mind then they in fact *would* have excess cycles to devote to physics processing. In fact once that relatively simple taken then everything should feel rather familiar as far as handling image quality is concerned, meaning we would simply continue to play with shader levels, AA, AF, etc. in order to achieve highest possible minimum frame rates while utilizing PPU capabilities.

Furthermore I had pointed out in a similar thread onyl a week or so back that games designers can easily take advantage of physics processing by tailoring their "screenplays" around the trade off between typical eye-candy and more physics driven content. Not every "scene" may require beefy physics, and not every scene may require high levels of eye-candy; the trade-off could be quite convincing if "scripted" properly.
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
Originally posted by: apoppin
Originally posted by: ViRGE
Originally posted by: apoppin
Originally posted by: superbooga
Just think of it as another graphical option. You turn it up, and your framerates will drop, just like any other graphical option.

according to ATi - when they first announced they were doing also physics on their x800-x1900[?] claimed that the "extra unused" cycles would manage it

did they ever demonstrate it so the performance penalty was more limited than nVidia's method?
They never demonstrated simultaneous operation in the first place AFAIK. The only time you'd have spare cycles would be if you were bottlenecked by something other than the GPU (i.e. the CPU), when the whole idea of GPU physics is to get such work off of the CPU, in effect reducing the CPU bottleneck. Using spare cycles would be counterproductive, in other words.

GPU-accelerated <almost anything> has been a joke so far. Commercial GPGPU use has matured but there's been no real use of the GPU in the consumer space (video decoding is a GPU feature, but it's handled with dedicated hardware, not as a GPGPU program like first intentioned). HavokFX, Quantum Effects, video encode acceleration, etc have all been a dud.

thanks, i really have not been keeping up .. and it didn't seem to make any sense what ATI said either back then
rose.gif


However, to question what you said [that i bolded] - isn't this the "fault" of the programs? If they are written properly with the GPU in mind, they should be far better than now - and nearly equal to what dedicated HW can do - at least that appears to be the 'theory' of the "extra" cycles not being extra but dedicated to processing the program written specifically to take advantage of the parralelism inherent in GPU architecture?
- and that would also mean they are really working on Fusion - this is all mostly "theoretical" and brand new, right?
I'm not quite sure what you mean, apoppin.
 

apoppin

Lifer
Mar 9, 2000
34,890
1
0
alienbabeltech.com
Originally posted by: ViRGE
I'm not quite sure what you mean, apoppin.


OK, Let me begin again .. too many nested quotes

IF a program was written specially to take advantage of the GPU's incredible parallelism - and even perhaps to take advantage of the fact that 3d graphics are mostly a one-way operation - completely UNLIKE Programs written for the CPU calculations; THEN perhaps we would see real use of the GPU in the consumer space - almost as well as what is currently handled with dedicated hardware. And *later* when FUSION is complete and the one-way GPU communicates with the CPU much more effectively - with memory fully integrated and the MB changed from what we know it - it will finally come into its own. AMD's Vision

i am talking about the "future"
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
Originally posted by: apoppin
Originally posted by: ViRGE
I'm not quite sure what you mean, apoppin.


OK, Let me begin again .. too many nested quotes

IF a program was written specially to take advantage of the GPU's incredible parallelism - and even perhaps to take advantage of the fact that 3d graphics are mostly a one-way operation - completely UNLIKE Programs written for the CPU calculations; THEN perhaps we would see real use of the GPU in the consumer space - almost as well as what is currently handled with dedicated hardware. And *later* when FUSION is complete and the one-way GPU communicates with the CPU much more effectively - with memory fully integrated and the MB changed from what we know it - it will finally come into its own. AMD's Vision

i am talking about the "future"
This implies that there's a lack of ability with current hardware, which I would argue is not the case. Certainly accessing data from a GPU isn't as fast as say local memory, but judging what people have been doing with CUDA and Brook+ it doesn't seem like a serious problem. The limiting factor is not the hardware, IMHO, it's the development software. Until a year ago (even less on AMD's side) there was no practical way to write GPGPU software, you had to write it as Cg/HLSL shader code which was picky about hardware and required an in-depth knowledge of how graphical rendering works. The realization of GPU acceleration is going to come from the fact that we finally have real high-level language development tools that can be easily integrated in to current development practices.

For what it's worth, I don't see Fusion changing any of this.
 

apoppin

Lifer
Mar 9, 2000
34,890
1
0
alienbabeltech.com
Originally posted by: ViRGE
Originally posted by: apoppin
Originally posted by: ViRGE
I'm not quite sure what you mean, apoppin.


OK, Let me begin again .. too many nested quotes

IF a program was written specially to take advantage of the GPU's incredible parallelism - and even perhaps to take advantage of the fact that 3d graphics are mostly a one-way operation - completely UNLIKE Programs written for the CPU calculations; THEN perhaps we would see real use of the GPU in the consumer space - almost as well as what is currently handled with dedicated hardware. And *later* when FUSION is complete and the one-way GPU communicates with the CPU much more effectively - with memory fully integrated and the MB changed from what we know it - it will finally come into its own. AMD's Vision

i am talking about the "future"
This implies that there's a lack of ability with current hardware, which I would argue is not the case. Certainly accessing data from a GPU isn't as fast as say local memory, but judging what people have been doing with CUDA and Brook+ it doesn't seem like a serious problem. The limiting factor is not the hardware, IMHO, it's the development software. Until a year ago (even less on AMD's side) there was no practical way to write GPGPU software, you had to write it as Cg/HLSL shader code which was picky about hardware and required an in-depth knowledge of how graphical rendering works. The realization of GPU acceleration is going to come from the fact that we finally have real high-level language development tools that can be easily integrated in to current development practices.

For what it's worth, I don't see Fusion changing any of this.

i must be stupid tonight

:confused:

IF a program was written specially to take advantage of the GPU's incredible parallelism .. THEN perhaps we would see real use of the GPU in the consumer space - almost as well as what is currently handled with dedicated hardware

that is what i thought i said [^^this is me, my quote^^ - without the confusing stuff that i should have put in another sentence]

rose.gif


i *expect* to see CUDA take off this year!! - for sure
.. what about AMD? .. that IS the question
- *my question*