PhysX and multi-core support

Lonbjerg · Jan 26, 2010

Dribble said:
Richard Huddy is both right and wrong.

For the software based physics effects which is what the game devs basically use it runs on as many threads as the game devs like - which being as it's probably a console port isn't very many.

For the hardware based physics effects which are pretty well all nvidia sponsored (and I suspect written) it seems to only use one thread in software mode. That's because no one has bothered to code the software fallback for the hardware mode in a multi-threaded way.

Is this wrong? We personally I don't think so - if it's nvidia's work that is providing the hardware path (via TWIMTBP) then why would nvidia make it work multithreaded? They don't sell quad core cpu's, they sell nvidia graphics cards - and that's why they offer the hardware physx path - to sell nvidia gpu's not AMD cpu's - nvidia isn't a charity.

However it doesn't give a fair comparison (for example say in batman) of what a quad core cpu could really manage is fully utilised for those effects. If you want that comparison just use 3d mark I suppose - they coded it to maximise physics for both cpu and gpu path's.

If you want games with hardware physics to fully utilise multi-threading you'll need to ask the game devs to add them not nvidia. However the bottom line seems to be the game dev's don't care and will just give use straight console ports and no more - it's only nvidia or ati getting involved that seems to give the PC version anything other then higher resolutions and textures.

The problem by running massive parallel physcis on the CPU is that no only does the CPU have a SIMD "handicap" (and the parallel performance dosn't increase linear with more CPU cores, but actually have diminishing returns) but the rest of the game "stalls"...while waiting for the physics calculations, as seen in these 2 examples:
http://physxinfo.com/news/1727/dark-void-benchmark-and-physx-patch-available/

http://www.tomshardware.com/reviews/batman-arkham-asylum,2465-10.html

The CPU physics becomes a bottleneck, increasing the bottleneck the more physics is caculated...and threads waiting for CPU-time dosn't show on the CPU load graph.
(the CPU really isn't very happy about massive parallel SIMD caculations)

We are back to the old analogy:
Would you run your graphics on the CPU...or the GPU?

evolucion8 · Jan 26, 2010

What you post would make sense if current x86 processors were In Order, current CPU's are Out of Order processors and things like scalar instructions will run nicely with another type of calculation like math calculation. Plus, CPU's are notoriously better at collision detection thanks to their excellent branchy code prediction which current GPU's aren't good enough due to their extreme parallelism.

Seero · Jan 26, 2010

evolucion8 said:
What you post would make sense if current x86 processors were In Order, current CPU's are Out of Order processors and things like scalar instructions will run nicely with another type of calculation like math calculation. Plus, CPU's are notoriously better at collision detection thanks to their excellent branchy code prediction which current GPU's aren't good enough due to their extreme parallelism.

Huh?
3 years ago
Are you happier seeing more load on CPU than better FPS? Occupying all cores doesn't necessarily mean better performance, but the other way around. If all cores are busy doing physics, what is going to handle the game code and others?

BTW, in case you don't know, GPU compute matrices calculations far better than CPU.

evolucion8 · Jan 26, 2010

Seero said:
Huh?
3 years ago
Are you happier seeing more load on CPU than better FPS? Occupying all cores doesn't necessarily mean better performance, but the other way around. If all cores are busy doing physics, what is going to handle the game code and others?

BTW, in case you don't know, GPU compute matrices calculations far better than CPU.

Well, that's what I see currently, PhysX put more load in the CPU and yet, it runs like crap without an AGEIA or nVidia GPU.

And for the all cores being busy, welcome to the world of multi core, Xbox 360 and PS3 has been doing that for ages!! Handling PhysX calculation in PhysX supported games and handling the code and other stuff!! Software is not catching up with hardware yet.

Seero · Jan 26, 2010

evolucion8 said:
Well, that's what I see currently, PhysX put more load in the CPU and yet, it runs like crap without an AGEIA or nVidia GPU.

And for the all cores being busy, welcome to the world of multi core, Xbox 360 and PS3 has been doing that for ages!! Handling PhysX calculation in PhysX supported games and handling the code and other stuff!! Software is not catching up with hardware yet.

It was bad back then, and it is still bad now. Unlike game console, the number of cores varies in PC. Some people have 2 cores, some have more. Most people as well as I5 all have 2 cores, while the extra 2 cores from I5 are HT. Isn't it not logical to optimize games on dual core CPU? Whose fault it is that C2Q isn't 100% better than C2D? 2 extra core yet not 100% faster most of the time. When it comes to games, people believed that it was better to get C2D because of faster speed per core. Should games be written to utilize 8 cores now so everyone can ditch their PC? Sorry, I will love to see dynamic utilization too, but it isn't happening. OpenCL and DirectCompute have the potential, but the solution is far away.

Lonbjerg · Jan 27, 2010

evolucion8 said:
Well, that's what I see currently, PhysX put more load in the CPU and yet, it runs like crap without an AGEIA or nVidia GPU.

And for the all cores being busy, welcome to the world of multi core, Xbox 360 and PS3 has been doing that for ages!! Handling PhysX calculation in PhysX supported games and handling the code and other stuff!! Software is not catching up with hardware yet.

You seem to suffer form at lot of misconceptions.

A CPU is not better for collision physics than a GPU.
Infact a GPU is massively better for parallel calcualtions than a CPU.

I suggest you start reading up, the level of misconceptions in your post are quite staggering, this would be a good place to start...anno 2005:
http://personal.inet.fi/atk/kjh2348fs/ageia_physx_2005.html

Work yourself up to 2010 please, as my head hurts reading the false claims you come with.

This article (all though old) is quite good for an entry level into the world of physics:
http://www.blachford.info/computer/articles/PhysX1.html

evolucion8 · Jan 27, 2010

Lonbjerg said:
You seem to suffer form at lot of misconceptions.

A CPU is not better for collision physics than a GPU.
Infact a GPU is massively better for parallel calcualtions than a CPU.

I suggest you start reading up, the level of misconceptions in your post are quite staggering, this would be a good place to start...anno 2005:
http://personal.inet.fi/atk/kjh2348fs/ageia_physx_2005.html

Work yourself up to 2010 please, as my head hurts reading the false claims you come with.

This article (all though old) is quite good for an entry level into the world of physics:
http://www.blachford.info/computer/articles/PhysX1.html

You tell me to read those link and welcome myself to 2010, but those links dates from 2006 and before, you are pathettic.

We all know here that GPU's are massivelly better for parallel calculations than CPU, Mr. Obvious. Collision detection isn't very parallel at all, it requires calculations of aproximations of rigid bodies, reactions and stuff that usually doesn't scale linearly with parallel calculations, it is based on variables which uses branchy code that simply runs horrible in GPU's. GPU's don't do very well if there's data dependency from different threads. So if you don't know what you are talking, stop spreading lies, if you do know better than me, then enlight me, I'm no expert of this topic, but I don't bragg about it like you.

Seero · Jan 27, 2010

evolucion8 said:
You tell me to read those link and welcome myself to 2010, but those links dates from 2006 and before, you are pathettic.

We all know here that GPU's are massivelly better for parallel calculations than CPU, Mr. Obvious. Collision detection isn't very parallel at all, it requires calculations of aproximations of rigid bodies, reactions and stuff that usually doesn't scale linearly with parallel calculations, it is based on variables which uses branchy code that simply runs horrible in GPU's. GPU's don't do very well if there's data dependency from different threads. So if you don't know what you are talking, stop spreading lies, if you do know better than me, then enlight me, I'm no expert of this topic, but I don't bragg about it like you.

Yeah, why should people go to school for 20 years learning yesterday's tech?

Oh wait...

evolucion8 · Jan 27, 2010

Seero said:
Yeah, why should people go to school for 20 years learning yesterday's tech?

Oh wait...

Please stay in topic, thank you!!

Lonbjerg · Jan 27, 2010

evolucion8 said:
You tell me to read those link and welcome myself to 2010, but those links dates from 2006 and before, you are pathettic.

You start at the begining...and the you progress onwards...simple logic.
Oh, and you personal attack is reported. -Little antagonizing comments like this tend to derail threads. -Admin DrPizza

We all know here that GPU's are massivelly better for parallel calculations than CPU, Mr. Obvious. Collision detection isn't very parallel at all, it requires calculations of aproximations of rigid bodies, reactions and stuff that usually doesn't scale linearly with parallel calculations, it is based on variables which uses branchy code that simply runs horrible in GPU's. GPU's don't do very well if there's data dependency from different threads. So if you don't know what you are talking, stop spreading lies, if you do know better than me, then enlight me, I'm no expert of this topic, but I don't bragg about it like you.

I highligthed where you run of the road and into a tree.
Here is a link to correct you false view:
http://http.developer.nvidia.com/GPUGems3/gpugems3_ch33.html

Like I said, you really need to read up *shrugs*

evolucion8 · Jan 27, 2010

Lonbjerg said:
You start at the begining...and the you progress onwards...simple logic.
Oh, and you personal attack is reported.

Who cares?

I already reported your trolling and attacks in the other thread and you got remorse and regret of it, now you are doing some revenge, cool, I know that it's wrong. -See what I mean? -Admin DrPizza

I highligthed where you run of the road and into a tree.
Here is a link to correct you false view:
http://http.developer.nvidia.com/GPUGems3/gpugems3_ch33.html

Like I said, you really need to read up *shrugs*

I never stated that doing collision detection was imposible in GPU, is just that isn't practical enough to do all the calculations alone in the GPU, for sure, CPU is still doing collision detection which isn't very intensive compared to other stuff that PhysX does that will simply run better on a GPU. The funny thing is that Need for Speed Shift uses PhysX in CPU for rigid bodies and collision detection and it run fantastic. Would you be able to answer why?

DrPizza · Jan 28, 2010

Knock it off you two, or I'll vacation both of you.
-Admin DrPizza

Lonbjerg · Jan 28, 2010

evolucion8 said:
Who cares? I already reported your trolling and attacks in the other thread and you got remorse and regret of it, now you are doing some revenge, cool, I know that it's wrong.

Red herring...*shrugs*

I never stated that doing collision detection was imposible in GPU, is just that isn't practical enough to do all the calculations alone in the GPU, for sure, CPU is still doing collision detection which isn't very intensive compared to other stuff that PhysX does that will simply run better on a GPU.

You have already been linked to a years old video showing you the excact opposite *shrugs*

The funny thing is that Need for Speed Shift uses PhysX in CPU for rigid bodies and collision detection and it run fantastic. Would you be able to answer why?

NFS - Shift uses only CPU physics...and not very much of it, you point is none-existent.

evolucion8 · Jan 28, 2010

Lonbjerg said:
NFS - Shift uses only CPU physics...and not very much of it, you point is none-existent.

Are you a developer? Do you particpate on Shift development? Do you know the algorithms used for Phisics or made a PhysX debugging? No, so who you are to state how much Shift used PhysX or not, you are just using a straw man argument which is empty as a bottle. The game does use it for impact calculation and stuff like innertia and collision detection, a pretty complicated calculation if you aske me, but whatever.

Anyway enjoy your encapsulated little world of fantasy with your own reality thinking that everyone is wrong and you are right and be happy with the green goblins, no one cares.

Lonyo · Jan 28, 2010

Lonbjerg said:
You start at the begining...and the you progress onwards...simple logic.
Oh, and you personal attack is reported.

I highligthed where you run of the road and into a tree.
Here is a link to correct you false view:
http://http.developer.nvidia.com/GPUGems3/gpugems3_ch33.html

Like I said, you really need to read up *shrugs*

about 21,000 distance queries per second for six-sided convex objects on a 3.0 GHz Pentium 4. The CUDA LCP solver demo computes about 69,000 queries per second on a GeForce 8800 GTX.

Assuming Nehalem is 50% faster than a P4 on a core vs core basis, that's 30k vs 69k.
Assuming that Fermi is 6x faster than an 8800GTX (4x as many shaders plus some extra to be nice, since you'd assume there are architectural improvements too), then that's 420k for Fermi.
30k per core on a Nehalem gives 240k total, or if you take a 980X and the clock speed difference you get 30*12*1.1 which gives you close to 400k.

Assume 50% more transistors for Gulftown over current Nehalem and you get 1.1bn transistors, which is less than half the number of transistors in Fermi. For almost equal (wildly guesstimated) performance.

So there's a lot of wild guesses there, but IMO I tried to be as nice as possible to Fermi, and on a transistor to performance basis, I would theorise that Fermi isn't better than a regular CPU at doing collision detection

evolucion8 · Jan 28, 2010

Lonyo said:
Assuming Nehalem is 50% faster than a P4 on a core vs core basis, that's 30k vs 69k.
Assuming that Fermi is 6x faster than an 8800GTX (4x as many shaders plus some extra to be nice, since you'd assume there are architectural improvements too), then that's 420k for Fermi.
30k per core on a Nehalem gives 240k total, or if you take a 980X and the clock speed difference you get 30*12*1.1 which gives you close to 400k.

Assume 50% more transistors for Gulftown over current Nehalem and you get 1.1bn transistors, which is less than half the number of transistors in Fermi. For almost equal (wildly guesstimated) performance.

So there's a lot of wild guesses there, but IMO I tried to be as nice as possible to Fermi, and on a transistor to performance basis, I would theorise that Fermi isn't better than a regular CPU at doing collision detection

Your calculations are awesome, I knew that my statement was more realistic than his pov, but your knowledge simply brings a much better understanding of it, let's sync with our virtual head bluetooth intranet, so I can learn more loll

Lonyo · Jan 28, 2010

evolucion8 said:
Your calculations are awesome, I knew that my statement was more realistic than his pov, but your knowledge simply brings a much better understanding of it, let's sync with our virtual head bluetooth intranet, so I can learn more loll

I would call it wild guesstimation that currently has no basis in empirically testable reality, but seems somewhat reasonable to a point.

Speculating on how two unreleased products compare is a little silly, but there really doesn't seem to be that much of a difference if an 8800GTX is only 3x as powerful as a single core Pentium 4 3GHz, given how far we have come since the days of the Pentium 4.

Seero · Jan 28, 2010

Lonyo said:
Assuming Nehalem is 50% faster than a P4 on a core vs core basis...

I think that is a wrong assumption. Pentium 4 Cedar Mill runs at 3.2Ghz, I7 975 runs at 3.2Ghz. I7 cores use much less power, produce much less heat and much smaller in size, but not faster. If things can scale linearly, then I7 is 4 times faster than P4, which is not the case.

Lonyo said:
Assuming that Fermi is 6x faster than an 8800GTX (4x as many shaders plus some extra to be nice, since you'd assume there are architectural improvements too), then that's 420k for Fermi.

Bad math. 6 times faster with 4x shaders indicated it isn't linear scaling. Plus, how did you get 6 times from?

Lonyo said:
30k per core on a Nehalem gives 240k total, or if you take a 980X and the clock speed difference you get 30*12*1.1 which gives you close to 400k.

Bad math, 21k x 4 = 84k assuming all cores are at max load. Sharing of cache, traffic to RAM will reduce this number, but we are here theory crafting anyways. If you ask why 4 not 8, then you really should study or do more researches.

Lonyo said:
Assume 50% more transistors for Gulftown over current Nehalem and you get 1.1bn transistors, which is less than half the number of transistors in Fermi. For almost equal (wildly guesstimated) performance.

Look up "Parallel Kernel Execution" and transistor.

Lonyo said:
So there's a lot of wild guesses there, but IMO I tried to be as nice as possible to Fermi, and on a transistor to performance basis, I would theorise that Fermi isn't better than a regular CPU at doing collision detection

No you didn't. Tesla claims to beat Home PC/station 250 times, tax it however ways you want but the answer will still be X times faster. With your method of calculation. One core = 21k, Fermi has 512 cores, 21k x 512 = 10,752k. Divide that by 2 (1.6Ghz shader clock) = 5,376k.

So i7 84k, Fermi 5,376k, 64 times faster than i7. Theorycraft? Yeah, big time.

Lonbjerg · Jan 28, 2010

evolucion8 said:
Are you a developer? Do you particpate on Shift development? Do you know the algorithms used for Phisics or made a PhysX debugging? No, so who you are to state how much Shift used PhysX or not, you are just using a straw man argument which is empty as a bottle. The game does use it for impact calculation and stuff like innertia and collision detection, a pretty complicated calculation if you aske me, but whatever.

Actually it's pretty low physics, I have run much more complex physcis (I have both the AGEIA's SDK(since 2006) and NVIDIA's SDK (since they acquired AGEIA) on my own PC.

The tick-rate for physics is standard 180Hz (180 collision per second) but the games runs much better if you modify it to run 360Hz (360 collison per second)..above 400Hz and you get no benefit....so yes I know how much PhysX the game is doing.

(Physicstweaker.xml --> <prop name="tick rate" data="360" />)

To put a perspective on that the old AGEIA PPU could handle ~533,000 complex collision per second...the game is really bad to use as an example for CPU vs GPU regarding physics calculations....but I have already stated that *shrugs*

BTW, would you stop accusing me of all sort of stuff, it's irrelevant and only makes you look less than stellar when it backfires.

Anyway enjoy your encapsulated little world of fantasy with your own reality thinking that everyone is wrong and you are right and be happy with the green goblins, no one cares.

I fail to see anything relevant, reported.

Lonbjerg · Jan 28, 2010

Lonyo said:
Assuming Nehalem is 50% faster than a P4 on a core vs core basis, that's 30k vs 69k.
Assuming that Fermi is 6x faster than an 8800GTX (4x as many shaders plus some extra to be nice, since you'd assume there are architectural improvements too), then that's 420k for Fermi.
30k per core on a Nehalem gives 240k total, or if you take a 980X and the clock speed difference you get 30*12*1.1 which gives you close to 400k.

Assume 50% more transistors for Gulftown over current Nehalem and you get 1.1bn transistors, which is less than half the number of transistors in Fermi. For almost equal (wildly guesstimated) performance.

So there's a lot of wild guesses there, but IMO I tried to be as nice as possible to Fermi, and on a transistor to performance basis, I would theorise that Fermi isn't better than a regular CPU at doing collision detection

Yup, wild guesses, but no where near accurate.
The clue should be in the video of a rigid bodies collisions that Seero posted in post #53:
http://www.youtube.com/watch?v=yIT4lMqz4Sk&feature=related

That is the PPU vs a P4 with HT..."only" 6000 rigid bodies...and the PPU stomps the P4.
Try and apply you guess-math to that senario and you will see how far off you guess was from reality

Lonyo · Jan 28, 2010

Seero said:
I think that is a wrong assumption. Pentium 4 Cedar Mill runs at 3.2Ghz, I7 975 runs at 3.2Ghz. I7 cores use much less power, produce much less heat and much smaller in size, but not faster. If things can scale linearly, then I7 is 4 times faster than P4, which is not the case.

What? You are assuming that a P4 single core at 3GHz is the same as an i7 core at 3GHz? Are you INSANE?
Did you even read what I wrote? Clockspeed doesn't matter on its own, performance per clock and clockspeed combined are important. Nehalem at 3GHz is faster in any single threaded app than a P4 at 3GHz, ignoring multiple cores.
Also why did you suddenly start talking about 3.2GHz? The metric was for a 3GHz Pentium 4.

Bad math. 6 times faster with 4x shaders indicated it isn't linear scaling. Plus, how did you get 6 times from?

I assumed that clock for clock, Fermi would have better performance than an 8800GTX, either due to architectural changes, or clockspeed changes.
4 times the shaders does not mean 4 times the processing power, with or without linear scaling, when you consider that the architecture is different, so I gave a fudge factor of 50% improvement in addition to useing 4x as many shaders.

Bad math, 21k x 4 = 84k assuming all cores are at max load. Sharing of cache, traffic to RAM will reduce this number, but we are here theory crafting anyways. If you ask why 4 not 8, then you really should study or do more researches.

i7 has hyperthreading (which I fudged again as being 4 extra cores), and 30k comes from the fact that clock for clock it's significantly faster than a P4.
An i7 at 3GHz on a parallel load will be a hell of a lot more than 4x faster than a single core 3GHz Pentium 4.

No you didn't. Tesla claims to beat Home PC/station 250 times, tax it however ways you want but the answer will still be X times faster. With your method of calculation. One core = 21k, Fermi has 512 cores, 21k x 512 = 10,752k. Divide that by 2 (1.6Ghz shader clock) = 5,376k.

So i7 84k, Fermi 5,376k, 64 times faster than i7. Theorycraft? Yeah, big time.

Why are you doing 512 x 21k? The 8800GTX didn't get 21k per shader, it got 69k total across 128 shaders, which means 69k * 4 = Fermi at 512 which is 276k

And are you just trolling or are you really that stupid? 250 times faster than a PC taxed however I want? That's got to be some kind of super troll.
If you really want to believe that a GPU will be 250 times faster than a CPU no matter what, then go ahead, but even the most die hard GPU fanboy from either side would absolutely laugh in your face.

Did you even actually read what I wrote?
I said it was highly speculative, but jesus, at least I stayed in the real world instead of going into some cloud cuckoo land where a single 8800GTX shader manages the same performance as a 3GHz P4.

Lonbjerg · Jan 28, 2010

Some real world numbers(mixed with guesswork) on CPU vs GPU:
http://golubev.com/about_cpu_and_gpu_2_en.htm

akugami · Jan 28, 2010

Lonbjerg said:
The problem by running massive parallel physcis on the CPU is that no only does the CPU have a SIMD "handicap" (and the parallel performance dosn't increase linear with more CPU cores, but actually have diminishing returns) but the rest of the game "stalls"...while waiting for the physics calculations...

We are back to the old analogy:
Would you run your graphics on the CPU...or the GPU?

You're assuming that CPU's are static and can't be updated. Now, granted certain very parallel operations may benefit from something like a GPU currently but is that any different from the past when we used math co-processors to enhance the CPU? No. These co-processors like FPU's are now integrated within the CPU.

It would be stupid to assume someone can't build a CPU with add-ons that would benefit physics processing and other operations currently slated for GPGPU. Not that I'm saying this is the best approach but lets not write off the CPU because of what it currently can't do. As we build better and smaller CPU's, we can add more to them. Granted the same will be true for GPU's.

Who's to say that at the end of the day, it might not make more sense to have higher end CPU's that are great at what is currently used with GPGPU? If we were talking about regular users, current CPU's are overkill for 90% of what consumers do. For power users, we could have multi-core CPU's with specialized cores that are specifically aimed at operations like physics and video encoding.

The other issue is that while GPU's may currently be better built to handle physics acceleration, using it for that purpose is not free either. That's why enabling PhysX on an nVidia card will net you a performance hit. You usually need a separate card for the express purpose of PhysX.

Lonbjerg · Jan 28, 2010

akugami said:
You're assuming that CPU's are static and can't be updated. Now, granted certain very parallel operations may benefit from something like a GPU currently but is that any different from the past when we used math co-processors to enhance the CPU? No. These co-processors like FPU's are now integrated within the CPU.

It would be stupid to assume someone can't build a CPU with add-ons that would benefit physics processing and other operations currently slated for GPGPU. Not that I'm saying this is the best approach but lets not write off the CPU because of what it currently can't do. As we build better and smaller CPU's, we can add more to them. Granted the same will be true for GPU's.

Who's to say that at the end of the day, it might not make more sense to have higher end CPU's that are great at what is currently used with GPGPU? If we were talking about regular users, current CPU's are overkill for 90% of what consumers do. For power users, we could have multi-core CPU's with specialized cores that are specifically aimed at operations like physics and video encoding.

I don't speculate on "what if" or "in some years"...i am talking about the facts...today.
If you want to go down that line, we migh render the graphics on CPU's, since they are so powerfull today...oh wait.

The other issue is that while GPU's may currently be better built to handle physics acceleration, using it for that purpose is not free either. That's why enabling PhysX on an nVidia card will net you a performance hit. You usually need a separate card for the express purpose of PhysX.

For what games?
And that argument is the same with AA...or increasing the resolution...no calculations comes for free...point being?

Seero · Jan 28, 2010

Lonyo said:
What? You are assuming that a P4 single core at 3GHz is the same as an i7 core at 3GHz? Are you INSANE?
Did you even read what I wrote? Clockspeed doesn't matter on its own, performance per clock and clockspeed combined are important. Nehalem at 3GHz is faster in any single threaded app than a P4 at 3GHz, ignoring multiple cores.
Also why did you suddenly start talking about 3.2GHz? The metric was for a 3GHz Pentium 4.

In case you don't know, the Hz refers to cycle, which equates to one operation of a CPU. 1 hz = 1 cycle(operation) per second, and 3Ghz CPU means 3,000 operation per second, regardless of the type of CPU. A single I7 core that runs at 3Ghz handles exact same number of operations than a P4 at 3Ghz. The P4 CPU I picked, Cedar Mill, is a 64-bit processor, same as I7. That means the number of cycle required to complete instructions is exactly the same.

Lonyo said:
I assumed that clock for clock, Fermi would have better performance than an 8800GTX, either due to architectural changes, or clockspeed changes.
4 times the shaders does not mean 4 times the processing power, with or without linear scaling, when you consider that the architecture is different, so I gave a fudge factor of 50% improvement in addition to useing 4x as many shaders.

Say things scale perfectly.
I7 = 4x P4 = 4x21k = 84k
Fermi = 4x8800GTS = 4x69k = 276k

Therefore, Fermi is 276/84 X 100% = 328.57% or more than 3x faster than I7.

Rumors said the architecture of Fermi is designed for GPGPU, not gaming. Thus the performance may be much higher when it comes to computing. I am not trying to justify the correctness of those figures, but simply saying that GPU > CPU with PhysX no matter how you play with them.

Lonyo said:
i7 has hyperthreading (which I fudged again as being 4 extra cores), and 30k comes from the fact that clock for clock it's significantly faster than a P4.
An i7 at 3GHz on a parallel load will be a hell of a lot more than 4x faster than a single core 3GHz Pentium 4.

P4 has HT too my friend. 4x faster is the best case, the theoretical max, where all cores are utilized and no battleneck from varies parts like RAM and cache. The HT implementation on I7 is better than P4, but HT doesn't mean more cores. You need to look up Hyper-thread and its implementation before you can come to a reasonable figure. When Core 0 is in action, cache that are reserved for Core 0 can be freed before Core 0 finishes, tricking the OS as if there is another Core available. P4 with HT was never 2x as fast as P4, no where close. Even with the initial Duo design, data can't arrive fast enough to utilize them. Until C2D, a complete ground up redesign, utilization becomes much better when using 2 cores. However, it was not still no where near 2x the speed, and one of the reason was because a single thread can't be separated into 2 cores. Which was how those multi-threaded talk started. So was C2D twice as fast when working on 2 threads? No. Was C2Q 2x faster as C2D? No. Was I7 2x faster than C2Q? No. How did you come up with the idea that I7 is 4x faster than P4 with HT is beyond me.

Lonyo said:
Why are you doing 512 x 21k? The 8800GTX didn't get 21k per shader, it got 69k total across 128 shaders, which means 69k * 4 = Fermi at 512 which is 276k

Lonyo said:
Assuming that Fermi is 6x faster than an 8800GTX (4x as many shaders plus some extra to be nice, since you'd assume there are architectural improvements too), then that's 420k for Fermi.

So is it 276k or 420k?

Lonyo said:
And are you just trolling or are you really that stupid? 250 times faster than a PC taxed however I want? That's got to be some kind of super troll.
If you really want to believe that a GPU will be 250 times faster than a CPU no matter what, then go ahead, but even the most die hard GPU fanboy from either side would absolutely laugh in your face.

Quote from Nvidia Website

NVIDIA TESLA GPU COMPUTING SOLUTIONS FOR WORKSTATIONS
Experience cluster level computing performanceup to 250 times faster than standard PCs and workstationsright at your desk. Convert your workstation into a NVIDIA® Tesla Personal Supercomputer by adding Tesla GPU computing processors. Each Tesla GPU is based on the revolutionary NVIDIA® CUDA massively parallel computing architecture with a rich set of developer tools (compilers, profilers, debuggers) for popular programming languages APIs like C, C++, Fortran, and driver APIs like OpenCL and DirectCompute.

Keep laughing.

Lonyo said:
Did you even actually read what I wrote?
I said it was highly speculative, but jesus, at least I stayed in the real world instead of going into some cloud cuckoo land where a single 8800GTX shader manages the same performance as a 3GHz P4.

I read your post carefully, and realized that those are bad maths. If you are really living in the real world, then you should simply watch the video comparison. Fact showed that GPU handles PhysX far better than CPU.

I am not trolling you, but I can't sit there watching people picking numbers out of thin air and claim that it is the truth. You on the other hand said i am insane, stupid and should be laugh at. I guess we have different definition of trolling.

PhysX and multi-core support

Diamond Member

Platinum Member

Golden Member

Platinum Member

Golden Member

Diamond Member

Platinum Member

Golden Member

Platinum Member

Diamond Member

Platinum Member

Administrator Elite Member Goat Whisperer

Diamond Member

Platinum Member

Lifer

Platinum Member

Lifer

Golden Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Golden Member