"PhysX hobbled on CPU by x87 code"

GaiaHunter · Jul 8, 2010

Scali said:
No, I'm saying that in this particular case, getting a performance boost from SSE is far from trivial.
The Bullet library actually uses some of the SSE intrinsics from VS2008 aswell, so it has received at least a bit of hand-optimization.

As I said before in the thread, if the computational part is not the bottleneck in the first place, you're not going to gain much by optimizing that part.
I think this small Bullet-test at least shows two things:
1) David Kanter was jumping to conclusions with his figures of 1.5-2x speedup. It's not that simple.
2) nVidia was correct in stating that some things are just faster with x87 than with SSE (just like the example I gave, the dotproduct).

I don't know enough on the technical level to dispute or refute stuff, so I'll ask just a couple of questions, if you don't mind?

Can it be we don't see any differences because Bullet might be more optimized in the first place? Or maybe it isn't optimized enough?

Is it possible that you didn't see much differences because it isn't actually a game you are running?

The only thing we have running on a GPU so far is the Cuda demo released with Bullet 2.74, and that performs better than a CPU yes.

And can you give an estimate how much faster it is (I seriously don't know)? Is it like 20% faster or 2x faster or 4x faster?

I'm not saying that guy proved anything - for that he would have to recompile it for SSE and see if then it was faster, but by the same token I don't think you proved him conclusively wrong either.

I'll do some research and then chime in.

EDIT: I guess Schmide raised an interesting point

If you recompile with just the sse flag it's not going to swizzle and pack the FP operations into a vector so, yeah it's going to be similar. You're basically saving a FXCH and a FSTP.

Schmide · Jul 8, 2010

Lonbjerg said:
DKanter did a borked "analyzis"
He PRESUMES that the use of x87( instead of using SSE) means that Physx is borked on the CPU and running much slower than it could.
But didn't verify in any means his findings.

His analysis is fine as he said without the source code it would be impossible to verify.

Lonbjerg said:
Scali actually put this to the test...a test with a different physcis API and found no major difference between x87 and SSE in Bullet Physics.

If you recompile with just the sse flag it's not going to swizzle and pack the FP operations into a vector so, yeah it's going to be similar. You're basically saving a FXCH and a FSTP.

Lonbjerg said:
That can only lead you to conclude that DKanter's piece was directed against NVIDIA, on a false premise..one that he never tested, but none the less he still (with no factual evidence) concluded that "those are the facts".

A quick conclusion to an analysis that is basically just asking the questions? There is no false premise and most of the facts are not as divisive as you attempt to make them.

Lonbjerg said:
It's no secret that quite a few people dislike PhysX.
It's also no secret that no one can show another physics API on the CPU doing it better.

Those are the facts.
Prove me wrong.

But saying so...means you get accused of "threadcapping" and "namecalling".

Makes you wonder eh?

You cant prove anything by inferring the negative. You need to learn how make a constructive argument and not make everything out to be a conspiracy theory against nVidia.

Lonbjerg · Jul 8, 2010

Schmide said:
His analysis is fine as he said without the source code it would be impossible to verify.

If you recompile with just the sse flag it's not going to swizzle and pack the FP operations into a vector so, yeah it's going to be similar. You're basically saving a FXCH and a FSTP.

A quick conclusion to an analysis that is basically just asking the questions? There is no false premise and most of the facts are not as divisive as you attempt to make them.

You cant prove anything by inferring the negative. You need to learn how make a constructive argument and not make everything out to be a conspiracy theory against nVidia.

Again all these words, but no proof of anything :hmm:

Where is the CPU physics API that make CPU physX look borked?!
Why did Intel buy Havok...for Larrebee "GPU" acceleration.
Why did AMD tout HavokFx...for GPU physics.
Why is AMD now touting Bullet Physics...for the GPU physics.

It's odd PhsyX get so "demonized"...yet non one can show a faster, better, more feature rich alternative?

Schmide · Jul 8, 2010

Lonbjerg said:
Again all these words, but no proof of anything :hmm:

Where is the CPU physics API that make CPU physX look borked?!
Why did Intel buy Havok...for Larrebee "GPU" acceleration.
Why did AMD tout HavokFx...for GPU physics.
Why is AMD now touting Bullet Physics...for the GPU physics.

It's odd PhsyX get so "demonized"...yet non one can show a faster, better, more feature rich alternative?

My words were there to bring some sanity to your rant. There is no proof yet and I think everyone has come forth with just a bit of cautious perspective except you. Maybe re read the article and calm down?

Lonbjerg · Jul 8, 2010

Schmide said:
My words were there to bring some sanity to your rant. There is no proof yet and I think everyone has come forth with just a bit of cautious perspective except you. Maybe re read the article and calm down?

I didn't see any proof in DKanter's article (besides the fact that he is an IRL friend with Char-lie).
I did se his speculation and presumptions being served as a fact.

Why can't you show me another CPU physics API, that makes PhysX looked borked...thus confirming the article?

It can't be that hard...or?

Schmide · Jul 8, 2010

Schmide said:
There is no proof yet and I think everyone has come forth with just a bit of cautious perspective except you. Maybe re read the article and calm down?

Ironically I say the above

Lonbjerg said:
I didn't see any proof in DKanter's article (besides the fact that he is an IRL friend with Char-lie).
I did se his speculation and presumptions being served as a fact.

Why can't you show me another CPU physics API, that makes PhysX looked borked...thus confirming the article?

It can't be that hard...or?

and you return with that first sentence? You want things cut and dry and they aren't. He explains in great detail what he did and the range of implications. As far as I'm concerned, you are not being productive and or rational so continue your ranting without me.

Lonbjerg · Jul 8, 2010

Schmide said:
Ironically I say the above

and you return with that first sentence? You want things cut and dry and they aren't. He explains in great detail what he did and the range of implications. As far as I'm concerned, you are not being productive and or rational so continue your ranting without me.

No Char-lie's friend did at lot of guesswork and didn't prove anything.
But it is served as facts.
I guess I know why they are friends now.

thedosbox · Jul 8, 2010

Lonbjerg said:
It's also no secret that no one can show another physics API on the CPU doing it better.

Classic example of a straw man. The topic under discussion is Nvidia's behaviour, not other Physics API vendors.

But saying so...means you get accused of "threadcapping" and "namecalling".

IMO, your posts in this thread seem to consist of sly accusations about people's motivations and crowing about anti-nvidia FUD. Instead, how about following scali's example and posting something on-topic?

Lonbjerg · Jul 9, 2010

thedosbox said:
Classic example of a straw man. The topic under discussion is Nvidia's behaviour, not other Physics API vendors.

Based on what?
A presumption by DKanter?
Funny thing you should look into.
When AGEIA aqquired NOVODEX, their physics API was coded for x87.
So you are blaming NVIDIA for not optimizing for SSE on the CPU but focusing on their GPU...but trying to make it look like the artificially "borked" the PhysX API?

If that is the case, you should do some reading up...

IMO, your posts in this thread seem to consist of sly accusations about people's motivations and crowing about anti-nvidia FUD. Instead, how about following scali's example and posting something on-topic?

Why don't you do the same?
You could link me to all those AAA titles showing CPU physics in all it's glory and thus show me why we don't need to run physcis on the GPU.

Or you could read up on what devs are saying about the topic:
http://www.pcgameshardware.com/aid,706182/Exclusive-tech-interview-on-Metro-2033/News/

PCGH: What are the visual differences between physics calculated by CPU and GPU (via PhysX, OpenCL or even DX Compute)? Are there any features that players without an Nvidia card will miss? What technical features cannot be realized with the CPU as "physics calculator?

Oles Shishkovstov: There are no visible differences as they both operate on ordinary IEEE floating point. The GPU only allows more compute heavy stuff to be simulated because they are an order of magnitude faster in data-parallel algorithms. As for Metro2033 - the game always calculates rigid-body physics on CPU, but cloth physics, soft-body physics, fluid physics and particle physics on whatever the users have (multiple CPU cores or GPU). Users will be able to enable more compute-intensive stuff via in-game option regardless of what hardware they have.

Dont point your finger at me, when you bring nothing to the table yourself.

thedosbox · Jul 9, 2010

Lonbjerg said:
So you are blaming NVIDIA for not optimizing for SSE on the CPU but focusing on their GPU...but trying to make it look like the artificially "borked" the PhysX API?

Show me where I made such a claim.

Dont point your finger at me, when you bring nothing to the table yourself.

You're the one being PM'd by a moderator for obnoxious behaviour, so I'll leave you to your ranting rather than indulge in a pointless flame war

Lonbjerg · Jul 9, 2010

thedosbox said:
Show me where I made such a claim.

What is your claim then...so there are no "misunderstaings"

You're the one being PM'd by a moderator for obnoxious behaviour, so I'll leave you to your ranting rather than indulge in a pointless flame war

That PM was retracted, as the Mod misunderstood my post.
Nice of you to blame me for posting irrelvant..and then doing it yourself....nice own goal.

I guess you made your hat-trick now.

So do you have anything relevant for the topic or do you just want to carry on, delivering nothing?

GaiaHunter · Jul 9, 2010

Lonbjerg said:
You could link me to all those AAA titles showing CPU physics in all it's glory and thus show me why we don't need to run physcis on the GPU.

Well and you could link us to all those AAA titles showing GPU physics in all it's glory, but considering the list is like 15-16 titles so far in the last what 3-4 years I dunno how much can you show. http://www.nzone.com/object/nzone_physxgames_home.html

Especially because all those games run on CONSOLES with no problems. (guess the console processors simple have these huge amounts of processing power compared to desktop CPUs).

But I guess we will just have to stay put because we have dozens of upcoming games with GPU physX next year... and I've been hearing this since the 4800 series launched ("buy GTX260/280 because physX will be hot next year").

And I guess engines like Infernal from ghostbusters are really sucky.

Lonbjerg · Jul 9, 2010

GaiaHunter said:
Well and you could link us to all those AAA titles showing GPU physics in all it's glory, but considering the list is like 15-16 titles so far in the last what 3-4 years I dunno how much can you show. http://www.nzone.com/object/nzone_physxgames_home.html

Especially because all those games run on CONSOLES with no problems. (guess the console processors simple have these huge amounts of processing power compared to desktop CPUs).

But I guess we will just have to stay put because we have dozens of upcoming games with GPU physX next year... and I've been hearing this since the 4800 series launched ("buy GTX260/280 because physX will be hot next year").

And I guess engines like Infernal from ghostbusters are really sucky.

So that was the long way of saying:
"I got nothing, but I will post smoke&mirros to make it look like I got something"?

And about the Infernal Engine, I direct you to this post:
http://hardforum.com/showpost.php?p=1035917653&postcount=21

An i7 was used in the Velocity Engine tornado demo. It maxes out 8 threads on the CPU with 1500 rigid bodies (plain boxes) + 200 soft bodies + one force actor, or runs in a low double digit framerate with 3500 rigid bodies (plain boxes) and one force actor. ATI and nvidia have shown demos with dozens of times more rigid bodies (in varied complex shapes) and normal framerate on older generation hardware (ATI's demo was made with the ancient Havok FX, in 2006!). Infernal shows the limitations of CPU physics pretty well. It may be better than PhysX on the CPU, but it's far inferior to physics on a GPU.

So still nothing to show?

Ashkael · Jul 9, 2010

Ars has an article on this subject, complete with a response from NVidia.

http://arstechnica.com/gaming/news/...cpu-gaming-physics-library-to-spite-intel.ars

GaiaHunter · Jul 9, 2010

Lonbjerg said:
So that was the long way of saying:
"I got nothing, but I will post smoke&mirros to make it look like I got something"?

And about the Infernal Engine, I direct you to this post:
http://hardforum.com/showpost.php?p=1035917653&postcount=21

An i7 was used in the Velocity Engine tornado demo. It maxes out 8 threads on the CPU with 1500 rigid bodies (plain boxes) + 200 soft bodies + one force actor, or runs in a low double digit framerate with 3500 rigid bodies (plain boxes) and one force actor. ATI and nvidia have shown demos with dozens of times more rigid bodies (in varied complex shapes) and normal framerate on older generation hardware (ATI's demo was made with the ancient Havok FX, in 2006!). Infernal shows the limitations of CPU physics pretty well. It may be better than PhysX on the CPU, but it's far inferior to physics on a GPU.

So still nothing to show?

So basically you have no game to show - just tech demos.

Since I'm yet to see a game using the ridiculous amounts of bodies presented in that demo of Infernal, I'm not worried.

Curiously, you just linked to a CPU physics API that puts physX CPU to shame in terms of speed (yeah you covered your base with subjective factors that aren't related to the piece in discussion that just addresses speed),

Lonbjerg · Jul 9, 2010

Ashkael said:
Ars has an article on this subject, complete with a response from NVidia.

http://arstechnica.com/gaming/news/...cpu-gaming-physics-library-to-spite-intel.ars

So SDK 3.0 will spill the beans on how accurate DKanter's article was.

This should be interesting.

beginner99 · Jul 9, 2010

I don't think they ever decided "Let's not improve PhysX on CPU so that the look very crippled." Probably just a finanical / Project managment decisions. If you make more money by focusing on consoles any company would use it's workforce to improve console performance. However they probably know it's crippled and could run faster but no resources to do it.

And who cares about CPU PhysX? nvidia is stupid to block GPU PhysX when ATI card is present. Remove that limitation and it would probably take off and also their sales on middle end cards.
If you go nv -> ATI now you would sell your nv card. Now the guy that buys it would else have bought a new one if you kept it. -> There would be more demand for nv cards.

evolucion8 · Jul 9, 2010

Lonbjerg said:
I guess you made your hat-trick now.

So do you have anything relevant for the topic or do you just want to carry on, delivering nothing?

Don't be too sentimental with nVidia, they just want your wallet and your soul

, it's all a conspiracy.

Back on topic; I wonder which type of code was originally used by AGEIA with their PPU, I know it was parallel, but don't know if they used RISC type code or equivalent...

Lonbjerg · Jul 9, 2010

evolucion8 said:
Don't be too sentimental with nVidia, they just want your wallet and your soul , it's all a conspiracy.

Back on topic; I wonder which type of code was originally used by AGEIA with their PPU, I know it was parallel, but don't know if they used RISC type code or equivalent...

I presume you are talking about the PPU "path" right?:
http://www.blachford.info/computer/articles/PhysX2.html

Because their CPU "path" had to run off X86 CPU's

Scali · Jul 9, 2010

Schmide said:
His analysis is fine as he said without the source code it would be impossible to verify.

Not really, an experienced programmer can analyze the assembly code.
However, in this case I think 'speculation' is a better term than 'analysis'.

Schmide said:
If you recompile with just the sse flag it's not going to swizzle and pack the FP operations into a vector so, yeah it's going to be similar. You're basically saving a FXCH and a FSTP.

As I said, Bullet actually uses the SSE intrinsics/extensions from VS2008, so it should be packing (AOS) and aligning the data to favour SSE.

Lonbjerg · Jul 9, 2010

Scali said:
Not really, an experienced programmer can analyze the assembly code.
However, in this case I think 'speculation' is a better term than 'analysis'.

As I said, Bullet actually uses the SSE intrinsics/extensions from VS2008, so it should be packing (AOS) and aligning the data to favour SSE.

Someone at B3D suggsted that you use Intel's compiler instead FYI.

Scali · Jul 9, 2010

Lonbjerg said:
Someone at B3D suggsted that you use Intel's compiler instead FYI.

I don't own a copy of the Intel compiler. Why don't they do it? It's open source. Make a community effort! (By the way, wasn't the Intel compiler recently in a similar situation because allegedly it would cripple AMD processors?).

I will point out this though:
I didn't recompile to SSE... the codebase was already aimed at taking advantage of SSE (and if they bothered to make an effort and actually download and inspect the code, they would see).
I *disabled* SSE and recompiled to x87-only (I also disabled the BT_USE_SSE flag in the headers, to skip the SSE intrinsics).
That's a difference.
So these people are barking up the wrong tree.

Scali · Jul 9, 2010

Ashkael said:
Ars has an article on this subject, complete with a response from NVidia.

http://arstechnica.com/gaming/news/...cpu-gaming-physics-library-to-spite-intel.ars

Thanks for the link.
It's interesting to see that the comments are completely different from B3D :
http://74.200.65.90/showthread.php?t=56878&page=5

At ArsTechnica, many people are sceptic about Kanter's claims, even without having seen my recompilation-test with Bullet.

At Beyond3D they seem to be in denial despite Lonbjerg pointing out the Bullet test.

I think that demonstrates what I said about Beyond3D earlier.

Scali · Jul 9, 2010

evolucion8 said:
Back on topic; I wonder which type of code was originally used by AGEIA with their PPU, I know it was parallel, but don't know if they used RISC type code or equivalent...

I believe the PPU's cores were based on the MIPS architecture, with vector extensions.

Scali · Jul 9, 2010

GaiaHunter said:
Can it be we don't see any differences because Bullet might be more optimized in the first place? Or maybe it isn't optimized enough?

Assuming that Bullet is more optimized (towards SSE, since that is the default setting) would imply that the differences between x87 and SSE are larger than with PhysX, not smaller.

GaiaHunter said:
Is it possible that you didn't see much differences because it isn't actually a game you are running?

No, this is a purely synthetic test. It measures ONLY the physics performance. In an actual game, the difference would be smaller, as physics would only be a certain percentage of the total CPU load.

GaiaHunter said:
And can you give an estimate how much faster it is (I seriously don't know)? Is it like 20% faster or 2x faster or 4x faster?

Depends on many factors... what CPU and GPU do you compare (do you take a Core Solo vs a GTX480, or a Core i7 980X vs an Ion)? What kind of test do you construct? Etc.

One of the tests that was demonstrated, used 100.000 cubes in 3D, running at 3 FPS in Cuda:
http://bulletphysics.com/GDC09_ErwinCoumans_BreakingBarriers_2nd.pdf

There was no CPU comparison made, but playing around with the CPU-based demos, I think it's pretty safe to assume that my Core2 Duo doesn't get anywhere close, at a few hundred stacked boxes it already gets in trouble. Even if we assume that a 980X is about 3 times as fast as my CPU, I doubt it would get anywhere near 100.000 cubes at 3 FPS.
I would guess that a GTX480 would easily be 4x faster than a 980X in this case, probably much more (I believe the Cuda sample allows you to switch between Cuda and CPU, so anyone with a Cuda-capable card can do their own testing... but that leaves me out).

"PhysX hobbled on CPU by x87 code"

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Banned

Diamond Member

Banned

Banned

Banned

Banned