• We should now be fully online following an overnight outage. Apologies for any inconvenience, we do not expect there to be any further issues.

"PhysX hobbled on CPU by x87 code"

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Scali

Banned
Dec 3, 2004
2,495
0
0
To add some historic perspective...
I still had a copy of the original NovodeX SDK and the NovodeXRocket samples on my harddrive.
I've put them online so everyone can download and compare the original CPU code to the current nVidia PhysX:
http://bohemiq.scali.eu.org/NovodeX/
After all, if you want to claim that nVidia has 'hobbled up' the CPU-code, you have to compare it to what it was before... This is that code.

I also have an old Ageia 2.7.0 SDK, from the PPU era, if anyone is interested:
http://bohemiq.scali.eu.org/PhysX/
That would tell us whether it was nVidia or Ageia who did it, if they did anything at all...

My take on things:
A quick disassembly of the NxPhysics.dll revealed no SSE code, everything seemed to be purely x87. The performance also doesn't seem to be better than similar samples from later PhysX SDKs. It looks worse, if anything.
 
Last edited:

Skurge

Diamond Member
Aug 17, 2009
5,195
1
71
To add some historic perspective...
I still had a copy of the original NovodeX SDK and the NovodeXRocket samples on my harddrive.
I've put them online so everyone can download and compare the original CPU code to the current nVidia PhysX:
http://bohemiq.scali.eu.org/NovodeX/
After all, if you want to claim that nVidia has 'hobbled up' the CPU-code, you have to compare it to what it was before... This is that code.

I also have an old Ageia 2.7.0 SDK, from the PPU era, if anyone is interested:
http://bohemiq.scali.eu.org/PhysX/
That would tell us whether it was nVidia or Ageia who did it, if they did anything at all...

My take on things:
A quick disassembly of the NxPhysics.dll revealed no SSE code, everything seemed to be purely x87. The performance also doesn't seem to be better than similar samples from later PhysX SDKs. It looks worse, if anything.

So its basically just negligence. I guess they aren't the only ones who can be accused of that.
 

Lonbjerg

Diamond Member
Dec 6, 2009
4,419
0
0
I will say though. There is basically one function that has implemented SSE. I can see how there is very little if any difference in execution.

WHAT!!!

Bullet is hobbled on the CPU...down with NVID...oh wait :D
 

Scali

Banned
Dec 3, 2004
2,495
0
0
WHAT!!!

Bullet is hobbled on the CPU...down with NVID...oh wait :D

Hey, at least he confirms that recompiling code to SSE does little for performance :)

And it's easy to create conspiracy theories around Bullet aswell... after all, Erwin Coumans works for Sony. It must be Sony's attempt to hobble up PC gaming, in order to make their PS3 look better!
There's another article in there, David Kanter!
 

Schmide

Diamond Member
Mar 7, 2002
5,745
1,036
126
I'm not going to affirm or deny any point on either side of too many unknowns.

This I don't get.
So you don't even agree that using optimizations with SSE intrinsics will yield more performance than just compiling vanilla C++ code and letting the compiler extract and optimize all parallelism for SSE?
Because that part is not an unknown.

No I agree with that. I'm not poisoning the well. I'm basically saying the one routine with intrinsics may not be enough even if it is the meat of the system. I will say that intel's vectorising compiler sometimes does work magic just with a flip of a switch.

The conclusion that I was inferring was that of:

Bullet having enough optimization with intrinsics to affirm or deny the amount of improvements that can be accomplished with them. You seem to be steering your argument towards the: I disabled SSE in one part of bullet and it made very little difference; therefore, SSE doesn't matter that much.

nVidia's Skolones (ARS)
It's fair to say we've got more room to improve on the CPU. But it's not fair to say, in the words of that article, that we're intentionally hobbling the CPU, The game content runs better on a PC than it does on a console, and that has been good enough.

Which I think is a fair statement and hobbling is too strong a word. Though neglecting could be appropriate. As most of my position has been, it's probably somewhere in the middle.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
Bullet having enough optimization with intrinsics to affirm or deny the amount of improvements that can be accomplished with them. You seem to be steering your argument towards the: I disabled SSE in one part of bullet and it made very little difference; therefore, SSE doesn't matter that much.

That's not at all what I was saying... but I think that says more about you than about me.
What I'm saying is that I disabled *all* SSE in Bullet, to do the x87 vs SSE test that Kanter suggested.
If I hadn't disabled the SIMD optimizations, then setting the compiler to compile to non-SSE instructions only, would have resulted in the intrinsics still being emitted as SSE code, and it wouldn't be a fair test.

So now that we have established that I had created a fair SSE vs x87 test case for Bullet... I then pointed out that this case is actually favourable to SSE, since it doesn't rely entirely on the compiler for auto-vectorization, but has some hand-optimizations aswell. Assuming PhysX doesn't have such hand-optimizations, and has to rely completely on the compiler, the results would be even less spectacular.
How well Bullet is actually optimized for SSE is irrelevant here.

I have never said that SSE doesn't matter... on the contrary. I just said that:
1) Physics is not necessarily a good case for SSE optimizations.
2) Compilers aren't that good at auto-vectorizing code to SSE.
Hence, the prediction of 1.5-2x performance improvements suggested by Kanter, from just recompiling PhysX to SSE are unrealistic.
Bullet actually confirms what I said.

But I said that despite that, there's no reason NOT to use SSE.

Which I think is a fair statement and hobbling is too strong a word. Though neglecting could be appropriate. As most of my position has been, it's probably somewhere in the middle.

That's what I said aswell.
Kanter's accusations were way off the mark in three ways:
Firstly he made the x87 code sound a lot worse than it really is... Yes, it can be optimized, but no it won't really change the perceived difference between CPU and GPU performance in PhysX.
Secondly, nVidia didn't purposely DISABLE existing SSE code. There is no evidence of nVidia (or Ageia) making PhysX perform worse than it did before the acquisition.
Lastly, PhysX does support multithreading, and various software exists to prove that. nVidia did nothing to prevent it. It just isn't automated (yet), so the developers have to put in some work to make it multithreaded (what else is new).
 

Schmide

Diamond Member
Mar 7, 2002
5,745
1,036
126
Hence, the prediction of 1.5-2x performance improvements suggested by Kanter, from just recompiling PhysX to SSE are unrealistic.
Bullet actually confirms what I said.

I still can't find this claim. Who said just recompiling could yield 1.5-2x?
 

Lonbjerg

Diamond Member
Dec 6, 2009
4,419
0
0
I still can't find this claim. Who said just recompiling could yield 1.5-2x?

DKanter "suggested" that:
http://realworldtech.com/page.cfm?ArticleID=RWT070510142143&p=5

"That 2-4X performance gain sounds respectable on paper. In reality though, if the CPU could run 2X faster by using properly vectorized SSE code, the performance difference would drop substantially and in some cases disappear entirely. Unfortunately, it is hard to determine how much performance x87 costs.."

He also wrote this:
"PhysX could take advantage of several cores in a modern CPU. For example, Westmere sports 6 cores, and using two cores for physics could easily yield a 2X performance gain. Combined with the benefits of vectorized SSE over x87, it is easy to see how a proper multi-core implementation using 2-3 cores could match the gains of PhysX on a GPU."

Suggesting a x4 speed up of CPU physics...and that is why I wondered if Havok/Bullet were "hobbled too"...sicne I don't see any games surpassing or even macthing GPU physics.

And that has spread around...just like it has spread around that his article was a factual technical piece...a lot of thing has spread around this topic, not many of them true.

Fanboys won't see the little "if" about the CPU...just like they posted that NVIDIA had artificially crippled CPU physics...otherwise this thread wouldn't be here.

Jsut look how this thread changed character form the first posts and till now.
It went from argument ignorantum to factual posts.
 
Last edited:

Schmide

Diamond Member
Mar 7, 2002
5,745
1,036
126
DKanter "suggested" that:
http://realworldtech.com/page.cfm?ArticleID=RWT070510142143&p=5

"That 2-4X performance gain sounds respectable on paper. In reality though, if the CPU could run 2X faster by using properly vectorized SSE code, the performance difference would drop substantially and in some cases disappear entirely. Unfortunately, it is hard to determine how much performance x87 costs.."

He also wrote this:
"PhysX could take advantage of several cores in a modern CPU. For example, Westmere sports 6 cores, and using two cores for physics could easily yield a 2X performance gain. Combined with the benefits of vectorized SSE over x87, it is easy to see how a proper multi-core implementation using 2-3 cores could match the gains of PhysX on a GPU."

Suggesting a x4 speed up of CPU physics...and that is why I wondered if Havok/Bullet were "hobbled too"...sicne I don't see any games surpassing or even macthing GPU physics.

He suggested nothing of the sort! It is a very contingent set of statements that infer if properly vectorized code was implemented, with a dose of sanity at the end saying it is truly hard to determine what benefits could be made! Finally adding a pinch of multi-threading to round out the equation.

And that has spread around...just like it has spread around that his article was a factual technical piece...a lot of thing has spread around this topic, not many of them true.

Fanboys won't see the little "if" about the CPU...just like they posted that NVIDIA had artificially crippled CPU physics...otherwise this thread wouldn't be here.

Jsut look how this thread changed character form the first posts and till now.
It went from argument ignorantum to factual posts.

I kept the fanboisim out of it. It would probably be better if you did as well. His article was factual in keeping the speculation to very discrete statements of the hypothetical. You are so quick to condemn and misconstrue what was actually said, it's no wonder these things spread like wildfire.
 
Last edited:

Lonbjerg

Diamond Member
Dec 6, 2009
4,419
0
0
He suggested nothing of the sort! It is a very contingent set of statements that infer if properly vectorized code was implemented, with a dose of sanity at the end saying it is truly hard to determine what benefits could be made! Finally adding a dose of multi-threading to round out the hypothetical.



I kept the fanboisim out of it. It would probably be better if you did as well. His article was factual in keeping the speculation to very discrete statements of the hypothetical. You are so quick to condemn and misconstrue what was actually said, it's no wonder these things spread like wildfire.


So you havn't seen the fallout...by a specualtive "article" served as facts...by an IRL friend of Char-lie...I have seen the fallout in many forums...sadly the article is not being used a a "presumption", but as evidence that NVIDIA deliberatly crippled PhysX on the CPU...facts don't matter.
 

Schmide

Diamond Member
Mar 7, 2002
5,745
1,036
126
So you havn't seen the fallout...by a specualtive "article" served as facts...by an IRL friend of Char-lie...I have seen the fallout in many forums...sadly the article is not being used a a "presumption", but as evidence that NVIDIA deliberatly crippled PhysX on the CPU...facts don't matter.

This sentence is so full of fanboism. For one you're damming the article based on associations not content. You've shown you preemption to take it out of context.

BTW facts do matter as do semantics. If you weren't going around feeding one side or the other, rational heads would prevail.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,700
406
126
So you havn't seen the fallout.

Yeah - big fallout!

Thousands upon thousands of threads and posts in every forum!

NVIDIA shares just dropped like a rock and people went burning NVIDIA buildings!

AMD gained huge market share and NVIDIA gained reputation for late and hot cards in a day!

Look at this thread - thousands upon thousands of rabid posts against NVIDIA calling for blood!

Get a grip.
 
Last edited:

Skurge

Diamond Member
Aug 17, 2009
5,195
1
71
Yeah - big fallout!

Thousands upon thousands of threads and posts in every forum!

NVIDIA shares just dropped like a rock and people went burning NVIDIA buildings!

AMD gained huge market share and NVIDIA gained reputation for late and hot cards in a day!

Look at this thread - thousands upon thousands of rabid posts against NVIDIA calling for blood!

Get a grip.

Call him out on something and his just gonna ignore that and continue spewing bile like always.
 

Lonbjerg

Diamond Member
Dec 6, 2009
4,419
0
0
More smoke&mirrors...now that is been hammered out that NVIDIA did't do a evil thing...S.O.P....carry on :)
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,700
406
126
More smoke&mirrors...now that is been hammered out that NVIDIA did't do a evil thing...S.O.P....carry on :)

Me, for example, on post #8 of this thread

I guess it isn't something surprising or that can be seen as unethical.

NVIDIA is interested on having its products working.

But apparently it is just smoke and mirrors.

Of course for someone in the extremities of a spectrum even those that are in the middle look so far away that they could be in the other end.
 

Skurge

Diamond Member
Aug 17, 2009
5,195
1
71
More smoke&mirrors...now that is been hammered out that NVIDIA did't do a evil thing...S.O.P....carry on :)

I wonder who claimed that they did?

Ofcourse, it not like the claim where you said nV sold more cards when they didnt.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
He suggested nothing of the sort! It is a very contingent set of statements that infer if properly vectorized code was implemented, with a dose of sanity at the end saying it is truly hard to determine what benefits could be made! Finally adding a pinch of multi-threading to round out the equation.

I actually mailed DKanter, his answer was this, and I quote:
"And actually it is just a flick of the switch to get SSE instead of x87. I'm sure your familiar with GCC:

-mfpmath=sse
-march=prescott

Both of those would radically improve the situation. fastmath would as well."

So that leaves no doubt that we misinterpreted the article.
This was really what he was saying. He thinks all it takes is a flick of the switch, and you get super-fast vectorized SSE code.

The point of me emailing DKanter was to get him to amend, rectify or follow-up the article, given the new information from nVidia, the recompilation of Bullet, and various other points.
I also pointed him to the original NovodeX code and told him he should profile that aswell.
Apparently he has no interest in any of that. The article is still there, nothing changed.
As a result, I no longer have any reason to consider him trustworthy or good-willing.
 
Last edited:

Keysplayr

Elite Member
Jan 16, 2003
21,219
55
91
More smoke&mirrors...now that is been hammered out that NVIDIA did't do a evil thing...S.O.P....carry on :)

You're taking this quite a bit over the top, and personally I might add. Why don't you just simmer down for a while. You're pretty much the only one crying with rage here. Doesn't do any good, especially for the rest of the members. State your piece, with facts (not in an antagonistic way because that will force people to ignore your facts, if any, and focus on YOU) and be done with it. Others can do it, so should you be able to.
 

Schmide

Diamond Member
Mar 7, 2002
5,745
1,036
126
I actually mailed DKanter, his answer was this, and I quote:
"And actually it is just a flick of the switch to get SSE instead of x87. I'm sure your familiar with GCC:

-mfpmath=sse
-march=prescott

Both of those would radically improve the situation. fastmath would as well."

So that leaves no doubt that we misinterpreted the article.
This was really what he was saying. He thinks all it takes is a flick of the switch, and you get super-fast vectorized SSE code.

I don't think that's what he said? Regardless you can't say what he thinks from that above quote.

BTW the intel compiler will vectorize some code as I said above.

Your own analysis gave a near 8% improvement in some circumstances. As did mine. VS2008

My benches

x87 on a q9550 stock
Code:
Results for 3000 fall: 22.069597
Results for 1000 stack: 15.439860
Results for 136 ragdolls: 12.464890
Results for 1000 convex: 17.588723
Results for prim-trimesh: 9.597156
Results for convex-trimesh: 16.227468
Results for raytests: 21.439204

SSE2 on a q9550 stock
Code:
Results for 3000 fall: 20.411149
Results for 1000 stack: 14.613794
Results for 136 ragdolls: 11.288755
Results for 1000 convex: 17.307443
Results for prim-trimesh: 9.465809
Results for convex-trimesh: 15.908466
Results for raytests: 21.341343
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
I don't think that's what he said? Regardless you can't say what he thinks from that above quote.
What? Scali said he wrote a mail to him and that's what he answered - looks like a direct quote to me and I'm not sure how "Both of those would radically improve the situation" could be misunderstood.
So far I haven't seen any proof for that claim, but some discrepances - taking standard C code and extracting vectorizeable instructions out of it, is not easy.. you may very well get a 5% improvement here and there, but his "radical improvements" without hand optimizing the code? Don't think so.
Also Scalis example wasn't just "change a compile switch", from flipping over it, they use at least some custom data structures.

Though without profiling the code I don't think we can make universally valid statements about it - how much time do those functions take globally? 8% absolute improvement for 30% of the code is something different than for 60%

And actually we'd have to compare SSE2 instructions without the IFDEFs, after all we're interested in how much performance improvement we gain by just flipping some switches. I'd hope somebody tries that for me, otherwise I'll hope those projects work just as fine under VS2010 (had some small problems with linker settings from VC6 projects) and I'll do that myself tomorrow, kinda late today ;)
 

Schmide

Diamond Member
Mar 7, 2002
5,745
1,036
126
What? Scali said he wrote a mail to him and that's what he answered - looks like a direct quote to me and I'm not sure how "Both of those would radically improve the situation" could be misunderstood.

I'm saying: I think it's a leap to go from him saying "radically improve" to, "he thinks all it takes is a flick of the switch, and you get super-fast vectorized SSE code." That's not what he said.
 

Keysplayr

Elite Member
Jan 16, 2003
21,219
55
91
I'm saying: I think it's a leap to go from him saying "radically improve" to, "he thinks all it takes is a flick of the switch, and you get super-fast vectorized SSE code." That's not what he said.

And at this point, I'm going to say that just about anything quoted from that guy can be interpretted 90 different ways, solving nothing. Am I right?
Since Scali was able to contact this guy, why don't you give it try as well?

Better still, why not you and Scali form an email together and send it to him addressing the most important ingredient in all this, which is..... MAKE YOURSELF CLEAR!! AND UNDERSTOOD!!!

just my 2 cents.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
I don't think that's what he said? Regardless you can't say what he thinks from that above quote.

Given that he claimed figures of 1.5-2x improvement, I don't think his meaning of 'radically' comes down to 8% (which would be 1.08x).

Your own analysis gave a near 8% improvement in some circumstances. As did mine. VS2008

Exactly. I never denied that SSE would improve the code, I just said it wouldn't improve it by 1.5-2x, not nearly.
Thanks for confirming that my Bullet tests were fair.
8% performance is not really 'hobbling the CPU'. It will be barely measurable in an actual game. Not saying that nVidia shouldn't enable SSE... but just that it's not really going to matter in terms of CPU PhysX vs GPU PhysX, so DKanter's claims are pretty much unfounded.