GaiaHunter
Diamond Member
- Jul 13, 2008
- 3,700
- 406
- 126
No, I'm saying that in this particular case, getting a performance boost from SSE is far from trivial.
The Bullet library actually uses some of the SSE intrinsics from VS2008 aswell, so it has received at least a bit of hand-optimization.
As I said before in the thread, if the computational part is not the bottleneck in the first place, you're not going to gain much by optimizing that part.
I think this small Bullet-test at least shows two things:
1) David Kanter was jumping to conclusions with his figures of 1.5-2x speedup. It's not that simple.
2) nVidia was correct in stating that some things are just faster with x87 than with SSE (just like the example I gave, the dotproduct).
I don't know enough on the technical level to dispute or refute stuff, so I'll ask just a couple of questions, if you don't mind?
Can it be we don't see any differences because Bullet might be more optimized in the first place? Or maybe it isn't optimized enough?
Is it possible that you didn't see much differences because it isn't actually a game you are running?
The only thing we have running on a GPU so far is the Cuda demo released with Bullet 2.74, and that performs better than a CPU yes.
And can you give an estimate how much faster it is (I seriously don't know)? Is it like 20% faster or 2x faster or 4x faster?
I'm not saying that guy proved anything - for that he would have to recompile it for SSE and see if then it was faster, but by the same token I don't think you proved him conclusively wrong either.
I'll do some research and then chime in.
EDIT: I guess Schmide raised an interesting point
If you recompile with just the sse flag it's not going to swizzle and pack the FP operations into a vector so, yeah it's going to be similar. You're basically saving a FXCH and a FSTP.
Last edited:
