PhysX: x87 and SSE
David Kanter from RealWorldTech.com in his “PhysX87: Software Deficiency” article has hypothesized that origin of slow execution of PhysX content on CPU is fact that PhysX SDK is mostly based on x87 rather than faster SSE instruction set.
“On modern CPUs, SSE can easily run 1.3-2X faster than similar x87 code” – stated Kanter.
However, TGDaily has managed to recieve commentaries from Bryan Del Rizzo, Nvidia spokesperson
[And although] our SDK does [include] some SSE code, we found [that] non-SSE code can result in higher performance than SSE in many situations. [Nevertheless], we will continue to use SSE and plan to enable it by default in future releases. That being said, not all developers want SSE enabled by default, because they still want support for older CPUs for their SW versions.
Update: official responce from Nvidia – We’re not hobbling CPU PhysX
Update #2: some more Nvidia statements at this ars technica article
Update #3: and more at Hothardware.com article “NVIDIA Sheds Light On Lack Of PhysX CPU Optimizations”
But lets get back to original article. According to David, sole reason for PhysX SDK to rely on outdated x87 instruction is to make “Nvidia GPUs looks a lot better than the CPU“. This idea was inherited other websites, like TechReport.com
The PhysX logo is intended as a selling point for games taking full advantage of Nvidia hardware, but it now may take on a stronger meaning: intentionally slow on everything else.
and Semi Accurate
In the end, there is one thing that is unquestionably clear, if you remove the de-optimizations that Nvidia inflicts only on the PC CPU version of PhysX, the GPU version would unquestionably be slower than a modern CPU.
Unfortunately, previous authors are missing few vital points: PhysX SDK is used in many games running on CPU, and physics level in those titles can be easily compared to physics content in games based on other “non crippled” physics engines, like Havok; nor there are any games, that can offer content, similar to GPU PhysX effects, but running on CPU with stable framerate.
And, most important, GPU can accelerate only few parts of PhysX code – rigid bodies, joints, raycasts, forcefields, broadphase, etc – rely purely on CPU, so what is the reason not to optimize those at the full potential, to make PhysX SDK more attractive for developers (and thus increase number of games with GPU PhysX support) ?! Something is telling us that reason “just to make GPUs look better over CPU” is not so obvious.
And what do you think ? Tell us in comments.












I think semiaccurate is taking it a bit too far. What basis does he use to conclude that the GPU is weaker than the CPU. The fact that they may have crippled CPU could have been just so it doesn’t come close, not because the CPU would win!
P10-17000 (QUOTE)
P10-17000
8 Jul 10 at 1:24 pm
There are two sides to this, hand optimized SSE and switching the compiler to generate SSE math code instead of x87.
As someone who has tried both approaches with physX, the hand optimized approach is not worth it, the gains are not worth the effort(unlike some similar technologies like VMX128). Modern CPUs are very efficient at out of order executing even poor code.
However switching the compiler to produce SSE is certainly a good idea. The CPU support argument seems like BS. Plus I was under the impression that the x64 build will use compiler SSE code anyway.
Having said that I still dont see a CPU beating a similar generation GPU for anything but the most complex(difficult to paralize) rigid body stuff.
David Black (QUOTE)
David Black
8 Jul 10 at 3:10 pm
I should probably also add that lack of SSE doesnt mean that nvidia are not delibratly creating a slow CPU version, there are many other(more effective) ways:-)
David Black (QUOTE)
David Black
8 Jul 10 at 3:14 pm
Plus I was under the impression that the x64 build will use compiler SSE code anyway
Well, both of those apps tested above aren’t using 2.8.3.
Zogrim (QUOTE)
Zogrim
8 Jul 10 at 3:17 pm
My 2 cents..
>nor there are any games, that can offer content, similar to GPU PhysX effects, but running on CPU with stable framerate.
Not sure about smoke and fluids, but i must say that CryEngine2 delivered some great physics including nice realtime destruction of buildings without serious impact to fps. I must say it was the most impressive and real-like feel for physics i ever saw, unlike CPU/GPU PhysX.
I’m convinced that any serious developer is able to code a great physics engine for his project. Because of the layered nature of PhysX, any custom engine can do the job more effective if created&integrated by the developer himself. It depends on his talent and experience, but the results may be more impressive than any PhysX game.
I can easily accept what they told about “not hobbling CPU performance”, but again – they put zero effort to better support CPU further, didn’t they?
>so what is the reason not to optimize those at the full potential, to make PhysX SDK more attractive for developers (and thus increase number of games with GPU PhysX support) ?!
And that’s why they made those promises regarding 3.x release? Will it really affect developers’ opinion? Maybe it will increase fps for CPU mode by 2-5 frames… or more? Also i’m pretty sure the updated engine won’t change anything for existing released projects.
If they really want to make PhysX more attracting, why don’t they work further on the engine itself (it didn’t changed much since Ageia times, right?), not porting it to open computing engines like OpenCL, not helping developers with CPU mode, but adding more and more GPU-only content?
We can speculate further, but they will never answer to these questions. All their defensive responses (marketing again) are addressed only to those articles/interviews to make PhysX look better imo.
GenL (QUOTE)
GenL
8 Jul 10 at 9:04 pm
I must say it was the most impressive and real-like feel for physics i ever saw, unlike CPU/GPU PhysX
Destruction ? Easy:
http://www.youtube.com/watch?v=fa_S2yA_YaA
pure CPU PhysX, simulation is more complex, than in Crysis.
any custom engine can do the job more effective if created&integrated by the developer himself
Developers must focus on gameplay and content, not on engine and tools creation. That is why all middleware (from UE3 to Havok) does exist.
And that’s why they made those promises regarding 3.x release?
I dunno, SDK 3.0 is more than 2 yers in development, and even I know few details – I believe those mulithreading feature was planned from the beginning.
not porting it to open computing engines like OpenCL, not helping developers with CPU mode, but adding more and more GPU-only content?
Not sure about OpenCL, but SDK 3.0 will contain many changes
Zogrim (QUOTE)
Zogrim
8 Jul 10 at 9:21 pm
>Destruction ? Easy:
>http://www.youtube.com/watch?v=fa_S2yA_YaA
Looks complex, but simulation doesn’t look anywhere realistic to me… In fact, i think it is far too complex for so low-detailed scene.
>Developers must focus on gameplay and content, not on engine and tools creation.
Very wrong! Coding is the art too, it is how the game is played and feels, it is a very base of any gameplay and content. Middlewares are mostly for consoles, where developers are lazy and don’t care about technical quality.
Although Havok at least delivers some basic non-ugly physics without performance hit, i’d still prefer to see a custom well-made engine.
If a developer ask me, i’d recommend to wait with the release and develop an effective physics solutions, even spend another 6 months but do it right, so it will look&feel really good. Impressive things aren’t always require the most heavy computing simulations, that’s it.
GenL (QUOTE)
GenL
8 Jul 10 at 10:01 pm
I think maybe PhysX 3.0 will raise the framerate for ATi from 5-8fps to a more playable 16-18fps with multithreading, but probably only for games that don’t use 2.8.3 or below
P10-17000 (QUOTE)
P10-17000
9 Jul 10 at 4:37 am
Knowing how far the can go with adding PhysX content, they are likely to insert enough of stuff to promote their latest faster GPU series, thus leaving it slower on their older GPUs. And this may eliminate any possible fps increase for CPU mode in most games…
GenL (QUOTE)
GenL
9 Jul 10 at 8:05 am
BTW, very good technical post at Beyond3D Forum
Zogrim (QUOTE)
Zogrim
12 Jul 10 at 6:11 pm
Hmm I thought AndyTX was an nvidian… but maybe I am confusing him with someone else (or it was just nvidias presentation of VSM that confused me…)
I imagine bullet did quite well here for SSE, since the code is probably fairly ameanable to compiler optimization and there is only really physics going on(hence the cache is not over loaded).
David Black (QUOTE)
David Black
12 Jul 10 at 8:08 pm