Archive for the ‘x87’ tag
PhysX: an easy target ?
Interesting technical article, called “PhysX: An easy target?“, was posted by user Bohemiq Scali at Window Live blogs.
First part of the entry is dedicated to brief overview of PhysX vs AMD’s physics solutions topic, similar to our “AMD and PhysX: History of the Problem” article, and can be read briefly.
But second part is focused on recent PhysX and x87 theme, and original “PhysX87: Software Deficiency” article by David Kanter. Original statement of mr. Kanter sounds like “SSE can easily run 1.3-2X faster than similar x87 code“, and that’s where Scali gives him a full pack of criticism:
Kanter then makes claims about the gains that can be had from converting the code to SSE. He claims that SSE could quadruple performance in theory, and in reality a boost of more than 2x would be possible. Kanter claims that a modern optimizing compiler can easily vectorize the code for SSE automatically, and such gains could be had from just a recompile.
So nVidia is just leaving all this performance on the table. What’s more, if PhysX would indeed be 2-4 times faster on CPU, it would actually be a threat to GPU-accelerated physics. Kanter claims that PhysX is hobbled on the CPU, and that nVidia is deliberately doing this to make GPU physics look good.
while, actually, “magic” SSE powers were a little exaggerated, since recent tests (#1; #2) with no-doubt SSE optimized Bullet physics engine have shown that
In synthetic tests, there is about 8% to be gained from recompiling. This is nowhere near the 2-4x figure that Kanter was using. In fact, 8% faster PhysX processing would mean even less than 8% higher framerates in games, since PhysX is not the only CPU-intensive task in a game.
Perhaps the net gain in framerate would be closer to 3-4%, depending on the game. In other words, recompiling PhysX with SSE would not make CPUs threaten GPU physics. Not even close. The difference would be lost in the margin of error, most likely.
but in spite of this
Kanter’s article, wrong as it may be, is linked on many news sites and forums all over the web, and many discussions ensue. Most people buy into Kanter’s article, and some sites make even more bold claims than Kanter himself, referring to Kanter’s article as ‘absolute proof’ of nVidia’s evil actions. This is exactly what AMD needs.
Sum:
You may found Scali’s article biased (AMD conspiracy theory and stuff), but it is worth a read as it has common sense. Give it a glimpse, and share your thoughts.
Also, don’t forget that PhysX SDK 2.8.4 already includes SSE2 compiler option, and should be included into next release of FluidMark, so we’re hoping to perform some tests soon.
PhysX: Fright or Delight
Interesting technical article, called “PhysX: Lust, Last oder Frust?” has emerged on Tom’sHardware.de today. It’s purpose is to revisit recent events in GPU PhysX (and CPU execution of PhysX effects) area – thus, basic knowledge of this topic is required.
Update: english version available
x87 vs SSE question gets updated with new CPU instructions tests of Mafia II:
(note: graph is called "vtune_metro2033", so some mistakes may take place)

As new PhysX SDK 2.8.4 with SSE2 compliler option is yet in beta, and Mafia II is based on SDK 2.8.3 – it is still relying on x87 instruction set.
Author correctly remarks, that moving from X87 to SSE usage won’t magically boost performance by 2x times, like several websites are promising, more likely 10-20 % or even less in real applications.
PhysX: x87 and SSE
David Kanter from RealWorldTech.com in his “PhysX87: Software Deficiency” article has hypothesized that origin of slow execution of PhysX content on CPU is fact that PhysX SDK is mostly based on x87 rather than faster SSE instruction set.
“On modern CPUs, SSE can easily run 1.3-2X faster than similar x87 code” – stated Kanter.
However, TGDaily has managed to recieve commentaries from Bryan Del Rizzo, Nvidia spokesperson
[And although] our SDK does [include] some SSE code, we found [that] non-SSE code can result in higher performance than SSE in many situations. [Nevertheless], we will continue to use SSE and plan to enable it by default in future releases. That being said, not all developers want SSE enabled by default, because they still want support for older CPUs for their SW versions.
Update: official responce from Nvidia – We’re not hobbling CPU PhysX
Update #2: some more Nvidia statements at this ars technica article
Update #3: and more at Hothardware.com article “NVIDIA Sheds Light On Lack Of PhysX CPU Optimizations”
But lets get back to original article. According to David, sole reason for PhysX SDK to rely on outdated x87 instruction is to make “Nvidia GPUs looks a lot better than the CPU“. This idea was inherited other websites, like TechReport.com
The PhysX logo is intended as a selling point for games taking full advantage of Nvidia hardware, but it now may take on a stronger meaning: intentionally slow on everything else.
and Semi Accurate
In the end, there is one thing that is unquestionably clear, if you remove the de-optimizations that Nvidia inflicts only on the PC CPU version of PhysX, the GPU version would unquestionably be slower than a modern CPU.
Unfortunately, previous authors are missing few vital points: PhysX SDK is used in many games running on CPU, and physics level in those titles can be easily compared to physics content in games based on other “non crippled” physics engines, like Havok; nor there are any games, that can offer content, similar to GPU PhysX effects, but running on CPU with stable framerate.
And, most important, GPU can accelerate only few parts of PhysX code – rigid bodies, joints, raycasts, forcefields, broadphase, etc – rely purely on CPU, so what is the reason not to optimize those at the full potential, to make PhysX SDK more attractive for developers (and thus increase number of games with GPU PhysX support) ?! Something is telling us that reason “just to make GPUs look better over CPU” is not so obvious.
And what do you think ? Tell us in comments.












