• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Povray Recompilation Project for Pentium 4's

pm

Elite Member Mobile Devices
This subject came up near the end of this thread here. There was a discussion between Remnant and fklosters about Pentium 4 performance on the freeware/Open-source raytracing program POVRay. So, I volunteered to grab the Intel v5 C++ Compiler with SSE2 optimizations and recompile it for everyone to run. This would be an interesting experiment for two reasons: it could give an indication of the benefits of SSE2 in a real-world application, it would show how recompilation benefits real-world applications on the Pentium 4.

Unfortunately, I discovered that the Intel C++ Compiler isn't some form of program that I can actually run, but it's actually a plug-in for Visual C++ v6.0. With out Visual C++ 6, you can't run it. I'm actually a Delphi programmer (which is Pascal and so it won't help), so my copy of Visual C++ is v4 and it's pretty out of date - it didn't work.

The Intel v5 C++ Compiler is actually available here as a free 30-day trial download from the Intel website. Remnant has Vis.C++ 6, so he's going to recompile it. I think I have some hope of getting my management to approve a Visual C++ license for myself, but this will take a day or two. If anyone else has it and wants to help, feel free.

POVRay source code is here.
Eval copy of Intel's C++ Compiler is here.

Note: I hope this thread won't turn into some sort of Pentium 4 vs. Athlon debate. This discussion is largely irrelevant to that debate; this is about recompiling an application to enable SSE2.
 
pm, care to recompile the RC5 core for the p4? We would be really be interested to see if it can be improved from there poor RC5 showing on Anand's P4 review. We believe that the SS2 could significantly improve the performance of the RC5 core when running on the P4.
 
dasm! you shelled out $399 for that???

I was waiting for the 30 day evel Intel keeps promising. (and then hacking it to NOT expire after 30 days 😉)

Hey, goto distributed.net and recompile thier RC5 cores! please!!!!!
 
The RC5 recompile effort would be substantially harder though - or at least I think it would be. For a start, they don't give out the full source code. Then there are more issues: they also use a lot of assembler, the networking code will require a libraries. I'm pretty sure it would be a more difficult effort.

I guess we'll see when we get POVRay recompiled, but I'd imagine that time will keep me from attempting the RC5 recompile effort (my wife tends to get irritated with me if I spend 90% of my time at home hunkered in the basement cursing at my computer). Plus there's the fact that I don't have a Pentium 4 to play with. I'm kinda an RC5 addict, so if I had a Pentium 4 then I'd probably want to tweak RC5 to work faster on it. Since I don't, I probably won't.
 
Train, I work for Intel. I didn't have to spend $400. 🙂

Plus, there currently is a 30 day trial for Windows. I haven't tried it myself, but it has a link to it at the Intel site - I posted the link in my first posting up above. The Linux eval is scheduled for a few months from now. What isn't working about the eval?
 
PM, they give out enough code to get a client working on any machine, it wont be a working Dnet Client, but enough to benchmark and if its faster than Dnets client, they will integrate the new core into thier existing client.
 
no PM, go to the page, it says "Coming Soon" and that same message has been up for a LONG time, i check every day 😉
 
Aight train, quit neffing 🙂

I want to know about the recompiled code train. Let me know.
 
Ok, I threw up an HTML page that has the binaries and some instructions on
running the benchmark.

Povray Optimized Binaries

P4 owners, if you can, try with both the P3 and P4 optimized versions, so we
have a baseline for how much the p4 optimizations helped.

Folks with other systems are very welcome to post results with the P3-optimized binary also, to help give an idea of where the different performance levels fall.
 
Well I just ran the benchmarks on my p4 rig and came up with these numbers

P3 @1.7 ghz (122 x 14) 79 seconds or 1 minute 19 seconds
P4 @1.7 ghz (122 x 14) 73 seconds or 1 minute 13 seconds
P4 @1.4 ghz (stock) 115 seconds or 1 minute 55 seconds
P4 @ 1568 mhz (112 x 14) (this is stock for me everyday) 1 minute 20 seconds or 80 seconds


I must say thanks Remnant looks like the p4 will have a great future 🙂

 
Pentium III 933MHz
256MB CAS2 PC133

Original: 2m 42s - 162s
P3 Optimized: 2m 05s - 125s

Recompilation definitely makes a difference both for Pentium III's, Athlon's and Pentium 4's. It's a wonder that the guys who run the povray.org site don't use a better compiler.

Thanks very much for recompiling it, remnant.

I actually managed to get a license for Visual C++ 6.0 too. Guess it's a little late. 🙂
 
what does this run on a duron/athlon. I will try that tomorrow. I have celeron 533@800 and duron 650@850 🙂 I may need help in running benchmark.
 
I ran the P3 binary on my Duron 750/PC100 (128MB CAS3) and I got 133 seconds. This is down from 186 seconds using the stock version.

My rig Specs:
Duron 750MHz
128MB PC100 CAS3
MSI K7T Pro 2-A
Nvidia TNT
SB Live! Value
some other unimportant stuff..
WinMe
 
Just ran the test on my Duron 650@900, 192 mb ram at 133 MHz CAS3, Soltek SL-75KAV.

Original - 2:38 or 158 seconds
P3 opt - 1:53 or 113 seconds
 
We have a limited number of data points, but there is one interesting statistic that I noticed. Taking the before and after numbers and calculating the percentage improvement in time by processor we get.

Percentage improvement (by processor):

500MHz Athlon 27%
750MHz Duron 28%
900MHz Duron 28%
933MHz Pentium III 22%
1.7GHz Pentium 4 35%

The AMD processors are improved by an approximately similar amount (~28%) while the Pentium III and the Pentium 4 processors improve by very different amounts (22% vs. 35%).
 
Just for comparison I asked a friend of mine to run the test on his Celeron 500.
Original - 5:53 or 353 seconds
P3 optimized - 5:04 or 304 seconds
 
HOLY MOTHER OF SSE2 OPTIMIZATION"

1500mhz + (non-SSE2) + pawntest = 178 seconds
1500mhz + SSE2 + pawntest = 86 seconds
1800mhz + (non-SSE2) + pawntest = 149 seconds
1800mhz + SSE2 + pawntest = 70 seconds


Thats what I'm talking about...much better!! Thanks pm & remnant2 for your hard work that makes me feel much better about my future 3d rendering performance!!!
 
fkloster,

Are you sure about your non-SSE2 numbers? NOS440 reported 115 seconds @ 1700MHz, but yours are 149 seconds @ 1800MHz... The SSE2 numbers seem inline with his numbers though. 70 seconds vs 74 seconds for him @ 1700MHz.
 
> OS440 reported 115 seconds @ 1700MHz, but yours are 149 seconds @ 1800MHz

Speculation: Perhaps he got 1:49, which is 109 seconds.

This would make sense, as given 115 seconds at 1700MHz, a P4 at 1800MHz would be expected to score around 108.61111 seconds if the test were perfectly linear, and perhaps 109.0000 if the test were only slightly less than linear.

-JC

PS: Hi, pm! ^_^
 
Ok, updated with the new results in the mini-results section.

Fkloster's on top with his 1.8ghz monster. Anyone with tbirds at 1.2ghz or higher want to add some datapoints? I'd love to see how they compare against the P4 running optimized code.

I'm not sure if I'm shocked or not over the recompile.. Clearly the compiler used in the first place was ancient/not optimizing, because all CPUs got huge gains being compiled with IntelC, as PM noted. Still, the P4 definitely gained the most running its custom optimizations. I have to say that I have more respect for it now, seeing as once I settled on the optimization flags, it took almost no time at all to recompile for the P4.

So the big question becomes: how popular is IntelC? Clearly the opportunity is there for optimization, but how many authors use it? This is the second time we've seen a app post huge gains on all CPUs on recompile (the first was the FlaskMPG app that Tom talked about a while ago).

Clearly, a lot of people are using old compilers!

edit : In case you missed it there first time, Here is the link to getting the recompiled binaries, and a table of results.
 
Simple.

<< OS440 reported 115 seconds @ 1700MHz >>



He was using the compiled version. I had the oportunity to test the non compiled version about a week ago and got 149 seconds.
 
Back
Top