- Mar 11, 2000
- 24,055
- 1,697
- 126
Intel compiler trick. Bad ethics or not?
I wanted to pick a primarily CPU-bound test to run, so I chose
SPEC CPU2000. The test system was a P4 3.2G Extreme Edition with 1 gig
of ram running WIndows XP Pro. First I compiled and ran spec with the
"generic x86 flag" (-QxW), which compiles code to run on any x86
processor. After running the generic version, I recompiled and ran
spec with the "Intel-specific flag" (-QxN) to see what kind of
difference that would make. For most benchmarks, there was not very
much change, but for 181.mcf, there was a win of almost 22% !
Curious as to what sort of optimizations the compiler was doing to
allow the Intel-specific version to run 22% faster, I tried running
the same binary on my friend's computer. His computer, the second test
machine, was an AMD FX51, also with 1 gig of ram, running Windows XP
Pro. First I ran the "generic x86" binaries on the FX51, and then
tried to run the "Intel-only" binaries. The Intel-specific ones
printed out an error message saying that the processor was not
supported and exited. This wasn't very helpful, was it true that only
Intel processors could take advantage of this performance boost?
I started mucking around with a dissassembly of the Intel-specific
binary and found one particular call (proc_init_N) that appeared to be
performing this check. As far as I can tell, this call is supposed to
verify that the CPU supports SSE and SSE2 and it checks the CPUID to
ensure that its an Intel processor. I wrote a quick utility which I
call iccOut, to go through a binary that has been compiled with this
Intel-only flag and remove that check.
Once I ran the binary that was compiled with the Intel-specific flag
(-QxN) through iccOut, it was able to run on the FX51. Much to my
surprise, it ran fine and did not miscompare. On top of that, it got
the same 22% performance boost that I saw on the Pentium4 with an
actual Intel processor. This is very interesting to me, since it
appears that in fact no Intel-specific optimization has been done if
the AMD processor is also capable to taking advantage of these same
optimizations.
I wanted to pick a primarily CPU-bound test to run, so I chose
SPEC CPU2000. The test system was a P4 3.2G Extreme Edition with 1 gig
of ram running WIndows XP Pro. First I compiled and ran spec with the
"generic x86 flag" (-QxW), which compiles code to run on any x86
processor. After running the generic version, I recompiled and ran
spec with the "Intel-specific flag" (-QxN) to see what kind of
difference that would make. For most benchmarks, there was not very
much change, but for 181.mcf, there was a win of almost 22% !
Curious as to what sort of optimizations the compiler was doing to
allow the Intel-specific version to run 22% faster, I tried running
the same binary on my friend's computer. His computer, the second test
machine, was an AMD FX51, also with 1 gig of ram, running Windows XP
Pro. First I ran the "generic x86" binaries on the FX51, and then
tried to run the "Intel-only" binaries. The Intel-specific ones
printed out an error message saying that the processor was not
supported and exited. This wasn't very helpful, was it true that only
Intel processors could take advantage of this performance boost?
I started mucking around with a dissassembly of the Intel-specific
binary and found one particular call (proc_init_N) that appeared to be
performing this check. As far as I can tell, this call is supposed to
verify that the CPU supports SSE and SSE2 and it checks the CPUID to
ensure that its an Intel processor. I wrote a quick utility which I
call iccOut, to go through a binary that has been compiled with this
Intel-only flag and remove that check.
Once I ran the binary that was compiled with the Intel-specific flag
(-QxN) through iccOut, it was able to run on the FX51. Much to my
surprise, it ran fine and did not miscompare. On top of that, it got
the same 22% performance boost that I saw on the Pentium4 with an
actual Intel processor. This is very interesting to me, since it
appears that in fact no Intel-specific optimization has been done if
the AMD processor is also capable to taking advantage of these same
optimizations.