Intel compiler flag prevents AMD chips from running for no reason?

Eug

Lifer
Mar 11, 2000
24,055
1,697
126
Intel compiler trick. Bad ethics or not?

I wanted to pick a primarily CPU-bound test to run, so I chose
SPEC CPU2000. The test system was a P4 3.2G Extreme Edition with 1 gig
of ram running WIndows XP Pro. First I compiled and ran spec with the
"generic x86 flag" (-QxW), which compiles code to run on any x86
processor. After running the generic version, I recompiled and ran
spec with the "Intel-specific flag" (-QxN) to see what kind of
difference that would make. For most benchmarks, there was not very
much change, but for 181.mcf, there was a win of almost 22% !

Curious as to what sort of optimizations the compiler was doing to
allow the Intel-specific version to run 22% faster, I tried running
the same binary on my friend's computer. His computer, the second test
machine, was an AMD FX51, also with 1 gig of ram, running Windows XP
Pro. First I ran the "generic x86" binaries on the FX51, and then
tried to run the "Intel-only" binaries. The Intel-specific ones
printed out an error message saying that the processor was not
supported and exited. This wasn't very helpful, was it true that only
Intel processors could take advantage of this performance boost?

I started mucking around with a dissassembly of the Intel-specific
binary and found one particular call (proc_init_N) that appeared to be
performing this check. As far as I can tell, this call is supposed to
verify that the CPU supports SSE and SSE2 and it checks the CPUID to
ensure that its an Intel processor. I wrote a quick utility which I
call iccOut, to go through a binary that has been compiled with this
Intel-only flag and remove that check.

Once I ran the binary that was compiled with the Intel-specific flag
(-QxN) through iccOut, it was able to run on the FX51. Much to my
surprise, it ran fine and did not miscompare. On top of that, it got
the same 22% performance boost that I saw on the Pentium4 with an
actual Intel processor. This is very interesting to me, since it
appears that in fact no Intel-specific optimization has been done if
the AMD processor is also capable to taking advantage of these same
optimizations.
 

Accord99

Platinum Member
Jul 2, 2001
2,259
172
106
-QxW is not the generic x86 flag, its the optimization level for Williamette P4s and newer, requires SSE2 and can perform autovectorization. Should work without problems on K8 CPUs. However, the current K8 SSE2 implementation has certain bottlenecks and 3:2 clock difference which gives the advantage in most SSE2 apps to the P4.
-QxN is a brand new optimization level introduced in ICC 8. Being new, Intel probably hasn't been able to ascertain the correctness of such optimizations with non-Intel processors yet.

Of course, if AMD wants to have a compiler that optimizes fully for their processor, they should fund their own.
 

Duvie

Elite Member
Feb 5, 2001
16,215
0
71
What is your point??? Isn;t this test a bit old and at the time was it all likely that SSE2 was only a p4 trait??? If so then wouldn't it be obvious it would have been "optimized for P4"...Show me it working on a Barton without SSE2 and still get 22% then I say it is a bit more noteworthy.....

Otherwsie the FX as well as all the A64's have SSE2 optimizations NOW!!!! I am sure AMD has ther own little optimization for the thing that does the same tihing.

I am just missing why this is so great and why this has to be a attack on the ethics of the P4....AMD write your own with your own flags that enables the potential SSE2 on theirs....INtel can't pave the road for everyone...
 

Megatomic

Lifer
Nov 9, 2000
20,127
6
81
The question I have is why was the call to check for an Intel CPUID added when a call to check for SSE2 was already there? Should it really matter if the SSE2 enabled chip is an Intel product if the coder was not trying to hamstring any non-Intel SSE2 enabled chip? Just curious...
 

Accord99

Platinum Member
Jul 2, 2001
2,259
172
106
Because the optimization level goes beyond SSE2. QxW is the first optimization level that requires SSE2 and works on the K8 cpus.
 

Eug

Lifer
Mar 11, 2000
24,055
1,697
126
Originally posted by: Accord99
Because the optimization level goes beyond SSE2. QxW is the first optimization level that requires SSE2 and works on the K8 cpus.
My question was if it goes beyond SSE2, and if it does, how it goes beyond SSE2. The optimization seems to work fine on the FX51.

And if does not go beyond SSE2, do you think it is bad ethics or not? ie. Why not just test for SSE2? Or is just a matter of timing as to when the chips and compilers were developed?