Quick question regarding icc 11.1 on AMD processors

jhu

Lifer
Oct 10, 1999
11,918
9
81
Do programs compiled with it run suboptimal pathways on AMD processors or has that been changed?
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
That's interesting. I've compiled povray 3.6.1 on a 64-bit Debian system and rendered the benchmark file on my Athlon II x4 with some interesting results comparing gcc 4.4 and icc 11.1:

icc, -march=core2
Total Time: 0 hours 13 minutes 45 seconds (825 seconds)

gcc, -march=barcelona
Total Time: 0 hours 15 minutes 6 seconds (906 seconds)

gcc, -march=k8
Total Time: 0 hours 25 minutes 36 seconds (1536 seconds)

I thought that after the lawsuit, icc would be using the better code path. So presumably the icc compile could be faster.

*edit

Add -ffast-math and -funroll-loops for gcc as suggested, and now looks like gcc comes out slightly ahead.

gcc, -march=barcelona -ffast-math -unroll-loops
Total Time: 0 hours 13 minutes 40 seconds (820 seconds)
 
Last edited:

degibson

Golden Member
Mar 21, 2008
1,389
0
0
All compiled programs run sub-optimally. It's in the natures of compilers not to be perfect. Given your data above, however, it seems to be stomping GCC (as expected).
 

Cogman

Lifer
Sep 19, 2000
10,284
138
106
Wow, pretty impressive. Never realized the gcc had fallen that far behinnd, again. Did you enable optimization like --ffast-math and loop unrolling?
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
All compiled programs run sub-optimally. It's in the natures of compilers not to be perfect. Given your data above, however, it seems to be stomping GCC (as expected).

Except icc presumably compiles a slower code path into the binary and runs that slower path if the program does not detect that it is running on an Intel processor.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
Wow, pretty impressive. Never realized the gcc had fallen that far behinnd, again. Did you enable optimization like --ffast-math and loop unrolling?

gcc didn't do too bad. It was only 81 seconds behind. But I assumed that since icc was faster that Intel uncrippled their compiler for other processors.
 

Cogman

Lifer
Sep 19, 2000
10,284
138
106
gcc didn't do too bad. It was only 81 seconds behind. But I assumed that since icc was faster that Intel uncrippled their compiler for other processors.

A 10% increase based entirely on which compiler is being used is really pretty significant.

My bet is that it is coming from vectorization improvements (considering this is going to be a heavy FP dependent program"). The ICC is known to have a pretty good vectorizer, GCC, not so much.
 

iCyborg

Golden Member
Aug 8, 2008
1,344
61
91
The updated times look better, icc and gcc about as fast on barcelona, so perhaps Intel isn't deliberately trying to cripple AMD (any more). Now we need someone to compare icc and gcc on a newer Intel.
 

eLiu

Diamond Member
Jun 4, 2001
6,407
1
0
The updated times look better, icc and gcc about as fast on barcelona, so perhaps Intel isn't deliberately trying to cripple AMD (any more). Now we need someone to compare icc and gcc on a newer Intel.

Is something mislabeled or removed? I don't see any icc results for Barcelona...??
 

eLiu

Diamond Member
Jun 4, 2001
6,407
1
0
It's the first result.

Oh sorry, I thought -march=X meant you were running on X (like compilation targeted at X and running on X), as opposed to running all of those cases on the same machine. Oops. Thanks for the data!
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Except icc presumably compiles a slower code path into the binary and runs that slower path if the program does not detect that it is running on an Intel processor.

Presume away, but I don't buy it. I find it hard to blame Intel on this one. Even if Intel wanted to, they can't make AMD's parts faster than their own via compiler magic. If I was an Intel shareholder, I would be worried if icc did too good of a job compiling for AMD targets. I would think that there's no value for Intel to have inside knowledge into their own microarchitecture that they simply can't get (or more realistically, can't legally use) about AMD's.

Besides, if Intel wanted to make AMD's performance under icc terrible, there wouldn't be a 9% difference. There'd be a 90% difference. I could be convinced otherwise, e.g., if you could prove that, across a variety of benchmarks, icc compiling on an intel platform (perhaps with and without -march specified) would produce a faster binary reliably than icc compiling on an AMD platform. But one benchmark with comparisons with gcc mean nothing.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
Presume away, but I don't buy it. I find it hard to blame Intel on this one. Even if Intel wanted to, they can't make AMD's parts faster than their own via compiler magic. If I was an Intel shareholder, I would be worried if icc did too good of a job compiling for AMD targets. I would think that there's no value for Intel to have inside knowledge into their own microarchitecture that they simply can't get (or more realistically, can't legally use) about AMD's.

Besides, if Intel wanted to make AMD's performance under icc terrible, there wouldn't be a 9% difference. There'd be a 90% difference. I could be convinced otherwise, e.g., if you could prove that, across a variety of benchmarks, icc compiling on an intel platform (perhaps with and without -march specified) would produce a faster binary reliably than icc compiling on an AMD platform. But one benchmark with comparisons with gcc mean nothing.

Higher up post:
Still a problem according to this:

http://www.agner.org/optimize/blog/read.php?i=49

Intel's own website seems to admit this as well:

http://software.intel.com/en-us/articles/optimization-notice/
 
Last edited:

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Higher up post:

Intel link is dead (hiding something, perhaps?)

As for the other, I fail to see the outrage (or the data). It seems to me entirely correct that there could exist and AMD processor slower than an Intel processor.

Bottom line: I see claims that icc does a poor(er) job when targeting or running on non-Intel platforms. What I don't see is proof that icc can make a faster binary for the given platform, so long as it is not compiled on an AMD part.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
updated second link (it's all in the second post in this thread anyway...)

from a post in the first link:

Using a debugger, I could verify that it uses an old version of Intel MKL (version 7.2.0, 2004), and that it loads different versions of the MKL depending on the CPU ID as indicated in the table above. The speed is more than doubled when the CPU fakes to be an Intel Pentium 4.
 

esun

Platinum Member
Nov 12, 2001
2,214
0
0
Intel link is dead (hiding something, perhaps?)

As for the other, I fail to see the outrage (or the data). It seems to me entirely correct that there could exist and AMD processor slower than an Intel processor.

Bottom line: I see claims that icc does a poor(er) job when targeting or running on non-Intel platforms. What I don't see is proof that icc can make a faster binary for the given platform, so long as it is not compiled on an AMD part.

Try actually reading the information at the link (yes, all of it, top to bottom; don't just skim the first paragraph). You clearly want to continue thinking you are right, which means you probably don't want to read the types of things that would demonstrate you are wrong, but don't pretend like we aren't offering sufficient evidence that you are wrong.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Try actually reading the information at the link (yes, all of it, top to bottom; don't just skim the first paragraph). You clearly want to continue thinking you are right, which means you probably don't want to read the types of things that would demonstrate you are wrong, but don't pretend like we aren't offering sufficient evidence that you are wrong.

You seem to be taking this rather personally, and I don't see why. You'll find a lot of people who won't draw the same conclusions as you, so get used to it.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
You seem to be taking this rather personally, and I don't see why. You'll find a lot of people who won't draw the same conclusions as you, so get used to it.

Did you read what's in the link? It's pretty damning evidence of what they're doing.. However, for Povray, I don't think using anything beyond SSE2 is all that helpful.
 

randname

Junior Member
May 16, 2011
4
0
0
The (now very old) criticism of ICC is its enabling of SSE code on non-intel processors. Rather than check the flags set in the CPUID, it instead checks if the Vendor ID is "GenuineIntel". This would commonly result in SSE2 (or above) not being utilized on AMD & VIA systems, regardless of their level of SSE support.

This blog (linked by the Agner Fog stuff earlier), delves down with GDB to find the relevant code sections and how the Intel compiler determined whether or not to enable SSE2 extensions.

Quoting from the blog
Code:
 15f:   0f a2           cpuid
 161:   89 45 fc        mov    %eax,0xfffffffc(%ebp)
 164:   89 5d f8        mov    %ebx,0xfffffff8(%ebp)
 167:   89 4d f4        mov    %ecx,0xfffffff4(%ebp)
 16a:   89 55 f0        mov    %edx,0xfffffff0(%ebp)
(snip)
 19f:   8b 45 f8        mov    0xfffffff8(%ebp),%eax
 1a2:   3d 47 65 6e 75  cmp    $0x756e6547,%eax
 1a7:   bb 01 00 00 00  mov    $0x1,%ebx
 1ac:   75 18           jne    1c6 <__intel_cpu_indicator_init+0x8c>
 1ae:   8b 45 f0        mov    0xfffffff0(%ebp),%eax
 1b1:   3d 69 6e 65 49  cmp    $0x49656e69,%eax
 1b6:   75 0e           jne    1c6 <__intel_cpu_indicator_init+0x8c>
 1b8:   8b 45 f4        mov    0xfffffff4(%ebp),%eax
 1bb:   3d 6e 74 65 6c  cmp    $0x6c65746e,%eax
 1c0:   75 04           jne    1c6 <__intel_cpu_indicator_init+0x8c>

The 'cpuid' call here puts the CPU manufacturer's ID in ebx:edx:ecx. The 'cmp' instructions later then check that these values were 'Genu','ineI','ntel' (i.e. 'GeniuneIntel'). If not, then we jump off to a bit of code that doesn't even pretend to check for the CPU capabilities but instead returns 0x1 (i.e it's a 386, and nothing other than the bog-standard 386 instruction set is available).

Basically the complaints stem from Intel not using ISA-standard ways of checking for flags and capabilities, and instead disallowing features solely based upon the Vendor ID string.


You know how its annoying when a website determines it wont display in your browser of choice solely because you are using a non-IE browser and not based upon your browsers capabilities? That's basically what the Intel compiler does with SSE code. Those websites (and Intel) can suck it.
 
Last edited:

TheRyuu

Diamond Member
Dec 3, 2005
5,479
14
81
Specify /arch:SSE2 (msvc, I don't know if the same applies to linux with -msse2) if you want to target AMD processors, if you specified /QxSSE2 I don't think it'll run on an AMD box (you can add /Qax for additional code paths for Intel processors if you want too with /arch:SSE2 being the default code path).

Autovectorization is pretty shitty anyway so it shouldn't be a huge deal.