All compiled programs run sub-optimally. It's in the natures of compilers not to be perfect. Given your data above, however, it seems to be stomping GCC (as expected).
Wow, pretty impressive. Never realized the gcc had fallen that far behinnd, again. Did you enable optimization like --ffast-math and loop unrolling?
gcc didn't do too bad. It was only 81 seconds behind. But I assumed that since icc was faster that Intel uncrippled their compiler for other processors.
The updated times look better, icc and gcc about as fast on barcelona, so perhaps Intel isn't deliberately trying to cripple AMD (any more). Now we need someone to compare icc and gcc on a newer Intel.
Is something mislabeled or removed? I don't see any icc results for Barcelona...??
It's the first result.
Except icc presumably compiles a slower code path into the binary and runs that slower path if the program does not detect that it is running on an Intel processor.
Presume away, but I don't buy it. I find it hard to blame Intel on this one. Even if Intel wanted to, they can't make AMD's parts faster than their own via compiler magic. If I was an Intel shareholder, I would be worried if icc did too good of a job compiling for AMD targets. I would think that there's no value for Intel to have inside knowledge into their own microarchitecture that they simply can't get (or more realistically, can't legally use) about AMD's.
Besides, if Intel wanted to make AMD's performance under icc terrible, there wouldn't be a 9% difference. There'd be a 90% difference. I could be convinced otherwise, e.g., if you could prove that, across a variety of benchmarks, icc compiling on an intel platform (perhaps with and without -march specified) would produce a faster binary reliably than icc compiling on an AMD platform. But one benchmark with comparisons with gcc mean nothing.
Still a problem according to this:
http://www.agner.org/optimize/blog/read.php?i=49
Intel's own website seems to admit this as well:
http://software.intel.com/en-us/articles/optimization-notice/
Higher up post:
Using a debugger, I could verify that it uses an old version of Intel MKL (version 7.2.0, 2004), and that it loads different versions of the MKL depending on the CPU ID as indicated in the table above. The speed is more than doubled when the CPU fakes to be an Intel Pentium 4.
Intel link is dead (hiding something, perhaps?)
As for the other, I fail to see the outrage (or the data). It seems to me entirely correct that there could exist and AMD processor slower than an Intel processor.
Bottom line: I see claims that icc does a poor(er) job when targeting or running on non-Intel platforms. What I don't see is proof that icc can make a faster binary for the given platform, so long as it is not compiled on an AMD part.
Try actually reading the information at the link (yes, all of it, top to bottom; don't just skim the first paragraph). You clearly want to continue thinking you are right, which means you probably don't want to read the types of things that would demonstrate you are wrong, but don't pretend like we aren't offering sufficient evidence that you are wrong.
You seem to be taking this rather personally, and I don't see why. You'll find a lot of people who won't draw the same conclusions as you, so get used to it.
Code:15f: 0f a2 cpuid 161: 89 45 fc mov %eax,0xfffffffc(%ebp) 164: 89 5d f8 mov %ebx,0xfffffff8(%ebp) 167: 89 4d f4 mov %ecx,0xfffffff4(%ebp) 16a: 89 55 f0 mov %edx,0xfffffff0(%ebp) (snip) 19f: 8b 45 f8 mov 0xfffffff8(%ebp),%eax 1a2: 3d 47 65 6e 75 cmp $0x756e6547,%eax 1a7: bb 01 00 00 00 mov $0x1,%ebx 1ac: 75 18 jne 1c6 <__intel_cpu_indicator_init+0x8c> 1ae: 8b 45 f0 mov 0xfffffff0(%ebp),%eax 1b1: 3d 69 6e 65 49 cmp $0x49656e69,%eax 1b6: 75 0e jne 1c6 <__intel_cpu_indicator_init+0x8c> 1b8: 8b 45 f4 mov 0xfffffff4(%ebp),%eax 1bb: 3d 6e 74 65 6c cmp $0x6c65746e,%eax 1c0: 75 04 jne 1c6 <__intel_cpu_indicator_init+0x8c>
The 'cpuid' call here puts the CPU manufacturer's ID in ebx:edx:ecx. The 'cmp' instructions later then check that these values were 'Genu','ineI','ntel' (i.e. 'GeniuneIntel'). If not, then we jump off to a bit of code that doesn't even pretend to check for the CPU capabilities but instead returns 0x1 (i.e it's a 386, and nothing other than the bog-standard 386 instruction set is available).