Intel Clear Linux ignores Zen's AVX2, FMA, BMI in Math library for Phyton, C/C++, SQL, ....

Hans de Vries

Senior member
May 2, 2008
321
1,018
136
www.chip-architect.com
This is currently emerging on the Internet: Page URL

Bug 24979 - sysdeps: dl_platform detection effectively performs "cripple AMD"

Glibc received the capability to transparently load libraries for specific CPU families with some SIMD extensions combinations in 2017, a move that should be beneficial for a lot of x86 users. However, the current form of the implementation[1] limits two "good" sets of modern SIMD instructions to Intel processors only, preventing competitor CPUs with equivalent capabilities to fully perform, something that should not happen in any free software package. This feels quite like a flashback to the icc/mkl "cripple AMD" routine from 2009.[2]

Here on this page is how it works. The open source MathLibrary OpenBLAS is used for build in functionality for Python, SQL, R, SAS used by over 90% of data-scientist and it can be called from many other languages like C/C++, Java and so on.

Multiple sets of ISA extensions such as AVX2, FMA, BMI are not recognized individually but as a whole as the "haswell" platform. Intel made a Linux patch to use some deprecated parameters for this. If the patch sees all the extensions than it will set the dl_platform parameter to "haswell". This patch was introduced shortly after the Ryzen launch in 2017.


Even when Zen1 and Zen2 have all the required extensions (AVX2, FMA, BMI....) the AMD CPU's will not get the dl_platform set to Haswell even though they are Haswell ISA compatible. The result is that all calls to OpenBLAS go to a non-optimized dynamic openBLAS library while in case of newer Intel processors, another optimized dynamic library is used which does contain all the AVX2, FMA, BMI optimized function.

Here is de code:

10536
 

Nothingness

Platinum Member
Jul 3, 2013
2,410
744
136
This matches what I found and understood.

Now who, beyond Intel ClearLinux, relies on that mechanism? All distros I looked at don't support that multilib, but they still have dedicated paths in their C library that should be correctly executed on AMD CPU.

EDIT: to clarify, I don't mean to say this shouldn't be fixed. It has to be fixed and that shouldn't be difficult. I'm just saying the impact likely is very small, perhaps only for those Zen users running ClearLinux.
 

moinmoin

Diamond Member
Jun 1, 2017
4,949
7,659
136
Note that this is not an issue with Clear Linux but with GCC, which is used by pretty much all Linux distributions. So any software is affected as soon as it uses GCC as compiler and "haswell" as target arch. The issue is likely not malice on part of Intel's developers who implemented this mechanism in GCC but simply bad design of using an arch to define an arbitrary supported feature set (which thanks to Intel's random segmentation strategy neither works for all Intel chips). GCC developers want AMD developers to get involved to fix it, but imo GCC just has lackluster guidelines (enforcing) to let such a case happen to begin with.
 
  • Like
Reactions: lightmanek

Schmide

Diamond Member
Mar 7, 2002
5,587
719
126
That screenshot seems to be misleading. I haven't ventured down the rabbit hole but this code snippet seems different than the current repo.

Looking at the code in GitHub there is a whole section to determine simd usage and seems more than likely to enable AVX and such on AMD and possibly others.

Code:
 118   /* This spells out "GenuineIntel".  */
 119   if (ebx == 0x756e6547 && ecx == 0x6c65746e && edx == 0x49656e69)
(snip)
Code:
/* This spells out "AuthenticAMD".  */
 252   else if (ebx == 0x68747541 && ecx == 0x444d4163 && edx == 0x69746e65)

The code that is similar to the one in the graphic follows a no_cpuid def which bypasses all that code from line 112. (I would guess to maybe deal with frankenprocessosaurses)

Code:
108 #if !HAS_CPUID
 109   if (__get_cpuid_max (0, 0) == 0)
 110     {
 111       kind = arch_kind_other;
 112       goto no_cpuid;
 113     }
 114 #endif

Code:
307 #if !HAS_CPUID
 308 no_cpuid:
 309 #endif
(code in graphic)

The secrets sauce seems to be in the snips.

To me this looks kind of malicious.
 

Glo.

Diamond Member
Apr 25, 2015
5,707
4,551
136



At least they are not trying to hide the fact that AMD users need not apply.
If that would be correct, there would be no affect on performance of AMD EPYC 2 CPUs, and they would not see the highest performance on Clear Linux from ALL of mainstream Distributions.

The performance margin between Clear Linux and other Distros is the same regardless of CPU manufacturer.

Yes, it is highly tuned for Intel CPUs. But it does not cripple AMD CPU performance, at the same time. Even AMD CPus benefit from some of the optimizations in the OS.

Examples here: https://www.phoronix.com/scan.php?page=article&item=8way-amd-rome&num=1
 

crashtech

Lifer
Jan 4, 2013
10,524
2,111
146
^ All very interesting! I was being a bit hyperbolic when saying AMD users need not apply, but since Intel pretty much disclaims their CPU bias right in plain sight, I kind of figured that any complaints re compiler shens are not as underhanded as some that have occurred in the past.

So, the question becomes, how are these Epyc CPUs performing so well on an OS that is explicitly not optimized for them? Could they do even better with a bit of tweaking, like CPUID spoofing?
 

thesmokingman

Platinum Member
May 6, 2010
2,307
231
106
^ All very interesting! I was being a bit hyperbolic when saying AMD users need not apply, but since Intel pretty much disclaims their CPU bias right in plain sight, I kind of figured that any complaints re compiler shens are not as underhanded as some that have occurred in the past.

So, the question becomes, how are these Epyc CPUs performing so well on an OS that is explicitly not optimized for them? Could they do even better with a bit of tweaking, like CPUID spoofing?

Raw power! There is literally no good reason to go Intel on the corporate front. That should demonstrate just how big the gap is.
 

Schmide

Diamond Member
Mar 7, 2002
5,587
719
126
Dudes. Look at the code. Read my post. It isn't doing anything wrong. Look at lines 119-305.
 

Schmide

Diamond Member
Mar 7, 2002
5,587
719
126
So, that was sarcasm? Because I don't know what the code means.

The graphic and the way they snipped code to make it look like it is doing something it isn't.

Edit: Where did this graphic come from. Source your images please.
 

Hans de Vries

Senior member
May 2, 2008
321
1,018
136
www.chip-architect.com
The graphic and the way they snipped code to make it look like it is doing something it isn't.

Edit: Where did this graphic come from. Source your images please.

1) If you are looking for the code you could have clicked on Here is de code: which is just above the code.

2) If you want to know what the code does then you should have clicked on: Here on this page is how it works.
( The author of the code is the co-author of the article. ) or otherwise read the explanation in the bug-report Page URL
 
Last edited:

Schmide

Diamond Member
Mar 7, 2002
5,587
719
126

Schmide

Diamond Member
Mar 7, 2002
5,587
719
126
Here's my quick analysis of the code.

There is a lot more nuance to the intel features (lines 128-250) most of those set flags to limit the usage of certain operations on intel platforms. Example

Code:
237       /* To avoid SSE transition penalty, use _dl_runtime_resolve_slow.
238          If XGETBV suports ECX == 1, use _dl_runtime_resolve_opt.  */
239       cpu_features->feature[index_arch_Use_dl_runtime_resolve_slow]
240         |= bit_arch_Use_dl_runtime_resolve_slow;

There is a penalty mixing sse and avx on most if not all intel platforms.

Other flags around that tell the libraries to allow unaligned loads for intel processors with AVX2, disable Intel TSX on Haswell processors, etc

Lines 252-292 adjust a couple flags for AMD.

The lack of flags there is more that AMD processors lack AVX512 and don't pay a penalty for mixing sse and avx.

Excavator seems to prefer a couple copy parameters.

The code on lines 322-350. Seems to only name certain intel platforms based on certain features it does not alter any cpu_features->feature[index] flags.

The need for all this is more that intel has so many segments of simd support. Now with mobile AVX512 support.

AMD on the other hand has 3, pre-bulldozer, bulldozer, and zen.

Edit: Digging further I now understand that GLRO(dl_platform) is doing more to decide what library to use. Yes this is not how things should be done. So...
 
Last edited:

Nothingness

Platinum Member
Jul 3, 2013
2,410
744
136
Edit: Digging further I now understand that GLRO(dl_platform) is doing more to decide what library to use. Yes this is not how things should be done. So...
Yes but do you know of any distro that provides those dl_platform specific libraries? I mean except Intel ClearLinux. I don't.
 

TheGiant

Senior member
Jun 12, 2017
748
353
106
Isn't there a simple solution= don't install Intel ClearLinux ?
install other that is optimised for zen2 ?
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Here's my quick analysis of the code.

Edit: Digging further I now understand that GLRO(dl_platform) is doing more to decide what library to use. Yes this is not how things should be done. So...

It is doing nothing more than adding to the search path of libraries. The libraries itself do not necessarily exist. Nothing wrong with that.