Linux and/or compiler experts re: Piledriver optimization

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
In the interest of benchmarking — specifically considering using Linux to test some things, I happened upon Funtoo Linux, a release of Gentoo that has profiles for most every processor so one can build one's system to extract maximum performance (maximum in terms of taking into account the limitations of the distro itself), at the cost of more effort to get everything up and running.

The thing that I'm wondering, though... The page below talks about AVX performance being a problem and yet AMD's compiler optimization profile enables AVX, the same profile that's used by Funtoo -march=bdver2 although also with -O2 -pipe. An answer on the page also suggests a workaround from Agner Fog and seems to suggest that XOP might also want to be avoided. I'm not sure why that would be. Does XOP also have performance problems?

http://stackoverflow.com/questions/33460592/forcing-avx-intrinsics-to-use-sse-instructions-instead#

Memory writes with the 256-bit AVX registers are exceptionally slow. The measured throughput is 5 - 6 times slower than on the previous model (Bulldozer), and 8 - 9 times slower than two 128-bit writes.
there is a solution for this. Agner Fog's vector class. Use a AVX vector such as Vec8f and compile with -D__SSE4_2__ -D__XOP__.

...

If you don't want to use XOP don't use -D__XOP__.

etc.

What I'm mainly wondering is if the bdver2 profile should be modified for better performance under Funtoo, for instance, and, if so, how.

Also, is it more useful to just compile individual programs with something like bdver2 rather than everything, making it possible to skip something complex like Gentoo in favor of an easier-to-use and possibly faster distro? Benchmarks I've seen on Phoronix show big swings in terms of performance from distro to distro, depending on the test. But, it seems that Intel's distro benefits in CPU tests from the instruction optimization Intel has done by quite a bit.

Of course, the other question is... has anything changed since this article in 2012, in terms of the benefits to be had:

Phoronix said:
With the Piledriver support came work within AMD's Open64 compiler fork for handling AVX, XOP, FMA3, FMA4, BMI, TBM, and F16C instruction sets.

I went over what the bdver2 target adds: BMI, TBM, F16C, and FMA3.

For all of the tests carried out under the latest AMD Open64 compiler release for Linux, none of these common open-source Linux benchmarks benefited from being built under "-march=bdver2" for the latest Piledriver support (BMI/TBM/F16C/FMA3) compared to just targeting the first-generation Bulldozer processors. Once software is better able to take advantage of BMI/TBM/F16C/FMA3, we will hopefully see the FX-8350 become even more competitive.
bdver1 was faster or equivalent in every test, maybe because of FMA4? I know Open64 isn't considered a fast compiler, though, from what I've read.

I know Piledriver is old news but I'm curious anyway. I'd rather not have to spend a huge amount of time and effort to do some benchmarking but I'd like to do more than just run generic compiles in Ubuntu if there is a benefit to be had by making a bit more of an effort. However, if the Gentoo/Funtoo distro itself is going to be so much slower than something like Ubuntu it seems like it would make more sense to just compile the apps themselves with the processor-specific optimization rather than the whole OS.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
If you have the time to compile the entire OS for every CPU you have, it would be interesting to see how much the performance of your benchmarked applications changes. It otherwise seems like a waste of time, unless you have a way to automate it all and don't mind waiting.
 

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
Funtoo does automate the process to a degree.

For the latest Intel processors, though, Intel's own distro seems to speed up some things, mainly CPU-related processes, although it falls behind in some other benchmarks. It is apparently compiled to take advantage of the latest Intel architectures.

From the tests that I saw on Phoronix, no single distro is fastest in all areas. It does look like BSD is slower than Linux in general except possibly in database hosting. Ubuntu 15 seems to do fairly well overall.
 
Last edited: