• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

How do modern compilers handle multiple instruction sets?

GWestphal

Golden Member
Say I have code that could benefit from AVX2, but I know a majority of my user base will have hardware that does not support this feature. When I compile, do modern compilers automatically create branches in the binary to use other instruction sets? For instance, I compile it with AVX2 support, does it create parallel paths? So the compiled version says this block of code can be executed if AVX2 is present, else if AVX is present do this, else do this. And then depending on which hardware is running it it picks the most efficient executuion path through the binary. The the binary is 4 times larger than if it was optimized for just AVX2, but it its also compatible across more instructionsets/hardware.
 
The process you are specifically referring to is called "vectorization" and, TBH, most compilers aren't great at it.

The process is very complex. As far as I understand it, the middleware compiler code identifies snippets that can be vectorized and the backend compiler code does the vectorization. In other words, each target is specifically programmed to perform vectorization for a given target. However, the process of identifying vectorizable code and vectorizing are two different processes.

(I'm not an expert on compilers in the slightest. So take everything I say with a spoonful of salt. This is just my understanding of how things work).
 
Say I have code that could benefit from AVX2, but I know a majority of my user base will have hardware that does not support this feature. When I compile, do modern compilers automatically create branches in the binary to use other instruction sets? For instance, I compile it with AVX2 support, does it create parallel paths? So the compiled version says this block of code can be executed if AVX2 is present, else if AVX is present do this, else do this. And then depending on which hardware is running it it picks the most efficient executuion path through the binary. The the binary is 4 times larger than if it was optimized for just AVX2, but it its also compatible across more instructionsets/hardware.

I'm far from an expert, but I do not believe this process is automatic. If you compile with AVX2 code, you manually need to put in some method for the executable to be able to run on older computers. Or recompile for a different machine.

here's some discusssion on it (with reference to SSE, not AVX)
http://stackoverflow.com/questions/...-sse-and-other-cpu-extensions/1894818#1894818
 
Typically what you do is set a minimum instruction set and you don't worry about producing multiple exe's or code paths. In very rare circumstances you might code multiple paths by hand and choose on the basis of the CPUs capabilities but its not something I have done very often. I have ha different DLL's compiled with different targets that are loaded dependent on CPU capabilities, but typically that is 32 bit v 64 bit!
 
Typically, what happens is that you create two different functions, probably in two different source files so that they are compiled separately with different compiler switches. Then you have a third function, written in assembler, that detects the current CPU's capabilities (like what a CPUID program does), and based on the results of that detection at runtime, your program then chooses which function to call. It's why most people don't bother with these things except for in situations where it would make a significant difference.
 
Back
Top