SSE4.2
The consoles might help for AVX, but AVX2 is still a not really useful option.
		
		
	 
If you are referring to the PS4 and XBone, do recall that their APUs are based on Jaguar.  Unless there's a bugfix, they run AVX code half as fast as they should.  There's still a performance increase, but it isn't all that great.  
	
	
		
		
			I personally think that HSA runtime is the best way to get AVX2 and even AVX512 support for the applications.
		
		
	 
Why? As of now, HSA is only publicly supported under Linux, and even then only tenuously.  And HSA has nothing to do with SIMD instruction sets.  It would need to be re-engineered from the ground up to optimize code blocks for SIMD, which is something no developer should rationally need or want from their development toolset and/or the end-users software stack.
ICC and GCC can do autovectorization, and even MS compilers can do it, to an extent.  SIMD has been around in dev circles long enough that if they really want to use it, they ought to be able to figure it out already.
The main problem is that, how often does a video game provide you with situations where a game engine can take advantage of SIMD outside of the graphics pipeline? SIMD thrives in circumstances where you have lots of similar computational tasks involving the same data types (int or fp) of the same length (32-bit, 64-bit) without dependencies or branching interrupting the task flow.  The most obvious application is in rendering tasks, but those are primarily offloaded to the GPU so that's out.  Physics is another area, but again, the push is to move that to GPUs wherever possible.  There are other areas where SIMD could be used.  It just requires some outside-of-the-box thinking.
 
	
	
		
		
			That platform is cheap and we can target a lot of extensions/accelerators with the same codebase. I really think that SYCL 2.1 will be also a revolutionary step for the programers.
		
		
	 
At least from what I've sen of SYCL 1.2, it doesn't look like Khronos group is targeting HSA at all, or any kind of SIMD autovectorization.  Again, that would be somewhat redundant . . .
	
		
	
	
		
		
			Game developers aren't going to want to lose sales to people who don't have AVX2 CPUs. This is one of the reasons why game engines aren't being made that fully take advantage of eight threads. They need the games to run well on an i3.
		
		
	 
Skylake i3 supports AVX/AVX2, at least until you overclock them (then I'm not sure what is the problem, hopefully the board OEMs can fix that).
	
		
	
	
		
		
			I'm no programmer, but I was under the impression that extensions such as AVX2 were backward compatible with older extensions.  For example, a new CPU like Haswell or Skylake would run the fastest codepath with AVX2, while a CPU like Sandy Bridge would use the same codepath but with less throughput/performance due to lacking AVX2..
		
		
	 
That depends on the binary.  If it has separate code paths, it'll use them.  If it doesn't, the program could bomb out and refuse operation.  Of course, if you use a JVM language, it can optimize for available SIMD extensions on the fly.
	
	
		
		
			And the game definitely was CPU limited when it first shipped no doubt, but now it performs very well.  So I wonder, did they get these gains by exploiting more vectorization, or was it all due to better multithreading?
		
		
	 
Shouldn't be too hard to tell, as you articulated here:
	
	
		
		
			It seems more the latter, as CPUs with more threads/cores gained more performance.
		
		
	 
though the best indicator would be to run the two different versions side-by-side and see how CPU behavior changed between versions.  If core utilization went up among those that were already active, then it was probably due to code optimization (including maybe SIMD extension use).  If under-utilized or completely unitilized cores became active, then it was probably due to multithreading.
	
		
	
	
		
		
			why should Intel pay for validation for CPU from a competitor?
)
		
		
	 
They shouldn't, but why be concerned at all about a competitor's CPU then? Validation implies that Intel would be/should be concerned over whether or not AMD's CPUs can conform to an ISA extension's rules by handling the instructions properly.  Let AMD worry about that.  If AMD's implementation of SSEwhatever or AVX is flawed, causing an ICC-compiled application to crash, then it's on AMD's head.  Intel shouldn't worry about it one way or the other.
Let AMD's CPUs indicate compatibility for whatever ISA extension they think they can support, and let the program use it if it finds it.  Intel never needed to check for the Intel label before allowing ICC-compiled apps to use SIMD.  That was lame as hell.