Originally Posted by cytg111
- You dont have to wait for your favorite software vendor to get benefit from your new arch. Your software vendor may never get around to it or may even not be in business anymore. There's a ton of scenarios where this makes sense IMO.
We have libre software, and languages based on IRs, both of which are quite efficient ways to do it.
With AVX(2), there is the option to get a performance boost just from vector SSE2 code recompilation (scalar won't get anything). That's not the norm. Getting any kind of meaningful performance boost from an existing binary will not be an easy feat, and may be close to impossible, without taking the time to analyze all code side effects, and determine which ones are and are not made use of.
The information that you really want, to make it work well, is in some form of AST and/or object file set, worked out from the source code. Most of that is, "thrown away," as part of making the final binary. Most of it can still be made use of with an abstract virtual machine language, made to describe the program's activity, not some other machine's activity. The JVM and .NET CLR are the premier examples of this, today, and LLVM is getting pervasive in the odd places you wouldn't normally look (drivers, FI). These pseudo-machine languages are very much made to have enough semantic information for a compiler left in them, while being far enough from the code that the CPU time to compile won't need to be too much. Binaries for actual machines, however, have all their quirks wrapped up in them, and use every little trick the compiler (or programmer, in highly-optimized cases) could figure out, to speed them up. In the case of whole new sets of instructions, the code would need to be reverse-engineered, refactored, and re-optimized.
Now, for compatibility, I'm all for it, like making a program that breaks in 64-bit work fine in 64-bit (I really do wish MS supported thunking all the way down, for that, because I occasionally do need to read files from ancient programs). But, for performance, it may not even be possible to improve things that way, but if it is, it would take an insane amount of work.