Haswell is all about
AVX2. Its key feature is 'gather' support, which in some cases enables an
eightfold increase in performance.
Gather is the parallel version of a memory load operation, reading up to 8 different 32-bit memory locations simultaneously or nearly simultaneously. In fact it replaces 18 legacy instructions with just a single instruction!
This is very significant because previously vector instructions were very hard to use by compilers. Only having sequential access to memory more often than not made compilers stick to slow scalar code. AVX2 finally allows to auto-vectorize code loops a lot more effectively. In fact gather support is the key to the GPU's high performance in 'throughput computing'. So Intel is effectively enabling the CPU to sustain performance levels similar to those of a GPU of equivalent size. This is further reinforced by AVX2's support for fused multiply-add (FMA) instructions. Again this is a feature borrowed from GPUs.
Because AVX2 is a massive extension, we should not expect any other major changes to the architecture, unless those to better support AVX2. For instance it will demand higher cache bandwidth to sustain the high throughput, so it is expected to be doubled compared to Ivy Bridge. The only realistic IPC improvement would be an extension of the macro-op fusion capabilities to support non-destructive scalar operations.
And because a quad-core Haswell chip will have higher floating-point performance on the CPU side than on the GPU side, you can expect to see a return of software vertex processing to enhance the graphics performance (much like the Cell high-throughput CPU assists the GPU in a PlayStation 3). This is also reinforced by the addition of 16-bit floating-point support for Ivy Bridge (called F16C).