Can anyone explain to me WHAT gather and scatter instructions are?! Are they supposed to make things multithreaded automagically?
Scatter gather = vector indirect memory access. Gather eats a vector of memory addresses, and spits out a vector of values retrieved from those addresses. Scatter eats a vector of values and a vector of addresses, and stores those values into those addresses in memory. Potentially with various more complex addressing modes.
The primary thing that makes vectorizing code hard is memory access, and scatter-gather is a proven solution that has worked in many other platforms. For some problems, it just makes life easier for programmers and compiler writers, while some other problems simply cannot be solved with a sensible performance without it (for instance, texture fetch).
While scatter-gather would be a very major advancement, it would not be a magic bullet that makes all code vectorizable -- it just makes it a lot easier. It is, however, quite expensive to implement -- for example, the present L1 caches would have to get a lot more complex to support 8 simultaneous memory access.