- Nov 8, 2011
- 392
- 0
- 0
What about the unstoppable power of HSA?
It's going to revolutionize computing as we know it. It brings all of the general-purpose computing power of the GPU, into the CPU cores. No more heterogeneous overhead.
R.I.P. GPGPU.
AVX only extended floating-point operations to 256-bit. x264 uses integer operations. AVX2 offers 256-bit integer vector operations. AVX-512 does not extend them to 512-bit, for now.AVX adds like what 10% performance boost to x264 encoding when compield to use it? So AVX 512 might add what, 20%? lol it is hardly a threat to gpgpu.
So why is avx2 not gaining a lot on x264? A few percent only.AVX only extended floating-point operations to 256-bit. x264 uses integer operations. AVX2 offers 256-bit integer vector operations. AVX-512 does not extend them to 512-bit, for now.
So why is avx2 not gaining a lot on x264? A few percent only.
I wasn't the one who brought x264 to the discussionBecause AVX doesn't gain anything on C-Code which rules out half of x264. Only a few algorithms gain from AVX or AVX2 in x264. That doesn't mean other software pgrogram types only improve by a couple of percent, it isn't that easy.
Interestingly it's not exactly new. It's for the most part the Xeon Phi ISA, made compatible with the legacy 256-bit and 128-bit instructions.BenchPress,
Could you give us more detailed thoughts on what you think of this new ISA extension?
I'm all ears!
What about the unstoppable power of HSA?
It's a programing problem.So why is avx2 not gaining a lot on x264? A few percent only.
It's a programing problem.
OpenCL is the best way to achieve the power what AVX2 provides. Or HSA with legacy fallback.
Meh, OpenCL is horribly clunky in my eyes.
Going against some of my earlier stances, I actually have big hopes for autovectorization of AVX-512. It has gather, scatter, operation masking, and plenty of other cool little tricks which should finally make it feasible to really crack autovectorization. It is, frankly, going to be awfully close to a GPU- not surprising, given that the instructions are very much inspired by the Phi (and hence by Larrabee).
That's a bit like asking what programs benefit from having more execution units per core. Sure, some have more instruction level parallelism (ILP) than others, but you can't draw a line between ones that do and ones that don't benefit from it. Likewise, these wide vector instructions are very generic, and they can be used to extract any data level parallelism (DLP):So what other programs that matter to end user would benefit a lot from these wide vectors?
Yes. But the next AVX implementation should support scatter.AVX-2 had gather, but not scatter, correct?
AVX-2 had gather, but not scatter, correct?
Yes. But the next AVX implementation should support scatter.
But this is not a huge problem. With AVX Intel want very efficient execution of MIMD->SPMD algorithm. This is a good thing, but there aren't any good standardized SPMD extension for C++. In C++AMP there is, but the WARP don't support AVX/AVX2. OpenCL is the only way now to support AVX2 with good efficiency. Also in OpenCL it is easier to compile a pre-vectorized input to the GPU. There are some other problems. Doing a divergent branch on an AVX unit is worse than doing it on a modern GPU. The CPUs don't have the same flexibility in their vector units.
What we need now is a good standardized SPMD extension for C++. This is a must. Or C++AMP is also a good solution, but the WARP must support AVX and AVX2.
What makes you think that?Doing a divergent branch on an AVX unit is worse than doing it on a modern GPU. The CPUs don't have the same flexibility in their vector units.
Same question.no vector-op masking, which is a very big thing.
I highly doubt that a killer app will appear. If such a killer app existed, don't you think nVidia wouldn't have shown it given for how long they've been claiming GPGPU was the next big thing?So you can expect to see many new applications. Stuff that hasn't been done (successfully) before. Use your imagination, and someone might create it.