The Official AVX2 Thread

Olikan · Jun 14, 2012

posts about avx-2 comming in ..........3.......2........1.......

This thread was created from posts that were contributed to another CPU thread but were derailing that thread. The OP's post here was one of those posts, not an issue on its own (the OP is not sanctioned) but its just where we started the thread so there you go.

Administrator Idontcare

KompuKare · Jun 14, 2012

Meh, AVX2 is just around the corner and will eclipse everything that has gone before by speeding every program x10 fold just be recompiling. It will finally live up to the hype of Larrabee, Itanium and even the old i860 DSP. All other processor makers will be doomed and all other foundries will henceforth only make memory chips...

@Olikan: was that what you had in mind...

CPUarchitect · Jun 15, 2012

KompuKare said:
Meh, AVX2 is just around the corner and will eclipse everything that has gone before by speeding every program x10 fold just be recompiling.

Actually scalar floating-point code can become up to 16x faster. But the worst case for parallel compute limited code is 2x. Bandwidth also goes up by 2x and there's gather support, so that should be a pretty consistent minimum.

Nemesis 1 · Jun 15, 2012

podspi said:
Seems kind of odd, given how important memory bandwidth is for APUs...

Also, I honestly do not understand why AVX2 always comes up in discussions of HSA/FSA/OpenCL. I see people continually framing the discussion as "AVX2 vs. heterogeneous computing" and I just do not think that is the case. We've seen Intel and AMD release OpenCL-compatible CPUs, and we will see both Intel and AMD introduce AVX2-compatible CPUs.

The main benefit of HSA, as far as I can tell, is making it easier for developers to extract processing power out of a heterogeneous system. While I agree that a homogeneous system will probably always be "the fastest", I think it makes a lot of sense for developers to be able to use as much processing power as possible, from as many sources as possible. Especially in mobile situations where you might not have an abundance of any one computing resource.

Look I understand that AMD can do AVX but what makes you think AMD will be able to use AVX2 . Intel won't stop them . But Intel will not allow AMD to use the vec. prefix now or ever. I know with AVX amd was able to use There prefix with AVX . I not sure AMD has a anything they can use for AVX2. I really don't see why AVx can't work with open CL either. AMD new about AVX long ago but AVX2 when announced they didn't know about . As intel has that locked up with the vecprefix Flag football going on big time here. Its not likely that AMD will not be able to use the same code path.

Riek · Jun 15, 2012

Nemesis 1 said:
Look I understand that AMD can do AVX but what makes you think AMD will be able to use AVX2 . Intel won't stop them . But Intel will not allow AMD to use the vec. prefix now or ever. I know with AVX amd was able to use There prefix with AVX . I not sure AMD has a anything they can use for AVX2. I really don't see why AVx can't work with open CL either. AMD new about AVX long ago but AVX2 when announced they didn't know about . As intel has that locked up with the vecprefix Flag football going on big time here. Its not likely that AMD will not be able to use the same code path.

AMD can use the VEX prefix tag without problems. What AMd cannot do is put their own defined operations with those prefix. Heck why XOP is not using the prefix. But they are free to use it for the istructions defined by intel.

(also you had that discussion months ago, multiple times... apparently you didn't learn it... although you said you got it back then.)

inf64 · Jun 15, 2012

I'm reading the posts of this Nemesis poster and I can't believe my eyes. So much absurd and untrue statements that it's incredible. At one point I was sure he is yanking our chains but now I really think he has no clue what he is talking about. Amazing.

bronxzv · Jun 15, 2012

Nemesis 1 said:
Look I understand that AMD can do AVX but what makes you think AMD will be able to use AVX2 .

not sure what you are after but the VEX prefix is the same for AVX2 and AVX, AMD already supports AVX (thus VEX) moreover most people now talk about AVX2 as including FMA3 so AMD already support part of AVX2 in Trinity

SocketF · Jun 15, 2012

bronxzv said:
not sure what you are after but the VEX prefix is the same for AVX2 and AVX, AMD already supports AVX (thus VEX) moreover most people now talk about AVX2 as including FMA3 so AMD already support part of AVX2 in Trinity

No, AMD supports partly AVX2 functionality. But as long as the decoder does not decode the explicit AVX2 instructions into internal µOps, AVX2 code wont work.

AMD can only activate that, if they can support all AVX2 instructions. There are some rather strange (though useful) scatter/gather instructions, I wonder if they can "emulate" this things easily with their current architecture.

For FMA3 things were easy, because FMA3 has no other/new functionality compared to FMA4. So an easy decoder update was enough ;-)

bronxzv · Jun 15, 2012

SocketF said:
No, AMD supports partly AVX2 functionality.

thus my "AMD already supports *part* of AVX2" comment, there is several AVX2 related feature flags in at least 2 distinct CPUID leaves

SocketF said:
There are some rather strange (though useful) scatter/gather instructions,

nothing strange about the vgatherx/vpgatherx series of instructions, they are perfectly documented and used by existing compilers (for HNI targets) and the SDE emulator, there is no scatter instructions in AVX2, though

CPUarchitect · Jun 15, 2012

podspi said:
Also, I honestly do not understand why AVX2 always comes up in discussions of HSA/FSA/OpenCL. I see people continually framing the discussion as "AVX2 vs. heterogeneous computing" and I just do not think that is the case. We've seen Intel and AMD release OpenCL-compatible CPUs, and we will see both Intel and AMD introduce AVX2-compatible CPUs.

Only one of these technologies will prevail. There is no room for both since both attempt to cover the need for general purpose throughput computing. History shows that incompatible competing technologies cannot coexist. Think about AMD64 versus IA64: Itanium is practically dead. Think about 3DNow! versus SSE: Bulldozer no longer supports 3DNow!.

So the question now is which is the superior throughput computing technology: homogeneous AVX2+ or heterogeneous GPGPU? And yes, both companies will support both for a while, but they have a different idea of what to focus on. There's a lot at stake for AMD since it's sacrificing CPU performance to make the GPU more powerful, in an attempt to make GPGPU more attractive. Not just that, it's also sacrificing graphics performance. As illustrated by NVIDIA's Fermi and Kepler, graphics and GPGPU require different architectures. HSA leans very much toward GPGPU, which compromises graphics.

Intel doesn't make any sacrifices. It already has a superior CPU architecture and it will be the first to add high throughput performance to it using AVX2. Even when AMD implements AVX2, there will still be a big difference in computing density because of Bulldozer's shared SIMD cluster architecture. There's also no sign of Intel sacrificing graphics performance for the sake of GPGPU. And last but definitely not least, AVX2 is much easier to adopt by developers than GPGPU, and will offer more consistent performance across system configurations.

The main benefit of HSA, as far as I can tell, is making it easier for developers to extract processing power out of a heterogeneous system.

Easier, yes, but it will never be easy. In fact heterogeneous computing becomes harder when things scale up. So they're fighting an uphill battle. The only way to guarantee that it doesn't suffer from bad latency and bandwidth scaling, is to fully merge the GPU technology into the CPU. And that's what AVX2 already does!

It's no coincidence that Intel's Knights Corner chip, which is pretty much a GPU architecture (minus the graphics components), uses an instruction set that has a very close resemblance to AVX2.

So it's inevitable that things will converge into a single architecture. All general purpose computing will happen on the CPU. The GPU either has to become fully focused on graphics, or the programmable shaders too get processed on the CPU and the GPU decays into some fixed-function units that act as peripheral components which assist the CPU in graphics processing.

AMD desperately wants the CPU and GPU to remain heterogeneous, but in doing so it ironically converges them closer together, making the case for AVX2 and its successors.

pelov · Jun 15, 2012

CPUarchitect said:
It's also no coincidence that Intel's Knights Corner chip, which is pretty much a GPU architecture (minus the graphics components), uses an instruction set that has a very close resemblance to AVX2.

It's x86-based. Larrabee 1.0 was also x86-based as well. Do you remember what the biggest drawback for Larrabee was? It was in being x86-based

Developers HATED it with a passion. You can't start throwing around x86 everywhere and expect people to follow along, particularly the GPU space. It's completely unnecessary.

AVX2 is an instruction set, HSA isn't. You seem to be confusing those. What stops AMD from adopting AVX2 on their CPUs? Considering their addition of FMA3 as well as FMA4 you'd figure they almost certainly will. But furthermore, what's to stop AMD from adding AVX2 to their GPUs as well? Nothing.

It is converging to a single SoC style architecture but with different routes and ideals in mind. Intel is looking to get there via instruction sets and x86 while AMD is looking past x86 and embracing and implementing HSA in such a way that all microarchitectures, regardless of who makes them, can benefit. openCL/HSA aren't proprietary anything.

CPUarchitect said:
There's a lot at stake for AMD since it's sacrificing CPU performance to make the GPU more powerful, in an attempt to make GPGPU more attractive. Not just that, it's also sacrificing graphics performance. As illustrated by NVIDIA's Fermi and Kepler, graphics and GPGPU require different architectures. HSA leans very much toward GPGPU, which compromises graphics.

Considering the modern era of computing, I'm pretty sure that's a good bet. What amazes me about posts like this is that people like yourself think this frame of thought only applies to AMD yet they completely neglect IB/Haswell/Sky Lake and how all 3 of those are GPU-focused architectures. So while you, and others, may throw around CPU benchmarks as if they mean something, the majority of users across all platforms are better served with a quicker SSD, GPU performance and display resolutions rather than a quicker pass through Pi. If laptops outsell desktops 3:1 (I mentioned 2:1 earlier, it's far closer to 3:1 in reality) and tablets and smartphones sell like hotcakes, what in the world makes you think we need more CPU processing power? You and me? Sure. But what about pretty much everyone else? They couldn't care less. They want a higher res screen, larger storage space and thinner design. Intel didn't bring about Ultrabooks because we needed more raw CPU throughput.

And that's the point. While AVX2 is just another instruction set that requires recompiling and benefits only an incredibly small base of hardware, openCL has been adopted by pretty much every single hardware maker worth their salt, even Intel and nVidia, to leverage the single piece of hardware that's making the greatest gains in your typical device: your GPU.

bronxzv · Jun 15, 2012

pelov said:
It's x86-based. Larrabee 1.0 was also x86-based as well. Do you remember what the biggest drawback for Larrabee was?

the fact that it wasn't released, isn't it?

pelov said:
It was in being x86-based Developers HATED it with a passion.

source for this BOLD statement ?

See for example Richard Huddy comment here;
http://www.bit-tech.net/hardware/graphics/2011/03/16/farewell-to-directx/1
"[...]and I guess it was actually the primary appeal of Larrabee to developers – not the hardware, which was hot and slow and unimpressive, but the software – being able to have total control over the machine[...]'

I can assure you that he knows very well what real developers want, real development teams care about development tools and software lifecycles not about the latest x86 bashing thread at slashdot.net

pelov said:
even Intel and nVidia, to leverage the single piece of hardware that's making the greatest gains in your typical device: your GPU.

look around for benchmarks comparing Open CL on Ivy Bridge GPU vs CPU and we will have some basis for discussion, it will be far better to have simply twice the CPU cores than this unmanageable (for complex projects) "heterogeneous" thingy, people (see AMD misleading HSA slides) keep comparing GPU+CPU vs CPU when the sensible comparison is GPU+CPU vs CPU+CPU (i.e. constant chip area and/or power), AMD claims 3:1 power adavantage for GPU:CPU but they are comparing with their worst ever CPU architecture and a soon obsolete ISA

pelov · Jun 15, 2012

As a consequence of underpowered hardware and questionable driver decisions, Intel was always ridiculed by gaming development teams and cursed upon whenever a publisher would force that the game supports Intel's integrated graphics - resulting in a paradox of having the best CPU and the worst GPU [several key developers warned us about never writing GMA graphics as a "GPU"]. In fact, during the preparation of this story a certain affair involving GMA graphics and driver optimizations in 3DMark Vantage broke out courtesy of Tech Report. Tim Sweeney of Epic Games lacked courtesy of Intel's graphics capabilities commenting that "Intel's integrated graphics just don't work. I don't think they will ever work." But that statement could be considered as courtesy compared to his latter statement "[Intel] always say 'Oh, we know it has never worked before, but the next generation ...' It has always been the next generation. They go from one generation to the next one and to the next one. They're not faster now than they have been at any time in the past."

http://www.brightsideofnews.com/pri...ient-truth-intel-larrabee-story-revealed.aspx

bronxzv said:
look around for benchmarks comparing Open CL on Ivy Bridge GPU vs CPU and we will have some basis for discussion, it will be far better to have simply twice the CPU cores than this unmanageable (for complex projects) "heterogeneous" thingy

You can say the same about nVidia and their openCL numbers. nVidia and Intel each have a separate horse in the race (nVidia has CUDA and Intel has anything-but-a-GPU for compute).

bronxzv said:
it will be far better to have simply twice the CPU cores than this unmanageable (for complex projects) "heterogeneous" thingy

Like gaming, for instance? Or driving a GUI? Because CPU cores can't do that alone. The reason we have GPUs is because CPUs sucked at each task. As resolutions get bigger you'll need bigger GPUs and both Intel and AMD know this. CPU performance has taken a back seat to GPUs. If you look at their upcoming CPU architectures you'll see they're both doing the same thing: addressing GPU performance. The biggest difference is that AMD has stated it publicly while Intel can't.

The point of HSA is to take advantage of that beefier GPU and have it do something rather than sit idle. That's all. And it's not like we're stopping on GPU power. In fact we're accelerating at a far faster rate than CPU performance is and it has to be that way to drive an ever-accelerating pixel count. Compare 7xxx to 6xxx, or Kepler to Fermi. Those are massive upgrades. Now compare your SB-E to a 920 or Thuban to a Phenom I (I won't mention Bulldozer as that was a fail of an epic magnitude). For near 95%> users they wouldn't be able to tell a difference.

bronxzv · Jun 15, 2012

pelov said:
Like gaming, for instance? Or driving a GUI? Because CPU cores can't do that alone.

The GUI ? you're kidding us ? even for games, sorry, I don't see why we need such a clumsy architecture, do you really think that, let's say 20 years down the road, there will be still separate "GPU" and "CPU" ?

pelov · Jun 15, 2012

bronxzv said:
The GUI ? you're kidding us ?

Try driving something with lots of sparkles via CPU like compiz 3D. Yes, GUI. You need a compatible 3D-capable GPU or on-die GPU.

bronxzv said:
even for games, sorry, I don't see why we need such a clumsy architecture, do you really think that, let's say 20 years down the road, there will be still separate "GPU" and "CPU" ?

Sorry, what side of the fence are you on again? Isn't it AMD that's looking to merge the two and not Intel? Because you're arguing FOR HSA now.

It's no secret GPUs are better at FP-tasks. They have always been and they will always be. They're specifically built with that single purpose in mind. CPUs, otoh, have to be able to do it all. Well, why not have the GPU take over the FPUs place? Or if you want to look at it the other way, why not have the FPUs grow up to be big boy GPUs? That's the point here. While Intel is looking to keep those two completely separate (other than specific hardware dedicated to specific tasks like Quicksync. Even still the HDxxxx can't do hardware accelerated GPGPU), AMD, ARM and Apple, etc. are looking to merge the two together for GPU computing.

AVX2 is an instruction set. Why are people forgetting this? AMD can embrace AVX2 adoption and almost certainly will. Moreover, AMD can embrace AVX2 adoption even better than Intel can because their GPUs can utilize AVX2 as well. What the hell does AVX2 have to do with HSA?

bronxzv · Jun 15, 2012

pelov said:
http://www.brightsideofnews.com/pri...ient-truth-intel-larrabee-story-revealed.aspx

don't see how it's related with Sweeney being against or pro x86, I'm quite sure his team will have released their own version of the Unreal engine with a richer/faster code than Intel's own by bypassing the Intel default DX compatibility stack

CPUarchitect · Jun 15, 2012

pelov said:
It's x86-based. Larrabee 1.0 was also x86-based as well. Do you remember what the biggest drawback for Larrabee was? It was in being x86-based Developers HATED it with a passion.

That's absurd because Larrabee was never released to the public.

The reason Larrabee failed was because it was aiming to compete with highly dedicated discrete graphics cards from NVIDIA and AMD, while also being suitable for HPC purposes. It succeeded in the latter market, hence the Knights Corner successor. And so you can't use Larrabee as an argument against AVX2 for general purpose throughput computing.

AVX2 is an instruction set, HSA isn't. You seem to be confusing those.

AVX2 in the strictest sense is indeed just an instruction set, but there's an underlying micro-architecture to support that instruction set! So what really matters is that Haswell's hardware will have wide vector units, gather support, fused multiply-add, high cache bandwidth, etc. So I could just talk about Haswell versus HSA, but AVX2 will outlive Haswell. Hence it's more useful to group all of the involved hardware under a broader definition of AVX2.

And in fact while HSA isn't an instruction set, it's also just a collection of concepts which wouldn't be very meaningful without an actual micro-architecture which is specifically designed to support it. So in many ways AVX2 in the broader sense and HSA in the broader sense are closely comparable.

What stops AMD from adopting AVX2 on their CPUs?

Nothing is. But please read my post above again. AMD and Intel have a different focus and a different interest.

But furthermore, what's to stop AMD from adding AVX2 to their GPUs as well? Nothing.

No, that's not an option for AMD. AVX2 contains instructions which require the general purpose registers. So you'd basically need all of x86 in the GPU. Which means it would actually be a homogeneous architecture. The whole point of HSA is to have non-x86 processing.

...what in the world makes you think we need more CPU processing power?

While AVX2 is just another instruction set that requires recompiling and benefits only an incredibly small base of hardware, openCL has been adopted by pretty much every single hardware maker worth their salt, even Intel and nVidia, to leverage the single piece of hardware that's making the greatest gains in your typical device: your GPU.

Why is it that you think more general-purpose CPU processing power is meaningless while more general-purpose GPU processing power is the best thing since sliced bread? It's both just computing.

But the difference is that while AVX2 requires a recompile, OpenCL requires a rewrite! That's far more developer effort. Furthermore, there's no guarantee that a GPU will outperform a CPU. As shown many times before, even a 3 TFLOPS GTX 680 can lose against a 230 GFLOPS CPU, and that's before AVX2!

pelov · Jun 15, 2012

bronxzv said:
don't see how it's related with Sweeney being against or pro x86, I'm quite sure his team will have released their own version of the Unreal engine with a richer/faster code than Intel's own by bypassing the Intel default DX compatibility stack

It's unnecessary. It's Intel sticking x86 onto something that has absolutely no need for x86.

Further, the performance was absolutely dreadful and even the team that worked on Larrabee badmouthed it to no end. It was essentially a P4/Bulldozer on the GPU.

CPUarchitect said:
No, that's not an option for AMD. AVX2 contains instructions which require the general purpose registers. So you'd basically need all of x86 in the GPU. Which means it would actually be a homogeneous architecture. The whole point of HSA is to have non-x86 processing.

My point was AMD can make an MIC as well. In fact, they could have but I think they came to the conclusion that it was unnecessary. Take a gander at how well nVidia has done in HPC without it.

AVX2 is paradoxically more useful in the place where it makes sense less: HPC. x86 is completely unnecessary in HPC for the GPGPU portion, but it's also the one segment of computing where recompiling and optimization to specific ISAs and architectures is far more likely.

bronxzv · Jun 15, 2012

pelov said:
It's no secret GPUs are better at FP-tasks.

nope, by far not all FP tasks, they are good mostly for dumb brute force very simple algorithms I'll say

bronxzv · Jun 15, 2012

pelov said:
It's unnecessary

indeed it's not important, so don't put x86 into this discussion

CPUarchitect · Jun 15, 2012

pelov said:
It's no secret GPUs are better at FP-tasks.

That's a false blanket statement. A quad-core Haswell processor with a GT2 iGPU will have more floating-point processing power on the CPU end. Likewise there are mainstream Sandy Bridge models that have more CPU power too.

And there really is no reason for a GPU to be better at it. Sure, they've had a phenomenal increase in floating-point performance in the past decade. This was thanks to multi-core and wide vectors. But now that evolution is stagnating. They've simply reached the point where they can no longer increase the computing density, without sacrificing other components which it needs to feed those computing resources. In the mean time the CPU is catching up, by going multi-core and adding wider vectors as well. A quad-core Haswell will be 32 times more powerful than a single-core Pentium 4 at the same frequency!

pelov · Jun 15, 2012

CPUarchitect said:
That's a false blanket statement. A quad-core Haswell processor with a GT2 iGPU will have more floating-point processing power on the CPU end. Likewise there are mainstream Sandy Bridge models that have more CPU power too.

It's a blanket statement that for the most part is true. We're not gaming with our CPUs driving the pixels/triangles anymore and neither are the theoretical GFLOPs any better, single or double precision. Compare an equal costing CPU with GPU and you'll see the difference.

And there really is no reason for a GPU to be better at it.

You're right, but if you keep improving the FPUs in a CPU you'll achieve the same thing AMD is hoping to do with HSA. It's the same goal.

AMD/Intel aren't going in separate directions. In fact, they're headed in the exact same direction but taking parallel routes: AMD branching off x86 and adopting an open standard while Intel is looking to cement x86 into the GPU portion as well - -though that hinges on whether or not Larrabee 2.0 aka MIC is another flop. If it is then they might have screwed the GPGPU pooch for good.

That's my biggest issue with this. I dislike proprietary anything. It stifles competition and means the consumer gets less. AMD's approach is to get everyone to play with their toys in the same sandbox while Intel is off in a completely separate sandbox with only AMD able to enter.

bronxzv · Jun 15, 2012

pelov said:
It's a blanket statement that for the most part is true. We're not gaming with our CPUs driving the pixels/triangles anymore

in modern engines the trend is clearly toward more complex hybrid renderers with not only triangles but PBR (point based representation) and/or voxels for light source rays / GI effects, IBR (image based rendering) to capitalize on temporal coherence for far details + a lot more tricks, the new frontier is programmer's productivity, not raw FLOPS or "triangles per second" anymore

beginner99 · Jun 15, 2012

Noob QUestion:

If AVX2 really is such a killer, why hasn't it been implemented like 10 years ago?

ShintaiDK · Jun 15, 2012

beginner99 said:
Noob QUestion:

If AVX2 really is such a killer, why hasn't it been implemented like 10 years ago?

Same reason we didnt have SB/IB 10 years ago.

The Official AVX2 Thread

Platinum Member

Golden Member

Senior member

Lifer

Senior member

Diamond Member

Senior member

Senior member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Lifer