AMD Fusion and GPGPU killer app

BenchPress · Dec 21, 2011

frostedflakes said:
AMD desperately needs more developers embracing OpenCL like this. GPGPU is pretty much the only area they have a solid advantage over Intel and could take the performance crown away from them.

Indeed AMD should be desperate since Intel is bringing GPU technology right into the CPU, with AVX2. It features a vector version of every scalar operation, just like a GPU, allowing to parallelize code by a factor 8. Most importantly it can do it with far fewer threads so it doesn't suffer so much from Amdahl's Law.

So AMD has got nothing on Intel. Two wrongs don't make a right. Fusion is a monstrosity with a crippled CPU side and they're putting all their eggs in the GPU's basket. GPGPU is proving to be a FAIL though and the few things it does right, AVX2 enabled CPU cores can do better.

KingFatty · Dec 21, 2011

As for the disk compression and the argument that "it's been done before" - when it was done before, I believe it was based on a CPU that was so pitifully slow compared to a modern GPGPU.

So isn't it a bad argument to say that it's been done before, when really it hasn't because there was no such thing as a modern GPGPU back then? Like saying you can't make a car that goes 200 mph fast, because they already tried making horses run that fast and the horses just died, it's been done before and didn't work...

Phynaz · Dec 21, 2011

ncalipari said:
It's never been tried before with as much computational power as GPGPU have.

It's been done for many years.

http://www.indranetworks.com/products.html

http://www.aha.com/index.php/products-2/data-compression/

ncalipari · Dec 21, 2011

Phynaz said:
It's been done for many years.

http://www.indranetworks.com/products.html

http://www.aha.com/index.php/products-2/data-compression/

it's like saying that multithreaded computation has been done for years because we had dual processor MBs.

GPGPU is a completely different environment, with different limits and strenght.

Please don't oversimplify it.

Chiropteran · Dec 21, 2011

Phynaz said:
It's been done for many years.

http://www.indranetworks.com/products.html

http://www.aha.com/index.php/products-2/data-compression/

What is this? Just a few posts above you said there was no money in this, now you supply us with a link to a whole company that offers this as a product. How do they pay their employees?

Phynaz · Dec 21, 2011

Chiropteran said:
What is this? Just a few posts above you said there was no money in this, now you supply us with a link to a whole company that offers this as a product. How do they pay their employees?

I did? I must be having a stroke, because I can't remember that. Mind finding that post for me?

A5 · Dec 21, 2011

BenchPress said:
Indeed AMD should be desperate since Intel is bringing GPU technology right into the CPU, with AVX2. It features a vector version of every scalar operation, just like a GPU, allowing to parallelize code by a factor 8. Most importantly it can do it with far fewer threads so it doesn't suffer so much from Amdahl's Law.

So AMD has got nothing on Intel. Two wrongs don't make a right. Fusion is a monstrosity with a crippled CPU side and they're putting all their eggs in the GPU's basket. GPGPU is proving to be a FAIL though and the few things it does right, AVX2 enabled CPU cores can do better.

You're half-right about AVX (in that it exists and enables 8x SIMD in the CPU), but it's not aiming to solve that problem on the same scale as GPGPU.

GPGPU hasn't done anything in the consumer space yet (due to a number of factors, mostly the small market with capable hardware and the competing programming standards), but it's growing steadily in the academic and business worlds because it lets you process massive datasets much more quickly than a bunch of CPUs in the same TDP and price envelopes. Even Intel acknowledges that this is a separate market (Knights Corner), so I'm not sure where your stance comes from besides wanting to hate on AMD.

Chiropteran · Dec 21, 2011

Phynaz said:
I did? I must be having a stroke, because I can't remember that. Mind finding that post for me?

Lets see, you wrote:

Really, you're not getting it. It's been tried before. The same argument was used before. It doesn't pay. The overhead is too large.

You said "It doesn't pay", plain as day.

And then apparently after you got called out on it, you went back and edited to add:

The only place it's used today is in WAN links.
Last edited by Phynaz

Phynaz · Dec 21, 2011

Chiropteran said:
You said "It doesn't pay", plain as day.

"It doesn't pay" has nothing to do with money. Put that line in the context of what I wrote and it means that there is no improvement in disk throughput.

I can only assume that you must not be a native English speaker to have not heard that phrase before.

Chiropteran said:
And then apparently after you got called out on it, you went back and edited to add:

Nope. Check the timestamp of the edit. It was a few minutes after I posted.

Chiropteran · Dec 21, 2011

Phynaz said:
"It doesn't pay" has nothing to do with money. Put that line in the context of what I wrote and it means that there is no improvement in disk throughput.

Okay, I'll humor you. When you said "it doesn't pay" you meant that the performance wasn't improved by the compression, right?

So again I say, you just gave us links to companies that offers this as a product, presumably as a way to make money. How do these companies exist at all if the product the sell "doesn't pay"? What companies buy the products if they offer no improvident?

There are holes in your argument.

Phynaz · Dec 21, 2011

Chiropteran said:
Okay, I'll humor you. When you said "it doesn't pay" you meant that the performance wasn't improved by the compression, right?

So again I say, you just gave us links to companies that offers this as a product, presumably as a way to make money. How do these companies exist at all if the product the sell "doesn't pay"? What companies buy the products if they offer no improvident?

There are holes in your argument.

Those are enterprise class hardware solutions, not consumer level software solutions.

The only holes are the ones you are failing to recognize in your own posts.

ncalipari · Dec 21, 2011

Phynaz said:
Those are enterprise class hardware solutions, not consumer level software solutions.

The only holes are the ones you are failing to recognize in your own posts.

now that a great part of the market has GPGPU capabilities in their devices, expect to see much more of this.

Chiropteran · Dec 21, 2011

Phynaz said:
Those are enterprise class hardware solutions, not consumer level software solutions.

The only holes are the ones you are failing to recognize in your own posts.

I wonder why this technology has only caught on in the enterprise class. Could it possibly be due to cost of hardware?

Do you think that having a built in GPUs on cheap commodity hardware combined with GPU accelerated software might help the spread of this technology?

Phynaz · Dec 21, 2011

Chiropteran said:
I wonder why this technology has only caught on in the enterprise class. Could it possibly be due to cost of hardware?

Do you think that having a built in GPUs on cheap commodity hardware combined with GPU accelerated software might help the spread of this technology?

No, I do not. Again, it comes down to overhead. Have you ever tried reading data in and out of a GPU? It's slow, slow slow.

Chiropteran · Dec 21, 2011

Phynaz said:
No, I do not. Again, it comes down to overhead. Have you ever tried reading data in and out of a GPU? It's slow, slow slow.

We are talking about compressing data that is being written or read from a hard disk drive.

I am not a technical expert, but I do know that reading data from memory, whether it's system RAM or some sort of dedicated GPU RAM, is an order of magnitude faster than reading the same data from a spindle hard drive.

It's not slow, slow slow, or slow slow slow. So what is your next objection?

Phynaz · Dec 21, 2011

What happens to your game frame rate when you have to texture from main memory? That's writing to a GPU.

Reading back into system memory from the GPU is even massively slower than that. Really, go ahead and Google it.

You seem to be under the impression that the compressed bitstream doesn't have to get into and out of the video card. And by your own admission writes are slow.

iCyborg · Dec 21, 2011

Phynaz said:
What happens to your game frame rate when you have to texture from main memory? That's writing to a GPU.

Reading back into system memory from the GPU is even massively slower than that. Really, go ahead and Google it.

You seem to be under the impression that the compressed bitstream doesn't have to get into and out of the video card. And by your own admission writes are slow.

On Fusion which is the topic here, CPU and GPU share RAM, there's no writing to GPU, so your argument doesn't hold there.
And even for discrete if this was so massively slow slow slow, then CPUs would be doing rendering instead of GPUs - obviously it's fast enough that you can offset it with GPU's compute power for appropriate tasks.

Vesku · Dec 21, 2011

The unified RAM addressing hasn't currently been implemented. Not sure if it is showing up with AMD GCN or the generation after.

ncalipari · Dec 21, 2011

Vesku said:
The unified RAM addressing hasn't currently been implemented. Not sure if it is showing up with AMD GCN or the generation after.

well it's partially supported by CUDA and not yet supported by OpenCL, but you can very well use one of the existing software layers to simulate that.

http://mathema.tician.de/software/pyopencl

Chiropteran · Dec 21, 2011

Phynaz said:
What happens to your game frame rate when you have to texture from main memory? That's writing to a GPU.

You are talking about how FPS drops because main memory is slower than GPU RAM?

Wanna know what is even slower than both, by about 10 fold difference or more? Your hard disk drive. Compared to a hard disk drive, reading or writing to video card memory is much faster, and no amount of rhetoric is going to change that fact.

Phynaz said:
You seem to be under the impression that the compressed bitstream doesn't have to get into and out of the video card. And by your own admission writes are slow.

A PCI-E 2.0 x16 slot video card could theoretically move the data back and forth at 8GB/sec. The fastest SSD hard drives in existence struggle to approach .5GB/sec

You seem to be under the impression that a 8GB/sec limitation is going to hold back the performance of a .5GB/sec peak device.

It won't, you are wrong.

iCyborg · Dec 21, 2011

Vesku said:
The unified RAM addressing hasn't currently been implemented. Not sure if it is showing up with AMD GCN or the generation after.

Unified RAM addressing will make programming easier, as now using relatively low-level OpenCL or DirectCompute is still quite cumbersome, CUDA is better in that respect. The point is that data doesn't go over an external bus for Fusion compared to dGPU.

Phynaz · Dec 21, 2011

Chiropteran said:
You seem to be under the impression that a 8GB/sec limitation is going to hold back the performance of a .5GB/sec peak device.

It won't, you are wrong.

Find a video card that can write system memory anywhere near that - even 1/4th of that, and then we will talk. Until then we are done, as you just want to argue and not actually gain any knowledge.

BenchPress · Dec 21, 2011

A5 said:
You're half-right about AVX (in that it exists and enables 8x SIMD in the CPU), but it's not aiming to solve that problem on the same scale as GPGPU.

What makes you think that? A quad-core Haswell CPU will deliver 500 GFLOPS. For comparison, the A8-3850's GPU tops out at 480 GFLOPS. Although Haswell is still more than a year out, again keep in mind that GPGPU has a really hard time achieving high efficiency. Also, they can't exceed the CPU's bandwidth anyway, and Fusion has no L3 cache to compensate. So what it's left with is a crippled CPU side and a GPU that is of little value beyond graphics.

Chiropteran · Dec 21, 2011

Phynaz said:
Find a video card that can write system memory anywhere near that - even 1/4th of that, and then we will talk. Until then we are done, as you just want to argue and not actually gain any knowledge.

First of all, I'm not doing your homework for you. If you are going to make a claim, you back it up with a source. If you'd like, I'll give you mine. Wikipedia, "PCI-E". Not the most prestigious source, but I trust it about 1000X more than I trust some random forum poster named "Phynaz".

Second, why are you bringing up a point we have already covered? We have already been over writes, you and I both know that the speed it takes to write isn't going to impact application performance at all because they can be buffered. Even if compression makes them slower, it will still be a net gain as long as reads are faster.

Third, as others have already corrected you on, Fusion arch means that the GPU doesn't have to work over the PCIE bus to access system memory, became it uses system memory as it's own.

Fourth, if what you imply is true (hard drive access is FASTER than GPU access), we would be storing our textures on the hard drive instead of GPU RAM... and that would make a lot of sense, right? No, of course not. I don't know where you went wrong but somewhere you missed a decimal point in your math, GPU access is dramatically faster than hard drive accesses in reality.

ncalipari · Dec 21, 2011

BenchPress said:
What makes you think that? A quad-core Haswell CPU will deliver 500 GFLOPS. For comparison, the A8-3850's GPU tops out at 480 GFLOPS. Although Haswell is still more than a year out, again keep in mind that GPGPU has a really hard time achieving high efficiency. Also, they can't exceed the CPU's bandwidth anyway. So what Fusion is left with is a crippled CPU side and a GPU that is of little value beyond graphics.

one year is a long time in the IT world.

AMD Fusion and GPGPU killer app

Senior member

Diamond Member

Lifer

Senior member

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Senior member

Diamond Member

Lifer

Diamond Member

Lifer

Golden Member

Diamond Member

Senior member

Diamond Member

Golden Member

Lifer

Senior member

Diamond Member

Senior member