Engineers boost Llano GPU performance by 20% without overclocking

Olikan · Feb 7, 2012

Found this on the web...

To achieve the 20% boost, the researchers reduce the CPU to a fetch/decode unit, and the GPU becomes the primary computation unit. This works out well because CPUs are generally very strong at fetching data from memory, and GPUs are essentially just monstrous floating point units. In practice, this means the CPU is focused on working out what data the GPU needs (pre-fetching), the GPU’s pipes stay full, and a 20% performance boost arises.

look at the link, the journalist is very confused 🙄

http://www.extremetech.com/computing/117377-engineers-boost-amd-cpu-performance-by-20-without-overclocking

Our experiments on a set of benchmarks show that our proposed preexecution improves the performance by up to 113% and 21.4% on average

from the link of the university
http://news.ncsu.edu/releases/wmszhougpucpu/

sm625 · Feb 7, 2012

So are we using the cpu and main memory as one giant branch predictor too? It makes no sense. I dont see how this 20% could have any relevance in real world computing.

anikhtos · Feb 7, 2012

sm625 said:
So are we using the cpu and main memory as one giant branch predictor too? It makes no sense. I dont see how this 20% could have any relevance in real world computing.

so now the new liano will have a better gpu and a worse cpu !?!?!?
so before we had a good cpu with intergrate graphic as joke
now we will have good graphic with a joke cpu?????

GammaLaser · Feb 7, 2012

This looks essentially like a run-ahead execution, allowing otherwise idle CPU resources to speed up execution by doing memory fetches for the GPU and warming up the cache. Before this idea was applied between threads in the CPU only. Interesting idea although I'm sure the quoted improvements are extremely optimistic 🙂

For some reason I got the initial impression that this was going to be some hardware trick but it's just clever software optimization.

gmaster456 · Feb 7, 2012

anikhtos said:
so now the new liano will have a better gpu and a worse cpu !?!?!?
so before we had a good cpu with intergrate graphic as joke
now we will have good graphic with a joke cpu?????

An integrated GPU with the power of a 3870 is NOT a joke.

Gloomy · Feb 7, 2012

gmaster456 said:
An integrated GPU with the power of a 3870 is NOT a joke.

Horsepower means very little though doesn't it? It's my understanding that what is holding integrated GPUs back is memory to GPU bandwidth. Or is that wrong?

Idontcare · Feb 7, 2012

Updated @ 17:54: The co-author of the paper, Huiyang Zhou, was kind enough to send us the research paper. It seems production silicon wasn’t actually used; instead, the software tweaks were carried out a simulated future AMD APU with shared L3 cache (probably Trinity). It’s also worth noting that AMD sponsored and co-authored this paper.

Erm, ok. Larrabee destroys all when it comes to these kinds of research results. 😀

Olikan · Feb 7, 2012

future AMD APU with shared L3 cache (probably Trinity)

🙄

this should be kaveri apus, not trinity...

soccerballtux · Feb 7, 2012

Olikan said:
Found this on the web...

look at the link, the journalist is very confused 🙄

http://www.extremetech.com/computing/117377-engineers-boost-amd-cpu-performance-by-20-without-overclocking

from the link of the university
http://news.ncsu.edu/releases/wmszhougpucpu/

wow this is really deep stuff. I wonder if IDC, CTho, Intel etc will be able to make sense of these tips! Seems strange to just throw this performance enhancing mod to the wolves. Do you think we'll be able to apply a bios update to do the same thing on our desktops?

soccerballtux · Feb 7, 2012

sm625 said:
So are we using the cpu and main memory as one giant branch predictor too? It makes no sense. I dont see how this 20% could have any relevance in real world computing.

why branch predict when you can compute all outcomes?

tweakboy · Feb 7, 2012

The world this uses dedicated video , for ultimate performance... gl

soccerballtux · Feb 7, 2012

Gloomy said:
Horsepower means very little though doesn't it? It's my understanding that what is holding integrated GPUs back is memory to GPU bandwidth. Or is that wrong?

yeah I'm pretty sure you're right. My old laptop had more than enough GPU to play WoW on medium settings but even with it at ultra low it could only muster about 13fps due to bandwidth bottleneck. Sometimes I wish they would give you 64MB video memory integrated...would it be that hard? Then you could at least get some mildly serious gaming done with older games.

tweakboy said:
The world this uses dedicated video , for ultimate performance... gl

😕

Makaveli · Feb 7, 2012

tweakboy said:
The world this uses dedicated video , for ultimate performance... gl

lmao

Idontcare · Feb 7, 2012

soccerballtux said:
wow this is really deep stuff. I wonder if IDC, CTho, Intel etc will be able to make sense of these tips! Seems strange to just throw this performance enhancing mod to the wolves. Do you think we'll be able to apply a bios update to do the same thing on our desktops?

What I get out of this is that they are basically making GPGPU stuff faster with the help of the CPU, not making CPU stuff faster.

This paper presents a novel approach to utilize the CPU resource to facilitate the execution of GPGPU programs on fused CPU-GPU architectures.

So you need to start with an app that already lends itself nicely to GPGPU compute, then rewrite it to take advantage of the CPU, and your GPGPU application will run faster.

Sounds like good stuff, but its not going to benefit x86 apps as far as I can tell.

Soulkeeper · Feb 7, 2012

I like the die shot in the article 🙂

I wish it was higher resolution

Idontcare · Feb 7, 2012

Die shot? I didn't see it, do you have full access to the article?

Soleron · Feb 8, 2012

Wouldn't it be faster still to make the whole die GPU blocks? Using the whole CPU power for one part of the GPU's function seems inefficient on power.

Olikan · Feb 8, 2012

Soleron said:
Wouldn't it be faster still to make the whole die GPU blocks? Using the whole CPU power for one part of the GPU's function seems inefficient on power.

where it says whole cpu power?

CPUarchitect · Feb 8, 2012

It's ironic that the CPU is being used to speed up the GPU.

Things like this reinforce that the future will be homogeneous architectures, not heterogeneous ones. The CPU simply needs high-throughput execution units like a GPU. And Intel will offer just that with the AVX2 support in Haswell next year. GPGPU's days are numbered.

jhu · Feb 8, 2012

Too bad the GPU portion doesn't do 64-bit float.

Soulkeeper · Feb 8, 2012

Idontcare said:
Die shot? I didn't see it, do you have full access to the article?

Olikan · Feb 8, 2012

CPUarchitect said:
Things like this reinforce that the future will be homogeneous architectures, not heterogeneous ones. The CPU simply needs high-throughput execution units like a GPU. And Intel will offer just that with the AVX2 support in Haswell next year. GPGPU's days are numbered.

this is something that i am very curious about...

amd's GCN supports x86 instructions, while avx-1024 seems to be a holy grail.
which one is better?

Idontcare · Feb 8, 2012

Soulkeeper said:

Oh, that's just Llano isn't it? I thought you meant there was a dieshot of the chip that was actually used in the research.

Here's bunches of die shots of Llano.

piesquared · Feb 8, 2012

Obviously intel emplyees are going to pump whatever they have. But the work MS put into C++ AMP would suggest that GPGPU has a strong future, it isn't dead. lol There's more potential in 10,000 shaders than there is in AVX2.

gorydetails · Feb 8, 2012

is this a new stepping,,,

Engineers boost Llano GPU performance by 20% without overclocking

Platinum Member

Diamond Member

Senior member

Member

Golden Member

Golden Member

Elite Member

Platinum Member

Lifer

Lifer

Diamond Member

Lifer

Diamond Member

Elite Member

Diamond Member

Elite Member

Senior member

Platinum Member

Senior member

Lifer

Diamond Member

Platinum Member

Elite Member

Golden Member

Member