GDDR5 RAM vs On-Package Cache RAM to improve IGP performance?

Fjodor2001 · Mar 29, 2013

Hi,

Apparently both Intel and AMD have come to the conclusion that IGPs are getting memory bandwidth bottlenecked. To solve this problem they seem to have chosen different routes however.

AMD has announced that they will support GDDR5 RAM for Kaveri, see:
http://www.xbitlabs.com/news/cpu/di..._Kaveri_APU_Supports_GDDR5_Memory_Report.html

Intel on the other hand will use On-Package graphics cache memory starting with Haswell, see:
http://wccftech.com/intel-haswell-g...ip-q3-2013-mobile-cpus-gt2-variants-detailed/

So I just thought it would be interesting to explore these two different solutions to the same common problem of the IGP being memory bandwidth bottlenecked. For example:

1. What could be the reasons Intel vs AMD have selected different solutions?

2. What are the pros & cons of each solution? E.g. costs, possibility of the CPU also getting benefits of faster RAM, amount of "fast RAM" available to the IGP, memory bandwidth differences (perhaps On-Package RAM is faster, but smaller amount available?), etc.

3. Do you think we'll see Broadwell bring On-Package cache across a broader range of Intel CPUs (on Haswell it's expected to only be used on a limited set of Notebook CPUs)?

4. Do you think AMD and Intel will stick with their respective selected solution? Or going forward, could AMD go in the direction of adding On-Package RAM, or Intel go in the direction of adding GDDR5 support, or perhaps both?

Please let me know what you think.

EDIT: Corrected text from "DDR5" to "GDDR5".

ShintaiDK · Mar 29, 2013

You mean GDDR5 and not DDR5. GDDR5 is very expensive and very power hungry. Its simply terrible for broad main memory usage. 8GB can cost more than the CPU.

I wouldnt be surprised if Skylake ships with stracked DRAM like nVidia Volta. The future is stacked DRAM. Anything in between is just short temporary solutions.

SPBHM · Mar 29, 2013

sony is using 8GB of GDDR5 for the PS4, perhaps the price is going down quickly?

ShintaiDK · Mar 29, 2013

SPBHM said:
sony is using 8GB of GDDR5 for the PS4, perhaps the price is going down quickly?

I doubt. If anything it might raise the prices for a short while.

Exophase · Mar 29, 2013

GDDR5. Not DDR5. Kaveri is speculated to be using GDDR5m which changes things a bit further.

The big problem with using a single fast memory pool is that it's overspecified for CPU tasks which is what a majority of the memory will be used for. 8GB is medium-end for PCs these days, but that amount of GDDR5 carries a big cost premium. It also wastes power if you don't need the extra bandwidth.

This pushes back on the two main reasons that APUs are attractive in the first place, lower cost and better power consumption.

So IMO, Intel's solution can be superior if the amount of memory is sufficient to alleviate the bandwidth limitations for typical gaming tasks. The solution worked well on XBox 360 and seems to be doing okay on Wii U. If it's really 128MB then that should be enough for render targets in a lot of situations. But this also has its own cost overheads and it's even worse if Intel is charging a big additional premium for it.

I could see AMD going with on-package memory in the future if they have the resources. They've shown development work for technologies like interposers. I doubt Intel is interested in support GDDR5 or other expensive specialized external memory. I also doubt they'll offer on-package memory for their desktop processors unless it can be used to provide a useful boost to CPU performance too. Desktop APUs are purely a value proposition and Intel has been getting more aggressive on mobile IGP performance and less on desktop IGP performance. I don't see this trend reversing.

Exophase · Mar 29, 2013

SPBHM said:
sony is using 8GB of GDDR5 for the PS4, perhaps the price is going down quickly?

Sony probably wants PS4 to be their main console offering for at least five years. The cost dynamics behind supporting that are very different from those behind supporting an APU that'll be replaced in two years at the latest (probably more like one year).

Fjodor2001 · Mar 29, 2013

ShintaiDK said:
You mean GDDR5 and not DDR5.

Yes, you are correct. Thanks for pointing it out. I edited the original post and corrected this now to avoid further confusion.

Phynaz · Mar 29, 2013

You aren't going to see GDDR5 used in a PC. You can bet on it.

Arkadrel · Mar 29, 2013

Do you see how fast the GPU is developing towards Gflops rates, compaired to the CPU?

Right now, alot of that "potential" is just wasted, because CPU&GPU dont work together well.
(The "idea" is to fix that, and make use of wasted potential, so you get MUCH more performance)

"These allow the GPU access to system memory and the CPU to access the GPU frame buffer through a 256-bit and 128-bit wide bus (per channel, each direction) respectively. This allows the graphics core and x86 processor modules to access the same memory areas and communicate with each other."

CPU and GPU would be able to share data between them from a single unified address space. This will prevent the relatively time-intensive need to copy data from CPU-addressable memory to GPU-addressable memory space — and will vastly improve performance as a result.

AMD gave two examples of programs and situations where the heterogeneous architecture can improve performance — and how shared memory can push performance even further.

The first example involved face detection algorithms...... snip...

It was at this point that Phil Rogers talked up the company’s heterogeneous architecture and the benefits of a “unified, shared, coherent memory.” By allowing the individual parts to play to their strengths, the company estimates 2.5 times the performance and up to a 40% reduction in power usage versus running the algorithm on either the CPU or GPU only (Intel methode).

By "codeing" software to use CPU&GPU at the same time, for the same tasks, working together, they can squeeze out 250% performance, while useing nearly half the power!!!

This is STRONGER than just useing the GPU alone for GPGPU, or the CPU alone.
Which means you might see CPU tasks in the future where AMD kicks Intels arse.
You might see GPGPU tasks, where AMD kicks Intels arse.

So the idea is to make ALL future APUs about x3 times as fast as they are now, with programs, by useing both CPU&GPU (while useing much less power).

Phynaz · Mar 29, 2013

Arkadrel said:
[B]By "codeing" software to use CPU&GPU at the same time, for the same tasks, working together, they can squeeze out 250% performance, while useing nearly half the power!!![/B]
[/QUOTE]

If only Gflops told the whole story. Try running a database engine on the 1,000 Gflop GPU. Oops, doesn't work so well, does it?

Let's say it again. GPU's don't run general purpose code.

Arkadrel · Mar 29, 2013

If only Gflops told the whole story. Try running a database engine on the 1,000 Gflop GPU. Oops, doesn't work so well, does it?

Have you ever wondered why that is?

....the relatively time-intensive need to copy data from CPU-addressable memory to GPU-addressable memory space...

^ maybe this is why (or part of the reason)? its called a bottleneck.

Maybe AMD removeing this "bottleneck" will change that?
Maybe a few years from now, you ll see APUs doing the work you now dont consider "worth while" to run on GPU as well, and doing it much faster than CPUs today can.

Insert_Nickname · Mar 29, 2013

Exophase said:
But this also has its own cost overheads and it's even worse if Intel is charging a big additional premium for it.

According to the rumour mill, GT3e (with on-package memory) will only be available in high performance i7's. On-package memory will not be used for mainstream. Perhaps its cost prohibits this?.

ShintaiDK · Mar 29, 2013

Insert_Nickname said:
According to the rumour mill, GT3e (with on-package memory) will only be available in high performance i7's. On-package memory will not be used for mainstream. Perhaps its cost prohibits this?.

Or that demand is still low.

Haswell GPU also introduces 2 new DX functions.
http://www.ubergizmo.com/2013/03/intel-pixelsync-order-independent-transparency/

Not sure how much I like the new DX of randomness tho.

Idontcare · Mar 29, 2013

Fjodor2001 said:
So I just thought it would be interesting to explore these two different solutions to the same common problem of the IGP being memory bandwidth bottlenecked. For example:

1. What could be the reasons Intel vs AMD have selected different solutions?

2. What are the pros & cons of each solution? E.g. costs, possibility of the CPU also getting benefits of faster RAM, amount of "fast RAM" available to the IGP, memory bandwidth differences (perhaps On-Package RAM is faster, but smaller amount available?), etc.

3. Do you think we'll see Broadwell bring On-Package cache across a broader range of Intel CPUs (on Haswell it's expected to only be used on a limited set of Notebook CPUs)?

4. Do you think AMD and Intel will stick with their respective selected solution? Or going forward, could AMD go in the direction of adding On-Package RAM, or Intel go in the direction of adding GDDR5 support, or perhaps both?

In my view of things this situation is completely to be expected, including the bifurcation of Intel and AMD on their chosen solution to the same general problem.

Think about it...what is AMD doing? They are trending towards evolving an existing solution (the discrete GPU PCIe board that plugs into a PCIe slot) into being a solution for an issue that was created (mem bandwidth to GPU) because they took the GPU off the discrete card and migrated it into the CPU socket. Time to migrate all the other stuff that was on the discrete card for a reason as well. Stick it all on the mobo.

That is a very natural progression for AMD's engineers, it fits with the existing way of thinking and solving problems the same engineers faced 5-10yrs ago when they had memory bandwidth issues with their discrete GPU ICs.

large_7669_hp-elitebook_8760w-amd_firepro_m5950-1gb-gddr5-graphics_card109-_647659-001-image2.jpg

^AMD FirePro M5950 w/1GB GDDR5

And what about Intel? They don't have a bunch of engineers experienced in solving bandwidth problems related discrete GPU add-on boards.

But what do they have? A bunch of engineers steeped in solving bandwidth issues for CPUs, going all the way back to the days of the PPro (if not even earlier): put a bunch of cache into the package and solve your bandwidth-limited issues

What did they do when their Pentium II's were bandwidth limited?

S_Intel-Pentium%20II%20400%20(SECC2%20back).jpg

And now they are faced with a situation where they are bandwidth limited, albeit for the GPU-centric compute portions of the IC, and what kind of "tried and true" solution do they fall back on? MCM some cache onto the CPU's PCB.

Now it is true that AMD has similar history of solving CPU bandwidth issues the same as Intel, but it is clear from AMD's chosen solution with GDDR5 that they have opted to morph their existing discrete GPU technology into an on-board solution and I think it shows who is in charge of the engineering decisions that are being made.

As to the benefits of either path...I'm inclined to assume AMD knows what it is doing. They are the one's who knew enough about making discrete GPU products (with GDDR5) that they were able to do something that Intel failed at (Larrabee). And I have to believe that if AMD thinks on-mobo GDDR5 is the right way to solve this problem then it most likely is true.

Phynaz · Mar 29, 2013

Arkadrel said:
Have you ever wondered why that is?

^ maybe this is why (or part of the reason)? its called a bottleneck.

Maybe AMD removeing this "bottleneck" will change that?
Maybe a few years from now, you ll see APUs doing the work you now dont consider "worth while" to run on GPU as well, and doing it much faster than CPUs today can.

Let's say it again, because didn't hear it the first ten times.

GPU's don't run general purpose code. If they did they would be CPU's.

Some time if the future you will see it happen, such as when x87 was integrated. Buy it's a long ways off. And when it does happen I have doubts that it will be AMD's instruction set.

Arkadrel · Mar 29, 2013

Phynaz said:
Let's say it again, because didn't hear it the first ten times.

GPU's don't run general purpose code. If they did they would be CPU's.

Some time if the future you will see it happen, such as when x87 was integrated. Buy it's a long ways off. And when it does happen I have doubts that it will be AMD's instruction set.

You have it backwards

CPUs run non general purpose code (but slowly, compaired to GPUs).

In the future maybe the "most used" type of code that exsist is the
non general purpose type. The only thing your CPU is used for, is that which it does best, or that which is not able to be used by another.

Whos to say?

Nothing wrong with useing a more optimised way of doing things,
as long as you dont suffer too much for it.

exsample: Digging a hole in the earth.

ME: I have this instrument called a Shovel, lets use it (for this task, because its smart).
(I ll get my spoon out, for the soup, when we have to eat (and not use the shovel for that))

You: Nah, I have something called a Spoon. Not only can you dig with it, but you can also use it to eat soup with. That makes it a better instrument, so lets just use it for everything (even digging holes).

Its just a matter of getting people to adapt it.
If programs start running 3 times as fast, it could take off.

Pilum · Mar 29, 2013

Idontcare said:
As to the benefits of either path...I'm inclined to assume AMD knows what it is doing. They are the one's who knew enough about making discrete GPU products (with GDDR5) that they were able to do something that Intel failed at (Larrabee). And I have to believe that if AMD thinks on-mobo GDDR5 is the right way to solve this problem then it most likely is true.

As Exophase pointed out, GDDR5 is overkill for the CPU, and it comes with decisive disadvantages (cost, power). Most RAM of a PC needs only be fast enough for the CPU; in a 8GiB configuration, you'll need only 1-2 GiB to be graphics memory. Requiring RAM to be all-GDDR5 will have a negative impact on system cost and is also decisively inelegant. Intels solution is far more cost-effective and flexible; just adjust the amount of stacked RAM when requirements increase.

Honestly, to me this seems to be a case of "When all you have is a hammer, every problems looks like a nail", rather than the conscious choice of an optimal solution.

Arkadrel · Mar 29, 2013

@Pilum

If you could get GPGPU software to be 3 times as fast, by shared memory.
Would you consider a 300% performance upgrade, worth the extra cost of GDDR5 ?

"to me this seems to be a case of "When all you have is a hammer, every problems looks like a nail"

But theres a performance reason behinde it. Doesnt that validate the use of it?

Imagine this 300% performance differnce, to slowly work its way into more and more games, and software (non gameing).

Can you see the APUs amd makeing, suddenly beating Intel by huge margins?
Would it be worth it?
how long has it been since AMD has had any sort of performance advantage over Intel?

Is it worth the cost of GDDR5?

Idontcare · Mar 29, 2013

Pilum said:
As Exophase pointed out, GDDR5 is overkill for the CPU, and it comes with decisive disadvantages (cost, power). Most RAM of a PC needs only be fast enough for the CPU; in a 8GiB configuration, you'll need only 1-2 GiB to be graphics memory. Requiring RAM to be all-GDDR5 will have a negative impact on system cost and is also decisively inelegant. Intels solution is far more cost-effective and flexible; just adjust the amount of stacked RAM when requirements increase.

Honestly, to me this seems to be a case of "When all you have is a hammer, every problems looks like a nail", rather than the conscious choice of an optimal solution.

AMD also doesn't have the option of stacking ram on the CPU's PCB as GloFo is not aggressively pursuing that option, unlike Intel and TSMC (Nvidia).

For AMD, they have to make GDDR5 work, they don't have a lot of options because when it comes to their foundry choices they also don't have a lot of options.

Mopetar · Mar 29, 2013

Phynaz said:
Let's say it again, because didn't hear it the first ten times.

GPU's don't run general purpose code. If they did they would be CPU's.

Some time if the future you will see it happen, such as when x87 was integrated. Buy it's a long ways off. And when it does happen I have doubts that it will be AMD's instruction set.

My understanding is that they're not going to run general purpose code, merely the parts of the code (if any) that can be done on well with a GPU. Having a memory space that both can access makes that easier. Of course there's still the matter of the chip being able to tell when it should be using the GPU. Doing that automagically isn't easy so you're probably stuck with requiring code to be explicitly written to do just that while others work on a compiler and front-end capable of doing most of the lifting because the number of people capable of hand-tuning code are rather limited.

So yes, you might be able to get more performance in some cases, but if it takes too much effort to get it, is anyone going to bother?

Blitzvogel · Mar 29, 2013

Intel's solution seems heavily dependent on on-chip memory size, and even still it would mean having to manage how it uses that memory in reference to the main memory and it's bandwidth. This on-chip memory does benefit both the CPU and GPU portions yes? But I assume it's not HSA like?

AMD's solution while arguably less elegant is reliant on brute size and strength, but in the end is necessary for the kind of sustained GFLOPS it can put out for graphics. With Kaveri, you get HSA features too, so CPU and GPU processes benefit. I think GDDR5 equipped APU systems will be relatively uncommon, mostly limited to APU equipped servers, embedded systems, and perhaps a few consumer PCs. Problem is that GDDR5 is expensive. 4 GB of GDDR5 would be hellishly expensive for a laptop or desktop. If a person cares that much about graphics, I think the money would be better sunk into the same amount of DDR3 and a basic dedicated GPU with 1 GB of GDDR5. But that means losing out on HSA and the efficiency of having a single memory bank.

Exophase · Mar 29, 2013

Insert_Nickname said:
According to the rumour mill, GT3e (with on-package memory) will only be available in high performance i7's. On-package memory will not be used for mainstream. Perhaps its cost prohibits this?.

The rumors I heard are slightly different - GT3e is only going to hit some of the lower power mobile parts. Not necessarily limited to those with i7 branding, but that could be the case. I don't know where the line is drawn between mainstream but this definitely won't be paired with the highest performance CPUs.

Intel is pushing a perf/W solution that could be very competitive, with both the on-package memory and a huge number of shaders. This is afforded them by their process technologies. This advantage is indispensable for mobile gaming platforms. It makes it hard for AMD to compete with APUs, nevermind anyone competing with GPUs. If there's limited competition and it's serving that gives Intel more room to raise margins..

Of course we're going to have to see how Kaveri does here but if it takes them many months later to release it it may not matter..

Idontcare said:
AMD also doesn't have the option of stacking ram on the CPU's PCB as GloFo is not aggressively pursuing that option, unlike Intel and TSMC (Nvidia).

Rumor is that Crystalwell isn't stacked, it's side by side. But AMD may not have the option of making such a large package instead.

Every mobile SoC has been using stacking on top of the package for years, this is surely different from the kind of stacking you refer to but I hope GF at least has this capability or they're in serious trouble.

BTW, everyone's referring to the (supposedly 128MB) of (probably) eDRAM Crystalwell (GT3e) offers as "cache." I doubt it's cache. It's probably just a straight memory buffer that has to be managed manually.

Piroko · Mar 29, 2013

Idontcare said:
As to the benefits of either path...I'm inclined to assume AMD knows what it is doing. They are the one's who knew enough about making discrete GPU products (with GDDR5) that they were able to do something that Intel failed at (Larrabee). And I have to believe that if AMD thinks on-mobo GDDR5 is the right way to solve this problem then it most likely is true.

I agree.

Both are valid solutions to the problem, but I think that the gddr5 solution is the better one for now until we can put some serious memory size onto the mcm. 128 Mb cache on chip might enable you some almost free effects (especially AA, if it's implemented resembling the xbox 360), but at the same time cost the game studios time and money to develop for. I don't think that this buffer will have that large of an effect if you don't optimize for it.
On the other side, replacing your system memory with gddr5 will come with a price premium (not that high imho, since you're saving the DDR3, its slots, and in comparison to the mcm, said mcm). On the plus side, the solution should be completely transparent to the game studios and it can scale, well, to about three to four times the speed of DDR3 with similar pin count afaik. Power consumption should be comparable if you don't push for the highest clocked bins.

Exophase · Mar 29, 2013

I don't think a 128MB buffer needs to be aggressively optimized by the game developer for to make a tangible difference. Just having the driver put what render target objects it can fit on it can make a big difference. Even with a less than sophisticated allocation policy. This isn't really new ground either, there were discrete GPUs in the past with only small amounts of dedicated RAM that were meant to directly address the rest from system memory.

It'll be very dependent on resolution but I don't think they were intent on getting reasonable performance current gen PC games going at 1080p with < 15W devices this generation.

I don't see how power consumption is going to be comparable. RAM with 3 times the throughput per pin doesn't use the same amount of power, even if you only use it 1/3rd as frequently.

Saving DIMM slots as a cost saving measure doesn't really factor into this if you're comparing against motherboards that could have (and often do) have soldered down DDR3.

Enigmoid · Mar 29, 2013

NUSNA_Moebius said:
Intel's solution seems heavily dependent on on-chip memory size, and even still it would mean having to manage how it uses that memory in reference to the main memory and it's bandwidth. This on-chip memory does benefit both the CPU and GPU portions yes? But I assume it's not HSA like?

AMD's solution while arguably less elegant is reliant on brute size and strength, but in the end is necessary for the kind of sustained GFLOPS it can put out for graphics. With Kaveri, you get HSA features too, so CPU and GPU processes benefit. I think GDDR5 equipped APU systems will be relatively uncommon, mostly limited to APU equipped servers, embedded systems, and perhaps a few consumer PCs. Problem is that GDDR5 is expensive. 4 GB of GDDR5 would be hellishly expensive for a laptop or desktop. If a person cares that much about graphics, I think the money would be better sunk into the same amount of DDR3 and a basic dedicated GPU with 1 GB of GDDR5. But that means losing out on HSA and the efficiency of having a single memory bank.

4 GB GDDR5 is not that expensive. Look at the lenovo y500. Costs $900 and you get a 1080p laptop with an i5, TWO 650m and 4 GB total of GDDR5. Cutting the screen and processor and two gk 107 chips out and you could easily reach $600 for a 720p 4 GB GDDR5 laptop with an amd cpu.

GDDR5 RAM vs On-Package Cache RAM to improve IGP performance?

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Elite Member

Lifer

Diamond Member

Member

Diamond Member

Elite Member

Diamond Member

Platinum Member

Diamond Member

Senior member

Diamond Member

Platinum Member