[TT]AMD introduces heterogeneous Uniform Memory Access

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

LogOver

Member
May 29, 2011
198
0
0
you can't have software/programs before the hardware for it is out !

Not exactly true. Intel posted AVX2 specification long before the actual hardware availability. There is also AVX2 emulator available on Intel's site for free to test AVX2 software products.

As for hUMA, I'm eager to see the actual details on MMU implementation (especially on Memory Access Protection). It may require changes in operating systems.
 

cbrunny

Diamond Member
Oct 12, 2007
6,791
406
126
So, does this mean that the GPU will be utilized to increase CPU performance? What if you choose to have a discrete GPU as your "main" GPU? does the on-die GPU still have a purpose?
 

Dribble

Platinum Member
Aug 9, 2005
2,076
611
136
The concept is great and at some point something like it will be adopted by the masses, but for AMD I think the PS4 is really their only chance for success. They just don't have the clout or capabilities to push a whole new way of doing things to market. Basically they need Intel to make a hardware standard, MS to produce the software to make it work and then AMD can provide some chips that support it. AMD is no Intel or MS - it simply can't do it alone.

With the PS4 however they get Sony who have made a hardware standard, and will produce millions of machines using it, and Sony will also develop the software libraries to make it all work. Here HSA can work.
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
Thats not true. The cache coherency alone adds latency and takes up bandwidth. Its a simple trade off. You hope for greater benefit than the penalty.

This is wrong.

parvadomus said:
This will add exactly the same latency that is added when you add aditional cores to a cpu die.

This is closer to correct.

Like it says in the slides, probe filters and coherence directories will tell whether the most up to date copy of the data currently resides in a CPU or GPU cache. I'm guessing that accessing the GPU cache is going to be slower (longer latency) than accessing the CPU cache, but I don't have anything to base this off of other than my gut and experience. My guess is that it's faster for the GPU to access CPU-cache-resident data, than vice-versa.

What will NOT happen is we will not see across-the-board longer latency accesses for CPUs accessing something in another CPU cache, but this will add a greater degree of NUCA than we've seen in AMD CPUs before (Intel has had this for a while with their L3-on-a-ring-bus).
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
So, does this mean that the GPU will be utilized to increase CPU performance? What if you choose to have a discrete GPU as your "main" GPU? does the on-die GPU still have a purpose?

No, the GPU is not increasing the CPU performance. A low-end CPU is still a low-end CPU.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
So, does this mean that the GPU will be utilized to increase CPU performance? What if you choose to have a discrete GPU as your "main" GPU? does the on-die GPU still have a purpose?

The proper way to phrase it is the GPU will be used to increase software performance where applicable.
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
Let me know when Kaveri gets near Tesla performance. Or even, when there is a spec for it. You cant code for something nobody besides AMD knows anything about.

a Kaveri with lower cpu clocks and more shaders... not the retail kaveri ofc

Tesla performance, really? wasn't you defending KC agains peak teorical performance?
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Are they ever gonna reveal anything technical - or do they know it's future would look more bleak if so?

1) CPU and GPU share the same physical address space
2) CPU and GPU are cache coherent

Seems pretty straightforward. Others are doing it as well - you can get a connect various GPUs over a cache coherent interconnect w/ARM clusters. Not sure if Haswell will have cache coherence or not but I suspect it will.
 

galego

Golden Member
Apr 10, 2013
1,091
0
0
Last edited:

galego

Golden Member
Apr 10, 2013
1,091
0
0
Interesting article at Ars. If games are being optimized for this for the PS4 and Xbox-whatever then AMD might actually be the best choice for gaming next year IF they can execute decently.

Related: Eurogamer run a poll among triple-A game developers and all them selected an AMD FX-8350 chip as "the best way to future-proof a games PC built in the here and now."

http://www.eurogamer.net/articles/digitalfoundry-future-proofing-your-pc-for-next-gen

We already have a thread on that Ars article. It would be appreciated if you guys could stick to it
-ViRGE
 
Last edited by a moderator:

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
This isn't a PS4 thread nor a FX-8350 thread. You have also posted that link in multiple other threads. Link spamming isn't appreciated.
 
Last edited:

LogOver

Member
May 29, 2011
198
0
0
PS4 is a HSA design with GDDR5 unified memory. Kaveri will use GDDR5 memory and hUMA for HSA. Therefore my bet is that the PS4 APU will be using hUMA.

EDIT: ExtremeTech seems to agree as well



http://www.extremetech.com/gaming/1...u-memory-should-appear-in-kaveri-xbox-720-ps4

Hope they wont take this route. GDDR5 has better bandwidth but worse latency then DDR3 (good for GPU, not good for CPU). But the worst thing is that it would imply RAM soldered on motherboard since no standard memory interface (like DIMM) is exists for GDDR5.
 
Last edited:

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
The idea looks very interesting to me from programming point of view.Is the APU responsible for keeping this operation thread safe or the burden lies with programmers and languages?
Some in the compiler, for sure. Reasonably strong ordering and coherency cost lots in hardware, and GPUs can get away with lacking some, using fancy scheduling and SMT to get around the impact of barriers and such. I can see thread safety implementations being quasi-software for years to come, on the GPU side of things (GCN's "OOO" write ordering should offer good enough HW guarantees to do the rest in software, if it eats too much in hardware resources, for now). But, it can mostly be compiled away, so you should never have to deal with it.

Memory coherency, however, really needs to be done in hardware. Microcode is fine; and doing it, "in hardware," by actually doing it in firmware, is also fine. Leaving it up to software has, and will again, result in longer development cycles, subtle bugs, and some software never being able to reach the potential its makers thought it could, because the average case ends up so much worse than the paper specs and tailored benchmark scores.

Guys, layman question of the day:

- Won't this more complex memory management add latency?
Yes, in best-case terms. It always does. That's nothing new. The problem is that, just like virtual memory v. physical memory, the vast majority of software does not live inside that best case bubble.

Of course, Intel has been there for awhile. They just need to improve their IGP and software for it. Good slides, and it's good they're finally doing it, but Intel beat them to this integration step. Intel needs to improve their software and IGP, but they've already been here, done that, and been sharing the LLC (it's like getting a the T-shirt, but for the CPU :)).

That is the big issue isn't it? Nobody is targeting GPUs, they're targeting GPU drivers. I think one of the advantages of Intel's Phi is that you target x86.
But, a very special x86. Also, it's a small niche-market product.
 
Last edited:

LogOver

Member
May 29, 2011
198
0
0
Not sure if Haswell will have cache coherence or not but I suspect it will.

Not sure if Haswell will have CPU/GPU cache coherence but definitely CPU will have the ability to read and write into GPU portion of memory. Intel revealed Haswell InstantAccess technology few weeks ago.
 

galego

Golden Member
Apr 10, 2013
1,091
0
0
Hope they wont take this route. GDDR5 has better bandwidth but worse latency then DDR3 (good for GPU, not good for CPU). But the worst thing is that it would imply RAM soldered on motherboard since no standard memory interface (like DIMM) is exists for GDDR5.

AMD will offer the option to choose between DDR3 and GDDR5

http://techreport.com/news/24737/amd-sheds-light-on-kaveri-uniform-memory-architecture

My bet is that GDDR5 builts will be faster in despite of a bit worse latency.

Take a look to this

http://www.interfacebus.com/Memory_Module_GDDR_DIMM.html

Maybe AMD is working with manufactures to provide some kind of AMP GDDR5 memory kits.
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
Hope they wont take this route. GDDR5 has better bandwidth but worse latency then DDR3 (good for GPU, not good for CPU). But the worst thing is that it would imply RAM soldered on motherboard since no standard memory interface (like DIMM) is exists for GDDR5.

I don't want to type it all out again (search my post history for a full description of why if you want it), but this is totally incorrect. GDDR5 doesn't have higher latency than DDR3. In fact, GDDR5 has 0 (zero) performance disadvantages when compared to DDR3. GDDR5 is strictly superior to DDR3 when it comes to performance, and is only ill-suited to being main system memory for economic, serviceability, and manufacturing reasons.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
AMD will offer the option to choose between DDR3 and GDDR5

http://techreport.com/news/24737/amd-sheds-light-on-kaveri-uniform-memory-architecture

My bet is that GDDR5 builts will be faster in despite of a bit worse latency.

Take a look to this

http://www.interfacebus.com/Memory_Module_GDDR_DIMM.html

Maybe AMD is working with manufactures to provide some kind of AMP GDDR5 memory kits.

From your first link:

Will hUMA mean CPUs and discrete GPUs can share a unified pool of memory, too? Not quite. When the question came up during the briefing, AMD said hUMA "doesn't directly unify those pools, but it does put them in a unified address space." The company then stressed that bandwidth won't be consistent between the CPU and discrete GPU memory pools—that is, GDDR5 graphics memory will be quicker, while DDR3 system memory will lag behind, so some hoop-jumping will still be required.

That makes it highly doubtful that GDDR5 will be used for system memory. They are even flat out saying it won't be, so at least we can put that rumor to rest.

So, as the article says - and I didn't catch this either - unified memory space doesn't mean unified memory chips.
 

NTMBK

Lifer
Nov 14, 2011
10,448
5,831
136
That makes it highly doubtful that GDDR5 will be used for system memory. They are even flat out saying it won't be, so at least we can put that rumor to rest.

Nice selective quoting there! The very next sentence of that article says:

(As an interesting side note, AMD then added that "people will be able to build APUs with either type of memory [DDR or GDDR] and then share any type of memory between the different processing cores on the APU.")

So they just said that there will be GDDR5 APUs...
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Nice selective quoting there! The very next sentence of that article says:



So they just said that there will be GDDR5 APUs...

They certainly aren't being clear.

But let's face it, what manufacturer is going to put GDDR5 system memory in these? They would price themselves completely out of the market.
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
TBut let's face it, what manufacturer is going to put GDDR5 system memory in these? They would price themselves completely out of the market.

they would price in the same market as low end discreet cards...
 

Plimogz

Senior member
Oct 3, 2009
678
0
71
It may only be a powerpoint slide, but at least it's a sign that we're finally reaching the "future" point from the original Fusion slides which made the rounds years ago.

There needs to be a point pretty soon where AMD can position its chips in respect to utilizing both traditionally CPU and GPU components for tasks which historically have been CPU bound. For their sake, as well as mine -- speaking as a fanboi, of course.

The consoles will surely act as a catalyst, but it seems to me that merely looking at Adobe's adoption of some OpenCL functions in their suite should be a positive indicator that all was not for naught when AMD jeopardized themselves on the ATi acquisition. hUMA is an important and necessary step. Hopefully the SR decode upgrade grants them some sufficient margin of improvement relative the competition and we find ourselves in a tighter race once again. In any case, with Koduri and Keller back on-board, and Read steering the ship, it sure is starting to feel like a heady time to root for the underdog.

Now, can I finally at least expect two GPUs on a single PCB to finally begin sharing a memory pool in time for the next generation of Radeons, or is there some VGA bandwidth angle which I don't at all get?
 

Sleepingforest

Platinum Member
Nov 18, 2012
2,375
0
76
they would price in the same market as low end discreet cards...

How much improvement are we expecting from Kaveri? AMD is claiming around 15% from Piledriver to Steamroller. If it really is the same price overall as the 7750 plus a cheap CPU, it won't be worthwhile--the 7750 is around 50% stronger than Trinity in GPU work.