[TT]AMD introduces heterogeneous Uniform Memory Access

LogOver · Apr 30, 2013

R0H1T said:
you can't have software/programs before the hardware for it is out !

Not exactly true. Intel posted AVX2 specification long before the actual hardware availability. There is also AVX2 emulator available on Intel's site for free to test AVX2 software products.

As for hUMA, I'm eager to see the actual details on MMU implementation (especially on Memory Access Protection). It may require changes in operating systems.

cbrunny · Apr 30, 2013

So, does this mean that the GPU will be utilized to increase CPU performance? What if you choose to have a discrete GPU as your "main" GPU? does the on-die GPU still have a purpose?

Dribble · Apr 30, 2013

The concept is great and at some point something like it will be adopted by the masses, but for AMD I think the PS4 is really their only chance for success. They just don't have the clout or capabilities to push a whole new way of doing things to market. Basically they need Intel to make a hardware standard, MS to produce the software to make it work and then AMD can provide some chips that support it. AMD is no Intel or MS - it simply can't do it alone.

With the PS4 however they get Sony who have made a hardware standard, and will produce millions of machines using it, and Sony will also develop the software libraries to make it all work. Here HSA can work.

sefsefsefsef · Apr 30, 2013

ShintaiDK said:
Thats not true. The cache coherency alone adds latency and takes up bandwidth. Its a simple trade off. You hope for greater benefit than the penalty.

This is wrong.

parvadomus said:
This will add exactly the same latency that is added when you add aditional cores to a cpu die.

This is closer to correct.

Like it says in the slides, probe filters and coherence directories will tell whether the most up to date copy of the data currently resides in a CPU or GPU cache. I'm guessing that accessing the GPU cache is going to be slower (longer latency) than accessing the CPU cache, but I don't have anything to base this off of other than my gut and experience. My guess is that it's faster for the GPU to access CPU-cache-resident data, than vice-versa.

What will NOT happen is we will not see across-the-board longer latency accesses for CPUs accessing something in another CPU cache, but this will add a greater degree of NUCA than we've seen in AMD CPUs before (Intel has had this for a while with their L3-on-a-ring-bus).

sontin · Apr 30, 2013

cbrunny said:
So, does this mean that the GPU will be utilized to increase CPU performance? What if you choose to have a discrete GPU as your "main" GPU? does the on-die GPU still have a purpose?

No, the GPU is not increasing the CPU performance. A low-end CPU is still a low-end CPU.

Phynaz · Apr 30, 2013

cbrunny said:
So, does this mean that the GPU will be utilized to increase CPU performance? What if you choose to have a discrete GPU as your "main" GPU? does the on-die GPU still have a purpose?

The proper way to phrase it is the GPU will be used to increase software performance where applicable.

loccothan · Apr 30, 2013

Mabe this is the future form of AMD processors, mariage of DDR3 and gDDR5 for processor to use. Same we have in PS4 and it can do something like this --> http://www.youtube.com/watch?v=lUjQ4DJXLzw // Hmm if they manage to do this in SteamRoller 12 core and this HSA hUMA will be somethin' i think

Olikan · Apr 30, 2013

ShintaiDK said:
Let me know when Kaveri gets near Tesla performance. Or even, when there is a spec for it. You cant code for something nobody besides AMD knows anything about.

a Kaveri with lower cpu clocks and more shaders... not the retail kaveri ofc

Tesla performance, really? wasn't you defending KC agains peak teorical performance?

Exophase · Apr 30, 2013

MisterMac said:
Are they ever gonna reveal anything technical - or do they know it's future would look more bleak if so?

1) CPU and GPU share the same physical address space
2) CPU and GPU are cache coherent

Seems pretty straightforward. Others are doing it as well - you can get a connect various GPUs over a cache coherent interconnect w/ARM clusters. Not sure if Haswell will have cache coherence or not but I suspect it will.

galego · Apr 30, 2013

Riceninja said:
is this happening in the PS4?

PS4 is a HSA design with GDDR5 unified memory. Kaveri will use GDDR5 memory and hUMA for HSA. Therefore my bet is that the PS4 APU will be using hUMA.

EDIT: ExtremeTech seems to agree as well

A recent interview with Mark Cerny at Gamasutra seems to confirm that the PS4 at least will employ AMD’s hUMA tech.

http://www.extremetech.com/gaming/1...u-memory-should-appear-in-kaveri-xbox-720-ps4

galego · Apr 30, 2013

lagokc said:
Interesting article at Ars. If games are being optimized for this for the PS4 and Xbox-whatever then AMD might actually be the best choice for gaming next year IF they can execute decently.

Related: Eurogamer run a poll among triple-A game developers and all them selected an AMD FX-8350 chip as "the best way to future-proof a games PC built in the here and now."

http://www.eurogamer.net/articles/digitalfoundry-future-proofing-your-pc-for-next-gen

We already have a thread on that Ars article. It would be appreciated if you guys could stick to it
-ViRGE

Phynaz · Apr 30, 2013

This isn't a PS4 thread nor a FX-8350 thread. You have also posted that link in multiple other threads. Link spamming isn't appreciated.

LogOver · Apr 30, 2013

galego said:
PS4 is a HSA design with GDDR5 unified memory. Kaveri will use GDDR5 memory and hUMA for HSA. Therefore my bet is that the PS4 APU will be using hUMA.

EDIT: ExtremeTech seems to agree as well

http://www.extremetech.com/gaming/1...u-memory-should-appear-in-kaveri-xbox-720-ps4

Hope they wont take this route. GDDR5 has better bandwidth but worse latency then DDR3 (good for GPU, not good for CPU). But the worst thing is that it would imply RAM soldered on motherboard since no standard memory interface (like DIMM) is exists for GDDR5.

Cerb · Apr 30, 2013

Jaydip said:
The idea looks very interesting to me from programming point of view.Is the APU responsible for keeping this operation thread safe or the burden lies with programmers and languages?

Some in the compiler, for sure. Reasonably strong ordering and coherency cost lots in hardware, and GPUs can get away with lacking some, using fancy scheduling and SMT to get around the impact of barriers and such. I can see thread safety implementations being quasi-software for years to come, on the GPU side of things (GCN's "OOO" write ordering should offer good enough HW guarantees to do the rest in software, if it eats too much in hardware resources, for now). But, it can mostly be compiled away, so you should never have to deal with it.

Memory coherency, however, really needs to be done in hardware. Microcode is fine; and doing it, "in hardware," by actually doing it in firmware, is also fine. Leaving it up to software has, and will again, result in longer development cycles, subtle bugs, and some software never being able to reach the potential its makers thought it could, because the average case ends up so much worse than the paper specs and tailored benchmark scores.

mrmt said:
Guys, layman question of the day:

- Won't this more complex memory management add latency?

Yes, in best-case terms. It always does. That's nothing new. The problem is that, just like virtual memory v. physical memory, the vast majority of software does not live inside that best case bubble.

Of course, Intel has been there for awhile. They just need to improve their IGP and software for it. Good slides, and it's good they're finally doing it, but Intel beat them to this integration step. Intel needs to improve their software and IGP, but they've already been here, done that, and been sharing the LLC (it's like getting a the T-shirt, but for the CPU

).

podspi said:
That is the big issue isn't it? Nobody is targeting GPUs, they're targeting GPU drivers. I think one of the advantages of Intel's Phi is that you target x86.

But, a very special x86. Also, it's a small niche-market product.

LogOver · Apr 30, 2013

Exophase said:
Not sure if Haswell will have cache coherence or not but I suspect it will.

Not sure if Haswell will have CPU/GPU cache coherence but definitely CPU will have the ability to read and write into GPU portion of memory. Intel revealed Haswell InstantAccess technology few weeks ago.

galego · Apr 30, 2013

LogOver said:
Hope they wont take this route. GDDR5 has better bandwidth but worse latency then DDR3 (good for GPU, not good for CPU). But the worst thing is that it would imply RAM soldered on motherboard since no standard memory interface (like DIMM) is exists for GDDR5.

AMD will offer the option to choose between DDR3 and GDDR5

http://techreport.com/news/24737/amd-sheds-light-on-kaveri-uniform-memory-architecture

My bet is that GDDR5 builts will be faster in despite of a bit worse latency.

Take a look to this

http://www.interfacebus.com/Memory_Module_GDDR_DIMM.html

Maybe AMD is working with manufactures to provide some kind of AMP GDDR5 memory kits.

sefsefsefsef · Apr 30, 2013

LogOver said:
Hope they wont take this route. GDDR5 has better bandwidth but worse latency then DDR3 (good for GPU, not good for CPU). But the worst thing is that it would imply RAM soldered on motherboard since no standard memory interface (like DIMM) is exists for GDDR5.

I don't want to type it all out again (search my post history for a full description of why if you want it), but this is totally incorrect. GDDR5 doesn't have higher latency than DDR3. In fact, GDDR5 has 0 (zero) performance disadvantages when compared to DDR3. GDDR5 is strictly superior to DDR3 when it comes to performance, and is only ill-suited to being main system memory for economic, serviceability, and manufacturing reasons.

Phynaz · Apr 30, 2013

galego said:
AMD will offer the option to choose between DDR3 and GDDR5

http://techreport.com/news/24737/amd-sheds-light-on-kaveri-uniform-memory-architecture

My bet is that GDDR5 builts will be faster in despite of a bit worse latency.

Take a look to this

http://www.interfacebus.com/Memory_Module_GDDR_DIMM.html

Maybe AMD is working with manufactures to provide some kind of AMP GDDR5 memory kits.

From your first link:

Will hUMA mean CPUs and discrete GPUs can share a unified pool of memory, too? Not quite. When the question came up during the briefing, AMD said hUMA "doesn't directly unify those pools, but it does put them in a unified address space." The company then stressed that bandwidth won't be consistent between the CPU and discrete GPU memory poolsthat is, GDDR5 graphics memory will be quicker, while DDR3 system memory will lag behind, so some hoop-jumping will still be required.

That makes it highly doubtful that GDDR5 will be used for system memory. They are even flat out saying it won't be, so at least we can put that rumor to rest.

So, as the article says - and I didn't catch this either - unified memory space doesn't mean unified memory chips.

NTMBK · Apr 30, 2013

Phynaz said:
That makes it highly doubtful that GDDR5 will be used for system memory. They are even flat out saying it won't be, so at least we can put that rumor to rest.

Nice selective quoting there! The very next sentence of that article says:

(As an interesting side note, AMD then added that "people will be able to build APUs with either type of memory [DDR or GDDR] and then share any type of memory between the different processing cores on the APU.")

So they just said that there will be GDDR5 APUs...

ShintaiDK · Apr 30, 2013

Its no different than how IGPs uses main memory today. AMD simply made it pageable.

Phynaz · Apr 30, 2013

NTMBK said:
Nice selective quoting there! The very next sentence of that article says:

So they just said that there will be GDDR5 APUs...

They certainly aren't being clear.

But let's face it, what manufacturer is going to put GDDR5 system memory in these? They would price themselves completely out of the market.

Shephard · Apr 30, 2013

Good for AMD!

Olikan · Apr 30, 2013

Phynaz said:
TBut let's face it, what manufacturer is going to put GDDR5 system memory in these? They would price themselves completely out of the market.

they would price in the same market as low end discreet cards...

Plimogz · Apr 30, 2013

It may only be a powerpoint slide, but at least it's a sign that we're finally reaching the "future" point from the original Fusion slides which made the rounds years ago.

There needs to be a point pretty soon where AMD can position its chips in respect to utilizing both traditionally CPU and GPU components for tasks which historically have been CPU bound. For their sake, as well as mine -- speaking as a fanboi, of course.

The consoles will surely act as a catalyst, but it seems to me that merely looking at Adobe's adoption of some OpenCL functions in their suite should be a positive indicator that all was not for naught when AMD jeopardized themselves on the ATi acquisition. hUMA is an important and necessary step. Hopefully the SR decode upgrade grants them some sufficient margin of improvement relative the competition and we find ourselves in a tighter race once again. In any case, with Koduri and Keller back on-board, and Read steering the ship, it sure is starting to feel like a heady time to root for the underdog.

Now, can I finally at least expect two GPUs on a single PCB to finally begin sharing a memory pool in time for the next generation of Radeons, or is there some VGA bandwidth angle which I don't at all get?

Sleepingforest · Apr 30, 2013

Olikan said:
they would price in the same market as low end discreet cards...

How much improvement are we expecting from Kaveri? AMD is claiming around 15% from Piledriver to Steamroller. If it really is the same price overall as the 7750 plus a cheap CPU, it won't be worthwhile--the 7750 is around 50% stronger than Trinity in GPU work.

[TT]AMD introduces heterogeneous Uniform Memory Access

Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Lifer

Senior member

Platinum Member

Diamond Member

Golden Member

Golden Member

Lifer

Member

Elite Member

Member

Golden Member

Senior member

Lifer

Lifer

Lifer

Lifer

Senior member

Platinum Member

Senior member

Platinum Member