• We should now be fully online following an overnight outage. Apologies for any inconvenience, we do not expect there to be any further issues.

Maybe We Don't Understand the implications of OCL and HSA

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Hitman928

Diamond Member
Apr 15, 2012
6,705
12,386
136
http://images.anandtech.com/doci/5491/Screen Shot 2012-02-01 at 2.14.16 PM.png

http://images.anandtech.com/doci/5503/Screen Shot 2012-02-02 at 3.12.58 PM.png

http://images.anandtech.com/doci/5493/Screen Shot 2012-02-02 at 9.20.35 AM.png

The HSAIL isn't even finalized yet; version 0.95 was released just last month. Kaveri is the first APU that is HSA capable; first gen Jaguar products are not going to be HSA capable. Sites like ExtremeTech are essentially failing to understand what HSA is; unified memory is a part of HSA. It is not all of HSA.

While I agree with almost everything here, my only point of contention is that you are equating first generation jaguar to the new console APU's. While, I agree this very well could be the case, we already know these are semi-custom chips, so I wouldn't be surprised if they have additional features not found in the 'consumer' jaguar cores, perhaps equal to kaveri for HSA implementation. I don't know the answer to this, like I said, no where have I seen anyone announce this, just lots of comments/speculation pointing that way. It would, I think, be in AMD's best interest if they made the consoles HSA compatible, but the customer is the one who really determines if they want it or not. I guess we'll find out soon enough ;).
 

USER8000

Golden Member
Jun 23, 2012
1,542
780
136
The PS4 SOC looks extensively customised:

http://techreport.com/news/24725/ps4-architect-discusses-console-custom-amd-processor

TR said:
A 256-bit interface links the console's processor to its shared memory pool. According to Cerny, Sony considered a 128-bit implementation paired with on-chip eDRAM but deemed that solution too complex for developers to exploit. Sony has also taken steps to make it easier for developers to use the graphics component for general-purpose computing tasks. Cerny identifies three custom features dedicated to that mission:

  • An additional bus has been grafted to the GPU, providing a direct link to system memory that bypasses the GPU's caches. This dedicated bus offers "almost 20GB/s" of bandwidth, according to Cerny.
  • The GPU's L2 cache has been enhanced to better support simultaneous use by graphics and compute workloads. Compute-related cache lines are marked as "volatile" and can be written or invalidated selectively.
  • The number of "sources" for GPU compute commands has been increased dramatically. The GCN architecture supports one graphics source and two compute sources, according to Cerny, but the PS4 boosts the number of compute command sources to 64.
 

zlatan

Senior member
Mar 15, 2011
580
291
136
Did OpenCL slow down that IB? Else it makes no sense.
An OpenCL code is not performance portable. Adobe optimize their implementation to AMD, so on Intel it will sucks. The Adobe programs won't use the Intel iGPU. But it works on the CPU, if you use AMD OpenCL driver.
This is a problem with OpenCL, because most OpenCL programs was made by AMD. The developer have to make an implementation to Intel and NVIDIA, if they want good performance on these hardwares. HSA will be a good solution for this, because it will provide performance portability.
 
Last edited:

zlatan

Senior member
Mar 15, 2011
580
291
136
Actually they won't. The console APUs are not HSA capable. First generation Jaguar is not HSA ready, among other limitations.

The first HSA capable APU is Kaveri, and even that's going to be a bit of a stretch since it's more like HSA 0.9 (the GPU won't be capable of fast context switching).
The console APUs are HSA capable.
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
An OpenCL code is not performance portable. Adobe optimize their implementation to AMD, so on Intel it will sucks. The Adobe programs won't use the Intel iGPU. But it works on the CPU, if you use AMD OpenCL driver.
This is a problem with OpenCL, because most OpenCL programs was made by AMD. The developer have to make an implementation to Intel and NVIDIA, if they want good performance on these hardwares. HSA will be a good solution for this, because it will provide performance portability.

How so? Intel isn't boarding on the HSA boat, neither is Nvidia. So how performance is going to be portable in the first place? I see AMD painting themselves into a corner, again.
 

zlatan

Senior member
Mar 15, 2011
580
291
136
How so? Intel isn't boarding on the HSA boat, neither is Nvidia. So how performance is going to be portable in the first place? I see AMD painting themselves into a corner, again.
For the others. The developers write a native C/C++, Python, Java ... or other code (OpenCL if they want). The HSA runtime will run the application. Than they will write an OpenCL implementation for Intel and NVIDIA. Of course they don't need to do this. The HSA runtime will allow to run any application on legacy mode (this is the Intel compatible mode).
 

Imouto

Golden Member
Jul 6, 2011
1,241
2
81
How so? Intel isn't boarding on the HSA boat, neither is Nvidia. So how performance is going to be portable in the first place? I see AMD painting themselves into a corner, again.

AMD ain't alone in the HSA boat. Companies like ARM, Samsung or Qualcomm are cofounders. With mobile stuff leading the compute market and that consortium dwarfing Intel + Nvidia they're not the cornered ones.

In fact I see this as a colossal effort to drive Intel into a more seizable status. AMD is just the visible head of this and prolly the one to blame if it fails.
 
Last edited:

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
For the others. The developers write a native C/C++, Python, Java ... or other code (OpenCL if they want). The HSA runtime will run the application. Than they will write an OpenCL implementation for Intel and NVIDIA. Of course they don't need to do this. The HSA runtime will allow to run any application on legacy mode (this is the Intel compatible mode).

And do you think this process will be seamless? That there will be no performance loss than standard code, let alone code hand optmized for Intel or ARM processors? Sorry, this is the definition of painting themselves into a corner.

AMD is daydreaming thinking that people will develop for them instead of develop for Intel or ARM. They are painting themselves into a corner. They'll probably finish with bulky, complex and expensive HSA hardware that nobody will use and still lagging on performance when compared to everyone else.
 

Imouto

Golden Member
Jul 6, 2011
1,241
2
81
AMD is daydreaming thinking that people will develop for them instead of develop for Intel or ARM. They are painting themselves into a corner. They'll probably finish with bulky, complex and expensive HSA hardware that nobody will use and still lagging on performance when compared to everyone else.

Again, ARM is part of the HSA Foundation among others.
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
AMD ain't alone in the HSA boat. Companies like ARM, Samsung or Qualcomm are cofounders. With mobile stuff leading the compute market and that consortium dwarfing Intel + Nvidia several times they're not the cornered ones.

Qualcomm, ARM and Samsung wouldn't put themselves at the mercy of AMD, and this is exactly what happens if they let HSA dominate their design development. Especially when every one of those you mentioned (except for ARM) will be competing among themselves on every market you can think.

The fact that nobody outside AMD is hyping HSA is a proof that their commitment is not much beyond formalities.
 

SammichPG

Member
Aug 16, 2012
171
13
81
Actually they won't. The console APUs are not HSA capable. First generation Jaguar is not HSA ready, among other limitations.

The first HSA capable APU is Kaveri, and even that's going to be a bit of a stretch since it's more like HSA 0.9 (the GPU won't be capable of fast context switching).

You have good a very good point (it totally flew over my head), but unless you signed some nda you don't know for sure.

Sony is getting a gddr5 apu while we most likely won't, it wouldn't be that surprising if amd squeezed hsa in the socs used for the new consoles.

It's just so stupidly important from a market point of view that you hope for their sake that they won't launch consoles without the full hsa feature set...
 

Imouto

Golden Member
Jul 6, 2011
1,241
2
81
Qualcomm, ARM and Samsung wouldn't put themselves at the mercy of AMD, and this is exactly what happens if they let HSA dominate their design development. Especially when every one of those you mentioned (except for ARM) will be competing among themselves on every market you can think.

At the mercy of AMD? You should pay a visit to the HSA website before saying something like that.

So you know better than all of those billion $ corps. Ensure to tell them in a mail.

The fact that nobody outside AMD is hyping HSA is a proof that their commitment is not much beyond formalities.

Different approaches. AMD is known for making grand claims way before being ready for stage that turn out to be all bragging and no substance. Nvidia is almost the same, Intel doesn't hype too much, Qualcomm the same. You're reading way too far.
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
http://images.anandtech.com/doci/5493/Screen Shot 2012-02-02 at 9.20.35 AM.png

Your own link says that Kaveri (GCN Graphics) has GPU context Switching.
You should see Kaveri as a 2014 part, 2013 is Trinity and Richland.
Kaveri doesn't have GPU context switching. That doesn't come until the next gen of GCN (2.0?). Kaveri would be based on GCN 1.1, or whatever you want to call Sea Islands.

http://images.anandtech.com/doci/5503/Screen Shot 2012-02-02 at 3.12.58 PM.png
 

Khato

Golden Member
Jul 15, 2001
1,293
372
136
The developers write a native C/C++, Python, Java ... or other code (OpenCL if they want). The HSA runtime will run the application.

While allowing developers to use any language they want would be a nice touch... unless there's something magical about HSA then it's not going to be able to extract any more parallelism than what the developer was capable of coding in their language of choice. (And even that amount is questionable due to the abstraction layer/quality of the HSA software.)

Please feel free to correct me, but I have a hard time getting excited about something that is effectively just an extension of the idea behind OpenCL on the software side. (Note that the hardware requirements of HSA are equally beneficial to OpenCL.)
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
At the mercy of AMD? You should pay a visit to the HSA website before saying something like that.

So you know better than all of those billion $ corps. Ensure to tell them in a mail.

Why do I know better? They applied for membership on the foundation, got good info from what AMD is planning, another industry forum for them to talk between themselves and they don't have to commit to anything. Why should they bother? They might even make token compatibility as a feature for their future silicon.

But so far nobody but AMD is designing silicon from the ground up with HSA in mind, and until I see the big guys doing this, I'll consider HSA as AMD next pipe dream, just like GPGPU was Nvidia pipe dream.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
For the others. The developers write a native C/C++, Python, Java ... or other code (OpenCL if they want). The HSA runtime will run the application. Than they will write an OpenCL implementation for Intel and NVIDIA. Of course they don't need to do this. The HSA runtime will allow to run any application on legacy mode (this is the Intel compatible mode).
Erm...how's this HSA runtime going to run this application? That simply makes no sense, without reinventing so many wheels as to be unfathomable.

If they write a C++ program, they're generally going to compile it directly to the CPU ISA.

If they write a Python program, at most, it might be compatible with CPython (compiled to CPU's ISA), Pypy (compiled to CPU's ISA, but also JIT for a few), and Jython (JVM).

And so on.

They're going to have to make their programs to use HSA. Now in some cases, this may be easier than others. FI, an HSA-enabled version of something like Numpy (if you think that's :rolleyes:, check out Magnum P.y. ;)), then just by using its data structures and functions for work on arrays, you could easily get some speed boosts. But, that's going to be the odd case.

The problem now, is that every HW maker has their own driver, their own libraries, with their own APIs, to implement their own special sauce, which also sees the computer's world through its own senses, which are made based on how it works inside, not how it would be the best way for others to make use of it. So, you've got hundreds of little alien universes trying to work together. It's a mess, wasting time and effort, in ways that negatively effect everyone but Intel, IBM, Oracle, etc. (and we generally don't care much about IBM or Oracle, these days). That is what HSA could help with.

For a bad car analogy, imagine if every auto maker used a different tail light pattern, for every series of car they made. So, you had to remember which light turning on what color meant what for every model of every make. It would be chaos, right? You don't care if they use burning filaments, gas, or LEDs, whether they are run by transistors or relays, or any of that, but you do care that blinking colored light on one side means turning that way, and a center red light means braking, and multiple white lights means backing up, etc..

It would be nice to be able to write some code to run on a generic DSP-like or GPU-like machine, and then have the system figure out the details, for the most part. You'd still have to write DSP-like or GPU-like code, and you'd still have to deal with all the threading intricacies to make it work well...you just wouldn't have to care exactly what it ran on. Outside of x86, these days, that's not an easy task, and it's not even common on x86, for OpenCL, yet.

Qualcomm, ARM and Samsung wouldn't put themselves at the mercy of AMD, and this is exactly what happens if they let HSA dominate their design development. Especially when every one of those you mentioned (except for ARM) will be competing among themselves on every market you can think.
Dominating their design and development, by implementing features that are mostly sensible and desirable, putting them at the mercy of AMD? Not really.

While certainly as byzantine as expected by a committee of semi-competitors, what's in the HSA/HSAIL doc they have up is, for the most part, pretty sensible, and doesn't get too much into how it has to be done, by the hardware.

But so far nobody but AMD is designing silicon from the ground up with HSA in mind, and until I see the big guys doing this, I'll consider HSA as AMD next pipe dream, just like GPGPU was Nvidia pipe dream.
Because as we all know, nobody uses CUDA on Quadros, and nVidia hasn't sold a single Tesla. It's not a pipe dream for nVidia, but a successful revenue generator.
 

galego

Golden Member
Apr 10, 2013
1,091
0
0
http://images.anandtech.com/doci/5491/Screen Shot 2012-02-01 at 2.14.16 PM.png

http://images.anandtech.com/doci/5503/Screen Shot 2012-02-02 at 3.12.58 PM.png

http://images.anandtech.com/doci/5493/Screen Shot 2012-02-02 at 9.20.35 AM.png

The HSAIL isn't even finalized yet; version 0.95 was released just last month. Kaveri is the first APU that is HSA capable; first gen Jaguar products are not going to be HSA capable. Sites like ExtremeTech are essentially failing to understand what HSA is; unified memory is a part of HSA. It is not all of HSA.

I don't think that ExtremeTech is failing to understand HSA and I don't see any indication that they are confounding with unified memory, only that mentioned both in the same paragraph.

Yes, HSA implementation comes in steps. Trinity already includes some HSA and Richland was presented by AMD as HSA enhanced. Kaveri is a step forward in HSA.

However, we know that the APU in the consoles is custom. In this early interview AMD talks about something that looks as HSA albeit the term is not mentioned:

http://www.techradar.com/news/gamin...-gave-it-the-hardware-nvidia-couldn-t-1141607

However, in this interview Cerny (PS4 architect) explicitly discuss "the HSA in PS4"

http://translate.google.com/transla...ess.co.jp/docs/series/rt/20130325_593036.html
 

Khato

Golden Member
Jul 15, 2001
1,293
372
136
However, in this interview Cerny (PS4 architect) explicitly discuss "the HSA in PS4"

http://translate.google.com/transla...ess.co.jp/docs/series/rt/20130325_593036.html

Indeed he does, though take note that the little snippet you chose to quote is from the interviewer/editor, not Mark Cerny. Here's the only mention of HSA that Cerny made (in google translate English):

GPU and CPU because is different in nature quite now, at this stage, when using the language uniform based on the architecture such as HSA, and use it to improve the efficiency is difficult. However, if you used to be able to use the same language on the GPU and CPU, the efficiency of the development will increase dramatically. That point will be a great help extremely. So I think for this part, and how it comes to long-term goals.
And yeah, I don't really want to try and draw anything from that interpretation.
 

galego

Golden Member
Apr 10, 2013
1,091
0
0
Yes the automated translation *ucks at his answer, but I think that the question made by Masayasu Ito (Sony Computer) makes it clear: the PS4 has HSA. In the answer I believe that Cerny is trying to say that they will disclose full HSA support at the software level in the long run. That is, that first games will be not using HSA capabilities.
 
Last edited:

Khato

Golden Member
Jul 15, 2001
1,293
372
136
An actual translation makes that Mark Cerny interview at least a bit easier to understand - http://www.neogaf.com/forum/showthread.php?t=532077

Again, I don't really want to try and draw any conclusions regarding the hardware implementation from that as it can be taken both ways. (Instead I'll just guess that it kinda sounds like a half-implemented version in terms of hardware.) As for the comment on HSA, well, it's not clear whether he's just saying that its current software state is useless or if the initiative is a dead end and they're developing their own API to make it happen. Or if he's talking about the next generation hardware already.

The properties of CPU and GPU are quite difference, so in the current stage, if you were to use an unified architecture such as HSA, it will be difficult to efficiently use the CPU and GPU. However, once the CPU and GPU are able to use the same APIs, development efficiency should increase exponentially. This will be rather huge. Thus, we expect to see this as somewhat of a long-term goal.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
Kaveri doesn't have GPU context switching. That doesn't come until the next gen of GCN (2.0?). Kaveri would be based on GCN 1.1, or whatever you want to call Sea Islands.
Graphic Core Next, came with out-of-order engines and has multiple context switching engines. I think Southern Islands(SI) to Sea Islands(CI) doubled the amount of contexts the GPU can possibly have. Are we talking about the same thing?

SI ACE - 4 contexts
CI ACE - 8 contexts
4 CI ACE engines = GK110 context switching.
7970 has two? -> 8 contexts(GK104 capacity)
7970 CI successor has four -> 32 contexts.
PS4/XB1/Kaveri has two -> 16 contexts.
 
Last edited:

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
Yes, HSA implementation comes in steps. Trinity already includes some HSA and Richland was presented by AMD as HSA enhanced. Kaveri is a step forward in HSA.
But that's not it at all. HSA just isn't a loose collection of technologies, it's the technologies necessary to support a new ISA: HSAIL. Trinity isn't HSA and Jaguar isn't HSA; neither of those parts can execute the HSAIL. Kaveri is the HSA test vehicle; the standard and IL are being written around it.
Graphic Core Next, came with out-of-order engines and has multiple context switching engines. I think Southern Islands(SI) to Sea Islands(CI) doubled the amount of contexts the GPU can possibly have. Are we talking about the same thing?
To clarify, I'm talking about fast GPU context switching. It's not the number of ACEs that's the issue; the issue is that any given CU needs to be able to switch contexts much faster than what GCN 1.x is currently capable of. Kaveri actually won't have this feature - it's not necessary for HSAIL execution - but it's one of the features that "finalizes" HSA, because otherwise you're locked into so-called "monolithic" HSA programs that primarily run on the GPU and don't play nicely with others.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
To clarify, I'm talking about fast GPU context switching.
Fermi -> Graphic Core Next -> Kepler

The three GPU architectures that have fast GPU context switching. Sea Islands just makes it more competitive with Kepler/Kepler+.

What I can get from C.I. right now:
Very vague stuff pretty much in line with CPU compute right now.
- Uses the CPU counters.
- Video & Compute is the same memory space.
- Unaligned Optimization.
- Able to detect memory issues like a CPU.

Also, in reference to SI & CI what is considered the "compute pipeline."
 
Last edited:

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
Because as we all know, nobody uses CUDA on Quadros, and nVidia hasn't sold a single Tesla. It's not a pipe dream for nVidia, but a successful revenue generator.

It was a pipe dream for Nvidia, a dead pipe dream FWIW. Just have a look at this:

http://www.tomshardware.com/news/nvidia-cuda-gpu-fermi-geforce,8766.html

This was the mood in 2009, when Nvidia was trying to become a compute powerhouse. Jensen himself rallied Nvidia to GPGPU, they sent their GPU business to the red to fund R&D for their first GPGPU chip (Fermi) and yet they don't have much to show with their Tesla line. The Quadro business experienced growth along with its native segment and Tesla is still a blip on their revenues. Add to that the fact that Nvidia is investing heavily on mobile and yes, you can say that GPGPU was a flop for Nvidia.

No wonder they reverted to smaller chips with limited compute capabilities for their mainstream parts and are investing heavily on the mobile arena for new revenue streams.