"Why won't ATI Support Cuda and PhysX?"

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Keysplayr

Elite Member
Jan 16, 2003
21,211
50
91
Originally posted by: evolucion8
Originally posted by: Keysplayr
GPGPU performance.

Why isn't it optimized? Why can't they optimize it? Why wont they? If they were able to do it, they would, right? This argument is utter BS because they have had waaaaay more than enough time to properly code for this arch. I think it's the best they could get out of it. I love how you guys are touting performance that will never materialize because it's damn near impossible to code for ATI in it's current arch. Which translates to, if you can't code for it, it's almost useless to try.

That's your opinion which we all respect, but nVidia and ATi engineers are no lousy engineers, they make decisions based on R&D. GeForce FX was a very flawed architecture and yet nVidia was able to optimize it so good that it was able to almost keep up with the R3X0 architecture in nVidia optimized games, considering that ATi has been working for much more time for their super scalar architecture, that ATi has a much better background in software engineering thanks to it's merge with AMD, there's no huge reason to spend so much time in GPGPU performance when most ATi cards sold are used in games.

Is a matter of execution and resource allocation with the driver development team. While nVidia architecture has the upper hand currently with the GPGPU, I don't see it as a key at the selling point level, or as a must have feature, since most applications today aren't completely parallel and will require general purpose calculations which will run like crap with the massive parallel GPU's of today. At the end, both, the software engineer and the ATi driver engineer must work to take advantage of the optimizations at the architecture and yet, Folding@Home is an old client which doesn't even use the Data cache share found on the HD 4000 series, it was made only for HD 3x00 and lower, and GeForce 8 which uses a completely different approach which will work great no matter what, nVidia is about predictable performance since no optimization is necessary to get a good performance of it in GPGPU applications, ATi is about extracting and maximizing parallelism which will require more work.

Well spoken, but after saying all that, it still doesn't change the end game.
This raises many questions. Questions that have been asked many times before.
For examples:

"There is no huge reason for AMD to spend so much time in GPGPU performance"
Ask the universities, laboratories, military, corporations what they spent on Nvidia GPU's for their computing (not gaming) needs. If that's not a reason to pursue GPGPU technologies, I don't know what else is.

"Is a matter of execution and resource allocation with the driver development team."
As many of us have suspected over the last few months, AMD may not even HAVE the resources to dedicate to GPGPU R&D. They are running very thin these days and it's understandable that they need to focus on what can make them money, right now.
It's a tough situation.

"Folding@Home is an old client which doesn't even use the Data cache share found on the HD 4000 series"
I'm no programmer, but is it that difficult to code F@H to utilize the Data caches found on the HD 4000 series? These are my points. It must be TERRIBLY difficult to code for this architecture, otherwise we would be seeing many many more 3rd party applications over the last year or since R7xx launched. We don't have to rely on AMD's resources other than an good SDK for devs. Let the devs do all the work. But they are not. Either because of a really crappy SDK (stream) or the SDK is good, but the hardware is just so wrong, or awkward for these types of GPGPU applications that devs don't even bother.

"nVidia is about predictable performance since no optimization is necessary to get a good performance of it in GPGPU applications, ATi is about extracting and maximizing parallelism which will require more work."

Apparently this is correct. People are using CUDA. Devs are using it. We see what their labors accomplish. Fruitful. Worth the time. ATi is not about extracting and maximizing parallelism. As you said, they are for gaming. GPGPU is third fiddle to them and it painfully demonstrates this aspect in the form of next to no support from devs, or AMD themselves.
 

Keysplayr

Elite Member
Jan 16, 2003
21,211
50
91
Originally posted by: evolucion8
http://www.extremetech.com/art.../0,2845,2324319,00.asp

Right now, Nvidia's cards are better folders, due primarily to better optimized code. With the latest drivers, most GeForce cards are getting pretty close to peak utilization. ATI's cards, which rely on their CAL driver, still seem to have a lot of headroom. In fact, the new Radeon HD 4800 have 800 stream processors, but the current client runs on them as if they were older cards with only 320.

Like I said, Folding@Home doesn't take advantage of the HD4x00 architecture.

In the single aspect of not utilizing the 4xxx series Data cache. Not that it is actually only using 320 shaders out of the 800.

 

dguy6789

Diamond Member
Dec 9, 2002
8,558
3
76
The folding development team is pretty well known for being really, really slow on updating things. It is no surprise at all that they don't have full support for something that takes a bit more effort than everything else.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
Originally posted by: SSChevy2001
How can the ATi client be optimized when it's repeating some of the workload? While it might never be full exploited, in it's current state it's far from efficient.

It's repeating some of the workload because it's faster than to store and retrieve.

It's very simple...
Say you want to do this:
C = A + B;
E = C + D;

Now, apparently on nVidia's architecture, it makes sense do to it this way...
On ATi's architecture, it's faster to do it this way:
C = A + B;
E = A + B + D;

That's a difference in architecture, because apparently nVidia's GPU can store to C and retrieve it again for another operation, in an efficient manner... With ATi it's faster to just recalc A + B rather than to retrieve it from memory.

So in both cases it *is* optimized. The difference is in the hardware. Different hardware requires different optimizations.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
Originally posted by: evolucion8
GeForce FX was a very flawed architecture and yet nVidia was able to optimize it so good that it was able to almost keep up with the R3X0 architecture in nVidia optimized games

Those were games, however. nVidia could just cheat by doing simpler calculations and lower precision. That's not optimization.
You can't do that in a GPGPU application, because it would produce incorrect results.
Optimization is taking an algorithm and rewriting it so it performs faster, while still producing correct results.
In the strictest sense, GeForce FX didn't produce correct results. If you compared an nVidia image with R3x0, the quality on FX was significantly lower. But for a game, 'correct' results don't matter much. As long as it still more or less looks like the same game, people are happy.
With scientific calculations such as MilkyWay, SETI, Folding and all that, obviously the precision and accuracy is a LOT more important.

Originally posted by: evolucion8
considering that ATi has been working for much more time for their super scalar architecture, that ATi has a much better background in software engineering thanks to it's merge with AMD, there's no huge reason to spend so much time in GPGPU performance when most ATi cards sold are used in games.

Funny enough, games is exactly where GPGPU will be used in the future. AMD has been promoting both Havok physics and AI on their videocards.
I'm quite sure AMD also wants a slice of the lucrative supercomputing sector. Just look at what prices nVidia's Tesla cards go for. AMD actually has a similar product line, called FireStream: http://ati.amd.com/products/streamprocessor/specs.html
No, I think AMD has many reasons to spend time on GPGPU performance.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
Originally posted by: Forumpanda
Scali is leaning towards nvidia, evolucion8 is leaning towards ATI, but both argue their point of view

Exactly, I'm leaning towards nVidia's current hardware and toolset, because I believe they are currently a better balance of performance and ease of use, when it comes to GPGPU. I've argued why.

If AMD's next generation of hardware and software addresses their current weaknesses and delivers a better overall balance than nVidia, I'll be the first to buy their stuff.
I've switched brands many times in the past, and I'm not ever planning to stick with inferior technology just because it has a certain brand on it.
I have this feeling that my next videocard could be Intel... :)
 

Atechie

Member
Oct 15, 2008
60
0
0
Originally posted by: Scali
Originally posted by: Forumpanda
Scali is leaning towards nvidia, evolucion8 is leaning towards ATI, but both argue their point of view

*snip*...I have this feeling that my next videocard could be Intel... :)

I see what you did there... :evil:

 

Keysplayr

Elite Member
Jan 16, 2003
21,211
50
91
Originally posted by: Scali
Originally posted by: SSChevy2001
How can the ATi client be optimized when it's repeating some of the workload? While it might never be full exploited, in it's current state it's far from efficient.

It's repeating some of the workload because it's faster than to store and retrieve.

It's very simple...
Say you want to do this:
C = A + B;
E = C + D;

Now, apparently on nVidia's architecture, it makes sense do to it this way...
On ATi's architecture, it's faster to do it this way:
C = A + B;
E = A + B + D;

That's a difference in architecture, because apparently nVidia's GPU can store to C and retrieve it again for another operation, in an efficient manner... With ATi it's faster to just recalc A + B rather than to retrieve it from memory.

So in both cases it *is* optimized. The difference is in the hardware. Different hardware requires different optimizations.

This is what I was figuring (not the technical part but how everything is already optimized).
So, there is no magical performance to tap into. It is what it is going to be.

 

cm123

Senior member
Jul 3, 2003
489
2
76
Originally posted by: Keysplayr
Even if ATI hardware can't run PhysX "as fast" as comparable Nvidia hardware, It couldn't possibly be worse than trying to run it on the CPU. GPUs are so powerful today, and so many of them to choose from, I don't think PhysX would hold back a powerful enough ATI card. Unless of course performance even when tweaked at the programming level is abysmal. But I think ATI cards have enough juice to run PhysX acceptably.



very true statement keys - however do you think amd wants to enable this and not be #1 or at least close to #1 in performance?

more-so with the way the next gen of cards look - maybe once amd is on top or very close and has a next gen of cards that look to be on top or very close - I don't need to tell you the next gen of nvidia card are monsters/power house of a card, yet for amd next release is bit more of a refresh with or before win7 and the next release after that looking bit more like the main event against nvidia new cards coming.

being a focus group member, you know sometimes it is in fact about being #1 when many here would debate that - other times its about having what will sell or be main-stream as long as people here think also for that to be so - how would it look (guessing you have seen those projection numbers) if amd enables physx on there cards and did not fair very well against nvidia even though it may run the game physx just fine?
 

Keysplayr

Elite Member
Jan 16, 2003
21,211
50
91
Originally posted by: cm123
Originally posted by: Keysplayr
Even if ATI hardware can't run PhysX "as fast" as comparable Nvidia hardware, It couldn't possibly be worse than trying to run it on the CPU. GPUs are so powerful today, and so many of them to choose from, I don't think PhysX would hold back a powerful enough ATI card. Unless of course performance even when tweaked at the programming level is abysmal. But I think ATI cards have enough juice to run PhysX acceptably.



very true statement keys - however do you think amd wants to enable this and not be #1 or at least close to #1 in performance?

more-so with the way the next gen of cards look - maybe once amd is on top or very close and has a next gen of cards that look to be on top or very close - I don't need to tell you the next gen of nvidia card are monsters/power house of a card, yet for amd next release is bit more of a refresh with or before win7 and the next release after that looking bit more like the main event against nvidia new cards coming.

being a focus group member, you know sometimes it is in fact about being #1 when many here would debate that - other times its about having what will sell or be main-stream as long as people here think also for that to be so - how would it look (guessing you have seen those projection numbers) if amd enables physx on there cards and did not fair very well against nvidia even though it may run the game physx just fine?

I can safely say, that any ATI user would be positively happy if ATI enables PhysX on their GPUs. If they said they wouldn't be happy, they would be lying. It would be free for the end user, and enable them to play games that utilize GPU PhysX. I've seen people complaining that they are upset that with hardware physics enabled, they are only getting 11fps in Cryostasis. That would upset me too. Having faith that AMD will run Havok or OpenCL based Havok on their GPU's is kind of a stretch with their current architecture and the R8xx refresh of same architecture. Like you mention, we will probably have to wait for ATI's true next gen architecture and that's only with hoping they move away from this current architectural design. It's like the pentium4 of GPU's. Good at certain tasks (P4 was good at encoding, R7xx good at gaming)
But kind of sucked wind in most everything else. At least compared to AMD's Athlon 64 offerings, or Nvidia's CUDA arch, respectively.

AMD risks not being "second" in performance by not enabling PhysX on their GPUs. But that still leaves all their users in the cold. The exact REAL reasons for their decision is only known to the ladies and gentlemen of AMD's board room and top engineers. We'll never know the truth of it.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
We may soon find ourselves in a rather strange situation:
1) nVidia OpenCL performs better than ATi OpenCL in most applications
2) nVidia C for Cuda performs better than nVidia OpenCL in most applications

What if this means that developers will just stick to Cuda?
 

cm123

Senior member
Jul 3, 2003
489
2
76
Originally posted by: Keysplayr
Originally posted by: cm123
Originally posted by: Keysplayr
Even if ATI hardware can't run PhysX "as fast" as comparable Nvidia hardware, It couldn't possibly be worse than trying to run it on the CPU. GPUs are so powerful today, and so many of them to choose from, I don't think PhysX would hold back a powerful enough ATI card. Unless of course performance even when tweaked at the programming level is abysmal. But I think ATI cards have enough juice to run PhysX acceptably.



very true statement keys - however do you think amd wants to enable this and not be #1 or at least close to #1 in performance?

more-so with the way the next gen of cards look - maybe once amd is on top or very close and has a next gen of cards that look to be on top or very close - I don't need to tell you the next gen of nvidia card are monsters/power house of a card, yet for amd next release is bit more of a refresh with or before win7 and the next release after that looking bit more like the main event against nvidia new cards coming.

being a focus group member, you know sometimes it is in fact about being #1 when many here would debate that - other times its about having what will sell or be main-stream as long as people here think also for that to be so - how would it look (guessing you have seen those projection numbers) if amd enables physx on there cards and did not fair very well against nvidia even though it may run the game physx just fine?

I can safely say, that any ATI user would be positively happy if ATI enables PhysX on their GPUs. If they said they wouldn't be happy, they would be lying. It would be free for the end user, and enable them to play games that utilize GPU PhysX. I've seen people complaining that they are upset that with hardware physics enabled, they are only getting 11fps in Cryostasis. That would upset me too. Having faith that AMD will run Havok or OpenCL based Havok on their GPU's is kind of a stretch with their current architecture and the R8xx refresh of same architecture. Like you mention, we will probably have to wait for ATI's true next gen architecture and that's only with hoping they move away from this current architectural design. It's like the pentium4 of GPU's. Good at certain tasks (P4 was good at encoding, R7xx good at gaming)
But kind of sucked wind in most everything else. At least compared to AMD's Athlon 64 offerings, or Nvidia's CUDA arch, respectively.

AMD risks not being "second" in performance by not enabling PhysX on their GPUs. But that still leaves all their users in the cold. The exact REAL reasons for their decision is only known to the ladies and gentlemen of AMD's board room and top engineers. We'll never know the truth of it.



I agree keys...

however even getting amd onboard with physx, step two is since its not part of dx (yet at least) getting lots more games with physx support. All my fav games now do not have physx, I'd like to see that change myself.

not even good amd partners have the answers from amd about many of their decisions (as of late) and that includes those that are beta or what nvidia calls focus group members. They are in such a rush just to land a full dx11 part ahead of nvidia, little focus is on performance, true product line update.

too bad intel would not come more into the pic with higher end graphic cards and matrox make a come back, and nvidia/amd merge - that would be interesting.

have you had your hands on any of the intel solutions? their more top of the line is a bit like 8800GTX in features and performance - wonder if intel will in fact launch this fall and with bit more improved product line.




 

tommo123

Platinum Member
Sep 25, 2005
2,617
48
91
Originally posted by: Scali
Originally posted by: Forumpanda
Scali is leaning towards nvidia, evolucion8 is leaning towards ATI, but both argue their point of view
I have this feeling that my next videocard could be Intel... :)

i think mine might be. i dont like nvidia cards (tried one a few months back and it reminded me of when i had a 7950 gx2 - hated it). but..... ATi is useless for other things, e.g there are zero options for anything CUDA like. Stream isn't used in anything i've seen that would be useful for me.

Intel hopefully will have something out that competes with at least a midrange part, but would be great for GPGPU purposes (i'm hoping x264 offloading some of the workload onto the gpu), and if intel is serious, it might be priced aggresively too.
 

Dribble

Platinum Member
Aug 9, 2005
2,076
611
136
Originally posted by: Scali
We may soon find ourselves in a rather strange situation:
2) nVidia C for Cuda performs better than nVidia OpenCL in most applications

What if this means that developers will just stick to Cuda?

3) It's easier to programme for Cuda then OpenCL.

Obviously in the end CUDA will die and get replaced by DirectX compute or OpenCL - I suspect DirectX compute will win in the end because it'll end up easier to programme then OpenCL (the same way as DX managed to displace the very well established OGL) - MS are good at that sort of thing.

However right now CUDA is really the only option, particularly because as I understand it nvidia's OpenCL implementation is currently slow (i.e. not really any faster then trying to do the same thing with the cpu), and Ati are still a way off bringing out OpenCL support.
 

evolucion8

Platinum Member
Jun 17, 2005
2,867
3
81
Originally posted by: Scali
It's repeating some of the workload because it's faster than to store and retrieve.

It's very simple...
Say you want to do this:
C = A + B;
E = C + D;

Now, apparently on nVidia's architecture, it makes sense do to it this way...
On ATi's architecture, it's faster to do it this way:
C = A + B;
E = A + B + D;

That's a difference in architecture, because apparently nVidia's GPU can store to C and retrieve it again for another operation, in an efficient manner... With ATi it's faster to just recalc A + B rather than to retrieve it from memory.

So in both cases it *is* optimized. The difference is in the hardware. Different hardware requires different optimizations.

But the Folding@home client is optimized for older ATi architectures which didn't had the cache hierarchy that the current architecture has, so what you said makes sense, because it would be a lot slower trying to retrieve it from memory instead of recalculating it.

Originally posted by: Scali
Those were games, however. nVidia could just cheat by doing simpler calculations and lower precision. That's not optimization.
You can't do that in a GPGPU application, because it would produce incorrect results.
Optimization is taking an algorithm and rewriting it so it performs faster, while still producing correct results.
In the strictest sense, GeForce FX didn't produce correct results. If you compared an nVidia image with R3x0, the quality on FX was significantly lower. But for a game, 'correct' results don't matter much. As long as it still more or less looks like the same game, people are happy.
With scientific calculations such as MilkyWay, SETI, Folding and all that, obviously the precision and accuracy is a LOT more important.

The main reason that the FX was slow it was because it had too few registers available in full precision and it had to go to lower precision, also it's hardware scheduler was very weak which all what it did was juggling data senselessly inside of the GPU, also it lacked of math power, too few texture units, it's weird Vertex Shader layout which was very innefficient, there's more than simply precision accuracy.

Originally posted by: evolucion8
considering that ATi has been working for much more time for their super scalar architecture, that ATi has a much better background in software engineering thanks to it's merge with AMD, there's no huge reason to spend so much time in GPGPU performance when most ATi cards sold are used in games.

Funny enough, games is exactly where GPGPU will be used in the future. AMD has been promoting both Havok physics and AI on their videocards.
I'm quite sure AMD also wants a slice of the lucrative supercomputing sector. Just look at what prices nVidia's Tesla cards go for. AMD actually has a similar product line, called FireStream: http://ati.amd.com/products/streamprocessor/specs.html
No, I think AMD has many reasons to spend time on GPGPU performance.[/quote]

Yes, you said FireStream, which uses a specialized driver which is certified for certain applications, does ATi or nVidia do the same for consumer graphics? AFAIK nobody buys Firestream to run folding@home or similar applications, stating that ATi GPGPU performance is lacking because of it's weak performance in Folding@Home which isn't optimized and updated to take advantage of the HD4X00 architecture or because doesn't have PhysX is simply a personal opinion which doesn't have a strong founded base because there's a valid reason why it runs slower that it should, ATi hardware is more dependant of software optimizations than nVidia's counterpart.

 

evolucion8

Platinum Member
Jun 17, 2005
2,867
3
81
Originally posted by: Keysplayr
This is what I was figuring (not the technical part but how everything is already optimized).
So, there is no magical performance to tap into. It is what it is going to be.

It's not about magic, like I stated in previous posts using Sandra, ATi hardware is weak when it has to retrieve data from the memory, but it' s much faster when is crunching calculations, specially in Double Precision. In Scali's Post, is the same example, only that Folding@Home would perform worse because it doesn't fully utilized the cache hierarchy of the HD 4x00 series and all it's stream processors, that explain why the HD 4800 is almost 2.5 times faster theorically compared to the HD 3870 but is barely faster when it comes to Folding@Home.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
Originally posted by: evolucion8
But the Folding@home client is optimized for older ATi architectures which didn't had the cache hierarchy that the current architecture has, so what you said makes sense, because it would be a lot slower trying to retrieve it from memory instead of recalculating it.

I, and the developers in the thread I linked to, was talking about the 4000 series.
Older architectures did not have LDS at all, so I couldn't possibly have been talking about those.
 

Scali

Banned
Dec 3, 2004
2,495
0
0
Originally posted by: evolucion8
It's not about magic, like I stated in previous posts using Sandra, ATi hardware is weak when it has to retrieve data from the memory, but it' s much faster when is crunching calculations, specially in Double Precision. In Scali's Post, is the same example, only that Folding@Home would perform worse because it doesn't fully utilized the cache hierarchy of the HD 4x00 series and all it's stream processors, that explain why the HD 4800 is almost 2.5 times faster theorically compared to the HD 3870 but is barely faster when it comes to Folding@Home.

Because of the hardware design it's just not possible to reach the same levels of efficiency as nVidia gets.
That's why ATi's theoretical figures are far more theoretical than nVidia's.
You can't just fix that by optimizing software. The software already IS optimzed, this is as good as it gets. It's a choice ATi made when designing the hardware. Their architecture crams a lot of processing power into a small area... but the downside is that you can't really extract that processing power.
In graphics you already see that... 4890 has the highest theoretical specs, but GTX285 is generally faster. And graphics is the best possible case for ATi's architecture, since the code is embaressingly parallel and doesn't rely on any kind of local sharing at all. Most GPGPU algorithms will not run as efficiently as graphics will.
Heck, some of the nVidia SDK samples give an estimate of actual flops... My 8800GTS scores about 90 GFLOPS in the n-body sample... But the theoretical performance is 230 GFLOPS.
I'd be living in a fairytale world if I was expecting that a few optimizations would suddenly make the n-body sample 2.5x as fast, because that's what my card could *theoretically* do. Real-world algorithms just don't work that way.
 

evolucion8

Platinum Member
Jun 17, 2005
2,867
3
81
Yeah, unfortunately, only games can get such performance boost in videocards, while regular applications can only get such performance boost in CPU which are flexible as hell. But I mean that Folding@Home at least should be able to run much faster on the HD 4870 than on the HD 3870, probably not near it's maximum theorically speed, but considerably faster than it's running today, and not magically like Keysplyr stated previously, and of course that if the 8800GTS can run 230GLFOPS, it's when running 3D games because of it's parallel nature, but in stuff like n-body, no general purpose application can untap such power.