AMD: 4XXX not great in OpenCL

Soulkeeper

Diamond Member
Nov 23, 2001
6,712
142
106
I didn't see any benchmarks in there.
Sounds like they are just going off a quote.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
Wasn't that the same site that had several "fermi cards" lying around?
 

VirtualLarry

No Lifer
Aug 25, 2001
56,226
9,990
126
It's known that ATI's 48xx series lacks the type of local memory cache that NV cards have, which is (partially) why their F@H performance is less. From what I've read, it's actually faster on ATI cards to compute some things twice, rather than compute them once and store the value to be retrieved later. Hence ATI cards score half as much as NV cards in F@H.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
6,719
7,016
136
ATI was probably hurting for money so bad after the HD 2xxx & 3xxx series (and AMD's Phenom flop) that they just needed a competitive part out without much R&D or future-proofing that could make them a quick buck. and hence the HD4xxx series was born.
 

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
ATI was probably hurting for money so bad after the HD 2xxx & 3xxx series (and AMD's Phenom flop) that they just needed a competitive part out without much R&D or future-proofing that could make them a quick buck. and hence the HD4xxx series was born.

These parts are designed 2 to 4 years before they ever hit market, it's unlikely the 4xxx series could have reacted very much at all to the 3xxx and Phenom series, and since the 4xxx series isn't very different from the 3xxx series, it's unlikely to be a reaction to the 2xxx series as well.

Besides that, DirectX 10.1 is future proofing.
As for OpenCL, well AMD had their own idea of what a compute language would look like, and nvidia's won out. The 4xxx cards can still offer competitive performance to nvidia's 8xxx/9xxx cards once double precision computations are used anyway. And the 5xxx series blows away nvidia's cards in single and double precision.

But yes, the 4xxx series' opencl implementation is basically emulating local storage, so algorithms that rely on it will run like crap. Still, it's not like the 4xxx is worthless for opencl, and at this point it's much better to have the widest install base for opencl possible, rather than limiting it only to cards that will run the most optimally.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Fox5 could you expand on tha statement right here.

As for OpenCL, well AMD had their own idea of what a compute language would look like, and nvidia's won out. The 4xxx cards can still offer competitive performance

Your not suggesting Open CL is a NIvidia thing are you ??? Its Apple> Not NV. C is for Cuda and the game is just starting. No winners Yet . Just Whinners
 

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
Fox5 could you expand on tha statement right here.

As for OpenCL, well AMD had their own idea of what a compute language would look like, and nvidia's won out. The 4xxx cards can still offer competitive performance

Your not suggesting Open CL is a NIvidia thing are you ??? Its Apple> Not NV. C is for Cuda and the game is just starting. No winners Yet . Just Whinners

OpenCL isn't made by nvidia, but it's not very different in design from CUDA. Nvidia certainly had a large influence on it, even if just leading by example.
 

evolucion8

Platinum Member
Jun 17, 2005
2,867
3
81
ATI was probably hurting for money so bad after the HD 2xxx & 3xxx series (and AMD's Phenom flop) that they just needed a competitive part out without much R&D or future-proofing that could make them a quick buck. and hence the HD4xxx series was born.

Totally wrong, the RV770 took 3 years to be designed, it isn't something pulled from under sleeves.

These parts are designed 2 to 4 years before they ever hit market, it's unlikely the 4xxx series could have reacted very much at all to the 3xxx and Phenom series, and since the 4xxx series isn't very different from the 3xxx series, it's unlikely to be a reaction to the 2xxx series as well.

I agree with you.

It's known that ATI's 48xx series lacks the type of local memory cache that NV cards have, which is (partially) why their F@H performance is less. From what I've read, it's actually faster on ATI cards to compute some things twice, rather than compute them once and store the value to be retrieved later. Hence ATI cards score half as much as NV cards in F@H.

I also heard that the protein used in ATi cards is much more complex than the ones used on nVidia hardware which loves tons of simple threads instead of the very large instructions that ATi arch loves, hence the double calculation.

I also think that ATi's OpenCL implementation it's in beta stage, I'm using the latest beta drivers dated this month and I'm still not able to run any OpenCL benchmark. So I don't know how speculation is made if there's not even benchmarks out there available, and in NGOHQ DirectCompute benchmark, the HD 4890 scores higher than the HD 5770. But we'll know that because of the global data share limitation in the HD 4800 series, it's quite hard to keep busy all it's execution resources, specially such a wide architecture as the RV7x0 series. Don't know much about hardware limitations, but love to read and learn.
 

Soleron

Senior member
May 10, 2009
337
0
71
http://www.techreport.com/discussions.x/18201

That is a better article on the situation. Has a second AMD quote.

Villmow later qualified that response by saying, "[the Radeon HD 4870] just has to be programmed differently than the 5XXX series to get performance because of the lack of proper hardware local support. It is possible to get good performance, just not with a direct port from Cuda [Nvidia's GPU compute architecture]." He also stressed that AMD's compiler stack will include more device-specific optimizations as it matures.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
6,719
7,016
136
These parts are designed 2 to 4 years before they ever hit market, it's unlikely the 4xxx series could have reacted very much at all to the 3xxx and Phenom series, and since the 4xxx series isn't very different from the 3xxx series, it's unlikely to be a reaction to the 2xxx series as well.

Besides that, DirectX 10.1 is future proofing.
As for OpenCL, well AMD had their own idea of what a compute language would look like, and nvidia's won out. The 4xxx cards can still offer competitive performance to nvidia's 8xxx/9xxx cards once double precision computations are used anyway. And the 5xxx series blows away nvidia's cards in single and double precision.

But yes, the 4xxx series' opencl implementation is basically emulating local storage, so algorithms that rely on it will run like crap. Still, it's not like the 4xxx is worthless for opencl, and at this point it's much better to have the widest install base for opencl possible, rather than limiting it only to cards that will run the most optimally.

-I don't follow:

The 3xxx series was itself a card released to make up for the 2xxx series shortfalls (namely heat and price). You're effectively saying ATI was planning on including a ring-bus on its 2xxx series parts, only to ditch it one generation later? I don't think ATI dumped that kind of R&D into the project for a one off.

Yeah these cards are PLANNED 2-3 years in advance, but that doesn't mean the situation on the ground cannot or does not radically affect future product launches.

Take the 4xxx series orignal rumored specs of 480 SPs. That could very well have been ATI's original roadmap back before they released the 2xxx and saw it get clobbered. Something like SPs scale easily and very well, there is no reason ATI couldn't have done some shotgun cores for testing purposes and upped the SP count to be competitive with Nvidia.
 

evolucion8

Platinum Member
Jun 17, 2005
2,867
3
81
-I don't follow:

The 3xxx series was itself a card released to make up for the 2xxx series shortfalls (namely heat and price). You're effectively saying ATI was planning on including a ring-bus on its 2xxx series parts, only to ditch it one generation later? I don't think ATI dumped that kind of R&D into the project for a one off.

Well, you seems to forgot that the ringbus was initially used on the Radeon X1800 series and it's derivatives. It worked fine and later improved with fully distributable reads and writes, it gave the Radeon HD 2900XT/HD 3800 series with 85% of bandwidth efficiency. The main issue with it was because it consumes a lot of power and using a Hub based memory controller like the one used with the HD 4800 series gave almost a 100% bandwidth efficiency with much better power consumption and granularity.

Yeah these cards are PLANNED 2-3 years in advance, but that doesn't mean the situation on the ground cannot or does not radically affect future product launches.

You are correct, like the last minute change done with the HD 4850 before launch that it got it's VRAM amount doubled, plus higher core and RAM speeds.

Take the 4xxx series orignal rumored specs of 480 SPs. That could very well have been ATI's original roadmap back before they released the 2xxx and saw it get clobbered. Something like SPs scale easily and very well, there is no reason ATI couldn't have done some shotgun cores for testing purposes and upped the SP count to be competitive with Nvidia.

I heard that initially the HD 4800 series got the 480SP design as their target, but the architecture was done so nicely with die space saving techniques and stuff that the chip couldn't use a 256-bits BUS due to padding issues, so they stuffed the GPU with more SP so they could use a wider bus width. It's quite impossible to create a SKU to test it against their rival cards specially if its rival is designing the GPU and it isn't even out yet.
 

Soulkeeper

Diamond Member
Nov 23, 2001
6,712
142
106
A quote made by ATI.
Seems a safe bet that ATI aren't going to trash talk their own parts for fun.

I guess my opinion is wrong again ....
Thanks for replacing it with your opinion.

We wouldn't want me to say anything on a forum without being quoted.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
6,719
7,016
136
Well, you seems to forgot that the ringbus was initially used on the Radeon X1800 series and it's derivatives. It worked fine and later improved with fully distributable reads and writes, it gave the Radeon HD 2900XT/HD 3800 series with 85% of bandwidth efficiency. The main issue with it was because it consumes a lot of power and using a Hub based memory controller like the one used with the HD 4800 series gave almost a 100% bandwidth efficiency with much better power consumption and granularity.



You are correct, like the last minute change done with the HD 4850 before launch that it got it's VRAM amount doubled, plus higher core and RAM speeds.



I heard that initially the HD 4800 series got the 480SP design as their target, but the architecture was done so nicely with die space saving techniques and stuff that the chip couldn't use a 256-bits BUS due to padding issues, so they stuffed the GPU with more SP so they could use a wider bus width. It's quite impossible to create a SKU to test it against their rival cards specially if its rival is designing the GPU and it isn't even out yet.

-Sorry, forgot where the ringbus was introduced and when it was discontinued. I retract my statement.