• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

The reason ATi GPUs underperform nVidia hardware

Denithor

Diamond Member
Stumbled across this post in the GPU forum - thought I'd share and elicit some conversation. Make sure you follow the link and read the stuff there.

Originally posted by: Scali
Graphics and GPGPU aren't the same thing. nVidia's G80 pretty much rewrote the book on GPGPU by adding a large shared cache to its shader processors.
This has absolutely no use for graphics, because D3D and OpenGL are designed in a way that each vertex and each pixel is completely independent by definition, and there is no sharing of any data between shaders, ever.

However, when doing GPGPU tasks, you can use the shared memory to have multiple threads communicate with eachother efficiently.
Prior to the 4000-series, ATi GPUs had no shared memory at all. They added it in the 4000-series, but the size is rather limited (boils down to about 128 bytes per thread, compared to nVidia's 512 bytes), as is the bandwidth (about 544GB/s compared to 1,417GB/s on RV790 vs GT200b).

Then I believe there is another limitation in ATi's design... namely that only one thread in every block can write to the shared memory, while the others have read-only access.

All this combined means that ATi cards indeed have some limitations in GPGPU compared to nVidia. This is also apparent in Folding@home for example.
Read this thread for example:
http://foldingforum.org/viewto...p?f=51&t=10442&start=0
It includes comments of people like mhouston, who work for AMD on the Folding@Home client. Basically they're saying that they calculate certain values multiple times because on ATi hardware this is faster than using the shared memory (LDS - Local Data Storage).

 
"Due to a difference in the implementation (in part due to hardware differences), the ATI code must do two force calculations where the x86, Cell, and NVIDIA hardware need only do one. This increases the overall native FLOP count for ATI hardware, but since these are not useful FLOPS in a sense, we did not include them in the x86 count."
 
Interesting, thanks 🙂

Btw you should make it clear in your title that you are refering to F@H.
Outside of F@H ATI cards are pretty competitive 🙂.
 
Good post! That is interesting, I always wondered what the big difference was from.

I don't fold on my 4890 and I'm not sure I would if I owned a nVidia based card either as I really don't want the extra heat. I've bought cards from both of them over the years and think they both make great cards but for me; currently anyway, when it comes to gaming I think ATI has the edge on value.
 
Back
Top