• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Milkyway@Home: consumer GPU or professional GPU?

Sunny129

Diamond Member
i'll get right to the point. i've been wondering lately if it would be worth it to invest in a professional graphics solution whose FP64 performance isn't reduced to 1/4 or 1/5 of its single precision performance like all the AMD/ATI consumer GPUs (or 1/8 or 1/12 of its single precision performance like all the nVidia consumer GPUs), and whose streaming processors aren't limited to only 2 FP64 operations per clock. i know they can get real expensive and have far more memory than i'll ever need for crunching purposes, but i figure even a lower-end and relatively inexpensive professional GPU might compete with (or even eclipse the performance of) a top of the line consumer GPU such as a GTX 580 or an HD 6970. if course this is just a guess at this point b/c i really don't know the exact compute capabilities of the architectures behind either ATI's or nVidia's latest professional level GPUs.

in RussianSensation's Milkyway@Home - GPU performance statistics thread, was kind enough to point out to me that a single Cypress or Cayman streaming processor (from ATI's HD 58xx and HD 69xx series consumer GPUs) is capable of the following calculations per clock:

  • 4 32-bit FP MAD per clock
  • 2 64-bit FP MUL or ADD per clock
  • 1 64-bit FP MAD per clock
  • 4 24-bit Int MUL or ADD per clock
  • SFU : 1 32-bit FP MAD per clock
i would like to find similar information for nVidia's GTX 4xx and GTX 5xx lines of consumer GPUs, as well as for both ATI's and nVidia's latest professional GPU architectures. but i haven't had much luck finding it. even at AMD's and nVidia's own sites i couldn't manage to get that kind of detailed specs. its like the most technical specs they make available to the public are things like memory capacity, memory bandwidth, etc. but nothing about single or double precision performance. might someone be able to point me in the right direction?

TIA,
Eric
 
I found this page were the v5900 is compared to the V5800. on the chart it shows the V5900 as having 0.158 teraflops under double while the V5800 has 0.221. The V7900 gets 0.464 and the V7800 0.403.

Edit:
I went to this wikipedia page where it lists all the amd gpus. It lists that both the V5900 and V7900 run double precision at 1/4 of the single precision speed.
 
Last edited:
thanks, that's exactly what i'm looking for! the charts in the wiki link for ATI GPUs includes double precision performance data. and while nVidia GPU charts don't contain that information, they do provide me with the single precision performance in GFLOPs, from which i can calculate the double precision performance so long as i know the FP64:FP32 ratio.

unfortunately, after further research i think its becoming quite clear that professional GPUs are still too cost-prohibitive for individual home PC crunching rigs given the performance they yield. i guess my quest for something faster/more powerful for crunching started as i was reading through one of AnandTech's articles on the Fermi architecture for Tesla GPUs. toward the bottom of THIS page of the article, there is a chart that says the Fermi architecture is capable of 16 double precision operations per clock (as compared with the 2 double precision operations per clock that FP64-capable consumer GPUs can handle). when i saw this, i guess i got overly excited...but i'm still not sure what i'm not understanding about that chart. both Wiki and nVidia's official site claims Fermi's FP64 performance to be 515 GFLOPs. after going through the math, this number seemed to make sense to me (if the GPU has a core clock of 1150mhz, 448 SP's, and is capable of 2 FLOPs per clock and FP64 performance 1/2 that of its FP32 performance, then 1150*448*2/2 = 515200 = ~515 GFLOPs). so then why does the article i linked above claim that the Fermi architecture can perform 16 double precision operations per clock? that would mean that if it is truly capable of 16 FP64 operations per clock, then its FP64 performance would have to be 1/16 of its single precision performance in order to equal 515 GFLOPs. what am i missing here?
 
If you go to the page 3 on the article that you linked to, in the middle of the page you see where he states that the fermi architecture can run fp64 at half the speed of fp32. But, when I pulled up the review of the gtx 580 & 590, both show that they run fp64 at 1/8 fp32.

Edit: The gtx 480 & 470 http://www.anandtech.com/show/2977/...tx-470-6-months-late-was-it-worth-the-wait-/6article states that nvidia limited the fp64 to 1/8 to separate it from tesla. By looking at their newer cards, it looks like they are still limiting it.
 
Last edited:
yup, that's what i thought. if you look at the chart on page 1 of THIS article, you'll also see that the GTX 570 and 580 are limited to an FP64:FP32 ratio of 1:8 like the GTX 470 & 480. moreover, the double precision performance of the GTX 560 Ti and the GTX 460 is even worse, utilizing a ratio of 1:12.

at any rate, i'm still confused as to why page 4 of THIS article says that Fermi is capable of 16 FP64 operations per clock, when both nVidia's Fermi-based consumer (GeForce) and professional (Tesla) GPUs only perform 2 FP64 operations per clock. maybe the article has a typo? maybe i'm misreading or misunderstanding something?
 
Back
Top