Milkyway@Home - GPU performance statistics

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Sunny129

Diamond Member
Nov 14, 2000
4,823
6
81
^ thanks for that info! i decided to research it a bit further, and there is a thread on the MW@H message boards dedicated to MW@H GPU requirements. it lists the following graphics cards as capable of double floating point precision arithmetic, and therefore capable of running MW@H:

NVIDIA:
Compute Capability 1.3 and Above
Example Products:
Geforce GTX 295
Geforce GTX 285
Geforce GTX 280
Geforce GTX 275 (credits to Bruce)
Geforce GTX 260
Tesla S1070
Tesla C1060
Quadro Plex 2200 D2
Quadro FX 5800
Quadro FX 4800
(Based on GT200 GPU)

ATI:
Example Products:
ATI HD Radeon 6970
ATI HD Radeon 6950
ATI HD Radeon 5970 (credits to kashi)
ATI HD Radeon 5870
ATI HD Radeon 5850
ATI HD Radeon 4890
ATI HD Radeon 4870
ATI HD Radeon 4850
ATI HD Radeon 4770
ATI HD Radeon 4830
ATI HD Radeon 38x0
ATI Firestream 9270
ATI Firestream 9250
ATI Firestream 9170
the thread was started on 1/19/10 and edited/updated on 1/21/10, but not since then, so obviously this list is going to me missing some newer cards that are also capable of running MW@H. also, for more information - and a good read in general - the entire thread can be found HERE.

i also found a link in that thread to an article (click HERE) at geeks3D.com entitled "Radeon HD 5770 Has No Double Precision Floating Point Support." if you scroll down slightly, you'll find a fairly current chart of ATI's/AMD's GPU lineup specifying which cards are double precision capable and which cards aren't (i'm not sure how current the list is b/c 1) the fastest card on the list is a 5870, and 2) i can't seem to find a publication date for the article).

i don't know if this kind of information already exists here @ AT, but i was too lazy to search...and besides, this is probably good info for this thread despite being slightly OT. i figure it helps to know which cards are compatible w/ MW@H before one purchases a particular card and tries to document WU times with it.
 
Last edited:

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
i don't know if this kind of information already exists here @ AT, but i was too lazy to search...and besides, this is probably good info for this thread despite being slightly OT. i figure it helps to know which cards are compatible w/ MW@H before one purchases a particular card and tries to document WU times with it.

Definitely not off topic. Having a list of compatible double precision cards will help those who want to contribute to this project :)

BTW, I recently sold my GTX470 and purchased an HD6950 which I plan to unlock into a 6970. I will update the chart with results as soon as I can (likely next week).

HD6950/6970 series are now capable of 1/4th DP of their single precision performance vs. 1/5th for HD5850/5870 series. Therefore, I see HD6950/6970 cards finally overtaking the 5870. Unfortunately, HD6850/6870 series do not have FP64 support (double-precision support).

Double-Precision Performance:

*GTX285 = 88.5 GFlops
*GTX480 = 169 Gflops (chart in the *Source has a mistake. 1.35Gflops SP * 1/8th for Fermi ~ 169 Gflops DP)
*5870 = 544 GigaFLOPS
(*Source)

**6950 = 563 GFlops
**6970 = 675 Gflops
(**Source)
 
Last edited:

Sunny129

Diamond Member
Nov 14, 2000
4,823
6
81
i don't know how specific you're looking to get with crunch times, but my 5870 is finishing MW@H tasks in approx. 1:27...which makes sense since Dajeepster's 5870 is doing them in 1:30...i don't know if he rounded off to 30 seconds or if his WU's were actually completing in 1:30 on the dime though...
 

petrusbroder

Elite Member
Nov 28, 2004
13,348
1,155
126
i also found a link in that thread to an article (click HERE) at geeks3D.com entitled "Radeon HD 5770 Has No Double Precision Floating Point Support." if you scroll down slightly, you'll find a fairly current chart of ATI's/AMD's GPU lineup specifying which cards are double precision capable and which cards aren't (i'm not sure how current the list is b/c 1) the fastest card on the list is a 5870, and 2) i can't seem to find a publication date for the article).

The article @ Geeks.3D is from October 2009 ... a bit old, but still very informative!
Here are some more data for the table above:

HD6970 ... double precision .... 683 GFLOPS (as mentionned above)
HD6950 ... double precision .... 562 GFLOPS (as mentionned above)

HD6870 ... singel precision ... not usable for MilkyWay@Home
HD6850 ... singel precision ... not usable for MilkyWay@Home

Source: www.ati.com
 

Sunny129

Diamond Member
Nov 14, 2000
4,823
6
81
thanks for the additions Petrus...i've updated my list above w/ the HD 6xxx GPUs.
 

Assimilator1

Elite Member
Nov 4, 1999
24,165
524
126
i don't know how specific you're looking to get with crunch times, but my 5870 is finishing MW@H tasks in approx. 1:27...which makes sense since Dajeepster's 5870 is doing them in 1:30...i don't know if he rounded off to 30 seconds or if his WU's were actually completing in 1:30 on the dime though...
As I was saying earlier in this thread, WU times are meaningless unless you quote the granted credit you got for them too :).
 

Sunny129

Diamond Member
Nov 14, 2000
4,823
6
81
As I was saying earlier in this thread, WU times are meaningless unless you quote the granted credit you got for them too :).
oops...forgot about that, despite you mentioning this to me on more than one occasion :oops:. i don't have any MW@H GPU tasks going right now, so i'll have to download some work when i get home later today, let a few tasks run to completion, and keep a close eye on my tasks status at the MW@H website since valid results only stay on the screen for a few minutes before disappearing. granted, i know that my taking it this far does the community little good unless everyone else reporting run times also reports their earned credit...otherwise we won't know for sure if anyone comparing their run times to mine is actually getting an "apples to apples" comparison. but we gotta start somewhere...

thanks for reminding me again about this important aspect of comparing run times...
 

Sunny129

Diamond Member
Nov 14, 2000
4,823
6
81
OK,

i ran 17 MW@H GPU tasks on my 5870 when i got home from work today. the first task ran for 1:26, the 2nd ran for 1:27, and the remaining 15 ran for 1:28. three of them are still pending, while the other 14 earned 213.76 credits each. unfortunately due to the fact that valid results disappear from the MH@W server in a matter of minutes (if that), i have no links to the valid WU's. i do however have links to the 3 that are still pending...this way you at least know what kind of WU's i was crunching:

de_separation_11_3s_fix_3_1263533_1299965200_0

de_separation_11_3s_fix_3_1263529_1299965200_0

de_separation_11_3s_fix_3_1263518_1299965200_0

they were all consecutive tasks, i.e. 1263517 through 1263533, so they were all of the same type. hopefully other will follow suit from here on out and post up the credit they earned for the specific type of WU they crunched.

*EDIT* - the above links are now dead b/c the "data-driven web pages" server only holds that information for a short period of time.
 
Last edited:

Assimilator1

Elite Member
Nov 4, 1999
24,165
524
126
Lol, no probs :)

Yea MW is even more of a PITA to benchmark then SETI, your reporting the granted credit is plenty good enough anyway :).

Talking of WU times+credit, I should report some numbers!:$
 

Sunny129

Diamond Member
Nov 14, 2000
4,823
6
81
should i be reporting the initial "claimed" credit as well? i can go back and do that if it'll help any.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
i don't know how specific you're looking to get with crunch times, but my 5870 is finishing MW@H tasks in approx. 1:27...which makes sense since Dajeepster's 5870 is doing them in 1:30...i don't know if he rounded off to 30 seconds or if his WU's were actually completing in 1:30 on the dime though...

I am not too concerned about exact times simply because I doubt it is realistic given the sample size. I will specify a range based on different observations provided in the thread (and discount large outliers). I wanted to start this thread to get a better idea of GPU rankings for this particular application since it takes full advantage of double precision performance. Of course if we had 10,000 data points for an HD5870, for example, I would be much more confident in the exact time. :D

My stock HD6950 completes a unit in 1 min 26-27 seconds as well.
 
Last edited:

Assimilator1

Elite Member
Nov 4, 1999
24,165
524
126
Like I mentioned earlier, WU times are meaningless unless you quote them with the granted credit. It's not even worth a rough figure without those numbers.

Where granted credit numbers are included these will give consistent numbers & comparable where the credit number is the same or at least close (but getting those numbers itself can be a PITA! ;)).

I have wondered if you divide the credit by the time to give credits/seconds whether you could then compare all WUs where you have the granted credit & time, however I haven't looked in to it yet. Would need lots of WU times with granted credit to work that out.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
Like I mentioned earlier, WU times are meaningless unless you quote them with the granted credit. It's not even worth a rough figure without those numbers.

We have been recording the 212-213 work credit unit though. I believe that is the smallest work unit in this project. I will specify the exact size once the BOINC servers are up. As this happens to be the fastest work unit as far as I am aware, therefore, when people report their fastest times, this specifically targets the same work unit for all of us. As a result, there should be no ambiguity in the chart.

I have other work units which compute in 1 min 38 seconds, 1 min 49 seconds. Post #6 specifies that the work unit times are to be collected for a work unit with ~ 213 credits. I should have been more clearer in the original post. Thanks for pointing this out. :thumbsup:
 
Last edited:

Assimilator1

Elite Member
Nov 4, 1999
24,165
524
126
Ah that's cool then :), if you're 100% sure they are the smallest WUs ;).
Re post 6, doh! I missed that.

Finally got a time with a credit number now for my HD 4830, 307s for a 213.76 credit WU.
Oh & its slightly overclocked now with the GPU at 600 MHz.
 

Sunny129

Diamond Member
Nov 14, 2000
4,823
6
81
i noticed some odd MW@H task completion times and credit assessments this morning before i left for work. first, let me start by saying that up until this morning, i only been aware of two specific types of MW@H "de separation" tasks - ones that award ~213 credits (which run for ~88 seconds on my HD 5870), and ones that award ~267 credits (which run for ~108 seconds on my HD 5870). for the purpose of this thread i obviously i reported the tasks that took ~88 seconds to complete, since [up until now] they had the shortest run times of all MW@H tasks i would receive (and were presumably earning the same ~213 credits that everyone else was earning with their fastest-completing MW@H tasks).

however, this morning i noticed some run times that seemed pretty randomly distributed. for instance, some of the run times i noticed were 72, 75, 76, 77, and 97 seconds. while none of these run times approximate the 88-sec and 108-sec run times i'm used to seeing, the 75, 76, and 77-sec tasks each earned ~213 credits (just like an 88-sec task would have earned), the 97-sec task earned ~267 credits (just like a 108-sec task would have earned), and the 72-sec task earned an uncommon ~241 credits.

so i'm hesitant to say that the 72-sec task is a valid result for the MW@H GPU performance chart on page 1 of this thread b/c it earned more than the typical ~213 credits...but it is without a doubt the shortest run time i've seen from a MW@H task on my HD 5870 GPU. the 75, 76, and 77-sec tasks on the other hand did earn the typical ~213 credits that the run times in the GPU performance chart are supposed to be based off of. and so i suppose the shortest run time based on the appropriate amount of awarded credit (~213) with my HD 5870 is now 75-sec (instead of 88-sec previously). i don't know why these tasks took approx. 10 seconds less to finish than the typical ~213-credit tasks take on my 5870. likewise, i don't know why the 97-sec task took approx. 10 seconds less to finish than the typical ~267-credit task.

anyone have a possible explanation for this?
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
however, this morning i noticed some run times that seemed pretty randomly distributed. for instance, some of the run times i noticed were 72, 75, 76, 77, and 97 seconds. while none of these run times approximate the 88-sec and 108-sec run times i'm used to seeing, the 75, 76, and 77-sec tasks each earned ~213 credits (just like an 88-sec task would have earned), the 97-sec task earned ~267 credits (just like a 108-sec task would have earned), and the 72-sec task earned an uncommon ~241 credits.

Same for me on a 6950. I have also noticed my GPU usage in MSI Afterburner fluctuate from 82-88% to 94-97% depending on the unit. Certain 213 units go all out at 94-97% while other units are only utilizing 82-88% of my GPU for the same granted credit. :hmm:

21376.png


The above credits were earned with an HD6950 unlocked to 1536 shaders @ 840mhz. Yet, the card is still slower than your stock HD5870. Strange considering it has more double precision performance at that point. Perhaps the drivers aren't optimized enough for VLIW-4 architecture yet.
 
Last edited:

Sunny129

Diamond Member
Nov 14, 2000
4,823
6
81
Same for me on a 6950. I have also noticed my GPU usage in MSI Afterburner fluctuate from 82-88% to 94-97% depending on the unit. Certain 213 units go all out at 94-97% while other units are only utilizing 82-88% of my GPU for the same granted credit. :hmm:
yes i get that GPU load fluctuation too...i'm guessing the reason for that though lies in the nature of the structure of the data itself, hence forcing the GPU to use more transistors simultaneously at some times more than others during the crunching process, and thus causing GPU load to fluctuate to some degree.


The above credits were earned with an HD6950 unlocked to 1536 shaders @ 840mhz. Yet, the card is still slower than your stock HD5870. Strange considering it has more double precision performance at that point. Perhaps the drivers aren't optimized enough for VLIW-4 architecture yet.
actually, strictly from a "shader clock X shader count" perspective, your 840MHz X 1536 shaders = 1,290,240 is less than my 850MHz X 1600 shaders = 1,360,000. even a 6970 @ the stock 880MHz doesn't have as much compute power as the 5870 in that sense. but then there's also what you mentioned to be considered - the VLIW-4 architecture of the 69xx series GPUs, which has been shown to be more efficient than my GPU's VLIW-5 architecture. so maybe there's some sense in the fact that your run times are slower (but just barely) than mine.

but in all fairness, i only noticed a few of those seventy-something second tasks this morning, and never before. that is not to say that these tasks don't ever slip by when i'm not in front of the computer, but for the most part i typically see ~88 second and ~108 second MW@H tasks worth ~213.76 and ~ 267.20 respectively.

at any rate, you may also be right that something isn't optimized for the VLIW-4 architecture as well as it could be...i'm not completely sure its the drivers, despite the fact that the Catalyst 11.2 drivers (the latest official release) are known to cause issues for folks crunching SETI@Home AP v5.06 tasks and S@H MB 6.10 on their ATI GPUs...and i don't know if the beta release 11.4 drivers fixes those problems or not. it could just as well be that the coding of the MW@H data has yet to be optimized for the VLIW-4 architecture. i don't know if MW@H started out specifically with GPU apps, but SETI@Home tasks were not meant to be run on an ATI GPU, and so volunteer coders and developers have had to write the code and optimize it on their own time. the issues i mentioned above were common to the 6xxx series cards and the 4xxx series cards. if you scour the SETI@Home forums, you find that their anonymous platform GPU apps are a bit more optimized for the 5xxx series cards. i know it seems like i went off on a S@H tangent, but my point is that, assuming MW@H GPU code was developed similarly to S@H's GPU code, then perhaps MW@H is susceptible to the same symptoms?...all of this speculation and i don't even know if you're using the 11.2 drivers lol.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
yes i get that GPU load fluctuation too...i'm guessing the reason for that though lies in the nature of the structure of the data itself, hence forcing the GPU to use more transistors simultaneously at some times more than others during the crunching process, and thus causing GPU load to fluctuate to some degree.

So I found a strange "quirk" on my system. When I simply open Windows Media Player, my GPU usage goes to 99% for MilkyWay@Home, even when WMP is minimized in the Start Toolbar and not playing any music/video. I tried this last night and I got an instant bump from 180,000 credits per day to 215,000 credits. With every single unit in the last 24 hours it has crunched it at 99% flat out. :wub:

milkywaywithwindowsmedi.png


Try this out and let me know if your performance improves.

actually, strictly from a "shader clock X shader count" perspective, your 840MHz X 1536 shaders = 1,290,240 is less than my 850MHz X 1600 shaders = 1,360,000. even a 6970 @ the stock 880MHz doesn't have as much compute power as the 5870 in that sense.

Right, but the double precision (FP64) performance of the HD5870 is limited to 1/5th of its single precision. On HD6950/6970 series, the double precision rate is limited to 1/4th of the single precision rate.

HD5870 @ 850mhz x 1600 = 2720 GFlops SP * 1/5 ==> 544 GFlops DP
HD6970 @ 840mhz x 1536 = 2580 GFlops SP * 1/4 ==> 645 GFlops DP

all of this speculation and i don't even know if you're using the 11.2 drivers lol.

Yes, currently using the 11.2s.
 

Sunny129

Diamond Member
Nov 14, 2000
4,823
6
81
So I found a strange "quirk" on my system. When I simply open Windows Media Player, my GPU usage goes to 99% for MilkyWay@Home, even when WMP is minimized in the Start Toolbar and not playing any music/video. I tried this last night and I got an instant bump from 180,000 credits per day to 215,000 credits. With every single unit in the last 24 hours it has crunched it at 99% flat out. :wub:
odd...i'll give it a try and let you know if anything changes. btw, you might want to turn your GPU fan up a bit. i know the GPU can handle those temps, but if you want to prolong the life of your card, i'd try to keep temps at or below 60°C at all times. i have to keep the fan on my 5870 @ 60% in order to keep temps below 60°C while crunching MW@H. it does sound like a small vacuum, but fortunately its in the home office rig, so the noise doesn't bother me.

*EDIT* - i just realized that the variation in GPU load you're getting w/ MW@H is something i only experience with S@H. when i crunch MW@H, i don't have that problem...entirely. what i mean is, GPU load sits at 99% for a majority of each MW@H task i crunch, however it does drop down as low as 95% momentarily a few times over the duration of the task...however, it doesn't sound nearly as bad as your experiences down near 82-88% and 94-97% GPU load situations. see my MSI Afterburner below for a better picture of what's going on w/ my GPU load:
008rw.jpg



Right, but the double precision (FP64) performance of the HD5870 is limited to 1/5th of its single precision. On HD6950/6970 series, the double precision rate is limited to 1/4th of the single precision rate.

HD5870 @ 850mhz x 1600 = 2720 GFlops SP * 1/5 ==> 544 GFlops DP
HD6970 @ 840mhz x 1536 = 2580 GFlops SP * 1/4 ==> 645 GFlops DP
could you elaborate on the math a bit more? how did you actually calculate the GFlop values of 2720 and 2580? i understand that double precision operations are slightly faster on the 69xx cards than they are on the 58xx cards. in other words i see that (2720)*(1/5) = 544, and that (2580)*(1/4) = 645. but i'm unclear on how to calculate Gflops.
 
Last edited:

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
odd...i'll give it a try and let you know if anything changes. btw, you might want to turn your GPU fan up a bit. i know the GPU can handle those temps, but if you want to prolong the life of your card, i'd try to keep temps at or below 60°C at all times. i have to keep the fan on my 5870 @ 60% in order to keep temps below 60°C while crunching MW@H. it does sound like a small vacuum, but fortunately its in the home office rig, so the noise doesn't bother me.

I have actually operated my GTX470 at 76-77*C crunching SETI@Home since June 15, 2010. Before that, my HD4890 crunched Milkyway 24/7 at 81-82*C for 1.5 years. So I am very confident that these GPUs can easily function for 5+ years 24/7 at 80*C+. Therefore, I feel that my 71-72*C are fairly safe. Thanks for the suggestion though.

As a side note, I have crunched SETI@Home on Q6600 3.4ghz @ 65-66*C for 2 years and on Core i7 860 3.9ghz @ 60-65*C for 2 years. I know that GPUs are rated at much much higher temperatures than CPUs. I believe Fermi (GF100) is even rated up to 105*C.

could you elaborate on the math a bit more? but i'm unclear on how to calculate Gflops.

A single Cypress or Cayman SP can do in a single clock cycle:
  • 4 32-bit FP MAD per clock
  • 2 64-bit FP MUL or ADD per clock
  • 1 64-bit FP MAD per clock
  • 4 24-bit Int MUL or ADD per clock
  • SFU : 1 32-bit FP MAD per clock
http://www.anandtech.com/show/2841/5

HD4890, HD5870, HD6970 (and their slower cousins) can both do 2x 64-bit Floating Point Multiply (MUL) and Addition (ADD) instructions per clock.

So for example,

HD4890 = 800 Execution Units * 2 flops/unit (for a multiply-add) * 850mhz GPU clock speed = 136,000 GigaFlops / 1000 = 1.36 TeraFLOPs

HD5870 = 1600 Execution Units * 2 flops/unit (for a multiply-add) * 850mhz GPU clock speed = 2.72 TFLOPs

HD6970 = 1536 Execution Units * 2 flops/unit (for a multiply-add) * 880mhz GPU clock speed = 2.70 TFLOPs

The calculation is differnet for NV because some of their units can do 4 instructions per clock. Anandtech did a quick calculation, 2nd last paragraph, at the bottom of this page.
 
Last edited:

sandorski

No Lifer
Oct 10, 1999
70,791
6,350
126
XFX 5870 at Stock Clocks, anywhere from just over 1 minute to just under 2 minutes, with the majority around 1:30, depending on the WU.
 

Ovven

Member
Feb 13, 2005
75
0
66
If you want to lower your temps in M@H, downlock your ram. My 4870 ram is running at 175mhz in stead of the default 900 with no effect on completion times. However, this doesn't work on Collatz projects, because those ones need a lot of memory bandwidth unlike M@H.