Milkyway@Home - GPU & CPU performance stats wanted, any capable h/w, old or new!

Sunny129 · Feb 20, 2014

Assimilator1 said:
Uh-oh, I think the benchmark WU now has a problem! Seems you were right afterall Sunny! Don't know why you got hit so early though

glad to know that i'm not going crazy!

BTW, i wonder how Mikey on the Milkway@Home forums is getting an average run time of 83s for his supposedly stock 7970...after all, he's on the same OS as i am. i did note that the ATI driver version on one of his 7970 hosts is 1.4.1848, which corresponds to Cat 13.9 or 13.12...again, i'm only running Cat 13.4, so maybe that has something to do with it.

Assimilator1 · Feb 20, 2014

Possibly, although I am wondering when these faster WUs 1st started to trickle out.......

Btw 3 of his 7900 rigs are using .1848 drivers, looking at his 1st client (537489) I saw their are still several results from the GPU their, 3 are 213.76 WUs (~83s) reported on 7/8th Feb so that's long before the quick ones showed up, I think!
485062 using .1471 drivers is also showing 4 213.76's at ~83s

When did you 1st see the 60s 213.76's?

Sunny129 · Feb 20, 2014

Assimilator1 said:
When did you 1st see the 60s 213.76's?

yesterday, around 1:30:30 UTC, not long before i edited post #99 in this thread to ask about it.

btw, i've been experimenting here to see if we can get better consistency out of another type of MW@H GPU task. with only the Milkyway@Home application selected (and the Separation and Modified Fit applications deselected), i receive only tasks that are worth 159.86 points...and sure enough, they all result in extremely consistent run times with hardly any variance at all. every task i've crunched since approx 23:23:00 UTC has consistently taken 45-46s to complete. perhaps we should use the 159.86-point tasks for benchmarking purposes instead?..although i would recommend that some folks keep an eye on their 159.86-point tasks over the course of a few days first, just to make sure these tasks don't intermittently see the same kind of behavior that the 213.76-point tasks have seen recently...

Assimilator1 · Feb 21, 2014

Well prior to the new WUs coming out most people were getting very consistent times with the 213.76 WUs, also it would be a pain to get everyone to post new times. Besides their's such a large difference between the std & the fast WUs that theirs no real chance of getting them mixed up.
However if they were to add a 3rd slightly different speed 213.76 WU then I would change it, also it's possible they could add different speed WUs to the other credit WUs.

Aren't your 'std' speed 213.76 WUs a consistent time?

I just looked at a load of WUs on your 496426 rig, & the times are all over the place. Were you experimenting with different drivers or clock speeds?

Oh & new time about to be added for Leonhearts 7950.
And some CPU times finally added

.

Tempted to see what my CPU will do them in

.
Temporarily switched my CPU to MW from Asteroids@H & instantly a 4C drop in temps, lol.
BOINC reckons 2hrs 50mins for the current WUs, which if right, at 10,260s would be might right at the top of this little table!

Ken g6 · Feb 21, 2014

Hm, "Linux" needs to go by my entry.

Sunny129 · Feb 21, 2014

Assimilator1 said:
Aren't your 'std' speed 213.76 WUs a consistent time?

well yes, i consistently get either ~60s run times or ~90s run times with the 213.76-point tasks...so i suppose we could continue to use these tasks as the benchmark so long as we're careful not to mix both longer running and shorter running tasks into our averages.

Assimilator1 said:
I just looked at a load of WUs on your 496426 rig, & the times are all over the place. Were you experimenting with different drivers or clock speeds

hahaha no, i went back to crunching 2 at a time on each GPU...that, and i enabled more than just the Modified Fit tasks...if i have to do more benchmarking, i'll go back to running just the Modified Fit tasks, and i'll go back to running one at a time of course...

Assimilator1 · Feb 21, 2014

Ok so 90s is your time for the std (long) 213.76 WUs :thumbsup:
Re times, ahhh ok, crunching 2 at a time will do it

I wonder if the long & short 213.76 WU times apply to CPUs too??
I've not crunched enough to find out atm, so far only 2, 12,747s & 12,717s & 1 not yet validated at 12,714s

Hmm, I chose a bad time to do this, BOINC managed to grab 4 LHC WUs earlier, their on hold atm, I'd like LHC to have 1 core but I find the resource division never seems to work properly!

Oh btw, I've u/ced by VRAM to 600 MHz (from 1GHz) which cut power usage by 11w (from the wall) & seems to of only added ~1s to 213.76 WU times. GPU @850 MHz & getting 211s so far, with nearly the same power usage as def. 725/1000

.
I use AMD GPU Tool to exceed driver clocking range, I think it works for 6900s too but not sure about 7900s.

Ken g6
Ok thx.

Sunny129 · Feb 21, 2014

you should probably note next to my entry that i'm at 800MHz VRAM. i was at the stock 1375MHz while benchmarking, but i get the same run times at 800MHz VRAM...

Assimilator1 · Feb 22, 2014

You get identical run times with VRAM @800MHz?
My 5850s runtimes dropped by a whole 1s going from 1000 to 600 MHz

Btw have you checked out AMD GPU Tool to see if you can downclock your RAM further?

Trying to dig out my CPU time atm, rather hard among the sea of GPU times!
So in addition to the earlier 12,747s & 12,717s WUs, I've got 12,398 & 12,395, 12,395, 12,718, 12,407 & 12,479. I've also got 2 unvalidated WUs @ 12,476 & 12,473s.
So averaging the validated ones gives 12,532s.

Oh & the answer to my question of whether the CPUs get long & short 213.76 WUs is yes, the short ones (I only saw 3) were in the mid 5000s area.

Sunny129 · Feb 22, 2014

Assimilator1 said:
You get identical run times with VRAM @800MHz?
My 5850s runtimes dropped by a whole 1s going from 1000 to 600 MHz

Btw have you checked out AMD GPU Tool to see if you can downclock your RAM further?

Trying to dig out my CPU time atm, rather hard among the sea of GPU times!

yes, identical run times, whether VRAM is at 1375MHz or 800MHz. you have to keep in mind that that VRAM throughput is a product of both clock speed AND bus width. so while MW@H hardly requires any VRAM throughput, it may require just enough throughput that a 5850 might become a bottleneck once the VRAM is underclocked past a certain lower limit. a 7970 on the other had still has plenty of VRAM throughput even at 800MHz to not become a bottleneck and start increasing run times. to put things into perspective, here are their comparative theoretical VRAM throughputs:

*EDIT* - now that i think about it, i can't say for sure that the run times are identical with the VRAM clocked at either speed...not based on a sample size of 5. now if i averaged say 50 tasks at 1375MHz VRAM and 50 tasks at 800MHz VRAM , then i might have a chance at seeing a difference in average run times...of course there's the chance that the difference between them could come out smaller than either of their own margins of error, making it impossible to tell if VRAM clock makes a difference in MW@H GPU task run times. one thing's for sure - if it does in fact make a difference, it is not substantial, and the marginal amount of additional crunching one might get from a high VRAM clock, even if it adds up over time, isn't worth the extra heat generated and electricity consumed (especially if you pay for your own electric).

5850: 1,000,000,000 clocks per second X 4 instructions per clock (quad-pumped) X 384-bit bus width = 1,024,000,000,000 bits per second, or 128GB/s

7970: 1,375,000,000 clocks per second X 4 instructions per clock (quad-pumped) X 256-bit bus width = 2,112,000,000,000 bits per second, or 264GB/s

...so the 7970 already has twice the VRAM throughput of the 5850 at stock clocks...i would have to underclock my VRAM to 687.5MHz (if i could) just to match the 5850's VRAM throughput at stock clocks. so somewhere south of 128GB/s VRAM throughput there must be a throughput rate that starts to hinder the efficiency of MW@H task computations.

...and yes, i tried AMD GPU Tool to see if i could get my VRAM timings even lower, but i didn't have much luck...in fact, AMD GPU Tool won't even detect my GPUs...

Sunny129 · Feb 23, 2014

Sapphire 7970, core @ 1000MHz (OCed), VRAM @ 1375MHz (factory/reference), BOINC v7.2.28, Catalyst 13.4, Win7 x64: ~85s

Sapphire 7970, core @ 1050MHz (OCed), VRAM @ 1375MHz (factory/reference), BOINC v7.2.28, Catalyst 13.4, Win7 x64: ~81s

Sapphire 7970, core @ 1100MHz (OCed), VRAM @ 1375MHz (factory/reference), BOINC v7.2.28, Catalyst 13.4, Win7 x64: ~77s

Sapphire 7970, core @ 1150MHz (OCed), VRAM @ 1375MHz (factory/reference), BOINC v7.2.28, Catalyst 13.4, Win7 x64: ~75s

...you can see the law of diminishing returns start to kick in above 1100MHz...

Assimilator1 · Feb 23, 2014

1000 > 1050 MHz, 5% clock speed increase giving ~5% cut in time.
1050 > 1100 MHz, 4.8% clock speed increase giving ~4.8% cut in time.
1100 > 1150 MHz, 4.6% clock speed increase giving ~2.6% cut in time.

Your right, weird! not throttling is it?

Re VRAM, yea makes sense.
I tried to downclock the RAM to 400 MHz last night & it locked up then BSOD! (& instantly trashed 6 WUs, don't know how it managed 6 when it was only working on 4 :confused, anyway 1 was a MW CPU task about 2/3 done

).
I've managed 500 MHz since, & although on 1st attempt it caused the grx card to spin up its fan briefly it seems to be ok. That cut power by another 3w, not a lot so if I see a significant slow down I'll knock it back upto 600 MHz.

Sunny129 · Feb 23, 2014

Assimilator1 said:
1000 > 1050 MHz, 5% clock speed increase giving ~5% cut in time.
1050 > 1100 MHz, 4.8% clock speed increase giving ~4.8% cut in time.
1100 > 1150 MHz, 4.6% clock speed increase giving ~2.6% cut in time.

Your right, weird! not throttling is it?

no its not throttling...but if you think about it, the results make sense mathematically. we wouldn't expect to see every 50MHz increase in core clock result in a 5s drop in run time. if i were to double my core clock speed, i would expect my run times to half. if i were to continuously double my core clock (which isn't possible in reality of course), this exponential increase in core clock would result in an asymptotic/logarithmic decrease in run time...that is to say, run times would get lower and lower, but they would decrease less and less each time i double the clock speed...so the law of diminishing returns is expected.

*EDIT* - when you add the above times to the list, know that i only tested them at the reference VRAM clock of 1375MHz (for some reason VRAM clock jumps back up to 1375MHz when i OC the core clock beyond the factory OC of 950MHz, and it can't be underclocked again until i drop the core clock back down to default or lower), whereas my original average run time of 90s at the default factory OC of 950MHz was confirmed both at the reference VRAM clock of 1375MHz and underclocked to 800MHz...so please note that in parentheses. also, listings for which you include a VRAM clock value should be listed at the real clock speed instead of the quad-pumped "effective" clock speed. for instance, Leonheart's VRAM clock for his 7950 should be shown as 1500MHz instead of 6GHz...it'll make looking at the chart easier b/c we'll have to do less math in our heads.

Assimilator1 · Feb 23, 2014

Your right about the RAM clock speeds, I don't normally list effective speeds as it can be confusing, although OTOH it can help show performance differences.

Re performance change, I wasn't saying that I'd expect every 50 MHz increase to give a 5s decrease, I had actually meant to say earlier that the reducing % increase with each 50 MHz step makes sense because 50 MHz becomes proportionally less each time. So would mean a gradually smaller decrease in time with each step.

The results don't make sense mathematically for the final step because it gives a 4.6% increase in clock speed yet only a 2.6% decrease in time (hence my question about throttling). The 1st 2 steps % increase clk vs decrease time match up.

If your sure your GPU isn't throttling then I wonder if your hitting a ceiling as to how fast a WU can be crunched??

So no one fancy posting any CPU times?

Sunny129 · Feb 23, 2014

ah, but think about it - while it might seem intuitive that a 5% increase in core clock (and thus processing power) will result in 5% decrease in run time, that is not the case. i think the best way i can explain it is to use a runner as an analogy...

suppose a runner runs 100m at 5 m/s. obviously it'll take him 100/5 = 20s to run it. if he decreases that distance by 5% (100*0.95 = 95m) and maintains the same speed of 5m/s, it will take him 95/5 = 19s to run it, which also happens to be a 5% decrease in run time (20*0.95 = 19)...

BUT, if he increases his speed by 5% (5*1.05 = 5.25m/s), instead of reducing his running distance by 5%, we see that it'll take him 100/5.25 = 19.0476s to run it. 19.0476s is 95.2381% of 20s, and so we see that a 5% increase in speed actually resulted in a 4.7619% decrease - not a 5% decrease - in his run time. now i chose to increase the runner's speed by 5% specifically because that is the percentage increase in core clock i happened to start out with when going from 1000MHz to 1050MHz. with a little math, we see that the following is true:

50MHz is a 50/1000 = 5.0000% increase over 1000MHz
50MHz is a 50/1050 = 4.7619% increase over 1050MHz
50MHz is a 50/1100 = 4.5455% increase over 1100MHz
50MHz is a 50/1150 = 4.3478% increase over 1150MHz

now in this analogy, the distance of the race is analogous to the amount of raw data contained in each "long" 213.76-point task, and the speed of the runner is analogous to the core clock of the GPU. unfortunately i can't easily calculate the theoretical GPU task run time like i could easily for a runner running a fixed distance at a fixed speed b/c i don't know the exact amount of raw data contained in a "long" 213.76-point task to make the calculation...but i can use the diminishing percentage increases in core clock calculated above, apply them to the example of the runner starting at 5m/s, and extrapolate to find what percentage decreases in his run times result from increasing his speed by the above percentages...these will be the same percentage decreases in run times seen in my benchmark progression from 1000MHz to 1150MHz in 50MHz increments. so if we start with 5m/s and apply a little math, we see that the following is true:

0.25m/s is a 0.25/5.00 = 5.0000% increase over 5m/s
0.25m/s is a 0.25/5.25 = 4.7619% increase over 5.25m/s
0.25m/s is a 0.25/5.50 = 4.5455% increase over 5.5m/s
0.25m/s is a 0.25/5.75 = 4.3478% increase over 5.75m/s

a little more math, and we have the following:

100m / 5.00m/s = 20.0000s
100m / 5.25m/s = 19.0476s
100m / 5.50m/s = 18.1818s
100m / 5.75m/s = 17.3913s

...and:

19.0476s / 20s = 95.2381%
18.1818s / 20s = 90.9091%
17.3913s / 20s = 86.9565%

...therefore:

the average run time at 1050MHz should be 95.2381% of 85s (which was the average run time at 1000MHz), or 80.9524s
the average run time at 1100MHz should be 90.9091% of 85s (which was the average run time at 1000MHz), or 77.2727s
the average run time at 1150MHz should be 86.9565% of 85s (which was the average run time at 1000MHz), or 73.9130s

i'd say that my actual average run times of 81s and 77s (taken at 1050MHz and 1100MHz respectively) agree quite well with theoretical prediction. but i see your point about the last core clock increase from 1100MHz -> 1150MHz, and how the corresponding decrease in run time of 75s doesn't look quite like it "fits the curve." but i don't think its b/c there is a limit to how fast a MWW@H GPU task can be crunched. rather i just think its a combination of factors contributing to the discrepancy, and that we'll never really know what all of those factors are and how much each one contributes to the discrepancy.

Assimilator1 · Feb 23, 2014

I don't know what happened to my maths earlier

lol

Your right that the 1st stat is a 4.7% decrease in time, but TBH that small difference could just be an artefact of averaging multiple runs. The only way to get really accurate times would be to run the same WU over & over again.
I remember contemplating such an idea for a benchmark WU for SETI, simply by copying the entire SETI installation including WU. That would work for the short term, at least until the WUs expiry date passed.

My 2nd stat was slightly out too, it should of been 4.9%, somehow I did get the last 1 right!

But my overall point (& yours initially), is still the same, & that is that the last increase in speed gave a considerably poorer decrease in time.
Did you check for power gating (I think that's the right term!) as well clock throttling?
That still seems like the most likely cause, or something else using GPU time.

Assimilator1 · Feb 23, 2014

Oh btw folks, still after benchmarks, would be nice to see more 6900s, 5800s stats & see any 3800 & 4700 stats!
As well as more NVidia & CPU stats :thumbsup:

Assimilator1 · Feb 25, 2014

HD 7750 score added to table.
Intel Xeon time added + edited description.

Assimilator1 · Mar 6, 2014

Was hoping to add a time for a Titan, but single WUs hugely under-utilise it, only ~18% GPU load! Multiple WUs can't be used for even a rough idea as the mix of long & short WUs mess that up!
I suppose I could post the range ......

No more times folks?

Sunny129 · Mar 6, 2014

Assimilator1 said:
Was hoping to add a time for a Titan but single WUs hugely under-utilise it, only ~18% GPU load!

i'm sure that has more to do with the fact that nVidia GPUs are far less efficient at MW@H than AMD GPUs are, and less to do with the fact that Titan is one of the most powerful cards currently available (if it has anything to do with the latter at all). i say this because a 7970 is easily fully utilized by MW@H, and a GTX Titan is nowhere near several times as powerful as a 7970. if nVidia GPUs were able to use OpenCL as well as AMD GPUs do, then they would all have ~100% utilization while crunching MW@H too, the GTX Titan notwithstanding. i doubt anything can be done about that GPU utilization deficit, short of nVidia bringing their OpenCL development out of the stone age, or introducing a new driver set with highly specialized optimizations.

as such, a GTX Titan benchmark should still be taken an submitted pretty please...

Assimilator1 · Mar 7, 2014

I wasn't implying that it was because of the power of the titan that it was being under-utilised, I didn't say at all why I thought it was

. Although I have heard NVidia's OpenCL support is poor which is what MW uses now.

Oh just for the record, not sure if you were implying otherwise but the Titan's DP power is about 50% more than the 7970!

. Not sure if running multiple WUs at once fully utilises that.

I'll see what I can do for a Titan benchmark.
Btw, just a thought, would allocating more CPU time via an app_info for a Titan increase the GPU load?

Sunny129 · Mar 7, 2014

Assimilator1 said:
Oh just for the record, not sure if you were implying otherwise but the Titan's DP power is about 50% more than the 7970! .

...which is a far cry from being several times more powerful than a 7970...that's not even twice as powerful as a 7970 from a DP perspective.

Assimilator1 said:
Not sure if running multiple WUs at once fully utilises that.

i'm not sure b/c i have no personal experience w/ MW@H and the GTX Titan...but i have my doubts. keep in mind that increased efficiency on the MW@H project (by way of running multiple tasks simultaneously) comes not from shortened run times of the tasks themselves, but from the overlap of of one task by another while the former finishes its last few seconds worth of calculations on the CPU, thereby preventing the GPU from sitting idle for those few seconds. in fact, experiment will show that running 2 MW@H tasks simultaneously will exactly double their run times, regardless of how much the tasks overlap. so its no surprise to see a single MW@H task load a 7970 to ~100% just like 2 simultaneous MW@H tasks does. in other words, a single MW@H task should take a GTX Titan to 100% utilization and simply finish sooner than it would on a 7970, were it not for nVidia's crappy OpenCL support.

Assimilator1 said:
Btw, just a thought, would allocating more CPU time via an app_info for a Titan increase the GPU load?

i have no idea, but it would be great if we could get someone w/ a GTX Titan to test MW@H using an app_info and manipulate the CPU time parameter...

Assimilator1 · Mar 7, 2014

Sunny129 said:
...which is a far cry from being several times more powerful than a 7970...that's not even twice as powerful as a 7970 from a DP perspective.

Err, I never said it was several times more powerful than a 7970 in the 1st place

.

Re CPU time on a Titan, I'll ask Yankton.

Btw re running multiple WUs at once on the Titan, it improves efficiency on that GPU because it increases GPU load. Yankton found that he could increase GPU load to nr 100% running 4 or 5 WUs at once, depending which WUs were being crunched. See the KWSN thread p2 for more info

. Oh & GPU times don't go up at those levels.

incidentally, looking back through his posts in that thread he mentions that the Titans
OpenCL does well in other projects

.

Sunny129 · Mar 7, 2014

Assimilator1 said:
Err, I never said it was several times more powerful than a 7970 in the 1st place .

i know you didn't...i said that a few posts above in response to your having pointed out that the GTX Titan is hugely under-utilized. i also understand that you weren't implying that the reason for this is the Titan's raw compute power...but i thought i would point out that its probably not the Titan's compute power causing the under-utilization anyways, just for folks who might be wondering.

Assimilator1 said:
Btw re running multiple WUs at once on the Titan, it improves efficiency on that GPU because it increases GPU load. Yankton found that he could increase GPU load to nr 100% running 4 or 5 WUs at once, depending which WUs were being crunched. See the KWSN thread p2 for more info . Oh & GPU times don't go up at those levels.

what exactly do you mean run times don't go up at those levels? are you saying that 4 or 5 simultaneous tasks finish in the same amount of time it takes to complete a single task running by itself? or are you saying that the increase in run time isn't directly proportional to the number of tasks being run (i.e. 1 task takes X seconds to complete, 2 tasks take <2X seconds to complete, 3 tasks take <3X seconds to complete, etc.)?

Assimilator1 · Mar 7, 2014

Re Titans power, ahhh ok

, guess that was a language issue

.

are you saying that 4 or 5 simultaneous tasks finish in the same amount of time it takes to complete a single task running by itself?

Yes apparently, see the KWSN thread, Yankton in turn discovered this on the MW forums where their were a couple of threads discussing running 6 WUs at once with little penalty. I haven't seen those threads for myself so I can't vouch for them.

Btw Yankton doesn't think the cpu time can be altered in the app info, where would he find it?

Milkyway@Home - GPU & CPU performance stats wanted, any capable h/w, old or new!

Diamond Member

Elite Member

Diamond Member

Elite Member

Programming Moderator, Elite Member

Diamond Member

Elite Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Elite Member

Diamond Member

Elite Member

Diamond Member

Elite Member

Elite Member

Elite Member

Elite Member

Diamond Member

Elite Member

Diamond Member

Elite Member

Diamond Member

Elite Member