GPUPI challenge! This time, the gimp beats you! Or not.

DrMrLordX

Lifer
Apr 27, 2000
23,223
13,301
136
Several years ago, I invited abuse of my Sempron 140 in a SuperPi challenge that probably led to some partial damage to the chip that eventually forced me to replace the poor old thing.

Ah, good times, good times.

Anyway, there's a new gimp in town, and there's a new test, too:

GPUPI.

Long story short, it's like SuperPi, but it's on your GPU. I think we should throw down and do some seriously not-serious-just-for-fun GPUPI runs in here to see who can post the best time. I propose two categories:

Socket-only results
Full-system results

Socket only would be, post the best time you can get from your CPU socket(s). The test will run on a CPU or GPU, so for those of you with no iGPU (or no iGPU worth a darn), you can always run the program using the CPU. Obviously, this category favors APUs, big whoop.

Full-system results would be, post the best time you can get, period, using any device in your system.

GPUPI now supports multiple GPUs or CPUs (but not GPUs + CPUs at the same time). Have at it, you Xfire/SLI freaks!

I can maintain a leaderboard if anyone wants me to, though updates may have some gaps since my work schedule is meh right now. Feel free to redo runs if you manage to tweak your hardware and achieve better results. I'll do my best to keep up, which may not be very good over the weekends.

The only submissions I'll post to the leaderboard will use default settings for 1b or 32m runs.

Anyway, let's get this started off the right way! Here goes the gimp:

https://www.dropbox.com/s/nn28xxgz466p19f/win107700kGPUPI1b4700CPU2100NB1028GPU.png?dl=0

socket-only, 1b:

4m 22.132s

AMD A10-7700k 4.7 ghz
2100 mhz NB
DDR3-2400 10-12-13-32 2T
1028 mhz iGPU

edit: I punched my score up a bit by raising iGPU speed to 1107 mhz in a "suicide" run. 1200 mhz iGPU locked the system shortly after boot (lulz).

https://www.dropbox.com/s/5g5c3cktpz17x7y/win107700kGPUPI1b4700CPU2100NB1107GPU.png?dl=0

edit: The best I can do is with my iGPU running @ 1152 mhz. Northbridge does nothing, memory does nothing, and any iGPU speed higher than that is crashy. Also, I'll add my 32m speed for kicks.

https://www.dropbox.com/s/kupw107lkabrrap/win107700kGPUPI1b4512CPU2016NB1152GPU.png?dl=0

https://www.dropbox.com/s/p10hiamim7zyw1a/win107700kGPUPI32m4512CPU2016NB1152GPU.png?dl=0

Socket-only 1b leaderboard
Code:
kagui:       3m 29.399s  | iGPU (A10-7850k, 960 mhz)
DrMrLordX:   3m 54.106s  | iGPU (A10-7700k, 1152 mhz)
biostud:     9m 25.828s  | CPU (i7-5820k, 4.3 ghz)
Makaveli:    9m 34.923s  | CPU (i7 970, 4.2 ghz)
Ramses:      13m 58.400s | CPU (FX 9590, 4716 mhz)
Yuriman:     15m 13.210s | CPU (i5-3570k, 4800 mhz)
Ruiner1:     17m 17.995s | CPU (Xeon L5639, 3.2 ghz)
.vodka:      17m 30.275s | CPU (i5-2500k, 4.5 ghz)
Bubbleawsome:17m 34.126s | CPU (i5-4670k, 4.3 ghz)
DrMrLordX:   22m 26.459s | CPU (A10-7700k, 4.7 ghz)

Socket-only 32m leaderboard
Code:
DrMrLordX:    3.442s  | GPU (A10-7700k, 1152 mhz)
biostud:      10.500s | CPU (i7-5820k, 4.3 ghz)
Bubbleawsome: 11.051s | CPU (i5-4670k, 4.3 ghz)
Makaveli:     13.036s | CPU (i7-970, 4.2 ghz)
.vodka:       20.055s | CPU (i5-2500k, 4.5 ghz)
DrMrLordX:    28.764s | CPU (A10-7700k, 4.7 ghz)

Full-system 1b leaderboard
Code:
.vodka:      22.383s   | GPU (R9 290, 1180/1600 mhz)
WiseUp216:   38.917s   | GPU (GTX 970, 1610 mhz)
ShintaiDK:   38.944s   | GPU (GTX 980, 1240 mhz)
cmdrdredd:   39.663s   | GPU (GTX 970, 1560 mhz)
Makaveli:    40.978s   | GPU (HD 7970, 1150 mhz)
Bubbleawsome:42.655s   | GPU (R9 280x, 1100 mhz)
Ramses:      45.519s   | GPU (R9 280x, 1050 mhz)
biostud:     46.614s   | GPU (HD 7990, 1000 mhz)
Lepton87:    47.411s   | GPU (GTX Titan SLI, 1200 mhz)
Deders:      50.876s   | GPU (GTX 780, 1215 mhz)
Ruiner1:     1m 2.791s | GPU (GTX 770, 1189 mhz)
Sable:       1m 3.12s  | GPU (GTX 680, 1308 mhz)
Yuriman:     1m 26.624s| GPU (HD 7850, 1225 mhz)
SPBHM:       1m 45.720s| GPU (HD 5850, 900 mhz)
elemein:     2m 54.116s| GPU (GTX 860M)

Full-system 32m leaderboard
Code:
.vodka:       .403s   | GPU (R9 290, 1180/1600 mhz)
cmdrdredd:    .496s   | GPU (GTX 970, 1560 mhz)
WiseUp216:    .502s   | GPU (GTX 970, 1610 mhz)
ShintaiDK:    .504s   | GPU (GTX 980, 1240 mhz)
Makaveli:     .656s   | GPU (HD 7970, 1150 mhz)
Bubbleawsome: .671s   | GPU (R9 280x, 1100 mhz)
biostud:      .731s   | GPU (HD 7990, 1000 mhz)
sm625:        12.200s | GPU (Quadro 3800)
 
Last edited:

Ramses

Platinum Member
Apr 26, 2000
2,871
4
81



GPU only

Device time for pi calculation: 44.836 s
Device time for memory reduction: 0.682 s


280x/16gb
 

Ramses

Platinum Member
Apr 26, 2000
2,871
4
81
I think it's geared toward, or in this thread, APU's. I threw my GPU in there just for reference, cpu-only time was longer than I felt like waiting lol...
 

DrMrLordX

Lifer
Apr 27, 2000
23,223
13,301
136
I used the CUDA version if that's okay. I can do the OpenCL one instead if necessary.

Use whatever gives you the best time for the hardware tested. Please try your CPU for a socket-only result as well! It's not going to be as fast as your video card, but still . . .

Maybe this belongs in the video card forum?

The code runs on CPUs just as well as GPUs. It could go either way.

I think it's geared toward, or in this thread, APU's. I threw my GPU in there just for reference, cpu-only time was longer than I felt like waiting lol...

Well, also, it'll be interesting to see how modern CPUs stack up to GPUs. Once I get finished toying with some other settings, I'll try it in CPU mode to show how much faster it is using the iGPU than it is with the Steamroller cores.

Hopefully someone with Haswell-E will show up and try to knock off my socket-only time.
 

elemein

Member
Jan 13, 2015
114
0
0
Use whatever gives you the best time for the hardware tested. Please try your CPU for a socket-only result as well! It's not going to be as fast as your video card, but still . . .



The code runs on CPUs just as well as GPUs. It could go either way.



Well, also, it'll be interesting to see how modern CPUs stack up to GPUs. Once I get finished toying with some other settings, I'll try it in CPU mode to show how much faster it is using the iGPU than it is with the Steamroller cores.

Hopefully someone with Haswell-E will show up and try to knock off my socket-only time.

The software doesn't seem to detect my HD4600 and I'm not gonna troubleshoot why when my time likely won't beat yours :$
 

Makaveli

Diamond Member
Feb 8, 2002
5,024
1,624
136
OpenCL GPU: AMD Tahiti (32 CUs, 1050 MHz)
OpenCL 2.0 AMD-APP (1642.5) is ready.

Compiling OpenCL kernels ... done.

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335546368 Bytes
Batch Size : 20M
Reduction Size : 64

1B
Device time for pi calculation: 44.297 s
Device time for memory reduction: 0.539 s
32M
Device time for pi calculation: 0.633 s
Device time for memory reduction: 0.037 s

Overclocked

OpenCL GPU: AMD Tahiti (32 CUs, 1150 MHz)
OpenCL 2.0 AMD-APP (1642.5) is ready.

Compiling OpenCL kernels ... done.

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335546368 Bytes
Batch Size : 20M
Reduction Size : 64

1B
Device time for pi calculation: 40.466 s
Device time for memory reduction: 0.512 s
32M
Device time for pi calculation: 0.625 s
Device time for memory reduction: 0.031 s


OpenCL CPU: Intel(R) Core(TM) i7 CPU 970 @ 3.20GHz (12 CUs, 4200 MHz)
OpenCL 2.0 AMD-APP (1642.5) is ready.

Compiling OpenCL kernels ... done.

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335546368 Bytes
Batch Size : 20M
Reduction Size : 64

1B
Device time for pi calculation: 553.724 s
Device time for memory reduction: 21.199 s


OpenCL CPU: Intel(R) Core(TM) i7 CPU 970 @ 3.20GHz (12 CUs, 4200 MHz)
OpenCL 2.0 AMD-APP (1642.5) is ready.

Compiling OpenCL kernels ... done.

Calculating 32.000.000th digit of PI. 20 iterations.

Allocated device memory : 16779264 Bytes
Batch Size : 1M
Reduction Size : 64

32M
Device time for pi calculation: 12.666 s
Device time for memory reduction: 0.370 s
 
Last edited:

Sable

Golden Member
Jan 7, 2006
1,130
105
106
Open CL and CUDA gave me more or less the same time.

OpenCL GPU: NVIDIA GeForce GTX 680 (8 CUs, 1058 MHz)
OpenCL 1.1 CUDA 6.5.30 is ready.

Compiling OpenCL kernels ... done.

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335546368 Bytes
Batch Size : 20M
Reduction Size : 64

Device time for pi calculation: 69.823 s
Device time for memory reduction: 1.278 s



CUDA GPU: GeForce GTX 680 with compute capability 3.0

=> Kernel 1, Batch Size: 20M, Blocks: 20480, Threads: 1024
=> Kernel 2, Batch Size: 20M, Blocks: 20480, Threads: 1024

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335545360 Bytes
Batch Size : 20M
Reduction Size : 64

Device time for pi calculation: 69.599 s
Device time for memory reduction: 1.331 s



And OpenCL suicide run +200mhz on the clock

Device time for pi calculation: 64.620 s
Device time for memory reduction: 1.188 s

+300mhz gave me an invalid result.

edit:

+250mhz

Device time for pi calculation: 61.977 s
Device time for memory reduction: 1.143 s
 
Last edited:

Ruiner1

Member
Sep 13, 2013
26
0
66
I've got my CPU overclocked to 200mhz base, x16 multiplier when all cores are used so it is actually at 3.2GHz. That said, it is miles slower than my GPU.

OpenCL GPU: NVIDIA GeForce GTX 770 (8 CUs, 1189 MHz)
OpenCL 1.1 CUDA 7.0.18 is ready.

Compiling OpenCL kernels ... done.

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335546368 Bytes
Batch Size : 20M
Reduction Size : 64

Device time for pi calculation: 61.614 s
Device time for memory reduction: 1.177 s




CUDA GPU: GeForce GTX 770 with compute capability 3.0

=> Kernel 1, Batch Size: 20M, Blocks: 20480, Threads: 1024
=> Kernel 2, Batch Size: 20M, Blocks: 20480, Threads: 1024

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335545360 Bytes
Batch Size : 20M
Reduction Size : 64

Device time for pi calculation: 64.462 s
Device time for memory reduction: 1.243 s




OpenCL CPU: Intel(R) Xeon(R) CPU L5639 @ 2.13GHz (12 CUs, 2130 MHz)
OpenCL 1.2 is ready.

Compiling OpenCL kernels ... done.

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335546368 Bytes
Batch Size : 20M
Reduction Size : 64

Device time for pi calculation: 1030.129 s
Device time for memory reduction: 7.866 s
 

DrMrLordX

Lifer
Apr 27, 2000
23,223
13,301
136
The software doesn't seem to detect my HD4600 and I'm not gonna troubleshoot why when my time likely won't beat yours :$

Hmm. That's odd . . . HD4600 drivers should support OpenCL. Unfortunate.

OpenCL CPU: Intel(R) Core(TM) i7 CPU 970 @ 3.20GHz (12 CUs, 4200 MHz)

Device time for pi calculation: 553.724 s
Device time for memory reduction: 21.199 s

Now that's interesting. I seriously think a 5960X could beat me, if you're putting up a score like that on your CPU. Is the clockspeed accurate?

edit: looked at your sig. Duh.

I've got my CPU overclocked to 200mhz base, x16 multiplier when all cores are used so it is actually at 3.2GHz. That said, it is miles slower than my GPU.

1b is pretty rough in CPU mode. I have yet to get around to it myself . . . still trying to get my iGPU running faster. Turns out memory speeds and timings do nothing to make the iGPU on my 7700k run faster or slower. It's all down to iGPU clockspeed and maybe NB speed (haven't tested that yet).

+300mhz gave me an invalid result.

I had a lot of this from unstable memory, though I also got it from most of my attempted bclk OC scenarios. Man bclk OC on this system is terrible . . . but I digress.

I am interested in knowing if any change in your card's memory speed affects your scores? Memory speed does nothing to my iGPU. I went from DDR3-2400 to DDR3-1600, messed with timings (CAS10 -> CAS7), nothing changed the score.
 
Last edited:

Makaveli

Diamond Member
Feb 8, 2002
5,024
1,624
136
Now that's interesting. I seriously think a 5960X could beat me, if you're putting up a score like that on your CPU. Is the clockspeed accurate?

edit: looked at your sig. Duh.

May have to add the clock speed in the results for the chart for the Cpu's :)

Also for my gpu its running at Ghz speed and not vanilla 7970 so not sure if you want to include clocks for those also.
 

Sable

Golden Member
Jan 7, 2006
1,130
105
106
I had a lot of this from unstable memory, though I also got it from most of my attempted bclk OC scenarios. Man bclk OC on this system is terrible . . . but I digress.

I am interested in knowing if any change in your card's memory speed affects your scores? Memory speed does nothing to my iGPU. I went from DDR3-2400 to DDR3-1600, messed with timings (CAS10 -> CAS7), nothing changed the score.

Everything stock

Device time for pi calculation: 75.689 s
Device time for memory reduction: 1.386 s

+500Mhz on memory

Device time for pi calculation: 75.504 s
Device time for memory reduction: 1.367 s

So nope, nothing.
 

sm625

Diamond Member
May 6, 2011
8,172
137
106
I did several runs, then I set the top dropdown box to 100M, I set the batch size dropdown box to 20M, and I set the Reduction Size dropdown box to 512. It crashed! What's up with that? My computer feels horribly slow while this is running.

Anyway, it took 12.2 seconds to do a 32M run on a quadro 3800. Can you guys settle on 32M so that it
a) wont take forever to run and
b) will be more easily comparable to cpu scores.
 

Ramses

Platinum Member
Apr 26, 2000
2,871
4
81
Mine crashes on those settings too.
But I can't tell it's running if it's minimized, gpu or cpu lol.
 

DrMrLordX

Lifer
Apr 27, 2000
23,223
13,301
136
Maybe we can open up a 32M category as well, for people with slower computers or who just don't want to run 1b.
 

DrMrLordX

Lifer
Apr 27, 2000
23,223
13,301
136
May have to add the clock speed in the results for the chart for the Cpu's :)

Also for my gpu its running at Ghz speed and not vanilla 7970 so not sure if you want to include clocks for those also.

I'll gather what data I can and add it to the scores as time permits.

Everything stock

Device time for pi calculation: 75.689 s
Device time for memory reduction: 1.386 s

+500Mhz on memory

Device time for pi calculation: 75.504 s
Device time for memory reduction: 1.367 s

So nope, nothing.

Okay, that's very interesting. Very interesting indeed . . .
 

SPBHM

Diamond Member
Sep 12, 2012
5,076
440
126
5850

OpenCL GPU: AMD Cypress (18 CUs, 900 MHz)
OpenCL 2.0 AMD-APP (1642.5) is ready.

Compiling OpenCL kernels ... done.

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335546368 Bytes
Batch Size : 20M
Reduction Size : 64

00h 00m 00.441s Batch 1 finished.
00h 00m 03.225s Batch 2 finished.
00h 00m 06.059s Batch 3 finished.
00h 00m 10.766s Batch 4 finished.
00h 00m 19.589s Batch 5 finished.
00h 00m 27.353s Batch 6 finished.
00h 00m 30.137s Batch 7 finished.
00h 00m 32.972s Batch 8 finished.
00h 00m 37.543s Batch 9 finished.
00h 00m 45.870s Batch 10 finished.
00h 00m 53.222s Batch 11 finished.
00h 00m 56.005s Batch 12 finished.
00h 00m 58.839s Batch 13 finished.
00h 01m 03.546s Batch 14 finished.
00h 01m 12.367s Batch 15 finished.
00h 01m 20.131s Batch 16 finished.
00h 01m 22.915s Batch 17 finished.
00h 01m 25.749s Batch 18 finished.
00h 01m 30.319s Batch 19 finished.
00h 01m 38.650s Batch 20 finished.
00h 01m 45.720s PI value output -> 5895585A0

Device time for pi calculation: 105.025 s
Device time for memory reduction: 0.695 s
 

Yuriman

Diamond Member
Jun 25, 2004
5,530
141
106
I'm not home right now, but I just set everything to stock via remote desktop and ran it on my GPU and CPU:

HD7850 @ 975mhz core = 1m 48.461s, lowest of three runs

3570K @ stock (3.4 base, 3.8 with 1 core) = 20m 05.077s, two runs

It's a shame Inte's iGPU isn't detected. I'll re-run with my overclocks back in place when I get home, thinking I might want to squeeze a bit more out.
 
Last edited:

sm625

Diamond Member
May 6, 2011
8,172
137
106
DrMrLordX, you have Socket-only 32m leaderboard listed twice. The 2nd one should probably read Full-system 32m leaderboard.
 

Ramses

Platinum Member
Apr 26, 2000
2,871
4
81
CPU only - 9590/16gb



OpenCL CPU: AMD FX(tm)-9590 Eight-Core Processor (8 CUs, 4716 MHz)
OpenCL 2.0 AMD-APP (1642.5) is ready.

Compiling OpenCL kernels ... done.

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335546368 Bytes
Batch Size : 20M
Reduction Size : 64

00h 00m 03.000s Batch 1 finished.
00h 00m 33.430s Batch 2 finished.
00h 01m 03.931s Batch 3 finished.
00h 01m 41.687s Batch 4 finished.
00h 02m 40.281s Batch 5 finished.
00h 03m 38.339s Batch 6 finished.
00h 04m 07.919s Batch 7 finished.
00h 04m 37.580s Batch 8 finished.
00h 05m 14.078s Batch 9 finished.
00h 06m 07.603s Batch 10 finished.
00h 07m 04.498s Batch 11 finished.
00h 07m 34.337s Batch 12 finished.
00h 08m 03.968s Batch 13 finished.
00h 08m 41.783s Batch 14 finished.
00h 09m 38.074s Batch 15 finished.
00h 10m 36.061s Batch 16 finished.
00h 11m 05.427s Batch 17 finished.
00h 11m 34.934s Batch 18 finished.
00h 12m 11.260s Batch 19 finished.
00h 13m 04.924s Batch 20 finished.
00h 13m 58.400s PI value output -> 5895585A0

Device time for pi calculation: 811.686 s
Device time for memory reduction: 26.714 s
 

kagui

Member
Jun 1, 2013
78
0
0
apu 7850k 2400ghz memory
gpu-z gives me igpu at 960 mhz and mem at 1200 mhz. but temp at 12c


OpenCL GPU: AMD Spectre (8 CUs, 1070 MHz)
OpenCL 2.0 AMD-APP (1642.5) is ready.

Compiling OpenCL kernels ... done.

Calculating 1.000.000.000th digit of PI. 20 iterations.

Allocated device memory : 335546368 Bytes
Batch Size : 20M
Reduction Size : 64

00h 00m 00.682s Batch 1 finished.
00h 00m 06.208s Batch 2 finished.
00h 00m 11.773s Batch 3 finished.
00h 00m 21.173s Batch 4 finished.
00h 00m 38.484s Batch 5 finished.
00h 00m 53.718s Batch 6 finished.
00h 00m 59.231s Batch 7 finished.
00h 01m 04.797s Batch 8 finished.
00h 01m 14.015s Batch 9 finished.
00h 01m 30.658s Batch 10 finished.
00h 01m 45.333s Batch 11 finished.
00h 01m 50.851s Batch 12 finished.
00h 01m 56.411s Batch 13 finished.
00h 02m 05.812s Batch 14 finished.
00h 02m 23.124s Batch 15 finished.
00h 02m 38.356s Batch 16 finished.
00h 02m 43.869s Batch 17 finished.
00h 02m 49.429s Batch 18 finished.
00h 02m 58.647s Batch 19 finished.
00h 03m 15.290s Batch 20 finished.
00h 03m 29.399s PI value output -> 5895585A0

Device time for pi calculation: 207.186 s
Device time for memory reduction: 2.213 s
 

Bubbleawsome

Diamond Member
Apr 14, 2013
4,834
1,204
146
The final category is a bit off. Says socket only, should be full-system 32m.

4670k@3.8Ghz with boost, R9 280x. I'll re-overclock the 4670k to 4.3 later.

32M
CPU:
Device time for pi calculation: 12.141 s
Device time for memory reduction: 0.420 s
GPU:
Device time for pi calculation: 0.603 s
Device time for memory reduction: 0.068 s

1b
CPU:
00h 20m 13.699s PI value output -> 5895585A0

Device time for pi calculation: 1203.502 s
Device time for memory reduction: 10.197 s
GPU:
Device time for pi calculation: 41.963 s
Device time for memory reduction: 0.692 s
 

DrMrLordX

Lifer
Apr 27, 2000
23,223
13,301
136
DrMrLordX, you have Socket-only 32m leaderboard listed twice. The 2nd one should probably read Full-system 32m leaderboard.

Woops! Thanks for pointing out that porbleem.

apu 7850k 2400ghz memory
gpu-z gives me igpu at 960 mhz and mem at 1200 mhz. but temp at 12c


00h 03m 29.399s PI value output -> 5895585A0

Device time for pi calculation: 207.186 s
Device time for memory reduction: 2.213 s

Ach I am unseated! Sadly I do not think I can defeat you, since the only way to improve performance is to increase iGPU clockspeed (the NB does nothing). You = winnar, for now.

The final category is a bit off. Says socket only, should be full-system 32m.

Fixed, thanks.

CPU only - 9590/16gb

OpenCL CPU: AMD FX(tm)-9590 Eight-Core Processor (8 CUs, 4716 MHz)
OpenCL 2.0 AMD-APP (1642.5) is ready.

Is that the proper clockspeed? And are you using turbo?

5850

OpenCL GPU: AMD Cypress (18 CUs, 900 MHz)
OpenCL 2.0 AMD-APP (1642.5) is ready.

Is 900 mhz the correct clockspeed here? Or are you running it faster than that?

On a lark, I ran 1b using my CPU (4.7 ghz, DDR3-2400 10-12-13-32 2T, 2100 NB) and got:

22m 26.459s

Not too shabby.
 
Last edited: