Thoughts on building GPU crunchers (riser cables)

StitchExperimen

Senior member
Feb 14, 2012
345
5
81
At the present I'm working just on “concept” first, but later I'm thinking of many crunchers in a rectangular room with some type of air cooling or evaporation tower cooling.

What I'm also interested in is suggestions in processors because some GPU projects need multiple cores. If there are 4 GPUs on riser cables with no case Xeon cpus might be better to consider either single or dual cpu. But how slow of a clock is useful I don't know because I ran Intel o/c 3770 and LGA 2011 at 4.3 to 4.5 GHz and was running WCG 7950s and 660Ti's. So what I'm trying to say is all I have now to make comparisons with are a server that has 2 AMD 4386 3.1, 3.7 turbo, 3.4 all turbo and a Nvidia EVGA GTX 460.

The price point of what GPU's to get is up for discussion. Longevity is another question of usefulness. Along with wattage.

Another thing is the main board... a server board is made for 24/7/365 service 5+ years.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,842
4,822
75
The number of cores required, and the speed required, totally depends on the projects. Most projects don't use more than one CPU core per GPU in the actual software, just because the threads (or lack thereof) naturally line up that way. However, the video card drivers can take another full core in some cases, so allocating one extra virtual core per card to that can be helpful.

What GPUs to get also depends on the project, and sometimes the OS. These new, very efficient Maxwell cards are doing very well in F@H in Linux, for instance, but with the new core in Windows they're (currently) terrible. They probably aren't very good at any double-precision projects either. 280Xs are great for double-precision projects, but they take a lot more power than Maxwell, and some projects are CUDA only. GTX 580s or Teslas are great for CUDA-only double-precision, but the 580s have a huge power draw and the Teslas cost a lot.
 

StitchExperimen

Senior member
Feb 14, 2012
345
5
81
I had a R9 290 for awhile and ran MW or Moo. I'm pretty sure it was MW and for that one I think I remember the time frame was they had recently came out with their GPU software.
Anyway what I was trying to get to was it took a load of 3 GPU for a R9 290

Now another shortcoming my testing is based on WCG Cancer and the wu were done 4 at a time each finishing in under 2 minutes and I don't know what role the cpu cache played. On a 3770 I could send 5 GPU down each of 2 GPUs and and 4 or 5 gpu they came out 6's or equal.
Now on a LGA 2011 6 core the $560? version I could put a max of 11 GPU through each of 2 7950s (time still averaged out the same as using 4 GPU)
Lessons learned faster clocks on the GPU mattered and some people were hitting 1200 or higher 1250 with Gigabyte. Our budgets were so high that when it came to forking over the money for a 7970 we bought 7950s instead. Overclocked processor speed mattered, you could tell who had better tweeking considering these wu were speeding by under 2 minutes and I think it was more like a minute and 20 seconds.

But some of it is generally only GPU and I'm out of the loop bad on the projects. I've way back when done F@H, PrimeGrid, MilkyWay@home, GPUGRID.
I never heard till now of a driver needing a virtual cpu.

FROM BOINC:
Projects with NVIDIA applications:
Collatz Conjecture (Windows and Mac OS X on Intel)
DistrRTgen (Windows, Linux 64bit)
Einstein@home (Linux, Windows and Mac OS X on Intel)
GPUgrid.net (Linux 64bit and Windows 64bit or 32bit, requires a video card with Compute Capability 1.3 (CC1.3) or higher)
Milkyway@home (Double precision GPU required, so compute capability 1.3 or higher, meaning a Geforce GTX 260 or better) (Linux 64bit and Windows)
Moo! (Driver 256.00 or better, Compute Capability 1.0 or higher, Minimum device memory 384 MB - - - http://moowrap.net/forum_thread.php?id=16)
PrimeGrid (Proth Prime Search (Sieve), Linux 32bit, Linux 64bit Windows and Mac OS X on Intel; Cullen/Woodall Prime Search (Sieve), Linux 32bit, Linux 64bit, Windows and Mac OS X on Intel)
SETI@home (Windows and Linux only)

Projects with ATI/AMD applications:
Collatz Conjecture (Windows, Windows 64bit, Linux 64bit)
DNETC@Home (Windows and Linux 32bit)*Project retired
Einstein (Linux, Windows on Intel; Mac OS X under active development)
Milkyway@home (OpenCL support and Double precision GPU required, so a Radeon 48xx, 47xx, 58xx, 69xx, FirePro V87xx, FireStream 92xx) (Windows only)
Moo! (Driver v10.4 or later, ATI Runtime (not older AMD), Minimum device memory 250 MB - http://moowrap.net/forum_thread.php?id=16)
PrimeGrid (Proth Prime Search (Sieve), Linux 32bit, Linux 64bit and Windows)
SETI@home (Windows and Linux only)

Also:
Folding at Home (F@H)
Please help me add to list, Thanks!


Single precision Gflops , Double precision GFlops

GeForce GTX 690_________5621.8_____234.2 $1000
GTX Titan Black____________5.1 TeraFlops 1.3 Tera Flops
GeForce GTX 980___________4612_______144 TDP 165 watts $549 4-way SLI
GTX TITAN________________4500.0_______1000.0
GeForce GTX 970__________3494________109 TDP 145 watts $329 3-way SLI
GeForce GTX 680__________3090.4_______128.8
GeForce GTX 590__________2488.3_______311.0
GeForce GTX 660 Ti________2460.0_______102.5
GeForce GTX 670__________2460.0_______102.5
GeForce GTX 660__________1881.6________78.4
GeForce GTX 580__________1581.1_______197.6
GeForce GTX 650 Ti Boost___1505.3_______62.7
GeForce GTX 650 Ti________1420.8_______59.2
GeForce GTX 570__________1405.4_______175.7
GeForce GTX 560 Ti________1263.4_______105.3

Radeon R9 295X2 Single Precision 11,466.752______1,433.344 $799.99
Radeon R9 290X 5,632___________704 $300-330-380 TDP 290 watts
Radeon R9 290 4,848.6__________606.1 $269 Sapphire TDP 275 watts

GPU FP32 GFLOPS FP64 GFLOPS Ratio
Radeon R9 295X2 --------- 11264 1408 FP64 = 1/8 FP32
Radeon HD 7990 ------------7782 1946 FP64 = 1/4 FP32
GeForce GTX Titan Black -- 5645 1881 FP64 = 1/3 FP32
GeForce GTX 690 ----------- 5622 234 FP64 = 1/24 FP32
Radeon R9 290X ------------ 5632 704 FP64 = 1/8 FP32
GeForce GTX 780 Ti -------- 5345 223 FP64 = 1/24 FP32
Radeon HD 6990 ----------- 5099 1276 FP64 = 1/4 FP32
GeForce GTX 980 ----------- 4981 156 FP64 = 1/32 FP32
Radeon R9 290 -------------- 4849 606 FP64 = 1/8 FP32
GeForce GTX Titan --------- 4709 1523 FP64 = 1/3 FP32
Radeon HD 7970 GHz ------- 4301 1075 FP64 = 1/4 FP32
GeForce GTX 970 ------------ 3920 122 FP64 = 1/32 FP32
GeForce GTX 780 ------------ 4156 173 FP64 = 1/24 FP32
Radeon R9 280X ------------- 4096 1024 FP64 = 1/4 FP32
Radeon R9 280 --------------- 3344 836 FP64 = 1/4 FP32
Radeon HD 7950 Boost ------ 3315 828 FP64 = 1/4 FP32
GeForce GTX 770 ------------ 3210 134 FP64 = 1/24 FP32
GeForce GTX 680 ------------ 3090 129 FP64 = 1/24 FP32
Radeon HD 7950 ------------- 2867 717 FP64 = 1/4 FP32
Radeon HD 5870 ------------- 2720 544 FP64 = 1/5 FP32
Radeon HD 6970 ------------- 2703 675 FP64 = 1/4 FP32
Radeon R9 270X ------------- 2688 168 FP64 = 1/16 FP32
Radeon HD 7870 ------------- 2560 160 FP64 = 1/16 FP32
GeForce GTX 590 ------------ 2488 311 FP64 = 1/8 FP32
GeForce GTX 670 ------------ 2460 102 FP64 = 1/24 FP32
GeForce GTX 660 Ti --------- 2460 102 FP64 = 1/24 FP32
Radeon R9 270 --------------- 2368 148 FP64 = 1/16 FP32
GeForce GTX 760 ------------ 2258 94 FP64 = 1/24 FP32
Radeon HD 6950 ------------- 2253 563 FP64 = 1/4 FP32
Radeon HD 5850 ------------- 2088 417 FP64 = 1/5 FP32
Radeon R7 260X ------------- 1971 123 FP64 = 1/16 FP32
Radeon R7 265 --------------- 1894 118 FP64 = 1/16 FP32
GeForce GTX 660 ------------ 1882 78 FP64 = 1/24 FP32
Radeon HD 7790 ------------- 1792 128 FP64 = 1/14 FP32
Radeon HD 7850 ------------- 1761 110 FP64 = 1/16 FP32
GeForce GTX 580 ------------ 1581 197 FP64 = 1/8 FP32
Radeon R7 260 --------------- 1536 96 FP64 = 1/16 FP32
GeForce GTX 650 Ti Boost --- 1505 62 FP64 = 1/24 FP32
GeForce GTX 650 Ti ---------- 1425 60 FP64 = 1/24 FP32
GeForce GTX 570 ------------- 1405 175 FP64 = 1/8 FP32
GeForce GTX 750 Ti ---------- 1388 43 FP64 = 1/32
Radeon HD 7770 GHz -------- 1280 80 FP64 = 1/16 FP32
Radeon R7 250X -------------- 1280 80 FP64 = 1/16 FP32
GeForce GTX 750 ------------- 1110 34 FP64 = 1/32 FP32
GeForce GTX 650 -------------- 812 33 FP64 = 1/24 FP32
Radeon R7 250 ----------------- 806 50 FP64 = 1/16 FP32
Radeon R7 240 ----------------- 500 31 FP64 = 1/16 FP32


Workstations cards:
GPU FP32 GFLOPS FP64 GFLOPS Ratio
FirePro W9100 5240 2620 FP64 = 1/2 FP32
 
Last edited:

StitchExperimen

Senior member
Feb 14, 2012
345
5
81
If anyone would like a Word copy file for your personal records PM me with an email and you'll get one.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,842
4,822
75
I never heard till now of a driver needing a virtual cpu.
Yeah, this is somewhat my fault. :p I implemented some additional checks in the OpenCL Proth Prime Search (Sieve) application, while at the same time increasing the speed of the GPU code. I also failed to figure out a way to compile the app for 64-bit Windows. The net result is that some fast GPUs with moderately slow CPUs need more than one core per GPU. Because my application only uses a single thread, the only reason I can think of that it would use more than one core is that the driver takes some CPU too.

P.S. Thanks for that list! Do you happen to have a list of applications that can use Intel GPUs? That's another thing I didn't get around to with Proth Prime Search (Sieve) this year.
 

StitchExperimen

Senior member
Feb 14, 2012
345
5
81
Which works better?
Collatz Conjecture
Nvidia
AMD
6s
additional comments:

Einstein@home
Nvidia
AMD
6s
additional comments:

GPUgrid.net
What's needed to be GPU productive? Last time I used a pair of 660 Ti's. What's the most bang for the buck?

Milkyway@home
Nvidia
AMD
6s
additional comments:

Moo!
Nvidia
AMD
6s
additional comments:

PrimeGrid
Nvidia
AMD
6s
additional comments:

SETI@home
Nvidia
AMD
6s
additional comments:
 

GLeeM

Elite Member
Apr 2, 2004
7,199
128
106
Milkyway is better, by far, on AMD.

Collatz:
GTX560Ti = 475,000 ppd
GTX970 = 688,000 ppd
GTX780 = 760,000 ppd
HD7950 = 850,000 ppd

Einstein: the 7950 and 780 are close

MOO: the 7950 is twice as fast as the 780

If I remember right Nvidia is better on PrimeGrid and Seti than AMD

There is a Nvidia GPU client for Asteroids but it is only 2X faster than one CPU core so doesn't make sense to run as it uses some CPU too.

Little known GPU project is Albert@Home, the Einstein testing project. (Looks like it counts towards Free-DC Milestone Stats!)
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,842
4,822
75
PrimeGrid depends on the project. I notice you listed Cullen/Woodall Prime Search (Sieve). It's already finished, and isn't likely to resume any time soon.

PPS Sieve is quite a bit better on Nvidia, and since another PPS Sieve race is tentatively scheduled for next December it's likely that project will continue for at least another year. The best card for that is probably a 780ti. If you do try to do this with AMD, go Linux, as that will reduce CPU usage.

GFN requires double-precision, and its OpenCL version is better than its CUDA version in many cases, so it has different requirements. I'm not absolutely sure, but I think the best card for that is an AMD R9 280X. But a GTX 580 is not bad either. GFN is likely to continue after PPS Sieve ends, but not necessarily very long.

There have also been some attempts to port LLR, the usual prime-finding program, to GPUs. None have been very effective. Even the CPU client is limited by RAM latency/bandwidth. Sky Lake (the next Intel CPU) is likely to be faster than anything else at LLR, especially with fast RAM. LLR is what we'll be using long after all sieves are finished.