Some major changes to my rigs. (Mining, DC, and otherwise.)

VirtualLarry

Lifer
Aug 25, 2001
48,384
5,094
126
I had been running a number of Ryzen-based gaming PCs, with semi-high-end video cards (GTX 1660 ti 6GB, one with an RX 5700 as well, my main rig). I like to do DC, occasionally (PG races, Dec. F@H race.) But to pay for my video cards, I use NH to make a few bucks on the side daily.

To conserve power, actually, consolidate power usage because my bedroom and living room are on the same 20A circuit, and my A/C in the bedroom is also on that circuit, but I have a separate circuit for A/V in the living room, which I've attached a power strip (it has a 120V and a 230V outlet on that wall plate/circuit), and thus I was running some crunchers/miners on that circuit as well.

I've consolidated all but my main rig's GTX 1660 ti cards into a Rosewill mining chassis (the 8-GPU, dual-PSU-capable one). It's a 6U rack case, it's fairly nice for what I paid for it ($65). It was interesting putting it together. I'm running off of my other Antec EDG 750W 80Plus Gold PSU. I also have another one in my main rig.

I'm using Rosewill x1 PCI-E risers, the ones that use the blue USB3.0 Type-A cables to connect a little PCI-E x1 "stub" circuit board, into the mobo in an x1 slot, and then into a little board that attached to a PCI-E x16 slot to plug intro a video card.

So I have 5x GTX 1660 ti 6GB cards hooked up this way now. Using an ASRock H110 PRO BTC+ as a mobo. (Picked up a couple of those for $20 ea. from monoprice.)

Anyways, the whole rig is powered by a dual-core Skylake Celeron G3900, because that's what I had around. However, that's popular for mining rigs like that one.

My question is, what DC projects will run on GPUs connected by PCI-E x1 connections? Any of them? I know that F@H likes bandwidth.. Not sure about PrimeGrid.

Fear not, I will keep my main rig "pure", as a gaming PC, that can run DC apps if need be.

But this consolidation for mining and thermal/electrical optimization was long needed.

Edit: The rig has 2x8GB of DDR4-3200, running at 2133. I do believe that it's likely that even with an x1 connection to the GPUs, that they will still be faster than CPUs for some projects.

Edit: Oh, I still have the Ryzen 6C/12T rigs, they just now have GTX 1650 4GB GDDR5 cards, about half the performance of the GTX 1660 ti cards. So they can still crunch, especially on the CPU. I've upgraded one more of them to a Ryzen R5 3600 CPU, so I now have two of those in my stable of rigs.
 
Last edited:

StefanR5R

Diamond Member
Dec 10, 2016
3,492
3,757
106
As you probably have seen too (certainly while folding), not only the host bus bandwidth is a concern, but also the host processor speed. While bus bandwidth usage varies quite a bit between DC applications, quite a few of the NVidia based ones tend to run a polling thread on the CPU which takes 100 % of one logical CPU if it can get it. (AMD applications tend to use less of the host processor bandwidth, but it's been a while since I last DC'd on an AMD GPU.)

It is of course possible to commit the CPU to additional work (other CPU work, or of course more GPU feeders in a multi-GPU rig), but this is going to reduce GPU computing throughput to some degree.

While I never collected data systematically, here are at least two recent observations, with GTX 1080Ti on Linux:
  • Moo! Wrapper's "opencl_nvidia_101" DNET client uses the host bus with merely ~80 MB/s RX and <30 MB/s TX.
    On the other hand, it has got one of those polling threads which take 100 % CPU.
  • PrimeGrid's "cudaPPSsieve" is polling too. On host processors with Intel HyperThreading enabled, I had to reduce the overall number of concurrent GPU and CPU tasks (GPU at PrimeGrid, CPU elsewhere) to the number of physical cores inorder to maintain 99...100 % GPU utilization.
    Vice versa, when I ran more tasks such that HyperThreading was actually used, GPU utilization frequently dipped down to 80 or 70 or less %.
Back to the subject of PCIe bus usage: If you use Windows, bus bandwidth usage in all DC applications will be higher than in Linux. That's because the Windows driver stack architecture requires frequent swapping between VRAM and system RAM. This enables features such as driver crash recovery and forensics, and live driver updates — at the cost of GPGPU performance.
 

VirtualLarry

Lifer
Aug 25, 2001
48,384
5,094
126
Back to the subject of PCIe bus usage: If you use Windows, bus bandwidth usage in all DC applications will be higher than in Linux. That's because the Windows driver stack architecture requires frequent swapping between VRAM and system RAM. This enables features such as driver crash recovery and forensics, and live driver updates — at the cost of GPGPU performance.
Interesting to note, thank you. I do run Windows 10 1909 (currently) on the box. I have allocated 65536 MB of Virtual Memory.
 

StefanR5R

Diamond Member
Dec 10, 2016
3,492
3,757
106
PS, by "swapping between VRAM and system RAM" I meant memory copies between GPU memory and system memory. (IOW I did not mean paging out to disk, as my wording might have wrongly implied.) The cost of this copying is primarily increased bus utilization, and typically lower GPU shaders utilization too.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
15,010
1,987
55
PrimeGrid PPS Sieve will do fine. :)
 

StefanR5R

Diamond Member
Dec 10, 2016
3,492
3,757
106
Quoting the Folding@home GPUs PPD Price Watt spreadsheet:
* On Windows pcie 3.0 x8 is needed for full performance of fast GPUs, x4 is minimum. On Linux for full performance x4 is optimum. x1 risers are 20% slower on Linux and more than 50% on Windows
(From what I have seen myself and in others' reports, Folding@home's fahcore_21/22 is known as the DC GPGPU application with the biggest dependence on host bus bandwidth.)
 

TennesseeTony

Elite Member
Aug 2, 2003
3,824
2,666
136
www.google.com
I can confirm that under Windows, Folding at 1x takes a massive hit on performance. My data was collect 3-4 years ago, so with the modern cards, it must be even worse of a hit. Linux would likely fare much better.
 
  • Like
Reactions: Assimilator1

Pokey

Platinum Member
Oct 20, 1999
2,642
212
106
Quoting the Folding@home GPUs PPD Price Watt spreadsheet:

(From what I have seen myself and in others' reports, Folding@home's fahcore_21/22 is known as the DC GPGPU application with the biggest dependence on host bus bandwidth.)
FWIW
My method isn’t scientific; I simply monitor each slot’s average over time.

I would pretty much echo Stefan and Tony. FAH uses more PCIe bandwidth than other DC projects, or mining.

I can’t tell any difference in Linux output between x4, x8, x16. So, I shoot for x8, but no less than x4, (I only have one card right now on an x4 slot, and it keeps up pretty good.) And keep the temps down to prevent throttling.

Pay attention to which slots give you which speeds, it can be tricky sometimes because the ideal slots are often too close together and temps become an issue.

My rigs are all naked, or open, but I have had cards bumping up against slowdown temp until I separated them. I am fortunate in that I have the room to spread out and avoid having to water cool.
 

ASK THE COMMUNITY