Identify physical GPUs in multiple GPU system

Fardringle · Oct 15, 2021

So I recently acquired a retired old software compiling/rendering system and wanted to give it a workout in BOINC.

The GPUs aren't super powerful (Quadro K2200) but there are four of them in the system, so I figured it would at least be interesting to play with, if not overly productive.

Windows 10 and BOINC had no problem identifying and activating all four cards for me. However, one of the cards is running a LOT slower than the other three in BOINC (about 2.2 times the processing times on the same tasks) and I'd like to figure out which physical card it is to see if the card itself is going bad, or maybe it's a slow PCIe slot, or something like that.

Dell's specs for the motherboard say that all four slots are PCIe x16 but one is also labeled as "PCIe 2.0 wired as x4" on that page so it might actually be a slower slot. I suspect the slower slot could be the issue since the project I'm testing them on at the moment (MLC@Home) likes high bandwidth, but I'm not 100% sure, and I'd really like to know how to tell if the card in that slot is the one that Windows identifies as Device 3.

Anyway, on to the question:

GPU-Z shows all four cards with very similar clock and RAM speeds, similar temperatures, bouncing around 70-100% GPU load, 11-18 Watt total power draw (one of the reasons these cards intrigue me), etc., but the one slower card - Device 3 - is staying around 40-60% GPU load and 8-11 watts power draw. So there is definitely something different about that one card or slot.

I want to physically identify which card is Device 3 in the system and swap the cards around to try to determine if it's an issue with the card or the PCIe slot.

My Google search skills are failing me on this one, and I haven't been able to find anything useful to help me. Anyone here have any ideas?

If you don't know of a way to physically identify cards based on their Device ID in Windows, maybe a project that doesn't use much PCIe bandwidth would also be helpful, to see if it's actually a problem with a card, or if it's just a slower PCIe slot.

lane42 · Oct 15, 2021

I used to run 4/5 gpu's in my seti rigs but that was a while ago and my memory is a little foggie. It might be GPU'S 0-3, so Device 3 might be gpu4, or the closest gpu to the
end of the board.

Fardringle · Oct 15, 2021

I was hoping it would be that simple, but from what little I have found online, people say that it's rarely (if ever) that easy, and GPU 0, 1, 2, 3, etc. could literally be in any location on the motherboard and it just depends on how Windows (arbitrarily?) assigns them...

It is possible, though, since the bottom PCIe slot is the one labeled as "PCIe 2.0 wired as x4" on the specs page in my link, and Device 3 (card 4) is the one that is getting slower times than the others.

emoga · Oct 15, 2021

@Fardringle You could try 'Exclude GPU' within the cc_config.xml file.

You could fiddle around with the device numbers until you find the trouble gpu. Just change the url to the appropriate project.

Code:

       <exclude_gpu>
        <url>www.primegrid.com</url>
         <device_num>0</device_num>
      </exclude_gpu>

       <exclude_gpu>
        <url>www.primegrid.com</url>
         <device_num>1</device_num>
      </exclude_gpu>

You might need to restart boinc for it to work.

Fardringle · Oct 15, 2021

Thanks, but I know which Device number is slower than the others as I can see that Device 3 is the one that has much longer running times in the BOINC client.

Unless I'm missing something, I don't see how that code would help me physically identify the card/slot that is running slower. I suppose it could make the GPU fan spin slower when the card is not working hard, but these Quadro cards have completely enclosed single slot coolers so the fans are not visible from the side and are only just barely visible looking straight at the face of the card...

I suppose the simplest option is to just turn the machine off, pull one of the cards, then boot it back up and see if the slow times disappear, and repeat the process with each card one at a time. Sadly, MLC doesn't have checkpoints so that would mean losing all progress on the current tasks. I'll have to decide if satisfying my curiosity is worth losing the work in progress.

crashtech · Oct 15, 2021

The one that is excluded will not be hot.

Fardringle · Oct 15, 2021

crashtech said:
The one that is excluded will not be hot.

That's a fair suggestion, but they are all reporting very nearly the same temperatures in GPU-Z, around 60-65C. I suspect that's because the PCIe slots are in pairs and the pairs are very close together, with only a couple of millimeters between the two cards in each pair. The one slow card is reporting the lowest temperature, but only by a few degrees so I can't tell any real difference by touching the sides of the cards.

crashtech · Oct 15, 2021

If the problem GPU is excluded from running, it should feel noticeably cooler, should it not?

Fardringle · Oct 15, 2021

To make things a little more interesting, and possibly explain what I'm seeing, I took a closer look at the task history for this computer and realized that it looks like two of the GPUs are running "fast" and averaging around 5,700 seconds per task, the "slow" one I've been talking about is averaging around 12,800 seconds, and another card is also running a bit slower than the fast two and averaging around 7,500 seconds.

So based on that, I think what's happening is the two fastest cards are in the slots labeled as "PCIe 3.0 x16 slot" on that specs sheet, the slowest one is in the slot labeled "PCIe x16 slot (PCIe 2.0 wired as x4)", and the middle speed card is in the slot labeled "PCIe x16 slot (PCIe 3.0 wired as x8)", and since MLC@Home is very dependent on communication between the GPU and the CPU, the x4 and x8 wired cards just aren't getting enough bandwidth to run the tasks at full speed.

I need to try a project that is not a PCIe bandwidth hog so I can test to see if all four cards run it at about the same speed. Any suggestions?

Fardringle · Oct 15, 2021

crashtech said:
If the problem GPU is excluded from running, it should feel noticeably cooler, should it not?

Oh, you meant turn that card off completely in the project. Yes, that should theoretically make it feel cooler. I think I got it sorted out based on my last post, though. Thanks.

Orange Kid · Oct 15, 2021

Now that you have it figured out, would it not be more interesting to see how many GPU projects are affected by pci slot bandwidth?
Incase you are bored.

Fardringle · Oct 15, 2021

Orange Kid said:
Now that you have it figured out, would it not be more interesting to see how many GPU projects are affected by pci slot bandwidth?
Incase you are bored.

I don't know about bored, but it is something interesting to find out. Maybe I'll let them run SRBase as the first test since I'm working on getting to 100M with my other GPUs right now.

StefanR5R · Oct 16, 2021

Based on what you say in post #9, MLC@Home is not only very dependent on bus bandwidth, but actually extremely dependent. Even much more so than the previously known top bandwidth hog, Folding@home. (With these K2200 cards, you don't even have over-the-top amounts of execution resources per slot, and the power budget and clocks are rather low too.)

Given this, I wonder if MLC@Home would perform considerably better on Linux compared to Windows. On Windows, there are extra memory transfers involved, to support features of the driver stack such as crash dumps and live driver updates.

BTW, you already monitored shader utilization and power draw. K2200 is supposedly based on the Maxwell architecture, therefore you might be able to get PCIe utilization readouts too. Try nvidia-smi dmon -s puct (t is for PCIe throughput), or with all possible metrics enabled: nvidia-smi dmon -s pucvmet. You'll need to find out where the nvidia-smi.exe is located in Windows; it's probably not in your %PATH.

Fardringle · Oct 16, 2021

StefanR5R said:
Based on what you say in post #9, MLC@Home is not only very dependent on bus bandwidth, but actually extremely dependent. Even much more so than the previously known top bandwidth hog, Folding@home. (With these K2200 cards, you don't even have over-the-top amounts of execution resources per slot, and the power budget and clocks are rather low too.)

Given this, I wonder if MLC@Home would perform considerably better on Linux compared to Windows. On Windows, there are extra memory transfers involved, to support features of the driver stack such as crash dumps and live driver updates.

BTW, you already monitored shader utilization and power draw. K2200 is supposedly based on the Maxwell architecture, therefore you might be able to get PCIe utilization readouts too. Try nvidia-smi dmon -s puct (t is for PCIe throughput), or with all possible metrics enabled: nvidia-smi dmon -s pucvmet. You'll need to find out where the nvidia-smi.exe is located in Windows; it's probably not in your %PATH.

That's an interesting idea to try to find out exactly where/why MLC is such a bandwidth hog (the developers acknowledge that it is, but I haven't seen any details about it). But I just switched the machine over to SRBase to test to see if it has any measurable bandwidth limitations, so I'll try your suggestion on MLC when I am done testing and switch back to that project.

Fardringle · Oct 16, 2021

StefanR5R said:
Given this, I wonder if MLC@Home would perform considerably better on Linux compared to Windows. On Windows, there are extra memory transfers involved, to support features of the driver stack such as crash dumps and live driver updates.

I actually kind of want to switch that machine to Linux to let it run the new Ramanujan Machine project. That might let me give a useful answer to this question.

I have installed Linux a few times, but mostly just in VMs, and never with a decent GPU, and certainly never with multiple GPUs. Is there anything special I need to do to get multiple GPUs working, or should the OS and drivers automatically detect all four when the correct drivers are installed? I'm leaning toward installing Linux Mint 20 since that's the distro I'm most familiar with, but if it would be easier to do in a different distro, I'm open to suggestions.

crashtech · Oct 16, 2021

My Linux boxes are all Mint 20. My impression is that Stefan has little against Mint, but little for it either. I believe his preferences run toward less user-friendly distros like Gentoo, if I'm not mistaken. Great stuff if you like building from scratch. Most of us on the TeAm use Mint, I think.

Markfw · Oct 16, 2021

crashtech said:
My Linux boxes are all Mint 20. My impression is that Stefan has little against Mint, but little for it either. I believe his preferences run toward less user-friendly distros like Gentoo, if I'm not mistaken. Great stuff if you like building from scratch. Most of us on the TeAm use Mint, I think.

Well, I have mint 19 on 16 boxes.... My favorite, as a linux idiot.

Icecold · Oct 16, 2021

Fardringle said:
I actually kind of want to switch that machine to Linux to let it run the new Ramanujan Machine project. That might let me give a useful answer to this question.

I have installed Linux a few times, but mostly just in VMs, and never with a decent GPU, and certainly never with multiple GPUs. Is there anything special I need to do to get multiple GPUs working, or should the OS and drivers automatically detect all four when the correct drivers are installed? I'm leaning toward installing Linux Mint 20 since that's the distro I'm most familiar with, but if it would be easier to do in a different distro, I'm open to suggestions.

If you're using a recent version of Ubuntu or LInux Mint there is a 'driver manager' (GUI, not command line) that you just have to select the proprietary NVidia drivers to be installed. It's really easy. If you're running a different distro you may need to install Nvidia drivers manually which is hurt or miss. Mint seems to work great.

Fardringle · Oct 16, 2021

Thanks for the info. I'll install Mint as soon as the GPUs finish running their current SRBase tasks. I'll be sure to beg for help if I have any problems getting all four GPUs working in Linux.

Markfw · Oct 16, 2021

Fardringle said:
Thanks for the info. I'll install Mint as soon as the GPUs finish running their current SRBase tasks. I'll be sure to beg for help if I have any problems getting all four GPUs working in Linux.

Just remember, its easy to install and run, but you don't have as many tools to tell you whats going on.

Fardringle · Oct 17, 2021

While anxiously waiting for the SRBase tasks to finish as their run times continued to climb higher and higher, silly me finally remembered that SRBase doesn't work well with multiple GPUs, and all four tasks were running on one poor, abused K2200.

So I killed those tasks, and killed Windows as well. The computer is now running Linux Mint 20.2 and after installing the official 470.23 Nvidia drivers using the Driver Manager, BOINC recognized all four cards and I have been running MLC on them for a couple of hours. It definitely looks like the cards in the two slower slots are still being bandwidth limited just about the same as they were in Windows, but I'll let it run over night and then post and update with the average run times.

After that I'll start testing some other GPU projects. Any that I should try first? Or, going the other direction, are there any other projects that don't work right with multiple GPUs like SRBase?

StefanR5R · Oct 17, 2021

crashtech said:
My impression is that Stefan has little against Mint, but little for it either. I believe his preferences run toward less user-friendly distros like Gentoo, if I'm not mistaken. Great stuff if you like building from scratch.

I have Mint on the DC computers which have GPUs. That's out of the convenience which Mint offers, and due to the help one can get from several folks here in the TeAm on GPU issues. I'll point out @biodoc especially.

Of the headless 2P computers, I had initially 1 with Gentoo and 1 with openSUSE Tumbleweed. This was already from before I picked up the DC hobby. These computers had same hardware but different OS because I wanted openSUSE for verification of how I set up Gentoo, and vice versa Gentoo as a platform for self-compiled stuff to use on Suse if needed.

Gentoo became a nuisance to maintain on the compute slave, and Tumbleweed soon had a bad update which prevented subsequent updates. Now my 2P computers have openSUSE Leap.

Just my desktop computer is still running Gentoo. But I am thinking of trying openSUSE for the desktop when I eventually replace the hardware. (Before Gentoo, I used Mandrake Linux, and before that, SuSE Linux, HP-UX, Amiga OS…)

Fardringle said:
While anxiously waiting for the SRBase tasks to finish as their run times continued to climb higher and higher, silly me finally remembered that SRBase doesn't work well with multiple GPUs, and all four tasks were running on one poor, abused K2200.

There is a workaround to use SRBase's GPU application on all GPUs. It involves a separate boinc-client instance for each GPU, and edited project files.

Fardringle · Oct 17, 2021

StefanR5R said:
There is a workaround to use SRBase's GPU application on all GPUs. It involves a separate boinc-client instance for each GPU, and edited project files.

Yes, but I didn't have it set up that way. I was just intending to let them run a couple of "quick" tasks to see if there is any difference in processing times on the different PCIe slots. Waited 10 hours before I realized that it wasn't going to work because they were all running on the same GPU in the same BOINC instance...

Fardringle · Oct 17, 2021

I'm seeing some interesting results so far, running MLC in Linux.

First, it appears that the Linux client is treating the MLC as if they are CPU tasks for scheduling purposes. If I set the BOINC client to use 100% of the CPUs, it will run 4 MLC tasks and 8 RamanujanMachine tasks for a while, but then it occasionally suspends the MLC tasks and runs 12 Ramanujan tasks instead.

If I set the BOINC client to use 67% of the CPUs (reserving 4 for GPU tasks like I'd normally do on other computers/projects) it drops down to only running a total of 8 tasks instead of running 8 CPU tasks and having the 4 GPU tasks running on the 'unused' CPU threads.

I don't know if this is the way it is going to behave for all GPU projects or not since I've only tried MLC so far, or if it is just more odd behavior from MLC.

That said, of the tasks that have completed, it looks like the cards in the full speed slots are actually slower than in Windows, averaging around 1.9 hours instead of 1.6 hours, the 'middle' speed card is around 2.3 hours vs. 2.1 hours in Windows, and the slowest card is at 29 hours vs. 3.5 hours in Windows.

So it appears that the slowest card is not being bandwidth limited as much in Linux as it was in Windows, but the other three are slower, which is really strange but might be related to the fact that the BOINC client seems to be treating them as CPU tasks instead of GPU tasks...

StefanR5R · Oct 17, 2021

Sometimes a science application is implemented in Linux, then more or less successfully ported to Windows. Other times it's the other way around.

I have read elsewhere that MLC's Linux application binary is dynamically linked, and the library files are not distributed by the project but taken from the host system. This and your performance findings could be hints that it was a Windows application ported to Linux, without additional efforts on robustness and/or efficiency.

Fardringle said:
it appears that the Linux client is treating the MLC as if they are CPU tasks for scheduling purposes.

All GPU tasks are actually CPU+GPU tasks, on Windows, on Linux, and any other platform.

The server sets an average CPU usage percentage and average GPU usage percentage for each application version. This can be overridden locally via app_config.xml (or app_info.xml if the user is going that far). The boinc client takes these usage percentages as hints for how many tasks to launch at once.

If you run a CPU-only application along with a CPU+GPU application on the same host, and are not satisfied with how the client schedules tasks of one sort and the other, there are two ways how to force the client to do what you want:

Use two client instances, one for each application type. Configure each client to use as much of the host resources as you intend to dedicate to each application type.

Or use just one client instance. Create an app_config.xml which tells the client that the CPU+GPU application uses just 0.01 CPUs on average. (The client will practically take that as "Completely ignore that the CPU+GPU application is using some amount of CPU".) Configure the client to use only as many CPUs as you want to dedicate to the CPU-only application.

Credits: I learned the latter method from @TennesseeTony.

The former method with separate client instances has a bit more initial configuration overhead. But it gets you even more control per application, notably separate settings of the work buffer depths.

Edit:
The contents of the app_config.xml will of course be project specific. Also, the average CPU usage which is given in app_config.xml is really only an information to the boinc client. The science application itself won't care and will keep taking as much CPU time as without app_config.xml.

Identify physical GPUs in multiple GPU system

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Elite Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Lifer

Moderator Emeritus, Elite Member

Golden Member

Diamond Member

Moderator Emeritus, Elite Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Elite Member