Identify physical GPUs in multiple GPU system

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
I'd like to figure out which physical card it is to see if the card itself is going bad, or maybe it's a slow PCIe slot, or something like that.
Users would love if device enumeration, including PCIe device enumeration, was more predictable. But that would require a concerted standardization effort between hardware vendors (SoC vendors and mainboard vendors), firmware vendors, and OS vendors. As far as I know (but I could be wrong), the state of affairs is that PCIe device enumeration is mostly stable across OS reboots, but there are no hints to the user which physical slot belongs to which bus ID.

Edit, it's not a trivial problem. PCIe is generally hot-pluggable, the user can add switches, and so on.
 
Last edited:

Fardringle

Diamond Member
Oct 23, 2000
9,188
753
126
Sometimes a science application is implemented in Linux, then more or less successfully ported to Windows. Other times it's the other way around.

I have read elsewhere that MLC's Linux application binary is dynamically linked, and the library files are not distributed by the project but taken from the host system. This and your performance findings could be hints that it was a Windows application ported to Linux, without additional efforts on robustness and/or efficiency.
Probably true. It sounds logical.

All GPU tasks are actually CPU+GPU tasks, on Windows, on Linux, and any other platform.
That's true, but in all other cases I've seen, including exactly the same projects on the same hardware in Windows, the GPU tasks always run (unless disabled) even when other CPU projects are running. That isn't happening in this case. BOINC seems to be treating them as purely CPU tasks and scheduling them as such to alternate with the actual CPU project, and NOT running them if there aren't any CPU threads available.

Use two client instances, one for each application type. Configure each client to use as much of the host resources as you intend to dedicate to each application type.

Or use just one client instance. Create an app_config.xml which tells the client that the CPU+GPU application uses just 0.01 CPUs on average. (The client will practically take that as "Completely ignore that the CPU+GPU application is using some amount of CPU".) Configure the client to use only as many CPUs as you want to dedicate to the CPU-only application.

I have used the second "trick" before to actually change the amount of CPU used by some projects. But MLC completely ignores it for that, so I don't know if it would be effective even for scheduling tasks in the BOINC client. It's worth a try just to see what happens. If it doesn't work, I'll go with the two client option.

Another option is to tell the OTHER project that it is only allowed to run a specific number of tasks simultaneously, forcing it to leave the rest of the CPU threads available for other work, but since I tend to switch between projects a lot, that would mean having to configure the setting for every project and I don't really want to do that.

edit: I set the MLC project to use .01 CPUs per task in app_config.xml and told BOINC to limit itself to 67% of the CPU (8 out of 12 threads). It immediately switched from running 12 Ramanjujan tasks to 8 Ramanujan tasks and 4 MLC tasks the way it should be. Still using pretty much 100% of the CPU since each MLC task uses a full CPU thread, but that's to be expected. I'll keep and eye on it for a while to make sure it keeps behaving. :)

After maybe a day or two to get some good averages for my report in my other thread, I'll swap to other projects to see any of them show signs of bandwidth limiting on the slower slots. I'll have to set up four BOINC clients (or test the cards one at a time) for SRBase. Since my goal is to get to 100M total and then move my other GPUs to something else, and I'm not very far from that goal now, I think I might make that my next test.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
I have used the second "trick" before to actually change the amount of CPU used by some projects. But MLC completely ignores it for that, so I don't know if it would be effective even for scheduling tasks in the BOINC client.
All of the existing GPU applications ignore this parameter. But the client takes it into account, independently of the project. And that's what makes the trick work.

As for CPU-only applications: Some, but not all, multithreaded applications take this parameter in order to configure their threadcount. E.g. vboxwrapper based applications do this. But these are rather the exception. (Edit,) the more general case is that the applications ignore it. The client however always respects it. (The client doesn't monitor how many percent of the CPUs a running application is using; instead the client takes for granted what the server or the app_config.xml is claiming about that.)

Another option is to tell the OTHER project that it is only allowed to run a specific number of tasks simultaneously, forcing it to leave the rest of the CPU threads available for other work,
I have seen this approach fail occasionally: While the client honors the limit of concurrent tasks per project or concurrent tasks per application, it does not always fill up the remaining CPUs with work from other projects or applications. Instead, some CPUs may be left idle.

None of this is directly Windows- or Linux-specific. Rather, this sort of behavior depends on a) the client version, b) which particular projects/ applications are active, c) which priorities the client is giving the tasks in the work buffer.

The latter point is a confused mix of project "resource share", reporting deadlines vs. estimated task durations¹, how much work was recently done for each of the active projects², and whatnot.

If you see different work scheduling behavior on Windows and Linux, it's not directly because of the OS.

________
¹) the client prioritizes work which seems to be in danger to miss the deadline
²) the client prioritizes work from projects which were not run a lot on this host in recent times
 
Last edited:

Fardringle

Diamond Member
Oct 23, 2000
9,188
753
126
BTW, you already monitored shader utilization and power draw. K2200 is supposedly based on the Maxwell architecture, therefore you might be able to get PCIe utilization readouts too. Try nvidia-smi dmon -s puct (t is for PCIe throughput), or with all possible metrics enabled: nvidia-smi dmon -s pucvmet. You'll need to find out where the nvidia-smi.exe is located in Windows; it's probably not in your %PATH.

I'm not quite sure of the significance of some of the column headers, but here's an excerpt from the results of "nvidia-smi-dmon -s puct" while running four MLC@Home tasks:

Code:
# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk rxpci txpci
# Idx     W     C     C     %     %     %     %   MHz   MHz  MB/s  MB/s
    0    13    58     -    53     8     0     0  2505  1124   209    45
    1    11    55     -    65    13     0     0  2505  1124  1627   445
    2    15    52     -    49     7     0     0  2505  1124   342    73
    3    10    46     -    61     6     0     0  2505  1124   834   123
    0    11    58     -    47     9     0     0  2505  1124   206    68
    1    10    55     -    65    12     0     0  2505  1124  2867   427
    2    11    52     -    48    10     0     0  2505  1124   463    37
    3     8    46     -    60    10     0     0  2505  1124   967   128
    0    15    58     -    49     7     0     0  2505  1124  1842   117
    1    14    55     -    66    13     0     0  2505  1124  2736   513
    2    12    52     -    49     7     0     0  2505  1124   151    57
    3    13    46     -    61     6     0     0  2505  1124   948   122
    0     7    58     -    52    11     0     0  2505  1124   158    32
    1    13    55     -    64    12     0     0  2505  1124  2817   674
    2     9    52     -    48    11     0     0  2505  1124   161    26
    3     7    46     -    61     6     0     0  2505  1124   948   125
    0     8    58     -    43     3     0     0  2505  1124   118    39
    1    11    56     -    64    13     0     0  2505  1124  2661   312
    2    11    53     -    48     8     0     0  2505  1124   212    42
    3     7    46     -    61     6     0     0  2505  1124   955   120
    0    15    58     -    50     7     0     0  2505  1124   130    29
    1    12    56     -    70     9     0     0  2505  1124  2895   690
    2    11    53     -    56    11     0     0  2505  1124   268   374
    3     7    47     -    57     4     0     0  2505  1124   965   125
    0    14    58     -    51    11     0     0  2505  1124   205    61
    1    10    56     -    65    13     0     0  2505  1124  2658   561
    2    11    53     -    43     3     0     0  2505  1124    56    21
    3     8    47     -    61     6     0     0  2505  1124   976   128
    0    10    58     -    44     7     0     0  2505  1124   126    27
    1    13    56     -    67    13     0     0  2505  1124  2812   266
    2    10    53     -    49     7     0     0  2505  1124   516   172
    3    11    47     -    61     6     0     0  2505  1124  1201   132
    0     8    58     -    53     7     0     0  2505  1124   149    44
    1    14    56     -    65    12     0     0  2505  1124  3053   582
    2    10    53     -    49    11     0     0  2505  1124    78    39
    3     7    47     -    61     6     0     0  2505  1124   948   125
    0     9    58     -    50    11     0     0  2505  1124   322    42
 

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
gpu Idx ........ GPU index
pwr W .......... board power usage in Watts
gtemp .......... GPU temperature in °C
mtemp ......... memory temperature in °C
sm % ............ shader utilization in %
mem % ......... memory use in %
enc/dec % .... encoder/ decoder units utilization in %
mclk MHz ..... memory clock
pclk MHz ...... GPU clock
rxpci MB/s .... host bus use, reception direction
txpci MB/s .... host bus use, transmission direction

I'm not quite sure if "reception/ transmission" is from the perspective of the host, or from the perspective of the GPU.

Some GPUs or cards don't support all of the metrics which nvidi-smi can potentially monitor.

BTW, the period at which nvidia-smi dmon is sampling can be configured with the parameter -d, e.g. nvidia-smi dmon -d 10 [...] for a 10 seconds interval. The minimum and default is 1 s, and AFAIU that's also what should be used if one is looking for peaks.

You can furthermore use the parameter -i to select one or more GPUs to monitor. E.g. nvidia-smi dmon -i 0 [...] for the first, or nvidia-smi dmon -i 1,2 [...] for the second and third.
 
Last edited:
  • Like
Reactions: biodoc