"GPU Multitasking" in the modern era, vendor relative ability

VirtualLarry · Feb 12, 2018

Just a real-world test here.

I've got a couple of R5 1600 boxes here, one with a GTX 1070 ti 8GB card, and one with TWO (RX 470 + RX 570). Windows 10, I believe newest video drivers on both.

I'm normally on the box with the RX 570, and I sometimes Skype and watch YouTube at the same time, while mining in the background. On that box, I simply watch my mining H/s output decrease, when I put the additional load on the video card.

On the box with the GTX 1070 ti, which is arguably the more powerful card, I see stutters and FF/RW on the video "chunks" while watching YouTube. Very disconcerting, and hard to watch. Skype was OK, just at a slightly lower-frame rate than when not mining.

I put this all down to multi-tasking, and the very real advantage that AMD has with their GPU architecture, having hardware-based scheduling, with multiple ACEs and whatnot. It just runs a lot smoother overall, when you've got multiple programs demanding work from the CPU, while at the same time, the host CPU is loaded way down (mining XMR), so a software-based scheduling solution is going to have a lot of latency, which is what I was seeing with the YouTube videos.

Has anyone else noticed this, with a real gritty A/B-type real-world testing?

AdamK47 · Feb 12, 2018

One solution I know of to get around the stuttering issue is to go into task manager and kill all mining processes. This will free up resources immensely and allow for smooth PC gaming with your graphics card.

Muhammed · Feb 12, 2018

VirtualLarry said:
having hardware-based scheduling, with multiple ACEs and whatnot

Sigh, how many times this fallacy has been repeated? NV GPUs don't lack a hardware scheduler at all. THEY ARE NOT USING A SOFTWARE SCHEDULER!
Ever thought the reason you are having this disparity is using TWO AMD GPUs instead of one?

VirtualLarry · Feb 12, 2018

I just did this again, with the SAME video on YouTube, and calling the SAME person on Skype, on the AMD rig, and ... no stutters whatsoever with the YouTube vid.

You really think that have a secondary AMD GPU in the rig, actually accelerates Skype or YouTube, MORE than just having a single AMD GPU? I've not heard of any support for multi-GPU acceleration for those things mentioned in driver release notes, have you?

Could a scheduling issue be behind why the GT1030 has such horrible frametimes, as compared to AMD's 2200G and 2400G APUs (released today, so drivers aren't even really optimized well yet), in this game?

https://techreport.com/review/33235/amd-ryzen-3-2200g-and-ryzen-5-2400g-processors-reviewed/7

I cannot speak to "boots on the ground" experience with that game on the GT1030, as I do own a GT1030 (in my FX-8320E rig as of last week), but I don't own that game yet.

I don't have a 2200G APU yet, either, probably maybe by next week I'll have one, could maybe do some testing?

tamz_msc · Feb 12, 2018

I've noticed that on my GT 730, doing some memory-intensive Einstein@home jobs on the GPU causes visible stutter - the BOINC screensaver and desktop rendering is pretty choppy during that time.

Despoiler · Feb 12, 2018

Muhammed said:
Sigh, how many times this fallacy has been repeated? NV GPUs don't lack a hardware scheduler at all. THEY ARE NOT USING A SOFTWARE SCHEDULER!
Ever thought the reason you are having this disparity is using TWO AMD GPUs instead of one?

Nvidia moved away from full hardware scheduling in Kepler. https://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/3

However based on their own internal research and simulations, in their search for efficiency NVIDIA found that hardware scheduling was consuming a fair bit of power and area for few benefits. In particular, since Kepler’s math pipeline has a fixed latency, hardware scheduling of the instruction inside of a warp was redundant since the compiler already knew the latency of each math instruction it issued. So NVIDIA has replaced Fermi’s complex scheduler with a far simpler scheduler that still uses scoreboarding and other methods for inter-warp scheduling, but moves the scheduling of instructions in a warp into NVIDIA’s compiler. In essence it’s a return to static scheduling.

Pascal's differs from the above approach by introducing dynamic load balancing which is done in the driver. It's in Nvidia's own presentation of the feature. Once the driver determines the ratio of work types coming in it distributes it to the GPU(assuming the dev didn't choose static allocation). Eventually there is hardware scheduling, but it's once all of the work is getting broken down for processing at the SMs.

VirtualLarry · Feb 12, 2018

I disabled the second card in Device Manager, now just using the Primary RX 570, I exited and re-started NiceHash, it gave me an error about the missing video card, but started mining on the remaining primary card.

Then, I about had a heart-attack, my hashrate went down to 4Mh/s, rather than 22Mh/s, and my entire Windows 10 experience got slowed way down. By the time I decided to reboot, I checked HWMonitor, and it appeared that the MemClock went down to 300Mhz, for some reason, rather than 1850Mhz. I shut down most of my programs, and then started the mining app back up, and MemClock returned to 1850Mhz, and I'm back to 22Mh/s. I guess there's still some driver quirks about the MemClock getting set way down, for some reason, and not bouncing up under load.

I still persistently kind of have that problem with my other rig, with a pair of R9 260X cards, HWMonitor shows them at 300Mhz MemClock, and they don't want to mine very well. On the newest drivers, I believe. I guess AMD hasn't tested the newest drivers with ALL of their old GCN-based cards, or maybe there's some BIOS- or power-management- quirk with those cards. Most of my cards are XFX.

Headfoot · Feb 12, 2018

My 290s do the same thing VL. If you figure out the solution definitely post again please, so far I've just been periodically rebooting the machine to "reset" back to normal and that seems to work okay

MrTeal · Feb 12, 2018

I think it's just an nVidia thing. My system bogs right down when mining on my 1080Tis and is basically unusable. Videos stutter, and the mouse lags horribly. With AMD cards, while you wouldn't have been able to game it was perfectly usable for video watching and internet browsing while also mining.

VirtualLarry · Feb 12, 2018

MrTeal said:
I think it's just an nVidia thing. My system bogs right down when mining on my 1080Tis and is basically unusable. Videos stutter, and the mouse lags horribly. With AMD cards, while you wouldn't have been able to game it was perfectly usable for video watching and internet browsing while also mining.

That's been my experience as well, in a nutshell. AMD cards seem to me, from observation, to be MUCH better at "actual multitasking".

Crono · Feb 12, 2018

The default intensity level may be higher for Nvidia cards. AMD cards definitely would get bogged down in equal fashion running at higher intensity. Not sure if there's an option to change it in NiceHash (probably can through config file), but you usually can through pooled mining like via Claymore or ethminer.

SimianR · Feb 12, 2018

I don't mine on Nicehash, I just use Claymore miner and mine ethereum with a GTX 1080. I found that I had this mouse lag and video lag as well, and when I changed the intensity to 1, which is -ethi 1 in the config, the mouse lag went away and there was only a minor performance hit (from 25Mh's to 24Mh's).

24601 · Feb 13, 2018

It's because OpenCL sucks for high performance computing, like mining, as OpenCL programs usually are relying on the drivers to load the GPU as much as it can.

The CUDA miners are optimized closer to the metal while the OpenCL ones are more generic and relying on whatever the AMD Driver allows it to do.

Due to the CUDA miners being so well optimized you are left with less idle resources on the graphics card.

The last AMD graphics card that had an architecture that wasn't hilariously lopsided was Tahiti, and you will see exactly the same thing happen with Tahiti when you run hyper-optimized hand-coded miners.

The entire reason why AMD jammed more and more scheduling silicon into every iteration after Tahiti is due to the fact that AMD sucks at actually keeping it's resources busy.

It's the same reason Ryzen has higher "gain" from hyperthreading than Skylake due to Ryzen sucking at extracting instruction level parallelism compared to Skylake.

zlatan · Feb 13, 2018

Windows and the traditional GPGPU model is the problem. A GPGPU program will launch kernels, no matter what API are you use, and the drivers will ask the OS to launch these for the user, which will involves a lot of system calls. Now if an app is drastically optimized for a GPU, and launch a lot of kernels it will involves more system calls, and the Windows OS might give you a bad multitasking experience, because the OS kernel has a lot of work to feed the GPU. And it doesn't matter if you not see a big CPU load, the problem is in the traditional GPGPU programming model, which will lead to a lot of unnecessary copy overhead. This gets actually worse with the meltdown patch.

This is also can be reproduced on AMD, although their driver is just more robust, and they are specially optimize for these scenarios, while Nvidia don't. But the underlying issue is still there in the OS, and there might be some GPGPU apps that can lead to a bad multitasking experience with a Radeon GPU.

AMD also solved the whole problem with ROCm on Linux. In this model a kernel launch doesn't involve an OS system call. But this approach requires a very big structural change in the OS, this is why ROCm is not supported on Windows.

VirtualLarry · Feb 14, 2018

While I agree, it might have something to do with the "intensity" setting, but I think that AMD drivers have something to do with it too, as on my FM1 APU rig, even at max CPU load, it was still "responsive", far, far, better than a Sandy Bridge i3 rig I happened to be working on on the same day, that I made a comparison with, while updating Windows 10.

One thing that I noticed, is that on the FM1 rig, running Windows 10 64-bit, the CPU load will max out at 99%, it wouldn't allow (user?) threads to occupy 100% CPU time. In a matter of fact, I could listen to internet radio flawlessly, well, unless the internet connection got bogged down, which it can during Windows 10 updates. But the CPU portion of the FM1 APU never did, and browsing was still possible, while doing updates.

Shmee · Feb 14, 2018

Larry, I do not have this issue with my 1080Ti. One thing I can recommend, is using EWBF's miner to mine equihash coins.

VirtualLarry · Feb 14, 2018

Shmee said:
Larry, I do not have this issue with my 1080Ti. One thing I can recommend, is using EWBF's miner to mine equihash coins.

Interesting. So, maybe the issue is NiceHash? Or how "optimized" the CUDA miners are? Not leaving that extra 1-2% of GPU power spare, for other things?

Now that you mention it, I think that some DC projects that ran on my 7950 3GB cards, were like that too, but in some later years, they weren't as bad, because they were modified to only allow using so many max percentage of the GPU's resources, leaving some free for occasional other things. I don't know if that was a driver change, or a change in the DC program that "drove" the GPUs.

Headfoot · Feb 14, 2018

When I need to use my pc with the miner on I just turn down intensity and it works fine

PhonakV30 · Feb 14, 2018

I use only Claymore CryptoNote and It's great without stutter.If i use xmr-stak-amd.exe for mining Monero , I get horrible stutter on gaming.so change your Mining App for Nv cards

Muhammed · Feb 14, 2018

Despoiler said:
Nvidia moved away from full hardware scheduling in Kepler. https://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/3

The concept of AMD having better scheduling than NVIDIA is BLATANTLY WRONG, NVIDIA still has more scheduling hardware than AMD even after the reduction. There are two parts of scheduling, instruction and latency tracking, Fermi has BOTH. Kepler did without the latency tracking scoreboarding, but it still retained the instruction part, and it's arguably even more complicated than GCN's fixed latency pipeline and simple counters. NVIDIA still uses reduced set of scoreboards for that part.

In essence: NVIDIA had a huge scheduling hardware in Fermi operating at two levels, AMD never tried that approach before, they only introduced one level in GCN. NVIDIA also reduced their scheduling to one level in Kepler but they still operate an arguably more complicated scheduler than AMD to this day.

Read here for more:
https://forum.beyond3d.com/posts/2015733/
https://forum.beyond3d.com/posts/1989879/
https://forum.beyond3d.com/posts/1975077/
https://forum.beyond3d.com/posts/1930541/
https://forum.beyond3d.com/threads/...rs-and-discussion.59649/page-102#post-1989856

Despoiler · Feb 14, 2018

Muhammed said:
The concept of AMD having better scheduling than NVIDIA is BLATANTLY WRONG, NVIDIA still has more scheduling hardware than AMD even after the reduction. There are two parts of scheduling, instruction and latency tracking, Fermi has BOTH. Kepler did without the latency tracking scoreboarding, but it still retained the instruction part, and it's arguably even more complicated than GCN's fixed latency pipeline and simple counters. NVIDIA still uses reduced set of scoreboards for that part.

In essence: NVIDIA had a huge scheduling hardware in Fermi operating at two levels, AMD never tried that approach before, they only introduced one level in GCN. NVIDIA also reduced their scheduling to one level in Kepler but they still operate an arguably more complicated scheduler than AMD to this day.

Read here for more:
https://forum.beyond3d.com/posts/2015733/
https://forum.beyond3d.com/posts/1989879/
https://forum.beyond3d.com/posts/1975077/
https://forum.beyond3d.com/posts/1930541/
https://forum.beyond3d.com/threads/...rs-and-discussion.59649/page-102#post-1989856

That wasn't the statement though. We are discussing whether Nvidia has software scheduling and the answer is they do. We know they use software to do some part of their scheduling. We know that AMD does it all in hardware. So really Nvidia should be referred to as a hybrid solution and AMD a hardware.

Muhammed · Feb 14, 2018

Despoiler said:
So really Nvidia should be referred to as a hybrid solution and AMD a hardware.

Nope, because AMD has a much simpler scheduler, they use a hybrid solution as well. In fact they always did from the very beginning.

hemla · Feb 14, 2018

If you have integrated card you might try setting your OS to use that card for display while leaving your mining cards alone?

JimKiler · Feb 14, 2018

Despoiler said:
That wasn't the statement though. We are discussing whether Nvidia has software scheduling and the answer is they do. We know they use software to do some part of their scheduling. We know that AMD does it all in hardware. So really Nvidia should be referred to as a hybrid solution and AMD a hardware.

Muhammed said:
Nope, because AMD has a much simpler scheduler, they use a hybrid solution as well. In fact they always did from the very beginning.

Are we talking about schedulers or who is better AMD versus Nvidia?

IEC · Feb 14, 2018

Does this on my Pascal cards when running BOINC as well.

Solution is to suspend BOINC on my main rig when doing something that requires the absence of stutter.

"GPU Multitasking" in the modern era, vendor relative ability

No Lifer

Lifer

Senior member

No Lifer

Diamond Member

Golden Member

No Lifer

Diamond Member

Diamond Member

No Lifer

Lifer

Senior member

Golden Member

Senior member

No Lifer

Memory & Storage, Graphics Cards Mod Elite Member

No Lifer

Diamond Member

Senior member

Senior member

Golden Member

Senior member

Junior Member

Diamond Member

Elite Member