Microsoft Refines DirectX 12 Multi-GPU with Simple Abstraction Layer

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91
As long as they use AFR, latency is double. It's not an API thing, but a simple fact that every frame you see, took 2 frames of time to render (every other frame is done by a different GPU, that took 2 frames worth of time to render).

If they use SFR, scaling will be much worse.

Neither results in a great mid range CF/SLI experience. Multi GPU's are really only good when you want to go beyond what a single GPU can do at the high end.

Please read up on how it works in DX12. Ashes of the singularity devs have written about how they implemented it and removed latency / input delay.
 

RampantAndroid

Diamond Member
Jun 27, 2004
6,591
3
81
I think they just mean same card scaling, similar to current SLI (crossfire could do different GPU same arch for a while now).

Cross vendor / advanced functionality will take more work.

Why? If they're not relying on proprietary interconnects between the cards like an SLI bridge...Windows already supports multiple videocards running at the same time (I think that limitation was removed in Win8?)
 

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91
Why? If they're not relying on proprietary interconnects between the cards like an SLI bridge...Windows already supports multiple videocards running at the same time (I think that limitation was removed in Win8?)

Explicit mGPU comes in two flavors: homogeneous, and heterogeneous.

Homogeneous mGPU refers to a hardware configuration in which you have multiple GPUs that are identical (and linked). Currently, this is what most people think of when ‘MultiGPU’ is mentioned. Right now, this is effectively direct DX12 control over Crossfire/SLI systems. This type of mGPU is also the main focus of this post.

Heterogeneous mGPU differs in that the GPUs in the system are different in some way; whether it be vendor, capabilities, etc. This is a more novel but exciting concept that game developers are still learning about. This opens up doors to many more opportunities to using all of the silicon in your system. For more information on heterogenous mGPU, you can read our blog posts here and here.

In both cases, MultiGPU in DX12 exposes the ability for a game developer to use 100% of the GPU silicon in the system as opposed to a more closed-box and bug prone implicit implementation.

https://blogs.msdn.microsoft.com/di...rectx-12-multigpu-and-a-peek-into-the-future/
 

bystander36

Diamond Member
Apr 1, 2013
5,154
132
106
What?

You need to re-think that.

Even in DX11, there are some titles where CF/SLI has zero issues with latency. Higher frame rate and smoother.

Let's imagine a 60 FPS scenario, 16ms per frame.

1 GPU = 60 FPS = 16ms per frame.
2 GPU with perfect scaling (95% is possible) = 120 FPS = 8ms per frame.

The problem is when it's done poorly, GPU #1 and #2 are not in sync well, leading to big frame time variance.

All DX12/Vulkan mGPU does is give developers more control. If they are capable, the result should be better. If they are not, well, no mGPU support at all. :/

This is where you fail to understand how AFR works.

With AFR, every other frame you see was displayed by alternating GPU's. If you get 120 FPS, your frame times are 8.33ms, the same as with a single GPU getting 120 FPS, BUT there is one major difference. Each GPU is only creating 60 FPS, and each individual frame they create, takes 16.67ms.

Let's me see if I can create a visual for you.

[GPU 1 frame][GPU 1 frame][GPU 1 frame]
..........[GPU 2 frame][GPU 2 frame][GPU 2 frame]

While the displayed times are 8.33ms a part, the rendering process of every frame is 16.67ms.

With a single GPU, the displayed and rendering times are 8.33ms.
 

bystander36

Diamond Member
Apr 1, 2013
5,154
132
106
Please read up on how it works in DX12. Ashes of the singularity devs have written about how they implemented it and removed latency / input delay.

No matter how you slice it, every frame created in AFR takes twice as long to create as a single card at the same FPS. At 120 FPS, your frame times are 8.33ms, but each frame rendered took 16.67ms to create, and displayed at staggered intervals between the 2 GPU's.

They are talking about reducing other forms of latency created with SLI/CF, but you cannot get rid of that inherent limitation.
 
Feb 19, 2009
10,457
10
76
This is where you fail to understand how AFR works.

With AFR, every other frame you see was displayed by alternating GPU's. If you get 120 FPS, your frame times are 8.33ms, the same as with a single GPU getting 120 FPS, BUT there is one major difference. Each GPU is only creating 60 FPS, and each individual frame they create, takes 16.67ms.

Let's me see if I can create a visual for you.

[GPU 1 frame][GPU 1 frame][GPU 1 frame]
..........[GPU 2 frame][GPU 2 frame][GPU 2 frame]

While the displayed times are 8.33ms a part, the rendering process of every frame is 16.67ms.

With a single GPU, the displayed and rendering times are 8.33ms.

Wait, are you talking about frame time lag or input lag?

Because 120 fps @ 8.33ms interval will still appear like 120 FPS fluid. Unless the GPU #1 and #2 fail to sync up their frames, and miss an interval.
 

bystander36

Diamond Member
Apr 1, 2013
5,154
132
106
Wait, are you talking about frame time lag or input lag?

Because 120 fps @ 8.33ms interval will still appear like 120 FPS fluid. Unless the GPU #1 and #2 fail to sync up their frames, and miss an interval.

I said latency in general. Your latency, from input to display is increased due to the increased time to render each frame. Being fluid is great for viewing, but latency has an big effect on game play too.

Edit: to be more clear. Each frame displayed had double the rendering latency in AFR, this also increases total latency. And while Mantle implementations greatly improved frame time variance, they still aren't as good as a single GPU.
 
Last edited:

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Edit: to be more clear. Each frame displayed had double the rendering latency in AFR, this also increases total latency. And while Mantle implementations greatly improved frame time variance, they still aren't as good as a single GPU.

Thats not necessarily the right way to look at it. If you are using AFR to increase the framerate, the latency would not increase. However you will not see the expected reduction in latency as the framerate increase compared with a (apparently faster) single GPU.
That having said, i am not the biggest supported of AFR to say the least. However i typically chose settings where the minimum framerate is above 60fps such that i can vsync. The input latency would then be comparable to running at 30fps (which means 33ms from input to buffer swap), which for me i consider acceptable.
 
Last edited:

krumme

Diamond Member
Oct 9, 2009
5,952
1,585
136
Its a cost/benefit issue. What is better eg. 4 smaller dies on interposer vs big die. Cost for development and production but also perf and perf with latency issues.

Its the consoles driving this asynch stuff, and it seems to me they need to get onboard for this to be more mainstream tech. But how difficult/costly is it to implement a proper low latency multigpu solution in an engine, if its similar size used and made on interposer?
 

Flapdrol1337

Golden Member
May 21, 2014
1,677
93
91
Its a cost/benefit issue. What is better eg. 4 smaller dies on interposer vs big die. Cost for development and production but also perf and perf with latency issues.

Its the consoles driving this asynch stuff, and it seems to me they need to get onboard for this to be more mainstream tech. But how difficult/costly is it to implement a proper low latency multigpu solution in an engine, if its similar size used and made on interposer?
I thought the main advantage was you can combine parts with different manufacturing processes. Like the intel chips with the 128MB on package memory, that cache is made on a process that makes slower but more power efficient transistors.