[Part 2] Measuring CPU Draw Call Performance

TheELF · Jul 31, 2017

MajinCry said:
It involves a specific optimization that could only ever be used in extremely simplistic, synthetic tests such as the one in this thread; when there is only one object, with no lights, materials, shadow map, parallax maps, etc, so the whole scene consists of duplicate draw calls of a single source, NVidia's driver will appear about twice as fast as AMD's.

But it's not, it only appears to be when we configure the synthetic as outlined in OP.

That's called instancing,first of all that's why you told people to turn it off so you can see pure draw calls,second of all just look at the assassins creed or hitman games,the crowds are actually clones galore, those scene would be impossible without instancing.It's not just a synthetic test case that you will never encounter in games,in fact almost anything in games uses instancing (hence the huge differences in fallout you brought up earlier) but the clone crowds are the easiest way to spot it.

MajinCry said:
Keep in mind that the draw call disparity is way more pronounced in real world (non synthetic) scenarios. The Lynnfield scores a bit over 50% better than Phenom II, but in Fallout 4's Corvega factory, when overlooking Lexington, it will have >3x better framerates. Tested that with a user over on the ENB forums.

Basic explanation from nvidia.
https://docs.nvidia.com/gameworks/c...cssamples/opengl_samples/instancingsample.htm

MajinCry · Jul 31, 2017

TheELF said:
That's called instancing,first of all that's why you told people to turn it off so you can see pure draw calls,second of all just look at the assassins creed or hitman games,the crowds are actually clones galore, those scene would be impossible without instancing.It's not just a synthetic test case that you will never encounter in games,in fact almost anything in games uses instancing (hence the huge differences in fallout you brought up earlier) but the clone crowds are the easiest way to spot it.

No, that's explicitly not instancing, and there's no way to disable this skeezy driver "optimization".

As I said, NVidia's driver has an optimization tailored for, and usable only by, unrealistic synthetics that call the exact same draw call a large number of times. This optimization can only come into play when said draw call only draws an object; no shadow casting, not receiving any shadows, not receiving any lights, no parallax maps, no materials, etc.

Geometry Instancing is a shader technique that has a slightly similar use case, but performs magnitudes better, and has an actual real-world use. Completely different thing entirely.

Boris Vorontsov talks about it here: http://enbseries.enbdev.com/forum/viewtopic.php?p=69741&sid=9eed93bb7dbc0d46fe51ae225ffa9d73#p69741

Not just draw function call cost lot of performance, but every command to driver between draw calls. You may (for old nvidia drivers at least) call crazy amount of draw calls in cycle without modifying anything, performance will be awesome. Insert any changes to object -> bottleneck.

So if we have:

DoDraw{
Draw(Rock)
Draw(Rock)
Draw(Rock)
Draw(Rock)
Draw(Rock)
Draw(Rock)
Draw(Rock)
Draw(Rock)
Draw(Rock)
-repeat ad nauseam-
}

Then NVidia's driver will appear a good deal faster than AMD's. But if we have:

DoDraw{
Draw(Rock, CastShadow)
Draw(Rock, RockMaterial, CastShadow)
Draw(Rock, ReceiveShadow, ReceiveLight)
Draw(Rock)
Draw(Rock, RockMaterial)
}

or

DoDraw{
Draw(Rock, RockMaterial)
Draw(Rock, RockMaterial)
Draw(Rock, RockMaterial)
Draw(Rock, RockMaterial)
-repeat ad nauseam-
}

or

DoDraw{
Draw(Rock, CastShadow)
Draw(Rock, CastShadow)
Draw(Rock, CastShadow)
Draw(Rock, CastShadow)
Draw(Rock, CastShadow)
}

etc. etc., then NVidia's driver will not perform any better. This "optimization" can only be used in synthetics, and it's purpose, lo' and behold, is to make NVidia look better in synthetics.

As for comparing actual draw call performance, I have that Fallout 4 (able to measure draw calls via ENBSeries) thread with all the necessary settings provided, but it never really got anywhere.

TheELF · Jul 31, 2017

Yeah but everybody is disabling instancing in this demo so no matter which one of us is right or wrong,the results have nothing to do with sleazy driver practices since the sleazy part is being disabled.

MajinCry · Jul 31, 2017

TheELF said:
Yeah but everybody is disabling instancing in this demo so no matter which one of us is right or wrong,the results have nothing to do with sleazy driver practices since the sleazy part is being disabled.

Again. We're not talking about Geometry Instancing here. Completely different thing.

TheELF · Jul 31, 2017

Oh so you believe that the no instancing button only disables one/some kind(s) of instancing but other(s) are still "hardcoded" into the driver?

MajinCry · Jul 31, 2017

TheELF said:
Oh so you believe that the no instancing button only disables one/some kind(s) of instancing but other(s) are still "hardcoded" into the driver?

Again, it's not instancing. It's a specific driver optimization intended for synthetic benchmarks. Did you bother to read the quote (or even better, the thread) I linked from the ENB forums?

TheELF · Jul 31, 2017

He's just saying that as long as nothing changes you are operating at maximum capacity,as soon as you have to do additional stuff things get slower...because you have to do additional stuff.
Well, duh!

Not just draw function call cost lot of performance, but every command to driver between draw calls. You may (for old nvidia drivers at least) call crazy amount of draw calls in cycle without modifying anything, performance will be awesome. Insert any changes to object -> bottleneck.

Not just draw function call cost lot of performance, but every command to driver between draw calls.

Every command needs CPU performance not only the draw calls.

You may (for old nvidia drivers at least) call crazy amount of draw calls in cycle without modifying anything, performance will be awesome. Insert any changes to object -> bottleneck.

Doing only draw calls will have better performance than doing draw calls plus whatever else. Because whatever else uses up CPU cycles that could have been spent on draw calls.

MajinCry · Jul 31, 2017

TheELF said:
He's just saying that as long as nothing changes you are operating at maximum capacity,as soon as you have to do additional stuff things get slower...because you have to do additional stuff.
Well, duh!

Every command needs CPU performance not only the draw calls.

Doing only draw calls will have better performance than doing draw calls plus whatever else. Because whatever else uses up CPU cycles that could have been spent on draw calls.

As I made clear, those "changes" are additional draw calls, such as shadows, lighting, materials, and such and so forth. He specifically stated that calling the same draw call will have the driver perform much better, but if there are draw calls other than rendering just one specific object that has no lights & shadows & what have you, the optimization doesn't come into play.

This is why NVidia's performance results are skewed, making them invalid for drawing a comparison to AMD's cards (i.e, "NVidia has less overhead!").

MajinCry · Nov 20, 2019

Interestingly, I found that there are several things you can do to increase draw call performance, measurable with this program.

1. Go exclusive full-screen. You can test that by pressing alt + enter with the demo, and you'll see about 0.10 more fps or so.

2. Set the priority for the process to high. The average FPS went up from 21.45 to 21.75 with my system (i7 6700k with no HT @ 4.0ghz, Vega 56).

3. Disable the intel speculation vulnerability fixes. Disabling Spectre alone will give a 0.10-0.15fps increase. To disable Spectre and meltdown, use https://www.grc.com/inspectre.htm and for a slew of the other vulnerability fixes use https://gallery.technet.microsoft.com/scriptcenter/Speculation-Control-e36f0050

MajinCry · Nov 14, 2020

A user on Reddit game me his results for Zen 2 and Zen 3, so I've updated the main post. Zen 3 puts AMD just ahead of Skylake's draw call performance.

.vodka · Nov 14, 2020

MajinCry said:
A user on Reddit game me his results for Zen 2 and Zen 3, so I've updated the main post. Zen 3 puts AMD just ahead of Skylake's draw call performance.

Holy crap. Very, very nice!

I wonder how Zen3 does on this regard with maxed out fabric clock + tight memory timings.

MajinCry · Nov 14, 2020

.vodka said:
Holy crap. Very, very nice!

I wonder how Zen3 does on this regard with maxed out fabric clock + tight memory timings.

He didn't tell me what speeds he was running, and he didn't want to test out how it performs with the driver thread on another CCX. So if we got any other Ryzen users here, still a bit needs testing.

Also Zen+. I wanna see how that fits in.

Search

[Part 2] Measuring CPU Draw Call Performance

TheELF

Diamond Member

MajinCry

Platinum Member

TheELF

Diamond Member

MajinCry

Platinum Member

TheELF

Diamond Member

MajinCry

Platinum Member

TheELF

Diamond Member

MajinCry

Platinum Member

MajinCry

Platinum Member

MajinCry

Platinum Member

.vodka

Golden Member

MajinCry

Platinum Member

TRENDING THREADS