[Part 3] Measuring CPU Draw Call Performance in Fallout 4

MajinCry

Platinum Member
Jul 28, 2015
2,495
571
136
Previous Thread

The most significant hit to framerates in large games, open world games especially, is caused by excessive draw call counts. And this isn't a new thing; as far back as 2006 with Oblivion, performance would utterly die during raining weather, as NPCs would take out their light-casting torches. This is due to the game using Forward Rendering, where the entire scene is redrawn for as many times as there are active lights. So if the game would only issue 2,000 draw calls, it would now be making 8,000 draw calls should three NPCs equip their illuminating torches.

Luckily, game developers invented Deferred Rendering, where only the objects a light affects are redrawn. This reduced draw calls significantly. If we take the above scene, and somehow swapped Oblivion's renderer for a deferred one, we'd see, for example, 2,030 draw calls should those three NPCs have their torches out.

But even deferred rendering is not enough these days. In Fallout 4, there are areas where your framerate would plummet. And why is this? Draw calls. The worst offender is in player-made settlements, where the draw calls can reach 20,000 in number. But that's not part of the game in and of itself, so instead, I picked two particularly problematic areas: Corvega, and Diamond City. The former due to it issuing the most draw calls out of any area in Fallout 4 (11,000 draw calls), and the latter due to it issuing many draw calls (8,000) while also having NPCs interacting with each other.

And here are the results, from an array of systems. First is the worst offender, Corvega:

Fallout_4_Draw_Calls_-_Corvega.png


Second, is Diamond City:

Fallout_4_Draw_Calls_-_Diamond_City.png


We see a couple very interesting things. The first, is that we see Fallout 4 loves fast RAM more than anything else. The second, is that NVidia's driver has around 30% higher framerates than AMD's, no matter the architecture. For Ryzen to pull 60fps in Fallout 4's intensive scenes, blazing fast DDR4 is required, as is an NVidia card. An Intel xLake CPU can just about keep at 60fps when partnered with an AMD card, but when partnered with an NVidia card, the game can reach above 70fps in these draw call limited scenes.

When paired with a Ryzen processor, the difference in driver overhead is especially pronounced, with 50fps being a hard to reach target when partnered with an AMD GPU. But an NVidia one? And with fast RAM? Ryzen then appears to give the xLake architecture a run for it's money. Outside of that particular configuration, however, and Ryzen is sorely lacking in draw call performance.

Edit1: This behaviour also coincides with the results from Part 2, where we saw Ryzen performing around 25% slower than Skylake in a draw call benchmark. With Fallout 4, we see Ryzen performing 21-23% slower than Skylake when in draw call intensive scenes.
 
Last edited:

Elfear

Diamond Member
May 30, 2004
7,097
644
126
Thanks for compiling all the info! Very useful.

I'd be very interested in seeing results for a 1st-gen Ryzen paired with fast ram and an Nvidia card.
 

MajinCry

Platinum Member
Jul 28, 2015
2,495
571
136
Thanks for compiling all the info! Very useful.

I'd be very interested in seeing results for a 1st-gen Ryzen paired with fast ram and an Nvidia card.

We also need a 2nd gen Ryzen paired with an AMD card, as the 2700x with an NVidia GPU doesn't do too shabby.
 

Mopetar

Diamond Member
Jan 31, 2011
7,835
5,981
136
Is this all that concerning going forward?

It seems that this is only a real problem in Bethesda games and likely because they're using an utterly ancient game engine that dates back some over 15 years at this point. Hopefully it gets replaced or fixed at some point in the near future. I was watching a video on the making of Fallout 76 and they interviewed some guys who worked on the multiplayer that said they had to seriously overhaul it since it wasn't compatible with the notion of multiple characters. Hopefully they modernized a lot of the other parts of the codebase as well.
 
  • Like
Reactions: ksec

MajinCry

Platinum Member
Jul 28, 2015
2,495
571
136
Is this all that concerning going forward?

It seems that this is only a real problem in Bethesda games and likely because they're using an utterly ancient game engine that dates back some over 15 years at this point. Hopefully it gets replaced or fixed at some point in the near future. I was watching a video on the making of Fallout 76 and they interviewed some guys who worked on the multiplayer that said they had to seriously overhaul it since it wasn't compatible with the notion of multiple characters. Hopefully they modernized a lot of the other parts of the codebase as well.

Yes it is, as draw calls are the largest detriment to performance in open world games. This would not be an issue with a renderer that uses Vulkan, which is designed around doing tens of thousands of draw calls per core, but Direct3D 11 is made for a few thousand draw calls being processed by a single core.

As games get bigger and/or more detailed, draw calls are gonna get even higher in number.
 

Uncle_Cherry

Junior Member
Jul 15, 2018
1
0
1
Hello Majin I made this account so I could add some insight to your results. On the previous thread you asked people to test ccx with ryzen processors and I can confirm that it does not have a huge impact on performance maybe 1/2%. I have a ryzen 1600x@stock and a gtx 970@1.5Ghz(driver version 398.36) with 3200Mhz(Havent tried to run it at 3200 since 2 bios updates ago running it at 2933Mhz atm) CL14 ram

At corvega: 11030 Draw Calls
0-11 in task manager = 52-53FPS
0-5: 50-51FPS
6-11: 52FPS
6, 8, 10: 50-51FPS
7, 9, 11: 50-51FPS

Diamond City: 8000 Draw Calls(draw calls werent very consistent in diamond city ranged anywhere from 7900 to 8400)
0-11: 55FPS
0-5: 52FPS
6-11: 53-54
6, 8, 10: 51FPS
7, 9, 11: 51FPS

Dont know if it was a firmware update in between all of the test results but from my results it seems leaving all threads enabled provides the best performance in contrary to what other people have found about using a single ccx


Also why is my 1600x beating a 1700 that has 500Mhz faster ram by like 4 fps if this is purely an architectural issue? Also the normal 1600 with slower ram beating mine. youd think the better binned processors would be better no? Maybe its just nvidias drivers causing the difference with the 1700? And I dont really see a reason why the 1080ti would make a 1600 with slower ram faster than a 1600x with faster ram.

Edit: Didnt think about this until afterwards but I do have the fallout 4 texture optimization project baked into my ba2 files. Im pretty sure I have backups of the originals and will retry with them but I dont think textures would change much when the game is being run at such low settings.

Edit2: Yeah I retested with the Original Ba2s and I dont really understand if it was just cause the game loaded differently or something but diamond city gained 3 fps while corvega lost 2
 
Last edited:

MajinCry

Platinum Member
Jul 28, 2015
2,495
571
136
Hello Majin I made this account so I could add some insight to your results. On the previous thread you asked people to test ccx with ryzen processors and I can confirm that it does not have a huge impact on performance maybe 1/2%. I have a ryzen 1600x@stock and a gtx 970@1.5Ghz(driver version 398.36) with 3200Mhz(Havent tried to run it at 3200 since 2 bios updates ago running it at 2933Mhz atm) CL14 ram

At corvega: 11030 Draw Calls
0-11 in task manager = 52-53FPS
0-5: 50-51FPS
6-11: 52FPS
6, 8, 10: 50-51FPS
7, 9, 11: 50-51FPS

Diamond City: 8000 Draw Calls(draw calls werent very consistent in diamond city ranged anywhere from 7900 to 8400)
0-11: 55FPS
0-5: 52FPS
6-11: 53-54
6, 8, 10: 51FPS
7, 9, 11: 51FPS

Dont know if it was a firmware update in between all of the test results but from my results it seems leaving all threads enabled provides the best performance in contrary to what other people have found about using a single ccx


Also why is my 1600x beating a 1700 that has 500Mhz faster ram by like 4 fps if this is purely an architectural issue? Also the normal 1600 with slower ram beating mine. youd think the better binned processors would be better no? Maybe its just nvidias drivers causing the difference with the 1700? And I dont really see a reason why the 1080ti would make a 1600 with slower ram faster than a 1600x with faster ram.

Edit: Didnt think about this until afterwards but I do have the fallout 4 texture optimization project baked into my ba2 files. Im pretty sure I have backups of the originals and will retry with them but I dont think textures would change much when the game is being run at such low settings.

Edit2: Yeah I retested with the Original Ba2s and I dont really understand if it was just cause the game loaded differently or something but diamond city gained 3 fps while corvega lost 2

The reason why you're getting a higher framerate over the other Ryzen systems, is that you're using an NVidia GPU, which I assume is due to Fallout 4 having implemented DCLs, thus explaining the difference in performance.

Your score isn't particularly high, especially compared to the 2000 Ryzen series paired with an NVidia GPU. If we could get another Ryzen 2000 user to run the benchmark, we'd see if it's mainly due to driver differences, or it being due to decent optimizations in the new Ryzen series.
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
Thanks for collecting the data. I think though this data is too narrow of a sample size to make the conclusion "Ryzen is bad at draw calls" across the board. Too much to ride on 1 test, especially when it's Bethesda's famously terrible Gamebryo derivative they've been flogging since Morrowind. They didn't even ship Skyrim with SSE extensions at launch, so I'd think its much more likely that Fallout 4's engine is not optimized. Fair to say that Ryzen is not great at Fallout 4 though
 

MajinCry

Platinum Member
Jul 28, 2015
2,495
571
136
Thanks for collecting the data. I think though this data is too narrow of a sample size to make the conclusion "Ryzen is bad at draw calls" across the board. Too much to ride on 1 test, especially when it's Bethesda's famously terrible Gamebryo derivative they've been flogging since Morrowind. They didn't even ship Skyrim with SSE extensions at launch, so I'd think its much more likely that Fallout 4's engine is not optimized. Fair to say that Ryzen is not great at Fallout 4 though

The previous draw call benchmark helps shine a light on Ryzen's draw call performance. At best, with super fast RAM, it's 23% slower than Skylake at processing draw calls. With slower RAM, it's 28% slower.

And in this Fallout 4 benchmark, Ryzen + fast DDR4 is 21% slower than Skylake in scenes with plenty of draw calls. I think we can say this conclusively shows that Ryzen has a significant performance deficit when compared against Skylake, in draw call limited scenarios, such as MMOs, strategy games, and open world games without serious batching methods.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
A lot of this depends on the API used, and the complexity of the scene. For instance, using multithreading in the latest version of the .net framework for unity I was able to do a HUGE number of draw calls on my 1080ti/Threadripper 1950X (I believe I was targeting vulkan or DX12). Game makers just need to start optimizing for higher core counts. The demo in question, which I may one day make into a game or benchmark was a voxel engine. Chunks suck, so the smaller you can make your chunks, the more interactive your world can be. I've pegged out 16 cores AND around 85% of my 1080ti, but got a playable framerate that allowed me to walk around, jump, etc. and the world was infinitely generating on the fly, saving chunks to an SSD for later recall. This included meshes like trees, grass, etc. Note that in Unity (at least the last time I used it) you had to manually update it to use newer C#/.NET framework standards.

Once developers start learning to get creative with multiple cores, games are going to change DRASTICALLY. Not just graphically, but AI, etc. are all going to get a huge boost. The only cards that will be limited are the ones that are draw call limited. This will also be something that PCIE 4.0 could actually help with, larger transfer rates = higher transfer speeds to the card.

Of course, we also need to solve the fast + cheap storage issue. SSDs need to come down in price. The problem with infinite open world games is that save games will eventually take up gigabytes of space.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
The previous draw call benchmark helps shine a light on Ryzen's draw call performance. At best, with super fast RAM, it's 23% slower than Skylake at processing draw calls. With slower RAM, it's 28% slower.

And in this Fallout 4 benchmark, Ryzen + fast DDR4 is 21% slower than Skylake in scenes with plenty of draw calls. I think we can say this conclusively shows that Ryzen has a significant performance deficit when compared against Skylake, in draw call limited scenarios, such as MMOs, strategy games, and open world games without serious batching methods.

Not really, you don't know what FO4 is doing outside of those draw calls, and it all depends on how multithreaded FO4 is. I haven't played far enough to get to one of those scenes, but it doesn't seem to use very much CPU.
 

MajinCry

Platinum Member
Jul 28, 2015
2,495
571
136
Not really, you don't know what FO4 is doing outside of those draw calls, and it all depends on how multithreaded FO4 is. I haven't played far enough to get to one of those scenes, but it doesn't seem to use very much CPU.

That's simply not true. We know that the game, in the background, is processing a few thousand NPCs. In previous Bethesda games, there were under a thousand NPCs being processed, but with Fallout 4, they made it so that even the generic NPCs are pathing in the background.

You can also check to see if draw calls are the limiting factor; point the camera at the ground, and the framerate will skyrocket. And that's due to all the objects no longer being rendered due to frustum culling.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
That's simply not true. We know that the game, in the background, is processing a few thousand NPCs. In previous Bethesda games, there were under a thousand NPCs being processed, but with Fallout 4, they made it so that even the generic NPCs are pathing in the background.

You can also check to see if draw calls are the limiting factor; point the camera at the ground, and the framerate will skyrocket. And that's due to all the objects no longer being rendered due to frustum culling.

Which may not have anything to do with draw calls at all. I will get into this more tomorrow.
 

outsidefactor

Junior Member
Jul 25, 2018
2
6
16
It may or may not be the case if it was an argument being made by one person. This is not the case.

Many people have been making this case about Fallout 4, but the truth is that these issues are not new and as such are very well understood. Boris Vorontsov, of ENB fame, has been making this case successfully for a long time. It is well understood that the engine in recent Fallout and Elder Scrolls games, which began as a toolkit in Gamebryo and is now called the Creation Engine, has had significant inefficiencies that have gone unaddressed by Bethesda for over a decade, many of them having their origin in Oblivion.

Here is a question you should try and answer: how does a programmer with no access to source code resolve significant performance and stability issues with Bethesda games weeks or even only days after a game is released, using code injection? That's exactly what ENB is. I, like many people, could not get Fallout 4 to run at launch because of the state it was released in. I had to wait for the first ENB for Fallout 4 before I could even play the game. Thankfully I didn't have to wait long as Boris had a beta ENB out fours days after Fallout 4 became available to the public.

The Creation Engine offers a lot of creative freedom. It's modability is its strength. At the other end of the scale, the Creation Engine's graphics are garbage, with a lot of functionality jerry-rigged in using third party APIs. Just ask Boris about the way the Creation Engine does shadows...

The Creation Engine is out of date, but worse than that, it's out of date bad code. This means that it produces visuals that are well behind state of the art and takes silly hardware to produce those out of date visuals.
 
  • Like
Reactions: KompuKare

Madcap_Magician

Junior Member
Apr 26, 2018
11
31
91
Can you separate out windows 7 and windows 10 results with a different color? Windows 7 has the highest score and according to other draw call tests has better performance than 10 in this regard. Also, it would be nice to include the guy who posted last on the prior thread because they have the highest performance seen so far.

The difference in performance between windows 7 and 10 is unfortunate for VR users as windows 10 is about it as far as compatibility goes. If I were to play in pancake only mode I would go with windows 7 for these games.

Thank you for creating this test because it has been very useful to see how certain hardware compares for open world games. I've found that many flight simulators have performance that correlates directly to the results in your test.
 

ZGR

Platinum Member
Oct 26, 2012
2,052
656
136
I'd love to see this test with MoreSpawns and WOTC combined. This allows nearly 200 AI in battle around you. Looks incredible, and really pushes the CPU.
 

USER8000

Golden Member
Jun 23, 2012
1,542
780
136
The game seems to love lower latencies,as Ryzen 2 with lower latency caches,is significantly faster than Ryzen 1.

14687


The graph is from Sweclockers. Skylake X does not seem to perform well with Fallout 4.
 
  • Like
Reactions: KompuKare
Jul 24, 2017
93
25
61
Yes it is, as draw calls are the largest detriment to performance in open world games. This would not be an issue with a renderer that uses Vulkan, which is designed around doing tens of thousands of draw calls per core, but Direct3D 11 is made for a few thousand draw calls being processed by a single core.

As games get bigger and/or more detailed, draw calls are gonna get even higher in number.

I'll admit this is a really n00by question, but what's the difference, then, between Fallout 4 and Assassin's Creed Unity? Unity is also a game that suffers greatly due to high draw call numbers, and it uses DirectX 11. However, it scales far better with core count compared to FO4. Is it just due to other factors that have nothing to do with draw calls?
 

MajinCry

Platinum Member
Jul 28, 2015
2,495
571
136
I'll admit this is a really n00by question, but what's the difference, then, between Fallout 4 and Assassin's Creed Unity? Unity is also a game that suffers greatly due to high draw call numbers, and it uses DirectX 11. However, it scales far better with core count compared to FO4. Is it just due to other factors that have nothing to do with draw calls?

It all depends on the scene being rendered. We see that there's a significant difference between the benchmarks other sites performed for Fallout 4, and the performance we see with Fallout 4 in this draw call benchmark. Considering how unintelligently benchmarks are generally conducted, I wouldn't be surprised if the AC Unity benchmarks are done in scenes where there is not as many objects being rendered as there could be.
 

outsidefactor

Junior Member
Jul 25, 2018
2
6
16
Ryzen ain't great at draw calls.

That feels like a gross oversimplification...

I think there is an unspoken understanding on the side of the developers that isn't really communicated with consumers. I suspect that in most dev houses one hardware vendor or another will be represented on most of the dev's systems. Once a product is nearing the end of the dev cycle they then start to tune the engine as part of the QA process. This means that performance biases either way tend to be baked in during the design stage and then can only be addressed in a very cursory way during the final stages of the release engineering process, when large scale engine changes are actually too difficult and/or costly to make. This process is even more baked in when closed source versions of vendor specific APIs, like Gameworks, are used to solve gaps in the engine. To suggest that the closed source version of Gameworks does not favor nVidia is naive to the point of idiocy, especially when third party patch authors (like Boris Vorontsov) have clearly demonstrated that this bias exists and does not need to exist. Fallout 4 further makes this issue clear because Gameworks is the API that provides much of the lighting and shadows in-game and this is exactly the part of the Fallout 4 draw process that most penalises AMD CPUs and Radeon GPUs.

There is such a significant difference in the design philosophies of AMD, nVidia and Intel that most games will favor one vendor over the other, be it an Intel/nVidia or AMD/Radeon bias, unless care is taken throughout the design process, and that trying to do it near the end of design or as part of QA yields poor results.

DirectX 12 does address this a little bit by forcing vendors to move closer to the hardware earlier in development if they want the best performance, and with the removal of the monolithic thread much of the anti-AMD bias has been removed from DX. I feel that's why Mantle was released, as a clear demonstration that there was a bias in DX 9 and 11 and that the bias could be addressed without penalising nVidia. Open source versions of Gameworks will also allow more scope for closing the bias gap between nVidia and AMD, but it's wirth noting that the developer of Gameworks (nVidia) is leaving the work of closing this bias gap to the people they sell Gameworks to...
 

TheELF

Diamond Member
Dec 22, 2012
3,973
730
126
I think there is an unspoken understanding on the side of the developers that isn't really communicated with consumers. I suspect that in most dev houses one hardware vendor or another will be represented on most of the dev's systems. Once a product is nearing the end of the dev cycle they then start to tune the engine as part of the QA process. This means that performance biases either way tend to be baked in during the design stage and then can only be addressed in a very cursory way during the final stages of the release engineering process, when large scale engine changes are actually too difficult and/or costly to make.
Well that would mean that all these games, since they are designed for consoles so AMD cpu and gpu,should run better on AMD since anything intel and nvidia specific is just cursory and shouldn't have a big impact.
 

Mopetar

Diamond Member
Jan 31, 2011
7,835
5,981
136
I'll admit this is a really n00by question, but what's the difference, then, between Fallout 4 and Assassin's Creed Unity? Unity is also a game that suffers greatly due to high draw call numbers, and it uses DirectX 11. However, it scales far better with core count compared to FO4. Is it just due to other factors that have nothing to do with draw calls?

Could be the engines. I watched a documentary style video about Fallout 76 awhile back where they talked about how much they had to overhaul the Gamebryo engine to make it work. One developer mentioned finding some code chunks from TES:Morrowind still around.

I suspect that Fallout 4 does a lot of things in a poorly optimized manner as a result of an old and clunky code base that’s had far too much tacked on over the years. All of that tinkering on top of some ancient code blobs is likely what makes it run less ideally at times, even on high end hardware.
 

MajinCry

Platinum Member
Jul 28, 2015
2,495
571
136
That feels like a gross oversimplification...

I think there is an unspoken understanding on the side of the developers that isn't really communicated with consumers. I suspect that in most dev houses one hardware vendor or another will be represented on most of the dev's systems. Once a product is nearing the end of the dev cycle they then start to tune the engine as part of the QA process. This means that performance biases either way tend to be baked in during the design stage and then can only be addressed in a very cursory way during the final stages of the release engineering process, when large scale engine changes are actually too difficult and/or costly to make. This process is even more baked in when closed source versions of vendor specific APIs, like Gameworks, are used to solve gaps in the engine. To suggest that the closed source version of Gameworks does not favor nVidia is naive to the point of idiocy, especially when third party patch authors (like Boris Vorontsov) have clearly demonstrated that this bias exists and does not need to exist. Fallout 4 further makes this issue clear because Gameworks is the API that provides much of the lighting and shadows in-game and this is exactly the part of the Fallout 4 draw process that most penalises AMD CPUs and Radeon GPUs.

There is such a significant difference in the design philosophies of AMD, nVidia and Intel that most games will favor one vendor over the other, be it an Intel/nVidia or AMD/Radeon bias, unless care is taken throughout the design process, and that trying to do it near the end of design or as part of QA yields poor results.

DirectX 12 does address this a little bit by forcing vendors to move closer to the hardware earlier in development if they want the best performance, and with the removal of the monolithic thread much of the anti-AMD bias has been removed from DX. I feel that's why Mantle was released, as a clear demonstration that there was a bias in DX 9 and 11 and that the bias could be addressed without penalising nVidia. Open source versions of Gameworks will also allow more scope for closing the bias gap between nVidia and AMD, but it's wirth noting that the developer of Gameworks (nVidia) is leaving the work of closing this bias gap to the people they sell Gameworks to...

The reason why I said that Ryzen isn't great at draw calls, as we see it underperforming in draw call limited scenarios; we tested this in the previous synthetic draw call benchmark, and again here in a real-world game.

https://forums.anandtech.com/threads/part-2-measuring-cpu-draw-call-performance.2499609/

With slow RAM, with two CCX's in use, Ryzen's draw call performance is on par with Core 2. With one CCX in use and slow RAM, the draw call performance is slower than Nehalem. With one CCX in use and fast RAM (3000Mhz), draw call performance is faster than Nehalem but slower than Sandybridge.

The reason why Direct3D 12, Mantle, and Vulkan perform much better than Direct3D 11, by many magnitudes, is due to two factors. The first is that the cost of an individual draw call is many times lesser, the second being that it is able to be parallelized, allowing draw calls to be processed by multiple cores. The difference we're talking about here, is that Direct3D 11 games should only be issuing several thousand draw calls to keep framerates high. With these low level APIs, with only a single core used by the renderer, ~50,000 draw calls is the recommended limit.
 
  • Like
Reactions: ozzy702