= = = =
Hi xxxxxxxxx,
Thanks for your interest in the Ashes of the Singularity benchmark.
In order to get an accurate picture of how well a given CPU will perform, it's important to look at the CPU Frame rate with Infinite GPU on and off ( a check box exists on the benchmark settings panel ). Note, while on, you may see some graphical corruption due to use of async shaders, however the results will be valid.
With Infinite GPU, you should see %90+ workload on your CPU. In this mode, we do not "wait" in the case where the GPU is still busy. You should see excellent scaling between 4-16 thread machines.This can only be tracked on DX12.
Without Infinite GPU, the CPU will "Wait" on a signal from the GPU that the ready to process another frame. During this wait, the CPU tends to power down when there isn't any additional work to do and effectively serializes a portion of the frame. This serialization is what causes the CPU frame rate discrepancy between Infinite GPU on and off.
In addition, due to this "wait", one interesting stat to track is your power draw. On DX11 the power draw tends to be much higher than on DX12, as the additional serial threads that the driver needs to process the GPU commands effectively forces the CPU to be active even if it is only using a fraction of it's cores. This tends to be an overlooked benefit to DX12 since the API is designed so that engines can evenly distribute work.
Regarding specific CPU workloads and the differences between AMD and Intel it will be important to note a few things.
1. We have heavily invested in SSE ( mostly 2 for compatibility reasons ) and a significant portion of the engine is executing that code during the benchmark. It could very well be 40% of the frame. Possibly more.
2. While we do have large contiguous blocks of SSE code ( mainly in our simulations ) it is also rather heavily woven into the entire game via our math libraries. Our AI and gameplay code tend to be very math heavy.
3. The Nitrous engine is designed to be data oriented ( basically we know what memory we need and when ). Because of this, we can effectively utilize the SSE streaming memory instructions in conjunction with prefetch ( both temporal and non temporal ). In addition, because our memory accesses are more predictable the hardware prefetcher tends to be better utilized.
4. Memory bandwidth is definitely something to consider. The larger the scope of the application, paired with going highly parallel puts a lot of pressure on the Memory System. On my i7 3770s i'm hitting close to peak bandwith on 40% of the frame.
I hope this information helps point you in the right direction for your investigation into the performance differences between AMD and Intel. We haven't done exhaustive comparative tests, but generally speaking we have found AMD chips to compare more favorably to Intel than what is displayed via synthetic benchmarks. I'm looking forward to your results.
# # #