Techspot: Rise of the Tomb Raider (PC) - CPU Performance

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Deders

Platinum Member
Oct 14, 2012
2,401
1
91
In this particular case - just no. If the game shows almost no performance difference between 2.5GHz and 4.5GHz core clock, then it is not remotely compute limited. It is directly the GPU taking advantage of higher memory bandwidth.

Usually if all the video data fits in the Vram, the GPU won't need much system ram bandwidth.

How are these new console games handled? The consoles have shared system and video memory. Do they have to put a copy of the video data into both the system and video ram? Is that why 6GB is used? If that was the case, wouldn't the PCIe bus be the bottleneck?

I'd still like to see how other architectures handle reduced or increased ram speeds.
 
Aug 11, 2008
10,451
642
126
But the Pentium *is* clockspeed limited. Goes from 77 to 26. Vishera is also to a lesser extent, going to from 81 to 71. I also dont see how they are saying the game uses a single core. At 2.5 ghz, quad skylake is 3x as fast as dual core pentium.

I would think there are multiple factors at work, and the limiting factor can vary depending of the cpu.
 

Zodiark1593

Platinum Member
Oct 21, 2012
2,230
4
81
Usually if all the video data fits in the Vram, the GPU won't need much system ram bandwidth.

How are these new console games handled? The consoles have shared system and video memory. Do they have to put a copy of the video data into both the system and video ram? Is that why 6GB is used? If that was the case, wouldn't the PCIe bus be the bottleneck?

I'd still like to see how other architectures handle reduced or increased ram speeds.

Cpu-gpu latency probably plays a role as the cpu and gpu can communicate closely in an apu setup, particularly with a fast shared RAM configuration.
 

2is

Diamond Member
Apr 8, 2012
4,281
131
106
In this particular case - just no. If the game shows almost no performance difference between 2.5GHz and 4.5GHz core clock, then it is not remotely compute limited. As consequence we can conclude, that it is directly the GPU taking advantage of higher memory bandwidth.

No it isn't, because if the GPU was relying on the system memory, it would be slow as molasses, even with the fastest DDR4 modules. It's the CPU that benefits from more system memory bandwidth. The GPU relies on its own memory.

The exception being if you're running integrated video.
 

Deders

Platinum Member
Oct 14, 2012
2,401
1
91
It should be clear by now, that it is not the Skylake core but the faster memory interface. The game is apparently not compute bound...so you draw the wrong conclusion.

My impression of Skylake is that it can handle more instructions per core than previous gens.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
But the Pentium *is* clockspeed limited. Goes from 77 to 26. Vishera is also to a lesser extent, going to from 81 to 71. I also dont see how they are saying the game uses a single core. At 2.5 ghz, quad skylake is 3x as fast as dual core pentium.

I would think there are multiple factors at work, and the limiting factor can vary depending of the cpu.

Indeed. You have essentially 2 asynchronous tasks, the CPU task and the GPU task both have a certain work-load per frame. Whichever task takes longer determines framerate. And then there is a certain amount of synchronous workload, where both CPU and GPU have to synchronize. As example, when CPU is locking a buffer in VRAM, GPU has to wait.

When you now reduce CPU clock you extend the CPU task in time and also the synchronous task. If the framerate only decreases slightly it is due to the synchronous part. If you further reduce clock frequency (or use a slower CPU) then eventually the CPU task will be limiting and you'll see a steeper decline in framerate.

What we can conclude from these numbers is, that at least between 4.5 and 2.5 GHz (Skylake) the CPU task is not limiting at all. The second conclusion following this argument is, that whatever makes skylake winning here, it is the GPU task beeing faster (and not Skylake having superior IPC)

No it isn't, because if the GPU was relying on the system memory, it would be slow as molasses, even with the fastest DDR4 modules.

Oh dear, it is by far not that simple. For each buffer, allocation location is determined by its usage. Some buffers, in particular those the CPU is supposed to have access to (in addition to the GPU) are allocated in system memory (DRAM). I have never implied, that all buffers are allocated in system memory. Apparently render targets have to use D3DPOOL::D3DPOOL_DEFAULT and are allocated in VRAM.
 
Last edited:

TheELF

Diamond Member
Dec 22, 2012
4,029
753
126
Indeed. You have essentially 2 asynchronous tasks, the CPU task and the GPU task both have a certain work-load per frame. Whichever task takes longer determines framerate.
If those two really were without synchronization you would get missing graphics just like in GTA V where the CPU task runs as fast as possible and the GPU task just does it's best to keep up,if one of the tasks determines framerate than it is synchronized.

What we can conclude from these numbers is, that at least between 4.5 and 2.5 GHz (Skylake) the CPU task is not limiting at all. The second conclusion following this argument is, that whatever makes skylake winning here, it is the GPU task beeing faster.
There are also games,like BF4/blackops3, that can use more worker cores/threads when more cores are available ,so maybe with lower clocks more cores are working to keep up the numbers.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
If those two really were without synchronization you would get missing graphics just like in GTA V where the CPU task runs as fast as possible and the GPU task just does it's best to keep up,if one of the tasks determines framerate than it is synchronized

Yes there is synchronization, otherwise you would have 2 different framerates : the amount of frames per second the CPU is processing, and the amount of frames per second the GPU is processing :)
And of course it does not make much sense for the CPU to go ahead and calculate frames the GPU is never going to render. Therefore, given the CPU task is shorter than the CPU task, the CPU will eventually block/suspend.
See the following sketch (time increases from left to right)

Compute limited:
CCCCCCSCCCCCCSCCCCCCSCCCCCC
IIIIIIIIIISGGGIIIISGGGIIIIISGGGIIII

GPU limited
CCCSCCCBBBBSCCCBBBBSCCCBBB
IIIIISGGGGGGSGGGGGGSGGGGGG

C-CPU, G-GPU, S-SYNCHRONOUS PART, B-CPU is blocking, I - GPU is idle

In any case, frametime is determined by sum of synchronous part and longest asynchronous part (G or C).
 
Last edited:

Deders

Platinum Member
Oct 14, 2012
2,401
1
91
What we can conclude from these numbers is, that at least between 4.5 and 2.5 GHz (Skylake) the CPU task is not limiting at all. The second conclusion following this argument is, that whatever makes skylake winning here, it is the GPU task beeing faster (and not Skylake having superior IPC)

I was thinking more along the lines of more data path in the CPU, similar to a wider data bus for memory, but for CPU instructions. Hence why Skylake is able to operate well at lower frequencies with lower power consumption for many tasks.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
About the single core, lets see what guru3d says,

On this page we show some performance relative to CPU cores, e.g. differences based on the number of CPU cores. Our X99 based motherboard can independently control the number of CPU cores of our Core i7 5960X in the sense they can be disabled / enabled.

So basically with our motherboard we start to disable CPU cores as shown in the above BIOS screenshot. We leave hyper-threading enabled. Our CPU per core is clocked at 4.4 GHz and we drop from 8 towards 4 towards 2 and even 1 cpu core enabled. I'll be using FCAT for measurements as that way we can display the tiniest of differences in performance.
1. They use the Haswell-E that has a massive 20MB of Shared L3 cache.
2. They use HyperThreading thus the game has an access to 2x Threads
3. The CPU is clocked to 4.4GHz.

I will say that the CPU performance on that game (at the time of release) is more about L3 Cache size and especially L3 Cache speed/latency and then single core clock frequency.
 

TheELF

Diamond Member
Dec 22, 2012
4,029
753
126
Yes there is synchronization, otherwise you would have 2 different framerates : the amount of frames per second the CPU is processing, and the amount of frames per second the GPU is processing :)
And of course it does not make much sense for the CPU to go ahead and calculate frames the GPU is never going to render.

What do you mean otherwise?
You do have 2 different framerates,why do you think benchmarks measure all the CPUs against the fastest GPU and then every GPU against the fastest CPU?
And in GTA V specifically you can run it much faster if you don't care that you only see half the city, but if you want to see every frame perfectly rendered you will have to run it much slower.

And of course it makes a lot of sense for the CPU to go ahead and calculate frames the GPU is never going to render because this assures that the most recent frame will be displayed whenever the GPU is ready,otherwise you get what people call lag,slightly delayed frames because the CPU gives you an older frame.
 

TheELF

Diamond Member
Dec 22, 2012
4,029
753
126
I will say that the CPU performance on that game (at the time of release) is more about L3 Cache size and especially L3 Cache speed/latency and then single core clock frequency.

But the g3258 also gets good framerates ,actually at 2,5Ghz it sucks and at 3Ghz it's starting to be ok.
Same CPU, same caches,as soon as the CPU is over a certain level of computing power anything more only adds minimal gains...
 

Innokentij

Senior member
Jan 14, 2014
237
7
81
Kind of pointless benchmark, console resolution when 980TI owners is at 1440p or higher, only set to high instead of very high? Is it running exlusive fullscreen mode that removes the random drops? FXAA? What level? What day time? This is garbage, giving it a F.
 

Sweepr

Diamond Member
May 12, 2006
5,148
1,143
136
Eurogamer is testing The Division (PC):

On the plus side, CPU utilisation looks solid. The game appears to utilise as many threads as you can throw at it - a quad-core i7 shows even balance across all eight threads, while testing on a six-core 3930K shows a similar story across 12. A Core i3 processor should be good for maintaining a baseline 30fps with relative comfort - the exception being in loading time length. Curiously, load times are tied directly to CPU power, and the gulf between i3 and i7 (both running from an SSD) is often palpable. Even the i7s see all threads maxed here, a situation that thankfully does not persist into gameplay where an i7 4790K overclocked to 4.6GHz sees utiilsation in the 40-50 per cent area when paired with a GTX 970.
 

RampantAndroid

Diamond Member
Jun 27, 2004
6,591
3
81
Wow...is a 6700k really 10FPS faster than a 3770k in real world performance in that game?

If only I could get a 6700 at a good price, I'd consider moving my 3770k to server duty...
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
And of course it makes a lot of sense for the CPU to go ahead and calculate frames the GPU is never going to render because this assures that the most recent frame will be displayed whenever the GPU is ready,otherwise you get what people call lag,slightly delayed frames because the CPU gives you an older frame.

Thats not how it works. The CPU never runs ahead of the GPU by a significant margin. Where did you get that idea from?
 

TheELF

Diamond Member
Dec 22, 2012
4,029
753
126
Thats not how it works. The CPU never runs ahead of the GPU by a significant margin. Where did you get that idea from?

I said that the CPU keeps updating the frames all the time so that it can send the most recent frame to the GPU, whenever the GPU is ready.

The GPU itself can also pre render a certain amount of frames so it always has the freshest frame to display.


And the reason why so many new games stutter on so many different systems?
Yup,the thread that is made for 1.5Ghz ancient core runs way too fast on desktop CPUs,so you are right,in stead of running ahead of the GPU by a significant margin it stops from time to time so that the graphics can catch up,result=stutter.
 

MrTeal

Diamond Member
Dec 7, 2003
3,919
2,708
136
This just goes to show you that for gaming CPU's really offer diminishing returns. There seems to be no reason to ever get above an I5 for gaming*.

*if all you plan to play is Rise of the Tomb Raider
 

coercitiv

Diamond Member
Jan 24, 2014
7,447
17,752
136
Yup,the thread that is made for 1.5Ghz ancient core runs way too fast on desktop CPUs,so you are right,in stead of running ahead of the GPU by a significant margin it stops from time to time so that the graphics can catch up,result=stutter.
Any testing for this?
 

2is

Diamond Member
Apr 8, 2012
4,281
131
106
*if all you plan to play is Rise of the Tomb Raider

You must be new to PC gaming and this must be the only game benchmark you've ever come across. I can't think of another reason why you'd think this is the only game that doesn't benefit a whole lot from an i7, considering this is a performance trend shared by the majority of games. There are exceptions, but they're just that. Exceptions.
 

Deders

Platinum Member
Oct 14, 2012
2,401
1
91
I can argue that they'll help less because the GPU becomes more important at higher resolutions.

Yes the GPU is the more important factor, but there will be situations where an i7 can help drive that GPU to maintain higher framerates needed for high refresh rates.