C++ : The cost of pointers

Carlis · Mar 31, 2015

Hi
I am planning to solve a physics problem through a special simulation technique.
I am very fond of pointers, and in this problem they would make life a lot easier.
In this problem the 'core' of the task would consist of manipulating pointers and thus the way different objects are related to each other.

I have the impression that there is a prize to pay for this in terms of computational performance. How can I estimate the performance loss? Does it result from bad utilisation of cache, or what is the story?

Best regards,
Carlis

Cogman · Mar 31, 2015

Interesting question and you are right to wonder about cache because ultimately the cost of pointers is going to be closely tied to how often you are hitting/missing the cache.

I would strongly suggest that you use a profiler such as gprof. You'll often find that IO and algorithmic complexity trumps the cost of individual operations. At very least, you'll be able to focus your efforts on things that matter most.

LevelSea · Mar 31, 2015

Cogman said:
Interesting question and you are right to wonder about cache because ultimately the cost of pointers is going to be closely tied to how often you are hitting/missing the cache.

I would strongly suggest that you use a profiler such as gprof. You'll often find that IO and algorithmic complexity trumps the cost of individual operations. At very least, you'll be able to focus your efforts on things that matter most.

Bogosort FTL

nickbits · Mar 31, 2015

Using pointers is probably going to be quicker than any other solution you come up with. Unless using them requires a lot of nested loops and recursive functions, they you may pay a penalty.

TheRyuu · Apr 1, 2015

See agner's "Optimizing software in C++" optimization guide[1]. Pointers (and references) are addressed on pages 36-38.

Basically you shouldn't have to worry about performance because of how microprocessors are constructed. Also a lot of what you're already doing in C++ is being done through pointers (e.g. relative to the stack pointer or implicit "this" pointer).

The optimization manual should cover all the cases you'd want to know about when dealing with pointers. There are also others which deal with a few other topics (such as optimizing stuff in assembly[2]).

Edit:

nickbits said:
Unless using them requires a lot of nested loops and recursive functions, they you may pay a penalty.

It would only be a problem if it had to read it from memory on every iteration (i.e. it couldn't be stored in a register).

[1] http://www.agner.org/optimize/optimizing_cpp.pdf
[2] http://www.agner.org/optimize/

Carlis · Apr 1, 2015

Thank you all for your input. I think a good starting point is to read the referred guide.
Perhaps I can do some small scale experiments on some select problem to see what happens.
Thanks a lot
//
Carlis

Merad · Apr 1, 2015

Pointers add a layer of indirection, so of course there is a (very negligible) amount of overhead associate with their use. This is basically irrelevant, because it's an unavoidable cost of developing software.

Cache thrashing should only be a problem if your data set is very large (much too large for the d-cache) and your data access patterns are bad. The classic example is when accessing a 2d array. If you iterate down the columns of the array every data access causes a large jump in memory, destroying the effectiveness of the d-cache (aka cache thrashing). Iterating down each row before going to the next column maximizes the effectiveness of the d-cache because memory accesses are sequential. On a large array, a column-first algorithm can be literally hundreds of times slower than a row-first algorithm that does the exact same thing.

In short, don't make blatantly bad choices in your algorithm, then if your performance is not acceptable run profiling to identify problems.

sao123 · Apr 5, 2015

if I recall correctly, the cost of dereferencing a pointer and subsequently accessing the data pointed to is based upon the size of the object which needs to be referenced.
built in types should be extremely fast, very large objects might be somewhat slower.

Merad · Apr 5, 2015

sao123 said:
if I recall correctly, the cost of dereferencing a pointer and subsequently accessing the data pointed to is based upon the size of the object which needs to be referenced.
built in types should be extremely fast, very large objects might be somewhat slower.

The size of the object shouldn't matter most of the time. If the object is larger than a single cache line, you increase the odds of having a cache miss since part of the object potentially could be evicted from cache while you're still using it. In a modern CPU however caches are large enough that that seems unlikely, and even if part of the object was evicted from L1 it would almost certainly be re-loaded from L2 or L3, so the penalty is minimal

sao123 · Apr 5, 2015

im not specifically talking about cache penalty...

dereferencing a pointer is a very quick operation, since it merely involves obtaining a memory location address.

What I was addressing was, it is faster to (perform operation) read/copy/move on a single int/float size, than a array of 100 items, or a class with 50 members,

Fox5 · Apr 5, 2015

sao123 said:
im not specifically talking about cache penalty...

dereferencing a pointer is a very quick operation, since it merely involves obtaining a memory location address.

What I was addressing was, it is faster to (perform operation) read/copy/move on a single int/float size, than a array of 100 items, or a class with 50 members,

I'm confused as to what you are asking.

At the very least, adding in a pointer dereference is costing you an additional operation. If your access patterns are bad, then you'll also be hitting main memory and missing the cache.

If you're asking whether it's faster to do:
int x1;
int x2;
... up to int x50;

or int x[500];

or a class with 50 member variables...

Well it depends. On an optimizing compiler, all 3 might be equal.
Otherwise, I'd expect the array to be the fastest, and the class with 50 member variables to be the slowest. Each member variable would require a dereference of a memory location, followed by the operation you want to do. On an optimizing compiler, it could create a new stack frame to handle the class members, or otherwise realize they're all contiguous for accesses.

Cogman · Apr 6, 2015

Fox5 said:
I'm confused as to what you are asking.

At the very least, adding in a pointer dereference is costing you an additional operation. If your access patterns are bad, then you'll also be hitting main memory and missing the cache.

If you're asking whether it's faster to do:
int x1;
int x2;
... up to int x50;

or int x[500];

or a class with 50 member variables...

Well it depends. On an optimizing compiler, all 3 might be equal.
Otherwise, I'd expect the array to be the fastest, and the class with 50 member variables to be the slowest. Each member variable would require a dereference of a memory location, followed by the operation you want to do. On an optimizing compiler, it could create a new stack frame to handle the class members, or otherwise realize they're all contiguous for accesses.

A class with 50 member variables is going to be just as fast as an array with 50 elements. In either case you are dereferencing memory to get to a memory location. In both cases, the memory is going to be a contiguous block (cache effective). This is before talking about optimizations.

Talking about optimizations, a class can be much faster than an array. This is because you can put access constraints on the class the can help the compiler out. For example, making a class immutable allows the compiler to do a whole lot of tricks with method calls and member accesses that it can't do with an array.

Merad · Apr 6, 2015

sao123 said:
im not specifically talking about cache penalty...

dereferencing a pointer is a very quick operation, since it merely involves obtaining a memory location address.

What I was addressing was, it is faster to (perform operation) read/copy/move on a single int/float size, than a array of 100 items, or a class with 50 members,

The array operation can potentially be optimized with SSE instructions. Other than that, there is no difference.

sm625 · Apr 15, 2015

When I wonder what a particular piece of code is costing me, I just put it in a loop of one million or so iterations and then use the performance counter before and after the loop to tell me how long it takes. I've tested many simple things like function calls and type casts. Needless to say they all cost pretty much nothing. But I've never seen any incidence where using a pointer is slower. It has always been either the same or faster. What really costs time is anything to do with strings (other than simple char arrays). CString being the biggest offender. It actually takes 5000 times more cpu cycles just to assign a value to a CString vs a char* type string.

gdansk · Apr 15, 2015

Code:

std::string a;
a.size();

Code:

std::string a;
std::string *b = &a;
b->size();

produce functionally identical assembly on my machine.

The true cost of pointers comes in when data is not proximate to each other. For example, in linked lists. If the address is not in the cache, the processor will be forced to load a new section from memory. TheRyuu previously linked Agner's guide. Read that.

Merad · Apr 15, 2015

sm625 said:
When I wonder what a particular piece of code is costing me, I just put it in a loop of one million or so iterations and then use the performance counter before and after the loop to tell me how long it takes.

That's not a very reliable way to test performance FYI. For example, a piece of code that normally only runs a few times might have a very poor (say 50%) branch prediction rate. Loop the same code a million times so that the branch predictor has a chance to warm up and it might easily hit a 95+% prediction rate. Same goes for cache performance. IOW, the first few iterations of the loop are likely to perform drastically worse than the overall average. A profiler is really the only way to accurately test.

Merad · Apr 15, 2015

gdansk said:
Code:

std::string a; a.size();

Code:

std::string a; std::string *b = &a; b->size();

produce functionally identical assembly on my machine.

I don't know what settings you compiled with, but appears that the compiler has optimized away everything. Your main, if unoptimized, should contain at least 3 function calls: string::string(), string::size(), and string::~string().

An easier example is in C, compiled with -m32 for easier readability (IMO):

Code:

#include <stdio.h>

 int main()
 {
   int x = 10;
   printf("%d\n", x);
 }

----

080483c4 <main>:
80483c4: 55                    push   %ebp
80483c5: 89 e5                 mov    %esp,%ebp
80483c7: 83 e4 f0              and    $0xfffffff0,%esp
80483ca: 83 ec 20              sub    $0x20,%esp
// set the value of x in memory
80483cd: c7 44 24 1c 0a 00 00  movl   $0xa,0x1c(%esp)
80483d5: b8 b4 84 04 08        mov    $0x80484b4,%eax
// load x from memory
80483da: 8b 54 24 1c           mov    0x1c(%esp),%edx
// places copy of x on the stack as the printf arg  
80483de: 89 54 24 04           mov    %edx,0x4(%esp)  
80483e2: 89 04 24              mov    %eax,(%esp)
80483e5: e8 0a ff ff ff        call   80482f4 <printf@plt>
80483ea: c9                    leave
80483eb: c3                    ret

Code:

#include <stdio.h>

 int main()
 {
   int x = 10;
   int* y = &x;
   printf("%d\n", *y);
 }

----

080483c4 <main>:
80483c4: 55                    push   %ebp
80483c5: 89 e5                 mov    %esp,%ebp
80483c7: 83 e4 f0              and    $0xfffffff0,%esp
80483ca: 83 ec 20              sub    $0x20,%esp
// store x in memory
80483cd: c7 44 24 18 0a 00 00  movl   $0xa,0x18(%esp)
// get the address of x
80483d5: 8d 44 24 18           lea    0x18(%esp),%eax
// store the address of x in y
80483d9: 89 44 24 1c           mov    %eax,0x1c(%esp)
// load y from memory
80483dd: 8b 44 24 1c           mov    0x1c(%esp),%eax
// load *y from memory
80483e1: 8b 10                 mov    (%eax),%edx
80483e3: b8 c4 84 04 08        mov    $0x80484c4,%eax
// copy *y on the stack as a printf arg
80483e8: 89 54 24 04           mov    %edx,0x4(%esp)
80483ec: 89 04 24              mov    %eax,(%esp)
80483ef: e8 00 ff ff ff        call   80482f4 <printf@plt>
80483f4: c9                    leave
80483f5: c3                    ret

neocpp · Apr 15, 2015

In my experience the cost of pointers is pretty small compared to how you can potentially use bad access patterns because of them. Only a few times when doing some numerically intensive work has there been a small but noticeable performance hit (depending on the application you may be able to mitigate it with templates). Out of curiosity, what sort of problem are you planning on solving?

Also, don't forget to turn on optimization when benchmarking, since I'm assuming you do that when actually running your application.

C++ : The cost of pointers

Carlis

Senior member

Cogman

Lifer

LevelSea

Senior member

nickbits

Diamond Member

TheRyuu

Diamond Member

Carlis

Senior member

Merad

Platinum Member

sao123

Lifer

Merad

Platinum Member

sao123

Lifer

Fox5

Diamond Member

Cogman

Lifer

Merad

Platinum Member

sm625

Diamond Member

gdansk

Diamond Member

Merad

Platinum Member

Merad

Platinum Member

neocpp

Senior member

TRENDING THREADS