C++ Pointers to Functions question

Sassy Rabbit

Member
Sep 7, 2007
89
0
0
I have several functions in a program I am working on which are called under different instances. Is it fast to set up an array of these functions and call them through a simple look up, or use a series of if-else if -else if etc. statements, or is the speed-up negligible?

I believe it would be faster to use the pointer array.

Thanks ahead of time for the input.....
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
Function pointers are a pretty typical approach for a state machine, which is what your mechanism sounds like, but I don't think performance is the driving factor. Rather function pointers are used to decouple at compile time, and make it possible to add additional states without changing the core logic, assuming all the signatures of the methods are the same. Inheritance and virtual functions can be used to get the same advantages.
 

Sassy Rabbit

Member
Sep 7, 2007
89
0
0
Well, performance is a little bit of an issue - this program will probably take days to run, so if it is significantly slower it wouldn't be a great choice, however I wanted to use that structure for the adaptability as you had stated if I don't sacrifice computation time.

Thanks for the feedback tho....
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
How big are the functions being called? Is a lot of the 2 days likely to be spent in the overhead of a poor choice in this part of the program, or is the vast majority of the time spent elsewhere? How many possible functions could you be calling? What CPU are you targeting?
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
Originally posted by: Sassy Rabbit
Well, performance is a little bit of an issue - this program will probably take days to run, so if it is significantly slower it wouldn't be a great choice, however I wanted to use that structure for the adaptability as you had stated if I don't sacrifice computation time.

Thanks for the feedback tho....

It's really going to be more about the efficiency of the overall algorithm. Virtual functions are called through a pointer, for example, and eventually all function calls map into transfers of control to an address. The only overhead would be how much work the compiler has to do to get the address and push the params onto the stack. I doubt simply using function pointers will have any impact at all.
 

Sassy Rabbit

Member
Sep 7, 2007
89
0
0
It's a simulation - so each time step calls N^2 (where N is the number of bodies in my simulation) functions - right now, the way the program was originally written before I inherited it, it consists of a lot of if..else if..else if etc. constructions which seems to me to be slow, especially as the type of bodies increases.
 

Sassy Rabbit

Member
Sep 7, 2007
89
0
0
Thanks - that was what I had been thinking, but I had wanted someone else to tell me I was thinking the right way before I undertake an overhaul.
 

EagleKeeper

Discussion Club Moderator<br>Elite Member
Staff member
Oct 30, 2000
42,589
5
0
Function pointers are great when your routine does not know which function will be needed during the "development" cycle.

Using them otherwise may make the code look pretty, but is a pain to read/understand the logic path unless a debugger is used.

As Markbnj indicated, it ends up being a pointer to address control in the end. Speed will not be the impact and memory allocation of code would be neglible unless you have a few thousand of "IF/case" statements that are being tested.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
I asked the question I asked because many CPUs lack the ability to predict indirect branches. If the functions are very short, the overhead is going to be significant, since the function call will act like a branch that's mispredicted nearly 100% of the time.
 

Sassy Rabbit

Member
Sep 7, 2007
89
0
0
Originally posted by: CTho9305
I asked the question I asked because many CPUs lack the ability to predict indirect branches. If the functions are very short, the overhead is going to be significant, since the function call will act like a branch that's mispredicted nearly 100% of the time.

AHH ok - the functions are fairly short between 10 and 100 lines long.

But function pointers seem a more efficient way of calling them rather than going through a potentially large evaluation.....especially I need to iterate an NXN matrix each time step, and each iteration would involve several of these function calls.
 

EagleKeeper

Discussion Club Moderator<br>Elite Member
Staff member
Oct 30, 2000
42,589
5
0
Originally posted by: Sassy Rabbit
Originally posted by: CTho9305
I asked the question I asked because many CPUs lack the ability to predict indirect branches. If the functions are very short, the overhead is going to be significant, since the function call will act like a branch that's mispredicted nearly 100% of the time.

AHH ok - the functions are fairly short between 10 and 100 lines long.

But function pointers seem a more efficient way of calling them rather than going through a potentially large evaluation.....especially I need to iterate an NXN matrix each time step, and each iteration would involve several of these function calls.

Using a case/switch statement will be more efficient in terms of time than a series of if/elseif

The compiler sets up a single known branch based on the index.

 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: Sassy Rabbit
Originally posted by: CTho9305
I asked the question I asked because many CPUs lack the ability to predict indirect branches. If the functions are very short, the overhead is going to be significant, since the function call will act like a branch that's mispredicted nearly 100% of the time.

AHH ok - the functions are fairly short between 10 and 100 lines long.

But function pointers seem a more efficient way of calling them rather than going through a potentially large evaluation.....especially I need to iterate an NXN matrix each time step, and each iteration would involve several of these function calls.

That many lines of C++ is likely long enough that the overhead won't be bad.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
My CPU optimization chops are woefully ancient at this point, but since you're calling the same relatively small chunks of code over and over, won't most of them end up in the L1 cache anyway?
 

Sassy Rabbit

Member
Sep 7, 2007
89
0
0
Originally posted by: Markbnj
My CPU optimization chops are woefully ancient at this point, but since you're calling the same relatively small chunks of code over and over, won't most of them end up in the L1 cache anyway?

OK - I admit this is losing me a little. I've never had formal computer science training....I'm in chemical engineering and I do simulation work. So all I know about C++ I learned myself from writing simulation software. Anyway, I'm not sure how the L1 cache would affect my program...
 

Sassy Rabbit

Member
Sep 7, 2007
89
0
0
Originally posted by: CTho9305
Originally posted by: Sassy Rabbit
Originally posted by: CTho9305
I asked the question I asked because many CPUs lack the ability to predict indirect branches. If the functions are very short, the overhead is going to be significant, since the function call will act like a branch that's mispredicted nearly 100% of the time.

AHH ok - the functions are fairly short between 10 and 100 lines long.

But function pointers seem a more efficient way of calling them rather than going through a potentially large evaluation.....especially I need to iterate an NXN matrix each time step, and each iteration would involve several of these function calls.

That many lines of C++ is likely long enough that the overhead won't be bad.

Oh the architecture I run on is almost strictly AMD 64bit processors if that makes a difference.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
Originally posted by: Sassy Rabbit
Originally posted by: Markbnj
My CPU optimization chops are woefully ancient at this point, but since you're calling the same relatively small chunks of code over and over, won't most of them end up in the L1 cache anyway?

OK - I admit this is losing me a little. I've never had formal computer science training....I'm in chemical engineering and I do simulation work. So all I know about C++ I learned myself from writing simulation software. Anyway, I'm not sure how the L1 cache would affect my program...

It's a matter of whether the CPU has to fetch the next few instructions from memory (slooow) the L2 cache (faster), or the L1 cache (fastest). Branch prediction is basically all about having the code that will be needed next already prefetched from memory into the cache. But if you are hitting the same small set of functions over and over much of their instructions will likely get into the cache and stay there.
 

Sassy Rabbit

Member
Sep 7, 2007
89
0
0
Originally posted by: Markbnj
Originally posted by: Sassy Rabbit
Originally posted by: Markbnj
My CPU optimization chops are woefully ancient at this point, but since you're calling the same relatively small chunks of code over and over, won't most of them end up in the L1 cache anyway?

OK - I admit this is losing me a little. I've never had formal computer science training....I'm in chemical engineering and I do simulation work. So all I know about C++ I learned myself from writing simulation software. Anyway, I'm not sure how the L1 cache would affect my program...

It's a matter of whether the CPU has to fetch the next few instructions from memory (slooow) the L2 cache (faster), or the L1 cache (fastest). Branch prediction is basically all about having the code that will be needed next already prefetched from memory into the cache. But if you are hitting the same small set of functions over and over much of their instructions will likely get into the cache and stay there.

OK - that is really cool to know. Now, this is probably the tougher question. How could I determine if using a set of if..else if.. etc statements, some with complex logic or an array of function pointers would be the fastest. On one hand - once the if statements are evaluated they would go to the right function (not sure where it would be stored though), on the other, I could go through many evaluations before I find the right subroutine.

And yes - do to the nature of the simulation, it would be many of the same subroutines - say 10 or so that are called over and over again.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: Sassy Rabbit
Originally posted by: CTho9305
Originally posted by: Sassy Rabbit
Originally posted by: CTho9305
I asked the question I asked because many CPUs lack the ability to predict indirect branches. If the functions are very short, the overhead is going to be significant, since the function call will act like a branch that's mispredicted nearly 100% of the time.

AHH ok - the functions are fairly short between 10 and 100 lines long.

But function pointers seem a more efficient way of calling them rather than going through a potentially large evaluation.....especially I need to iterate an NXN matrix each time step, and each iteration would involve several of these function calls.

That many lines of C++ is likely long enough that the overhead won't be bad.

Oh the architecture I run on is almost strictly AMD 64bit processors if that makes a difference.

Barcelona/Phenom are the first AMD CPUs that can predict indirect branches. RealWorldTech has some stuff on it that's worth reading.

Originally posted by: Markbnj
Originally posted by: Sassy Rabbit
Originally posted by: Markbnj
My CPU optimization chops are woefully ancient at this point, but since you're calling the same relatively small chunks of code over and over, won't most of them end up in the L1 cache anyway?

OK - I admit this is losing me a little. I've never had formal computer science training....I'm in chemical engineering and I do simulation work. So all I know about C++ I learned myself from writing simulation software. Anyway, I'm not sure how the L1 cache would affect my program...

It's a matter of whether the CPU has to fetch the next few instructions from memory (slooow) the L2 cache (faster), or the L1 cache (fastest). Branch prediction is basically all about having the code that will be needed next already prefetched from memory into the cache. But if you are hitting the same small set of functions over and over much of their instructions will likely get into the cache and stay there.

Branch prediction and caches are somewhat orthogonal. The cache issue is, "are the instructions accessible quickly", and the branch prediction issue is, "do I know which instruction is coming next?". A reasonably small program will probably fit completely in the L1 I-cache (64KB on AMD CPUs), but even if the program is entirely in cache, a K8 is still going to eat a ~12 cycle penalty (I'm not sure of the exact number) when it hits an indirect branch that goes to a different place each time.

Originally posted by: Sassy Rabbit
OK - that is really cool to know. Now, this is probably the tougher question. How could I determine if using a set of if..else if.. etc statements, some with complex logic or an array of function pointers would be the fastest. On one hand - once the if statements are evaluated they would go to the right function (not sure where it would be stored though), on the other, I could go through many evaluations before I find the right subroutine.

The easiest way to find out would be to try them all. It depends on a lot of factors, so working it out would be non-trivial and you'd need to understand a lot of subtleties about your CPU and your program.

...but before you do that, you should use a profiler to check if you're really spending much of your execution time on the branching part. If 90% of the time is spent elsewhere, you'll get at most a 10% speedup.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
Branch prediction and caches are somewhat orthogonal. The cache issue is, "are the instructions accessible quickly", and the branch prediction issue is, "do I know which instruction is coming next?".

I was thinking that next quote was "So I can put it in the cache and access it quickly." Optimization at this level is beyond my expertise on modern CPUs. Pretty cool stuff, though :).
 

Sassy Rabbit

Member
Sep 7, 2007
89
0
0
Originally posted by: CTho9305


Originally posted by: Markbnj
It's a matter of whether the CPU has to fetch the next few instructions from memory (slooow) the L2 cache (faster), or the L1 cache (fastest). Branch prediction is basically all about having the code that will be needed next already prefetched from memory into the cache. But if you are hitting the same small set of functions over and over much of their instructions will likely get into the cache and stay there.

Branch prediction and caches are somewhat orthogonal. The cache issue is, "are the instructions accessible quickly", and the branch prediction issue is, "do I know which instruction is coming next?". A reasonably small program will probably fit completely in the L1 I-cache (64KB on AMD CPUs), but even if the program is entirely in cache, a K8 is still going to eat a ~12 cycle penalty (I'm not sure of the exact number) when it hits an indirect branch that goes to a different place each time.

So if you happen to call the functions in the same order each time a routine that assess the state is called - would that help keep the instructions stored in the Cache? Likewise, if the same one happened to be called over and over again?
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: Sassy Rabbit
Originally posted by: CTho9305


Originally posted by: Markbnj
It's a matter of whether the CPU has to fetch the next few instructions from memory (slooow) the L2 cache (faster), or the L1 cache (fastest). Branch prediction is basically all about having the code that will be needed next already prefetched from memory into the cache. But if you are hitting the same small set of functions over and over much of their instructions will likely get into the cache and stay there.

Branch prediction and caches are somewhat orthogonal. The cache issue is, "are the instructions accessible quickly", and the branch prediction issue is, "do I know which instruction is coming next?". A reasonably small program will probably fit completely in the L1 I-cache (64KB on AMD CPUs), but even if the program is entirely in cache, a K8 is still going to eat a ~12 cycle penalty (I'm not sure of the exact number) when it hits an indirect branch that goes to a different place each time.

So if you happen to call the functions in the same order each time a routine that assess the state is called - would that help keep the instructions stored in the Cache? Likewise, if the same one happened to be called over and over again?

Calling the same one over and over again: yes. Calling them in the same order: unlikely. Using a consistent order is potentially good for branch prediction though. Really though, make sure your program is burning enough time here that it matters before spending too much time worrying about how to optimize this one portion.