C++ Pointers to Functions question

Sassy Rabbit · Nov 26, 2007

I have several functions in a program I am working on which are called under different instances. Is it fast to set up an array of these functions and call them through a simple look up, or use a series of if-else if -else if etc. statements, or is the speed-up negligible?

I believe it would be faster to use the pointer array.

Thanks ahead of time for the input.....

Markbnj · Nov 26, 2007

Function pointers are a pretty typical approach for a state machine, which is what your mechanism sounds like, but I don't think performance is the driving factor. Rather function pointers are used to decouple at compile time, and make it possible to add additional states without changing the core logic, assuming all the signatures of the methods are the same. Inheritance and virtual functions can be used to get the same advantages.

Sassy Rabbit · Nov 26, 2007

Well, performance is a little bit of an issue - this program will probably take days to run, so if it is significantly slower it wouldn't be a great choice, however I wanted to use that structure for the adaptability as you had stated if I don't sacrifice computation time.

Thanks for the feedback tho....

CTho9305 · Nov 26, 2007

How big are the functions being called? Is a lot of the 2 days likely to be spent in the overhead of a poor choice in this part of the program, or is the vast majority of the time spent elsewhere? How many possible functions could you be calling? What CPU are you targeting?

Markbnj · Nov 26, 2007

Originally posted by: Sassy Rabbit
Well, performance is a little bit of an issue - this program will probably take days to run, so if it is significantly slower it wouldn't be a great choice, however I wanted to use that structure for the adaptability as you had stated if I don't sacrifice computation time.

Thanks for the feedback tho....

It's really going to be more about the efficiency of the overall algorithm. Virtual functions are called through a pointer, for example, and eventually all function calls map into transfers of control to an address. The only overhead would be how much work the compiler has to do to get the address and push the params onto the stack. I doubt simply using function pointers will have any impact at all.

Sassy Rabbit · Nov 27, 2007

It's a simulation - so each time step calls N^2 (where N is the number of bodies in my simulation) functions - right now, the way the program was originally written before I inherited it, it consists of a lot of if..else if..else if etc. constructions which seems to me to be slow, especially as the type of bodies increases.

Sassy Rabbit · Nov 27, 2007

Thanks - that was what I had been thinking, but I had wanted someone else to tell me I was thinking the right way before I undertake an overhaul.

EagleKeeper · Nov 28, 2007

Function pointers are great when your routine does not know which function will be needed during the "development" cycle.

Using them otherwise may make the code look pretty, but is a pain to read/understand the logic path unless a debugger is used.

As Markbnj indicated, it ends up being a pointer to address control in the end. Speed will not be the impact and memory allocation of code would be neglible unless you have a few thousand of "IF/case" statements that are being tested.

CTho9305 · Nov 28, 2007

I asked the question I asked because many CPUs lack the ability to predict indirect branches. If the functions are very short, the overhead is going to be significant, since the function call will act like a branch that's mispredicted nearly 100% of the time.

Sassy Rabbit · Nov 29, 2007

Originally posted by: CTho9305
I asked the question I asked because many CPUs lack the ability to predict indirect branches. If the functions are very short, the overhead is going to be significant, since the function call will act like a branch that's mispredicted nearly 100% of the time.

AHH ok - the functions are fairly short between 10 and 100 lines long.

But function pointers seem a more efficient way of calling them rather than going through a potentially large evaluation.....especially I need to iterate an NXN matrix each time step, and each iteration would involve several of these function calls.

EagleKeeper · Nov 29, 2007

Originally posted by: Sassy Rabbit

Originally posted by: CTho9305
I asked the question I asked because many CPUs lack the ability to predict indirect branches. If the functions are very short, the overhead is going to be significant, since the function call will act like a branch that's mispredicted nearly 100% of the time.

Click to expand...

AHH ok - the functions are fairly short between 10 and 100 lines long.

But function pointers seem a more efficient way of calling them rather than going through a potentially large evaluation.....especially I need to iterate an NXN matrix each time step, and each iteration would involve several of these function calls.

Using a case/switch statement will be more efficient in terms of time than a series of if/elseif

The compiler sets up a single known branch based on the index.

CTho9305 · Nov 29, 2007

Originally posted by: Sassy Rabbit

Originally posted by: CTho9305
I asked the question I asked because many CPUs lack the ability to predict indirect branches. If the functions are very short, the overhead is going to be significant, since the function call will act like a branch that's mispredicted nearly 100% of the time.

Click to expand...

AHH ok - the functions are fairly short between 10 and 100 lines long.

But function pointers seem a more efficient way of calling them rather than going through a potentially large evaluation.....especially I need to iterate an NXN matrix each time step, and each iteration would involve several of these function calls.

That many lines of C++ is likely long enough that the overhead won't be bad.

Markbnj · Nov 29, 2007

My CPU optimization chops are woefully ancient at this point, but since you're calling the same relatively small chunks of code over and over, won't most of them end up in the L1 cache anyway?

Sassy Rabbit · Nov 29, 2007

Originally posted by: Markbnj
My CPU optimization chops are woefully ancient at this point, but since you're calling the same relatively small chunks of code over and over, won't most of them end up in the L1 cache anyway?

OK - I admit this is losing me a little. I've never had formal computer science training....I'm in chemical engineering and I do simulation work. So all I know about C++ I learned myself from writing simulation software. Anyway, I'm not sure how the L1 cache would affect my program...

Sassy Rabbit · Nov 29, 2007

Originally posted by: CTho9305

Originally posted by: Sassy Rabbit

Originally posted by: CTho9305
I asked the question I asked because many CPUs lack the ability to predict indirect branches. If the functions are very short, the overhead is going to be significant, since the function call will act like a branch that's mispredicted nearly 100% of the time.

Click to expand...

AHH ok - the functions are fairly short between 10 and 100 lines long.

But function pointers seem a more efficient way of calling them rather than going through a potentially large evaluation.....especially I need to iterate an NXN matrix each time step, and each iteration would involve several of these function calls.

Click to expand...

That many lines of C++ is likely long enough that the overhead won't be bad.

Oh the architecture I run on is almost strictly AMD 64bit processors if that makes a difference.

Markbnj · Nov 29, 2007

Originally posted by: Sassy Rabbit

Originally posted by: Markbnj
My CPU optimization chops are woefully ancient at this point, but since you're calling the same relatively small chunks of code over and over, won't most of them end up in the L1 cache anyway?

Click to expand...

OK - I admit this is losing me a little. I've never had formal computer science training....I'm in chemical engineering and I do simulation work. So all I know about C++ I learned myself from writing simulation software. Anyway, I'm not sure how the L1 cache would affect my program...

It's a matter of whether the CPU has to fetch the next few instructions from memory (slooow) the L2 cache (faster), or the L1 cache (fastest). Branch prediction is basically all about having the code that will be needed next already prefetched from memory into the cache. But if you are hitting the same small set of functions over and over much of their instructions will likely get into the cache and stay there.

Sassy Rabbit · Nov 29, 2007

Originally posted by: Markbnj

Originally posted by: Sassy Rabbit

Originally posted by: Markbnj
My CPU optimization chops are woefully ancient at this point, but since you're calling the same relatively small chunks of code over and over, won't most of them end up in the L1 cache anyway?

Click to expand...

OK - I admit this is losing me a little. I've never had formal computer science training....I'm in chemical engineering and I do simulation work. So all I know about C++ I learned myself from writing simulation software. Anyway, I'm not sure how the L1 cache would affect my program...

Click to expand...

It's a matter of whether the CPU has to fetch the next few instructions from memory (slooow) the L2 cache (faster), or the L1 cache (fastest). Branch prediction is basically all about having the code that will be needed next already prefetched from memory into the cache. But if you are hitting the same small set of functions over and over much of their instructions will likely get into the cache and stay there.

OK - that is really cool to know. Now, this is probably the tougher question. How could I determine if using a set of if..else if.. etc statements, some with complex logic or an array of function pointers would be the fastest. On one hand - once the if statements are evaluated they would go to the right function (not sure where it would be stored though), on the other, I could go through many evaluations before I find the right subroutine.

And yes - do to the nature of the simulation, it would be many of the same subroutines - say 10 or so that are called over and over again.

CTho9305 · Nov 29, 2007

Originally posted by: Sassy Rabbit

Originally posted by: CTho9305

Originally posted by: Sassy Rabbit

Originally posted by: CTho9305
I asked the question I asked because many CPUs lack the ability to predict indirect branches. If the functions are very short, the overhead is going to be significant, since the function call will act like a branch that's mispredicted nearly 100% of the time.

Click to expand...

AHH ok - the functions are fairly short between 10 and 100 lines long.

But function pointers seem a more efficient way of calling them rather than going through a potentially large evaluation.....especially I need to iterate an NXN matrix each time step, and each iteration would involve several of these function calls.

Click to expand...

That many lines of C++ is likely long enough that the overhead won't be bad.

Click to expand...

Oh the architecture I run on is almost strictly AMD 64bit processors if that makes a difference.

Barcelona/Phenom are the first AMD CPUs that can predict indirect branches. RealWorldTech has some stuff on it that's worth reading.

Originally posted by: Markbnj

Originally posted by: Sassy Rabbit

Originally posted by: Markbnj
My CPU optimization chops are woefully ancient at this point, but since you're calling the same relatively small chunks of code over and over, won't most of them end up in the L1 cache anyway?

Click to expand...

OK - I admit this is losing me a little. I've never had formal computer science training....I'm in chemical engineering and I do simulation work. So all I know about C++ I learned myself from writing simulation software. Anyway, I'm not sure how the L1 cache would affect my program...

Click to expand...

It's a matter of whether the CPU has to fetch the next few instructions from memory (slooow) the L2 cache (faster), or the L1 cache (fastest). Branch prediction is basically all about having the code that will be needed next already prefetched from memory into the cache. But if you are hitting the same small set of functions over and over much of their instructions will likely get into the cache and stay there.

Branch prediction and caches are somewhat orthogonal. The cache issue is, "are the instructions accessible quickly", and the branch prediction issue is, "do I know which instruction is coming next?". A reasonably small program will probably fit completely in the L1 I-cache (64KB on AMD CPUs), but even if the program is entirely in cache, a K8 is still going to eat a ~12 cycle penalty (I'm not sure of the exact number) when it hits an indirect branch that goes to a different place each time.

Originally posted by: Sassy Rabbit
OK - that is really cool to know. Now, this is probably the tougher question. How could I determine if using a set of if..else if.. etc statements, some with complex logic or an array of function pointers would be the fastest. On one hand - once the if statements are evaluated they would go to the right function (not sure where it would be stored though), on the other, I could go through many evaluations before I find the right subroutine.

The easiest way to find out would be to try them all. It depends on a lot of factors, so working it out would be non-trivial and you'd need to understand a lot of subtleties about your CPU and your program.

...but before you do that, you should use a profiler to check if you're really spending much of your execution time on the branching part. If 90% of the time is spent elsewhere, you'll get at most a 10% speedup.

Markbnj · Nov 30, 2007

Branch prediction and caches are somewhat orthogonal. The cache issue is, "are the instructions accessible quickly", and the branch prediction issue is, "do I know which instruction is coming next?".

I was thinking that next quote was "So I can put it in the cache and access it quickly." Optimization at this level is beyond my expertise on modern CPUs. Pretty cool stuff, though

.

Sassy Rabbit · Dec 1, 2007

Originally posted by: CTho9305

Originally posted by: Markbnj
It's a matter of whether the CPU has to fetch the next few instructions from memory (slooow) the L2 cache (faster), or the L1 cache (fastest). Branch prediction is basically all about having the code that will be needed next already prefetched from memory into the cache. But if you are hitting the same small set of functions over and over much of their instructions will likely get into the cache and stay there.

Click to expand...

Branch prediction and caches are somewhat orthogonal. The cache issue is, "are the instructions accessible quickly", and the branch prediction issue is, "do I know which instruction is coming next?". A reasonably small program will probably fit completely in the L1 I-cache (64KB on AMD CPUs), but even if the program is entirely in cache, a K8 is still going to eat a ~12 cycle penalty (I'm not sure of the exact number) when it hits an indirect branch that goes to a different place each time.

So if you happen to call the functions in the same order each time a routine that assess the state is called - would that help keep the instructions stored in the Cache? Likewise, if the same one happened to be called over and over again?

CTho9305 · Dec 1, 2007

Originally posted by: Sassy Rabbit

Originally posted by: CTho9305

Originally posted by: Markbnj
It's a matter of whether the CPU has to fetch the next few instructions from memory (slooow) the L2 cache (faster), or the L1 cache (fastest). Branch prediction is basically all about having the code that will be needed next already prefetched from memory into the cache. But if you are hitting the same small set of functions over and over much of their instructions will likely get into the cache and stay there.

Click to expand...

Branch prediction and caches are somewhat orthogonal. The cache issue is, "are the instructions accessible quickly", and the branch prediction issue is, "do I know which instruction is coming next?". A reasonably small program will probably fit completely in the L1 I-cache (64KB on AMD CPUs), but even if the program is entirely in cache, a K8 is still going to eat a ~12 cycle penalty (I'm not sure of the exact number) when it hits an indirect branch that goes to a different place each time.

Click to expand...

So if you happen to call the functions in the same order each time a routine that assess the state is called - would that help keep the instructions stored in the Cache? Likewise, if the same one happened to be called over and over again?

Calling the same one over and over again: yes. Calling them in the same order: unlikely. Using a consistent order is potentially good for branch prediction though. Really though, make sure your program is burning enough time here that it matters before spending too much time worrying about how to optimize this one portion.

C++ Pointers to Functions question

Member

Elite Member <br>Moderator Emeritus

Member

Elite Member

Elite Member <br>Moderator Emeritus

Member

Member

Discussion Club Moderator<br>Elite Member

Elite Member

Member

Discussion Club Moderator<br>Elite Member

Elite Member

Elite Member <br>Moderator Emeritus

Member

Member

Elite Member <br>Moderator Emeritus

Member

Elite Member

Elite Member <br>Moderator Emeritus

Member

Elite Member