How does the number of arguments to a function affect performance?

eLiu

Diamond Member
Jun 4, 2001
6,407
1
0
Hi all,
How (if at all) does the number of arguments to a function affect performance? Like is it ok have an argument list that's like 15 things long? (I don't think I have any that are quite 15, but just for the sake of argument.)

Does the size of the arguments affect the answer to my question? (i.e. is it different to pass 10 doubles vs 10 ints vs 10 pointers vs 10 strings, etc or some mix of the previous)

If there is some kind of practical upper limit on the # of arguments, would it be preferable to load arguments into a struct and pass that?

Thanks,
-Eric
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
At the lowest possible level, limiting arguments to the number of machine registers is beneficial. It takes a bloody smart compiler for it to matter much at all, however (i.e. a compiler that flexibly register-allocates across function call boundaries -- going beyond 'convention' here). If not passing arguments in register, for each argument, there is a small, fixed cost of pushing it onto the stack in the caller, and retrieving it from the stack in the callee. So each argument slows the act of calling down a tiny, tiny bit (nominally, one store per argument, plus a small fixed cost of setting up the stack frame, amortized over all the arguments, plus one load to retrieve the argument from the stack in the callee -- which is fairly likely to hit in the processor's store buffer, since it is temporally related to the original store).

There is no real benefit, besides perhaps readability and encapsulation, to putting all the variables in a struct and passing that instead -- the compiler does something very like that anyway on the stack, and probably does it better than most programmers can.

More important than the number is the type: basically any primitive type is OK (ints, bools, floats, doubles -- I'm assuming C++ here). Non-primitive types you must be very careful about. In particular, use pointers or reference to prevent objects from being copied (unless you want them to be copied).

i.e., A (below) is faster than B (below)

void myFuncA( const string &str1, const string &str2, ... );
void myFuncB( string str1, string str2, ... );
 

dinkumthinkum

Senior member
Jul 3, 2008
203
0
0
First things first: I wouldn't worry about this sort of thing for 95% of cases [insert diatribe about premature optimization] and you would probably best be served by using inline functions when it does matter (inner loops, say).

Size of each individual arguments does matter. Four separate ints, or a struct of four ints, passed by value it is still going to require the same amount of space to be used somehow. On the AMD64 architecture, the C calling convention is to use registers rdi, rsi, rdx, r14, r15 and then put the rest on the stack. So you can store four int arguments in the lower portions of rdi, rsi, rdx, and r14. Or if you put it in a struct, then the compiler will split it up for you. As degibson noted, a smart compiler can do 'interesting' things like create alternative entry points with faster conventions or better packing. But you are still subject to the frame setup stack access and the price of any caller/callee-save registers which must be preserved.

Your best bet is inlining functions called on a fast path.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
At some point it's better for maintainability to collect your items into a class and add member functions to it, that operate on existing member variables which don't need to be passed as arguments.

instead of spaceship_add_delta ( delta_x, delta_y, delta_z, x, y, z, inertia_x, inertia_y, inertia_z, .... )

SpaceObject:: AddDelta ( delta_x, delta_y, delta_z ) ;

That way you're only passing the new information, not the existing state.
 

Cogman

Lifer
Sep 19, 2000
10,284
138
106
Originally posted by: GodlessAstronomer
The first question you need to ask yourself is, is this optimisation worthwhile?

aye, and that is only visible through profiling.

Since this is your second thread on optimization, I would say you should consider a few techniques.

1. Loop unrolling. Due to pipelining, it can sometimes be beneficial to change a statement like
for (int i = 0; i < 100; ++i)
{
someArray [ i ] = i << 2;
}
to
for (int i = 0; i < 100; ++i)
{
someArray[ i ] = i << 2;
someArray[++i] = i << 2;
}
(woops, I didn't change I like I thought I was going to :))

many compilers will do that for you. Also, limiting the number of function calls can be a great speed up. For example

for (int i = 0; i < something.size(); ++i)
to
int size = something.size();
for (int i = 0; i < size; ++i);

One other thing that improves speed is reducing the number of memory references by storing stuff into local variables.

Most of the time this isn't possible. However, sometimes it is. So for example.
for (int i = 0; i < 100; ++i)
{
array[7] += i * some value;
}
to
int accum = array[7];
for (int i = 0; i < 100; ++i)
{
accum += i * some value;
}
array[7] = accum;

There are other optimizations (IE making your code more cache friendly) but be aware that in almost every case they make the code less readable. Don't optimize every function, only optimize the functions that your program is spending the most time on.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Originally posted by: GodlessAstronomer
The first question you need to ask yourself is, is this optimisation worthwhile?

If you don't learn about the benefits of various optimizations, you'll never know.
 

eLiu

Diamond Member
Jun 4, 2001
6,407
1
0
Originally posted by: degibson
At the lowest possible level, limiting arguments to the number of machine registers is beneficial. It takes a bloody smart compiler for it to matter much at all, however (i.e. a compiler that flexibly register-allocates across function call boundaries -- going beyond 'convention' here). If not passing arguments in register, for each argument, there is a small, fixed cost of pushing it onto the stack in the caller, and retrieving it from the stack in the callee. So each argument slows the act of calling down a tiny, tiny bit (nominally, one store per argument, plus a small fixed cost of setting up the stack frame, amortized over all the arguments, plus one load to retrieve the argument from the stack in the callee -- which is fairly likely to hit in the processor's store buffer, since it is temporally related to the original store).

There is no real benefit, besides perhaps readability and encapsulation, to putting all the variables in a struct and passing that instead -- the compiler does something very like that anyway on the stack, and probably does it better than most programmers can.

More important than the number is the type: basically any primitive type is OK (ints, bools, floats, doubles -- I'm assuming C++ here). Non-primitive types you must be very careful about. In particular, use pointers or reference to prevent objects from being copied (unless you want them to be copied).

i.e., A (below) is faster than B (below)

void myFuncA( const string &str1, const string &str2, ... );
void myFuncB( string str1, string str2, ... );

On the 2nd point (size)... first, I'm in C, not C++ (not sure how much that matters). Second, yup I only ever pass primitives by value; everything else gets a pointer.

On the 1st: sounds like there's not much use in optimising this then. The only functions i have with long argument lists are only called a few times.

Also, I'm the one that made the thread about inlining :)

GodlessAstronomer: given what degibson has said, it seems like the answer is 'Nope'.

Cogman: in your 2nd example, why is the new version any better than the old? There's still 1 function call. (I mean this is assuming the only reason I need size is to set the loop limits. If I need the size for other things, then I will store it.)

In your 3rd example: I had always assumed that the compiler takes care of this for you (i.e. multiple accesses to the same array location don't always get compiled as multiple accesses). Doh. I guess I'll start doing that myself then.

 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Originally posted by: eLiu
Also, I'm the one that made the thread about inlining :)
Heh, I seldom notice the actual poster names.

Cogman: in your 2nd example, why is the new version any better than the old? There's still 1 function call. (I mean this is assuming the only reason I need size is to set the loop limits. If I need the size for other things, then I will store it.)

My name isn't "Cogman", but I do love Cogman's example, so I'm going to chime in anyway.

The loop in question has been unrolled. That is, there are more memory accesses per branch in the assembly, meaning there is more useful computation going on (i.e. filling the array) per overhead instruction (branches and loop induction).
 

Cogman

Lifer
Sep 19, 2000
10,284
138
106
Originally posted by: degibson
Originally posted by: eLiu
Also, I'm the one that made the thread about inlining :)
Heh, I seldom notice the actual poster names.

Cogman: in your 2nd example, why is the new version any better than the old? There's still 1 function call. (I mean this is assuming the only reason I need size is to set the loop limits. If I need the size for other things, then I will store it.)

My name isn't "Cogman", but I do love Cogman's example, so I'm going to chime in anyway.

The loop in question has been unrolled. That is, there are more memory accesses per branch in the assembly, meaning there is more useful computation going on (i.e. filling the array) per overhead instruction (branches and loop induction).

Exactly, Essentially it is reducing the number of times a branch is checked by a factor of 2. Branches can be expensive, so reducing them is beneficial. But it also reduces the number of times the comparison operator is ran, so you have a 2 instruction decrease per loop.

However, Like I said in the post, many compilers have automatic loop unrolling, so you should be aware that the compiler might have that option already available to you (I haven't actually seen how good it is in the assembly so I can't vouch for it completely).
 

eLiu

Diamond Member
Jun 4, 2001
6,407
1
0
Originally posted by: Cogman
Originally posted by: degibson
Originally posted by: eLiu
Also, I'm the one that made the thread about inlining :)
Heh, I seldom notice the actual poster names.

Cogman: in your 2nd example, why is the new version any better than the old? There's still 1 function call. (I mean this is assuming the only reason I need size is to set the loop limits. If I need the size for other things, then I will store it.)

My name isn't "Cogman", but I do love Cogman's example, so I'm going to chime in anyway.

The loop in question has been unrolled. That is, there are more memory accesses per branch in the assembly, meaning there is more useful computation going on (i.e. filling the array) per overhead instruction (branches and loop induction).

Exactly, Essentially it is reducing the number of times a branch is checked by a factor of 2. Branches can be expensive, so reducing them is beneficial. But it also reduces the number of times the comparison operator is ran, so you have a 2 instruction decrease per loop.



However, Like I said in the post, many compilers have automatic loop unrolling, so you should be aware that the compiler might have that option already available to you (I haven't actually seen how good it is in the assembly so I can't vouch for it completely).

degibson: I just notated that part of the post w/Cogman's name b/c I was too lazy to go back and quote the post.

But continuing, I think possibly we're talking about different examples? I was referring to this:

for (int i = 0; i < something.size(); ++i)
to
int size = something.size();
for (int i = 0; i < size; ++i);

I didn't think this would speed up anything at first. But then it struck me that the compiler potentially doesn't know that something.size() isn't changing... so it gets re-evaluated every time. Is that correct? So declaring a size variable lets the compiler know that the number of looping steps isn't changed by the loop operations?


And on the topic of loop unrolling, how much should you unroll something? Like wouldn't it then be better to make 4 explicit statements (equivalent to setting i, i+1, i+2, and i+3) or just have all 100 things done explicitly?
 

Cogman

Lifer
Sep 19, 2000
10,284
138
106
Originally posted by: eLiu
Originally posted by: Cogman
Originally posted by: degibson
Originally posted by: eLiu
Also, I'm the one that made the thread about inlining :)
Heh, I seldom notice the actual poster names.

Cogman: in your 2nd example, why is the new version any better than the old? There's still 1 function call. (I mean this is assuming the only reason I need size is to set the loop limits. If I need the size for other things, then I will store it.)

My name isn't "Cogman", but I do love Cogman's example, so I'm going to chime in anyway.

The loop in question has been unrolled. That is, there are more memory accesses per branch in the assembly, meaning there is more useful computation going on (i.e. filling the array) per overhead instruction (branches and loop induction).

Exactly, Essentially it is reducing the number of times a branch is checked by a factor of 2. Branches can be expensive, so reducing them is beneficial. But it also reduces the number of times the comparison operator is ran, so you have a 2 instruction decrease per loop.



However, Like I said in the post, many compilers have automatic loop unrolling, so you should be aware that the compiler might have that option already available to you (I haven't actually seen how good it is in the assembly so I can't vouch for it completely).

degibson: I just notated that part of the post w/Cogman's name b/c I was too lazy to go back and quote the post.

But continuing, I think possibly we're talking about different examples? I was referring to this:

for (int i = 0; i < something.size(); ++i)
to
int size = something.size();
for (int i = 0; i < size; ++i);

I didn't think this would speed up anything at first. But then it struck me that the compiler potentially doesn't know that something.size() isn't changing... so it gets re-evaluated every time. Is that correct? So declaring a size variable lets the compiler know that the number of looping steps isn't changed by the loop operations?


And on the topic of loop unrolling, how much should you unroll something? Like wouldn't it then be better to make 4 explicit statements (equivalent to setting i, i+1, i+2, and i+3) or just have all 100 things done explicitly?

Yep, that's exactly why there is a speed up. Even if the code isn't changing the size of the object that compiler has to assume that it does.

As for the loop unrolling, It really depends from system to system. The only way to know how much is enough is by profiling on the target machine. Also note that there are some cases where a 5x unroll will be slower then a 4x unroll, but a 6x unroll will be faster then a 4x unroll.
 

eLiu

Diamond Member
Jun 4, 2001
6,407
1
0
Originally posted by: Cogman
Originally posted by: eLiu
Originally posted by: Cogman
Originally posted by: degibson
Originally posted by: eLiu
Also, I'm the one that made the thread about inlining :)
Heh, I seldom notice the actual poster names.

Cogman: in your 2nd example, why is the new version any better than the old? There's still 1 function call. (I mean this is assuming the only reason I need size is to set the loop limits. If I need the size for other things, then I will store it.)

My name isn't "Cogman", but I do love Cogman's example, so I'm going to chime in anyway.

The loop in question has been unrolled. That is, there are more memory accesses per branch in the assembly, meaning there is more useful computation going on (i.e. filling the array) per overhead instruction (branches and loop induction).

Exactly, Essentially it is reducing the number of times a branch is checked by a factor of 2. Branches can be expensive, so reducing them is beneficial. But it also reduces the number of times the comparison operator is ran, so you have a 2 instruction decrease per loop.



However, Like I said in the post, many compilers have automatic loop unrolling, so you should be aware that the compiler might have that option already available to you (I haven't actually seen how good it is in the assembly so I can't vouch for it completely).

degibson: I just notated that part of the post w/Cogman's name b/c I was too lazy to go back and quote the post.

But continuing, I think possibly we're talking about different examples? I was referring to this:

for (int i = 0; i < something.size(); ++i)
to
int size = something.size();
for (int i = 0; i < size; ++i);

I didn't think this would speed up anything at first. But then it struck me that the compiler potentially doesn't know that something.size() isn't changing... so it gets re-evaluated every time. Is that correct? So declaring a size variable lets the compiler know that the number of looping steps isn't changed by the loop operations?


And on the topic of loop unrolling, how much should you unroll something? Like wouldn't it then be better to make 4 explicit statements (equivalent to setting i, i+1, i+2, and i+3) or just have all 100 things done explicitly?

Yep, that's exactly why there is a speed up. Even if the code isn't changing the size of the object that compiler has to assume that it does.

As for the loop unrolling, It really depends from system to system. The only way to know how much is enough is by profiling on the target machine. Also note that there are some cases where a 5x unroll will be slower then a 4x unroll, but a 6x unroll will be faster then a 4x unroll.

Oh geez. Possibly I'll skip the loop unrolling for the most part b/c there isn't really a target system per se. Anyone with a C compiler and a good bit of RAM should be able to run our code, and our users have at least P4, Core, Core2, and K8 & K10 (aside: was there ever a K9?). OSes include Windows (XP... not aware of any vista) and various Linux distros. Compilers are all gcc and icc to my knowledge.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Originally posted by: eLiu
Oh geez. Possibly I'll skip the loop unrolling for the most part b/c there isn't really a target system per se. Anyone with a C compiler and a good bit of RAM should be able to run our code, and our users have at least P4, Core, Core2, and K8 & K10 (aside: was there ever a K9?). OSes include Windows (XP... not aware of any vista) and various Linux distros. Compilers are all gcc and icc to my knowledge.

99.999% of code should never be hand-optimized. If you need optimization, you'll know it. But its still fun to talk about.
 

erwos

Diamond Member
Apr 7, 2005
4,778
0
76
Originally posted by: degibson
99.999% of code should never be hand-optimized. If you need optimization, you'll know it. But its still fun to talk about.
Well, I disagree somewhat, but it's more of a nitpick. There are often very simple optimizations that you can make that will make your code run much faster. But I would _not_ suggest digging down and writing your own assembly code in the vast majority of cases. People who think they're smarter than the compiler, especially a good one like gcc... well, they're usually not. Optimize in the language you're writing in, in other words.

Worrying about the number of arguments you're passing a function does not strike me as a worthwhile optimization compared to making sure that the function is using that data as efficiently as it can. To my eyes, any work you do in realm of passing in variables should be to make sure that the code is actually maintainable. Functions with a dozen inputs are probably begging to be subdivided into further functions or have some of those inputs be combined into arrays or structs/classes.

I am not going to claim to be a performance guru, but I do have some professional experience in the area of high-performance, soft real-time computing.
 

Cogman

Lifer
Sep 19, 2000
10,284
138
106
Originally posted by: erwos
Originally posted by: degibson
99.999% of code should never be hand-optimized. If you need optimization, you'll know it. But its still fun to talk about.
Well, I disagree somewhat, but it's more of a nitpick. There are often very simple optimizations that you can make that will make your code run much faster. But I would _not_ suggest digging down and writing your own assembly code in the vast majority of cases. People who think they're smarter than the compiler, especially a good one like gcc... well, they're usually not. Optimize in the language you're writing in, in other words.

Worrying about the number of arguments you're passing a function does not strike me as a worthwhile optimization compared to making sure that the function is using that data as efficiently as it can. To my eyes, any work you do in realm of passing in variables should be to make sure that the code is actually maintainable. Functions with a dozen inputs are probably begging to be subdivided into further functions or have some of those inputs be combined into arrays or structs/classes.

I am not going to claim to be a performance guru, but I do have some professional experience in the area of high-performance, soft real-time computing.

And what you are saying is really spot on. The first optimization should always be picking a good algorithm. After that, the rest is just nitpicking at tiny pieces of code. Some optimizations I would suggest all the time (IE a function that will return the same value every time it is called in a function, rather then call it several times, call it once and store it in a local variable).

It is possible to do better then GCC, however, GCC is getting better all the time. I just saw that GCC 4.4.0 has added a ton of loop optimizations (like loop blocking). If your compiler will do it for you, let it :).