C/C++ prefix vs postfix increment

duragezic · Jun 30, 2007

A while ago I read on some C++ site that was fairly reputable that was basically said

prefix increment can be faster than postfix, but never slower, so you may as well use the prefix version if you don't care about the result. i.e. in a for() loop, ++i and i++ do the same thing.

Then a month or so later in my C++ class, my teacher (who was a grad student, not a professor yet) said the same thing basically.

Is this even true? Or is it just rule of thumb, or approximation? It might also depend on the compiler and platform. I would think it would be very hard to gauge a difference in speed with such a simple operation, but like it says, if it is at least as fast, wouldn't it be better to use? I guess the biggest problem is just the habit of people writing the postfix version more, such as in for loops.

I'm not too familiar with the inner workings of C or C++, but I have wondered if this can be considered true. I almost never see people use it though, so I'm inclined to believe that it is a myth?

dighn · Jun 30, 2007

I read this a long time ago too. I do the prefix increments out of habit. But either way I don't think it makes any difference in most applications. It's simply not worth it to optimize unnecessarily.

Besides, I would think any modern compiler would be able to optimize any unevaluated postfix increments.

engineereeyore · Jun 30, 2007

I call bull-crap on this. I could be wrong, but here's my logic.

The ++i creates a RAW data hazard, meaning the the assignment operation can not begin execution until the increment operation completes. For the i++ implementation, a RAR name dependency is created, which unlike a RAW hazard, is easily avoidable. The value can be both read and sent in for incrementation at the same time. So if anything, i++ is always faster.

If someone want to point out something I'm missing, I'd love to know. Just my opinion on the subject.

dighn · Jun 30, 2007

Originally posted by: engineereeyore
I call bull-crap on this. I could be wrong, but here's my logic.

The ++i creates a RAW data hazard, meaning the the assignment operation can not begin execution until the increment operation completes. For the i++ implementation, a RAR name dependency is created, which unlike a RAW hazard, is easily avoidable. The value can be both read and sent in for incrementation at the same time. So if anything, i++ is always faster.

If someone want to point out something I'm missing, I'd love to know. Just my opinion on the subject.

I looked more into this. It mostly have to do with operator overloading. When the post-operation is overloaded as a function, the return can't happen before the operation, so it would have to keep a copy of the original value first.

For primitive types however I don't see why the compiler would be restricted to that.

engineereeyore · Jun 30, 2007

Originally posted by: dighn
I looked more into this. It mostly have to do with operator overloading. When the post-operation is overloaded as a function, the return can't happen before the operation, so it would have to keep a copy of the original value first.

For primitive types however I don't see why the compiler would be restricted to that.

I think the problem is people associate fewer instructions with faster execution. On a modern superscaler architecture, multiple instructions are always being executed concurrently. So the number of instruction has almost no bearing on how quickly something is actually done. For instance, given

DIV R1,R2,R3

and

ADD R1,R2,R3
MULT R4,R5,R6
ADD R3,R4,R5

On almost any architecture, the second set of operations will be faster, simply due to the computation time of the DIV. Anyone who claims that one C++ instruction executes faster than another on all architectures scares me. I understand what they're saying, but everything I've read so far points to an idea presented by people who have absolutely no concept of actual microprocessor architectures.

Again, I could be wrong, but Tomasulo's algorithm, implemented on most modern microprocessors, should implement an i++ quicker than a ++i.

dighn · Jun 30, 2007

Again, I could be wrong, but Tomasulo's algorithm, implemented on most modern microprocessors, should implement an i++ quicker than a ++i.

It should be the same speed either way because in a for loop like that, the result isn't used in the same statement anyway. However in loops that use iterators instead of integer counters, the post vs. pre argument may hold depending on how well the compiler is able to optimize the overloaded operator function.

piasabird · Jul 1, 2007

If a compiler is good, it will pick that fastest method of implementation for the object code.

Ken g6 · Jul 1, 2007

Hmm. I just tried a little experiment with gcc -S. Even with no optimizations:

int i;for(i=0;i<10;++i);
and
int i;for(i=0;i<10;i++);

compile to the exact same assembly code! (On x86, version permitting; this was version 3.4.4 cygming special.)

OK, now that that's out of the way, f(++i) shouldn't compile the same as f(i++). Further, in this case, engineereyore is right. In the second case, the function doesn't depend on the incremented value, so both can execute in parallel. (Processor permitting, of course.)

Templeton · Jul 1, 2007

Originally posted by: Ken_g6
Hmm. I just tried a little experiment with gcc -S. Even with no optimizations:

int i;for(i=0;i<10;++i);
and
int i;for(i=0;i<10;i++);

compile to the exact same assembly code! (On x86, version permitting; this was version 3.4.4 cygming special.)

OK, now that that's out of the way, f(++i) shouldn't compile the same as f(i++). Further, in this case, engineereyore is right. In the second case, the function doesn't depend on the incremented value, so both can execute in parallel. (Processor permitting, of course.)

The issue isn't with primative types, there's no reason that the compiler shouldn't generate the most efficient version for primatives if you don't care about the previous value. The issue arises with user defined types with overloaded pre/post increment operators (such as iterators) where the post increment version will generate a needless temporary. Whether the compiler is able to optimize the contstruction of this temporary away is a question for your particular compiler, but it's certainly not gaurenteed and I would think of it being even less likely for non inline operators. If you have no need for the previous value, use preincrement version as it will never be any slower but could definitely be faster when dealing with user defined types.

smack Down · Jul 1, 2007

I don't think it matters,
j = i++; means copy the value of i, then increment i.
j = ++i; means increment the value of i, then copy i.

So both do the same basic operations so in an unoptimized complier both would use the same instructions just in a different order. In theory an out of order processor would be able to better optimizes a copy before increment because it doesn't need to wait for i + 1 to be calculated.

In the case of:
i++
++i

Don't use the return value and it should be easier to optimisms away the copy in ++i because the copy operation would be closer to the place where the value is discarded. To say one is always faster then would require more knowledge of the system.

Templeton · Jul 1, 2007

Originally posted by: smack Down
I don't think it matters,
j = i++; means copy the value of i, then increment i.
j = ++i; means increment the value of i, then copy i.

So both do the same basic operations so in an unoptimized complier both would use the same instructions just in a different order. In theory an out of order processor would be able to better optimizes a copy before increment because it doesn't need to wait for i + 1 to be calculated.

In the case of:
i++
++i

Don't use the return value and it should be easier to optimisms away the copy in ++i because the copy operation would be closer to the place where the value is discarded. To say one is always faster then would require more knowledge of the system.

No, consider the overloaded operators for some class A

In order to properly implement a postfix increment for a user defined type, a copy of the object needs to be made and returned back to the caller. As for whether this temporary copy can be optimized out if not used, that's going to depend entirely on how good your compiler is. (And I would bet that none are good enough to optimize it out if these operators are non inline in a seperately compiled library)

Markbnj · Jul 1, 2007

This condition of postfix increment on objects of class type has been known for years. It doesn't matter at all for scalar values (unless logically important in the order of execution). For scalars in the examples shown the compiler will generate the exact same code.

For class types, it is only one of a number of situations in which you have to be sensitive to the creation of temporaries. Nearly every pass-by-value situation represents another.

engineereeyore · Jul 1, 2007

Originally posted by: dighn
It should be the same speed either way because in a for loop like that, the result isn't used in the same statement anyway. However in loops that use iterators instead of integer counters, the post vs. pre argument may hold depending on how well the compiler is able to optimize the overloaded operator function.

Well, that depends. If you're using the increment operator inside of an array reference, i++ is going to work faster. I know ++i and i++ would reference two different values in an array, but you can typically change how you're referencing things to make it work.

I'm still not sure if I buy it or not though. I definitely don't agree that i++ will never be faster, because that's just plain wrong. But ++i sometimes being faster, I don't know. I'll have to research this a little more.

dighn · Jul 1, 2007

Originally posted by: engineereeyore
<div class="FTQUOTE"><begin quote>Originally posted by: dighn
It should be the same speed either way because in a for loop like that, the result isn't used in the same statement anyway. However in loops that use iterators instead of integer counters, the post vs. pre argument may hold depending on how well the compiler is able to optimize the overloaded operator function.</end quote></div>

Well, that depends. If you're using the increment operator inside of an array reference, i++ is going to work faster. I know ++i and i++ would reference two different values in an array, but you can typically change how you're referencing things to make it work.

I'm still not sure if I buy it or not though. I definitely don't agree that i++ will never be faster, because that's just plain wrong. But ++i sometimes being faster, I don't know. I'll have to research this a little more.

Yeah it definitely depends on the loop. I was referring to the typical scenario (well to me anyway) where you have for(int i = 0; i < somevalue; ++i){...}
In this case if i is an object, then ++i definitely should be faster ignoring optimization.

mlm · Jul 1, 2007

We discussed this a couple of weeks ago in the STL class that I'm taking. Basically, the postfix ++ is more costly because it has to create a copy to return. Prefix doesn't make any copies.

If it doesn't matter to the code, the compiler will change postfixes to prefixes, but if it does matter, the compiler will keep it as-is. I can't remember any examples of that though

engineereeyore · Jul 2, 2007

Here's an example I found of why people believe this to be true. Consider this example:

void PrintElement(const std::vector<int>::iterator& i)
{
std::cout << *i << " ";
}

int main()
{
std::vector<int> v;

v.push_back(1); v.push_back(2); v.push_back(3);

std::vector<int>::iterator i = v.begin();

while (i != v.end())
PrintElement( i++ );

while (i != v.begin())
PrintElement( --i );
}

The person that posted this stated that in order for the first Print Element to run, it required the use of a temporary to hold the original value of i to be sent to the PrintElement function, while i itself was incremented. The second implementation would therefore run faster because the temporary was not required.

THIS IS ABSOLUTE IDIOCY!

While the logic behind what is being said is true, the effect of this implementation on the speed of the program is absurd. The first implementation, even if it required the use of a temporary, would still run just as fast, if not faster. Why? Because, as I said before, the prefix operator introduces a RAW (read-after-write) data hazard, which introduces a single cycle delay in ANY microprocessor architecture. The postfix creates a RAR (read-after-read) name dependency, something very easily avoided and introducing zero delays.

So people can claim whatever they wish, but even using an object, this is not true. Again, just my understanding of the subject, but unless I'm missing something, this claim is complete crap.

dandragonrage · Jul 2, 2007

That's BS. You don't need a temporary. You call PrintElement THEN you INC i. Though if you were passing in a pointer to an int and the function modified that int, it could be a different story. I'm not sure if that behavior is actually defined, though. I don't think the spec defines when the increment is done, but I could be wrong.

mlm · Jul 2, 2007

<div class="FTQUOTE"><begin quote>Originally posted by: engineereeyore
Here's an example I found of why people believe this to be true. Consider this example:
</end quote></div>

If you do a trace-through, you're calling different functions. This wouldn't make much of a difference with built-in types, but if it's a costly user-defined type, prefix is definitely a better choice.

But of course, it also depends on how your compiler interprets the increments.

_Myt operator++(int)
{ // postincrement
_Myt _Tmp = *this;
++*this;
return (_Tmp);
}

_Myt& operator--()
{ // predecrement
--(*(_Mybase *)this);
return (*this);
}

Edit: Whoops, copied the wrong ones

dighn · Jul 2, 2007

Originally posted by: engineereeyore
Here's an example I found of why people believe this to be true. Consider this example:

void PrintElement(const std::vector<int>::iterator& i)
{
std::cout << *i << " ";
}

int main()
{
std::vector<int> v;

v.push_back(1); v.push_back(2); v.push_back(3);

std::vector<int>::iterator i = v.begin();

while (i != v.end())
PrintElement( i++ );

while (i != v.begin())
PrintElement( --i );
}

The person that posted this stated that in order for the first Print Element to run, it required the use of a temporary to hold the original value of i to be sent to the PrintElement function, while i itself was incremented. The second implementation would therefore run faster because the temporary was not required.

THIS IS ABSOLUTE IDIOCY!

While the logic behind what is being said is true, the effect of this implementation on the speed of the program is absurd. The first implementation, even if it required the use of a temporary, would still run just as fast, if not faster. Why? Because, as I said before, the prefix operator introduces a RAW (read-after-write) data hazard, which introduces a single cycle delay in ANY microprocessor architecture. The postfix creates a RAR (read-after-read) name dependency, something very easily avoided and introducing zero delays.

So people can claim whatever they wish, but even using an object, this is not true. Again, just my understanding of the subject, but unless I'm missing something, this claim is complete crap.

The RAW/RAR issue is significantly overshadowed by the temporary creation. You are looking at this from a very low-level which doesn't really apply to the object case. The --/++ operations are proper, actual function calls and the way the language is structured necessarily requires the construction of a temporary object for the post-operator. Object creation is always relatively expensive, certainly when compared to the minute delay caused by data hazards.

PrinterElement(i++); --> the compiler calls the ++ function, which must increment i but also return the original value of i for passing into PrintElement() after the increment. copy must be made.

PrintElement(--i); --> -- function is called, which decrements i but can return the current state of the object.

engineereeyore · Jul 2, 2007

Originally posted by: dighn
The RAW/RAR issue is significantly overshadowed by the temporary creation. You are looking at this from a very low-level which doesn't really apply to the object case. The --/++ operations are proper, actual function calls and the way the language is structured necessarily requires the construction of a temporary object for the post-operator. Object creation is always relatively expensive, certainly when compared to the minute delay caused by data hazards.

PrinterElement(i++); --> the compiler calls the ++ function, which must increment i but also return the original value of i for passing into PrintElement() after the increment. copy must be made.

PrintElement(--i); --> -- function is called, which decrements i but can return the current state of the object.

But that is not at all true. When the processor creates a copy, it's not copying the entire data structure. i is nothing more than a pointer. Copying a pointer is a very simple process. You can't increment an object, only a pointer to an object. Therefore, what you are copying is a pointer and that operation, on current processors, is almost not even worth considering. Therefore, the RAR and RAW become very much a big deal as all that is copied is the pointer, not the structure.

I've asked this question of several other computer architects, many with Ph. D's, and they all agree. Prefix does not function faster. To a software person, it may appear so because of the use of a temporary, but to a hardware person who knows how these things are handled by the processor, this claims is absolutely false.

engineereeyore · Jul 2, 2007

Originally posted by: dandragonrage
That's BS. You don't need a temporary. You call PrintElement THEN you INC i. Though if you were passing in a pointer to an int and the function modified that int, it could be a different story. I'm not sure if that behavior is actually defined, though. I don't think the spec defines when the increment is done, but I could be wrong.

Agreed. Any compiler should be able to look at the use of the incremented value and determine that the function should be called first, thereby eliminating the need for a temp.

mlm · Jul 2, 2007

Originally posted by: engineereeyore
But that is not at all true. When the processor creates a copy, it's not copying the entire data structure. i is nothing more than a pointer. Copying a pointer is a very simple process. You can't increment an object, only a pointer to an object. Therefore, what you are copying is a pointer and that operation, on current processors, is almost not even worth considering. Therefore, the RAR and RAW become very much a big deal as all that is copied is the pointer, not the structure.

Sure you can. All you have to do is have ++ overloaded for that object.

engineereeyore · Jul 2, 2007

<div class="FTQUOTE"><begin quote>Originally posted by: mlm
<div class="FTQUOTE"><begin quote>Originally posted by: engineereeyore
But that is not at all true. When the processor creates a copy, it's not copying the entire data structure. i is nothing more than a pointer. Copying a pointer is a very simple process. You can't increment an object, only a pointer to an object. Therefore, what you are copying is a pointer and that operation, on current processors, is almost not even worth considering. Therefore, the RAR and RAW become very much a big deal as all that is copied is the pointer, not the structure.
</end quote></div>

Sure you can. All you have to do is have ++ overloaded for that object.</end quote></div>

True, if you wanted to do that you could. But the temporary used is still going to be just a pointer. That won't change anything.

EDIT: Matter of fact, that would make i++ even more efficient. i++ can create a temp, return it, and continue working on the i++, whereas ++i requires you to complete all operations pertaining to ++i before you can return the value.

mlm · Jul 2, 2007

Originally posted by: engineereeyore

Originally posted by: mlm

Originally posted by: engineereeyore
But that is not at all true. When the processor creates a copy, it's not copying the entire data structure. i is nothing more than a pointer. Copying a pointer is a very simple process. You can't increment an object, only a pointer to an object. Therefore, what you are copying is a pointer and that operation, on current processors, is almost not even worth considering. Therefore, the RAR and RAW become very much a big deal as all that is copied is the pointer, not the structure.

Click to expand...

Sure you can. All you have to do is have ++ overloaded for that object.

Click to expand...

True, if you wanted to do that you could. But the temporary used is still going to be just a pointer. That won't change anything.

EDIT: Matter of fact, that would make i++ even more efficient. i++ can create a temp, return it, and continue working on the i++, whereas ++i requires you to complete all operations pertaining to ++i before you can return the value.

But it wouldn't be just a pointer. The compiler would invoke that object's copy constructor to create the temp. If you look at the code that I posted from the compiler, the postincrement takes the int by value, not by reference.

I'm not quite sure how you could do a return before finishing the increment. I've hardly worked with concurrency (or whatever it's called), but from what I've seen that doesn't seem possible.

smack Down · Jul 2, 2007

mlm is correct with respect to object because the standard uses a hack to make it faster.
With an int++ returns a copy of the initial value, ++int returns a copy of the final value, object++ returns a copy of the initial value, but ++object returns a reference to the final value.

The int in "_Myt operator++(int) " is another hack. The int isn't used for data, but instead used to make the function signatory different between the pre and post.

C/C++ prefix vs postfix increment

Lifer

Lifer

Platinum Member

Lifer

Platinum Member

Lifer

Lifer

Programming Moderator, Elite Member

Senior member

Diamond Member

Senior member

Elite Member <br>Moderator Emeritus

Platinum Member

Lifer

Senior member

Platinum Member

Senior member

Senior member

Lifer

Platinum Member

Platinum Member

Senior member

Platinum Member

Senior member

Diamond Member