Okay, it is obvious that a hardware lesson is needed because you guys just aren't getting it. First and foremost, looking at this at the hardware level is NOT too low level. How do you measure a software program? How do you say one software implementation is faster that the other? By the size of the code? By the number of instructions? NO. You measure it according to how quickly it runs on the hardware. Therefore, if you want to know if ++i is really faster that i++, you have to look at it at the hardware level. Assembly code is not going to tell you anything.
As for the hardware, for simplicity sake, let's use a dual-core processor, and then we'll move on to single core. For your ++i implementation, this is how it will work.
CPU1 | CPU2
increment value 1 |
increment value 2 |
increment value .... |
wait until all increment have completed |
.... |
.... |
return value |
Now, consider the use of a temp
CPU1 | CPU2
copy value 1 | increment value 1
copy value 2 | increment value 2
copy value .... | increment value ......
return value | wait for all increments to complete
(EDIT: Sorry, spaced didn't appear. | mark operations used on different CPUs)
Now, which one completes first? The second implementation, use with the i++. And contrary to popular belief, you don't need to thread the process for this to work. The CPU is capable of determining independent, non-specified operations and executing them concurrently. Don't have a dual core? A hyper-threaded computer will do the same thing. Using
Tomasulo's algorithm, and the fact that all microprocessor architectures since, I believe, the Pentium II Pro implement this algorithm in order to perform out-of-order execution and out-of-order completion, a hyper-threaded computer can take in more than one instruction at a time, and can therefore still execute the above code as shown.
Now if you want to tell me again how exactly it is that ++i executes faster, I'm all ears. As for the original statement,
prefix increment can be faster than postfix, but never slower, this is completely wrong. Consider:
j = array[i++] vs. j = array[++i] (ignore that these are two different values)
add temp,i,0 ; store i in temp
inc i ; increment i
st j, array(temp) ; store result in j
vs.
inc i
st j, array(i)
Which one happens faster? Neither if your compiler sucks. If the compiler is doing its job, the first one will execute faster due to the RAW stall of the second. So the stalls are exceptionally important.
Point is, the only metric for measuring how quickly code executes is in clock cycles, and you can't determine clock cycles if you don't understand the hardware. Feel free to disagree if you like, but you'll be wrong if you do so. I have a Master's in Computer Engineering with a specialty in processor architecture. I have several other people with Ph.D's in Computer Engineering who concur with what I have said. If you want to ask ANY other computer engineer with such specialties, feel free. They'll tell you the same thing. The claims that ++i may occasionally run faster than i++ may have been valid in older architectures, but like I said, this is no longer the case.