Out of Order fetch

imgod2u

Senior member
Sep 16, 2000
993
0
0
So I'm looking over the old MIPS basic 5 stage pipeline. And I see just how much of a pain branches can be. So I think to myself, why do you even take branches at all? There seems to be plenty of instructions that occur after the branch and most branches are maybe 5-6 instructions long. So why not just skip them entirely (during the fetch process) and fetch the instructions afterwards?

I started thinking of what it would take to do out-of-order fetch on an MPU. Of course, you would need a buffer, a window if you will, to store the instructions being decoded (on the P4, it's already done like this with the trace cache, a few modifications will be needed of course, to make it out of order). This would not only increase the average decoding bandwidth as instruction misses in cache can be avoided by just decoding other instructions which are in cache, but branches that only span a few instructions can be skipped and processed later (when the branch instruction is evaluated) in parallel with the instructions which are already decoded. That is the worst case (that all the instructions after the branch relies on the result of the branch). The best case would be if the instructions after the branch were independent and could, therefore, be executed before the branch is resolved. This would speed up things significantly for long-pipeline machines as not only would you not have to predict (and suffer prediction penalties if you're wrong), but you can improve decoding throughput.
 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
Well, the P4 already supports OOE, and I know they do branch prediction as well (they'd have to; otherwise they'd be hosed with their long pipeline, because they'd stall at every branch). So I'm guessing they already do something very much like this.