Over the summer, I put together a simple demonstration of out-of-order execution in JavaScript. I thought some of you might find it interesting... you can find it here.
The instruction set:
[*]ldi - load immediate, register is set to the value in the instruction
[*]addr - register add, destination register is set to sum of source registers
[*]mul - register multiply, destination register is set to product of source registers
[*]bne - branch if not equal, jumps to location specified in the instruction if the source registers are not equal
[*]halt - stops execution
The boxes you see are:
[*]Program: the instructions in the program
[*]PC: the program counter (i.e. the location of the instruction that will be fetched)
[*]The decoder: shows instruction to decode, the PC of that instruction, and the branch prediction (right now, the branch predictor always predicts branches as taken); the "map" table which is used for register renaming; the decoder's results
[*]PRF: physical register file - speculative registers
[*]ARF: architectural register file - commited (non-speculative) register values, Note that its values are pointers into the physical register file.
[*]Free list: indicates which physical registers are not in use
[*]Issue Queue: holds instructions waiting to execute. Instructions are "Ready" when all source values are valid (i.e. the instructions that produced the source values have finished)
[*]ALUs: the execution units. there are 2.
[*]Re-Order Buffer: ensures the instructions update the architectural register file in order, and allows for recovery when a branch is mispredicted
Description of the program it executes:
What the program does:
register 1 gets incremented repeatedly
register 3 gets multiplied by 5 repeatedly, until register 1 has reached 5
How to run it
Click the "step" button to execute 1 cycle.
I hope at least a few people find it interesting and understandable. It's hard to show all of the pieces on the screen in a readable way.
If anyone has questions, I'd be happy to answer them. If I need another break from studying, I might write something more about how out-of-order execution works.
Notes to people familiar with out-of-order execution
I cheat for precise interrupts. The "right" way of doing it would be to take snapshots of the free list at each branch (or, I guess, instruction that could fault), and restore them when a misprediction or exception occurs. Since I already have too many tables on the screen, I just reconstruct the free list myself when an exception occurs - I look at the ARF to see what physical registers are in use and rebuild the freelist from there.
Also, by leaving out memory support, I eliminate one of the most complicated parts of a modern processor.
My issue queue is obviously way too big for the sample programs I use. I sized everything to be large enough that I never have stalls due to full ROB/issue queue/physical register file.
Related project
I'm currently writing another version in C, which presently has about the same level of functionality, but I'm hoping to eventually make it support a complete, real instruction set and integrating it with a full-system simulator. It would be pretty cool if that worked, since it would allow for performance modelling of unmodified binaries.
The instruction set:
[*]ldi - load immediate, register is set to the value in the instruction
[*]addr - register add, destination register is set to sum of source registers
[*]mul - register multiply, destination register is set to product of source registers
[*]bne - branch if not equal, jumps to location specified in the instruction if the source registers are not equal
[*]halt - stops execution
The boxes you see are:
[*]Program: the instructions in the program
[*]PC: the program counter (i.e. the location of the instruction that will be fetched)
[*]The decoder: shows instruction to decode, the PC of that instruction, and the branch prediction (right now, the branch predictor always predicts branches as taken); the "map" table which is used for register renaming; the decoder's results
[*]PRF: physical register file - speculative registers
[*]ARF: architectural register file - commited (non-speculative) register values, Note that its values are pointers into the physical register file.
[*]Free list: indicates which physical registers are not in use
[*]Issue Queue: holds instructions waiting to execute. Instructions are "Ready" when all source values are valid (i.e. the instructions that produced the source values have finished)
[*]ALUs: the execution units. there are 2.
[*]Re-Order Buffer: ensures the instructions update the architectural register file in order, and allows for recovery when a branch is mispredicted
Description of the program it executes:
Note that instruction 4 is the fifth instruction, since the count starts at 0. Note also that setting register 0 to 5 often doesn't actually do anything - the extra instructions are there just so you can see them finish before the multiply does.register 0 = 5
register 1 = 1
register 2 = 2
register 3 = 2
register 2 = register 1 + register 2
register 3 = register 0 * register 3
register 0 = 5
register 0 = 5
if register2 differs from register 0, go to instruction 4
halt
What the program does:
register 1 gets incremented repeatedly
register 3 gets multiplied by 5 repeatedly, until register 1 has reached 5
How to run it
Click the "step" button to execute 1 cycle.
I hope at least a few people find it interesting and understandable. It's hard to show all of the pieces on the screen in a readable way.
If anyone has questions, I'd be happy to answer them. If I need another break from studying, I might write something more about how out-of-order execution works.
Notes to people familiar with out-of-order execution
I cheat for precise interrupts. The "right" way of doing it would be to take snapshots of the free list at each branch (or, I guess, instruction that could fault), and restore them when a misprediction or exception occurs. Since I already have too many tables on the screen, I just reconstruct the free list myself when an exception occurs - I look at the ARF to see what physical registers are in use and rebuild the freelist from there.
Also, by leaving out memory support, I eliminate one of the most complicated parts of a modern processor.
My issue queue is obviously way too big for the sample programs I use. I sized everything to be large enough that I never have stalls due to full ROB/issue queue/physical register file.
Related project
I'm currently writing another version in C, which presently has about the same level of functionality, but I'm hoping to eventually make it support a complete, real instruction set and integrating it with a full-system simulator. It would be pretty cool if that worked, since it would allow for performance modelling of unmodified binaries.