Do all designs use a one-to-one mapping of future file entries (ignoring any microcode-only entries) to architectural register file entries? It seems that a one-to-one mapping would mean code like:
A processor like the Athlon or Hammer would presumably execute the first load to ax and the load to bx simultaneously, then the two increments, and then the two stores. However, if future files can only map a given register to one location, then the second set of operations involving ax would have to wait for the first to finish before it could begin.
Of course, not doing a 1-1 mapping would add a lot of complexity and might not be worth the performance gains.
Sorry if this isn't clear, or I made some stupid oversight... I've been working on this stuff for a couple of hours
.
would unnecessarily stall (assuming a wide enough superscalar and large enough reorder buffer). The three blocks have no data dependencies, and the only dependency is artificial - we just lack enough registers to operate on all 3 at once (assume for the sake of argument that the other registers are in needed or I made the example long enough to use all 8 of x86's registers.mov ax, [0] (sorry, I don't remember the right notation for "mem[0]")
inc ax
mov [0], ax
mov bx, [1]
inc bx
mov [1], bx
mov ax, [2]
inc ax
mov [2], ax
A processor like the Athlon or Hammer would presumably execute the first load to ax and the load to bx simultaneously, then the two increments, and then the two stores. However, if future files can only map a given register to one location, then the second set of operations involving ax would have to wait for the first to finish before it could begin.
Of course, not doing a 1-1 mapping would add a lot of complexity and might not be worth the performance gains.
Sorry if this isn't clear, or I made some stupid oversight... I've been working on this stuff for a couple of hours