heres a little info on vex from a source outside intels white paper/ If you read down you will see that depending on wjat FMA 3/4 is used there is room for 5 registers . I really haven't found yet what this 4/5register is for but I sure will know soon enough .
[edit] FeaturesThe proposed VEX coding scheme extends the existing x86 instruction set architecture to allow the definition of new instructions and the extension or modification of previously existing instruction codes. This serves the following purposes:
The opcode map is extended to make space for future instructions.
It allows instruction codes to have up to five operands, where the original scheme allows only two operands (in rare cases three operands).
It allows the size of SIMD vector registers to be extended from the 128-bits XMM registers to 256-bits registers named YMM. There is room for further extensions of the register size in the future.
It allows existing two-operand instructions to be modified into non-destructive three-operand forms where the destination register is different from both source registers. For example c:=a+b instead of a:=a+b (where register a is changed by the instruction).
[edit] Technical descriptionThe proposed VEX coding scheme uses a code prefix consisting of 2 or 3 bytes which is added to existing or new instruction codes[1].
The VEX prefix replaces the most commonly used instruction prefix bytes and escape codes. In many cases, the number of prefix bytes and escape bytes that are replaced is the same as the number of bytes in the VEX prefix, so that the total length of the VEX-encoded instruction is the same as the length of the legacy instruction code. In other cases, the VEX-encoded version is longer or shorter than the legacy code.
The 3-bytes VEX prefix contains the following components:
The four bits R,X,B,W contained in the REX prefix used in the x86-64 instruction set extension.
Two bits named pp to replace operand size prefixes and operand type prefixes (66, F2, F3).
A bit named L specifying 256 bit vector length.
Four bits named vvvv specifying an second source register operand.
Five bits named m-mmmm. Two of the m bits are used for replacing existing escape codes and for specifying the length of the instruction. The remaining three m bits are reserved for future use, such as specifying vector lengths > 256 bits, specifying different instruction lengths, or extending the opcode space.
The 2-bytes VEX prefix contains a subset of these components and can be used in cases where not all components are needed.
The encoding is as follows:
First byte Second byte Third byte
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
3-byte VEX 1 1 0 0 0 1 0 1 R̅ X̅ B̅ m m m m m W v v v v L p p
2-byte VEX 1 1 0 0 0 1 0 1 R̅ v v v v L p p
The R̅, X̅ and B̅ bits are equivalent to the REX prefix's R, X and B bits, providing a fourth register number bit for each of the three registers referenced by a standard x86 instruction: the register operand, and the index and base registers for the memory operand. The v̅ bits specify an additional source register, or are set to all-ones if not used. All of these bits are complemented in the instruction stream, so they are encoded as 1 bits in 32-bit mode.
The VEX opcode bytes are the same as that used by the LDS and LES instructions. These instructions are not supported in 64-bit mode, while in 32-bit mode, the following "mod R/M" byte can not be of the form "11xxxxxx" (which would specify a register operand). The bit inversion ensures that the second byte of a VEX prefix is always of this form in 32-bit mode.
The W bit is equivalent to the REX prefix's W bit, and specifies a 64-bit operand. For non-integer instructions, it is a general opcode extension bit.
The 5 m bits replace leading opcode bytes. The values 1, 2 and 3 are equivalent to opcodes 0F, 0F 38 and 0F 3A; all other values are currently reserved. (The 2-byte VEX prefix always corresponds to a 0F prefix.)
The L bit indicates the vector length. It is 0 for 128-bit SSE (xmm) registers, and 1 for 256-bit AVX (ymm) registers.
The p bits encode additional prefix bytes. The 4 possible values are none, 66, F3, and F2. These encode the operand type for SSE instructions: packed single, packed double, scalar single and scalar double, respectively.
Instructions that need more than three operands have an extra suffix byte specifying one or two additional register operands. Instructions coded with the VEX prefix can have up to five operands. At most one of the operands can be a memory operand; and at most one of the operands can be an immediate constant of 4 or 8 bits. The remaining operands are registers.
The AVX instruction set is the first instruction set extension to use the VEX coding scheme. The AVX instructions have up to four operands. The AVX instruction set allows the VEX prefix to be applied only to instructions using the SIMD XMM registers. However, the VEX coding scheme has space for applying the VEX prefix to other instructions as well in future instruction sets.
Legacy instructions with a VEX prefix added are equivalent to the same instructions without VEX prefix with the following differences:
The VEX-encoded instruction can have one more operand, making it non-destructive.
A 128-bit XMM instruction without VEX prefix leaves the upper half of the full 256-bit YMM register unchanged, while the VEX-encoded version sets the upper half to zero.
Instructions that use the whole 256-bit YMM register should not be mixed with non-VEX instructions that leave the upper half of the register unchanged, for reasons of efficiency.