VEX prefix encoding applies to SIMD instructions operating on YMM registers, XMM
registers, and in some cases with a general-purpose register as one of the operand.
VEX prefix is not supported for instructions operating on MMX or x87 registers.
Details of VEX prefix and instruction encoding are discussed in Chapter 4.
1.4.3 Arithmetic Primitives for 128-bit Vector and Scalar processing
Intel AVX provides 131 128-bit numeric processing instructions that employ VEXprefix
encoding. These VEX-encoded instructions generally provide the same functionality
over instructions operating on XMM register that are encoded using SIMD
prefixes. The 128-bit numeric processing instructions in AVX cover floating-point and
integer data processing; across 128-bit vector and scalar processing.
The enhancement in AVX on 128-bit floating-point compare operation provides 32
conditional predicates to improve programming flexibility in evaluating conditional
expressions. This contrasts with floating-point SIMD compare instructions in SSE and
SSE2 supporting only 8 conditional predicates.
FMA provides 60 128-bit floating-point instructions to process 128-bit vector and
scalar data. The arithmetic operations cover fused multiply-add, fused multiplysubtract,
signed-reversed multiply on fused multiply-add and multiply-subtract
new variable blend instructions supports four-operand syntax with nondestructive
source syntax. Branching conditions dependent on floating-point
data or integer data can benefit from Intel AVX. This is more flexible than
non-VEX encoded instruction syntax that uses the XMM0 register as implied
mask for blend selection. While variable blend with implied XMM0 syntax is
supported in SSE4 using SIMD prefix encoding, VEX-encoded 128-bit variable
blend instructions only support the more flexible four-operand syntax.
Prior to using AVX, the application must identify that the operating system supports
the XGETBV instruction, the YMM register state, in addition to processor’s support for
YMM state management using XSAVE/XRSTOR and AVX instructions. The following
simplified sequence accomplishes both and is strongly recommended.
1) Detect CPUID.1:ECX.OSXSAVE[bit 27] = 1 (XGETBV enabled for application use1)
2) Issue XGETBV and verify that XFEATURE_ENABLED_MASK[2:1] = ‘11b’ (XMM
state and YMM state are enabled by OS).
3) detect CPUID.1:ECX.AVX[bit 28] = 1 (AVX instructions supported).
(Step 3 can be done in any order relative to 1 and 2)
The following pseudocode illustrates this recommended application AVX detection
process:
----------------------------------------------------------------------------------------
INT supports_AVX()
{ ; result in eax
mov eax, 1
cpuid
Figure 2-1. General Procedural Flow of Application Detection of AVX
1. If CPUID.01H:ECX.OSXSAVE reports 1, it also indirectly implies the processor supports XSAVE,
XRSTOR, XGETBV, processor extended state bit vector XFEATURE_ENALBED_MASK register.
Thus an application may streamline the checking of CPUID feature flags for XSAVE and OSXSAVE.
XSETBV is a privileged instruction
It is unwise for an application to rely exclusively on CPUID.1:ECX.AVX[bit 28]
or at all on CPUID.1:ECX.XSAVE[bit 26]: These indicate hardware support but not
operating system support. If YMM state management is not enabled by an operating
systems, AVX instructions will #UD regardless of CPUID.1:ECX.AVX[bit 28].
“CPUID.1:ECX.XSAVE[bit 26] = 1” does not guarantee the OS actually uses the
XSAVE process for state management.
These steps above also apply to enhanced 128-bit SIMD floating-pointing instructions
in AVX (using VEX prefix-encoding) that operate on the YMM states. Application
detection of VEX-encoded AES is described in Section 2.2.2.
cmp ecx, 018001000H; check OSXSAVE, AVX, FMA feature flags
jne not_supported
; processor supports AVX,FMA instructions and XGETBV is enabled by OS
mov ecx, 0; specify 0 for XFEATURE_ENABLED_MASK register
XGETBV; result in EDX:EAX
and eax, 06H
cmp eax, 06H; check OS has enabled both XMM and YMM state support
jne not_supported
mov eax, 1
jmp done
NOT_SUPPORTED:
mov eax, 0
done:
The lower 128 bits of a YMM register is aliased to the corresponding XMM register.
Legacy SSE instructions (i.e. SIMD instructions operating on XMM state but not using
the VEX prefix, also referred to non-VEX encoded SIMD instructions) will not access
the upper bits (255:128) of the YMM registers. AVX and FMA instructions with a VEX
prefix and vector length of 128-bits zeroes the upper 128 bits of the YMM register.
See Chapter 2, “Programming Considerations with 128-bit SIMD Instructions” for
more details.
Upper bits of YMM registers (255:128) can be read and written by many instructions
with a VEX.256 prefix.
XSAVE and XRSTOR may be used to save and restore the upper bits of the YMM registers.
Bits 31-0: Reports the valid bit fields of the lower 32 bits of the
XFEATURE_ENABLED_MASK register. If a bit is 0, the corresponding bit
field in XFEATURE_ENABLED_MASK is reserved.
Bit 00: legacy x87
Bit 01: 128-bit SSE
Bit 02: 256-bit AVX
EBX Bits 31-0: Maximum size (bytes, from the beginning of the
XSAVE/XRSTOR save area) required by enabled features in XCR0. May
be different than ECX if some features at the end of the XSAVE save
area are not enabled.
ECX Bit 31-0: Maximum size (bytes, from the beginning of the
XSAVE/XRSTOR save area) of the XSAVE/XRSTOR save area required
by all supported features in the processor, i.e all the valid bit fields in
XCR0.
EDX Bit 31-0: Reports the valid bit fields of the upper 32 bits of the
XFEATURE_ENABLED_MASK register (XCR0). If a bit is 0, the corresponding
bit field in XCR0 is reserved
Processor Extended State Enumeration Sub-
Leaf 0DH output depends on the initial value in ECX.
If ECX contains an invalid sub leaf index, EAX/EBX/ECX/EDX return 0.
Each valid sub-leaf index maps to a valid bit in the XCR0 register
starting at bit position 2
EAX Bits 31-0: The size in bytes (from the offset specified in EBX) of the
save area for an extended state feature associated with a valid subleaf
index, n. This field reports 0 if the sub-leaf index, n, is invalid*.
YMM STATE, VEX PREFIX AND SUPPORTED OPERATING
MODES
AVX and FMA instructions comprises of 256-bit and 128-bit instructions that operates
on YMM states via VEX prefix encoding. SIMD instructions operating on XMM states
(i.e. not accessing the upper 128 bits of YMM) generally do not use VEX prefix.
For processors that support YMM states, the YMM state exists in all operating modes.
However, the available interfaces to access YMM states may vary in different modes.
The processor's support for instruction extensions that employ VEX prefix encoding is
independent of the processor's support for YMM state.
Instructions requiring VEX prefix encoding generally are supported in 64-bit, 32-bit
modes, and 16-bit protected mode. They are not supported in Real mode, Virtual-
8086 mode or entering into SMM mode.
Note that bits 255:128 of YMM register state are maintained across transitions into
and out of these modes. Because, XSAVE/XRSTOR instruction can operate in all operating
modes, it is possible that the processor's YMM register state can be modified by
software in any operating mode by executing XRSTOR. The YMM registers can be
updated by XRSTOR using the state information stored in the XSAVE/XRSTOR area
residing in memory.
Operating systems must use the XSAVE/XRSTOR instructions for YMM state management.
The XSAVE/XRSTOR instructions also provide flexible and efficient interface to
manage XMM/MXCSR states and x87 FPU states in conjunction with new processor
extended states.
An OS must enable its YMM state management to support AVX and FMA extensions.
Otherwise, an attempt to execute an instruction in AVX or FMA extensions (including
an enhanced 128-bit SIMD instructions using VEX encoding) will cause a #UD exception.
3.2.6 Processor Extended State Save Optimization and XSAVEOPT
The XSAVEOPT instruction paired with XRSTOR is designed to provide a high performance
method for system software to perform state save and restore.
A processor may indicate its support for the XSAVEOPT instruction if
CPUID.(EAX=0DH, ECX=1):EAX.XSAVEOPT[Bit 0] = 1. The functionality of
XSAVEOPT is similar to XSAVE. Software can use XSAVEOPT/XRSTOR in a pair-wise
manner similar to XSAVE/XRSTOR to save and restore processor extended states.
The syntax and operands for XSAVEOPT instructions are identical to XSAVE, i.e. the
mask operand in EDX:EAX specifies the subset of enabled features to be saved.
Note that software using XSAVEOPT must observe the same restrictions as XSAVE
while allocating a new save area. i.e., the header area must be initialized to zeroes.
The first 64-bits in the save image header starting at offset 512 are referred to as
XHEADER.BV. However, the instruction differs from XSAVE in several important
aspects:
1. If a component state in the processor specified by the save mask corresponds to
an INIT state, the instruction may clear the corresponding bit in XHEADER.BV,
but may not write out the state (unlike the XSAVE instruction, which always
writes out the state).
2. If the processor determines that the component state specified by the save mask
hasn't been modified since the last XRSTOR, the instruction may not write out the
state to the save area.
3. A implication of this optimization is that software which needs to examine the
saved image must first check
header bit is clear, it means that the state is INIT and the saved memory image
may not correspond to the actual processor state.
4. The performance of XSAVEOPT will always be better than or at least equal to that
of XSAVE.
3.2.6.1 XSAVEOPT Usage Guidelines
When using the XSAVEOPT facility, software must be aware of the following guidelines:
1. The processor uses a tracking mechanism to determine which state components
will be written to memory by the XSAVEOPT instruction. The mechanism includes
three sub-conditions that are recorded internally each time XRSTOR is executed
and evaluated on the invocation of the next XSAVEOPT. If a change is detected in
any one of these sub-conditions, XSAVEOPT will behave exactly as XSAVE. The
three sub-conditions are:
— current CPL of the logical processor
— indication whether or not the logical processor is in VMX non-root operation
— linear address of the XSAVE/XRSTOR area
2. Upon allocation of a new XSAVE/XRSTOR area and before an XSAVE or XSAVEOPT
instruction is used, the save area header (HEADER.XSTATE) must be initialized to
zeroes for proper operation.
3. XSAVEOPT is designed primarily for use in context switch operations. The values
stored by the XSAVEOPT instruction depend on the values previously stored in a
given XSAVE area.
4. Manual modifications to the XSAVE area between an XRSTOR instruction and the
matching XSAVEOPT may result in data corruption.
5. For optimization to be performed properly, the XRSTOR XSAVEOPT pair must use
the same segment when referencing the XSAVE area and the base of that
segment must be unchanged between the two operations.
6. Software should avoid executing XSAVEOPT into a buffer from which it hadn’t
previously executed a XRSTOR. For newly allocated buffers, software can execute
XRSTOR with the linear address of the buffer and a restore mask of EDX:EAX = 0.
Executing XRSTOR(0:0) doesn’t restore any state, but ensures expected
operation of the XSAVEOPT instruction.
7. The XSAVE area can be moved or even paged, but the contents at the linear
address of the save area at an XSAVEOPT must be the same as that when the
previous XRSTOR was performed.
A destination operand not aligned to 64-byte boundary (in either 64-bit or 32-bit
modes) will result in a general-protection (#GP) exception being generated. In 64-bit
mode, the upper 32 bits of RDX and RAX are ignored
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
AVX and FMA instructions are encoded using a more efficient format than previous
instruction extensions in the Intel 64 and IA-32 architecture. The improved encoding
format uses a new prefix referred to as “VEX“. The VEX prefix may be two or three
bytes long, depending on the instruction semantics. Despite the length of the VEX
prefix, the instruction encoding format using VEX addresses two important issues:
(a) there exists inefficiency in instruction encoding due to SIMD prefixes and some
fields of the REX prefix, (b) Both SIMD prefixes and REX prefix increase in instruction
byte-length. This chapter describes the instruction encoding format using VEX.
.1 INSTRUCTION FORMATS
Legacy instruction set extensions in IA-32 architecture employs one or more “singlepurpose“
byte as an “escape opcode“, or required SIMD prefix (66H, F2H, F3H) to
expand the processing capability of the instruction set. Intel 64 architecture uses the
REX prefix to expand the encoding of register access in instruction operands. Both
SIMD prefixes and REX prefix carry the side effect that they can cause the length of
an instruction to increase significantly. Legacy Intel 64 and IA-32 instruction set are
limited to supporting instruction syntax of only two operands that can be encoded to
access registers (and only one can access a memory address).
Instruction encoding using VEX prefix provides several advantages:
• Instruction syntax support for three operands and up-to four operands when
necessary. For example, the third source register used by VBLENDVPD is encoded
using bits 7:4 of the immediate byte.
• Encoding support for vector length of 128 bits (using XMM registers) and 256 bits
(using YMM registers)
• Encoding support for instruction syntax of non-destructive source operands.
• Elimination of escape opcode byte (0FH), SIMD prefix byte (66H, F2H, F3H) via a
compact bit field representation within the VEX prefix.
• Elimination of the need to use REX prefix to encode the extended half of generalpurpose
register sets (R8-R15) for direct register access, memory addressing, or
accessing XMM8-XMM15 (including YMM8-YMM15).
• Flexible and more compact bit fields are provided in the VEX prefix to retain the
full functionality provided by REX prefix. REX.W, REX.X, REX.B functionalities are
provided in the three-byte VEX prefix only because only a subset of SIMD instructions
need them.
• Extensibility for future instruction extensions without significant instruction
length increase.
AMD has no support for this or software now or ever
Figure 4-1 shows the Intel 64 instruction encoding format with VEX prefix support.
Legacy instruction without a VEX prefix is fully supported and unchanged. The use of
VEX prefix in an Intel 64 instruction is optional, but a VEX prefix is required for Intel
64 instructions that operate on YMM registers or support three and four operand
syntax. VEX prefix is not a constant-valued, “single-purpose” byte like 0FH, 66H,
F2H, F3H in legacy SSE instructions. VEX prefix provides substantially richer capability
than the REX prefix.
4.1.1 VEX and the LOCK prefix
Any VEX-encoded instruction with a LOCK prefix preceding VEX will #UD.
4.1.2 VEX and the 66H, F2H, and F3H prefixes
Any VEX-encoded instruction with a 66H, F2H, or F3H prefix preceding VEX will #UD.
4.1.3 VEX and the REX prefix
Any VEX-encoded instruction with a REX prefix proceeding VEX will #UD.
4.1.4 The VEX Prefix
The VEX prefix is encoded in either the two-byte form (the first byte must be C5H) or
in the three-byte form (the first byte must be C4H). The two-byte VEX is used mainly
for 128-bit, scalar, and the most common 256-bit AVX instructions; while the threebyte
VEX provides a compact replacement of REX and 3-byte opcode instructions
(including AVX and FMA instructions). Beyond the first byte of the VEX prefix, it
consists of a number of bit fields providing specific capability, they are shown in
Figure 4-2.
The bit fields of the VEX prefix can be summarized by its functional purposes:
• Non-destructive source register encoding (applicable to three and four operand
syntax): This is the first source operand in the instruction syntax. It is
represented by the notation, VEX.vvvv. This field is encoded using 1’s
— NDS, NDD, DDS: specifies that VEX.vvvv field is valid for the encoding of a
register operand:
• VEX.NDS: VEX.vvvv encodes the first source register in an instruction
syntax where the content of source registers will be preserved.
• VEX.NDD: VEX.vvvv encodes the destination register that cannot be
encoded by ModR/M:reg field.
• VEX.DDS: VEX.vvvv encodes the second source register in a threeoperand
instruction syntax where the content of first source register will
be overwritten by the result.
• If none of NDS, NDD, and DDS is present, VEX.vvvv must be 1111b (i.e.
VEX.vvvv does not encode an operand). The VEX.vvvv field can be
encoded using either the 2-byte or 3-byte form of the VEX prefix.
— 128,256: VEX.L field can be 0 (denoted by VEX.128 or VEX.LZ) or 1
(denoted by VEX.256). The VEX.L field can be encoded using either the 2-
byte or 3-byte form of the VEX prefix. The presence of the notation VEX.256
or VEX.128 in the opcode column should be interpreted as follows:
• If VEX.256 is present in the opcode column: The semantics of the
instruction must be encoded with VEX.L = 1. An attempt to encode this
instruction with VEX.L= 0 can result in one of two situations: (a) if
VEX.128 version is defined, the processor will behave according to the
defined VEX.128 behavior; (b) an #UD occurs if there is no VEX.128
version defined.
• If VEX.128 is present in the opcode column but there is no VEX.256
version defined for the same opcode byte: Two situations apply: (a) For
VEX-encoded, 128-bit SIMD integer instructions, software must encode
the instruction with VEX.L = 0. The processor will treat the opcode byte
encoded with VEX.L= 1 by causing an #UD exception; (b) For VEXencoded,
128-bit packed floating-point instructions, software must
encode the instruction with VEX.L = 0. The processor will treat the opcode
byte encoded with VEX.L= 1 by causing an #UD exception (e.g.
VMOVLPS).
• If VEX.LIG is present in the opcode column: The VEX.L value is ignored.
This generally applies to VEX-encoded scalar SIMD floating-point instructions.
Scalar SIMD floating-point instruction can be distinguished from
the mnemonic of the instruction. Generally, the last two letters of the
instruction mnemonic would be either “SS“, “SD“, or “SI“ for SIMD
floating-point conversion instructions.
5-4 Ref. # 319433-010
INSTRUCTION SET REFERENCE
•
Legacy SSE”: Refers to SSE, SSE2, SSE3, SSSE3, SSE4, and any future
instruction sets referencing XMM registers and encoded without a VEX prefix.
• XGETBV, XSETBV, XSAVE, XRSTOR are defined in IA-32 Intel Architecture
Software Developer’s Manual, Volumes 3A and Intel® 64 and IA-32 Architectures
Software Developer’s Manual, Volume 2B.
• VEX: refers to a two-byte or three-byte prefix. AVX and FMA instructions are
encoded using a VEX prefix.
• VEX.vvvv. The VEX bitfield specifying a source or destination register (in 1’s
complement form).
• rm_field: shorthand for the ModR/M r/m field and any REX.B
• reg_field: shorthand for the ModR/M reg field and any REX.R
128-bit Legacy SSE version: The first source operand and the destination operand is
the same. Bits (VLMAX-1:128) of the corresponding YMM destination register remain
unchanged. The mask register operand is implicitly defined to be the architectural
register XMM0. An attempt to execute BLENDVPS with a VEX prefix will cause #UD.
VEX.128 encoded version: The first source operand and the destination operand are
XMM registers. The second source operand is an XMM register or 128-bit memory
location. The mask operand is the third source register, and encoded in bits[7:4] of
the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode,
imm8[7] is ignored. The upper bits (VLMAX-1:128) of the corresponding YMM
register (destination register) are zeroed. VEX.W must be 0, otherwise, the instruction
will #UD.
VEX.256 encoded version: The first source operand and destination operand are YMM
registers. The second source operand can be a YMM register or a 256-bit memory
location. The mask operand is the third source register, and encoded in bits[7:4] of
the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode,
imm8[7] is ignored. VEX.W must be 0, otherwise, the instruction will #UD.
VBLENDVPS permits the mask to be any XMM or YMM register. In contrast,
BLENDVPS treats XMM0 implicitly as the mask and do not support non-destructive
destination operation.
The comparison predicate operand is an 8-bit immediate:
• For instructions encoded using the VEX prefix, bits 4:0 define the type of
comparison to be performed (see Figure 5-9). Bits 5 through 7 of the immediate
are reserved.
• For instruction encodings that do not use VEX prefix, bits 2:0 define the type of
comparison to be made (see the first 8 rows of Table 5-9). Bits 3 through 7 of the
immediate are reserved.
The unordered relationship is true when at least one of the two source operands
being compared is a NaN; the ordered relationship is true when neither source
operand is a NaN.
A subsequent computational instruction that uses the mask result in the destination
operand as an input operand will not generate an exception, because a mask of all 0s
corresponds to a floating-point value of +0.0 and a mask of all 1s corresponds to a
QNaN.
Note that processors with “CPUID.1H:ECX.AVX =0” do not implement the “greaterthan”,
“greater-than-or-equal”, “not-greater than”, and “not-greater-than-or-equal
relations” predicates. These comparisons can be made either by using the inverse
relationship (that is, use the “not-less-than-or-equal” to make a “greater-than”
comparison) or by using software emulation. When using software emulation, the
program must swap the operands (copying registers when necessary to protect the
data that will now be in the destination), and then perform the compare using a
different predicate. The predicate to be used for these emulations is listed in the first
8 rows of Table 3-7 (Intel 64 and IA-32 Architectures Software Developer’s Manual
Volume 2A) under the heading Emulation.
Compilers and assemblers may implement the following two-operand pseudo-ops in
addition to the three-operand CMPPS instruction, for processors with
“CPUID.1H:ECX.AVX =0”. See Table 5-12. Compiler should treat reserved Imm8
values as illegal syntax.
The greater-than relations that the processor does not implement require more than
one instruction to emulate in software and therefore should not be implemented as
pseudo-ops. (For these, the programmer should reverse the operands of the corresponding
less than relations and use move instructions to ensure that the mask is
moved to the correct destination register and that the source operand is left intact.)
Processors with “CPUID.1H:ECX.AVX =1” implement the full complement of 32 predicates
shown in Table 5-13, software emulation is no longer needed. Compilers and
assemblers may implement the following three-operand pseudo-ops in addition to
the four-operand VCMPPS instruction. See Table 5-13, where the notation of reg1
and reg2 represent either XMM registers or YMM registers. Compiler should treat
reserved Imm8 values as illegal syntax. Alternately, intrinsics can map the pseudoops
to pre-defined constants to support a simpler intrinsic interface.
maximum signed doubleword integer, the floating-point invalid exception is raised,
and if this exception is masked, the indefinite integer value (80000000H) is returned.
Legacy SSE instructions: Use of the REX.W prefix promotes the instruction to 64-bit
operation. See the summary chart at the beginning of this section for encoding data
and limits.
Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise
instructions will #UD.
128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit
memory location. The destination is not distinct from the first source XMM register
and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are
unmodified.
If VDPPD is encoded with VEX.L= 1, an attempt to execute the instruction encoded
with VEX.L= 1 will cause an #UD exception
Extracts 128-bits of packed floating-point values from the source operand (second
operand) at an 128-bit offset from imm8[0] into the destination operand (first
operand). The destination may be either an XMM register or an 128-bit memory location.
VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.
The high 7 bits of the immediate are ignored.
If VEXTRACTF128 is encoded with VEX.L= 0, an attempt to execute the instruction
encoded with VEX.L= 0 will cause an #UD exception.
Extracts 128-bits of packed floating-point values from the source operand (second
operand) at an 128-bit offset from imm8[0] into the destination operand (first
operand). The destination may be either an XMM register or an 128-bit memory location.
VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.
The high 7 bits of the immediate are ignored.
If VEXTRACTF128 is encoded with VEX.L= 0, an attempt to execute the instruction
encoded with VEX.L= 0 will cause an #UD exception.
256-bit VEX-encoded instruction and legacy 128-bit SIMD instructions has internal
state to manage the upper and lower halves of the YMM states. Functionally, VEXencoded
SIMD instructions can be intermixed with legacy SSE instructions (non-VEXencoded
SIMD instructions operating on XMM registers). However, there is a performance
impact with intermixing VEX-encoded SIMD instructions (AVX, FMA) and
Legacy SSE instructions that only operate on the XMM register state.
The general programming considerations to realize optimal performance are the
following:
• Minimize transition delays and partial register stalls with YMM registers accesses:
Intermixed 256-bit, 128-bit or scalar SIMD instructions that are encoded with
VEX prefixes have no transition delay due to internal state management.
Sequences of legacy SSE instructions (including SSE2, and subsequent
generations non-VEX-encoded SIMD extensions) that are not intermixed with
VEX-encoded SIMD instructions are not subject to transition delays.
• When an application must employ AVX and/or FMA, along with legacy SSE code,
it should minimize the number of transitions between VEX-encoded instructions
and legacy, non-VEX-encoded SSE code. Section 2.8.1 provides recommendation
for software to minimize the impact of transitions between VEX-encoded code
and legacy SSE code.
In addition to performance considerations, programmers should also be cognizant of
the implications of VEX-encoded AVX instructions with the expectations of system
software components that manage the processor state components enabled by
XCR0. For additional information see Section 4.1.9.1, “Vector Length Transition and
Programming Considerations”.
1.3.3 VEX Prefix Instruction Encoding Support
Intel AVX introduces a new prefix, referred to as VEX, in the Intel 64 and IA-32
instruction encoding format. Instruction encoding using the VEX prefix provides the
following capabilities:
• Direct encoding of a register operand within VEX. This provides instruction syntax
support for non-destructive source operand.
• Efficient encoding of instruction syntax operating on 128-bit and 256-bit register
sets.
• Compaction of REX prefix functionality: The equivalent functionality of the REX
prefix is encoded within VEX.
• Compaction of SIMD prefix functionality and escape byte encoding: The functionality
of SIMD prefix (66H, F2H, F3H) on opcode is equivalent to an opcode
extension field to introduce new processing primitives. This functionality is
replaced by a more compact representation of opcode extension within the VEX
prefix. Similarly, the functionality of the escape opcode byte (0FH) and two-byte
escape (0F38H, 0F3AH) are also compacted within the VEX prefix encoding.
• Most VEX-encoded SIMD numeric and data processing instruction semantics with
memory operand have relaxed memory alignment requirements than instructions
encoded using SIMD prefixes (see Section 2.5).
VEX prefix encoding applies to SIMD instructions operating on YMM registers, XMM
registers, and in some cases with a general-purpose register as one of the operand.
VEX prefix is not supported for instructions operating on MMX or x87 registers.
This is last I will add in this post . But anyonr who says AMD can use the vex prefix is kidding themselfes . It states in this PDF what the prefix of vex is . Its Intel software computational slices encoded