Thoughts on "8 Core" Bulldozer and "4 Core Sandy Bridge"

garagisti · Jun 9, 2011

Lepton87 said:
Is anyone else having problems understanding this incoherent gibberish?

wife: i'm going to push you under a bus
nemesis: intel is gr8
wife: i loved you once
nemesis: but i love intel...
wife: i've got a gun you know
nemesis: i love intel
wife: goodbye and good riddance. BANG!
nemesis: I LOVE INTEL....

Sorry admins, got too tempted as threads are constantly derailed by Inteh shills.

Intel today owns process and AMD leads architecturally. Haswell including stuff from BD is a testament to that...

Nemesis 1 · Jun 10, 2011

more on compiler

http://software.intel.com/en-us/art...r-intel-avx-intel-advanced-vector-extensions/

Optimization Notice
Intel® compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel® and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the "Intel® Compiler User and Reference Guides" under "Compiler Options." Many library routines that are part of Intel® compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel® compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors.

Intel® compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.

While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel® and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not.

Notice revision #20101101.

ed29a · Jun 10, 2011

Nemesis 1 said:
NO I am saying that AVX is getting recompiled work Ready for the Haswell generation of cores and compilers running morphed x86 software. NO hardware X86. When haswell appears all the sse2&3 code that uses the prefix of vex code correctly well be known New programs should use instruction set that gives best results. Intel can complete the software This is were LRBni comes in and the vector unit comes into play As intel continues to morph X86 code. Everthing intel has done since 1999 has lead in this direction The Elbrus buy out . Intel wanting to do itanic with VLIW What intel was trying to do with the P6, AVX FMA4 announcement . Than saying they would do AVX and the Prefix of vec. Only cancelling FMA4. Larribbee and the development work LRBin with inorder cores and that development continues on inorder cores using a vector Unit TO further development of Vex instruction code along with SB IB and finally the LRBin instruction set running VLIW cpu Morphing X86.legacy code that wouldn't convert VeX instruction set with LRBIN tieing all three elements to gether Jit compilers VEX instruction set Prefix of VEX. compilers

Someone translate this please.

Nemesis 1 · Jun 10, 2011

Read the links it might help

Topweasel · Jun 10, 2011

Nemesis 1 said:
more on compiler

http://software.intel.com/en-us/art...r-intel-avx-intel-advanced-vector-extensions/

Optimization Notice
Intel® compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel® and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the "Intel® Compiler User and Reference Guides" under "Compiler Options." Many library routines that are part of Intel® compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel® compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors.

Intel® compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.

While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel® and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not.

Notice revision #20101101.

That has nothing to do with Haswell and has everything to do with the fact that as of about 8 years ago the compiler caused programs to look at CPU-ID and if it wasn't Genuine Intel it would auto turn off all SSE functionality even if the CPU supported it.

Intel didn't get in trouble because the use of the compiler is not as high as one would think and Intel doesn't as a company have to make sure their stuff runs well on competitors products.

What it did mean is that several developers specially ones making enthusiast products (games and benchmarks) and encoders, stopped using Intel's compilers if they were. If their hope is to have their compiler move stuff to "better" support Haswell. It won't happen.

Nemesis 1 · Jun 10, 2011

You really do need to read the posted linKs posted. Compilers is in fact the topic.
YOU need to understand what the vexPrefix is and how it functions. And why AMD cann't use prefix . Which makes avx pretty useless on an AMD processor. Its mostly AMD marketing. The FMA4/AVX using AMD XOD prefix or what ever it is called cann't access AVX Intel software . In the cupid. or OS mirroring and masking will not allow access to Intel software. Read the link to the 800 page pdf and you will understand the differance of AMD AVX and intel AVX use the VEXprefix. Read why AMD is 0/0 and SB with the vex prefix is 0/1. Than and only than will you beable comprend what reality is.

Compiler and software and arch In the SB/BD generation is whats seperates the Men from the boys . I have posted almost all available info on this except the secret stuff that intel withheld.

Topweasel · Jun 10, 2011

Nemesis 1 said:
You really do need to read the posted linKs posted. Compilers is in fact the topic.
YOU need to understand what the vexPrefix is and how it functions. And why AMD cann't use prefix . Which makes avx pretty useless on an AMD processor. Its mostly AMD marketing. The FMA4/AVX using AMD XOD prefix or what ever it is called cann't access AVX Intel software . In the cupid. or OS mirroring and masking will not allow access to Intel software. Read the link to the 800 page pdf and you will understand the differance of AMD AVX and intel AVX use the VEXprefix. Read why AMD is 0/0 and SB with the vex prefix is 0/1. Than and only than will you beable comprend what reality is.

Compiler and software and arch In the SB/BD generation is whats seperates the Men from the boys . I have posted almost all available info on this except the secret stuff that intel withheld.

What I am saying is that disclaimer has nothing to do with AVX or any other special technology. I am not going to get into that tech as reading your attempts at using technical information to build a debate would make Data explode.

That disclaimer was to prevent legal trouble from customers. When AMD went after Intel over how their compiler treated their processors. Intels lawyers drew that up to protect themselves. Not from AMD, but users of their compiler that may have taken issue with the fact that their compilers could have been screwing with their customer base.

As for Haswell at this point, I really don't care, we are a waaays off from seeing it. But I just wanted to note that if you were implying that the Intel compilers were going to be needed to reap the benefits of Haswell. I wouldn't hold your breathe because no self respecting developer of games, encoders, or enthusiast software titles are going to cross that bridge again.

LOL_Wut_Axel · Jun 10, 2011

Edrick said:
Clearly you seem to be the one who is consumed with the marketing lingo.

News flash, if SB-E and Bulldozer are priced in the same segment, then they will compete with each other. It doesn't matter really what the pre-release roadmaps say.

If they were priced the same they wouldn't be in different performance categories altogether. Also, it's not just the CPU here: the SB-E platform will be much more expensive than Sandy Bridge and Bulldozer. For your argument to make sense, they'd have to have a similar platform first and foremost, which they won't.

3DVagabond · Jun 10, 2011

Nemesis 1 said:
You really do need to read the posted linKs posted. Compilers is in fact the topic.
YOU need to understand what the vexPrefix is and how it functions. And why AMD cann't use prefix . Which makes avx pretty useless on an AMD processor. Its mostly AMD marketing. The FMA4/AVX using AMD XOD prefix or what ever it is called cann't access AVX Intel software . In the cupid. or OS mirroring and masking will not allow access to Intel software. Read the link to the 800 page pdf and you will understand the differance of AMD AVX and intel AVX use the VEXprefix. Read why AMD is 0/0 and SB with the vex prefix is 0/1. Than and only than will you beable comprend what reality is.

Compiler and software and arch In the SB/BD generation is whats seperates the Men from the boys . I have posted almost all available info on this except the secret stuff that intel withheld.

If you want to have a discussion on non technical forums you need to explain things in layman's terms. You are being condescending to tell people to go read an 800 pg. document to be able to understand you. Most people here aren't going to care enough about what you think to go through the effort. Besides, odds are they/we won't understand the 800 pg technical document either. You need to communicate at a level that others can understand.

notty22 · Jun 10, 2011

3DVagabond said:
If you want to have a discussion on non technical forums you need to explain things in layman's terms. You are being condescending to tell people to go read an 800 pg. document to be able to understand you. Most people here aren't going to care enough about what you think to go through the effort. Besides, odds are they/we won't understand the 800 pg technical document either. You need to communicate at a level that others can understand.

Yes.

Cerb · Jun 10, 2011

Nemesis 1 said:
You really do need to read the posted linKs posted. Compilers is in fact the topic.
YOU need to understand what the vexPrefix is and how it functions. And why AMD cann't use prefix . Which makes avx pretty useless on an AMD processor. Its mostly AMD marketing. The FMA4/AVX using AMD XOD prefix or what ever it is called cann't access AVX Intel software . In the cupid. or OS mirroring and masking will not allow access to Intel software. Read the link to the 800 page pdf and you will understand the differance of AMD AVX and intel AVX use the VEXprefix. Read why AMD is 0/0 and SB with the vex prefix is 0/1. Than and only than will you beable comprend what reality is.

Nothing you have posted has yet shown why AMD can't use it. Has Intel forced AMD into an agreement that forbids them from supporting it, or haven't they?

Compiler and software and arch In the SB/BD generation is whats seperates the Men from the boys . I have posted almost all available info on this except the secret stuff that intel withheld.

Has Intel withheld some kind of GenuineIntel check being added to MSVC or GCC, then? Have they slipped in a secret patch to LLVM's x86 backend? The compilers that matter are 3rd party, and we like it that way.

How the instructions work is a technical implementation issue. Whether AMD can offer support for it is an IP rights licensing issue. Has Intle created a forbidden ISA extension? If they have, how will it not be an antitrust violation, if not a violation of agreements existing between AMD and Intel? If they have not created a forbidden ISA extension, then AMD may implement it in future CPUs.

Nemesis 1 · Jun 10, 2011

In the documents it clearly states that In order fot Vexprefix to perform as designed that a jitcompiler is required as intel installed computational slices with those prefix of Vex , Its software its not part of the core. AMD in there usage of FMA4 has to use 0/0

Intel uses 0/1 with out the O/1 Intels software that makes AVX perform on Intels cpus Gives an illegeal cupid single. In AVX intel allowed for the O/O path . But intel using software shortened that path for its CPU. In other words the only differance Between AMD AVX and Intels is Alot of the computational work is optimized on the software that only the vexprefix computational slices access, The PDF is easy to follow and their is alot in it . I suppose I could list a page for ya to read thats shows this and than another and another . Its no small thing. But In the first 100 pages their is a wealth of information and it just keeps getting better.

Yes The secret PATH IS 0/1 AMD has to use 0/0 to met all the protcol of the AVX instruction set or the the OS supporting AVX will not function correctly. Intels 0/1 allows the use of geniun intel cpus and its software.. AMD can do the same function using 0/0 but not using intels software speedups of to 2.2

Nemesis 1 · Jun 11, 2011

I will post the link here than I will go get a page . You can view the page and will discuss it .

http://software.intel.com/file/35247/

One of the interesting points here is . This compaction of REx functionality the equivalentof rex prefix is encoded withen VEX. There alot of encoding going on here with the vex prefix. A AMD AVX has to use the rexprefix. Its explained in here what that prefix is,

page 20 1.3 pade 21 1.3.2 page 22 1.33 . Read those see what ya get out of this . Than I will take ya onward .

goodnight

podspi · Jun 11, 2011

But software runs on hardware. If AMD is allowed to make Intel-compatible hardware (and vice-versa, with x64), the only way for Intel to shut AMD out is to lock-out their software.

If Intel ties their software to their CPU, that is again an antitrust issue. Like it or not, when you own the majority of a market like Intel does (especially if you've acted the way Intel has), you aren't allowed to do whatever you want. This is the law.

BTW, I read the pages that you specified. It DOES sound like some really cool stuff, but nothing in there implies that it is anything AMD could not implement. It looks like a new instruction/instruction syntax, which AMD has always been allowed to implement. Yes, they might have to play catch-up (since they have to wait to see what Intel actually does), but they can't be barred from implementing it.

Technically that isn't true, but they can be barred from implementing it as easy as Intel can be barred from implementing x64.

Cerb · Jun 11, 2011

Um, yeah, I can't beat podspi's 3rd paragraph. Nothing there appears to state that only Intel® processors are allowed to support it.

JFAMD · Jun 11, 2011

We are working with all of the compiler verndors to ensure that AMD processors can run AVX code. I can't speak to their tools (that is there business) but we have had no complaints nor issues with PGI, GCC, Microsoft, etc.

I can't speak for intel, but if their compiler doesn't support AVX on AMD but everyone else's does, that either means that a.) they are not smart enough to figure it out or b.) they are purposely doing something to mess with us.

I seriously doubt it is A and I believe, per the terms of their agreement, it can't be B. Which would lead me to believe that the whole discusssion is FUD at this point.

Nemesis 1 · Jun 11, 2011

podspi said:
But software runs on hardware. If AMD is allowed to make Intel-compatible hardware (and vice-versa, with x64), the only way for Intel to shut AMD out is to lock-out their software.

If Intel ties their software to their CPU, that is again an antitrust issue. Like it or not, when you own the majority of a market like Intel does (especially if you've acted the way Intel has), you aren't allowed to do whatever you want. This is the law.

BTW, I read the pages that you specified. It DOES sound like some really cool stuff, but nothing in there implies that it is anything AMD could not implement. It looks like a new instruction/instruction syntax, which AMD has always been allowed to implement. Yes, they might have to play catch-up (since they have to wait to see what Intel actually does), but they can't be barred from implementing it.

Technically that isn't true, but they can be barred from implementing it as easy as Intel can be barred from implementing x64.

Hold on . I said I would walk you threw it . I post pages that show AMD can not use the software path.

Nemesis 1 · Jun 11, 2011

JFAMD said:
We are working with all of the compiler verndors to ensure that AMD processors can run AVX code. I can't speak to their tools (that is there business) but we have had no complaints nor issues with PGI, GCC, Microsoft, etc.

I can't speak for intel, but if their compiler doesn't support AVX on AMD but everyone else's does, that either means that a.) they are not smart enough to figure it out or b.) they are purposely doing something to mess with us.

I seriously doubt it is A and I believe, per the terms of their agreement, it can't be B. Which would lead me to believe that the whole discusssion is FUD at this point.

All ready stated that Intel left a path for AMD . But not the software path . THE prefix vex dertemines when the software is used. AMDs prefix does not allow for this path. period and never will . Intels software is patented, Everthing I am trying to convey is about the prefix of vex which you already know AMD can't use . NO mask no mirroring allowed period.

Nemesis 1 · Jun 11, 2011

VEX prefix encoding applies to SIMD instructions operating on YMM registers, XMM
registers, and in some cases with a general-purpose register as one of the operand.
VEX prefix is not supported for instructions operating on MMX or x87 registers.
Details of VEX prefix and instruction encoding are discussed in Chapter 4.

1.4.3 Arithmetic Primitives for 128-bit Vector and Scalar processing
Intel AVX provides 131 128-bit numeric processing instructions that employ VEXprefix
encoding. These VEX-encoded instructions generally provide the same functionality
over instructions operating on XMM register that are encoded using SIMD
prefixes. The 128-bit numeric processing instructions in AVX cover floating-point and
integer data processing; across 128-bit vector and scalar processing.
The enhancement in AVX on 128-bit floating-point compare operation provides 32
conditional predicates to improve programming flexibility in evaluating conditional
expressions. This contrasts with floating-point SIMD compare instructions in SSE and
SSE2 supporting only 8 conditional predicates.
FMA provides 60 128-bit floating-point instructions to process 128-bit vector and
scalar data. The arithmetic operations cover fused multiply-add, fused multiplysubtract,
signed-reversed multiply on fused multiply-add and multiply-subtract

new variable blend instructions supports four-operand syntax with nondestructive
source syntax. Branching conditions dependent on floating-point
data or integer data can benefit from Intel AVX. This is more flexible than
non-VEX encoded instruction syntax that uses the XMM0 register as implied
mask for blend selection. While variable blend with implied XMM0 syntax is
supported in SSE4 using SIMD prefix encoding, VEX-encoded 128-bit variable
blend instructions only support the more flexible four-operand syntax.

Prior to using AVX, the application must identify that the operating system supports
the XGETBV instruction, the YMM register state, in addition to processor’s support for
YMM state management using XSAVE/XRSTOR and AVX instructions. The following
simplified sequence accomplishes both and is strongly recommended.
1) Detect CPUID.1:ECX.OSXSAVE[bit 27] = 1 (XGETBV enabled for application use1)
2) Issue XGETBV and verify that XFEATURE_ENABLED_MASK[2:1] = ‘11b’ (XMM
state and YMM state are enabled by OS).
3) detect CPUID.1:ECX.AVX[bit 28] = 1 (AVX instructions supported).
(Step 3 can be done in any order relative to 1 and 2)
The following pseudocode illustrates this recommended application AVX detection
process:
----------------------------------------------------------------------------------------
INT supports_AVX()
{ ; result in eax
mov eax, 1
cpuid
Figure 2-1. General Procedural Flow of Application Detection of AVX
1. If CPUID.01H:ECX.OSXSAVE reports 1, it also indirectly implies the processor supports XSAVE,
XRSTOR, XGETBV, processor extended state bit vector XFEATURE_ENALBED_MASK register.
Thus an application may streamline the checking of CPUID feature flags for XSAVE and OSXSAVE.
XSETBV is a privileged instruction

It is unwise for an application to rely exclusively on CPUID.1:ECX.AVX[bit 28]
or at all on CPUID.1:ECX.XSAVE[bit 26]: These indicate hardware support but not
operating system support. If YMM state management is not enabled by an operating
systems, AVX instructions will #UD regardless of CPUID.1:ECX.AVX[bit 28].
“CPUID.1:ECX.XSAVE[bit 26] = 1” does not guarantee the OS actually uses the
XSAVE process for state management.
These steps above also apply to enhanced 128-bit SIMD floating-pointing instructions
in AVX (using VEX prefix-encoding) that operate on the YMM states. Application
detection of VEX-encoded AES is described in Section 2.2.2.

cmp ecx, 018001000H; check OSXSAVE, AVX, FMA feature flags
jne not_supported
; processor supports AVX,FMA instructions and XGETBV is enabled by OS
mov ecx, 0; specify 0 for XFEATURE_ENABLED_MASK register
XGETBV; result in EDX:EAX
and eax, 06H
cmp eax, 06H; check OS has enabled both XMM and YMM state support
jne not_supported
mov eax, 1
jmp done
NOT_SUPPORTED:
mov eax, 0
done:

The lower 128 bits of a YMM register is aliased to the corresponding XMM register.
Legacy SSE instructions (i.e. SIMD instructions operating on XMM state but not using
the VEX prefix, also referred to non-VEX encoded SIMD instructions) will not access
the upper bits (255:128) of the YMM registers. AVX and FMA instructions with a VEX
prefix and vector length of 128-bits zeroes the upper 128 bits of the YMM register.
See Chapter 2, “Programming Considerations with 128-bit SIMD Instructions” for
more details.
Upper bits of YMM registers (255:128) can be read and written by many instructions
with a VEX.256 prefix.
XSAVE and XRSTOR may be used to save and restore the upper bits of the YMM registers.

Bits 31-0: Reports the valid bit fields of the lower 32 bits of the
XFEATURE_ENABLED_MASK register. If a bit is 0, the corresponding bit
field in XFEATURE_ENABLED_MASK is reserved.
Bit 00: legacy x87
Bit 01: 128-bit SSE
Bit 02: 256-bit AVX
EBX Bits 31-0: Maximum size (bytes, from the beginning of the
XSAVE/XRSTOR save area) required by enabled features in XCR0. May
be different than ECX if some features at the end of the XSAVE save
area are not enabled.
ECX Bit 31-0: Maximum size (bytes, from the beginning of the
XSAVE/XRSTOR save area) of the XSAVE/XRSTOR save area required
by all supported features in the processor, i.e all the valid bit fields in
XCR0.
EDX Bit 31-0: Reports the valid bit fields of the upper 32 bits of the
XFEATURE_ENABLED_MASK register (XCR0). If a bit is 0, the corresponding
bit field in XCR0 is reserved
Processor Extended State Enumeration Sub-

Leaf 0DH output depends on the initial value in ECX.
If ECX contains an invalid sub leaf index, EAX/EBX/ECX/EDX return 0.
Each valid sub-leaf index maps to a valid bit in the XCR0 register
starting at bit position 2
EAX Bits 31-0: The size in bytes (from the offset specified in EBX) of the
save area for an extended state feature associated with a valid subleaf
index, n. This field reports 0 if the sub-leaf index, n, is invalid*.

YMM STATE, VEX PREFIX AND SUPPORTED OPERATING
MODES
AVX and FMA instructions comprises of 256-bit and 128-bit instructions that operates
on YMM states via VEX prefix encoding. SIMD instructions operating on XMM states
(i.e. not accessing the upper 128 bits of YMM) generally do not use VEX prefix.
For processors that support YMM states, the YMM state exists in all operating modes.
However, the available interfaces to access YMM states may vary in different modes.
The processor's support for instruction extensions that employ VEX prefix encoding is
independent of the processor's support for YMM state.
Instructions requiring VEX prefix encoding generally are supported in 64-bit, 32-bit
modes, and 16-bit protected mode. They are not supported in Real mode, Virtual-
8086 mode or entering into SMM mode.
Note that bits 255:128 of YMM register state are maintained across transitions into
and out of these modes. Because, XSAVE/XRSTOR instruction can operate in all operating
modes, it is possible that the processor's YMM register state can be modified by
software in any operating mode by executing XRSTOR. The YMM registers can be
updated by XRSTOR using the state information stored in the XSAVE/XRSTOR area
residing in memory.

Operating systems must use the XSAVE/XRSTOR instructions for YMM state management.
The XSAVE/XRSTOR instructions also provide flexible and efficient interface to
manage XMM/MXCSR states and x87 FPU states in conjunction with new processor
extended states.
An OS must enable its YMM state management to support AVX and FMA extensions.
Otherwise, an attempt to execute an instruction in AVX or FMA extensions (including
an enhanced 128-bit SIMD instructions using VEX encoding) will cause a #UD exception.

3.2.6 Processor Extended State Save Optimization and XSAVEOPT
The XSAVEOPT instruction paired with XRSTOR is designed to provide a high performance
method for system software to perform state save and restore.
A processor may indicate its support for the XSAVEOPT instruction if
CPUID.(EAX=0DH, ECX=1):EAX.XSAVEOPT[Bit 0] = 1. The functionality of
XSAVEOPT is similar to XSAVE. Software can use XSAVEOPT/XRSTOR in a pair-wise
manner similar to XSAVE/XRSTOR to save and restore processor extended states.
The syntax and operands for XSAVEOPT instructions are identical to XSAVE, i.e. the
mask operand in EDX:EAX specifies the subset of enabled features to be saved.
Note that software using XSAVEOPT must observe the same restrictions as XSAVE
while allocating a new save area. i.e., the header area must be initialized to zeroes.
The first 64-bits in the save image header starting at offset 512 are referred to as
XHEADER.BV. However, the instruction differs from XSAVE in several important
aspects:
1. If a component state in the processor specified by the save mask corresponds to
an INIT state, the instruction may clear the corresponding bit in XHEADER.BV,
but may not write out the state (unlike the XSAVE instruction, which always
writes out the state).
2. If the processor determines that the component state specified by the save mask
hasn't been modified since the last XRSTOR, the instruction may not write out the
state to the save area.
3. A implication of this optimization is that software which needs to examine the
saved image must first check
header bit is clear, it means that the state is INIT and the saved memory image
may not correspond to the actual processor state.
4. The performance of XSAVEOPT will always be better than or at least equal to that
of XSAVE.
3.2.6.1 XSAVEOPT Usage Guidelines
When using the XSAVEOPT facility, software must be aware of the following guidelines:
1. The processor uses a tracking mechanism to determine which state components
will be written to memory by the XSAVEOPT instruction. The mechanism includes
three sub-conditions that are recorded internally each time XRSTOR is executed
and evaluated on the invocation of the next XSAVEOPT. If a change is detected in
any one of these sub-conditions, XSAVEOPT will behave exactly as XSAVE. The
three sub-conditions are:
— current CPL of the logical processor
— indication whether or not the logical processor is in VMX non-root operation
— linear address of the XSAVE/XRSTOR area
2. Upon allocation of a new XSAVE/XRSTOR area and before an XSAVE or XSAVEOPT
instruction is used, the save area header (HEADER.XSTATE) must be initialized to
zeroes for proper operation.
3. XSAVEOPT is designed primarily for use in context switch operations. The values
stored by the XSAVEOPT instruction depend on the values previously stored in a
given XSAVE area.
4. Manual modifications to the XSAVE area between an XRSTOR instruction and the
matching XSAVEOPT may result in data corruption.
5. For optimization to be performed properly, the XRSTOR XSAVEOPT pair must use
the same segment when referencing the XSAVE area and the base of that
segment must be unchanged between the two operations.
6. Software should avoid executing XSAVEOPT into a buffer from which it hadn’t
previously executed a XRSTOR. For newly allocated buffers, software can execute
XRSTOR with the linear address of the buffer and a restore mask of EDX:EAX = 0.
Executing XRSTOR(0:0) doesn’t restore any state, but ensures expected
operation of the XSAVEOPT instruction.
7. The XSAVE area can be moved or even paged, but the contents at the linear
address of the save area at an XSAVEOPT must be the same as that when the
previous XRSTOR was performed.
A destination operand not aligned to 64-byte boundary (in either 64-bit or 32-bit
modes) will result in a general-protection (#GP) exception being generated. In 64-bit
mode, the upper 32 bits of RDX and RAX are ignored

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

AVX and FMA instructions are encoded using a more efficient format than previous
instruction extensions in the Intel 64 and IA-32 architecture. The improved encoding
format uses a new prefix referred to as “VEX“. The VEX prefix may be two or three
bytes long, depending on the instruction semantics. Despite the length of the VEX
prefix, the instruction encoding format using VEX addresses two important issues:
(a) there exists inefficiency in instruction encoding due to SIMD prefixes and some
fields of the REX prefix, (b) Both SIMD prefixes and REX prefix increase in instruction
byte-length. This chapter describes the instruction encoding format using VEX.

.1 INSTRUCTION FORMATS
Legacy instruction set extensions in IA-32 architecture employs one or more “singlepurpose“
byte as an “escape opcode“, or required SIMD prefix (66H, F2H, F3H) to
expand the processing capability of the instruction set. Intel 64 architecture uses the
REX prefix to expand the encoding of register access in instruction operands. Both
SIMD prefixes and REX prefix carry the side effect that they can cause the length of
an instruction to increase significantly. Legacy Intel 64 and IA-32 instruction set are
limited to supporting instruction syntax of only two operands that can be encoded to
access registers (and only one can access a memory address).
Instruction encoding using VEX prefix provides several advantages:
• Instruction syntax support for three operands and up-to four operands when
necessary. For example, the third source register used by VBLENDVPD is encoded
using bits 7:4 of the immediate byte.
• Encoding support for vector length of 128 bits (using XMM registers) and 256 bits
(using YMM registers)
• Encoding support for instruction syntax of non-destructive source operands.
• Elimination of escape opcode byte (0FH), SIMD prefix byte (66H, F2H, F3H) via a
compact bit field representation within the VEX prefix.
• Elimination of the need to use REX prefix to encode the extended half of generalpurpose
register sets (R8-R15) for direct register access, memory addressing, or
accessing XMM8-XMM15 (including YMM8-YMM15).
• Flexible and more compact bit fields are provided in the VEX prefix to retain the
full functionality provided by REX prefix. REX.W, REX.X, REX.B functionalities are
provided in the three-byte VEX prefix only because only a subset of SIMD instructions
need them.
• Extensibility for future instruction extensions without significant instruction
length increase.
AMD has no support for this or software now or ever

Figure 4-1 shows the Intel 64 instruction encoding format with VEX prefix support.
Legacy instruction without a VEX prefix is fully supported and unchanged. The use of
VEX prefix in an Intel 64 instruction is optional, but a VEX prefix is required for Intel
64 instructions that operate on YMM registers or support three and four operand
syntax. VEX prefix is not a constant-valued, “single-purpose” byte like 0FH, 66H,
F2H, F3H in legacy SSE instructions. VEX prefix provides substantially richer capability
than the REX prefix.

4.1.1 VEX and the LOCK prefix
Any VEX-encoded instruction with a LOCK prefix preceding VEX will #UD.
4.1.2 VEX and the 66H, F2H, and F3H prefixes
Any VEX-encoded instruction with a 66H, F2H, or F3H prefix preceding VEX will #UD.
4.1.3 VEX and the REX prefix
Any VEX-encoded instruction with a REX prefix proceeding VEX will #UD.
4.1.4 The VEX Prefix
The VEX prefix is encoded in either the two-byte form (the first byte must be C5H) or
in the three-byte form (the first byte must be C4H). The two-byte VEX is used mainly
for 128-bit, scalar, and the most common 256-bit AVX instructions; while the threebyte
VEX provides a compact replacement of REX and 3-byte opcode instructions
(including AVX and FMA instructions). Beyond the first byte of the VEX prefix, it
consists of a number of bit fields providing specific capability, they are shown in
Figure 4-2.
The bit fields of the VEX prefix can be summarized by its functional purposes:
• Non-destructive source register encoding (applicable to three and four operand
syntax): This is the first source operand in the instruction syntax. It is
represented by the notation, VEX.vvvv. This field is encoded using 1’s

— NDS, NDD, DDS: specifies that VEX.vvvv field is valid for the encoding of a
register operand:
• VEX.NDS: VEX.vvvv encodes the first source register in an instruction
syntax where the content of source registers will be preserved.
• VEX.NDD: VEX.vvvv encodes the destination register that cannot be
encoded by ModR/M:reg field.
• VEX.DDS: VEX.vvvv encodes the second source register in a threeoperand
instruction syntax where the content of first source register will
be overwritten by the result.
• If none of NDS, NDD, and DDS is present, VEX.vvvv must be 1111b (i.e.
VEX.vvvv does not encode an operand). The VEX.vvvv field can be
encoded using either the 2-byte or 3-byte form of the VEX prefix.
— 128,256: VEX.L field can be 0 (denoted by VEX.128 or VEX.LZ) or 1
(denoted by VEX.256). The VEX.L field can be encoded using either the 2-
byte or 3-byte form of the VEX prefix. The presence of the notation VEX.256
or VEX.128 in the opcode column should be interpreted as follows:
• If VEX.256 is present in the opcode column: The semantics of the
instruction must be encoded with VEX.L = 1. An attempt to encode this
instruction with VEX.L= 0 can result in one of two situations: (a) if
VEX.128 version is defined, the processor will behave according to the
defined VEX.128 behavior; (b) an #UD occurs if there is no VEX.128
version defined.
• If VEX.128 is present in the opcode column but there is no VEX.256
version defined for the same opcode byte: Two situations apply: (a) For
VEX-encoded, 128-bit SIMD integer instructions, software must encode
the instruction with VEX.L = 0. The processor will treat the opcode byte
encoded with VEX.L= 1 by causing an #UD exception; (b) For VEXencoded,
128-bit packed floating-point instructions, software must
encode the instruction with VEX.L = 0. The processor will treat the opcode
byte encoded with VEX.L= 1 by causing an #UD exception (e.g.
VMOVLPS).
• If VEX.LIG is present in the opcode column: The VEX.L value is ignored.
This generally applies to VEX-encoded scalar SIMD floating-point instructions.
Scalar SIMD floating-point instruction can be distinguished from
the mnemonic of the instruction. Generally, the last two letters of the
instruction mnemonic would be either “SS“, “SD“, or “SI“ for SIMD
floating-point conversion instructions.
5-4 Ref. # 319433-010
INSTRUCTION SET REFERENCE
•

Legacy SSE”: Refers to SSE, SSE2, SSE3, SSSE3, SSE4, and any future
instruction sets referencing XMM registers and encoded without a VEX prefix.
• XGETBV, XSETBV, XSAVE, XRSTOR are defined in IA-32 Intel Architecture
Software Developer’s Manual, Volumes 3A and Intel® 64 and IA-32 Architectures
Software Developer’s Manual, Volume 2B.
• VEX: refers to a two-byte or three-byte prefix. AVX and FMA instructions are
encoded using a VEX prefix.
• VEX.vvvv. The VEX bitfield specifying a source or destination register (in 1’s
complement form).
• rm_field: shorthand for the ModR/M r/m field and any REX.B
• reg_field: shorthand for the ModR/M reg field and any REX.R

128-bit Legacy SSE version: The first source operand and the destination operand is
the same. Bits (VLMAX-1:128) of the corresponding YMM destination register remain
unchanged. The mask register operand is implicitly defined to be the architectural
register XMM0. An attempt to execute BLENDVPS with a VEX prefix will cause #UD.
VEX.128 encoded version: The first source operand and the destination operand are
XMM registers. The second source operand is an XMM register or 128-bit memory
location. The mask operand is the third source register, and encoded in bits[7:4] of
the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode,
imm8[7] is ignored. The upper bits (VLMAX-1:128) of the corresponding YMM
register (destination register) are zeroed. VEX.W must be 0, otherwise, the instruction
will #UD.
VEX.256 encoded version: The first source operand and destination operand are YMM
registers. The second source operand can be a YMM register or a 256-bit memory
location. The mask operand is the third source register, and encoded in bits[7:4] of
the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode,
imm8[7] is ignored. VEX.W must be 0, otherwise, the instruction will #UD.
VBLENDVPS permits the mask to be any XMM or YMM register. In contrast,
BLENDVPS treats XMM0 implicitly as the mask and do not support non-destructive
destination operation.

The comparison predicate operand is an 8-bit immediate:
• For instructions encoded using the VEX prefix, bits 4:0 define the type of
comparison to be performed (see Figure 5-9). Bits 5 through 7 of the immediate
are reserved.
• For instruction encodings that do not use VEX prefix, bits 2:0 define the type of
comparison to be made (see the first 8 rows of Table 5-9). Bits 3 through 7 of the
immediate are reserved.
The unordered relationship is true when at least one of the two source operands
being compared is a NaN; the ordered relationship is true when neither source
operand is a NaN.
A subsequent computational instruction that uses the mask result in the destination
operand as an input operand will not generate an exception, because a mask of all 0s
corresponds to a floating-point value of +0.0 and a mask of all 1s corresponds to a
QNaN.
Note that processors with “CPUID.1H:ECX.AVX =0” do not implement the “greaterthan”,
“greater-than-or-equal”, “not-greater than”, and “not-greater-than-or-equal
relations” predicates. These comparisons can be made either by using the inverse
relationship (that is, use the “not-less-than-or-equal” to make a “greater-than”
comparison) or by using software emulation. When using software emulation, the
program must swap the operands (copying registers when necessary to protect the
data that will now be in the destination), and then perform the compare using a
different predicate. The predicate to be used for these emulations is listed in the first
8 rows of Table 3-7 (Intel 64 and IA-32 Architectures Software Developer’s Manual
Volume 2A) under the heading Emulation.
Compilers and assemblers may implement the following two-operand pseudo-ops in
addition to the three-operand CMPPS instruction, for processors with
“CPUID.1H:ECX.AVX =0”. See Table 5-12. Compiler should treat reserved Imm8
values as illegal syntax.

The greater-than relations that the processor does not implement require more than
one instruction to emulate in software and therefore should not be implemented as
pseudo-ops. (For these, the programmer should reverse the operands of the corresponding
less than relations and use move instructions to ensure that the mask is
moved to the correct destination register and that the source operand is left intact.)
Processors with “CPUID.1H:ECX.AVX =1” implement the full complement of 32 predicates
shown in Table 5-13, software emulation is no longer needed. Compilers and
assemblers may implement the following three-operand pseudo-ops in addition to
the four-operand VCMPPS instruction. See Table 5-13, where the notation of reg1
and reg2 represent either XMM registers or YMM registers. Compiler should treat
reserved Imm8 values as illegal syntax. Alternately, intrinsics can map the pseudoops
to pre-defined constants to support a simpler intrinsic interface.

maximum signed doubleword integer, the floating-point invalid exception is raised,
and if this exception is masked, the indefinite integer value (80000000H) is returned.
Legacy SSE instructions: Use of the REX.W prefix promotes the instruction to 64-bit
operation. See the summary chart at the beginning of this section for encoding data
and limits.
Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise
instructions will #UD.

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit
memory location. The destination is not distinct from the first source XMM register
and the upper bits (VLMAX-1:128) of the corresponding YMM register destination are
unmodified.
If VDPPD is encoded with VEX.L= 1, an attempt to execute the instruction encoded
with VEX.L= 1 will cause an #UD exception

Extracts 128-bits of packed floating-point values from the source operand (second
operand) at an 128-bit offset from imm8[0] into the destination operand (first
operand). The destination may be either an XMM register or an 128-bit memory location.
VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.
The high 7 bits of the immediate are ignored.
If VEXTRACTF128 is encoded with VEX.L= 0, an attempt to execute the instruction
encoded with VEX.L= 0 will cause an #UD exception.

Extracts 128-bits of packed floating-point values from the source operand (second
operand) at an 128-bit offset from imm8[0] into the destination operand (first
operand). The destination may be either an XMM register or an 128-bit memory location.
VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.
The high 7 bits of the immediate are ignored.
If VEXTRACTF128 is encoded with VEX.L= 0, an attempt to execute the instruction
encoded with VEX.L= 0 will cause an #UD exception.

256-bit VEX-encoded instruction and legacy 128-bit SIMD instructions has internal
state to manage the upper and lower halves of the YMM states. Functionally, VEXencoded
SIMD instructions can be intermixed with legacy SSE instructions (non-VEXencoded
SIMD instructions operating on XMM registers). However, there is a performance
impact with intermixing VEX-encoded SIMD instructions (AVX, FMA) and
Legacy SSE instructions that only operate on the XMM register state.
The general programming considerations to realize optimal performance are the
following:
• Minimize transition delays and partial register stalls with YMM registers accesses:
Intermixed 256-bit, 128-bit or scalar SIMD instructions that are encoded with
VEX prefixes have no transition delay due to internal state management.
Sequences of legacy SSE instructions (including SSE2, and subsequent
generations non-VEX-encoded SIMD extensions) that are not intermixed with
VEX-encoded SIMD instructions are not subject to transition delays.
• When an application must employ AVX and/or FMA, along with legacy SSE code,
it should minimize the number of transitions between VEX-encoded instructions
and legacy, non-VEX-encoded SSE code. Section 2.8.1 provides recommendation
for software to minimize the impact of transitions between VEX-encoded code
and legacy SSE code.
In addition to performance considerations, programmers should also be cognizant of
the implications of VEX-encoded AVX instructions with the expectations of system
software components that manage the processor state components enabled by
XCR0. For additional information see Section 4.1.9.1, “Vector Length Transition and
Programming Considerations”.

1.3.3 VEX Prefix Instruction Encoding Support
Intel AVX introduces a new prefix, referred to as VEX, in the Intel 64 and IA-32
instruction encoding format. Instruction encoding using the VEX prefix provides the
following capabilities:
• Direct encoding of a register operand within VEX. This provides instruction syntax
support for non-destructive source operand.
• Efficient encoding of instruction syntax operating on 128-bit and 256-bit register
sets.
• Compaction of REX prefix functionality: The equivalent functionality of the REX
prefix is encoded within VEX.
• Compaction of SIMD prefix functionality and escape byte encoding: The functionality
of SIMD prefix (66H, F2H, F3H) on opcode is equivalent to an opcode
extension field to introduce new processing primitives. This functionality is
replaced by a more compact representation of opcode extension within the VEX
prefix. Similarly, the functionality of the escape opcode byte (0FH) and two-byte
escape (0F38H, 0F3AH) are also compacted within the VEX prefix encoding.
• Most VEX-encoded SIMD numeric and data processing instruction semantics with
memory operand have relaxed memory alignment requirements than instructions
encoded using SIMD prefixes (see Section 2.5).
VEX prefix encoding applies to SIMD instructions operating on YMM registers, XMM
registers, and in some cases with a general-purpose register as one of the operand.
VEX prefix is not supported for instructions operating on MMX or x87 registers.

This is last I will add in this post . But anyonr who says AMD can use the vex prefix is kidding themselfes . It states in this PDF what the prefix of vex is . Its Intel software computational slices encoded

podspi · Jun 11, 2011

Nemesis 1 said:
Hold on . I said I would walk you threw it . I post pages that show AMD can not use the software path.

I read the pages, it does not show this

It IS a really cool tech, though its usefulness may be negated if we all have capable IGPs compatible with OpenCL in the near future. (Just my opinion).

I see absolutely nothing about it being exclusive to Intel, however.

Nemesis 1 · Jun 11, 2011

podspi said:
I read the pages, it does not show this

It IS a really cool tech, though its usefulness may be negated if we all have capable IGPs compatible with OpenCL in the near future. (Just my opinion).

I see absolutely nothing about it being exclusive to Intel, however.

Page 47 2.7.2 shows a chart of protected and compatibility

Nemesis 1 · Jun 11, 2011

podspi said:
I read the pages, it does not show this

It IS a really cool tech, though its usefulness may be negated if we all have capable IGPs compatible with OpenCL in the near future. (Just my opinion).

I see absolutely nothing about it being exclusive to Intel, however.

You get any one from AMD here to say The prefix of vex is not a intel exclusive and they would be lying . Its a computational slice that is encoded

Nemesis 1 · Jun 11, 2011

JFAMD said:
We are working with all of the compiler verndors to ensure that AMD processors can run AVX code. I can't speak to their tools (that is there business) but we have had no complaints nor issues with PGI, GCC, Microsoft, etc.

I can't speak for intel, but if their compiler doesn't support AVX on AMD but everyone else's does, that either means that a.) they are not smart enough to figure it out or b.) they are purposely doing something to mess with us.

I seriously doubt it is A and I believe, per the terms of their agreement, it can't be B. Which would lead me to believe that the whole discusssion is FUD at this point.

Get off it . Thats pure hype and you know it . Good to see you properly worded it AVX

Are you saying that AMD can use THe prefix of vex. You know they cann't because they don't have the encode or the software . AMD using AVX can't compact Rex They have to use the rex prefix when doing legacy code were it applies .

Intel has no FMA on sb . yet they have AVX . AMD can't do AVX without FMA4 intel can and AMD will never have the prefix of vex

Nemesis 1 · Jun 11, 2011

podspi said:
But software runs on hardware. If AMD is allowed to make Intel-compatible hardware (and vice-versa, with x64), the only way for Intel to shut AMD out is to lock-out their software.

If Intel ties their software to their CPU, that is again an antitrust issue. Like it or not, when you own the majority of a market like Intel does (especially if you've acted the way Intel has), you aren't allowed to do whatever you want. This is the law.

BTW, I read the pages that you specified. It DOES sound like some really cool stuff, but nothing in there implies that it is anything AMD could not implement. It looks like a new instruction/instruction syntax, which AMD has always been allowed to implement. Yes, they might have to play catch-up (since they have to wait to see what Intel actually does), but they can't be barred from implementing it.

Technically that isn't true, but they can be barred from implementing it as easy as Intel can be barred from implementing x64.

No with prefix of vex AVX provides for a space in that space is the prefix of vex and space is setaside for further vexprefix values to be added in the computational slices of Vex/ This space is AMD nomans land . Intel has done nothing wrong . They are just using software along side AVX . That space is the prefix of vex AVX is was invented by intel for intel but Intel New that they would have to share AVX. Thus the reason for the software in the prefix of vex space and the AVX space. Intels prefix of Vex is all patented software AMD can't use this software ever. It took intel over 15 years to develope this hardware software combination . AMD can do the hardware but the software is untouchable.

AMD can never under any circumstances stop intel from using X86 64 , Ever , No more than intel can stop AMD from using X86. There are things that are not covered by the AMD/Intel license agreement . Prefix of Vex is one of those things. Its software

Nemesis 1 · Jun 11, 2011

As soon as software comes out using the prefix of vex . It a simple matter of running a benchmark on SSe2 programms that have been recompiled aut0maticly . Run the programm on Intel than AMD and you will see at least a 2.2 increase in performance. Not so on an AMD BD core. Not even close and I don't care wHat compiler AMD uses. THe OS will not allow AMD processor to run or try to run Vexprefix. I already showed you some pages that tell you when cupid does its job correctly. AMD is 0/0 Intel is 0/1 the 1 is the key to the software in vexprefix. Think of the prefix of vex as mitosis intel not giving this to amd now or ever. I gave you the page number for compatibilities and read threw it and you will see that AMD AVX is nothing more than a talking point . AMD has FMA4 amd will try to do AVX withing the primatives of Intels AVX . If they do not than cupid will single this and the OS and will not allow access to AVX instructions . Also keep in mind this paper covers Intel AVX and FMA. AMDs FMA is not covered other than Vexprefix exceptions.

Thoughts on "8 Core" Bulldozer and "4 Core Sandy Bridge"

Senior member

Lifer

Senior member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Elite Member

Lifer

Lifer

Golden Member

Elite Member

Senior member

Lifer

Lifer

Lifer

Golden Member

Lifer

Lifer

Lifer

Lifer

Lifer