Thoughts on "8 Core" Bulldozer and "4 Core Sandy Bridge"

Nemesis 1 · Jun 11, 2011

This is Intels vex prefix basics,

The VEX prefix and VEX coding scheme is a proposed future extension to the x86 instruction set architecture for microprocessors from Intel, AMD and others.

Contents [hide]
1 Features
2 Technical description
3 History
4 References

[edit] FeaturesThe proposed VEX coding scheme extends the existing x86 instruction set architecture to allow the definition of new instructions and the extension or modification of previously existing instruction codes. This serves the following purposes:

The opcode map is extended to make space for future instructions.
It allows instruction codes to have up to five operands, where the original scheme allows only two operands (in rare cases three operands).
It allows the size of SIMD vector registers to be extended from the 128-bits XMM registers to 256-bits registers named YMM. There is room for further extensions of the register size in the future.
It allows existing two-operand instructions to be modified into non-destructive three-operand forms where the destination register is different from both source registers. For example c:=a+b instead of a:=a+b (where register a is changed by the instruction).
[edit] Technical descriptionThe proposed VEX coding scheme uses a code prefix consisting of 2 or 3 bytes which is added to existing or new instruction codes[1].

The VEX prefix replaces the most commonly used instruction prefix bytes and escape codes. In many cases, the number of prefix bytes and escape bytes that are replaced is the same as the number of bytes in the VEX prefix, so that the total length of the VEX-encoded instruction is the same as the length of the legacy instruction code. In other cases, the VEX-encoded version is longer or shorter than the legacy code.

The 3-bytes VEX prefix contains the following components:

The four bits R,X,B,W contained in the REX prefix used in the x86-64 instruction set extension.
Two bits named pp to replace operand size prefixes and operand type prefixes (66, F2, F3).
A bit named L specifying 256 bit vector length.
Four bits named vvvv specifying an second source register operand.
Five bits named m-mmmm. Two of the m bits are used for replacing existing escape codes and for specifying the length of the instruction. The remaining three m bits are reserved for future use, such as specifying vector lengths > 256 bits, specifying different instruction lengths, or extending the opcode space.
The 2-bytes VEX prefix contains a subset of these components and can be used in cases where not all components are needed.

The encoding is as follows:

First byte Second byte Third byte
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
3-byte VEX 1 1 0 0 0 1 0 0 R̅ X̅ B̅ m m m m m W v v v v L p p
2-byte VEX 1 1 0 0 0 1 0 1 R̅ v v v v L p p

The R̅, X̅ and B̅ bits are equivalent to the REX prefix's R, X and B bits, providing a fourth register number bit for each of the three registers referenced by a standard x86 instruction: the register operand, and the index and base registers for the memory operand. The v̅ bits specify an additional source register, or are set to all-ones if not used. All of these bits are complemented in the instruction stream, so they are encoded as 1 bits in 32-bit mode.

The VEX opcode bytes are the same as that used by the LDS and LES instructions. These instructions are not supported in 64-bit mode, while in 32-bit mode, the following "mod R/M" byte can not be of the form "11xxxxxx" (which would specify a register operand). The bit inversion ensures that the second byte of a VEX prefix is always of this form in 32-bit mode.

The W bit is equivalent to the REX prefix's W bit, and specifies a 64-bit operand. For non-integer instructions, it is a general opcode extension bit.

The 5 m bits replace leading opcode bytes. The values 1, 2 and 3 are equivalent to opcodes 0F, 0F 38 and 0F 3A; all other values are currently reserved. (The 2-byte VEX prefix always corresponds to a 0F prefix.)

The L bit indicates the vector length. It is 0 for 128-bit SSE (xmm) registers, and 1 for 256-bit AVX (ymm) registers.

The p bits encode additional prefix bytes. The 4 possible values are none, 66, F3, and F2. These encode the operand type for SSE instructions: packed single, packed double, scalar single and scalar double, respectively.

Instructions that need more than three operands have an extra suffix byte specifying one or two additional register operands. Instructions coded with the VEX prefix can have up to five operands. At most one of the operands can be a memory operand; and at most one of the operands can be an immediate constant of 4 or 8 bits. The remaining operands are registers.

The AVX instruction set is the first instruction set extension to use the VEX coding scheme. The AVX instructions have up to four operands. The AVX instruction set allows the VEX prefix to be applied only to instructions using the SIMD XMM registers. However, the VEX coding scheme has space for applying the VEX prefix to other instructions as well in future instruction sets.

Legacy instructions with a VEX prefix added are equivalent to the same instructions without VEX prefix with the following differences:

The VEX-encoded instruction can have one more operand, making it non-destructive.
A 128-bit XMM instruction without VEX prefix leaves the upper half of the full 256-bit YMM register unchanged, while the VEX-encoded version sets the upper half to zero.
Instructions that use the whole 256-bit YMM register should not be mixed with non-VEX instructions that leave the upper half of the register unchanged, for reasons of efficiency.

From Wikipedia, the free encyclopedia
Jump to: navigation, search
The XOP instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set for the Bulldozer processor core, due to begin production in 2011.[1]

XOP is a revision of the SSE5 instruction set proposal announced on August 30, 2007. This revision makes the binary coding of the proposed new instructions more compatible with Intel's AVX instruction extensions, while the functionality of the instructions is unchanged.[2]

The XOP instructions include:

Integer vector multiply-accumulate instructions
Integer vector horizontal addition
Integer vector compare
Integer vector shift and rotate instructions
Vector byte permutation
Vector conditional move instructions
Floating point fraction extraction
The XOP instruction set is supplemented by the FMA4 (floating point vector multiply-accumulate) and CVT16 (Half precision floating point conversion) instruction sets, which were also included in SSE5.

[edit] Compatibility issuesAMD has changed the encoding from the original SSE5 specification in order to improve compatibility with Intel's AVX instruction set and the new VEX coding scheme.

All SSE5 instructions that were equivalent or similar to instructions in the AVX and FMA4 instruction sets announced by Intel have been changed to use the coding proposed by Intel. Integer instructions without equivalents in AVX were classified as the XOP extension.[3] The XOP instructions have an Opcode byte 8F (hexadecimal), but otherwise almost identical coding scheme as AVX with the 3-byte VEX prefix.

Commentators[4] have seen this as evidence that Intel has not allowed AMD to use any part of the large VEX coding space. AMD has been forced to use different codes in order to avoid using any code combination that Intel might possibly be using in their development pipeline for something else. The XOP coding scheme is as close to the VEX scheme as technically possible without risking that the AMD codes overlap with any future Intel codes. It must be noted that this inference is speculative, since no public information is available about negotiations between the two companies on this issue.

The use of the 8F byte requires that the m-bits (see VEX coding scheme) have a value bigger than or equal to 8 in order to avoid overlap with existing instructions. The C4 byte used in the VEX scheme has no such restriction. This may prevent the use of the m-bits for other purposes in the future in the XOP scheme, but not in the VEX scheme. Another possible problem is that the pp bits have the value 00 in the XOP scheme, while they have the value 01 in the VEX scheme for instructions that have no legacy equivalent. This may complicate the use of the pp bits for other purposes in the future.

A similar compatibility issue is the difference between the FMA3 and FMA4 instruction sets. INTEL initially proposed FMA4 in AVX/FMA specification version 3 to supersede the 3-operand FMA proposed by AMD in SSE5. After AMD adopted FMA4, however, Intel canceled FMA4 support and reverted back to FMA3 in the AVX/FMA specification version

This is a part a serve guy should get , Beings JW is a server guy . I ask him how this is done with AMDs prefix XOP

The VEX-encoded instruction can have one more operand, making it non-destructive.
A 128-bit XMM instruction without VEX prefix leaves the upper half of the full 256-bit YMM register unchanged, while the VEX-encoded version sets the upper half to zero.
Instructions that use the whole 256-bit YMM register should not be mixed with non-VEX instructions that leave the upper half of the register unchanged, for reasons of efficiency

Lepton87 · Jun 11, 2011

Nemesis 1 if you want to be understood and taken seriously you should really work on your English, because right now it's worse than that of first year ESL students.

While Nemesis 1 works on his English, I recommend you work on your manners. What you just said is kind of insulting in case you didn't know or care to know. Keep these comments to yourself in the future.

Anandtech Moderator - Keysplayr

Nemesis 1 · Jun 11, 2011

YOU read the 800 page PDF than . After your done come back tell us what ya learned in a short paragraph. Good luck . You need to work on your english is BS when the links are here unread.But even if ya read the link I doubt ya would comprehend whats written . The links are here read them and than you should get it. Rather than relieing on others to tell ya what it says.

AMD saying XOP is as close to Vex as they can get , is likely true . But they don't tell its a country mile in differance.

Cerb · Jun 11, 2011

Nemesis 1 said:
AMD saying XOP is as close to Vex as they can get , is likely true . But they don't tell its a country mile in differance.

That all appears to refer to AMD adding their own codes, not implement codes Intel has set in stone. I think it's safe to assume that nobody thinks AMD even should be able to add their own instructions to Intel's extension. There are legitimate perks to being the market leader, and that would be one of them.

If there is a clandestine agreement over it, or an open one, for that matter, that would be the evidence you will need. Not anything technical.

Commentators[4] have seen this as evidence that Intel has not allowed AMD to use any part of the large VEX coding space. AMD has been forced to use different codes in order to avoid using any code combination that Intel might possibly be using in their development pipeline for something else. The XOP coding scheme is as close to the VEX scheme as technically possible without risking that the AMD codes overlap with any future Intel codes. It must be noted that this inference is speculative, since no public information is available about negotiations between the two companies on this issue.

That, FI, is the only relevant point you've posted, and even that speculation only refers to AMD coming up with their own instructions.

Nemesis 1 · Jun 11, 2011

We went threw this debate once befor . In that debate I linked intels white paper on AVX . In that paper it said that the prefex of vex is an Intel exclusive . PDF I posted. It goes into some detail the use of rex . Intels prefix of Vex compacts the rex prefix inside the Vex prefix so that Intel doesn't have to use rex prefix in there coding scheme AMD does however . AMD cannot enter in the prefix of Vex space so nothing they do with XOP will allow the OS to run AVX because it will have an UD exception.

Page 47 starting at 2.7.2 Are a alot of charts that will trigger exceptions . Intel left space in the prefix of Vex for Intel processors only. AMD has XOP which is basicly nothing to write home about . XOP cann't function in the prefix of Vex space without causing a UD exception It also list illegeal mask and mirror. All of which will tell the OS not to run AVX apps . If intels spaced is infringed on in the prefix of vex space

Nemesis 1 · Jun 11, 2011

I hope you guys can understand this it doesn't get any simpler.

There is no transition penalty if an application clears the upper bits of all YMM registers
(set to 0) via VZEROUPPER, VZEROALL, before transitioning between AVX
instructions and legacy SSE instructions. Note: clearing the upper state via
sequences of XORPS or loading 0 values individually may be useful for breaking
dependency, but will not avoid state transition penalties.

podspi · Jun 11, 2011

Can we drop this? We're getting nowhere...

ANYWAY:
I think it is going to be very interesting looking at the performance of the FX-6xxx series vs the 980X. I'm the FX-6xxx manages to get within 10% of the 980X. If it manages to do that @ 95W then I would say AMD will have a winner

Arachnotronic · Jun 11, 2011

podspi said:
Can we drop this? We're getting nowhere...

ANYWAY:
I think it is going to be very interesting looking at the performance of the FX-6xxx series vs the 980X. I'm the FX-6xxx manages to get within 10% of the 980X. If it manages to do that @ 95W then I would say AMD will have a winner

The FX-6xxx will not get within 10% of the 980x...are you on crack?

Kevmanw430 · Jun 11, 2011

^We don't know that yet.

aegisofrime · Jun 11, 2011

Intel17 said:
The FX-6xxx will not get within 10% of the 980x...are you on crack?

I agree. From the get-go, I have always doubted that AMD's FX cores are comparable to Intel's SnB cores. To cram that many into a viable TDP, I think they would have to be weaker than what Intel could cram.

That said, it doesn't mean that AMD couldn't pull of some magic.

Man, Bulldozer is hyped up so badly.

Lepton87 · Jun 11, 2011

Expecting people to read 800 hundred pages technical document just because you can't explain what you mean exactly is preposterous. If you can't explain a concept in simple layman's terms you don't probably fully grasp it either. As Richard Feynman famously said:
"If you can't explain something to a first year student, then you haven't really understood it"

Cerb · Jun 11, 2011

Nemesis 1 said:
Page 47 starting at 2.7.2 Are a alot of charts that will trigger exceptions . Intel left space in the prefix of Vex for Intel processors only. AMD has XOP which is basicly nothing to write home about . XOP cann't function in the prefix of Vex space without causing a UD exception It also list illegeal mask and mirror. All of which will tell the OS not to run AVX apps . If intels spaced is infringed on in the prefix of vex space

The vex space isn't the issue. That only Intel can specify instructions is the way it should be. What Intel should not be able to do, and what is not specified in the PDF, AFAICT, is that AMD is prevented from implementing the exact instructions Intel has now specified. AMD not being able to implement instructions that Intel has already laid out would definitely be an antitrust issue.

That AMD must introduce their own new instructions using DREX is fine and dandy, and not an issue at all. The issue is whether they can copy what Intel has set in stone.

podspi · Jun 11, 2011

Intel17 said:
The FX-6xxx will not get within 10% of the 980x...are you on crack?

Why not? Both are on 32nm, and Bulldozer has the advantage of coming out after the 980X, so at the very least AMD's engineers can see how that level of performance was achieved.

Intel certainly has an advantage in its manufacturing process, but with both companies operating on comparable nodes, I would expect AMD to be more competitive than they have been in recent history.

Nemesis 1 · Jun 11, 2011

Cerb said:
The vex space isn't the issue. That only Intel can specify instructions is the way it should be. What Intel should not be able to do, and what is not specified in the PDF, AFAICT, is that AMD is prevented from implementing the exact instructions Intel has now specified. AMD not being able to implement instructions that Intel has already laid out would definitely be an antitrust issue.

That AMD must introduce their own new instructions using DREX is fine and dandy, and not an issue at all. The issue is whether they can copy what Intel has set in stone.

No it wouldn't . Here are 2 differant cpu . 1 intel one AMD I leave it to you which is which . As I said that pesky prefix of Vex is alot more than your willing to admitt. Heres the first than I have to get the second . AMD has everthing they need to operate in the AVX space . Intel just has more

Processors with “CPUID.1H:ECX.AVX =1” implement the full complement of 32 predicates
shown in Table 5-15, software emulation is no longer needed. Compilers and
assemblers may implement the following three-operand pseudo-ops in addition to
the four-operand VCMPSS instruction. See Table 5-17, where the notations of reg1
reg2, and reg3 represent either XMM registers or YMM registers. Compiler should
treat reserved Imm8 values as illegal syntax. Alternately, intrinsics can map the
pseudo-ops to pre-defined constants to support a simpler intrinsic interface.

Note that processors with “CPUID.1H:ECX.AVX =0” do not implement the “greaterthan”,
“greater-than-or-equal”, “not-greater than”, and “not-greater-than-or-equal
relations” predicates. These comparisons can be made either by using the inverse
relationship (that is, use the “not-less-than-or-equal” to make a “greater-than”
comparison) or by using software emulation. When using software emulation, the
program must swap the operands (copying registers when necessary to protect the
data that will now be in the destination), and then perform the compare using a
different predicate. The predicate to be used for these emulations is listed in the first
8 rows of Table 3-7 (Intel 64 and IA-32 Architectures Software Developer’s Manual
Volume 2A) under the heading Emulation.
Compilers and assemblers may implement the following two-operand pseudo-ops in
addition to the three-operand CMPSS instruction, for processors with
“CPUID.1H:ECX.AVX =0”. See Table 5-16. Compiler should treat reserved Imm8
values as illegal syntax.

Nemesis 1 · Jun 11, 2011

So AMds llano is an x86 APU so intel should have all rights to that tech by your reasoning.

Cerb · Jun 12, 2011

Nemesis 1 said:
So AMds llano is an x86 APU so intel should have all rights to that tech by your reasoning.

No, it's not. By my reasoning, AMD should be able to implement the vex prefix, and decode and execute instructions that have had their software-level implementations specified. Nothing in my reasoning has anything to do with known CPUs, but with the legal ability of AMD to implement the decoding of certain series of bytes in future CPUs. It's not a physical technology; it's a packet of bytes.

SB, BD, IB, Llano, Haswell, are all not really important to the issue, as they are not sets of instructions, but physical implementations of CPUs. The DoJ, the FTC, and lawyers are what are important, because the only way that AMD can't support the same instructions, with the vex prefix, is if their agreements with Intel forbid them, or if Intel has gone out of their way to skirt around agreements already made, forbidding AMD by deceptive practices.

Arachnotronic · Jun 12, 2011

I mean, it is a matter of common sense...Intel's got the best manufacturing/process tech, and right now they're on a winning streak with making the right design decisions (based on real performance, not silly clock speeds...*ahem* Netburst *ahem*). So to think that AMD can put 8 cores in the same TDP that Intel's putting 6 cores in and expect each of AMD's cores to be on par with Intel's is...well, fanboy-ish!

Topweasel · Jun 12, 2011

Intel17 said:
I mean, it is a matter of common sense...Intel's got the best manufacturing/process tech, and right now they're on a winning streak with making the right design decisions (based on real performance, not silly clock speeds...*ahem* Netburst *ahem*). So to think that AMD can put 8 cores in the same TDP that Intel's putting 6 cores in and expect each of AMD's cores to be on par with Intel's is...well, fanboy-ish!

Along those same lines of thought. One can say that just because Intel has the best money and tech, and has been on winning streak doesn't mean they can do no wrong. To believe that Intel is always designing the most efficient and fastest processors they could be with all of their money and tech is fanboyish.

Even if BD is as competitive as people would like, I wouldn't take AMD's stumbling here as any sign that Intel is working up to its potential and can't be beat. They have been before and there is an opportunity to for it to happen again.

CPUarchitect · Jun 12, 2011

Topweasel said:
They have been before and there is an opportunity to for it to happen again.

That opportunity has just been blown to bits; Haswell will support gather instructions: Haswell New Instructions (AVX2).

This is major news. It means AVX2 and LRBni (the Larrabee instruction set) are practically equivalent, making the CPU a throughput-oriented device like the GPU, without sacrificing flexibility.

Mark my words: Heterogeneous architectures are history. It used to offer great benefits but now we'll get the best of both worlds in a single chip for even greater power.

Abwx · Jun 12, 2011

Nemesis 1 said:
No it wouldn't . Here are 2 differant cpu .

Processors with CPUID.1H:ECX.AVX =1 implement the full complement of 32 predicates
shown in Table 5-15, software emulation is no longer needed. Compilers and
assemblers may implement the following three-operand pseudo-ops in addition to
the four-operand VCMPSS instruction. See Table 5-17, where the notations of reg1
reg2, and reg3 represent either XMM registers or YMM registers. Compiler should
treat reserved Imm8 values as illegal syntax. Alternately, intrinsics can map the
pseudo-ops to pre-defined constants to support a simpler intrinsic interface.

Note that processors with CPUID.1H:ECX.AVX =0 do not implement the greaterthan,
greater-than-or-equal, not-greater than, and not-greater-than-or-equal
relations predicates. These comparisons can be made either by using the inverse
relationship (that is, use the not-less-than-or-equal to make a greater-than
comparison) or by using software emulation. When using software emulation, the
program must swap the operands (copying registers when necessary to protect the
data that will now be in the destination), and then perform the compare using a
different predicate. The predicate to be used for these emulations is listed in the first
8 rows of Table 3-7 (Intel 64 and IA-32 Architectures Software Developers Manual
Volume 2A) under the heading Emulation.
Compilers and assemblers may implement the following two-operand pseudo-ops in
addition to the three-operand CMPSS instruction, for processors with
CPUID.1H:ECX.AVX =0. See Table 5-16. Compiler should treat reserved Imm8
values as illegal syntax.

All this sound as the differenciation is made by the compilers
to not enable some optimisations based solely on the "genuine intel"
checking procedure.
Either you didnt understand intel s optimisation guide, either
you are gullible enough to believe that such disoptimisations
will be implemented.

Anyway, your reasonning has more to do with a shareholder
that did buy intel s stock at its peak in 2001 , at about 70$,
and now is in search of some myths to enhance his hope
to regain his price if ever intel s crush the concurrence
and then can quietly milk the market.

Abwx · Jun 12, 2011

CPUarchitect said:
That opportunity has just been blown to bits; Haswell will support gather instructions: Haswell New Instructions (AVX2).

This is major news. It means AVX2 and LRBni (the Larrabee instruction set) are practically equivalent, making the CPU a throughput-oriented device like the GPU, without sacrificing flexibility.

It means that Intel is pursuing its usual strategy to create new instructions
that will allow them to have a lead in computing performances.

The harsh reaction of intel when AMD proposed SSE5 is just a good
clue that should the two firms be on the same level in this matter,
intel would no more retain its lead more than a few years.

jones377 · Jun 12, 2011

CPUarchitect said:
That opportunity has just been blown to bits; Haswell will support gather instructions: Haswell New Instructions (AVX2).

This is major news. It means AVX2 and LRBni (the Larrabee instruction set) are practically equivalent, making the CPU a throughput-oriented device like the GPU, without sacrificing flexibility.

Mark my words: Heterogeneous architectures are history. It used to offer great benefits but now we'll get the best of both worlds in a single chip for even greater power.

I just hope the AVX FMA fiasco will not be repeated with AVX2. Before, AMD implemented FMA4 as Intel themselves specified in the first draft of AVX. Then Intel changed their minds and changed the spec to FMA3. I hope there won't be a 2nd revision of AVX2 in the future that will make AMD CPUs incompatible again!

Unless AMD got these specs earlier than the public, the clock to implement this starts now. I bet it will take at least 2 years to work these instructions into an existing design. Way longer if they start from scratch. If Intel changes the spec it will throw AMD off their schedule as they would have to go back and redo their designs (again!)

podspi · Jun 12, 2011

Intel17 said:
I mean, it is a matter of common sense...Intel's got the best manufacturing/process tech, and right now they're on a winning streak with making the right design decisions (based on real performance, not silly clock speeds...*ahem* Netburst *ahem*). So to think that AMD can put 8 cores in the same TDP that Intel's putting 6 cores in and expect each of AMD's cores to be on par with Intel's is...well, fanboy-ish!

I understand where you are coming from. It has been a long time since AMD has released a superior architecture, and been even longer since AMD was competitive from a manufacturing standpoint. Starting this month (when Llano drops) both companies will be on comparable nodes.

Assuming that the above is true, there is no a priori reason to believe Intel's 32nm design will be superior to AMD's. Sure, Intel has held the performance crown for quite a while, but as I've said before in a few other threads, building an microarch is NOT a race. You don't have to "catch up" because the entire thing is designed, not run

.

And, so you don't think everything I am saying is irreparably tainted with green-tinted glasses, this is the same reasoning I cite when I say that AMD will likely be in a lot of trouble in terms of IGP performance vs. Intel. There is no reason to believe that Intel's engineers can't design a GPU just as good as AMD's, and with a higher transistor budget (due to 22nm next year) AMD better be ready for Intel to do exactly that. The only thing stopping this from actually happening soon is Intel's desire for margins, and the fact that I believe they are too afraid of compromising CPU performance to dedicate too many transistors to the GPU... yet.

CPUarchitect said:
That opportunity has just been blown to bits; Haswell will support gather instructions: Haswell New Instructions (AVX2).

This is major news. It means AVX2 and LRBni (the Larrabee instruction set) are practically equivalent, making the CPU a throughput-oriented device like the GPU, without sacrificing flexibility.

Mark my words: Heterogeneous architectures are history. It used to offer great benefits but now we'll get the best of both worlds in a single chip for even greater power.

I wouldn't be so sure. Heterogeneous computing is available today, while Haswell will at the earliest be available in a year and a half.

Heterogeneous computing is so exciting, imho, because it really opens the possibilities of doing some quite advanced operations on very lightweight hardware. We're talking actually doing transcoding on your netbook WHILE browsing the web, or simple photo or video editing on the go.

Along with the fact that the ARM ecosystem is adopting OpenCL, and Intel also supporting OpenCL, assuming AVX2 isn't significantly faster than OpenCL, I think it is safe to say we'll be stuck with it for a while.

Heck, we've already seen adoption for OpenCL blow past AVX. Not that adoption for either technology is particularly impressive atm. But I couldn't find (via simple google search) anything that didn't come from www.intel.com that supports AVX. OpenCL at least has PowerPoint

Nemesis 1 · Jun 12, 2011

Well remember what I was saying about Haswell not being X86 . Its really looking as if I and others got it right . ChecK this out .

This is why AMD can't use the prefix of vex. Its called AVX2

http://software.intel.com/file/36945

PCMPGTB/PCMPGTW/PCMPGTD/PCMPGTQ- Compare Packed Integers
This is a small section on packed Integers . These also use the prefix of Vex,There is alot more this is exciting 13 years I waited. Zinn2b called this way back when ,. Right here in these forums

Nemesis 1 · Jun 12, 2011

jones377 said:
I just hope the AVX FMA fiasco will not be repeated with AVX2. Before, AMD implemented FMA4 as Intel themselves specified in the first draft of AVX. Then Intel changed their minds and changed the spec to FMA3. I hope there won't be a 2nd revision of AVX2 in the future that will make AMD CPUs incompatible again!

Unless AMD got these specs earlier than the public, the clock to implement this starts now. I bet it will take at least 2 years to work these instructions into an existing design. Way longer if they start from scratch. If Intel changes the spec it will throw AMD off their schedule as they would have to go back and redo their designs (again!)

better read the AVX2 PDF . There won't be an x86 haswell and this is why AMD cann't use the prefix of Vex its for a new intel microarch. That won't be carring the hefty x86 decode . Which means its not x86. It likely a VLIW cpu (EPIC) ITANIC

Thoughts on "8 Core" Bulldozer and "4 Core Sandy Bridge"

Lifer

Platinum Member

Lifer

Elite Member

Lifer

Lifer

Golden Member

Lifer

Senior member

Junior Member

Platinum Member

Elite Member

Golden Member

Lifer

Lifer

Elite Member

Lifer

Diamond Member

Senior member

Lifer

Lifer

Senior member

Golden Member

Lifer

Lifer