Thoughts on "8 Core" Bulldozer and "4 Core Sandy Bridge"

Page 14 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Sorry. I don't understand. There is no mention of the phrase 'p-slice' in there. Also, what is being described above bears absolutely no resemblance to the 'p-slice' described in the mitosis papers.

Define p-slice in mitose than read the last section of post 320 . Thats a P-slice but keep in mind the software is doing the work or should I say the mitosis compiler. In the code thats written you will see the Rex prefix code in that 320 post reply . that section explains it . You have to remember the work has to be done in the compiler. or its not software solution .


The encoding is as follows:

First byte Second byte Third byte
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
3-byte VEX 1 1 0 0 0 1 0 0 R̅ X̅ B̅ m m m m m W v v v v L p p
2-byte VEX 1 1 0 0 0 1 0 1 R̅ v v v v L p p
In the encode above you can see the rex prefix encode. Now at the end of the Vex prefix note the pp intels pp is 0/1 AMDs pp is 0/0 if amd tried to run this code it would up as discribded in the 800page pdf . But I did give those page numbers for flags in an earlier post . I gave all the info . how I choose to report the info is up to me . Because life is choice and we all have the right to make our own choices , Than we have to live with them .
 
Last edited:

bronxzv

Senior member
Jun 13, 2011
460
0
71
Unfortunitly one of my post was removed that really ties it together . I did however copy that page as I did all these pages to CD . When wife gets home I will ask her what folder she put it in as its not in the index yet on the research computer.

Hey! you just made me spit my coffee through my nose
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Hey! you just made me spit my coffee through my nose

Ya I know I could just transfer the files but I won't take any chances . Some years ago I came up with the idea of using diamond dust in a thermal compound for pc use . my computer was hacked and all the information and formulas were stolen . Your still on my ignor list but when not signed in there you are spouting coffee threw your nostrils
 
Last edited:

Outrage

Senior member
Oct 9, 1999
217
1
0
you sure you didnt mix in some angel dust and you are breathing in the fumes whenever your pc heats up?
 

bronxzv

Senior member
Jun 13, 2011
460
0
71
Now at the end of the Vex prefix note the pp intels pp is 0/1 AMDs pp is 0/0

no, it isn't what you think it is, for a simple explanation see in the latest AVX Reference Manual (Ref. # 319433-011), bottom of page 4-5

"
pp: opcode extension providing equivalent functionality of a SIMD prefix
00: None
01: 66
10: F3
11: F2
"

VEX encodes in 2 bits what was spending one full byte with SSEx, these 2-bit values have the same meaning than their 0x66,0xF2 and 0xF3 counterparts in legacy SSEx code, and obviously the same meaning for Intel, AMD, VIA, whatever x86 CPU
 
Last edited:

Mark R

Diamond Member
Oct 9, 1999
8,513
16
81
pp: opcode extension providing equivalent functionality of a SIMD prefix
00: None
01: 66
10: F3
11: F2

"
The funny thing is he had quoted that very section earlier in the thread.

He is quoting stuff that directly contradicts what he is saying!
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
IS this the post you referr to .

Flexible and more compact bit fields are provided in the VEX prefix to retain the
full functionality provided by REX prefix. REX.W, REX.X, REX.B functionalities are
provided in the three-byte VEX prefix only because only a subset of SIMD instructions
need them.
I hope this helps the people with reading comprehension problems . IF not. OH well!


This defines prefix Vex . This is a P-slice There is still one more example thats in more simple terms . But thats for later
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
no, it isn't what you think it is, for a simple explanation see in the latest AVX Reference Manual (Ref. # 319433-011), bottom of page 4-5

"
pp: opcode extension providing equivalent functionality of a SIMD prefix
00: None
01: 66
10: F3
11: F2
"

VEX encodes in 2 bits what was spending one full byte with SSEx, these 2-bit values have the same meaning than their 0x66,0xF2 and 0xF3 counterparts in legacy SSEx code, and obviously the same meaning for Intel, AMD, VIA, whatever x86 CPU

Actually if you lok at the compatability tables you will see that I understand perfectly well thank you . If a vex prefix is used and ends in a pp Intel 0/1 and AMD 0/0 you will see it end as an up
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
This is from mitosis PDF .

Identifying Live-ins
Identifying the live-ins of a speculative thread requires a topdown
traversal of its control-flow graph starting at the CQIP to
identify register and memory values read before being written by
the speculative thread. Each path is explored until a certain length.
This length represents the time that previous threads take to
compute and commit these values. This is because once the
previous thread commits, the speculative thread need no longer
rely on predicted values, but can read committed values. This time
is estimated as the time it takes to sequentially execute all the
code between the SP and CQIP minus the thread spawn overhead

2 things I want you to retain here is Greater than or Less than and code length.

------------------------------------------------------------------------------------


Flexible and more compact bit fields are provided in the VEX prefix to retain the
full functionality provided by REX prefix. REX.W, REX.X, REX.B functionalities are
provided in the three-byte VEX prefix only because only a subset of SIMD instructions
need them.
I hope this helps the people with reading comprehension problems . IF not. OH well!


This defines prefix Vex . This is a P-slice There is still one more example thats in more simple terms . But thats for later.
_________________________________________________________________________________________________________________________________________________

This is from Mitosis PDF

value prediction. The
synchronization approach imposes a high overhead when
dependences are frequent, as in the workload presented here.
Value prediction has more potential – if the values that are
computed by one thread and consumed by another can be
predicted, the consumer thread can be executed in parallel with
the producer thread since these values are only needed for
validation at a later stage. It is typically assumed that these value
predictions are computed in hardware. The Mitosis system
presents a novel approach, which adds code (derived from the
original program) to predict in software the live-ins (values
consumed, but not produced by, the thread) for each speculative
thread. Because mechanisms for recovery of incorrect threads are
already in place, the code to produce the values need not always
be correct, and can be highly optimized. We refer to this code as
pre-computation slices (p-slices). The main advantages of p-slices
are: (1) they are potentially more accurate in the prediction of
live-ins than a hardware-based predictor, since it is derived from
the original code, (2) they can encapsulate multiple control flows
that contribute to the prediction of live-ins, and (3) they can
accelerate the detection of incorrectly spawned threads.

Unfortunitly one of my post was removed that really ties it together . I did however copy that page as I did all these pages to CD . When wife gets home I will ask her what folder she put it in as its not in the index yet on the research computer.

________________________________________________________________________________________________________________________________________

Other than the missing post this is the last post in this reply . Note it shows the rex prefix code inside the the Vexprefix. THat is a P-slice when used with the compiler
3bytes vexprefix
The four bits R,X,B,W contained in the REX prefix used in the x86-64 instruction set extension.
Two bits named pp to replace operand size prefixes and operand type prefixes (66, F2, F3).
A bit named L specifying 256 bit vector length.
Four bits named vvvv specifying an second source register operand.
Five bits named m-mmmm. Two of the m bits are used for replacing existing escape codes and for specifying the length of the instruction. The remaining three m bits are reserved for future use, such as specifying vector lengths > 256 bits, specifying different instruction lengths, or extending the opcode space.
The 2-bytes VEX prefix contains a subset of these components and can be used in cases where not all components are needed.

The encoding is as follows:

First byte Second byte Third byte
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
3-byte VEX 1 1 0 0 0 1 0 0 R̅ X̅ B̅ m m m m m W v v v v L p p
2-byte VEX 1 1 0 0 0 1 0 1 R̅ v v v v L p p

The R̅, X̅ and B̅ bits are equivalent to the REX prefix's R, X and B bits, providing a fourth register number bit for each of the three registers referenced by a standard x86 instruction: the register operand, and the index and base registers for the memory operand. The v̅ bits specify an additional source register, or are set to all-ones if not used. All of these bits are complemented in the instruction stream, so they are encoded as 1 bits in 32-bit mode.

The VEX opcode bytes are the same as that used by the LDS and LES instructions. These instructions are not supported in 64-bit mode, while in 32-bit mode, the following "mod R/M" byte can not be of the form "11xxxxxx" (which would specify a register operand). The bit inversion ensures that the second byte of a VEX prefix is always of this form in 32-bit mode.

The W bit is equivalent to the REX prefix's W bit, and specifies a 64-bit operand. For non-integer instructions, it is a general opcode extension bit.

The 5 m bits replace leading opcode bytes. The values 1, 2 and 3 are equivalent to opcodes 0F, 0F 38 and 0F 3A; all other values are currently reserved. (The 2-byte VEX prefix always corresponds to a 0F prefix.)

The L bit indicates the vector length. It is 0 for 128-bit SSE (xmm) registers, and 1 for 256-bit AVX (ymm) registers.

The p bits encode additional prefix bytes. The 4 possible values are none, 66, F3, and F2. These encode the operand type for SSE instructions: packed single, packed double, scalar single and scalar double, respectively.

Instructions that need more than three operands have an extra suffix byte specifying one or two additional register operands. Instructions coded with the VEX prefix can have up to five operands. At most one of the operands can be a memory operand; and at most one of the operands can be an immediate constant of 4 or 8 bits. The remaining operands are registers.

The AVX instruction set is the first instruction set extension to use the VEX coding scheme. The AVX instructions have up to four operands. The AVX instruction set allows the VEX prefix to be applied only to instructions using the SIMD XMM registers. However, the VEX coding scheme has space for applying the VEX prefix to other instructions as well in future instruction sets.

Legacy instructions with a VEX prefix added are equivalent to the same instructions without VEX prefix with the following differences:

The VEX-encoded instruction can have one more operand, making it non-destructive.
A 128-bit XMM instruction without VEX prefix leaves the upper half of the full 256-bit YMM register unchanged, while the VEX-encoded version sets the upper half to zero.
Instructions that use the whole 256-bit YMM register should not be mixed with non-VEX instructions that leave the upper half of the register unchanged, for reasons of efficiency

Ah, ok, I am beginning to see it now. You've spent a great deal of time contemplating all of this, haven't you? :D

The timeline fits too.

So you are thinking this is all going to come together for Haswell? Or are you thinking it is the tock past Haswell? (saltmire or something like that?)
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
which compatibility tables are you referring to?

I not going to go look in the PDF. I did make an error however . The code example and whole copy past there was from Wik . Poor choose on my part . I was wondering why it used a differant print formate as appearing on your screen . My bad. But the point is the code example is correct that the encode is there for the rex values.

This is more to the point and is in the first few pages of the 800 p pdf

1.3.3 VEX Prefix Instruction Encoding Support
Intel AVX introduces a new prefix, referred to as VEX, in the Intel 64 and IA-32
instruction encoding format. Instruction encoding using the VEX prefix provides the
following capabilities:
• Direct encoding of a register operand within VEX. This provides instruction syntax
support for non-destructive source operand.
• Efficient encoding of instruction syntax operating on 128-bit and 256-bit register
sets.
• Compaction of REX prefix functionality: The equivalent functionality of the REX
prefix is encoded within VEX.
• Compaction of SIMD prefix functionality and escape byte encoding: The functionality
of SIMD prefix (66H, F2H, F3H) on opcode is equivalent to an opcode
extension field to introduce new processing primitives. This functionality is
replaced by a more compact representation of opcode extension within the VEX
prefix. Similarly, the functionality of the escape opcode byte (0FH) and two-byte
escape (0F38H, 0F3AH) are also compacted within the VEX prefix encoding.
• Most VEX-encoded SIMD numeric and data processing instruction semantics with
memory operand have relaxed memory alignment requirements than instructions
encoded using SIMD prefixes (see Section 2.5).
VEX prefix encoding applies to SIMD instructions operating on YMM registers, XMM
registers, and in some cases with a general-purpose register as one of the operand.
VEX prefix is not supported for instructions operating on MMX or x87 registers.
Details of VEX prefix and instruction encoding are discussed in Chapter 4.

This is in fact the encode for the mitosis compiler. and works exactly as the mitosis PDF suggest.
 

bronxzv

Senior member
Jun 13, 2011
460
0
71
This is in fact the encode for the mitosis compiler. and works exactly as the mitosis PDF suggest.

Any compiler can use this encoding, that's the whole point of any ISA. Though, as many people tried to explain you before, there is nothing especially linked to Mitosis there.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Ah, ok, I am beginning to see it now. You've spent a great deal of time contemplating all of this, haven't you? :D

The timeline fits too.

So you are thinking this is all going to come together for Haswell? Or are you thinking it is the tock past Haswell? (saltmire or something like that?)

I am hoping haswell were it all comes together but I thought it would be sooner. as I will give you a link to XS before I got banned. speculating is hard when its so far in the future
I really thought the Geshner would be the one . Nehalem c really got my thinking off track but it ended up as step away from x86 . Geshner was changed to SB I think . But As I see it SB/IB are exactly what their names sugjest a bridge. I will get that link but as you will see I made alot of mistakes exspecially on core counts for AMD/INTEL. But that was the info avaiable at the time . This link is from 2006 . But If you will go back and look at Zinn2b post you will have an idea how long I been thinking on this. I will post this now get the link and edit. Oh just so ya know , I got banned for speculating. But as is normal I was 100% correct . I see the guys at XS as good overlookers in so far as World records on unstable machines but when it comes to performance at a given frequency I view them as children .

http://www.xtremesystems.org/forums/showthread.php?120113-Vector-processing-on-nehelam
 
Last edited:

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Any compiler can use this encoding, that's the whole point of any ISA. Though, as many people tried to explain you before, there is nothing especially linked to Mitosis there.

No they can't . Thats all i have to say without the software support in the compiler not Any compiler can do this PERIOD.
 

Mark R

Diamond Member
Oct 9, 1999
8,513
16
81
This is in fact the encode for the mitosis compiler. and works exactly as the mitosis PDF suggest.

Could you explain this, please? I really don't get what you are trying to say.

Nothing you have said so far makes VEX prefix work anything like mitosis. The VEX prefix is just the way that AVX2 instructions are written.

Absolutely nothing in the AVX2 instruction set has anything to do with pre-computation (p-slice) or speculative execution. In particular, the REX prefix is also nothing to do with this.

If you read the mitosis PDF, a p-slice is a program segment generated by the compiler, that is used to guestimate some results, so that the additional CPU cores have something to work on, while the main program runs on the first core.
 

bronxzv

Senior member
Jun 13, 2011
460
0
71
No they can't

where did you get this strange idea? how do you think gcc, VC++, icc,etc are generating AVX code without using this very encoding?

FYI here is a short example of AVX code with the ASM and machine code, you can see for example the typical 0xC5 leading byte for the 2-byte VEX encoding

00401 c5 fc 59 44 24 20 vmulps ymm0, ymm0, YMMWORD PTR [32+esp]
00407 c5 c4 58 e4 vaddps ymm4, ymm7, ymm4
0040b c5 fc 58 c9 vaddps ymm1, ymm0, ymm1
0040f c5 e4 59 bc 24 c0 01 00 00 vmulps ymm7, ymm3, YMMWORD PTR [448+esp]
00418 c5 e4 59 84 24 60 01 00 00 vmulps ymm0, ymm3, YMMWORD PTR [352+esp]
00421 c5 e4 59 9c 24 20 01 00 00 vmulps ymm3, ymm3, YMMWORD PTR [288+esp]
0042a c5 c4 58 fa vaddps ymm7, ymm7, ymm2
0042e c5 fc 58 d4 vaddps ymm2, ymm0, ymm4
00432 c5 e4 58 c9 vaddps ymm1, ymm3, ymm1
00436 c5 fc 11 3c 97 vmovups YMMWORD PTR [edi+edx*4], ymm7
0043b c5 fc 11 14 93 vmovups YMMWORD PTR [ebx+edx*4], ymm2
00440 c5 fc 11 0c 90 vmovups YMMWORD PTR [eax+edx*4], ymm1
00445 83 c2 08 add edx, 8
00448 3b 94 24 e8 01 00 00 cmp edx, DWORD PTR [488+esp]
0044f 0f 82 19 ff ff ff jb .B22.5 ; Prob 82%
 
Last edited:

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Any compiler can use this encoding, that's the whole point of any ISA. Though, as many people tried to explain you before, there is nothing especially linked to Mitosis there.

Mitosis is nothing more than a name given to a research project , When used on a real world production CPU Intel can call it what ever they choose . Example Me and IDC have gone around on finfet . But intel has no FINfet intel does however have 3D tri-gate. Many research products that come to market come with a new name attached to it .
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
where did you get this strange idea? how do you think gcc, VC++, icc,etc are generating AVX code without using this very encoding?

FYI here is a short example of AVX code with the ASM and machine code, you can see for example the typical 0xC5 leading byte for the 2-byte VEX encoding

00401 c5 fc 59 44 24 20 vmulps ymm0, ymm0, YMMWORD PTR [32+esp]
00407 c5 c4 58 e4 vaddps ymm4, ymm7, ymm4
0040b c5 fc 58 c9 vaddps ymm1, ymm0, ymm1
0040f c5 e4 59 bc 24 c0 01 00 00 vmulps ymm7, ymm3, YMMWORD PTR [448+esp]
00418 c5 e4 59 84 24 60 01 00 00 vmulps ymm0, ymm3, YMMWORD PTR [352+esp]
00421 c5 e4 59 9c 24 20 01 00 00 vmulps ymm3, ymm3, YMMWORD PTR [288+esp]
0042a c5 c4 58 fa vaddps ymm7, ymm7, ymm2
0042e c5 fc 58 d4 vaddps ymm2, ymm0, ymm4
00432 c5 e4 58 c9 vaddps ymm1, ymm3, ymm1
00436 c5 fc 11 3c 97 vmovups YMMWORD PTR [edi+edx*4], ymm7
0043b c5 fc 11 14 93 vmovups YMMWORD PTR [ebx+edx*4], ymm2
00440 c5 fc 11 0c 90 vmovups YMMWORD PTR [eax+edx*4], ymm1
00445 83 c2 08 add edx, 8
00448 3b 94 24 e8 01 00 00 cmp edx, DWORD PTR [488+esp]
0044f 0f 82 19 ff ff ff jb .B22.5 ; Prob 82%

So whats this prove? nothing . IDC is beginning to see. But he not convinced. But IDC will now to look harder . Because he understands that the rex instruction code is encoded to the Ves prefix . AMD has XOP and they donot have encodes in XOP for rex values . Thats a fact . Or is AMD lieing to us all as is usual. AMD can't use the vexprefix as without the compiler software supporting it . Its uselee and will cause UP the OS will not run Vexprefix without the corrrect code written correctly. As proof AMD hasn't got this . Listen to what JFamd has to say . Intel has to pad the Ymm values . Padding as he referrs to it is clearing the upper values to 0 . But if you read the mitosis PDF you can clearly see why the upper YMM register is cleared. I posted that part in this thread so ya would see in my reply to JFamd , I will get that post so that there is NO misunderstanding,
 

bronxzv

Senior member
Jun 13, 2011
460
0
71
Example Me and IDC have gone around on finfet . But intel has no FINfet intel does however have 3D tri-gate.

Since 2002 Intel announced that they were prefering Tri-gate over FinFET
http://www.intel.com/technology/silicon/tri-gate.htm

so there was little left for speculation already 10 years before the 1st products are brought to market

btw this is a very bad example of names that change since in this case they kept the original nomenclature unlike for many other technologies
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
POST 311 .

Originally Posted by JFAMD
They will do that because Sandybridge has an issue with handling mixed SSE and AVX instructions. They need to clear out their pipeline between switching instructions, and this takes clock cycles. they recommemded at IDF that companies convert all SSE instructions to AVX-128 to avoid performance penalties.





http://news.softpedia.com/news/Intel...n-187568.shtml

Well if you say so , But this is more likely the case .


Identifying Live-ins
Identifying the live-ins of a speculative thread requires a topdown
traversal of its control-flow graph starting at the CQIP to
identify register and memory values read before being written by
the speculative thread. Each path is explored until a certain length.
This length represents the time that previous threads take to
compute and commit these values. This is because once the
previous thread commits, the speculative thread need no longer
rely on predicted values, but can read committed values. This time
is estimated as the time it takes to sequentially execute all the
code between the SP and CQIP minus the thread spawn overhead

As you can clearly see JF implies intel has to do this because they haven't figured AMDS AVX instructions . So they don't have the same functionality as AMDs real deal AVX . This is dishonest in a manner . INTEL invented AVX for Intel cpus the very fact that JF amd says intel has to do something differant than AMD should tell you something. JF is referring to clearing the YMM to all zeros than he implies this takes more clock cycles . If you read the AVX pdf you will see this is not a fact. Its a necessary action for mitosis to work

If you read the AVX the length is a big deal. . One of these 2 processors can do greater than or less than the other processor cann't and that will cause an up. You tell me which is which. SB or BD
 
Last edited:

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
I am going to post that missing post again , But since I have everthing on file From this topic I going to show it to some people first.