Thoughts on "8 Core" Bulldozer and "4 Core Sandy Bridge"

Nemesis 1 · Jun 13, 2011

Idontcare said:
Do you mean Anaphase?

I thought Mitosis was abandoned as a dead-end.

No I don't mean anaphase . Had you not all ready used that I wouldn't even have brought mitosis up . NO It didn't die . Fact is Intel has brought it to market . The discriptive

of the vex prefix is the same as that of mitosis . Its just has a differant name now . But both use P-slices and increase performance alot

Nemesis 1 · Jun 13, 2011

podspi said:
In the end performance is all that matters. If Haswell really brings a speedup of 2x compared to SB/IB, you will have a believer out of me :awe:

We won't have to wait for Haswell . The wait will be for AMDs BD using AVX and intel using AVx with the Vexprefix to see this performance jump . Intels will be compileing old SSe apps with the Vex prefix alot faster than AMD . When you run a programm that has been recompiled were the Vexprefix used . The performance differance will be huge in most cases not so much in other cases but the averge performance increase will be about 2.2 Most of the SSE2 apps will use the vexprefix and the recompile is automatic when vexprefix is applied

JFAMD · Jun 14, 2011

bronxzv said:
Why will they do that since Sandy Bridge has good support for legacy SSE code and Bulldozer will have probably even better support ? Why will they bother to maintain one more code path since they must continue to support legacy CPUs?

They will do that because Sandybridge has an issue with handling mixed SSE and AVX instructions. They need to clear out their pipeline between switching instructions, and this takes clock cycles. they recommemded at IDF that companies convert all SSE instructions to AVX-128 to avoid performance penalties.

Nemesis 1 said:
LINK

http://news.softpedia.com/news/Inte...ge-EP-CPUs-Will-Arrive-in-Autumn-187568.shtml

ed29a · Jun 14, 2011

bronxzv said:
I have been warned "2. an unconquerable opponent or rival.", you're really good at this game

Idiots tend to drag you down to their level and beat you with experience. I wouldn't argue with them. The chump can't even compose a coherent sentence.

No insulting other members here. This is your only warning before infractions.
Markfw900
Anandtech Moderator

Idontcare · Jun 14, 2011

Nemesis 1 said:
No I don't mean anaphase . Had you not all ready used that I wouldn't even have brought mitosis up . NO It didn't die . Fact is Intel has brought it to market . The discriptive

of the vex prefix is the same as that of mitosis . Its just has a differant name now . But both use P-slices and increase performance alot

Are you sure you are talking about the same Mitosis that I am?

http://www.anandtech.com/show/1766/7

Nemesis 1 · Jun 14, 2011

IDC I will referr you to post 273,
If you google Intel mitosis . In the top five links you will see a pdf its 11 pages. I Can't link .

Than read post 294. Best thing to do is read the section I pulled that from In the intel 800 page PDF . You will see clearly this is mitosis. Post 294 describes P-sclies in the Vex prefix . If you read the 800 page PDF . Than you will have no doubt6 about it.

Nemesis 1 · Jun 14, 2011

ed29a said:
Idiots tend to drag you down to their level and beat you with experience. I wouldn't argue with them. The chump can't even compose a coherent sentence.

I see forum rules must not apply to you. You could say the same thing if your vocabulary was better. Without breaking forum rules .

bronxzv · Jun 14, 2011

ed29a said:
I wouldn't argue with them.

Neither me, usually. In this case it's fascinating somehow since it looks like if the poster is genuinely convinced to have found something very important. His intent isn't apparently to spread FUD but to make everybody aware of the discovery he made, but which discovery? I have no idea...

Mark R · Jun 14, 2011

I have no idea what relevance Intel's mitosis technology and P-slices have to do with AVX; they are totally different and not even remotely related.

Mitosis is a compiler technology designed to maximize performance on explicitly parallel instruction code (EPIC) processors, such as the Itanium series.

VEX is nothing but the way in which AVX instructions are encoded.

Am I missing something?

jones377 · Jun 14, 2011

3.......2.......1.......

Nemesis 1 · Jun 14, 2011

JFAMD said:
They will do that because Sandybridge has an issue with handling mixed SSE and AVX instructions. They need to clear out their pipeline between switching instructions, and this takes clock cycles. they recommemded at IDF that companies convert all SSE instructions to AVX-128 to avoid performance penalties.

http://news.softpedia.com/news/Inte...ge-EP-CPUs-Will-Arrive-in-Autumn-187568.shtml

Well if you say so , But this is more likely the case .

Identifying Live-ins
Identifying the live-ins of a speculative thread requires a topdown
traversal of its control-flow graph starting at the CQIP to
identify register and memory values read before being written by
the speculative thread. Each path is explored until a certain length.
This length represents the time that previous threads take to
compute and commit these values. This is because once the
previous thread commits, the speculative thread need no longer
rely on predicted values, but can read committed values. This time
is estimated as the time it takes to sequentially execute all the
code between the SP and CQIP minus the thread spawn overhead

bronxzv · Jun 14, 2011

Mark R said:
Mitosis is a compiler technology designed to maximize performance on explicitly parallel instruction code (EPIC) processors, such as the Itanium series.

AFAIK the speculative thread techniques exposed in the Mitosis paper are particularly effective on an in-order CPU like Itanium but they can provide speedups also on OoO CPUs though to a lesser extent. They are useful for example in some cases to decrease cache miss rate for the cases that can't trigger the hardware prefetchers and that are not easily amenable to explicit software prefetch.

My understanding is that the most effective techniques exposed in the Mitosis paper (useful for x86 code on an OoO engine) are available in the auto-parallelizer of the mainstream Intel C++ compiler (with the -parallel flag set), it will be interesting to have a confirmation by someone in the compiler team.

The only required hardware support is multithreading and a shared cache between the main thread and the speculative thread. With hyperthreading the L1D$ is shared so it's probably the best possible target.

podspi · Jun 14, 2011

bronxzv said:
AFAIK the speculative thread techniques exposed in the Mitosis paper are particularly effective on an in-order CPU like Itanium but they can provide speedups also on OoO CPUs though to a lesser extent. They are useful for example in some cases to decrease cache miss rate for the cases that can't trigger the hardware prefetchers and that are not easily amenable to explicit software prefetch.

My understanding is that the most effective techniques exposed in the Mitosis paper (useful for x86 code on an OoO engine) are available in the auto-parallelizer of the mainstream Intel C++ compiler (with the -parallel flag set), it will be interesting to have a confirmation by someone in the compiler team.

The only required hardware support is multithreading and a shared cache between the main thread and the speculative thread. With hyperthreading the L1D$ is shared so it's probably the best possible target.

Isn't the idea to toss OoO, and use the extra transistors/power for more threads/faster clocks?

bronxzv · Jun 14, 2011

podspi said:
Isn't the idea to toss OoO, and use the extra transistors/power for more threads/faster clocks?

This is the idea behind some designs but unrelated to Mitosis AFAIK, the Mitosis paper mention SMT as a possible target, SMT is typically implemented on OoO engines since the canceled EV8 (Power 6 is the only exception I can think off)

podspi · Jun 14, 2011

bronxzv said:
This is the idea behind some designs but unrelated to Mitosis AFAIK, the Mitosis paper mention SMT as a possible target, SMT is typically implemented on OoO engines since the canceled EV8 (Power 6 is the only exception I can think off)

Don't forget the venerable Intel Atom!

bronxzv · Jun 14, 2011

podspi said:
Don't forget the venerable Intel Atom!

Ooops! indeed, sorry

Idontcare · Jun 14, 2011

Nemesis 1 said:
IDC I will referr you to post 273,
If you google Intel mitosis . In the top five links you will see a pdf its 11 pages. I Can't link .

Than read post 294. Best thing to do is read the section I pulled that from In the intel 800 page PDF . You will see clearly this is mitosis. Post 294 describes P-sclies in the Vex prefix . If you read the 800 page PDF . Than you will have no doubt6 about it.

OK Nemesis, I followed your rabbit hole and I came up with nothing.

The merry-go-round of "follow post ###, which references post ###, which quotes post ###" and I finally got to the much venerated 800pg pdf.

Likewise I got the 11pg mitosis document.

Did a search on the 800pg doc...searched for "p-slice" - came up nothing. Searched for "slice" - nothing again. Searched for "mitosis"...nada.

I like the 11pg mitosis doc. But I continue to see absolutely no correlation between it and the vex/avx stuff here.

If your goal is to keep the world's best kept secret all to yourself then you are succeeding.

podspi · Jun 14, 2011

I was under the impression that all of this speculative multi threading didn't work well in practice because it was simpler/more effective to just shut down cores/threads and boost the speed of working threads, instead of trying to use those threads/transistors to predict the the working thread.

I however, am way out of my league in this discussion

Still, very cool idea. Maybe IBM will implement something like this in their quest for performance at all costs!

Nemesis 1 · Jun 14, 2011

OK . Than read post 164. its taken from the 800p. PDF .

Here is a discriptive of a P-slice on Xex prefex . I also have something else that I pulled that sats it in wording that in more simpliefied but I won't post that as of yet . I was actually hoping someone else might read the PDF . But I will post it when I feel the right time is at hand.

AVX and FMA instructions are encoded using a more efficient format than previous
instruction extensions in the Intel 64 and IA-32 architecture. The improved encoding
format uses a new prefix referred to as “VEX“. The VEX prefix may be two or three
bytes long, depending on the instruction semantics. Despite the length of the VEX
prefix, the instruction encoding format using VEX addresses two important issues:
(a) there exists inefficiency in instruction encoding due to SIMD prefixes and some
fields of the REX prefix, (b) Both SIMD prefixes and REX prefix increase in instruction
byte-length. This chapter describes the instruction encoding format using VEX.

I have also posted in this thread a little more conclusive information on this . But for now I have posted enough on the subject that leaves little doudt about it being mitosis .

In the Mitosis PDF I pulled a small paragraph that adderess LENGHTH . or L If you read the vexprefix you would see more clearly. Sure I could do this in a more simplistic fashion . But than that would not allow you guys to use your own minds. .

But I will in my next post . post the small sections I deemed relavent to the debate.

Nemesis 1 · Jun 14, 2011

This is from mitosis PDF .

Identifying Live-ins
Identifying the live-ins of a speculative thread requires a topdown
traversal of its control-flow graph starting at the CQIP to
identify register and memory values read before being written by
the speculative thread. Each path is explored until a certain length.
This length represents the time that previous threads take to
compute and commit these values. This is because once the
previous thread commits, the speculative thread need no longer
rely on predicted values, but can read committed values. This time
is estimated as the time it takes to sequentially execute all the
code between the SP and CQIP minus the thread spawn overhead

2 things I want you to retain here is Greater than or Less than and code length.

------------------------------------------------------------------------------------

Flexible and more compact bit fields are provided in the VEX prefix to retain the
full functionality provided by REX prefix. REX.W, REX.X, REX.B functionalities are
provided in the three-byte VEX prefix only because only a subset of SIMD instructions
need them.
I hope this helps the people with reading comprehension problems . IF not. OH well!

This defines prefix Vex . This is a P-slice There is still one more example thats in more simple terms . But thats for later.
_________________________________________________________________________________________________________________________________________________

This is from Mitosis PDF

value prediction. The
synchronization approach imposes a high overhead when
dependences are frequent, as in the workload presented here.
Value prediction has more potential – if the values that are
computed by one thread and consumed by another can be
predicted, the consumer thread can be executed in parallel with
the producer thread since these values are only needed for
validation at a later stage. It is typically assumed that these value
predictions are computed in hardware. The Mitosis system
presents a novel approach, which adds code (derived from the
original program) to predict in software the live-ins (values
consumed, but not produced by, the thread) for each speculative
thread. Because mechanisms for recovery of incorrect threads are
already in place, the code to produce the values need not always
be correct, and can be highly optimized. We refer to this code as
pre-computation slices (p-slices). The main advantages of p-slices
are: (1) they are potentially more accurate in the prediction of
live-ins than a hardware-based predictor, since it is derived from
the original code, (2) they can encapsulate multiple control flows
that contribute to the prediction of live-ins, and (3) they can
accelerate the detection of incorrectly spawned threads.

Unfortunitly one of my post was removed that really ties it together . I did however copy that page as I did all these pages to CD . When wife gets home I will ask her what folder she put it in as its not in the index yet on the research computer.

________________________________________________________________________________________________________________________________________

Other than the missing post this is the last post in this reply . Note it shows the rex prefix code inside the the Vexprefix. THat is a P-slice when used with the compiler
3bytes vexprefix
The four bits R,X,B,W contained in the REX prefix used in the x86-64 instruction set extension.
Two bits named pp to replace operand size prefixes and operand type prefixes (66, F2, F3).
A bit named L specifying 256 bit vector length.
Four bits named vvvv specifying an second source register operand.
Five bits named m-mmmm. Two of the m bits are used for replacing existing escape codes and for specifying the length of the instruction. The remaining three m bits are reserved for future use, such as specifying vector lengths > 256 bits, specifying different instruction lengths, or extending the opcode space.
The 2-bytes VEX prefix contains a subset of these components and can be used in cases where not all components are needed.

The encoding is as follows:

First byte Second byte Third byte
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
3-byte VEX 1 1 0 0 0 1 0 0 R̅ X̅ B̅ m m m m m W v v v v L p p
2-byte VEX 1 1 0 0 0 1 0 1 R̅ v v v v L p p

The R̅, X̅ and B̅ bits are equivalent to the REX prefix's R, X and B bits, providing a fourth register number bit for each of the three registers referenced by a standard x86 instruction: the register operand, and the index and base registers for the memory operand. The v̅ bits specify an additional source register, or are set to all-ones if not used. All of these bits are complemented in the instruction stream, so they are encoded as 1 bits in 32-bit mode.

The VEX opcode bytes are the same as that used by the LDS and LES instructions. These instructions are not supported in 64-bit mode, while in 32-bit mode, the following "mod R/M" byte can not be of the form "11xxxxxx" (which would specify a register operand). The bit inversion ensures that the second byte of a VEX prefix is always of this form in 32-bit mode.

The W bit is equivalent to the REX prefix's W bit, and specifies a 64-bit operand. For non-integer instructions, it is a general opcode extension bit.

The 5 m bits replace leading opcode bytes. The values 1, 2 and 3 are equivalent to opcodes 0F, 0F 38 and 0F 3A; all other values are currently reserved. (The 2-byte VEX prefix always corresponds to a 0F prefix.)

The L bit indicates the vector length. It is 0 for 128-bit SSE (xmm) registers, and 1 for 256-bit AVX (ymm) registers.

The p bits encode additional prefix bytes. The 4 possible values are none, 66, F3, and F2. These encode the operand type for SSE instructions: packed single, packed double, scalar single and scalar double, respectively.

Instructions that need more than three operands have an extra suffix byte specifying one or two additional register operands. Instructions coded with the VEX prefix can have up to five operands. At most one of the operands can be a memory operand; and at most one of the operands can be an immediate constant of 4 or 8 bits. The remaining operands are registers.

The AVX instruction set is the first instruction set extension to use the VEX coding scheme. The AVX instructions have up to four operands. The AVX instruction set allows the VEX prefix to be applied only to instructions using the SIMD XMM registers. However, the VEX coding scheme has space for applying the VEX prefix to other instructions as well in future instruction sets.

Legacy instructions with a VEX prefix added are equivalent to the same instructions without VEX prefix with the following differences:

The VEX-encoded instruction can have one more operand, making it non-destructive.
A 128-bit XMM instruction without VEX prefix leaves the upper half of the full 256-bit YMM register unchanged, while the VEX-encoded version sets the upper half to zero.
Instructions that use the whole 256-bit YMM register should not be mixed with non-VEX instructions that leave the upper half of the register unchanged, for reasons of efficiency

GammaLaser · Jun 14, 2011

Seems a lot like Sun's "Rock" CPU, which was to pioneer the hardware scout/simulataneous speculative threading ideas. Found a paper at http://csrl.unt.edu/~kavi/CSCE6610/Sun-Rock.pdf. Rock had a lot of interesting architecture features, 'cept it got killed off by Oracle.

Nemesis 1 · Jun 14, 2011

GammaLaser said:
Seems a lot like Sun's "Rock" CPU, which was to pioneer the hardware scout/simulataneous speculative threading ideas. Found a paper at http://csrl.unt.edu/~kavi/CSCE6610/Sun-Rock.pdf. Rock had a lot of interesting architecture features, 'cept it got killed off by Oracle.

Yes indeed . Now all you have to do is research were sparc tech came from . ELBRUS. No oracle didn't kill it . I don't have the agreement But I believe when Intel bought Elbrus SUN in a new license agreement couldn't transferr this tech to another owner as AMD with X86.

bronxzv · Jun 14, 2011

GammaLaser said:
Seems a lot like Sun's "Rock" CPU, which was to pioneer the hardware scout/simulataneous speculative threading ideas. Found a paper at http://csrl.unt.edu/~kavi/CSCE6610/Sun-Rock.pdf. Rock had a lot of interesting architecture features, 'cept it got killed off by Oracle.

Sun was late to the party, look at an early paper here :
http://www.intel.com/technology/itj/2002/volume06issue01/vol6iss1_hyper_threading_technology.pdf
page 22

all this stuff is pretty much out of fashion now if you ask me

Mark R · Jun 14, 2011

Nemesis 1 said:
OK . Than read post 164. its taken from the 800p. PDF .

Here is a discriptive of a P-slice on Xex prefex . I also have something else that I pulled that sats it in wording that in more simpliefied but I won't post that as of yet . I was actually hoping someone else might read the PDF . But I will post it when I feel the right time is at hand.

AVX and FMA instructions are encoded using a more efficient format than previous
instruction extensions in the Intel 64 and IA-32 architecture. The improved encoding
format uses a new prefix referred to as VEX. The VEX prefix may be two or three
bytes long, depending on the instruction semantics. Despite the length of the VEX
prefix, the instruction encoding format using VEX addresses two important issues:
(a) there exists inefficiency in instruction encoding due to SIMD prefixes and some
fields of the REX prefix, (b) Both SIMD prefixes and REX prefix increase in instruction
byte-length. This chapter describes the instruction encoding format using VEX.

Sorry. I don't understand. There is no mention of the phrase 'p-slice' in there. Also, what is being described above bears absolutely no resemblance to the 'p-slice' described in the mitosis papers.

Nemesis 1 · Jun 14, 2011

[ repeat post

Thoughts on "8 Core" Bulldozer and "4 Core Sandy Bridge"

Lifer

Lifer

Senior member

Senior member

Elite Member

Lifer

Lifer

Senior member

Diamond Member

Senior member

Lifer

Senior member

Golden Member

Senior member

Golden Member

Senior member

Elite Member

Golden Member

Lifer

Lifer

Member

Lifer

Senior member

Diamond Member

Lifer