Originally posted by: Nemesis 1
Well its pretty sure thing that you don't understand compilers or mitosis. You should have read the link to it I gave you. Than you would understand the SSE4 importance to mitosis.
I have read the link before, actually...
How about one from Intel on what Mitosis actually is and what it does...
Intel's Mitosis White Paper
First, the reason for the program...
any time a compiler has to parallelize two pieces of code, it has to consider all potential dependences. It has to analyze whether one piece of code might write something in a given memory location that another piece of code may read. Unfortunately, in most cases, a compiler doing this only has an approximate view of the memory locations that are being touched by every single instruction. As a result, whenever the compiler has to detect potential dependences, it tends to be over-conservative. If the compiler cannot prove that two instructions are independent, it presumes they are dependent. This means when the compiler generates the code, it assumes a huge number of dependences that don't exist or very rarely exist. The code ends up overly serialized and misses many opportunities for parallelization
In other words, current compilers must be conservative with dependencies...now the Mitosis solution.
Speculative threads could revolutionize how we parallelize applications. Compilers would no longer have to be conservative. Instead, they could be optimistic. Instead of generating code for the worst case, compilers could generate for the common case. The result would be a much higher degree of parallelism and a significant gain in performance
A brilliant idea, but how do we make it work?
First the software (compiler):
Mitosis relies on both hardware and software (compiler) support to work. On the software side, the Mitosis compiler is responsible for analyzing the program, and locating the sections of it that can efficiently be executed in parallel. A key component of this analysis is the identification of sections of code whose corresponding precomputation slices have a very low computation overhead. Other conventional aspects such as workload balance also need to be considered
Intel has one of the best software teams around, but this project is MASSIVE (certainly more difficult than the design of C2D)! It must also be distributed, tested, and utilized by the software community...not exactly a 1-2 year project (just ask the EPIC guys).
What about the hardware?
On the hardware side, Mitosis is built on top of a multi-core and/or multithreaded processor. The main extension required is support for buffering and multiversioning in the memory hierarchy. Buffering is needed to keep the speculative state until the thread is verified and can be committed. Multiversioning is required to allow each variable to have a different value for each of the threads that are running in parallel. This is needed because every thread is executing a piece of code that started out with sequential semantics, but now, parallelized in threads, is being worked on simultaneously with values that were previously supplied in different points in time in the program
As you can see Intel is certainly headed in that direction, but the buffering and multiversioning isn't part of Penryn (at least not part of what we've seen, and that would be a HUGE change!), and certainly HT has nothing to do with it (though it does
sound similar).
Now let's address your confusion about what SSE does...
SSE stands for Streaming SIMD Extensions.
You can think of SSE as a type of macro register that allows for simultaneous calculations to be performed from a call made in the software. This frees up the general register integer and FP work needed to be done by a significant amount.
The first SSE added 8 registers and 70 new instructions.
SSE2 and SSE3 added another 144 instructions which were predominantly geared towards media encoding
For SSE4, I think you saw the words "compiler vectorization primitives" and "parallelized code" without really reading the rest...
Firstly, remember that Nehalem is to be a CSI based processer...
Second, remember that one of the original goals of CSI was to create a single platform for both Xeon and Itanium (which operates in EPIC only)...
Now read this line again...
"The bulk of SSE4's 50 or so instructions is comprised of new compiler vectorization primitives, which should make it easier for compilers to translate software written in high-level languages into effectively parallelized code and data structures"
As to why they are introducing it in Penryn, it seems to me to be the same scenario they followed with Prescott...where they introduced the hardware for EM64T but didn't activate it. SSE4 needs to be in the marketplace and be used by codewriters before it becomes effective.