Haswell will rival graphics performance of today's discrete cards!

Page 10 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

ch33kym0use

Senior member
Jul 17, 2005
495
0
0
I can do without the hype. They should stick to what they do best. Advancing technology that they can and providing computer things for peeps.
 

CPUarchitect

Senior member
Jun 7, 2011
223
0
0
Thanks much for your time and effort . Would you comment on this .Now this is during recompile . Lets say in the case of a jit compiler. Could this be a true statement.

The entry and exit points of the current working thread are marked, and the portion of the thread that would be split off is separated. The new thread is then fed the appropriate input data that it needs to begin its execution.
Speculative threading, like in the Mitosis project, is just experimental academic research. It hasn't proven to be of any practical use yet, and I don't think it ever will. It attempts to dynamically turn single-threaded code into multi-threaded code. In theory that can speed some things up a little, but it won't scale beyond a few cores due to the speculative nature. It also wastes lots of electrical power, which just isn't acceptable. There are far more promising static techniques to improve effective multi-core performance.

Anyhow, this hasn't got a single thing to do with AVX. And on top of that all of the multi-threading research applies equally to Intel, AMD, and every other chip manufacturer. The Mitosis project is insignificant and isn't used outside of the research labs (and I found no indication that it's still actively in development).

AVX2 might be relevant to this thread's topic, but speculative threading certainly is not. Graphics is known as an "embarrassingly parallel" workload, and does not require speculative techniques to increase the number of threads.

In fact graphics is struggling with too many threads! As the transistor budget continues to scale, the thread count will have to be lowered. To achieve that they'll have to start focussing on ILP within threads. CPU architectures already do that, which is another reason why eventually AVX2+ may take over graphics processing and achieve higher effective performance than a heterogeneous architecture at other tasks. We're still many years away from that, but it looks like Intel is closer to that goal than AMD.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
I just wanted to see who I was dealing with here . Good enough for me .

Now its just a matter of waiting till the fat lady sings.
 
Last edited:

Keysplayr

Elite Member
Jan 16, 2003
21,211
50
91
I just wanted to see who I was dealing with here . Good enough for me .

Now its just a matter of waiting till the fat lady sings.

You mean, you just wanted to see who was dealing with you.

;)

Even though I haven't got a clue about what he is talking about, it sure sounds like he does.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
You mean, you just wanted to see who was dealing with you.

;)

Even though I haven't got a clue about what he is talking about, it sure sounds like he does.

Yes he does . But I am not getting direct ans. to a question there be a little dancing here. With haswell we are talking about a cpu . Yet everyone is focusing to much on Graphics . In the last reply he wrote this .

AVX2 might be relevant to this thread's topic, but speculative threading certainly is not. Graphics is known as an "embarrassingly parallel" workload, and does not require speculative techniques to increase the number of threads.

I was referring to lets say an old apps recompiled using AVX I or II . That contains computational slices in the Vex prefix . Not all VEX code contains these computional slices only the VEX prefix. In the case of an fusion type processor doing graphics it is possiable that speculative thread can be spun off to the cpu to aid the gpu. So that the cpu in graphics parallel workload offers even greater performance increases by harnessing the entire die. The GPU may not benefit from this but the cpu portion most definitly would. Let us not forget this tech is all new . Intels Aim is to harness the entire die .For improved performance and efficiency . The same as NV is doing and AMD. NV has cuda AMD open CL . Yes I know NV has CL. In the case of intel /AMD has both CL /AVX its would be ashame if the compilers didn't take advantage of both at the same time. In the case of NV running CL as i understand it NV uses a software layer for CL are you going to debate that this is weak comparred to what AMD is doing ? I also clearly wrote that we need to use the information here that both Intel /AMD have in the AVX documents PDF. No other sources. Stay with whats in the AVX PDF . We can clearly see that the Vex prefix and the XOP are not the same. Close but no cigar.


When I wrote the above as follows I misworded. Were I said this .

Not all VEX code contains these computional slices only the VEX prefix. It should read not all AVX instructions have the VEX ----Prefix. Sorry about that . Vex is the computation slices (P-Slice) As it appeears at the front of an AVX recompile say SSE II instructions recompiled.
 
Last edited:

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
In theory that can speed some things up a little, but it won't scale beyond a few cores due to the speculative nature.

I was under the impression it requires lots of cores, and doesn't run with just a few.

This is due to the fact the another thread is spawned at every possible branch. You can very quickly have hundreds of threads running.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
I was under the impression it requires lots of cores, and doesn't run with just a few.

This is due to the fact the another thread is spawned at every possible branch. You can very quickly have hundreds of threads running.
Mitosis primarily speculated for unknown data dependencies, which is a problem that is hard to solve statically (lifetime optimizing compilers are good, though), and which is more of a problem for IA64 than the rest of the universe (it is a problem for the rest of the universe, but less so). It's a good bit more sophisticated than just spawning for branches. Actually, it's a good bit more complicated than just handling data dependencies, but that's effectively it in a nutshell, as data dependencies are Itanium's elephant in the room. If a data dependency is dependent on a not-yet-taken branch, or not-yet-calculated value, exactly how do you handle it? You can predicate (eats up instructions and registers, and gets really ugly with typical scalar/superscalar machines), you can wait (leave it up to hardware), you can speculate in software (useful, but not generally applicable, nor will a generalized set of code work for any given application), or you can act like it will go one way with some exception producing/handling code (risky w/o good profiling, since you will want to know the path(s) that is/are taken most often). Generally, not situations that scale out too much, as after a few slices execute, the whole thing is merged back to the main thread. On top of what can be done, in many cases of value prediction, there simply is nothing that can be done but to hope that hardware speculation (and cache eviction policies) likes your memory access patterns.

Speculative threading is great outside of reality, but back in reality, nothing is free.
 
Last edited:

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
The mitosis PDF .

http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.81.8192&rep=rep1&type=pdf


From another paper.


This paper presents the Mitosis framework, which is a combined hardware-software approach to speculative multithreading, even in the presence of frequent dependences among threads. Speculative multithreading increases single-threaded application performance by exploiting thread-level parallelism speculatively - that is, executing code in parallel even when the compiler or runtime system cannot guarantee the parallelism exists. The proposed approach is based on predicting/computing thread input values via software, through a piece of code that is added at the beginning of each thread (the pre-computation slice). A pre-computation slice is expected to compute the correct thread input values most of the time, but not necessarily always. This allows aggressive optimization techniques to be applied to the slice to make it very short. This paper focuses on the microarchitecture that supports this execution model. The primary novelty of the microarchitecture is the hardware support for the execution and validation of pre-computation slices. Additionally, this paper presents new architectures for the register file and the cache memory in order to support multiple versions of each variable and allow for efficient roll-back in case of misspeculation. We show that the proposed microarchitecture, together with the compiler support, achieves an average speedup of 2.2 for applications that conventional non-speculative approaches are not able to parallelize at all.
 
Last edited:

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Well I see from were its moved in position in google a few are interested in the paper above . Ya have to pay to play.