Haswell to support transactional memory in hardware

Phynaz · Feb 8, 2012

Intel's announcement:
http://software.intel.com/en-us/blogs/2012/02/07/transactional-synchronization-in-haswell/

Ars report:
http://arstechnica.com/business/new...mory-going-mainstream-with-intel-haswell.ars?

tl;dr:
Better multi-threaded performance.

Abwx · Feb 8, 2012

Hardware.fr seems quite pessimistic about this feature.....

http://translate.googleusercontent....e.html&usg=ALkJrhiYF5aGR1g_BP2sPi0WecxZD-FdJA

blckgrffn · Feb 8, 2012

If that really works, wow. Intel is bringing their A game to the table. ARM what?

AMD, you can get this via IP sharing, no? Hopefully yes.

janas19 · Feb 8, 2012

Transactional synchronization? Hmmm.... Kind of reminds me of *gasp* Bulldozer! (A little bit). Lol.

Phynaz · Feb 8, 2012

Abwx said:
Hardware.fr seems quite pessimistic about this feature.....

http://translate.googleusercontent....e.html&usg=ALkJrhiYF5aGR1g_BP2sPi0WecxZD-FdJA

Doesn't read that way to me.

Perhaps you should read some of the research Oak Ridge National Laboratory, DARPA, Cray, IBM and others have done before you start with "Not invented by AMD".

Abwx · Feb 8, 2012

Phynaz said:
Doesn't read that way to me.

Perhaps you should read some of the research Oak Ridge National Laboratory, DARPA, Cray, IBM and others have done before you start with "Not invented by AMD".

You are in the IT sector for long enough to know that documented
research often did end as failure once implemented...

The Pentium 4 double pumped ALUs had considerable docs ,
yet they did strugle to be on par with older uarch s ALUs
on Athlon and Pentium III.....

Phynaz · Feb 8, 2012

Oh, kinda like how CMT worked out for AMD. Now I get what you are saying :whiste:

Anyway, if you want to bring up the P4 please start your own thread. The purpose of the thread is to discuss Haswell and TSX.
Thanks.

BenchPress · Feb 8, 2012

Abwx said:
You are in the IT sector for long enough to know that documented
research often did end as failure once implemented...

And you're clearly not long enough in the "IT sector" to understand the massive importance of transactional memory.

Try writing a lock free task scheduler using only CAS. Then try it again using transactional memory.

blckgrffn · Feb 8, 2012

Ars Technica had a great piece on this a while back, it's worth the read IMHO.

Abwx · Feb 8, 2012

BenchPress said:
And you're clearly not long enough in the "IT sector" to understand the massive importance of transactional memory.

Try writing a lock free task scheduler using only CAS. Then try it again using transactional memory.

Long enough to realize that what was managed with softare optimisations
will be partly replaced by an hardware managements of memory sharing and tasks locks....

You should read the article i linked....

RavenSEAL · Feb 8, 2012

Well, looks like my next re-build might just be Intel.

BenchPress · Feb 8, 2012

Abwx said:
Long enough to realize that what was managed with softare optimisations
will be partly replaced by an hardware managements of memory sharing and tasks locks....

Your point being?

You should read the article i linked....

I know everything there is to know about it. I read the spec. Now what's your point?

Abwx · Feb 9, 2012

BenchPress said:
Your point being?

I know everything there is to know about it. I read the spec. Now what's your point?

My point is that a software implementation will be on
the long term more versatile.

What intel is doing is just to push the OS in a direction such
that it will be optimised firstly for its CPUs , but this is old history....

Phynaz · Feb 9, 2012

Abwx said:
My point is that a software implementation will be on
the long term more versatile.

What intel is doing is just to push the OS in a direction such
that it will be optimised firstly for its CPUs , but this is old history....

Really. I wonder what IBM is doing at Argonne then? What OS are they pushing to be optimized for their CPU's?

As humorous as your uninformed posts are, you are continuing to embarrass yourself. You really should consider spending a couple of hours educating yourself. Or don't, I really don't care either way.

Phynaz · Feb 9, 2012

BenchPress said:
I read the spec.

Thanks for the link. I read through chapter 8.

You know what I'm thinking....This is a first step towards speculative execution. This is the first of the control mechanisms that will be required. I wonder now how much of this initially started with the Mitosis research project?

Abwx · Feb 9, 2012

Speculative execution already exist , it s just that it cant be pushed
further as this would imply useless and energy voracious speculative exe.....

Has already heard of pipes stallings???....

CPUarchitect · Feb 9, 2012

Abwx said:
Speculative execution already exist , it s just that it cant be pushed
further as this would imply useless and energy voracious speculative exe.....

He's talking about speculative threading assisted by hardware, not the existing branch prediction you're thinking of.

And speculation is not necessarily a bad thing for power consumption. For instance a processor without branch prediction and prefetching would run much slower than a modern one which does use these techniques. To make it match in performance, you'd have to do some extreme overclocking, greatly increasing the power consumption beyond what you saved by not speculating.

So from a performance/Watt perspective speculation can be a very interesting deal. It just needs a high prediction rate and a low penalty for mispredictions.

bronxzv · Feb 9, 2012

Abwx said:
Has already heard of pipes stallings???....

hey!, the whole point of speculative multithreading is to *avoid stalls*

Cerb · Feb 9, 2012

Abwx said:
Hardware.fr seems quite pessimistic about this feature.....

http://translate.googleusercontent....e.html&usg=ALkJrhiYF5aGR1g_BP2sPi0WecxZD-FdJA

I dunno. Like Phynaz, I read a fairly balanced view on it, from that article.

TM is one of those things that we need, and in the long run, need supported in hardware.

nyker96 · Feb 9, 2012

all look good on paper, I wonder what's the actual performance improvements from this type of thing. When it comes to threading and multitasking at least from a software point of view, it's hard to tell its benefits without some actual testing.

blckgrffn · Feb 9, 2012

nyker96 said:
all look good on paper, I wonder what's the actual performance improvements from this type of thing. When it comes to threading and multitasking at least from a software point of view, it's hard to tell its benefits without some actual testing.

Having seen a "variation" of this sort of thing introduced in file systems, which have relied on traditional locks like this in the past, the upsides and practical throughput gains can be pretty enormous. I would think the more cores you have the more important this technology is to maintain throughput.

I say that because this is important when you have multiple servers accessing a clustered files system, which is how I view cores competing for memory access. It probably isn't perfect or even 90% accurate, but it makes sense of this concept for me.

Cerb · Feb 9, 2012

nyker96 said:
all look good on paper, I wonder what's the actual performance improvements from this type of thing. When it comes to threading and multitasking at least from a software point of view, it's hard to tell its benefits without some actual testing.

The greatest benefit is that it makes a lot more sense, when sharing data across threads. Let each thread read and write as it will, with checks to verify correctness, and then allow a globally-visible commit, or fall back. It would be wrong to say it is simple, but it would be right to say that it fits most humans' thought patterns far better than locking. Using locks in a traditional way is very much a choice of lesser evil over greater evil (lockless operation with shared memory--run away in fear!).

Software transactional memory has a fair chance of imposing more overhead than it offers in added performance, due to the added overhead of managing transactions. Each transaction must be isolated until commit, must have a way to verify that it should commit, that it did or did not commit, and then something to do on commit failure. When the transactions themselves are for very small amounts of work and/or memory, keeping up with that can take longer than waiting to grab a lock. Meanwhile, optimistic locking gets you halfway into the problems of trying to go lock-free.

Looking at the spec, this won't remove STM from the picture, but reduce overhead for small contentious sections of code (tell the hardware how to handle a race condition, and only do something about it if a race actually happens). I love chiding Intel for craziness in their extensions, but this one looks just about right. All I don't see are ways to handle falsely-conflicting writes.

Phynaz said:
Thanks for the link. I read through chapter 8.

You know what I'm thinking....This is a first step towards speculative [threading]. This is the first of the control mechanisms that will be required. I wonder now how much of this initially started with the Mitosis research project?

First step? Isn't it the overwhelming majority of the steps? There is no need for the hardware to know that it is executing the same task in two threads; there just needs to be (1) a way for one thread to fail without screwing up program/memory state, and (2) a way to guarantee that only correct paths keep executing. If my understanding is correct, a little special handling of overlapping write sets (IE, false sharing -> convoy -> live-lock) should be all the CPU would need, in addition to a minimal HTM implementation (in addition, a mechanism to handle sub-line sharing/contention would also improve basic HTM performance--win-win), to be able to properly implement SpMT.

Nemesis 1 · Feb 9, 2012

Oh! cool

RobertPters77 · Feb 9, 2012

Goddamnit Intel you keep raping my wallet with your innovations!

I'LL TAKE 20 HASWELLS!

Abwx said:
Hardware.fr seems quite pessimistic about this feature.....

http://translate.googleusercontent....e.html&usg=ALkJrhiYF5aGR1g_BP2sPi0WecxZD-FdJA

Because it's not made by Amd?

Abwx · Feb 10, 2012

RobertPters77 said:
Because it's not made by Amd?

No , because i m not gullible and that i take the time to read articles,
as the one from Hardware.fr

Ultimately these early information about the implementation of transactional memory in Haswell leave us mixed feelings. HLE mode, due to its backward compatibility is particularly interesting and should simplify the execution of code that was not written optimally by providing performance gains. It will be understood that programmers recompile the code by adding the intrinsic need to enjoy it but the payoff can be interesting. RTM mode for its part also shows the limits of standard x86, by dint of being extended in all directions is the problem of interoperability.

Unlike mathematical instructions that a compiler can decide on its own dispatcher SSE instructions or AVX for some calculations based on the processor model on which the program will run, transactional memory requires an intervention by the programmer (via intrinsic this may also be the case for math instruction) or a new memory model for the programming language (the model proposed above transactional memory for C + 11). If we add to that the unknown with concern the eventual implementation of AMD, how RTM may have an even harder to win than other extensions to the x86 instruction set.

Edit :

As a side note , i will add that the topic title is misleading
according to the quote above , seems that Phynaz
is taking Intel s claims at face value , as usual....

Haswell to support transactional memory in hardware

Lifer

Lifer

Diamond Member

Platinum Member

Lifer

Lifer

Lifer

Senior member

Diamond Member

Lifer

Diamond Member

Senior member

Lifer

Lifer

Lifer

Lifer

Senior member

Senior member

Elite Member

Diamond Member

Diamond Member

Elite Member

Lifer

Senior member

Lifer