all look good on paper, I wonder what's the actual performance improvements from this type of thing. When it comes to threading and multitasking at least from a software point of view, it's hard to tell its benefits without some actual testing.
The greatest benefit is that it makes a lot more sense, when sharing data across threads. Let each thread read and write as it will, with checks to verify correctness, and then allow a globally-visible commit, or fall back. It would be wrong to say it is simple, but it would be right to say that it fits most humans' thought patterns far better than locking. Using locks in a traditional way is very much a choice of lesser evil over greater evil (lockless operation with shared memory--run away in fear!).
Software transactional memory has a fair chance of imposing more overhead than it offers in added performance, due to the added overhead of managing transactions. Each transaction must be isolated until commit, must have a way to verify that it should commit, that it did or did not commit, and then something to do on commit failure. When the transactions themselves are for very small amounts of work and/or memory, keeping up with that can take longer than waiting to grab a lock. Meanwhile, optimistic locking gets you halfway into the problems of trying to go lock-free.
Looking at the spec, this won't remove STM from the picture, but reduce overhead for small contentious sections of code (tell the hardware how to handle a race condition, and only do something about it if a race actually happens). I love chiding Intel for craziness in their extensions, but this one looks just about right. All I don't see are ways to handle falsely-conflicting writes.
Thanks for the link. I read through chapter 8.
You know what I'm thinking....This is a first step towards speculative [threading]. This is the first of the control mechanisms that will be required. I wonder now how much of this initially started with the Mitosis research project?
First step? Isn't it the overwhelming majority of the steps? There is no need for the hardware to know that it is executing the same task in two threads; there just needs to be (1) a way for one thread to fail without screwing up program/memory state, and (2) a way to guarantee that only correct paths keep executing. If my understanding is correct, a little special handling of overlapping write sets (IE, false sharing -> convoy -> live-lock) should be all the CPU would need, in addition to a minimal HTM implementation (in addition, a mechanism to handle sub-line sharing/contention would also improve basic HTM performance--win-win), to be able to properly implement SpMT.