so stm makes it slower?????
No, it
can be slower. On average, it should be faster for high-concurrency programming, and even when not, should make for easier for higher-level programming (C#, FI; though C++ may have wide support first).
For easy safe operation, you can use simple pessimistic locking. In this case, scalability sucks. Optimistic locking, and finer-grained locking, lead to emergent bugs all the time. Locks themselves may be simple, but 20+ locks, not all of which behave the same way, are not, and you still have to carefully analyze the global changes you make, to be sure you won't create races.
Alternatively, you can use shared-nothing, leaving you with no need to worry about locks, except for I/O. Even a more efficient shared-nothing system, doing CoW, is going to eat up time and bandwidth compared to much sharing. In addition, a task that looks parallel, but with overlapping accesses, often, especially w/ compositions, ends up having far worse performance than just serial code, because it's processed mostly serially, but with all the extra overhead of unshared threaded code. Ugh. Sometimes, you want shared memory, but you still want dangerous race protection.
With transactional memory, you can get rid of most locks (improving performance and making for easier to follow code), but you must have a way to keep (or create while writing) a clean 'before' state, detect a potential conflict (safe races may be detected as conflicting), and recover from a conflict. But, you get the performance advantages, on a single box with many concurrent memory operations to the same space, of sharing, in that you aren't wasting memory time and bandwidth making too many unnecessary copies, you aren't synchronizing (often flushing) memory
just in case, you aren't blocking a bunch of safe code
just in case; and while is plenty of room for compiler, run time system, and code bugs, there is far less room to create a mire of locks that all bug guarantees a hard to track down bug some day. It won't make a bad design work better, but can make a good design easier to implement and maintain.
The catch is that the whole data memory space of the application has to be able to be treated like a coherent database, and that's not free. Every single write to shared memory is effectively encapsulated in a check-try-catch-finally block. The performance cost of that complication,
even in the common case of no detected conflicts, is part of why, to become more than a niche feature, it needs some hardware backing, to keep the overhead down.
With real STM implementations coming along (ICC, GCC), non-academic implementation research having mostly been successful (STM.NET and Velox, that I know of), pure software implementations having been fairly well tested
(not always being speedy, though, and adding a complicated management layer to deal with), and the need becoming more obvious as we get into having more and more cores available, now is just the right time to get hardware support, to lessen the performance burden.