Haswell to support transactional memory in hardware

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
Actually yes, the instructions are backwards compatible. You can run this code today, on a SB CPU and it is just ignored and it falls back to the old locking method.

So do compilers/runtimes supporting TSX insert a check for the CPU ID?
 

exar333

Diamond Member
Feb 7, 2004
8,518
8
91
Does TM require additional memory bandwidth over the existing locking methods, or is it just used more efficiently as the # of threads increase? My understanding is that this would provide minimal impact in a small number of threads, but the performance delta as the # of threads goes up can be huge.
 

GammaLaser

Member
May 31, 2011
173
0
0
So do compilers/runtimes supporting TSX insert a check for the CPU ID?

For the HLE part of the extension, a separate codepath is not needed. HLE re-uses existing instruction prefixes (REPE and REPNE) that current hardware will ignore because these prefixes are not currently used in conjunction with lock manipulation instructions.

On the other hand, the RTM part of the extension does require a separate codepath. RTM defines new instructions altogether and older hardware will generate a fault if it sees these instructions.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Does TM require additional memory bandwidth over the existing locking methods, or is it just used more efficiently as the # of threads increase? My understanding is that this would provide minimal impact in a small number of threads, but the performance delta as the # of threads goes up can be huge.
Transactional memory is one big step from intricate procedural programming for space and logic-starved hardware, towards predicated programming for bandwidth-starved logic-rich hardware. In pure software, editing common data structures can eat up more time in the STM system than they are doing work, which is where hardware support for the grunt work should help immensely. More or less bandwidth is going to be dependent on implementation (software more than hardware, provided you have hardware support), and the specific kind of work the software does (in particular, what's the likelihood of cache line crossing on a regular basis?). It should use less, normalized to the same amount of results, but saying it will, across the board, I think would be a bit naive.

If managing locks is taking a bunch of CPU time that could be put towards doing real work, I would expect substantial gains. If managing locks to improve performance carries too much risk, with requirements-driven design changes over time (many a small business' dev team will be in this category), I would expect substantial gains. Beyond that, however, it's another available tool, and whether it eats up more time/bandwidth/energy is all in how it's used. Shared-memory procedural languages (IE, the common in-demand ones, like C++, C#, and Java) will still be giving programmers all the rope they need to hang themselves with, IMO, as part of allowing them to make the hardware sing and dance.