OK, so I have a need to count the number of trailing zeros in a 64-bit number really fast. In a 64-bit x86 OS I can use the bsf instruction. Heck, it maps directly to the C __builtin_ctzll() function. But in 32 bits it doesn't. This particular application often has few zeros at the end, so the 32-bit version of __builtin_ctzll() isn't as fast as the following GCC assembly:
OK, so that works great, but now I need to port it to Visual C++. First, after reversing the operands for Intel syntax, I tried sticking the contents of each of those operands in the assembly inline. I get syntax errors. I try casting to a struct of two 32-bit unsigned integers instead. I get syntax errors. Finally, after a couple of hours, I turn up, on this page:
ARGH! D:
OK, fine. So I make an inline function, where n gets passed in as a const, I copy it to two const unsigned ints, nlo and nhi, add a non-const "pos", and stick those in the inline assembly. It says something about "Improper operand type". 😡
I'm gathering the Microsoft assembler isn't smart enough to realize it should pick a register, any free register, and just use that for pos.
I guess if anyone has any bright ideas they can post them; but I'm just going back to __builtin_ctzll().
Code:
asm volatile(
" bsfl %[k0l], %[pos] \n"
" jnz bsflok \n"
" bsfl %[k0h], %[pos] \n"
" addl $32, %[pos] \n"
"bsflok:"
: [pos] "=r" (pos)
: [k0l] "rm" ((unsigned int)(n)),
[k0h] "rm" ((unsigned int)((n) >> 32))
: "cc" )
OK, so that works great, but now I need to port it to Visual C++. First, after reversing the operands for Intel syntax, I tried sticking the contents of each of those operands in the assembly inline. I get syntax errors. I try casting to a struct of two 32-bit unsigned integers instead. I get syntax errors. Finally, after a couple of hours, I turn up, on this page:
Note
Microsoft and Borland inline assemblers do not support type casts.
ARGH! D:
OK, fine. So I make an inline function, where n gets passed in as a const, I copy it to two const unsigned ints, nlo and nhi, add a non-const "pos", and stick those in the inline assembly. It says something about "Improper operand type". 😡
I'm gathering the Microsoft assembler isn't smart enough to realize it should pick a register, any free register, and just use that for pos.
I guess if anyone has any bright ideas they can post them; but I'm just going back to __builtin_ctzll().