generating uniformly distributed random integers in C

eLiu

Diamond Member
Jun 4, 2001
6,407
1
0
Hi all,
I have an array of N items, and I'd like to select "randomly" from it. What's the best way to do this? (That the selection sequence be highly pseudorandom is not important. I simply do not want to have a selection routine that's like "go in increasing order" or "take every 10th element (wrapping around w/o repeats)" or something like that.)

The most obvious answer seems to be: rand() % N, which will give me an integer between 0 and N-1, inclusive.

But from what [little] I understand about prng's, relying on the last bits of a pseudorandom # can be dangerous. But I don't know how bad it is. Like I don't care if the sequence of N numbers is very predictable. I do care if the sequence of N numbers lie mostly in the range say 10 to 20.

Lastly, computational complexity isn't that important, but it should not be more than a handful of simple arithmetic operations and a single call to rand(). Also it is CRITICAL that the method -always- return a number in the specified range--i.e. I don't want to deal with overflow/underflow, division by 0, etc. Avoiding denormalized numbers would also be good.

Thanks,
-Eric
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
You have several options here:

1) If you want pseudo-random numbers, aka determinism, aka your program does the same thing every time, use a predictable random seed, followed up by calls to rand(), or better yet, random(). Bear in mind that rand will return a value between 1 and RAND_MAX inclusive, usually 0x7fff on many implementations. So if you need bigger rand values, you may have to call rand()/random() several times and merge them together.

e.g., if your RAND_MAX is 0x7fff, you need to call rand() 3 times to get 32 random bits:
unsigned int r1 = rand();
unsigned int r2 = rand();
unsigned int r3 = rand();
unsigned int r = r1 ^ (r2 << 15) ^ (r3 << 27) ^ (r3 >> 10);

Note, the above uses high-order bits from r3 to add entropy to the low-order bits of r1.

After generating enough random bits, modding by N will give you close to a uniform distribution if your effective RAND_MAX >> N.

BTW, if rand()/random() isn't fast enough, check out the mersenne twister (i.e. google it).

2) If you want true random numbers, e.g. cryptographically secure randomness, you need a different source of entropy altogether. Finding a source of true randomness can be tough. The low-order bits of the rdtsc instruction on x86 (or rdtick on sparc, though that only has nanosecond granularity) provide some randomness, but aren't suitable for multiple calls, as above. Repeated calls will show strong correlation. Another source of entropy is recent user input, or timestamps, as reflected by system logs. Again, use only low-order bits. I think the timestamps of keystrokes, for instance, end up being normal (not uniform). Possibly noise on wireless signals.
 

presidentender

Golden Member
Jan 23, 2008
1,166
0
76
Originally posted by: degibson
You have several options here:

1) If you want pseudo-random numbers, aka determinism, aka your program does the same thing every time, use a predictable random seed, followed up by calls to rand(), or better yet, random(). Bear in mind that rand will return a value between 1 and RAND_MAX inclusive, usually 0x7fff on many implementations. So if you need bigger rand values, you may have to call rand()/random() several times and merge them together.

e.g., if your RAND_MAX is 0x7fff, you need to call rand() 3 times to get 32 random bits:
unsigned int r1 = rand();
unsigned int r2 = rand();
unsigned int r3 = rand();
unsigned int r = r1 ^ (r2 << 15) ^ (r3 << 27) ^ (r3 >> 10);

Note, the above uses high-order bits from r3 to add entropy to the low-order bits of r1.

After generating enough random bits, modding by N will give you close to a uniform distribution if your effective RAND_MAX >> N.

BTW, if rand()/random() isn't fast enough, check out the mersenne twister (i.e. google it).

2) If you want true random numbers, e.g. cryptographically secure randomness, you need a different source of entropy altogether. Finding a source of true randomness can be tough. The low-order bits of the rdtsc instruction on x86 (or rdtick on sparc, though that only has nanosecond granularity) provide some randomness, but aren't suitable for multiple calls, as above. Repeated calls will show strong correlation. Another source of entropy is recent user input, or timestamps, as reflected by system logs. Again, use only low-order bits. I think the timestamps of keystrokes, for instance, end up being normal (not uniform). Possibly noise on wireless signals.

I'm a fan of low-order bits on core temp, if you can get them and they're read with enough precision.
 

eLiu

Diamond Member
Jun 4, 2001
6,407
1
0
Originally posted by: degibson
You have several options here:

1) If you want pseudo-random numbers, aka determinism, aka your program does the same thing every time, use a predictable random seed, followed up by calls to rand(), or better yet, random(). Bear in mind that rand will return a value between 1 and RAND_MAX inclusive, usually 0x7fff on many implementations. So if you need bigger rand values, you may have to call rand()/random() several times and merge them together.

e.g., if your RAND_MAX is 0x7fff, you need to call rand() 3 times to get 32 random bits:
unsigned int r1 = rand();
unsigned int r2 = rand();
unsigned int r3 = rand();
unsigned int r = r1 ^ (r2 << 15) ^ (r3 << 27) ^ (r3 >> 10);

Note, the above uses high-order bits from r3 to add entropy to the low-order bits of r1.

After generating enough random bits, modding by N will give you close to a uniform distribution if your effective RAND_MAX >> N.

BTW, if rand()/random() isn't fast enough, check out the mersenne twister (i.e. google it).

2) If you want true random numbers, e.g. cryptographically secure randomness, you need a different source of entropy altogether. Finding a source of true randomness can be tough. The low-order bits of the rdtsc instruction on x86 (or rdtick on sparc, though that only has nanosecond granularity) provide some randomness, but aren't suitable for multiple calls, as above. Repeated calls will show strong correlation. Another source of entropy is recent user input, or timestamps, as reflected by system logs. Again, use only low-order bits. I think the timestamps of keystrokes, for instance, end up being normal (not uniform). Possibly noise on wireless signals.

2) Sorry I wasn't very clear in the OP. I don't think true randomness matters here. I'm just trying to randomly select pivots for use with a randomized "Selection" (find the kth largest in the set) algorithm.

1) First, what is random()? Is this a C library function? B/c I can't find a reference to it...
Thanks for the tip on how to increase the range of rand. RAND_MAX is the same as INT_MAX in my version of stdlib.h, which is 2^31, and I don't think I'll have arrays of more than 2 billion elements any time soon.

But is RAND_MAX not usually 2^31? The value you listed is 2^15. Should I insert a check for what RAND_MAX is in case I run into a machine that doesn't use 32bit words? (And subsequently extend the range of rand)

Also, can I do this check by #if(RAND_MAX == 2^15) so that it gets eval-ed at compile time?

Finally, can I avoid the potential issue with using modulo by doing:
a = (int) rand()/((RAND_MAX + 1.0)/M)
to produce a random integer in [0,M). That seems like it would be better b/c then I can use all of the bits of rand(). I divide the denominator by M instead of multiplying rand() by M to avoid overflow.


presidentender, I have no idea as to how to access the core temp. This would have to be something that can be done rapidly in code--letting the OS handle it is not an option.
 

dinkumthinkum

Senior member
Jul 3, 2008
203
0
0
If you are going to use rand() then the proper use of rand() would be

int x = (int)((double)rand() / (((double)RAND_MAX + (double)1) / N))

/* 0 <= x < N */
 

eLiu

Diamond Member
Jun 4, 2001
6,407
1
0
Originally posted by: dinkumthinkum
If you are going to use rand() then the proper use of rand() would be

int x = (int)((double)rand() / (((double)RAND_MAX + (double)1) / N))

/* 0 <= x < N */

Cool, this is what I proposed to do in the post above yours (eh I skipped a bunch of the casts so possibly my code wouldn't work as it stands). Sounds like it was a good idea, yay.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Originally posted by: eLiu
But is RAND_MAX not usually 2^31? The value you listed is 2^15. Should I insert a check for what RAND_MAX is in case I run into a machine that doesn't use 32bit words? (And subsequently extend the range of rand)

RAND_MAX can be anything, so long as its bigger than or equal to 0x7fff. The value I listed is 2^15 - 1, by the way. Its not that the word size on the machine isn't 32 bit, its that the algorithm doesn't necessarily operate on all 32 bits.

Also, can I do this check by #if(RAND_MAX == 2^15) so that it gets eval-ed at compile time?

Almost. More like #if (RAND_MAX == ((1<<15)-1)
 

eLiu

Diamond Member
Jun 4, 2001
6,407
1
0
Originally posted by: degibson
Originally posted by: eLiu
But is RAND_MAX not usually 2^31? The value you listed is 2^15. Should I insert a check for what RAND_MAX is in case I run into a machine that doesn't use 32bit words? (And subsequently extend the range of rand)

RAND_MAX can be anything, so long as its bigger than or equal to 0x7fff. The value I listed is 2^15 - 1, by the way. Its not that the word size on the machine isn't 32 bit, its that the algorithm doesn't necessarily operate on all 32 bits.

Also, can I do this check by #if(RAND_MAX == 2^15) so that it gets eval-ed at compile time?

Almost. More like #if (RAND_MAX == ((1<<15)-1)

Whooops... how did I miss that that number is odd, lol. Yeah I guess 2^15 is also not a valid C expression. (I spend the other half of my life in matlab, so I often forget.)

Thanks for the correction/tips :)

-Eric