How many threads can be accessing global shared memory at completely random addresses? Is this a simple function of the memory speed and timings?
The reason I ask is that I have this proof-of-work system called Cuckoo Cycle at https://github.com/tromp/cuckoo in which each thread performs
a cheap siphash computation to access (on average) 3.3 random words
in memory, and even running with 32 threads, I still don't see memory being
saturated.
Can anyone predict based on memory timings what would be the maximum
number of threads this application can effectively use?
Or does anyone here have access to systems with higher levels of multi-threading? If so, could you run some test like
for i in {32,40,48,56,64}; do echo $i; cc -o cuckoo -DNTHREADS=$i -DSIZEMULT=1 -DSIZESHIFT=30 cuckoo.c -O3 -std=c99 -m64 -Wall -Wno-deprecated-declarations -pthread -l crypto; (time for j in {0..9}; do ./cuckoo $j; done) 2>&1; done > output
to see where the timings (realtime) flatten out?
Thanks!
The reason I ask is that I have this proof-of-work system called Cuckoo Cycle at https://github.com/tromp/cuckoo in which each thread performs
a cheap siphash computation to access (on average) 3.3 random words
in memory, and even running with 32 threads, I still don't see memory being
saturated.
Can anyone predict based on memory timings what would be the maximum
number of threads this application can effectively use?
Or does anyone here have access to systems with higher levels of multi-threading? If so, could you run some test like
for i in {32,40,48,56,64}; do echo $i; cc -o cuckoo -DNTHREADS=$i -DSIZEMULT=1 -DSIZESHIFT=30 cuckoo.c -O3 -std=c99 -m64 -Wall -Wno-deprecated-declarations -pthread -l crypto; (time for j in {0..9}; do ./cuckoo $j; done) 2>&1; done > output
to see where the timings (realtime) flatten out?
Thanks!