Essence_of_War
Platinum Member
I have some very float-intensive fortran code that is totally serial (no OMP, no MPI, no pthreads, no nothing). I typically run multiple instances of it with slightly different parameters to do simulation work, and recently I've had my first opportunity to run it on a Xeon that has HT.
I have a quad-core Xeon w/ HT and when I load up 4 instances of this code, watching top consistently shows a load of 3.8. Am I leaving performance on the table if I only run 4 instances? If I loaded up an additional 4 instances, assuming I have enough RAM to not start swapping, will I slow down all of my simulation runs by forcing the extra scheduling on non-physical threads?
For reference, this system doesn't have to do anything but run this simulation code.
I have a quad-core Xeon w/ HT and when I load up 4 instances of this code, watching top consistently shows a load of 3.8. Am I leaving performance on the table if I only run 4 instances? If I loaded up an additional 4 instances, assuming I have enough RAM to not start swapping, will I slow down all of my simulation runs by forcing the extra scheduling on non-physical threads?
For reference, this system doesn't have to do anything but run this simulation code.