I just aborted 4 ATLAS tasks that have been "postponed" for twelve hours.
I haven't had this problem myself yet. Just that the tasks appear running to the client, but are actually using only a negligible amount of CPU for hours (perhaps days, if allowed to).
I have been running the other vbox-loving project, Cosmology@home, often and for extended duration. I was occasionally getting "postponed" tasks there too. But there it was manageable for me, because (a) run times of Cosmo vbox tasks are measured in minutes, not hours, and (b) I have a background script which automatically rids me of such "postponed" tasks.
--------
It has been more than a year ago that I last ran LHC. I remembered various problems with it from that time, but this time is different again for me since I have now VirtualBox available on considerably more computers than last year.
By this return to LHC@home, I now learned:
- I cannot run Theory/vbox because these tasks fall idle far too much. Host utilization is a joke.
- I cannot run Atlas/vbox exclusively. The constraint here is not even that I lack RAM in order to feed all CPUs with Atlas/vbox only, but worse than that, my internet connection cannot satisfy the extreme download bandwidth requirement that Atlas/vbox has. Effect: Host utilization is very poor in this scenario too.
- I cannot run Atlas/vbox in combination with other applications either. While an application mix effectively works around the RAM and networking bottlenecks, problem #1 manifests itself: Atlas/vbox tasks randomly fall idle too. It's a good thing that my hosts are practically overcommitted by means of Intel Hyperthreading, and thus the other applications still use CPUs while some Atlas tasks don't.
- At least the LHC@home admins appear to have implemented an effective workaround against SixTrack failing with "exceeded elapsed time limit", which used to happen after the FLOPS estimation was corrupted by series of tasks that complete after a few seconds. (There are still many short tasks, and the FLOPS estimation still gets corrupted by them, making task queue management impossible. Just this particular client-side cancellation of good tasks doesn't happen anymore, in my current observation. But it's less than 2 days of observation so far.)
To summarize: While I have the technical means to run Theory/vbox and Atlas/vbox, the fact alone that these tasks go idle at random means that
I must not run any of LHC's vbox based applications. SixTrack is a poor alternative: Queue management is impossible, which is very bad for example if you run this quorum-2 application in a competition.
Now, it would be somewhat interesting to me whether Atlas/Linux-native is plagued by the same network bandwidth problem as Atlas/vbox. If I had holidays like you at the other side of the pond, I might have taken the time to play with it. I detest having to install a custom networked filesystem for this,* but on the other hand, for all I know the vbox tasks may already be using that same networked filesystem.
*) Newsflash to the LHC devs: The boinc client can transfer files too! And can manage the bandwidth use, number of concurrent transfers, and daily/ weekly schedule for networking.