Markfw
Moderator Emeritus, Elite Member
So we had a power outage. For some odd reason, on of my boxes wouldn't even post. So after a day being out, I swapped the physical location with a box that was powered off, but know to be a good config with 2 x F@H before I started saving power, and selling boxes and leaving some off. So Ipowered it up, and no problems... except SP unit #2 machinedid #3 startes getting EUE's.
Then I finally got the other box to post, and kep the same OC setting, and it starts doing the same thing !
Both boxes have the same symoptoms. Both boxes have had no software or hardware changes. Both boxes have different IP addresses. Both boxes have smp unit number 1 totally stable.
I tried de-oc'ing both a little, no effect. They go 33-66 (or more) % complete and then EUE.
Any ideas ?
Here are the logs from the last part of the last EUE's on both.
[22:50:36] Completed 115000 out of 250000 steps (46 percent)
[23:06:33] Writing local files
[23:06:33] Completed 117500 out of 250000 steps (47 percent)
[23:21:30] Warning: long 1-4 interactions
[23:21:30] Gromacs cannot continue further.
[23:21:30] Going to send back what have done.
[23:21:30] logfile size: 97227
[16:41:29] Completed 92500 out of 250000 steps (37 percent)
[16:58:03] Writing local files
[16:58:03] Completed 95000 out of 250000 steps (38 percent)
[17:14:44] Writing local files
[17:14:44] Completed 97500 out of 250000 steps (39 percent)
[17:17:23] Gromacs cannot continue further.
[17:17:23] Going to send back what have done.
[17:17:23] logfile size: 82288
[17:17:23] - Writing 82824 bytes of core data to disk...
[17:17:23] ... Done.
I just took them down again, both at ~110 mhz below the original. 3510 down to 340 and 3550 down to 3460. We will see now, but very odd....
Then I finally got the other box to post, and kep the same OC setting, and it starts doing the same thing !
Both boxes have the same symoptoms. Both boxes have had no software or hardware changes. Both boxes have different IP addresses. Both boxes have smp unit number 1 totally stable.
I tried de-oc'ing both a little, no effect. They go 33-66 (or more) % complete and then EUE.
Any ideas ?
Here are the logs from the last part of the last EUE's on both.
[22:50:36] Completed 115000 out of 250000 steps (46 percent)
[23:06:33] Writing local files
[23:06:33] Completed 117500 out of 250000 steps (47 percent)
[23:21:30] Warning: long 1-4 interactions
[23:21:30] Gromacs cannot continue further.
[23:21:30] Going to send back what have done.
[23:21:30] logfile size: 97227
[16:41:29] Completed 92500 out of 250000 steps (37 percent)
[16:58:03] Writing local files
[16:58:03] Completed 95000 out of 250000 steps (38 percent)
[17:14:44] Writing local files
[17:14:44] Completed 97500 out of 250000 steps (39 percent)
[17:17:23] Gromacs cannot continue further.
[17:17:23] Going to send back what have done.
[17:17:23] logfile size: 82288
[17:17:23] - Writing 82824 bytes of core data to disk...
[17:17:23] ... Done.
I just took them down again, both at ~110 mhz below the original. 3510 down to 340 and 3550 down to 3460. We will see now, but very odd....