Originally posted by: Rubycon
How do the clients like BOINC check for errata in overclocked systems? For example suppose a user signs up with an overclocked system that was stable, etc. But summer heat comes, dog hair, other stuff plugs up the heatsink, etc. System starts producing errors. What kind of checking happens to prevent error-laden completed work units from getting submitted back to the mothership? Does the client check like P95?
I'm curious about this as it looks like a lot of overclockers are also participants in distributed computing.
I think the real answer is... that they really don't know if the results are correct or not.
The only reliable way that I can see is to establish a quorom - that is, to send the work units out to multiple independent people, and then check that the returned results all agree.
But I doubt that these projects implement such a thing.
SoB was implementing some sort of double-checks, for precisely this reason, but I don't think that all of the work units sent out were double-checked. I kind of wish that they were, I'm suspicious of bad data. I might have even sent some bad data when I OCed my rig to 3.28Ghz. I thought that it was prime stable, but later on I was doing more testing and it failed Prime95, so I dropped it back down to 3.2Ghz. Now it
seems stable, but apart from testing, how can I know for certain.
It also seems that some systems that are Prime95 stable, are not F@H stable. (Prime95 stresses the FPU, F@H stresses the SSE units).
Ideally, I guess, BOINC or other distributed-computing clients would come with an integrated stress-tester, that ran tests on both the integer, FPU, and SSE units on the CPU to ensure stability. Perhaps in the future this will happen. Perhaps the client would refuse to download work units unless the stress test passed sucessfully, even.