I'm glad to see your system is starting to look stable. Good luck with it.
Well I didn't want to get too much into the details of under what circumstances short-run computations
are useful in Prime95 or not just because I think there are a lot of different configurations that can be used to make short test runs useful, though there are also some configurations that make short test runs not useful, as well as some configurations that don't do much any verification testing at all (AFAIK) so they wouldn't ever catch errors.
To successfully do a LL test on a very long possible prime like
2^32,582,657
which is the largest known Mersenne prime can take a system about 1-2 CPU weeks of computation or so
as far as I know with Prime95.
When you're running in "non stress test" mode that is one of the more typical / useful types of calculations
Prime95 does, e.g. LL testing of a particular possible prime like M32582657.
I believe it saves its intermediate LL calculation every so often (maybe every half hour? read the docs), or when you tell it to gracefully pause/stop/close down it typically also saves its spot within the LL test it is currently
running so when you start the program again it'll immediately start pack at the same spot in the long calculation so you won't have lost more than around a half hour's worth of calculations even if you stop and restart the program after 2 hours on like 100 different occasions -- eventually it'll be 2 weeks worth of total CPU time and it'll have a result of that particular LL test.
Other types of tests like trial factoring can run much more quickly than the LL test and they may only take a few minutes per possible factor tested or whatever... It depends on your settings and work list and stuff.
In any of these cases it can possibly use certain strategies to double-check the result of calculations it does either by repeating a given calculation, or doing a certain sub-calculation like a multiply with a couple of different algorithmic inputs that it can cross check to see if the result of them in comparison or in analysis can possibly be correct. It can do that pretty much for any modular multiply operation (which is the basis of the whole thing) it does.. it just depends on its settings and work configuration etc.
In stress test mode I know that it isn't trying to produce a useful calculation of an unknown result, it is just trying to perform calculations with known or easily verifiable results to look for errors in the PC. I don't recall precisely what kinds of error checking it does in this case, or how often it checks for errors, or whether it would even bother to save intermediate results and restart calculations at all in this mode of operation. Empirically, however, if you can sometimes see it give you a red "ERROR/FAILED" indication after only a few minutes of runtime on an unstable system, this tells me they're probably verifying results pretty often in this case however they do it, so it seems that if you're talking about run times of more than a couple hours full load they'll have usefully verified many calculations within that span before you interrupt the testing, and, hence, basically the whole time would qualify as being useful test time against a possibly random error, and thus the test time would be cumulatively reassuring even if you only do many somewhat short (hour or more) runs.
I'd ask around at mersenneforums or mersenne.org or whatever if you want to know exactly how long it takes to detect an erroneous calculation when you're running in test mode, or how often it checks for errors in non-test-mode depending on if you're LL checking, trial factoring, or whatever. In test mode the mere fact that it repeats the same SET of computations should not matter since even if you were just calculating something simple like "1+1=? Check to see if result=2" trillions of times you should eventually see an error if the system is unstable as long as your calculation program at least does a decent job at exercising the FPU/ALU/Cache/registers and other parts of the CPU / memory. The specific calculation involved doesn't (to a good approximation) matter at all as to the probability of a CPU error, only that that calculation fully uses certain parts of the CPU (fpu/alu/cache/registers/SSE2/whatever...).
Originally posted by: Psynaut
Thanks Quix,
I did watch the Prime 95 test names, and I thought it appeared to be repeating the same tests every few hours or so, but I certainly wasn't sure and didn't want to make assumptions about how these programs work. Also, I was running it with error checking enabled.
BTW, I am stable after one hour on OCCT, so that is reassuring.