• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

stable for 28 days. not stable enough.

VirtualLarry

No Lifer
Trying to figure out what's unstable about my computer.
It always bluescreens or reboots in 30 days or a little less, of 24/7 usage.

My apologies for starting this thread, because I started a similar thread a couple of months ago, and I don't know where that thread went.

Here's the rig in question:
E2140 @ 3.2Ghz, 1.425v (BIOS), 1.36v (CPU-Z prime95 load), 16-20C away from TJMax
GA-P35-DS3R v1.0 BIOS F4
2x2GB Patriot DDR2-800. RAM is specced at 5-5-5-12 2.0v, I'm running it at 5-5-5-15 1.8v. Memest86+ passed a 24hr test at those specs.
CoolerMaster HyperTX2
WD 320GB SATA HD
LG 20X IDE DVD burner
Radeon X1950GT PCI-E 256MB
ThermalTake 430W PSU
CoolerMaster Elite 330 case with rear exhaust fan

Passes prime95 24hr test just fine. Haven't run OCCT:linpack, but I think the temps would cause thermal shutdown.

Runs SeventeenorBust 24/7 in the background, as well as my MagicJack phone software.

The weak spots that I can see might be the RAM voltage, or the PSU.

A little more background: I had this rig originally connected to a BE-550 UPS, and it would reboot once or more per day. I plugged it into the wall, and it stopped rebooting. So I replaced the UPS. Then it only rebooted weekly, or less. However, I connected my newly-build F@H rig (BE-2400 stock speeds, 3x9600GSO stock speeds, K9A2 Plat. mobo, EarthWatts 650 PSU), and it too rebooted, even though the CPU wasn't overclocked. However, it probably exceeded the max 330W output of the UPS, so if the power glitched, it probably wouldn't be able to hold up anyways. I never tried plugged that rig directly into the wall, I just assumed that it was my UPS.
So I went out and bought a beefy 1200VA/720W UPS, Ativa brand (OEMed by APC), at Office Depot for $75. I thought that if it was the UPS, that it would fix the problem.

Well, I was working on my new Q6600 quad-core rig, plugged into my other BE-550 UPS, into another outlet about 5-6 feet down the wall from the first outlet (possibly same circuit?), and one time when I was rebooting it, the other E2140 rig connected to the 1200VA UPS rebooted too! (Edit: It had been running for 28 days straight) (Was that just a coincidence?)

So are all of my UPSs junk? Something wrong with the wiring in my apt? Or do I just suck at building 24/7 stable PCs?

I have another rig at another location, an E4400, that was running at 2.8Ghz OCed, but recently has been running at 2.0Ghz (non-OCed), hooked up to a BX1500 APC UPS (1500VA), and it runs XP SP2, and SeventeenorBust, and it runs for multiple months straight without crashes or reboots.
 
You dont have to apologize.....You are a DAMN LIFER!!!!

I am leaning toward a power issue or that occasional rare memory error......

Do you write down and check the issues when a BSOD come up? I would think the reboots are independent and separate of the BSOD. BSOD can definitely be memory, cache, and even software related (mostly vid cards).....but reboots in most instances have been related to thermal or power issues...except for when people dont uncheck the option in windows to automatically reboot after system error.

Do you run any gpu clients in any DC type app?
 
Let me just add a few questions...

Is the system cpu being loaded during this time, or just idle and sitting there?

Is the vid card being loaded in some fashion be it gaming or DC computing?



 
CPU is loaded 100%. Video card is unused (except for displaying the desktop). I don't think that the X1950s support "PowerPlay", although I'm not 100% certain.

I did at one point attempt to do a PSU stresstest by running Prime95 and ATItool's 3d viewer at the same time.

I suppose I can try OCCT 3's PSU test too.

I have "reboot on BSOD disabled". It seems to want to reboot rather than BSOD the vast majority of the time. So that might indicate PSU more than RAM.

At this point, perhaps I should upgrade the PSU (I have some Antec Basiq 500W PSUs), and increase the memory voltage to 1.9v or 2.0v. I hesitate to give the RAM more volts than it needs, because of temps and longevity reasons, but if I'm not really giving them enough, then perhaps I should.
 
See, it's things like that (that IDC posted), that make me really pissed off at Intel for not REQUIRING ECC memory, for the i7 platform.

When you're talking about 12GB or 24GB worth of RAM, that's a lot of cells, and a pretty high probability of soft errors.
 
There was some point in the late nineties that I squandered close to $1k on ram for some new uber computing setup I convinced myself I needed. I can't recall the total ram size, IIRC it was either 256MB or 512MB, it was monster by any definition for desktop rigs at the time.

At any rate I was so paranoid with having so many ram cells and thoughts of soft-errors would making my computer crash every few hours really bedeviled me. So much so that I bought all ECC ram and ensure I had an ECC enabled chipset.

Nowadays its a barely mentioned topic and here we have 10x-20x as much installed ram. I actually forgot all about it till I saw your thread.
 
28 days is a lunar cycle -- I say you have a werecomputer.

Or, there is another 28 day cycle -- do you think of your computer as a she :lips: 😀
 
I wonder. If the problem were with the RAM or PSU or UPS, wouldn't it show up if I clocked my CPU at stock speeds? Maybe I should try that, and see how long it lasts. OTOH, if the problem is the CPU VRM section or CPU PSU circuitry, it would seem to only happen when the CPU is overclocked and under heavy load. But I would think that an overclocked dual-core wouldn't stress the VRMs as much as an overclocked quad-core, and those boards will overclock a Q6600, although I don't know about the long-term reliability doing so.

The coolermaster HyperTX2 has an angle louvre that directs air through the CPU fan and towards the VRMs on the board.
 
Originally posted by: GLeeM
28 days is a lunar cycle -- I say you have a werecomputer.

Or, there is another 28 day cycle -- do you think of your computer as a she :lips: 😀

:laugh:

Some software you always have running... has a memory leak causing Windows to fail?

Didn't Windows 95 always, guaranteed, crash at 48 days or so?
 
Originally posted by: Zap
Originally posted by: GLeeM
28 days is a lunar cycle -- I say you have a werecomputer.

Or, there is another 28 day cycle -- do you think of your computer as a she :lips: 😀

:laugh:

Some software you always have running... has a memory leak causing Windows to fail?

+1

Windows in general is a memory leak.
 
Originally posted by: Zap
Some software you always have running... has a memory leak causing Windows to fail?

Or ...
Some software you always have running... updates and reboots?

If you didn't have it on a UPS I would have suggested the power company does something once a month ... or something on the line or in the house does something to cause a dip or surge to the power.

Is it possible for a virus or root kit to activate every 28 days?

I guess I just should have asked if it is a time of month thing or a length of time thing.

"bluescreens or reboots" ... seems like this makes it harder to troubleshoot, if it were one or the other it might be easier to pin down.

I guess I'm not very good at this, but maybe I can "bump" someone else into figureing it out. 🙂
 
Originally posted by: VirtualLarry
It always bluescreens or reboots in 30 days or a little less, of 24/7 usage.
.
.
.
I have another rig at another location...and it runs for multiple months straight without crashes or reboots.

These two bits of information would appear to eliminate the memory leak theory as well as the household electricity hiccup theory.

Memory leaks (the disease) usually manifest as symptoms of individual apps erring out for there being a lack of system resources. Windows pops up a little box telling you this and closes the application or refuses to open a new one.

In the old days with Win95 the user had to manually reboot windows when memory leaks became catastrophic (opening/closing MS Word 8 times was guaranteed to do it), the memory leaks themselves in windows did not cause windows to crash and reboot or BSOD.

Here's an example from MS support, which is aptly incorrectly titled as causing system hangs too but if you read the document the only thing that happens is the system refuses to open new applications:

If this code is run hundreds of times, the system becomes unstable, and there is not enough memory to start other applications. In Excel, you may receive the following error messages:
http://support.microsoft.com/kb/192869

Also if power distribution was having issues then his other rigs would be getting hit by them too as this would be a point of commonality between his rigs.
 
Originally posted by: Duvie
I am leaning toward a power issue or that occasional rare memory error......

+1 with duvie all the way.
 
I don't think I saw someone suggesting this but have you tried a different PSU? I see everyone's leaning towards the PSU, but didn't see if someone suggested trying a different one to see if your current one is the culprit.
 
Originally posted by: Idontcare
Also if power distribution was having issues then his other rigs would be getting hit by them too as this would be a point of commonality between his rigs.

The "rig in the other location" is in a completely seperate apt., across town.

However, I do have an AMD64 3800+ rig running in my old slimline HTPC case, non-overclocked, attached to an identical (and presumably working 100%) ES-550 UPS, that also runs for months at a time, in the same apt. However, in order to use its monitor for testing my Q6600 rig, I turned it off. So I'm unable to offer a datapoint in which it also reboots when something funny happens, because I've never had that happen with this machine. It only has a 200W PSU.


 
Hi VLarry,

I did not see the OS you are running....XP? If it is Windows 2000 (yes, I still run it on one PC....works well for my needs), the FAA found out that about 28 to 30 days of uptime is max....when someone forgot to do a scheduled reboot, and it brought down ATC:

http://www.techworld.com/opsys.../index.cfm?newsid=2275

It says 49.7 day reset cycle in above article....

I see that you are a Lifer(TM) and know your PCs, but, just to ask: do the system logs show anything at all?

Also, seems like your power situation is a bit weak, i.e. maybe a marginal (older?) power supply, and a sort of flaky UPS, maybe not big enough?

Patriot RAM:I think you said you were going to try it at 2.0v--what does patriot recommend for the sticks you have?

Seventeen or Bust: every 28 days your machine gets fed some bogus data that causes a crash?

Some software you always have running... updates and reboots?

Like Microsoft Update Tuesdays? But, that would show up in your system logs I think....but it does happen about every 4 weeks or so.

You mentioned that power circuit you have it plugged into may be a bit flaky...what else is plugged in on same circuit? Isn't there like a 15 amp limit on each house circuit? With a couple of today's high powered PCs plugged in, you can exceed that. (I think that is why a 110v x 15 amp = 1500 or so kW power supply is going to be the max possible....)

As for joking about the lunar cycle: do you really want AmberClad to show up and open a can of hurt on everyone here?

Thank you for your concern.

NX
 
Originally posted by: NXIL
As for joking about the lunar cycle: do you really want AmberClad to show up and open a can of hurt on everyone here?
Do you want to be first in line when that happens :evil:?
 
See, I told you.

Do you want to be first in line when that happens ?

No! That is why I was warning all these immature, insensitive, and emotionally illiterate men (and boyz) to be careful.

And, old, but still relevant:

This is a handy guide that should be as common as a driver's license in the wallet of every husband, boyfriend, or significant other.....

DANGEROUS: What's for dinner?
SAFER: Can I help you with dinner?
SAFEST: Where would you like to go for dinner?
ULTRASAFE: Here, have some chocolate.

DANGEROUS: Are you wearing that?
SAFER: Gee, you look good in brown.
SAFEST: WOW! Look at you!
ULTRASAFE: Here, have some chocolate.

DANGEROUS: What are you so worked up about?
SAFER: What did I do wrong?
SAFEST: Here's fifty dollars.
ULTRASAFE: Here, have some chocolate.

DANGEROUS: Should you be eating that?
SAFER: You know, there are a lot of apples left.
SAFEST: Can I get you a glass of wine with that?
ULTRASAFE: Here, have some chocolate.

DANGEROUS: What did you do all day?
SAFER: I hope you didn't overdo it today.
SAFEST: I've always loved you in that robe!
ULTRASAFE: Here, have some more chocolate.
 
I'd shut off 17orBust and/or the overclock to eliminate them. I had a system that would pass every stress test but fail encoding HD video bench X264; you might try running that as another stability test?
 
Originally posted by: AmberClad
Originally posted by: NXIL
As for joking about the lunar cycle: do you really want AmberClad to show up and open a can of hurt on everyone here?
Do you want to be first in line when that happens :evil:?

Sorry, I can't stop giggling. No I don't. My sincere appologies! It was a timing thing. Just after I typed in the "werecomputer" thought, my wife shooed the dog away who was only looking for a pat on the head or a chin scratch.

I should have said, "Here have some chocolate".

PS: I just showed all of this to my wife and we both LOLed :laugh:
 
Just a little update. I tried running it at stock speeds to see if it rebooted, but my HD made a funny noise so I had to shut down and cut that experiment short. So I re-overclocked it, and I tried increasing the vcore a notch. Still rebooted. So I increased the vdimm to 2.0v (+0.2v). Still rebooted.

So I cranked the vcore back down a notch, and the vdimm back down to 1.8v, and now the GMCH is increased a notch (+0.1v). So I'll see how long it stays up with that setting.
 
Hope you get it figured it out man.

I personally am dealing with some evil satanic problem with my main PC right now that causes it to lock up during L4D.
It passes every single stresstest i can dream of to throw at it, & it happens OCed, lightly OCed, & at stock.
I've tried every single thing i can think of, many things that shouldn't have any affect, & i just cannot figure it out for the life of me.

So dear god, i feel your pain.
 
i've had systems be stable for months, then when the heat picks up during summer, or dust builds up, they become unstable and need a slight decrease in voltage/clocks
 
Back
Top