Running chkdsk on a RAID array

boomerang

Lifer
Jun 19, 2000
18,890
642
126
This is a real small business with 7 employees that my wife manages. They're experiencing what I'm now feeling must be data corruption on a file server I put together maybe 3 or 4 years ago. I've got two IDE drives connected to a 3ware card in RAID 1.

My first thought was to run chkdsk on these drives, but I had a feeling that I ought to do some research first and I see a general consensus that one does not really want to run chkdsk on an array. Actually, what I'm seeing is that it can lead to disaster. I'm thinking perhaps not on a RAID 1 array, but it just doesn't seem to make sense to run chkdsk on two drives at the same time. I'm really not sure.

So, I'm thinking maybe I should just run it on one disk at a time with the other being disconnected. Is this the way to do this?

I'm thinking another option would be to run the WD diagnostic on the drives. Once again, one at a time.

I'm very open to any guidance or suggestions.
 

RebateMonger

Elite Member
Dec 24, 2005
11,588
0
0
Running, Chkdsk on a RAID array can be risky. Windows tries to make the drive structure neat and tidy, and may take out files or directories if that's required to fix the structure.

Considering that the most expensive IDE drive available is about $70, you might want to:

1) Back up the entire system.
2) Record the RAID parameters so they can be duplicated if necessary.
3) Label the original drives/cabling.
4) Install two new IDE drives in RAID1.
5) Restore the system from the backups.

This leaves your original disks available for later diagnosis if you wish, allows you to get your original system back if something goes wrong with the restore, and avoids major downtime while doing diagnosis on those older drives. And it leaves the original drives as-is for data recovey if necessary.

If you think that your System files are corrupted, you may have to re-install your OS and rebuild from there. You didn't say what your symptoms are.
 

boomerang

Lifer
Jun 19, 2000
18,890
642
126
We're doing redundant backups so that's not an issue. The corruption is happening on a partition that is used solely for data so no OS issues.

They're having issues with missing records, corrupted records and the like in a piece of database software. (This is a bit of a generalization.) There is a compiling routine built into the software and usually when they are having unusual problems that fixes it. In this case it hasn't. The makers of the software were consulted and they have no answers. Suddenly today I realized that it may be data corruption and my first thought is the HD's themselves. Hence my wanting some guidance on trying to determine if that is in fact the problem.

I concur on the drive replacement. I've already checked drive prices and availability. But I would really like to know what the true problem is first.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
My first thought was to run chkdsk on these drives, but I had a feeling that I ought to do some research first and I see a general consensus that one does not really want to run chkdsk on an array. Actually, what I'm seeing is that it can lead to disaster. I'm thinking perhaps not on a RAID 1 array, but it just doesn't seem to make sense to run chkdsk on two drives at the same time. I'm really not sure.

Running chkdsk on a RAID array has exactly the same risk as running it on a single drive. Anyone who says otherwise is extremely confused. RebateMonger is correct that running chkdsk can have unintended side affects, but that has nothing to do with RAID.

So, I'm thinking maybe I should just run it on one disk at a time with the other being disconnected. Is this the way to do this?

It'll give you a good, ready backup to fall back on in case something does go wrong but that's about it.

I'm thinking another option would be to run the WD diagnostic on the drives. Once again, one at a time.

Those are almost always just SMART tests which won't affect the data. It couldn't hurt and you'll probably have to do it anyway if you have to RMA one of the drives. But the controller should have some kind of error log and it should say if the controller's had problems with one of the drives.
 

boomerang

Lifer
Jun 19, 2000
18,890
642
126
I want to thank you both for your help. It appears the best course is for me to dig into the utilities for the 3ware card and see what I can see. Replacement drives have been ordered and may arrive as early as tomorrow.

The drives in use are just out of their 5 year warranty and I'm going to consider them as having served all of their useful life.

As I said, redundant backups are already in existence and this swap should be an easy affair.
 

RebateMonger

Elite Member
Dec 24, 2005
11,588
0
0
Originally posted by: boomerang
But I would really like to know what the true problem is first.
That's certainly a good goal.

Databases are often where drive-induced corruption shows up, because the database files are often very large and a small error can sometimes corrupt the whole file. I've had Exchange databases on RAID 1 get corrupted by a couple of power outages (the UPS was dead, despite no warning light).

Something like loose power or data connectors could also do it.

Frankly, you might even consider putting in a new RAID controller and cables. I hate to sound like a wuss, but often it's cheaper to bite the bullet and replace the whole assembly than to spend days troubleshooting it and possibly miss a failing component and have the problem repeat.
 

boomerang

Lifer
Jun 19, 2000
18,890
642
126
Originally posted by: RebateMonger
Originally posted by: boomerang
But I would really like to know what the true problem is first.
That's certainly a good goal.

Databases are often where drive-induced corruption shows up, because the database files are often very large and a small error can sometimes corrupt the whole file. I've had Exchange databases on RAID 1 get corrupted by a couple of power outages (the UPS was dead, despite no warning light).

Something like loose power or data connectors could also do it.

Frankly, you might even consider putting in a new RAID controller and cables. I hate to sound like a wuss, but often it's cheaper to bite the bullet and replace the whole assembly than to spend days troubleshooting it and possibly miss a failing component and have the problem repeat.
You've pretty much nailed my concerns. I'm wondering if it's the 3ware card. I purposely picked a good quality piece of hardware that was still affordable for us, but it has been in use for 4 1/2 years (checked my records, it's amazing how time flies.) Well, tomorrow I should be able to take a look at the logs and maybe they will help me make a decision on what to do. I truly want to thank you for your input.