• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Intel RAID1 unstable

dtgoodwin

Member
I have SuperMicro X9SCM Server board (204 chipset - roughly equivalent to P67) with 4 GB of Kingston ECC RAM, and an i3 2100T processor. I am running WHS V1 on it. I have two WD Scorpio 250GB Blue drives running in RAID 1 off the two 6Gb ports. For 4 months it has run flawlessly. Starting one month ago, I started having problems accessing the array, and consequently any data on the server since it is dependent on the system drive. It starts with an error from iastor that \Device\Ide\iaStor0 did not respond within the timeout period. I get this message for about 8-9 minutes, beginning near the time when WHS does it's nightly chkdsk. There are no other disk events, although there are other normal events that continue after until I find the error. My symptoms are that the HD light is on solid with no other drives being busy (the remaining drives are in trays that have activity lights). The system is basically unresponsive, though I have a mouse cursor and can still ping it, it can't access c: or d: that are on the array. This has happened 4 times - about once a week.

It will not shut down properly, however if I reset it, everything is fine. Here is what I've done to try and resolve the issue.

I've pulled each drive one at a time and tested them with the WD diags tool full check. No errors reported, and no SMART values out of line. Rebuilding is always quick and successful.
I've updated to the latest RST driver that I could see (10.6.0.1002 - 5/20/2011).
I've updated to the latest IPMI firmware 1.27
I've flash the system BIOS, and cleared it R 1.1a
I've run a verification on the array. According to the application event log, I can see it begin and complete without any additional messages. It had shown one verification error, but I couldn't tell how to determine what that was. The last verification done today cleared that error, and again, there were no messages.
I've removed and reseated the connections.

The server in every other way is rock solid. It never locks up or has this behavior except during the night.

I began by starting my testing of the drives, and with each lockup, have progressed further in the steps, with the last changes occurring last week. It locked up most recently last night.

I'm growing frustrated. I know these are consumer drives, not necessarily intended for 24/7 operation. I purchased them because of their small size, and especially because my server is on 24/7, their low power consumption. The remainder of my drives are all WD 20EARS drives.


Any ideas or suggestions?
 
I thought the RE drives were the only ones WD allowed to be used in raid and that all others had TLER disabled which would present issues with RAID?
 
I know the RE drives have TLER set at a low value. My understanding is that the non-RE drives usually just end up getting dropped by the controller because they take too long doing error recovery. I'm not having issues with them dropping, or at least I don't think so, unless this is how the Intel soft RAID works.
 
yup. intel raid is good pretty much for raid-0 lol.

spend $50 on a real raid controller and get some RE4's. i've got 10 RE4's in raid-10 - works great.
 
Back
Top