I've got a couple drives in my ZFS system that are giving me trouble. I have 16 identical HD103SJ from over 5 years ago that have been flawless up until now. I've been getting emails every few weeks about an unrecoverable read error on one of them, and today got a message about another drive for the first time.
I pulled SMART data on all 16 drives, these are the only two that show any value other than 0 for RAW_VALUE on Raw_Read_Error_Rate. Both are fine from a value/threshold standpoint. For reference, the worst of the two:
I have a spare drive on hand to replace this one, but much of what I'm finding says that until the threshold value trips failure to just ignore the raw value. When I have 14 other drives reporting 0 though I feel like I'm tempting fate.
Should I order a few more spares and replace both of these, or just hold out until FreeNAS actually fails a drive? I never expected to have all 16 drives still alive after this long, they've been amazingly reliable for a distinctly consumer class drive.
Viper GTS
I pulled SMART data on all 16 drives, these are the only two that show any value other than 0 for RAW_VALUE on Raw_Read_Error_Rate. Both are fine from a value/threshold standpoint. For reference, the worst of the two:
Code:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 5782
2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0
3 Spin_Up_Time 0x0023 072 070 025 Pre-fail Always - 8782
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 62
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 38413
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 252 252 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 93
191 G-Sense_Error_Rate 0x0022 252 252 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 059 053 000 Old_age Always - 41 (Min/Max 21/52)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 0
223 Load_Retry_Count 0x0032 252 252 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 93
I have a spare drive on hand to replace this one, but much of what I'm finding says that until the threshold value trips failure to just ignore the raw value. When I have 14 other drives reporting 0 though I feel like I'm tempting fate.
Should I order a few more spares and replace both of these, or just hold out until FreeNAS actually fails a drive? I never expected to have all 16 drives still alive after this long, they've been amazingly reliable for a distinctly consumer class drive.
Viper GTS
