ST2000DM001. 38 command timeouts. Should I worry?

eton975

Senior member
Jun 2, 2014
283
8
81
Backblaze's big data dump showed that drives with large numbers of command timeouts failed far, far more often than those without any. However, the range of the results was easily into the billions.

The lowest tier on their graph (above 0) was between a single timeout to 13 billion, which is a huge range. What if, say, drives with only a few command timeouts didn't have a very high failure rate at all, but drives with billions pushed that up to the 10% the column in that graph shows?

Asking this because I have a ST2000DM001 with 38 command timeouts. Don't worry, everything's backed up and it's not my main drive anymore (Toshiba ftw!). Can anyone from Backblaze or one of the data recovery guys check in on this?
 

eton975

Senior member
Jun 2, 2014
283
8
81
----------------------------------------------------------------------------
CrystalDiskInfo 6.5.2 (C) 2008-2015 hiyohiyo
Crystal Dew World : http://crystalmark.info/
----------------------------------------------------------------------------

OS : Windows 10 [10.0 Build 10240] (x64)
Date : 2015/09/26 11:56:18

-- Controller Map ----------------------------------------------------------
+ Intel(R) 8 Series/C220 Series SATA AHCI Controller - 8C02 [ATA]
- TOSHIBA DT01ACA300
- ST2000DM001-1ER164
- PHILIPS DVD+-RW DVD8881
- Microsoft Storage Spaces Controller [SCSI]

-- Disk List ---------------------------------------------------------------
(1) TOSHIBA DT01ACA300 : 3000.5 GB [0/0/0, pd1]
(2) ST2000DM001-1ER164 : 2000.3 GB [1/0/0, pd1] - st

----------------------------------------------------------------------------
(2) ST2000DM001-1ER164
----------------------------------------------------------------------------
Model : ST2000DM001-1ER164
Firmware : CC25
Serial Number : Z8E002SV
Disk Size : 2000.3 GB (8.4/137.4/2000.3/2000.3)
Buffer Size : Unknown
Queue Depth : 32
# of Sectors : 3907029168
Rotation Rate : 7200 RPM
Interface : Serial ATA
Major Version : ACS-2
Minor Version : ACS-3 Revision 3b
Transfer Mode : SATA/600 | SATA/600
Power On Hours : 2739 hours
Power On Count : 772 count
Temperature : 17 C (62 F)
Health Status : Good
Features : S.M.A.R.T., APM, 48bit LBA, NCQ
APM Level : 8001h [ON]
AAM Level : ----

-- S.M.A.R.T. --------------------------------------------------------------
ID Cur Wor Thr RawValues(6) Attribute Name
01 116 _99 __6 000006797640 Read Error Rate
03 _97 _96 __0 000000000000 Spin-Up Time
04 _99 _99 _20 000000000423 Start/Stop Count
05 100 100 _10 000000000000 Reallocated Sectors Count
07 _79 _60 _30 000005BC0709 Seek Error Rate
09 _97 _97 __0 000000000AB3 Power-On Hours
0A 100 100 _97 000000000000 Spin Retry Count
0C 100 100 _20 000000000304 Power Cycle Count
B7 100 100 __0 000000000000 Vendor Specific
B8 100 100 _99 000000000000 End-to-End Error
BB 100 100 __0 000000000000 Reported Uncorrectable Errors
BC 100 _99 __0 000000000027 Command Timeout
BD _98 _98 __0 000000000002 High Fly Writes
BE _83 _55 _45 000011100011 Airflow Temperature
BF 100 100 __0 000000000000 G-Sense Error Rate
C0 100 100 __0 000000000030 Power-off Retract Count
C1 _98 _98 __0 00000000104C Load/Unload Cycle Count
C2 _17 _45 __0 000A00000011 Temperature
C5 100 100 __0 000000000000 Current Pending Sector Count
C6 100 100 __0 000000000000 Uncorrectable Sector Count
C7 200 200 __0 000000000000 UltraDMA CRC Error Count
F0 100 253 __0 C39600000AAF Head Flying Hours
F1 100 253 __0 0006B9AAD16E Total Host Writes
F2 100 253 __0 00E5F1547CD0 Total Host Reads

-- IDENTIFY_DEVICE ---------------------------------------------------------
0 1 2 3 4 5 6 7 8 9
000: 0C5A 3FFF C837 0010 0000 0000 003F 0000 0000 0000
010: 2020 2020 2020 2020 2020 2020 5A38 4530 3032 5356
020: 0000 0000 0004 4343 3235 2020 2020 5354 3230 3030
030: 444D 3030 312D 3145 5231 3634 2020 2020 2020 2020
040: 2020 2020 2020 2020 2020 2020 2020 8010 4000 2F00
050: 4000 0200 0200 0007 3FFF 0010 003F FC10 00FB 5110
060: FFFF 0FFF 0000 0007 0003 0078 0078 0078 0078 0000
070: 0000 0000 0000 0000 0000 001F 850E 0006 00CC 0040
080: 03F0 001F 346B 7D69 4163 3449 BC49 4163 207F 0062
090: 0062 8001 FFFE 0000 D0D0 0000 0000 0000 0000 0000
100: 88B0 E8E0 0000 0000 0000 0000 6003 0000 5000 C500
110: 7A46 D33E 0000 0000 0000 0000 0000 0000 0000 405E
120: 401C 0000 0000 0000 0000 0000 0000 0000 0029 88B0
130: E8E0 88B0 E8E0 2020 0002 0140 0100 5000 3C06 3C0A
140: 0000 003C 0000 0008 0000 0000 05FF 0280 0000 0000
150: 0008 0000 0000 0000 0000 8000 0000 0000 5800 8000
160: 0000 0000 0000 0000 0000 0000 0000 0000 0002 0000
170: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
180: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
190: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
200: 0000 0000 0000 0000 0000 0000 1085 0000 0000 4000
210: 0000 0000 0000 0000 0000 0000 0000 1C20 0000 0000
220: 0000 0000 107E 0000 0000 0000 0000 0000 0000 0000
230: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
240: 0000 0000 0000 0007 0000 0000 0000 0000 0000 0000
250: 0000 0000 0000 0000 0000 16A5

-- SMART_READ_DATA ---------------------------------------------------------
+0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +A +B +C +D +E +F
000: 0A 00 01 0F 00 74 63 40 76 79 06 00 00 00 03 03
010: 00 61 60 00 00 00 00 00 00 00 04 32 00 63 63 23
020: 04 00 00 00 00 00 05 33 00 64 64 00 00 00 00 00
030: 00 00 07 0F 00 4F 3C 09 07 BC 05 00 00 00 09 32
040: 00 61 61 B3 0A 00 00 00 00 00 0A 13 00 64 64 00
050: 00 00 00 00 00 00 0C 32 00 64 64 04 03 00 00 00
060: 00 00 B7 32 00 64 64 00 00 00 00 00 00 00 B8 32
070: 00 64 64 00 00 00 00 00 00 00 BB 32 00 64 64 00
080: 00 00 00 00 00 00 BC 32 00 64 63 27 00 00 00 00
090: 00 00 BD 3A 00 62 62 02 00 00 00 00 00 00 BE 22
0A0: 00 53 37 11 00 10 11 00 00 00 BF 32 00 64 64 00
0B0: 00 00 00 00 00 00 C0 32 00 64 64 30 00 00 00 00
0C0: 00 00 C1 32 00 62 62 4C 10 00 00 00 00 00 C2 22
0D0: 00 11 2D 11 00 00 00 0A 00 00 C5 12 00 64 64 00
0E0: 00 00 00 00 00 00 C6 10 00 64 64 00 00 00 00 00
0F0: 00 00 C7 3E 00 C8 C8 00 00 00 00 00 00 00 F0 00
100: 00 64 FD AF 0A 00 00 96 C3 27 F1 00 00 64 FD 6E
110: D1 AA B9 06 00 00 F2 00 00 64 FD D0 7C 54 F1 E5
120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
160: 00 00 00 00 00 00 00 00 00 00 00 00 50 00 00 73
170: 03 00 01 00 01 CE 02 00 00 00 00 00 00 00 00 00
180: 00 00 00 00 1B 00 00 00 03 04 04 03 03 03 03 03
190: 03 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00
1A0: 00 00 00 00 00 00 00 00 47 0C 45 1D F8 08 00 00
1B0: 00 00 00 00 01 00 4D 15 6E D1 AA B9 06 00 00 00
1C0: D0 7C 54 F1 E5 00 00 00 00 00 00 00 00 00 00 00
1D0: 00 00 00 00 00 00 00 00 2F 00 00 00 01 00 00 00
1E0: 00 00 00 00 00 00 00 00 03 00 00 00 00 00 00 01
1F0: 00 00 00 00 00 00 00 00 00 00 14 18 00 00 00 0C

-- SMART_READ_THRESHOLD ----------------------------------------------------
+0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +A +B +C +D +E +F
000: 01 00 01 06 00 00 00 00 00 00 00 00 00 00 03 00
010: 00 00 00 00 00 00 00 00 00 00 04 14 00 00 00 00
020: 00 00 00 00 00 00 05 0A 00 00 00 00 00 00 00 00
030: 00 00 07 1E 00 00 00 00 00 00 00 00 00 00 09 00
040: 00 00 00 00 00 00 00 00 00 00 0A 61 00 00 00 00
050: 00 00 00 00 00 00 0C 14 00 00 00 00 00 00 00 00
060: 00 00 B7 00 00 00 00 00 00 00 00 00 00 00 B8 63
070: 00 00 00 00 00 00 00 00 00 00 BB 00 00 00 00 00
080: 00 00 00 00 00 00 BC 00 00 00 00 00 00 00 00 00
090: 00 00 BD 00 00 00 00 00 00 00 00 00 00 00 BE 2D
0A0: 00 00 00 00 00 00 00 00 00 00 BF 00 00 00 00 00
0B0: 00 00 00 00 00 00 C0 00 00 00 00 00 00 00 00 00
0C0: 00 00 C1 00 00 00 00 00 00 00 00 00 00 00 C2 00
0D0: 00 00 00 00 00 00 00 00 00 00 C5 00 00 00 00 00
0E0: 00 00 00 00 00 00 C6 00 00 00 00 00 00 00 00 00
0F0: 00 00 C7 00 00 00 00 00 00 00 00 00 00 00 F0 00
100: 00 00 00 00 00 00 00 00 00 00 F1 00 00 00 00 00
110: 00 00 00 00 00 00 F2 00 00 00 00 00 00 00 00 00
120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1C0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FD
 
Last edited:
Feb 25, 2011
16,983
1,616
126
Is the number increasing?

I ask because I had a bad SATA expander, which caused several of my hard drives to log several thousand errors (About 12k errors between the four drives that were effected). The drives are fine now, and the error counts haven't budged since, but the errors still show up in the SMART data. So it's possible that something in the past caused those errors to be logged.

The wikipedia note on that particular attribute (command timeouts) is interesting: "The count of aborted operations due to HDD timeout. Normally this attribute value should be equal to zero and if the value is far above zero, then most likely there will be some serious problems with power supply or an oxidized data cable."

So I'd be curious if it's actually a problem with the disk at all? Or if this is just an echo of a bad cable you already swapped out or something.

I'd also be curious if Seagate is actually using that register correctly, or if it's being used to store other data (they can do that.)
 
Last edited:

eton975

Senior member
Jun 2, 2014
283
8
81
Well, it did still increase with another cable when I tested for a few days. There was one day when it went up by like 20.

I might download the full Backblaze report and use SQL on it.
 

Soulkeeper

Diamond Member
Nov 23, 2001
6,731
155
106
C5 100 100 __0 000000000000 Current Pending Sector Count
C6 100 100 __0 000000000000 Uncorrectable Sector Count
C7 200 200 __0 000000000000 UltraDMA CRC Error Count

These number seem wrong to me, 100/200 for everything like placeholder values.
Try doing a offline/extended smart test with a different peice of software and getting the log.
Running the test will take hours, then you know the log is the most current generated data.
 

AlienTech

Member
Apr 29, 2015
117
0
0
on mine it shows...
BC 100 _75 __0 00C400C4012F Command Timeout
C5 100 100 __0 000000000000 Current Pending Sector Count
C7 200 200 __0 00000000001D UltraDMA CRC Error Count


BC 100 _99 __0 000000000001 Command Timeout
C5 100 100 __0 000000000000 Current Pending Sector Count
C7 200 200 __0 000000000000 UltraDMA CRC Error Count

But they been working fine for years..
 

Soulkeeper

Diamond Member
Nov 23, 2001
6,731
155
106
In linux with smartctl my seagate output looks more like this:
Code:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   117   099   006    Pre-fail  Always       -       146418096
  3 Spin_Up_Time            0x0003   093   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       316
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   076   060   030    Pre-fail  Always       -       8682604585
  9 Power_On_Hours          0x0032   072   072   000    Old_age   Always       -       24760
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       272
183 Runtime_Bad_Block       0x0032   099   099   000    Old_age   Always       -       1
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   098   098   000    Old_age   Always       -       2
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       1 1 1
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   060   045    Old_age   Always       -       31 (Min/Max 27/36)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       260
193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       743319
194 Temperature_Celsius     0x0022   031   040   000    Old_age   Always       -       31 (0 17 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       23
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       18890h+01m+40.841s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       160499360995899
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       237927206390499

SMART Error Log Version: 1
No Errors Logged


I guess your raw values are in hex, you'd just have to convert them. But even then some look odd ie: 000A00000011 Temperature

On the seagates you can essentially ignore Raw_Read_Error_Rate it just keeps incrementing everytime you use it.
The only ones that really matter are Current_Pending_Sector, Offline_Uncorrectable, Reallocated_Sector_Ct, Runtime_Bad_Block, and UDMA_CRC_Error_Count(usually bad cable or flaky drivers/fw).
 
Last edited:

CiPHER

Senior member
Mar 5, 2015
226
1
36
C5 100 100 __0 000000000000 Current Pending Sector Count
C6 100 100 __0 000000000000 Uncorrectable Sector Count
C7 200 200 __0 000000000000 UltraDMA CRC Error Count

These number seem wrong to me, 100/200 for everything like placeholder values.
They are not wrong - this is how SMART works. Many people do not know how SMART works. Let me try to explain:

You have:

Current - normalised value of the actual - current - value for this attribute
Worst - normalised value of the worst value for this attribute recorded in history
Threshold - the lowest limit the drive may have before the drive is considered faulty
Raw value - an absolute number often in hexadecimal which can mean anything depending on how the value has been generated - more about this later

Note that the first three are normalised values meaning they are 8-bit values ranging from 0 to 255. Contrary to what feels natural, a HIGHER number means a BETTER value. So 100 or 200 can be the best possible value. So 200 can mean no cable errors or bad sectors, while 199 can mean 20 cable errors or bad sectors, and so on. The harddrive itself decides what normalised value corresponds with an actual raw value - the raw value is often the 'real' value and the normalised value is just a reflection that automated systems can understand without having to know what that attribute means.

Because SMART is means to be an autonomous system where systems can read the SMART and say: this drive has failed. Your system BIOS for example often has an option to enable SMART and it will read the SMART for each harddrive and look if any Current value has decreased below the Threshold value. Then it will display a SMART error that the drive has failed or has a serious error. Thus, if the normalised value (Current) decreases below the threshold value, the entire drive is considered to be failed. The Current value means the actual value at this moment, while the Worst value is the lowest value recorded in history.

For example: temperature can be 25 degrees now having a normalised value of 75, while it was 45 degrees maximum in the past, meaning a normalised value of 55. Where 45 can be the threshold - meaning if the drive becomes hotter than 55 degrees you get a SMART error. In this example, a normalised value of 100 means 0 degrees (freezing temperature) Celsius. Some drive start at 150 or 200 and subtract the actual temperature in Celsius. Not all drives follow this scheme by the way, some violate the SMART specification by having the actual temperature as Current value.

on mine it shows...
BC 100 _75 __0 00C400C4012F Command Timeout
C5 100 100 __0 000000000000 Current Pending Sector Count
C7 200 200 __0 00000000001D UltraDMA CRC Error Count
UDMA CRC Error Count typically means cabling errors resulting in detectable data corruption. Not all cable errors can be detected if the cable is so bad it does not even link. But if it is semi-bad then the drive will appear to function but data will be corrupted every once in awhile. Thanks to CRC - which means error detection and NOT error correction - this does not go unnoticed and the UDMA CRC Error Count raw value will be increased.

Do not look at the normalised value for Current Pending Sector (=bad sectors) of UDMA CRC Error Count (=cable errors) but look for the raw value. 0x1D is the hexadecimal equivalent of 29 cable errors. This means you had a few cable errors (29) in the past.

Note that this value will never be decreased - it will stay at the same value if you had cabling errors in the past but not any longer. So you should capture your SMART data regularly and compare the UDMA CRC Error Count raw value with prior readings. If it stays the same, you are good. If it keeps increasing, you have problems with your cabling.

Current Pending Sector is the most important SMART attribute as it means ACTIVE bad sectors as opposed to PASSIVE bad sectors which mean those sectors have been swapped by reserve sectors. Passive bad sectors are reported as Reallocated Sector Count. Just one active bad sector can wreak havoc on legacy storage solutions, such as RAID or old filesystems like NTFS (Windows), HFS (Apple), Ext4 (Linux) or UFS (UNIX). Modern 3rd generation filesystems such as ZFS, Btrfs and ReFS are almost immune to bad sectors.

I guess your raw values are in hex, you'd just have to convert them. But even then some look odd ie: 000A00000011 Temperature
This is not odd - those values are binary encoded. Meaning: multiple values stored in one hexadecimal number. For temperature, this typically means actual/lowest/highest values which are all encoded in one hex number. You need knowledge about how the number is encoded to be able to extract this data. CrystalDiskInfo and SmartMonTools generally are up for the task.

On the seagates you can essentially ignore Raw_Read_Error_Rate it just keeps incrementing everytime you use it.
It does not increment, it can lower as well. It is not an absolute number, it is an encoded value like the temperature example you gave above. For RRER and SER (Seek Error Rate) please pay special attention to the last word: rate. This means the absolute number of errors is not relevant, only those relative to the number of operations. So you should interpret this attribute as explaining how many times ECC error correction is required to provide corruption free data extraction (RRER) and how many seeks go awry resulting in the drive having to seek again (SER). A bad SER means the drive often encounters vibrations or mechanical quality is suboptimal. A bad RRER often means the platter surface quality is suboptimal resulting in many errorcorrection (ECC) is required to know the contents of the sector. This means bad sectors without physical damage are more likely to happen. Or otherwise put: a bad signal to noise ratio.

With RRER and SER, do not look at the raw value - some manufacturers just put them to 0 just because consumers tend to interpret them as errors and return the drive for no reason. But the truth is no harddrive can read its contents without relying on ECC error correction - only ancient harddrives could read its contents without relying on ECC. Same goes with SSDs, ECC is mandatory. Errors are normal. Only the degree to which error correction is necessary, tells you something about the quality. That is why both RRER and SER are referred to as rate - meaning they are relative to the number of operations performed.

You should run the seatools tests and let it tell you the status of your drive.
Just beware that running the manufacturer utility might actually fix things and it will tell you the drive is in good shape while in fact bad sectors have disappeared/fixed without telling you. It will destroy 'evidence' this way. So before running the manufacturer utility i strongly recommend to capture the SMART data prior to using any test. This also includes chkdsk any basically any other utility. Simply reading the drive might resolve bad sectors due to error recovery.

The bad thing here is that the SMART data will not record bad sectors in the past - like UDMA CRC Error Count does for cabling errors. That number will never decrease. But for bad sectors there is no such attribute. There is only Uncorrectable Sector Count which is the same as Current Pending Sector but the difference is the latter is an online attribute meaning it is updated immediately, whereas Uncorrectable Sector Count is only updated every power cycle or once every x hours. So Current Pending Sector can be 0 but USC is still 29 for example, betraying the fact that in the past you had 29 sectors but since they have been overwritten they are gone now.

Always capture the SMART before doing anything else!
 

Soulkeeper

Diamond Member
Nov 23, 2001
6,731
155
106
Odd enough for me, because I don't use CrystalDiskInfo
"You need knowledge about how the number is encoded to be able to extract this data."
That pretty much sums it up.

One thing worth noting: like when a bios updates to add support for new cpu's or programs, like cpuid, your smart program of choice must also update.
There is really no guarantee that your ATTRIBUTE_NAME is even correct or that any of the values in the program's particular struct are right. It's worth keeping this in mind, and also having the latest version if you have a new drive.