So many interesting posts, shame I didn't read all of them. However, I would like to make some contributions to this discussion:
What are the differences between WD Red, WD Green and WD AV drives?
WD Red has TLER enabled, WD Green has TLER disabled and cannot be enabled. WD AV are special 'Audio/Video' drives that are TLER=0 meaning they make no attempt to recover weak sectors at all. This is to prevent lost frames because the harddrive would not be able to keep up with the datarate if it spends much time on recovery.
WD Red has some different specifications than WD Green, including the mention that Reds would be suitable for 24/7 operation. From what I have been able to gather from similar situations in the past, the only difference is probably only the firmware and added warranty and not any physical difference. Only the special 'RE' drives are physically different.
There is some confusion about this, since the WD Green is listed as higher power consumption than the WD Red. But already mentioned in this thread, this is due to there being two versions of the WD Green one with conservative 750GB platters and a modern 1000GB platter version. The latter should be equal to WD Red. 1000GB platters are used in the most modern generation of harddrives today. The fewer platters, the better. It will be faster due to increased data density (sectors per track), power consumption will drop, height and weight may be less, less mechanical components such as heads and the platter itself. All this probably also translates to increased reliability, due to the general rule that lesser mechanical components equals lesser probability of failure.
TLER, what is it, what is not?
TLER or Time-Limited Error Recovery, is a very simple setting used by the harddrive firmware that tells how much seconds the harddrive may spend trying to recover the contents of an unreadable sector, before it gives up and returns an error.
Basically, all drives have TLER. Consumer drives usually spend about 120 seconds recovering before they return an error. You could say this is TLER=120. But as convention, we say TLER is off when it is set to a high value like 120. When we enable TLER, that usually means TLER=7 or 7 seconds recovery time. This value is a convention since the most strict hardware RAID controllers use 10 seconds as time-out value. This brings us to the next question...
Why do we need TLER?
Some might say 'because you do not want to wait 120 seconds'. This statement is inaccurate and misleading in many ways. I will try to explain. But first, let's look at how hardware RAID controllers behave versus regular drives with TLER=120 ('off').
1. Hardware RAID controller sends read request to harddrive X
2. Harddrive X cannot read a sector and keeps trying for more than 10 seconds
3. Hardware RAID controller things 'huh, drive X is not responding, it must have been failed!'
4. Hardware RAID controller kicks out drive X from the RAID-array
5. Hardware RAID controller updates the metadata on the remaining disk members, to reflect the detached disk. This is to prevent the disk from reattaching when you reboot or power-cycle.
6. Hardware RAID controller mentions the fact that it is running DEGRADED or FAILED depending on what RAID-scheme you were using.
This sequence is typical for many hardware RAID controllers. But not all, some have much higher timeout values. Others do not drop disks but return I/O errors to the application. There also was speculation that some controllers might use redundant sources for the bad sector and use that instead, but I never have seen anything to substantiate this claim.
The real reason you need TLER, therefore, is that hardware RAID controllers adhere to very strict timeouts. Basically, your drive is working perfectly, or it is being kicked out. Such a controller would require TLER-enabled harddrives because whenever you might encounter a bad sector, you do not want the entire disk to be 'failed' because of one tiny 512-bytes of unreadable data.
The truth is, hardware RAID including what is called 'onboard RAID' are very dumb about timeouts, causing many people to have lost data because their RAID failed because disks were kicked out. The user can recover from this situation, but many fail and their own attempts might finish off any chance of successful recovery. In general, hardware RAID and onboard RAID behave very poorly to bad sectors.
Can TLER be dangerous?
Yes, unfortunately. It is really just a dirty hack because hardware RAID and onboard RAID are behaving so poorly to disk timeouts.
If you do not need TLER, you do not want it. Why? Because you disable your last line of defense.
Assume you are running a RAID5 on a Linux software RAID platform where you do not need TLER. So your disks do not have TLER as well. Assume one day a disk fails in the array. You have a spare disk lying around and are swapping the bad disk for the spare one. While rebuilding, your array is vulnerable because it has no redundancy available for the data still pending to be synchronised. If one other disk member were to encounter a bad sector - and this happens more frequently than you think - you would have a major problem. This unreadable sector could seriously cause headaches and even loss of all data in cases where the users' response would ultimately cripple the integrity of the RAID.
In such situation, where you have lost your redundancy, you want your harddrives to spend the time they need trying to recover the data. Even if the chance is small, you would want that last line of defence. Why else would we humans use seat belts in our cars? It is not like we want to crash, but if it happens, we want a last line of defence.
Why are bad sectors such a problem?
Good question. Why should everything go to hell because one tiny fraction of your harddrive is unreadable. The harddrives are designed to generate unreadable sectors by the way -- manufacturers have chosen for only basic error recovery. If more error recovery was applied, bad sectors would have been much less common, but disks would also have been smaller due to less space being able to use effectively.
The real problem
The real problem is that todays storage hardware is not perfect and due to higher data densities and increasing capacities, bad sectors are much more common than they used to be. In the meantime, the software we use (NTFS, Ext4, UFS) has not been designed to cope with bad sectors at all. They offer no protection to your data or the crucial filesystem metadata; it is at the mercy of bit rot. If a bad sector were to be located on filesystem metadata, that could severely damage the data integrity and cause the data to be inaccessible and require recovery utilities to get most of it back.
In other words, todays software is not designed for todays hardware.
The real solution
The real solution.... is ZFS. Simply so superior in almost every way and virtually immune to bad sectors! ZFS would correct them on the fly without you ever noticed there was a problem. Once you migrate your data to ZFS, you will have granted it formidable protection against corruption and loss of data in general. I can only recommend people have a look at ZFS and be convinced about how superior it is to the legacy RAID solutions and filesystems of today.
Oh and ZFS likes those WD Green harddrives just fine. ZFS works very well with cheap harddisks. Headparking is a feature more harddrives have, including 7200rpm ones. It can be disabled by setting APM to 254. I believe only WD uses persistent APM setting which survives a power cycle. Other vendors - due to patents - may only implement a volatile equivalent. So the headparking issue is the least severe on WD one could argue. Funny, isn't it?
Challenge the authority
I invite you to challenge everything, as long as you provide good analyses and arguments.
Cheers,
sub.mesa