WD Red or WD RE4 drives for ZFS NAS?

jdcooper · Sep 9, 2012

I'm planning to build a ZFS NAS with FreeNAS. The initial plan is 5x1TB drives in RAID-Z2. My intent is to build the server using ECC RAM.

Is it crazy to consider the WD Red drives? On paper, they seem like a good choice, but I haven't heard a lot of reports of people actually using them yet. The RE4 on the other hand seems tried and true.

Jon

Arcanedeath · Sep 9, 2012

The RE4 will perform better and had a longer warrenty so I'd go with those.

jdcooper · Sep 11, 2012

Thanks. They seem to be the least expensive drive I can find that a) is suitable for RAID, b) doesn't have horrendous reviews on Newegg, and c) has a >1y warranty.

imagoon · Sep 11, 2012

I use the 2TB Red drives in my home NAS/Vmware server. I am easily pulling 90MB/s off of 3 in RAID 5 (from windows copy not VMware's monitor). The Blacks are a bit faster but not a ton really.

More info:

2TB RED 3 disk RAID 5 -> 1TB BLACK 3 disk RAID 5 on the same controller averaged at 90MB/s over 6gig of data. Copied from another VM running 2008R2 -> 2012 via SMB.

thelastjuju · Sep 11, 2012

Its all about longevity. The 5 year warranty on BLACKS and RE4's reflects how much confidence the manufacturer has in them.

The WD Reds have a decent 3 year warranty, but just haven't been out long enough to get a good read on them. Plus their prices keep going through the roof for no apparent reason. But I would trust them far over any GREEN offerings out there.

The $10 difference (between the REDS --> BLACKS/RE4's) is well worth the extra 2-year warranty, without a doubt.

:thumbsup:

Remember, the HDD probably has the highest % chance of failing out of every component.

Insert_Nickname · Sep 11, 2012

thelastjuju said:
Its all about longevity. The 5 year warranty on BLACKS and RE4's reflects how much confidence the manufacturer has in them.

The WD Reds have a decent 3 year warranty, but just haven't been out long enough to get a good read on them. Plus their prices keep going through the roof for no apparent reason. But I would trust them far over any GREEN offerings out there.

The $10 difference (between the REDS --> BLACKS/RE4's) is well worth the extra 2-year warranty, without a doubt. :thumbsup:

Remember, the HDD probably has the highest % chance of failing out of every component.

Cant argue with you there...

I did have one black fail on me though, must have been a monday drive...

imagoon · Sep 11, 2012

Not sure about how ZFS works but the Red are supported on RAID cards. TLER of less than 7 seconds etc. The blacks are not (anymore). If zfs can handle the 90 second recovery then the blacks are fine. Otherwise you need RE4 or Reds depending on your performance needs.

jdcooper · Sep 12, 2012

Thanks everybody. I think I'm going to hold out until I can find a decent price on the RE4s and just do that. The extra 2 years' warranty for a few extra bucks seems worth it.

I have read that the RE4s have a tendency to run hot so I will get some first rate fannage on them too.

murphyc · Sep 23, 2012

imagoon said:
Not sure about how ZFS works but the Red are supported on RAID cards. TLER of less than 7 seconds etc. The blacks are not (anymore). If zfs can handle the 90 second recovery then the blacks are fine. Otherwise you need RE4 or Reds depending on your performance needs.

The advantage of shorter recovery, which then results in a read error, is the RAID (or the file system in the case of redundant data ReFS, btrfs, and ZFS) can find the data elsewhere, be it a mirrored copy or reconstructed from parity. So the recovery of the whole storage system is simply faster rather than it waiting. In any case, the bad LBA that originally caused the error will be overwritten with corrected data.

Sounds like regular scrubs aren't being done, and almost certainly regular SMART extended offline tests aren't being done. If you're scrubbing a drive with very high ERC value, if a bad sector is recovered before that time out, the file system (or RAID implementation) won't question the delay. Nevertheless, the scrub, if a read error does occur at some point, it's going to get fixed at that point, rather than just persist until one day you try to read a totally unreadable sector and then end up with a stalled storage system while one disk tries deep recovery on a handful of sectors.

Anyway, more proactive scrubbing and testing seems warranted by most people having these problems in their storage systems. But ZFS can use either kind of drive, it's just a matter of how fast a read error finally is produced and thus how fast the file system recovers. It wouldn't otherwise risk the whole array (or storage pool).

imagoon · Sep 24, 2012

murphyc said:
The advantage of shorter recovery, which then results in a read error, is the RAID (or the file system in the case of redundant data ReFS, btrfs, and ZFS) can find the data elsewhere, be it a mirrored copy or reconstructed from parity. So the recovery of the whole storage system is simply faster rather than it waiting. In any case, the bad LBA that originally caused the error will be overwritten with corrected data.

Sounds like regular scrubs aren't being done, and almost certainly regular SMART extended offline tests aren't being done. If you're scrubbing a drive with very high ERC value, if a bad sector is recovered before that time out, the file system (or RAID implementation) won't question the delay. Nevertheless, the scrub, if a read error does occur at some point, it's going to get fixed at that point, rather than just persist until one day you try to read a totally unreadable sector and then end up with a stalled storage system while one disk tries deep recovery on a handful of sectors.

Anyway, more proactive scrubbing and testing seems warranted by most people having these problems in their storage systems. But ZFS can use either kind of drive, it's just a matter of how fast a read error finally is produced and thus how fast the file system recovers. It wouldn't otherwise risk the whole array (or storage pool).

Good to know for ZFS. In RAID, the drive will get dumped after about 10 seconds so it will force a rebuild. During read recovery the drive basically goes unresponsive. If ZFS can handle a unresponsive drive for 90 seconds without doing recovery actions then it is fine.

murphyc · Sep 24, 2012

There is a disadvantage of fast error recovery if a drive fails, leaving your RAID degraded. Now, in degraded mode, the RAID doesn't have a way to recover from these fast errors. So really what should happen is all working drives in a degraded array would have SCT ERC timeout set to a higher value to compel the drive to recover what's now the only copy of data. Premature early read errors with low ERC timeout would cause the whole array to fail.

murphyc · Sep 24, 2012

imagoon said:
In RAID, the drive will get dumped after about 10 seconds so it will force a rebuild.

This is between misleading and false. The timeout for a RAID controller dropping a drive is variable. If you have a controller that drops drives after 10 seconds, but you're buying drives that have longer recoveries, then you're a bad sysadmin. The controller timeout needs to be compatible with the drive timeout.

And software RAID will wait for the drive to conclusively deliver the data or a read error.

If ZFS can handle a unresponsive drive for 90 seconds without doing recovery actions then it is fine.

Pretty much anything can handle an unresponsive drive for 90 seconds without dumping the drive: either by default such as resilient file systems, or software RAID, or hardware RAID controllers which are typically configurable but default to far lower timeouts. But those same controllers, if they aren't junk, expect to have drives with lower SCT ERC settings, and also have full ECC so the likelihood of read errors is far lower from the get go. Unlike consumer disks.

imagoon · Sep 24, 2012

murphyc said:
There is a disadvantage of fast error recovery if a drive fails, leaving your RAID degraded. Now, in degraded mode, the RAID doesn't have a way to recover from these fast errors. So really what should happen is all working drives in a degraded array would have SCT ERC timeout set to a higher value to compel the drive to recover what's now the only copy of data. Premature early read errors with low ERC timeout would cause the whole array to fail.

The whole point of a fast failure is to allow the RAID controller to handle the read error which would normally either be a sector rewrite or a sector remap. IE you want it to fail fast so the drive does not fall out of the array. A single sector read error will not dump a drive unless the error count goes above a count specified in the controller or until it runs out of sector remappable space.

Long read timeouts (90 seconds) cause a disk drop which will require a rebuild and a degraded array until the rebuild is complete.

You seem to think that tler at 7 seconds means the drive drops. All it means is the drive reports a failed read to the controller. 90 second tler values like consumer drives is the same thing. The issue is the 90 second drives go unresponsive longer than the controller allows and drops the disk.

imagoon · Sep 24, 2012

murphyc said:
This is between misleading and false. The timeout for a RAID controller dropping a drive is variable. If you have a controller that drops drives after 10 seconds, but you're buying drives that have longer recoveries, then you're a bad sysadmin. The controller timeout needs to be compatible with the drive timeout.

And software RAID will wait for the drive to conclusively deliver the data or a read error.

Pretty much anything can handle an unresponsive drive for 90 seconds without dumping the drive: either by default such as resilient file systems, or software RAID, or hardware RAID controllers which are typically configurable but default to far lower timeouts. But those same controllers, if they aren't junk, expect to have drives with lower SCT ERC settings, and also have full ECC so the likelihood of read errors is far lower from the get go. Unlike consumer disks.

I am not sure where you are going with this. Software RAID is another animal. Hardware controllers typically have a 10second timeout. Most disks with tler set to 7 seconds will fail properly in this time allowing the RAID controller to do its job, IE rebuild the sector and take what ever recovery actions it needs to do.

Your jab at me about a bad sysadmin is misdirected because it was you that has fabricated that I am doing this. I simply said: drives with 90 second TLER values will more than likely get booted from the array. I am not as well versed in linux but the Windows software RAID will also boot the disk before 90 seconds. It will not wait for a conclusive read error if the disk appears to be hung. If the disk appears hung, it boots it and performs recovery actions. Hardware RAID controllers generally perform the same way except they tend to be even tighter ie the near standard of 10 seconds before booting a non-responsive disk.

murphyc · Sep 24, 2012

imagoon said:
The whole point of a fast failure is to allow the RAID controller to handle the read error which would normally either be a sector rewrite or a sector remap. IE you want it to fail fast so the drive does not fall out of the array.

Long read timeouts (90 seconds) cause a disk drop which will require a rebuild and a degraded array until the rebuild is complete.

You failed to read what I wrote. I am referring to an already degraded array. The disadvantage of short error recovery for an already degraded array made of consumer disks is that it increases the chances that the whole array will fail. There is no recovery for a degraded array, any additional read errors result in data loss.

Ideally, upon going degraded, the remaining disks should have higher SCT ERC timeouts set. You should not be using consumer disks with RAID controllers that expect fast recoveries and aren't otherwise configurable.

I think we're going to find a new category of failures with Red disks being used in arrays with controllers expecting fast recoveries, because they will have the same error rate as consumer drives, but don't have the better ECC that enterprise disks have.

Basically what's happening is people are buying improperly matched hardware. And your blanket assertion that long timeouts lead RAID to drop disks is not always true - it depends on the RAID implementation, and how the RAID controller is configured. Not all have a problem with long ERC times.

murphyc · Sep 24, 2012

imagoon said:
I am not sure where you are going with this. Software RAID is another animal. Hardware controllers typically have a 10second timeout. Most disks with tler set to 7 seconds will fail properly in this time allowing the RAID controller to do its job, IE rebuild the sector and take what ever recovery actions it needs to do.

1. I am referring to an already degraded array. The controller cannot do squat because it's already on its last copy of all available data. If there is a read error on a degraded array, there is data loss.

2. Hardware RAID is usually anything but except better cards in the $150+ range. But if that's what you're using, they are configurable. If they're not configurable *and* you're using consumer disks, then you're not a good sysadmin.

3. Controllers don't rebuild or reallocate sectors. Upon read error, either software or hardware RAID will find the data from mirrored copy or reconstruct from parity, and then write the correct data to the LBA on the disk that previously reported the failed read. It is the disk itself that reallocates the sector *IF* there is a persistent write error and there are spare sectors on the disk.

Your jab at me about a bad sysadmin is misdirected because it was you that has fabricated that I am doing this.

I never said you were doing this. I never called you personally a bad sysadmin. It's a blanket statement that people mismatching their hardware are bad sysadmins. And we see this *all the time* on various forums where people were pairing WDC Green drives in RAID 5 configurations and whining about multiple disk failures. They wanted cheap, they got cheap. And when told these disks were only meant for RAID 0 and 1, not RAID 4 or 5 or 6, they complained more.

I simply said: drives with 90 second TLER values will more than likely get booted from the array.

Untrue statement. In the consumer realm, many of those "hardware RAID" are actually software RAID on a card or in an enclosure, and are quite tolerant of high SCT ERC recoveries.

I am not as well versed in linux but the Windows software RAID will also boot the disk before 90 seconds. It will not wait for a conclusive read error if the disk appears to be hung. If the disk appears hung, it boots it and performs recovery actions.

How it's justifiable to have software RAID attempt fast recovery on slow recovery drives is bizarre.

This does not make sense at all for a software RAID to behave this way, so I find it difficult to believe it's true. Software RAID is overwhelmingly the domain of pairing with consumer disks, widely known to have slow error recoveries. For Windows to do this with RAID 0 arrays would cause large numbers of total array failures, yet I haven't seen this at all. And further, it doesn't make sense for the underlying software RAID to be less tolerant than NTFS, which even if it receives a read error from a disk (not in an array) it will insist the drive retry. So you can get really long recoveries on Windows with NTFS because of this.

Hardware RAID controllers generally perform the same way except they tend to be even tighter ie the near standard of 10 seconds before booting a non-responsive disk.

That is a piece of hardware that comes from a world that expects to be paired with enterprise disks. Those disks have vastly better ECC than consumer disks. If they haven't corrected the error in even a few seconds, it isn't recoverable data. That's why they fail quickly. That is simply not the case with consumer disks, which is why you get retries. Their ECC isn't as good. It's much much slower. So my point is if any person buys hardware RAID controllers with 10 second timeouts, and they pair it with disks that have longer than 10 second recoveries, they're bad sysadmins.

In enterprise installations, this whole question never comes up. Everything works exactly as designed. The problem arose when people started buying consumer drives with slow error recovery, and put them into a situation where they'd get booted sooner than they should.

imagoon · Sep 24, 2012

murphyc said:
You failed to read what I wrote. I am referring to an already degraded array. The disadvantage of short error recovery for an already degraded array made of consumer disks is that it increases the chances that the whole array will fail. There is no recovery for a degraded array, any additional read errors result in data loss.

Ideally, upon going degraded, the remaining disks should have higher SCT ERC timeouts set. You should not be using consumer disks with RAID controllers that expect fast recoveries and aren't otherwise configurable.

I think we're going to find a new category of failures with Red disks being used in arrays with controllers expecting fast recoveries, because they will have the same error rate as consumer drives, but don't have the better ECC that enterprise disks have.

Basically what's happening is people are buying improperly matched hardware. And your blanket assertion that long timeouts lead RAID to drop disks is not always true - it depends on the RAID implementation, and how the RAID controller is configured. Not all have a problem with long ERC times.

Can you support your comment that the REDs do not have the same ECC as the Enterprise drives? Also can you show me where I stated long ERC always drop disks? My comment is based on life experience. Yes you can change the controller timeouts. However I have rarely ever been in a case where I want the RAID controller to wait for the drive for 90 seconds. During that time the LUN is unresponsive, I don't want data to halt for 90 seconds. I would rather that RAID controller do its job and rebuild the data rather than waiting and hoping the disk can do it. Giving the disk 7 seconds before having it tell the controller that there was an issue seems reasonable.

90 seconds tends to cause the applications sitting above the disk subsystem to start timing out. IE ESXi dropping the datastore, Windows dropping the virtual disk etc.

Looking at the testing WD has done for the red line I think WD and those major manufactures [Syndology, Dlink, Drobo, etc] generally agree otherwise they would not default to 10 second time outs.

As for the array rebuild failing due to short timeouts... The brands listed as compatible with WD simply mark unrebuildable sectors as bad [up to a certain limit, typically a few hundred to several thousand]. If there was enough bad sectors to fail the array rebuild, disk sweeping should have discovered the failing disk already.

3. Controllers don't rebuild or reallocate sectors. Upon read error, either software or hardware RAID will find the data from mirrored copy or reconstruct from parity, and then write the correct data to the LBA on the disk that previously reported the failed read. It is the disk itself that reallocates the sector *IF* there is a persistent write error and there are spare sectors on the disk.

This is untrue. Nearly all LSI cards build a LUN with spare sectors. I would take a good guess the other major companies do also. I am pretty sure Highpoint does/did this in the consumer realm. They don't assume the disk can remap, they also will remap LUN sectors in the LUN spare area.

murphyc · Sep 24, 2012

imagoon said:
Can you support your comment that the REDs do not have the same ECC as the Enterprise drives?

It's a SATA disk, not a SAS disk for one. The ECC between the two is different by spec. Second, the Red is not even an NL SATA disk.

Also can you show me where I stated long ERC always drop disks?

By inference. Your statements have been unqualified. For example:

Long read timeouts (90 seconds) cause a disk drop

In RAID, the drive will get dumped after about 10 seconds so it will force a rebuild.

However I have rarely ever been in a case where I want the RAID controller to wait for the drive for 90 seconds.

Then don't use drives that have longer time outs than the controller.

During that time the LUN is unresponsive, I don't want data to halt for 90 seconds.

Then don't use drives or controllers that have longer time outs than required for the application.

I would rather that RAID controller do its job and rebuild the data rather than waiting and hoping the disk can do it.

It can't rebuild data when the array is degraded. That is my whole frigging point that you keep ignoring. Once degraded, a premature read error will cause data loss in the array, and in many implementations read errors while rebuilding a replaced drive in a degraded array will cause the rebuild to halt because the data simply cannot be reconstructed - it's not there once you're getting read errors in a degraded state.

Giving the disk 7 seconds before having it tell the controller that there was an issue seems reasonable.

For SAS it's very reasonable. There's a reason why consumer disks take longer.

And for consumer SATA in RAID arrays, fast ERC is reasonable when the array is otherwise functioning normally, i.e. not degraded. But as soon as you lose a drive, fast recovery is not what you want. You want those drives to do everything they can to recover data because as soon as you have a read error from any remaining drive there is no other copy of the data. (And this applies to RAID 6 in a two failed disk degraded state; it functionally operates OK with minimal risk in a one failed disk degraded state because there's still additional parity in the face of read errors).

90 seconds tends to cause the applications sitting above the disk subsystem to start timing out. IE ESXi dropping the datastore, Windows dropping the virtual disk etc.

Who are using dirt cheap consumer disks in such an application and then bitching about long error recoveries? Bad sysadmins.

Looking at the testing WD has done for the red line I think WD and those major manufactures [Syndology, Dlink, Drobo, etc] generally agree otherwise they would not default to 10 second time outs.

Red will work better in the short term with periodic read errors. But it's immediately a higher risk in a degraded condition. Instead of the controller dropping the disk, the disk will drop its own sectors, and the controller will have no way to recover since the array is already degraded.

As soon as you're degraded you're going to want SAS disks with better ECC. Or you're going to want all of those Reds (and the controller) to tolerate a longer error recovery time. It doesn't need to be 90 seconds, but the whole point of having RAID is availability. If you're faced with the whole array collapsing versus merely 15 second or 30 second delays, I think people will choose delays than collapse. And then promptly build a higher quality RAID with higher quality disks.

As for the array rebuild failing due to short timeouts... The brands listed as compatible with WD simply mark unrebuildable sectors as bad [up to a certain limit, typically a few hundred to several thousand]. If there was enough bad sectors to fail the array rebuild, disk sweeping should have discovered the failing disk already.

How do you propose a RAID rebuild works when your only copy of data (either mirrored copy or via parity) is reported as bad by a remaining disk?

This is untrue. Nearly all LSI cards build a LUN with spare sectors.

Enterprise gear works differently, true. Why someone is using a high quality LSI card with consumer disks is mysterious, however. You will build your LUNs with a higher quality controller and SAS disks. Maybe you'd go a bit cheaper and use NL SATA. But regular consumer SATA? Risky. Why bother? The unrecoverable error rate of consumer SATA disks is an order of magnitude higher than enterprise SATA and two orders of magnitude greater than enterprise SAS.

blastingcap · Sep 24, 2012

Performance issues aside how bad is it to use cheap green drives instead of red or re4 drives? Red drives are sold out or pricegouged, and it is a lot cheaper to go green. I would not mind having to swap out drives more often. Any special settings for zfs necessary if using green drives?

tynopik · Sep 24, 2012

murphyc said:
It's a SATA disk, not a SAS disk for one. The ECC between the two is different by spec.

That's the interface and has nothing to do with how the data is actually stored on the disk.

murphyc said:
Or you're going to want all of those Reds (and the controller) to tolerate a longer error recovery time. It doesn't need to be 90 seconds, but the whole point of having RAID is availability. If you're faced with the whole array collapsing versus merely 15 second or 30 second delays, I think people will choose delays than collapse. And then promptly build a higher quality RAID with higher quality disks.

Um, the point is that RAID controllers aren't going to wait 30 or even 15 seconds, they'll drop the disk from the array and move on.

imagoon · Sep 24, 2012

It's a SATA disk, not a SAS disk for one. The ECC between the two is different by spec. Second, the Red is not even an NL SATA disk.

IOEDC/IOECC is supported on both SAS and SATA. It is on the Red / Black / Blue / RE4 / WD Enterprise drives. The ECC portion of SAS / SATA is not interface specific. It is an HDA thing.

Also can you show me where I stated long ERC always drop disks?
By inference. Your statements have been unqualified. For example:

Obviously there is nothing absolute in the IT world. However I will give you that in the consumer land where cheap junk is common, I could see them tuning the timeouts to prevent issues for the consumer to deal with

I would rather that RAID controller do its job and rebuild the data rather than waiting and hoping the disk can do it.
It can't rebuild data when the array is degraded. That is my whole frigging point that you keep ignoring. Once degraded, a premature read error will cause data loss in the array, and in many implementations read errors while rebuilding a replaced drive in a degraded array will cause the rebuild to halt because the data simply cannot be reconstructed - it's not there once you're getting read errors in a degraded state.

You say I keep ignoring it but quite simply, I don't agree with you. Most current controllers while rebuilding from a degraded status will mark any sectors that it can't rebuild as 'Bad' at the file system level and keep on rebuilding. This is justified in that losing the entire array is 'bad.' Losing several files is generally better. At this point you restore from backups the missing files and replace the other failed disk in the LUN and let it rebuild again. RAID isn't a backup so this is acceptable.

Giving the disk 7 seconds before having it tell the controller that there was an issue seems reasonable.
For SAS it's very reasonable. There's a reason why consumer disks take longer.

And for consumer SATA in RAID arrays, fast ERC is reasonable when the array is otherwise functioning normally, i.e. not degraded. But as soon as you lose a drive, fast recovery is not what you want. You want those drives to do everything they can to recover data because as soon as you have a read error from any remaining drive there is no other copy of the data. (And this applies to RAID 6 in a two failed disk degraded state; it functionally operates OK with minimal risk in a one failed disk degraded state because there's still additional parity in the face of read errors).

90 second recovery on consumer drives is reasonable because it isn't assumed there is another disk or group of disks available for consumer drives. Also, I am trying to locate the study, but there was evidence that if the drive failed to read a sector in 10 seconds that the odds that it would be able to read that sector in 90 seconds was pretty near zero. If I can find it I will post it.

As soon as you're degraded you're going to want SAS disks with better ECC. Or you're going to want all of those Reds (and the controller) to tolerate a longer error recovery time. It doesn't need to be 90 seconds, but the whole point of having RAID is availability. If you're faced with the whole array collapsing versus merely 15 second or 30 second delays, I think people will choose delays than collapse. And then promptly build a higher quality RAID with higher quality disks.

As per above (feel free to check the WD and Seagate specifications.) The SATA and SAS disks use the same ECC correction technology.

How do you propose a RAID rebuild works when your only copy of data (either mirrored copy or via parity) is reported as bad by a remaining disk?

From above:

You say I keep ignoring it but quite simply, I don't agree with you. Most current controllers while rebuilding from a degraded status will mark any sectors that it can't rebuild as 'Bad' at the file system level and keep on rebuilding. This is justified in that losing the entire array is 'bad.' Losing several files is generally better. At this point you restore from backups the missing files and replace the other failed disk in the LUN and let it rebuild again. RAID isn't a backup so this is acceptable.

Enterprise gear works differently, true. Why someone is using a high quality LSI card with consumer disks is mysterious, however. You will build your LUNs with a higher quality controller and SAS disks. Maybe you'd go a bit cheaper and use NL SATA. But regular consumer SATA? Risky. Why bother? The unrecoverable error rate of consumer SATA disks is an order of magnitude higher than enterprise SATA and two orders of magnitude greater than enterprise SAS.

Considering the Red drives in question are rated for 1,000,000 hours vs Blue and Green's 750,000. Black, RE4 and RE SAS are rated at 1,200,000. Seems like the cheap Red drives are a pretty decent compromise for slower mass storage. Considering except for the XE series, which is WD's "very high reliability" line. All of the WD enterprise lines are 1.2million.

All those numbers for expected life and none of them are an order of magnitude better than any other. SAS line would need to be 7.5million hours to just 1 order higher than the lowest rated consumer drive.

murphyc · Sep 24, 2012

blastingcap said:
Performance issues aside how bad is it to use cheap green drives instead of red or re4 drives? Red drives are sold out or pricegouged, and it is a lot cheaper to go green. I would not mind having to swap out drives more often. Any special settings for zfs necessary if using green drives?

Well the only reason why you'd get long error recovery on a green drive is due to persistent read errors (internal to the disk). The problem is that it's non-obvious when this is happening until it gets really bad. If you're looking at the full SMART attributes rather than just the single pass/fail health attribute, you can see these show up in either the raw error read rate, ECC errors, or pending sector errors. Some system logs might also show read delays, not just a disk that reports back a read error. So you kinda have to do some diagnostics periodically and most people don't do that. They like to pretend the state of a disk drive is binary: working vs not working.

So my suggestion is that you initiate ATA Enhanced Secure Erase on the drives periodically which will remove persistently bad sectors from use. You could also just zero the disk with a zeroing utility: this is easier for most people, but it's much slower, and it only zero's addressable sectors rather than all sectors. But if there are any bad sectors, it's equally effective at compelling the firmware to remove them from use, and substitute reserve sectors instead.

murphyc · Sep 24, 2012

Definition of periodically: I don't know. Once a year seems reasonable.

More often, perhaps once a month, I'd have the RAID scrubbed. There are different kinds of scrubs.

For example md "check" will only record mismatches, it won't attempt to fix them. But if the disk reports back to the md driver a sector read error, then md's normal recovery routine will find the data elsewhere (mirrored copy or from parity) and write the correct data to the same LBA that returned the read error, causing the disk firmware to determine if that sector has transient or persistent errors. If transient the write should fix it. If persistent, the write won't fix it, and the firmware will remap the LBA to a reserve physical sector.

And the md "repair" basically resyncs the whole array, and recomputes parity and if necessary writes new parity chunks if there are mismatches. Obviously there is an ambiguity here with conventional RAID 1; whereas ZFS mirroring can unambiguously determine which chunks are correct and fix the wrong ones.

Check is faster than repair. So it makes sense to schedule a repair and check the log. If there are errors, then do a repair. But even check makes certain kinds of repairs if there are sector read errors as reported by the disk. So it's an important maintenance item to do scrubbing of some kind for the array. As well as checking the detail version of SMART attributes.

murphyc · Sep 24, 2012

tynopik said:
That's the interface and has nothing to do with how the data is actually stored on the disk.

a.) Not always true. SAS disks regularly support 520 and 528 byte sectors for the purpose of better ECC.

b.) The ECC hardware in a SAS disk is faster and more accurate than what you'll find in consumer drives, even with the same amount of checksum information stored on disk.

Um, the point is that RAID controllers aren't going to wait 30 or even 15 seconds, they'll drop the disk from the array and move on.

Not true. It depends on the controller. And depends on how it's configured.

tynopik · Sep 24, 2012

murphyc said:
a.) Not always true. SAS disks regularly support 520 and 528 byte sectors for the purpose of better ECC.

b.) The ECC hardware in a SAS disk is faster and more accurate than what you'll find in consumer drives, even with the same amount of checksum information stored on disk.

the point is that such things are not defined/required in the spec

it's possible to make a SAS disk without such things and a SATA disk with such things

also, please explain how the hardware can be 'more accurate' with the same amount of parity data? either you have enough to recover or you don't

murphyc said:
Not true. It depends on the controller. And depends on how it's configured.

You can always find exceptions, but generally speaking, it is true

WD Red or WD RE4 drives for ZFS NAS?

Member

Platinum Member

Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Member

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Senior member

Senior member

Senior member

Diamond Member