• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

WD Red NAS HDs, hard to find stock?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
The point is data consistently. Just because you have never worked with it don't tell everyone "never."

Huh? I was not the one who used the word "blindly" first, nor did I use the word "never". You're putting words into my mouth. You still haven't qualified what you meant by Windows and Linux software RAID "queries all drives, does the XOR and verifies the result" unless the feature is disabled. Please tell me exactly what feature you're referring to. Which mdadm flag are you referring to?

This $50 highpoint card is doing it right now.

It is doing what, exactly? And what model is it? And what RAID level?


You can and should also turn on scrubing in md and Windows if you are using it. However since the author of md has taken the position that he can't be bothered to add proper scrubing and parity checking on read then I wouldn't use md personally.

Define proper scrubbing, and state how md doesn't do that. There are two kinds of md scrubs. Both do parity checking. In the case of RAID 1, 4, 5 what do you expect the behavior of RAID to be when there is a data to parity mismatch found on read?

I'll stick to the $50 -> $150 cards I use at home where I can turn this functionality on.

See and I'd sooner use ZFS or even btrfs than cheap sub-$150 cards.
 
Amazon finally has the 2TB WD Red in stock at $134, which is quite reasonable.

In contrast, The Hitachi drive is $234

WD RE4 is $199

Yes, that makes sense. The Red is consumer, with a UER of 1 in 10^14 and a 5400RPM rotation speed and commensurate access times; the RE4 is nearline SATA 7200RPM with a UER of 1 in 10^15; the Ultrastart is enterprise SATA or SAS 7200RPM with a UER of 1 in 10^16.

If you're using a resilient file system, or the data is replaceable (i.e. not that important) then you're fine taking the small risk of corruption. If not, get a better drive.

My plan is to get 3 drives for RAID-Z1 medium important home files. Would you spend the money for the Enterprise versions? Running on a ZFS S3200 with dual gb, 8GB ECC.

No. I would sooner spend the money on more cheap (or even cheaper) drives, and create a 2nd independent ZFS pool (striped pool or RAIDZ), and use ZFS send to keep the original data cloned. That way you have a backup. You could schedule this so that it happens once a week (or whatever) and otherwise spin down that 2nd set of disks.

And this is for a NAS? You're using 1GigE and wireless? The access times and performance rate of going to enterprise is not going to make much difference for that application. This is in a home or small office?
 
FWIW, you should check out NAS 4 Free. They are using an older GUI than FreeNAS, but it is based on FreeBSD 9 and ZFS v28 instead of v15. Another option for you is Nexentastor Community Edition which is free up to 18TB of storage (it's Illumos based).
 
They are trickling in. They showed up on Newegg two days ago for $120 just to be sold out an hour later. When they restocked the next day, I bought a set at $120 each. Promptly sold out again.

Listing on Newegg at $160 and $140 on Amazon (3rd party) this am.

Edit : talking 2TB version here.
 
Last edited:
The point is data consistently. Just because you have never worked with it don't tell everyone "never." This $50 highpoint card is doing it right now. You can and should also turn on scrubing in md and Windows if you are using it. However since the author of md has taken the position that he can't be bothered to add proper scrubing and parity checking on read then I wouldn't use md personally. I'll stick to the $50 -> $150 cards I use at home where I can turn this functionality on.
To rephrase, how is it checking the data? The OS wants to see 512B or 4KB sectors, and the controller gives it stripesize*drivecount, which should be aligned multiples, but containing (stripesize*drivecount)/fsblocksize blocks each, and nothing else. If it is using additional space for its own ECC info, how much is overhead is it adding, and how is it spread across the drive?

The core problem with RAID 1 is that if drive A got bad data to write, and drive B got good data, or they were updated out of sync somehow, the drive could report no error on that sector, but one set of data wouldn't match. At that point, you would need every FS block CRC'ed, and FS awareness of the RAID, or RAID driver awareness of the FS, so that the data could be verified against the FS' CRCs. And, that's the easy scenario, and one which probably gets corrected well under Windows, I would imagine (but, I'm not sure).

It would also be possible, with a bad PSU or power failure, or driver-level bug, to have bad writes to both drives, leaving the data in a state that can b read as incorrect, but where figuring out whether either, or which one, is correct would be difficult to impossible, without FS integration.

Unless the RAID controller explicitly stores the correct CRCs somewhere, you won't even be able to tell which is which. Scrubbing without an additional software layer of checking and correction only prevents the easy errors: 1 drive reporting a bad CRC, one a correct CRC, or both bad CRCs (bad block, data integrity compromised). Anything but that or bitrot won't be caught, much less prevented or corrected. OTOH, if it does, what mechanism is it using to not break RAID 1, yet also not harm performance by needing to perform several seeks per write?

RAID 5 ends up in a similar boat.

The correct solution is to invalidate some, "free space," and write new data there, rather than overwriting the old data in-place. This pretty much necessitates either a filesystem-integrated method; or a fully proprietary RAID arrangement, with some additional space per stripe allocated for error checking[ and correction] data, some dedicated EC[C] stripes, etc..

I'm not convinced that going from 10^-14 to 10^-15 to 10^-16 is really worth a damn, except when in a data center environment (at which point it will still only allow the drives to reach around parity with the consumer drives, in errors/time). Commodity hardware has so many sources of light failure that a pure software answer to data corruption will offer many more orders of magnitude the robustness, in practice (UER specs are assuming the drive got good data, has a good RAM chip, and there was no other error in the system--we really want to find those other errors, too, especially the ones that we aren't expecting, because expected errors tend to be prevented errors), not unlike it has for decades with networking protocols (where each layer, for protocols that care about integrity, will be made to not trust the layer it rides in). Traditional RAID, however, doesn't do that. It fundamentally trusts the drive controller(s) and disks.

They are trickling in. They showed up on Newegg two days ago for $120 just to be sold out an hour later. When they restocked the next day, I bought a set at $120 each. Promptly sold out again.

Listing on Newegg at $160 and $140 on Amazon (3rd party) this am.
They are a nice Goldilocks drive series. Right now, cost and availability kind of suck, but that will work itself out over the next few months, for sure.
 
Last edited:
Cerb, first I am going to be all over the place with this because I don't ever think there is one 'best' way to do home NAS units / backups. I also don't have time to completely proof what I wrote atm.

The process varies between controllers and configuration of the hardware and software. here is why it is all over and not "all consumer drives suck / are great / whatever"

Issues that affect consumer level drives:
DIF is not commonly supported (but is often added to consumer SATA drives in arrays IE Segate Barracuda drives can do 520byte sectors with the proper firmware. These are the same drives the consumer can pick up at Microcenter. However consumer SATA controllers would simply barf when talking to them.
Some SATA and NL-SATA/NL-SAS drives report write success and write completed before they get to the disk for performance reasons. This gives you the issue above you mentioned where the drive could silently fail. This is a firmware thing and can often configured with the manufacture tools (Barracuda can do this.)
Some of the cheaper SATA drives may simply drop the ECC bit blocks from the sectors to gain a larger capacity.

Issues with Enterprise drives:
SAS tends to have a higher cost
SAS drives are generally smaller
NL-SAS is available in common consumer sizes but cost more [typically]

Benefits with Enterprise drives:
Performance
longer rated UERs etc

So assuming you did your homework and bought drives that at least have basics:
The controllers can also often be configured to handle errors in differing ways.
They can be configured to read all sectors and do the XOR checking on all reads. The cheap PERC6i I got for $60 can do this for example.
They can be configured to do periodic "idle" scrubbing. PERC6i again
A disk CRC error on read should cause the controller to attempt to get enough data to rebuild, attempt to rewrite the defective sector and remap it to a space cluster at the controller level. PERC6 again does this as does the "H" series Dell stuff.

Up at the filesystem level there is more going on and obviously can vary a ton.

ZFS has a lot of the error correction built in to it and can handle running on junk controllers and ECC less drives without and issue. The trade off is a certain amount of overhead and performance. ZFS itself has its own tuning that can drastically improve its performance but also drastically reduce consistency (esp if say power is lost or something panics the kernel.) The fact that ZFS merges a lot of the layers has a certain "coolness" factor but it currently has issues where it is limited to most BSD as the other ports do work but tend to be older revs. Also as typical with the CLI os's if something does go wrong, normally the home consumer is left in mystery land with error messages. Granted many of the GUI's are getting better at this.

In general I think most people need to look at all this stuff if they really care about their data.

If you really don't want to lose stuff you need to keep multiple copies. That renders most of the stuff we were arguing here moot because a RAID error or ZFS error that loses a cluster is no longer a concern.

In reality for most home consumers... I recommend they buy a "black box" like Syndology. If they want to roll their own I recommend they decide what matters to them (some people are scared of BSD / Linux / Windows etc) and then buy drives that work well with the hardware they select. Going with a a used SAS card off ebay, the controller will likely expect drives with low read error recovery and the like so you pick drives that either come like that from the factory or ones that have tools to configure it.

I they want to play in BSD land with zpools, then they can buy cheap SATA PCIe cards and toss whatever drives they want in the pool and let ZFS handle the drive issues as they occur. They might find themselves using google for a few hours if something goes wrong though.

I also tell them that a) RAID is not a backup. Just because the consumer thinks it is, it isn't. b) ZFS itself is not a backup. See a) If you care about the data, backup to something else whether it be "the cloud" or a second NAS.
 
Some of the cheaper SATA drives may simply drop the ECC bit blocks from the sectors to gain a larger capacity.

(...) ECC less drives without and issue
These would be my only points of contention/disagreement. AFAIK, no drives except for "AV" types (sadly, I do know that many people buy these because they are cheap) make any significant error-checking compromises. The SNR has just been getting too low for them to be able to go without decent ECC on the drives. It's not exposed to you, but it's there and being used by the drive.

Beyond that, it's been your presentation, by and large. Flashing drives to use 520B, utilizing a 520B-supporting controller, go beyond RAID, and beyond common disks. That's not the sort of thing most people would assume from limited information like, "$50 highpoint card," which all look like they have no such support (the cheap SAS one is $70, and a model can only be inferred after your last point). OTOH, it still trusts the drives and controllers, so far from eliminates the usefulness of an added layer of checking.
 
Last edited:
I found this article on how soon unrecoverable error occurs at sustained transfer rate based on UER. Granted, this is a bit. It's not a whole file. But we also don't know what data is affected, nor do we have a choice. What is affected, matters in how graceful the failure is. If it's within a critical part of the file system itself, in theory a whole directory of images could be lost.

The controllers can also often be configured to handle errors in differing ways. They can be configured to read all sectors and do the XOR checking on all reads. The cheap PERC6i I got for $60 can do this for example.

Is this the card? Because I don't know what feature you're referring to, if you aren't just referring to scrubbing. Or what you mean by configuring them to handle errors in different ways. But you said XOR so that means RAID 5 or 6.

In normal operation, i.e. the array is not degraded, the disks are returning the data requested without any read errors, the RAID controller defers to only data chunks, not parity.

Since you aren't using the commonly accepted term "scrubbing" it sounds like you are suggesting something else. When you say "XOR checking on all reads" sounds like you think it does this during normal operation. If you are suggesting this, here is why this is wrong for RAID 5.

If a data chunk and parity chunk mismatch, in normal operation (i.e. the disk has reported no read error for either one), you have two rather significant problems:

1. It is unknowable to the RAID controller which is right and which is wrong. There is no mechanism for the controller to choose between them. Therefore there is zero point in doing constant "XOR checking" on literally all reads in normal operation.

2. Any mismatch between a data chunk and parity almost certainly indicates a hardware problem. It's even really unlikely to be a controller firmware problem. But regardless, normal operation is not the time to be learning about this, it's what scrubs are for.

For RAID 6, you could do constant parity checking but that's unnecessary with regular scrubs. So I don't know what problem this is designed to solve if there are even products that do this. It's a lot more expensive to compute parity and check it, than to compute a CRC at a file system level (btrfs, ZFS, ReFS) and have that checked.
 
I found this article on how soon unrecoverable error occurs at sustained transfer rate based on UER. Granted, this is a bit. It's not a whole file. But we also don't know what data is affected, nor do we have a choice. What is affected, matters in how graceful the failure is. If it's within a critical part of the file system itself, in theory a whole directory of images could be lost.



Is this the card? Because I don't know what feature you're referring to, if you aren't just referring to scrubbing. Or what you mean by configuring them to handle errors in different ways. But you said XOR so that means RAID 5 or 6.

In normal operation, i.e. the array is not degraded, the disks are returning the data requested without any read errors, the RAID controller defers to only data chunks, not parity.

Since you aren't using the commonly accepted term "scrubbing" it sounds like you are suggesting something else. When you say "XOR checking on all reads" sounds like you think it does this during normal operation. If you are suggesting this, here is why this is wrong for RAID 5.

If a data chunk and parity chunk mismatch, in normal operation (i.e. the disk has reported no read error for either one), you have two rather significant problems:

1. It is unknowable to the RAID controller which is right and which is wrong. There is no mechanism for the controller to choose between them. Therefore there is zero point in doing constant "XOR checking" on literally all reads in normal operation.

2. Any mismatch between a data chunk and parity almost certainly indicates a hardware problem. It's even really unlikely to be a controller firmware problem. But regardless, normal operation is not the time to be learning about this, it's what scrubs are for.

For RAID 6, you could do constant parity checking but that's unnecessary with regular scrubs. So I don't know what problem this is designed to solve if there are even products that do this. It's a lot more expensive to compute parity and check it, than to compute a CRC at a file system level (btrfs, ZFS, ReFS) and have that checked.

Yes that one. Specifically using the LSI tools it can be set to a RAID5 consistency mode where it uses the Intel IOP333 XOR offload engine that is on the card to a) do idle time disk checks and b) do verify on read. It can also do this in RAID6.
 
Specifically using the LSI tools it can be set to a RAID5 consistency mode where it uses the Intel IOP333 XOR offload engine that is on the card to a) do idle time disk checks and b) do verify on read. It can also do this in RAID6.

The documentation on "Consistency Checks" indicates it's only a background operation. It's not a mode or part of normal operation. You can schedule them, or run them manually. It's not a persistent setting. You can choose to have it abort the check when there is an inconsistency, instead of automatic correction. Basically it's a scrub that you can set to be ready only (a check), or read and automatically correct (repair).

The documentation does not distinguish between two kinds of inconsistencies possible, however. And you keep avoiding this as well.

A read error is disk error. It's not ambiguous how RAID should behave when this is encountered.

A mismatch is a raid error. In RAID 1 and 5 it is ambiguous how to fix the problem when two RAID 1 sectors which should be identical, aren't identical; or when RAID 5 data chunk doesn't match the parity chunk. The "Consistency Check" documentation for your controller indicates inconsistencies are automatically repaired, but for RAID 1 and 5, such a setting will choose incorrectly 50% of the time. Do you understand this?

For RAID 6, there is a relatively unambiguous way of determining which chunk is incorrect.

Having said that, I come back to your statement disputing that RAID controllers do not blindly trust the drive, and ask you what you mean. Because in fact the RAID controller you are citing behaves this way, it defers to the drive blindly.

Even in RAID 6 mode it does this, it's just that it has extra XOR so it can (statistically but not conclusively) resolve any mismatches.
 
You utilize RAID-5E Rotating Parity N with Data Continuation (PRL=15, RLQ=03) and the DDF and hotspare blocks in the extent to store check data. You then either a) chose to scrub so the array reads all of the sectors and verifies the data or have it do verify on reads. During the writes you lazy calculate either XOR or CRC and store them in the HS.
 
Back
Top