RE4 Hardware Differences from Caviar Black

Tsaar

Senior member
Apr 15, 2010
228
0
76
Up until 1 month ago I was using a Velociraptor 300gb as my data drive (I don't store video and my music collection is only between 10-15gb).

I finally decided to make the leap the Caviar Black 1TB. I decided to download about 400GB of Steam games since I had the space (I use a junction to my SSD for games that need that extra load performance).

It started out well, but over the last month this Caviar Black has began rattling so loudly it can be heard from quite a distance. I have never returned a HDD due to paranoia about personal data, but since this one is so new I am going to DBAN it and return it to Amazon (it is still within a month). I am then going to purchase a new drive.

Right now the RE4 is only $10 more than the Caviar Black. I am not too worried about the TLER being 7.5 seconds on the RE4 even though I will be in a single drive configuration (i.e. no RAID).

What I am wondering is if the RE4 are built to a high manufacturing standard? Things such as better bearings, vibration tolerance, heat tolerance, etc? I am not able to find a clear answer on this. The WD website makes it seem that the RE4 has higher reliability, but on some forums it seems the only difference is firmware.
 
Last edited:

gpse

Senior member
Oct 7, 2007
477
5
81
I would get another Black, I've had mine for almost a year now, zero issues! Perhaps you just got a lemon
 

MarkLuvsCS

Senior member
Jun 13, 2004
740
0
76
I don't knowk what RE4 is but, what you asking ? I would always go WDC 1st then samsung..thx
Tweakboy, here is some information for you. Caviar Black and RE4 are both Western Digital Drives. The latter is part of their enterprise line.

I don't think the drives will perform any different, but for $10 I probably would get the RE4. Typically paying for enterprise/business designed items is a waste of money for home use, but when the cost difference is slim I'd usually spend the extra for peace of mind.

Sometimes the benefit not easily quantified is the occasional benefits for some higher end hardware can sometimes have slightly better support.
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
RE4 is meant for raid. It has 7 second time limited error recovery and is meant for continuous operation (less head unload/load iirc).

If you are not using a raid controller, use the black. It has deep cycle recovery 180-seconds to keep trying when there is a nasty bad sector. Raid controllers will not be able to communicate to the drive during the 180 second deep cycle recovery and will drop the drive from the array.

BLACK/GREEN/blue -> NO RAID
AV/AV-GP -> video surveillance (TLER=0)
RE4/RE4-GP -> RAID
RED -> Like green RE4 but a little more eco-friendly. RAID.

Velociraptor non-enterprise -> 10K really fast, non-raid, 3.5" only
Velociraptor Enterprise (yellow Enterprise label) -> 10K really fast, raid,tler=7, 2.5" or 3.5"

Drives are binned, the top quality to RE4,red, lower quality to consumer/oem. worst quality to external drives.
 

Tsaar

Senior member
Apr 15, 2010
228
0
76
I decided to purchase the 2TB Caviar Black. I had heard there were some reliability issues in the first generation of these 2TB drives (i.e. in the SATA II days of the CB series), but I am hoping those have been worked out.

They are going for $170 on Amazon currently.

Is anyone familiar with the validation process HDD manufacturers use to determine the bin level? Since it seems all of these drives are mechanically identical, there has to be some way to determine this since they list the RE4s with a higher MTBF.

Edit: Actually canceled and probably going to go with the 1TB Black again. The RE4 still seems tempting since the price is practically identical.
 
Last edited:

gpse

Senior member
Oct 7, 2007
477
5
81
RE4 uses different bearings/motor, among other things, that's why it has a higher MTBF.
 

Chapbass

Diamond Member
May 31, 2004
3,113
65
91
RE4 is meant for raid. It has 7 second time limited error recovery and is meant for continuous operation (less head unload/load iirc).

If you are not using a raid controller, use the black. It has deep cycle recovery 180-seconds to keep trying when there is a nasty bad sector. Raid controllers will not be able to communicate to the drive during the 180 second deep cycle recovery and will drop the drive from the array.

BLACK/GREEN/blue -> NO RAID
AV/AV-GP -> video surveillance (TLER=0)
RE4/RE4-GP -> RAID
RED -> Like green RE4 but a little more eco-friendly. RAID.

Velociraptor non-enterprise -> 10K really fast, non-raid, 3.5" only
Velociraptor Enterprise (yellow Enterprise label) -> 10K really fast, raid,tler=7, 2.5" or 3.5"

Drives are binned, the top quality to RE4,red, lower quality to consumer/oem. worst quality to external drives.
Very cool post, thanks for the info Emu.
 

n0x1ous

Platinum Member
Sep 9, 2010
2,526
181
106
RE4 definitely has a higher tolerance for vibration as its meant to be operated 24/7 in close proximity to tons of other drives. I have a few Blacks and they are rather noisy so thats normal. Have had 3 in 3 different machines for a couple years now without issue.

Just got 4 -3TB reds for my upcoming Server 2012 Essentials home server. I have personally never had a WD drive fail on me.
 

bryanW1995

Lifer
May 22, 2007
11,143
32
91
RE4 is meant for raid. It has 7 second time limited error recovery and is meant for continuous operation (less head unload/load iirc).

If you are not using a raid controller, use the black. It has deep cycle recovery 180-seconds to keep trying when there is a nasty bad sector. Raid controllers will not be able to communicate to the drive during the 180 second deep cycle recovery and will drop the drive from the array.

BLACK/GREEN/blue -> NO RAID
AV/AV-GP -> video surveillance (TLER=0)
RE4/RE4-GP -> RAID
RED -> Like green RE4 but a little more eco-friendly. RAID.

Velociraptor non-enterprise -> 10K really fast, non-raid, 3.5" only
Velociraptor Enterprise (yellow Enterprise label) -> 10K really fast, raid,tler=7, 2.5" or 3.5"

Drives are binned, the top quality to RE4,red, lower quality to consumer/oem. worst quality to external drives.
I hadn't heard that the worst quality goes to external, though it makes sense that RE4 would get the best and OEM would also be better binned than externals.
 

Concillian

Diamond Member
May 26, 2004
3,751
8
81
As Emulex noted, sometimes the hardware is the same with different firmwares, but usually something that will affect reliability claims (MTBF) will be the result of a design change somewhere along the way. You can't really bin for reliability unless there is a known cause for that reliability (manufacturing issue that causes a bunch of parts to be marginal in one way or another, for example).

Most reliability must be designed in. You can't bin for lower failure rates. It's a rate. If you know it's going to fail one of two things happened:
- it already failed... so you can't sell it.
- you tested variable to bee out of spec and know it will fail, so you remove that component before it had a chance to fail.

You can't really "test in" this level of MTBF change. I needs to be designed in.

Most likely the RE4 drives are designed to give up a little bit of margin on the density capability, data rate, or yield / cost. That margin buys design trade-offs that result in better reliability test results, which results in better MTBF claims. It's not impossible that MTBF comes from firmware alone, but my experience says that this is unlikely.
 

Concillian

Diamond Member
May 26, 2004
3,751
8
81
I hadn't heard that the worst quality goes to external, though it makes sense that RE4 would get the best and OEM would also be better binned than externals.
- Some average Joe sees maybe 25 drives in his lifetime. His experience is going to be hit or miss by definition.

- An OEM who purchases several hundred thousand drives a quarter and frequently installs 50,000 drives at a time in datacenters, and is going to be able to tell you what your failure rate is within a fraction of a percent.

If you have a manufacturing issue that builds 100k components by accident in a way that 99% of them will work fine, but 1% of them will result in a drive failure and you have no way of testing out the 1%....

Would you just let those components end up wherever? Or would you do everything in your power to make sure those components don't end up at the OEM who will WITHOUT A DOUBT notice the increased failure rate. You might say that you scrap all of them, but if it's a constrained component, that could means millions of dollars of lost opportunity. It's not always an obvious decision.

If you want the best possible failure rate for a consumer level drive, buy that drive through a large OEM like Dell or HP. It will cost more, but it will be less likely to have a problem. In the end though, every product has a failure rate, and you need backups anyway, so I go the route of cheap drives, but have at least 2 backups of anything important.
 

brandonb

Diamond Member
Oct 17, 2006
3,731
2
0
I have an RE4 drive. I bought it because I built a silent computer (outside of the ATI 7970 which I regret buying due to the noise, but I wanted something powerful enough to run eyefinity with BF3).

I like it. I had WD Black's, but they have a tendency to squeal a bit (high pitched noise coming from the drive) but you could really only hear it if you had the case open, along with the normal drive noise. The high pitched whine bothers me though. The RE4 does not have this problem that I've noticed. Between the SSD and RE4 there is little to no noise coming out of the computer at all. I can't even tell the computer is on when I push the button. I don't have the power LED hooked up. The only way I can tell it is on is that the HD LED fires up. But when I push the button I hear no difference between on/off with the computer outside of the 7970.

I haven't really noticed any negatives to the drive. It seems just as fast as the WD Black. Sure on paper it might have a flaw for a consumer level drive (non raid), but I cannot notice anything in reality. Granted I bought the RE4 when I bought the Antec case (see sig) so it might be more of the case than drive. Rubber gromets work well.

Anyways. Long story short. I like my RE4 more than my Black.
 

ronbo613

Golden Member
Jan 9, 2010
1,237
45
91
I have a 1TB Caviar Black 1001FALS that has been my hardest working mechanical drive, no complaints at all; a great hard drive.
I recently replaced a bunch of aging Seagates with 1TB RE4 drives. I can tell no performance difference between the Black and the RE4s, but for the extra $10 per RE4 over the Black, I figured it was worth it to have enterprise grade hard drives. I checked it out before I bought them, I believe the quality of some of the RE4 internals may be just a bit better then the Blacks.
Either way, both are great hard drives, worth the extra money you pay for them.
I always back up my important files on at least two other hard drives.
 

Coup27

Platinum Member
Jul 17, 2010
2,140
3
81
I don't knowk what RE4 is but, what you asking ? I would always go WDC 1st then samsung..thx
RE4 is a hard drive manufacturer that instead of using platters to store data on them they use slices of pizza with the data embedded into the melted cheese for integrity..thx
 

murphyc

Senior member
Apr 7, 2012
235
0
0
WD Green, Blue, Black can be used with RAID 1 and 0, officially sanctioned by WD. More complex RAID they won't support. However, TLER isn't an advantage for linux mdraid of any level including RAID 5 or 6, or ZFS RAID-Z[123].
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
TLER is an advantage with any raid, do you think that most programs really want to wait for 180 seconds to respond to a disk i/o request? SQL server definitely isn't kind to that. Web servers aren't kind to that. What happens if two drives hit that 180 second deep cycle recovery (or half of them)? not good.

QNAP/SYNOLOGY/DROBO have special sauce drivers that say screw it, we roll our own and we can use TLER=0,TLER=7, and Deep cycle recovery all in the same raid-set by ignoring all of the above and dealing with issues their own way. I do not believe any "Free" raid solution has this technology.

From my standpoint, I would rather a raid controller handles the remap and gets on with business than timing out. Matter of fact i've seen deep cycle recovery push bad data through both soft-raid and hardware raid controllers.

check this out:
http://intelraid.com/uploads/FatSAS_WhitePaper_v1.0.pdf

^^ READ THIS ^^
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
and this one!!
http://www.snia.org/sites/default/education/tutorials/2008/spring/storage/Whittington-W_Desktop_Nearline_Enterprise_HDDS.pdf


So if your drives don't have full ECC like SAS, and they corrupt, does your raid controller check the parity/CRC (Raid-5/6) during reads? (answer for most hardware raid controllers = NO!)

look, not all intel NIC for server have FULL ECC (go check!)
look, UDIMM ECC doesn't protect against ADDRESS errors only DATA versus RDIMM.

When building a storage nas, these are the little things that all add up to reliable 100% storage, versus strange random corruption.

ZFS has had to build in extra intensive processes to check for this! why? because people cheap out on gear!!


XEON CPU -> RDIMM ECC -> RAID controller with ECC -> SAS drive with full ECC = no need to check for corruption, it's not possible with IOEDC and IOECC.

Core I3 -> UDIMM ECC (yes some models support it!) -> Raid controller with regular DIMM -> SATA drive with partial ECC = better have that ZFS scrubbing system or you will feel the pain.

And to stack even more on top of that: SAS Expanders have to tunnel SATA to the SAS controller, they work GREAT when drives are not not misbehaving. It's when isht hits the fan that things go all to heck.

I've witnessed a raid-5 (Seagate NS) with 1 drive failed, push corrupted data through the raid controller. The raid was not rebuilding, and did not fail. This is the worst nightmare possible and resulted in format of then entire array due to unknown corruption. Followed quickly by replacing every drive. Ram was tested (8gb) and in non-ecc mode no bits failed. Controller was tested and no problems.
 
Last edited:

jrichrds

Platinum Member
Oct 9, 1999
2,531
3
81
If you are not using a raid controller, use the black. It has deep cycle recovery 180-seconds to keep trying when there is a nasty bad sector. Raid controllers will not be able to communicate to the drive during the 180 second deep cycle recovery and will drop the drive from the array.
Do all consumer-level drives from other brands (Seagate and Samsung b4 they were bought out by Seagate) also do the long recovery?

I know it was a big deal back when you could toggle TLER on Caviar Black drives with software, until WD disabled the ability to do so.

But I hear so many people running Samsung and Seagate consumer-level drives in RAID arrays...I'm wondering if it's only WD that does the long, deep-cycle recovery.
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
They all do.

Some drives now have SCT control to enable this for linux/solaris raid on boot. However they do not all work, and most lost their setting upon power loss/reboot.

so it doesn't work so hot with most raid controllers.

Deep cycle recovery is a good thing for non-raid since if it can save a sector in 100 seconds that's better than losing the data or corruption!
 

murphyc

Senior member
Apr 7, 2012
235
0
0
TLER is an advantage with any raid, do you think that most programs really want to wait for 180 seconds to respond to a disk i/o request?
It's a fair point. However, such RAID implementations aren't going to drop the disk out of the array as a result. The advantage of TLER is for fussier controllers, so they don't unnecessarily drop the disk but instead receive a CRC error and self-correct for that sector (by rebuilding the whole chunk from parity).

SQL server definitely isn't kind to that. Web servers aren't kind to that. What happens if two drives hit that 180 second deep cycle recovery (or half of them)? not good.
You're right. However, WDC Red's spec sheet fine print says the disk is for home and small office 1-5 bay NAS systems. For 6+ bay systems (not actual number of disks but number of bays) or rack mount systems, they says you need to use enterprise disks.

Further, any disk that's taking more than a few seconds for ECC to correct a sector error is correcting a persistent error. The disk needs to be zero'd or ideally it should be Secure Erased to force those sectors out of use.

QNAP/SYNOLOGY/DROBO have special sauce drivers that say screw it, we roll our own and we can use TLER=0,TLER=7, and Deep cycle recovery all in the same raid-set by ignoring all of the above and dealing with issues their own way. I do not believe any "Free" raid solution has this technology.
Calls for speculation. Rebuilding data from parity is precisely what RAID is for. The legitimate question, which is not easy to test, is what do these less sophisticated RAID implementations do when the drive issues a CRC error for a sector? Do they puke? Or do they rebuild the affected chunk from parity (or mirrored copy)? And further, do they issue a write command for that rebuilt chunk to the original LBAs, causing the disk to write to those same sectors? If the sector error is persistent on write, the drive firmware will remove that sector from use, and write the data with the same LBA to a previously reserved sector.

From my standpoint, I would rather a raid controller handles the remap and gets on with business than timing out. Matter of fact i've seen deep cycle recovery push bad data through both soft-raid and hardware raid controllers.
RAID does not remap sectors directly. It contains no sector maps. That's the job of the disk firmware.

And it's also a fair point that ECC recovery from sector read errors may in fact contain bad data. The ECC isn't foolproof. If the disk thinks it has successfully corrected for an error, it will not issue a CRC error to the OS, and the OS almost certainly will accept the data as is. But I'm hard pressed to think of any RAID, even sophisticated hardware RAID, that can deal with this because the penalty on read to rebuild data from parity and confirm that the actual data is correct is really high - unreasonably high. The noted exception for this are the resilient file systems, ZFS, btfs, and ReFS.

But bringing up the possibility of ECC false error correct actually makes your argument even stronger that it's better for ECC to just dump out a CRC error early on, not merely for time purposes, but honestly if it can't correct the error in a few seconds, it may be there's an increasingly high chance of getting bogus "corrected" data instead of an error reported to the OS.

But think about what we're paying extra for, is a reduction in drive firmware ECC capability. We're asking for less ECC (time wise) and a faster failure notification. And somehow that's a feature to pay for. It's like paying more for fewer features on a microwave (not the best analogy...)

check this out:
http://intelraid.com/uploads/FatSAS_WhitePaper_v1.0.pdf

I think on paper, entirely on the just the basis of the drive technology, there's hardly a dispute about the superiority of enterprise SAS over consumer SATA. But resilient file systems, and clustered file systems, are making the deployment of thousands to tens of thousands of consumer grade disks for enterprise applications possible. Google's study on HDD failure trends of over 100,000 drives were all consumer PATA and SATA disks.
 
Last edited:

murphyc

Senior member
Apr 7, 2012
235
0
0
I've witnessed a raid-5 (Seagate NS) with 1 drive failed, push corrupted data through the raid controller. The raid was not rebuilding, and did not fail. This is the worst nightmare possible and resulted in format of then entire array due to unknown corruption. Followed quickly by replacing every drive. Ram was tested (8gb) and in non-ecc mode no bits failed. Controller was tested and no problems.
Sounds like bit rot or silent data corruption affected one or more parity chunks and at that point, once parity is toast, in RAID 5 with single parity, you're hosed with a failed disk. You have no primary data to compare to, and even if you did it's ambiguous which is wrong, data or parity. Whereas with double parity it's possible to resolve the ambiguity.

So how is Google using consumer PATA and SATA drives, with ext4 (no data checksumming), and GFS which also doesn't appear to have any kind of checksumming or ECC on it's own, detect and correct for corruption? They have multiple chunks which they could compare to each other but in normal operation I don't know that they look at more than one chunk.
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
Do you believe that your raid controller is checking parity/crc on READS where no error models (smart/timing) are happening?

1. Drive will read from n-1 or n-2 drives and use cache to grab data as fast as possible, if no errors are presumed, why bother compute CRC/PARITY? the cpu's are not fast enough for modern 15K sas drives, let alone ssd.

ZFS is superior in this respect but it comes at a huge cost since every read would have to computer parity/crc checks and even the fastest dual-roc/xeon's are not going to keep up with large storage and/or SSD.

Google is not OLTP financial data in that sense, you ever click on a cache page and nada? kaput? sorry? that is cool for google, but for accounting, or critical documents, not cool. Google is cool with 98% accuracy and non-ACID compliant transactions, your auditors would not be cool with that.

So spend a few more bucks and have reliable storage. Hell my constellation 1TB SAS drives were only $99 compared to the SATA version ($90 at the time). I think the extra few bucks are well justified.

Anyhoo. I've had ZERO issues with my sas raid-10 setups. 1000% reliable. I can't say that much for SATA.

HP has 3-way RAID-1 for 3 or 4TB sata drives. They know you need it for speed, and to ensure reliability. It makes sense. Drives are cheap again.

Raid controllers can keep a few sectors off the table for its own remap. It's not different than SSD overprovisioning. If you take 5% off the table for raid controller "weak" sector remaps, that's better than losing a drive and that time between 1% and 5% would be an early warning indicator.
 

ASK THE COMMUNITY