RE4 Hardware Differences from Caviar Black

mikeymikec · Aug 17, 2012

Emulex said:
Drives are binned, the top quality to RE4,red, lower quality to consumer/oem. worst quality to external drives.

/me holds up a sign saying "[citation needed]"

murphyc · Aug 17, 2012

Emulex said:
Do you believe that your raid controller is checking parity/crc on READS where no error models (smart/timing) are happening?

No I do not that was expressly my point.

1. Drive will read from n-1 or n-2 drives and use cache to grab data as fast as possible, if no errors are presumed, why bother compute CRC/PARITY? the cpu's are not fast enough for modern 15K sas drives, let alone ssd.

This is not how RAID works, even in degraded mode. Parity and data chunks are distributed among all member disks (except RAID 4) so all drives are read from, always in RAID 5/6, not n-1 or n-2 drives. Parity is ignored in normal operation. In degraded mode, now parity chunks have to be read in and missing data chunks reconstructed on the fly.

The context of this hypothetical situation is an array in normal operation, where there is a bad sector and disk ECC thinks it has corrected for it, but has corrected incorrectly. A rare event but quite possible for consumer disks, as you point out they don't have full ECC. ZFS can mitigate this efficiently because everything is checksummed - that is not the same thing as parity. The checksum is a very efficient and effective "pass/fail" for a chunk. Whereas for conventional RAID 5/6, there is no checksum. Only parity. The only way you'd be able to 2nd guess a drive saying "here's your legit data" is if your controller were reconstructing data on the fly from parity. Expensive not merely computationally, but you have a pile of parity chunks that now have to be read from the disk and pushed through the interface that otherwise would have been skipped.

Google is not OLTP financial data in that sense, you ever click on a cache page and nada?

Yeah but they clearly have some mechanism for subsequently identifying those errors, and retrieving copies of that data. I'm just not finding out what layer that's in: not the two file systems I'm aware of that they're using, ext4 and GFS.

Raid controllers can keep a few sectors off the table for its own remap.

This doesn't make sense for multiple reasons, but I invite you to explain it in detail.

The file system asks for sectors by LBA. A controller mapping this out for other LBA's would render the data on the array utterly useless (inaccessible and impossible to reconstruct) with a single point of failure: that particular controller. Even replacing it with the exact make/model would mean the array was useless.

It's not different than SSD overprovisioning.

Except over provisioning/wear leveling is done by that SSD's firmware. They are joined at the hip. HDD's already have this capability in their firmware so it doesn't really make sense to duplicate the capability in the controller. Invariably it will be choosing remapped sectors to a wholly different location on the disk that the drive firmware would, which in modern drives are reserved areas specifically designe to avoid excessive seeking as a result of such remaps. That wouldn't happen with a controller based remap - as well as creating an non-divorcable pairing between disks and controller.

If you take 5% off the table for raid controller "weak" sector remaps, that's better than losing a drive and that time between 1% and 5% would be an early warning indicator.

OK if the RAID controller assumes complete and total control of the physical disk, i.e. it is responsible for writing out the partition map such that it creates its own physical disk LBA to controller LBA map abstraction between the file system and disk, I could see how this would work. And then that table mapping could be written to a reserved region on the disks so that another controller of same make/model would be able to maintain the map. But that's woefully inefficient compared to the controller simply asking the drive to rewrite those sectors. If the write fails persistently, the drive firmware will do its own remapping. SATA and SAS do this. I don't see the advantage of a controller doing it.

Mark R · Aug 17, 2012

As I understand it, the main differences between the RE4 and black are:

1. Firmware changes providing minimal error correction functionality on the RE4, whereas the black has maximal error correction. (TLER)

2. Black has firmware using weaker ECC codes, which require a less powerful processor, but providing slightly less data integrity (partially compensated for by the more aggressive error recovery in the black).

3. The RE4 contains a gyroscope (or possibly an array of accelerometers - WD has patents for both). This is used to compensate the head actuator in the event of rotational vibration (e.g. other hard drives in the same chassis seeking)

4. Possibly some other changes for 24/7 use, higher MTBF. Not sure if these are hardware differences (e.g. different bearings or lubricants) or just more aggressive binning, or even just marketing with identical hardware.

bononos · Aug 17, 2012

Emulex said:
...........

Drives are binned, the top quality to RE4,red, lower quality to consumer/oem. worst quality to external drives.

Are you serious about this? It just sounds odd.
Edit- My understanding is that the different models of drives have diff hardware and therefore the whole drives themselves can't be binned.

Tsaar · Aug 18, 2012

I bought another Black and just slapped it in my system. I am hearing the pleasant "grinding" sound that we all love from HDDs, and thankfully no rattling...so far.

I ran my last drive through a DBAN zero fill and then a DOD Short before putting it in a return box. I know I am being paranoid...

bryanW1995 · Aug 18, 2012

Concillian said:
- Some average Joe sees maybe 25 drives in his lifetime. His experience is going to be hit or miss by definition.

- An OEM who purchases several hundred thousand drives a quarter and frequently installs 50,000 drives at a time in datacenters, and is going to be able to tell you what your failure rate is within a fraction of a percent.

If you have a manufacturing issue that builds 100k components by accident in a way that 99% of them will work fine, but 1% of them will result in a drive failure and you have no way of testing out the 1%....

Would you just let those components end up wherever? Or would you do everything in your power to make sure those components don't end up at the OEM who will WITHOUT A DOUBT notice the increased failure rate. You might say that you scrap all of them, but if it's a constrained component, that could means millions of dollars of lost opportunity. It's not always an obvious decision.

If you want the best possible failure rate for a consumer level drive, buy that drive through a large OEM like Dell or HP. It will cost more, but it will be less likely to have a problem. In the end though, every product has a failure rate, and you need backups anyway, so I go the route of cheap drives, but have at least 2 backups of anything important.

I am now in the market for another 3tb external. Maybe 2. We talk a lot in here about backups and why you do them, but imagine how my professional photographer wife would feel if she lost every single picture she'd ever taken of her little girls?

bryanW1995 · Aug 18, 2012

Coup27 said:
RE4 is a hard drive manufacturer that instead of using platters to store data on them they use slices of pizza with the data embedded into the melted cheese for integrity..thx

What happens if you RAID a cheddar pizza drive with a swiss pizza drive, and the cheese melts in parallel lines? Would that yield a striped array?

Tsaar said:
LOL so I was Google searching "RE4 vs Caviar Black" and someone took my OP and gave it worse grammar and reposted it:

http://www.tomshardware.com/forum/288535-32-hardware-differences-caviar-black

Not surprising to me at all, Toms is after all our "little brother". It's like the tech website for jocks.

Coup27 · Aug 18, 2012

bryanW1995 said:
What happens if you RAID a cheddar pizza drive with a swiss pizza drive, and the cheese melts in parallel lines? Would that yield a striped array?

No, I would call you a n00b for RAID'ing non-identical drives.

Concillian · Aug 18, 2012

Mark R said:
4. Possibly some other changes for 24/7 use, higher MTBF. Not sure if these are hardware differences (e.g. different bearings or lubricants) or just more aggressive binning, or even just marketing with identical hardware.

Might be different lubricants, but not likely for the bearings. Most likely lubricant change would be the super thin layer bonded to the media or the super thin layer bonded to the heads.

Media is the most often component that isn't identical when MTBF claims are different. The platter manufacturing timeline is <1 day, so it's easiest to tweak (compared to heads, which are silicon and each wafer is in process for weeks). Motor, bearing, etc... are all pretty mature technology compared to media, heads and channel. Channel is usually outsourced, so has very long lead times.

I think you'd be surprised at how small a change in layer thickness of some of the layers on the heads and media will make significant design trade-offs / changes in terms of reliability.

I'd be really, really surprised if a Black and RE4 is 100% identical media and heads.

murphyc · Aug 20, 2012

Emulex said:
QNAP/SYNOLOGY/DROBO have special sauce drivers that say screw it, we roll our own and we can use TLER=0,TLER=7, and Deep cycle recovery all in the same raid-set by ignoring all of the above and dealing with issues their own way. I do not believe any "Free" raid solution has this technology.

As it turns out this is not true. The linux md driver (Linux RAID) can do this. It will not drop a drive merely for a lengthy ECC recovery, it will wait. Upon read error (e.g. bad sector) the md driver locates correct data elsewhere (rebuilt from parity, or mirrored copy), and then overwrites the bad blocks and re-reads them. Normally that will cause the drive to remove bad sectors from use, using reserve sectors. If the md write or re-read fails, then the drive is removed from the array.

For TLER, this process simply happens faster. So md raid can do exactly what you describe, and it is free.

murphyc · Aug 20, 2012

Tsaar said:
I ran my last drive through a DBAN zero fill and then a DOD Short before putting it in a return box. I know I am being paranoid...

ATA Secure Erase is faster. It also overwrites sectors without LBAs (reserve good and bad sectors, bad sectors previously containing data). And it conforms to NIST 800-88.

murphyc · Aug 20, 2012

bryanW1995 said:
imagine how my professional photographer wife would feel if she lost every single picture she'd ever taken of her little girls?

The issue of image preservation for photographers is in many ways more difficult, or at least tedious, with digital than film. Film, throw it in a freezer. Digital is much more complicated maintaining availability, integrity, and requires periodic migration - all of which are non-obvious.

Further the storage requirements for pro photographers are in the realm of what small to medium businesses produce, who also have dedicated IT staff to leverage enterprise storage solutions. Yet pro photographers don't have dedicated IT staff, and rarely are implementing enterprise best practices in storage.

BFG10K · Aug 21, 2012

The RE4 drives have slightly lowered access times over the Blacks too.

Emulex · Aug 21, 2012

murphyc said:
As it turns out this is not true. The linux md driver (Linux RAID) can do this. It will not drop a drive merely for a lengthy ECC recovery, it will wait. Upon read error (e.g. bad sector) the md driver locates correct data elsewhere (rebuilt from parity, or mirrored copy), and then overwrites the bad blocks and re-reads them. Normally that will cause the drive to remove bad sectors from use, using reserve sectors. If the md write or re-read fails, then the drive is removed from the array.

For TLER, this process simply happens faster. So md raid can do exactly what you describe, and it is free.

so it will wait and your mysql will crash as the entire array waits? php scripts timeout?

murphyc · Aug 21, 2012

Emulex said:
so it will wait and your mysql will crash as the entire array waits? php scripts timeout?

Your complaints have no merit.

Crashes are bugs.

And the reason why there is a deep recovery in the first place is because there are sector errors. It is not a usual condition. It needs to be addressed rather than depending on, and complaining about, slow ECC recovery. Or choose a different disk.

murphyc · Aug 21, 2012

Adjustment: Choose a different disk model more appropriate to the task.

Mark R · Aug 22, 2012

Emulex said:
so it will wait and your mysql will crash as the entire array waits? php scripts timeout?

Yes. This is the problem with mdraid - if an underlying drive stalls for an error recovery event, the array will stall, and your applications will stall and may abort operations due to timeouts.

For home use, this may not be an issue. For almost any serious use, this behavior is most undesirable.

bigsnyder · Aug 22, 2012

TLER can be toggled with the right utility on the blacks (can't speak for the other drives)

murphyc · Aug 24, 2012

Mark R said:
Yes. This is the problem with mdraid - if an underlying drive stalls for an error recovery event, the array will stall, and your applications will stall and may abort operations due to timeouts.

I suggest before you post such things that you read why they haven't been implemented, and better understand that this should be corrected at lower levels, including specifying the right kind of hardware for the task.

If your system/workflow is that sensitive to delays, then it is your job to make sure those delays don't occur. One way is scheduled smartd testing to make sure disks aren't producing read errors. That disk is unlikely to suddenly have just a series of sectors needing deep recovery. There will have been shorter read problems with ECC before the problem you suggest, but you're somehow suggesting it's OK to ignore this and just wait for distaster then blame the md driver.

Another way, if it's serious usage, is to spec drives that don't go into deep recovery.

For home use, this may not be an issue. For almost any serious use, this behavior is most undesirable.

You're basically proposing serious use, but using non-serious consumer drives, and not monitoring their health, and then complaining about the ensuing behavior. It's absurd.

RE4 Hardware Differences from Caviar Black

Lifer

Senior member

Diamond Member

Diamond Member

Guest

Lifer

Lifer

Platinum Member

Diamond Member

Senior member

Senior member

Senior member

Lifer

Diamond Member

Senior member

Senior member

Diamond Member

Golden Member

Senior member