This is a reply to Cerb in another thread but i thought creating a separate topic might be better not to disturb the original thread. So here goes:
And metadata gets copies=2 by default, while the crucial ZFS metadata which allows ZFS to detect it as a disk belonging to a ZFS pool, gets multiple copies (i think 16) spanned across the drive.
So unreadable sectors (URE) should never cause damage to the ZFS filesystem itself, only to specific files - even on a single disk configuration this is true. The chance that both metadata copies are affected by unreadable sectors is statistically very small. And when using a redundant configuration, you can multiply the protection as the copies also apply to the parity/mirror version as well.
Enabling ditto blocks - as the feature is officially called - can be achieved with a zfs set copies=2 pool/documents command. ZFSguru and probably the other ZFS platforms can do this easily via the web-interface.
You have to start with one thing: error detection.
Then you start to add layers of protection to correct damage, or reverse it like with zfs snapshots, another waaaay cool feature that ZFS offers that allows easy incremental backups of your data. This way, you protect against other dangers too like viruses that eat your data or an accident with the delete-button. These are dangers that normally only a backup would protect from, and not just RAID. This feature is not unique to ZFS as many other filesystems have it too. But the way it is implemented is really cool and very easy to use and understand for beginners.
I would love to know which parts you agree with and which ones you don't.
And by the way, anything i said about ZFS being 'immune' to bad sectors or URE's also applies to other 3rd generation filesystems like Btrfs and ReFS. Simply put: in 2015, protection against bitrot is almost mandatory.
Caching metadata in RAM memory is very hot for some ZFS users - myself included! It makes file searching almost instant and directory browsing as well. It also causes less random seeks to the (5400rpm) disks so that they can go do more sequential I/O instead. This means the disks won't do 50MB/s because they are being hammered with random reads, but will do up to 175MB/s with the latest 6GB WD Green - this one by the way has 1.2TB platters and higher sequential speeds than the 2-5TB Greens.
The 'application' may be your copy command or other backup/filesync program. So this prevents corruption from spreading to your backups and of course other data. For home users this is nice, but not extremely crucial. For companies, this is essential since the bank cannot sometimes make a mistake and transfer a trillion dollars due to the application making the decisions getting faulty data. They need End-to-End data security.
(*) I should note that ZFS' End-to-End data security feature stops working when RAM corruption is involved. ZFS can detect and sometimes correct RAM corruption on disk, however. But the application is vulnerable.
But this issue was long under the carpet. Stupid RAID engines kick the disk out the array and immediately write/update metadata to the disks to reflect the new (broken/degraded) array status. So a power-cycle or reboot will not fix it. Many home users have lost their data this way. Due to a stupid bad sector causing panic on a badly designed RAID engine. It makes me feel both sad and angry when considering that this affects so many people across the globe. Technology should be smart, sleek and sexy. ZFS is just so much better in so many ways. There will be better in the future, but right now ZFS is miles ahead of any of the legacy RAID and 2nd generation filesystem crap that cannot even detect corruption to begin with.
If you set it on the entire filesystem, then yes. But ZFS can set these things on a per-filesystem basis. So you will assign copies=2 or even copies=3 for your important data like documents, and regular copies=1 for bulk data like downloads, videos, etc. Generally, the documents are only 0.1-2% of total data so it is 'cheap' to enable the feature.With copies, you may as well have RAID
And metadata gets copies=2 by default, while the crucial ZFS metadata which allows ZFS to detect it as a disk belonging to a ZFS pool, gets multiple copies (i think 16) spanned across the drive.
So unreadable sectors (URE) should never cause damage to the ZFS filesystem itself, only to specific files - even on a single disk configuration this is true. The chance that both metadata copies are affected by unreadable sectors is statistically very small. And when using a redundant configuration, you can multiply the protection as the copies also apply to the parity/mirror version as well.
What do you mean exactly?and you are now using non-standard configs
Enabling ditto blocks - as the feature is officially called - can be achieved with a zfs set copies=2 pool/documents command. ZFSguru and probably the other ZFS platforms can do this easily via the web-interface.
I don't really understand what you mean. If you copy the files to another directory; how do you know they weren't corrupt to begin with? How do you know the file on the destination is the same as the original after the copy? Do you check the checksum? And if you do, doesn't the read I/O get drawn from the RAM filecache instead of re-reading it from disk to verify?and with just metadata, not enough of importance is protected. Redundancy is needed to protect against UREs, and ZFS does not change that. It enables better protection, but you have to go out of your way to make that happen. FI, saving off files to another local directory will safeguard your data just as well.
You have to start with one thing: error detection.
Then you start to add layers of protection to correct damage, or reverse it like with zfs snapshots, another waaaay cool feature that ZFS offers that allows easy incremental backups of your data. This way, you protect against other dangers too like viruses that eat your data or an accident with the delete-button. These are dangers that normally only a backup would protect from, and not just RAID. This feature is not unique to ZFS as many other filesystems have it too. But the way it is implemented is really cool and very easy to use and understand for beginners.
I would strongly object. ZFS is virtually immune to bad sectors when using redundant configurations. Let us compare the difference in reality:A URE with plain RAID 5 and a URE with RAID-Z1 are functionally identical.
- A RAID engine of an inferior class will panic on any timeout - including a bad sector recovery - and throw the disk out the array. Some will even drop TLER-disks, because they drop disks with I/O errors too. Worse, all these retarded RAID-engines update the metadata of the other disks to reflect the new state. So reboot or power-cycle will not fix the issue. Have one bad sector on two disks and you ruin your RAID5 already. Even with all disks perfectly good, because one bad sector falls within the specced 10^-14 uBER rate of producing unreadable sectors. Maximum 1 per day with maximum duty cycle; usually once per half year on average when using many samples - because the variety per sample is huge.
- A RAID engine of superior class will handle bad sectors gracefully; letting the disk timeout and cause service interruption, but not drop the disk from the array and simply return I/O error instead to the application. The application will most likely yield an error as well to the user. All is good. But hey, what if the data wasn't a read file, but rather crucial metadata? Oops! The legacy filesystem has no redundancy itself whatsoever. Now you're in trouble!
- I have had multiple people on the Dutch forums who lost their array on md-raid. They switched to ZFS. I should note that they did not seek expert help long enough to exhaust all possible ways of recovery. That would have demanded he copies all disks to a backup source so he can retry anything he does and not work on the original copy instead. Simply put: i do not know the cause. The only persons with a failed ZFS pool i got, were one with failed redundancy (two disks dead in RAID-Z) and one who used a ZFS pool behind an Areca RAID controller with unsafe write-back. This causes write reordering across FLUSH-requests, which kills the integrity of ZFS. The only real way to kill ZFS aside from massive disk failure, is to put something in between the disks that changes the order of writes. Lost recent writes are not a problem at all, thanks to the transactional filesystem design.
- I have not had any confirmation any RAID engine ever actually corrects bad sectors from redundancy. I have heard claims though, but nothing substantive like an official document or something of sorts. It is theoretically perfectly possible for RAID engines to do this. They can read data from the mirror/parity and write the calculated missing data to the disk that should have had that data but was unreadable. This way you overwrite the bad sector and fix the issue. It is similar to what ZFS does, but i have not had anyone convince me this actually exists in reality - aside from ZFS and its siblings.
- When using a RAID5 solution, the change is beyond 99% that you will use a "legacy" 2nd generation filesystem like NTFS or Ext4 or XFS. A filesystem of this generation would blindly accept metadata - it would not authenticate the metadata like ZFS does. This causing all kinds of funny things to happen, like entirely lost directory trees, all weird names of files and directories and other unforeseen disaster. I have screenshots of someone who encountered this. The creator of Ext4 (theodore 'o or something) seems to agree, and thinks of his creation as a 'stopgap' until Btrfs - a 3rd generation filesystem - can take over. He also labeled ext4 as 'old technology'. I happen to agree with him. :awe:
- When using a RAID5 solution, you have the filesystem and disk aggregation layer ('RAID') as separate entities, not being aware of each other's information. This means some features that require a unification or at least co-operation of the two, will not be possible. ZFS and also Btrfs can do this. One unique feature is the dynamic stripesize, which is simply not possible in legacy RAID. It causes RAID-Z to be more like RAID3 instead of RAID5, and have fixed the issue of the 'RAID5 write hole' and can do atomicity in one pass. Since all writes will align perfectly with the stripe boundary, no read-modify-write will ever occur with RAID-Z family - as is frequent with RAID5 writes, other than sequential contiguous I/O.
I would love to know which parts you agree with and which ones you don't.
And by the way, anything i said about ZFS being 'immune' to bad sectors or URE's also applies to other 3rd generation filesystems like Btrfs and ReFS. Simply put: in 2015, protection against bitrot is almost mandatory.
Some ZFS systems have 128GiB of RAM memory so that they can cache about 64-96GiB of metadata, which is about 5 - 50% of their pool's total metadata. And all metadata is compressed by ZFS. So it's not 'hardly any' - perhaps percentage wise it is for a large pool. You have a point of course when you assert that any URE would land much more frequently on data than on metadata, of course. But even this small chance is not low enough to call insignificant. 0,1% means one in a thousand chance. That is way to high. And i would assume even NTFS uses more metadata than 0,1% of stored data, depending on file size of course.how much space does metadata take up v. data? Hardly any!
Caching metadata in RAM memory is very hot for some ZFS users - myself included! It makes file searching almost instant and directory browsing as well. It also causes less random seeks to the (5400rpm) disks so that they can go do more sequential I/O instead. This means the disks won't do 50MB/s because they are being hammered with random reads, but will do up to 175MB/s with the latest 6GB WD Green - this one by the way has 1.2TB platters and higher sequential speeds than the 2-5TB Greens.
You're referring to the end-to-end data security feature of ZFS i presume. Yes it is nice, but for home users not all that hot. End-to-end security means the path of the storage chain is protected by checksum or other means of error detection method. So the disk gets checksummed from what the ZFS metadata says, if you have a good SAS controller it can add a layer of protection as well, ZFS internally keeps the data guaranteed from corruption (*) and finally it is delivered to the application only if the data is known to be good.What ZFS specifically does protect against, that is not protected against in most systems (BTRFS and ReFS being notable exceptions), and which appear to be ever more important as time goes on, are not UREs, but successful reads of bad data, which need OS-level or higher awareness to handle.
The 'application' may be your copy command or other backup/filesync program. So this prevents corruption from spreading to your backups and of course other data. For home users this is nice, but not extremely crucial. For companies, this is essential since the bank cannot sometimes make a mistake and transfer a trillion dollars due to the application making the decisions getting faulty data. They need End-to-End data security.
(*) I should note that ZFS' End-to-End data security feature stops working when RAM corruption is involved. ZFS can detect and sometimes correct RAM corruption on disk, however. But the application is vulnerable.
We agree here. I would add BSD GEOM software-raid, which is among the best and in some areas better than what Linux offers. But popular engines like Windows FakeRAID (Intel/AMD/nVidia/Silicon Image/ASMedia/Promise/JMicron/blabla) and quite a few hardware RAID cards, including the popular Areca series - are doomed. It might be possible they fixed the worst of issues with recent firmware i don't know about, i should add.Some other RAID implementations may or may not stop cold with a failed stripe. Likewise, Linux' RAID handles non-TLER disks just fine, along with degraded arrays.
But this issue was long under the carpet. Stupid RAID engines kick the disk out the array and immediately write/update metadata to the disks to reflect the new (broken/degraded) array status. So a power-cycle or reboot will not fix it. Many home users have lost their data this way. Due to a stupid bad sector causing panic on a badly designed RAID engine. It makes me feel both sad and angry when considering that this affects so many people across the globe. Technology should be smart, sleek and sexy. ZFS is just so much better in so many ways. There will be better in the future, but right now ZFS is miles ahead of any of the legacy RAID and 2nd generation filesystem crap that cannot even detect corruption to begin with.