btrfs is supposed to be fault tolerant right?

Elixer · May 16, 2016

Long story short, was using btrfs, power cut out to the machine, and on next boot, I see something about btrfs flash on the screen, and then, nothing.

It just wouldn't recover.
I had to boot a linux rescue CD, and then do btrfsck, and then, finally it was back to normal.

This was on 16.04 Ubuntu I have been testing.

Anyone else playing around with btrfs, and run into this?

lxskllr · May 16, 2016

I wouldn't use btrfs on a mission critical machine. From what I've read around, it mostly works except when it doesn't. Most people use it without issue, and love it, but the rest get catastrophic failure. I'll stick with ext4 til it gets more mature, then think about switching.

Elixer · May 16, 2016

I was just surprised that it could fail so easily. Having a power outage isn't unheard of, yet, I can repeat this as many times as I want, and it always fails.

I was thinking using btrfs would be nice on a old SSD since it is supposed to keep track of bit errors and the like, but, so far, I am unimpressed with how it handles faults, and the r/w speeds are slower than ext4 for the things I was testing.

My test machine that this is on just don't have enough RAM for ZFS, and is limited to only 4GB. (Old machine).

replica9000 · May 16, 2016

btrfs wasn't unstable for me, but was slow on performance for my particular setup. I also didn't like the way it dealt with metadata.

I use ZFS on top of luks on a machine with only 4GB of RAM, and it runs fast with no issues.

Red Squirrel · May 16, 2016

I read on it, it looks like something very interesting but last I read it was not yet fit for production. I never looked at it further though when I realized my performance issues were NFS related and not mdadm raid, so sticking to that for now.

Elixer · May 17, 2016

Since I am too lazy to make a new thread, I am curious, how do you guys transfer from one file system to another?
Do you guys use something like cp -aX, or do you go with tar cpBdf - . | ( cd /filesystem2 ; tar xvpBdf - ) or just use rsync?

mv2devnull · May 17, 2016

rsync

replica9000 · May 17, 2016

I use rsync -Pavx

Essence_of_War · May 17, 2016

Elixer said:
Since I am too lazy to make a new thread, I am curious, how do you guys transfer from one file system to another?
Do you guys use something like cp -aX, or do you go with tar cpBdf - . | ( cd /filesystem2 ; tar xvpBdf - ) or just use rsync?

Rsync, typically with

-avhiP --stats

And -z if it is between hosts over something like ssh because why not, and -n before I do it for real.

Fallen Kell · May 18, 2016

lxskllr said:
I wouldn't use btrfs on a mission critical machine. From what I've read around, it mostly works except when it doesn't. Most people use it without issue, and love it, but the rest get catastrophic failure. I'll stick with ext4 til it gets more mature, then think about switching.

Hahaha..... I was going to say something similar, but I don't even trust EXT4 and was going to say use EXT3 where you can.

replica9000 · May 18, 2016

Fallen Kell said:
Hahaha..... I was going to say something similar, but I don't even trust EXT4 and was going to say use EXT3 where you can.

EXT4 has been solid for me. I haven't lost any files to it yet, even with multiple crashes on several systems. My Ex's laptop had no usable battery, and a faulty power cord. That laptop probably saw half a dozen power failures a day, and never lost any files.

replica9000 · May 18, 2016

Elixer said:
My test machine that this is on just don't have enough RAM for ZFS, and is limited to only 4GB. (Old machine).

I wanted to add that the recommendation for RAM with ZFS was 1 gigabyte of RAM per 1 terabyte of storage. I also found out that ZFS won't compress a block unless that block can be compressed by at least 12.5% or more.

Fallen Kell · May 19, 2016

replica9000 said:
I wanted to add that the recommendation for RAM with ZFS was 1 gigabyte of RAM per 1 terabyte of storage. I also found out that ZFS won't compress a block unless that block can be compressed by at least 12.5% or more.

And you risk data corruption without ECC memory as RAM is used as level 1 cache for all writes to disk.

Don't get me wrong, ZFS is a really great filesystem. Probably the best there is in terms of data integrity, but only if you use proper hardware.

replica9000 · May 19, 2016

Fallen Kell said:
And you risk data corruption without ECC memory as RAM is used as level 1 cache for all writes to disk.

That would affect any filesystem though.

jhu · May 19, 2016

I only trust my ~~porn~~ important data with ZFS.

Essence_of_War · May 19, 2016

Fallen Kell said:
Hahaha..... I was going to say something similar, but I don't even trust EXT4 and was going to say use EXT3 where you can.

Google and Backblaze both trust ext4.

Ext4 is fine.

Fallen Kell · May 19, 2016

replica9000 said:
That would affect any filesystem though.

Not as much as ZFS. While it is true that most of your data will be in RAM at some point for anything written to disk, with ZFS the memory used would be first copied to yet another part of RAM, using 2x or more the amount of RAM, doubling the chance that the file moved through a bad section of RAM. You also have to contend with the write block checksum that ZFS creates of the particular write which gets created in RAM while the file is still cached in RAM, further increasing the space used up in RAM, increasing the chance that the operation used a bad part of RAM. And if the checksum is bad for that write block, ZFS will mark the file as corrupt and believe your disks are failing, when in fact it is your memory that is bad. And I didn't even account for possible parity blocks being created since I am not yet making the assumption that you are using a RaidZ or RaidZ2 disk group(s) in your Zpool, but if you did, you again just increased you memory footprint in RAM. Then if you take advantage of using a ZFS snapshots, you just increased you RAM usage again for any files that you change... I can keep going on and on explaining how much more RAM ZFS uses and thus the higher increased likely hood of file corruption from non-ECC memory, but all the documentation already states this.

Please trust me, I do this for a living. I love ZFS, but as I said, you need the right hardware to use it, and if you don't have that correct hardware, it is much more dangerous to your data than using some other filesystem.

Elixer · May 20, 2016

Well, for me at least, the whole point in using btrfs was to try and detect SSD bit errors, since I don't really trust it anymore.
Having btrfs fail so easily, was depressing.

The only other choice is ZFS and the system don't have ECC RAM, so, I wouldn't be able to tell if the error is via the RAM or the SSD itself, but, wouldn't ZFS still be more robust that btrfs in the original and other recovery scenarios?

Fallen Kell · May 30, 2016

Elixer said:
Well, for me at least, the whole point in using btrfs was to try and detect SSD bit errors, since I don't really trust it anymore.
Having btrfs fail so easily, was depressing.

The only other choice is ZFS and the system don't have ECC RAM, so, I wouldn't be able to tell if the error is via the RAM or the SSD itself, but, wouldn't ZFS still be more robust that btrfs in the original and other recovery scenarios?

It is only more robust if you trust your memory and avoid many of the features of ZFS that make it so great. Absolutely do not enable data de-duplication. Absolutely do not use or create any data snapshots. Get a small SSD designed for your zil-log (extremely high number of small writes, so you want something with extensive number of writes/re-writes before failure). I would also run at a minimum weekly memory checks (memtest86). And even then, I would also do data backups.

ZFS is the best filesystem out there for protecting against bit-flips on the disk/storage drive, but as I stated, is extremely vulnerable to problems from RAM, which is why only ECC RAM should be used.

Red Squirrel · May 30, 2016

If you want solid and dependable, check out mdadm raid. Even after hard shut downs I've never had it fail on me to the point of losing data. I've had close calls caused by hardware failure (losing 2 drives in a raid 5) or situations where it had to resync the array, but I can't think of any incident where I had an array actually go completely dead on me. Before someone else says it, no, raid is not backup, but if you never have to use your backups then it's always better.

Though I always had the impression ZFS was even more solid, I just never really looked much into it myself.

Fallen Kell · May 30, 2016

ZFS can be much more solid than mdadm. It has all the same basic concepts of RAID devices (mirrors, essentially RAID 5 (single parity disk), RAID 6 (dual parity disks), and many combinations of them, heck you can have a filesystem that is built on a combination of mirrors, RAID 5, and RAID 6 all at the same time... that being said, there are performance and space consequences which may make it overkill depending on what you do and why).

At work, I have used ZFS to create a disk pool across 4x12disk disk arrays. I used a stripe of multiple raidz2 drives and a hot spare for each raidz2 as well as 2 hot spares for the entire pool. The specific ordering/grouping of the drives allowed for 1 disk array to be completely lost/removed and yet still have data integrity within the system, as well as handle the read and write operations of 2500-3000 CPUs from a small cluster. While initially people believed the setup was overkill in terms of the drive configuration, it turned out to have been the right call since I knew the data on there was not intended to be backed up. We suffered a failure which resulted in the loss of one of the disk arrays, but my foresight in ordering the disks such that no single raidz2 had more than 2 disks on any one array resulted in us not losing the data and ran perfectly fine for the 2 weeks it took to get a replacement.

ZFS is most probably the most powerful and capable filesystem that currently exists for data integrity. It is, however, entirely what you make of it, as improperly configured, you will be worse off than having just done a concatenation of your disks. But it is still one of the only filesystems that can protect against silent data corruption as it is the only one that I know of that will checksum each write-block and can then flag a portion that is bad and attempt to rebuild it using parity blocks (if the underlying zpool had parity blocks from using raidz or raidz2) or compare against mirrors (if the underlying zpool had mirrors).

thecoolnessrune · Jun 17, 2016

Fallen Kell said:
ZFS is most probably the most powerful and capable filesystem that currently exists for data integrity. It is, however, entirely what you make of it, as improperly configured, you will be worse off than having just done a concatenation of your disks. But it is still one of the only filesystems that can protect against silent data corruption as it is the only one that I know of that will checksum each write-block and can then flag a portion that is bad and attempt to rebuild it using parity blocks (if the underlying zpool had parity blocks from using raidz or raidz2) or compare against mirrors (if the underlying zpool had mirrors).

ReFS + Storage spaces offers similar data protection abilities. They certainly have their tradeoffs. ZFS is a Storage system capable of Petabyte scaling, whereas Storage Spaces has a much smaller limit in the tens of terabytes (though its increasing with every release).

I think, especially in small environments, a big limitation in ZFS is that although it has an immense amount of data care, it is limited to a singular node, and if that node goes down, all the data care in the world won't maintain your uptime. Storage Spaces offers effective Scale Out Storage features on par with the big boys, including Storage Spaces Direct coming with Server 2016.

ZFS at this point is still mostly limited to ludicrously expensive RSF-1 deployments for high availability, and it's still very limited compared to a true scale out system. Zetavault may change some of that, but you're still paying roughly $1,500 for licenses for a 20TB license pack.

For home users planning a home lab environment, I'd be inclined to recommend 2 Windows Server 2012 R2 Essentials nodes with Hyper-V over a ZFS + Hypervisor deployment.

That being said, I'm still using, and enjoying my ZFS Deployment

Fallen Kell · Jun 21, 2016

thecoolnessrune said:
ReFS + Storage spaces offers similar data protection abilities. They certainly have their tradeoffs. ZFS is a Storage system capable of Petabyte scaling, whereas Storage Spaces has a much smaller limit in the tens of terabytes (though its increasing with every release).

I think, especially in small environments, a big limitation in ZFS is that although it has an immense amount of data care, it is limited to a singular node, and if that node goes down, all the data care in the world won't maintain your uptime. Storage Spaces offers effective Scale Out Storage features on par with the big boys, including Storage Spaces Direct coming with Server 2016.

I would argue with that "limitation". The limitation only exists in that it isn't inherently a "clustered" filesystem meant to be actively controlled via more than one host server. Nothing prevents you from creating a high availability cluster which runs your storage as a clustered service, either as a virtual machine, a Veritas service, or Ricci/Lucci/Cman_tools service. The idea is that you create your disk pools from disks on your SAN, and have multiple servers connected to the SAN which have access to the disks. If the server that was currently hosting the disks fails, the HA service would notice that it is down and failover to another server (using the zpool export/import commands), fires up the IP address used to host this data on the new server, and you are done.

thecoolnessrune said:
ZFS at this point is still mostly limited to ludicrously expensive RSF-1 deployments for high availability, and it's still very limited compared to a true scale out system. Zetavault may change some of that, but you're still paying roughly $1,500 for licenses for a 20TB license pack.

As I just explained, no, it isn't limited to ludicrously expensive RSF-1 deployments for high availability. You can do this with a free linux distro such as CentOS and setup lucci/ricci/cman_tools high availability clustering. The only cost is the SAN or other methods of attaching disks to multiple servers (iSCSI, etc).

thecoolnessrune · Jun 21, 2016

Fallen Kell said:
I would argue with that "limitation". The limitation only exists in that it isn't inherently a "clustered" filesystem meant to be actively controlled via more than one host server. Nothing prevents you from creating a high availability cluster which runs your storage as a clustered service, either as a virtual machine, a Veritas service, or Ricci/Lucci/Cman_tools service. The idea is that you create your disk pools from disks on your SAN, and have multiple servers connected to the SAN which have access to the disks. If the server that was currently hosting the disks fails, the HA service would notice that it is down and failover to another server (using the zpool export/import commands), fires up the IP address used to host this data on the new server, and you are done.

This isn't even close to the same level of redundancy and high availability of a truly clustered file system implementation. There is an *immense* difference in telling my client that they had a node shutdown, but they still have one or more nodes holding up the system, and there was no downtime, vs. a node failure, high availability kicking in, losing 300 VMs hosted on it, and now we're in the process of starting them back up and hoping there wasn't any data loss.

Fallen Kell said:
As I just explained, no, it isn't limited to ludicrously expensive RSF-1 deployments for high availability. You can do this with a free linux distro such as CentOS and setup lucci/ricci/cman_tools high availability clustering. The only cost is the SAN or other methods of attaching disks to multiple servers (iSCSI, etc).

Until ZFS gets the ability to either A: Have its controller exist on multiple nodes, or B: Failover seamlessly, and automatically occurs within 10 seconds of node failure, then its just a bandaid when there are indeed other alternatives.

Like I said, I use ZFS now in my home. I love it. I have no real problems with the system. But it doesn't exist in a bubble, and when you mentioned that no other system has such data protections, that's not true anymore. Likewise Storage Spaces has a real advantage when it comes to system redundancy and high availability. Again, the only thing that gets that close to that, that I know, that gets ZFS close to that is RSF-1, which has some caching special sauce to get failover time down to a couple of seconds, which is acceptable for the vast majority of environments. But whether we like it or not, a system that is not designed to seamlessly hide failures is not a system that is designed for today's modern infrastructure.

Everything is going redundant. Disk systems have been forever. Disk controllers, servers, and programs are all built in modern times to exist in active / active groups. No failover, no failures, just work getting shunted to whatever is available to participate. Now high performance file systems are starting to get this focus. And whether or not true non-failure behavior is important to your project, there's no doubt that it's important to a lot of stakeholders.

Fallen Kell · Jun 21, 2016

How long do you think it takes to do a zpool import? We have tested it many times and it only takes 1-3 seconds. The entire failover only takes 5-10 seconds, well within the timeout values of NFS and SAMBA. And again, that is when created as a HA service group such as Ricci/Luci. You could easily do it as a VM on a clustered VM environment which support live migration/etc...

btrfs is supposed to be fault tolerant right?

Lifer

No Lifer

Lifer

Member

No Lifer

Lifer

Golden Member

Member

Platinum Member

Diamond Member

Member

Member

Diamond Member

Member

Lifer

Platinum Member

Diamond Member

Lifer

Diamond Member

No Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member