btrfs is supposed to be fault tolerant right?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

thecoolnessrune

Diamond Member
Jun 8, 2005
9,673
583
126
How long do you think it takes to do a zpool import? We have tested it many times and it only takes 1-3 seconds. The entire failover only takes 5-10 seconds, well within the timeout values of NFS and SAMBA. And again, that is when created as a HA service group such as Ricci/Luci. You could easily do it as a VM on a clustered VM environment which support live migration/etc...

You may be having great luck with ZFS's failover abilities, and that's great, I am (mostly) as well in the deployments I've looked over and in my home. However, import times for many is not "5-10 seconds" as you claim. This may be the case on raw iSCSI LUNs with no L2ARC, but ZILs, L2ARC, NFS, and of course, a lot of disks, can drastically increase the time it takes to do a pool import. Solaris 11's ZFS has a lot of little bits that work to improve this, a huge one being parallel disk scan operations, but it can still take a long time. This was posted about fairly often in the Solaris 10 days. Stately export times (exports not related to a system crash), can also take a long time because the exportfs command runs one folder at a time (so if you have a lot of NFS shares, you're boned), and so does the zfs umount. It then has to de-serialize the L2ARC if its in place.

Building up all that gear to do it can make sense. Supermicro even offers a pod based on the Lustre solution. ZFS *can* be high availability, as you've noted several times. But its not without its hiccups. It's also based on manually assigning resources to various nodes and hoping that you load balance ok, rather than universal "right workload to the right node" management. It's also not "cooked in".

ZFS HA simply does not have the I/O granularity and uptime resiliency in a node failure that other cluster solutions. It does not make ZFS a bad system by any stretch of the imagination. But to purport that a scripted together HA backend on a filesystem that is not capable of coordinating with multiple nodes is on par with a fully supported scale out file system that was built from the ground up to support clustering is simply a fallacy. Depending on certain customers (especially customers needing huge amounts of Data Storage), clustering ZFS with RFS-1, Nexenta, or others makes sense, or even doing it yourself if you're a small shop that cannot afford supported systems. But its hardly the same as what a true clustered file system does.

There's reasons why Nutanix, Nimble, Tegile, Pure, Tintri, NetApp, EMC, Microsoft, Hitachi, and others are all doing natively clustered storage arrays for their latest generation. The times, they are a-changin' :)