LUN/Datastore Issue

gea · Aug 5, 2014

XavierMace said:
It's free.

http://www.oracle.com/technetwork/s...11/downloads/index.html?ssSourceSiteId=ocomen

Server 2k12 ReFS has come a long way to be sure, but it's still far inferior to ZFS IMO. Not saying ZFS is the be all end all, right for everybody option. I'd still stick with old fashioned RAID with BBWC for anything business critical.

But I completely agree on the issue with the forks.

Oracle Solaris 11 is not free.
You can only install for developing and demo use without a paid subscription.

Free are some distributions that are based on the OpenSource Solaris fork. There is only one fork: Illumos. This is the upstream for some distributions like NexentaStor (commercial), OmniOS (free, commercial support as an option), OpenIndiana or SmartOS.

Beside that only ZFS (and partly other new filesystems like btrfs and ReFS or commercial storage boxes like NetApp) offers a new set of storage features that are highly needed on modern high capacity disks:

- Copy On Write = always consistent filesystem = no offline fschk
- Snapshots and versioning without inintial space consumption or delay
- realtime checksums (always valid data) + selfhealing
- advanced caching and log features (ARC, ZIL)
- pooling with storage virtualisation
- software raid without write hole problem

especially the last is the real problem with hardware raid where data corruption can always happen on a crash during write. So No, I would never go back to hardware raid.

Currently ZFS is a unique set of modern storage features, especially on Solaris and Co.

imagoon · Aug 5, 2014

ZFS has a related weakness to the write hole issue. All file systems do. Incomplete writes can always cause damage or inconsistency. Even if ZFS rolls back a write using the copy on write, you may not have a coherent block chain afterwards. ZFS itself can be consistent but the data sitting on top may be damaged.

No matter the disk tech there is a very good reason to have NV caches that hold the blocks of data prior to the write commit so that when the system is powered backup the data can be committed to disk.

gea · Aug 5, 2014

imagoon said:
ZFS has a related weakness to the write hole issue. All file systems do. Incomplete writes can always cause damage or inconsistency. Even if ZFS rolls back a write using the copy on write, you may not have a coherent block chain afterwards. ZFS itself can be consistent but the data sitting on top may be damaged.

No matter the disk tech there is a very good reason to have NV caches that hold the blocks of data prior to the write commit so that when the system is powered backup the data can be committed to disk.

That is not the core of the problem and not really true.
Example:

If you have a conventional hardware raid-1 without a BBU (to begin with a simple example) where you want to edit data with a crash during write, you have as a result:

both disks are updated correctly: fine
both disks contain errors (can results in an inconsistent filesystem as metadata may partly updated incorrectly)
often one disk is updated while the other is not. On next read, the system may read randomly from one disk (faulted data or correct data)

The main problem, without checksums the system cannot decide if data is valid. You can now use a BBU on the raidcontroller to reduce the problem but this does not avoid the problem at all - it lowers the chance of problems.

A Copy on write filesystems like ZFS with software raid works different.
If you write to a Raid to update some data, a modified datablock is not updated but written completely new (CopyOnWrite). On a crash during the write, the old datastructure prior the write remains valid. Only when all data is written, the pointers are updated and the new data is valid on disk. This is a ok or not done behaviour, not a rollback.

This does not avoid dataloss with the current written file during a write but this avoids the inconsistent filesystem.

The remaining problem is a cache on disk and the write cache within the OS. This is the same with ZFS and other filesystems. With ZFS you can adress this with sync write behaviour and a ZIL logdevice that gives you the rollback option. If you use a dedicated logdevice, this is done without speed degration.

A similar grade of security cannot be done with hardware raid (nor a similar performance) as it needs full control for the OS over the raid down to the disk - especially when paired with checksums where you can decide if data is valid (or in case of a mirror after problems what disk contains valid data). If there is redundancy, errors are repaired automatically during read.

imagoon · Aug 5, 2014

gea said:
That is not the core of the problem and not really true.
Example:

If you have a conventional hardware raid-1 without a BBU (to begin with a simple example) where you want to edit data with a crash during write, you have as a result:

both disks are updated correctly: fine
both disks contain errors (results in an inconsistent filesystem)
often one disk is updated while the other is not. On next read, the system may read randomly from one disk (faulted data or correct data)

The main problem, without checksums the system cannot decide if data is valid. You can now use a BBU on the raidcontroller to reduce the problem but this does not avoid the problem at all - it lowers the chance of problems.

A Copy on write filesystems like ZFS with software raid works different.
If you write to a Raid to update some data, a modified datablock is not updated but written completely new (CopyOnWrite). On a crash during the write, the old datastructure prior the write is already valid. Only when all data is written, the pointers are updated and the new data is valid on disk. This is a ok or not done behaviour, not a rollback.

This does not avoid dataloss with the current written file during a write but this avoids the inconsistent filesystem.

The remaining problem is a cache on disk and the write cache within the OS. This is the same with ZFS and other filesystems. With ZFS you can adress this with sync write behaviour and a ZIL logdevice that gives you the rollback option. If you use a dedicated logdevice, this is done without speed degration.

A similar grade of security cannot be done with hardware raid (nor a similar performance) as it needs full control for the OS over the raid down to the disk - especially when paired with checksums where you can decide if data is valid (or in case of a mirror after problems what disk contains valid data). If there is redundancy, errors are repaired automatically during read.

I fully understand all of what you said. However even copy on write will not leave you with consistent data. Sure the ZFS file structure survives (because during the next boot the failed write rolled back) but the data living on it is suspect.

Hardware RAID will have the same issue. Granted the BBU you mentioned is "yesterdays technology" and nearly everything now is NV RAM. The files on the disk are suspect but the file system itself can pass with flying colors. There is also other techniques out there now like extended checksums on the drives themselves. 520byte/4160byte sectors that effectively do the same thing along with ReFS cluster checksumming on the Windows side.

gea · Aug 5, 2014

You cannot solve write whole problems with any other technology in a similar way than with CopyOnWrite. This is the reason why other modern filesystems (btrfs, ReFS or WAFL/NetApp) are CopyOnWrite as well - this is definitly the future especially as there are many other problems that are adressed with this new generation of filesystems like end to end checksums (disk to OS) not only the disk internal checksumming that cannot help to decide if real data is valid.

Search

LUN/Datastore Issue

gea

Senior member

imagoon

Diamond Member

gea

Senior member

imagoon

Diamond Member

gea

Senior member

TRENDING THREADS