Question My First NAS: Newbie Q&A on Hashing, Data & RAID Scrubbing and Check Summing for Backups

chane

Member
Apr 18, 2010
107
9
81
In pursuit of building-or rather having my local IT guy build me-my first NAS, I’ve sunk my newbie brain as deep as it can go into learning how best to use it after my builder does all the hardware and OS installs and then walks me through use of the GUI.

Of course, beyond basic storage capacity and drive storage redundancy to prevent user file losses, a NAS or any server and its file system (zfs or btrfs) are only as useful as they enable you to prevent data corruption. Save for the crazy maths (and terms like “pool” which seems to have multiple meanings in the data storage biz), these reports were helpful https://en.wikipedia.org/wiki/Hash_function https://en.wikipedia.org/wiki/Checksum for learning about hash functions and the tables of hash codes (“hashes”) they (apparently?) create for each document, photo, audio or video file.

But please to these questions:

Is a hash code automatically created for every user file (e.g., document, photo, audio, video) the first time it gets written to the NAS? Or do you have to use some kind of app or NAS utility and enable it to generate and assign a hash code to every one of your files?

And where are those codes stored? Inside of the file’s own container? Or are all user file hash codes stored someplace else? In a “hash table” and/or on a drive partition on the RAID drive array?

Are these hash codes used by the zfs and btrfs file system for routine data scrubbing?

https://blog.synology.com/how-data-scrubbing-protects-against-data-corruption
https://www.qnap.com/en/how-to/tuto...a-corruption-by-using-data-scrubbing-schedule
https://par.nsf.gov/servlets/purl/10100012

Then, as mentioned in the above links, following data scrubbing, are these hash codes also usually used for routine RAID scrubbing?

But for both data and RAID scrubbing, is data integrity ensured by comparing the hash code of each file with its initially (first ever created) hash code (stored wherever) to the hash code currently in the file. If the system’s comparing calculations show that the codes are different, then one or more of the file’s bits have flipped, so then it knows that the file is therefore corrupt?

If yes, then at that point will it flag me and ask if it wants the system to attempt to repair it?

If I say yes, then it will try to overwrite the corrupt file with the mirrored copy stored on a redundant (e.g., RAID 5) drive.


CAUTION
: As RAID scrubbing puts mechanical stress and heat on HDDs, the rule of thumb seems to be to schedule it for once monthly-and only when drives are idle, so no user triggered read/write errors can occur. https://arstechnica.com/civis/threa...bad-for-disks-if-done-too-frequently.1413781/

Beyond scrubbing, what else can I and the zfs and/or btrfs do to both bit rot?

And to minimize the risk crashes:

Replace the RAIDed HDD array every 3 (consumer) to 5 (enterprise grade) years.

Do not install any software upgrade for the NAS until it’s been around long for the NAS brand and the user community forum to declare it to be bug free.

What else can I do to minimize the risk of crashes?


Finally, when backing up from my (main) NAS to an (ideally identical??) NAS, Kunzite says here “…and I'm check summing my backups...”
https://forum.qnap.com/viewtopic.php?t=168535

But as hash functions are never perfect, and while rare, data “collisions” are inevitable. https://en.wikipedia.org/wiki/Hash_collision So as those hash algorithms are used for data and RAID scrubbing, they are evidently also used for check summing to ensure that data transfers from the NAS to a backup device happen without file corruption.

Apparently, CRC-32 is among the least collusion proof hash algorithms. https://en.wikipedia.org/wiki/Hash_collision#CRC-32

Thus, for backups from main NAS to backup NAS, how much more is the SHA256 hash function (algorithm) worth using to prevent collisions and to verify data integrity of user files via check summing than MD5, because it uses twice the number of bits?

But if not much more advantageous for even potentially large audio files https://www.hdtracks.com/ , then would SHA256 be a lot more so than MD5 for check summing during for backups of DVD movie rips saved to uncompressed MKV and/or ISO files, because video bandwidths are so much bigger than audio?

And what would be a recommended checksum calculator app? https://www.lifewire.com/what-does-checksum-mean-2625825#toc-checksum-calculators

But if the app returns a check sum error between the file on my main NAS and the copy to be updated on my backup NAS, how then to repair the corrupt file?

Again, by using the file’s original hash code (stored some place) created the first time that it was ever stored in the NAS?

If yes, would that app then prompt me to choose to have the system repair the file?
 
Feb 25, 2011
16,718
1,419
126
Hashing as a concept is used for several different things. It's not really something you'll need to worry about - the software doing the hashing will tell you whether or not a file or block of data is corrupt or not, or will tell you whether or not a file copy was successful.

Basically everything you're talking about is handled automatically by the file system, OS, or the backup scheduler and automation software. ZFS scrubs are run on a schedule and you don't really have to think about them. Likewise, most backup software will determine what and when to copy, on its own.

The likelihood of you actually running into a problem with bitrot, where you'd actually have to choose whether to try and repair an individual file is... literally astronomical. That said, if you're paying an IT consultant to set this all up for you, you should probably ask them to create a disaster recovery guide on how to handle things like drive failures, array rebuilds, etc. - they should be dry-running all that stuff anyway to be sure things are working before they hand you the keys.
 

Tech Junky

Diamond Member
Jan 27, 2022
3,179
1,051
96
Well, you dove a mile deep and an inch wide so far. I keep it simple and just did raid 10 for speed and redundancy. After about 8 years I saw no data issues. Getting into the md5 checksums just adds to the overhead of the system. They're good if it's vital for a business
that has compliance reviews and audits.

The biggest issue is if a transfer is interrupted by power or network loss. That's when you see corruption.
 

gea

Member
Aug 3, 2014
177
10
81
only a few remarks
Checksum protection ex on ZFS is not done on file level but on the level of ZFS datablocks (16kb-1M)

Checksum errors on any read result in a data repair (self healing filesystem) from raid redundancy. Additionally you can start a pool scrub that reads and checks/repair all data

Bitrot (data corruption that cannot be repaired by disks during read) is a statistical problem. Number of errors depend on pool size, usage time and hardware quality (in that order). The larger the pool the longer you use it the higher the risk. With a multi TB busy pool over some years you can guarantee errors. Bad psu, ram, cables or disk bays can also be the reason of data corruptions.

ZFS protects against undetected data corruptions. A crash during write cannot damage a ZFS filesystem or raid due Copy on Write unlike older filesystems.

Use readonly snaps ex one per hour/today, one per day/ current week, one per month/ current year to be protected against human errors or Ransomware.

Use ECC ram as ram errors can corrupt data with the effect that ZFS may write bad data with correct checksums. As this is a small statistical problem, danger scale with ram size.

Be prepared of a disaster (fire, theft, amok hardware), do backups even with ZFS. Best on a different location. Best on a hardware that is online only during backups.

Prefer raids where you can allow any two disks to fail, ex ZFS Z2. Btrfs is not stable with such raid arrays.
 
Last edited: