The111
Member
Looking for critique on my thought process here, and a few unanswered questions regarding data corruption.
As a photographer and videographer I generate a sizeable amount of data very quickly, data that I want to last a lifetime. For years I archived it on optical discs in triplicate (2 copies stored at home, 1 away from home), but this grows increasingly impractical for many reasons, and I also still fear data dropout from the optical medium over time (everyone has had burned discs go bad).
So, I've swapped to HDD storage. Right now my current "archive of everything" is about 650GB. I have 3x 1TB drives (WDC Black) that I use as mirrored backup disks (2 stored at home, 1 away from home). I don't like the idea of RAID because I don't want any sort of disk array that is reliant on any software or hardware external to the disk to function (mobo, drivers, etc). I refer to my manually mirrored triplicate disk concept as "ghetto RAID" since it accomplishes the same thing (and more, since the 3rd disc is off-site in case of fire, etc). The disks are only powered on when I am transferring data to or from. I store them in ESD cases and when I need to hook them up to PC I use an e-SATA dock. There is another 1TB drive in my desktop machine which contains the same archive for immediate access and also for daily updates (the backup discs get updated much less often).
I am mostly content with my methods, but one thing concerns me: data corruption. I don't know much about it, but I do know that data can go bad over time. More specifically, I'm worried about unintentionally replicating corrupt data over good data in my manual backup process.
Scenario 1 (acceptable corruption): Let's say that in my desktop I have my current archive defined as ABCDEF. And I also have this ABCDEF mirrored on my 3 external disks. Over the course of a few weeks, I update the archive in my machine such that ABCDEF becomes ABCDEF+G. Very easy to update all 3 external disks to ABCDEFG (simply add G to each disk). If, anywhere along the line, "C" has gone corrupt on any of the 4 disks, that corruption stays isolated to the one disk (since I have not over-written any of the C's). Somewhere in the future I become aware of the corruption, and it's no problem, I simply get a good "C" from one of the other disks.
Scenario 2 (unacceptable corruption): Let's call the archive A1B1C1D1E1F1. Over the course of a few weeks, the version in my machine which I update daily is transformed into A2B2C2D2E2F2+G. These 1->2 changes are minor. Maybe I've altered 4 photos out of a batch of 4,000 in each instance. Maybe I've also modified the directory tree structure in such a way that it's easiest to just completely rewrite all 3 of my backup disks with the brand new A2B2C2D2E2F2G set, and blow away A1B1C1D1E1F1 on all of these backups. Now, what if, unknown to me, a large part of C1 became corrupt on my internal drive. I changed C1 to C2 by modifying 4 out of 4,000 photos (another 2,000 have gone bad but I never discovered it). When I get rid of all my backup C1's and replace them with C2's... I am replicating this unknown corruption onto all my backups. VERY BAD. How do I avoid this?
I think the key here is that the corruption occurred "unknown to me." As I said, I don't know a lot about data corruption... I just know it happens. How can I check for it? Right now, if I copy a very large directory from one disk to another, when it is done copying, I check the "properties" in windows for each folder, and I compare the file count and byte count. If both numbers are identical, I have always assumed that the backup was successful and both folders on their respective disk are indeed identical. Is this a valid assumption? If it is, then I can always check for data corruption on my primary internal disk by comparing it to external disks just before making any internal modifications which might results in overwriting archive data on all the external disks.
Absent these manual "corruption checks," I am sure there exists all sorts of "backup software" which I have always ignored because I prefer the KISS principle. I am guessing these programs employ sync operations which would ostensibly find corruption as well... but I again I like to KISS. No fancy software where not necessary. Should I be re-thinking this?
One of the very few advantages that optical had over HDD was that I could not forcibly overwrite anything that was archived on a write-once medium. Though, I do not think this one advantage alone is enough to draw me back to optical archiving.
As a photographer and videographer I generate a sizeable amount of data very quickly, data that I want to last a lifetime. For years I archived it on optical discs in triplicate (2 copies stored at home, 1 away from home), but this grows increasingly impractical for many reasons, and I also still fear data dropout from the optical medium over time (everyone has had burned discs go bad).
So, I've swapped to HDD storage. Right now my current "archive of everything" is about 650GB. I have 3x 1TB drives (WDC Black) that I use as mirrored backup disks (2 stored at home, 1 away from home). I don't like the idea of RAID because I don't want any sort of disk array that is reliant on any software or hardware external to the disk to function (mobo, drivers, etc). I refer to my manually mirrored triplicate disk concept as "ghetto RAID" since it accomplishes the same thing (and more, since the 3rd disc is off-site in case of fire, etc). The disks are only powered on when I am transferring data to or from. I store them in ESD cases and when I need to hook them up to PC I use an e-SATA dock. There is another 1TB drive in my desktop machine which contains the same archive for immediate access and also for daily updates (the backup discs get updated much less often).
I am mostly content with my methods, but one thing concerns me: data corruption. I don't know much about it, but I do know that data can go bad over time. More specifically, I'm worried about unintentionally replicating corrupt data over good data in my manual backup process.
Scenario 1 (acceptable corruption): Let's say that in my desktop I have my current archive defined as ABCDEF. And I also have this ABCDEF mirrored on my 3 external disks. Over the course of a few weeks, I update the archive in my machine such that ABCDEF becomes ABCDEF+G. Very easy to update all 3 external disks to ABCDEFG (simply add G to each disk). If, anywhere along the line, "C" has gone corrupt on any of the 4 disks, that corruption stays isolated to the one disk (since I have not over-written any of the C's). Somewhere in the future I become aware of the corruption, and it's no problem, I simply get a good "C" from one of the other disks.
Scenario 2 (unacceptable corruption): Let's call the archive A1B1C1D1E1F1. Over the course of a few weeks, the version in my machine which I update daily is transformed into A2B2C2D2E2F2+G. These 1->2 changes are minor. Maybe I've altered 4 photos out of a batch of 4,000 in each instance. Maybe I've also modified the directory tree structure in such a way that it's easiest to just completely rewrite all 3 of my backup disks with the brand new A2B2C2D2E2F2G set, and blow away A1B1C1D1E1F1 on all of these backups. Now, what if, unknown to me, a large part of C1 became corrupt on my internal drive. I changed C1 to C2 by modifying 4 out of 4,000 photos (another 2,000 have gone bad but I never discovered it). When I get rid of all my backup C1's and replace them with C2's... I am replicating this unknown corruption onto all my backups. VERY BAD. How do I avoid this?
I think the key here is that the corruption occurred "unknown to me." As I said, I don't know a lot about data corruption... I just know it happens. How can I check for it? Right now, if I copy a very large directory from one disk to another, when it is done copying, I check the "properties" in windows for each folder, and I compare the file count and byte count. If both numbers are identical, I have always assumed that the backup was successful and both folders on their respective disk are indeed identical. Is this a valid assumption? If it is, then I can always check for data corruption on my primary internal disk by comparing it to external disks just before making any internal modifications which might results in overwriting archive data on all the external disks.
Absent these manual "corruption checks," I am sure there exists all sorts of "backup software" which I have always ignored because I prefer the KISS principle. I am guessing these programs employ sync operations which would ostensibly find corruption as well... but I again I like to KISS. No fancy software where not necessary. Should I be re-thinking this?
One of the very few advantages that optical had over HDD was that I could not forcibly overwrite anything that was archived on a write-once medium. Though, I do not think this one advantage alone is enough to draw me back to optical archiving.