Advice on data corruption problem

Jhkh

Junior Member
Jan 2, 2013
18
0
66
The solutions provided suits my needs. Problem SOLVED

Hi, here I am again, for deciphering obscure mysteries of computer tech !
I ran out of ideas for this one, but it seems it’s solved (took almost 5 whole days^^)


Anyway I want your advice on the matter.


This was mostly written before I found the cause so it’s very detailed. If you don’t want to read the whole post, jump straight to conclusion.
Be prepared, it’s a long ride!

Let me explain ; I own an Acer Aspire V3-772G (laptop), equipped with the following :
-Core I7-4702MQ 2.2GHz turbo 3.2GHz
-Nvidia Geforce GTX 760M
-2X 8 GB Kingston PC3-12800 DDR3 800MHz
-SSD Toshiba MSATA 120GB
-HDD Toshiba 2.5” 1TB
-Combo DVD burner/BD player
-Full HD matte display


I use it mostly for photo manipulation and video encoding with x264 and Avisynth, so the processor is often stressed at 100% for a long time. The “disks” have more than 6000 power on hours behind them.

The PC was bought in August 2013, had motherboard and power supply replaced by the store in February 2014 after complete failure.
I have then used it with no problem for a while.


I decided a month ago to install Windows 10, but ended reinstalling Windows 8.1 (clean install) because I prefer W8 for Better TrueType rendering (Lol).


At this occasion, I made a copy of my external drive back to the internal (that I formatted to get rid of manufacturer rescue partitions). Note that I use Freefilesync to “mirror” the files to/from my external hard drive.


Then I started to notice ”holes” in some video frames from files I kept :
http://i.imgur.com/5WMiXuT.png
http://i.imgur.com/c6JP3G7.png


There was also errors “51 disk” “An error was detected on device \Device\Harddisk2\DR2” during a paging operation” in events viewer, targeting my external drive. (Ask if you want HEX details, but they are random). See there : http://i.imgur.com/d3HmqTj.png


Sfc /scannow gave uncorrectable errors, fixed with DISM and new sfc /scannow.



THE TESTS


Did full AV scan (Avira), and full Malwarebytes scan, no result.


I took internal drive out in a dock on another computer and checked CRC for the whole disks (between external and internal one) with Freefilesync, and found that 20 or so files were corrupted, mostly video files (mp4, mkv) but also some iso files I kept. Seems to affect randomly anything bigger than 100MB.


On the laptop, I took two big video files (15GB for both), from another backup made a long time ago, that didn’t have holes in the frames, and copied them over and over :
-between 2 external hard drives (3,5” with external power supply)
-between the SSD and the internal HDD
-between SSD and external drives (both of them)
-between internal HDD and SSD


Always random corruption on the biggest of the two files, even with Teracopy.

Did the same on my other computer numerous times ( with no problem at all. So I’m sure the external drives are good, I even did a complete surface check with Minitool Partition Wizard on one of them (the one that showed the 51 errors, whose didn’t show up on my other computer when copying to it).


I made 4 pass of MemTest86 on the memory (12 hours) with no single error.

I then booted on a live Ubuntu DVD, and did the following tests more than 3 times each (tests with md5sum, I previously checked the reference files) :

-usb 3.0 stick to external HDD 1 (usb3)
-usb 3.0 stick to external HDD 2
-transfer between the two external HDD
-transfer between SSD to external
-changed controllers (all drives on usb2, then on usb 3)
No problem with the usb stick and the usb3 drive (both are very fast)
Second file always corrupted on usb 2.0 drive (slow one).

Did a new test to be sure on my other computer (Windows 10), four times between my external drives, no problem.

On the laptop with Ubuntu live, I tested copying with :
-one RAM stick at a time (one, then the other) : no problem
-Both sticks again : corruption
-Took off bios(uefi) battery, still corruption remains.
-Moved the sticks to the free slots (still dual channel) : no problem
-Tested with another power supply (memory back in the original slots) : corruption again.

ADDITIONAL TESTS
-Here are CPUID results : http://i.imgur.com/pZRgTNR.png
-Cleaned slots with compressed air and a gentle brush: NO MORE PROBLEM!!!

CONCLUSION :
It must have been a dirty DIMM slot.

Big question, as file integrity is my priority, HOW should I avoid such a situation in the future, apart from using Teracopy and always check CRC of the files? (quite a hassle, I want reliable hardware, not one that corrupts even the backups)


Should I use the empty slot pair to avoid the problem coming back?


This has been the trickiest problem I ever stumbled upon, as it can’t be detected with standard tools, and happens only with some precise big files (like 20 on more than 700GB of data).


Thanks for reading, and let me know what you think.
 
Last edited:

ksec

Senior member
Mar 5, 2010
420
117
116
Same problem here. But i think you should edit your post at the start with Tl;DR.

I need a a simple, Storage devices that preserve file integrity. As larger the files and Hard Drives goes, the higher chance of things getting corrupted, Especially with video and photos.

Another question is, is there any way / tools to fix this corrupted photos / video?
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Big question, as file integrity is my priority, HOW should I avoid such a situation in the future, apart from using Teracopy and always check CRC of the files? (quite a hassle, I want reliable hardware, not one that corrupts even the backups)
Buy and use a PC that has ECC RAM (which generally entails moving to a desktop), then store and backup your photos on ReFS volumes. There's not much of a market for this sort of thing, because most DIMMs will perform well enough basically forever, and too few people have gotten bitten, so you really have to buy a workstation or tower server, or build a desktop with a server/workstation motherboard, and CPU that supports ECC (however, there are a few ECC-supporting notebook CPUs, so no reason it can't come to be). The situation generally sucks, if you like carrying a PC around.

But, as a first step, ReFS, on Windows (Storage Spaces, in 8 or newer), for data drives. All data on the drive is checksummed, and checked against when read. Anything corrupted during any kind of editing, or transfer process transforms the data format, will not be detected, but you should have protection against silent corruption while data is not being used, or while copying from a ReFS volume to another ReFS volume.

Also, check the SMART values on the HDD (get the portable/zip CrystalDiskInfo, for a basic easy program), to be sure it's not failing.
 
Last edited:

Jhkh

Junior Member
Jan 2, 2013
18
0
66
ksec said:
Another question is, is there any way / tools to fix this corrupted photos / video?
I think it's too late in such a situation, especially because I relied too much on the "backup".

Cerb said:
Buy and use a PC that has ECC RAM (which generally entails moving to a desktop), then store and backup your photos on ReFS volumes.
Thank you for the advice, I didn't know ReFS actually existed.

I will consider buying a desktop with ECC RAM, and maybe a Proliant Gen8 for the backups.
Also, check the SMART values on the HDD
Done. Nothing wrong in there. The problem did not appear again, I assume the dust released when paint works were done in my house caused it.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Thank you for the advice, I didn't know ReFS actually existed.
The same guy that decided how Windows 8's desktop and phone UI merging would be done was also in charge of developing of developing ReFS as a Windows feature/product. For all the technical excellence MS can manage, they consistently seem to find ways to trip over themselves, when to comes to the management and marketing. Their own efforts to market and document it, as Storage Spaces, makes something that's mildly complicated, compared to the traditional structure (fully separated volume management, RAID, and file system, versus all combined), and made it seem awfully complex and convoluted.

It's basically their attempt to come up with something that can compete with ZFS. (on Windows Server OSes, it has a lot more features, to match, as well). You don't get recovery from data corruption without RAID (on the desktop), but anything more than a mirror (RAID 1) or parity array (RAID 5) on the desktop implementation is tricky.

and maybe a Proliant Gen8 for the backups.
Backups can be done with and on just about anything. If you need to span across drives, though, a dedicated NAS, which a Windows PC will work very well as, may be convenient.
 

John Connor

Lifer
Nov 30, 2012
22,757
618
121
I didn't read the whole thing, but it sounds like you could benefit from RAID 1.

I've seen HDDs with a CRC thing, so look at the specs when buying a new HDD.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
I've seen HDDs with a CRC thing, so look at the specs when buying a new HDD.
Only a full SAS chain can give you such extra error-checking, compared to a modern filesystem, AFAIK.

Agreed about RAID 1 (for Storage Spaces, a mirror), but that should be a much lower priority. If the OP wants to set up a NAS (with GbE and/or AC WiFi, performance isn't a big issue, even for photos and videos, as long as parity-based arrays are avoided), definitely use RAID 1 or 10 on that.

RAID 1 (and 10) will do a great job of protecting against downtime, including data getting out of sync, if one HDD dies. It can also protect against *non-silent errors*, like a bad HDD sector on one HDD, or a bad write that wasn't detected during the write, but then gets found when read. With a single drive, you can get an error (and then usually have to Google it, becase it will probably be cryptic, when it actually pops up :)), but then have to go to your backups for good data.

Another thing to consider is rotating backups, with some rather old ones, so that in the time it takes to realize there was a problem, file versions that are not corrupted may still exist. That is also a good way to keep data protected against ransomware.
 

Jhkh

Junior Member
Jan 2, 2013
18
0
66
I will definitively direct myself towards an ECC capable desktop workstation (ReFS on storage drive), and a NAS/server with RAID1 configuration for backups, as it ensures the disks are still readable if the RAID controller fails (never forget that eventuality ^_^)
Cerb said:
Another thing to consider is rotating backups, with some rather old ones, so that in the time it takes to realize there was a problem, file versions that are not corrupted may still exist.
A second copy from time to time on a regular USB drive seems a good idea, for many reasons (power surge, etc...).

Only a full SAS chain can give you such extra error-checking, compared to a modern filesystem, AFAIK.
Sure, I own an outdated server with full hardware SAS. I don't use it anymore because of the power draw and small HDD capacities.

I may be a bit tech savvy, still I need other's experience ;).
I now have all the keys for chosing the best hardware solution for my needs. Thanks a lot!
 

ksec

Senior member
Mar 5, 2010
420
117
116
Only a full SAS chain can give you such extra error-checking, compared to a modern filesystem, AFAIK.

Agreed about RAID 1 (for Storage Spaces, a mirror), but that should be a much lower priority. If the OP wants to set up a NAS (with GbE and/or AC WiFi, performance isn't a big issue, even for photos and videos, as long as parity-based arrays are avoided), definitely use RAID 1 or 10 on that.

RAID 1 (and 10) will do a great job of protecting against downtime, including data getting out of sync, if one HDD dies. It can also protect against *non-silent errors*, like a bad HDD sector on one HDD, or a bad write that wasn't detected during the write, but then gets found when read. With a single drive, you can get an error (and then usually have to Google it, becase it will probably be cryptic, when it actually pops up :)), but then have to go to your backups for good data.

Another thing to consider is rotating backups, with some rather old ones, so that in the time it takes to realize there was a problem, file versions that are not corrupted may still exist. That is also a good way to keep data protected against ransomware.

I am using RAID 1 already on my Synology NAS. Raid 1 as far as i know, only protects you from Disk failure. File Corruption is an entirely different thing. While it is nice to know ReFS, but that is only a Windows solution.

I wish there are consumer NAS that offer some file corruption protection. After all, having 2 - 3TB of files, even a small % would make quite a difference.
 
Last edited:

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
I am using RAID 1 already on my Synology NAS. Raid 1 as far as i know, only protects you from Disk failure. File Corruption is an entirely different thing. While it is nice to know ReFS, but that is only a Windows solution.

I wish there are consumer NAS that offer some file corruption protection. After all, having 2 - 3TB of files, even a small % would make quite a difference.
No disagreement. BTRFS works well enough for a simple RAID 1, but still really requires *n*x familiarity; and ZFS is rabbit hole to jump down in. Even not using turnkey, you've got to invest some time to learn about it and manage it, right now.
 

ksec

Senior member
Mar 5, 2010
420
117
116
No disagreement. BTRFS works well enough for a simple RAID 1, but still really requires *n*x familiarity; and ZFS is rabbit hole to jump down in. Even not using turnkey, you've got to invest some time to learn about it and manage it, right now.

Thx for the BTRFS tip. I just learned my NAS ( Synology ) will get BTRFS in the next DSM upgrade. Not sure if my model supports it but at least some consumer NAS is finally taking some action against file corrpution.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Thx for the BTRFS tip. I just learned my NAS ( Synology ) will get BTRFS in the next DSM upgrade. Not sure if my model supports it but at least some consumer NAS is finally taking some action against file corrpution.
Cool. I already consider them the best consumer/low-end business turnkey NASes. Nice to know they are working to maintain their reputation.
 

Jhkh

Junior Member
Jan 2, 2013
18
0
66
kesc said:
I wish there are consumer NAS that offer some file corruption protection. After all, having 2 - 3TB of files, even a small % would make quite a difference.
I think, actual mainstream filesystems (NTFS, here you go) are just not good enough for large hard drives such as the 4TB I own.
May become even more of a problem in near future. When too many consumers will be impacted, eventually the switchover will start.
Cerb said:
BTRFS works well enough for a simple RAID 1
ksec said:
I just learned my NAS ( Synology ) will get BTRFS in the next DSM upgrade
How nice is that, it is exactly what I need! Glad I reported the purchase of the NAS to hear Synology is what I'll buy, for that very reason.