• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Which hash algorithm samples the most data?

TheDarkKnight

Senior member
This question has arisen from a need to verify the integrity of some of my DVDs using a software that uses the CRC stored on a CD/DVD disc.

Which hash method samples the most data that it represents...

CRC-32, MD5, or SHA-1?

I am curious given a hash generated on a regular size DVD .ISO file of a length/size of ~4.38 GiB, what percentage of all those bytes would be used to generate the final hash in the algorithm?
 
All of them should be hashing the entire contents. The only difference is the length of the resultant output, and therefore the probability of collision. If you're checking what should be an identical file, then any errors will result in a mismatched hash.
 
crc and md5 are usually used for speed, sha is slower but more accurate
they all use 100% of the bytes, but it is possible to get two different files with the same hash, just rare.
generally the larger the hash, the more accurate. IE: 32-bits 256-bits 512-bits etc.
 
Don't worry about just use whatever is easier. They will all detect a problem on the DVDs perfectly fine. The chance that any hash has a collision is just too tiny to matter at this size.
 
Actutally the data on a DVD is protected against errors by using erasure codes.

On a CD and DVD Reed Solomon codes are used. You can read about it on wikipedia.
 
Actutally the data on a DVD is protected against errors by using erasure codes.

On a CD and DVD Reed Solomon codes are used. You can read about it on wikipedia.

Eh ? Erasure codes ? You mean ECC. (Error correcting code). While it is true that the hardware does have ECC ability, both DVDs & CDs can indeed develop errors that can make it worthless, there is only so much that ECC can "fix".

Using PAR2 recovery volumes is a excellent way to add a extra layer of protection to be able to recover your data. You just need to have more recovery volumes available than the amount of errors in the original file(s).
 
Eh ? Erasure codes ? You mean ECC. (Error correcting code). While it is true that the hardware does have ECC ability, both DVDs & CDs can indeed develop errors that can make it worthless, there is only so much that ECC can "fix".

Using PAR2 recovery volumes is a excellent way to add a extra layer of protection to be able to recover your data. You just need to have more recovery volumes available than the amount of errors in the original file(s).

Reed Solomon is an erasure code, also known as forward error correction. And erasure codes is an error correcting code. So not tell me what I mean...
The Parchive erasure code might not be the most effective.
In order to be able to be 100% sure of you can recover the data you need to used a code with maximum hamming distance.
However the question is fomulated in an unspecific way. The hash function is a oneway function not ment to hold any additional information.

-SG
 
The Parchive erasure code might not be the most effective.
-SG

Correct.

Parchive version 2 (.par2) has a minor bug in the specification which can, under some circumstances, give suboptimal protection. There are also a number of PAR2 software packages which have additional bugs (in particular, there is one package with a major bug in construction of the Reed-Solomon codes resulting in readable and valid files, but with virtually no redundancy).

There is a proposed .par3 standard, but only 1 beta software package uses it (multipar) as the author has written the .par3 proposal. This algorithm is mathematically optimal, as far as is known; it is also computationally far more efficient than .par2, so can be an order of magnitude faster to calculate for large datasets. The risk with .par3 is that the standard has not been decided upon, and therefore the file-format is subject to change. If you chose to use this, make sure you keep a copy of the exact software you used to create the files, in case future versions cannot read them.
 
Back
Top