• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

how does hashing work?

dpopiz

Diamond Member
as I understand it, a "hash" of a file is a very small bit of data that basically can represent an entire huge file, and can be used to tell whether or not two files are identical.

but how can you tell if two files are *indentical* if all you have is the tiny little bit of hash data?

I must be missing something....
 
Most of the times you use an algorithm that is one-to-one, that is for any given input there is only one output and that output is unique to the input.

If your hashing function satisfies that, then you know that no two different input sets can have the same output, so IF two inputs generate the same output then you know they have to be identical.
 
Originally posted by: dpopiz
as I understand it, a "hash" of a file is a very small bit of data that basically can represent an entire huge file, and can be used to tell whether or not two files are identical.

but how can you tell if two files are *indentical* if all you have is the tiny little bit of hash data?

I must be missing something....

Right, but that tiny piece of information is derived from the entire contents of your file through a very complex mathematical function. So even if your file changes by a single bit, it will produce a completely different hash. But ultimately, it uses all of the information in the file to derive the hash, not just a tiny slice of it.

 
1. ok, so why can't they be used for super-fantastic-unbelievable compression? if it's a one-to-one function, then couldn't you just take a hash and send it through the inverse of that function to get the original data?

2. how is a hash a "security feature"? I mean, as I understand it, this is how they're used for security:
- you want to send a file to somebody else and make sure it doesn't get tampered with along the way. so you make a hash of the file and send it off to the person along with the original file.
- along the way, somebody tampers with the original file. they also make a new hash of the file after they tampered with it and send that along with it instead of the hash of the original file.
- so the person you sent this file to receives a tampered version of it along with the "tampered hash" and thinks it hasn't been tampered with
so what's the point?
 
1. You design your hash function so that it isn't reversible. You have to pick and choose data to throw out, take a look at MD5 hashing. Given the hash, you cannot determine what the input was.

2. Send person file, they run your hash on it and return to you the hash they get. You check it to see if it's valid.
 
1. ok, so why can't they be used for super-fantastic-unbelievable compression? if it's a one-to-one function, then couldn't you just take a hash and send it through the inverse of that function to get the original data?

By definition hashes are non-reversible.

2. how is a hash a "security feature"? I mean, as I understand it, this is how they're used for security:

It's used in other places besides verifying file integrity. For instance, the passwords on most Linux systems are MD5 hashes.

- you want to send a file to somebody else and make sure it doesn't get tampered with along the way. so you make a hash of the file and send it off to the person along with the original file.

You made the fatal mistake of sending the verifcation along with the package, that's just asking for trouble. If you really want to do that you need to do something extra like sign the hash so it can be verified as good too. Or you just point them to a site with the hash or as MCrusty said, have them send you their hash and you verify it.
 
Originally posted by: dpopiz
1. ok, so why can't they be used for super-fantastic-unbelievable compression? if it's a one-to-one function, then couldn't you just take a hash and send it through the inverse of that function to get the original data?

It's not exactly one-to-one.. it's almost one-to-one. The chance for a hash collision, while remote, exists.

 
Back
Top