One-way Hashing

kpb

Senior member
Oct 18, 2001
252
0
0
I was thinking about things and started wondering how these hashes work.

They seem to be used pretty commonly and can operate pretty quickly even on fairly low end hardware yet they seem to be considered all but uncrackable. I'm assuming that any given input always has to produce the same out and that output should be unique to that input. Yet computing the other way from the output back to the input is either impossible or far to time consuming to be practical.

I tried to search some but only found very vage descriptions that basically said thats what it does or very specific things like the source for md5 hashes.

With out getting into specific implimentations how exactly do these formula's work give that they have to be able to calculate quickly even on modest hardware yet be able to stand up to hour or even days of cracking on much more capable machines or multiple machines? How are they so much harder to resolve in reverse?
 

Mark R

Diamond Member
Oct 9, 1999
8,513
16
81
The key concept in one-way functions is that some stages of the computation throw away data. In other words, there are far fewer outputs than there are inputs.

A typical secure hash like MD5 has a 128 bit output, yet the input can be of arbitary length. There are inumerable different inputs that will produce the same output - the output alone doesn't give you enough information to compute the input.

Try this simple program - it's a trivially simple one-way function. You can't compute the input given only the output.

for i=0 to Length
hash = hash + data(i)
next

It's not suitable for a cryptographic hash, because changing the input leads to an easily predictable change in the output. If you wanted to find another input with the same hash as an existing input, it would be quite easy.

Secure hashing needs more steps designed so that a small change has a profound effect on the hash, and more importantly that change is highly dependent on the previous value of the hash. The algorithm can be as complex as you want - you could encrypt your file using a highly-secure encryption system and then use the last few bytes of the encrypted output as the hash. Most useful hashes are designed to be as fast as possible, they use operations that are as simple as possible. MD5 only needs addition, XOR and bit-shifting - the trick with MD5 is manipulating the data several times at each stage, so that each bit in the input, has the potential to affect each and every bit in the output.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Another nice property of a good hash like MD5 is that the odds of two "short" strings producing the same hash is very low. It can be proved that an algorithm will never produce a "collision" (two inputs producing same output) for any possible string under some length.

Hashing isn't really useful as encryption, because you can't reverse it. Asymmetric key encryption (RSA, for example) is much more interesting :).
 

idgaf13

Senior member
Oct 31, 2000
453
0
0
could a Hash be used to identify a user of a P2P ,such as bit torrent,so that over the life of a torrent file
that a user could be credited with sharing ?
current methods do not allow a user to be identified as seeding a file if they sign off after achieiving a total download. They may come back and seed later but are only reported as a seed and not distinguished as seeding a particular file.
many users get irritatted when people abandon them after completion but many peoplecome back later to help reseed but are branded as bad because they left to soon after completing.
 

V00D00

Golden Member
May 25, 2003
1,834
0
0
Originally posted by: idgaf13
could a Hash be used to identify a user of a P2P ,such as bit torrent,so that over the life of a torrent file
that a user could be credited with sharing ?
current methods do not allow a user to be identified as seeding a file if they sign off after achieiving a total download. They may come back and seed later but are only reported as a seed and not distinguished as seeding a particular file.
many users get irritatted when people abandon them after completion but many peoplecome back later to help reseed but are branded as bad because they left to soon after completing.

That's what pretty much all P2P networks do... kind of. They take hashes of the files, so if 2 people have the same file, they share the same hash, if one person goes offline the hashes match up between the 2 files, so it can resume from the other guy.

This is also how the RIAA is making P2P networks unreliable. They are faking the hashes for dummy files. When you download part of the song from an RIAA bot off there you're getting garbage that corrupts the file, but to the P2P network the hash is the same, so it thinks it's sending you the right thing. Only takes 1 RIAA bot to cripple a download that could have 100 other users with the actual file.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: V00D00
Originally posted by: idgaf13
could a Hash be used to identify a user of a P2P ,such as bit torrent,so that over the life of a torrent file
that a user could be credited with sharing ?
current methods do not allow a user to be identified as seeding a file if they sign off after achieiving a total download. They may come back and seed later but are only reported as a seed and not distinguished as seeding a particular file.
many users get irritatted when people abandon them after completion but many peoplecome back later to help reseed but are branded as bad because they left to soon after completing.

That's what pretty much all P2P networks do... kind of. They take hashes of the files, so if 2 people have the same file, they share the same hash, if one person goes offline the hashes match up between the 2 files, so it can resume from the other guy.

This is also how the RIAA is making P2P networks unreliable. They are faking the hashes for dummy files. When you download part of the song from an RIAA bot off there you're getting garbage that corrupts the file, but to the P2P network the hash is the same, so it thinks it's sending you the right thing. Only takes 1 RIAA bot to cripple a download that could have 100 other users with the actual file.


protocols like bittorrent, which breaks a file into chunks and hashes each chunk, are significantly less vulnerable to that type of attack.
 

V00D00

Golden Member
May 25, 2003
1,834
0
0
Originally posted by: CTho9305
protocols like bittorrent, which breaks a file into chunks and hashes each chunk, are significantly less vulnerable to that type of attack.


I didn't know bittorrent works like that. I'm sure it woudl just as easy to apply the same hash-faking ideas to this program. I'm sure it's not that hard. Maybe take a bit more programming, but defineatly possible.

get hash of file chunk -> apply to dummy chunk of equal size
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: V00D00
Originally posted by: CTho9305
protocols like bittorrent, which breaks a file into chunks and hashes each chunk, are significantly less vulnerable to that type of attack.


I didn't know bittorrent works like that. I'm sure it woudl just as easy to apply the same hash-faking ideas to this program. I'm sure it's not that hard. Maybe take a bit more programming, but defineatly possible.

get hash of file chunk -> apply to dummy chunk of equal size

Yes, but if you're downloading, say, a 700MB movie, and the RIAA sends you a block of junk, you download that chunk (chunks are usually between 256kb and 2MB), your client calculates the hash, sees it doesn't match, discards the data, and gets it somewhere else.
 

Rainsford

Lifer
Apr 25, 2001
17,515
0
0
Originally posted by: V00D00
Originally posted by: CTho9305
protocols like bittorrent, which breaks a file into chunks and hashes each chunk, are significantly less vulnerable to that type of attack.


I didn't know bittorrent works like that. I'm sure it woudl just as easy to apply the same hash-faking ideas to this program. I'm sure it's not that hard. Maybe take a bit more programming, but defineatly possible.

get hash of file chunk -> apply to dummy chunk of equal size

It works in P2P networks because, as I understand your explanation, the P2P networks misuse hashing. In Bit Torrent, or any reasonably well designed use of hashing, the way it works is that you give me a file (or a piece of a file) and I get the expected hash of the file or piece from a trusted source. I then perform the hash on the file or piece myself, and if it doesn't match the expected hash value, I know I don't have a good copy.

It sounds like what the P2P networks do is use hashes for unique identification of files, but they screw up by not having you verify the hash is actually the hash of the file you are getting. IE, you trust the source of the file, which is exactly why hashes exist in the first place, because you DON'T trust the source of the file and you want to verify it's the right thing.

Here's an example of how it SHOULD be used, and how it works in bit torrent (as far as I know).

I want chunk 5, which I know has a hash of 11ae (just an example, the real hash is 160 bits)
So I connect to RIAA toadie who sends me what he claims is chunk 5
I apply the hashing function to the "chunk 5" from RIAA toadie and see if its equal to 11ae
If it isn't, I know I got a bad chunk and try to get it from someone else

There is no way to fake this since I know what the output should be when the chunk is run through the hashing function. A dummy chunk will not hash to the same thing. In fact, that's a big thing with hashing functions. It should be very difficult to find another input that hashes to the same output, otherwise making a fake dummy chunk wouldn't be all that hard.
 

V00D00

Golden Member
May 25, 2003
1,834
0
0
Yeah, that would be exactly correct. Kazaa doesn't verify the files using the hashes, just uses them as a way to identify the file. Kind of stupid. Kind of really stupid.
 

AEB

Senior member
Jun 12, 2003
681
0
0
UNIX actually uses a hash type method to store passwords. It encrypts and hashes and stores the hashed "number". It doesnt have to go backwards becasue when it wants to verify if the enterd password matches it just hashes the enterd using the same scheme and key and see if it matches with the one stored. So the password is esentially secure because even if someone did get the password file they would have to run through every possible character combination. Nothing about the lenght is given away either because the algorithm only stores 8 bytes(i think), even if you have a 32 byte password. Very interesting stuff. but yes hashes are mostly used for authentication or signatures.
 

V00D00

Golden Member
May 25, 2003
1,834
0
0
Hashing is also used in the Windows logon authentication. Pretty much any windows actions requiring a password use hashes. It's the only way to verify something without a direct match and revealing the password.