LinkedIn was hacked and the hackers got 6.5M passwords, half of which was posted in cleartext (i.e., already "decrypted"; I used quotes because no real decryption actually happened).
What's funny was that just two weeks ago, I was talking about this very same topic, 'how to do password storage right', with my younger brother (I was presenting an overview of a framework, and part of it was the security focus), while we were both basically idling after our older brother's wedding. (Yes, isn't it awesome that there's still time for tech even after a wedding? :awe: )
How LinkedIn Stored Passwords
Basically, LinkedIn was not like reddit (IIRC) who was embarrassingly caught storing passwords in cleartext. Instead, LinkedIn stored the SHA-1 hash of the actual password. As I was telling my brother, using just SHA-1 (and more so MD5) is almost tantamount to using cleartext, except that it makes the developers feel better. Bottomline, your passwords are practically no safer.
So what's the state-of-the-art when it comes to password storage?
1.) Use a per-user salt that is a nonce.
2.) Don't use MD5 or SHA-1
3.) Use bcrypt (BLOWFISH) with an appropriate work factor, or apply key stretching on your preferred hash algorithm (hopefully, you followed #2, and have settled on a SHA-2 variant or its peers).
If LinkedIn followed these, the Russian hackers would have gotten nothing but a useless bunch of hashes, that they'd need an inordinate amount of time to brute force.
How exactly do the following recommendations actually help?
1. Use a per-user salt that is a nonce.
A salt is simply another string you append to the password before hashing. It should be a cryptographic nonce, not just a “random” value, because being a cryptographic nonce goes a long way to guarantee, with no extra hassle, that your salts will actually be unique for every user, whether you have 10K users, or 600M like Facebook.
Salts are not secret information. They would end up as just another field in your user table, probably beside the password field. What they are meant for is to make pre-computed dictionary attacks obsolete. Even with beefy hardware, having proper salts will make it inefficient for attackers to use a pre-computed dictionary (“rainbow table”
. Without a salt, they can afford to spend weeks or months generating their dictionary, and they can use that against all your users. With salts, they have to generate a dictionary for each user. You've just cut their efficiency by 99.9999....% and basically made a dictionary attack obsolete.
This is why 3M passwords seemed to have been “decrypted” very fast and posted in cleartext in the Russian forum. These were simply the passwords that were found in the dictionary. In other words, a look-up table (albeit a gigantic one) defeated the password hashing scheme.
2. Don't use MD5 or SHA-1
In message authentication, MD5 is clearly broken (the last break I've read about involved pencil-and-paper; you can't get any more broken than that), and SHA-1 isn't faring too much better. However, you may argue, that their weakness and inappropriateness for message authentication has no bearing on being a password hash. That's correct. The problem with them (and similarly, all message authentication digests) is that they were designed to be fast, and speed is the enemy. For example, using my secondary PC (Phenom II X4, 3.4GHz, 8GB RAM) and non-optimized "word"-generator and MD5 hashing code I drew up in a few minutes, I was able to generate 4M hashes per second (and random words for each, using 92 chars I could find on my keyboard) and compare each one to a hash I have in order to determine what the original word was. Do the math. There are even faster ways to go about it (optimizing the sloppy code, bitslicing, using a statistics-based character distribution on the dataset in order to use chars more likely to appear in valid words, etc), including using GPUs and, the real killer, FPGAs.
3. Use bcrypt (BLOWFISH) with an appropriate work factor, or apply key stretching on your preferred hash algorithm.
Since being fast is deadly for our password hashing scheme, the solution is either:
a.) Use a password hashing scheme that is not fast (bcrypt)
b.) Slow down your password hashing scheme (key stretching)
Key stretching works by simply looping the hashing process a thousand times or more, and then using that final hash result as the password hash (using the salt for each iteration is also a good idea, as it also makes the process a little more expensive). The exact number of iterations is not important – what's important is how slow the process becomes. Our target is to make it as slow as possible so that the attacker's effort will also be slowed down, but not so slow so as to negatively affect our service. Depending on your server load, it could be 100ms, 400ms, higher, or lower.
Bcrypt is great because it is based on a block-cipher which was designed to be slow AND can be slowed down (variable work factor). That's automatic key-stretching for you, and saves you the trouble of implementing a way to make the key stretching variable so that you can adjust it as processing power advances and increases.
The moment I read of the LinkedIn hack in DailyTech, I sent a YM message (we were both at work that time; yeah yeah, “someone still uses YM, what a caveman”, I've heard that before
) to my brother and reminded him of our conversation two weeks ago. If he didn't believe me then, I'm sure he believes me now 
What's funny was that just two weeks ago, I was talking about this very same topic, 'how to do password storage right', with my younger brother (I was presenting an overview of a framework, and part of it was the security focus), while we were both basically idling after our older brother's wedding. (Yes, isn't it awesome that there's still time for tech even after a wedding? :awe: )
How LinkedIn Stored Passwords
Basically, LinkedIn was not like reddit (IIRC) who was embarrassingly caught storing passwords in cleartext. Instead, LinkedIn stored the SHA-1 hash of the actual password. As I was telling my brother, using just SHA-1 (and more so MD5) is almost tantamount to using cleartext, except that it makes the developers feel better. Bottomline, your passwords are practically no safer.
So what's the state-of-the-art when it comes to password storage?
1.) Use a per-user salt that is a nonce.
2.) Don't use MD5 or SHA-1
3.) Use bcrypt (BLOWFISH) with an appropriate work factor, or apply key stretching on your preferred hash algorithm (hopefully, you followed #2, and have settled on a SHA-2 variant or its peers).
If LinkedIn followed these, the Russian hackers would have gotten nothing but a useless bunch of hashes, that they'd need an inordinate amount of time to brute force.
How exactly do the following recommendations actually help?
1. Use a per-user salt that is a nonce.
A salt is simply another string you append to the password before hashing. It should be a cryptographic nonce, not just a “random” value, because being a cryptographic nonce goes a long way to guarantee, with no extra hassle, that your salts will actually be unique for every user, whether you have 10K users, or 600M like Facebook.
Salts are not secret information. They would end up as just another field in your user table, probably beside the password field. What they are meant for is to make pre-computed dictionary attacks obsolete. Even with beefy hardware, having proper salts will make it inefficient for attackers to use a pre-computed dictionary (“rainbow table”
This is why 3M passwords seemed to have been “decrypted” very fast and posted in cleartext in the Russian forum. These were simply the passwords that were found in the dictionary. In other words, a look-up table (albeit a gigantic one) defeated the password hashing scheme.
2. Don't use MD5 or SHA-1
In message authentication, MD5 is clearly broken (the last break I've read about involved pencil-and-paper; you can't get any more broken than that), and SHA-1 isn't faring too much better. However, you may argue, that their weakness and inappropriateness for message authentication has no bearing on being a password hash. That's correct. The problem with them (and similarly, all message authentication digests) is that they were designed to be fast, and speed is the enemy. For example, using my secondary PC (Phenom II X4, 3.4GHz, 8GB RAM) and non-optimized "word"-generator and MD5 hashing code I drew up in a few minutes, I was able to generate 4M hashes per second (and random words for each, using 92 chars I could find on my keyboard) and compare each one to a hash I have in order to determine what the original word was. Do the math. There are even faster ways to go about it (optimizing the sloppy code, bitslicing, using a statistics-based character distribution on the dataset in order to use chars more likely to appear in valid words, etc), including using GPUs and, the real killer, FPGAs.
3. Use bcrypt (BLOWFISH) with an appropriate work factor, or apply key stretching on your preferred hash algorithm.
Since being fast is deadly for our password hashing scheme, the solution is either:
a.) Use a password hashing scheme that is not fast (bcrypt)
b.) Slow down your password hashing scheme (key stretching)
Key stretching works by simply looping the hashing process a thousand times or more, and then using that final hash result as the password hash (using the salt for each iteration is also a good idea, as it also makes the process a little more expensive). The exact number of iterations is not important – what's important is how slow the process becomes. Our target is to make it as slow as possible so that the attacker's effort will also be slowed down, but not so slow so as to negatively affect our service. Depending on your server load, it could be 100ms, 400ms, higher, or lower.
Bcrypt is great because it is based on a block-cipher which was designed to be slow AND can be slowed down (variable work factor). That's automatic key-stretching for you, and saves you the trouble of implementing a way to make the key stretching variable so that you can adjust it as processing power advances and increases.
The moment I read of the LinkedIn hack in DailyTech, I sent a YM message (we were both at work that time; yeah yeah, “someone still uses YM, what a caveman”, I've heard that before
Last edited: