Password Storage Lessons from the LinkedIn Hack

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
LinkedIn was hacked and the hackers got 6.5M passwords, half of which was posted in cleartext (i.e., already "decrypted"; I used quotes because no real decryption actually happened).

What's funny was that just two weeks ago, I was talking about this very same topic, 'how to do password storage right', with my younger brother (I was presenting an overview of a framework, and part of it was the security focus), while we were both basically idling after our older brother's wedding. (Yes, isn't it awesome that there's still time for tech even after a wedding? :awe: )

How LinkedIn Stored Passwords
Basically, LinkedIn was not like reddit (IIRC) who was embarrassingly caught storing passwords in cleartext. Instead, LinkedIn stored the SHA-1 hash of the actual password. As I was telling my brother, using just SHA-1 (and more so MD5) is almost tantamount to using cleartext, except that it makes the developers feel better. Bottomline, your passwords are practically no safer.

So what's the state-of-the-art when it comes to password storage?
1.) Use a per-user salt that is a nonce.
2.) Don't use MD5 or SHA-1
3.) Use bcrypt (BLOWFISH) with an appropriate work factor, or apply key stretching on your preferred hash algorithm (hopefully, you followed #2, and have settled on a SHA-2 variant or its peers).

If LinkedIn followed these, the Russian hackers would have gotten nothing but a useless bunch of hashes, that they'd need an inordinate amount of time to brute force.

How exactly do the following recommendations actually help?

1. Use a per-user salt that is a nonce.
A salt is simply another string you append to the password before hashing. It should be a cryptographic nonce, not just a “random” value, because being a cryptographic nonce goes a long way to guarantee, with no extra hassle, that your salts will actually be unique for every user, whether you have 10K users, or 600M like Facebook.

Salts are not secret information. They would end up as just another field in your user table, probably beside the password field. What they are meant for is to make pre-computed dictionary attacks obsolete. Even with beefy hardware, having proper salts will make it inefficient for attackers to use a pre-computed dictionary (“rainbow table”). Without a salt, they can afford to spend weeks or months generating their dictionary, and they can use that against all your users. With salts, they have to generate a dictionary for each user. You've just cut their efficiency by 99.9999....% and basically made a dictionary attack obsolete.

This is why 3M passwords seemed to have been “decrypted” very fast and posted in cleartext in the Russian forum. These were simply the passwords that were found in the dictionary. In other words, a look-up table (albeit a gigantic one) defeated the password hashing scheme.

2. Don't use MD5 or SHA-1

In message authentication, MD5 is clearly broken (the last break I've read about involved pencil-and-paper; you can't get any more broken than that), and SHA-1 isn't faring too much better. However, you may argue, that their weakness and inappropriateness for message authentication has no bearing on being a password hash. That's correct. The problem with them (and similarly, all message authentication digests) is that they were designed to be fast, and speed is the enemy. For example, using my secondary PC (Phenom II X4, 3.4GHz, 8GB RAM) and non-optimized "word"-generator and MD5 hashing code I drew up in a few minutes, I was able to generate 4M hashes per second (and random words for each, using 92 chars I could find on my keyboard) and compare each one to a hash I have in order to determine what the original word was. Do the math. There are even faster ways to go about it (optimizing the sloppy code, bitslicing, using a statistics-based character distribution on the dataset in order to use chars more likely to appear in valid words, etc), including using GPUs and, the real killer, FPGAs.

3. Use bcrypt (BLOWFISH) with an appropriate work factor, or apply key stretching on your preferred hash algorithm.
Since being fast is deadly for our password hashing scheme, the solution is either:
a.) Use a password hashing scheme that is not fast (bcrypt)
b.) Slow down your password hashing scheme (key stretching)

Key stretching works by simply looping the hashing process a thousand times or more, and then using that final hash result as the password hash (using the salt for each iteration is also a good idea, as it also makes the process a little more expensive). The exact number of iterations is not important – what's important is how slow the process becomes. Our target is to make it as slow as possible so that the attacker's effort will also be slowed down, but not so slow so as to negatively affect our service. Depending on your server load, it could be 100ms, 400ms, higher, or lower.

Bcrypt is great because it is based on a block-cipher which was designed to be slow AND can be slowed down (variable work factor). That's automatic key-stretching for you, and saves you the trouble of implementing a way to make the key stretching variable so that you can adjust it as processing power advances and increases.


The moment I read of the LinkedIn hack in DailyTech, I sent a YM message (we were both at work that time; yeah yeah, “someone still uses YM, what a caveman”, I've heard that before:p ) to my brother and reminded him of our conversation two weeks ago. If he didn't believe me then, I'm sure he believes me now :D
 
Last edited:

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
Linked In might not have actually been hacked. Looks like only 10% of the passwords are right. This means there is a high chance this was actually a socially engineering hack where they made it look plausible that linkedIn was hacked so they told everyone they had been. Evidence suggests the passwords are actually from previous hacks and the hits are due to users using the same passwords in other places.

The interesting thing is that at the same time the hackers fired out millions of phishing emails saying LinkedIn had been hacked and providing a link to change the password. Actually its pretty genius really as an approach.

PS you didn't get this from me.
 

Leros

Lifer
Jul 11, 2004
21,867
7
81
I don't see the need to even store the user salt.

Why not use a hash to generate the user's salt from their user id? If your database gets compromised, the hackers wouldn't even have the salt. They would need to know the algorithm you use to generate the salt.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
The salt is simply there to thwart dictionary attacks, nothing more.

What you are proposing is "security by obscurity". There is no crypto value in it*, and I can imagine a few cases where it would not be beneficial at all**. However, if your programmers are dead-set on that and helps them feel more secure by implementing something like that, you can implement a double-salting mechanism - use the first salt as it was meant to be (see original post), and then implement a secondary "secret" salt.

In short, that would look like hash(password + salt + secret_salt). In such a case, if your salt (the real salt, not the "secret" salt) is an actual cryptographic nonce as previously described, you would not be sacrificing any of the crypto value of the salt, while having the desired "secret algorithm" effect. In this case, since the effective security of your implementation does not anymore rely purely on a "secret algorithm", you are not stuck with "security by obscurity" level of security (pretty low), but instead are now in the "defense in depth" level.***



Notes:
* Because the only purpose of the salt is to make each record unique, whether you store it plainly or keep it a secret does not make it do its job better.

** Using usernames, or IDs, or whatever other field makes the salt predictable, consistent, and (our real concern) repeatable. For example, in internal business apps rolled out as various departmental / isolated systems by IT (instead of a grand unified "one-system-to-handle-the-entire-company"), it is a common practice to make usernames the same (employee ID commonly, or a standard "naming convention" from IT), therefore if the IT department applies the same password storing schemes, each user will then be assigned the same salt for each of the systems he accesses. For one thing, that makes his hashes very visibly similar if he reused his password. Even if he didn't, we already know his hashes are the same, and that makes the dictionary twice as useful (or x times as useful, whatever the number of systems he is accessing internally that IT rolled out).

*** The real strength of any cryptosystem is that its effective mitigation strength does not rely on the potential weakness of the information of the attacker. It doesn't matter if they have only the user table, or the entire database, or the entire server - in our particular case (password storage) they should only be able to resort to brute forcing it, and it should be just as expensive for them each time, whatever they may have in possession (one table, entire DB, or whole server including apps - or even if the "breach" was an inside job and those responsible come from the IT dept themselves, and thus have knowledge of the "secret algo"). The recommended password storage mechanism assures us of that. Relying on a secret algo does not.

**** Defense in depth does have its merits. I am, in fact, in favor of the defense in depth tactic that I outlined in this post. However, I did not include it because I do not wish my personal bias to affect the presentation of the ideal password storage as defined by security experts and cryptographers. Also, to be honest, it won't help your security director sleep better at night when a breach happens - he'd be losing sleep wondering if the attackers merely got the user table, or did they also get a copy of the algo from the source code, or if it was an inside job and they immediately possess knowledge of the "secret algo". What would help him sleep better is "oh yeah, it'd take them months to brute force a single account... I'll just go to work tomorrow, have a cup of coffee, then make the system(s) prompt each user for a new password. Bless reliable password storage, I'm not gonna be fired for this".

***** The seconday "secret" salt can even be a better secret than just an algo that specifies "hash the username". It can be an application-specific key (so if your IT dept rolled out many internal business apps, each one would have a unique app-specific key as the secret salt). Then you can better store this secret (separate server from code and DB, for example). It's still just for "defense in depth", and never a replacement for the actual recommendation. In other words, the "secret" salt will always be supplementary, not a replacement, to the real salt.
 

jvroig

Platinum Member
Nov 4, 2009
2,394
1
81
Linked In might not have actually been hacked. Looks like only 10% of the passwords are right. This means there is a high chance this was actually a socially engineering hack where they made it look plausible that linkedIn was hacked so they told everyone they had been. Evidence suggests the passwords are actually from previous hacks and the hits are due to users using the same passwords in other places.
And that's the difference between using a proven, cryptographically secure, password storage mechanism, instead of a "clever" developer-inspired one, or a clearly broken one such as using SHA1 once then calling it a day.

Had they been storing passwords as recommended, they would have just brushed off this attempt to phish (for the sake of this discussion, let us all assume it was in fact just a social engineering tactic). They would have immediately issued the statement: "Actually, we've been storing passwords in the most secure way, and even with the hashes (which we doubt they actually have, but it doesn't matter either way), they'd never be able to crack 3M passwords, so this is completely bogus, it's laughable".

But because the way they store their passwords is broken (at least, from how all news sources have described it, and they have not attempted to say "no, we aren't just using SHA1 one time", so I assume it really is as described), it becomes completely feasible that passwords were actually recovered and transformed into plaintext. And therein lies the difference between a reliable crypto system, and one that isn't: the reliable one lets you sleep better, the other one keeps you nervous and guessing and playing "what if" in your mind.
 

Mark R

Diamond Member
Oct 9, 1999
8,513
16
81
I don't see the need to even store the user salt.

Why not use a hash to generate the user's salt from their user id? If your database gets compromised, the hackers wouldn't even have the salt. They would need to know the algorithm you use to generate the salt.

Yes. This would work to prevent dictionary attacks on the password.

However, it would not be of relevant advantage as compared to the use of a randomly generated and stored salt. The potential problem with it, is that if this algorithm is widely used, common user id's potentially become vulnerable to dictionary attacks. Although the use of such attacks seems unlikely, the use of a random salt avoids this.

Little is gained by using a secret hash algorithm, as this is, at its heart, security by obscurity, which offers only modest security.

There is some value in the use of an additional "secret" salt. If the database is compromised and the PW hashes and salts leaked, the database remains vulnerable to a brute-force attack. If a secret salt (embedded in the server-side code, not in the database) is used, then brute-force attacks become very impractical, unless the hackers are able to obtain the secret salt (which will hopefully be on a different server, with different accessibility, etc.)
 

alkemyst

No Lifer
Feb 13, 2001
83,769
19
81
15+ char passwords based on known memory and where you joined.

1lu^mych@rm1n@n@ndt3ch564343v@r

problem exists with sites narrowing down your passwords.