Could someone explain today's XKCD to me?

Atreus21 · Aug 10, 2011

What are bits of entropy, and how do you establish how many a certain character on a keyboard has?

William Gaatjes · Aug 10, 2011

I have to look up the word entropy first.
IMHO: I do know that entropy is used to define the amount of chaos(or the lack of) in a given situation. Chaos means here a lack of under lying connection between different variables.

But the picture is quite easy to understand.
Most people use common words where some characters are replaced. Meaning letters are replaced by numbers or other symbols. But it is still a word. A word that can be found in a dictionary.

In the Netherlands, you have this word game called Lingo where you have to guess what word it is while not having all the letters. Now when you have a computer and a digitally stored dictionary, it is possible to use a search algorithm. With the letters you do have, the search algorithm searches for the possible words. Then a list of words will be produced.

But it is for a search algorithm more difficult to find a sentence of words. Because there are more words then there are characters. And a simple sentence is easier to remember then some weird looking word for a human.

WildW · Aug 10, 2011

xkcd really made me smile this morning, but looking at it again now. . .well, maybe I fail the human test, but I was sure it was horsestaplebatterycorrect, probably because that's left-to-right what the last picture was.

EarthwormJim · Aug 10, 2011

Bits of entropy usually means Shannon's Entropy. It has to do with random variables and information theory.

As an example, a single fair coin toss has an entropy of one bit. Two coin tosses have two entropy bits.

Since there are 28 unknown random bits, with each having an equally likely chance of being either a 1 or a 0, then there are 28 bits of entropy.

It's basically a way of measuring uncertainty. Since you are completely uncertain what the the state of the 28 bits are, each bit contributes one entropy bit.

Mark R · Aug 10, 2011

Entropy in this case refers to the actual amount of 'information' contained within. If you use a password based upon an 8-letter word in the dictionary, there is much less information in this password, than one based on 8 random characters.

Let's say that there are 60,000 approx words that could be used - that's roughly 2^16 options. In other words, choosing a word from the dictionary has the same amount of randomness as choosing a random 16 bit number (a number from 0-65535). In the appropriate jargon, you could say that a dictionary word offers 16-bits of entropy. This is actually far less than the actual amount of information than an 8 character password could potentially hold (assuming 64 possible character codes - it would be 64^8 or 2^40 - or 40 bits [a number between 0 and 1 trillion]).

In the first example, there are lots of tweaks, numbers, etc. that add additional randomness (or entropy) to the password: Making a letter a cap or not - 1 bit per letter (i.e. doubling the number of possible combinations), etc.

The key point is that the amount of entropy dictates the number of passwords that need to be tried to find the right one (assuming that an attacker knows common ways in which a password might be assembled).

There is no point in using '256 bit' encryption (where there are 10^77 different keys to choose from) if your password choice only can only generate about 10^8 different passwords?

So, if more randomness is what you want, why not use multiple words? This is why well designed high security systems don't use passwords, they use passphrases (essentially a combination of random words compiled into a sentence for easy memorization).

William Gaatjes · Aug 10, 2011

When reading back my own post, i realized i forgot to mention something that is important.

There are more words then characters.
Although this depends on which character map you use it is generally the case. I am not a hacker, but i do know that the browser header reveals a lot of information about the user.
Thus this can determined by probing your browser header. Also your IP address can be used to probe from which country (by use of reverse DNS) you are and it is a possibility to then determine what language you speak and in such a case the list of possible used character maps is getting smaller.
Also by use of whois, you can find out a lot.
There are more possible combinations of words.
As long as you do not use common phrases, expressions or sayings you are save. Also these "common phrases, expressions or sayings" are different from country to country.

As a sidenote :
I am not a hacker but i recently did a test to figure out what my browser gives away and how visible i am at the internet.

To do your own safety testing, visit this website :
https://www.grc.com/x/ne.dll?bh0bkyd2

General website.
https://www.grc.com

Jeff7 · Aug 10, 2011

My thought on this password issue is that the original idea was to make it difficult for a person to guess your password. For most of the last 20 year period the comic references, computers didn't really have the raw computing power to brute-force a password unless it was one of the top x,xxx passwords used. So you had to be sure it wasn't something easy to guess. (And using odd characters made it tougher for any computer that might make the attempt.)

But we accidentally progressed to having immense computing resources without the leap to quantum computers, which everyone seemed to figure would render current encryption methods useless. Now we've got videocards that are optimized for parallel calculations, which just happen to provide a platform that's great for cracking encryption, or brute-forcing a password.

So that's just my own thought on how this all came about. Back when we had super-awesome 33MHz 386 processors, a brute-force attack wasn't much of a concern, but some random coworker sitting down at your PC and trying your dog's name as a password was. Now that we find ourselves with ridiculously fast computing power available to consumers, getting at encrypted goodies won't necessarily take >1,000 years and a government supercomputer, unless you increase the length of the password considerably.

(Though what about that 128-bit encryption key stuff? How does that work into all this? 128 bits = 16 bytes, ya? So if I have a 50-character password on some data that's using 128-bit encryption, would it be quicker to try to directly brute-force the encryption key instead of the password?
Edit: After reading Mark R's post above, I think I might be way off.)

DirkGently1 · Aug 11, 2011

I don't know how much this will help the OP understand the XKCD strip, but it is relevant and very interesting.

http://www.tomshardware.com/reviews/password-recovery-gpu,2945.html

Biftheunderstudy · Aug 11, 2011

My department routinely checks passwords by attempting code breaking algorithms on the administrated machines. We have the advantage of having access to a supercomputer network, but quite often people get emails saying their password is weak and its been brute forced by the net admin.

The passwords for the same supercomputer network are closely monitored and they watch for things like the comic suggests. RSA keys and passphrases are some of the best ways to do this, lastpass helps in this regard.

William Gaatjes · Aug 11, 2011

There should be one disadvantage. When the server is located in another country and the person is in yet another country, then a mutual character set must be used. Usually this is the ascii character set. But when combination of words are used it is not that easy. I was also thinking, if the person that wants to remotely hack passwords, then it is done remotely and this usually means 10 to 100 milliseconds delay between each password entry. Wifi or cable.
If a server also uses a limited amount of password entries for each hour. Then there really is not much to worry about. Because then the time to hack is a lot longer. The super computer then can calculate really fast by brute force but has no means to test these hacked passwords at the same speed. Here a bottleneck arises.

As long as there are no data packets that are encrypted, it is a lot more difficult to hack the password or encryption method. It reminds me of the wifi net hacking that was shown a few years ago. Then at a constant rate encrypted data packages where received by the hacker to analyze.

flvinny521 · Aug 11, 2011

How much more or less effective are passwords created by creating an acronym of a sentence or phrase?

For example:

"I want to create Vinny's new email password" becomes Iw2cV'sn3wep

EarthwormJim · Aug 11, 2011

flvinny521 said:
How much more or less effective are passwords created by creating an acronym of a sentence or phrase?

For example:

"I want to create Vinny's new email password" becomes Iw2cV'sn3wep

I foresee a post-it note stuck on the user's monitor with a password like that.

flvinny521 · Aug 11, 2011

Actually, I have been using this system for several months without issue, but I am just curious as to whether I am just making things more complicated for myself.

kevinsbane · Aug 11, 2011

What makes a password "effective"?

As to your question,

The shortened version of your passphrase is much much much more likely to be broken. That being said, the chance of it being brute forced is something like 174 years at 100 billion attempts a second. Ie, don't worry.

The full version of your passphrase is much more secure (theoretically). It would take ~3x10^64 years to brute force at 100 billion attempts a second. So, theoretically speaking, it's something like 62 orders of magnitude more difficult to crack. Not that it matters at this point.

Effective? Do you seriously want to type that sentence in every single time you need to enter in the password? How much time does it take to remember the shortened version as compared to the long version? How much effort to translate between the two?

So what's your goal? Something that's easy to remember but hard to guess? Or something that is a little harder to remember, but easier to input?

My opinion is to use something easy to remember, like a (short) word. Then add a number that is easy to remember. Then pad a series of letters or numbers that is easy to remember... oh, like... 1212121212 or aaaaaaaaaa or asdfghjkl; or something like that. So you end up with...

big1asdfghjkl;

Easy to type, easy to remember, and as long as you don't tell anyone how you padded your first two parts, then your password length is the only thing that matters. Theoretically speaking, big1asdfghjkl; is a stronger password than Iw2cV'sn3wep, and is arguably easier to remember (AND type).

Mark R · Aug 11, 2011

Jeff7 said:
(Though what about that 128-bit encryption key stuff? How does that work into all this? 128 bits = 16 bytes, ya? So if I have a 50-character password on some data that's using 128-bit encryption, would it be quicker to try to directly brute-force the encryption key instead of the password?
Edit: After reading Mark R's post above, I think I might be way off.)

A 50 character random alphanumeric password will easily contain over 128 bits of entropy (assuming that the password truly is random). What this means is that the password isn't a weak point - you're better off brute force searching the 128-bit key, rather than trying to guess passwords according to any sort of pattern.

Whether it is easier to simply brute-force search for an encyption key, or guess passwords depends on whether the passwords are likely to follow a predetermined pattern, and how time-consuming it is to convert a password into a key. Some algorithms for converting typed passwords into binary keys are simply conventional hash algorithms (e.g. MD5). These algorithms are simple and fast. E.g. if you have a list of passwords to test, you can go through the list calculating hundreds of millions of MD5 values per second (using a decent GPU), and check the decryption keys at a similar rate.

So, if you can MD5 and check decryption on 100 million passwords/s, or check 200 million keys/s - if you reckon you can exclude 50% of possible keys due to weak passwords, it's better off to try guessing passwords.

This is a problem, and the use of MD5/SHA-1 or even SHA-512 hashes for converting passwords to keys is (although ubiquitous) considered insecure, as it makes it very easy to take advantage of weak user passwords.

'Secure' password-to-key conversion algorithms are specifically designed to be very slow, and require large amounts of resources (to make parallel processing difficult). E.g. some algorithms simply take the MD5, then take the MD5 of that, then repeat that 10000 times. This makes the conversion very slow, and makes a search for weak passwords too slow to be practical.

In this case, if you can only check 10 thousand passwords/s, compared to 200 million encryption keys, then you have to be looking for very, very weak passwords (compared to the underlying encryption) to make this a useful endeavour.

flvinny521 · Aug 12, 2011

kevinsbane said:
Lots of info...

Thanks for the reply, you more than answered what I was looking for.

Evadman · Aug 13, 2011

kevinsbane said:
Theoretically speaking, big1asdfghjkl; is a stronger password than Iw2cV'sn3wep, and is arguably easier to remember (AND type).

That math only works on brute forcing. If I want access to your account, I will use password dictionaries first, and some of those combinations are going to be in there.

kevinsbane · Aug 13, 2011

Evadman said:
That math only works on brute forcing. If I want access to your account, I will use password dictionaries first, and some of those combinations are going to be in there.

Very true. I assure you that asdfgbig1hjkl; is not in a dictionary. It is relatively trivial for anyone to come up with a combination that is not found in any dictionary of passwords; it simply isn't possible (currently) to store the number of permutations that is possible. For example, given my example above,

big1asdfghjkl;
bigasdfghjkl;1
asdfghjkl;big1
1asdfghjkl;big
1bigasdfghjkl;
1asdfghjkl;big
asdfghjkl;1big

The above 7 are trivial recombinations of the same 3 base combos: big, 1, and asdfghjkl;. There are probably some millions of different combinations of the above three base factors. And that's only using 1 particular word, 1 particular number, and one particular padding string. There are thousands of words, infinite number of numbers (let's say 10000 for simplicity; 4 digits) and an infinite number of padding strings. With millions of combinations of those. Password dictionaries that try to store something like that... well, 10^5 (100 000 possible words)*10^4(4 digit number)*10^6(millions of combinations)*10^6(millions of unique padding strings) = 10^21 possible combinations. Let's say 1 password = 1 byte in a text file. 10^21 bytes = 1 billion terabytes. Ridiculously impractical. Not to mention a dictionary attack is nothing more than a brute-force attack but on a limited search space: meaning 100 billion attacks/second on the above search space means it'd still take ~150 years to crack that above combo. That is, given that you gave a simple padding string. With current technology, it'd be faster to crack the password than to transfer the dictionary file via the internet. (Well, barring ocean liners filled with a billion 1-terabyte hard drives...)

You're right, SOME of those combinations will invariably end up in a dictionary somewhere. If you take some care in preparing your base components though, the likelihood of that happening is, in my opinion, on the same order of probability of a hacker cracking your password on the first try by inputing a random string.

Fox5 · Aug 14, 2011

I once heard that only about 1000 words make up 95% of the usage of the English language. 5000 words gets you to 99.x%. That makes xkcd's example considerably weaker to a dictionary attack.

soccerballtux · Aug 14, 2011

Fox5 said:
I once heard that only about 1000 words make up 95% of the usage of the English language. 5000 words gets you to 99.x%. That makes xkcd's example considerably weaker to a dictionary attack.

I don't think any hackers run past 2-3 words in a dictionary attack.

I like the horse. Funny.

Fox5 · Aug 15, 2011

soccerballtux said:
I don't think any hackers run past 2-3 words in a dictionary attack.

I like the horse. Funny.

If they were attacking an organization that 'suggested' (read: mandated) its users use a policy like XKCD offered, they certainly would.

Organizations don't want users to use weak passwords, but they often mandate certain criteria for passwords which users will meet the absolute minimum of in general.

kevinsbane · Aug 16, 2011

Fox5 said:
If they were attacking an organization that 'suggested' (read: mandated) its users use a policy like XKCD offered, they certainly would.

Organizations don't want users to use weak passwords, but they often mandate certain criteria for passwords which users will meet the absolute minimum of in general.

Salt the catch phrase. One or two simple additions will defeat a dictionary attack.

correct horse staple battery -> correct; horse battery staple

Adding a single ascii character to the above will increase the difficulty of a dictionary attack against the above passphrase by 2,755 times. (95 different ascii characters, and the ascii character can be placed in 29 unique locations)

The problem for a hacker is twofold, in pulling off a successful dictionary attack against a long, but low entropy (non random) password/passphrase.
First, they must guess the pattern used to obscure the password.
Secondly, obviously, they need to guess the actual characters of the password. (the classic problem)

If a hacker knew the salt pattern or the base characters, their job becomes a simple dictionary attack on a vastly reduced search space. Instead of needing to search 308 million combinations to crack the password "search", they only need to search ~1000.

However, if a hacker does not know either the pattern used to obscure the base nor the exact composition of the base, then it is a problem with two unknowns, but only one equation (the dictionary). There is no way to use the single equation you have to solve both unknowns at the same time; you must exhaustively test every combination of the two in order to find the answer. Which means a brute-force attack on a slightly reduced search space.

alkemyst · Aug 16, 2011

an xkcd pic is worthless without it's mouseover.

sm625 · Aug 17, 2011

The point is that "correcthorsebatterystaple" is a long password that is hard to crack mainly because of its sheer length. (It is 175 bits if you use 7 bit ascii, 200 bits if you use 8 bit ascii, 150 bits if you use 6 bit alphanumeric-only ascii). Such a code is hard for a computer to crack. Tr0ub4dor is easy to crack because it is only 54 bits alphanumeric-only ascii. Many password crackers will cycle through just alphanumerics because it yields results faster. Even if your password cracker cycles through common dictionary words then it still will take longer to come up with correcthorsebatterystaple. (A 100000 word dictionary to the power of 4 is greater than 2 to the power of 54.)

Jeff7 · Aug 17, 2011

kevinsbane said:
Salt the catch phrase. One or two simple additions will defeat a dictionary attack.

correct horse staple battery -> correct; horse battery staple

Adding a single ascii character to the above will increase the difficulty of a dictionary attack against the above passphrase by 2,755 times. (95 different ascii characters, and the ascii character can be placed in 29 unique locations)
...

You could be a real jerk about it and use a special character for salt. ╫

Could someone explain today's XKCD to me?

Lifer

Lifer

Senior member

Diamond Member

Diamond Member

Lifer

Lifer

Senior member

Senior member

Lifer

Member

Diamond Member

Member

Senior member

Diamond Member

Member

Administrator Emeritus<br>Elite Member

Senior member

Diamond Member

Lifer

Diamond Member

Senior member

No Lifer

Diamond Member

Lifer