Thursday, November 03, 2011

How to calculate Password Strength...

When I visit websites that want me to create an account before doing something, I typically enter in bogus information and occasionally I see a "password meter" that determines that I've entered a "weak" password.  At least it is considered "weak" by some systems and "average/strong" in others.  Being the curious sort of person, I've been trying to come up with a good, consistent strategy for calculating password strength and then something useful to do with it.  I assume most developers only want to write password strength code one time, do something useful with it, and then move onto the next task.

What constitutes a strong password?  An excellent question and something the industry seems to have difficulties figuring out at the moment.  NIST, the National Institute of Standards and Technology, has a few words to say on the topic.  Basically, password strength boils down to the number of bits of entropy that a password has.

So the next question is:  How does one calculate the number of bits of entropy of a password?  NIST has proposed the following rules:

  • The first byte counts as 4 bits.
  • The next 7 bytes count as 2 bits each.
  • The next 12 bytes count as 1.5 bits each.
  • Anything beyond that counts as 1 bit each.
  • Mixed case + non-alphanumeric = up to 6 extra bits.

After thinking about this for a while and crunching some numbers, the NIST proposal seemed decent enough.  Plus, implementing the algorithm in software is easy.  Unlike a lot of programmers, I don't take algorithms like this at face value and just implement them blindly.  I like to test their claims first and sometimes introduce my own ideas.

During my research phase, I also ran across this xkcd web comic:


While I don't generally care for xkcd comics, this one actually made me think.  Is "correcthorsebatterystaple" actually 44 bits of entropy?  I immediately said to myself "no" but, being a sucker for self-inflicted punishment, I went to check.

Modern security is best done by not "handing out" bits of entropy.  Basically, you only want to say that something has "at most 'x' bits of entropy" if that something passes a number of tests.

My initial calculations of "correcthorsebatterystaple" against the NIST rules result in a maximum of 41 bits of entropy.  So, xkcd handed out three undeserved bits.  Three bits doesn't sound like a lot, but it is the difference between 550 years and 69 years.  Also, this is a phrase using words from the dictionary and the letters of the phrase have a lot of repetition, so I modified the NIST algorithm to only count a multiplier of 75% of each repeat character (e.g. the first 'r' counts for 100% of the bits, the next 'r' counts only for 75% of the bits, etc).  This drops the number of bits of entropy to 34.2.  My own algorithm runs some more tests but the result still comes out to 34 bits.  This is 10 bits fewer than the xkcd comic's claim for that strategy to selecting a password.  Or, more simply put, roughly half a year at 1000 guesses/sec.  Not 550 years.

For the "Tr0ub4dor&3" part of the comic, my same base algorithm gives me 22.1 bits, not 28 bits.  Or roughly 1.2 hours at 1000 guesses/sec.  However, the extra tests in my algorithm actually calculate it as having about 14 bits of entropy.

Of course, the actual time it takes to guess the password is going to be somewhere between my theoretical time and xkcd's theoretical time.  So, while on a surface level, xkcd seemed way off, the approach to selecting a good password with four random words isn't actually half bad and is a considerable improvement over today's passwords that users select.  And we are all for anything that improves the passwords that users select...right?

If the computer generates dictionary-based passwords for the user, then that will eliminate the human equation.  A human, following xkcd's rules, would likely select words of things that they can see, hear, and touch in their immediate vicinity - making it easier for a hacker to build a small attack dictionary.  A quick glance around me and I get "computerkeyboardvideogamenetflixmoviewikipedia" - a great password from the algorithm's perspective except for the fact that I'm sitting at a computer with a keyboard, I love playing video games, I have a Netflix DVD movie in front of me, and I visited Wikipedia recently.  So, even though I'm sure it has terrific entropy, it is a terrible password.  Hence the need for a computer to generate dictionary-based passwords.

One thing I've noted with so-called "password strength meters" is that they are generally useless.  The server-side doesn't enforce password strength requirements.  Part of the problem is that no one seems to have come up with a solid algorithm (other than NIST) and no one has actually said, "here is how you should use this password strength meter..."  This combination is useless to the average server-side programmer, so the only thing I've seen done is display the meter to the user and hope they make a strong password.  Ha!  As if that will ever happen.

What should actually happen:  Calculate the number of bits of entropy in a password using a good algorithm and then apply a minimum threshold.  (And get rid of the silly password meter.)  If the password exceeds the threshold, then the password is strong enough, otherwise the user has to try again.

This approach allows a web forum owner to choose a minimum threshold of 18 bits of entropy and an online bank to choose a minimum threshold of 40 bits of entropy (or more).  Each industry requires different password strengths.  A password strength meter, if you really need one, could then become useful for prevalidation against the threshold.  After all, the user only cares to know the answer to the question, "Will my password be strong enough so I won't have to resubmit this form?"

I'm still nitpicking at my algorithm, which is one of the reasons why it isn't being published (yet).  So, don't get your knickers in a twist over my "hand-waving" claiming to have a solution and not playing nice in the kiddie pool and sharing said solution.  The algorithm will be released in due time.

Another example:  Google recently published an online safety guide on their homepage.  This guide contains a YouTube video and other similar textual instructions on creating so-called strong passwords.  The YouTube video shows "Garden1ngF@n" as a "strong password".  For that password, my base algorithm gives 22.8 bits of entropy, but the extra tests portion of my algorithm calculate it as having around 16 bits of entropy.  Yikes.  Seems really broken to me.

So it looks like I'm onto something here.  My algorithm is blasting apart even Google's recommendations for strong passwords.  Woot!

Read the next part in this three part series: Calculating Password Strength: Part II

9 comments:

  1. Interesting blog! You've got a new follower :D

    ReplyDelete
  2. Why are you using "correcthorsebatterystaple" instead of "correct horse battery staple"? According to NIST, that's 3 extra bytes at 1.5b of entropy each, plus up to 6b for non-alphanumeric.
    Anyway those guidelines were made for passwords like Tr0ub4dor&3, not for passphrases.

    Let us know when you come up with a good algorithm for calculating the entropy of data. I suspect researchers making compression software, among others, will be most interested.

    ReplyDelete
  3. @Petr - I mashed it altogether because it didn't look like there were any spaces. A lot of websites won't take spaces in passwords either. I'm not sure how useful password entropy calculations will be for compression software. The NIST suggestions are just for passwords, not really intended for general data. My goal is a rough estimate of password entropy so a threshold can be applied and poorly selected passwords can be thrown out.

    ReplyDelete
  4. I've always thought it silly to have those password meters and not use them for any kind of enforcement.

    Since that XKCD comic came out I wonder how long before dictionary phrase attacks become a problem. 'correcthorsebatterystaple' wouldn't last long against it. However I think the whole point of the comic was to point out how ridiculous passwords are getting.

    My big lesson from the mtgox.com incident was length trumps complexity.

    ReplyDelete
    Replies
    1. Actually, I have a 300,000 word English dictionary I deploy with my SSO server. (300,000 ^ 4) / 2 is the minimum number of attempts required to brute force a single password using said dictionary.

      If you combine that with 'bcrypt'-like hashing, it becomes entirely infeasible to brute force even a four word password phrase. It looks weak but is surprisingly strong.

      Delete
  5. You say "computerkeyboardvideogamenetflixmoviewikipedia" is a terrible password. Why?

    It is only composed of dictionary words and knowledge about you, but you really think anyone will brute force that? Or narrow the search space by adding knowledge about you, such as computer or keyboard?
    There are soo many keywords one can say about a person that it won't narrow it down enough.

    Tiago Mendo

    ReplyDelete
    Replies
    1. Well, it isn't a good password because someone can build a much smaller dictionary of "objects likely to be in the vicinity" of the person sitting in front of the computer screen. That dictionary is a lot smaller than you would think and people are terrible at crafting passwords themselves as proven time and again with various hacked websites.

      If it is a worthwhile account to hack, then it is worth the effort for someone to brute force it.

      Delete
  6. Very interesting approach. I did the same thing from a diffrent prospective. http://thelivingpearl.com/2013/01/02/generating-and-checking-passwords-in-python/

    ReplyDelete
    Replies
    1. End users don't like randomly generated passwords. If your users are using KeePass or a similar tool already, then it is fine, but most users don't. For a truly random password, your algorithm is fine, but for everything else, NIST + dictionary is better. The key is to calculate entropy and then use a threshold and cut off bad password selections at that threshold.

      Delete