Saturday, January 21, 2012

How to calculate Password Strength (Part II)...

Previously, on Cubic:  The main character introduced a broad analysis of a new algorithm for calculating the entropy of passwords so that a threshold may be applied and weak passwords rejected.  Will our hero's new algorithm pass more rigorous testing or will his arch nemesis Statistics Boy defeat it?  Let's find out!

Since my last publication, I've been busy doing some other things.  But this week I got back to working with this algorithm to see how good it actually is.  My primary goals with my tests were to figure out how well it performs against real-world data and to determine a baseline entropy threshold for the algorithm that rejects most bad passwords.  And what better real-world data is there than to use databases of passwords that were stolen from hacked websites?

I ended up testing against two types of information.  The first type were hacking dictionaries.  These are specially formulated files designed to defeat commonly selected weak passwords.  The latter type were actual password databases that someone else extracted from various sources and someone else published to the Internet.

My overall analysis so far shows that the algorithm works very well and eliminates most bad passwords at the 18 bits of entropy level.  This is the relative conclusion I've come to from analyzing millions of passwords and looking at the resulting output.  It also surprised me by doing remarkably well on non-English passwords but I only had one database to work with for those.

Of course, you probably now want to see the relative levels of strength.  Here is example output for the top 10.4 million passwords from the massive 32.6 million password RockYou database leak:

    8 => 2531803,
    9 => 2501749,
    10 => 1561832,
    11 => 1548202,
    12 => 918072,
    13 => 825913,
    14 => 315297,
    15 => 296425,
    16 => 113774,
    17 => 102698,
    18 => 35126,
    19 => 25454,
    20 => 12288,
    21 => 4274,
    22 => 4183,
    23 => 1870,
    24 => 958,
    25 => 796,
    26 => 366,
    27 => 227,
    28 => 227,
    29 => 183,
    30 => 138,
    31 => 138,
    32 => 138,
    33 => 138,
    34 => 138,
    35 => 138,
    36 => 138,
    37 => 138,
    38 => 85,
    39 => 85,
    40 => 85,
    41 => 85,
    42 => 85,
    43 => 85,
    44 => 85,
    45 => 85,
    46 => 85,
    47 => 85,
    48 => 85,
    49 => 85,
    50 => 85,

How should you read this?  Out of the top 10.4 million passwords, only 35,126 (0.3%) would have made it passed the algorithm at the 18 bits of entropy level.  My gut instinct and preliminary testing before I started seriously testing showed 18 bits to be the minimum acceptable level for this algorithm.  I'm leaning toward the following rules:

18 bits of entropy = minimum for ANY website.
25 bits of entropy = minimum for a general purpose web service used relatively widely (e.g. Hotmail).
30 bits of entropy = minimum for a web service with business critical applications (e.g. SAAS).
40 bits of entropy = minimum for a bank or other financial service.

I was actually already leaning toward this as my recommended "rules of thumb" before running these tests.  These more serious statistical tests merely validate my gut instinct and preliminary testing.

The 85 passwords exceeding the minimum for the last rule are actually an outlier in the data and can be counted as just one password.  So out of the 10.4 million passwords that were analyzed, only 712 (796 - 84 = 712) were actually good enough to be used for RockYou at the 25 bits of entropy level.  That is, the minimum at where they should have been running their service.  What this tells me is that my algorithm rejects over 99% of all bad passwords from the start.  Which is actually better than I was expecting.  The other data sets show similar results.

At this point, I expect some developers are chomping at the bit to see this algorithm.  The hard numbers above prove I've got something legit but you are still going to have to wait just a little longer...

What is going to happen?  Will developers everywhere finally gain access to a seemingly miraculous set of functions?  Will we be teased forever?  Join us next time when the adventures of Cubic continue!

"Take that, Statistics Boy!"
"Oh noes!  I've been defeated!"

(The boring conclusion to this series can be found in Part III.)

The Ultimate Chair (Partial Resolution)

A while back, I wrote a series of posts of creating a chair that would allow me to sit outside and soak up some rays.  Programmers are white and nerdy because they sit inside.  There are several benefits you can get from going outside:

  • Fresh air
  • The sounds of nature
  • Sunlight
  • Knowing what time of day it is
  • Not looking like a pasty-white programmer/office worker
Anyway, this post was an itch I've been meaning to scratch for a while to provide some closure and someone finally commented on it, so here goes...

I used the chair for a few months.  (Feel free to take that sentence out of context.)  And it sort of worked.  It got me outside but I had several problems that I could never resolve to my satisfaction:

1)  The chair itself wasn't comfortable to sit in for an extended period of time.  I hate anything that provides so-called "back support".  As an intelligent alien lifeform on Earth (that's a joke, BTW), back "support" is more like back "torture" to me.

2)  Glare from the sun made it impossible to sit outside in direct sunlight.  I know you aren't supposed to sit in the sun for long periods of time, but 15 to 30 minutes gives you the Vitamin D you need to stay healthy.  But I wasn't able to read the laptop screen due to glare.  I researched the issue back then and came up empty-handed but walked away from the experiment around the time e-ink hit the market and decided to watch from the sidelines as e-ink displays might supposedly breathe new life into this project.  So far, they haven't.

3)  It took a while to set up and tear down.  Whereas my computer inside was all ready to go.  I couldn't leave it outside because of possible weather issues.  Nobody likes waking up to soggy chairs and wet digital equipment.  That, and electronics have this habit of not working when wet.  Go figure.

The digital egg timer eventually died and some of the custom parts of the setup actually melted one summer.  That latter aspect was an interesting mess to clean up:  Goo everywhere.

Overall, it was worth the effort for a DIY project.  The ability to get outside to write software is still in me.  I may try again if e-ink ever gets good enough that laptop/tablet manufacturers start implementing them.  I'll probably just use a regular outdoor table and chair though.  Comfortable chairs, outdoor or otherwise, are hard to come by.  I like chairs that I can lean back into and rest my head on when I just want to relax.

However, I still use one component extensively:  The ultra-sturdy mousepad.  That thing is built like a rock, works on any surface, and doesn't go anywhere.  My only complaint is its thickness.  1/8" Plexiglass would probably be plenty and, if anyone wanted to be really cheap, a slice of cardboard or thin wood might also work.  The only issue with alternate materials is getting enough grip to take place between the surfaces and the contact cement so that they stay put or finding a different glue to use.  There is room for improvement here and I'd like to think that there is even a market for it.

Ran a few Google searches on e-ink again.  Qualcomm's got something called 'Mirasol' that is prototype-ish looking and claims to solve the problems of e-ink - namely color and performance.  Nearly impossible to find videos of it in action though and was just debuted at CES in product format (Kyobo reader), so it'll be another year before it gets to be solid enough tech. to be worth trying again.

So, as it stands, I'm still kind of waiting for the technology to improve a bit before trying again.