Skip to main content

GitHub commits publicly reveal your private life

GitHub is a great tool. It enables software developers to work together on open source projects. That's pretty awesome. However, it also unfortunately exposes your personal life to the entire world. It is easy to look at the history log of commits for any given GitHub user and identify their an incredibly creepy level.

Using GitHub histories, an attacker can identify when you are probably awake, asleep, at home, and at work. They can also identify habits such as what days of the week you tend to commit code. As well as what days of the week you never commit code. Which days and months you commit the most code and which days and months you do not as well as the frequency of commits. All of that information can be used to derive your physical location in the world, your religion, your favorite sports team(s), and your relationship status with your significant other (if you are on good terms or not, having sex or not, etc). And possibly your hobbies and general interests.

If you don't think your personal data isn't already being mined for the above, you are quite mistaken. It is.

In my opinion, commit timestamps are a security vulnerability. Let's say an attacker wants to "send a message" to a software developer they don't like. They simply figure out when the person is going to be away from their home, show up, do their thing (tag/graffiti, rob/steal/destroy property, drop a threatening letter, etc), and leave. GitHub commit timestamps provide a wealth of information and, according to the field of statistics, an attacker only needs 35 data points to achieve what is known as "statistical significance". Each commit timestamp is a valuable data point. Therefore, all someone needs is 35 commits to start building a profile. So the attacker may notice that the commit history is devoid of commits during the week Monday through Friday during "normal" business hours when mapped to a specific timezone (i.e. narrowed to a specific region of the world). They can reasonably assume that the target is at some form of a day job. Physical addresses are pretty easy to obtain when the real name is acquired, so the commit history just confirms what is already published information. The lack of commits is information that is just as important as the actual commits.

More data points simply improves the accuracy of the information. Thus, the more frequently you commit, the more information about your personal life that you give away! At some point, with enough commits, everything about your personal life can be determined. Oh, you don't commit code whenever a specific sports team plays a game and it airs on television? You might be a fan of that team because humans are literally incapable of doing two things at once - despite the fact that some people that claim the contrary humans are single-taskers. When you are committing code, you are in front of a computer screen and focused on that singular task. Your favorite TV shows are also possibly able to be determined with GitHub commits simply because you aren't committing code during the time you watch those shows - although if you use Netflix, Hulu, etc., then that information can be a lot harder to determine but you generally won't be committing code while watching any given show.

So how should this be fixed? First off, the entire world doesn't need to see commit timestamps. Timestamps should only be accessible to trusted users and services. I realize timestamps are part of the commit log, so Git itself will have to be changed to accommodate fixing this issue. Second, there should be tight controls over how much timestamp information is disseminated even to trusted users and services. And finally there should be a timestamp privacy (mangling) option to set commit timestamps to specific/random times according to a ruleset that the committer makes (e.g. hardcode all commits to Mondays at random times of the day regardless of the fact the code was committed on Thursday at some specific time of the day).