Skip to main content

Adventuring deeply into software serial numbers

Be sure to check out the high-performance open source CubicleSoft License Server on GitHub, which generates serial numbers as described in this post. It's a complete, turnkey solution for creating and managing sellable software licenses.

Let's say you just finished making a really great piece of software and are ready to sell it to other people. But first, you "need" to "protect" your hard-earned investment. I put those two words in quotes because we should first identify our audience: Does your audience consist of people who are honest and upright and ethical? If they aren't those things, then no amount of effort you put into the software will make them that way.

The variety of anti-piracy techniques in use today only work effectively against casual piracy. That is, the average person who might be mildly predisposed to steal your work and pass it around the office or share it with friends either not realizing they are stealing or not necessarily caring until they hit a wall of sorts. Then they'll say, "Fine, here's some cash to make this problem go away and make the software work for me/us."

The sort of person who will never pay for your software, no matter how onerous the anti-piracy mechanism might be, is the person who knows how to search Google for "cracks" and "keygens" for software and use a BitTorrent client. They have zero intention of paying for the software - whatever their rationale. The only solution for this sort of person is to involve the legal system but that gets expensive fast (lawyer's fees, court fees, time wasted, etc). They also run the very real risk of running a crack or keygen executable that also contains malware, which turns their once pristine computing device into a bot on a botnet and infecting their entire local network (or worse). The folks who make cracks and keygens aren't exactly what you might call honorable people. I'm reminded of the old adage, "There's no honor among thieves."

In my opinion, how much effort you sink into a protection scheme should be primarily focused on preventing personal and casual office piracy. For that, an unguessable letter and number sequence is both a good enough and a fairly effective strategy. This is known in the software development industry as a serial number and is an alphanumeric string that usually looks something like:

394d-4gzd-9xrb-5tha

There are two primary ways to go about creating a serial number: Online and offline. For an online serial, it is a simple matter of generating a random sequence of characters, storing that in a database, and then using what is known as a license server to validate the serial number by communicating over the Internet every time the software starts up/does something significant. Online serials eliminate the "keygen" aspect of piracy since you directly control the keys that are possible and how many times they are used to install the software. There is one major immediate downside is that the software won't run at all without an Internet connection, which is a non-starter for some people (e.g. DoD contractors who have to work in a Classified environment). However, there are also long-term downsides as well. Consider what happens when you remove support for an older version of the product OR too many people hit the license server at once OR you stop running your business OR you forgot to pay a bill and the domain your software talks to goes away OR...well, the list is endless...the result is that the license server that validates the online serial numbers goes offline or doesn't function properly. Those things actually happen with unfortunate regularity. Everyone's software installations of your software will eventually fail when using online-only serials and the impact can be significant including upset customers who demand their money back. I have a few additional thoughts on how to handle online serial numbers the right way towards the end of this post in a way that won't upset customers who paid for your software product and are using it as intended so that they can continue to use it past End of Life (EOL).

Now that online serial numbers have been mostly addressed, let's move onto offline serial numbers. Obviously, if anyone can enter any letters and numbers into the serial number field of your software then people will quickly realize they can enter in 1234-1234-1234-1234 and move on as well as share that knowledge with their friends because who wants to see people wasting their time? So we need to limit the number of correct serial numbers and have two functions written in software: One function that generates valid serial numbers that is used by the purchasing system and another function that validates serial numbers that goes into the software itself. This is around the point in time where people ask questions such as, "How to generate a software license key?" or "How can I make a serial number for my software?" or similar in online forums. They realize the need to limit the space of valid offline serial numbers but don't know how to do that (yet).

The first thing to understand is that the serial number that is entered is just a series of bytes and, on most systems these days, bytes are made up of 8 bits each. Now, let's look at the available alphanumeric characters that might make up the typical serial number:

ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789

That is 26 letters and 10 numbers for a total of 36 possibilities per character (where 'A' = 0, 'B' = 1, etc). Unfortunately, that only gets us slightly more than 5 bits (2^5 = 32). You also don't want to require the user to enter in a mix of upper and lowercase letters but rather allow them to enter in either, so we start by limiting to just those 36 characters. However, if you've entered in serial numbers before, it can be difficult to differentiate certain characters. For example, 'I', 'l', and '1' can be hard differentiate depending on the font, as can 'O' vs '0' and 'Z' vs '2'. Quite a few people have difficulty reading due to a variety of physical conditions, so any help here is welcome. Since we have four more letters than we actually need to fill 5 bits, let's cull the possibilities to precisely 32 characters:

abcdefghijkmnpqrstuvwxyz23456789
ABCDEFGHJKLMNPQRSTUVWXYZ23456789

While most serials are uppercase, I actually prefer lowercase because it eliminates the difficulty that some people have of determining if something is a letter (e.g. z) versus a number (e.g. 2). The major difference above is 'i' vs 'L'.

Now that we've chosen the character set of the final string and decided on each character in the serial number representing 5 bits of data, now comes a series of difficult questions:
  • How long, in characters, should the serial number be?
  • How to validate the serial number?
  • What data should/can the serial number contain?
  • How can the serial number be protected?
Obviously, the longer the serial number, the more likely it is that someone will give up on entering it and go find another solution or complain loudly about how long and difficult the serial numbers are to type in. Also, the longer a serial number is, the more suspicious it becomes that it contains interesting data. Most serial numbers are somewhere in the range of 16 to 25 characters long. 25 characters (5 groups of 5 characters) seems to be the limit that users are willing to tolerate typing in. Let's look at the breakdown of a couple of options:
12345 12345 12345 12345 12345 <-- 5 columns of bits (0-31)

16 characters (4 groups of 4 characters):
12345 67812 34567 81234
56781 23456 78123 45678
12345 67812 34567 81234
56781 23456 78123 45678
(Total storage:  80 bits, 10 bytes)

25 characters (5 groups of 5 characters):
12345 67812 34567 81234 56781
23456 78123 45678 12345 67812
34567 81234 56781 23456 78123
45678 12345 67812 34567 81234
56781 23456 78123 45678 12345
(Total storage:  125 bits, 15 bytes + 5 bits)
Before making a choice, let's answer the other two questions. Validation of the serial number can be done a number of ways. The simplest approach is to use a checksum. A CRC-32 of the data, for example, is 4 bytes long. Of course, that uses up almost half of the available space in an 80-bit serial just for validation. Also, CRC-32 has known weaknesses when it comes to "data collisions" where changing a single bit can still result in the same checksum. Before making a decision on what to do here, let's move onto the next question.

80 to 125 bits of storage (i.e. 10-15 bytes) doesn't sound like a lot, but let's think about some of the things we might want to store inside a serial number (if possible):
  • Product identifier.
  • Product classification (e.g. Standard vs. Pro editions).
  • Version number.
  • Timestamp of when the serial number was created.
  • Timestamp of when the serial number expires (e.g. temporary licenses).
  • Purchaser ID (e.g. from your 'users' database table).
  • Flags or custom data for various per-product reasons.
Want to know how much of that you can cram into 80 bits and still have enough room for validation? Almost all of it. Behold:
 1 bit :  Expires - affects Date field (0 = No, 1 = Yes)
20 bits:  Date issued or expires ((int)(UNIX timestamp / 86400), ~2873 years)
10 bits:  Product ID (0-1023)
 4 bits:  Product classification (e.g. 0 = Standard, 1 = Pro, 2 = Enterprise)
 8 bits:  Major version (0-255)
 8 bits:  Minor version (0-255)
 5 bits:  Custom bits per app (e.g. flags)
24 bits:  Start of HMAC SHA-1 of the first 56 bits (7 bytes) of this serial plus user-specific info (e.g. ID, email).
-------
80 bits
Three whole bytes are dedicated to validation. I recommend HMAC SHA-1 for several reasons that will become apparent in a moment. You may be aware that SHA-1 is deemed "broken" and therefore "should not be used" and there might be concern about data collisions. However, HMAC SHA-1 is still perfectly valid to use and collisions don't actually matter here anyway. The validation is just one tiny preventative measure.

Of course, once someone knows the scheme used to derive a serial number, they can simply make their own valid serial numbers en masse using a tool for that purpose. This is what is known as a key generator, or "keygen," for short. The last question from earlier asks how a serial number like the above could be protected from being keygenned. In short, it both can and can't be protected. The answer is: Encryption. If you just lay out the bits of the key as above and do nothing else, the software will be keygenned in under a day. Let's do something about that and encrypt the serial number. But which algorithm should be used? The first thought is AES, which is a very popular symmetric encryption cipher and quite secure. Unfortunately, AES requires 128 bits of storage (i.e. the block size), which is significantly larger than our serial number. An option might be to use something with a smaller block size like Blowfish, which only requires 64 bits so there could be two encryption cycles with double-encrypted data in the middle. However, Blowfish has an expensive key schedule (CPU-wise), which might not necessarily be ideal and I'll explain why in a bit. Plus we don't need anything super serious here as it's not like we are storing bank details or medical records. We just need something that makes it impossible for the average keygen developer to be able to create something to deploy to average people. I know what to do! Let's make our own encryption algorithm that results in an exact 80 bit block size!

The first step to encrypting something is choosing a "good" encryption key. That is a lot harder than it sounds. In this case, we want the secret key to be associated with each major version of the product itself. However, the encryption should also be unique per serial number. A HMAC takes in a secret key plus some non-secret data (e.g. the product ID and an email address) and produces a signature using a cryptographic hashing algorithm. HMAC SHA-1 produces 20 bytes of output. 20 bytes (160 bits) is exactly twice the size of our 10 byte (80 bit) serial number, which will be quite useful in a moment. Here is the algorithm:

// Calculate the HMAC SHA-1 key.
var key = hmac_sha1(product_id '|' major_version '|' user_info, per_product_major_version_secret);

// Encrypt step.  16 rounds.
// The 'serial' array in this example is 10 bytes (it's easier to work with that way).
for (var x = 0; x < 16; x++)
{
  // Rotate the bits across the bytes to the right by five.
  // The last 5-bit chunk becomes the first 5-bit chunk.
  var lastchunk = serial[9] & 0x1F;
  for (var x2 = 9; x2; x2--)
  {
    serial[x2] = (((serial[x2 - 1] & 0x1F) << 3) | ((serial[x2] >> 5) & 0x07));
  }
  serial[0] = ((lastchunk << 3) | ((serial[0] >> 5) & 0x07));

  // Apply XOR of the first 10 bytes of the encryption key and add the other 10 bytes to the serial (10 bytes).
  for (var x2 = 0; x2 < 10; x2++)  serial[x2] = (((serial[x2] ^ key[x2]) + key[x2 + 10]) & 0xFF);
}
Disclaimer: There is no guarantee that this algorithm is actually secure. Given enough valid, generated serial numbers, it may be possible to differentiate details about the encrypted data without the key, which may or may not in turn be sufficient to calculate the original HMAC secret based on statistical probabilities and differential analysis. In other words, this encryption algorithm shouldn't be used for anything other than generating serial numbers.

The algorithm is split into three parts: The key generation phase (the first line) and then a loop that rotates each 5-bit chunk across all the bits (16 * 5 = 80) and, after each rotation, performs simple XOR and ADD operations based on the key. (For those familiar with cryptography, this is a basic Feistel cipher.)

Once the serial number has been encrypted, the final step is to map the data to the final characters and return the generated serial number.

Validation of a serial number is the reverse of this entire process.

Of course, this brings us to an interesting point: If the two HMAC secrets are bundled with the software to decrypt and validate the serial number, what stops someone from simply writing a keygen for this? Nothing stops them from doing that. Unless, of course, the HMAC secrets are simply not included with the software and an online service (aka license server) is required to decrypt and validate and return the decrypted contents of the serial number.

Remember how I said earlier that online serial numbers cause problems including upset customers when (not if!) the license server stops functioning and that I'd have a solution for that? Well the solution is pretty simple: Instead of randomly generating character strings for serial numbers and relying entirely on an Internet-based service to run the software, the license server generates serial numbers using the above algorithm. However, the license server holds onto the HMAC secrets and each major version of the software gets new HMAC secrets. When a major version or the entire product is being retired (e.g. not enough sales), a special out-of-band release of the software can be made that includes the HMAC secrets. Existing customers download and use the now-offline version with their existing serial numbers and they no longer require an Internet-based license server to continue to run the software. They not only remain happy customers, there's also less incentive to keygen software that is several versions behind the latest release and a keygen for previous versions still won't work for the latest releases.

I also said earlier that Blowfish might not be ideal. When using an online license server solution, the server has to decrypt and validate keys as fast as possible. So expensive algorithms like Blowfish are not desirable for that purpose. I wrote a fairly basic implementation of the above algorithm in PHP and benchmarked it at 27,000 keys generated per second on a single thread on a 6th Gen Intel Core i7. It would be even faster in pure C/C++/assembler. Don't forget to check out the high-performance open source CubicleSoft License Server on GitHub, which generates serial numbers as described in this post and much more!

While not directly related, "cracks" are effectively modifications to executable code that completely bypass serial number checking logic. A crack will modify the executable code in a few select locations such that the serial number logic always returns successfully, which makes the rest of the software think it is running in a fully licensed fashion. Like keygens, cracks and cracked software also have a significant chance of containing malware. There is no defense against cracks so the solution to this problem is to provide non-software things that require a valid serial number to access such as: Technical support forums, a ticketing system, regular software updates, access to a plugin database for the software, etc. People who have a valid license (i.e. paid you money) get simple, easy access to those things while the people who don't have a valid license don't get access.

An alternative to using a serial number is to use public key cryptography. You digitally sign every single license with your private key and then the public key in the software validates the signed data. Instead of a short string, the user has to copy and paste a large body of text from somewhere and also keep a copy stored on their devices somewhere in case they need to reinstall the product. The software has to also include a working PKI library because implementing those things is a bit painful without a hefty understanding of asymmetric cryptography. I've used software that does public key crypto and it's an "okay" solution but I can see how some users might find it awkward because not everyone knows how to "copy and paste" (mostly older folks but I've seen younger people be amazed too when they learn how to copy and paste content for the first time). There are also folks out there who will take a photo of the text on the screen with their phone, print out the photo onto a piece of paper, and then type it in from the now blurry, printed image...one...character...at...a...time. In my opinion, PKI for software licensing is overkill and doesn't stop people from using cracks to just bypass the validation code. In fact, more people will run a crack because finding and using their personal giant blob of text to re-validate the product whenever the OS is reinstalled and/or upgraded is too onerous to deal with.

Comments