Monday, March 16, 2009

unsigned long long long long long int

Congratulations! You've encountered one of the more hair-brained stupid schemes of the past century. There is no way in ANSI C/C++ to say, "I want X bits of storage for this integer."

Your initial thought might be, "Wait a minute, that can't be right...right?" Let us look at the various data types available to an ANSI C/C++ programmer:


A 'char' lets us declare a single byte. A byte is 8 bits...or is it? CHAR_BIT (in limits.h) is typically defined as 8 bits (and the Standard requires it to be defined as a _minimum_ of 8). However, it could be defined as 9 and there has been some hardware where it is defined as 32. Additionally, it is up to each compiler to decide whether just saying 'char' is signed or unsigned.

Thankfully 'float' and 'double' use the IEEE floating point standard. Edit: Due to some response, I should clarify: Every compiler I have ever used (and I've used quite a few) relies on IEEE floating point for float and double. However, the ANSI C/C++ Standard does not require IEEE floating point to be used (it just happens to be convienently used basically everywhere).

The struct/class/template/union are not technically 'data types' but more of a grouping mechanism to group logical chunks of data types (and data) together.

Pointers can point at any of the other data types.

Now that I've worked through all the other types, we are left with 'int'. People first get their first taste of this problem when they discuss sizeof(int) on the various forums out there. They learn quickly about 'short', 'long', 'signed' and 'unsigned'. Below is a chart of what the ANSI C/C++ Standard says is the _minimum_ number of bits for each type of 'int':

short int - 16
signed short int - 16
unsigned short int - 16
int - 16
signed int - 16
unsigned int - 16
long int - 32
signed long int - 32
unsigned long int - 32
long long int - 64
signed long long int - 64
unsigned long long int - 64

Those are the minimum number of bits. Each compiler author chooses what various forms of sizeof(int) are in terms of bits. So, some compilers are 16. Some are 32. Some are 64. And there are even a couple compilers that define sizeof(int) as 24 bits. It varies so widely.

The 'long long' type was added so that 64-bit hardware could be programmed for in C/C++. And, to accommodate this hacky extension of C/C++, you use the printf()-family format specifier %lld, %lli, or %llu to display the integer.

So, now here is a question to ponder: What happens when native 256-bit data types show up in hardware? Are we going to create more hacky accommodations and sit all day writing 'unsigned long long long long int/%llllu'? What happens if some popular compiler vendor decides that 'unsigned long long long long int' is 384 bits? Or are we going to start mixing 'short's and 'long's - 'unsigned long long short long int'? How much longer will this "minimum number of bits" foolishness continue? The lack of being able to declare how many bits/bytes we need for integer representation is silly and only going to get sillier as int sizes get larger.

Someone will point out bitfields. I hereby point any such someones at the numerous issues surrounding bitfields. Bitfields are typically restricted to the size of an 'unsigned int', have poorer performance (compared to doing the same things they do yourself), are only available
inside 'struct's, etc. They are a modest solution, but hardly workable in the long run.

'int' (implicitly 'signed') and 'unsigned int' should be all we need and should mean, "I don't care how many bits there are for this integer and trust the compiler to generate enough bits for the target platform for most common operations." That is - 32 bits for 32-bit platforms, 64
bits for 64-bit platforms, etc. For specific declarations where we need exact bit sizes and refined control, I'd like to see something like:

int@64 x, y;

To declare a couple of 64-bit signed integers. The '@' is illegal C/C++ right now (i.e. won't compile if you try it) but would be a pretty good candidate.

For printf() and similar routines, %@64i seems like it could work well enough. The '@' symbol also avoids existing printf() "extensions" such as multilingual parameter support. This suggested approach would make things more readable and this approach also allows us to get rid of the near-worthless 'short' and 'long' keywords from the language.

I'm definitely agreeable to restrictions on such an addition to the language such as "implementations may choose to place a limit on the value after the '@'". And such limits could be "multiple of 8" or "power of 2" or "between 1 and 256 inclusive" (or a combination). For where a compiler has restrictions and does not necessarily provide native support, it should be possible to do something like this:

#if !native(int@512)
typedef __Int512Handler int@512;

int@512 x, y;

The 'native' preprocessor keyword would mean, "If the compiler supports this type natively" (the above example means, 'If the compiler does NOT support this type natively'). The above example would allow printf() to assume that the value passed in will be of the specific class if the @value falls out of range of native values. The class would provide the necessary logic to handle all normal integer operations.

Even with the above, the compiler author should still be able to put restrictions on what can be declared. Let's say the compiler author only wants to support "integers that are multiples of 8 and powers of 2 up to 512 bits" and the hardware only supports everything in that range up to 64-bits natively, then the author only has to define 128, 256, and 512 bit support. As each platform comes into existence, native support can be added and, in the event older native elements disappear (e.g. 16-bit), support can be relegated to a class. Old code would just simply need to be recompiled for the new hardware.

Sadly, though, the ANSI C/C++ committees aren't likely to ever see this suggestion. So, we will likely end up writing 'unsigned long long long long short long long long short short long int' in the not too distant future.

Edit: Someone pointed out stdint.h to me. I feel slightly embarrassed for not knowing about this BUT implementations are only required to provide definitions for types that the hardware and compiler supports. My approach does something similar EXCEPT it allows us to get rid of a potential plague before it becomes quite troublesome: 'short' and 'long'. The whole point of this post is that we are using 'short' and 'long' for something they are NOT intended to be used for - to define integers of specific bit-lengths and if we keep on going like this, it will come back around to haunt us later. stdint.h is definitely a good starting point but we need to get rid of 'short' and 'long' in favor of a more generic approach.

"Hair-brained stupid schemes" is a great phrase.


  1. LP64 ... I'm reading your post because of something I regard as equally ridiculous.

    "long long int"

    I went through the wars of porting 32-bit C code to 64-bit platforms. Lots of churn in those days, but essentially DEC made it rational with a simple rule:

    32-bit: int, long and pointers are all 32-bit
    64-bit: long and pointers get promoted to 64-bit.

    I never understood why we had long and int in the first place (I guess long was the opposite of short), but the scheme works well ... with all compilers except Microsoft.

    If "long" defines 32-bit integers in 32-bit mode, and 64-bit integers in 64-bit mode; and "long long" defines 64-bit integers in 32-bit mode ... what on earth is "long long int" ... an artifact of the Dept of Redundancy Dept ?

  2. I know this post is nearly two years old now, but I had to point out: the Microchip C18 compiler DOES actually have a "short long int". It specifies a 24-bit integer.

  3. There is something like (u)int512_t.

    1. Well, it is better than stringing a bunch of 'long's together.