108: Data Types: Ints Part 1.

You’ll probably have one or more ints in almost every method and class you write. They’re everywhere so you really should know how to use them.

This episode explains the data lengths of various ints and pointers along with words. Do you know what a word is? How about a double word or a DWORD? A quad word or a QWORD? A double quad word or a DQWORD? Do you understand how these relate to processors and operating systems?

And how about the short, int, long, and long long types? Do you understand some of the problems that can be caused by pointers switching from 32 bits to 64 bits?

This episode describes all this plus five systems for describing the lengths of all these types:

LP32 – The long and pointer types are 32 bit. This means that shorts and ints are 16 bit.
ILP32 – The int, long, and pointer types are 32 bit. This means that shorts are 16 bit.
LLP64 – The long long and pointer types are 64 bit. This means that ints 32 bit. And shorts are still 16 bit.
LP64 – The long and pointer types are 64 bit. This means that long longs are also 64 bit. Ints are 32 bit. And shorts are still 16 bit.
ILP64 – This is not a very common system. The int, long, and pointer types are 64 bit. This means that long longs are also 64 bit. And I’m not sure what length shorts are.

Listen to the full episode or you can also read the full transcript below.

Transcript

Int is short for integer and that means they can hold whole numbers including zero. And just like the chars and bytes from the previous episode, an int can hold negative numbers if it’s signed.

There’s a lot of history around ints and knowing this can help you better understand all the differences surrounding ints. There’s a lot more variations of ints than there are chars and bytes.

Let’s start out then with the word. The way I always understood word was that it matched the basic width of the processor and while this definition might still be valid for the processors themselves, a lot of prior software assumes that a word is 16 bits. You can find 32 systems today that define a word to be 32 bits and you can also find 64 bit systems that define a word to be 64 bits. While I like this system because it gives a nice name to the width of data that the processor can handle natively, I have to say that I’ve personally had a lot more experience on systems where the word got stuck at 16 bits.

So if a word is stuck at 16 bits, then what do we call a 32 bit value on a 32 bit system? It’s called a double word or a dword for short. You probably want to avoid calling it a double word because a double is another data type that I’ll explain in an upcoming episode. Calling it a dword avoids this confusion.

And what is a 64 bit value called on a 64 bit system when the word is still defined to be 16 bits? That’s a quad word. Or a qword for short.

And what will we call a 128 bit value when 128 bit processors become more common? If we follow the same pattern, then it’ll probably be a double quad word or a dqword for short.

It would have probably been much better if the simple term word had advanced in software like it did on the hardware side in modern computers. Because of this, even the term word must be questioned and defined to make sure everybody agrees what they’re talking about.

You normally don’t have a builtin datatype in languages called a word though. I explained it because it is used in programming as a definition for some other type. You may actually see a DWORD usually spelled in all uppercase letters when programming and now you’ll know some history behind it and can know to check exactly what type it really means. Usually, it’ll be 32 bits.

What most languages do support is the int. But we can’t just stop with a simple int. Not when we have all this confusion around words. So to even things out a bit, there’s short ints sometimes just called a short, there’s the int itself, there’s a long int type sometimes just called a long, and then there’s a fairly new type called a long long. I don’t know what to make of that name. The language designers must have wanted to go home quickly when it came time to come up with a new name.

What all this means is really just extra confusion. About the only thing you can say for sure is that a short will be smaller or maybe the same size as an int. And that an int will be smaller or maybe the same size as a long. And that a long will be smaller or maybe the same size as a long long. These are all relative sizes and don’t really say much. You can also safely assume that both a short and an int will be at least 16 bits, that a long will be at least 32 bits, and that a long long will be at least 64 bits.

Just because an int must be at least 16 bits though doesn’t mean that’s the common length. I’ll explain more right after this message from our sponsor.

( Message from Sponsor )

As processors have advanced from 8 bits. And if I remember right, some of the very early satellites even used 4 bit processors. Wow! I have a hard time imagining what the code looked like. Anyway, as processors have advanced from 8 bits to 16 bits to 32 bits and now mostly 64 bits, one thing actually did advance right along with the bitness and that’s been the size of pointers.

You can safely assume that a pointer will match the capability of your operating system starting with 32 bits anyway. This means that even if you have a 64 bit processor but are running an operating system that only knows about 32 bits, then your pointers will still be 32 bits. You can’t run a 64 bit capable operating system on a machine that only has 32 bits but you can run a 32 bit operating system on a machine that has 64 bits. And if you ever find a copy of Windows 3.1 running in a museum, then it’ll be a 16 bit operating system but with 32 bit pointers. I would assume that 128 bit computers will support pointers of 128 bits also. I don’t have access to a 128 bit computer to verify this and certainly don’t have access to a 128 bit operating system.

So what’s all this about pointers have to do with ints? Well, this is actually a common problem with porting 32 bit code over to 64 bits. Back when 32 bit computers were still new, it was common for ints to also be 32 bits. and many developers made the mistake of writing code that assumed a pointer value could be stored safely in an int and then converted back to a pointer when needed. this worked fine when both ints and pointers were 32 bits. But as soon as 64 bit operating systems became available and defined pointers to be 64 bits then this assumption no longer worked out so well. Trying to squeeze 64 bits into 32 means that some of the bits are going to get cut off.

Beyond just pointers and ints though, mistakes can be made with longs and ints and sometimes even with shorts and ints. In order to help keep things straight, four common names appeared to describe the lengths of all these types. The names at first might seem like just a bunch of letters and numbers but there is a system. And this is one of the reasons why I started out describing pointer lengths.

The first system is called LP32 and this means that longs and pointers are 32 bits. In this system since ints and shorts are left out from the name, that means they’re both 16 bits. You have to go back to Windows 3.1 to find this system. Windows 3.1 is a 16 bit operating system that runs on a 32 bit computer.

The next system is called ILP32 and this means that ints, longs, and pointers are all 32 bits. Since short is left out, then a short is 16 bits. This is the system used by 32 bit versions of Windows, Linux, and Mac computers.

Beginning with 64 bit operating systems, we have the next system called LLP64 and this means that long longs and pointers are 64 bits. Since ints and longs are left out, they’re both 32 bits. Shorts are also left out of the name but they’re normally stuck at 16 bits by now. This is the system used by 64 bit versions of Windows probably because it was already hard enough to fix all the problems between 32 bit ints and 64 bit pointers that the designers didn’t also want to introduce problems between ints and longs too. So in this system, both ints and longs stayed at 32 bits.

The last system is found in modern Linux and Mac computers and is called LP64 which means longs and pointers are 64 bits. In this system, shorts are still stuck at 16 bits, ints remain at 32 bits, and longs, long longs, and pointers are all 64 bits.

And very rarely you might actually come across a fifth system called ILP64 and this means that ints, longs, long longs, and pointers are all 64 bits. I have no idea what length a short would be in this system. You might find this type of system running on a supercomputer.

If there’s one thing that stands out to me about all these systems, it has to be that last fifth system. Just think about it for a moment and realize that the computer you have packaged into a portable 15 inch by 12 inch by maybe 1 inch thick complete with a battery capable of lasting anywhere from 7 to 12 hours or more is approaching the specs of older supercomputers.

Well, I had wanted to tie all this together with the topic from yesterday about chars and bytes but it’s already getting kind of long. And it’s a lot to absorb about words, shorts, ints, longs, long longs, pointers, etc. So I’ll continue tomorrow.

108: Data Types: Ints Part 1.

Transcript

Tags

Leave a ReplyCancel reply

108: Data Types: Ints Part 1.

Transcript

Share this:

Tags

Leave a ReplyCancel reply