109: Data Types: Ints Part 2.

You’ll probably have one or more ints in almost every method and class you write. They’re everywhere so you really should know how to use them.

Knowing how to use the int data type means you need to understand the limits and consequences of using a fixed number of binary digits. The previous episode explained what types of ints exist and their lengths. This episode explains how negative numbers play a huge role when working with ints and an interesting security vulnerability that can result when you don’t properly use ints.

The math that computers perform with your variables is different than in real life. Because the computer can’t just add extra digits when it needs to. If I asked you to add 5 plus 5 five on paper, you’ll need to start using a new digit to represent 10. If you limited your view to just that single digit, then you might think that 5 plus 5 is equal to 0. If you had reached your maximum number of bits, then this extra need is called an overflow and will cause the result to wrap around back to the beginning.
And computers are also different because they work with the full word size even for small values. In real life, this would be like every time you wanted to write the number 5, you instead always wrote 0,000,000,005. Adding leading zeros doesn’t change the value. At least in real life.

And here’s another aspect where computers are different yet again. A computer doesn’t have a place to put a little negative sign so instead it uses two’s complement to represent negative numbers and that means it uses the most significant bit to signal if a number is negative or not.

This causes small negative numbers to appear just like large unsigned numbers. You have to know ahead of time how you want to interpret the bits. Listen to this episode for more or read the full transcript below.

Transcript

The same rules apply no matter if we’re using 4 bits or some other amount. Every bit that you have to work with effectively doubles how high you can count. With a single bit, you can count from 0 to 1. With two bits, you can count from 0 to 3. With three bits, you can count from 0 to 7. And with four bits, you can count from 0 to 15.

If we let the letter n be the number of bits, then you can count from 0 up to 2 to the power of n minus 1. For example, if n equals 4 which means we have 4 bits, then 2 to the power of 4 is the same thing as 2 times 2 times 2 times 2 which is 16. Then just subtract 1 from 16 to give the highest number you can count up to with that many bits. Four bits means we can count from 0 up to 15.

With 32 bits, a typical unsigned int can hold any value from 0 up to 4,294,967,295. And a typical signed int can hold a value about half of that. As long as you can work with numeric values less than this, then a 32 bit int is a good choice. Even if you expect values up to just a thousand, you’ll still normally use an int. A short int would fit values up to a thousand better but a lot of times, you’ll be doing things with your values that need to interact with other variables. Unless you absolutely need the extra two bytes that you save with a short, go ahead and use an int.

Now where you can save enough to make a difference is when you need a lot of variables at the same time in memory. Maybe you have a class with some data members and you need a million of them to be loaded into a collection in memory. Knowing when you can use a short vs. an int in this case is important and could save you two million bytes of memory. Just be aware that the compiler will usually add padding to your classes so that they align better with the processor’s natural word size. If your class has only a single numeric value, then making it a short probably won’t matter at all because the compiler will just add anywhere from 2 to 6 bytes of padding.

If you’re using a language such as C#, then there’s not a lot you can do about this other than selecting the smallest sized types to begin with. One good thing is that C# is more specific about the size of shorts, ints, and longs. In C#, a short is 16 bits, an int is 32 bits, and a long is 64 bits. You might get the same thing in C++ if your system is using the LP64 model or you might get something different. Listen to the previous episode to learn more about LP64.

But a language like C++ gives you more control. You can select smaller integer types too just like in C# but in C++, the order that you declare your class member variables is important and can affect the padding. You’ll probably want to try different arrangements and test them to see what works best.

There’s one more important topic about ints that has to do with signed vs. unsigned use as well as an interesting security vulnerability that can arise from improper use of ints that I’ll explain right after this message from our sponsor.

( Message from Sponsor )

For this discussion, instead of using 32 or even 64 bits, I’ll explain things with just 4 bits. The same concepts apply no matter how many bits you have but it’s a lot easier to explain with fewer bits.

And computers are also different because they work with the full word size even for small values. In real life, this would be like every time you wanted to write the number 5, you instead always wrote 0,000,000,005. Adding leading zeros doesn’t change the value. At least in real life.

And here’s another aspect where computers are different yet again. In real life, to make a number negative, we just add a little negative sign in front of the number. Or maybe you’re used to putting the number inside parenthesis to show that it’s negative. Either method doesn’t work with computers. Because computers normally use two’s complement to represent negative numbers. A computer doesn’t have a place to put a little negative sign so instead it uses the most significant bit to signal if a number is negative or not.

This causes small negative numbers to appear just like large unsigned numbers. You have to know ahead of time how you want to interpret the bits. Back to our example with 4 bits, if the bits are all ones, then this could represent the value 15. But it could also just as easily mean the value -1. Looking at the bits alone isn’t enough to tell the correct interpretation. The compiler will write code to interpret these 4 bits as either 15 or -1 depending on how you declared your variable in the first place. If you declared a signed variable, then when all four bits are 1, the compiler will take this to mean the value -1. And if you declared your variable to be unsigned, then the same bit pattern will mean 15.

This is only half the story though. Because remember when I said that if the memory usage isn’t critical that you should probably just go ahead and use an int? Well, let’s say that memory was tight and you needed the full range of a short with no need for negative numbers. So you declared an unsigned short which allows you to store values from 0 up to 65,535. Then comes the time in your code when you need to compare this value with some other number that’s declared to be an int. And let’s say that the actual value from your unsigned short is 40,000 while the actual value from the int is 5. That 40,000 is big enough that the bit pattern would be interpreted as a negative number had it been signed.

When you compile your code that compares your unsigned short with an int to see which has the larger value you would expect that the 40,000 is much larger. But instead, you get a compiler error message that says there’s a signed/unsigned mismatch. So to fix it, if you think that you can just cast your unsigned short to a short, then what happens is that 40,000 value gets reinterpreted as the value -25,536. And guess which value is now larger. Yep, the value 5 is now a lot larger than what should have been 40,000.

Had you instead just directly cast the unsigned short to an int, then the compiler would have performed an operation called sign extension. Actually it does this all the time anyway but going from an unsigned short to the larger int would have allowed the compiler to make the correct decision.

To see what I mean, let’s go back to the simpler 4 bits and pretend for a moment that ints are 4 bits long. We need a smaller short int type so let’s assume for a moment that short ints only use 3 bits. Alright, that 40,000 value just isn’t going to fit at all in 3 bits so let’s change it to be 7. The new scenario is that you now want to compare an unsigned short that takes up 3 bits and has the value 7 with an int which takes up 4 bits and has the value 5. The processor needs to eventually compare values of the same size. So the compiler will need to convert the 3 bit value into 4 bits. But it notices that the smaller value is unsigned and the larger value is signed and gives you an error. At least you should hope for an error. If it only give you a warning, then you may have never noticed the signed/unsigned problem at all.

So how does it convert an unsigned value into a signed value that takes up more bits. If it adds leading zero bits, then that would take the value 7 which is all three bits set to one and add a leading zero bit. The result will be 0111. That’s great because 0111 is still 7. But what if we had a signed 3 bit value of 111? That’s the value -1 in three bits. To convert this to 4 bits, we now need to extend the sign flag all the way out through the extra bits. I mean that 111 when it represents the value -1 needs to become 1111 in four bits in order to remain the same value -1. We can’t just fill in the larger bits with all zeros when the smaller value represented an actual negative number. We need to sign extend the bits all the way out.

And that’s why converting the unsigned short to a short first caused the value 40,000 to be reinterpreted as the negative value -25,536. Then when the newly reinterpreted signed short gets converted to the larger int, the compiler made sure to preserve what it thought was a negative value into the int. This cause the resulting int to also be -25,536.

And this is why if you instead convert straight to a larger int type, then the compiler will see the same bit pattern but this time it know that the most significant bit does not mean the value is negative and will fill in the leading int bits with zeros. This maintains the value 40,000.

The final lesson here is that you need to be careful when casting signed types to unsigned and the other way around because you may end up reinterpreting your values in very different ways.

This same lesson applies to bytes and chars too. They’re all numeric types and use two’s complement to represent negative numbers. If you tell the compiler to reinterpret those bits in a different way, then don’t be surprised when large values suddenly become negative and when negative values suddenly become large positive values.

Alright, I said I was going to explain an interesting security vulnerability and it’s related to what I just described. Let’s say that you need to allocate some memory and know that you need at least 100 bytes plus some extra amount that depends on a couple values you read in from the outside world. And let’s say that you’re using ints for all your calculations. If the outside world is playing nice, then maybe you read an int value of 10 and another int value of 10 and you allocate 120 bytes and all is well. But if you’re up against an attacker, then what happens if the attacker asks for the maximum int value both times? You might think that this would result in such a large amount of requested memory that your allocation will fail. And your program would detect the failure and send a message back to the nice attacker to please ask for less memory. But what ends up happening is the maximum int value added to 100 is enough to cause the int bits to now be interpreted as a very large negative value. Since you can’t ask for a negative amount of memory, this doesn’t cause problems yet. But another maximum int value added brings the int value back into the positive side of things but with a sinister result. You see, the maximum int value interpreted as a negative number is -1 and because we momentarily went into negative mode, then end result is as if we subtracted 1 two times. We end up asking for only 98 bytes of memory. That’s definitely a reasonable amount of memory to ask for. The only problem is that we expect at least 100 bytes for our own use. And when we try to use those last two bytes, if we’re lucky the application will crash. If we’re not lucky and the attacker planned all this very carefully, then that simple mistake in our program is enough to potentially give the attacker full control over the computer.

The lesson here is simple. Watch out for negative values and remember that even large positive values have the potential to switch over into a negative value if they overflow.

109: Data Types: Ints Part 2.

Transcript

Tags

Leave a ReplyCancel reply

109: Data Types: Ints Part 2.

Transcript

Share this:

Tags

Leave a ReplyCancel reply