Chars and bytes form some of the most basic data types available. But what are they really? And what can you do with them?
This will depend on what language you are using and maybe even what platform you’re building your application to run on. Some languages such as C# may have both types while others like C++ may have just one but with an idea of what the other is. And some languages may have just one or possible neither.
In general though, these types are used to hold small numeric values and character codes.
The episode also describes both signed and unsigned types and a common way of representing negative values called two’s complement. To convert between positive and negative values using two’s complement, all you have to do is first flip all the bits and then add one. This works both ways.
An easy way to tell if a signed value is negative or not is to look at the most significant bit. If it’s set to one, then the value is negative. And if it’s a zero, then the value is positive. Listen to the full episode or you can also read the full transcript below.
This episode gets us back to some of the basics and I’ll explain many data types in their own episode. You might already have an idea of what a char is after listening to the other episodes but there are some things that we can discuss in this dedicated episode that would have taken other episodes off topic.
This might seem simple but there are some complexities. The byte type is normally 8 bits long but it can be different. I’ve always known bytes to be 8 bits and that’s how I think of them. The technical term though for a type that’s always exactly 8 bits is an octet. You’ll probably comes across this term sometimes especially in networking. If you think of a byte as the smallest unit of addressable memory, then there have been many different sizes of bytes both smaller and larger than 8 bits throughout history. And some embedded systems even today can have a different number of bits in a byte.
The final answer then is that a byte normally has 8 bits but maybe not. If you really want to be specific and refer to exactly 8 bits, then you can call it an octet.
Except in C# where a byte is defined to be 8 bits and is unsigned. I’ll explain signed and unsigned in just a moment. C# also gives you a signed byte or just sbyte as it’s written.
In C++, there is no byte data type but the size of a byte is defined to be at least 8 bits. The data type that you use for a byte in C++ is the char. You also have besides the char, an explicit signed char and an unsigned char. This is where it gets strange. All three types, char, signed char, and unsigned char are distinct types in C++. So is char signed or unsigned then? Well, that depends on your compiler and sometimes even on the platform that you’re compiling for.
Alright, how does C# represent chars then? Hopefully your heads not spinning too much yet. You might want to sit down though so you don’t get dizzy. In C#, a char is 2 bytes long and C# really goes out of its way to make sure that chars are used just for 16 bit unicode characters. So while it has the same representation as an unsigned 16 bit numeric type, you really shouldn’t think of a char in C# as a number.
This is different in C++ where a char is defined to be one byte. Here, you can easily store simple ASCII characters as well as numbers. You just need to make sure that the number value will fit. For that, you need to know about signed vs. unsigned values. I’ll explain those right after this message from our sponsor.
( Message from Sponsor )
There are many ways that you could represent negative numbers and probably the most common is called the two’s-complement. Let’s start with some simple binary counting though. We’ll just count with 2 bits because anything more gets confusing with just audio. Okay, with 2 bits, we can count in binary up to the value 3 like this:
◦ 00 is the value 0
◦ 01 is the value 1
◦ 10 is the value 2
◦ and 11 is the value 3
The same pattern repeats as you add more bits. The two’s complement system is nice because it allows you to add positive and negative numbers in binary without worrying about the sign. You do still have to worry about overflowing the bits. The way it works is like this. To change a positive number into it’s two’s complement negative version, first flip all the bits and then add one. That’s it. Real simple. And to go back and convert a negative value back into it’s absolute positive value, just do the same steps again. Flip all the bits and then add one.
Let’s take the value positive one which is 01 in our 2 bit example. If we were using more bits, then the value one would still be the same with all leading zeros. For example, with 4 bits, it would be 0001.
Okay, so we have 01 and the first step is to flip all the bits. That gives us 10. Then we add one which gives 11. The value -1 in two’s complement will always be all ones no matter how many bits you have. If you have 4 bits, then -1 will be 1111.
When we were counting in binary from 00 up to 11, we went from 0 up to 3. But this was when we were interpreting these bits as unsigned. If instead, we want to accommodate negative values, then something has to go. Because the value 11 can’t be both 3 and -1. It turns out that with our 2 bit example 3 and -1 do have the same bit pattern. So we need to know ahead of time how to interpret the value. This is why it’s important to be able to specify if a value should be signed or unsigned.
Let’s complete the example though and convert 2 into a negative value. We start out with 10 which is the value 2 if unsigned. The first step is to flip the bits. That gives us 01. Then we add one. That gives 10. So in our 2 bit example, 10 can mean both 2 or -2. Which one depends again on how we want to interpret the bits.
An interesting fact about two’s complement is that you can always tell if a signed value is negative or not by just looking at the most significant bit. If that bit is a 1, then the value is negative. I’ll talk about this some more in the episode tomorrow.
Notice how our 2 bit example can count from 0 up to 3 if unsigned and from -2 up to 1 if signed. There will always be one less value that you can count up to in signed than the smallest negative value. This is because of the zero. There’s only one value that represents zero which is actually a good thing. But it does take up a spot and that’s why the positive max value is one less than the absolute value of the smallest negative value.
Going back to our 8 bit char, then. If it’s unsigned, then it can represent values from 0 up to 255. And if signed, then it can represent values from -128 up to 127.
In C++, if you know that you want a small numeric value that will fit in these ranges, then you can use a char. And if you want the most portable code, then you need to be specific about which range you mean and use either a signed char or an unsigned char.
The same thing applies to C# but with less ambiguity. Because C# defines the byte type and the sbyte type. If you want a small numeric value between 0 and 255, then use a byte. and if you want to support negative values and can live with the reduced max values in each direction, then an sbyte will go from -128 up to 127. And if you really want to represent a 16 bit unicode character in C#, then use a char type.