fbpx

If you want to work with fractional values instead of just whole numbers, then floating point types are usually a good choice. They’re different enough from ints that you need to understand how to use them.

Working with ints gives you the ability to accurately define each and every whole number that’s within the range of the number of bits available. But integer arithmetic is also limited to whole numbers. If you multiple 1 by 3 you get 3 as expected. If you divide 1 by 3, you get 0 though. If you want one third, then you’ll have to use floating point arithmetic instead to get close to 0.333.

Even floating point arithmetic is not perfectly accurate. It’s an approximation. Just like 0.333 is close but not exactly one third.

Modern computers have built-in support for performing really fast calculations using floating point types. It won’t be exact but it’s fast. If you want an exact answer, then you’ll have to settle for 37 thirds instead of 12.333. That’s because there just aren’t enough 3’s in the answer. That 12.333 really should be 12.33333 all the way to infinity with 3’s. What we lose in terms of absolute accuracy is made up with the ability to keep using the answer in further calculations. It would be too confusing to try calculating everything in terms of proper fractions. We settle with decimal point representations instead and use the computer’s built-in hardware to do the work.

You’ll also want to be very careful testing for equality or greater than or less than when working with floating point numbers. You may find that the answer is not what you expect. For example, if you have some simple code that checks if 1.1 plus 2.2 equals 3.3, well, my computer says it’s not. How can the computer get something so simple absolutely wrong? What usually happens is because 1.1, 2.2, and 3.3 are all approximations, then the calculations are just off enough for the comparison to fail. Listen to this episode for more about floats, or you can also read the full transcript below.

Transcript

Working with ints gives you the ability to accurately define each and every whole number that’s within the range of the number of bits available. But integer arithmetic is also limited to whole numbers. If you multiple 1 by 3 you get 3 as expected. If you divide 1 by 3, you get 0 though. If you want one third, then you’ll have to use floating point arithmetic instead to get close to 0.333.

Even floating point arithmetic is not perfectly accurate. It’s an approximation. Just like 0.333 is close but not exactly one third.

The only way to remain completely accurate and true to something like one third is to keep track of the fraction numerator and denominator directly. But this makes calculating difficult. If you want to multiply 37 by one third, it’s much easier and faster to perform floating point arithmetic as long as you realize the answer will be slightly off.

Modern computers have builtin support for performing really fast calculations using floating point types. It won’t be exact but it’s fast.

If you want an exact answer, then you’ll have to settle for 37 thirds instead of 12.333. That’s because there just aren’t enough 3’s in the answer. That 12.333 really should be 12.33333 all the way to infinity with 3’s. What we lose in terms of absolute accuracy is made up with the ability to keep using the answer in further calculations. It would be too confusing to try calculating everything in terms of proper fractions. We settle with decimal point representations instead and use the computer’s builtin hardware to do the work.

One important point to realize though is what can start out as a small difference between the true answer and the floating point representation can grow into a bigger error as more and more calculations are made. Each time a floating point number is multiplied the small error also gets multiplied.

I remember when I was first learning how to program and wanted to build a game. It was a side runner and I had an image of a character that would appear to run across the screen and had to avoid obstacles. At first everything matched up well. But as the character ran farther and the scenery scrolled to the left, the character started running into obstacles that would not have been there. For example, the character would run into a wall and stop even though the wall was still many pixels away. I was using floating point numbers and the accuracy of the calculations slowly drifted until it was noticeable.

All of this really has nothing to do with computers. It’s a problem more to do with the number system. Or, I should say, with the base of the number system. If we used a base three or a base six or any multiple of three number system, then one third would be no problem. In base three, one third is 0.1. This wouldn’t really solve anything because in that system, then one half would be repeating. I’m not going to get into the details of why. It’s just too much to explain in an audio podcast. The main thing to remember is that no matter what number system is being used, there will always be values that don’t quite fit nicely and need to be approximated.

There are other things you need to know about floats and I’ll describe them right after this message from our sponsor.

( Message from Sponsor )

How did I solve my first attempt at a computer game? Well, I didn’t really. All I did was make my calculations more precise by using doubles instead of floats. It worked well enough that I was happy.

Just like ints have different sizes and names such as short, long, and long long, well, floats also have different sizes. It turns out that the basic float type is the smallest. The next larger floating point type is called a double. And beyond that, there’s the long double. When I refer to floating point numbers or floating point calculations, then I’m referring to the type of math that involves fractional values and not the specific float type itself. And when I refer to a float, then I’m talking about the specific type of floating point number representation.

Both float and double have well defined meanings. Long double is likely to be different depending on your compiler and platform. Floats are represented with 32 bits and doubles use 64 bits.

But these are not like integers where every new bit doubles the range of values. Floats and doubles use a format specified by the IEEE and they store a sign bit, an exponent, and a mantissa. Think of a mantissa as a base number that gets raised to a power of the exponent. This arrangement allows you to represent some very large numbers and also some very small numbers.

But you have a fixed number of digits to work with. What this means is that you won’t be able to represent a really big number that also has a tiny fractional part.

When I switched my game to use doubles, all I really did was improve the range of the calculations so that my rounding errors became negligible. A better fix would have been to account for the types of numbers I was working with and not try to mix big numbers with small numbers.

If you have only small numbers, try to account for the accumulation of rounding errors by reducing the number of calculations you need to perform.

Even if you group your numbers and control how many operations you do, it may still not be enough. You’ll also want to be very careful testing for equality or greater than or less than when working with floating point numbers. You may find that the answer is not what you expect.

For example, if you have some simple code that checks if 1.1 plus 2.2 equals 3.3, well, just try it. We know that 1.1 plus 2.2 should be an exact match for 3.3, right? My computer says it’s not. What’s up with that? Computers are supposed to be really good with numbers! How can the computer get something so simple absolutely wrong?

The answer is a bit complicated. When I ask my computer if 1.5 plus 2.5 equals 4.0, then it gives me the correct answer. What usually happens is because 1.1, 2.2, and 3.3 are all approximations, then the calculations are just off enough for the comparison to fail. But 1.5 and 2.5 and 4.0 all can be represented exactly with no approximations.

Now, you can’t write your program trying to figure out which values will be approximated and which will be exact. The best course of action is to just assume that all floating point values will have some degree of error to them.