If you want to work with fractional values instead of just whole numbers, then floating point types are usually a good choice. They’re different enough from ints that you need to understand how to use them.

Working with ints gives you the ability to accurately define each and every whole number that’s within the range of the number of bits available. But integer arithmetic is also limited to whole numbers. If you multiple 1 by 3 you get 3 as expected. If you divide 1 by 3, you get 0 though. If you want one third, then you’ll have to use floating point arithmetic instead to get close to 0.333.

Even floating point arithmetic is not perfectly accurate. It’s an approximation. Just like 0.333 is close but not exactly one third.

The only way to remain completely accurate and true to something like one third is to keep track of the fraction numerator and denominator directly. But this makes calculating difficult. If you want to multiply 37 by one third, it’s much easier and faster to perform floating point arithmetic as long as you realize the answer will be slightly off.

Modern computers have built-in support for performing really fast calculations using floating point types. It won’t be exact but it’s fast. If you want an exact answer, then you’ll have to settle for 37 thirds instead of 12.333. That’s because there just aren’t enough 3’s in the answer. That 12.333 really should be 12.33333 all the way to infinity with 3’s. What we lose in terms of absolute accuracy is made up with the ability to keep using the answer in further calculations. It would be too confusing to try calculating everything in terms of proper fractions. We settle with decimal point representations instead and use the computer’s built-in hardware to do the work.

Just like ints have different sizes and names such as short, long, and long long, well, floats also have different sizes. It turns out that the basic float type is the smallest. The next larger floating point type is called a double. And beyond that, there’s the long double. Both float and double have well defined meanings. Long double is likely to be different depending on your compiler and platform. Floats are represented with 32 bits and doubles use 64 bits.

But these are not like integers where every new bit doubles the range of values. Floats and doubles use a format specified by the IEEE and they store a sign bit, an exponent, and a mantissa. Think of a mantissa as a base number that gets raised to a power of the exponent. This arrangement allows you to represent some very large numbers and also some very small numbers. But you have a fixed number of digits to work with. What this means is that you won’t be able to represent a really big number that also has a tiny fractional part.

You’ll also want to be very careful testing for equality or greater than or less than when working with floating point numbers. You may find that the answer is not what you expect. For example, if you have some simple code that checks if 1.1 plus 2.2 equals 3.3, well, my computer says it’s not. How can the computer get something so simple absolutely wrong? What usually happens is because 1.1, 2.2, and 3.3 are all approximations, then the calculations are just off enough for the comparison to fail.


What's on your mind?

On a scale of 0 to 10, how likely are you to refer us to a friend?