What happens when code has undefined behavior? There’s hundreds of ways code can have undefined behavior. What happens is completely up to the compiler. You should not depend on undefined behavior because compilers can completely change or delete sections of code. Since the behavior is undefined, then compilers can optimize code in ways you never intended.
I’ve known about undefined behavior for a long time. But this episode describes something that surprised me. I was not fully aware of the magnitude of the problem and the drastic steps that compilers can take. You really can’t depend on anything in your code once you enter undefined behavior.
And because compilers are getting better all the time, something that used to go undetected can suddenly change on you as the compiler decides to rewrite your code. Since it’s already undefined, then who’s to say that the compiler is wrong?
There’s a lot of really great videos online at cppcon.org about undefined behavior if you want to learn more. Just search for “cppcon undefined behavior” and you’ll find lots more information.
If you’d like to improve your coding skills even more, then you can find all my favorite books and resources at the Resources page.
Listen to the full episode for more details or read the full transcript below.
Undefined means that we don’t know what will happen.
I’ve been programming for a long time and have known about undefined behavior. But I just learned something new about undefined behavior that I want to share with you. This surprised me.
Let’s first recap what is undefined behavior and what causes it.
You can get into this situation in a lot of ways. I recently found out that the C language has about 200 documented cases leading to undefined behavior in your code. C++ has even more but they’re not documented all in one place.
None of this includes your own code or that of some library you’re using.
I’ve always known that you can’t rely on undefined behavior but what surprised me was that the compiler can decide to do whatever it wants when it detects undefined behavior. Since the behavior is already undefined, the compiler writers viewpoint is that the compiler can’t make things any worse. So what happens is the compiler can optimize your code. In other words, they actually use undefined behavior.
And compilers don’t just use undefined behavior in the same way, they’re getting better at it.
Code that used to work even though it relied on undefined behavior can sometimes stop when you upgrade your compiler. A new compiler can detect the undefined behavior and decide to reorder your code, skip function calls completely, exit out of your loops early, and who knows what else. The exact code generated by the compiler can change from one version to the next or even with different options.
Compilers are able to detect undefined behavior based on possible misuse even if other factors would normally cause nothing bad to happen.
For example, there’s a method called memcpy, usually pronounced as “memcopy”, that copies data from one place in memory to another. It takes two pointers and an integer. One pointer points to the source of the data. Another pointer points to the destination. And the integer lets you control how much data will be copied. Now, if you call this method with the size set to zero, then it doesn’t matter what the source and destination point to. You’re saying don’t copy anything from over there to over here. But just the fact that the compiler notices you using null pointers means you’ve already entered undefined behavior.
To put this in terms of something from real life, let’s say that you visit an ice cream shop and tell the clerk that you’d like to purchase nothing. Does it really matter if you wanted to pay with fake monopoly money? No. Because you’re not really buying anything.
But if this was a compiler, it might decide that because you’re using fake money, then your behavior is undefined and it’s free to do whatever it wants. Maybe you walk away from the shop with all your real money mysteriously gone from your wallet. Or the secret service is waiting for you outside the shop for using counterfeit money. They arrest you even though you explain that you never actually tried to buy anything.
When writing code, it might seem like a good idea to make calls like this in a loop or during recursion. Maybe you know that the first time through the loop, everything will be okay and it simplifies your code so you don’t have to write special cases. The compiler might see this and decide to do whatever it wants. It might remove all this code as if you never wrote it at all.
Maybe you do notice that your program isn’t behaving like you thought it would.
What do you do?
Well, you start debugging. One way to debug is to add code that will print messages to a log file. You think that you can trust that if your code reaches some log statement, then it will print the message to a file. But if the bug is caused by undefined behavior, then the compiler might just decide to skip everything. Including your log statement. You’re looking through your log file and getting even more confused because it doesn’t make sense.
That’s the thing with undefined behavior. Once you get into this situation, anything is possible.
You can’t even rely on warning messages from your compiler. What would that message look like? “Hi, this is your compiler speaking and I noticed that this code might sometimes not behave correctly. So I decided to make it better by changing it for you.”
Here’s one specific case of undefined behavior.
This is actually the case that I found out about and was surprised by. It turns out that C++ says it’s undefined to take the address of a class member method in the std namespace.
Why would you ever want to do that?
Well, maybe you learn about something called bind that lets you store away some arguments to pass to a method and then call the bind object later with the remaining arguments. The bind object will take the arguments it already has plus the arguments you provide when you call the bind object and call the method with all the parameters it expects. In order to use bind, you need to give it the address of the method to call. This means if you try to bind the method push_back in a vector, you’ve just entered undefined behavior.
The compiler is then free to do whatever it wants including deleting all that code and pretending it was never written.
How will your program react to gaping holes in the code? That’s for you to worry about. The compiler did its job.
While researching this episode, I watched some really good videos from cppcon. Just search for cppcon undefined behavior. That’s c p p c o n.
One talk describes code that tries to use an initialized variable as a source for random numbers. We know that we can’t rely on memory containing any specific value unless we own that memory and have already written something there. To read a value from memory when nothing was ever written will return whatever value happens to already exist.
Well, the compiler can detect this case.
And it might decide to remove all your code that deals with the uninitialized memory and replace it with something that’s constant.
Since you can’t rely on any specific value, why not just give your code the same value without reading the memory? The compiler can generate smaller and faster code whenever your code tries to do something that it’s not supposed to.