fbpx

The preprocessor is old, primitive, and strong. But you need to understand its strengths to use it effectively. It adds capabilities that the C++ compiler can’t come close to duplicating on its own.

All the preprocessor directives begin with the hash character, #, and do not use a semicolon at the end. The end of a directive is just the end of the line. You can extend a preprocessor directive to other lines by ending each line to be extended with a backslash character, \.

The using statement is the most common preprocessor directive and might look like this:

#include <iostream>
#include "../OtherProject/CustomClass.h"

You’ll also often find conditional sections like this:

#ifndef __cplusplus
#error You must compile this code with a C++ compiler
#endif

A very common pragma directive that allows you to specify that an include file should be processed only once no matter how many times it gets included in a particular compilation unit is:

#pragma once

And you can also use preprocessor directives to define elaborate replacement text like this:

#define MYCLASS(name, id) class name ## id { };

Using this macro like this:

MYCLASS(MyClass, A)

Would be equivalent to writing the following:

class MyClassA { };

This macro not only replaced MYCLASS with the content of the macro but the ## symbols caused the text passed as name and id to be concatenated to form the class name.

Listen to the full episode, or you can also read the full transcript below.

Transcript

It’s called a preprocessor because it runs first before the compiler ever starts working with your code. You use it all the time without even realizing it. Anytime you write #include and then the name of some standard header file or one of your own header files, then you’re using the preprocessor.

I’m going to explain seven aspects or capabilities of the preprocessor with some examples. All the preprocessor directives begins with the hash character. The preprocessor also doesn’t use semicolons to mark the end of a directive. A preprocessor directive goes all the way to the end of a line. The end of a line also marks the end of a preprocessor directive but you can if you want extend very long directives onto the subsequent lines by ending each line to be continued with a backslash character.

Number one is the most common and one that I just mentioned #include directives are followed by the name and optional path to a header file. The header file is placed within angle brackets for standard header files that come with the C++ compiler itself or that come with various libraries that you can install. These directives cause the contents of the header file to be included in the file being compiled. If the included file has its own #include statements, then they’re also included. You use header files to define the basic structure of classes or to declare information such as method declarations to be used elsewhere. Putting this information in header files allows you to define it once and make use of it wherever needed without having to re-declare everything each time. Think of this like a rubber stamp that you can use to imprint text over and over again in documents.

Number two is also used quite regularly whenever you want to selectively include or exclude sections of your code. These are conditional directives and you have the following available:

◦ #if, #ifdef, #ifndef, #else, #elif and #endif

◦ Some of these are abbreviated in code but when speaking their names, I’ve always found it easier to pronounce them fully. For example, #elseif is how I pronounce #elif. To me, else if just sounds better than elif. Just like how we normally write st in addresses but say street when talking.

◦ These directives work together to test conditions and then either include or exclude entire sections of code. You can also have other preprocessor directives inside these sections that get included or excluded.

◦ You can use these to change how your code is written. This goes beyond just changing how your code behaves at runtime. Maybe you want to have a single header file that defines useful information for the rest of your program. But this information is completely different depending on your platform. So if you’re writing a program to run on Windows and you want to include this file, then your information will be specific to Windows. On Linux, it will be specific to Linux. You do this by putting both Windows and Linux code in the same include file but each in their own preprocessor conditional section. Then by defining some term such as WINDOWS before beginning the compilation of your windows code, you’ll be sure to include the sections that are specific to windows and it’ll be as if the sections for Linux never existed.

Number three is something that I’ve never used. It’s called a null directive and consists of just a single hash character. Nothing happens and it gets ignored. I included it here just to make this list complete.

Number four is an error directive. This will stop the compilation with an error. Why would you want to force an error? Well, if you forced an error all the time, then I’d also ask you, “What are you doing?” No, the usage of this directive makes sense when included in a conditional section. Maybe you don’t want to support compiling your code on Linux. You can then force an error only if you detect Linux.

Number five is a file name and line information directive. The preprocessor is aware of the actual files being compiled. The C++ compiler itself does not have this information. If you want to include the actual source code file name or the line number of some statement in your code, you could manually type this information into your code as a string literal but that’s very fragile. What happens is you add some more lines of code to the beginning of your file? Or what happens if you copy this code to a different source code file entirely? You don’t want to have to keep updating this information. So let the preprocessor do this for you. Just be aware that this information is really of no use to your customers. Your customers won’t know or care what file name or line number your code encountered a serious error. This information is only useful to the developers. But it can be extremely useful when tracking down bugs.

I’ll explain the last two directives right after this message from our sponsor.

( Message from Sponsor )

Number six is called pragma directives. This directive begins with #pragma and contains instructions that are specific to your compiler. You can use these directives to control the behavior of your compiler itself from within your source code file. What you can do and how you need to format these directives will be found in the documentation of your compiler.

And finally, we get to number seven. These are called replace text macros and use #define and #undef directives along with some clever use of either a single or a double hash character. This is where the full power of the preprocessor can be found. It’ll take some getting used to and even I need to refer to the language documentation anytime I try to write complex code that makes use of these directives.

The first thing to understand is that the preprocessor really doesn’t understand anything about C++ or any other language. All it knows is how to replace some text with some other text. Even so, it’s really good at this job. the basic directive is written like this:

◦ #define identifierText replacingText

◦ Most define directives use all capital letters for the text to be replaced. This makes it easy to spot text in your code that will be targeted for replacement.

◦ The original usage of this feature comes from the C language where it was used to define constant values. For example, instead of writing 3.14159265 in various places in your code, it used to be better to define a macro for this. You would write:

▪ #define PI 3.14159265

▪ Then in your code, anytime you need the value of Pi, you could just write capitalP capitalI and by the time your compiler gets to that part of your code, it would be as if you had written the digits directly. The preprocessor runs first and replaces the identifier text, capitalP capitalI, with 3.14159265.

▪ A better way to do this same task in C++ is to define a const double called pi and assign it the same value 3.14159265. The benefit of using a constant double variable is that you now have a typed variable that you get to specify. It’s protected from being modified because it’s const but now the compiler knows about the variable. When it was defined with a macro, the compiler just sees some digits.

◦ This has also been used to define utility methods such as a macro to determine the larger of two numbers. I don’t recommend you do this anymore. This was essential up until about the mid 1990’s when templates were introduced.

◦ The main reason that I still use macros is for wrapping up repetitive code into simple replaceable identifiers. I’ve used macros to declare entire classes with methods that all get hidden behind the simple text identifier in my code. The compiler never sees the identifierText and it’s as if I had written the class definition directly. This could lead to duplicate definitions, so what I do is make use of the line number macros to give my class definitions unique names.