A regular expression or regex lets you perform the advanced text operations matching, searching, tokenizing, and replacing.
To begin, why would you ever want to use regexes? This is definitely something you’ll want to be careful about. The syntax of a regular expression can easily take several minutes to study even when it only has about 20 characters. Beyond that, and you’ll probably need to create some diagrams to understand what it means. Just one little character can drastically change the meaning.
To say that a regex is terse would be a big understatement. But this terseness is also its biggest strength. It can pack a lot of subtle meaning into what may at first look what you would get if a cat walked across your keyboard. Yeah, regexes can be cryptic.
Here’s the four basic uses that I’d consider for a regex.
- If you have a short string or piece of text and need to verify if it matches a known good pattern, then depending on the complexity of the pattern, a regex might be a good idea. Be careful though, I’ve seen regular expressions that try to validate email addresses get completely out of control.
- If you have a long string, and this could even be a text file, that you want to search for some pattern either to find the pattern or just determine if it exists, then you could find a regex useful.
- If you have a string that you want to divide into several smaller strings based on one or more patterns then you might want to use a regex. This is called tokenizing.
- If you want to not just find some matching text but then replace it with something else, then a regex could help.
There are no hard rules that I can give you for when to use a regex in your code. I consider regular expressions all the time and I can tell you that I rarely use them. I think where they really shine is when the pattern needs to be configurable.
Listen to the full episode for more or you can also read the full transcript below.
Transcript
There’s a lot of details involved that would make all the stars and parenthesis of the function pointer episodes look downright simple by comparison. So there’s no way I can read exact regular expressions to you. I’ll try to focus on just the concepts and highlight just a few of the special characters so you can get an idea of what you can do with regular expressions.
I’ll start referring to a regular expression as a regex. Some people might pronounce this as rejex. I keep the hard G from regular and say regex.
To begin, why would you ever want to use regexes? This is definitely something you’ll want to be careful about. The syntax of a regular expression can easily take several minutes to study when it only has about 20 characters. Beyond that, and you’ll probably need to create some diagrams to understand what it means. Just one little character can drastically change the meaning.
To say that a regex is terse would be a big understatement. But this terseness is also its biggest strength. It can pack a lot of subtle meaning into what may at first look what you would get if a cat walked across your keyboard. Yeah, regexes can be cryptic. That’s the word I was looking for.
Here’s the four basic uses that I’d consider for a regex.
◦ #1 If you have a short string or piece of text and need to verify if it matches a known good pattern, then depending on the complexity of the pattern, a regex might be a good idea. Be careful though, I’ve seen regular expressions that try to validate email addresses get completely out of control.
◦ #2 If you have a long string, and this could even be a text file, that you want to search for some pattern either to find the pattern or just determine if it exists, then you could find a regex useful.
◦ #3 If you have a string that you want to divide into several smaller strings based on one or more patterns then you might want to use a regex. This is called tokenizing.
◦ #4 If you want to not just find some matching text but then replace it with something else, then a regex could help.
There are no hard rules that I can give you for when to use a regex in your code. I consider regular expressions all the time and I can tell you that I rarely use them. I think where they really shine is when the pattern needs to be configurable. I’ll explain what I mean right after this message from our sponsor.
A lot of books will give you an example of using a regex to match HTML or XML tags and while this could be legitimate, I think it leads developers into thinking that regular expressions are good at this sort of thing.
Let me be clear, if you have a complicated file with lots of unknown structure, then trying to parse this with regular expressions is a bad idea. If you need to parse it, then use a parser. Even if this means you have to write your own.
I think a regex is much better suited for smaller tactical work and even then, you should only use it when the flexibility that you get from using a regex is critical.
I mentioned letting a regex be configurable. You really need to consider who the user is though that could possibly make changes. I’m not talking about your average computer user. I’m not even talking about most people we would consider to be “really good” with computers. I’m more talking about the one person in your whole company who might be able to understand them. Or the one person from your school who spent every waking moment at the computer, doing programming that is.
If you’re building an application designed to be used by programmers, then you might find that a few of your customers will understand regular expressions.
I hope I’m beginning to get you to see the pitfalls of regular expressions. It’s not that they’re not for everybody. It’s more that they’re only for very few people. Even programmers.
One of the best examples I’ve seen of where you might really get value from a regex is putting simple regular expressions in a config file for your application. This is great because it lets you adapt the behavior of your application if necessary. You probably won’t need to change the regex often if at all but it’s nice to know that it can be changed if needed.
Another place that I’ve seen that used regular expressions well was a log file viewer. First of all, log files are really only meant to be used by software developers. Sure, some users will be able to make sense of them. But to get full use out of a log file, you need to be able to match the log file with the source code that created it. If you’re a developer and have a log file with many tens of thousands of lines, then you need a way to accurately search for complicated patterns. A log file viewer application that lets you go beyond simple word matching and provide your own regular expressions can be a big boost in your ability to find important and relevant information.
Let’s switch gears now and I’ll explain a bit about some of ways you can write a regular expression.
The main thing to understand is that there are certain characters that have special meanings and the other characters will be matched as-is.
If you use the word “cat” all in lower-case as a regex, then the only thing it will match is those three characters. Depending on how you use the regex, it may find the three characters somewhere in a longer string if you’re searching instead of matching.
What if you want to find either “Cat” with a capital C or “cat” all in lower-case? You use the square brackets to surround both the upper and lower-case c followed by the a and t characters. Square brackets allow you to match one of the characters.
The only other special characters I can easily describe here are:
◦ The star which matches any number of whatever comes before it.
◦ The question mark which matches something optional that could either exist or not.
◦ The plus sign which matches one or more of whatever comes before it.
◦ The carrot or sometimes called the hat character which matches the beginning of a line of text. You can use this to make sure what you’re looking for comes at the beginning of a new line.
◦ And the dollar sign which matches the end of a line.
There are a lot more special characters including the ability to define groups and compare one part of matching text with another part.
You can add a lot of flexibility to your application and simplify some tasks that would otherwise need complicated code by using regular expressions. Just be aware that the cost is sometimes a lot of complexity in the regular expressions themselves and this can hide bugs if you don’t really understand them.