A regular expression or regex lets you perform the advanced text operations matching, searching, tokenizing, and replacing.
To begin, why would you ever want to use regexes? This is definitely something you’ll want to be careful about. The syntax of a regular expression can easily take several minutes to study even when it only has about 20 characters. Beyond that, and you’ll probably need to create some diagrams to understand what it means. Just one little character can drastically change the meaning.
To say that a regex is terse would be a big understatement. But this terseness is also its biggest strength. It can pack a lot of subtle meaning into what may at first look what you would get if a cat walked across your keyboard. Yeah, regexes can be cryptic.
Here’s the four basic uses that I’d consider for a regex.
- If you have a short string or piece of text and need to verify if it matches a known good pattern, then depending on the complexity of the pattern, a regex might be a good idea. Be careful though, I’ve seen regular expressions that try to validate email addresses get completely out of control.
- If you have a long string, and this could even be a text file, that you want to search for some pattern either to find the pattern or just determine if it exists, then you could find a regex useful.
- If you have a string that you want to divide into several smaller strings based on one or more patterns then you might want to use a regex. This is called tokenizing.
- If you want to not just find some matching text but then replace it with something else, then a regex could help.
There are no hard rules that I can give you for when to use a regex in your code. I consider regular expressions all the time and I can tell you that I rarely use them. I think where they really shine is when the pattern needs to be configurable.
A lot of books will give you an example of using a regex to match HTML or XML tags and while this could be legitimate, I think it leads developers into thinking that regular expressions are good at this sort of thing. Let me be clear, if you have a complicated file with lots of unknown structure, then trying to parse this with regular expressions is a bad idea. If you need to parse it, then use a parser. Even if this means you have to write your own.
I think a regex is much better suited for smaller tactical work and even then, you should only use it when the flexibility that you get from using a regex is critical. One of the best examples I’ve seen of where you might really get value from a regex is putting simple regular expressions in a config file for your application. This is great because it lets you adapt the behavior of your application if necessary. You probably won’t need to change the regex often if at all but it’s nice to know that it can be changed if needed.