Regular Expression: What It Means

Regex commands aren't terribly hard to work with

Regular Expressions
Regex Groups in JavaScript.

What Is a Regular Expression?

Regex, or regular expressions, are a pattern matching markup that programmers use to search for certain patterns in text. Regular expressions can search for just about anything, depending on how you structure them; they're used all over by programmers because they're invaluable for helping computers sort through data quickly and filter out nonsense that could otherwise cause errors.

Regular expressions tend to look scary, especially for non-programmers. Just look at this:

[a-zA-Z0-9_.+-]+@[a-zA-Z0-9_-]+\.[a-zA-Z0-9_.-]+

Realistically, it's actually not that bad; that one matches email addresses. As you'll see, expressions can be broken down into individual characters, all of which tell the program what to look for.

Regular expressions are almost universal. The same general syntax applies across all languages with only slight variations here and there. This guide contains examples from both Python and JavaScript as well as plain old regex. If you work with a different language, don't worry. Nearly everything will apply to your language of choice, too.

Regex Basics

Technically, there aren't a lot of things that couldn't be considered regular expressions, because literal strings of text are really simple ones. If you were to use 'abcde' as a regular expression, the programming language would search for that exact string.

The first more dynamic matching character to take a look at is the '.' character. In this context, the dot character is a wildcard. If you're searching with it, your program will return any character it finds as a match.

So, what if you want to look for a literal dot? That's not hard either. When you want to use a literal period, add a backslash before it, like this: '\.'

Backslash Characters

The backslash plays plenty more roles here, though. Most of the major regex characters include a backslash.

Regular Expression Digits
Finding Digits With Regex In Python.

Take a look at a few examples:

  • \d: Digits from 0 to 9
  • \w: "Word Characters" letters, digits, and underscore
  • \s: Whitespace characters, including tabs, newlines, and regular spaces

If you use the capital letter instead with any of these, you'll get the reverse. For example, '\D' gives you everything but digits.

Classes

The backslash characters are good, but they're still kind of rigid. Generally, you're going to want to match either letters, numbers, or a few special characters.

Regular Expression Classes
Using Regex Classes To Find Letters In Python.

Place the characters you want to be matched in a pair of square brackets '[]', and your program will match to any of them. This is called a regex class.

[abcd1234]

The above example is still inefficient. Instead, you can use a dash to specify a range; for example, all lowercase letters:

[a-z]

You can list ranges too. The below expression matches all letters and digits:

[a-zA-Z0-9]

If you're going to include the dash in your set of characters, tack it on at the end to prevent it from being evaluated. It works with other special characters, too.

[a-zA-Z0-9_.+-]

Like with the backslash characters, you can get the inverse result here, too. Place a '^' at the beginning of your class to exclude them from your results. This will exclude digits and several special characters from the results:

[^0-9_+.-]

Groups

Groups use a set of parenthesis to chunk apart your expression. They group data, allowing your program to target and use it. When a program strips the 'http://' from a web address, it's using regex groups to accomplish that. The regex lets it target certain criteria, and the groups let it separate sections out.

Regular Expression Groups
Regex Groups Help Find URLs in JavaScript.

Groups also let you choose between one pattern or another. They employ a single '|' to act as "or" in the expression. The expression below will match any of these: .com, .org, .net, .edu, or .gov.

\.(com|org|net|edu|gov)

Quantifiers

Quantifiers are exactly what they sound like. They tell the expression the quantity of a character you're looking for. These are the available quantifiers:

  • *: Zero or more
  • +: One or more
  • ?: Zero or one
  • {3}: The amount in the brackets

Place any of these quantifiers at the end of the character or class you want to specify the amount of. This example looks for standard seven-digit telephone numbers:

\d{3}[.*-]\d{3}[.*-]\d{4}

Anchors and Boundaries

Regular expressions let you search for patterns based on their position within a string of text or around a word.

Regular Expression Anchors
Regex Anchors Use Positioning To Find a Match in JavaScript.

These are your primary options:

  • ^: The beginning of a string
  • $: The end of a string
  • \b: Word boundary (the beginning or end of a word)

If you want to only find strings that begin with a letter, you can try:

^[a-zA-Z]

Say you want to find just the word "it," not words containing the letters I and T; that's where you'd use word boundaries.

\b(i|I)t\b

Final Thoughts

Regular expressions can save you a ton of headaches when programming. Imagine trying to write logic to accomplish any of the examples in this article. It'd be a terrible mess. Once you become comfortable with them, you'll probably find yourself really enjoying the power and flexibility of regex.