Introduction to Regular Expressions

Matching Basics

Start here for the basics of pattern matching.

Pattern Matches Explanation
a abc as easy as 123. All digits and alpha characters match themselves. In this example, the input "a" will match all instances of the single character "a".
. abc as easy as 123. The period "." matches any single character except return.
abc abc as easy as 123. Strings of characters will produce exact matches to sequence of characters specified.
(abc|123) abc as easy as 123. Parentheses are used to signify groups. Inside the group, a pipe "|" can be used as an OR operator.
[abc] abc as easy as 123. Everything inside of square brackets counts as a single character. For instance, this will match a or b or c. Equivalent to (a|b|c).
[a-c1-3] abc as easy as 123. Ranges can be specified by using a dash.
[^a-c1-3] abc as easy as 123. A caret "^" directly inside the first square bracket matches all characters not in the brackets. Notice that even spaces and the period are matched.
\. abc as easy as 123.

Some characters have special meanings in regular expressions. These characters must be preceded by the escape character "\" for a literal match.

Special characters include: ^ $ ( ) < . * + ? [ { \ | >

Characters by Type

Sometimes, you don't care what the number is - you just care if it is a number.

Pattern Matches Explanation
\d abc as easy as 123. Matches any digit character. This is equivalent to [0-9].
\D abc as easy as 123. Any character that is not a digit. Equivalent to [^0-9].
\w abc as easy as 123. Any alphanumeric character, including the underscore. Equivalent to [A-Za-z0-9_].
\W abc as easy as 123. Any non-alphanumeric character. Equivalent to [^A-Za-z0-9_].
\n abc as easy as 123.  The line feed character. Same as \r in most cases.
\r abc as easy as 123.  The carriage return character. Same as \n in most cases.
\s abc as easy as 123.  Any white-space character, including space, tab, new line and return.
\S abc as easy as 123. Any single non–white-space character. Equivalent to [^ \t\r\n].

Position

You can also specify where the match needs to occur.

Pattern Matches Explanation
^a abc as easy as 123. Starting the expression with a caret "^" means that the pattern must start at the beginning. Notice that only the first "a" character is matched.
123\.$ abc as easy as 123. Ending the expression with the dollar sign "$" means that the match must be at the end.
\ba abc as easy as 123. Use "\b" to match characters next to a word boundary (such as space or return). Notice that the "a" in easy is not matched.
\Ba abc as easy as 123. Use "\B" to match characters that are not next to a word boundary.

Quantifiers

How many matches are allowed? Let me count the ways...

Pattern Matches Explanation
ab* a b ab abb abbb The asterisk specifies that the previous character can appear 0 or more times in a row.
ab+ a b ab abb abbb The plus specifies that the previous character must appear 1 or more times in a row.
ab? a b ab abb abbb The question mark means that the preceding character is optional. Note that a(b+)? is equivalent to ab*.
ab{2} a b ab abb abbb The preceding character must appear exactly this many times.
ab{2,4} a b ab abb abbb Sets a range to how many times the preceding character must appear. The second number is optional. For instance {2,} mean to match 2 of more times.