Introduction to Regular Expressions
Matching Basics
Start here for the basics of pattern matching.
| Pattern | Matches | Explanation |
|---|---|---|
| a | abc as easy as 123. | All digits and alpha characters match themselves. In this example, the input "a" will match all instances of the single character "a". |
| . | abc as easy as 123. | The period "." matches any single character except return. |
| abc | abc as easy as 123. | Strings of characters will produce exact matches to sequence of characters specified. |
| (abc|123) | abc as easy as 123. | Parentheses are used to signify groups. Inside the group, a pipe "|" can be used as an OR operator. |
| [abc] | abc as easy as 123. | Everything inside of square brackets counts as a single character. For instance, this will match a or b or c. Equivalent to (a|b|c). |
| [a-c1-3] | abc as easy as 123. | Ranges can be specified by using a dash. |
| [^a-c1-3] | abc as easy as 123. | A caret "^" directly inside the first square bracket matches all characters not in the brackets. Notice that even spaces and the period are matched. |
| \. | abc as easy as 123. | Some characters have special meanings in regular expressions. These characters must be preceded by the escape character "\" for a literal match. Special characters include: ^ $ ( ) < . * + ? [ { \ | > |
Characters by Type
Sometimes, you don't care what the number is - you just care if it is a number.
| Pattern | Matches | Explanation |
|---|---|---|
| \d | abc as easy as 123. | Matches any digit character. This is equivalent to [0-9]. |
| \D | abc as easy as 123. | Any character that is not a digit. Equivalent to [^0-9]. |
| \w | abc as easy as 123. | Any alphanumeric character, including the underscore. Equivalent to [A-Za-z0-9_]. |
| \W | abc as easy as 123. | Any non-alphanumeric character. Equivalent to [^A-Za-z0-9_]. |
| \n | abc as easy as 123. | The line feed character. Same as \r in most cases. |
| \r | abc as easy as 123. | The carriage return character. Same as \n in most cases. |
| \s | abc as easy as 123. | Any white-space character, including space, tab, new line and return. |
| \S | abc as easy as 123. | Any single non–white-space character. Equivalent to [^ \t\r\n]. |
Position
You can also specify where the match needs to occur.
| Pattern | Matches | Explanation |
|---|---|---|
| ^a | abc as easy as 123. | Starting the expression with a caret "^" means that the pattern must start at the beginning. Notice that only the first "a" character is matched. |
| 123\.$ | abc as easy as 123. | Ending the expression with the dollar sign "$" means that the match must be at the end. |
| \ba | abc as easy as 123. | Use "\b" to match characters next to a word boundary (such as space or return). Notice that the "a" in easy is not matched. |
| \Ba | abc as easy as 123. | Use "\B" to match characters that are not next to a word boundary. |
Quantifiers
How many matches are allowed? Let me count the ways...
| Pattern | Matches | Explanation |
|---|---|---|
| ab* | a b ab abb abbb | The asterisk specifies that the previous character can appear 0 or more times in a row. |
| ab+ | a b ab abb abbb | The plus specifies that the previous character must appear 1 or more times in a row. |
| ab? | a b ab abb abbb | The question mark means that the preceding character is optional. Note that a(b+)? is equivalent to ab*. |
| ab{2} | a b ab abb abbb | The preceding character must appear exactly this many times. |
| ab{2,4} | a b ab abb abbb | Sets a range to how many times the preceding character must appear. The second number is optional. For instance {2,} mean to match 2 of more times. |