A regular expression, or regex, is a string of characters that specifies a pattern. It's used across search, string validation, and lexical analysis. The languages of regular expressions coincide with those recognized by finite state automata.
Most programming languages support regex — Python, C, C++, Java, JavaScript, and Dart among them.
Basics of regex
A single character is itself a regular expression. The boolean or operator | matches either pattern:
himatches{hi}hi | hellomatches{hi, hello}zz*matches{z, zz, zzz, ...}(haha)+matches{haha, hahahaha, ...}analy(s|z)ematches{analyse, analyze}analog(ue)?matches{analog, analogue}
Key operators
| Pattern | Meaning |
|---------|---------|
| [a-z] | Any lowercase letter |
| ^word | String begins with "word" |
| word$ | String ends with "word" |
| \d | Any digit |
| . | Any character |
| o{2} | Exactly two occurrences of "o" |
Regex for validation
Say we want to validate emails for a specific domain, allowing only - and _ as special characters:
[A-Za-z0-9_-]+@mycompany\.com
The outer group ( )+ matches one or more occurrences. The inner [A-Za-z0-9_-] matches alphanumeric characters plus underscore and dash. The \. escapes the dot since . alone matches any character.
You can interact with this example on regex101.