Basic Characters

  • . - Matches any character except newline
  • \d - Matches any digit (0-9)
  • \D - Matches any non-digit
  • \w - Matches any word character (alphanumeric + underscore)
  • \W - Matches any non-word character
  • \s - Matches any whitespace (space, tab, newline)
  • \S - Matches any non-whitespace character
  • \t - Matches a tab character
  • \n - Matches a newline character
  • \r - Matches a carriage return

Anchors

  • ^ - Start of a string
  • $ - End of a string
  • \b - Word boundary (e.g., matches the boundary between a word and a space)
  • \B - Non-word boundary (e.g., matches the position between two word characters)

Quantifiers

  • * - 0 or more occurrences (e.g., a* matches "", β€œa”, β€œaa”, etc.)
  • + - 1 or more occurrences (e.g., a+ matches β€œa”, β€œaa”, etc.)
  • ? - 0 or 1 occurrence (optional) (e.g., a? matches "" or β€œa”)
  • {n} - Exactly n occurrences (e.g., a{3} matches β€œaaa”)
  • {n,} - n or more occurrences (e.g., a{2,} matches β€œaa”, β€œaaa”, etc.)
  • {n,m} - Between n and m occurrences (e.g., a{2,4} matches β€œaa”, β€œaaa”, or β€œaaaa”)

Groups and Ranges

  • (abc) - Capturing group, matches exactly β€œabc”
  • (?:abc) - Non-capturing group, matches β€œabc” without storing it for backreferences
  • [abc] - Character set, matches β€œa”, β€œb”, or β€œc”
  • [^abc] - Negated character set, matches any character except β€œa”, β€œb”, or β€œc”
  • [a-z] - Character range, matches any lowercase letter from a to z
  • [A-Z] - Matches any uppercase letter from A to Z
  • [0-9] - Matches any digit from 0 to 9
  • [a-zA-Z0-9_] - Matches any alphanumeric character or underscore
  • (a|b) - Alternation, matches either β€œa” or β€œb” (OR operator)

Special Characters

  • \\ - Escape character (e.g., \. to match a period)
  • \n - Newline
  • \t - Tab
  • \r - Carriage return
  • \f - Form feed
  • \v - Vertical tab

Lookahead and Lookbehind

  • (?=...) - Positive lookahead, matches if ... follows (e.g., \d(?=abc) matches a digit followed by β€œabc”)
  • (?!...) - Negative lookahead, matches if ... does not follow (e.g., \d(?!abc) matches a digit not followed by β€œabc”)
  • (?<=...) - Positive lookbehind, matches if ... precedes (e.g., (?<=abc)\d matches a digit preceded by β€œabc”)
  • (?<!...) - Negative lookbehind, matches if ... does not precede (e.g., (?<!abc)\d matches a digit not preceded by β€œabc”)

Backreferences

  • \1, \2, etc. - Matches the same text as the first, second, etc., capturing group (e.g., (\w)\1 matches repeated word characters like β€œaa” or β€œbb”)
  • Named Capturing Groups: (?<name>...) - Assigns a name to a capturing group (e.g., (?<digit>\d))
  • Named Backreference: \k<name> - Refers to a named capturing group (e.g., \k<digit>)

Useful Shorthands

  • .* - Matches any character (except newline) 0 or more times
  • \w+ - Matches one or more word characters
  • \d{n} - Matches exactly n digits
  • \s* - Matches zero or more whitespace characters
  • [a-zA-Z]+ - Matches one or more alphabetic characters
  • [A-Za-z0-9_.+-] - Matches characters commonly used in email addresses

Common Regex Patterns

  • Email: \b[\w.%+-]+@[\w.-]+\.[A-Za-z]{2,6}\b
  • URL: http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+
  • Phone Number (US): \(?\d{3}\)?[-\s.]?\d{3}[-\s.]?\d{4}
  • Date (YYYY-MM-DD): \b\d{4}-\d{2}-\d{2}\b
  • IPv4 Address: \b(?:\d{1,3}\.){3}\d{1,3}\b
  • Postal Code (US): \b\d{5}(?:-\d{4})?\b
  • Hex Color Code: #(?:[0-9a-fA-F]{3}){1,2}
  • Username: [a-zA-Z0-9_]{3,16} (matches usernames between 3 and 16 characters)

Flags

  • i - Case insensitive (e.g., /abc/i matches β€œABC” as well as β€œabc”)
  • g - Global search (e.g., /abc/g finds all matches)
  • m - Multi-line mode (^ and $ match start/end of lines)
  • s - Dotall mode (. matches newline as well)
  • u - Unicode mode (e.g., /\u{1F600}/u matches a Unicode emoji)

Tips for Using Regex

  • Escape Special Characters: Use \\ to escape special characters like ., *, +, ?, etc.
  • Capturing vs Non-Capturing Groups: Use (?:...) for non-capturing groups if you don’t need backreferences.
  • Testing Regex: Use tools like regex101 or RegExr to test and debug your expressions.
  • Use Comments for Complex Regex: In some languages, you can use x flag to allow comments and whitespace for readability (e.g., / ( [A-Z] \w+ ) /x)
  • Break Down Complex Patterns: Break down your pattern into smaller parts to debug step by step.

Regex Tools and Resources