Back to Pattern Library
Web Development

Match HTML Tags Regex

A regex pattern to extract or strip HTML elements from a block of text.

Loading Regex Sandbox...

How to Match HTML Tags with Regex

Extracting or stripping HTML tags from a string is a common requirement when sanitizing user input or parsing web scraping results. While you should generally use a DOM parser for complex HTML, regex is perfect for lightweight stripping.

The Pattern Breakdown

The pattern <\/?\w+((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)\/?> captures opening, closing, and self-closing tags, along with their attributes:

  • <\/?: Matches the opening bracket and an optional forward slash (for closing tags).
  • \w+: Matches the tag name (e.g., div, p, span).
  • The middle block handles any number of attributes (like class="foo" or disabled).
  • \/?>: Matches the optional self-closing slash and the final closing bracket.

Why Regex is Not a Full HTML Parser

Remember the golden rule of web scraping: You cannot parse HTML strictly with regex. Because HTML is not a regular language, nested tags and malformed DOMs will eventually break this pattern. Use this strictly for sanitization!