Skip to content

Instantly share code, notes, and snippets.

@gruber
Last active October 25, 2025 19:13
Show Gist options
  • Save gruber/249502 to your computer and use it in GitHub Desktop.
Save gruber/249502 to your computer and use it in GitHub Desktop.

Revisions

  1. gruber revised this gist Feb 11, 2014. No changes.
  2. gruber renamed this gist Feb 11, 2014. 1 changed file with 8 additions and 2 deletions.
    Original file line number Diff line number Diff line change
    @@ -1,9 +1,15 @@
    Single-line version of pattern:
    The regex patterns in this gist are intended to match any URLs,
    including "mailto:[email protected]", "x-whatever://foo", etc. For a
    pattern that attempts only to match web URLs (http, https), see:
    https://gist.github.com/gruber/8891611


    # Single-line version of pattern:

    (?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))


    Extended version of same pattern:
    # Multi-line commented version of same pattern:

    (?xi)
    \b
  3. gruber revised this gist Jul 27, 2010. 1 changed file with 25 additions and 16 deletions.
    41 changes: 25 additions & 16 deletions Liberal URL-matching Regex, Take Two
    Original file line number Diff line number Diff line change
    @@ -1,25 +1,34 @@
    Single-line version of pattern:

    (?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))


    Extended version of same pattern:

    (?xi)
    \b
    ( # Capture 1: entire matched URL
    ( # Capture 1: entire matched URL
    (?:
    [a-z][\w-]+: # URL protocol and colon
    [a-z][\w-]+: # URL protocol and colon
    (?:
    /{1,3} # 1-3 slashes
    | # or
    [a-z0-9%] # Single letter or digit or '%'
    # (Trying not to match e.g. "URI::Escape")
    /{1,3} # 1-3 slashes
    | # or
    [a-z0-9%] # Single letter or digit or '%'
    # (Trying not to match e.g. "URI::Escape")
    )
    | # or
    www\d{0,3}[.] # "www.", "www1.", "www2." … "www999."
    | # or
    www\d{0,3}[.] # "www.", "www1.", "www2." … "www999."
    | # or
    [a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash
    )
    (?: # One or more:
    [^\s()<>]+ # Run of non-space, non-()<>
    | # or
    \([^\s()<>]+\) # a matching set of parens
    (?: # One or more:
    [^\s()<>]+ # Run of non-space, non-()<>
    | # or
    \(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
    )+
    (?: # End with:
    \([^\s()<>]+\) # a set of parens
    | # or
    [^`!()\[\]{};:'".,<>?«»“”‘’\s] # not a space or one of these punct chars
    (?: # End with:
    \(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
    | # or
    [^\s`!()\[\]{};:'".,<>?«»“”‘’] # not a space or one of these punct char
    )
    )
  4. gruber revised this gist Dec 6, 2009. 1 changed file with 10 additions and 14 deletions.
    24 changes: 10 additions & 14 deletions Liberal URL-matching Regex, Take Two
    Original file line number Diff line number Diff line change
    @@ -1,29 +1,25 @@
    (?x)
    (?xi)
    \b
    ( # Capture 1: entire matched URL
    (?:
    [\w-]+: # URL protocol and colon
    [a-z][\w-]+: # URL protocol and colon
    (?:
    /{1,3} # 1-3 slashes
    | # or
    [[:alpha:][:digit:]] # Single letter or digit
    # (Try not to match, say "URI::Escape")
    | # or
    [a-z0-9%] # Single letter or digit or '%'
    # (Trying not to match e.g. "URI::Escape")
    )
    | # or
    www\d?[.] # "www.", "www1.", "www2.", etc.
    www\d{0,3}[.] # "www.", "www1.", "www2." … "www999."
    )
    (?: # One or more:
    [^\s()<>]+ # Run of non-space, non-()<>
    | # or
    | # or
    \([^\s()<>]+\) # a matching set of parens
    )+
    (?: # End with:
    \([^\s()<>]+\) # a set of parens
    | # or
    (?:
    [^[:punct:]\s] # a non-punctuation non-space char
    | # or
    / # a slash
    )
    \([^\s()<>]+\) # a set of parens
    | # or
    [^`!()\[\]{};:'".,<>?«»“”‘’\s] # not a space or one of these punct chars
    )
    )
  5. gruber revised this gist Dec 5, 2009. 1 changed file with 19 additions and 19 deletions.
    38 changes: 19 additions & 19 deletions Liberal URL-matching Regex, Take Two
    Original file line number Diff line number Diff line change
    @@ -1,29 +1,29 @@
    (?x)
    \b
    ( # Capture 1: entire matched URL
    ( # Capture 1: entire matched URL
    (?:
    [\w-]+: # URL protocol and colon
    [\w-]+: # URL protocol and colon
    (?:
    /{1,3} # 1-3 slashes
    | # or
    [[:alpha:][:digit:]] # Single letter or digit
    # (Try not to match, say "URI::Escape")
    /{1,3} # 1-3 slashes
    | # or
    [[:alpha:][:digit:]] # Single letter or digit
    # (Try not to match, say "URI::Escape")
    )
    | # or
    www\d?[.] # "www.", "www1.", "www2.", etc.
    | # or
    www\d?[.] # "www.", "www1.", "www2.", etc.
    )
    (?: # One or more:
    [^\s()<>]+ # Run of non-space, non-()<>
    | # or
    \([^\s()<>]+\) # a matching set of parens
    (?: # One or more:
    [^\s()<>]+ # Run of non-space, non-()<>
    | # or
    \([^\s()<>]+\) # a matching set of parens
    )+
    (?: # End with:
    \([^\s()<>]+\) # a set of parens
    | # or
    (?:
    [^[:punct:]\s] # a non-punctuation non-space char
    | # or
    / # a slash
    (?: # End with:
    \([^\s()<>]+\) # a set of parens
    | # or
    (?:
    [^[:punct:]\s] # a non-punctuation non-space char
    | # or
    / # a slash
    )
    )
    )
  6. gruber created this gist Dec 5, 2009.
    29 changes: 29 additions & 0 deletions Liberal URL-matching Regex, Take Two
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,29 @@
    (?x)
    \b
    ( # Capture 1: entire matched URL
    (?:
    [\w-]+: # URL protocol and colon
    (?:
    /{1,3} # 1-3 slashes
    | # or
    [[:alpha:][:digit:]] # Single letter or digit
    # (Try not to match, say "URI::Escape")
    )
    | # or
    www\d?[.] # "www.", "www1.", "www2.", etc.
    )
    (?: # One or more:
    [^\s()<>]+ # Run of non-space, non-()<>
    | # or
    \([^\s()<>]+\) # a matching set of parens
    )+
    (?: # End with:
    \([^\s()<>]+\) # a set of parens
    | # or
    (?:
    [^[:punct:]\s] # a non-punctuation non-space char
    | # or
    / # a slash
    )
    )
    )