Skip to content

Instantly share code, notes, and snippets.

@regexyl
Last active February 25, 2022 13:43
Show Gist options
  • Select an option

  • Save regexyl/f465d8362c2b7c77284b1455b1f8c5ed to your computer and use it in GitHub Desktop.

Select an option

Save regexyl/f465d8362c2b7c77284b1455b1f8c5ed to your computer and use it in GitHub Desktop.

Revisions

  1. regexyl revised this gist Feb 25, 2022. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions regex.md
    Original file line number Diff line number Diff line change
    @@ -77,6 +77,7 @@ Finding a non-word boundary? Just find the word boundaries, remove them, and eve
    - Parenthesized Back References (Capture Group)
    - `()`: Creates a capture group for extracting a substring or using a back reference.
    - Use `$1`, `$2`, ... (JS, Java, Perl), or `\1`, `\2`, ... (Python) to retrieve the back references in sequential order.
    - `(?:...)`: A non-capturing group; creates a capture group that will be omitted from the resulting list of captures. [^so-cg]
    - Character Class (or Bracket List)
    - `[]`
    - `[...]`: Accept any *one* of the character within the bracket.
    @@ -96,6 +97,7 @@ Finding a non-word boundary? Just find the word boundaries, remove them, and eve
    [^ntu-guide]: https://www3.ntu.edu.sg/home/ehchua/programming/howto/Regexe.html#:~:text=On%20the%20other%20hand%2C%20the,%5CD%20or%20non%2Ddigit
    [^moz-eg]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace#examples
    [^so-boundary]: https://stackoverflow.com/questions/4541573/what-are-non-word-boundary-in-regex-b-compared-to-word-boundary
    [^so-cg]: Lu, S. (2014, January 29). Use of capture groups in String.split(). Stack Overflow. https://stackoverflow.com/questions/21419530/use-of-capture-groups-in-string-split

    ### Awesome Resources
    - https://riptutorial.com/regex
  2. regexyl revised this gist Feb 25, 2022. 1 changed file with 6 additions and 0 deletions.
    6 changes: 6 additions & 0 deletions regex.md
    Original file line number Diff line number Diff line change
    @@ -32,6 +32,12 @@ function f2c(x) {
    return s.replace(test, convert);
    }
    ```
    5. Capturing the matched pattern
    ```js
    const regexChars = /[\\^$.*+?()[\]{}|]/g;
    const str = 'as[b*';
    console.log(str.replace(regexChars, `\\$&`)) // 'as\\[b\\*'
    ```

    ## Possible Trip-Ups
    ### `\b\` and `\B`: Matching [non-]word boundaries
  3. regexyl revised this gist Feb 25, 2022. 1 changed file with 2 additions and 1 deletion.
    3 changes: 2 additions & 1 deletion regex.md
    Original file line number Diff line number Diff line change
    @@ -50,7 +50,6 @@ Finding a non-word boundary? Just find the word boundaries, remove them, and eve
    [^so-boundary]

    ## Syntax
    ###
    - Metacharacters
    - `.`: Any *one* character except newline, same as `[^\n]`.
    - `\d`, `\D`: Any *one* digit/non-digit character (where digits are `[0-9]`).
    @@ -83,6 +82,8 @@ Finding a non-word boundary? Just find the word boundaries, remove them, and eve
    - Regex recognizes common escape sequences such as `\n` for newline, `\t` for tab, `\r` for carriage-return, `\nnn` for a up to 3-digit octal number, `\xhh` for a two-digit hex code, `\uhhhh` for a 4-digit Unicode, `\uhhhhhhhh` for a 8-digit Unicode.
    - Laziness
    - `*?`, `+?`, `??`, `{m,n}?`, `{m,}?`: Curbs greediness for repetition operators.
    - Capturing matched pattern
    - `$&`: Represents the matched word.

    [^ntu-guide]

  4. regexyl revised this gist Feb 25, 2022. 1 changed file with 4 additions and 2 deletions.
    6 changes: 4 additions & 2 deletions regex.md
    Original file line number Diff line number Diff line change
    @@ -34,17 +34,19 @@ function f2c(x) {
    ```

    ## Possible Trip-Ups
    ### `\b\` and `\B`: Matching [non-]boundary characters
    ### `\b\` and `\B`: Matching [non-]word boundaries
    A word boundary (`\b`) is a *zero width match* that can match:
    - Between a word character (`\w`) and a non-word character (`\W`) or
    - Between a word character and the start or end of the string.

    `\B` is the inverse of `\b`. It can match:
    `\B` is the inverse of `\b`, also *zero width*. It can match:
    - Between two word characters.
    - Between two non-word characters.
    - Between a non-word character and the start or end of the string.
    - The empty string.

    Finding a non-word boundary? Just find the word boundaries, remove them, and everything left are basically non-word boundaries

    [^so-boundary]

    ## Syntax
  5. regexyl revised this gist Feb 25, 2022. 1 changed file with 20 additions and 4 deletions.
    24 changes: 20 additions & 4 deletions regex.md
    Original file line number Diff line number Diff line change
    @@ -2,7 +2,7 @@
    The JavaScript version.

    ## Frequent Examples
    Search for:
    Search for: [^moz-eg]
    1. "/example/": `/\/example\/[a-z]+/i`
    2. Switch words in a string
    ```js
    @@ -33,6 +33,20 @@ function f2c(x) {
    }
    ```

    ## Possible Trip-Ups
    ### `\b\` and `\B`: Matching [non-]boundary characters
    A word boundary (`\b`) is a *zero width match* that can match:
    - Between a word character (`\w`) and a non-word character (`\W`) or
    - Between a word character and the start or end of the string.

    `\B` is the inverse of `\b`. It can match:
    - Between two word characters.
    - Between two non-word characters.
    - Between a non-word character and the start or end of the string.
    - The empty string.

    [^so-boundary]

    ## Syntax
    ###
    - Metacharacters
    @@ -68,9 +82,11 @@ function f2c(x) {
    - Laziness
    - `*?`, `+?`, `??`, `{m,n}?`, `{m,}?`: Curbs greediness for repetition operators.

    ## References
    - https://www3.ntu.edu.sg/home/ehchua/programming/howto/Regexe.html#:~:text=On%20the%20other%20hand%2C%20the,%5CD%20or%20non%2Ddigit
    - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace#switching_words_in_a_string
    [^ntu-guide]

    [^ntu-guide]: https://www3.ntu.edu.sg/home/ehchua/programming/howto/Regexe.html#:~:text=On%20the%20other%20hand%2C%20the,%5CD%20or%20non%2Ddigit
    [^moz-eg]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace#examples
    [^so-boundary]: https://stackoverflow.com/questions/4541573/what-are-non-word-boundary-in-regex-b-compared-to-word-boundary

    ### Awesome Resources
    - https://riptutorial.com/regex
  6. regexyl revised this gist Feb 25, 2022. 1 changed file with 3 additions and 0 deletions.
    3 changes: 3 additions & 0 deletions regex.md
    Original file line number Diff line number Diff line change
    @@ -71,3 +71,6 @@ function f2c(x) {
    ## References
    - https://www3.ntu.edu.sg/home/ehchua/programming/howto/Regexe.html#:~:text=On%20the%20other%20hand%2C%20the,%5CD%20or%20non%2Ddigit
    - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace#switching_words_in_a_string

    ### Awesome Resources
    - https://riptutorial.com/regex
  7. regexyl revised this gist Feb 25, 2022. 1 changed file with 38 additions and 5 deletions.
    43 changes: 38 additions & 5 deletions regex.md
    Original file line number Diff line number Diff line change
    @@ -2,7 +2,36 @@
    The JavaScript version.

    ## Frequent Examples

    Search for:
    1. "/example/": `/\/example\/[a-z]+/i`
    2. Switch words in a string
    ```js
    let re = /(\w+)\s(\w+)/;
    let str = 'John Smith';
    let newstr = str.replace(re, '$2, $1');
    console.log(newstr); // Smith, John
    ```
    3. Using an inline function that modifies the matched characters
    ```js
    function styleHyphenFormat(propertyName) {
    function upperToHyphenLower(match, offset, string) {
    return (offset > 0 ? '-' : '') + match.toLowerCase();
    }
    return propertyName.replace(/[A-Z]/g, upperToHyphenLower);
    }
    console.log(styleHyphenFormat('borderTop')) // border-top
    ```
    4. Converting Fahrenheit to Celsius
    ```js
    function f2c(x) {
    function convert(str, p1, offset, s) {
    return ((p1 - 32) * 5/9) + 'C';
    }
    let s = String(x);
    let test = /(-?\d+(?:\.\d*)?)F\b/g; // (?:...) is a non-capturing group
    return s.replace(test, convert);
    }
    ```

    ## Syntax
    ###
    @@ -12,7 +41,7 @@ The JavaScript version.
    - `\w`, `\W`: Any *one* word/non-word character. For ASCII, word characters are `[a-zA-Z0-9_]`.
    - `\s`, `\S`: Any *one* space/non-space character. For ASCII, whitespace characters are `[ \n\r\t\f]`.
    - Occurrence Indicators
    - `+`: One or more, e.g. `[0-9]+` matches 1 or more digits, such as "123", "0000".
    - `+`: One or more, e.g. `[0-9]+` matches 1 or more digits, such as "123", "0000".
    - `*`: Zero or more (accepts the above + empty strings).
    - `?`: Zero or one (optional), e.g., [+-]? matches an optional "+", "-", or an empty string.
    - `{}`
    @@ -24,8 +53,9 @@ The JavaScript version.
    - `$`: End of line
    - `\b`: Boundary of word, i.e., start-of-word or end-of-word. E.g., \bcat\b matches the word "cat" in the input string.
    - `\B`: Inverse of `\b`, i.e. non-start-of-word or non-end-of-word.
    - Parenthesized Back References
    - `()`:
    - Parenthesized Back References (Capture Group)
    - `()`: Creates a capture group for extracting a substring or using a back reference.
    - Use `$1`, `$2`, ... (JS, Java, Perl), or `\1`, `\2`, ... (Python) to retrieve the back references in sequential order.
    - Character Class (or Bracket List)
    - `[]`
    - `[...]`: Accept any *one* of the character within the bracket.
    @@ -35,6 +65,9 @@ The JavaScript version.
    - `|`: OR operator, e.g. `four|4` accepts "four" or "4".
    - `\`: Escape sequence to accept a char with special meaning in regex.
    - Regex recognizes common escape sequences such as `\n` for newline, `\t` for tab, `\r` for carriage-return, `\nnn` for a up to 3-digit octal number, `\xhh` for a two-digit hex code, `\uhhhh` for a 4-digit Unicode, `\uhhhhhhhh` for a 8-digit Unicode.
    - Laziness
    - `*?`, `+?`, `??`, `{m,n}?`, `{m,}?`: Curbs greediness for repetition operators.

    ## References
    - https://www3.ntu.edu.sg/home/ehchua/programming/howto/Regexe.html#:~:text=On%20the%20other%20hand%2C%20the,%5CD%20or%20non%2Ddigit.
    - https://www3.ntu.edu.sg/home/ehchua/programming/howto/Regexe.html#:~:text=On%20the%20other%20hand%2C%20the,%5CD%20or%20non%2Ddigit
    - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace#switching_words_in_a_string
  8. regexyl created this gist Feb 24, 2022.
    40 changes: 40 additions & 0 deletions regex.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,40 @@
    # Regex Cheatsheet
    The JavaScript version.

    ## Frequent Examples


    ## Syntax
    ###
    - Metacharacters
    - `.`: Any *one* character except newline, same as `[^\n]`.
    - `\d`, `\D`: Any *one* digit/non-digit character (where digits are `[0-9]`).
    - `\w`, `\W`: Any *one* word/non-word character. For ASCII, word characters are `[a-zA-Z0-9_]`.
    - `\s`, `\S`: Any *one* space/non-space character. For ASCII, whitespace characters are `[ \n\r\t\f]`.
    - Occurrence Indicators
    - `+`: One or more, e.g. `[0-9]+` matches 1 or more digits, such as "123", "0000".
    - `*`: Zero or more (accepts the above + empty strings).
    - `?`: Zero or one (optional), e.g., [+-]? matches an optional "+", "-", or an empty string.
    - `{}`
    - `{m,n}`: `m` to `n` (both inclusive).
    - `{m}`: Exactly `m` times.
    - `{m,}`: `m` or more times (`m+`).
    - Position Anchors
    - `^`: Start of line, e.g. `^[0-9]$` matches a numeric string.
    - `$`: End of line
    - `\b`: Boundary of word, i.e., start-of-word or end-of-word. E.g., \bcat\b matches the word "cat" in the input string.
    - `\B`: Inverse of `\b`, i.e. non-start-of-word or non-end-of-word.
    - Parenthesized Back References
    - `()`:
    - Character Class (or Bracket List)
    - `[]`
    - `[...]`: Accept any *one* of the character within the bracket.
    - `[.-.]`: Accept any *one* of the characters in the range, e.g. `[0-9]`, `[A-Za-z]`.
    - `[^...]`: Rejects any *one* of the character, e.g. `[^0-9]` matches any non-digit.
    - Only ^, -, ], \ require escape sequence inside the bracket list.
    - `|`: OR operator, e.g. `four|4` accepts "four" or "4".
    - `\`: Escape sequence to accept a char with special meaning in regex.
    - Regex recognizes common escape sequences such as `\n` for newline, `\t` for tab, `\r` for carriage-return, `\nnn` for a up to 3-digit octal number, `\xhh` for a two-digit hex code, `\uhhhh` for a 4-digit Unicode, `\uhhhhhhhh` for a 8-digit Unicode.

    ## References
    - https://www3.ntu.edu.sg/home/ehchua/programming/howto/Regexe.html#:~:text=On%20the%20other%20hand%2C%20the,%5CD%20or%20non%2Ddigit.