Created
April 2, 2023 18:49
-
-
Save imranity/0694e078ed531d74da98a5caac703d49 to your computer and use it in GitHub Desktop.
Revisions
-
imranity created this gist
Apr 2, 2023 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,67 @@ # python regex ## chap1 : intro to regex * regex: Regular expressions are text patterns that define the form a text string should have. - useful for email checking patern - matching word "color" and "colour" - extra specific info like postal code LOL: "Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems." * How regex started (birth of grep) Ken Thompson's work didn't end in just writing a paper. He included support for these regular expressions in his version of QED. To search with a regular expression in QED, the following had to be written: g/<regular expression>/p In the preceding line of code, g means global search and p means print. If, instead of writing regular expression, we write the short form re, we get g/re/p, and therefore, the beginnings of the venerable UNIX command-line tool grep ?: match single char (file?.xml matches file1.xml and file9.xml but not file99.xml) * : match any numder of char in file?.xml: literals -> file and xml metacharacters -> ? (or '\*' ) ### OUR FIRST REGEX /a\w*/ ==> matches any word starting with 'a' ### Escaping Metacharacters Metachars can coexist but what if need to use metachar as luterals? 3 ways to do it: * escape the metachar by preceding with a backlash * in python , use "re.escape" * Quoting with \Q and \E: (not supported in Python) There are 12 metachar that should be escaped when needed to use as char: \ backslash ^ Caret $ Dollar Sign . Dot | Pipe Symbol ? Question * Asterik + Plus sign ( ) [ { ## Character class Character classes allow us to define a char that will match if any of defined char on set is present for example to match "license" and "licene" --> /licen[sc]e/ we can use range of chars [b-e] or num [2-9] Ranges can be combined : [0-9a-zA-z] * Negation of ranges [^0-9] match anything not a number but there has to be a char e.g. /hello[^0-9]/ wont match hello as there no char in its place ### Predefined char class | Element | Description | . | matches any char except newline | \d | matches any decimal , equivalent to [0-9] | \D | matches any non-digit , eq to [^0-9] | \s | matches any whitespace class: eq to [ \t\n\r\f\v ] | \S | matches non-whitespace , eq to [ ^ \t\n\r\f\v ] | \w | matches any alphanumeric eq to [0-9a-zA-Z_] [^\/\] -> matches any char thats not a backslash or slash