# Parsing CSS file with monadic parser in Clojure Inspired by ["Parsing CSS with Parsec"](http://blog.jakubarnold.cz/2014/08/10/parsing-css-with-parsec.html). Just quick notes and code that you can play with in REPL. By [@kachayev](https://twitter.com/kachayev) ## 0. Intro What do we have? ``` .container h1 { color: rgba(255, 0, 0, 0.9); font-size: 24px; font-family: Monaco; } ``` What do we want to get? ```clojure {:selector ".container h1", :rules ({:key "color", :value "rgba(255, 0, 0, 0.9)"} {:key "font-size", :value "24px"} {:key "font-family", :value "Monaco"})} ``` Let's do this with [Monadic parsing](http://eprints.nottingham.ac.uk/223/1/pearl.pdf) approach. Step-by-step explanation of how it works one can find in [Monadic Parsing in Python](http://goo.gl/X7klJ8). What about Clojure? ## 1. Parser abstraction From theory: ```haskell type Parser a = String -> [(a, String)] ``` So, we're going to represent ```parser``` as simple function that takes input and returns seq of possible results. Simplest parsers: ```clojure ;; takes any input and "consume" first char from it (defn any [input] (if (empty? input) '() (list [(first input) (apply str (rest input))]))) ;; this one doesn't accept any input (defn failure [_] '()) (any "clojure-1.7") ;; => ([c lojure-1.7]) ``` We also will use few helpers: ```clojure (defn parse [parser input] (parser input)) (defn parse-all [parser input] (->> input (parse parser) (filter #(= "" (second %))) ffirst)) ``` ## 2. Monads I'm not going to pay much attention to what ```monads``` are. Everything that you need to know is that we will use it to compose small parsers into bigger one. Also we're not going to implement entire "monad infrastructure". ```clojure ;; builds parser that always returns given element without consuming (changing) input (defn return [v] (fn [input] (list [v input]))) ;; takes parser and function that builds new parsers from (each) result of applying first one (defn >>= [m f] (fn [input] (->> input (parse m) (mapcat (fn [[v tail]] (parse (f v) tail))))) ``` Next part is a simple macro that provides haskell-like ```do``` notation. If you don't know how ```do``` block works in Haskell, just skip this code and return to it when necessary. ```clojure (defn merge-bind [body bind] (if (and (not= clojure.lang.Symbol (type bind)) (= 3 (count bind)) (= '<- (second bind))) `(>>= ~(last bind) (fn [~(first bind)] ~body)) `(>>= ~bind (fn [~'_] ~body)))) (defmacro do* [& forms] (reduce merge-bind (last forms) (reverse (butlast forms)))) ``` ## 3. Basic parsers Let's build basic parsers that can recognize chars, strings, letters, spaces, etc. ```clojure (defn sat [pred] (>>= any (fn [v] (if (pred v) (return v) failure)))) ;; just a helper (defn char-cmp [f] (fn [c] (sat (partial f (first c))))) ;; recognizes given char (def match (char-cmp =)) ;; rejects given char (def noneOf (char-cmp not=)) ;; just a helper (defn from-re [re] (sat (fn [v] (not (nil? (re-find re (str v))))))) ;; recognizes any digit (def digit (from-re #"[0-9]")) ;; recognizes any letter (def letter (from-re #"[a-zA-Z]")) ``` Lets try: ```clojure (parse (match "c") "clojure1.7") ;; => ([\c "lojure1.7"]) (parse letter "clojure1.7") ;; => ([\c "lojure1.7"]) (parse digit "1.7clojure") ;; => ([\1 ".7clojure"]) ``` Ok, so far so good. Moving forward... ## 4. Combinators Useful combinators that will help us: ```clojure ;; (ab) (defn and-then [p1 p2] (do* (r1 <- p1) (r2 <- p2) ;; xxx: note, that it's dirty hack to use STR to concat outputs ;; Full functional implementation should use MonadPlus protocol (return (str r1 r2)))) ;; (a|b) (defn or-else [p1 p2] (fn [input] (lazy-cat (parse p1 input) (parse p2 input)))) (declare plus) (declare optional) ;; (a*) (defn many [parser] (optional (plus parser))) ;; (a+) equals to (aa*) (defn plus [parser] (do* (a <- parser) (as <- (many parser)) (return (cons a as)))) ;; (a?) (defn optional [parser] (or-else parser (return ""))) ``` Pay special attention to ```plus``` combinator implementation using ```do*``` mentioned above. ```do*``` is only syntax sugar for ```bind``` operations, in reality ```plus``` looks like: ```clojure (defn plus [parser] (>>= parser (fn [a] (>>= (many parser) (fn [as] (return (cons a as))))))) ``` Example of combinators usage: ```clojure ;; recognizes space (or newline) (def space (or-else (match " ") (match "\n"))) ;; recognizes empty string or arbitrary number of spaces (def spaces (many space)) ;; recognizes given string, i.e. "clojure" (defn string [s] (reduce and-then (map #(match (str %)) s))) ``` Lets try to build something more complicated and interesting: ```clojure (def clojure-version (do* (string "clojure") (match " ") (major <- digit) (match ".") (minor <- digit) (return (str "major: " major "; minor: " minor)))) (parse-all clojure-version "clojure 1.7") ;; => "major: 1; minor: 7" ``` ## 5. CSS From CSS 2.1 grammar definition: ```haskell type Selector = String data Rule = Rule String String deriving Show data Ruleset = Ruleset Selector [Rule] deriving Show ``` In Clojure we can use records to represent data types: ```clojure (defrecord Rule [key value]) (defrecord Ruleset [selector rules]) ``` Now lets define ```rule``` and ```ruleset``` parsers: ```clojure (def letter+ (or-else letter (match "-"))) (def rule (do* (p <- (many (noneOf ":"))) (match ":") spaces (v <- (many (noneOf ";"))) (match ";") spaces (return (Rule. (apply str p) (apply str v))))) (def ruleset (do* (s <- (plus (noneOf "{"))) (match "{") spaces (r <- (plus rule)) spaces (match "}") spaces (return (Ruleset. (trim (apply str s)) r)))) ``` Play a bit: ```clojure (parse-all rule "background: #fafafa; ") ;; {:key "background", :value "#fafafa"} (parse-all ruleset "p { background: #fafafa; color: red; }") ;; {:selector "p", ;; :rules ;; ({:key "background", :value "#fafafa"} {:key "color", :value "red"})} (def css ".container h1 { color: rgba(255, 0, 0, 0.9); font-size: 24px; font-family: Monaco; }") (parse-all ruleset css) ;; {:selector ".container h1", ;; :rules ;; ({:key "color", :value "rgba(255, 0, 0, 0.9)"} ;; {:key "font-size", :value "24px"} ;; {:key "font-family", :value "Monaco"})} ``` Looks nice. ## 6. Notes - it's easier to work both with monads and parsers in dynamically typed language - it's not that convenient to work with ```string```s and ```list``` of ```char```s (you can see few "irrational" ```apply str _``` and ```first _```) - there is still room for improvement, i.e. more general ```return```, ```bind``` and ```do*``` - you can try to rewrite parsers using applicative style