dpetranek · December 14, 2017 02:21
diff --git a/rhymes.clj b/rhymes.clj
 ;; So we want a rhyming dictionary in Clojure.
 ;; 
 ;; https://gist.github.com/jackrusher/8640437
 ;;
 ;; I'm going to study this code and learn as I go.
 ;;
 ;; First I put it in a namespace.

 (ns unscroll.rhyme
  (:require 
    [clojure.string :as string]))


 ;; So that I can mess with it inside my current environment in
 ;; emacs.
 ;; 
 (def rhyme-txt
  (map #(string/split % #"[ ]+")
    (string/split-lines
      (slurp "data/cmudict.txt"))))

 ;; Good ol' CMU rhyming dictionary; nice to see you again my friend!
 ;; 
 ;; The format of the dictionary is such:
 ;;
 ;; ABDOMEN  AE0 B D OW1 M AH0 N
 ;;
 ;; ie.
 ;;
 ;; WORD[SPACE]PHONEME_1[SPACE]PHONEME_2[SPACE]...PHONEME_N[NEWLINE]
 ;;
 ;; So we slurp in the file, split the lines by newlines, and then
 ;; split on space.
 ;; 
 ;; Gotta love slurp; one thing it took me a while to figure out the
 ;; other day is that slurp starts looking for files at the top level
 ;; of the Clojure project. Where the source files are means nothing
 ;; to Clojure (because it means nothing to the JVM).
 ;;
 ;; Anyway now that we've done that we can say:
 ;; (take 3 (drop 1010 rhyme-txt))
 ;;
 ;; (We want to fast-forward past a bunch of comments and miscellany
 ;; to get a clear example, so start around 1010)
 ;;
 ;; And we'd see
 ;; (["ACTUARY" "AE1" "K" "CH" "UW0" "EH1" "R" "IY2"]
 ;;  ["ACTUATE" "AE1" "K" "CH" "UW2" "EY1" "T"]
 ;;  ["ACTUATOR" "AE1" "K" "T" "Y" "UW0" "EY2" "T" "ER0"])
 ;; 
 ;; So now we're dealing with a list of vectors where the first value
 ;; is the word and the rest represent its phonemes.
 ;;
 ;; There's a lot going on with this one as there always is with
 ;; Clojure code. Let's see how it gets called first:
 ;;
 ;; (get-deepest-values (get-in rhyme-to-word (take 3 (word-to-rhyme "connection"))))
 ;; Which could probably be a function of its own called "rhyme."
 ;;
 ;; Although I've noticed Jack doesn't really care much whether
 ;; something is a function or a variable, it's like it DOESN'T EVEN
 ;; MATTER. Hmm.
 ;;
 ;; Anyway, we're going to give this function, word-to-rhyme a word
 ;; and get something back.
 ;;
 ;; (word-to-rhyme "connection")
 ;;
 ;; and we get
 ;;
 ;; [:N :AH :SH :K :EH :N :AH :K]
 ;;
 ;; which is the reverse order of phonemes. Which makes sense, we're
 ;; building a rhyming dictionary, gonna go end to beginning in terms
 ;; of phonemes.
 ;;
 ;; Going to add a crazy amount of indenting here so I can see the
 ;; levels more clearly as a n00b.

 (def word-to-rhyme
  (reduce (fn [m [word & rhyme]]
            (assoc m
              (string/lower-case word)
              (mapv
                #(keyword
                   (string/replace %1 #"[0-9]" ""))
                (reverse rhyme))))
    {} rhyme-txt))

 ;; This is a funny one because I mess with the parens my java keeps
 ;; running out of heap space. So it's clearly doing a lot, like
 ;; loading the whole CMU dictionary into memory.

 ;; The super hot LISPY action in there is

 ;; (reduce function=[that function] val=[an empty map {}]
 ;; coll=[rhyme-txt, our list of vectors])

 ;; Clojure docs are kind of a bear. For reduce, they say "If val is
 ;;  supplied, returns the result of applying f to val and the first
 ;;  item in coll, then applying f to that result and the 2nd item,
 ;;  etc." AWESOME.

 ;; So what reduce does here is return the results applying that
 ;; function in there to {} and the first item in the rhyme-text,
 ;; then applies that to the second item, etc.

 ;; So in [m [word & rhyme]] it's going to be going:
 ;;
 ;; [{} ["CONNECTION" K AH N EH K SH AH N ]]
 ;;
 ;; So that's interesting because we have the word & rhyme -- the way
 ;; that destructuring works, rhyme will catch all of the phonemes
 ;; into a list; it's almost like the CMU people could predict this
 ;; kind of programming would occur using their dictionary.
 ;; Then we say:
 ;;
 ;; (assoc map key val)
 ;;
 ;; Or here:
 ;;
 ;; (assoc {} "connection" ...)
 ;;
 ;; And then a couple things happen
 ;; 
 ;; First we reverse the rhyme so "K AH0 N EH1 K SH AH0 N" becomes
 ;; N AH0 SH K EH1 N AH0 K
 ;; that's (reverse rhyme)
 ;;
 ;; SECOND we replace all the numbers with nothing (likely because we
 ;; just don't need the data, can't remember why CMU uses numbers--
 ;; syllable stress maybe?)
 ;;
 ;; we do that with a regular expression yielding
 ;; 
 ;; N AH SH K EH N AH K
 ;;
 ;; THIRD we run a mapv across that list of phonemes--that is, apply
 ;; to each element and return a vector. And what we are applying is
 ;; the "keyword" function which turns a string to a clojure keyword so
 ;; we end up with a structure like:
 ;; 
 ;; {"connection" [:N :AH :SH :K :EH :N :AH :K]}
 ;;
 ;; NOTE: I'm not sure WHY we're converting to keywords but they are
 ;; prettier in general and make for better keywords in maps, and I'm
 ;; assuming they actually are optimized as, like, keywords, so....
 ;; 
 ;; Anyway, and then we repeat that (lazily, I guess, so in chunks of
 ;; 32 or whatever it is that Clojure does?) as needed until we've
 ;; slurped up the whole file into a big map or what I still think of
 ;; as an associative array.  Aaaand now we have a variable that
 ;; defines a function that when given---
 ;; 
 ;; Oh god, I SEE. I ACTUALLY SEE. This is a def instead of a
 ;; function for a reason. It's a var that calls a function which
 ;; returns a map, but in Clojure a map can operate as a function. So
 ;; when I say:
 ;;
 ;; (word-to-rhyme "connection") I'm causing the interpreter to read
 ;; the entirety of the dictionary into a map, and assigning that map
 ;; to "word-to-rhyme" and then because I'm calling word-to-rhyme as
 ;; the first item in a sexp, the interpreter evaluates it as a
 ;; function and returns the phonemes that it has assoc'd to that
 ;; word.
 ;;
 ;; Clojure is kind of dense.
 ;; 
 ;; So I'm going to assume we're in similar territory here with this
 ;; function.

 (def rhyme-to-word
  (reduce
    (fn [m [word rhyme]]
      (assoc-in m rhyme { :terminate word }))
    {}
    word-to-rhyme))

 ;; Aand we are,  KIND OF. Hmm. So in this case we take the map (now just a
 ;; map) from word-to-rhyme and do another reduce, except the
 ;; structure we're building up is going to be a trie (?) so we're
 ;; going:
 ;; 
 ;;  (assoc-in {} [:N :AH :SH :K :EH :N :AH :K] { :terminate "connection" })
 ;;
 ;; And as a result we're getting:
 ;;
 ;; {:N {:AH {:SH {:K {:EH {:N {:AH {:K {:terminate "connection"}}}}}}}}}
 ;;
 ;; Great but that's one word. NOW Clojure hands that same map back
 ;; to the reduce with ANOTHER word, and so on for thousands of
 ;; words, building up a huge nested behemoth of a data structure.
 ;;
 ;; So we've passed assoc-in the phonemes for "connection"; we can
 ;; now pass it "correction" and they should be all jammed up in a
 ;; really nice way...
 ;; 
 ;; (assoc-in (assoc-in {} [:N :AH :SH :K :EH :N :AH :K] { :terminate
 ;;   "connection" }) [:N :AH :SH :K :EH :ER :K] {:terminate
 ;;   "correction"})
 ;;
 ;; Okay, yes we end up with something that will let us take one
 ;; word, look up the phonemes (in reverse order) and look for
 ;; similar phonemes, then map those back to the words. That's what
 ;; we have here, no doubt. Looks like this:
 ;; 
 ;; {:N {:AH {:SH {:K {:EH {:ER {:K {:terminate "correction"}}, :N
 ;; {:AH {:K {:terminate "connection"}}}}}}}}}
 ;;
 ;; And since I can assoc-in I can get in too.
 ;;
 ;; Okay so on we go...

 (defn get-deepest-values [k]
  (if (string? k) [k] (mapcat get-deepest-values (vals k))))

 ;; What the hell is this? What is it for? OH GOD.
 ;; 
 ;; So here we're looking for strings inside a nest of keywords--that
 ;; makes sense. What is mapcat?  Clojure docs: "Returns the result
 ;; of applying concat to the result of applying map to f and
 ;; colls. Thus function f should return a collection."
 ;;
 ;; Great, thanks Clojure docs.  You're my pal.

 ;; What it means I think is that you're going to give this function
 ;; a bundle of stuff and it'll do something to each piece of stuff
 ;; (map...) and then smush everything together into one nice list
 ;; (...cat).  So we're saying given a nested associated structure
 ;; like the one we just made, pull out all the values ...
 ;;
 ;; wait hold on--let's look at how it's actually called.

 ;; Okay this is the big mooooment
 (get-deepest-values (get-in rhyme-to-word (take 5 (word-to-rhyme "erection"))))

 ;; And this gives a result thus: ("erection" "direction" "correction" "collection" "inflection" "preelection" "circumspection" "introspection" "imperfection" "perfection" "midsection" "transection" "connection" "protection")

 ;; But sometimes it's all too last-first for me, so Let's do that
 ;; using this guy "->>" with comments

 (->>
  
  (word-to-rhyme "erection")
  ;; gives us [:N :AH :SH :K :EH :R :IH]

  (take 5)
  ;; gives us (:N :AH :SH :K :EH)--i.e. five phonemes, or the "ection" part of the rhyme.

  (get-in rhyme-to-word)
  ;; so we're calling (get-in rhyme-to-word '(:N :AH :SH :K :EH))
  ;; and we get this result:
  ;;
  ;; {:R {:IH {:terminate "erection"}}, :ER {:D
  ;; {:terminate "direction"}, :K {:terminate "correction"}}, :L {:AH
  ;; {:K {:terminate "collection"}}, :F {:N {:IH
  ;; {:terminate "inflection"}}}, :IH {:IY {:R {:P
  ;; {:terminate "preelection"}}}}}, :P {:S {:M {:AH {:K {:ER {:S
  ;; {:terminate "circumspection"}}}}}, :AH {:R {:T {:N {:IH
  ;; {:terminate "introspection"}}}}}}}, :F {:ER {:P {:M {:IH
  ;; {:terminate "imperfection"}}, :terminate "perfection"}}}, :S {:D
  ;; {:IH {:M {:terminate "midsection"}}}, :N {:AE {:R {:T
  ;; {:terminate "transection"}}}}}, :N {:AH {:K
  ;; {:terminate "connection"}}}, :T {:AH {:R {:P
  ;; {:terminate "protection"}}}}}
  ;;
  ;; Okay so THAT's what we're working with when we call....

  
  (get-deepest-values))

 ;; So it turns out that all that
 ;; 
 ;; (defn get-deepest-values [k]
 ;;  (if (string? k) [k] (mapcat get-deepest-values (vals k))))
 ;; 
 ;; Does is say: Get the values from the key/value pairs that are in
 ;; a map called "k." If you hit any value at all that is a string,
 ;; return it and you're done for that part. OTHERWISE keep mapping
 ;; over the values and run this function again on each one of them
 ;; (until you hit a string). And however nested things are is fine
 ;; and all, but please return a nice flat list of results (that's
 ;; why it's mapcat instead of cat")
 ;;
 ;; So it's like you gave it a siamese russian doll. And it keeps
 ;; looking inside the first twin's dolls until it finds a piece of
 ;; paper with a word or two on it. And then it throws away all the
 ;; dolls. And it does the same to the other twin. Maybe the first
 ;; twin is three dolls deep. Maybe the second twin is four dolls
 ;; deep.

 ;; So that's the reason we switched strings to keywords, because
 ;; we knew we'd be looking things up by phonemes and we wanted
 ;; a data type that is good for "looking things up in deeply
 ;; nested maps" and if there's ONE data type like that, it's keywords! Maybe;
 ;; god if I know.

 ;; Of all of them, these tiny recursive functions are the hardest to
 ;; write and understand.

 ;; Anyway, that's how it works, I think.
	;; So we want a rhyming dictionary in Clojure.
	;;
	;; https://gist.github.com/jackrusher/8640437
	;;
	;; I'm going to study this code and learn as I go.
	;;
	;; First I put it in a namespace.

	(ns unscroll.rhyme
	(:require
	[clojure.string :as string]))


	;; So that I can mess with it inside my current environment in
	;; emacs.
	;;
	(def rhyme-txt
	(map #(string/split % #"[ ]+")
	(string/split-lines
	(slurp "data/cmudict.txt"))))

	;; Good ol' CMU rhyming dictionary; nice to see you again my friend!
	;;
	;; The format of the dictionary is such:
	;;
	;; ABDOMEN AE0 B D OW1 M AH0 N
	;;
	;; ie.
	;;
	;; WORD[SPACE]PHONEME_1[SPACE]PHONEME_2[SPACE]...PHONEME_N[NEWLINE]
	;;
	;; So we slurp in the file, split the lines by newlines, and then
	;; split on space.
	;;
	;; Gotta love slurp; one thing it took me a while to figure out the
	;; other day is that slurp starts looking for files at the top level
	;; of the Clojure project. Where the source files are means nothing
	;; to Clojure (because it means nothing to the JVM).
	;;
	;; Anyway now that we've done that we can say:
	;; (take 3 (drop 1010 rhyme-txt))
	;;
	;; (We want to fast-forward past a bunch of comments and miscellany
	;; to get a clear example, so start around 1010)
	;;
	;; And we'd see
	;; (["ACTUARY" "AE1" "K" "CH" "UW0" "EH1" "R" "IY2"]
	;; ["ACTUATE" "AE1" "K" "CH" "UW2" "EY1" "T"]
	;; ["ACTUATOR" "AE1" "K" "T" "Y" "UW0" "EY2" "T" "ER0"])
	;;
	;; So now we're dealing with a list of vectors where the first value
	;; is the word and the rest represent its phonemes.
	;;
	;; There's a lot going on with this one as there always is with
	;; Clojure code. Let's see how it gets called first:
	;;
	;; (get-deepest-values (get-in rhyme-to-word (take 3 (word-to-rhyme "connection"))))
	;; Which could probably be a function of its own called "rhyme."
	;;
	;; Although I've noticed Jack doesn't really care much whether
	;; something is a function or a variable, it's like it DOESN'T EVEN
	;; MATTER. Hmm.
	;;
	;; Anyway, we're going to give this function, word-to-rhyme a word
	;; and get something back.
	;;
	;; (word-to-rhyme "connection")
	;;
	;; and we get
	;;
	;; [:N :AH :SH :K :EH :N :AH :K]
	;;
	;; which is the reverse order of phonemes. Which makes sense, we're
	;; building a rhyming dictionary, gonna go end to beginning in terms
	;; of phonemes.
	;;
	;; Going to add a crazy amount of indenting here so I can see the
	;; levels more clearly as a n00b.

	(def word-to-rhyme
	(reduce (fn [m [word & rhyme]]
	(assoc m
	(string/lower-case word)
	(mapv
	#(keyword
	(string/replace %1 #"[0-9]" ""))
	(reverse rhyme))))
	{} rhyme-txt))

	;; This is a funny one because I mess with the parens my java keeps
	;; running out of heap space. So it's clearly doing a lot, like
	;; loading the whole CMU dictionary into memory.

	;; The super hot LISPY action in there is

	;; (reduce function=[that function] val=[an empty map {}]
	;; coll=[rhyme-txt, our list of vectors])

	;; Clojure docs are kind of a bear. For reduce, they say "If val is
	;; supplied, returns the result of applying f to val and the first
	;; item in coll, then applying f to that result and the 2nd item,
	;; etc." AWESOME.

	;; So what reduce does here is return the results applying that
	;; function in there to {} and the first item in the rhyme-text,
	;; then applies that to the second item, etc.

	;; So in [m [word & rhyme]] it's going to be going:
	;;
	;; [{} ["CONNECTION" K AH N EH K SH AH N ]]
	;;
	;; So that's interesting because we have the word & rhyme -- the way
	;; that destructuring works, rhyme will catch all of the phonemes
	;; into a list; it's almost like the CMU people could predict this
	;; kind of programming would occur using their dictionary.
	;; Then we say:
	;;
	;; (assoc map key val)
	;;
	;; Or here:
	;;
	;; (assoc {} "connection" ...)
	;;
	;; And then a couple things happen
	;;
	;; First we reverse the rhyme so "K AH0 N EH1 K SH AH0 N" becomes
	;; N AH0 SH K EH1 N AH0 K
	;; that's (reverse rhyme)
	;;
	;; SECOND we replace all the numbers with nothing (likely because we
	;; just don't need the data, can't remember why CMU uses numbers--
	;; syllable stress maybe?)
	;;
	;; we do that with a regular expression yielding
	;;
	;; N AH SH K EH N AH K
	;;
	;; THIRD we run a mapv across that list of phonemes--that is, apply
	;; to each element and return a vector. And what we are applying is
	;; the "keyword" function which turns a string to a clojure keyword so
	;; we end up with a structure like:
	;;
	;; {"connection" [:N :AH :SH :K :EH :N :AH :K]}
	;;
	;; NOTE: I'm not sure WHY we're converting to keywords but they are
	;; prettier in general and make for better keywords in maps, and I'm
	;; assuming they actually are optimized as, like, keywords, so....
	;;
	;; Anyway, and then we repeat that (lazily, I guess, so in chunks of
	;; 32 or whatever it is that Clojure does?) as needed until we've
	;; slurped up the whole file into a big map or what I still think of
	;; as an associative array. Aaaand now we have a variable that
	;; defines a function that when given---
	;;
	;; Oh god, I SEE. I ACTUALLY SEE. This is a def instead of a
	;; function for a reason. It's a var that calls a function which
	;; returns a map, but in Clojure a map can operate as a function. So
	;; when I say:
	;;
	;; (word-to-rhyme "connection") I'm causing the interpreter to read
	;; the entirety of the dictionary into a map, and assigning that map
	;; to "word-to-rhyme" and then because I'm calling word-to-rhyme as
	;; the first item in a sexp, the interpreter evaluates it as a
	;; function and returns the phonemes that it has assoc'd to that
	;; word.
	;;
	;; Clojure is kind of dense.
	;;
	;; So I'm going to assume we're in similar territory here with this
	;; function.

	(def rhyme-to-word
	(reduce
	(fn [m [word rhyme]]
	(assoc-in m rhyme { :terminate word }))
	{}
	word-to-rhyme))

	;; Aand we are, KIND OF. Hmm. So in this case we take the map (now just a
	;; map) from word-to-rhyme and do another reduce, except the
	;; structure we're building up is going to be a trie (?) so we're
	;; going:
	;;
	;; (assoc-in {} [:N :AH :SH :K :EH :N :AH :K] { :terminate "connection" })
	;;
	;; And as a result we're getting:
	;;
	;; {:N {:AH {:SH {:K {:EH {:N {:AH {:K {:terminate "connection"}}}}}}}}}
	;;
	;; Great but that's one word. NOW Clojure hands that same map back
	;; to the reduce with ANOTHER word, and so on for thousands of
	;; words, building up a huge nested behemoth of a data structure.
	;;
	;; So we've passed assoc-in the phonemes for "connection"; we can
	;; now pass it "correction" and they should be all jammed up in a
	;; really nice way...
	;;
	;; (assoc-in (assoc-in {} [:N :AH :SH :K :EH :N :AH :K] { :terminate
	;; "connection" }) [:N :AH :SH :K :EH :ER :K] {:terminate
	;; "correction"})
	;;
	;; Okay, yes we end up with something that will let us take one
	;; word, look up the phonemes (in reverse order) and look for
	;; similar phonemes, then map those back to the words. That's what
	;; we have here, no doubt. Looks like this:
	;;
	;; {:N {:AH {:SH {:K {:EH {:ER {:K {:terminate "correction"}}, :N
	;; {:AH {:K {:terminate "connection"}}}}}}}}}
	;;
	;; And since I can assoc-in I can get in too.
	;;
	;; Okay so on we go...

	(defn get-deepest-values [k]
	(if (string? k) [k] (mapcat get-deepest-values (vals k))))

	;; What the hell is this? What is it for? OH GOD.
	;;
	;; So here we're looking for strings inside a nest of keywords--that
	;; makes sense. What is mapcat? Clojure docs: "Returns the result
	;; of applying concat to the result of applying map to f and
	;; colls. Thus function f should return a collection."
	;;
	;; Great, thanks Clojure docs. You're my pal.

	;; What it means I think is that you're going to give this function
	;; a bundle of stuff and it'll do something to each piece of stuff
	;; (map...) and then smush everything together into one nice list
	;; (...cat). So we're saying given a nested associated structure
	;; like the one we just made, pull out all the values ...
	;;
	;; wait hold on--let's look at how it's actually called.

	;; Okay this is the big mooooment
	(get-deepest-values (get-in rhyme-to-word (take 5 (word-to-rhyme "erection"))))

	;; And this gives a result thus: ("erection" "direction" "correction" "collection" "inflection" "preelection" "circumspection" "introspection" "imperfection" "perfection" "midsection" "transection" "connection" "protection")

	;; But sometimes it's all too last-first for me, so Let's do that
	;; using this guy "->>" with comments

	(->>

	(word-to-rhyme "erection")
	;; gives us [:N :AH :SH :K :EH :R :IH]

	(take 5)
	;; gives us (:N :AH :SH :K :EH)--i.e. five phonemes, or the "ection" part of the rhyme.

	(get-in rhyme-to-word)
	;; so we're calling (get-in rhyme-to-word '(:N :AH :SH :K :EH))
	;; and we get this result:
	;;
	;; {:R {:IH {:terminate "erection"}}, :ER {:D
	;; {:terminate "direction"}, :K {:terminate "correction"}}, :L {:AH
	;; {:K {:terminate "collection"}}, :F {:N {:IH
	;; {:terminate "inflection"}}}, :IH {:IY {:R {:P
	;; {:terminate "preelection"}}}}}, :P {:S {:M {:AH {:K {:ER {:S
	;; {:terminate "circumspection"}}}}}, :AH {:R {:T {:N {:IH
	;; {:terminate "introspection"}}}}}}}, :F {:ER {:P {:M {:IH
	;; {:terminate "imperfection"}}, :terminate "perfection"}}}, :S {:D
	;; {:IH {:M {:terminate "midsection"}}}, :N {:AE {:R {:T
	;; {:terminate "transection"}}}}}, :N {:AH {:K
	;; {:terminate "connection"}}}, :T {:AH {:R {:P
	;; {:terminate "protection"}}}}}
	;;
	;; Okay so THAT's what we're working with when we call....


	(get-deepest-values))

	;; So it turns out that all that
	;;
	;; (defn get-deepest-values [k]
	;; (if (string? k) [k] (mapcat get-deepest-values (vals k))))
	;;
	;; Does is say: Get the values from the key/value pairs that are in
	;; a map called "k." If you hit any value at all that is a string,
	;; return it and you're done for that part. OTHERWISE keep mapping
	;; over the values and run this function again on each one of them
	;; (until you hit a string). And however nested things are is fine
	;; and all, but please return a nice flat list of results (that's
	;; why it's mapcat instead of cat")
	;;
	;; So it's like you gave it a siamese russian doll. And it keeps
	;; looking inside the first twin's dolls until it finds a piece of
	;; paper with a word or two on it. And then it throws away all the
	;; dolls. And it does the same to the other twin. Maybe the first
	;; twin is three dolls deep. Maybe the second twin is four dolls
	;; deep.

	;; So that's the reason we switched strings to keywords, because
	;; we knew we'd be looking things up by phonemes and we wanted
	;; a data type that is good for "looking things up in deeply
	;; nested maps" and if there's ONE data type like that, it's keywords! Maybe;
	;; god if I know.

	;; Of all of them, these tiny recursive functions are the hardest to
	;; write and understand.

	;; Anyway, that's how it works, I think.
No results found