Skip to content

Instantly share code, notes, and snippets.

Look for abbreviations in pattern files

I put a commented version that explains what this does at the bottom

  1. Install http://brew.sh/
  2. In Terminal: brew install poppler hunspell
  3. mkdir ~/Dictionaries
  4. Download dictionaries from https://cgit.freedesktop.org/libreoffice/dictionaries/tree/en and put them in there. Use the "plain" link. You need both the .dic and the .aff files.
  5. pdftotext pattern.pdf - | tr -s '[:blank:][:punct:]' '\n' | awk 'length($1) >= 2 && length($1) <= 5 { print $1 }' | hunspell -d ~/Dictionaries/en_US -a | awk '{print $1,$2}' | grep -v '^[\*@]' | tr -d '&#' | sort | uniq -c