Skip to content

Instantly share code, notes, and snippets.

@henrik
Created March 3, 2012 17:07
Show Gist options
  • Save henrik/1967035 to your computer and use it in GitHub Desktop.
Save henrik/1967035 to your computer and use it in GitHub Desktop.

Revisions

  1. henrik revised this gist Mar 3, 2012. 1 changed file with 3 additions and 1 deletion.
    4 changes: 3 additions & 1 deletion ocr.markdown
    Original file line number Diff line number Diff line change
    @@ -4,7 +4,9 @@ Install ImageMagick for image conversion:

    Install tesseract for OCR:

    brew install tesseract
    brew install tesseract --all-languages

    Or install without `--all-languages` and [install them manually as needed](http://blog.philippklaus.de/2011/01/chinese-ocr/).

    Make sure the input image is a grayscale `.tif` and fairly large. ~500x150 was too small, while ~2000*500 worked very well.

  2. henrik created this gist Mar 3, 2012.
    17 changes: 17 additions & 0 deletions ocr.markdown
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,17 @@
    Install ImageMagick for image conversion:

    brew install imagemagick

    Install tesseract for OCR:

    brew install tesseract

    Make sure the input image is a grayscale `.tif` and fairly large. ~500x150 was too small, while ~2000*500 worked very well.

    convert input.png -resize 400% -type Grayscale input.tif

    OCR it. The default language is English. Language codes are 3 chars per `man tesseract`.

    tesseract -l eng input.tif output

    This creates `output.txt`.