Created
March 3, 2012 17:07
-
Star
(237)
You must be signed in to star a gist -
Fork
(36)
You must be signed in to fork a gist
-
-
Save henrik/1967035 to your computer and use it in GitHub Desktop.
Revisions
-
henrik revised this gist
Mar 3, 2012 . 1 changed file with 3 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -4,7 +4,9 @@ Install ImageMagick for image conversion: Install tesseract for OCR: brew install tesseract --all-languages Or install without `--all-languages` and [install them manually as needed](http://blog.philippklaus.de/2011/01/chinese-ocr/). Make sure the input image is a grayscale `.tif` and fairly large. ~500x150 was too small, while ~2000*500 worked very well. -
henrik created this gist
Mar 3, 2012 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,17 @@ Install ImageMagick for image conversion: brew install imagemagick Install tesseract for OCR: brew install tesseract Make sure the input image is a grayscale `.tif` and fairly large. ~500x150 was too small, while ~2000*500 worked very well. convert input.png -resize 400% -type Grayscale input.tif OCR it. The default language is English. Language codes are 3 chars per `man tesseract`. tesseract -l eng input.tif output This creates `output.txt`.