Skip to content

Instantly share code, notes, and snippets.

@hamdshah
Forked from pnc/ocr-shot.sh
Last active March 19, 2018 13:00
Show Gist options
  • Select an option

  • Save hamdshah/a94df20fe56c0029e6057cdab255337a to your computer and use it in GitHub Desktop.

Select an option

Save hamdshah/a94df20fe56c0029e6057cdab255337a to your computer and use it in GitHub Desktop.

Revisions

  1. @hamd-shah hamd-shah revised this gist Mar 19, 2018. 1 changed file with 7 additions and 1 deletion.
    8 changes: 7 additions & 1 deletion ocr-shot.sh
    Original file line number Diff line number Diff line change
    @@ -14,4 +14,10 @@ EOF
    ) | plutil -convert binary1 - -o - | xxd -p | tr -d '\n')
    xattr -w -x com.apple.metadata:kMDItemFinderComment "$hex" "$1"
    mdimport "$1"
    mdimport "$1"
    #OR use this one.
    #If you screenshot individual windows, the alpha channel prevents Tesseract from scanning properly. Also a lot of UI text is too small to accurately scan. To solve this I preprocessed with ImageMagick like so:
    #CONTENTS=$(convert "$1" -magnify -alpha remove - | tesseract -c language_model_penalty_non_dict_word=0.8 --tessdata-dir /usr/local/share/ stdin stdout -l eng | xml esc)
    #Testing with a screenshot of my Terminal, I got better results with -magnify than -adaptive-resize '200%x200%', but YMMV.
  2. @pnc pnc revised this gist Mar 17, 2018. 1 changed file with 1 addition and 3 deletions.
    4 changes: 1 addition & 3 deletions ocr-shot.sh
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,6 @@
    #!/bin/bash

    set -ev
    set -e

    CONTENTS=$(tesseract -c language_model_penalty_non_dict_word=0.8 --tessdata-dir /usr/local/share/ "$1" stdout -l eng | xml esc)

    @@ -13,7 +13,5 @@ hex=$((cat <<EOF
    EOF
    ) | plutil -convert binary1 - -o - | xxd -p | tr -d '\n')
    echo $hex
    xattr -w -x com.apple.metadata:kMDItemFinderComment "$hex" "$1"
    mdimport "$1"
  3. @pnc pnc created this gist Mar 17, 2018.
    19 changes: 19 additions & 0 deletions ocr-shot.sh
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,19 @@
    #!/bin/bash

    set -ev

    CONTENTS=$(tesseract -c language_model_penalty_non_dict_word=0.8 --tessdata-dir /usr/local/share/ "$1" stdout -l eng | xml esc)

    hex=$((cat <<EOF
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
    <plist version="1.0">
    <string>$CONTENTS</string>
    </plist>
    EOF
    ) | plutil -convert binary1 - -o - | xxd -p | tr -d '\n')
    echo $hex
    xattr -w -x com.apple.metadata:kMDItemFinderComment "$hex" "$1"
    mdimport "$1"