some small programs


GUI frontend to convert Scan Tailor tiff output to a OCR'ed, searchable djvu file


Detailed Info:

made in Autohotkey by nod5 as free software under GPL3

How to use: Drag drop a file on a command.

The first command takes a .tiff as input,
operates on all .tiff in dropfile folder and
outputs an OCR'ed, searchable .djvu file.

- for use on .tiff from Scan Tailor
- operates on *all* .tiff in same folder as dropped file
- uses -lossy setting to minimize djvu file size

required programs (try latest windows binary version):
1. (djvulibre 3.5.22, 3.5.23 works)
2. (tesseract-3 works)
check ReadMe/FAQ on site; two downloads needed:
eng.traineddata.gz (unpack and put in subfolder tesseract-ocr\tessdata\ )

command line use:
TiffDjvuOcr.exe "C:\b\a.tif"    = (all dropfolder) .tiff to .djvu ocr
TiffDjvuOcr.exe noocr "C:\b\a.tif"  = (all dropfolder) .tiff to .djvu
TiffDjvuOcr.exe "C:\b\a.djvu"    = ocr .djvu
TiffDjvuOcr.exe gettif "C:\b\a.djvu"   = get .tiff = .djvu to multipage tiff
TiffDjvuOcr.exe img "C:\b\a.jpg"    = single image file to .djvu
TiffDjvuOcr.exe join "C:\b\a.djvu"    = join .djvu (all in dropfolder)
TiffDjvuOcr.exe noloss "C:\b\a.tiff"    = (all dropfolder) .tiff to .djvu no-loss (bigger file; use if char errors in djvu)

Changelog + md5:

50bc4f32bd7e1b91311bf725a65dc416 *TiffDjvuOcr.ahk
36d2633fdecbe4502fdbb49d0babed06 *TiffDjvuOcr.exe
v110305 New commands: to .djvu no-loss , join .djvu , img to .djvu; Autohotkey_L compatible.
v101013 ImageMagick no longer needed; now using Tesseract 3; fixed error at ocr on pages with no text
v100605 Perl no longer needed for processing tesseract output (thanks ewemoa!)
v100404 first release