nod5
some small programs

TiffDjvuOcr

GUI frontend to convert Scan Tailor tiff output to a OCR'ed, searchable djvu file

   

Detailed Info:

made in Autohotkey by nod5 as free software under GPL3

How to use: Drag drop a file on a command.


The first command takes a .tiff as input,
operates on all .tiff in dropfile folder and
outputs an OCR'ed, searchable .djvu file.


- for use on .tiff from Scan Tailor
- operates on *all* .tiff in same folder as dropped file
- uses -lossy setting to minimize djvu file size

required programs (try latest windows binary version):
1. djvu.sourceforge.net (djvulibre 3.5.22, 3.5.23 works)
2. code.google.com/p/tesseract-ocr/ (tesseract-3 works)
check ReadMe/FAQ on site; two downloads needed:
tesseract-3.00.win32.zip
eng.traineddata.gz (unpack and put in subfolder tesseract-ocr\tessdata\ )


command line use:
TiffDjvuOcr.exe "C:\b\a.tif"    = (all dropfolder) .tiff to .djvu ocr
TiffDjvuOcr.exe noocr "C:\b\a.tif"  = (all dropfolder) .tiff to .djvu
TiffDjvuOcr.exe "C:\b\a.djvu"    = ocr .djvu
TiffDjvuOcr.exe gettif "C:\b\a.djvu"   = get .tiff = .djvu to multipage tiff
TiffDjvuOcr.exe img "C:\b\a.jpg"    = single image file to .djvu
TiffDjvuOcr.exe join "C:\b\a.djvu"    = join .djvu (all in dropfolder)
TiffDjvuOcr.exe noloss "C:\b\a.tiff"    = (all dropfolder) .tiff to .djvu no-loss (bigger file; use if char errors in djvu)

Changelog + md5:

50bc4f32bd7e1b91311bf725a65dc416 *TiffDjvuOcr.ahk
36d2633fdecbe4502fdbb49d0babed06 *TiffDjvuOcr.exe
v110305 New commands: to .djvu no-loss , join .djvu , img to .djvu; Autohotkey_L compatible.
v101013 ImageMagick no longer needed; now using Tesseract 3; fixed error at ocr on pages with no text
v100605 Perl no longer needed for processing tesseract output (thanks ewemoa!)
v100404 first release