GOCR

GOCR, also often referred to as jOCR, is an OCR (Optical Character Recognition) program, developed under the GNU Public License. It converts scanned images of text back to text files. Joerg Schulenburg started the program, and now leads a team of developers with Bruno Barberi Gnecco. It reads images in many formats (pnm, pbm, pgm, ppm, some pcx and tga image files and outputs a text file. The gOCR/2 port was done by Franz Bakan.

Here you can find OS/2 executables of GOCR Version 0.38 if you want to play with it. GOCR.EXE is compiled with GCC 3.2.1 (available via ftp from netlabs).

Usage
Type gocr -h for usage. Example 'one-liner' of a scan2text.cmd: scanimage --device=epson --mode=Gray --resolution=300 | gocr - > textfile.txt Another example: scanimage --device=epson --mode=Gray --resolution=300 1>out.pnm 2>out.error && gocr out.pnm > ocr.txt

Hints
If the image is complex or the letters are small, gocr is quite slooow. (expect duration of serveral minutes!).

I suggest that you make your first tests with small scans.

How to compile with GCC 3.2.1
get and install os2unix delete make.bat os2unix -all sh configure (probably only if you use 2.0 of os2unix) remove the 3 lines

ifeq ($(omf),on) LIBOBJS := $(LIBOBJS:.o=.obj) endif

after LIBOBJS = pgm2asc.$(obj) from src/Makefile make

Compiling wiht emx-gcc required 2 more steps: copy srclibPgm2asc.a srcPgm2asc.a make

Links

 * GOCR 0.47]