Tesseract Open Source OCR Engine
Tesseract is a free optical character recognition engine originally developed at Hewlett-Packard and currently developed by Google. It is a raw OCR engine - it has no document layout analysis, no output formatting, and no graphical user interface. It only processes a TIFF or BMP image of a single column and creates text from it. It can detect fixed pitch vs proportional text. The engine was in the top 3 in terms of character accuracy in 1995. The source code will read a binary, grey or color image and output text. Tesseract can process English, French, Italian, German, Spanish, Brazilian, Portuguese and Dutch and can be trained to work in other languages as well.
-
Links to Publishing / tesseract-ocr
-
Has a link diff
-
Download package
Source Files
(show merged sources derived from linked package)
Filename | Size | Changed | Actions |
---|---|---|---|
3.04.01.tar.gz | 00022691052.16 MB | 1455882774about 2 years ago | ![]() |
_link | 0000000118118 Bytes | 1461585757almost 2 years ago | ![]() |
tesseract-ocr.changes | 00000066676.51 KB | 1455882774about 2 years ago | ![]() |
tesseract-ocr.spec | 00000037923.7 KB | 1461585757almost 2 years ago | ![]() |