Tesseract Open Source OCR Engine

Edit Package tesseract-ocr

Tesseract is a free optical character recognition engine originally developed at Hewlett-Packard and currently developed by Google. It is a raw OCR engine - it has no document layout analysis, no output formatting, and no graphical user interface. It only processes a TIFF or BMP image of a single column and creates text from it. It can detect fixed pitch vs proportional text. The engine was in the top 3 in terms of character accuracy in 1995. The source code will read a binary, grey or color image and output text.

Tesseract can process English, French, Italian, German, Spanish, Brazilian, Portuguese and Dutch and can be trained to work in other languages as well.

Refresh
Refresh
Source Files
Filename Size Changed
baselibs.conf 0000000014 14 Bytes
tesseract-5.4.0.tar.gz 0001900009 1.81 MB
tesseract-ocr.changes 0000017020 16.6 KB
tesseract-ocr.spec 0000003868 3.78 KB
Latest Revision
Ana Guerrero's avatar Ana Guerrero (anag+factory) accepted request 1179190 from Martin Pluskal's avatar Martin Pluskal (pluskalm) (revision 17)
- Update to version 5.4.0:
  * Build fixes, code refactoring and other smaller changes.
  * Fix grey result of indexed PNG in pdfrenderer.
  * Rename frk -> deu_latf (ISO 639-3, ISO 15924).
  * Remove broken Dockerfile.
  * Fixes for several issues reported by Coverity Scan.
  * Remove unsupported OpenCL code and related API functions (#4220).
  * Facilitate vectorization for generic build (#4223).
  * Add PAGE XML renderer / export (#4214).
  * Support training without lstmf files.
  * Improve CCUtil::main_setup (fixes issue #4230 related to Coda).
  * Allow for text angle/gradient to be retrieved (#4070).
  * Fix setup of datadir on installations with Conda (issue #4230) (#4240)
  * Fix FP exception in Wordrec::angle_change (issue #4242) (#4243)
  * Small build fixes and code improvements 

- Disable opencl support due to boo#1213370:x
Comments 0
openSUSE Build Service is sponsored by