File ocrodjvu.changes of Package ocrodjvu
-------------------------------------------------------------------
Tue Apr 9 17:04:12 UTC 2019 - Kyrill Detinov <lazy.kent@opensuse.org>
- Update to 0.11.
* Tesseract:
+ Don't insist that language codes are always 3-letter long.
+ Make it possible to pass arbitrary options to Tesseract.
+ Speed up extraction of character-level details for Tesseract ≥ 3.04.
* Limit the number of OMP threads (used by Tesseract), so that
the overall number of threads doesn't exceed the number specified
by -j.
* Require subprocess32, even when no parallelism were requested
by user.
* Reduce memory consumption by keeping OCR results in memory only
as long as necessary.
* Stop honoring the “tessdata” environment variable.
* Suggest using -e/--engine if (implicitly selected) default OCR
engine was not found.
* Improve error handling.
* Improve documentation.
* Improve the test suite.
- Add runtime dependency: python2-subprocess32.
-------------------------------------------------------------------
Tue Oct 16 21:03:16 UTC 2018 - lazy.kent@opensuse.org
- Update to 0.10.4.
* Fix handling input files with non-ASCII names. Regression
introduced in 0.9.1.
* Improve documentation:
+ Fix punctuation.
+ Clarify Python version requirements.
+ Update the credits file to make it clear that the project is
no longer being funded.
-------------------------------------------------------------------
Thu May 24 09:23:55 UTC 2018 - lazy.kent@opensuse.org
- Update to 0.10.3.
* Tesseract: fix stripping multi-line comments.
* Drop support for python-djvulibre < 0.3.9.
* Improve documentation:
+ Document untrusted search path vulnerability that was unknowingly
fixed in 0.4.7.
+ Document that argparse is only needed for Python 2.6.
+ Link to Python 2 (not Python 3) documentation.
+ Use HTTPS for unicode.org URLs.
+ Update Tesseract bug tracker URL.
+ Update HTML5 specification URL.
+ Update PyPI URLs.
* Improve the setup script:
+ Don't import any own modules.
+ Use distutils644 to sanitize tarball permissions etc.
* Improve the test suite:
+ Fix compatibility with subprocess32 3.5.0.
- Add Recommends: python-subprocess32.
-------------------------------------------------------------------
Sat Apr 28 07:43:29 UTC 2018 - lazy.kent@opensuse.org
- Correct dependencies (python -> python2).
- Replace "env" shebang with "python2".
- Remove BuildRequires: gpg-offline.
-------------------------------------------------------------------
Tue Apr 4 11:52:51 UTC 2017 - lazy.kent@opensuse.org
- Update to 0.10.2.
* Make --version print also versions of Python and the libraries.
* Make --version print to stdout, not stderr.
* Make bad usage exit status 1.
* Drop support for PyICU < 1.0.
* Update DocBook XSL homepage URL.
- Update the source URL.
-------------------------------------------------------------------
Fri Nov 25 16:20:22 UTC 2016 - lazy.kent@opensuse.org
- Update to 0.10.1.
* Don't hardcode the Python interpreter path in script shebangs;
use “#!/usr/bin/env python” instead.
* Include a missing test image in the tarball.
* Update Tesseract homepage URL.
* Update bug tracker URLs. The project repo has moved to GitHub.
-------------------------------------------------------------------
Tue Jul 5 18:07:39 UTC 2016 - lazy.kent@opensuse.org
- Update to 0.10.
* Add support for cuneiform-multilang as OCR engine.
* Improve error handling.
-------------------------------------------------------------------
Wed Jun 1 08:34:42 UTC 2016 - lazy.kent@opensuse.org
- Update to 0.9.2.
* Fix crashes on empty pages.
* Fix typos.
* Ignore boring diagnostic messages from Tesseract.
* Update the HTML5 specification URLs.
* Update the ICU website URL.
* Update the PyICU website URL.
* Rename the test modules, so that passing --all to nosetests is
no longer necessary.
- Correct the source URL.
-------------------------------------------------------------------
Mon Aug 31 16:41:15 UTC 2015 - lazy.kent@opensuse.org
- Update to 0.9.1.
* Use the subprocess32 module (a thread-safe replacement for the
subprocess module) when it's available.
* Issue a warning when the -j/--jobs is enabled, but the
subprocess is not thread-safe.
* Include an example script for converting scans to DjVu + hOCR.
* Improve error handling.
- Package an example script.
-------------------------------------------------------------------
Fri Jul 31 19:58:45 UTC 2015 - lazy.kent@opensuse.org
- Update to 0.9.
* If python-djvulibre >= 0.4 is installed, don't escape non-ASCII
characters in djvused scripts.
* Improve error handling.
-------------------------------------------------------------------
Sun Jun 21 13:18:35 UTC 2015 - lazy.kent@opensuse.org
- Update to 0.8.
* Change the default OCR engine to Tesseract.
* Add the “tesseract: ” prefix to messages Tesseract prints on
stderr.
* Ensure that exit code is non-zero if the program recovered from
an error.
* Improve error handling.
- Drop ocrodjvu-0.7.2-engine.patch: upstream uses tesseract now.
-------------------------------------------------------------------
Fri Nov 14 18:25:43 UTC 2014 - lazy.kent@opensuse.org
- Update to 0.7.19.
* Make sure that text zones are at least 1 pixel wide and 1 pixel
high.
* Tesseract: fix splitting bounding boxes for character clusters.
* Fix typos in the documentation.
-------------------------------------------------------------------
Wed Apr 23 04:54:43 UTC 2014 - lazy.kent@opensuse.org
- Update to 0.7.18.
* Fix counting pages when file identifier cannot be converted to
locale encoding.
* Use HTTPS URLs when they are available, in documentation and
code.
* Update some stale URLs in documentation and code.
- Update keyring.
-------------------------------------------------------------------
Sat Feb 8 09:11:56 UTC 2014 - lazy.kent@opensuse.org
- Update to 0.7.17.
* Fix compatibility with Tesseract > 3.02.
* ocrodjvu:
+ Ensure that exit code is non-zero if the program was
interrupted by user.
+ Fix typos in the documentation.
-------------------------------------------------------------------
Tue Apr 30 05:32:55 UTC 2013 - lazy.kent@opensuse.org
- Update to 0.7.16.
* Use “en-US-POSIX” as the default locale for ICU.
* ocrodjvu:
+ Fix option names in documentation of the --ocr-only option.
+ Don't crash if file identifier is not in UTF-8 or if it
cannot be converted to locale encoding; use the page number
instead.
+ Don't hang if a page cannot be decoded.
-------------------------------------------------------------------
Wed Apr 17 18:08:30 UTC 2013 - lazy.kent@opensuse.org
- Update to 0.7.15.
* Strip trailing whitespace from text zones bigger than words
(lines, paragraphs, …).
* Fix compatibility with Tesseract 3.02.
* ocrodjvu:
+ Make it possible to pass multiple languages to Tesseract ≥
3.02.
+ Cuneiform: rename mixed Russian-English language code:
“rus-eng” → “rus+eng”. This is for consistency with
Tesseract.
+ Tesseract: fix support for Chinese language pack.
+ Tesseract: make it possible to pass the -psm option in order
to customize layout analysis. For example, to enable OSD,
use: “-X extra_args='-psm 1'”.
+ Make --list-languages output sorted.
+ Tesseract: remove “osd” from language list.
+ Accept both ISO 639-2/T and ISO 639-2/B language codes.
+ Add the --save-raw-ocr option.
+ Add the --raw-ocr-filename-template option.
+ Improve documentation of the --ocr-only option.
* Fix compatibility with nose 1.2.
- Drop ocrodjvu-0.7.14-nose.patch (fixed upstream).
- Change Requires: -> Recommends: python-PyICU (required for the
``--word-segmentation=uax29`` option), python-html5lib (required
for the ``--html5`` option).
-------------------------------------------------------------------
Sun Apr 7 16:08:29 UTC 2013 - lazy.kent@opensuse.org
- Add ocrodjvu-0.7.14-nose.patch: fix compatibility with
python-nose 1.2 (needed for tests).
-------------------------------------------------------------------
Mon Apr 1 14:50:21 UTC 2013 - lazy.kent@opensuse.org
- Update to 0.7.14.
* Document which versions of OCRopus are supported.
* Document that PyICU and html5lib are only required for some
optional features.
* Document what software is needed to rebuild the manual pages
from source.
* djvu2hocr:
+ Add the --title option.
+ Add the --css option.
+ Document the -p/--pages option.
-------------------------------------------------------------------
Fri Feb 15 06:22:44 UTC 2013 - lazy.kent@opensuse.org
- Update to 0.7.13.
* Improve the manual pages.
* Improve the test suite.
- Verify GPG signature.
-------------------------------------------------------------------
Sat Aug 25 07:21:13 UTC 2012 - lazy.kent@opensuse.org
- Update to 0.7.12.
* Don't let “-X fix-html=1” break HTML snippets ocrodjvu
generates itself for the “-t chars” Tesseract support.
- Depends on python-xml in openSUSE < 12.1.
-------------------------------------------------------------------
Sat Jun 16 18:26:17 UTC 2012 - lazy.kent@opensuse.org
- Update to 0.7.11.
* hocr2djvused:
+ Allow processing multiple hOCR documents at once.
* Fix merging results of two Tesseract runs.
- Changes in 0.7.10.
* Improve error handling.
* ocrodjvu:
+ Attempt to fix encoding issues and eliminate unwanted control
characters in files produced by Tesseract and Cuneiform.
* hocr2djvused:
+ Add the --fix-utf8 option.
* djvu2hocr:
+ Translate DjVu “region” to <div class="ocrx_block"> (instead
of <span…>, which was causing XHTML validity errors).
* Include example scans2djvu+hocr script.
* Fix merging results of two Tesseract runs.
* Use RFC 3339 date format in the manual page.
- Changes in 0.7.9.
* Improve error handling.
* Fix compatibility with Tesseract > 3.01.
-------------------------------------------------------------------
Fri Jan 27 09:17:08 UTC 2012 - lazy.kent@opensuse.org
- Update to 0.7.8.
* Improve test suite.
- Changes in 0.7.7.
* Raise proper import error if html5lib is not installed.
-------------------------------------------------------------------
Mon Nov 14 08:38:05 UTC 2011 - lazy.kent@opensuse.org
- Update to 0.7.6.
* Improve error handling.
* ocrodjvu:
+ Fix a regression in gocr, ocrad and tesseract engines, which
made them unusable.
- Changes in 0.7.5.
* Accept slightly malformed hOCR documents (with a text zone not
completely within the page area).
* Fix compatibility with Tesseract > 3.00.
* ocrodjvu, hocr2djvused:
+ Add the --html5 option.
- Added COPYING to docs.
- spec clean up.
-------------------------------------------------------------------
Wed Aug 17 14:02:05 UTC 2011 - lazy.kent@opensuse.org
- Don't run test suite (success locally and failed in OBS).
-------------------------------------------------------------------
Fri Aug 12 20:08:47 UTC 2011 - lazy.kent@opensuse.org
- Update to 0.7.4.
* hocr2djvused:
+ Ignore comments and <script> elements in hOCR.
* For Tesseract ≥ 3.00, extract bounding boxes of particular
characters with higher accuracy.
- Run test suite.
-------------------------------------------------------------------
Sat Jul 30 11:31:43 UTC 2011 - lazy.kent@opensuse.org
- Update to 0.7.3.
- Use python-setuptools.
- Corrected License tag.
- Use full URL as a source.
-------------------------------------------------------------------
Thu Apr 14 00:09:53 UTC 2011 - lazy.kent@opensuse.org
- Update to 0.7.2.
* Don't hang if one of the threads raises an exception.
* Use the logging module for printing progress messages, errors
etc.
- Changes in 0.7.1.
* ocrodjvu:
+ Work around a bug in Cuneiform, which mistakenly use ‘slo’
(rather than ‘slv’) as language code for Slovenian.
Accept ‘ces’, ‘nld’, ‘slv’, ‘ron’ as language codes for
Czech, Dutch, Slovenian and Romanian languages, even when
Cuneiform internally use different ones.
* djvu2hocr:
+ Don't flip hOCR upside-down.
- Engine patch refresh.
- Added credits.txt, todo.txt.
-------------------------------------------------------------------
Mon Nov 8 10:08:16 UTC 2010 - lazy.kent.suse@gmail.com
- Update to 0.7.0.
* Correctly handle empty pages recognized by Cuneiform and Ocrad.
* Fix crash on Cuneiform-generated hOCR with bounding boxes for
whitespace characters.
* Fix compatibility with Tesseract 3.00.
* Fix colors in 24-bit BMP images.
* ocrodjvu:
+ Make ‘-e’ an alias for ‘--engine’.
+ Make ‘-l’ an alias for ‘--language’.
+ Add the -X option (for advanced users).
+ Work-around for Cuneiform returning files with control
characters is now disabled by default. Use ‘-X fix-html=1’ to
re-enable it.
+ Add the --on-error option (for advanced users).
* djvu2hocr:
+ Fix a typo, which prevented hocr2djvused from correctly
parsing files.
-------------------------------------------------------------------
Fri Oct 1 23:57:14 UTC 2010 - lazy.kent.suse@gmail.com
- Update to 0.6.1.
* Improve detection of Tesseract.
* Correctly handle unrecognized and non-ASCII characters in
Ocrad ORF output.
* Fix crash on hOCR with image elements.
* Fix insecure use of temporary files when using Cuneiform.
-------------------------------------------------------------------
Fri Sep 17 09:48:36 UTC 2010 - lazy.kent.suse@gmail.com
- Update to 0.6.0.
* Add support for the Tesseract OCR engine.
* Fix Cuneiform support (a regression introduced in 0.5).
- Dropped obsolete reorder_colors patch.
-------------------------------------------------------------------
Thu Sep 16 12:23:25 UTC 2010 - lazy.kent.suse@gmail.com
- Update to 0.5.1.
* Fix crash when listing engines/languages if Ocropus is not
found.
- Patch to reorder colors in bitonal BMPs.
-------------------------------------------------------------------
Wed Sep 15 06:40:31 UTC 2010 - lazy.kent.suse@gmail.com
- Update to 0.5.0.
* Add support for the Ocrad OCR engine.
* Add support for the GOCR engine.
* Prevent Cuneiform from asking interactive questions.
* Drop support for guessing page size from image (scan) contents.
- Updated engine patch.
- Replaced python-setuptools with python-base in BuildRequires.
- Added tesseract, ocropus, gocr, orcad to Recommends.
- Man pages installed by setup.
-------------------------------------------------------------------
Wed Aug 25 23:38:49 UTC 2010 - lazy.kent.suse@gmail.com
- Update to 0.4.7.
* Preserve as much environment as possible when calling external
programs.
-------------------------------------------------------------------
Wed Aug 4 05:47:34 UTC 2010 - lazy.kent.suse@gmail.com
- Update to 0.4.6.
* Implement work-around for Cuneiform returning files with
control characters.
* Avoid deprecation warnings with PyICU ≥ 1.0.
* djvu2hocr:
+ Don't crash on very long documents.
- Dropped obsolete patches.
-------------------------------------------------------------------
Tue May 25 21:14:58 UTC 2010 - lazy.kent.suse@gmail.com
- Fixed problem with cuneiform broken characters encoding.
-------------------------------------------------------------------
Tue May 25 07:46:57 UTC 2010 - lazy.kent.suse@gmail.com
- Update to 0.4.5.
* Fix handling of ‘deu’ and ‘rus-eng’ languages.
* Properly handle hOCR with inline formatting.
* djvu2hocr:
+ Add ocr-system and ocr-capabilities meta information.
- Dropped obsolete patches (fixed upstream).
-------------------------------------------------------------------
Mon May 24 19:50:37 UTC 2010 - lazy.kent.suse@gmail.com
- Fixed order in which text with inline markup is read.
-------------------------------------------------------------------
Mon May 24 12:09:57 UTC 2010 - lazy.kent.suse@gmail.com
- Update to 0.4.4.
* Document that ocrodjvu honours TMPDIR environment variable.
* Don't remove temporary directory if ocrodjvu crashed.
- Fixed handling of ‘deu’ and ‘rus-eng’ languages.
-------------------------------------------------------------------
Sat Apr 10 11:51:30 UTC 2010 - lazy.kent.suse@gmail.com
- Default engine changed to Cuneiform.
-------------------------------------------------------------------
Tue Apr 6 18:35:04 UTC 2010 - lazy.kent.suse@gmail.com
- Initial package created - 0.4.3.