Revisions of OCRmyPDF

Frank Kunz's avatar Frank Kunz (frank_kunz) committed (revision 17)
- Update to version 16.1.2
  Remove 0001-Drop-shebang-from-non-executable-files.patch from build
  v16.1.2
    Fixed test suite failure when using Ghostscript 10.3.
    Other minor corrections.
  v16.1.1
    Fixed PyPy 3.10 support.
  v16.1.0
    Improved hOCR renderer is now default for left to right languages.
    Improved handling of rotated pages. Previously, OCR text might be missing for
    pages that were rotated with a /Rotate tag on the page entry.
    Improved handling of cropped pages. Previously, in some cases a page with a
    crop box would not have its OCR applied correctly and misalignment between
    OCR text and visible text coudl occur.
    Documentation improvements, especially installation instructions for less
    common platforms.
  v16.0.4
    Fixed some issues for left-to-right text with the new hOCR renderer. It is still
    not default yet but will be made so soon. Right-to-left text is still in progress.
    Added an error to prevent use of several versions of Ghostscript that seem
    corrupt existing text in input PDFs. Newly generated OCR is not affected.
    For best results, use Ghostscript 10.02.1 or newer, which contains the fix
    for the issue.
  v16.0.3
    Changed minimum required Ghostscript to 9.54, to support users of RHEL 9 and its
    derivatives, since that is the latest version available there.
    Removed warning message about CVE-2023-43115, on the assumption that most
    distributions have backported the patch by now.
  v16.0.2
    Temporarily changed PDF text renderer back to sandwich by default to address
Frank Kunz's avatar Frank Kunz (frank_kunz) committed (revision 16)
osc copypac from project:home:frank_kunz package:OCRmyPDF revision:14
Frank Kunz's avatar Frank Kunz (frank_kunz) committed (revision 15)
- Update to version 15.3.1
  Fixed an issue with logging settings for misc/watcher.py introduced in the previous release. :issue:`1180`
  Updated documentation on Docker performance concerns.
  Update misc/watcher.py to improve command line interface using Typer, and support .env specification of environment variables. Improved error messages. Thanks to @mflagg2814 for the PR that prompted this improvement.
  Improved error message when a file cannot be read because we are running in a snap container.
  Added a Docker image based on Alpine Linux. This image is smaller than the Ubuntu-based image and may be useful in some situations. Currently hosted at jbarlow83/ocrmypdf-alpine. Currently not available in ARM flavor.
  The Ubuntu Docker is now aliased to jbarlow83/ocrmypdf-ubuntu.
  Updated Docker documentation.

- Update to verison 15.1.0
  We now require Pillow 10.0.1, due a serious security vulnerability in all earlier versions of that dependency. The vulnerability concerns WebP images and could be triggered in OCRmyPDF when creating a PDF from a malicious WebP image.
  Added some keyword arguments to ocrmypdf.ocr that were previously accepted but undocumented.
  Documentation updates and typing improvements.
Frank Kunz's avatar Frank Kunz (frank_kunz) committed (revision 14)
- Update to version 15.0.2
  Added Python 3.12 to test matrix.
  Updated documentation for notes on Python 3.12, 32-bit support and some new features in v15.
Frank Kunz's avatar Frank Kunz (frank_kunz) committed (revision 13)
fix dependencies
Frank Kunz's avatar Frank Kunz (frank_kunz) committed (revision 12)
fix dependencies
Frank Kunz's avatar Frank Kunz (frank_kunz) committed (revision 11)
fix dependencies
Frank Kunz's avatar Frank Kunz (frank_kunz) committed (revision 10)
fix dependencies
Frank Kunz's avatar Frank Kunz (frank_kunz) committed (revision 9)
- Update to version 15.0.1
  v15.0.1
    Wheels Python tag changed to py39.
    Marked as a expected fail a test that fails on recent Ghostscript versions.
    Clarified documentation and release notes around the extent of 32-bit support.
    Updated installation documentation to changes in v15.
  v15.0.0
    Dropped support for Python 3.8.
    Dropped support many older dependencies - see pyproject.toml for details. Generally speaking, Ubuntu 22.04 is our baseline system.
    Dropped support for 32-bit Linux wheels. You must use a 64-bit operating system, and 64-bit versions of Python, Tesseract and Ghostscript to use OCRmyPDF. Many of our dependencies are dropping 32-bit builds (e.g. Pillow), and we are following suit. (Maintainers may still build 32-bit versions from source.)
    Changed to trusted release for PyPI publishing.
    pikepdf memory mapping is enabled again for improved performance, now an issue with pikepdf has been fixed.
    ocrmypdf.helpers.calculate_downsample previously had two variants, one that took a PIL.Image and one that took a tuple[int, int]. The latter was removed.
    The snap version of ocrmypdf is now based on Ubuntu core22.
    We now account situations where a small portion of an image on a page reports a high DPI (resolution). Previously, the entire page would be rasterized at the highest resolution, which caused performance problems. Now, the page is rasterized at a resolution based on the average DPI of the page, weighted by the area that each feature occupies. Typically, small areas of high resolution in PDFs are errors or quirks from the repeated use of assets and high resolution is not beneficial. :issue:`1010,1104,1004,1079,1010`
    Ghostscript color conversion strategy is now configurable. :issue:`1143`
  v14.4.0
    Digitally signed PDFs are now detected. If the PDF is signed, OCRmyPDF will refuse to modify it. Previously, only encrypted PDFs were detected, not those that were signed but not encrypted. :issue:`1040`
    In addition, --invalidate-digital-signatures can be used to override the above behavior and modify the PDF anyway. :issue:`1040`
    tqdm progress bars replaced with "rich" progress bars. The rich library is a new dependency. Certain APIs that used tqdm are now deprecated and will be removed in the next major release.
    Improved integration with GitHub Releases. Thanks to @stumpylog.
Frank Kunz's avatar Frank Kunz (frank_kunz) committed (revision 8)
- Update to version 14.3.0
  Renamed master branch to main.
  Improve PDF rasterization accuracy by using the -dPDFSTOPONERROR option to Ghostscript.
    Use --continue-on-soft-render-error if you want to render the PDF anyway. The plugin
    specification was adjusted to support this feature; plugin authors may want to adapt
    PDF rasterizing and rendering plugins. :issue:`1083`
  The calculated deskew angle is now recorded in the logged output. :issue:`1101`
  Metadata can now be unset by setting a metadata type such as --title to an empty string. :issue:`1117,1059`
  Fixed random order of languages due to use of a set. This may have caused output to vary
    when multiple languages were set for OCR. :issue:`1113`
  Clarified the optimization ratio reported in the log output.
  Fixed :issue:`977`, where images inside Form XObjects were always excluded from image optimization.
  Added --tesseract-downsample-above to downsample larger images even when they do not exceed
    Tesseract's internal limits. This can be used to speed up OCR, possibly sacrificing accuracy.
  Fixed resampling AttributeError on older Pillow. :issue:`1096`
  Removed an error about using Ghostscript on PDFs with that have the /UserUnit feature in use.
    Previously, Ghostscript would fail to process these PDFs, but in all supported versions it
    is now supported, so the error is no longer needed.
  Improved documentation around installing other language packs for Tesseract.
Frank Kunz's avatar Frank Kunz (frank_kunz) committed (revision 7)
- Update to version 14.1.0
  Added --tesseract-non-ocr-timeout. This allows using Tesseract's deskew and other non-OCR features while disabling OCR using --tesseract-timeout 0.
  Added --tesseract-downsample-large-images. This downsamples larges images that exceed the maximum image size Tesseract can handle. Large images may still take a long time to process, but this allows them to be processed if that is desired.
  Fixed :issue:`1082`, an issue with snap packaged building.
  Change linter to ruff, fix lint errors, update documentation.
Frank Kunz's avatar Frank Kunz (frank_kunz) committed (revision 6)
Fix dependency version
Frank Kunz's avatar Frank Kunz (frank_kunz) committed (revision 5)
Enable unittests
Frank Kunz's avatar Frank Kunz (frank_kunz) committed (revision 4)
- Update to version 14.0.4
Frank Kunz's avatar Frank Kunz (frank_kunz) committed (revision 3)
- Update to version 14.0.2
Frank Kunz's avatar Frank Kunz (frank_kunz) committed (revision 2)
- Update to version 13.4.4
Frank Kunz's avatar Frank Kunz (frank_kunz) committed (revision 1)
osc copypac from project:home:fstrba package:OCRmyPDF revision:3
Displaying all 17 revisions
openSUSE Build Service is sponsored by