File python-ocrmypdf.changes of Package python-ocrmypdf

-------------------------------------------------------------------
Wed Nov  6 14:57:33 UTC 2024 - Matej Cepl <mcepl@cepl.eu>

- Update to 16.6.0:
  - Fixed an issue where damaged PDFs would fail with --redo-ocr.
    :issue:`1403`
  - Fixed an error that prevented JBIG2 optimization on Windows
    if the image was optimized in an earlier step. :issue:`1396`
  - Fixed an error detecting the version of unpaper 7.0.0.
    :issue:`1409`
  - Fixed a performance regression when scanning pages.
    :issue:`1378`. Thanks @aliemjay.
  - Fixed Alpine Docker image by enforcing Alpine 3.19. Alpine
    3.20 includes a defective version of Tesseract OCR and so is
    not usable.
  - Upgraded Ubuntu Docker image to use Ubuntu 24.04.
  - Build and test scripts/actions switched to uv.
  - When running in a container, we now remind the user that
    temporary folders are inside the container and may not be
    accessible.
  - Fixed Linux test coverage matrix, which was missing some key
    versions.
- Update to 16.5.0:
  - Fixed issue with interpreting PDFs that have images with
    array masks. :issue:`1377`
  - Enabled testing on Python 3.13.
  - Fixed a test that did not work correctly but still passed.
    :issue:`1382`
  - Improved "PDF/A conversion failed" warning message to better
    describe implications.
  - Updated documentation to better explain OCR_JSON_SETTINGS in
    batch processing.
  - Build backend changed from setuptools to hatchling.
- Update to 16.4.3:
  - Work around pdfminer.six issue where a token on the buffer
    boundary is incorrectly parsed as two tokens. :issue:`1361`
  - New rules are applied to stencil masks and explicit masks
    when calculating the optimal page DPI for rendering.
    :issue:`1362`
  - Fixed attempts to use an incompatible jbig2.EXE provided by
    TeX Live. :issue:`1363`
- Update to 16.4.2:
  - Fixed order of filenames passed to Ghostscript for PDF/A
    generation. :issue:`1359`
  - Suppressed missing jbig2dec warning message. :issue:`1358`
  - Fixed calculation of image size when soft mask dimensions
    don't match image dimension. :issue:`1351`
  - Several fixes to documentation. Thanks to users Iris and
    JoKalliauer who contributed these changes.
  - Fixed error on processing PDFs that are missing certain image
    metadata. :issue:`1315`
- Update to 16.4.1:
  - Fixed calculation of image printed area (used in finding
    weighted DPI for OCR). :issue:`1334`
  - Fixed "NotImplementedError: not sure how to get colorspace"
    error messages in logs which simply records a failure
    to optimize images with print production colorspaces.
    :issue:`1315`
- Update to 16.4.0:
  - Selecting the osd and equ pseudo-languages with -l/--language
    now exits with an error when using Tesseract OCR, because
    these are not regular Tesseract languages but implementation
    details implemented. Using them can cause Tesseract to crash.
  - The hOCR renderer is more tolerant of extra whitespace in
    input files.
  - watcher.py now changes the output file extension to .pdf when
    the input is not .pdf.
  - Improved handling of PDFs that contain circularly referenced
    Form XObjects. :issue:`1321`
  - Fixed Alpine Docker image for ARM64, which was not building
    correctly.
  - Docker images now use pikepdf 9.0.0.
  - Prevent use of Tesseract OCR 5.4.0, a version with known
    regressions.
  - Disabled progressbar for "Linearizing" when --no-progress-bar
    set.
  - Fixed some tests that warn about missing JBIG2 decoding via
    pikepdf, by installing the necessary libraries during tests.
- Update to 16.3.1:
  - Fixed a test suite failure with Ghostscript 10.03.0+.
    :issue:`1316`
  - Fixed an issue with the presentation of the "OCR" progress
    bar. :issue:`1313`
- Update to 16.3.0:
  - Fixed progress bar not displaying for Ghostscript PDF/A
    conversion. :issue:`1313`
  - Added progress bar for linearization. :issue:`1313`
  - If --rotate-pages-threshold issued without --rotate-pages we
    now exit with an error since the user likely intended to use
    --rotate-pages. :issue:`1309`
  - If Tesseract hOCR gives an invalid line box, print an error
    message instead of exiting with an error. :issue:`1312`
- Update to 16.2.0:
  - Fixed issue 'NoneType' object has no attribute 'get' when
    optimizing certain PDFs. :issue:`1293,1271`
  - Switched formatting from black to ruff.
  - Added support for sending sidecar output to io.BytesIO.
  - Added support for converting HEIF/HEIC images (the native
    image of iPhones and some other devices) to PDFs, when the
    appropriate pi-hief library is installed. This library is
    marked as a dependency, but maintainers may opt out if
    needed.
  - We now default to downsampling large images that would
    exceed Tesseract's internal limits, but only if it cause
    processing to fail. Previously, this behavior only occurred
    if specifically requested on command line. It can still be
    configured and disabled. See the --tesseract command line
    options.
  - Added Macports install instructions. Thanks @akierig.
  - Improved logging output when an unexpected error occurs while
    trying to obtain the version of a third party program.
- Update to 16.1.2:
  - Fixed test suite failure when using Ghostscript 10.3.
  - Other minor corrections.
- Update to 16.1.1:
  - Fixed PyPy 3.10 support.
- Update to 16.1.0:
  - Improved hOCR renderer is now default for left to right
    languages.
  - Improved handling of rotated pages. Previously, OCR text
    might be missing for pages that were rotated with a /Rotate
    tag on the page entry.
  - Improved handling of cropped pages. Previously, in some
    cases a page with a crop box would not have its OCR applied
    correctly and misalignment between OCR text and visible text
    coudl occur.
  - Documentation improvements, especially installation
    instructions for less common platforms.

-------------------------------------------------------------------
Mon Jan  8 15:26:44 UTC 2024 - ecsos <ecsos@opensuse.org>

- Update to 16.0.4
  - Fixed some issues for left-to-right text with the new hOCR renderer.
    It is still not default yet but will be made so soon.
    Right-to-left text is still in progress.
  - Added an error to prevent use of several versions of Ghostscript
    that seem corrupt existing text in input PDFs. 
    Newly generated OCR is not affected.
    For best results, use Ghostscript 10.02.1 or newer,
    which contains the fix for the issue.

-------------------------------------------------------------------
Thu Jan  4 10:05:05 UTC 2024 - ecsos <ecsos@opensuse.org>

- Update to 16.0.3
  - Changed minimum required Ghostscript to 9.54, to support users of RHEL 9 and its derivatives,
    since that is the latest version available there.
  - Removed warning message about CVE-2023-43115, on the assumption that most distributions have backported the patch by now.
- Changes from 16.0.2
  - Temporarily changed PDF text renderer back to sandwich by default to address regressions in macOS Preview.
- Changes from 16.0.1
  - Fixed text rendering issue with new hOCR text renderer - extraneous byte order marks.
  - Tightened dependencies.
- Changes from 16.0.0
  - Added OCR text renderer, combined the best ideas of Tesseract's PDF generator and the older hOCR transformer renderer.
    The result is a hopefully permanent fix for wordssmushedtogetherwithoutspaces issues in extracted text, better
    registration/position of text on skewed baselines :issue:`1009`, fixes to character output when the German Fraktur script
    is used :issue:`1191`, proper rendering of right to left languages (Arabic, Hebrew, Persian) :issue:`1157`.
    Asian languages may still have excessive word breaks compared to expectations. The new renderer is the default; 
    the old sandwich renderer is still available using --pdf-renderer sandwich; the old hOCR renderer is no more.
  - The ocrmypdf.hocrtransform API has changed substantially.
  - Support for Python 3.9 has been dropped. Python 3.10+ is now required.
  - pikepdf >= 8.8.0 is now required.

-------------------------------------------------------------------
Fri Dec 15 08:32:05 UTC 2023 - ecsos <ecsos@opensuse.org>

- Initial version 15.4.4
openSUSE Build Service is sponsored by