OCRmyPDF

Edit Package OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Refresh
Refresh
Source Files
Filename Size Changed
OCRmyPDF-16.1.2.tar.gz 0006688943 6.38 MB
OCRmyPDF.changes 0000013608 13.3 KB
OCRmyPDF.spec 0000004706 4.6 KB
Latest Revision
Frank Kunz's avatar Frank Kunz (frank_kunz) committed (revision 17)
- Update to version 16.1.2
  Remove 0001-Drop-shebang-from-non-executable-files.patch from build
  v16.1.2
    Fixed test suite failure when using Ghostscript 10.3.
    Other minor corrections.
  v16.1.1
    Fixed PyPy 3.10 support.
  v16.1.0
    Improved hOCR renderer is now default for left to right languages.
    Improved handling of rotated pages. Previously, OCR text might be missing for
    pages that were rotated with a /Rotate tag on the page entry.
    Improved handling of cropped pages. Previously, in some cases a page with a
    crop box would not have its OCR applied correctly and misalignment between
    OCR text and visible text coudl occur.
    Documentation improvements, especially installation instructions for less
    common platforms.
  v16.0.4
    Fixed some issues for left-to-right text with the new hOCR renderer. It is still
    not default yet but will be made so soon. Right-to-left text is still in progress.
    Added an error to prevent use of several versions of Ghostscript that seem
    corrupt existing text in input PDFs. Newly generated OCR is not affected.
    For best results, use Ghostscript 10.02.1 or newer, which contains the fix
    for the issue.
  v16.0.3
    Changed minimum required Ghostscript to 9.54, to support users of RHEL 9 and its
    derivatives, since that is the latest version available there.
    Removed warning message about CVE-2023-43115, on the assumption that most
    distributions have backported the patch by now.
  v16.0.2
    Temporarily changed PDF text renderer back to sandwich by default to address
Comments 2

Thomas Glatt's avatar

Thank you for this package! If you are interested in some feedback: On my system I needed the following python modules to run the software:

  • pluggy
  • img2pdf
  • reportlab
  • pdfminer.six
  • coloredlogs
  • tqdm

Andrea Ippolito's avatar

Yes that's unfortunate. I tried to use this repo to avoid the need to remember to pip update every time a new ocrmypdf version comes out, but as it stands it's not really practically since I have to fetch the dependencies manually elsewhere, it becomes a bit too much. I guess I'll keep using pip (well, pipx actually), until the authors of ocrmypdf finally provide a build for opensuse (or even a flatpak!)

openSUSE Build Service is sponsored by