File OCRmyPDF.spec of Package OCRmyPDF
#
# spec file for package OCRmyPDF
#
# Copyright (c) 2017 SUSE LINUX GmbH, Nuernberg, Germany.
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
# upon. The license for this file, and modifications and additions to the
# file, is the same license as for the pristine package itself (unless the
# license for the pristine package is not an Open Source License, in which
# case the license is the MIT License). An "Open Source License" is a
# license that conforms to the Open Source Definition (Version 1.9)
# published by the Open Source Initiative.
# Please submit bugfixes or comments via http://bugs.opensuse.org/
#
%define modname ocrmypdf
Name: OCRmyPDF
Version: 8.0.1
Release: 0
Summary: Add an OCR text layer to scanned PDF files
License: GPL-3.0
Group: Productivity/Publishing/PDF
Url: https://github.com/jbarlow83/OCRmyPDF
Source0: %{name}-%{version}.tar.gz
BuildRequires: fdupes
BuildRequires: ghostscript >= 9.15
BuildRequires: libjpeg-devel
BuildRequires: python3-Pillow >= 3.1.1
BuildRequires: python3-PyPDF2 >= 1.25.1
BuildRequires: python3-cffi >= 1.9.1
BuildRequires: python3-img2pdf >= 0.2
BuildRequires: python3-pytest-runner
BuildRequires: python3-ruffus >= 2.6.3
BuildRequires: python3-setuptools_scm >= 1.8.0
BuildRequires: python3-setuptools_scm_git_archive
BuildRequires: qpdf >= 5.1.1
BuildRequires: tesseract-ocr >= 3.03
BuildRequires: unpaper >= 6.1
Requires: ghostscript >= 9.15
Requires: python3-Pillow >= 3.1.1
Requires: python3-PyPDF2 >= 1.25.1
Requires: python3-cffi >= 1.9.1
Requires: python3-img2pdf >= 0.2
Requires: python3-reportlab >= 3.2.0
Requires: python3-ruffus >= 2.6.3
Requires: qpdf >= 5.1.1
Requires: tesseract-ocr >= 3.03
Requires: tesseract-ocr-traineddata-orientation_and_script_detection >= 3.03
Requires: unpaper >= 6.1
BuildRoot: %{_tmppath}/%{name}-%{version}-build
BuildArch: noarch
%description
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched.
Main features:
- Generates a searchable PDF/A file from a regular PDF
- Places OCR text accurately below the image to ease copy / paste
- Keeps the exact resolution of the original embedded images
- When possible, inserts OCR information as a "lossless" operation without rendering vector information
- Keeps file size about the same
- If requested deskews and/or cleans the image before performing OCR
- Validates input and output files
- Provides debug mode to enable easy verification of the OCR results
- Processes pages in parallel when more than one CPU core is available
- Uses Tesseract OCR engine
- Supports the 39 languages recognized by Tesseract
- Battle-tested on thousands of PDFs, a test suite and continuous integration
%prep
%setup -q
%build
CFLAGS="%{optflags}" python3 setup.py build
%install
python3 setup.py install --prefix=%{_prefix} --root=%{buildroot}
# chmod ugo+x %{buildroot}%{python3_sitelib}/%{modname}/{hocrtransform,leptonica,main,pageinfo,pdfa}.py
%fdupes %{buildroot}%{python3_sitelib}
chmod ugo+x %{buildroot}%{python3_sitelib}/ocrmypdf/*.py
chmod ugo+x %{buildroot}%{python3_sitelib}/ocrmypdf/exec/*.py
%files
%defattr(-,root,root)
%doc README.md LICENSE
%{_bindir}/%{modname}
%{python3_sitelib}/*
%changelog