File OCRmyPDF.spec of Package OCRmyPDF

# spec file for package OCRmyPDF
# Copyright (c) 2017 SUSE LINUX GmbH, Nuernberg, Germany.
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
# upon. The license for this file, and modifications and additions to the
# file, is the same license as for the pristine package itself (unless the
# license for the pristine package is not an Open Source License, in which
# case the license is the MIT License). An "Open Source License" is a
# license that conforms to the Open Source Definition (Version 1.9)
# published by the Open Source Initiative.

# Please submit bugfixes or comments via

%define modname ocrmypdf
Name:           OCRmyPDF
Version:        4.5.6
Release:        0
Summary:        Add an OCR text layer to scanned PDF files
License:        GPL-3.0
Group:          Productivity/Publishing/PDF
Source0:        %{name}-%{version}.tar.gz
BuildRequires:  fdupes
BuildRequires:  ghostscript >= 9.15
BuildRequires:  libjpeg-devel
BuildRequires:  python3-Pillow >= 3.1.1
BuildRequires:  python3-PyPDF2 >= 1.25.1
BuildRequires:  python3-cffi >= 1.9.1
BuildRequires:  python3-img2pdf >= 0.2
BuildRequires:  python3-pytest-runner
BuildRequires:  python3-ruffus >= 2.6.3
BuildRequires:  python3-setuptools_scm >= 1.8.0
BuildRequires:  python3-setuptools_scm_git_archive
BuildRequires:  qpdf >= 5.1.1
BuildRequires:  tesseract-ocr >= 3.03
BuildRequires:  unpaper >= 6.1
Requires:       ghostscript >= 9.15
Requires:       python3-Pillow >= 3.1.1
Requires:       python3-PyPDF2 >= 1.25.1
Requires:       python3-cffi >= 1.9.1
Requires:       python3-img2pdf >= 0.2
Requires:       python3-reportlab >= 3.2.0
Requires:       python3-ruffus >= 2.6.3
Requires:       qpdf >= 5.1.1
Requires:       tesseract-ocr >= 3.03
Requires:       tesseract-ocr-traineddata-orientation_and_script_detection >= 3.03
Requires:       unpaper >= 6.1
BuildRoot:      %{_tmppath}/%{name}-%{version}-build
BuildArch:      noarch

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched.

Main features:
 -  Generates a searchable PDF/A file from a regular PDF
 -  Places OCR text accurately below the image to ease copy / paste
 -  Keeps the exact resolution of the original embedded images
 -  When possible, inserts OCR information as a "lossless" operation without rendering vector information
 -  Keeps file size about the same
 -  If requested deskews and/or cleans the image before performing OCR
 -  Validates input and output files
 -  Provides debug mode to enable easy verification of the OCR results
 -  Processes pages in parallel when more than one CPU core is available
 -  Uses Tesseract OCR engine
 -  Supports the 39 languages recognized by Tesseract
 -  Battle-tested on thousands of PDFs, a test suite and continuous integration

%setup -q

CFLAGS="%{optflags}" python3 build

python3 install --prefix=%{_prefix} --root=%{buildroot}
install -D -m 775 %{buildroot}%{_bindir}
# chmod ugo+x  %{buildroot}%{python3_sitelib}/%{modname}/{hocrtransform,leptonica,main,pageinfo,pdfa}.py
%fdupes %{buildroot}%{python3_sitelib}
chmod ugo+x %{buildroot}%{python3_sitelib}/ocrmypdf/*.py
chmod ugo+x %{buildroot}%{python3_sitelib}/ocrmypdf/exec/*.py

%doc README.rst LICENSE.rst