File python-unstructured.spec of Package python-unstructured
#
# spec file for package python-unstructured
#
# Copyright (c) 2024 SUSE LLC
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
# upon. The license for this file, and modifications and additions to the
# file, is the same license as for the pristine package itself (unless the
# license for the pristine package is not an Open Source License, in which
# case the license is the MIT License). An "Open Source License" is a
# license that conforms to the Open Source Definition (Version 1.9)
# published by the Open Source Initiative.
# Please submit bugfixes or comments via https://bugs.opensuse.org/
#
Name: python-unstructured
Version: 0.15.9
Release: 0
Summary: A library that prepares raw documents for downstream ML tasks
License: Apache-2.0
URL: https://github.com/Unstructured-IO/unstructured
Source: https://files.pythonhosted.org/packages/source/u/unstructured/unstructured-%{version}.tar.gz
BuildRequires: python-rpm-macros
BuildRequires: %{python_module pip}
BuildRequires: %{python_module setuptools}
BuildRequires: %{python_module wheel}
# SECTION test requirements
BuildRequires: %{python_module backoff}
BuildRequires: %{python_module beautifulsoup4}
BuildRequires: %{python_module chardet}
BuildRequires: %{python_module dataclasses-json}
BuildRequires: %{python_module emoji}
BuildRequires: %{python_module filetype}
BuildRequires: %{python_module fsspec}
BuildRequires: %{python_module langdetect}
BuildRequires: %{python_module lxml}
BuildRequires: %{python_module nltk}
BuildRequires: %{python_module numpy1}
BuildRequires: %{python_module psutil}
BuildRequires: %{python_module pytest}
BuildRequires: %{python_module python-iso639}
BuildRequires: %{python_module python-magic}
BuildRequires: %{python_module python-oxmsg}
BuildRequires: %{python_module pytest-vcr}
BuildRequires: %{python_module pytz}
BuildRequires: %{python_module rapidfuzz}
BuildRequires: %{python_module requests}
BuildRequires: %{python_module tabulate}
BuildRequires: %{python_module tqdm}
BuildRequires: %{python_module typing-extensions}
BuildRequires: %{python_module unstructured-client}
BuildRequires: %{python_module wrapt}
# /SECTION
BuildRequires: fdupes
Requires: python-backoff
Requires: python-beautifulsoup4
Requires: python-chardet
Requires: python-dataclasses-json
Requires: python-emoji
Requires: python-filetype
Requires: python-langdetect
Requires: python-lxml
Requires: python-nltk
Requires: python-numpy1
Requires: python-psutil
Requires: python-python-iso639
Requires: python-python-magic
Requires: python-python-oxmsg
Requires: python-rapidfuzz
Requires: python-requests
Requires: python-tabulate
Requires: python-tqdm
Requires: python-typing-extensions
Requires: python-unstructured-client
Requires: python-wrapt
Suggests: python-pyairtable
Suggests: python-pypandoc
Suggests: python-python-pptx >= 1.0.1
Suggests: python-pikepdf
Suggests: python-effdet
Suggests: python-google-cloud-vision
Suggests: python-onnx
Suggests: python-openpyxl
Suggests: python-unstructured.pytesseract >= 0.3.12
Suggests: python-markdown
Suggests: python-xlrd
Suggests: python-python-docx >= 1.1.2
Suggests: python-pdfminer.six
Suggests: python-pandas
Suggests: python-pi_heif
Suggests: python-pypdf
Suggests: python-unstructured-inference == 0.7.36
Suggests: python-networkx
Suggests: python-pdf2image
Suggests: python-astrapy
Suggests: python-adlfs
Suggests: python-fsspec
Suggests: python-azure-search-documents
Suggests: python-boto3
Suggests: python-langchain-community
Suggests: python-bs4
Suggests: python-boxfs
Suggests: python-fsspec
Suggests: python-chromadb
Suggests: python-importlib-metadata >= 8.2.0
Suggests: python-typer <= 0.9.0
Suggests: python-tenacity == 8.5.0
Suggests: python-clarifai
Suggests: python-atlassian-python-api
Suggests: python-pandas
Suggests: python-databricks-sdk
Suggests: python-deltalake
Suggests: python-fsspec
Suggests: python-discord-py
Suggests: python-python-docx >= 1.1.2
Suggests: python-python-docx >= 1.1.2
Suggests: python-dropboxdrivefs
Suggests: python-fsspec
Suggests: python-elasticsearch
Suggests: python-langchain-huggingface
Suggests: python-mixedbread-ai
Suggests: python-openai
Suggests: python-tiktoken
Suggests: python-langchain
Suggests: python-langchain-community
Suggests: python-langchain-google-vertexai
Suggests: python-langchain
Suggests: python-langchain-voyageai
Suggests: python-pypandoc
Suggests: python-gcsfs
Suggests: python-fsspec
Suggests: python-bs4
Suggests: python-pygithub > 1.58.0
Suggests: python-python-gitlab
Suggests: python-google-api-python-client
Suggests: python-hubspot-api-client
Suggests: python-urllib3
Suggests: python-langdetect
Suggests: python-sacremoses
Suggests: python-sentencepiece
Suggests: python-torch
Suggests: python-transformers
Suggests: python-onnx
Suggests: python-pdf2image
Suggests: python-pdfminer.six
Suggests: python-pikepdf
Suggests: python-pi_heif
Suggests: python-pypdf
Suggests: python-google-cloud-vision
Suggests: python-effdet
Suggests: python-unstructured-inference == 0.7.36
Suggests: python-unstructured.pytesseract >= 0.3.12
Suggests: python-atlassian-python-api
Suggests: python-confluent-kafka
Suggests: python-pypandoc
Suggests: python-python-pptx >= 1.0.1
Suggests: python-pikepdf
Suggests: python-effdet
Suggests: python-google-cloud-vision
Suggests: python-onnx
Suggests: python-openpyxl
Suggests: python-unstructured.pytesseract >= 0.3.12
Suggests: python-markdown
Suggests: python-xlrd
Suggests: python-python-docx >= 1.1.2
Suggests: python-pdfminer.six
Suggests: python-pandas
Suggests: python-pi_heif
Suggests: python-pypdf
Suggests: python-unstructured-inference == 0.7.36
Suggests: python-networkx
Suggests: python-pdf2image
Suggests: python-markdown
Suggests: python-pymongo
Suggests: python-notion-client
Suggests: python-htmlBuilder
Suggests: python-python-docx >= 1.1.2
Suggests: python-pypandoc
Suggests: python-msal
Suggests: python-Office365-REST-Python-Client
Suggests: python-bs4
Suggests: python-langchain-openai
Suggests: python-opensearch-py
Suggests: python-pypandoc
Suggests: python-msal
Suggests: python-Office365-REST-Python-Client
Suggests: python-paddlepaddle == 3.0.0b1
Suggests: python-unstructured.paddleocr == 2.8.1.0
Suggests: python-onnx
Suggests: python-pdf2image
Suggests: python-pdfminer.six
Suggests: python-pikepdf
Suggests: python-pi_heif
Suggests: python-pypdf
Suggests: python-google-cloud-vision
Suggests: python-effdet
Suggests: python-unstructured-inference == 0.7.36
Suggests: python-unstructured.pytesseract >= 0.3.12
Suggests: python-pinecone-client >= 3.7.1
Suggests: python-psycopg2-binary
Suggests: python-python-pptx >= 1.0.1
Suggests: python-python-pptx >= 1.0.1
Suggests: python-qdrant-client
Suggests: python-praw
Suggests: python-pypandoc
Suggests: python-pypandoc
Suggests: python-s3fs
Suggests: python-fsspec
Suggests: python-simple-salesforce
Suggests: python-fsspec
Suggests: python-paramiko
Suggests: python-msal
Suggests: python-Office365-REST-Python-Client
Suggests: python-singlestoredb
Suggests: python-slack_sdk
Suggests: python-pandas
Suggests: python-weaviate-client
Suggests: python-wikipedia
Suggests: python-openpyxl
Suggests: python-pandas
Suggests: python-xlrd
Suggests: python-networkx
BuildArch: noarch
%python_subpackages
%description
The unstructured library provides open-source components for ingesting and pre-processing
images and text documents, such as PDFs, HTML, Word docs, and many more. The use cases of
unstructured revolve around streamlining and optimizing the data processing workflow for
LLMs. unstructured modular functions and connectors form a cohesive system that simplifies
data ingestion and pre-processing, making it adaptable to different platforms and efficient
in transforming unstructured data into structured outputs.
%prep
%autosetup -p1 -n unstructured-%{version}
%build
%pyproject_wheel
%install
%pyproject_install
%python_clone -a %{buildroot}%{_bindir}/unstructured-ingest
%python_expand %fdupes %{buildroot}%{$python_sitelib}
%check
%pytest
%post
%python_install_alternative unstructured-ingest
%postun
%python_uninstall_alternative unstructured-ingest
%files %{python_files}
%doc CHANGELOG.md README.md
%license LICENSE.md
%python_alternative %{_bindir}/unstructured-ingest
%{python_sitelib}/unstructured
%{python_sitelib}/unstructured-%{version}.dist-info
%changelog