File python-chardet.changes of Package python-chardet

-------------------------------------------------------------------
Fri Mar  6 07:41:56 UTC 2026 - Matej Cepl <mcepl@cepl.eu>

- update to 6.0.0 (the last version before the infringement;
  DON’T UPGRADE UNTIL gh#chardet/chardet#327 IS RESOLVED):
  - Features
    - Unified single-byte charset detection: Instead of only
      having trained language models for a handful of languages
      (Bulgarian, Greek, Hebrew, Hungarian, Russian, Thai,
      Turkish) and relying on special-case Latin1Prober and
      MacRomanProber heuristics for Western encodings, chardet
      now treats all single-byte charsets the same way: every
      encoding gets proper language-specific bigram models
      trained on CulturaX corpus data. This means chardet can now
      accurately detect both the encoding and the language for
      all supported single-byte encodings.
    - 38 new languages: Arabic, Belarusian, Breton, Croatian,
      Czech, Danish, Dutch, English, Esperanto, Estonian, Farsi,
      Finnish, French, German, Icelandic, Indonesian, Irish,
      Italian, Kazakh, Latvian, Lithuanian, Macedonian, Malay,
      Maltese, Norwegian, Polish, Portuguese, Romanian, Scottish
      Gaelic, Serbian, Slovak, Slovene, Spanish, Swedish, Tajik,
      Ukrainian, Vietnamese, and Welsh. Existing models for
      Bulgarian, Greek, Hebrew, Hungarian, Russian, Thai, and
      Turkish were also retrained with the new pipeline.
    - EncodingEra filtering: New encoding_era parameter to detect
      allows filtering by an EncodingEra flag enum (MODERN_WEB,
      LEGACY_ISO, LEGACY_MAC, LEGACY_REGIONAL, DOS, MAINFRAME,
      ALL) allows callers to restrict detection to encodings from
      a specific era. detect() and detect_all() default to
      MODERN_WEB. The new MODERN_WEB default should drastically
      improve accuracy for users who are not working with legacy
      data. The tiers are:
          MODERN_WEB: UTF-8/16/32, Windows-125x, CP874, CJK
          multi-byte (widely used on the web)
          LEGACY_ISO: ISO-8859-x, KOI8-R/U (legacy but well-known
          standards)
          LEGACY_MAC: Mac-specific encodings (MacRoman,
          MacCyrillic, etc.)
          LEGACY_REGIONAL: Uncommon regional/national encodings
          (KOI8-T, KZ1048, CP1006, etc.)
          DOS: DOS/OEM code pages (CP437, CP850, CP866, etc.)
          MAINFRAME: EBCDIC variants (CP037, CP500, etc.)
    - --encoding-era CLI flag: The chardetect CLI now accepts
      -e/--encoding-era to control which encoding eras are
      considered during detection.
    - max_bytes and chunk_size parameters: detect(),
      detect_all(), and UniversalDetector now accept max_bytes
      (default 200KB) and chunk_size (default 64KB) parameters
      for controlling how much data is examined. (#314, @bysiber)
    - Encoding era preference tie-breaking: When multiple
      encodings have very close confidence scores, the detector
      now prefers more modern/Unicode encodings over legacy ones.
    - Charset metadata registry: New chardet.metadata.charsets
      module provides structured metadata about all supported
      encodings, including their era classification and language
      filter.
    - should_rename_legacy now defaults intelligently: When set
      to None (the new default), legacy renaming is automatically
      enabled when encoding_era is MODERN_WEB.
    - Direct GB18030 support: Replaced the redundant GB2312
      prober with a proper GB18030 prober.
    - EBCDIC detection: Added CP037 and CP500 EBCDIC model
      registrations for mainframe encoding detection.
    - Binary file detection: Added basic binary file detection to
      abort analysis earlier on non-text files.
    - Python 3.12, 3.13, and 3.14 support (#283, @hugovk; #311)
    - GitHub Codespace support (#312, @oxygen-dioxide)
  - Fixes
    - Fix CP949 state machine: Corrected the state machine for
      Korean CP949 encoding detection. (#268, @nenw)
    - Fix SJIS distribution analysis: Fixed
      SJISDistributionAnalysis discarding valid second-byte range
      >= 0x80. (#315, @bysiber)
    - Fix UTF-16/32 detection for non-ASCII-heavy text: Improved
      detection of UTF-16/32 encoded CJK and other non-ASCII text
      by adding a MIN_RATIO threshold alongside the existing
      EXPECTED_RATIO.
    - Fix get_charset crash: Resolved a crash when looking up
      unknown charset names.
    - Fix GB18030 char_len_table: Corrected the character length
      table for GB18030 multi-byte sequences.
    - Fix UTF-8 state machine: Updated to be more spec-compliant.
    - Fix detect_all() returning inactive probers: Results from
      probers that determined "definitely not this encoding" are
      now excluded.
    - Fix early cutoff bug: Resolved an issue where detection
      could terminate prematurely.
    - Default UTF-8 fallback: If UTF-8 has not been ruled out and
      nothing else is above the minimum threshold, UTF-8 is now
      returned as the default.
  - Breaking changes
    - Dropped Python 3.7, 3.8, and 3.9 support: Now requires
      Python 3.10+. (#283, @hugovk)
    - Removed Latin1Prober and MacRomanProber: These special-case
      probers have been replaced by the unified model-based
      approach described above. Latin-1, MacRoman, and all other
      single-byte encodings are now detected by
      SingleByteCharSetProber with trained language models,
      giving better accuracy and language identification.
    - Removed EUC-TW support: EUC-TW encoding detection has been
      removed as it is extremely rare in practice.
    - LanguageFilter.NONE removed: Use specific language filters
      or LanguageFilter.ALL instead.
    - Enum types changed: InputState, ProbingState, MachineState,
      SequenceLikelihood, and CharacterCategory are now IntEnum
      (previously plain classes or Enum). LanguageFilter values
      changed from hardcoded hex to auto().
    - detect() default behavior change: detect() now defaults to
      encoding_era=EncodingEra.MODERN_WEB and
      should_rename_legacy=None (auto-enabled for MODERN_WEB),
      whereas previously it defaulted to considering all
      encodings with no legacy renaming.
  - Misc changes
    - Switched from Poetry/setuptools to uv + hatchling: Build
      system modernized with hatch-vcs for version management.
    - License text updated: Updated LGPLv2.1 license text and FSF
      notices to use URL instead of mailing address. (#304, #307,
      @musicinmybrain)
    - CulturaX-based model training: The create_language_model.py
      training script was rewritten to use the CulturaX
      multilingual corpus instead of Wikipedia, producing higher
      quality bigram frequency models.
    - Language class converted to frozen dataclass: The language
      metadata class now uses @dataclass(frozen=True) with
      num_training_docs and num_training_chars fields replacing
      wiki_start_pages.
    - Test infrastructure: Added pytest-timeout and pytest-xdist
      for faster parallel test execution. Reorganized test data
      directories.

-------------------------------------------------------------------
Mon Sep  4 16:04:35 UTC 2023 - Dirk Müller <dmueller@suse.com>

- update to 5.2.0:
  * Adds support for running chardet CLI via `python -m chardet`

-------------------------------------------------------------------
Fri Apr 21 12:23:15 UTC 2023 - Dirk Müller <dmueller@suse.com>

- add sle15_python_module_pythons (jsc#PED-68)

-------------------------------------------------------------------
Thu Apr 13 22:40:29 UTC 2023 - Matej Cepl <mcepl@suse.com>

- Make calling of %{sle15modernpython} optional.

-------------------------------------------------------------------
Mon Jan 16 21:13:18 UTC 2023 - Dirk Müller <dmueller@suse.com>

- skip python 3.6 builds 

-------------------------------------------------------------------
Mon Jan  2 18:40:26 UTC 2023 - Dirk Müller <dmueller@suse.com>

- update to 5.1.0:
  * Add should_rename_legacy argument to most functions, which will rename
    older encodings to their more modern equivalents (e.g., GB2312 becomes
    GB18030) (#264, @dan-blanchard)
  * Add capital letter sharp S and ISO-8859-15 support 
  * Add a prober for MacRoman encoding
  * Add --minimal flag to chardetect command
  * Add type annotations to the project and run mypy on CI
  * Add support for Python 3.11
  * Clarify LGPL version in License trove classifier (#255, @musicinmybrain)
  * Remove support for EOL Python 3.6 (#260, @jdufresne)
  * Remove unnecessary guards for non-falsey values (#259, @jdufresne)
  * Switch to Python 3.10 release in GitHub actions (#257, @jdufresne)
  * Remove setup.py in favor of build package (#262, @jdufresne)
  * Run tests on macos, Windows, and 3.11-dev (#267, @dan-blanchard)

-------------------------------------------------------------------
Tue Jul  5 13:21:09 UTC 2022 - Ben Greiner <code@bnavigator.de>

- Update to  5.0.0
  * This release is the first release of chardet that no longer
    supports Python < 3.6
  * Added a prober for Johab Korean (#207, @grizlupo)
  * Added a prober for UTF-16/32 BE/LE (#109, #206, @jpz)
  * Added test data for Croatian, Czech, Hungarian, Polish, Slovak,
    Slovene, Greek, and Turkish, which should help prevent future
    errors with those languages
  * Improved XML tag filtering, which should improve accuracy for
    XML files (#208)
  * Tweaked SingleByteCharSetProber confidence to match latest
    uchardet (#209)
  * Made detect_all return child prober confidences (#210)
  * Updated examples in docs (#223, @domdfcoding)
  * Documentation fixes (#212, #224, #225, #226, #220, #221, #244
    from too many to mention)
  * Minor performance improvements (#252, @deedy5)
  * Add support for Python 3.10 when testing (#232, @jdufresne)
  * Lots of little development cycle improvements, mostly thanks to
    @jdufresne
- Canonicalize alternatives creation

-------------------------------------------------------------------
Fri Dec 10 09:05:04 UTC 2021 - pgajdos@suse.com

- pytest-runner is not required for build

-------------------------------------------------------------------
Thu Sep 30 08:18:47 UTC 2021 - Stefan Schubert <schubi@suse.de>

- Use libalternatives instead of update-alternatives.

-------------------------------------------------------------------
Sun Dec 20 05:52:28 UTC 2020 - John Vandenberg <jayvdb@gmail.com>

- Remove now unnecessary pytest4.patch and python-chardet-rpmlintrc
- Update to v4.0.0
  See https://github.com/chardet/chardet/compare/3.0.4...4.0.0

-------------------------------------------------------------------
Mon Oct 14 11:45:00 UTC 2019 - Matej Cepl <mcepl@suse.com>

- Replace %fdupes -s with plain %fdupes; hardlinks are better.

-------------------------------------------------------------------
Wed Jul  3 08:32:17 UTC 2019 - Tomáš Chvátal <tchvatal@suse.com>

- Add patch to fix build with pytest4:
  * pytest4.patch

-------------------------------------------------------------------
Tue Feb 26 08:14:25 UTC 2019 - Tomáš Chvátal <tchvatal@suse.com>

- Switch to multibuild to avoid buildcycles

-------------------------------------------------------------------
Tue Dec  4 12:49:12 UTC 2018 - Matej Cepl <mcepl@suse.com>

- Remove superfluous devel dependency for noarch package

-------------------------------------------------------------------
Tue May 15 07:02:02 UTC 2018 - antoine.belvire@opensuse.org

- Fix update-alternatives call in %postun.

-------------------------------------------------------------------
Wed Sep 20 21:47:30 UTC 2017 - dmueller@suse.com

- add update-alternatives post-requires 

-------------------------------------------------------------------
Fri Aug 25 13:09:48 UTC 2017 - tbechtold@suse.com

- Fix build for Leap-42.3

-------------------------------------------------------------------
Tue Aug 15 09:57:21 UTC 2017 - dmueller@suse.com

- add update-alternative support for py2/py3 coinstallability

-------------------------------------------------------------------
Thu Jun 29 08:43:41 UTC 2017 - ecsos@opensuse.org

- fix source link

-------------------------------------------------------------------
Sat Jun 10 08:39:04 UTC 2017 - dmueller@suse.com

- update to 3.0.4

-------------------------------------------------------------------
Tue Mar 21 13:57:55 UTC 2017 - jmatejek@suse.com

- do not use %py_ver, replace with %python_version

-------------------------------------------------------------------
Sun Mar 19 08:23:54 UTC 2017 - aloisio@gmx.com

- Converted to single spec.

-------------------------------------------------------------------
Mon Jan 30 21:41:47 UTC 2017 - rjschwei@suse.com

- Include in SLE 12 (bsc#1002895, FATE#321630)

-------------------------------------------------------------------
Mon May 11 05:49:58 UTC 2015 - arun@gmx.de

- specfile:
  * added update alternative to prevent conflicts with python3 version
  * add tests

-------------------------------------------------------------------
Tue Feb 10 23:45:01 UTC 2015 - aloisio@gmx.com

- Update to version 2.3.0
  * Added support for CP932 detection (thanks to @hashy)
  * Fixed an issue where UTF-8 with a BOM would not be detected
    as UTF-8-SIG (#8)
  * Modified chardetect to use argparse for argument parsing
  * Moved docs to a gh-pages branch. You can now access them
    at http://chardet.github.io
- Changelog on https://github.com/chardet/chardet/commits/2.3.0
- Other minor changes

-------------------------------------------------------------------
Thu Oct 24 11:00:03 UTC 2013 - speilicke@suse.com

- Require python-setuptools instead of distribute (upstreams merged)

-------------------------------------------------------------------
Tue Oct  2 03:09:41 UTC 2012 - alexandre@exatati.com.br

- Update to 2.1.1:
  - Sorry, no changelog.

-------------------------------------------------------------------
Fri Jul 27 15:10:36 UTC 2012 - alexandre@exatati.com.br

- Update to 1.1:
  - Sorry, no changelog.

-------------------------------------------------------------------
Wed Dec 28 22:18:53 UTC 2011 - alexandre@exatati.com.br

- Standard in spec file;
- Remove CFLAGS and %clean section from spec file.

-------------------------------------------------------------------
Thu Dec  8 11:12:41 UTC 2011 - coolo@suse.com

- the license seems to be LGPL-2.1+

-------------------------------------------------------------------
Sat Mar 26 02:11:35 UTC 2011 - alexandre@exatati.com.br

- Regenerate spec file with py2pack;
- Bzip2 source file.

-------------------------------------------------------------------
Mon Jan 25 14:35:35 UTC 2010 - alexandre@exatati.com.br

- Initial package (2.0.1) for openSUSE.
Places

File python-chardet.changes of Package python-chardet

Places