File dwarfs.changes of Package dwarfs
-------------------------------------------------------------------
Tue Mar 24 18:35:20 UTC 2026 - Mia Herkt <mia@0x0.st>
- Add patches to fix the test suite on 32-bit architectures:
* 0001-fix-util-implement-time_with_unit-using-integer-arit.patch
* 0002-fix-utils-implement-ratio_to_string-using-integer-ar.patch
* 0003-fix-frozen-Layout-should-use-64-bit-type-for-64-bit-.patch
* 0004-fix-always-set-_FILE_OFFSET_BITS-64-on-UNIX-like-sys.patch
gh#mhx/dwarfs#354
-------------------------------------------------------------------
Mon Mar 23 03:05:32 UTC 2026 - Mia Herkt <mia@0x0.st>
- Update to version 0.15.1:
Bug fixes:
* mkdwarfs did not correctly handle inputs where hardlinks had
the same inode number on different devices. To run into this
issue, you would have to make mkdwarfs scan files from multiple
devices (e.g. the root of a directory tree with multiple
mounted filesystems) and have files with the same inode number
on different devices and have at least two of those files also
have a link count greater than 1. While this is hopefully rare
in practice, it is a serious bug that can lead to crashes (in
the best case) or even data loss (in the worst case), as only
the data of one of these files would be stored in the image.
This has been fixed and a test has been added to cover this
case.
- Version 0.15.0:
Bug fixes:
* Commas in the filesystem image path were not escaped when
passed to the FUSE driver as the fsname option. Because commas
are used as FUSE argument separators, this could cause mounting
to fail for paths containing commas.
gh#mhx/dwarfs#323
* Progress reporting in dwarfsextract was broken when extracting
a subset of files using patterns, because it was computed
relative to the total filesystem size rather than the total
size of the selected files. Several subtle edge cases could
also cause progress percentages to fail to reach 100% or even
exceed it. These issues have been fixed.
gh#mhx/dwarfs#316
* Fixed FUSE argument vector initialization in dwarfs_main, which
could trigger an assertion inside libfuse when extra arguments
were added after an uninitialized vector was passed to
FUSE_ARGS_INIT.
* Fixed a metadata lookup bug for parent_dir_entry in filesystems
with format version 2.2 and earlier (that is, DwarFS releases
before v0.5.0), where an additional level of indirection was
required but missing. Fortunately, this only affected the debug
output of dwarfsck: the parent= field shown with
-d directory_tree would display the parent inode number rather
than the parent entry number. The only other code path using
parent_dir_entry effectively compensated for the missing
indirection.
* Recompressing a filesystem image with sparse files, without
also rebuilding the metadata, could erroneously fail with an
error claiming that sparse file support could not be disabled,
even without --no-sparse-files. The root cause was an unchecked
std::optional access. This has been fixed.
* When rewriting a filesystem image, the bytes_in and bytes_out
progress counters were updated at different times, which could
lead to incorrect compression ratios being shown during
progress reporting. Both counters are now updated together
after compression.
* When using --format=newc and extracting a subset of hardlinked
files, dwarfsextract could crash with an "unexpected deferred
entry" error. This was caused by a peculiarity of libarchive's
newc implementation that was not handled correctly. The bug has
been fixed and is now covered by a test.
* Corrected the license information in a few headers, changing
them from GPL-3.0-or-later to MIT.
Features:
* Major dependency reduction / de-Meta-ing the project.
fbthrift and folly are no longer dependencies of DwarFS, and
the corresponding submodules have been removed from the
repository. fbthrift has been replaced by a new thrift_lite
library that implements the subset DwarFS actually needs,
including the thrift compiler, compact protocol support, JSON
serialization, debug output, and frozen-layout support.
The frozen library from fbthrift has been forked and is now
maintained as an internal component with all folly dependencies
removed. DwarFS now relies on standard C++23, Boost, and a few
new in-repo components instead. This also removes several
indirect dependencies (gflags, glog, double-conversion,
libevent, and on macOS also libsodium). The resulting code is
simpler, the dependency footprint is smaller, and binary size
is reduced in many cases. The compact protocol remains fully
compatible, but the debug and JSON output formats are no longer
identical to fbthrift's output.
* mkdwarfs now automatically selects the progress display mode
based on whether the output is connected to a terminal and
whether the current locale uses UTF-8. Previously, the default
was always unicode, which could produce garbled output in
non-UTF-8 environments.
gh#mhx/dwarfs#326
* The project is now compliant with the REUSE specification.
All source files now carry SPDX license identifiers, a
REUSE.toml file has been added, and full license texts are
included in the LICENSES directory.
* New --hollow option for mkdwarfs. This allows building hollow
filesystem images that preserve the structure, metadata, and
file sizes of the input while replacing actual file contents
with zero-filled sparse files. This is useful for testing
scenarios where realistic filesystem structure matters but the
actual contents do not.
gh#mhx/dwarfs#131
* mkdwarfs now supports ZSTD long-distance matching (LDM) via a
new long algorithm option for --compression. This can improve
compression ratios with extremely large block sizes (typically
above 128 MiB), or with smaller block sizes at lower
compression levels.
* New binary file categorizer (--categorize=binary).
This categorizer can identify ELF (Linux/FreeBSD),
PE (Windows), and Mach-O (macOS) executables and shared
libraries and group them into separate categories by type and
architecture. This can dramatically improve compression ratios
when binaries from different platforms and architectures are
mixed together.
* dwarfsextract has a new --num-disk-writers option to run
multiple writer threads in parallel when extracting files to
disk. This can improve throughput, especially when extracting
large numbers of small files.
* dwarfsextract has new --skip-devices and --skip-specials
options to skip device nodes and special files (such as sockets
and FIFOs) during extraction.
* dwarfsextract now emits a warning when a pattern is provided
but no matching files are found.
* dwarfsck can now export metadata to stdout instead of to a file
by using --export-metadata=-.
* With mkdwarfs in --input-list mode, specifying --order=none now
preserves the exact order of entries given in the input file.
Previously, none was not treated as a meaningful ordering
guarantee in this mode.
* New --no-check option for mkdwarfs to skip the filesystem
integrity check before recompression. This can speed up
recompression workflows when the source image is assumed to be
valid. The individual checks are still performed during the
rewrite itself.
gh#mhx/dwarfs#322
* When recompressing a filesystem image, blocks that are
uncompressed in the source image are no longer unnecessarily
copied into memory. Instead, the mapped memory region is passed
directly to the compressor, saving some memory and CPU time in
this case.
* While profiling mkdwarfs, reading memory usage from
/proc/self/smaps_rollup turned out to be a significant hotspot.
DwarFS now reads /proc/self/status by default instead, trading
some accuracy for much lower overhead. It is still possible to
use /proc/self/smaps_rollup by setting
DWARFS_ACCURATE_MEMORY_USAGE=1. This can be combined with
DWARFS_LOG_MEMORY_USAGE to periodically log memory usage during
a mkdwarfs run. If /proc/self/smaps_rollup is inaccessible,
DwarFS automatically falls back to /proc/self/status.
* Memory usage during mkdwarfs rewrite operations is now properly
constrained by the -L / --memory-limit option, taking into
account both queued blocks and memory used by the compression
algorithm itself.
gh#mhx/dwarfs#322
* The similarity ordering option now uses a different hash mixing
function. The main goal is to improve distribution, since only
a small number of hash bits are actually used. This may or may
not improve compression ratios, but it will affect the
resulting image size.
Build:
* The project now requires C++23 compiler support instead of
C++20. Care has been taken to restrict usage to widely
available and long-supported C++23 features.
* Cleaned up many compiler warnings across different platforms.
Docs:
* Added documentation for the fits, hotness, and binary
categorizers to the mkdwarfs manual page.
Test:
* Test coverage has been significantly improved from 96.4% to
97.1%, with more than 10,000 lines of new test code.
-------------------------------------------------------------------
Sat Mar 21 12:11:32 UTC 2026 - ecsos <ecsos@opensuse.org>
- Fix build error for Leap.
-------------------------------------------------------------------
Wed Nov 26 18:45:57 UTC 2025 - Mia Herkt <mia@0x0.st>
- Update to version 0.14.1:
Bugfixes:
* The metadata_builder now recomputes all total sizes
(total_fs_size, total_allocated_fs_size, and
total_hardlink_size) as part of the build() function. This not
only ensures that the totals are correct even if the allocated
size changes between scanning and segmenting (which has been
happening at least on ZFS volumes), but it also allows images
affected by a related bug in Windows builds of DwarFS to be
fixed by rebuilding the metadata.
* Instead of making the FUSE drivers fail hard when seeing the
options that were removed in v0.14.0, they now just log a
warning and ignore them. The options may still be fully removed
in a future release.
gh#mhx/dwarfs#303.
* The pcmaudio categorizer had two minor issues when compressing
a large number of WAV files. One was reporting an unsupported
format: 3/0 or unsupported format: 65,534/3 warning, which
isn't very useful for the end user. These format codes
correspond to IEEE floating point formats, which are indeed
unsupported. However, the format appears to be quite common,
so the warning has been downgraded to an info message that
explicitly mentions the floating point format. The second issue
was an unexpected fmt chunk size of 20 bytes, which caused the
file to be rejected as a PCM audio file (meaning it was added
using a generic compressor instead of FLAC). It turns out that
these non-conforming fmt chunks are also quite common in
practice, so the code has been changed to accept the
non-conforming file, but also logging an info message
mentioning the non-conformance.
gh#mhx/dwarfs#309.
* The help text for the mkdwarfs compress level option (-l) was
misleading in combination with the manual page as neither
mentioned that the table with details was shown only by
-H / --long-help.
gh#mhx/dwarfs#312.
Features:
* Added shell completion for dwarfsck and dwarfsextract.
* Added sample desktop unmount handlers.
- Changes in 0.14.0:
Bugfixes:
* Leading dots in --input-list file paths were incorrectly
treated as literal directory names instead of being expanded.
gh#mhx/dwarfs#292.
* The SPDX license identifier in GPL-licensed source files was
incorrectly specified as GPL-3.0-only instead of
GPL-3.0-or-later.
gh#mhx/dwarfs#275.
* Fixed an off-by-one error when recovering self_index fields in
metadata, which could cause the sentinel directory to have a
non-zero self_entry. While harmless by itself (since that entry
is never actually used), this would cause the metadata
consistency check to fail. The fix covers three aspects:
correcting the off-by-one error; ensuring the self_entry
recovery code does not run for the sentinel directory;
and changing the metadata consistency check to only warn about
a non-zero self_entry rather than fail. Running mkdwarfs with
--rebuild-metadata will also reset a non-zero sentinel
self_entry to zero.
* Fixed the implementation of the read operation in the FUSE
driver to send positive error code values to libfuse. This was
likely never triggered in practice, but in cases where parts of
the filesystem image vanish while being accessed (which
previously caused SIGBUS crashes), libfuse would not understand
the negative error codes.
* When setting CPU thread affinity for worker group threads via
DWARFS_WORKER_GROUP_AFFINITY, the code did not CPU_ZERO the
cpu_set_t structure before setting individual CPUs. This could
pin threads to random CPUs in addition to the requested ones.
* The FITS categorizer would scan entire files for the
end-of-header marker if their size was a multiple of
2880 bytes, causing significant slowdowns on large non-FITS
files. Additional checks now ensure scanning only continues if
the data truly looks like a standards-compliant FITS header.
* GCC caught a potential null-pointer dereference on error when
opening a file in mkdwarfs. This has been fixed.
* Numerous fixes for 32-bit architectures, mostly related to
integer overflows with file sizes larger than 4 GiB.
* Another off-by-one error caused the first regular file inode to
be excluded from the file-size cache. This would be hard to
notice unless that file was highly fragmented. The cache will
be fixed when rebuilding the metadata.
* The FUSE driver’s enable_nlink option is now the default
behavior and cannot be disabled. The previous optimization
skipped building a table of hardlink counts, which produced
inherently incorrect file status information (hardlinked files
share an inode, so reporting a link count of 1 is wrong).
The hardlink table is now stored in the metadata by default;
if there are no hardlinks, it consumes no space. You can still
omit the hardlink table with --no-hardlink-table, at the cost
of building it on-the-fly when the filesystem image is loaded
(typically fast — e.g., ~300 ms for 14 million files).
Features:
* New I/O layer abstraction that supports “classic” mmap-based
file access, granular mmap-based access on 32-bit systems, and
fully mmap-less access if desired. This applies to all DwarFS
tools. By default, tools use the most efficient
method—memory-mapping whole files on 64-bit systems and
mapping file segments on 32-bit systems (to conserve address
space). This can be controlled via the new DWARFS_IOLAYER_OPTS
environment variable described in dwarfs-env(7).
* Full support for sparse files. mkdwarfs now detects and
efficiently processes sparse files, skipping holes where
possible and preserving them in the filesystem image.
dwarfsextract extracts sparse files as such and preserves
sparse representations when extracting to archive formats that
support them (e.g., tar).
Note: Sparse file support is not backwards compatible; images
containing sparse files cannot be processed by DwarFS versions
prior to 0.14.0. By default, mkdwarfs enables sparse file
support if it detects sparse input. Use --no-sparse-files to
disable it and ensure compatibility with older versions.
* Support for subsecond timestamp resolution. The default remains
one second, but finer resolutions (down to nanoseconds) can be
specified with --time-resolution. mkdwarfs will warn if the
requested resolution is finer than the native filesystem
resolution. This is fully backwards compatible: older DwarFS
versions will handle such images but ignore the subsecond
parts.
gh#mhx/dwarfs#294.
* Desktop integration for Linux. A new --auto-mountpoint option
automatically creates or selects a mount-point directory,
making it easier to mount DwarFS images from file managers.
Desktop files and MIME type definitions are now installed to
enable double-click mounting of .dwarfs files.
* Shell completion for mkdwarfs (bash and zsh).
* Improved error handling when DwarFS tools encounter SIGBUS
(usually caused by accessing memory-mapped files on unreliable
or faulty storage like network shares or flaky USB drives).
When SIGBUS is caught, tools now print an error suggesting
switching from mmap- to read-based I/O via DWARFS_IOLAYER_OPTS.
* dwarfsck now checks metadata consistency by default (unless
--no-check is given), improving detection of filesystem image
corruption.
* The FUSE driver exposes new options cache_sparse and
no_cache_sparse to control whether sparse files should be
cached in the kernel page cache. See dwarfs(1) for details.
* The JSON output from dwarfsck now contains a complete raw
metadata dump when the detail level includes
metadata_full_dump.
* dwarfsck no longer artificially limits string sizes when
dumping metadata.
* Accelerated search for the start of a DwarFS image in files
with custom headers; the new code is about four times faster,
scanning at more than 6 GiB/s on a modern CPU.
* The cache size can now be configured for dwarfsck, useful with
the --checksum option.
* Both dwarfsck and dwarfsextract now limit the amount of data
requested from the filesystem image at once to avoid exhausting
memory (and virtual address space on 32-bit systems).
* Improved self-extracting binary stub with better compatibility
for qemu, binfmt_misc, and old kernels. The stub now works on
Linux kernels as old as 2.6.21 (and possibly older), and it now
uses nanoprintf to further reduce binary size.
* The FUSE driver will now show the name of the mounted file
system image in the mount point listing (e.g., in df or mount
output).
Compatibility:
* The accepted minor version for the DwarFS image format has been
incremented. Release v0.16.0 will also increment the written
minor version. This means images produced with v0.16.0 will not
be readable by DwarFS tools prior to v0.14.0.
See the “Features” section in dwarfs-format(7) for details.
* The (no_)cache_image option has been removed from the FUSE
driver.
Build:
* Removed the hard dependency on the date library, which caused
build issues on distributions that no longer bundle it
(e.g., openSUSE).
- Drop remove_hhdate_dependency.patch
- Drop folly-remove-boost_system-dependency.patch
-------------------------------------------------------------------
Fri Oct 3 13:38:35 UTC 2025 - Filippo Bonazzi <filippo.bonazzi@suse.com>
- Remove hhdate dependency
- Add remove_hhdate_dependency.patch: replace date library usage with
C++20 std::chrono
- Add %check section and run tests
- Add test dependencies gtest and gmock
-------------------------------------------------------------------
Thu Oct 2 17:25:56 UTC 2025 - Mia Herkt <mia@0x0.st>
- Remove libboost_system-devel from BuildRequires
- Add folly-remove-boost_system-dependency.patch
Fixes build with Boost >=1.89.0
gh#mhx/dwarfs#288
gh#facebook/folly#2489
-------------------------------------------------------------------
Tue Sep 2 22:55:02 UTC 2025 - Mia Herkt <mia@0x0.st>
- Update to version 0.13.0:
Bugfixes:
* Made section index discovery more robust.
gh#mhx/dwarfs#264
* A recent kernel change (https://lkml.org/lkml/2025/5/5/2868)
caused the tools_test to fail on Linux 6.14 and later.
This has been fixed by accepting both EPERM and ENOSYS as
valid error codes for link() calls.
Features:
* Support for big-endian architectures.
This is still experimental, even though all unit tests pass
with QEMU, and the benchmark suite runs fine on real hardware.
This currently requires forked versions of folly and fsst.
The changes are small and the pull requests will hopefully be
merged upstream soon.
* Experimental support for 32-bit architectures.
While DwarFS should mostly "just work" on 32-bit when using
small images (a few hundred megabytes), the limited address
space is a problem for the extensive use of memory-mapped
files inside DwarFS. There will be changes to limit the use of
mmap in the future (mainly due to other issues), which should
help 32-bit compatibility as a side-effect.
gh#mhx/dwarfs#268.
* The category metadata for categorized blocks is now stored in
the metadata block by default. This allows re-compressing the
blocks with a metadata-dependent algorithm (e.g. FLAC) even if
they were previously compressed using a metadata-independent
algorithm.
This can be disabled using the --no-category-metadata option.
See the mkdwarfs man page for more details.
* The --no-category-names and --no-category-metadata options can
be used to reduce the size of the metadata. However, this will
make it impossible to use metadata-dependent compression
algorithms (e.g. FLAC), or even select category-specific
compression, when recompressing the image.
* Metadata rebuilding is now supported in mkdwarfs using the
--rebuild-metadata option. Previously, the metadata could only
be recompressed, but it was impossible to change it. With the
new option, it is now possible to change metadata packing and
apply operations like --set-owner, --set-group, --set-time,
--time-resolution, --chmod, or --no-create-timestamp.
Note that these are potentially lossy operations that may be
irreversible. By default, the history of metadata rebuilds is
tracked in the metadata itself, but this can be disabled using
--no-metadata-version-history.
* In addition to metadata rebuilding, it is now also possible to
change the block size of an existing image using the
--change-block-size option. This implies --rebuild-metadata
and --recompress=all. This can be useful for tuning the
performance of an existing image without having to re-create
it from scratch.
* mkdwarfs now shows its current memory usage while running.
Note that -L/--memory-limit still only limits the memory used
for the block queue, not the overall memory usage. Fixing this
is on the roadmap, there's no need to file an issue.
* dwarfsextract has new options to control the output format:
--format-options and --format-filters. There is also
--format=auto to automatically "guess" the format and filters
based on the output file name.
* dwarfsck has a new frozen_details detail level that will show
the frozen_analysis content ordered by memory location instead
of memory usage and also shows the address range of each
section.
-------------------------------------------------------------------
Sun Aug 24 12:05:51 UTC 2025 - Jan Engelhardt <jengelh@inai.de>
- Replace wrong BuildRequires pkgconfig(clzma) -> pkgconfig(liblzma);
build only succeeded previously by accident.
-------------------------------------------------------------------
Sat Jun 21 12:19:45 UTC 2025 - Mia Herkt <mia@0x0.st>
- Update to version 0.12.4:
Bugfixes
* Segfault on bad_compression_ratio_error. When recompressing a
filesystem where some blocks cannot be compressed using the
selected algorithm because of a bad_compression_ratio_error,
the resulting block was left empty.
* Add history unless --no-history is given when rewriting a file
system image.
* Allow dumping frozen_layout w/o frozen_analysis in dwarfsck.
* Logging timestamps should show local time.
Features
* More complete breakdown of metadata in dwarfsck.
* Add schema_raw_dump flag to dwarfsck --detail.
Build
* Update folly/fbthrift/fsst.
-------------------------------------------------------------------
Mon Apr 21 19:50:12 UTC 2025 - Mia Herkt <mia@0x0.st>
- Update to version 0.12.3:
Bugfixes
* Automatic image offset detection (for images using a custom
header) did not work correctly if the header contained a
string that would be identified as the start of a v1 section
header (these were only used before dwarfs-0.3.0).
If there was either "DWARFS\x02\x00" or "DWARFS\x02\x01" in
the header, offset detection would fail. The check has been
modified to peek further into the data and ensure this really
is a v1 section header, and also checking if the next section
header position can be derived from the length field.
It is still possible to construct a file system image where
offset detection will ultimately fail, but it is much less
likely with the change.
- Changes in version 0.12.2:
Bugfixes
*The dwarfs-0.12.0 release introduced a performance regression
where FLAC compression took more than twice as long as in the
previous releases. This has been fixed. FLAC decompression was
unaffected.
-------------------------------------------------------------------
Sun Apr 13 19:46:34 UTC 2025 - Mia Herkt <mia@0x0.st>
- Update to version 0.12.1:
Features
* Added --memory-limit=auto to mkdwarfs to use a more reasonable
(hopefully) default for the block queue. The old default of
1 GiB was quite arbitrary and definitely not suitable for
low-end systems. The new auto default will determine the limit
based on the number of workers (which in turn is based on the
number of CPUs), the block size, and the amount of physical
memory of the system.
* Replaced vector_byte_buffer with malloc_byte_buffer, which is
internally based around a simple buffer that doesn't incur the
cost of initializing each element like std::vector. Especially
for large blocks which are known to be overwritten immediately,
this can save a few CPU cycles.
- Changes in version 0.12.0:
* New Licensing Conditions: Instead of being all GPL-3.0 like all
the previous releases, this release changes the license of a
large fraction of the DwarFS code to MIT. All tools and
libraries that only read DwarFS images are now MIT-licensed.
Everything else (e.g. mkdwarfs) is still GPL-3.0 for the time
being.
Bugfixes
* Changes for compatibility with Boost.Process v2.
Features
* Re-licensed all libraries required for reading DwarFS images
under the MIT license. The source of all tools that just read
DwarFS images (i.e. everything except for mkdwarfs) are also
under the MIT license now. Everything else is still GPL-3.0.
gh#mhx/dwarfs#255
* New hotness categorizer in mkdwarfs that allows a list of "hot"
files to be stored in distinct file system blocks.
* New explicit ordering mode in mkdwarfs that allows files to be
ordered accoring to the order in a given list file.
* dwarfs now shows the version of the FUSE library used.
* New dwarfs options preload_all and preload_category to populate
the block cache immediately after mounting.
* New dwarfs option analysis_file that can be used for profiling
and as input to mkdwarfs new hotness categorizer and explicit
ordering mode.
* New dwarfs option block_allocator that allows the user to
switch from a malloc-based block allocator to an mmap-based
one. This can help with returning memory back to the system if
the blocks are evicted from the cache.
-------------------------------------------------------------------
Fri Apr 4 07:31:03 UTC 2025 - Jan Engelhardt <jengelh@inai.de>
- Use SRPM base name for devel subpackage
-------------------------------------------------------------------
Tue Apr 1 03:25:25 UTC 2025 - Mia Herkt <mia@0x0.st>
- Initial package, version 0.11.3