New I/O Layer, Sparse Files and Sub-Second Timestamps
New I/O Layer
A completely new I/O layer abstraction replaces the ubiquitous use of memory mapped files across the DwarFS code base. Memory-mapping is still the default, but processing is done in "segments" rather than in whole files. This required a significant amount of changes (this release adds/touches more than 5,000 lines of code and almost 10,000 lines of new tests) in almost every part of the code that was processing file data, code that previously assumed any file could simply be accessed as a contiguous piece of memory.
In the new abstraction layer, backends are pluggable and configurable through the DWARFS_IOLAYER_OPTS environment variable, letting you:
- Configure the size up to which files are mapped "eagerly", i.e. as a whole and not in segments. This is mostly relevant for 32-bit systems, on which this is set to a reasonable default (32 MiB).
- Switch from
mmap()to classicread()for maximum robustness on unreliable storage or faulty hardware.
The latter is relevant if you're seeing "bus errors" (SIGBUS), as many have done in the past (#45, #50, #108, #163, #213). You can switch to the read-based backend using:
$ export DWARFS_IOLAYER_OPTS=open_mode=read
Sparse File Support
This release includes end-to-end sparse file support:
mkdwarfsdetects holes in files and preserves sparseness in the image.- The FUSE driver exposes sparsity via
lseek()where supported (Linux, FreeBSD). dwarfsextractwrites sparse files and preserves them when targeting archive formats that support holes (e.g.,tar).
Compatibility: images containing sparse files require DwarFS ≥ 0.14.0. You can use --no-sparse-files to explicitly treat sparse files as non-sparse and keep compatibility with older versions. If your input does not contain sparse files, the images remain backwards-compatible even without that flag.
Sparse file support matrix
| Feature / OS | Linux | FreeBSD | macOS | Windows |
|---|---|---|---|---|
Reading sparse files (mkdwarfs) |
✅ | ✅ | ✅ | ✅ |
Writing sparse files (dwarfsextract) |
✅ | ✅ | ✅ | ✅ |
| Exposing sparse files via FUSE layer | ✅ | ✅ | ❌ | ❌ |
FUSE-level sparseness requires
lseek()support in the FUSE implementation (currently Linux and FreeBSD). On macOS and Windows, files are exposed via FUSE as non-sparse even thoughdwarfsextractcan still write sparse files when extracting. Missinglseek()support is tracked here for Windows and here for macOS.
Sub-second timestamps
Configurable sub-second timestamp resolution down to nanoseconds (using --time-resolution). The default remains one second. This is fully backwards-compatible: older DwarFS versions can read images with sub-second resolution, but will ignore the sub-second part.
Bug fixes
-
Leading dots in
--input-listfile paths were incorrectly treated as literal directory names instead of being expanded. This has been fixed. Fixes #292. -
The SPDX license identifier in GPL-licensed source files was incorrectly specified as
GPL-3.0-onlyinstead ofGPL-3.0-or-later. This has been corrected. Fixes #275. -
Fixed an off-by-one error when recovering
self_indexfields in metadata, which could cause the sentinel directory to have a non-zeroself_entry. While harmless by itself (since that entry is never actually used), this would cause the metadata consistency check to fail. The fix covers three aspects: correcting the off-by-one error; ensuring theself_entryrecovery code does not run for the sentinel directory; and changing the metadata consistency check to only warn about a non-zeroself_entryrather than fail. Runningmkdwarfswith--rebuild-metadatawill also reset a non-zero sentinelself_entryto zero. -
Fixed the implementation of the
readoperation in the FUSE driver to send positive error code values to libfuse. This was likely never triggered in practice, but in cases where parts of the filesystem image vanish while being accessed (which previously caused SIGBUS crashes), libfuse would not understand the negative error codes. -
Moved the FUSE driver binaries from
sbintobinand kept only themount.dwarfs/mount.dwarfs2symlinks insbin. This better aligns with user expectations, other FUSE drivers, and the fact that the man pages are installed in section 1. (Thanks to Ahmad Khalifa for the fix.) -
The
dwarfs2binary was broken in builds using shared libraries. (Thanks to Ahmad Khalifa for the fix.) -
When setting CPU thread affinity for worker group threads via
DWARFS_WORKER_GROUP_AFFINITY, the code did notCPU_ZEROthecpu_set_tstructure before setting individual CPUs. This could pin threads to random CPUs in addition to the requested ones. -
The FITS categorizer would scan entire files for the end-of-header marker if their size was a multiple of 2880 bytes, causing significant slowdowns on large non-FITS files. Additional checks now ensure scanning only continues if the data truly looks like a standards-compliant FITS header.
-
GCC caught a potential null-pointer dereference on error when opening a file in
mkdwarfs. This has been fixed. -
Numerous fixes for 32-bit architectures, mostly related to integer overflows with file sizes larger than 4 GiB.
-
Another off-by-one error caused the first regular file inode to be excluded from the file-size cache. This would be hard to notice unless that file was highly fragmented. The cache will be fixed when rebuilding the metadata.
-
The FUSE driver’s
enable_nlinkoption is now the default behavior and cannot be disabled. The previous optimization skipped building a table of hardlink counts, which produced inherently incorrect file status information (hardlinked files share an inode, so reporting a link count of 1 is wrong). The hardlink table is now stored in the metadata by default; if there are no hardlinks, it consumes no space. You can still omit the hardlink table with--no-hardlink-table, at the cost of building it on-the-fly when the filesystem image is loaded (typically fast — e.g., ~300 ms for 14 million files). -
Fixed a typo in
dwarfs-format.md. (Thanks to Dennis Brakhane for spotting this and sending a PR.)
Features
-
New I/O layer abstraction that supports “classic”
mmap-based file access, granularmmap-based access on 32-bit systems, and fullymmap-less access if desired. This applies to all DwarFS tools. By default, tools use the most efficient method—memory-mapping whole files on 64-bit systems and mapping file segments on 32-bit systems (to conserve address space). This can be controlled via the newDWARFS_IOLAYER_OPTSenvironment variable described indwarfs-env(7). -
Full support for sparse files.
mkdwarfsnow detects and efficiently processes sparse files, skipping holes where possible and preserving them in the filesystem image. This is supported on all platforms. The FUSE driver implementslseek()where supported by the FUSE library (currently Linux and FreeBSD); Windows and macOS fall back to showing files as non-sparse.dwarfsextractextracts sparse files as such and preserves sparse representations when extracting to archive formats that support them (e.g., tar). Note: Sparse file support is not backwards compatible; images containing sparse files cannot be processed by DwarFS versions prior to 0.14.0. By default,mkdwarfsenables sparse file support if it detects sparse input. Use--no-sparse-filesto disable it and ensure compatibility with older versions. -
Support for subsecond timestamp resolution. The default remains one second, but finer resolutions (down to nanoseconds) can be specified with
--time-resolution.mkdwarfswill warn if the requested resolution is finer than the native filesystem resolution. This is fully backwards compatible: older DwarFS versions will handle such images but ignore the subsecond parts. Fixes #294. -
Desktop integration for Linux. A new
--auto-mountpointoption automatically creates or selects a mount-point directory, making it easier to mount DwarFS images from file managers. Desktop files and MIME type definitions are now installed to enable double-click mounting of.dwarfsfiles. (Thanks to Ahmad Khalifa for the implementation.) -
Shell completion for
mkdwarfs(bash and zsh). (Thanks to Ahmad Khalifa for the contribution.) -
Improved error handling when DwarFS tools encounter
SIGBUS(usually caused by accessing memory-mapped files on unreliable or faulty storage like network shares or flaky USB drives). WhenSIGBUSis caught, tools now print an error suggesting switching frommmap- toread-based I/O viaDWARFS_IOLAYER_OPTS. -
dwarfscknow checks metadata consistency by default (unless--no-checkis given), improving detection of filesystem image corruption. -
If sparse files are supported by the FUSE library, the FUSE driver exposes new options
cache_sparseandno_cache_sparseto control whether sparse files should be cached in the kernel page cache. Seedwarfs(1)for details. -
The JSON output from
dwarfscknow contains a complete raw metadata dump when the detail level includesmetadata_full_dump. -
dwarfsckno longer artificially limits string sizes when dumping metadata. (Thanks to Dennis Brakhane for the contribution.) -
Accelerated search for the start of a DwarFS image in files with custom headers; the new code is about four times faster, scanning at more than 6 GiB/s on a modern CPU.
-
The cache size can now be configured for
dwarfsck, useful with the--checksumoption. -
Both
dwarfsckanddwarfsextractnow limit the amount of data requested from the filesystem image at once to avoid exhausting memory (and virtual address space on 32-bit systems). -
Improved self-extracting binary stub with better compatibility for
qemu,binfmt_misc, and old kernels. The stub now works on Linux kernels as old as 2.6.21 (and possibly older), and it now usesnanoprintfto further reduce binary size. -
The FUSE driver will now show the name of the mounted file system image in the mount point listing (e.g., in
dformountoutput) on Linux, FreeBSD and macOS, as well as the filesystem subtype (dwarfs) on Linux and FreeBSD.
Compatibility
-
The accepted minor version for the DwarFS image format has been incremented. Release v0.16.0 will also increment the written minor version. This means images produced with v0.16.0 will not be readable by DwarFS tools prior to v0.14.0. See the “Features” section in
dwarfs-format(7)for details. -
The
(no_)cache_imageoption has been removed from the FUSE driver.
Docs
-
Added documentation on manual FSST decoding to
dwarfs-format.md. (Thanks to Dennis Brakhane for the PR.) -
Several cleanups and additions to
dwarfs-format.md, including a glossary of terms, clarification of blocks vs. sections, and descriptions of compatibility handling via features, plus details on the representation of sparse files and hardlinks. -
New manual page
dwarfs-env(7)documenting DwarFS-specific environment variables.
Build
-
Removed the dependency on Boost.Iostreams and the hard dependency on Boost.System.
-
Removed the hard dependency on the
datelibrary, which caused build issues on distributions that no longer bundle it (e.g., SUSE). -
The build system now creates symbolic mount links at install time rather than in the build directory.
Test
- Significantly improved test coverage.
New Contributors
- @thekhalifa made their first contribution in #277
- @brakhane made their first contribution in #296
Full Changelog: v0.13.0...v0.14.0
SHA-256 Checksums
ad4c9f31bf292d9a684add611c954f20b9aae698beae9d1a1144e3f9da4e8521 dwarfs-0.14.0-Linux-aarch64.tar.xz
7205055f731feee030c855c4c1cb2aeb30955bbd1f1178c4ddb39b858902383e dwarfs-0.14.0-Linux-arm.tar.xz
f006e7c2c0d527a3d40cbcdbdbbe339225dc2e0844abba862e684f8c1b79a8db dwarfs-0.14.0-Linux-i386.tar.xz
4b81e7476eb6286c53137cf6ecd955aec534fb458e983baddfd8a384b064107b dwarfs-0.14.0-Linux-loongarch64.tar.xz
f4150d897e689dc755233050f948d4c73de426311dee45ca8eed453f678e7e57 dwarfs-0.14.0-Linux-ppc64le.tar.xz
36fd97fd8b36703c416dbe8f24e1642339392bc7134a9f39b3962b84afd6785e dwarfs-0.14.0-Linux-ppc64.tar.xz
ff3507cfe450fc2e0c04437901d164d21d322ce0540ef1251be70fbb96c5f557 dwarfs-0.14.0-Linux-riscv64.tar.xz
e34aeefdab4ad7a8677444c90e1e11c5655b03c5920972e49d5fadfa82075af3 dwarfs-0.14.0-Linux-s390x.tar.xz
2b253aec82243437a7913e8a878944d96a7ab308a3861d59ce41aaa15e1409ad dwarfs-0.14.0-Linux-x86_64.tar.xz
514b851af356102abca9103dd12c92a31fad6d2f705c4cfaff4e815b5753250f dwarfs-0.14.0.tar.xz
188c870e6f1d01b09741f172cebcffcfb9ca3d52b95c3face4af99af34f6463d dwarfs-0.14.0-Windows-AMD64.7z
d7bcb4e3beba3d97cab0ef2c40d945c56642982f548905baf1fb43ae22af8611 dwarfs-fuse-extract-0.14.0-Linux-aarch64
27ceca09ae6733d29117035e3aa82e355c18bafe6fe3e2b73166cd1b074f9b64 dwarfs-fuse-extract-0.14.0-Linux-aarch64.upx
c42c86beb1bb7bd3c1923b8c0b7f4f3dfdb2c0e8a725e91be08afe1eb9305ffb dwarfs-fuse-extract-0.14.0-Linux-arm
d7cba4f871ccf33940ef295b7c1a3654c81c670d718b82607fc0ba6902cad789 dwarfs-fuse-extract-0.14.0-Linux-arm.upx
3e94f869c2d950e8dc298959811faa9b40040a94292643c0dc8a3de92a9e97be dwarfs-fuse-extract-0.14.0-Linux-i386
124c0ee4a08979c20be9dd046d2849b8e3ef22b1a955c2effba727fac13c3791 dwarfs-fuse-extract-0.14.0-Linux-i386.upx
8bafe0484a2e834555c53c3bc406b1b0c1eeef29b6306f387c1a8477833a00b1 dwarfs-fuse-extract-0.14.0-Linux-loongarch64
cbe651024537a778129a8532ec21356ff966a047ba50cd9fed937f752ebcc6c0 dwarfs-fuse-extract-0.14.0-Linux-ppc64
10b1b6b8f8aeb36a5e90080765fd03583ed9ae716333006ca91314aecb1bf75f dwarfs-fuse-extract-0.14.0-Linux-ppc64le
47cff3962b6d3d3a865882e1285855211ca889ef060698a68fae069517954977 dwarfs-fuse-extract-0.14.0-Linux-riscv64
860b58eb508e4921b7299d32449c966bc11db5386cf42b7b1eccbe106b82d6ce dwarfs-fuse-extract-0.14.0-Linux-s390x
32f508133049734f537730fbdc3f0fb8e33c4d1bfe4311ebb21584c2917f3503 dwarfs-fuse-extract-0.14.0-Linux-x86_64
012cdef2ae435bb865d106f616d044e20610ddcdc3a3877d5232e37b4990a0d2 dwarfs-fuse-extract-0.14.0-Linux-x86_64.upx
111509c66fef6791177720a95c495d10247d2afbbb9044a40fe85a414b23a88f dwarfs-universal-0.14.0-Linux-aarch64
bfdb7264834f55f90027cef6c69f57a16617fce79e469a2f13ad0638141b3f8d dwarfs-universal-0.14.0-Linux-aarch64.upx
1e10b1e391ef7207052a83ce2bb865b3d7d5d866925365985e3c3800275b90fd dwarfs-universal-0.14.0-Linux-arm
5c68e62a230870863d53b140c2dbe531de77705caa87efabdc81a1389a57d4a4 dwarfs-universal-0.14.0-Linux-arm.upx
02a47352525c4ff194dab0d6e4cceb8137de2b9e89d7bc3f8529db2a1bb11237 dwarfs-universal-0.14.0-Linux-i386
6d5154083a535369c801910f026bbcfd274039558fb7acc01ed7dd620dba5bf5 dwarfs-universal-0.14.0-Linux-i386.upx
806e7fdf702d03603c4f794eadbc8ed754e28f73caf75f6c055dc6a014e21cb7 dwarfs-universal-0.14.0-Linux-loongarch64
40bf4a02b2a2de53f4b95d7c9b7a5e9f01ad16379eb21c683d9bf1e9cf465966 dwarfs-universal-0.14.0-Linux-ppc64
be38e97a00074199526908b01ceb5c2809da361ac0d2bc40a5d0eae87402bf47 dwarfs-universal-0.14.0-Linux-ppc64le
8763dbd3492e25d85b3e510bf2baa547f4860246d4c5c36f6ac8fd0446639cc6 dwarfs-universal-0.14.0-Linux-riscv64
f043ba472083be62438af690e7ebb48ee5ed5291895fe55e8fff7ba91b3cc36a dwarfs-universal-0.14.0-Linux-s390x
d5b9876a2b1b81ef7030f0e1e683f2cd879a9b4a785a2c2c59d8f28d34127486 dwarfs-universal-0.14.0-Linux-x86_64
accec681104a93c3752ef19629925e73d53b3b9608826572dd8c6f064184dba9 dwarfs-universal-0.14.0-Linux-x86_64.upx
6b0ba1fae9789f5db950276bfadd04619065015924d2c86004d50101c5e7ff6f dwarfs-universal-0.14.0-Windows-AMD64.exe