6 releases
| 0.1.6 | Jan 6, 2026 |
|---|---|
| 0.1.5 | Jan 3, 2026 |
| 0.1.1 | Dec 31, 2025 |
#202 in Compression
160KB
2.5K
SLoC
safe_unzip
Secure archive extraction. Supports ZIP (core), TAR, and 7z (optional features).
The Problem
Zip files can contain malicious paths that escape the extraction directory:
import zipfile
zipfile.ZipFile("evil.zip").extractall("/var/uploads")
# Extracts ../../etc/cron.d/pwned β /etc/cron.d/pwned π
This is Zip Slip, and Python's default behavior is still vulnerable.
"But didn't Python fix this?"
Sort of. Python added warnings and ZipInfo.filename sanitization in 2014. In Python 3.12+, there's a filter parameter:
# The "safe" way β but who knows this exists?
zipfile.ZipFile("evil.zip").extractall("/var/uploads", filter="data")
The problem: the safe option is opt-in. The default is still vulnerable. Most developers don't read the docs carefully enough to discover filter="data".
safe_unzip makes security the default, not an afterthought.
The Solution
use safe_unzip::extract_file;
extract_file("/var/uploads", "evil.zip")?;
// Err(PathEscape { entry: "../../etc/cron.d/pwned", ... })
# Python bindings β same safety
from safe_unzip import extract_file
extract_file("/var/uploads", "evil.zip")
# Raises: PathEscapeError
Security is the default. No special flags, no opt-in safety. Every path is validated. Malicious archives are rejected, not extracted.
Why Not Just Use zip / tar / zipfile?
Because archive extraction is a security boundary, and most libraries treat it as a convenience function.
| Library | Default Behavior | Safe Option |
|---|---|---|
Python zipfile |
Vulnerable | filter="data" (opt-in, obscure) |
Python tarfile |
Vulnerable | filter="data" (opt-in, Python 3.12+) |
Rust zip |
Vulnerable | Manual path validation |
Rust tar |
Vulnerable | Manual path validation |
safe_unzip |
Safe by default | N/A β always safe |
If you're extracting untrusted archives, you need a library designed for that threat model.
Who Should Use This
- Backend services handling user-uploaded zip files
- CI/CD systems unpacking third-party artifacts
- SaaS platforms with file import features
- Forensics / malware analysis pipelines
- Anything running as a privileged user
If your zip files only come from trusted sources you control, the standard zip crate is fine. If users can upload archives, use safe_unzip.
Features
- CLI Tool β
safe_unzipcommand with--list,--verify, limits, and filtering - Archive Verification β Check CRC32 integrity without extracting
- Multi-Format Support β ZIP (core), TAR, and 7z (feature flags)
- Partial Extraction β Extract specific files with
only()or glob patterns - Progress Callbacks β Monitor extraction progress (Rust API)
- Async API β Optional tokio-based async extraction (feature flag)
- Zip Slip Protection β Path traversal attacks blocked via path_jail
- Zip Bomb Protection β Configurable limits on size, file count, and path depth
- Strict Size Enforcement β Catches files that decompress larger than declared
- Filename Sanitization β Blocks control characters and Windows reserved names
- Symlink Handling β Skip or reject symlinks (no symlink-based escapes)
- Secure Overwrite β Removes symlinks before overwriting to prevent symlink attacks
- Atomic File Creation β TOCTOU-safe file creation using
O_EXCL - Overwrite Policies β Error, skip, or overwrite existing files
- Filter Callback β Extract only the files you want
- Two-Pass Mode β Validate everything before writing anything
- Permission Stripping β Removes setuid/setgid bits on Unix
Installation
CLI:
cargo install safe_unzip --features cli
Rust:
[dependencies]
safe_unzip = "0.1"
Python:
pip install safe-unzip
Feature Flags
| Feature | Default | Description |
|---|---|---|
tar |
β | TAR/TAR.GZ extraction |
async |
β | Tokio-based async API |
sevenz |
β | 7z extraction (heavier deps) |
# ZIP only (smallest, ~30 deps)
safe_unzip = "0.1"
# With TAR support (~40 deps)
safe_unzip = { version = "0.1", features = ["tar"] }
# With async API
safe_unzip = { version = "0.1", features = ["async"] }
# Kitchen sink (~85 deps)
safe_unzip = { version = "0.1", features = ["tar", "async", "sevenz"] }
Note: Python bindings always include TAR support.
Python Bindings
The Python bindings are thin wrappers over the Rust implementation via PyO3. This means:
- β Identical security guarantees β same code path, same validation
- β Identical limits β same defaults (1GB total, 10K files, 100MB per file)
- β Identical semantics β same error conditions, same behavior
- β No re-implementation β Python calls Rust directly, no logic duplication
Security reviewers: the Python API is a direct binding, not a port.
CLI Usage
# Extract archive to destination
safe_unzip archive.zip -d /var/uploads
# List contents without extracting
safe_unzip archive.zip --list
# Verify integrity (CRC32 check)
safe_unzip archive.zip --verify
# With limits
safe_unzip archive.zip -d /var/uploads --max-size 100M --max-files 1000
# Glob filtering
safe_unzip archive.zip -d /var/uploads --include "**/*.py" --exclude "**/test_*"
# Partial extraction
safe_unzip archive.zip -d /var/uploads --only README.md --only LICENSE
# Verbose output
safe_unzip archive.zip -d /var/uploads -v
Quick Start
use safe_unzip::extract_file;
// Extract with safe defaults
let report = extract_file("/var/uploads", "archive.zip")?;
println!("Extracted {} files ({} bytes)",
report.files_extracted,
report.bytes_written
);
Usage Examples
Basic Extraction
use safe_unzip::Extractor;
let report = Extractor::new("/var/uploads")?
.extract_file("archive.zip")?;
Create Destination if Missing
use safe_unzip::Extractor;
// Extractor::new() errors if destination doesn't exist (catches typos)
// Extractor::new_or_create() creates it automatically
let report = Extractor::new_or_create("/var/uploads/new_folder")?
.extract_file("archive.zip")?;
// The convenience functions (extract_file, extract) also create automatically
use safe_unzip::extract_file;
extract_file("/var/uploads/new_folder", "archive.zip")?;
Custom Limits (Prevent Zip Bombs)
use safe_unzip::{Extractor, Limits};
let report = Extractor::new("/var/uploads")?
.limits(Limits {
max_total_bytes: 500 * 1024 * 1024, // 500 MB total
max_file_count: 1_000, // Max 1000 files
max_single_file: 50 * 1024 * 1024, // 50 MB per file
max_path_depth: 10, // No deeper than 10 levels
})
.extract_file("archive.zip")?;
Filter by Extension
use safe_unzip::Extractor;
// Only extract images
let report = Extractor::new("/var/uploads")?
.filter(|entry| {
entry.name.ends_with(".png") ||
entry.name.ends_with(".jpg") ||
entry.name.ends_with(".gif")
})
.extract_file("archive.zip")?;
println!("Extracted {} images, skipped {} other files",
report.files_extracted,
report.entries_skipped
);
Partial Extraction (New in v0.1.5)
Extract specific files by name or glob pattern:
use safe_unzip::Extractor;
// Extract only specific files
let report = Extractor::new("/var/uploads")?
.only(&["README.md", "LICENSE"])
.extract_file("archive.zip")?;
// Include by glob pattern
let report = Extractor::new("/var/uploads")?
.include_glob(&["**/*.py", "**/*.rs"])
.extract_file("archive.zip")?;
// Exclude by glob pattern
let report = Extractor::new("/var/uploads")?
.exclude_glob(&["**/__pycache__/**", "**/*.pyc"])
.extract_file("archive.zip")?;
Python:
from safe_unzip import Extractor
# Extract only specific files
report = Extractor("/var/uploads").only(["README.md", "LICENSE"]).extract_file("archive.zip")
# Include by pattern
report = Extractor("/var/uploads").include_glob(["**/*.py"]).extract_file("archive.zip")
# Exclude by pattern
report = Extractor("/var/uploads").exclude_glob(["**/__pycache__/**"]).extract_file("archive.zip")
Progress Callbacks
Monitor extraction progress:
use safe_unzip::Extractor;
let report = Extractor::new("/var/uploads")?
.on_progress(|p| {
println!("[{}/{}] {} ({} bytes)",
p.entry_index + 1,
p.total_entries,
p.entry_name,
p.entry_size
);
})
.extract_file("archive.zip")?;
Python:
from safe_unzip import Extractor
def show_progress(p):
print(f"[{p['entry_index']+1}/{p['total_entries']}] {p['entry_name']}")
Extractor("/var/uploads").on_progress(show_progress).extract_file("archive.zip")
# Or with tqdm for a progress bar
from tqdm import tqdm
entries = list_zip_entries("archive.zip")
pbar = tqdm(total=len(entries))
def update_bar(p):
pbar.update(1)
pbar.set_description(p['entry_name'])
Extractor("/var/uploads").on_progress(update_bar).extract_file("archive.zip")
pbar.close()
Archive Verification
Check archive integrity without extracting:
use safe_unzip::verify_file;
// Verify CRC32 for all entries
let report = verify_file("archive.zip")?;
println!("Verified {} entries ({} bytes)",
report.entries_verified,
report.bytes_verified
);
Or with the Extractor (useful if you want to verify then extract):
use safe_unzip::Extractor;
let extractor = Extractor::new("/var/uploads")?;
// First verify
extractor.verify_file("archive.zip")?;
// Then extract (re-reads archive, but guarantees integrity)
extractor.extract_file("archive.zip")?;
Overwrite Policies
use safe_unzip::{Extractor, OverwritePolicy};
// Skip files that already exist
let report = Extractor::new("/var/uploads")?
.overwrite(OverwritePolicy::Skip)
.extract_file("archive.zip")?;
// Or overwrite them
let report = Extractor::new("/var/uploads")?
.overwrite(OverwritePolicy::Overwrite)
.extract_file("archive.zip")?;
// Default: Error if file exists
let report = Extractor::new("/var/uploads")?
.overwrite(OverwritePolicy::Error) // This is the default
.extract_file("archive.zip")?;
Symlink Policies
use safe_unzip::{Extractor, SymlinkPolicy};
// Default: silently skip symlinks
let report = Extractor::new("/var/uploads")?
.symlinks(SymlinkPolicy::Skip)
.extract_file("archive.zip")?;
// Or reject archives containing symlinks
let report = Extractor::new("/var/uploads")?
.symlinks(SymlinkPolicy::Error)
.extract_file("archive.zip")?;
Extraction Modes
| Mode | Speed | On Failure | Use When |
|---|---|---|---|
Streaming (default) |
Fast (1 pass) | Partial files remain | Speed matters; you'll clean up on error |
ValidateFirst |
Slower (2 passes) | No files if validation fails | Can't tolerate partial state |
β οΈ Neither mode is truly atomic. If extraction fails mid-write (e.g., disk full), partial files remain regardless of mode. ValidateFirst only prevents writes when validation fails (bad paths, limits exceeded), not when I/O fails during extraction.
use safe_unzip::{Extractor, ExtractionMode};
// Two-pass extraction:
// 1. Validate ALL entries (no disk writes)
// 2. Extract (only if validation passed)
let report = Extractor::new("/var/uploads")?
.mode(ExtractionMode::ValidateFirst)
.extract_file("untrusted.zip")?;
Use ValidateFirst when you can't tolerate partial state from malicious archives. Use Streaming (default) when speed matters and you can clean up on error.
Extracting from Memory
use safe_unzip::Extractor;
use std::io::Cursor;
let zip_bytes: Vec<u8> = download_zip_somehow();
let cursor = Cursor::new(zip_bytes);
let report = Extractor::new("/var/uploads")?
.extract(cursor)?;
TAR Extraction (New in v0.1.2)
use safe_unzip::{Driver, TarAdapter};
// Extract a .tar file
let report = Driver::new("/var/uploads")?
.extract_tar_file("archive.tar")?;
// Extract a .tar.gz file
let report = Driver::new("/var/uploads")?
.extract_tar_gz_file("archive.tar.gz")?;
// With options
let report = Driver::new("/var/uploads")?
.filter(|entry| entry.name.ends_with(".txt"))
.validation(safe_unzip::ValidationMode::ValidateFirst)
.extract_tar_file("archive.tar")?;
The new Driver API provides a unified interface for all archive formats with the same security guarantees.
7z Extraction (Requires sevenz Feature)
Enable the sevenz feature:
[dependencies]
safe_unzip = { version = "0.1", features = ["sevenz"] }
use safe_unzip::{Driver, SevenZAdapter};
// Extract a .7z file
let report = Driver::new("/var/uploads")?
.extract_7z_file("archive.7z")?;
// Or from bytes
let report = Driver::new("/var/uploads")?
.extract_7z_bytes(&seven_z_bytes)?;
Note: 7z archives are fully decompressed into memory before extraction, so large archives may use significant RAM.
Python:
from safe_unzip import extract_7z_file, Extractor
# Simple extraction
report = extract_7z_file("/var/uploads", "archive.7z")
# With options
report = Extractor("/var/uploads").extract_7z_file("archive.7z")
Async Extraction (New)
Enable the async feature for tokio-based async extraction:
[dependencies]
safe_unzip = { version = "0.1", features = ["async"] }
use safe_unzip::r#async::{extract_file, extract_tar_file, AsyncExtractor};
#[tokio::main]
async fn main() -> Result<(), safe_unzip::Error> {
// Simple async extraction
let report = extract_file("/var/uploads", "archive.zip").await?;
// TAR extraction
let report = extract_tar_file("/var/uploads", "archive.tar").await?;
// With options
let report = AsyncExtractor::new("/var/uploads")?
.max_total_bytes(500 * 1024 * 1024)
.max_file_count(1000)
.extract_file("archive.zip")
.await?;
Ok(())
}
Concurrent extraction of multiple archives:
use safe_unzip::r#async::{extract_file, extract_tar_bytes};
let (zip_result, tar_result) = tokio::join!(
extract_file("/uploads/a", "first.zip"),
extract_tar_bytes("/uploads/b", tar_data),
);
The async API uses spawn_blocking internally, so extraction runs in a thread pool without blocking the async runtime.
Python Async API
Python async support uses asyncio.to_thread() to run extraction in a thread pool:
import asyncio
from safe_unzip import async_extract_file, AsyncExtractor
async def main():
# Simple async extraction
report = await async_extract_file("/var/uploads", "archive.zip")
# TAR extraction
from safe_unzip import async_extract_tar_file
report = await async_extract_tar_file("/var/uploads", "archive.tar")
# With options
report = await (
AsyncExtractor("/var/uploads")
.max_total_mb(500)
.max_files(1000)
.extract_file("archive.zip")
)
asyncio.run(main())
Concurrent extraction:
async def extract_all(archives):
tasks = [
async_extract_file(f"/uploads/{i}", path)
for i, path in enumerate(archives)
]
return await asyncio.gather(*tasks)
Security Model
| Threat | Attack Vector | Defense |
|---|---|---|
| Zip Slip | Entry named ../../etc/cron.d/pwned |
path_jail validates every path |
| Zip Bomb (size) | 42KB β 4PB expansion | max_total_bytes limit + streaming enforcement |
| Zip Bomb (count) | 1 million empty files | max_file_count limit |
| Zip Bomb (lying) | Declared 1KB, decompresses to 1GB | Strict size reader detects mismatch |
| Symlink Escape | Symlink to /etc/passwd |
Skip or reject symlinks |
| Symlink Overwrite | Create symlink, then overwrite target | Symlinks removed before overwrite |
| Path Depth | a/b/c/.../1000levels |
max_path_depth limit |
| Invalid Filename | Control chars, CON, NUL |
Filename sanitization |
| Overwrite | Replace sensitive files | OverwritePolicy::Error default |
| Setuid | Create setuid executables | Permission bits stripped |
| Encrypted Archives | Password handling complexity | Rejected (see Encrypted Archives) |
Default Limits
| Limit | Default | Description |
|---|---|---|
max_total_bytes |
1 GB | Total uncompressed size |
max_file_count |
10,000 | Number of files |
max_single_file |
100 MB | Largest single file |
max_path_depth |
50 | Directory nesting depth |
Error Handling
use safe_unzip::{extract_file, Error};
match extract_file("/var/uploads", "archive.zip") {
Ok(report) => {
println!("Success: {} files", report.files_extracted);
}
Err(Error::PathEscape { entry, detail }) => {
eprintln!("Blocked path traversal in '{}': {}", entry, detail);
}
Err(Error::TotalSizeExceeded { limit, would_be }) => {
eprintln!("Archive too large: {} bytes (limit: {})", would_be, limit);
}
Err(Error::FileTooLarge { entry, size, limit }) => {
eprintln!("File '{}' too large: {} bytes (limit: {})", entry, size, limit);
}
Err(Error::FileCountExceeded { limit }) => {
eprintln!("Too many files (limit: {})", limit);
}
Err(Error::AlreadyExists { path }) => {
eprintln!("File already exists: {}", path);
}
Err(Error::InvalidFilename { entry, .. }) => {
eprintln!("Invalid filename: {}", entry);
}
Err(Error::EncryptedEntry { entry }) => {
eprintln!("Encrypted entry not supported: {}", entry);
}
Err(e) => {
eprintln!("Extraction failed: {}", e);
}
}
Limitations
Format Limitations
- ZIP, TAR, 7z only β RAR not supported
- Requires seekable input for ZIP β ZIP format requires reading the central directory at the end
- TAR is sequential β TAR files are read in order;
ValidateFirstmode caches entries in memory - No encrypted archives β See below
Encrypted Archives
safe_unzip does not support password-protected zip files. Encrypted entries are rejected with Error::EncryptedEntry.
If you need to extract encrypted archives:
- Decrypt first using the
zipcrate directly - Then extract with
safe_unzip
This is intentionalβencryption handling is outside our security scope. Password management, key derivation, and cryptographic validation are complex domains that deserve dedicated tooling.
Extraction Behavior
- Partial state in Streaming mode β If extraction fails mid-way, already-extracted files remain on disk. Use
ExtractionMode::ValidateFirstto validate before writing. - Filters not applied during validation β In
ValidateFirstmode, limits are checked against ALL entries. Filtered entries still count toward limits. This is conservative: validation may reject archives that would succeed with filtering.
Security Scope
These threats are not fully addressed (by design or complexity):
| Limitation | Reason |
|---|---|
| Case-insensitive collisions | On Windows/macOS, File.txt and file.txt map to the same file. We don't track extracted names to detect this. |
| Unicode normalization | cafΓ© (NFC) vs cafΓ© (NFD) appear identical but are different bytes. Full normalization requires ICU. |
| Concurrent extraction | If multiple threads/processes extract to the same destination, race conditions can occur. Use file locking or separate destinations. |
| Sparse file attacks | Not applicable to zip format. |
| Hard links | Zip format doesn't support hard links. |
| Device files | Zip format doesn't support special device files. |
TOCTOU Mitigations
For OverwriteMode::Error and OverwriteMode::Skip, we use atomic file creation (O_CREAT | O_EXCL) instead of check-then-create. This eliminates race conditions between checking if a file exists and creating it.
For OverwriteMode::Overwrite, symlinks are removed before writing to prevent symlink-following attacks, but there's a brief window between removal and creation.
Filename Restrictions
These filenames are rejected for security:
- Control characters (including null bytes)
- Backslashes (
\) β prevents Windows path separator confusion - Paths longer than 1024 bytes
- Path components longer than 255 bytes
- Windows reserved names:
CON,PRN,AUX,NUL,COM1-9,LPT1-9
Development
Fuzzing
We use cargo-fuzz with two targets:
# Install cargo-fuzz (requires nightly)
cargo install cargo-fuzz
# Run the main extraction fuzzer
cargo +nightly fuzz run fuzz_extract
# Run the adapter fuzzer (tests parsing layer)
cargo +nightly fuzz run fuzz_zip_adapter
Fuzzing targets are in fuzz/fuzz_targets/. Run fuzzing before releases to catch parsing edge cases.
License
MIT OR Apache-2.0
Dependencies
~3β18MB
~217K SLoC