Thanks to visit codestin.com
Credit goes to docs.rs

Crate soft_canonicalize

Crate soft_canonicalize 

Source
Expand description

§soft-canonicalize

Path canonicalization that works with non-existing paths.

Rust implementation inspired by Python 3.6+ pathlib.Path.resolve(strict=False), providing the same functionality as std::fs::canonicalize (Rust’s equivalent to Unix realpath()) but extended to handle non-existing paths, with optional features for simplified Windows output (dunce) and virtual filesystem semantics (anchored).

§Why Use This?

  • 🚀 Works with non-existing paths - Plan file locations before creating them
  • ⚡ Fast - Optimized performance with minimal allocations and syscalls
  • ✅ Compatible - 100% behavioral match with std::fs::canonicalize for existing paths, with optional UNC simplification via dunce feature (Windows)
  • 🎯 Virtual filesystem support - Optional anchored feature for bounded canonicalization within directory boundaries
  • 🔒 Robust - 495 comprehensive tests covering edge cases and security scenarios
  • 🛡️ Safe traversal - Proper .. and symlink resolution with cycle detection
  • 🌍 Cross-platform - Windows, macOS, Linux with comprehensive UNC/symlink handling
  • 🔧 Zero dependencies - Optional features may add minimal dependencies

§Lexical vs. Filesystem-Based Resolution

Path resolution libraries fall into two categories:

Lexical Resolution (no I/O):

  • Performance: Fast - no filesystem access
  • Accuracy: Incorrect if symlinks are present (doesn’t resolve them)
  • Use when: You’re 100% certain no symlinks exist and need maximum performance
  • Examples: std::path::absolute, normpath::normalize

Filesystem-Based Resolution (performs I/O):

  • Performance: Slower - requires filesystem syscalls to resolve symlinks
  • Accuracy: Correct - follows symlinks to their targets
  • Use when: Safety is priority over performance, or symlinks may be present
  • Examples: std::fs::canonicalize, soft_canonicalize, dunce::canonicalize

Rule of thumb: If you cannot guarantee symlinks won’t be introduced, or if correctness is critical, use filesystem-based resolution.

§Use Cases

§Path Comparison

  • Equality: Determine if two different path strings point to the same location
  • Containment: Check if one path is inside another directory

§Common Applications

  • Build Systems: Resolve output paths during build planning before directories exist
  • Configuration Validation: Ensure user-provided paths stay within allowed boundaries
  • Deduplication: Detect when different path strings refer to the same planned location
  • Cross-Platform Normalization: Handle Windows UNC paths and symlinks consistently

§Quick Start

[dependencies]
soft-canonicalize = "0.5"

§Basic Example

use soft_canonicalize::soft_canonicalize;

let non_existing_path = r"C:\Users\user\documents\..\non\existing\config.json";

// Using Rust's own std canonicalize function:
let result = std::fs::canonicalize(non_existing_path);
assert!(result.is_err());

// Using our crate's function:
let result = soft_canonicalize(non_existing_path);
assert!(result.is_ok());

// Shows the UNC path conversion and path normalization
assert_eq!(
    result.unwrap().to_string_lossy(),
    r"\\?\C:\Users\user\non\existing\config.json"
);

// With `dunce` feature enabled, paths are simplified when safe
assert_eq!(
    result.unwrap().to_string_lossy(),
    r"C:\Users\user\non\existing\config.json"
);

§Optional Features

§Anchored Canonicalization (anchored feature)

For correct symlink resolution within virtual/constrained directory spaces, use anchored_canonicalize. This function implements true virtual filesystem semantics by clamping ALL paths (including absolute symlink targets) to the anchor directory:

[dependencies]
soft-canonicalize = { version = "0.5", features = ["anchored"] }
use soft_canonicalize::anchored_canonicalize;
use std::fs;

// Set up an anchor/root directory (no need to pre-canonicalize)
let anchor = std::env::temp_dir().join("workspace_root");
fs::create_dir_all(&anchor)?;

// Canonicalize paths relative to the anchor (anchor is soft-canonicalized internally)
let resolved_path = anchored_canonicalize(&anchor, "../../../etc/passwd")?;
// Result: /tmp/workspace_root/etc/passwd (lexical .. clamped to anchor)

// Absolute symlinks are also clamped to the anchor
// If there's a symlink: workspace_root/config -> /etc/config
// It resolves to: workspace_root/etc/config (clamped to anchor)
let symlink_path = anchored_canonicalize(&anchor, "config")?;
// Safe: always stays within workspace_root, even if symlink points to /etc/config

Key features:

  • Virtual filesystem semantics: All absolute paths (including symlink targets) are clamped to anchor
  • Anchor-relative canonicalization: Resolves paths relative to a specific anchor directory
  • Complete symlink clamping: Follows symlink chains with clamping at each step
  • Component-by-component: Processes path components in proper order
  • Absolute results: Always returns absolute canonical paths within the anchor boundary

For a complete multi-tenant security example, run:

cargo run --example virtual_filesystem_demo --features anchored

§Simplified Path Output (dunce feature, Windows-only)

By default, soft_canonicalize returns Windows paths in extended-length UNC format (\\?\C:\foo) for maximum robustness and compatibility with long paths, reserved names, and other Windows filesystem edge cases.

If you need simplified paths (C:\foo) for compatibility with legacy applications or user-facing output, enable the dunce feature:

[dependencies]
soft-canonicalize = { version = "0.5", features = ["dunce"] }

Example:

use soft_canonicalize::soft_canonicalize;
let path = soft_canonicalize(r"C:\Users\user\documents\..\config.json")?;

// Without dunce feature (default):
// Returns: \\?\C:\Users\user\config.json (extended-length UNC)

// With dunce feature enabled:
// Returns: C:\Users\user\config.json (simplified when safe)

When to use:

  • ✅ Legacy applications that don’t support UNC paths
  • ✅ User-facing output requiring familiar path format
  • ✅ Tools expecting traditional Windows path format

How it works:

The dunce crate intelligently simplifies Windows UNC paths (\\?\C:\fooC:\foo) only when safe:

  • Automatically keeps UNC for paths >260 chars
  • Automatically keeps UNC for reserved names (CON, PRN, NUL, COM1-9, LPT1-9)
  • Automatically keeps UNC for paths with trailing spaces/dots
  • Automatically keeps UNC for paths containing .. (literal interpretation)

§When Paths Must Exist: proc-canonicalize

Since v0.5.0, soft_canonicalize uses proc-canonicalize by default for existing-path canonicalization instead of std::fs::canonicalize. This fixes a critical issue with Linux namespace boundaries.

The Problem: On Linux, std::fs::canonicalize resolves “magic symlinks” like /proc/PID/root to their targets, losing the namespace boundary:

// /proc/self/root is a "magic symlink" pointing to the current process's root filesystem
// std::fs::canonicalize incorrectly resolves it to "/"
let std_result = std::fs::canonicalize("/proc/self/root")?;
assert_eq!(std_result.to_string_lossy(), "/"); // Wrong! Namespace boundary lost

// proc_canonicalize preserves the namespace boundary
let proc_result = proc_canonicalize::canonicalize("/proc/self/root")?;
assert_eq!(proc_result.to_string_lossy(), "/proc/self/root"); // Correct!

Recommendation: If you need to canonicalize paths that must exist (and would previously use std::fs::canonicalize), use proc_canonicalize::canonicalize for correct Linux namespace handling:

[dependencies]
proc-canonicalize = "0.0"

§Security & CVE Coverage

Security does not depend on enabling features. The core API is secure-by-default; the optional anchored feature is a convenience for virtual roots. We test all modes (no features; --features anchored; --features anchored,dunce).

Built-in protections include:

  • NTFS Alternate Data Stream (ADS) validation - Blocks malicious stream placements and traversal attempts
  • Symlink cycle detection - Bounded depth tracking prevents infinite loops
  • Path traversal clamping - Never ascends past root/share/device boundaries
  • Null byte rejection - Early validation prevents injection attacks
  • UNC/device semantics - Preserves Windows extended-length and device namespace integrity
  • TOCTOU race resistance - Tested against time-of-check-time-of-use attacks

See docs/SECURITY.md for detailed analysis, attack scenarios, and test references.

§Cross-Platform Notes

  • Windows: returns extended-length verbatim paths for absolute results (\\?\C:\…, \\?\UNC\…)
    • With dunce feature: returns simplified paths (C:\…) when safe
  • Unix-like systems: standard absolute and relative path semantics
  • UNC floors and device namespaces are preserved and respected

§Testing

495 tests including:

  • std::fs::canonicalize compatibility tests (existing paths)
  • Path traversal and robustness tests
  • Python pathlib-inspired behavior checks
  • Platform-specific cases (Windows/macOS/Linux)
  • Symlink semantics and cycle detection
  • Windows-specific UNC, 8.3, and ADS validation
  • Anchored canonicalization tests (with anchored feature)

§Known Limitation (Windows 8.3)

On Windows, for non-existing paths we cannot determine equivalence between a short (8.3) name and its long form. Existing paths are canonicalized to the same result.

use soft_canonicalize::soft_canonicalize;
let short_form = soft_canonicalize("C:/PROGRA~1/MyApp/config.json")?;
let long_form  = soft_canonicalize("C:/Program Files/MyApp/config.json")?;
assert_ne!(short_form, long_form); // for non-existing suffixes

§How It Works

For those interested in the implementation details, here’s how soft_canonicalize processes paths:

  1. Input validation (empty path, platform pre-checks)
  2. Convert to absolute path (preserving drive/root semantics)
  3. Fast-path: try fs::canonicalize on the original absolute path
  4. Lexically normalize . and .. (fast-path optimization for whole-path existence check)
  5. Fast-path: try fs::canonicalize on the normalized path when different
  6. Validate null bytes (platform-specific)
  7. Discover deepest existing prefix with symlink-first semantics: resolve symlinks incrementally, then process . and .. relative to resolved targets
  8. Optionally canonicalize the anchor (if symlinks seen) and rebuild
  9. Append non-existing suffix lexically, then normalize if needed
  10. Windows: ensure extended-length prefix for absolute paths
  11. Optional: simplify Windows paths when dunce feature enabled

Structs§

SoftCanonicalizeError
Error payload used by this crate to attach the offending path to I/O errors.

Constants§

MAX_SYMLINK_DEPTH
Maximum number of symlinks to follow before giving up. This matches the behavior of std::fs::canonicalize and OS limits:

Traits§

IoErrorPathExt
Extension to extract our path-aware payload from io::Error.

Functions§

soft_canonicalize
Performs “soft” canonicalization on a path.