`rawccopy-rs`

An Implementation for Direct NTFS File Content Extraction via Raw Disk Parsing

Summary

rawccopy-rs is a low-level library designed for the direct extraction of file content from New Technology File System (NTFS) volumes. It operates by parsing on-disk data structures, bypassing high-level operating system file I/O APIs. This methodology provides unfettered access to file data, irrespective of file system locks, security descriptors, or API-level data concealment mechanisms. The library's approach is rooted in a direct interpretation of NTFS metadata, including the Master File Table ($MFT), attribute runlists, and index B-trees, enabling the reconstruction of any file's data stream from a raw disk image or a live volume.

1. The Problem Domain: Limitations of API-Based File Access

Standard interaction with file systems is mediated through operating system APIs (e.g., CreateFileW, ReadFile in the Windows API). While suitable for general-purpose computing, this abstracted model presents significant limitations in specialized fields such as digital forensics and incident response (DFIR), where obtaining a "ground truth" representation of on-disk data is required.

API-level access is subject to several constraints:

Exclusive File Locking: The operating system and its applications often place exclusive locks on critical system files (e.g., registry hives, pagefiles, active database files), preventing them from being read by other processes.
Security and Permissions: Access to files is governed by security descriptors, which may prevent even a privileged user from reading specific data.
API-Level Obfuscation: Malicious software (rootkits) can intercept or "hook" file system APIs to conceal the presence of files, directories, or alternate data streams from user-mode applications.
Filesystem Abstractions: The API presents a simplified view of a file, hiding the underlying complexity of its physical storage, such as fragmentation, compression, or residency within the master file table.

Overcoming these limitations requires a methodology that circumvents the OS file system driver and interacts directly with the volume at the block level.

2. Methodology: Direct NTFS Structure Parsing

The rawccopy-rs library implements a direct-access model by parsing NTFS on-disk structures. This process reconstructs the location and content of a file by interpreting the file system's metadata as a database.

2.1. Volume and Boot Sector Initialization

The process begins by acquiring a handle to a raw block device (e.g., \\.\PhysicalDrive0, \\.\C:) or a forensic disk image. The first sector of the target volume, the Volume Boot Record (VBR), is read. The OEM ID field is validated to confirm the presence of an NTFS file system ("NTFS ").

From the VBR's BIOS Parameter Block (BPB), critical geometry parameters are extracted, including:

Bytes per sector.
Sectors per cluster.
The logical cluster number (LCN) of the Master File Table ($MFT).

2.2. Master File Table ($MFT) Processing

The $MFT is the central metadata file in NTFS, containing at least one entry—an MFT record—for every file and directory on the volume. The library first locates the $MFT using the LCN from the boot sector and reads its own MFT record (always at index 0).

Each MFT record is a fixed-size block of data (typically 1024 bytes). Before parsing, each record undergoes a "fix-up" procedure. NTFS uses an Update Sequence Array (USA) to protect against torn writes. The last two bytes of each sector in the record are replaced with a signature, and the original bytes are stored in the USA within the record header. The library validates this signature and patches the original bytes back into the record to ensure its integrity before further processing.

2.3. Attribute Interpretation and Data Retrieval

An MFT record is composed of a series of variable-length attribute structures. These attributes define the characteristics of a file, such as its name, timestamps, and data content. The library iterates through these attributes to locate the primary data stream, represented by the $DATA attribute.

NTFS file data can be stored in two ways:

Resident Data: For very small files, the data is stored directly within the $DATA attribute inside the MFT record itself. Extraction is a simple matter of reading the bytes from the attribute's value offset.
Non-Resident Data: For larger files, the data is stored in clusters elsewhere on the volume. The $DATA attribute contains not the data itself, but a set of pointers known as a runlist (or mapping pairs).

The runlist is a highly compact representation of data extents. Each entry specifies a starting Virtual Cluster Number (VCN) within the file and a corresponding Logical Cluster Number (LCN) on the disk, along with the length of the contiguous run. The library parses these runlists to build a complete map of the file's physical layout on the disk, allowing for the precise reconstruction of fragmented files.

2.4. Support for Advanced NTFS Features

The methodology extends to handle more complex NTFS file system features:

Sparse Files: A runlist entry with an LCN of zero indicates a sparse region, which contains no allocated data. The library interprets this as a block of zero-bytes of the specified length.
Compressed Files: NTFS supports transparent file compression using the LZNT1 algorithm. A compressed $DATA attribute has a non-zero compression unit size. The library reads the compressed data in blocks from the disk, identifies compressed versus uncompressed regions within a compression unit, and applies an LZNT1 decompression routine to reconstruct the original data.
Attribute Lists: If a file has too many attributes to fit within a single MFT record, some attributes are moved to extension MFT records. An $ATTRIBUTE_LIST attribute is created in the base record, which contains pointers to the MFT records holding the externalized attributes. The library parses the $ATTRIBUTE_LIST to locate and read all parts of a fragmented attribute, such as a highly fragmented $DATA stream.

3. Path and Index Resolution

To locate a file by its path, the library implements a parser for NTFS index structures, which are used for directories. A directory's $INDEX_ROOT and $INDEX_ALLOCATION attributes form a B-tree that maps file names to their MFT record references.

The library traverses this B-tree by starting at the root directory (MFT record #5) and recursively searching the index for each component of the target path. This allows for the resolution of any file path to its corresponding MFT record number without relying on OS API calls. The path resolution logic also supports NTFS reparse points by parsing the $REPARSE_POINT attribute to follow symbolic links and volume mount points.

Conclusion

The rawccopy library provides a robust and filesystem-native methodology for file data extraction. By parsing on-disk NTFS structures directly—from the boot sector to MFT records, attribute runlists, and index trees—it reconstructs file content with high fidelity. This approach successfully bypasses the abstractions and limitations of standard file I/O APIs, making it a suitable engine for forensic tooling that requires unmediated access to file system data.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
rawccopy-sys		rawccopy-sys
rawccopy		rawccopy
.gitignore		.gitignore
.gitmodules		.gitmodules
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`rawccopy-rs`

An Implementation for Direct NTFS File Content Extraction via Raw Disk Parsing

Summary

1. The Problem Domain: Limitations of API-Based File Access

2. Methodology: Direct NTFS Structure Parsing

2.1. Volume and Boot Sector Initialization

2.2. Master File Table ($MFT) Processing

2.3. Attribute Interpretation and Data Retrieval

2.4. Support for Advanced NTFS Features

3. Path and Index Resolution

Conclusion

About

Uh oh!

Releases

Packages

Languages

xangelix/rawccopy-rs

Folders and files

Latest commit

History

Repository files navigation

rawccopy-rs

An Implementation for Direct NTFS File Content Extraction via Raw Disk Parsing

Summary

1. The Problem Domain: Limitations of API-Based File Access

2. Methodology: Direct NTFS Structure Parsing

2.1. Volume and Boot Sector Initialization

2.2. Master File Table ($MFT) Processing

2.3. Attribute Interpretation and Data Retrieval

2.4. Support for Advanced NTFS Features

3. Path and Index Resolution

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`rawccopy-rs`

Packages