Codestin Search App

Dimi1010 · 2025-09-12T09:40:58Z

The PR adds heuristics based on the file content that is more robust than deciding based on the file extension.

The new decision model scans the start of the file for its magic number signature. It then compares it to the signatures of supported file types [1] and constructs a reader instance based on the result.

A new function createReader and tryCreateReader has been added due to changes in the public API of the factory.
The functions differ in the error handling scheme, as createReader throws and tryCreateReader returns nullptr on error.

Method behaviour changes during erroneous scenarios:

Scenario	`getReader`	`createReader`	`tryCreateReader`
File not found	N/A	Throws exception	Return `nullptr`
Unsupported format	Return `PcapFileDeviceReader`	Throws exception	Return `nullptr`

…sed on the magic number.

…le-selection

… tied to it.

…ics detection method.

codecov · 2025-09-12T09:59:59Z

Codecov Report

❌ Patch coverage is 88.78205% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.83%. Comparing base (0bd4834) to head (1f1fb30).

Files with missing lines	Patch %	Lines
Pcap++/src/PcapFileDevice.cpp	85.97%	19 Missing and 4 partials ⚠️
Tests/Pcap++Test/Tests/FileTests.cpp	91.36%	5 Missing and 7 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##              dev    #1962      +/-   ##
==========================================
- Coverage   84.22%   83.83%   -0.40%     
==========================================
  Files         309      313       +4     
  Lines       55070    55976     +906     
  Branches    11310    11828     +518     
==========================================
+ Hits        46384    46928     +544     
- Misses       7556     8225     +669     
+ Partials     1130      823     -307

Flag	Coverage Δ
alpine320	`76.40% <79.78%> (+0.01%)`	⬆️
fedora42	`76.13% <80.21%> (-0.01%)`	⬇️
macos-14	`81.97% <82.49%> (-0.01%)`	⬇️
macos-15	`81.96% <83.65%> (+<0.01%)`	⬆️
mingw32	`70.39% <79.41%> (+0.04%)`	⬆️
mingw64	`70.39% <79.41%> (+0.13%)`	⬆️
npcap	`?`
rhel94	`75.76% <79.21%> (+0.01%)`	⬆️
ubuntu2004	`59.58% <59.30%> (-0.02%)`	⬇️
ubuntu2004-zstd	`59.67% <57.98%> (-0.02%)`	⬇️
ubuntu2204	`75.70% <79.21%> (+0.01%)`	⬆️
ubuntu2204-icpx	`59.05% <59.32%> (-0.02%)`	⬇️
ubuntu2404	`76.08% <79.21%> (+0.02%)`	⬆️
ubuntu2404-arm64	`76.07% <79.78%> (+0.01%)`	⬆️
unittest	`83.83% <88.78%> (-0.40%)`	⬇️
windows-2022	`85.60% <85.60%> (+0.11%)`	⬆️
windows-2025	`85.64% <85.60%> (+0.13%)`	⬆️
winpcap	`85.64% <85.60%> (-0.09%)`	⬇️
xdp	`51.62% <0.95%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Pcap++/src/PcapFileDevice.cpp

Tests/Pcap++Test/Tests/FileTests.cpp

seladb · 2025-09-15T08:08:40Z

Tests/Pcap++Test/Tests/FileTests.cpp

-	PTF_ASSERT_NOT_NULL(dynamic_cast<pcpp::PcapNgFileReaderDevice*>(genericReader));
-	PTF_ASSERT_TRUE(genericReader->open());
+	// ------- IFileReaderDevice::createReader() Factory
+	// TODO: Move to a separate unit test.


We should add the following to get more coverage:

Open a snoop file

Open a file that is not any of the options

Open pcap files with different magic numbers

Assuming we add a version check for snoop and pcap file: create temp files with bogus data that has the magic number but wrong versions

3d713ab adds the following tests:

Pcap, PcapNG, Zst file with correct content + extension

Pcap, PcanNG file with correct content + wrong extension

Bogus content file with correct extension (pcap, pcapng, zst)

Bogus content file with wrong extension (txt)

Haven't found a snoop file to add. Do we have any?

Open pcap files with different magic numbers

Do you mean Pcap content that has just its magic number changed? Because IMO it is reasonable to consider that invalid format and fail as regular bogus data.

Assuming we add a version check for snoop and pcap file: create temp files with bogus data that has the magic number but wrong versions

Pending on #1962 (comment) .

Pcap++/src/PcapFileDevice.cpp

Move it out if it needs to be reused somewhere.

Libpcap supports reading this format since 0.9.1. The heuristics detection will identify such magic number as pcap and leave final support decision to the pcap backend infrastructure.

seladb · 2025-09-21T08:10:16Z

@Dimi1010 some CI tests fail...

…le-selection

…in `createReader` instead of having the Format detector assume that is what is intended.

…t detector from libpcap behaviour.

…AII initialization.

…le-selection

…n tryCreateDevice.

seladb · 2025-12-31T08:15:25Z

@Dimi1010 I'm working on parsing pcap files without libpcap: #2034
Maybe we can rework this PR after my PR is merged?

…le-selection # Conflicts: # Pcap++/src/PcapFileDevice.cpp # Tests/Pcap++Test/Tests/FileTests.cpp

…ternal parser.

…le-selection

Dimi1010 · 2026-02-20T21:37:59Z

@seladb can we merge this? It has been sitting for a while.

seladb · 2026-02-22T08:09:20Z

Pcap++/src/PcapFileDevice.cpp

+		};
+
+		/// @brief Heuristic file format detector that scans the magic number of the file format header.
+		class CaptureFileFormatDetector


Since we're not parsing all formats (maybe except Zstd) in PcapPlusPlus, we can reuse the logic we already have. Maybe it can run the open() method (or extract a portion of it) for each reader type until it can to find the right type?

WDYM, we are not parsing all formats? Did you mean "now"?

Also, the necessary logic to detect the file format is already extracted in this class. Tbh, the open() call should probably delegate the format detection to this class if more comprehensive magic number format validation is needed.

IMO, how the file is processed after format detection that is a separate concern. In the device selection that is to be handled in the createReader device factory, thus allowing looser coupling between actual device classes and format detection. (e.g it is as simple to swapping if PcapNG creates PcapDevice or PcapNGDevice as swapping a case statement).

I think integrating the functionality into open() would be suboptimal for the following reasons:

It potentially adds more responsibilities to the function that just "open the device".

Looping through all the devices would involve iterating through a loop of more complicated operations.
Constructing the device and possibly repeated file open / close for each open() call as it is designed to function independently.

An open() call can fail for multiple other reasons, not affiliated with the file format specifically.

WDYM, we are not parsing all formats? Did you mean "now"?

Yes, I meant "now", sorry for the typo 🤦

IMO, how the file is processed after format detection that is a separate concern. In the device selection that is to be handled in the createReader device factory, thus allowing looser coupling between actual device classes and format detection. (e.g it is as simple to swapping if PcapNG creates PcapDevice or PcapNGDevice as swapping a case statement).

I think integrating the functionality into open() would be suboptimal for the following reasons:

Having duplicate logic to determine if the file is of a certain format in both the device and CaptureFileFormatDetector is not great because if we fix a bug in one of them, we might miss the other. I think this logic should be in one place: either CaptureFileFormatDetector calls open() (might be the easiest option), or we can extract the detection logic and use it in both places

Hmm, it should be possible. It will require expanding the CaptureFileFormatDetector a bit. Currently it only returns the format, but pcap for instance uses the magic number to also detect native or swapped byte order.

Depending on how specific we want to get it might involve a double read of the magic number, once by the format detector and once during the actual file header structure read. Impact should be minimal tho, as fstream is buffered by default.

@seladb Tried a WIP implementation. It is possible to have open() call the format detector, tho I am not perfectly happy with the current iteration I have.

Can we do that merge of functionality in another PR, since those changes would also modify the PcapReader/Writer and SnoopReader and it goes out of scope of this PR?

PS: The WIP API would is something like this:

/// @brief An enumeration representing different capture file formats. enum class CaptureFileFormat { Unknown, Pcap, // regular pcap with microsecond precision PcapMod, // Alexey Kuznetzov's "modified" pcap format PcapNano, // regular pcap with nanosecond precision PcapNG, // uncompressed pcapng Snoop, // solaris snoop ZstArchive, // zstd compressed archive }; /// @brief Specifies the byte order (endianness) of a capture file relative to the host system. enum class CaptureFileByteOrder { Unknown, // Unknown format. Magic number is palindrome. Native, // Byte order is native to the host system. Swapped // Byte order is swapped to the host system. }; /// @brief Heuristic file format detector that scans the magic number of the file format header. class CaptureFileFormatDetector { public: /// @brief Checks a content stream for the magic number and determines the type. /// /// The function optionally detects the byte order of the file if it can be determined by the magic number. /// The byte order is not updated if no supported format is detected. /// /// @param[in] content A stream that contains the file content. /// @param[out] byteOrder Optional location to store the detected byte order. /// @return A CaptureFileFormat value with the detected content type. CaptureFileFormat detectFormat(std::istream& content, CaptureFileByteOrder* byteOrder = nullptr) const; /// @brief Checks a content stream for the magic number and determines if it is a Pcap file. /// /// The function optionally detects the byte order of the file if it can be determined by the magic number. /// The byte order is not updated if no supported format is detected. /// /// @param[in] content A stream that contains the file content. /// @param[out] byteOrder Optional location to store the detected byte order. /// @return A CaptureFileFormat value with the detected Pcap format or Unknown if the file is not pcap. CaptureFileFormat detectPcapFile(std::istream& content, CaptureFileByteOrder* byteOrder = nullptr) const; /// @brief Checks a content stream for the magic number and determines if it is a PcapNG file. /// @param[in] content A stream that contains the file content. /// @return True if the content stream is PcapNG file, false otherwise. bool isPcapNgFile(std::istream& content) const; /// @brief Checks a content stream for the magic number and determines if it is a Snoop file. /// @param[in] content A stream that contains the file content. /// @param[out] byteOrder Optional location to store the detected byte order. /// @return True if the content stream is Snoop file, false otherwise. bool isSnoopFile(std::istream& content, CaptureFileByteOrder* byteOrder = nullptr) const; /// @brief Checks a content stream for the magic number and determines if it is a Zstd archive. /// /// The function optionally detects the byte order of the file if it can be determined by the magic number. /// The byte order is not updated if no supported format is detected. /// /// @param[in] content A stream that contains the file content. /// @param[out] byteOrder Optional location to store the detected byte order. /// @return True if the content stream is Snoop file, false otherwise. bool isZstdArchive(std::istream& content, CaptureFileByteOrder* byteOrder = nullptr) const; };

I'm not sure we need CaptureFileFormatDetector if we call open() for each file type.

If we don't want to call open() we can extract the detection logic for each format in a static method, for example:

private: static bool PcapFileReaderDevice::isPcapFile(const std::ifstream& file, FileTimestampPrecision& precision, bool& needsSwap); public: static bool PcapFileReaderDevice::isPcapFile(const std::ifstream& file) { return isPcapFile(file, ...); } bool PcapFileReaderDevice::open() { ... ... = isPcapFile(...); ... }

…le-selection

Dimi1010 added 4 commits September 12, 2025 12:03

Added heuristics file content detector that determines the content ba…

02de760

…sed on the magic number.

Merge remote-tracking branch 'upstream/dev' into feature/heuristic-fi…

d2b6339

…le-selection

Moved stream checkpoint outside format detector as it is not directly…

685dd9f

… tied to it.

Added a new factory function createReader that uses the new heurist…

40dee69

…ics detection method.

Dimi1010 added the enhancement label Sep 12, 2025

Add <algorithm> include.

f1e3e18

Dimi1010 added 2 commits September 12, 2025 13:17

Added unit tests.

8da1790

Deprecated old factory function.

3ad51e2

Dimi1010 added the API deprecation Pull requests that deprecate parts of the public interface. label Sep 12, 2025

Dimi1010 added 3 commits September 12, 2025 14:08

Add byte-swapped zstd magic number.

15c2000

Lint

17af8d4

Move enum closer to first usage.

46418ec

Dimi1010 marked this pull request as ready for review September 12, 2025 11:36

Dimi1010 requested a review from seladb as a code owner September 12, 2025 11:36

Dimi1010 requested review from clementperon, egecetin and tigercosmos September 12, 2025 11:36

tigercosmos approved these changes Sep 12, 2025

View reviewed changes

seladb reviewed Sep 15, 2025

View reviewed changes

Dimi1010 added 4 commits September 15, 2025 15:45

Added unit tests for file reader device factory.

3d713ab

Revert indentation.

a2391ec

Fixed StreamCheckpoint to restore original stream state.

ea328d7

Merge branch 'dev' into feature/heuristic-file-selection

db86c3e

Dimi1010 commented Sep 19, 2025

View reviewed changes

Pcap++/src/PcapFileDevice.cpp Outdated Show resolved Hide resolved

Dimi1010 added 3 commits September 20, 2025 12:59

Merge branch 'dev' into feature/heuristic-file-selection

4aed9bd

Moved isStreamSeekable helper to inside CaptureFileFormatDetector.

a83ae2b

Move it out if it needs to be reused somewhere.

Added pcap magic number for Alexey Kuznetzov's modified pcap format.

916e872

Libpcap supports reading this format since 0.9.1. The heuristics detection will identify such magic number as pcap and leave final support decision to the pcap backend infrastructure.

Merge remote-tracking branch 'upstream/dev' into feature/heuristic-fi…

022529f

…le-selection

Trimmed pcapng sample.

54f7bae

Dimi1010 linked an issue Oct 20, 2025 that may be closed by this pull request

Add indication if LightPcapNG backend is compiled with ZSTD compression support. #1973

Open

Dimi1010 mentioned this pull request Oct 20, 2025

Add indication if LightPcapNG backend is compiled with ZSTD compression support. #1973

Open

nbooster mentioned this pull request Oct 21, 2025

Unaligned packet field (UBSanitizer error report) plus warnings... #2001

Open

Dimi1010 added 18 commits October 24, 2025 09:29

Merge branch 'dev' into feature/heuristic-file-selection

93cba3d

Change PcapNGZst to ZstArchive. Zst to PcapNG branch folding is done …

3643fac

…in `createReader` instead of having the Format detector assume that is what is intended.

Added separate format value for "modified" pcap to separate the forma…

ec5980f

…t detector from libpcap behaviour.

Docs fix.

b3639a9

Merge branch 'dev' into feature/heuristic-file-selection

ce561b9

Merge branch 'dev' into feature/heuristic-file-selection

6322b41

Add automatic open functionality to createReader factory to mimic R…

ae9caa8

…AII initialization.

Merge branch 'dev' into feature/heuristic-file-selection

956b596

Update docstring.

5275349

Fix exception message assert.

a15f529

Merge remote-tracking branch 'upstream/dev' into feature/heuristic-fi…

385528c

…le-selection

Merge remote-tracking branch 'upstream/dev' into feature/heuristic-fi…

9f5b5f1

…le-selection

Refactored format tests to utilize the createReader factory.

b0674bd

Fix nanoprecision test issues.

e614176

Remove openDevice flag. Update create procedure to avoid exceptions o…

bb9917b

…n tryCreateDevice.

Docs update + Lint

9a2a390

Docs fix.

df0a5a8

Lint.

92c80d9

Dimi1010 requested a review from seladb December 30, 2025 10:38

Dimi1010 added 3 commits January 17, 2026 17:09

Merge remote-tracking branch 'upstream/dev' into feature/heuristic-fi…

692df58

…le-selection # Conflicts: # Pcap++/src/PcapFileDevice.cpp # Tests/Pcap++Test/Tests/FileTests.cpp

Remove nano support checks as it should always be supported by the in…

fdf8c89

…ternal parser.

Merge remote-tracking branch 'upstream/dev' into feature/heuristic-fi…

5a62e9b

…le-selection

seladb reviewed Feb 22, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/dev' into feature/heuristic-fi…

1f1fb30

…le-selection

Conversation

Dimi1010 commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

seladb Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Dimi1010 Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

seladb commented Sep 21, 2025

Uh oh!

seladb commented Dec 31, 2025

Uh oh!

Dimi1010 commented Feb 20, 2026

Uh oh!

seladb Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Dimi1010 Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seladb Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Dimi1010 Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Dimi1010 Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seladb Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Dimi1010 commented Sep 12, 2025 •

edited

Loading

codecov bot commented Sep 12, 2025 •

edited

Loading

Dimi1010 Feb 22, 2026 •

edited

Loading

Dimi1010 Feb 24, 2026 •

edited

Loading

seladb Feb 28, 2026 •

edited

Loading