Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix out-of-bounds access and key handling in Caffe/Caffe2 reader#6211

Merged
JanuszL merged 3 commits into
NVIDIA:mainfrom
JanuszL:fix_caffe_Reader
Feb 19, 2026
Merged

Fix out-of-bounds access and key handling in Caffe/Caffe2 reader#6211
JanuszL merged 3 commits into
NVIDIA:mainfrom
JanuszL:fix_caffe_Reader

Conversation

@JanuszL
Copy link
Copy Markdown
Contributor

@JanuszL JanuszL commented Feb 14, 2026

  • Uses sized std::string construction for LMDB keys to avoid reading
    past non-null-terminated key data
  • Adds bounds checks for label indices in Caffe2 multi-label sparse
    and weighted-sparse parsing paths
  • Validate image byte_data size against declared dimensions in
    Caffe2Parser
  • Add negative Python tests exercising the new error paths

Category:

Bug fix (non-breaking change which fixes an issue)

Description:

  • Uses sized std::string construction for LMDB keys to avoid reading
    past non-null-terminated key data
  • Adds bounds checks for label indices in Caffe2 multi-label sparse
    and weighted-sparse parsing paths
  • Validate image byte_data size against declared dimensions in
    Caffe2Parser
  • Add negative Python tests exercising the new error paths

Additional information:

Affected modules and functionalities:

  • caffe2 parser
  • lmdb
  • test_caffe

Key points relevant for the review:

  • NA

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
      • test_caffe.test_caffe2_parser_label_index_out_of_bounds_sparse
      • test_caffe.test_caffe2_parser_label_index_out_of_bounds_weighted_sparse
      • test_caffe.test_caffe2_parser_image_byte_data_size_mismatch
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

- Uses sized std::string construction for LMDB keys to avoid reading
  past non-null-terminated key data
- Adds bounds checks for label indices in Caffe2 multi-label sparse
  and weighted-sparse parsing paths
- Validate image byte_data size against declared dimensions in
  Caffe2Parser
- Add negative Python tests exercising the new error paths

Signed-off-by: Janusz Lisiecki <[email protected]>
@JanuszL JanuszL added the important-fix Fixes an important issue in the software or development environment. label Feb 14, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Feb 14, 2026

Greptile Summary

This PR fixes critical security vulnerabilities in the Caffe/Caffe2 reader by preventing out-of-bounds memory accesses in three distinct code paths.

Key Changes:

  • LMDB Key Handling: Replaced unsafe to_string(reinterpret_cast<char*>(key.mv_data)) with sized string constructor std::string(static_cast<const char*>(key.mv_data), key.mv_size) to prevent reading past non-null-terminated key data
  • Label Index Validation: Added DALI_ENFORCE bounds checks for label indices in both MULTI_LABEL_SPARSE and MULTI_LABEL_WEIGHTED_SPARSE parsing paths, ensuring indices are within [0, num_labels) range
  • Image Data Size Validation: Added size verification for byte_data against declared dimensions (H×W×C) before memcpy, preventing buffer overruns
  • Integer Overflow Prevention: Changed static_cast<size_t>(H * W * C) to static_cast<size_t>(H) * W * C to prevent integer overflow before cast
  • Type Safety Improvement: Changed label index from auto idx = static_cast<int>(...) to int64_t idx in MULTI_LABEL_SPARSE path to avoid potential overflow when casting from float/int to int

Testing:
Three new negative tests were added with hand-crafted protobuf data to verify error handling for out-of-bounds label indices (sparse and weighted-sparse variants) and image data size mismatches.

Confidence Score: 5/5

  • This PR is safe to merge and addresses critical security vulnerabilities
  • All changes are defensive security fixes with proper bounds checking and validation. The implementation is thorough, includes comprehensive negative tests, and the review fixes commit addressed potential integer overflow issues. No functional changes to correct code paths.
  • No files require special attention

Important Files Changed

Filename Overview
dali/operators/reader/loader/lmdb.h Fixed LMDB key string construction to use sized constructor, preventing out-of-bounds reads on non-null-terminated keys
dali/operators/reader/parser/caffe2_parser.h Added bounds checks for label indices and image data size validation, preventing buffer overruns and memory corruption
dali/test/python/reader/test_caffe.py Added comprehensive negative tests for out-of-bounds label indices and image data size mismatches using crafted protobuf data

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[LMDB Key Read] --> B{Key Data Valid?}
    B -->|Before Fix| C[to_string with reinterpret_cast]
    B -->|After Fix| D[Sized string constructor]
    C --> E[Potential OOB Read]
    D --> F[Safe Key String]
    
    G[Caffe2 Label Parse] --> H{Label Type?}
    H -->|MULTI_LABEL_SPARSE| I[Extract Label Index]
    H -->|MULTI_LABEL_WEIGHTED_SPARSE| J[Extract Index & Weight]
    I --> K{idx >= 0 && idx < num_labels?}
    J --> K
    K -->|No| L[DALI_ENFORCE Error]
    K -->|Yes| M[Store Label Value]
    
    N[Caffe2 Image Parse] --> O[Read Dims: H, W, C]
    O --> P{byte_data.size == H*W*C?}
    P -->|No| Q[DALI_ENFORCE Error]
    P -->|Yes| R[memcpy Image Data]
    
    style E fill:#ff6b6b
    style L fill:#ff6b6b
    style Q fill:#ff6b6b
    style F fill:#51cf66
    style M fill:#51cf66
    style R fill:#51cf66
Loading

Last reviewed commit: 9a2fa78

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@JanuszL
Copy link
Copy Markdown
Contributor Author

JanuszL commented Feb 14, 2026

!build

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [44018446]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [44018446]: BUILD PASSED

@JanuszL
Copy link
Copy Markdown
Contributor Author

JanuszL commented Feb 16, 2026

!build

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Comment thread dali/test/python/reader/test_caffe.py Fixed
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [44137002]: BUILD STARTED

Signed-off-by: Janusz Lisiecki <[email protected]>
@JanuszL
Copy link
Copy Markdown
Contributor Author

JanuszL commented Feb 16, 2026

!build

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [44137308]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [44137308]: BUILD PASSED

for (int i = 0; i < label_data_size; ++i) {
label_tensor_data[static_cast<int>(proto_get_data<T>(label_indices, i))]
= static_cast<T>(1);
auto idx = static_cast<int>(proto_get_data<T>(label_indices, i));
Copy link
Copy Markdown
Contributor

@mzient mzient Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto idx = static_cast<int>(proto_get_data<T>(label_indices, i));
int64_t idx = proto_get_data<T>(label_indices, i);

Much more readable - and also supports larger range.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

const int W = image_proto.dims(1);

image.Resize({H, W, C}, DALI_UINT8);
DALI_ENFORCE(image_proto.byte_data().size() == static_cast<size_t>(H * W * C),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DALI_ENFORCE(image_proto.byte_data().size() == static_cast<size_t>(H * W * C),
DALI_ENFORCE(image_proto.byte_data().size() == static_cast<size_t>(H) * W * C,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Signed-off-by: Janusz Lisiecki <[email protected]>
@JanuszL
Copy link
Copy Markdown
Contributor Author

JanuszL commented Feb 19, 2026

!build

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [44382257]: BUILD STARTED

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [44382257]: BUILD FAILED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [44382257]: BUILD PASSED

@JanuszL JanuszL merged commit 6e98682 into NVIDIA:main Feb 19, 2026
7 checks passed
@JanuszL JanuszL deleted the fix_caffe_Reader branch February 19, 2026 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

important-fix Fixes an important issue in the software or development environment.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants