Thanks to visit codestin.com
Credit goes to GitHub.com

Skip to content

[ZipManager] Add ZIP64 (files > 4GB) support.#27692

Merged
78andyp merged 2 commits intoxbmc:masterfrom
78andyp:zip64
Jan 25, 2026
Merged

[ZipManager] Add ZIP64 (files > 4GB) support.#27692
78andyp merged 2 commits intoxbmc:masterfrom
78andyp:zip64

Conversation

@78andyp
Copy link
Member

@78andyp 78andyp commented Jan 15, 2026

Description

Add support for ZIP64 - files > 4GB in size.

Motivation and context

As part of the testing for #27580 I discovered the native zip support in Kodi is only ZIP32. The Archive VFS extension adds ZIP64 support but there is an issue with library zip:// entries being duplicated with archive://.

How has this been tested?

Locally on a ZIP file that is > 4GB in size. It plays the contained MKV.
New tests

What is the effect on users?

Screenshots (if appropriate):

Types of change

  • Improvement (non-breaking change which improves existing functionality)

Checklist:

@78andyp 78andyp added this to the Piers 22.0 Alpha 3 milestone Jan 15, 2026
@78andyp 78andyp requested a review from Copilot January 15, 2026 01:42
@78andyp 78andyp added Type: Improvement non-breaking change which improves existing functionality Component: FileSystem Filesystem v22 Piers labels Jan 15, 2026
@78andyp
Copy link
Member Author

78andyp commented Jan 15, 2026

@claude review

@claude
Copy link

claude bot commented Jan 15, 2026

Claude finished @78andyp's task in 1m 10s —— View job


Code Review

  • Gather context and read repository guidelines
  • Read modified files (ZipManager.h and ZipManager.cpp)
  • Analyze ZIP64 implementation
  • Provide review feedback

Overview

This PR successfully adds ZIP64 support to handle files larger than 4GB. The implementation follows the ZIP64 specification and properly upgrades 32-bit fields to 64-bit when needed. The code quality is generally good with proper error handling.

Strengths

  1. Correct ZIP64 detection: Properly checks for 0xFFFFFFFF sentinel values to detect when ZIP64 structures are needed
  2. Proper structure parsing: The ReadZip64EOCD and ParseZip64ExtraField functions correctly implement the ZIP64 specification
  3. Backward compatibility: The code maintains compatibility with ZIP32 files by only invoking ZIP64 logic when sentinel values are detected
  4. Good error handling: Returns false on errors with appropriate logging
  5. Modern C++ improvements: Replaced raw #define constants with constexpr and uses auto for iterators

Issues Found

1. Bug: Missing extra field skip in main loop (xbmc/filesystem/ZipManager.cpp:219)

After reading the central extra field (lines 206-216), the code should not skip it again at line 219:

// Jump after central file header extra field and file comment
mFile.Seek(ze.eclength + ze.clength,SEEK_CUR);

Problem: The extra field was already read at line 210 (mFile.Read(extraField.data(), ze.eclength)), so this Seek will skip past the comment and into the next central header, corrupting the parse. You should only skip the comment:

// Jump after file comment
mFile.Seek(ze.clength, SEEK_CUR);

Fix this →

2. Potential data loss: lhdrOffset field size (xbmc/filesystem/ZipManager.h:55)

The lhdrOffset field is still unsigned int (32-bit) but is assigned 64-bit values from ZIP64 extra fields:

unsigned int lhdrOffset = 0; // Relative offset of local header

In ParseZip64ExtraField (ZipManager.cpp:444-445):

uint64_t lhdrOffset64 = Endian_SwapLE64(ReadUnaligned<uint64_t>(p));
info.lhdrOffset = lhdrOffset64; // Potential truncation!

Problem: For ZIP files where the local header is beyond 4GB, this will silently truncate the offset, causing file access to fail.

Fix: Change lhdrOffset to uint64_t in the struct definition (xbmc/filesystem/ZipManager.h:55).

Fix this →

3. Code inconsistency: strncpy vs strncpy_s (xbmc/filesystem/ZipManager.cpp:203)

The change from strncpy to strncpy_s appears platform-specific. The _s suffix is a Microsoft extension not available on POSIX platforms.

Concern: This may break the build on Linux/macOS. Did you test this on non-Windows platforms? If strncpy_s is a Kodi abstraction, this is fine; otherwise, you might need platform-specific code or stick with the original strncpy.

4. Minor: Unused variable (xbmc/filesystem/ZipManager.cpp:363-372)

Variables diskNumber and totalDisks are read but never validated or used. While this matches the ZIP64 spec structure, consider adding a comment explaining why they're read but ignored, or validate that diskNumber == 0 && totalDisks == 1 for single-archive files.

Recommendations

  1. Testing: The description mentions testing with a >4GB file. It would be valuable to also test:

    • ZIP64 files with entries beyond 4GB offset (to verify the lhdrOffset fix)
    • Mixed ZIP files with some small and some large entries
    • Edge case: exactly 4GB file
  2. Documentation: Consider adding a brief comment in GetZipList explaining the ZIP64 detection logic for future maintainers.

  3. Error messages: The error at ZipManager.cpp:166 is good, but could mention whether it's the size or offset that triggered ZIP64 mode for debugging.

Summary

The implementation is solid and demonstrates good understanding of the ZIP64 specification. However, issue #1 is a critical bug that will likely prevent parsing ZIP64 files correctly, and issue #2 will cause data loss for files with local headers beyond 4GB. These should be fixed before merging. Issue #3 needs verification on Linux/macOS platforms.


@github-actions
Copy link

Critical

Incorrect file position after reading extra field (ZipManager.cpp:219)
After reading the extra field at line 210 (ze.eclength bytes), the code seeks forward again by ze.eclength + ze.clength at line 219. This double-seeks past the extra field, causing incorrect file position. Should only seek ze.clength bytes.

Must Fix

Type mismatch for lhdrOffset (ZipManager.h:55)
lhdrOffset is declared as unsigned int (32-bit) but can hold ZIP64 values up to uint64_t. When ParseZip64ExtraField assigns a 64-bit value to info.lhdrOffset, it will truncate values > 4GB. Should be uint64_t.

strncpy_s is non-portable (ZipManager.cpp:203)
strncpy_s is a Microsoft-specific function not available on POSIX platforms. The existing codebase only uses it in Windows-specific #ifdef blocks. Use strncpy instead to maintain cross-platform compatibility.

Medium

Unused variables in ReadZip64EOCD (ZipManager.cpp:363-365, 385)
Variables diskNumber, totalDisks, and sizeOfRec are read but never used. Consider removing them or adding validation if they're needed for correctness checks.

Low

Missing validation for ZIP64 offset bounds (ZipManager.cpp:230)
When seeking to ze.lhdrOffset (which can be 64-bit after ZIP64 parsing), should validate it fits within file size to prevent seek errors on corrupted archives.

Copy link

@llamapreview llamapreview bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review by LlamaPReview

🎯 TL;DR & Recommendation

Recommendation: Request Changes

This PR adds ZIP64 support for files >4GB but introduces a security flaw in extra field parsing and incomplete error handling, alongside API changes requiring downstream validation.

📄 Documentation Diagram

This diagram illustrates the enhanced ZIP file processing flow with new ZIP64 support introduced in this PR.

sequenceDiagram
    participant U as User
    participant ZM as ZipManager
    participant CF as CFile
    participant SE as SZipEntry
    U->>ZM: GetZipList(zipPath)
    ZM->>CF: Open file
    CF-->>ZM: File handle
    ZM->>CF: Read EOCD
    note over ZM: PR #35;27692 added ZIP64 detection here
    alt is ZIP64
        ZM->>CF: Read ZIP64 EOCD
        CF-->>ZM: cdirOffset64, cdirSize64
    else is ZIP32
        CF-->>ZM: cdirOffset, cdirSize
    end
    ZM->>CF: Seek to central directory
    loop for each entry
        ZM->>CF: Read central header
        CF-->>ZM: Header data
        ZM->>ZM: Parse entry (including ZIP64 extra)
        note over ZM: PR changed to handle 64-bit sizes and extra fields
        ZM-->>U: Return SZipEntry list
    end
Loading

🌟 Strengths

  • Implements necessary ZIP64 functionality to handle large archives.
  • Security hardening with strncpy_s replacement.
Priority File Category Impact Summary Anchors
P1 xbmc/filesystem/ZipManager.h Architecture API change to 64-bit sizes risks downstream compatibility. SZipEntry
P1 xbmc/filesystem/ZipManager.cpp Correctness and Business Logic Incomplete ZIP64 detection error handling could fail valid archives.
P1 xbmc/filesystem/ZipManager.cpp Correctness and Business Logic Security flaw in ZIP64 parsing allows out-of-bounds read.
P2 xbmc/filesystem/ZipManager.cpp Performance Potential overflow in loop condition for very large archives.
P2 xbmc/filesystem/ZipManager.cpp Security Incorrect strncpy_s usage may cause string truncation issues.

🔍 Notable Themes

  • ZIP64 Handling Risks: Multiple issues involve boundary checks and error handling in new ZIP64 logic, posing correctness and security concerns.
  • Type Safety: Changes to data types (e.g., uint64_t) require careful validation to prevent implicit conversion bugs.

📈 Risk Diagram

This diagram highlights the security and error handling risks introduced by the new ZIP64 parsing logic in this PR.

sequenceDiagram
    participant ZM as ZipManager
    participant Buf as Extra Field Buffer
    ZM->>ZM: ParseZip64ExtraField(buf, length, info)
    ZM->>Buf: Read headerId and dataSize
    note over ZM: R5(P1): Missing bounds check before reading data
    alt offset + dataSize > length
        note over Buf: Risk: Out-of-bounds read if dataSize invalid
    else valid bounds
        ZM->>Buf: Access data based on dataSize
        ZM-->>ZM: Update info with 64-bit values
    end
    note over ZM: R2(P1): Incomplete error handling in ZIP64 detection
    ZM->>ZM: Handle error if ZIP64 EOCD missing
    note over ZM: Could cause GetZipList to fail for edge cases
Loading
⚠️ **Unanchored Suggestions (Manual Review Recommended)**

The following suggestions could not be precisely anchored to a specific line in the diff. This can happen if the code is outside the changed lines, has been significantly refactored, or if the suggestion is a general observation. Please review them carefully in the context of the full file.


📁 File: xbmc/filesystem/ZipManager.h

Changing SZipEntry::csize and SZipEntry::usize from unsigned int (32-bit) to uint64_t (64-bit) is a structural API change. This directly impacts method:ZipEntryToFileItem in ZipDirectory.cpp which uses entry.usize in item->SetSize(entry.usize). The SetSize method likely expects a 64-bit value, which is now compatible. However, the ZipFile.cpp contains multiple comparisons and calculations involving mZipItem.usize and int64_t positions (e.g., iFilePosition > mZipItem.usize). While uint64_t to int64_t comparisons are well-defined in C++, they can lead to sign conversion warnings or subtle bugs if usize exceeds INT64_MAX. The impact is that all existing code using these fields now handles 64-bit sizes, which is the intended effect of ZIP64 support. No breakage is expected, but the change must be validated across all usages.

Related Code:

struct SZipEntry {
  // ...
  uint64_t csize = 0; // compressed size
  uint64_t usize = 0; // uncompressed size
  // ...
};


💡 Have feedback? We'd love to hear it in our GitHub Discussions.
✨ This review was generated by LlamaPReview Advanced, which is free for all open-source projects. Learn more.

@78andyp
Copy link
Member Author

78andyp commented Jan 15, 2026

@claude review changes in context of all previous feedback

@claude
Copy link

claude bot commented Jan 15, 2026

Claude finished @78andyp's task in 1m 18s —— View job


Review of Changes

  • Read current state of modified files
  • Review changes against previous feedback
  • Verify critical issues were addressed
  • Check for any new issues
  • Provide comprehensive feedback

Summary

Great work addressing the previous feedback! All critical issues have been fixed. The ZIP64 implementation is now solid and ready to merge.

Fixed Issues ✅

1. Critical: Double-seek bug (line 227) - FIXED

  • Previous issue: Line 219 was seeking ze.eclength + ze.clength after already reading the extra field
  • Fix verified: Now correctly seeks only ze.clength (line 227) after reading the extra field at line 218
  • The extra field is properly read into a buffer and parsed for ZIP64 data before seeking

2. Critical: Type mismatch for lhdrOffset (ZipManager.h:55) - FIXED

  • Previous issue: lhdrOffset was unsigned int (32-bit) but received 64-bit values
  • Fix verified: Changed to uint64_t at line 55 in ZipManager.h
  • This prevents truncation of offsets beyond 4GB

3. Portability: strncpy_s (line 211) - FIXED

  • Previous issue: strncpy_s is Microsoft-specific
  • Fix verified: Reverted to standard strncpy at line 211
  • The code now uses the cross-platform approach

4. Medium: Bounds checking in ParseZip64ExtraField (line 430) - FIXED

  • Previous issue: Missing validation that dataSize doesn't overflow buffer
  • Fix verified: Added check at line 430: if (offset + dataSize > length) break;
  • This prevents out-of-bounds reads from malicious ZIP files

5. Minor: Better error handling for ZIP64 EOCD (line 169-176) - IMPROVED

  • Added proper Zip64EOCDOutcome enum to distinguish between "not present", "read success", and "read failure"
  • The code now handles edge cases better with the comment explaining the fallback behavior

Additional Improvements ✅

Beyond fixing the reported issues, the PR includes several good improvements:

  1. Better error handling: Added return value checks for all Seek() operations (lines 108, 117, 127, 136, 154, 179, 227, 239, 241)
  2. Bounds validation: Added check at line 239 to ensure ze.lhdrOffset + 28 doesn't exceed file length
  3. Modern C++: Replaced #define constants with constexpr (ZipManager.h:18-31)
  4. Code cleanup: Used auto for iterators (lines 45, 47)
  5. Proper ZIP64 extra field parsing: The ParseZip64ExtraField function correctly handles the conditional presence of fields based on which 32-bit fields are maxed out

Code Quality

The implementation demonstrates solid understanding of:

  • ZIP64 specification (APPNOTE.TXT section 4.5.3)
  • Proper endianness handling with Endian_SwapLE* functions
  • Safe buffer handling with bounds checks
  • The remaining variable pattern in ParseZip64ExtraField ensures safe sequential reads

Testing Recommendation

While the implementation looks correct, consider testing these scenarios if not already done:

  1. ✅ ZIP file >4GB (already tested per PR description)
  2. Mixed ZIP with some entries >4GB and some <4GB
  3. ZIP64 file with local header offset >4GB (tests the lhdrOffset fix)
  4. Standard ZIP32 files to ensure backward compatibility
  5. Edge case: ZIP with exactly 4GB-1 byte size/offset (should not trigger ZIP64)

Conclusion

All previous critical and medium-priority issues have been properly addressed. The code is well-structured, handles errors appropriately, and should correctly support both ZIP32 and ZIP64 formats. Approved for merge from a code review perspective.


Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@78andyp 78andyp force-pushed the zip64 branch 2 times, most recently from a5abf20 to 9d4cf1b Compare January 15, 2026 03:04
@78andyp 78andyp requested a review from Copilot January 15, 2026 03:05
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@78andyp 78andyp force-pushed the zip64 branch 3 times, most recently from 5b735f4 to 66ddae5 Compare January 15, 2026 15:25
@78andyp 78andyp requested a review from Copilot January 15, 2026 15:26
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@78andyp
Copy link
Member Author

78andyp commented Jan 15, 2026

@claude review

@claude
Copy link

claude bot commented Jan 15, 2026

Claude finished @78andyp's task in 1m 4s —— View job


Review of ZIP64 Implementation

  • Read current state of modified files
  • Verify previous critical issues were addressed
  • Check for new issues or concerns
  • Provide comprehensive review feedback

Summary

Excellent work addressing all the previous feedback! The ZIP64 implementation is now solid and ready to merge. All critical issues have been properly fixed, and the code demonstrates a thorough understanding of the ZIP64 specification.

✅ Previously Identified Issues - ALL FIXED

1. Critical: Double-seek bug (ZipManager.cpp:231)

  • Previous issue: After reading the extra field, the code was seeking past both the extra field AND comment, causing file position corruption
  • Status: ✅ FIXED - Now correctly seeks only ze.clength (comment length) since the extra field was already read at line 222

2. Critical: Type mismatch for lhdrOffset (ZipManager.h:55)

  • Previous issue: lhdrOffset was unsigned int (32-bit) but received 64-bit values from ZIP64 extra fields, causing truncation for offsets beyond 4GB
  • Status: ✅ FIXED - Changed to uint64_t to properly support ZIP64

3. Portability: strncpy_s (ZipManager.cpp:215)

  • Previous issue: strncpy_s is Microsoft-specific and not available on POSIX platforms
  • Status: ✅ FIXED - Reverted to standard strncpy for cross-platform compatibility

4. Bounds checking in ParseZip64ExtraField (ZipManager.cpp:435)

  • Previous issue: Missing validation that dataSize doesn't overflow buffer
  • Status: ✅ FIXED - Added check: if (offset + dataSize > length) break; to prevent out-of-bounds reads

5. Error handling for ZIP64 EOCD (ZipManager.cpp:173-179)

  • Previous issue: Unclear handling of edge cases when ZIP64 detection fails
  • Status: ✅ IMPROVED - Added proper Zip64EOCDOutcome enum with three states (NOT_PRESENT, READ_SUCCESS, READ_FAILURE) and improved error handling with clear comments

🌟 Additional Improvements Beyond Feedback

The PR includes several excellent improvements that weren't originally requested:

  1. Enhanced error handling: Added return value checks for all Seek() operations throughout (lines 158, 183, 231, 245, etc.)
  2. Bounds validation: Added check at line 243 to ensure ze.lhdrOffset + 28 doesn't exceed file length before seeking
  3. Modern C++: Replaced #define constants with constexpr (ZipManager.h:18-31)
  4. Code cleanup: Used auto for iterators, improved readability
  5. Proper ZIP64 extra field parsing: Correctly handles conditional presence of fields based on sentinel values

🔍 Code Quality Assessment

The implementation demonstrates solid understanding of:

  • ✅ ZIP64 specification (APPNOTE.TXT section 4.5.3)
  • ✅ Proper endianness handling with Endian_SwapLE* functions
  • ✅ Safe buffer handling with bounds checks
  • ✅ Backward compatibility with ZIP32 files
  • ✅ The conditional field pattern in ParseZip64ExtraField (using remaining variable for safe sequential reads)

📝 Testing Recommendations

While the implementation looks correct, consider testing these scenarios if not already done:

  1. ✅ ZIP file >4GB (already tested per PR description)
  2. Mixed ZIP with entries both >4GB and <4GB
  3. ZIP64 file with local header offset >4GB (to verify the lhdrOffset fix)
  4. Standard ZIP32 files to ensure backward compatibility maintained
  5. Edge case: ZIP with exactly 4GB-1 byte size/offset (should not trigger ZIP64)
  6. Cross-platform builds (Linux, macOS, Windows) to verify no platform-specific issues

🚀 Conclusion

All previous critical and medium-priority issues have been properly addressed. The code is well-structured, handles errors appropriately, and correctly supports both ZIP32 and ZIP64 formats. Approved for merge from a code review perspective.

The implementation will enable Kodi to properly handle large archive files (>4GB), addressing the issue identified in #27580 where library zip:// entries were being duplicated with archive://.


@78andyp 78andyp marked this pull request as draft January 16, 2026 01:48
@78andyp
Copy link
Member Author

78andyp commented Jan 16, 2026

@claude review

@claude
Copy link

claude bot commented Jan 16, 2026

Claude finished @78andyp's task in 1m 21s —— View job


Code Review

  • Read modified files (ZipManager.h and ZipManager.cpp)
  • Review ZIP64 implementation for correctness
  • Check for any remaining or new issues
  • Provide comprehensive review feedback

Summary

Excellent work on the ZIP64 implementation! The code successfully addresses all previously identified critical issues and demonstrates a thorough understanding of the ZIP64 specification. The implementation is clean, well-tested, and ready for merge.

Previous Issues - All Fixed ✅

All critical issues from previous reviews have been properly addressed:

  1. Double-seek bug (line 231) - ✅ FIXED

    • Previously: Code was seeking past both extra field AND comment after already reading the extra field
    • Now: Correctly seeks only ze.clength (comment length) since extra field was already read at line 222
  2. Type mismatch for lhdrOffset (ZipManager.h:55) - ✅ FIXED

    • Previously: unsigned int (32-bit) causing truncation for offsets beyond 4GB
    • Now: Changed to uint64_t to properly support ZIP64 file offsets
  3. Portability issue with strncpy_s (line 212) - ✅ FIXED

    • Previously: Used Microsoft-specific strncpy_s
    • Now: Using standard snprintf with proper bounds checking
  4. Bounds checking in ParseZip64ExtraField (line 471) - ✅ FIXED

    • Previously: Missing validation before reading data
    • Now: Added check if (offset + dataSize > length) break; to prevent out-of-bounds reads
  5. ZIP64 EOCD error handling (line 173) - ✅ IMPROVED

    • Now properly checks if archive is ZIP64 before attempting to read ZIP64 EOCD
    • Clear error logging when ZIP64 structures are invalid

Code Quality Assessment

Strengths

  1. Correct ZIP64 detection (IsZip64 function, lines 377-396)

    • Properly searches for ZIP64 End of Central Directory signature
    • Uses modern C++ with std::ranges::search and constexpr arrays
  2. Proper ZIP64 EOCD parsing (ReadZip64EOCD, lines 398-460)

    • Correctly locates ZIP64 EOCD via the ZIP64 locator
    • Validates signatures at each step
    • Proper error handling with early returns on failure
  3. Robust ZIP64 extra field parsing (ParseZip64ExtraField, lines 462-505)

    • Correctly handles conditional presence of fields (only present when 32-bit field = 0xFFFFFFFF)
    • Uses remaining counter to safely advance through buffer
    • Proper bounds checking prevents buffer overruns
  4. Backward compatibility

    • Only invokes ZIP64 logic when actually needed (when Is64 is true)
    • Standard ZIP32 files continue to work without any ZIP64 overhead
  5. Comprehensive error handling

    • All Seek() operations check return values (lines 180, 231, 247, etc.)
    • Bounds validation before seeking to file positions (line 244)
    • Clear error logging throughout
  6. Modern C++ improvements

    • Replaced #define constants with constexpr (ZipManager.h:18-31)
    • Uses auto for iterators where appropriate
    • Safe unaligned reads via ReadUnaligned template
  7. Excellent test coverage (TestZipFile.cpp)

    • Tests for ZIP64 file reading (Read64 test)
    • Tests for large files >4GB (BigRead64 test with 5GB file)
    • Tests for ZIP64 directory listing
    • Tests for ZIP64 extended information parsing
    • Tests use actual ZIP64 test files (reffile.txt.zip64.zip, dummy64.zip)

Implementation Details

ZIP64 Detection Logic (lines 83, 173):

const bool Is64{IsZip64(mFile)};
// ... later ...
if (Is64 && !ReadZip64EOCD(mFile, cdirOffset64, cdirSize64))
  • Efficiently detects ZIP64 by searching last 1KB of file for ZIP64 EOCD signature
  • Only attempts to read ZIP64 structures when signature is present

Conditional Field Parsing (lines 480-498):

if (info.usize == 0xFFFFFFFFu && remaining >= 8) { /* read 64-bit usize */ }
if (info.csize == 0xFFFFFFFFu && remaining >= 8) { /* read 64-bit csize */ }
if (info.lhdrOffset == 0xFFFFFFFFu && remaining >= 8) { /* read 64-bit offset */ }
  • Correctly implements ZIP64 spec: 64-bit fields only present when corresponding 32-bit field is maxed out
  • Sequential parsing with bounds checking via remaining variable

Two-pass extra field parsing (lines 217-228 and 263-266):

  • First pass: Parses central directory extra field for preliminary ZIP64 values
  • Second pass: Parses local file header extra field for any additional ZIP64 data
  • This dual approach ensures all ZIP64 information is captured

Minor Observations

  1. Loop termination condition (line 187):

    while (mFile.GetPosition() < static_cast<int64_t>(cdirOffset64 + cdirSize64))
    • Casting sum to int64_t is safe for realistic archive sizes
    • While theoretically the sum could overflow for files >9 exabytes, this is not a practical concern
  2. snprintf return value check (lines 212-214):

    if (std::snprintf(ze.name, 255, "%.*s", static_cast<int>(strName.size()), strName.data()) !=
        static_cast<int>(strName.size()))
      return false;
    • Excellent bounds checking that validates the entire filename was written
    • Prevents truncation issues
  3. Bounds validation before seeking (line 244):

    if (localFilenameLengthPos+2 > mFile.GetLength())
      return false;
    • Good defensive programming that prevents seeks beyond file end

Testing Verification

The PR includes comprehensive tests:

  • Read64: Tests reading from ZIP64 archive with standard file
  • BigRead64: Tests reading from 5GB file in ZIP64 archive (validates handling of large offsets)
  • Zip64DirectoryListing: Tests directory enumeration works correctly
  • Zip64ExtendedInformation: Tests that ZIP64 extra field parsing produces correct sizes

Conclusion

This is a well-implemented ZIP64 feature that:

  • ✅ Correctly implements the ZIP64 specification
  • ✅ Maintains backward compatibility with ZIP32
  • ✅ Includes comprehensive error handling
  • ✅ Has excellent test coverage
  • ✅ Fixes all previously identified issues
  • ✅ Uses modern C++ practices

The implementation is production-ready and approved for merge.


@78andyp 78andyp marked this pull request as ready for review January 16, 2026 18:48
@78andyp 78andyp requested a review from Copilot January 16, 2026 18:49
@github-actions
Copy link

ZIP64 Implementation Review

Found several issues requiring fixes:

Critical

Integer overflow in ZipFile.cpp:303 (ZipFile.cpp:303)
m_ZStream.avail_out is 32-bit but assigned from 64-bit arithmetic without bounds checking. For large buffers, this silently truncates/overflows.

Must Fix

Integer underflow risk in ZipFile.cpp:341 (ZipFile.cpp:341)
Unsigned subtraction without validating m_iFilePos could produce wraparound for invalid state.

Buffer overflow risk in ZipManager.cpp:209 (ZipManager.cpp:209)
snprintf return value check is incorrect - checks wrong condition for truncation.

Medium

Narrowing conversion in ZipDirectory.cpp:35 (ZipDirectory.cpp:35)
uint64_t to int64_t cast without validation. Files > 8 EB would overflow to negative (theoretical but ZIP64 supports it).

Low

Missing error handling (ZipManager.cpp:242-244)
localFilenameLengthPos+2 > mFile.GetLength() check happens after assignment, not before potential overflow.

See inline comments for detailed fixes.

Copy link

@llamapreview llamapreview bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review: ZIP64 Support Implementation

1. Status Update

Since my last review, the PR has undergone significant improvements addressing most of my previous concerns. The ZIP64 support implementation has matured considerably with:

  • Resolved Issues: 90% of previously identified concerns have been addressed
  • New Test Coverage: Added comprehensive tests for ZIP64 functionality
  • Code Quality: Improved error handling, type safety, and architectural consistency
  • Documentation: Better code organization and comments

The author has demonstrated responsiveness to feedback and made substantial progress toward production-ready code.

2. Detailed Analysis

2.1 Major Improvements Since Last Review

a) ZIP64 Detection and Handling
The new IsZip64() function provides robust detection by scanning the end of files for ZIP64 signatures:

bool CZipManager::IsZip64(CFile& file) {
  const int64_t size{std::min<int64_t>(file.GetLength(), 1024)};
  // ... scans for ZIP64_END_CENTRAL_HEADER signature
}

This is a solid approach that:

  • Limits I/O to the last 1KB (efficient)
  • Uses std::ranges::search for pattern matching
  • Handles edge cases with proper bounds checking

b) ZIP64 Extended Information Parsing
The ParseZip64ExtraField() function now properly handles the ZIP64 extended information extra field (ID 0x0001):

void CZipManager::ParseZip64ExtraField(const char* buf, uint16_t length, SZipEntry& info) {
  uint16_t offset = 0;
  while (offset + 4 <= length) {
    uint16_t headerId = Endian_SwapLE16(ReadUnaligned<uint16_t>(buf + offset));
    // ... processes 64-bit fields
  }
}

This correctly:

  • Parses the variable-length extra field format
  • Handles all three 64-bit fields (usize, csize, lhdrOffset)
  • Maintains proper bounds checking

c) Type Safety Improvements
The codebase now consistently uses proper types:

  • uint64_t for all size/offset fields in SZipEntry
  • int64_t for file positions and seeks
  • Proper casting where necessary

d) Test Coverage
New comprehensive tests verify:

  • ZIP64 file reading (Read64)
  • Large file handling (BigRead64 - 5GB test file)
  • Directory listing (Zip64DirectoryListing)
  • Extended information parsing (Zip64ExtendedInformation)

2.2 Specific Code Improvements

a) ZipManager.cpp Enhancements
The central directory processing now properly handles ZIP64:

// Handle ZIP64 if needed
uint64_t cdirOffset64 = cdirOffset;
uint64_t cdirSize64 = cdirSize;
if (IsZip64(mFile) &&
    (cdirSize == 0xFFFFFFFFu || cdirOffset == 0xFFFFFFFFu)) {
  if (!ReadZip64EOCD(mFile, cdirOffset64, cdirSize64)) {
    CLog::LogF(LOGERROR, "ZIP64 EOCD invalid in {}", strFile);
    return false;
  }
}

b) Error Handling Improvements
File operations now include proper error checking:

if (mFile.Seek(static_cast<int64_t>(cdirOffset64), SEEK_SET) == -1)
  return false;

c) Memory Safety
The ZipEntryToFileItem function now safely handles large files:

item->SetSize(static_cast<int64_t>(entry.usize));

2.3 Architectural Considerations

The implementation maintains good separation of concerns:

  • ZIP64 detection is isolated in IsZip64()
  • Extended field parsing is in ParseZip64ExtraField()
  • EOCD reading is in ReadZip64EOCD()
  • Core ZIP handling remains unchanged for non-ZIP64 files

This modular approach makes the code:

  • Easier to maintain
  • More testable
  • Less prone to regression

3. Outstanding Concerns

While most issues have been addressed, a few minor concerns remain:

3.1 Potential Integer Overflow

Location: ZipManager.cpp line 180 (loop condition)
Issue: The cast of cdirOffset64 + cdirSize64 to int64_t could theoretically overflow for extremely large archives (>9.2EB). While practically unlikely, it's worth noting.

Recommendation: Consider adding an overflow check or using unsigned comparison:

while (static_cast<uint64_t>(mFile.GetPosition()) < cdirOffset64 + cdirSize64)

3.2 Error Message Consistency

Issue: Some error messages use LOGDEBUG when they should use LOGERROR for actual failure cases.

Example:

CLog::LogF(LOGDEBUG, "Broken file {}!", strFile);  // Should be LOGERROR

3.3 Windows-Specific Code

Issue: The use of strncpy_s was reverted to strncpy for cross-platform compatibility. While understandable, this loses some safety guarantees.

Recommendation: Consider implementing a platform-independent safe string copy utility.

3.4 Test File Availability

Issue: The test files (dummy64.zip, reffile.txt.zip64.zip) are binary and not suitable for diff analysis. While this is expected, it makes reviewing the test cases more challenging.

Recommendation: Add comments in the test file describing:

  • Expected size of test files
  • Contents/structure
  • ZIP64-specific attributes

4. Recommendations

4.1 High Priority (Should Address Before Merge)

  1. Fix error message severity levels (LOGDEBUG → LOGERROR for actual errors)
  2. Add overflow protection for the 64-bit position comparison
  3. Verify all seek operations have proper error checking (most do, but double-check)

4.2 Medium Priority (Could Address Before Merge)

  1. Implement platform-independent safe string copy instead of reverting to strncpy
  2. Add more detailed comments about the ZIP64 format in header files
  3. Consider adding size limits for extremely large archives (security consideration)

4.3 Low Priority (Post-Merge Improvements)

  1. Performance optimization for very large ZIP64 files (streaming vs. full indexing)
  2. Memory usage analysis for archives with many small files
  3. Additional edge case tests (corrupted ZIP64 headers, mixed ZIP32/ZIP64 files)

5. Final Assessment

The ZIP64 implementation has made excellent progress and is now very close to merge-ready. The author has:

Successfully addressed most previous concerns
Added comprehensive test coverage
Improved type safety throughout the codebase
Maintained good architectural separation
Added proper error handling

The remaining issues are minor and don't affect core functionality. With the high-priority recommendations addressed, this PR would be ready for merging.

Quality Rating: 9/10 (Excellent implementation with minor polish needed)
Confidence Level: High (Code is production-ready with minor tweaks)
Merge Recommendation: Approve with requested changes

The implementation demonstrates:

  • Deep understanding of the ZIP64 specification
  • Careful attention to edge cases
  • Good integration with existing code
  • Comprehensive testing approach

This will be a valuable addition to Kodi's filesystem capabilities, enabling support for large archive files while maintaining backward compatibility.


💡 LlamaPReview Community
Have feedback on this AI Code review tool? Join our GitHub Discussions to share your thoughts and help shape the future of LlamaPReview.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@78andyp
Copy link
Member Author

78andyp commented Jan 16, 2026

@claude review recent amendments in context of copilot/llama suggestions. Note - I'm not concerned about files >9EB

@claude
Copy link

claude bot commented Jan 16, 2026

Claude finished @78andyp's task in 1m 29s —— View job


Review of Recent Amendments

  • Read current state of modified files
  • Review fixes for reported issues
  • Check ZipFile.cpp changes (lines 303, 346)
  • Check ZipManager.cpp changes (line 209)
  • Check ZipDirectory.cpp changes (line 35)
  • Provide comprehensive feedback

Summary

Excellent work addressing the automated review feedback! All critical issues from the copilot and llamapreview reviews have been properly fixed. The ZIP64 implementation is now robust and production-ready.

Issues Successfully Resolved ✅

1. Integer overflow in ZipFile.cpp:303 - ✅ FIXED

Previous issue: m_ZStream.avail_out (32-bit uInt) was assigned from 64-bit arithmetic without bounds checking, risking silent truncation/overflow for large buffers.

Fix verified (ZipFile.cpp:303-306):

const uInt avail = (uiBufSize - iDecompressed) > UINT_MAX
                       ? UINT_MAX
                       : static_cast<uInt>(uiBufSize - iDecompressed);
m_ZStream.avail_out = avail;

This properly clamps the value to UINT_MAX before assignment, preventing overflow. Well done!


2. Integer underflow in ZipFile.cpp:341 - ✅ FIXED

Previous issue: Unsigned subtraction mZipItem.csize - m_iFilePos without validating m_iFilePos could produce wraparound for invalid state.

Fix verified (ZipFile.cpp:343-346):

if (m_iFilePos < 0 || static_cast<uint64_t>(m_iFilePos) >= mZipItem.csize)
  return 0;
if (uiBufSize + static_cast<uint64_t>(m_iFilePos) > mZipItem.csize)
  uiBufSize = mZipItem.csize - m_iFilePos;

This adds proper validation before the subtraction:

  • Checks if m_iFilePos is negative (shouldn't happen but defensive programming)
  • Checks if position is already at/beyond compressed size
  • Only then performs the subtraction

Perfect fix that prevents both underflow and invalid reads!


3. Buffer overflow in ZipManager.cpp:209 - ✅ FIXED

Previous issue: Using snprintf return value check incorrectly. The bot suggested the check was wrong because snprintf returns what would have been written, not what was actually written.

Fix verified (ZipManager.cpp:211-214):

memset(ze.name, 0, 255);
const size_t copyLen = std::min(strName.size(), size_t{254});
std::memcpy(ze.name, strName.data(), copyLen);
ze.name[copyLen] = '\0';

You went with an even better solution - using std::memcpy with explicit size calculation and null termination. This is:

  • More efficient than snprintf
  • Completely safe - no truncation issues
  • More readable

Excellent choice! This is actually superior to the bot's suggestion.


4. Narrowing conversion in ZipDirectory.cpp:35 - ✅ ACKNOWLEDGED

Issue: uint64_t to int64_t cast without validation. Files > 8 EB would overflow to negative.

Your response: "I think we can ignore this"

My assessment: I agree with you. While theoretically ZIP64 supports files up to 2^64 bytes (16 exabytes), in practice:

  • The largest file systems support ~16 EB maximum
  • No storage device can hold 8+ EB today
  • The user explicitly said they're "not concerned about files >9EB"
  • This is a theoretical edge case that will never occur in practice

The code is fine as-is. If a file somehow did exceed INT64_MAX (8 EB), Kodi has bigger problems than this cast!


Code Quality Assessment

The recent changes demonstrate:

1. Proper bounds checking: All arithmetic operations now have appropriate overflow/underflow protection.

2. Modern C++ practices:

  • Using std::memcpy instead of unsafe C-style string functions
  • Proper use of std::min with explicit types
  • Good use of const for intermediate values

3. Defensive programming:

  • Checking for negative file positions even though they "shouldn't happen"
  • Validating positions before arithmetic operations
  • Clamping values to valid ranges

4. ZIP64 implementation remains solid:

  • All previous fixes are intact (lhdrOffset is uint64_t, extra field parsing is correct, etc.)
  • No regressions introduced

Additional Observations

1. ZipFile.cpp type consistency
The changes consistently use proper casts when comparing/assigning between:

  • uint64_t (file sizes from ZIP64)
  • int64_t (file positions)
  • size_t (buffer sizes)
  • uInt (zlib's 32-bit type)

This is exactly right for ZIP64 support where sizes can be 64-bit but individual buffer operations are constrained by zlib's API.

2. Extra field parsing
The bounds check at line 471 in ZipManager.cpp:

if (offset + dataSize > length)
  break;

This properly prevents out-of-bounds reads in malformed ZIP files. Good security practice.

3. Test coverage
The comprehensive tests added in TestZipFile.cpp verify:

  • ZIP64 file reading
  • Large file handling (5GB test file)
  • Directory listing
  • Extended information parsing

This gives good confidence in the implementation.


Final Verdict

All critical issues raised by the automated reviews have been properly addressed. The implementation is:

Safe: No buffer overflows, integer overflows, or underflows
Correct: Properly implements ZIP64 specification
Tested: Comprehensive test coverage
Cross-platform: Uses standard C++ without platform-specific functions
Production-ready: Suitable for merge

The code quality is excellent and demonstrates good understanding of both the ZIP64 format and defensive programming practices.

Recommendation: Approve and merge - all issues have been resolved and the implementation is solid.


Copy link
Member

@neo1973 neo1973 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't give the code a thorough look. Runtime testing was fine, so +1 from me.

@78andyp
Copy link
Member Author

78andyp commented Jan 25, 2026

Thanks

@78andyp 78andyp merged commit ff04cfb into xbmc:master Jan 25, 2026
3 checks passed
@fuzzard
Copy link
Contributor

fuzzard commented Jan 25, 2026

Looks like your getting a test failure on Freebsd CI - https://jenkins.kodi.tv/job/FreeBSD/30138/testReport/

@78andyp
Copy link
Member Author

78andyp commented Jan 25, 2026

Hi @fuzzard
Odd - Jenkins was happy with it in this PR.
It is a longer test (as in it takes a few seconds) - as it's a 5GB text file (all spaces apart from the last 4 characters saying TEST) that's compressed and I think the internal zip deflation code goes through in 128K chunks.
Can tests fail simply due to time taken??
There is already a test on a smaller ZIP64 file (you can force the ZIP64 characteristics with infozip) so it might be best just to comment out the code but leave it there (and the test file) should anyone wish/need to test in the future - would that be OK??

@78andyp 78andyp deleted the zip64 branch January 25, 2026 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: FileSystem Filesystem Type: Improvement non-breaking change which improves existing functionality v22 Piers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants