Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@derduher
Copy link
Collaborator

Summary

This PR addresses multiple security vulnerabilities in the sitemap index parsing and generation functionality by adding comprehensive validation and security checks.

Security Fixes

πŸ”΄ HIGH: Protocol Injection Prevention

  • Added URL validation to prevent javascript:, data:, file:, and ftp: protocol injection attacks
  • Uses centralized validateURL() for consistent security enforcement
  • Enforces http/https protocol restriction
  • Validates URL format and structure

🟑 MEDIUM: URL Length DoS Protection

  • Enforced 2048 character URL limit per sitemaps.org specification
  • Prevents resource exhaustion attacks via excessively long URLs

🟑 MEDIUM: Memory Exhaustion Protection

  • Added maxEntries parameter to parseSitemapIndex() with default limit of 50,000 entries
  • Prevents DoS attacks via maliciously large sitemap indexes
  • Configurable limit for different use cases

🟑 MEDIUM: Inconsistent Validation Fixed

  • Replaced basic URL validation in stream with centralized validateURL()
  • Ensures consistent security policies across all code paths

🟒 LOW-MEDIUM: Date Format Validation

  • Added ISO 8601 date format validation for lastmod fields
  • Prevents arbitrary text injection in date fields
  • Ensures XML schema compliance

🟒 LOW: Empty URL Leakage Fixed

  • Fixed bug where items with failed validation were pushed with empty URLs
  • Invalid entries are now properly filtered out

Changes

Modified Files

lib/sitemap-index-parser.ts

  • Added URL validation in text/cdata handlers using validateURL()
  • Added date format validation for lastmod fields using LIMITS.ISO_DATE_REGEX
  • Added check to skip items with invalid/empty URLs in closetag handler
  • Imported validateURL and LIMITS from validation/constants modules

lib/sitemap-index-stream.ts

  • Replaced basic new URL() check with centralized validateURL()
  • Improved error message formatting for consistency
  • Imported validateURL from validation module

tests/sitemap-index-security.test.ts (NEW)

  • 27 comprehensive security tests covering:
    • Protocol injection attacks (parser & stream, both WARN and THROW modes)
    • URL length limits (exceeding limit, at limit)
    • Date format validation (invalid formats, valid ISO 8601)
    • Memory exhaustion (50,001 entries test, custom limits)
    • CDATA handling security
    • Error level handling (SILENT, WARN, THROW)
    • Empty and malformed URL filtering

Test Results

βœ… All 356 tests passing

  • Coverage maintained above 90%
  • New security tests validate all attack vectors
  • Existing tests confirm backward compatibility

Backward Compatibility

βœ… 100% backward compatible

  • Default behavior unchanged (WARN level)
  • Invalid entries are filtered in WARN mode (existing behavior)
  • THROW mode properly rejects invalid data when explicitly requested
  • New maxEntries parameter is optional with sensible default (50,000)
  • Error messages maintain expected format for existing tests

Example Usage

// Default behavior - warns and filters invalid URLs
const items = await parseSitemapIndex(xmlStream);

// Strict mode - throws on invalid data
const stream = new XMLToSitemapIndexStream({ level: ErrorLevel.THROW });

// Custom memory limit
const items = await parseSitemapIndex(xmlStream, 10000);

Security Impact

This PR protects against:

  • βœ… XSS attacks via javascript: protocol injection
  • βœ… Local file access via file: protocol
  • βœ… Data URL injection attacks
  • βœ… Memory exhaustion DoS attacks
  • βœ… URL length-based resource exhaustion
  • βœ… Date field injection attacks

Checklist

  • All tests passing (356/356)
  • Code coverage maintained (>90%)
  • Backward compatibility preserved
  • Security tests added for all vulnerabilities
  • Documentation in code comments
  • Follows existing code patterns
  • Pre-commit hooks passed (eslint, prettier)

πŸ€– Generated with Claude Code

Co-Authored-By: Claude [email protected]

…ream

This commit addresses multiple security vulnerabilities in the sitemap
index parsing and generation functionality:

**Security Fixes:**

1. **Protocol Injection (HIGH)**: Added URL validation to prevent
   javascript:, data:, file:, and ftp: protocol injection attacks
   - Uses centralized validateURL() for consistent security
   - Enforces http/https protocol restriction
   - Validates URL format and structure

2. **URL Length DoS (MEDIUM)**: Enforced 2048 character URL limit
   per sitemaps.org specification to prevent resource exhaustion

3. **Memory Exhaustion (MEDIUM)**: Added maxEntries parameter to
   parseSitemapIndex() with default limit of 50,000 entries
   - Prevents DoS via maliciously large sitemap indexes
   - Configurable limit for different use cases

4. **Date Format Validation (LOW-MEDIUM)**: Added ISO 8601 date
   format validation for lastmod fields
   - Prevents arbitrary text injection
   - Ensures spec compliance

5. **Inconsistent Validation (MEDIUM)**: Replaced basic URL
   validation in stream with centralized validateURL()
   - Ensures consistent security across all code paths

6. **Empty URL Leakage (LOW)**: Fixed items with failed validation
   being pushed with empty URLs

**Changes:**

- lib/sitemap-index-parser.ts:
  - Added URL validation in text/cdata handlers
  - Added date format validation for lastmod
  - Added check to skip items with invalid URLs
  - Import validateURL and LIMITS

- lib/sitemap-index-stream.ts:
  - Replaced basic URL check with validateURL()
  - Improved error message formatting
  - Import validateURL from validation.ts

- tests/sitemap-index-security.test.ts (NEW):
  - 27 comprehensive security tests
  - Protocol injection tests (parser & stream)
  - URL length limit tests
  - Date validation tests
  - Memory exhaustion tests
  - CDATA handling tests
  - Error level handling tests

**Backward Compatibility:**

- All changes are 100% backward compatible
- Default behavior unchanged (WARN level)
- New maxEntries parameter is optional
- Invalid entries filtered in WARN mode (existing behavior)
- All 356 tests passing

πŸ€– Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@derduher derduher merged commit 4e390f6 into master Oct 15, 2025
6 checks passed
@derduher derduher deleted the security/sitemap-index-validation branch October 15, 2025 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant