Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

dacoburn
Copy link
Collaborator

@dacoburn dacoburn commented Aug 22, 2025

🔧 Socket Python CLI - Critical Bug Fixes & Performance Improvements

🎯 Description

This PR addresses several critical issues in the Socket Python CLI that were affecting license compliance monitoring, committer identification, and performance with large repositories. The changes ensure that all security alerts, including license violations, are properly reported in diff scans while improving the overall reliability and performance of the tool.

🐛 Key Bug Fixes

1. License Violations Missing from Diff Scans

Critical Fix: The CLI was filtering out license violation alerts (licenseSpdxDisj) from diff processing results, causing license violations to not be properly reported when scanning changes between commits or branches.

Root Cause: The process_alerts_for_diff_scan() method in the core module contained a filter condition that explicitly excluded alerts of type licenseSpdxDisj from being added to the alerts collection.

Solution: Removed the license alert filter condition, ensuring all alert types including license violations are now properly included in diff scan results for comprehensive compliance monitoring.

# Before (filtered out license alerts)
if issue_alert.type != 'licenseSpdxDisj':
    if issue_alert.key not in alerts_collection:
        alerts_collection[issue_alert.key] = [issue_alert]
    else:
        alerts_collection[issue_alert.key].append(issue_alert)

# After (includes all alerts)
if issue_alert.key not in alerts_collection:
    alerts_collection[issue_alert.key] = [issue_alert]
else:
    alerts_collection[issue_alert.key].append(issue_alert)

2. Enhanced Committer Identification

Improvement: Significantly improved how the CLI identifies committers with a robust priority-based system that works across different CI/CD environments.

New Priority Order:

  1. CLI Arguments: Direct --committers parameter (highest priority)
  2. CI/CD Environment Variables:
    • GITHUB_ACTOR (GitHub Actions)
    • GITLAB_USER_LOGIN (GitLab CI)
    • BITBUCKET_STEP_TRIGGERER_UUID (Bitbucket Pipelines)
  3. Smart Email Parsing: Extract usernames from GitHub noreply emails ([email protected])
  4. Git Email: Use the commit author's email address
  5. Git Author Name: Fallback to author name

This ensures accurate committer attribution across all major CI/CD platforms and development workflows.

🚀 Performance Enhancements

3. Lazy File Loading with SDK 2.1.8

Major Performance Improvement: Upgraded to Socket SDK 2.1.8 with lazy file loading support, significantly improving performance for large repositories.

Benefits:

  • Prevents "Too many open files" errors when processing repositories with large numbers of manifest files
  • Reduced memory footprint through efficient file handle management
  • Better resource utilization with configurable max_open_files=50 limit
  • Improved scalability for enterprise-grade repositories

4. Reduced Log Noise

Quality of Life: Changed ulimit warning messages from warning to debug level to reduce unnecessary log noise while maintaining diagnostic capability.

Implementation:

# Now uses debug level for better log hygiene
log.debug(f"Found {file_count} manifest files, which may exceed the file descriptor limit (ulimit -n = {ulimit_check['soft_limit']})")

🎯 Release Notes

  • FIXED: License violations (licenseSpdxDisj) are now properly included in diff scan results, ensuring complete compliance monitoring
  • IMPROVED: Enhanced committer identification with proper priority order: CLI arguments → CI/CD environment variables (GITHUB_ACTOR, GITLAB_USER_LOGIN, BITBUCKET_STEP_TRIGGERER_UUID) → extracted usernames from GitHub noreply emails → git email → git author name as fallback
  • ENHANCED: Upgraded to Socket SDK 2.1.8 with lazy file loading support, significantly improving performance for large repositories and preventing "Too many open files" errors
  • OPTIMIZED: Reduced log noise by changing ulimit warning messages from warning to debug level for better log hygiene

*This PR ensures the Socket Python CLI provides complete, accurate, and performant security scanning across all development

…andling

- Upgrade socket-sdk-python dependency to version 2.1.8 to support lazy file loading capabilities
- Enable lazy file loading in fullscans.post() with use_lazy_loading=True and max_open_files=50 to prevent "Too many open files" errors when processing large numbers of manifest files
- Remove custom lazy_file_loader module as this functionality is now handled by the SDK
- Fix committer display format by implementing proper priority order:
  1. CLI --committers argument (highest priority)
  2. CI/CD SCM username (GITHUB_ACTOR, GITLAB_USER_LOGIN, BITBUCKET_STEP_TRIGGERER_UUID)
  3. Git username extracted from email patterns (e.g., GitHub noreply emails)
  4. Git email address
  5. Git author name (fallback)
- Add get_formatted_committer() method to Git class to properly format committer strings instead of displaying raw git.Actor objects
- Include license alerts in diff processing by removing licenseSpdxDisj filter condition
- Change ulimit warning messages from log.warning to log.debug to reduce noise
- Update create_full_scan() method signature to accept file paths directly instead of pre-processed file objects
- Remove deprecated load_files_for_sending() method as lazy loading is now handled by the SDK

This update improves performance for large repositories, provides better committer identification in CI/CD environments, and ensures license violations are properly reported.
@dacoburn dacoburn requested a review from a team as a code owner August 22, 2025 23:54
@dacoburn dacoburn requested review from nolanlawson and kapravel and removed request for a team August 22, 2025 23:54
Copy link

github-actions bot commented Aug 22, 2025

🚀 Preview package published!

Install with:

pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple socketsecurity==2.2.0.dev1

Docker image: socketdev/cli:pr-111

@dacoburn dacoburn changed the title feat: upgrade to SDK 2.1.8 with lazy loading and improved committer h… fix: include license violations in diff results + SDK 2.1.8 upgrade Aug 22, 2025
…dling

- Add --enable-diff flag to force differential scanning even when using --integration api
- Improve license policy violation grouping and display in PR comments
- Fix alert consolidation logic to prevent duplicate alerts based on manifest files
- Enhance empty baseline scan creation with proper file cleanup
- Add comprehensive test coverage for new enable_diff functionality
- Update documentation with new scanning mode examples and usage patterns

The --enable-diff flag enables differential mode without SCM integration,
useful for getting diff reports while using the API integration type.
License policy violations are now properly grouped by package and displayed
with consistent formatting in GitHub PR comments.
@dacoburn dacoburn merged commit c9df808 into main Aug 23, 2025
6 checks passed
@dacoburn dacoburn deleted the doug/fix-diff-results-for-violation branch August 23, 2025 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants