Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Latest commit

Β 

History

History
231 lines (175 loc) Β· 8.47 KB

File metadata and controls

231 lines (175 loc) Β· 8.47 KB

πŸ” Duplicate Logic Detector Action

GitHub Marketplace Tests License: MIT

Automatically detect duplicate logic in Python code changes using advanced AST analysis and semantic similarity.

Prevent code duplication, improve code quality, and maintain cleaner codebases with intelligent duplicate detection that goes beyond simple text matching.

✨ Key Features

  • 🧠 Multi-Strategy Detection: AST analysis, semantic similarity, and function signature matching
  • 🎯 Smart Pattern Recognition: Detects business logic patterns and common code structures
  • πŸ’¬ Actionable PR Comments: Provides suggestions and refactoring recommendations
  • βš™οΈ Highly Configurable: Adjustable similarity thresholds and file patterns
  • πŸ“Š Comprehensive Reports: JSON and Markdown reports with detailed analysis
  • πŸš€ Fast & Efficient: Uses uv package manager for lightning-fast dependency installation

πŸš€ Quick Start

Add this workflow to .github/workflows/duplicate-detection.yml:

name: Duplicate Logic Detection

on:
  pull_request:
    paths: ['**/*.py']

jobs:
  detect-duplicates:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
          
      - name: Detect Duplicate Logic
        uses: ArthurMor4is/duplicate-logic-detector-action@v1
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}

πŸ“‹ Input Parameters

Parameter Description Required Default
github-token GitHub token for API access βœ… ${{ github.token }}
pr-number Pull request number ❌ ${{ github.event.number }}
repository Repository name (owner/repo) ❌ ${{ github.repository }}
base-ref Base reference for comparison ❌ ${{ github.base_ref }}
head-ref Head reference for comparison ❌ ${{ github.head_ref }}
post-comment Post findings as PR comment ❌ true
fail-on-duplicates Fail if high-confidence duplicates found ❌ false
similarity-method Similarity method to use (jaccard_tokens, sequence_matcher, levenshtein_norm) ❌ jaccard_tokens

πŸ“Š Outputs

Output Description
duplicates-found Whether any duplicates were detected
match-count Total number of matches found
report-path Path to the generated report file

βš™οΈ Usage Examples

Basic Usage

- name: Detect Duplicate Logic
  uses: ArthurMor4is/duplicate-logic-detector-action@v1
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}

Strict Mode (Fail on Duplicates)

- name: Detect Duplicate Logic
  uses: ArthurMor4is/duplicate-logic-detector-action@v1
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    fail-on-duplicates: true

Silent Mode (No PR Comments)

- name: Detect Duplicate Logic
  uses: ArthurMor4is/duplicate-logic-detector-action@v1
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    post-comment: false

Custom Similarity Method

- name: Detect Duplicate Logic (High Precision)
  uses: ArthurMor4is/duplicate-logic-detector-action@v1
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    similarity-method: levenshtein_norm  # More thorough analysis
    fail-on-duplicates: true

πŸ” Detection Strategies

The action uses configurable similarity analysis to detect duplicate logic patterns:

1. AST Analysis

  • Parses Python files to extract function definitions
  • Analyzes function signatures and structure
  • Identifies code patterns and complexity

2. Similarity Methods

Choose from three different similarity algorithms:

jaccard_tokens (Default)

  • Best for: General purpose, fast analysis
  • Method: Token-based Jaccard similarity coefficient
  • Strengths: Fast, good balance of precision/recall
  • Use when: You want reliable results with good performance

sequence_matcher

  • Best for: Balanced approach between speed and accuracy
  • Method: Python's difflib.SequenceMatcher
  • Strengths: Good at detecting structural similarities
  • Use when: You need more nuanced similarity detection

levenshtein_norm

  • Best for: High precision, strict duplicate detection
  • Method: Normalized Levenshtein distance
  • Strengths: Most thorough analysis, best precision
  • Use when: You want to catch even subtle duplicates

3. Smart Filtering

  • Excludes very small functions (< 5 lines)
  • Filters out test files and common patterns
  • Prioritizes business logic and complex functions

πŸ“ˆ Example Output

## πŸ” Duplicate Logic Detection Results

Found 2 potential duplicates with high confidence:

### Match 1: Email Validation
- **New Function**: `check_email_format` (src/utils.py:15)
- **Existing Function**: `validate_email` (src/validators.py:8)  
- **Similarity**: 92%
- **Suggestion**: Consider using the existing `validate_email` function instead

### Match 2: Data Processing
- **New Function**: `process_user_data` (src/handlers.py:25)
- **Existing Function**: `handle_user_info` (src/services.py:45)
- **Similarity**: 87%
- **Suggestion**: Extract common logic into a shared utility function

πŸ“¦ Dependencies

Runtime Dependencies

The action has minimal runtime dependencies for fast execution:

  • rich v14.1.0 - Console output and progress bars

Development Dependencies

For development, testing, and research, additional dependencies are available:

  • Testing: pytest, pytest-mock, pytest-cov, pytest-xdist
  • Code Quality: black, isort, flake8, mypy, pre-commit
  • Research: GitPython, PyGithub, scikit-learn, nltk, numpy, pandas, pyyaml

Dependency Management

The action uses modern Python packaging with pyproject.toml and uv for fast dependency management:

# Clean core dependencies
dependencies = []

# Runtime dependencies (action execution)
[project.optional-dependencies]
runtime = ["rich==14.1.0"]

# Research dependencies (experiments)
research = ["GitPython", "PyGithub", "scikit-learn", ...]

# Development dependencies
dev = ["black>=23.0.0", "isort>=5.12.0", ...]
test = ["pytest>=7.0.0", "pytest-mock>=3.10.0", ...]

πŸ› οΈ Development

# Clone the repository
git clone https://github.com/ArthurMor4is/duplicate-logic-detector-action.git

# Install dependencies using uv (recommended)
uv sync --all-extras

# Or using traditional pip
pip install -e ".[dev,test]"

# Run tests
make test
# or
uv run pytest

# Run sample analysis
make test-sample

Note: The config/default-config.yml file is used for development and testing purposes only. The GitHub Action uses built-in configuration optimized for CI/CD workflows.

πŸ“š Documentation

🀝 Contributing

Contributions are welcome! Please read our contributing guidelines and submit pull requests.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™‹β€β™‚οΈ Support