Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ShigrafS
Copy link

Closes #264

This PR enhances the flexibility of column specification in pairtools by allowing CLI flags (-c1, -c2, etc.) to accept both column indices (ints) and column names (strs). Additionally, it refactors sorting and deduplication logic to use canonicalized column references, making pair_type (pt) optional for sorting.

🛠️ Changes Introduced

  1. Modified CLI Flags (-c1, -c2, etc.)

    • Now accepts both integers and strings.
    • Default values remain integers for backward compatibility.
  2. Implemented Conversion Functions in headerops

    • Converts between column indices and names.
    • Introduced a canonicalization function for consistent column handling.
  3. Refactored sort and dedup Commands

    • Updated sorting and deduplication logic to work with both int- and str-based column references.
    • Ensured internal operations remain stable across different input types.
  4. Made pair_type (pt) Optional for Sorting

    • Sorting no longer strictly depends on pair_type.
    • Verified correctness in cases where pair_type is missing.
  5. Updated Tests

    • Added test cases for mixed column specification (ints & strs).
    • Verified correctness of sorting and deduplication without pair_type.

✅ How This Improves pairtools

  • More user-friendly CLI: Users can now specify columns either by index or name, making scripts more readable.
  • Backward compatibility maintained: Default values are still integers, ensuring existing workflows are not broken.
  • Greater flexibility in data handling: Sorting and deduplication work smoothly even if pair_type is missing.

📌 Checklist Before Merge

  • [] Code changes are complete and reviewed.
  • [] New features are covered with unit tests.
  • [] Backward compatibility is maintained.
  • [] Documentation updates (if necessary).

ShigrafS and others added 5 commits March 20, 2025 17:43
…dex) and strings (column names), with default values as integers.

Implemented conversion functions in headerops to handle column index ↔ column name mapping.

Introduced a canonicalization function to standardize column references.

Updated sort and dedup CLI logic to use the new conversion functions.

Made pair_type (pt) optional for sorting.

Updated test cases to cover new functionality.
Fixed the sorting check order in test_merrge.py.
@ShigrafS ShigrafS marked this pull request as ready for review March 27, 2025 11:31
@ShigrafS ShigrafS closed this May 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chrom1, chrom2 and pair_type fields are now required in pairs file header

1 participant