Thanks to visit codestin.com
Credit goes to github.com

Skip to content

A genomic language model that distinguishes true structural variants from artifacts in long-read whole genome amplification

License

Notifications You must be signed in to change notification settings

ylab-hi/ChimeraLM

Repository files navigation

ChimeraLM Logo

ChimeraLM

A Genomic Language Model for Detecting WGA Chimeric Artifacts

python pypi pyversion download hf-space ruff

release stars activity lastcommit

InstallationQuick StartWeb DemoDocumentationCitation


A deep learning-powered tool to identify chimeric artifacts introduced by whole genome amplification (WGA).

🌐 Try it Online

No installation required! Try ChimeraLM instantly in your browser:

🤗 Launch Web Demo on Hugging Face Spaces

Perfect for:

  • 🧪 Testing with individual sequences
  • 📊 Visualizing prediction confidence scores
  • 🎓 Learning about chimeric artifact detection
  • 🔬 Quick validation before batch processing

For production use with BAM files and batch processing, install the CLI tool below.

Installation

pip install chimeralm

Requirements: Python 3.10, 3.11 and 3.12

For GPU support, installation instructions, and troubleshooting, see the Installation Guide.

Quick Start

# Predict chimeric reads (CPU)
chimeralm predict your_data.bam

# Predict with GPU acceleration
chimeralm predict your_data.bam --gpus 1 --batch-size 24

# Filter BAM to remove chimeric reads
chimeralm filter your_data.bam your_data.predictions

Output:

  • Predictions: Tab-separated file with read names and labels (0=biological, 1=chimeric)
  • Filtered BAM: {input}.filtered.sorted.bam with chimeric reads removed

Need more help? See the Quick Start Tutorial for a complete walkthrough.

Documentation

Full documentation is available at ylab-hi.github.io/ChimeraLM

Key Resources:

Features

  • 🌐 Interactive Web Demo: Try it online at HuggingFace Spaces - no installation needed!
  • 🎯 High Accuracy: Deep learning model trained on real WGA data
  • ⚡ GPU Accelerated: Optimized for CUDA, MPS (Apple Silicon), and CPU
  • 🚀 Easy to Use: Simple CLI with sensible defaults
  • 📦 Fast Processing: Batch inference with configurable parallelism
  • 🖥️ Local Web Interface: Run the web UI locally with chimeralm ui
  • 🏭 Production Ready: Includes filtering, sorting, and indexing of BAM files

Contributing

Contributions are welcome! See our Contributing Guide for development setup and guidelines.

Citation

If you use ChimeraLM in your research, please cite:

@software{chimeralm2025,
  title={ChimeraLM: A genomic language model to identify chimera artifacts},
  author={Li, Yangyang and Guo, Qingxiang and Yang, Rendong},
  year={2025},
  url={https://github.com/ylab-hi/ChimeraLM}
}

License

Apache License 2.0 - see LICENSE for details.

About

A genomic language model that distinguishes true structural variants from artifacts in long-read whole genome amplification

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •