A deep learning-powered tool to identify chimeric artifacts introduced by whole genome amplification (WGA).
pip install chimeralmRequirements: Python 3.10, 3.11 and 3.12
For GPU support, installation instructions, and troubleshooting, see the Installation Guide.
# Predict chimeric reads (CPU)
chimeralm predict your_data.bam
# Predict with GPU acceleration
chimeralm predict your_data.bam --gpus 1 --batch-size 24
# Filter BAM to remove chimeric reads
chimeralm filter your_data.bam your_data.predictionsOutput:
- Predictions: Tab-separated file with read names and labels (0=biological, 1=chimeric)
- Filtered BAM:
{input}.filtered.sorted.bamwith chimeric reads removed
Need more help? See the Quick Start Tutorial for a complete walkthrough.
Full documentation is available at ylab-hi.github.io/ChimeraLM
Key Resources:
- Installation Guide - Setup with pip, conda, uv, or from source
- Quick Start Tutorial - Your first prediction in 15 minutes
- CLI Reference - Complete command documentation
- BAM Filtering Tutorial - Comprehensive filtering guide
- Performance Optimization - Speed up your analysis
- Troubleshooting - Common issues and solutions
- High Accuracy: Deep learning model trained on real WGA data
- GPU Accelerated: Optimized for CUDA, MPS (Apple Silicon), and CPU
- Easy to Use: Simple CLI with sensible defaults
- Fast Processing: Batch inference with configurable parallelism
- Web Interface: Interactive web UI for visualization and analysis
- Production Ready: Includes filtering, sorting, and indexing of BAM files
Contributions are welcome! See our Contributing Guide for development setup and guidelines.
If you use ChimeraLM in your research, please cite:
@software{chimeralm2025,
title={ChimeraLM: A genomic language model to identify chimera artifacts},
author={Li, Yangyang and Guo, Qingxiang and Yang, Rendong},
year={2025},
url={https://github.com/ylab-hi/ChimeraLM}
}Apache License 2.0 - see LICENSE for details.