ChimeraLM

A Genomic Language Model for Detecting WGA Chimeric Artifacts

Installation • Quick Start • Documentation • Citation

A genomic language model to identify chimera artifacts introduced by whole genome amplification (WGA).

Overview

ChimeraLM is a deep learning-powered genomic language model that detects artificial chimeric reads arising from whole genome amplification (WGA) processes. Built with PyTorch Lightning and optimized for modern GPUs, it provides fast and accurate identification of chimeric artifacts in BAM files.

Key Features

High Accuracy: Deep learning model trained on real WGA data
GPU Accelerated: Optimized for CUDA, MPS (Apple Silicon), and CPU
Easy to Use: Simple CLI with sensible defaults
Fast Processing: Batch inference with configurable parallelism
Web Interface: Interactive web UI for visualization and analysis
Production Ready: Includes filtering, sorting, and indexing of BAM files

Installation

Install from PyPI

pip install chimeralm

Install from Source

# Clone the repository
git clone https://github.com/ylab-hi/ChimeraLM.git
cd ChimeraLM

# Install in development mode with uv
uv sync

uv run chimeralm --version

Quick Start

Detect chimeric reads in your BAM file:

# Install ChimeraLM
pip install chimeralm

# Predict chimeric reads (CPU)
chimeralm predict your_data.bam

# Predict with GPU acceleration
chimeralm predict your_data.bam --gpus 1 --batch-size 24

CLI Usage

ChimeraLM provides a Python CLI with three main commands:

predict: Detect chimeric reads using pre-trained models
filter: Filter BAM files based on predictions
web: Launch interactive web interface

Command Structure

chimeralm [OPTIONS] COMMAND [ARGS]...

Available Commands

`predict` - Detect Chimeric Reads

Predict chimeric reads in a BAM file using the pre-trained ChimeraLM model.

chimeralm predict [OPTIONS] DATA_PATH

Arguments:

DATA_PATH: Path to the input BAM file

Options:

-g, --gpus INTEGER: Number of GPUs to use (default: 0)
-o, --output PATH: Output path for predictions (default: {input}.predictions)
-b, --batch-size INTEGER: Batch size for processing (default: 12)
-w, --workers INTEGER: Number of worker threads (default: 0)
-v, --verbose: Enable verbose output
-m, --max-sample INTEGER: Maximum number of samples to process
-l, --limit-batches INTEGER: Limit prediction batches
-p, --progress-bar: Show progress bar
--random-seed: Make prediction non-deterministic

Examples:

# Basic prediction on CPU
chimeralm predict input.bam

# Prediction with GPU acceleration
chimeralm predict input.bam --gpus 1 --batch-size 24

# Prediction with custom output path and progress bar
chimeralm predict input.bam --output results/ --progress-bar --verbose

`filter` - Filter BAM Files

Filter BAM files based on prediction results.

chimeralm filter [OPTIONS] INPUT_BAM PREDICTIONS_DIR

Arguments:

INPUT_BAM: Path to the input BAM file
PREDICTIONS_DIR: Directory containing prediction results

Options:

-o, --output-prediction PATH: Output path for filtered BAM file

Example:

chimeralm filter input.bam predictions/ --output-prediction filtered.bam

`web` - Launch Web Interface

Launch an interactive web interface for visualizing and analyzing chimeric reads.

chimeralm web

This command starts a local web server that provides:

Interactive visualization of predictions
Analysis dashboards and metrics
Easy-to-use interface for non-technical users

Performance Tips

GPU Usage: Use --gpus 1 for faster processing if CUDA is available
Batch Size: Increase --batch-size for better GPU utilization (e.g., 24-32)
Memory: Monitor memory usage with large batch sizes
Threading: Adjust --workers based on your system's CPU cores

Output Files

Prediction outputs:

{output_dir}/predictions.txt: Tab-separated file with read names and predicted labels (0=biological, 1=chimeric)
{input}.filtered.sorted.bam: BAM file with chimeric reads removed (auto-generated by filter command)
{input}.filtered.sorted.bam.bai: BAM index file

Troubleshooting

Common Issues:

CUDA out of memory: Reduce --batch-size or use CPU mode
Slow processing: Enable GPU acceleration with --gpus 1
Missing dependencies: Run uv sync to install all dependencies

Debug Mode: Use --verbose flag to get detailed logging information about the prediction process.

Version Information

chimeralm --version

Getting Help

# General help
chimeralm --help

# Command-specific help
chimeralm predict --help

Citation

If you use ChimeraLM in your research, please cite:

@software{chimeralm2025,
  title={ChimeraLM: A genomic language model to identify chimera artifacts},
  author={Li, Yangyang, Guo, Qingxiang and Yang, Rendong},
  year={2025},
  url={https://github.com/ylab-hi/ChimeraLM}
}

License

This project is licensed under the Apache License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 243 Commits
.cargo		.cargo
.github		.github
chimeralm		chimeralm
configs		configs
docs		docs
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.vale.ini		.vale.ini
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.rs		build.rs
cliff.toml		cliff.toml
clippy.toml		clippy.toml
codecov.yaml		codecov.yaml
eval.py		eval.py
pyproject.toml		pyproject.toml
rust-toolchain.toml		rust-toolchain.toml
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ChimeraLM

Overview

Key Features

Installation

Install from PyPI

Install from Source

Quick Start

CLI Usage

Command Structure

Available Commands

`predict` - Detect Chimeric Reads

`filter` - Filter BAM Files

`web` - Launch Web Interface

Performance Tips

Output Files

Troubleshooting

Version Information

Getting Help

Citation

License

Contributing

About

Uh oh!

Releases 1

Packages

Languages

License

ylab-hi/ChimeraLM

Folders and files

Latest commit

History

Repository files navigation

ChimeraLM

Overview

Key Features

Installation

Install from PyPI

Install from Source

Quick Start

CLI Usage

Command Structure

Available Commands

predict - Detect Chimeric Reads

filter - Filter BAM Files

web - Launch Web Interface

Performance Tips

Output Files

Troubleshooting

Version Information

Getting Help

Citation

License

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

`predict` - Detect Chimeric Reads

`filter` - Filter BAM Files

`web` - Launch Web Interface

Packages