bmVAE is a variational autoencoder method for clustering single-cell mutation data.
- Python 3.8+.
First, download bmVAE from github and change to the directory:
git clone https://github.com/zhyu-lab/bmvae
cd bmvaeCreate a new environment named "bmvae":
conda create --name bmvae python=3.8.13Then activate it:
conda activate bmvaeUse pip to install the requirements:
python -m pip install -r requirements.txtNow you are ready to run bmVAE!
bmVAE clusters cells into distinct subpopulations and infer genotypes of each subpopulation based on single-cell binary data.
Example:
python bmvae.py --input testdata/example.txt --output testdata
The SNVs of single cells are denoted as a genotype matrix. Each row defines the mutational states of a single cell, and each column represents a mutation. Columns are separated by tabs. The genotype matrix is binary.
The entry at position [i,j] should be
- 0 if mutation j is not observed in cell i,
- 1 if mutation j is observed in cell i, or
- 3 if the genotype information is missing
The output directory is provided by users.
The genotypes of subpopulations are written to a file with name "clusters.txt".
The cell-to-cluster assignments are written to a file with name "labels.txt".
The estimated FPR and FNR are written to a file with name "para.txt".
-
--input <filename>Replace <filename> with the file containing the genotype matrix. -
--output <string>Replace <string> with the output directory.
-
--Kmax <INT>Set <INT> to a positive integer. This specifies the maximum number of clusters to consider. Default value is set to N/10 (N denotes the number of cells). -
--seed <INT>Set <INT> to a non-negative integer. This specifies the seed for generating random numbers. Default value is 0. -
--epochs <INT>Set <INT> to a positive integer. This specifies the number of epoches to train the VAE. Default value is 250. -
--batch_size <INT>Set <INT> to a positive integer. This specifies the batch size for training the VAE. Default value is 64. -
--lr <Double>Set <Double> to a positive real number. This specifies the learning rate for training the VAE. Default value is 0.0001. -
--dimension <INT>Set <INT> to a positive integer. This specifies the dimension of latent space of the VAE. Default value is 3.
If you have any questions, please contact [email protected].