ANDES is a suite of standalone scripts for comparing similarity between gene sets using precomputed gene embeddings. It includes a consensus protein–protein interaction network embedding generated with node2vec and a sample geneset database (Gene Ontology Biological Process genesets for Homo Sapiens).
- Gene-set similarity: Compute pairwise similarity scores between two gene sets in embedding space.
- Embedding-based GSEA: Perform a ranked Gene Set Enrichment Analysis (GSEA) using embedding-derived gene rankings.
These functions are implemented in src/set_analysis_fun.py and the demo
jupyter notebook (demo.ipynb) shows sample usage.
If you use ANDES in your work, please cite:
A best-match approach for gene set analyses in embedding spaces. Li L, Dannenfelser R, Cruz C, Yao V. Genome Research. 2024.
- Install conda if you haven't already
- Create and activate the ANDES environment:
conda env create -f env.yml
conda activate ANDESTo quickly get started we recommend looking at our demo.ipynb. Alternatively, ANDES can be run
from the command line in both modes with the following commands.
Compute similarity between all pairs of genesets in two databases / gmt files:
python src/andes.py --emb embedding_file.csv --genelist embedding_gene_ids.txt --geneset1 first_gene_set_database.gmt --geneset2 second_gene_set_database.gmt --out output_file.csv -n num_processorCompute a ranked-based comparison for a geneset database (such as Gene Ontology) given a ranked list of genes
python src/andes_gsea.py --emb embedding_file.csv --genelist embedding_gene_ids.txt --geneset gene_set_database.gmt --rankedlist ranked_genes.txt --out output_file.csv -n num_processor