GeneAAExtractor is a lightweight Google Colab tool for extracting amino acid sequences of specific genes directly from GFF3 and FASTA files. It's designed for microbiologists, bioinformaticians, and AMR researchers working with genome annotations and isolate analysis.
- Extracts amino acid sequences for only the user-specified genes
- Works with
.gff3,.fasta, and.txtgene list inputs - Supports strand orientation and reverse complement logic
- Exports individual
.faafiles for each gene in the format:GeneName IsolateName.faa - Automatically zips all extracted protein files for download
- GFF3 file β Genome annotations
- FASTA file β Genomic sequence
- TXT file β List of gene names to extract (one per line, case-sensitive optional)
A .zip archive containing one .faa file per gene:
acrA SSS08.faa
blaTEM SSS08.faa
dfrA12 SSS08.faa
- Open the tool in Google Colab
- Upload your
.gff3,.fasta, and.txtfiles - Enter your isolate name when prompted
- The tool processes your genome and downloads the protein
.zip
Input:
- ecoli_annotations.gff3
- ecoli_genome.fasta
- gene_list.txt (contains: acrA, acrB, blaTEM)
Output:
- acrA SSS08.faa
- acrB SSS08.faa
- blaTEM SSS08.faa
π¦ Dependencies
Python 3.7+
Biopython
π License
MIT License β free to use and adapt for academic or research purposes.
β¨ Acknowledgements
Developed with love for wet-lab researchers looking to automate their isolate curation workflows.