A toolkit designed specifically for researchers working with bacterial genomes. This pipeline creates circular genome visualizations and functional annotation analyses.
This toolkit is specifically designed for:
- Microbiologists and environmental microbiologists analyzing bacterial genomes
- Genomics researchers who need visualizations of the genomic DNA
- Research groups working with bacterial isolates from environmental or clinical samples
- Circular maps with customizable color schemes
- GC content and GC skew analysis with automatic calculation and visualization
- Gene annotation display with color-coded functional categories
- Adjusted graphic output in multiple formats (PNG, SVG, or PDF)
- Automatic scaling optimized for different genome sizes
- Donut charts for COG/eggNOG functional category distributions
- Color palettes designed for contrast view of elements
- Statistical summaries of functional annotations
- Multiple visualization options for different presentation needs
- Protein sequence extraction from Excel annotation files
- EggNOG-mapper results parsing with summarization. The EggNOG data need to be analyzed separately by user
- Flexible input/output formats compatible with common bioinformatics tools
Basic Requirements:
- Computer: Windows 10+, macOS 10.15+, or Linux (Ubuntu 18.04+)
- Python: Version 3.8 or higher
- Memory: Minimum 4GB RAM (8GB+ recommended for large genomes)
- Storage: At least 1GB free space for installation and output files
This toolkit uses uv for easy dependency management (similar to how app stores work on your phone):
-
Install uv (one-time setup):
# For macOS/Linux curl -LsSf https://astral.sh/uv/install.sh | sh # For Windows powershell -c "irm https://astral.sh/uv/install.sh | iex"
-
Restart your terminal after installation.
This section walks you through the entire process from raw data to circular genome figures.
You will need:
-
Genome Sequence File (
genome.fasta)- Contains your bacterial genome DNA sequence
- Must be in FASTA format (standard format used by most sequencing tools)
- Can be assembled from your NGS data or downloaded from databases
-
Gene Annotation File (
annotation.xlsx)- Contains information about genes, their locations, and functions
- Must be in Excel format (.xlsx)
- Should include columns for gene name, start position, end position, strand, and functional category
Example of a properly formatted annotation file:
| Gene | Start | End | Strand | Function | COG_Category |
|---|---|---|---|---|---|
| gyrA | 100 | 2500 | + | DNA gyrase subunit A | L |
| recA | 2600 | 4000 | - | Recombinase A | L |
| rpoB | 4100 | 7000 | + | RNA polymerase beta | K |
If you plan to perform functional annotation using EggNOG-mapper:
uv run python create_fasta.py --input your_annotation.xlsx --output proteins.faaWhat this does:
- Reads your Excel annotation file
- Extracts protein sequences for each gene
- Creates a FASTA file compatible with functional annotation tools
Basic Usage:
uv run python circular_genome_map_publication.py \
--fasta your_genome.fasta \
--anno your_annotation.xlsx \
--window 9000 \
--outdir resultsUnderstanding the Parameters:
--fasta: Your genome sequence file--anno: Your gene annotation Excel file--window: Window size for GC content calculation (9000 works well for most bacteria)--outdir: Folder where results will be saved
Output Files:
circular_map.png: High-resolution image for presentationscircular_map.svg: Vector format for printing (infinite resolution)circular_map.pdf: PDF format for journal submissionsgc_content.txt: Tabular data for further analysis
Part A: Run EggNOG-mapper (External tool, not included!!)
- Upload your
proteins.faafile to the EggNOG-mapper web server - Download the results as a tab-separated file
Part B: Summarize Functional Categories:
uv run python create_functional_table.py \
--input emapper_out.annotations \
--output functional_summary.csvPart C: Create eggNOG pie chart:
Option 1: Donut Chart
uv run python plot_nature_figure.py \
--input functional_summary.csv \
--output_prefix functional_analysisOption 2: Scientific Bar Chart
uv run python generate_scientific_figure.py \
--input functional_summary.csv \
--output_prefix functional_bars- Outer Ring 1: Blue colored ring, gene annotations for forward CDS
- Outer Ring 2: Red colored ring, gene annotations for reverse CDS
- Inner Ring 1: Black colored histogram ring, GC Content
- Inner Ring 2: Green and purple colored histogram ring, GC Skew (indicative of replication origin and terminus)
| Category | Description | Color |
|---|---|---|
| J | Translation, ribosomal structure | Blue |
| K | Transcription | Green |
| L | Replication, recombination and repair | Red |
| C | Energy production and conversion | Orange |
| E | Amino acid transport and metabolism | Purple |
| F | Nucleotide transport and metabolism | Brown |
| G | Carbohydrate transport and metabolism | Pink |
| H | Coenzyme transport and metabolism | Gray |
| I | Lipid transport and metabolism | Yellow |
| P | Inorganic ion transport and metabolism | Cyan |
| Q | Secondary metabolites biosynthesis | Magenta |
| R | General function prediction only | Light Blue |
| S | Function unknown | Light Gray |
You can modify colors by editing the script files:
- Open the relevant Python script in a text editor
- Look for color definitions (usually hex codes like
#FF0000) - Replace with your preferred colors
- Save and re-run the script
For extremely high-resolution outputs:
uv run python circular_genome_map_publication.py \
--fasta your_genome.fasta \
--anno your_annotation.xlsx \
--window 9000 \
--outdir results \
--dpi 600Create a simple text file with your genome paths and use:
while IFS= read -r genome; do
uv run python circular_genome_map_publication.py \
--fasta "$genome" \
--anno "${genome%.fasta}_annotation.xlsx" \
--window 9000 \
--outdir "results_$(basename "$genome" .fasta)"
done < genome_list.txtQ: "Python command not found" error A: Install Python 3.8+ from python.org or use your system's package manager.
Q: "uv command not found" error
A: Make sure you installed uv correctly and restarted your terminal. Try running uv --version to verify installation.
Q: Memory errors with large genomes
A: Increase the --window parameter to smaller values (e.g., 5000) or use a computer with more RAM.
Q: Empty or corrupted output files A: Check that your input files are properly formatted and not corrupted. Try opening them in their respective applications first.
Q: Colors don't match my journal's requirements A: Use the SVG output format and import it into vector graphics software (Adobe Illustrator, Inkscape) for final adjustments.
For Technical Issues:
- Check the error messages carefully for specific file or parameter issues
- Verify your input files match the expected format
- Ensure all required software is properly installed
For Scientific Questions:
- Consult with your bioinformatics core facility
- Reference the original EggNOG-mapper publications for methodology
- Consider collaborating with computational biologists for complex analyses
Circular genome maps are visual representations of bacterial chromosomes that display:
- Gene organization and density
- Functional categorization of genes
- Nucleotide composition patterns (GC content/skew)
- Evolutionary relationships between genomic regions
These visualizations may help researchers identify:
- Replication origin and terminus regions
- Genomic islands and horizontal gene transfer events
- Metabolic capabilities and lifestyle adaptations
- Comparative genomic features between strains
The COG (Clusters of Orthologous Groups) classification system groups genes based on:
- Evolutionary relationships across different species
- Functional similarities and biochemical pathways
- Cellular processes and molecular functions
This standardized system enables meaningful comparisons between different bacterial genomes and facilitates meta-analyses across multiple studies.
If you use this toolkit in your research, please cite:
@software{CirculAn2025,
author = {Alex Prima},
title = {CirculAn: An automated Python toolkit for high-resolution circular genome mapping and functional annotation visualization},
year = {2026},
url = {https://github.com/axp-knickei/CirculAn},
version = {0.1.0},
doi={doi.org/10.5281/zenodo.18522435}
}
Additional citations to consider:
- EggNOG-mapper methodology paper (Huerta-Cepas et al., 2017, 2019)
- Original COG database publications (Tatusov et al., 1997, 2003)
- Specific journals' guidelines for figure preparation
This project is released under the MIT License, allowing for both academic and commercial use with proper attribution.
Developed by: Alex
Development Period: 2026
Purpose: Learning to make genome mapping of whole genome sequence data
Target Users: A beginner who want to test and inspect with their genomic dataset
Contributions and feedback are welcome! This toolkit continues to evolve
Version v0.1.0: Initial release with core circular mapping functionality
Last updated: February 2026
Compatible with Python 3.8+ and modern operating systems