███████╗████████╗ █████╗ ██████╗ ██╗ ██╗███████╗ ██████╗ ██████╗ ██████╗ ███████╗
██╔════╝╚══██╔══╝██╔══██╗██╔══██╗██║ ██║██╔════╝██╔════╝██╔═══██╗██╔══██╗██╔════╝
███████╗ ██║ ███████║██████╔╝███████║███████╗██║ ██║ ██║██████╔╝█████╗
╚════██║ ██║ ██╔══██║██╔═══╝ ██╔══██║╚════██║██║ ██║ ██║██╔═══╝ ██╔══╝
███████║ ██║ ██║ ██║██║ ██║ ██║███████║╚██████╗╚██████╔╝██║ ███████╗
╚══════╝ ╚═╝ ╚═╝ ╚═╝╚═╝ ╚═╝ ╚═╝╚══════╝ ╚═════╝ ╚═════╝ ╚═╝ ╚══════╝A species-optimized computational pipeline for rapid, accessible Staphylococcus aureus genotyping and surveillance
Complete MRSA/MSSA genomic analysis in minutes — not hours
📖 Documentation • ⚡ Quick Start • ✨ Features • 🔧 Installation • 🚀 Usage • 📊 Output • 📈 Performance • 🔮 Future Roadmap • 🤝 Contributing
- 🎯 Overview
- ✨ Key Features
- ⚡ Quick Start
- 🔧 Installation
- 🚀 Usage Guide
- 📊 Output Structure
- 🔍 Analytical Modules
- 📈 Performance Benchmarks
- 🔬 Validation & Accuracy
- 🆚 Tool Comparison
- 🔮 Future Development
- ❓ FAQ
- 🐛 Troubleshooting
- 📚 Citation
- 🙏 Acknowledgements
- 👥 Authors & Contact
- 📄 License
StaphScope is an automated, locally-executable computational pipeline designed specifically for comprehensive Staphylococcus aureus genomic surveillance. It addresses the critical bottleneck in MRSA (Methicillin-Resistant S. aureus) research by integrating six essential genotyping methods into a single, cohesive workflow.
- Fragmented Bioinformatics: Traditional MRSA analysis requires 5+ separate tools with conflicting dependencies
- Resource Barriers: Web-based services need constant internet and raise data privacy concerns
- Time Constraints: Generalist platforms take hours; outbreaks need answers in minutes
- Interpretation Challenges: Raw data without epidemiological context limits actionable insights
StaphScope delivers:
- ✅ Single-command installation via Conda
- ✅ 10-14 minute complete analysis (24 samples, 16 cores)
- ✅ 100% local execution with data privacy
- ✅ Intelligent resource management using Python's psutil library
- ✅ Interactive HTML reports with epidemiological context
- ✅ Automated MRSA/MSSA classification with confidence scoring
Perfect for: Clinical labs, outbreak investigations, research studies, and public health surveillance.
| Module | 🎯 Purpose | 📊 Key Outputs | ⚡ Speed |
|---|---|---|---|
| MLST Typing | Phylogenetic classification via 7 housekeeping genes | ST, CC, allele profiles, epidemiological context | <1 min |
| spa Typing | Hypervariable region analysis of protein A gene | spa type, repeat patterns, alignment metrics | <1 min |
| SCCmec Typing | Methicillin resistance cassette characterization | SCCmec type (I-XIII), mec/ccr complexes, confidence scores | 1-2 min |
| AMR Profiling | Comprehensive resistance gene detection | 5,000+ AMR genes, risk categorization, cross-sample patterns | 2-3 min |
| ABRicate Screening | Multi-database virulence/plasmid detection | 9 databases, plasmid replicons, virulence factors, clinical flags | 3-4 min |
| Lineage Database | Global epidemiological context | 44 major lineages, geographical distribution, outbreak potential | Instant |
- Automated MRSA Classification: Based on concurrent mecA/mecC + SCCmec detection
- Clinical Gene Flagging: Automatic highlighting of PVL, enterotoxins, van genes
- Risk Assessment: Categorizes genes as 'Critical Risk' (e.g., mecA, vanA) or 'High Risk'
- Cross-Genome Pattern Discovery: Summarizes gene frequencies across entire sample sets
- Curated Lineage Database: 44 major lineages with HA-MRSA, CA-MRSA, LA-MRSA classifications
- 8-10× faster than Bactopia for S. aureus-specific analyses
- Linear scaling with sample numbers (R² = 0.931)
- Dynamic resource allocation using Python psutil
- Low memory footprint: Runs on 4GB RAM, scales to HPC clusters
# Method 1:Conda (Recommended - handles all dependencies)
# Put conda-forge first (has newer biopython versions)
conda create -n staphscope -c conda-forge -c bioconda -c bbeckley-hub staphscope -y
# Create environment with Python 3.9 and install staphscope
conda create -n staphscope-env python=3.9 staphscope -c conda-forge -c bioconda
conda activate staphscope-env
or
# Install using mamba (Rapid)
mamba create -n staphscope -c conda-forge -c bioconda -c bbeckley-hub staphscope -y
mamba install -c conda-forge -c bioconda -c bbeckley-hub staphscope
# Method 2: From source
git clone https://github.com/bbeckley-hub/staphscope-typing-tool.git
cd staphscope-typing-tool
conda env create -f environment.yml
conda activate staphscope
pip install -e .Refer to [Update Databases (Recommended)- Please refer to these resources for STAPHSCOPE's integrated databases.]
### **Run your first analysis**
```bash
# Single genome analysis
staphscope -i genome.fasta -o results/
# Batch processing (24 genomes)
staphscope -i "*.fna" -o batch_results --threads 16
# Analysis complete in ~14 minutes! 🎉
usage: staphscope [-h] -i INPUT -o OUTPUT [-t THREADS] [--skip-amr]
[--skip-abricate] [--skip-mlst] [--skip-spa] [--skip-sccmec]
[--skip-lineage] [--skip-comprehensive]
StaphScope: Complete S. aureus Typing Pipeline
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Input FASTA file(s) - can use glob patterns like
"*.fna" or "*.fasta"
-o OUTPUT, --output OUTPUT
Output directory for all results
-t THREADS, --threads THREADS
Number of threads (default: 2)
--skip-amr Skip AMR analysis (AMRfinderPlus)
--skip-abricate Skip ABRicate analysis
--skip-mlst Skip MLST analysis
--skip-spa Skip spa typing analysis
--skip-sccmec Skip SCCmec analysis
--skip-lineage Skip lineage reference generation
--skip-comprehensive Skip comprehensive report generation (MLST + spa +
SCCmec)
Examples:
staphscope -i genome.fna -o results/
staphscope -i "*.fna" -o batch_results --threads 8
staphscope -i "*.fasta" -o analysis --threads 16 --skip-lineage
staphscope -i "genome*.fa" -o results/ --threads 4 --skip-comprehensive
Supported FASTA formats: .fna, .fasta, .fa, .fn
Analysis Modules:
• MLST (Multi-Locus Sequence Typing)
• spa typing (Staphylococcal Protein A)
• SCCmec typing (Methicillin Resistance Cassette)
• AMR profiling (Antimicrobial Resistance)
• ABRicate (Comprehensive resistance/Plasmid/virulence)
• Lineage reference database
• Comprehensive report (MLST + spa + SCCmec summary)
`
Output: Comprehensive results for all analyses in organized directories
Please run abricate --setupdb for recent gene annotations!!!
⭐ Star us on GitHub if you find this tool useful!
Transforming fragmented genomic data into coherent biological narratives 🧬✨
| Resource | Minimum | Recommended | Production |
|---|---|---|---|
| CPU Cores | 2 | 8+ | 16+ |
| RAM | 4 GB | 8 GB | 16 GB |
| Storage | 2 GB | 10 GB | 50 GB+ |
| OS | Linux, macOS, WSL2 | Linux | Linux Cluster |
# Download Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
# Follow prompts, then:
source ~/.bashrc# Create and activate environment
conda create -n staphscope python=3.10 or 3.8 or 3.9 or 3.11 or 3.12 or 3.13 or 3.14
conda activate staphscope
# Install from conda-forge channel
conda install -c conda-forge -c bioconda -c bbeckley-hub staphscope
# Install using mamba (Rapid)
mamba install -c conda-forge -c bioconda -c bbeckley-hub staphscope
# Verify installation
staphscope --versionNB: ALWAYS CHECK THE ORDER OF YOUR CHANNELS BEFORE INSTALLING STAPHSCOPE TO AVOID CONDA ISSUES!!!!!!
# Add channels in correct order
conda config --add channels conda-forge
conda config --add channels bioconda
conda config --add channels bbeckley-hub
--
# Update ABRicate databases(version ≥1.0.1)
abricate --setupdb[https://pubmlst.org/organisms/staphylococcus-aureus] [https://github.com/tseemann/mlst] [http://spa.ridom.de/dynamic/sparepeats.fasta] [https://spa.ridom.de/dynamic/spatypes.txt] [https://bitbucket.org/genomicepidemiology/Sccmecfinder] [https://github.com/ncbi/amr]
# 🐳 Docker Installation & Usage
StaphScope is available as a Docker container for easy, reproducible, and portable analysis.
### Option 1: Pull from Docker Hub
```bash
# Pull the latest image
docker pull bbeckleyhub/staphscope:latest
# Run a quick test
docker run --rm bbeckleyhub/staphscope:latest --help# Clone the repository
git clone https://github.com/bbeckley-hub/staphscope-typing-tool
cd staphscope-typing-tool
# Build the Docker image
docker build -t staphscope:latest .
# Test the image
docker run --rm staphscope:latest --help--
The official Docker image is available on Docker Hub:
docker pull bbeckleyhub/staphscope:latest
docker pull bbeckleyhub/staphscope:1.0.0 # Specific version# Prepare directories
mkdir -p data/input data/output
# Add your FASTA file
cp your_genome.fasta data/input/
# Run analysis
docker run --rm \
-v $(pwd)/data/input:/data/input \
-v $(pwd)/data/output:/data/output \
bbeckley/staphscope:latest \
-i "your_genome.fasta" \
-o /data/output \
-t 4# Prepare batch input
mkdir -p batch_analysis/input batch_analysis/output
cp *.fasta batch_analysis/input/
# Run batch analysis with 8 threads
docker run --rm \
-v $(pwd)/batch_analysis/input:/data/input \
-v $(pwd)/batch_analysis/output:/data/output \
bbeckleyhub/staphscope:latest \
-i "*.fasta" \
-o /data/output \
-t 8# Test with minimal data
mkdir -p test/input test/output
echo ">test" > test/input/test.fasta
echo "ATCG" >> test/input/test.fasta
docker run --rm \
-v $(pwd)/test/input:/data/input \
-v $(pwd)/test/output:/data/output \
bbeckley/staphscope:latest \
-i "*.fasta" -o /data/output \
--skip-amr --skip-abricate --skip-mlst \
--skip-spa --skip-sccmec --skip-lineage --skip-comprehensiveWhen running the container, mount these directories:
| Host Directory | Container Path | Purpose |
|---|---|---|
./input/ |
/data/input |
Input FASTA files (.fasta, .fna, .fa) |
./output/ |
/data/output |
All analysis results |
# Show help
docker run --rm bbeckleyhub/staphscope:latest --help# Add user to docker group (Linux)
sudo usermod -aG docker $USER
newgrp docker
# Or run with sudo
sudo docker run --rm bbeckleyhub/staphscope:latest --help# Limit memory usage
docker run --memory="8g" --rm bbeckleyhub/staphscope:latest ...- Docker Engine 20.10+ or Docker Desktop 4.0+
- Minimum RAM: 4GB (8GB recommended for large datasets)
- Disk Space: 2GB for image + space for input/output
- CPU: 2+ cores (4+ recommended)
# Pull latest version
docker pull bbeckleyhub/staphscope:latest
# Remove old versions
docker image prune✨ Pro Tip: For production use, consider using Docker volumes for persistent storage:
docker volume create staphscope_data
docker run -v staphscope_data:/data/output bbeckleyhub/staphscope:latest ...IMAGE SIZE
staphscope:latest ~500MB
# Single genome analysis
staphscope -i /path/to/genome.fasta -o /path/to/results
# Batch processing with wildcards
staphscope -i "*.fna" -o results_2025 --threads 8
# Specify custom number of threads
staphscope -i "*.fasta" -o analysis -t 16
# Skip specific modules (if already analyzed)
staphscope -i sample.fna -o results --skip-spa --skip-lineage- Accepted:
.fna,.fasta,.fa,.fn - Required: Assembled genomes (contigs or complete)
- Not supported: Raw reads (FASTQ) - see Future Roadmap
- Batch patterns:
*.fasta,sample_*.fna,[0-9].fa
# Daily surveillance of 12 isolates
staphscope -i "daily_isolates/*.fasta" -o /mnt/shared/surveillance/$(date +%Y%m%d) --threads 12
# Expected: Complete analysis in ~8 minutes
# Output: Interactive HTML report for clinical team review# 100-genome phylogeny project
Supporst glob patterns
# Expected: All 100 genomes analyzed in ~90 minutes
# Output: Machine-readable JSON/TSV for downstream analysis# Urgent outbreak investigation (8 suspected cases)
staphscope -i "outbreak/*.fasta" -o /tmp/urgent_analysis --skip-lineage
# Expected: Results in ~4 minutes
# Output: Immediate identification of shared SCCmec types and resistance profilesStaphscope generates a comprehensive, organized output directory with results from each analysis module. Below is a typical directory tree (example from sample MRSA252):
Staphscope/
├── abricate_results/ # Gene detection (ABRicate)
│ ├── MRSA252/ # Per-sample detailed results
│ │ ├── abricate_*.txt # Raw ABRicate outputs per database
│ │ ├── abricate_*_report.html # HTML reports per database
│ │ └── MRSA252_comprehensive_abricate_report.html
│ ├── staph_*_abricate_summary.tsv # Combined TSV summaries per database
│ ├── staph_*_summary.json # JSON summaries per database
│ ├── staph_*_summary_report.html # HTML summary reports per database
│ └── staph_abricate_master_summary.json # Master summary across all DBs
│
├── amr_results/ # AMR gene profiling (AMRFinder+)
│ ├── MRSA252/
│ │ ├── MRSA252_amrfinder.txt # Raw AMRFinder output
│ │ └── MRSA252_amrfinder_report.html
│ ├── staph_amrfinder_summary.tsv # Tabular summary
│ ├── staph_amrfinder_summary.json # JSON summary
│ ├── staph_amrfinder_summary_report.html
│ ├── staph_amrfinder_statistics_summary.tsv # Statistical summary
│ └── staph_amrfinder_master_summary.json # Master JSON
│
├── mlst_results/ # Multi-Locus Sequence Typing
│ ├── MRSA252/
│ │ ├── mlst_raw_output.txt # Raw MLST output
│ │ ├── mlst_report.txt/.tsv/.html # Formatted reports
│ ├── mlst_summary.tsv # Combined TSV summary
│ ├── mlst_summary.json # Combined JSON summary
│ └── mlst_summary.html # Combined HTML report
│
├── sccmec_results/ # SCCmec typing (MyKmerFinder)
│ ├── s_MRSA252/ # Per-sample SCCmec results
│ │ ├── results_MyKmerFinder.txt # Kmer-based typing results
│ │ ├── results_tab_MyDbFinder.txt # Database matching results
│ │ ├── sccmec_detailed_results.txt # Detailed typing report
│ │ ├── sccmec_enhanced_report.json # Enhanced JSON report
│ │ └── staphscope_comprehensive_report.html
│ ├── staphscope_summary.tsv # Combined SCCmec summary
│ ├── staphscope_summary.html # HTML summary
│ └── staphscope_detailed_results.csv # Detailed combined results
│
├── spa_results/ # spa typing
│ ├── MRSA252/
│ │ ├── spa_typing_raw.txt # Raw spa typing output
│ │ ├── spa_typing_report.txt/.tsv/.html
│ ├── spa_summary.tsv # Combined TSV summary
│ ├── spa_summary.json # Combined JSON summary
│ └── spa_summary.html # Combined HTML report
│
├── lineage_results/ # Phylogenetic lineage assignment
│ └── staphscope_lineage_reference.html # Lineage reference report
│
└── Staphscope_final_report/ # Consolidated final reports
├── staphscope_comprehensive_report.html/.json/.tsv # Master reports
├── STAPHSCOPE_ULTIMATE_REPORTS # High-level summary( Gene-centric analysis)
.txt/.tsv/.csv: Raw and tabulated data for downstream analysis.json: Structured data for programmatic access and integration.html: Interactive visual reports for manual inspection- Per-sample directories: Contain raw and detailed outputs for individual isolates
- Summary files: Aggregated results across all processed samples
- Single-sample overview: Check the sample-specific directory (e.g.,
MRSA252/) within each module - Cross-sample summaries: Look for
*_summary.tsvor*_summary.jsonin each module's root - Final consolidated report: All key results merged in
Staphscope_final_report/ - Visualization-ready data: Most
.tsvand.jsonfiles are optimized for the Automatic Visualization Module
This organized structure ensures easy navigation, reproducibility, and integration with downstream bioinformatics workflows!!
- Dashboard Overview: Summary statistics at a glance
- Interactive Tables: Sort, filter, search all results
- ** Cross genome pattern discovery
- Clinical Alerts: Color-coded risk indicators
// Example JSON output structure
{
"sample": "USA300_FPR3757",
"mlst": {"st": "ST8", "cc": "CC8", "alleles": ["1","1","1","1","1","1","1"]},
"spa": {"type": "t008", "repeats": "11-19-12-21-17-34-24-34-22-25"},
"sccmec": {"type": "IV(2B)", "confidence": "very-high", "mec_complex": "A", "ccr_complex": "2"},
"mrsa_status": "MRSA",
"amr_genes": [
{"gene": "mecA", "risk": "CRITICAL", "coverage": 100, "identity": 99.8},
{"gene": "fosB", "risk": "HIGH", "coverage": 100, "identity": 98.5}
],
"virulence_factors": ["lukS-PV", "lukF-PV", "hlgA", "hlgB", "hlgC"],
"lineage": {
"name": "USA300",
"classification": "CA-MRSA",
"risk_level": "High",
"geography": "North America",
"pvl_status": "Positive"
}
}- Database: PubMedST S. aureus
- Method: BLAST-based allele calling (100% coverage/identity default)
- Output: ST, CC, 7-gene profile (arcC, aroE, glpF, gmk, pta, tpi, yqiL)
- Enhanced: Automatic lineage database query for epidemiological context
- Database: Ridom spa repeat database
- Method: BLAST against repeat sequences
- Output: spa type, repeat pattern, contig location, alignment metrics
- Tool: SCCmecFinder (hierarchical two-method system)
- Primary: Gene-based (ccr/mec complexes, 90% ID, 60% coverage)
- Secondary: k-mer homology (types I-XIII, ≥50% template coverage)
- Confidence Levels: Very-high, High, Medium, Low, Not Assigned
- Subtyping: Types IV and V community-associated cassettes
- Tool: NCBI-AMRFinderPlus v4.2.4 (curated database 2025-12-03.1)-bundled
- Optimization: S. aureus-specific database curation
- Risk Assessment: Critical Risk (mecA, vanA, cfr), High Risk (erm, tetM)
- Pattern Discovery: Cross-genome frequency analysis
- Databases:
- VFDB (Virulence factors)
- ResFinder (Acquired resistance)
- CARD (Comprehensive resistance)
- PlasmidFinder (Replicon typing)
- MegaRes, NCBI, ARG-ANNOT, ECOH, EcoLi_VF
- Thresholds: ≥80% identity and coverage
- Clinical Flags: Automatic highlighting of PVL, enterotoxins, van genes
- Content: 44 major S. aureus lineages (18 HA-MRSA, 19 CA-MRSA, 7 LA-MRSA)
- Metadata: Geographical distribution, clinical significance, virulence profiles
- High-Risk: 9 lineages flagged as high-risk, 5 PVL-positive
- Manual Curation: Updated via periodic literature review
| System | Samples | Time | Speed vs Bactopia |
|---|---|---|---|
| 💻 Laptop (2 cores, 8GB RAM) | 1 | 2m 33s | 5× faster |
| 💻 Laptop (2 cores, 8GB RAM) | 24 | 28m 17s | 6× faster |
| 🖥️ Workstation (16 cores, 16GB RAM) | 1 | 1m 31s | 8× faster |
| 🖥️ Workstation (16 cores, 16GB RAM) | 24 | 14m 34s | 10× faster |
| 🖥️ Workstation (16 cores, 16GB RAM) | 100 | ~60m | 12× faster |
- Memory Usage: 2-4 GB typical, scales linearly with samples
- CPU Utilization: Dynamic allocation via psutil (no resource waste)
- Storage: ~100 MB per sample analysis
- Parallelization: Sample-level + intra-module threading
| Reference Strain | Expected Type | StaphScope Result | Concordance |
|---|---|---|---|
| USA300 | ST8–t008–IV(2B) | ST8–t008–IV(2B) | ✅ 100% |
| N315 | ST5–t002–II(2A) | ST5–t002–II(2A) | ✅ 100% |
| MRSA252 | ST36–t018–II(2A) | ST36–t018–II(2A) | ✅ 100% |
| TW20 | ST239–t037–III(3A) | ST239–t037–III(3A) | ✅ 100% |
| NCTC8325 | ST8–t211–None | ST8–t211–Not Assigned | ✅ 100% |
- MRSA: 21 isolates (87.5%)
- MSSA: 3 isolates (12.5%)
- Dominant STs: ST5 (9), ST8 (5), ST22 (2)
- Critical Genes: mecA (21), mecC (1), fosB (20)
- PVL: 7 isolates (29.2%), all ST8/ST59
- Plasmids: 14/24 genomes (58.3%) with plasmid replicons
StaphScope was validated against gold-standard reference genomes with 100% concordance for:
- MLST types (PubMedST database)
- spa types (Ridom database)
- SCCmec types (CGE reference)
- AMR profiles (NCBI-AMRFinderPlus)
24 diverse S. aureus genomes analyzed:
# Detected lineages
ST5 (Healthcare-associated): 9 isolates (37.5%)
ST8 (USA300, CA-MRSA): 5 isolates (20.8%)
ST22 (EMRSA-15): 2 isolates (8.3%)
ST239 (Brazilian/Hungarian): 1 isolate (4.2%)
ST59 (Asian CA-MRSA): 1 isolate (4.2%)
ST398 (Livestock-associated): 1 isolate (4.2%)
ST9 (Livestock-associated): 2 isolates (8.3%)
ST36 (EMRSA-16): 1 isolate (4.2%)
ST425: 1 isolate (4.2%)| Gene | Prevalence | Risk Level | Phenotype |
|---|---|---|---|
| mecA | 87.5% (21/24) | CRITICAL | Methicillin resistance |
| fosB | 83.3% (20/24) | HIGH | Fosfomycin resistance |
| blaZ | 37.5% (9/24) | HIGH | Beta-lactamase |
| dfrG | 16.7% (4/24) | HIGH | Trimethoprim resistance |
| mecC | 4.2% (1/24) | CRITICAL | Alternative methicillin resistance |
- 100% prevalence: hlgA/B/C (gamma-hemolysin), hld (delta-hemolysin), aur (aureolysin)
- 29.2% prevalence: PVL genes (lukS-PV, lukF-PV) in ST8/ST59 lineages
| Feature | StaphScope | Bactopia | Nullarbor | Mykrobe |
|---|---|---|---|---|
| Analysis Focus | 🎯 S. aureus-optimized | Multi-species | Multi-species | Multi-species |
| Input Format | Assembled genomes | Raw reads | Raw reads | Raw reads |
| Installation | Single Conda package | Complex (Nextflow+Docker) | Conda + DB downloads | Single Conda |
| Execution | Local CLI | Local/Cluster | Local | CLI + Web GUI |
| Parallelization | Auto-resource detection | Pipeline-level | Sample-level | Single-threaded |
| MRSA Features | Integrated classification + lineage DB + S. areus specific typing | General typing | General typing | Resistance only |
| Critical Gene Flagging | ✅ mecA, PVL, van genes | ❌ Absent | ❌ Absent | ❌ Absent |
| Resource Needs | Low-moderate (2+ GB) | High (HPC recommended) | High (Cluster) | Low-moderate |
| Setup Ease | Single command | Multiple steps | Multiple steps | Single command |
- ✅ Ideal for: S. aureus-specific research, clinical MRSA surveillance, outbreak response
- ✅ Best when: You need integrated typing + resistance + virulence in one workflow
- ✅ Perfect if: You value speed (minutes vs hours) and data privacy (local execution)
⚠️ Use Bactopia/Nullarbor: Multi-species projects, raw read analysis, extensive QC⚠️ Use Mykrobe: Quick resistance profiling only, web interface preferred
STAPHSCOPE generates comprehensive HTML reports that are perfect for AI analysis. Here's how to use AI tools to get the most from your data.
-
Install any AI browser extension:
-
Open your report:
staphscope_ultimate_report.html -
Select & Ask:
- Navigate to any section (AMR Genes, MLST Analysis, etc.)
- Select the text/data you're interested in
- Right-click → Choose your AI extension → Ask your question
"What is the clinical significance of ST5 vs ST8?"
"Which samples are MRSA and what ST are they?"
"Explain the mecA gene and its importance"
"Which samples have multiple resistance genes?"
"What treatment implications do these genes have?"
"Which samples carry PVL toxin?"
"Are there any high-risk virulence combinations?"
"Are there correlations between ST and specific genes?"
"Identify any concerning patterns in this dataset"
- Provide context: Start with "I'm analyzing S. aureus genomics data..."
- Be specific: Instead of "tell me about this", ask "what does SCCmec type IV indicate?"
- Ask for interpretations: "What are the clinical implications of these findings?"
- Request summaries: "Summarize the resistance profile of sample XYZ"
STAPHSCOPE reports are structured with clear tables and organized data that AI can easily understand. Each gene is shown with ALL genomes that contain it, making pattern analysis straightforward.
- GitHub Issues: bbeckley-hub/staphscope-typing-tool
- Email: [email protected]
- University of Ghana Medical School
- Install AI extension (ChatGPT/Claude/Gemini)
- Open staphscope_ultimate_report.html
- Select text → Right-click → Ask AI
• "Most common sequence types?"
• "Clinical significance of [ST]?"
• "Samples with [gene]?"
• "Treatment implications?"
• "PVL toxin carriers?"
• "High-risk combinations?"
• "Correlations between ST and resistance?"
• "Concerning patterns?"
# Planned machine learning module
staphscope --ml-predict --input results.json --model outbreak_risk
# Raw read support (in development)
staphscope --raw-reads sample_R1.fastq sample_R2.fastq --assembler shovill
This module will automatically generates publication-quality visualizations from Staphscope analysis results using modern Python plotting libraries.
- Multi-format Support: Generate PNG, SVG, PDF, and interactive HTML visualizations
- Comprehensive Plot Types:
- Statistical Plots: Box plots, violin plots, and distribution histograms
- Comparison Charts: Bar charts, grouped bars, and stacked plots
- Trend Analysis: Line graphs, scatter plots with regression lines
- Composition Views: Pie charts, donut charts, and treemaps
- Correlation Insights: Heatmaps, pair plots, and correlation matrices
- Smart Defaults: Automatically selects appropriate plot types based on data structure
- Customizable Themes: Built-in color palettes optimized for scientific publishing
- Seaborn: Statistical visualizations with beautiful default styles
- Matplotlib: Foundation layer for complete customization
- Plotly: Interactive HTML plots for exploratory analysis
- Pandas: Built-in plotting for quick data exploration
sample_distribution.png- Diversity metrics across samplesvariant_frequency_bar.svg- Top mutations with confidence intervalscorrelation_heatmap.html- Interactive sample similarity matrixtime_series_trend.pdf- Longitudinal tracking of key markers
- Outbreak Prediction: Identify emerging patterns and transmission networks
- Phenotype Inference: Predict virulence, transmissibility from genotype
- Risk Scoring: Automated risk assessment for clinical isolates
- Anomaly Detection: Flag novel or unexpected genetic combinations
- Raw Read Support: Direct FASTQ analysis with integrated assembly(Snippy)
- Real-Time Updates: Live database synchronization
- Plugin System: Community-contributed analysis modules
- Database Contributions: User-submitted lineage updates
- Benchmark Datasets: Shared validation datasets
- Translation Support: Help translate the interface to your language
Q: Is StaphScope free to use?
A: Yes! StaphScope is open-source under the MIT License. Free for academic, clinical, and commercial use.
Q: What makes StaphScope different from other tools?
A: StaphScope is S. aureus-optimized, integrates 6 analysis types in one workflow, runs 8-10× faster than generalist tools, and includes a curated global lineage database.
Q: Can I use StaphScope for clinical diagnosis?
A: StaphScope is a research tool. While highly accurate, results should be validated with orthogonal methods for clinical decision-making.
Q: Why only assembled genomes? When will raw read support be added?
A: We focused first on assembled genomes for speed and simplicity. Raw read support is our #1 priority for 2026 development.
Q: How often are databases updated?
A: We have planned sequential releases when databases updates are needed. The lineage database is manually curated every 6 months. Users can run abricate --setupdb anytime.
Q: Can I run StaphScope on Windows?
A: Yes, via WSL2 (Windows Subsystem for Linux). Native Windows support is planned.
Q: How do I handle very large batches (1000+ genomes)?
A: Just you use glob patterns and take a coffee break.
Q: What does "Not Assigned" mean for SCCmec typing?
A: This indicates insufficient evidence for cassette classification—usually MSSA or novel SCCmec types.
Q: How is MRSA status determined?
A: MRSA = positive for both SCCmec element AND mecA or mecC gene. MSSA = lacks either criterion.
Q: Are virulence factors from other species filtered out?
A: Yes! The ABRicate module uses S. aureus-optimized thresholds and databases minimize cross-species false positives.
# Issue: Database errors
# Solution:
abricate --setupdb
# Issue: Missing dependencies
# Solution:
conda remove staphscope
conda clean --all
conda install -c bbeckley-hub staphscope # Fresh install- Check existing issues: GitHub Issues
- Search closed issues: Many problems already solved
- Create new issue: Include:
- Full error message
staphscope --version- Conda environment list (
conda list) - Example command that failed
- Email support: [email protected] (response within 48 hours)
If you use StaphScope in your research, please cite our manuscript:
@article{beckley2025staphscope,
title={StaphScope: A species-optimized computational pipeline for rapid and accessible Staphylococcus aureus genotyping and surveillance},
author={Beckley, Brown and Vincent, Amarh},
journal={In preparation},
year={2025},
note={Manuscript submitted for publication}
}@software{staphscope2025,
title = {StaphScope: A species-optimized computational pipeline for rapid and accessible Staphylococcus aureus genotyping and surveillance},
author = {Brown Beckley},
year = {2025},
publisher = {GitHub},
url = {https://github.com/bbeckley-hub/staphscope-typing-tool},
version = {1.0.0}
}Please also cite these essential tools that make StaphScope possible:
# MLST
@article{seemann2018mlst,
title={mlst: Scan contig files against traditional PubMLST typing schemes},
author={Seemann, Torsten},
year={2018},
publisher={GitHub}
}
# AMRFinderPlus
@article{feldgarden2021amrfinderplus,
title={AMRFinderPlus and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial resistance, stress response, and virulence},
author={Feldgarden, Michael and others},
journal={Scientific Reports},
volume={11},
number={1},
pages={12728},
year={2021}
}
# ABRicate
@software{seemann2024abricate,
title={ABRicate: Mass screening of contigs for antimicrobial and virulence genes},
author={Seemann, Torsten},
year={2024},
publisher={GitHub}
}
# SCCmecFinder
@article{kaya2018sccmecfinder,
title={SCCmecFinder, a Web-Based Tool for Typing of Staphylococcal Cassette Chromosome mec in Staphylococcus aureus Using Whole-Genome Sequence Data},
author={Kaya, H and others},
journal={mSphere},
volume={3},
number={1},
pages={e00612-17},
year={2018}
}StaphScope stands on the shoulders of giants. We are deeply grateful to:
- Tool Developers: Torsten Seemann (MLST, ABRicate), NCBI team (AMRFinderPlus), H. Kaya (SCCmecFinder)
- Database Curators: PubMedST, Ridom spa, CGE, CARD, VFDB teams
- Python Ecosystem: Biopython, psutil, pandas, plotly developers
- Testing Community: Early adopters who provided invaluable feedback
- Reviewers & Editors: For strengthening this tool & its manuscript
- Open Science Community: For making this work possible
"If we ever meet in person, the drinks are on me!" - Brown Beckley
- Report Bugs: GitHub Issues
- Suggest Features: GitHub Discussions
- Improve Documentation: Pull requests welcome
- Share Data: Contribute to the lineage database
- Translate: Help translate to your language
Brown Beckley
- 🎓 Department of Medical Biochemistry, University of Ghana Medical School
- 🔬 Department of Biochemistry and Biotechnology, KNUST
- 📧 [email protected]
- 🐙 GitHub: bbeckley-hub
- 🐦 LinkedIn: @brownbeckley
Amarh Vincent
- 🎓 Department of Medical Biochemistry, University of Ghana Medical School
We welcome collaborations on:
- 🧬 MRSA epidemiology studies
- 🏥 Clinical validation projects
- 💻 Bioinformatics tool development
- 🌍 Global surveillance initiatives
- 🏥 Bioinformatics application in Public Health
- 🧬 Infectious disease& immunological studies etc.
Contact for collaboration: [email protected]
- GitHub Releases: Star and watch the repository
- LinkedIn: Follow for announcements
StaphScope is released under the MIT License:
StaphScope integrates several open-source tools, each with their own licenses:
- MLST: GPL-3.0
- ABRicate: GPL-2.0
- AMRFinderPlus: Public Domain
- SCCmecFinder: Apache-2.0
All dependencies are properly credited and their licenses respected.
From days to minutes. From fragmented to integrated. From data to insights.
StaphScope: Precision surveillance for the antibiotic resistance era.
Found this tool useful? Drop a star ⭐ and follow the page for more exciting updates on planned modules!! Join the Fight Against Antimicrobial Resistance*
Antimicrobial resistance (AMR) represents one of the most significant global health threats of our time. We invite researchers, clinicians, and public health professionals to collaborate with us in:
- Expanding and validating our S. areus database
- Sharing regional epidemiological data
- Developing standardized typing methodologies
- Advancing AMR surveillance and intervention strategies
- Feature suggestions to improve practical utility
Together, we can enhance global AMR monitoring and develop more effective treatment strategies.