Mavhir

ML-powered chemical toxicity prediction API built with FastAPI, RDKit, and scikit-learn

Overview

The Toxicity Predictor API provides machine learning-based predictions for chemical toxicity endpoints including:

Ames Mutagenicity: Bacterial reverse mutation test (OECD 471)
Carcinogenicity: Rodent carcinogenicity studies (OECD 451/453)

Built for researchers, regulatory scientists, and pharmaceutical companies who need fast, reliable toxicity predictions for chemical risk assessment.

Key Features

Fast Predictions: Sub-second response times for single compounds
Batch Processing: Handle up to 1000 compounds per request
File Upload: Direct SDF file processing
Chemical Lookup: Integrated PubChem database search
Rich Metadata: Molecular properties, descriptors, confidence scores
Health Monitoring: Built-in system diagnostics
Auto Documentation: Interactive API docs with Swagger UI
Docker Ready: One-command deployment

Quick Start

Prerequisites

Python 3.9+
Git

1. Clone & Setup

git clone https://github.com/Ojochogwu866/Mavhir.git
cd Mavhir

python -m venv venv
source venv/bin/activate

pip install -r requirements.txt

2. Train Models

python data/create_example_data.py
python app/models/train_models.py

3. Configure Environment

cp .env.example .env

# Edit .env file with your settings (optional)

4. Start the API

# Development server
python -m app.main

# Or with uvicorn directly
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

5. Test the API

# Check health
curl http://localhost:8000/health

# Predict toxicity
curl -X POST "http://localhost:8000/api/v1/predict/smiles" \
  -H "Content-Type: application/json" \
  -d '{"smiles": "CCO", "endpoints": ["ames_mutagenicity"]}'

API Documentation

Once running, visit:

Swagger UI: http://localhost:8000/docs

Core Endpoints

Endpoint	Method	Description
`/health`	GET	Basic health check
`/api/v1/predict/smiles`	POST	Single compound prediction
`/api/v1/predict/batch`	POST	Batch prediction
`/api/v1/predict/sdf`	POST	SDF file upload
`/api/v1/chemical/lookup/{name}`	GET	PubChem compound lookup
`/api/v1/chemical/validate`	GET	SMILES validation

Usage Examples

Single Compound Prediction

import requests

response = requests.post(
    "http://localhost:8000/api/v1/predict/smiles",
    json={
        "smiles": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C",  # Caffeine
        "endpoints": ["ames_mutagenicity", "carcinogenicity"],
        "include_properties": True
    }
)

result = response.json()
print(f"Ames prediction: {result['data']['predictions']['ames_mutagenicity']['prediction']}")

Batch Processing

compounds = [
    "CCO",           # Ethanol
    "CC(=O)O",       # Acetic acid
    "c1ccccc1"       # Benzene
]

response = requests.post(
    "http://localhost:8000/api/v1/predict/batch",
    json={
        "smiles_list": compounds,
        "endpoints": ["ames_mutagenicity"]
    }
)

results = response.json()
print(f"Processed {results['summary']['successful']} compounds successfully")

PubChem Lookup

response = requests.get("http://localhost:8000/api/v1/chemical/lookup/aspirin")
compound_info = response.json()

if compound_info["found"]:
    print(f"Aspirin SMILES: {compound_info['canonical_smiles']}")
    print(f"Molecular weight: {compound_info['molecular_weight']}")

Architecture

mavhir/
├── app/
│   ├── main.py
│   ├── api/
│   │   ├── health.py
│   │   ├── chemical.py
│   │   └── predict.py
│   ├── core/
│   │   ├── config.py
│   │   ├── models.py
│   │   └── exceptions.py
│   ├── services/
│   │   ├── chemical_processor.py
│   │   ├── descriptor_calculator.py
│   │   ├── predictor.py
│   │   └── pubchem_client.py
│   └── models/
├── data/
├── tests/
├── docs/
└── requirements.txt

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=app --cov-report=html

# Run specific test categories
pytest -m "not benchmark"
pytest tests/test_api.py

Docker Deployment

Development

docker-compose up --build

Production

# Build image
docker build -t mavhir .

# Run container
docker run -p 8000:8000 mavhir

📊 Model Information

Ames Mutagenicity Model

Algorithm: Random Forest Classifier
Features: ~200 Mordred molecular descriptors
Training Data: 6,500+ compounds from literature
Performance: 88% accuracy, 0.91 AUC-ROC
Endpoint: Bacterial reverse mutation (Salmonella typhimurium)

Carcinogenicity Model

Algorithm: Gradient Boosting Classifier
Features: ~180 Mordred molecular descriptors
Training Data: 1,200+ compounds from NTP/CPDB
Performance: 76% accuracy, 0.82 AUC-ROC
Endpoint: 2-year rodent bioassays

🔧 Configuration

Key environment variables:

# Basic settings
ENVIRONMENT=production
DEBUG=false
MAX_BATCH_SIZE=100

# Model settings
AMES_MODEL_PATH=app/models/ames_mutagenicity.pkl
CARCINOGENICITY_MODEL_PATH=app/models/carcinogenicity.pkl

# Processing settings
DESCRIPTOR_TIMEOUT=60
ENABLE_DESCRIPTOR_CACHING=true

# PubChem API
PUBCHEM_RATE_LIMIT_DELAY=0.2
PUBCHEM_MAX_RETRIES=3

📈 Performance

Single prediction: ~150ms average
Batch processing: ~100 compounds/minute
Descriptor calculation: ~50ms per compound
Memory usage: ~500MB base + ~1MB per cached compound

🤝 Contributing

Fork the repository
Create feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open Pull Request

Development Setup

# Install development dependencies
pip install -r requirements-dev.txt

# Install pre-commit hooks
pre-commit install

# Run code formatting
black app/ tests/
isort app/ tests/

# Type checking
mypy app/

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
app		app
data		data
docs		docs
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
License		License
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mavhir

Overview

Key Features

Quick Start

Prerequisites

1. Clone & Setup

2. Train Models

3. Configure Environment

4. Start the API

5. Test the API

API Documentation

Core Endpoints

Usage Examples

Single Compound Prediction

Batch Processing

PubChem Lookup

Architecture

Testing

Docker Deployment

Development

Production

📊 Model Information

Ames Mutagenicity Model

Carcinogenicity Model

🔧 Configuration

📈 Performance

🤝 Contributing

Development Setup

About

Uh oh!

Languages

License

Ojochogwu866/Mavhir

Folders and files

Latest commit

History

Repository files navigation

Mavhir

Overview

Key Features

Quick Start

Prerequisites

1. Clone & Setup

2. Train Models

3. Configure Environment

4. Start the API

5. Test the API

API Documentation

Core Endpoints

Usage Examples

Single Compound Prediction

Batch Processing

PubChem Lookup

Architecture

Testing

Docker Deployment

Development

Production

📊 Model Information

Ames Mutagenicity Model

Carcinogenicity Model

🔧 Configuration

📈 Performance

🤝 Contributing

Development Setup

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages