Aether

The Safe GenAI Platform: A reference implementation for ML inference with integrated safety, traffic governance, and observability.

What is Aether?

Aether is a complete GenAI serving platform that treats safety as a first-class concern, not an afterthought. It integrates four purpose-built components into a unified system designed for demos and reference deployments.

The Challenge

Most GenAI platforms focus only on performance. But in production, you need more:

Challenge	Typical Solution	Aether Solution
Prompt injection	Hope the LLM handles it	Sentinel: 3-tier defense for prompt-injection mitigation
PII leakage	Manual review	Sentinel: Auto-redaction before response
Cost explosion	Monthly bill shock	Atlas: Token-based quotas with forecasting
Performance at scale	Over-provision	Hyperion: Intelligent batching, 3x throughput
"What's happening?"	Grep through logs	MonitorX: ML-aware dashboards and alerting

The Four Pillars (in request flow order)

Component	Role	Key Capabilities
Atlas	Traffic Gateway	Auth, rate limiting, quotas, safety compute budget
Sentinel	Safety Analysis	Tiered defense (Heuristics→Fast ML→Deep ML/LLM), PII detection
Hyperion	Model Serving	GPU inference, intelligent batching, caching, optimization
MonitorX	Observability	Real-time metrics, alerting, drift detection, dashboards

Why Aether?

Safe by Design: Integrated safety layer catches attacks, PII leakage, toxic content
One-Command Setup: Local infrastructure in minutes
Self-Observing: Automatic metrics collection and alerting
Quota-Aware: Built-in cost control and rate limiting
High-Performance: GPU acceleration with intelligent batching
Reference-Ready: Modular components, tests, and docs

🚀 Quick Start

Prerequisites

Docker & Docker Compose
8GB+ RAM recommended
(Optional) NVIDIA GPU for acceleration

Installation

# Clone Aether
git clone https://github.com/BugVanquisher/Aether
cd Aether

# (Optional) copy demo env defaults
cp .env.example .env

# Clone all four component repositories
git clone https://github.com/BugVanquisher/Atlas
git clone https://github.com/BugVanquisher/Hyperion
git clone https://github.com/BugVanquisher/MonitorX
git clone https://github.com/BugVanquisher/Sentinel

# Setup and start WITH SAFETY (recommended!)
docker-compose -f docker-compose-with-sentinel.yml up -d

# Or without safety layer (original setup)
# ./setup-integrated-demo.sh

That's it!

Keeping Components Updated

Each component (Atlas, Hyperion, MonitorX, Sentinel) is an independent repository. To pull the latest changes from all components:

./sync-repos.sh

This script:

Pulls origin/main for each component
Skips repos with local changes (won't overwrite your work)
Shows status for each repo

After syncing, rebuild containers if needed:

docker-compose -f docker-compose-with-sentinel.yml build

Verify Installation

# Run automated tests
./test-integrated-system.sh

# Expected output:
# ✓ Hyperion inference successful
# ✓ Atlas gateway working
# ✓ Quota tracking active
# ✓ MonitorX is collecting metrics
# ✓ Rate limiting working

Access Your Platform

Atlas Gateway (entry point): http://localhost:8080
Sentinel Safety API: http://localhost:8090
Hyperion API: http://localhost:8000
MonitorX Dashboard: http://localhost:8501
InfluxDB UI: http://localhost:8086

Default API Key (demo only): demo-key-12345 (override via API_KEY)

Security Notes (Before Production)

Demo credentials in this repo are for local testing only; rotate them before any real deployment.
Set secrets through environment variables (see .env.example) and remove demo defaults.
Disable demo bypass keys such as SENTINEL_DEMO_KEY and set a strong ADMIN_API_KEY.

Production Checklist (Quick)

Set API_KEY, ADMIN_API_KEY, and all storage tokens via env or secrets manager.
Enable Sentinel auth (SENTINEL_API_KEYS) and RapidAPI proxy secret if applicable.
Rotate all demo credentials and disable any demo bypass keys.

🎬 Interactive Demo

Run the guided presentation:

./demo-presentation.sh

This walks through:

System health checks
Complete inference flow
Quota tracking
Rate limiting in action
Performance monitoring
Cache optimization
Prometheus metrics

Perfect for demos, presentations, or understanding the system!

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    AETHER: SAFE GENAI PLATFORM                  │
│         Production ML serving with integrated safety            │
└─────────────────────────────────────────────────────────────────┘

                              Clients
                                 │
                                 ▼
                   ┌─────────────────────────┐
                   │     Atlas Gateway       │ ← TRAFFIC GATEWAY
                   │      Port 8080          │
                   │                         │
                   │ • Authentication        │
                   │ • Rate Limiting         │
                   │ • Quota Control         │
                   │ • Safety Compute Budget │
                   └───────────┬─────────────┘
                               │ (authenticated)
                               ▼
                   ┌─────────────────────────┐
                   │   Sentinel (Safety)     │ ← SAFETY ANALYSIS
                   │      Port 8090          │
                   │                         │
                   │ • Tier 1: Heuristics    │
                   │ • Tier 2: Fast ML       │
                   │ • Tier 3: Deep ML/LLM   │
                   │ • Reports tier to Atlas │
                   └───────────┬─────────────┘
                               │ (if safe)
                               ▼
                   ┌─────────────────────────┐
                   │    Hyperion Engine      │ ← MODEL SERVING
                   │      Port 8000          │
                   │                         │
                   │ • GPU Inference         │
                   │ • Intelligent Batching  │
                   │ • Response Caching      │
                   └───────────┬─────────────┘
                               │
                               ▼
                   ┌─────────────────────────┐
                   │   Sentinel (Safety)     │ ← OUTPUT FILTER
                   │                         │
                   │ • PII Leakage Check     │
                   │ • Toxicity Filtering    │
                   │ • Secret Detection      │
                   └───────────┬─────────────┘
                               │
                               ▼
                            Response

         ┌──────────────────────────────────────┐
         │        MonitorX (Observability)      │
         │    API: 8001 | Dashboard: 8501       │
         │                                      │
         │ • Safety Metrics (block rate, FP)    │
         │ • Performance Metrics (latency)      │
         │ • ML-Aware Alerting                  │
         └──────────────────────────────────────┘

Supporting Infrastructure:
├── Redis (6379): Caching + Quota Storage
└── InfluxDB (8086): Time-Series Metrics

Why Atlas Before Sentinel?
Sentinel's Tier 3 uses LLM-based analysis (significant compute).
Atlas must protect this resource with quotas, just like inference.

Safety Demo

Run the safety demonstration to see Sentinel in action:

./demo-safety.sh

This showcases:

Normal requests passing through safely
Prompt injection attacks being blocked
PII detection and flagging
Toxic content being caught
HIPAA violations detected

Documentation

Quick Links

Architecture Decision Records - Design decisions and rationale
Integration Guide - Complete setup and configuration
Troubleshooting Guide - Common issues and solutions
API Reference - Endpoint documentation
Production Deployment - Kubernetes and scaling

Component Documentation

For detailed information on each component:

Sentinel Documentation - Safety gateway details
Atlas Documentation - Traffic governance details
Hyperion Documentation - Inference engine internals
MonitorX Documentation - Observability setup

💡 Usage Examples

Basic Inference

# Send a request through the unified platform
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer demo-key-12345" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "Explain machine learning in simple terms"}
    ],
    "max_tokens": 100
  }'

Check Quota Usage

curl http://localhost:8080/v1/usage \
  -H "Authorization: Bearer demo-key-12345"

Monitor Performance

# View batch statistics
curl http://localhost:8000/v1/batch/stats | jq

# Access real-time dashboard
open http://localhost:8501

Python SDK Integration

from aether import AetherClient

# Initialize client
client = AetherClient(
    gateway_url="http://localhost:8080",
    api_key="demo-key-12345"
)

# Make inference request
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "What is Aether?"}
    ],
    max_tokens=100
)

print(response.choices[0].message.content)

# Check quota
usage = client.get_usage()
print(f"Daily usage: {usage['daily_usage']} / {usage['daily_limit']}")

Key Features

Traffic Gateway (Atlas)

API key authentication and authorization
Rate limiting (QPS + burst control)
Daily and monthly quota enforcement
Safety compute budget (limit expensive Tier 3 checks)
Token-level accounting (not just request counting)
Priority-based routing (critical/high/normal/low)
Usage forecasting and capacity planning

Safety Analysis (Sentinel)

Tiered Defense: Heuristics (<1ms) → Fast ML (5-15ms) → Deep ML/LLM (50-100ms)
Prompt Injection Detection: Catches 95%+ of known attack patterns
PII Detection & Redaction: Microsoft Presidio-powered entity recognition
Toxicity Filtering: BERT-based toxicity classification
HIPAA/GDPR Compliance: Healthcare and privacy policy enforcement
Output Filtering: Catch leakage in generated responses
Tier Reporting: Reports tier invoked back to Atlas for quota tracking

High-Performance Inference (Hyperion)

GPU acceleration with CUDA support
Intelligent request batching (2-8 requests/batch, 3x throughput)
Redis caching for repeated queries
Model optimization (quantization, compilation)
Support for multiple model types
Automatic CPU/GPU fallback

Comprehensive Observability (MonitorX)

Real-time metrics visualization
Safety metrics: Block rate, false positive rate, tier distribution
ML-aware alerting with adaptive thresholds
Multi-channel alerting (Email, Slack, Webhooks)
Historical analysis and trends
CSV/JSON data export

🚢 Production Deployment

Docker Compose (Recommended for getting started)

docker-compose -f docker-compose-integrated.yml up -d

Kubernetes

# Deploy to Kubernetes cluster
kubectl apply -f kubernetes/namespace.yaml
kubectl apply -f kubernetes/

# Or use Helm
helm install aether ./helm/aether-platform

Environment Configuration

# Copy environment template
cp .env.example .env

# Edit for your environment
vim .env

Key variables:

# Atlas
ATLAS_RATE_LIMIT_QPS=100
ATLAS_ADMIN_TOKEN=<secure-token>

# Hyperion
HYPERION_DEVICE_TYPE=cuda  # or 'cpu'
HYPERION_BATCH_SIZE=8
HYPERION_MODEL_NAME=microsoft/DialoGPT-small

# MonitorX
MONITORX_SLACK_WEBHOOK=<your-webhook>
[email protected]

API Endpoints

Sentinel Safety API (Port 8090)

Endpoint	Method	Description
`/health`	GET	Health check
`/supervise`	POST	Safety supervision (input/output check)
`/dashboard`	GET	Compliance dashboard (requires auth)

Atlas Gateway (Port 8080)

Endpoint	Method	Description
`/healthz`	GET	Health check
`/v1/chat/completions`	POST	OpenAI-compatible inference
`/v1/usage`	GET	Check quota usage
`/v1/admin/keys`	POST	Register API key
`/v1/forecasting/forecast`	GET	Traffic prediction
`/metrics`	GET	Prometheus metrics

Hyperion API (Port 8000)

Endpoint	Method	Description
`/healthz`	GET	Health + device info
`/v1/llm/chat`	POST	Direct LLM inference
`/v1/batch/stats`	GET	Batch performance
`/v1/models`	GET	Available models
`/metrics`	GET	Prometheus metrics

MonitorX API (Port 8001)

Endpoint	Method	Description
`/api/v1/health`	GET	API health
`/api/v1/models`	POST	Register model
`/api/v1/metrics/inference`	POST	Submit metrics
`/alerts/active`	GET	Active alerts
`/alerts/history`	GET	Alert history

Full API documentation available at /docs on each service.

🧪 Testing

Automated Testing

# Run integration test suite
./test-integrated-system.sh

# Run individual component tests
cd Atlas && pytest
cd Hyperion && pytest
cd MonitorX && pytest

Load Testing

# Install locust
pip install locust

# Run load test
locust -f tests/locustfile.py \
  --host http://localhost:8080 \
  --users 100 \
  --spawn-rate 10

Verification

# Quick health check
curl http://localhost:8080/healthz
curl http://localhost:8000/healthz
curl http://localhost:8001/api/v1/health

# Verify metrics collection
curl http://localhost:8080/metrics | grep atlas_requests_total

🎓 Advanced Topics

Multi-Model Deployment

Deploy multiple models simultaneously:

hyperion-gpt2:
  build: ./Hyperion
  environment:
    - MODEL_NAME=gpt2
    
hyperion-bert:
  build: ./Hyperion
  environment:
    - MODEL_NAME=bert-base-uncased

Custom Alerting

Configure MonitorX alerts:

thresholds = {
    "latency": 2000.0,      # Alert if >2s
    "error_rate": 0.05,     # Alert if >5% errors
    "gpu_memory": 0.90,     # Alert if >90% memory
}

Horizontal Scaling

Scale with Kubernetes HPA:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: hyperion-hpa
spec:
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

🤝 Contributing

Aether is an integration project. Each component lives in its own repository:

Atlas - Traffic governance
Hyperion - ML inference engine
MonitorX - Observability platform
Sentinel - Safety gateway

Development Workflow

Make changes in the standalone component repo
Commit and push to the component repo
Run ./sync-repos.sh in Aether to pull updates
Rebuild: docker-compose -f docker-compose-with-sentinel.yml build

For Integration Improvements

Fork this repository
Create a feature branch
Submit a pull request

📄 License

Apache License 2.0 - See LICENSE file for details.

Each component maintains its own Apache 2.0 license.

Showcase

Aether demonstrates:

Safety-First Design: Integrated safety layer from day one, not bolted on
System Design: Composable architecture with clear separation of concerns
DevOps Excellence: One-command deployment, comprehensive monitoring
Production Mindset: Health checks, graceful degradation, observability
Enterprise Features: Authentication, rate limiting, quota management, compliance
Performance: GPU acceleration, intelligent batching, caching

Perfect for:

Production ML deployments requiring safety compliance
Learning ML infrastructure with security best practices
Portfolio demonstrations of end-to-end platform design
Rapid prototyping with built-in guardrails
ML system benchmarking with safety metrics

🔗 Links

GitHub: github.com/BugVanquisher/Aether
Documentation: Full Integration Guide
Issues: Report a Bug
Discussions: Community Forum

Acknowledgments

Aether integrates four purpose-built components:

Sentinel - AI safety gateway with tiered defense
Atlas - Traffic governance for LLM inference
Hyperion - High-performance ML inference platform
MonitorX - Production ML observability

Built with:

Microsoft Presidio for PII detection
Unitary Toxic-BERT for toxicity classification
FastAPI for APIs
InfluxDB for time-series storage

Built by BugVanquisher

Aether: Safe GenAI at Scale

Get Started | Safety Demo | Documentation | ADRs

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
docs/adrs		docs/adrs
images		images
.DS_Store		.DS_Store
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
INTEGRATION-README.md		INTEGRATION-README.md
LICENSE		LICENSE
README.md		README.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
demo-presentation.sh		demo-presentation.sh
demo-safety.sh		demo-safety.sh
docker-compose-integrated.yml		docker-compose-integrated.yml
docker-compose-with-sentinel.yml		docker-compose-with-sentinel.yml
prometheus.yml		prometheus.yml
setup-aether.sh		setup-aether.sh
setup-integrated-demo.sh		setup-integrated-demo.sh
sync-repos.sh		sync-repos.sh
test-integrated-system.sh		test-integrated-system.sh
verify-and-setup.sh		verify-and-setup.sh

Folders and files

Latest commit

History

Repository files navigation

Aether

What is Aether?

The Challenge

The Four Pillars (in request flow order)

Why Aether?

🚀 Quick Start

Prerequisites

Installation

Keeping Components Updated

Verify Installation

Access Your Platform

Security Notes (Before Production)

Production Checklist (Quick)

🎬 Interactive Demo

Architecture

Safety Demo

Documentation

Quick Links

Component Documentation

💡 Usage Examples

Basic Inference

Check Quota Usage

Monitor Performance

Python SDK Integration

Key Features

Traffic Gateway (Atlas)

Safety Analysis (Sentinel)

High-Performance Inference (Hyperion)

Comprehensive Observability (MonitorX)

🚢 Production Deployment

Docker Compose (Recommended for getting started)

Kubernetes

Environment Configuration

API Endpoints

Sentinel Safety API (Port 8090)

Atlas Gateway (Port 8080)

Hyperion API (Port 8000)

MonitorX API (Port 8001)

🧪 Testing

Automated Testing

Load Testing

Verification

🎓 Advanced Topics

Multi-Model Deployment

Custom Alerting

Horizontal Scaling

🤝 Contributing

Development Workflow

For Integration Improvements

📄 License

Showcase

🔗 Links

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages