Thanks to visit codestin.com
Credit goes to github.com

Skip to content

BugVanquisher/Aether

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Aether

The Safe GenAI Platform: A reference implementation for ML inference with integrated safety, traffic governance, and observability.

License Docker Platform Safety


What is Aether?

Aether is a complete GenAI serving platform that treats safety as a first-class concern, not an afterthought. It integrates four purpose-built components into a unified system designed for demos and reference deployments.

The Challenge

Most GenAI platforms focus only on performance. But in production, you need more:

Challenge Typical Solution Aether Solution
Prompt injection Hope the LLM handles it Sentinel: 3-tier defense for prompt-injection mitigation
PII leakage Manual review Sentinel: Auto-redaction before response
Cost explosion Monthly bill shock Atlas: Token-based quotas with forecasting
Performance at scale Over-provision Hyperion: Intelligent batching, 3x throughput
"What's happening?" Grep through logs MonitorX: ML-aware dashboards and alerting

The Four Pillars (in request flow order)

Component Role Key Capabilities
Atlas Traffic Gateway Auth, rate limiting, quotas, safety compute budget
Sentinel Safety Analysis Tiered defense (Heuristics→Fast ML→Deep ML/LLM), PII detection
Hyperion Model Serving GPU inference, intelligent batching, caching, optimization
MonitorX Observability Real-time metrics, alerting, drift detection, dashboards

Why Aether?

  • Safe by Design: Integrated safety layer catches attacks, PII leakage, toxic content
  • One-Command Setup: Local infrastructure in minutes
  • Self-Observing: Automatic metrics collection and alerting
  • Quota-Aware: Built-in cost control and rate limiting
  • High-Performance: GPU acceleration with intelligent batching
  • Reference-Ready: Modular components, tests, and docs

πŸš€ Quick Start

Prerequisites

  • Docker & Docker Compose
  • 8GB+ RAM recommended
  • (Optional) NVIDIA GPU for acceleration

Installation

# Clone Aether
git clone https://github.com/BugVanquisher/Aether
cd Aether

# (Optional) copy demo env defaults
cp .env.example .env

# Clone all four component repositories
git clone https://github.com/BugVanquisher/Atlas
git clone https://github.com/BugVanquisher/Hyperion
git clone https://github.com/BugVanquisher/MonitorX
git clone https://github.com/BugVanquisher/Sentinel

# Setup and start WITH SAFETY (recommended!)
docker-compose -f docker-compose-with-sentinel.yml up -d

# Or without safety layer (original setup)
# ./setup-integrated-demo.sh

That's it!

Keeping Components Updated

Each component (Atlas, Hyperion, MonitorX, Sentinel) is an independent repository. To pull the latest changes from all components:

./sync-repos.sh

This script:

  • Pulls origin/main for each component
  • Skips repos with local changes (won't overwrite your work)
  • Shows status for each repo

After syncing, rebuild containers if needed:

docker-compose -f docker-compose-with-sentinel.yml build

Verify Installation

# Run automated tests
./test-integrated-system.sh

# Expected output:
# βœ“ Hyperion inference successful
# βœ“ Atlas gateway working
# βœ“ Quota tracking active
# βœ“ MonitorX is collecting metrics
# βœ“ Rate limiting working

Access Your Platform

Default API Key (demo only): demo-key-12345 (override via API_KEY)

Security Notes (Before Production)

  • Demo credentials in this repo are for local testing only; rotate them before any real deployment.
  • Set secrets through environment variables (see .env.example) and remove demo defaults.
  • Disable demo bypass keys such as SENTINEL_DEMO_KEY and set a strong ADMIN_API_KEY.

Production Checklist (Quick)

  • Set API_KEY, ADMIN_API_KEY, and all storage tokens via env or secrets manager.
  • Enable Sentinel auth (SENTINEL_API_KEYS) and RapidAPI proxy secret if applicable.
  • Rotate all demo credentials and disable any demo bypass keys.

🎬 Interactive Demo

Run the guided presentation:

./demo-presentation.sh

This walks through:

  1. System health checks
  2. Complete inference flow
  3. Quota tracking
  4. Rate limiting in action
  5. Performance monitoring
  6. Cache optimization
  7. Prometheus metrics

Perfect for demos, presentations, or understanding the system!


Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    AETHER: SAFE GENAI PLATFORM                  β”‚
β”‚         Production ML serving with integrated safety            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

                              Clients
                                 β”‚
                                 β–Ό
                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚     Atlas Gateway       β”‚ ← TRAFFIC GATEWAY
                   β”‚      Port 8080          β”‚
                   β”‚                         β”‚
                   β”‚ β€’ Authentication        β”‚
                   β”‚ β€’ Rate Limiting         β”‚
                   β”‚ β€’ Quota Control         β”‚
                   β”‚ β€’ Safety Compute Budget β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚ (authenticated)
                               β–Ό
                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚   Sentinel (Safety)     β”‚ ← SAFETY ANALYSIS
                   β”‚      Port 8090          β”‚
                   β”‚                         β”‚
                   β”‚ β€’ Tier 1: Heuristics    β”‚
                   β”‚ β€’ Tier 2: Fast ML       β”‚
                   β”‚ β€’ Tier 3: Deep ML/LLM   β”‚
                   β”‚ β€’ Reports tier to Atlas β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚ (if safe)
                               β–Ό
                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚    Hyperion Engine      β”‚ ← MODEL SERVING
                   β”‚      Port 8000          β”‚
                   β”‚                         β”‚
                   β”‚ β€’ GPU Inference         β”‚
                   β”‚ β€’ Intelligent Batching  β”‚
                   β”‚ β€’ Response Caching      β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                               β–Ό
                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚   Sentinel (Safety)     β”‚ ← OUTPUT FILTER
                   β”‚                         β”‚
                   β”‚ β€’ PII Leakage Check     β”‚
                   β”‚ β€’ Toxicity Filtering    β”‚
                   β”‚ β€’ Secret Detection      β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                               β–Ό
                            Response

         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚        MonitorX (Observability)      β”‚
         β”‚    API: 8001 | Dashboard: 8501       β”‚
         β”‚                                      β”‚
         β”‚ β€’ Safety Metrics (block rate, FP)    β”‚
         β”‚ β€’ Performance Metrics (latency)      β”‚
         β”‚ β€’ ML-Aware Alerting                  β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Supporting Infrastructure:
β”œβ”€β”€ Redis (6379): Caching + Quota Storage
└── InfluxDB (8086): Time-Series Metrics

Why Atlas Before Sentinel?
Sentinel's Tier 3 uses LLM-based analysis (significant compute).
Atlas must protect this resource with quotas, just like inference.

Safety Demo

Run the safety demonstration to see Sentinel in action:

./demo-safety.sh

This showcases:

  1. Normal requests passing through safely
  2. Prompt injection attacks being blocked
  3. PII detection and flagging
  4. Toxic content being caught
  5. HIPAA violations detected

Documentation

Quick Links

Component Documentation

For detailed information on each component:


πŸ’‘ Usage Examples

Basic Inference

# Send a request through the unified platform
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer demo-key-12345" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "Explain machine learning in simple terms"}
    ],
    "max_tokens": 100
  }'

Check Quota Usage

curl http://localhost:8080/v1/usage \
  -H "Authorization: Bearer demo-key-12345"

Monitor Performance

# View batch statistics
curl http://localhost:8000/v1/batch/stats | jq

# Access real-time dashboard
open http://localhost:8501

Python SDK Integration

from aether import AetherClient

# Initialize client
client = AetherClient(
    gateway_url="http://localhost:8080",
    api_key="demo-key-12345"
)

# Make inference request
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "What is Aether?"}
    ],
    max_tokens=100
)

print(response.choices[0].message.content)

# Check quota
usage = client.get_usage()
print(f"Daily usage: {usage['daily_usage']} / {usage['daily_limit']}")

Key Features

Traffic Gateway (Atlas)

  • API key authentication and authorization
  • Rate limiting (QPS + burst control)
  • Daily and monthly quota enforcement
  • Safety compute budget (limit expensive Tier 3 checks)
  • Token-level accounting (not just request counting)
  • Priority-based routing (critical/high/normal/low)
  • Usage forecasting and capacity planning

Safety Analysis (Sentinel)

  • Tiered Defense: Heuristics (<1ms) β†’ Fast ML (5-15ms) β†’ Deep ML/LLM (50-100ms)
  • Prompt Injection Detection: Catches 95%+ of known attack patterns
  • PII Detection & Redaction: Microsoft Presidio-powered entity recognition
  • Toxicity Filtering: BERT-based toxicity classification
  • HIPAA/GDPR Compliance: Healthcare and privacy policy enforcement
  • Output Filtering: Catch leakage in generated responses
  • Tier Reporting: Reports tier invoked back to Atlas for quota tracking

High-Performance Inference (Hyperion)

  • GPU acceleration with CUDA support
  • Intelligent request batching (2-8 requests/batch, 3x throughput)
  • Redis caching for repeated queries
  • Model optimization (quantization, compilation)
  • Support for multiple model types
  • Automatic CPU/GPU fallback

Comprehensive Observability (MonitorX)

  • Real-time metrics visualization
  • Safety metrics: Block rate, false positive rate, tier distribution
  • ML-aware alerting with adaptive thresholds
  • Multi-channel alerting (Email, Slack, Webhooks)
  • Historical analysis and trends
  • CSV/JSON data export

🚒 Production Deployment

Docker Compose (Recommended for getting started)

docker-compose -f docker-compose-integrated.yml up -d

Kubernetes

# Deploy to Kubernetes cluster
kubectl apply -f kubernetes/namespace.yaml
kubectl apply -f kubernetes/

# Or use Helm
helm install aether ./helm/aether-platform

Environment Configuration

# Copy environment template
cp .env.example .env

# Edit for your environment
vim .env

Key variables:

# Atlas
ATLAS_RATE_LIMIT_QPS=100
ATLAS_ADMIN_TOKEN=<secure-token>

# Hyperion
HYPERION_DEVICE_TYPE=cuda  # or 'cpu'
HYPERION_BATCH_SIZE=8
HYPERION_MODEL_NAME=microsoft/DialoGPT-small

# MonitorX
MONITORX_SLACK_WEBHOOK=<your-webhook>
[email protected]

API Endpoints

Sentinel Safety API (Port 8090)

Endpoint Method Description
/health GET Health check
/supervise POST Safety supervision (input/output check)
/dashboard GET Compliance dashboard (requires auth)

Atlas Gateway (Port 8080)

Endpoint Method Description
/healthz GET Health check
/v1/chat/completions POST OpenAI-compatible inference
/v1/usage GET Check quota usage
/v1/admin/keys POST Register API key
/v1/forecasting/forecast GET Traffic prediction
/metrics GET Prometheus metrics

Hyperion API (Port 8000)

Endpoint Method Description
/healthz GET Health + device info
/v1/llm/chat POST Direct LLM inference
/v1/batch/stats GET Batch performance
/v1/models GET Available models
/metrics GET Prometheus metrics

MonitorX API (Port 8001)

Endpoint Method Description
/api/v1/health GET API health
/api/v1/models POST Register model
/api/v1/metrics/inference POST Submit metrics
/alerts/active GET Active alerts
/alerts/history GET Alert history

Full API documentation available at /docs on each service.


πŸ§ͺ Testing

Automated Testing

# Run integration test suite
./test-integrated-system.sh

# Run individual component tests
cd Atlas && pytest
cd Hyperion && pytest
cd MonitorX && pytest

Load Testing

# Install locust
pip install locust

# Run load test
locust -f tests/locustfile.py \
  --host http://localhost:8080 \
  --users 100 \
  --spawn-rate 10

Verification

# Quick health check
curl http://localhost:8080/healthz
curl http://localhost:8000/healthz
curl http://localhost:8001/api/v1/health

# Verify metrics collection
curl http://localhost:8080/metrics | grep atlas_requests_total

πŸŽ“ Advanced Topics

Multi-Model Deployment

Deploy multiple models simultaneously:

hyperion-gpt2:
  build: ./Hyperion
  environment:
    - MODEL_NAME=gpt2
    
hyperion-bert:
  build: ./Hyperion
  environment:
    - MODEL_NAME=bert-base-uncased

Custom Alerting

Configure MonitorX alerts:

thresholds = {
    "latency": 2000.0,      # Alert if >2s
    "error_rate": 0.05,     # Alert if >5% errors
    "gpu_memory": 0.90,     # Alert if >90% memory
}

Horizontal Scaling

Scale with Kubernetes HPA:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: hyperion-hpa
spec:
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

🀝 Contributing

Aether is an integration project. Each component lives in its own repository:

Development Workflow

  1. Make changes in the standalone component repo
  2. Commit and push to the component repo
  3. Run ./sync-repos.sh in Aether to pull updates
  4. Rebuild: docker-compose -f docker-compose-with-sentinel.yml build

For Integration Improvements

  1. Fork this repository
  2. Create a feature branch
  3. Submit a pull request

πŸ“„ License

Apache License 2.0 - See LICENSE file for details.

Each component maintains its own Apache 2.0 license.


Showcase

Aether demonstrates:

  • Safety-First Design: Integrated safety layer from day one, not bolted on
  • System Design: Composable architecture with clear separation of concerns
  • DevOps Excellence: One-command deployment, comprehensive monitoring
  • Production Mindset: Health checks, graceful degradation, observability
  • Enterprise Features: Authentication, rate limiting, quota management, compliance
  • Performance: GPU acceleration, intelligent batching, caching

Perfect for:

  • Production ML deployments requiring safety compliance
  • Learning ML infrastructure with security best practices
  • Portfolio demonstrations of end-to-end platform design
  • Rapid prototyping with built-in guardrails
  • ML system benchmarking with safety metrics

πŸ”— Links


Acknowledgments

Aether integrates four purpose-built components:

  • Sentinel - AI safety gateway with tiered defense
  • Atlas - Traffic governance for LLM inference
  • Hyperion - High-performance ML inference platform
  • MonitorX - Production ML observability

Built with:


Built by BugVanquisher

Aether: Safe GenAI at Scale

Get Started | Safety Demo | Documentation | ADRs

About

The unified intelligence layer that connects Atlas, Hyperion, and MonitorX into a self-observing, quota-aware, high-performance ML platform.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages