<<<<<<< HEAD
=======
Production-ready WAF system using Transformer ML models to detect anomalous web traffic by learning benign patterns from DVWA, OWASP Juice Shop, and WebGoat applications.
SIH 2025 Problem Statement: PS-25172 (Department of Space, SAC)
waf-pipeline/
βββ ingest/ # Log ingestion (Filebeat + Kafka consumer)
βββ parser/ # Log parsing & normalization service
βββ trainer/ # ML model training (Transformer + LoRA)
βββ inference/ # FastAPI inference service
βββ gateway/ # Node.js detection controller
βββ dashboard/ # MERN dashboard (React + Node API)
βββ blockchain/ # Solidity smart contract & gateway
βββ infra/ # Docker Compose + Kubernetes manifests
βββ demo/ # Demo scripts & attack generation
βββ ci/ # GitHub Actions workflows
βββ docs/ # Architecture diagrams & API specs
- Docker & Docker Compose
- Node.js 18+
- Python 3.10+
- CUDA-capable GPU (optional, for training)
-
Start all services:
cd infra docker-compose up -d -
Generate benign training data:
python demo/generate_benign.py --app dvwa --count 100000 --seed 42
-
Train model:
docker exec -it waf-trainer python train.py --config configs/transformer_ae.yml -
Access dashboard:
http://localhost:3000 -
Test with attack injection:
bash demo/attack_inject.sh
Web Traffic β Nginx (OpenResty) β Mirror Request β Gateway
β β
Upstream App Inference Service
β β
Access Logs β Filebeat β Kafka β Parser β MinIO/S3
β
Trainer
- Filebeat config for log collection
- Kafka consumer for real-time streaming
- Batch upload endpoint
- Normalizes Apache/Nginx logs
- Masks sensitive data (IDs, emails, tokens)
- Path templating & tokenization
- Transformer Autoencoder (6 layers, 512 hidden)
- LoRA fine-tuning for incremental learning
- Export to TorchScript & ONNX
- FastAPI with <30ms p50 latency
- Anomaly scoring endpoint
- Model versioning & hot reload
- Async request observation
- Redis blocklist management
- MongoDB alert storage
- Real-time detection visualization
- Threshold controls
- Metrics & alerts
- Polygon Mumbai smart contract
- Tamper-proof detection records
- IPFS metadata storage
Create .env file:
# Kafka
KAFKA_BOOTSTRAP_SERVERS=localhost:9092
# MongoDB
MONGO_URI=mongodb://localhost:27017/waf
# Redis
REDIS_URL=redis://localhost:6379
# Model
MODEL_VERSION=v1.0.0
MODEL_PATH=/models/transformer_ae.pt
ANOMALY_THRESHOLD=3.0
# Blockchain
POLYGON_RPC_URL=https://rpc-mumbai.maticvigil.com
PRIVATE_KEY=your_private_key_here
# Security
JWT_SECRET=your_jwt_secret_here- Architecture: 6-layer encoder/decoder
- Hidden dim: 512
- Attention heads: 8
- Max sequence length: 256 tokens
- Training data: 1M benign requests
- Score: Reconstruction loss + token-level anomalies
- Warning threshold: ΞΌ + 3Ο
- Block threshold: ΞΌ + 6Ο
# Unit tests
pytest parser/tests/
pytest inference/tests/
# Integration tests
pytest tests/e2e_test.py
# Load tests
locust -f tests/locustfile.py- Prometheus metrics:
:9090/metrics - Grafana dashboard:
:3001 - Inference health:
:8000/infer/health
- TLS for all inter-service communication
- JWT authentication
- Rate limiting per IP
- Secrets via environment variables
- Inference API:
http://localhost:8000/docs(OpenAPI) - Gateway API:
http://localhost:4000/api-docs
- Follow existing code structure
- Add tests for new features
- Update documentation
- Run linters before committing
MIT License - See LICENSE file for details
SIH 2025 - Department of Space, SAC
shubham