Upload anything → Get anomaly alerts & forecasts
Production-ready REST API with:
- 📝 Log Anomaly Detection (TF-IDF + DBSCAN for text)
- 🔍 Outlier Detection (IQR, Z-score, Isolation Forest)
- 📈 Time-Series Forecasting (Linear, Exponential, Prophet)
- 🧠 Unified Model (One model, all data, continuous learning)
- 📊 Smart Upload (Auto-detects CSV, Excel, JSON, Logs)
55,000+ rows tested | 6 datasets | 1 unified model
# Install & generate data
make install
make data
# Analyze a dataset (NO CURL NEEDED!)
python app.py analyze data/sensor_data.csv temperature
# Train unified model
python app.py train data/sales_data.csv sales
# Check status
python app.py status
# Start server
python app.py
# Or use Makefile shortcuts
make analyze-sensors # Analyze 20K sensor readings
make train # Train on all 55K samples
make help # See all commands✨ Best-in-class lightweight algorithms:
- Outlier Detection: IQR, Z-score, Isolation Forest
- Linear Regression: Ridge regression with automatic feature scaling
- Time-Series Forecasting: Exponential smoothing & linear trend
- Model Training & Persistence: Train once, predict many times
🚀 Production-ready:
- Thread-safe data & model storage
- Rate limiting & security headers
- Prometheus metrics
- Health checks & monitoring
- Docker support
pip install -r requirements.txtpython app.pyServer starts at http://localhost:5000
python test.pyUse the ready-to-go datasets in sample_data/ to exercise every endpoint without having to craft payloads from scratch:
| File | Format | Purpose |
|---|---|---|
sample_data/sales_data.csv |
CSV | Multivariate retail metrics with a deliberate sales spike to test regression & outlier detection |
sample_data/sales_data.xlsx |
Excel | Same data for validating Excel ingestion |
sample_data/sales_data.json |
JSON | Direct upload example for the REST API |
sample_data/energy_usage.csv |
CSV | Clean time-series readings for forecasting |
sample_data/energy_usage.json |
JSON | Alternative time-series payload |
sample_data/system_logs.txt |
Text | Small log file you can vectorize or preprocess before uploading |
-
Upload the CSV/JSON data
python - <<'PY' import pandas as pd, requests df = pd.read_csv('sample_data/sales_data.csv') payload = {"name": "sales_sample", "data": df.to_dict('records')} res = requests.post('http://localhost:5000/upload_data', json=payload, timeout=10) res.raise_for_status() print(res.json()) PY
-
Detect outliers:
curl -s http://localhost:5000/detect_outliers \ -H 'Content-Type: application/json' \ -d '{"dataset": "sales_sample", "method": "iforest", "columns": ["sales", "marketing_spend"]}' | jq .statistics
-
Run regression:
curl -s http://localhost:5000/regression \ -H 'Content-Type: application/json' \ -d '{"dataset": "sales_sample", "target": "sales", "features": ["marketing_spend", "temperature"], "model_name": "sales_model"}' | jq .metrics
-
Forecast with the energy dataset:
python - <<'PY' import pandas as pd, requests df = pd.read_csv('sample_data/energy_usage.csv') payload = {"name": "energy_sample", "data": df.to_dict('records')} requests.post('http://localhost:5000/upload_data', json=payload, timeout=10).raise_for_status() res = requests.post('http://localhost:5000/forecast', json={ "dataset": "energy_sample", "value_column": "energy_usage", "method": "linear", "horizon": 5 }, timeout=10) print(res.json()["forecast"]) PY
-
Run the automated suite once the data is loaded:
pytest tests/test_analytics.py -v
POST /upload_data
{"name": "sales", "data": [{"col1": 10, "col2": 20}]}POST /detect_outliers
{"dataset": "sales", "method": "iqr"}
# methods: iqr, zscore, iforestPOST /train_model
{"dataset": "sales", "target": "col1", "model_name": "predictor"}POST /predict
{"model_name": "predictor", "data": [{"col2": 25}]}POST /forecast
{"dataset": "sales", "value_column": "col1", "method": "linear", "horizon": 10}
# methods: linear, exponentialPOST /regression
{"dataset": "sales", "target": "col1", "features": ["col2"]}import requests
# Upload data
requests.post('http://localhost:5000/upload_data', json={
"name": "sales",
"data": [
{"date": "2024-01-01", "sales": 100, "marketing": 20},
{"date": "2024-01-02", "sales": 120, "marketing": 25}
]
})
# Detect outliers
result = requests.post('http://localhost:5000/detect_outliers', json={
"dataset": "sales",
"method": "iforest"
})
print(f"Found {result.json()['n_outliers']} outliers")
# Train model
requests.post('http://localhost:5000/train_model', json={
"dataset": "sales",
"target": "sales",
"model_name": "predictor",
"features": ["marketing"]
})
# Predict
predictions = requests.post('http://localhost:5000/predict', json={
"model_name": "predictor",
"data": [{"marketing": 30}]
})
print(predictions.json()['predictions'])Create .env file (see .env.example):
SECRET_KEY=your-secret-key
RATE_LIMIT_PER_MINUTE=60
MAX_BATCH_SIZE=10000
MODEL_DIR=modelsdocker-compose up -dIQR: Fast, no training. Flags values outside Q1-1.5×IQR to Q3+1.5×IQR
Z-Score: Assumes normal distribution. Flags |z| > 3
Isolation Forest: Best for high-dimensional, non-linear patterns
Ridge Regression: L2 regularization, automatic scaling
Exponential Smoothing: Fast, good for stationary data
Linear Trend: Simple, interpretable forecasting
- 100K rows in memory
- Sub-second responses
- Isolation Forest: ~100ms for 10K rows
- Regression: ~50ms for 10K rows
MIT