Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

m1rl0k/LogAnalysis

Repository files navigation

Analytics API - Turnkey ML Platform

Upload anything → Get anomaly alerts & forecasts

Production-ready REST API with:

  • 📝 Log Anomaly Detection (TF-IDF + DBSCAN for text)
  • 🔍 Outlier Detection (IQR, Z-score, Isolation Forest)
  • 📈 Time-Series Forecasting (Linear, Exponential, Prophet)
  • 🧠 Unified Model (One model, all data, continuous learning)
  • 📊 Smart Upload (Auto-detects CSV, Excel, JSON, Logs)

55,000+ rows tested | 6 datasets | 1 unified model

Quick Start

# Install & generate data
make install
make data

# Analyze a dataset (NO CURL NEEDED!)
python app.py analyze data/sensor_data.csv temperature

# Train unified model
python app.py train data/sales_data.csv sales

# Check status
python app.py status

# Start server
python app.py

# Or use Makefile shortcuts
make analyze-sensors  # Analyze 20K sensor readings
make train            # Train on all 55K samples
make help             # See all commands

Features

Best-in-class lightweight algorithms:

  • Outlier Detection: IQR, Z-score, Isolation Forest
  • Linear Regression: Ridge regression with automatic feature scaling
  • Time-Series Forecasting: Exponential smoothing & linear trend
  • Model Training & Persistence: Train once, predict many times

🚀 Production-ready:

  • Thread-safe data & model storage
  • Rate limiting & security headers
  • Prometheus metrics
  • Health checks & monitoring
  • Docker support

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Run the Server

python app.py

Server starts at http://localhost:5000

3. Test the API

python test.py

Sample Data

Use the ready-to-go datasets in sample_data/ to exercise every endpoint without having to craft payloads from scratch:

File Format Purpose
sample_data/sales_data.csv CSV Multivariate retail metrics with a deliberate sales spike to test regression & outlier detection
sample_data/sales_data.xlsx Excel Same data for validating Excel ingestion
sample_data/sales_data.json JSON Direct upload example for the REST API
sample_data/energy_usage.csv CSV Clean time-series readings for forecasting
sample_data/energy_usage.json JSON Alternative time-series payload
sample_data/system_logs.txt Text Small log file you can vectorize or preprocess before uploading

Quick functional walkthrough

  1. Upload the CSV/JSON data

    python - <<'PY'
    import pandas as pd, requests
    df = pd.read_csv('sample_data/sales_data.csv')
    payload = {"name": "sales_sample", "data": df.to_dict('records')}
    res = requests.post('http://localhost:5000/upload_data', json=payload, timeout=10)
    res.raise_for_status()
    print(res.json())
    PY
  2. Detect outliers:

    curl -s http://localhost:5000/detect_outliers \
      -H 'Content-Type: application/json' \
      -d '{"dataset": "sales_sample", "method": "iforest", "columns": ["sales", "marketing_spend"]}' | jq .statistics
  3. Run regression:

    curl -s http://localhost:5000/regression \
      -H 'Content-Type: application/json' \
      -d '{"dataset": "sales_sample", "target": "sales", "features": ["marketing_spend", "temperature"], "model_name": "sales_model"}' | jq .metrics
  4. Forecast with the energy dataset:

    python - <<'PY'
    import pandas as pd, requests
    df = pd.read_csv('sample_data/energy_usage.csv')
    payload = {"name": "energy_sample", "data": df.to_dict('records')}
    requests.post('http://localhost:5000/upload_data', json=payload, timeout=10).raise_for_status()
    res = requests.post('http://localhost:5000/forecast', json={
        "dataset": "energy_sample",
        "value_column": "energy_usage",
        "method": "linear",
        "horizon": 5
    }, timeout=10)
    print(res.json()["forecast"])
    PY
  5. Run the automated suite once the data is loaded:

    pytest tests/test_analytics.py -v

Core Endpoints

Upload Data

POST /upload_data
{"name": "sales", "data": [{"col1": 10, "col2": 20}]}

Detect Outliers

POST /detect_outliers
{"dataset": "sales", "method": "iqr"}
# methods: iqr, zscore, iforest

Train Model

POST /train_model
{"dataset": "sales", "target": "col1", "model_name": "predictor"}

Predict

POST /predict
{"model_name": "predictor", "data": [{"col2": 25}]}

Forecast

POST /forecast
{"dataset": "sales", "value_column": "col1", "method": "linear", "horizon": 10}
# methods: linear, exponential

Regression

POST /regression
{"dataset": "sales", "target": "col1", "features": ["col2"]}

Example Usage

import requests

# Upload data
requests.post('http://localhost:5000/upload_data', json={
    "name": "sales",
    "data": [
        {"date": "2024-01-01", "sales": 100, "marketing": 20},
        {"date": "2024-01-02", "sales": 120, "marketing": 25}
    ]
})

# Detect outliers
result = requests.post('http://localhost:5000/detect_outliers', json={
    "dataset": "sales",
    "method": "iforest"
})
print(f"Found {result.json()['n_outliers']} outliers")

# Train model
requests.post('http://localhost:5000/train_model', json={
    "dataset": "sales",
    "target": "sales",
    "model_name": "predictor",
    "features": ["marketing"]
})

# Predict
predictions = requests.post('http://localhost:5000/predict', json={
    "model_name": "predictor",
    "data": [{"marketing": 30}]
})
print(predictions.json()['predictions'])

Configuration

Create .env file (see .env.example):

SECRET_KEY=your-secret-key
RATE_LIMIT_PER_MINUTE=60
MAX_BATCH_SIZE=10000
MODEL_DIR=models

Docker

docker-compose up -d

Algorithm Details

IQR: Fast, no training. Flags values outside Q1-1.5×IQR to Q3+1.5×IQR
Z-Score: Assumes normal distribution. Flags |z| > 3
Isolation Forest: Best for high-dimensional, non-linear patterns
Ridge Regression: L2 regularization, automatic scaling
Exponential Smoothing: Fast, good for stationary data
Linear Trend: Simple, interpretable forecasting

Performance

  • 100K rows in memory
  • Sub-second responses
  • Isolation Forest: ~100ms for 10K rows
  • Regression: ~50ms for 10K rows

License

MIT

About

ML / NLP for Log Analysis API

Resources

License

Stars

Watchers

Forks

Packages

No packages published