Thanks to visit codestin.com
Credit goes to github.com

Skip to content

This repository contains an end-to-end ML engineering solution for the Microsoft Azure Predictive Maintenance dataset. The goal is to predict whether a machine will fail in the near future, using a binary classification model.

Notifications You must be signed in to change notification settings

Maxkaizo/pred_maint

Repository files navigation

Predictive Maintenance – ML Engineering Challenge

This repository contains an end-to-end ML engineering solution for the Microsoft Azure Predictive Maintenance dataset.

The goal is to predict whether a machine will fail in the near future, using a binary classification model.
The focus of this project is on reproducibility, deployment craftsmanship, and MLOps maturity, rather than state-of-the-art modeling.


Quick Start

1. Clone the repository

git clone https://github.com/Maxkaizo/pred_maint.git
cd pred_maint

2. Environment variables

The project requires a .env file at the root directory.

If it does not exist, create it manually with the following content:

PREFECT_API_URL=http://prefect:4200/api
MLFLOW_TRACKING_URI=http://mlflow:5000
AWS_ACCESS_KEY_ID=test
AWS_SECRET_ACCESS_KEY=test
MLFLOW_S3_ENDPOINT_URL=http://localstack:4566
PYTHONPATH=/app
AWS_DEFAULT_REGION=us-east-1

3. Start the full environment

This project is fully containerized. Running the following command spins up all required services:

docker compose up --build

This may take a while on the first run, as Docker will download images and initialize all services.

Services included:

  • Prefect (pipeline orchestration) → UI at http://localhost:4200
  • MLflow (experiment tracking & model registry) → UI at http://localhost:5000
  • Postgres (metadata storage for Prefect & MLflow)
  • LocalStack (S3 emulation for datalake & artifacts)
  • Training pipeline (runs automatically on startup)
  • Inference API (REST service for predictions at http://localhost:8000)

System Architecture

4. Test inference

Once the containers are up, send a prediction request to the inference API:

curl -X POST http://localhost:8000/predict \
     -H "Content-Type: application/json" \
     -d @sample.json

Repository Structure

.
├── app/                 # Core pipeline code (flows & tasks)
│   ├── flows/           # Prefect flows (orchestration entrypoints)
│   └── tasks/           # Modular tasks (data prep, FE, training)
├── data/                # Raw & processed data (local copy of Kaggle dataset)
├── docs/                # Documentation (TDD, design notes, tech report, diagrams)
├── inference_app/       # Inference service (REST API + model loader)
├── notebooks/           # EDA and experimentation notebooks
├── postgres-init/       # SQL init scripts for Postgres databases
├── docker-compose.yml   # Orchestration of all containers
├── Dockerfile*          # Container definitions (training, inference, mlflow)
├── environment.yml      # Python environment (dependencies)
└── sample.json          # Example payload for inference requests

Components

  • Training pipeline – Prefect orchestrates data ingestion, feature engineering, target creation, model training and registration in MLflow.
  • Models – CatBoost (primary) and LightGBM (secondary), chosen based on PRC/ROC performance.
  • Tracking – MLflow stores runs, metrics, and artifacts.
  • Storage – LocalStack (S3 emulation) for datalake and artifacts.
  • Deployment – Inference container serving the model via REST API.

Results

  • Best model: CatBoost (AP ≈ 0.86, F1 ≈ 0.87).
  • Business trade-off: Recall prioritized (avoid missed failures) over precision (extra operational costs).
  • Performance curves and full analysis can be found in docs/performance_curves.png.

Documentation


🛠 Roadmap

Planned improvements (not included due to time constraints):

  • Periodic retraining via Prefect schedules.
  • Continuous monitoring with Evidently.
  • CI/CD with GitHub Actions.
  • Code linting & pre-commit hooks.

About

This repository contains an end-to-end ML engineering solution for the Microsoft Azure Predictive Maintenance dataset. The goal is to predict whether a machine will fail in the near future, using a binary classification model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published