MLOps Zoomcamp - My Assignments

This repository contains my solutions and assignments for the MLOps Zoomcamp course by DataTalks.Club. Each module of the course is organized into separate directories with their respective assignments, code, and documentation.

Course Overview

MLOps Zoomcamp is a comprehensive course on Machine Learning Operations (MLOps) covering the full lifecycle of ML projects, from data preparation to model deployment and monitoring.

Repository Structure

01-intro/
- Introduction to MLOps
- NYC Yellow taxi trip duration prediction model
- Data preprocessing, feature engineering, and model training
- Metrics evaluation on training and validation datasets
02-experiment-tracking/
- Experiment tracking with MLflow
- Training and hyperparameter tuning for taxi trip duration prediction
- MLflow model registry and tracking server setup
- Green Taxi Trip data processing and model optimization
03-orchestration/
- Workflow orchestration using Prefect
- End-to-end ML pipeline for NYC taxi trip duration prediction
- Prefect deployment configuration and scheduling
- Integration of MLflow with Prefect for model tracking
- Docker containerization for MLflow and Prefect services
04-deployment/
- Model deployment strategies
- Batch inference with Docker
- Cloud storage integration
- Orchestrated batch workflow with Prefect
- Production-ready code for model scoring
05-monitoring/
- Model monitoring for batch services
- PostgreSQL for metrics storage
- Grafana for metrics visualization
- Evidently AI for data drift detection
- Dockerized monitoring stack
06-best-practices/
- Unit and integration testing with pytest
- CI/CD implementation with GitHub Actions
- AWS Lambda for serverless model deployment
- LocalStack for local AWS service testing
- Infrastructure as code with Terraform
- Makefile automation for development workflows
- Code quality with isort, black, and pylint

Completed Assignments

Module 1: Introduction to MLOps

In this module, I created a linear regression model to predict the duration of taxi trips in NYC using Yellow Taxi Trip Records from January and February 2023. The assignment involved:

Loading and exploring Yellow Taxi Trip data
Computing trip durations and handling outliers
Implementing one-hot encoding for categorical features
Training a linear regression model
Evaluating the model on both training and validation datasets

Tools & Libraries used:

Python
Pandas, NumPy
scikit-learn
Jupyter notebooks

Module 2: Experiment Tracking and Model Registry

In this module, I implemented experiment tracking and model management using MLflow. The assignment focused on:

Setting up MLflow for experiment tracking and model versioning
Processing NYC Green Taxi Trip data for January to March 2023
Training RandomForestRegressor models with MLflow autologging
Running hyperparameter optimization with hyperopt
Setting up a local MLflow tracking server with SQLite backend
Managing model lifecycle using MLflow Model Registry
Evaluating and selecting the best model based on validation metrics

Tools & Libraries used:

MLflow (version 1.27.0)
Hyperopt for hyperparameter tuning
scikit-learn (RandomForestRegressor)
Pandas, NumPy
SQLite for MLflow backend storage

Module 3: Orchestration and ML Pipelines

In this module, I implemented workflow orchestration for the NYC taxi trip duration prediction project using Prefect. The work involved:

Building an end-to-end ML pipeline with Prefect tasks and flows
Implementing automated data preparation and model training processes
Setting up Prefect deployments with scheduling capabilities
Integrating MLflow tracking within the Prefect workflow
Containerizing the entire workflow with Docker for reproducibility
Creating monitoring utilities for tracking model performance
Implementing error handling and retry mechanisms for workflow reliability
Setting up a complete MLOps environment with both Prefect server and workers

Tools & Libraries used:

Prefect 3.x for workflow orchestration
Docker for containerization
MLflow for experiment tracking
scikit-learn for modeling
Shell scripts for automation

Module 4: Model Deployment

In this module, I implemented batch deployment strategies for the NYC taxi trip duration prediction model. The assignment focused on:

Converting Jupyter notebooks to production-ready Python scripts
Building a parameterized batch scoring system for different time periods
Containerizing the model with Docker for portable execution
Integrating cloud storage options for prediction results (AWS S3, GCS, Azure)
Creating an orchestrated batch workflow using Prefect
Implementing a complete deployment pipeline from data ingestion to result storage
Designing a flexible system that can handle different data sources and output formats

Tools & Libraries used:

Docker for containerization
Prefect for workflow orchestration
Cloud storage SDKs (boto3, google-cloud-storage)
pandas for data processing
scikit-learn for model inference

Module 5: Model Monitoring

In this module, I implemented a comprehensive monitoring system for ML batch services using NYC Green Taxi data. The assignment involved:

Setting up a monitoring stack with PostgreSQL, Grafana, and Evidently AI
Building a baseline linear regression model for taxi trip duration prediction
Implementing data quality metrics including QuantileValue for fare_amount and ValueDrift for predictions
Creating a pipeline to calculate daily metrics for March 2024 data
Developing a Prefect-orchestrated workflow for metrics calculation
Storing monitoring metrics in a PostgreSQL database
Designing custom Grafana dashboards for metric visualization
Saving and managing dashboard configurations for reproducibility
Containerizing the entire monitoring stack with Docker

Tools & Libraries used:

Evidently AI for data quality and model performance monitoring
PostgreSQL for metrics storage
Grafana for visualization and alerting
Docker and Docker Compose for containerization
Prefect for workflow orchestration
pandas and scikit-learn for data processing and modeling

Module 6: Best Practices

In this module, I implemented MLOps best practices for the NYC taxi trip duration prediction project. The assignment focused on:

Setting up unit and integration testing with pytest
Implementing code quality checks with GitHub Actions CI/CD
Using AWS Lambda for serverless model inference
Working with LocalStack for local AWS service testing
Implementing infrastructure as code with Terraform
Creating Makefile automation for streamlined development workflow
Establishing a robust project structure with proper package management
Implementing proper environment management and configuration
Setting up linting and formatting with isort, black, and pylint
Creating comprehensive documentation for the project

Tools & Libraries used:

pytest for testing
GitHub Actions for CI/CD
AWS Lambda and LocalStack
Terraform for infrastructure as code
Docker and Docker Compose
Makefile for workflow automation
isort, black, and pylint for code quality

Setup and Installation

This project uses Python 3.10+ and the following dependencies as defined in the pyproject.toml file:

The project uses the uv package manager for faster dependency resolution and installation:

# Create a virtual environment using uv
uv venv
source .venv/bin/activate  # On Linux/Mac

# Install dependencies with uv
uv pip install -e .

# The dependencies are locked in the uv.lock file for reproducibility

Future Work

With all modules completed, the only remaining component is:

Final Project: Putting all the MLOps practices together in a comprehensive end-to-end solution

Contact

Feel free to reach out if you have any questions about my assignments or solutions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MLOps Zoomcamp - My Assignments

Course Overview

Repository Structure

Completed Assignments

Module 1: Introduction to MLOps

Module 2: Experiment Tracking and Model Registry

Module 3: Orchestration and ML Pipelines

Module 4: Model Deployment

Module 5: Model Monitoring

Module 6: Best Practices

Setup and Installation

Future Work

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.vscode		.vscode
01-intro		01-intro
02-experiment-tracking		02-experiment-tracking
03-orchestration		03-orchestration
04-deployment		04-deployment
05-monitoring		05-monitoring
06-best-practices		06-best-practices
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

taltaf913/mlops-zoomcamp-2025

Folders and files

Latest commit

History

Repository files navigation

MLOps Zoomcamp - My Assignments

Course Overview

Repository Structure

Completed Assignments

Module 1: Introduction to MLOps

Module 2: Experiment Tracking and Model Registry

Module 3: Orchestration and ML Pipelines

Module 4: Model Deployment

Module 5: Model Monitoring

Module 6: Best Practices

Setup and Installation

Future Work

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages