BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
WORK INTEGRATED LEARNING PROGRAMMES
Part A: Course Design
Course Title MLOps
Course No(s) AIML ZG523
Credit Units 4
Content Authors Pravin Y Pawar
Version 1.0
Course Description
Adaptation of DevOps for building and deploying machine learning systems, Model Deployment:
Infrastructure requirements; Deployment patterns, Model CI/CD (Build, Test, Integration and Delivery of
model); Model Serving tools and technologies; Model life cycle management, ML pipelines with data
management support, model assessment, evolution and management in production, MLOps infrastructure and
tools; Trends in Model deployment: ML on the Cloud / Edge / Browsers; VMs, Containers, Docker,
Kubernetes (K8S), FaSS; ML-as-a-Service.
Course Objectives
The course aims at providing:
CO1 Provide understanding of the requirements, stakeholders and essential steps involved in building a
machine learning pipeline
CO2 Equip with conceptual knowledge and hands-on experience in ML model deployments on different
targets
CO3 Experience in automating the process for continually developing, evaluating, deploying and
updating the models
CO4 Introduce and apply industry practices for model monitoring and observability
CO5 Orient towards latest trends in MLOps space especially on cloud, for edge and mobile devices, in
browsers
Text Book(s)
T1 Introducing MLOps, Treveil and Dataiku Team
T2 Designing ML Systems, Chip Huyen
T3 Engineering MLOps: Rapidly build, test, and manage production-ready machine learning life
cycles at scale
Reference Book(s) & other resources
R1 Reliable Machine Learning – Applying SRE Principles to ML in Production, Chen et al
R2 Machine Learning Engineering, Andriy Burkov
R3 Building Machine Learning Pipelines - Automating Model Life Cycles with TensorFlow, Hannes
Hapke & Catherine Nelson
R4 Practical MLOps, Noah Gift, Alfredo Deza
R5 Beginning MLOps with MLFlow: Deploy Models in AWS SageMaker, Google Cloud, and
Microsoft Azure
R6 AWS / Azure MLOps documentation
R7 Various product technical / white papers
Learning Outcomes:
Students will be able to :
LO1 Determine the infrastructure and tooling requirements necessary for a specified ML Use case
realization
LO2 Build, deploy, serve, orchestrate and analyze ML pipeline using open-source tools/platforms
LO3 Refine ML models through retraining, periodic tuning and complete remodeling to ensure long-
term accuracy
LO4 Appreciate model Serving approaches on various targets such as edge, mobile device, on cloud, in
browser
Part B: Course Handout
Academic Term I Semester 2023-2024
Course Title MLOps
Course No AIML ZG523
Lead Instructor Pravin Y Pawar
Glossary of Terms
Module M Module is a standalone quantum of designed content. A typical course is
delivered using a string of modules. M2 means module 2.
Contact Hour CH Contact Hour (CH) stands for an hour long live session with students
conducted either in a physical classroom or enabled through technology.
In this model of instruction, instructor led sessions will be for 32 CH.
Recorded RL RL stands for Recorded Lecture or Recorded Lesson. It is presented to the
Lecture student through an online portal. A given RL unfolds as a sequences of
video segments interleaved with exercises.
Lab Exercises LE Lab exercises associated with various modules
Self-Study SS Specific content assigned for self-study
Homework HW Specific problems/design/lab exercises assigned as homework
Modular Structure
Module Summary
No. Content of the Module
M1 MLOps Foundations
M2 Process and Tooling
M3 Model Experimentation and Packaging
M4 Model deployment & Orchestration
M5 Model Serving
M6 Monitoring & Observability
M7 Continual Learning and Testing
M8 Trends in MLOps
Detailed Structure
M1: MLOps Foundations
Contact Session 1-3
Session Type Description/Plan Reference
1 CH1 Three levels of ML software ML Workflow
Three Levels
CH2
ML life-cycle and System Architecture AWS ML Lens
ML System Arch
2 CH3 Challenges with ML lifecycle Challenges
CH4 T1 Ch1
Motivation and Drivers for MLOps
Peoples of MLOps T1 Ch2
3 CH5 Key MLOps features and maturity models T1 Ch3
Google MLOps
CH6
Microsoft
AllTheOps: DataOps, ModelOps, AIOps AllTheOps
Post CS AR Hidden technical debt in machine learning systems
2020 State of Enterprise Machine Learning | Algorithmia
Why is DevOps for Machine Learning so Different?
Delivering on the Vision of MLOps
MLOps Principles
MLOps Principles and How to Implement Them
MLOps vs. DevOps vs. ModelOps
Differences Between MLOps, ModelOps, AIOps,
DataOps
Roles in ML Team and How They Collaborate
LE With Each Other
Lab 1
M2: Process and Tooling
Contact Session 4-5
Session Type Description/Plan Reference
4 CH7 MLOps life-cycle, process and capabilities Google Guide
CH8 Infrastructure: Storage and Compute T2 Ch10
Dev / Production Env, Runtimes T1 Ch5
5 CH9 ML Platforms T2 Ch10
Landscape of MLOps Tools / Platforms ML Platforms
CH10 Mymlops
Uber’s Michelangelo tools
TFX @ Spotify Michelangelo
ML @ NetFlix’s SpotifyI
SpotifyII
Netflix
Post CS AR Building a Machine Learning Platform [Definitive
Guide]
Building a machine learning platform
Open Source MLOps: Platforms, Frameworks and
Tools
A Tour of End-to-End Machine Learning
Platforms
End to End ML Platform! Are we there yet?
ML Platform Podcast
MLOps Landscape in 2023: Top Tools and
Platforms
M3: Model Experimentation and Packaging
Contact Session 6-7
Session Type Description/Plan Reference
6 CH11 Experimentation
CH12 Model Versioning
Model Metadata
Model Registry
7 CH13 Model Packaging T3 Ch5
Model File formats R4 Ch4
CH14
Serialization
Containerization
Post CS AR Three Levels of ML Software
Guide to File Formats for Machine Learning:
Columnar, Training, Inferencing, and the Feature
Store
LE
Lab 2
M4: Model deployment & Orchestration
Contact Session 8-9
Session Type Description/Plan Reference
8 CH15 Deployment Myths T2 Ch7
CH16 Productionalization and Deployment T1 Ch6
Deployment requirements and challenges
Deployment Patterns R2 Ch8
o Static / Dynamic / Streaming
9 CH17 Orchestration of ML Pipelines
CH18 Apache Beam/AirFlow/KubeFlow
Post CS LE Lab 3
M5: Model Serving
Contact Session 10-12
Session Type Description/Plan Reference
10 CH19 Properties of Model Serving runtime R2 Ch8
CH20 Key serving questions: Load, Latency, Location, R1 Ch7
Hardware, Execution, Feature pipelines
three-
Model serving Architectures/ patterns levels-of-ml-
software
Batch vs Online Prediction/Scoring T2 Ch7
11 CH21 Model Server(Model as a Service)
CH22 Model API Design
Real-time model serving
12 CH23 Integrated ML Platforms / ML-as-a-Service Platform
(MLaaS)
CH24
Case study: AirBnb, Netflix, Booking.com
Post CS LE How to Solve the Model Serving Component of
the MLOps Stack
Lab 3 and 5
M6: Monitoring & Observability
Contact Session 13-14
Session Type Description/Plan Reference
13 CH25 Causes of ML system failures • T2 Ch8
Model degradation • T1 Ch7
CH26
Drifts detection
Feedback loop
14 CH27 Production Monitoring R1 Ch9
CH28 Monitoring and Observability T2 Ch8
o ML-specific Metrics
o Monitoring toolbox
o Observability
Post CS AR A Comprehensive Guide on How to Monitor Your
Models in Production
Arize - Machine Learning Observability
LE Lab 4
M7: Continual Learning and Testing
Contact Session 15
Session Type Description/Plan Reference
15 CH29 Continual Learning T2 Ch9
o Stateless retraining vs Stateful training
CH30
o Challenges
o Stages of continual learning
o Model upgradations
Test in production R2 Ch7
o Offline vs Online evaluation T2 Ch9
o Shadow deployment
o A/B testing
o Canary releases
o Interleaving experiments
o Bandits
Post CS SS To be identified
M8: Trends in MLOps
Contact Session 16
Session Type Description/Plan Reference
16 CH31 Model Compression
CH32 ML in Browsers and Mobile Phones
ML on Edge
Continuous ML
Federated Machine Learning
Post CS SS To be identified
Experiential Leaning Component
Lab Topic
1 Construct an end-to-end Machine Learning Pipeline using Virtual Labs
MLflow (MATS?)
Stages
a) Problem understanding (aka business understanding)
b) Data collection
c) Data annotation
d) Data wrangling
e) Model development, training and evaluation
f) Model Validation
g) Local Model deployment
h) Prediction
Tech-Stack
a) RDBMS / Real time Source
b) SQL/Python
c) Dbt
d) Feast
e) DVC
f) Python/Scikit-Learn
g) MLFlow
h) GitHub
i) REST
2 Manage Machine Learning Model Metadata using MLFlow / Virtual Labs
Neptune (Continuous Integration)
Components
a) Projects
b) Experiments
c) Metadata
d) Model tracking / logging
e) Model Registry
Tech-Stack
a) Python
b) Jupyter Notebooks
c) MLFlow / Neptune
d) GIT?
3 Deploy and serve the ML model as Microservices (Continuous Virtual Labs
Delivery)
Stages
a) Triggers for deployment
b) Local Deployment using containers
c) Cloud deployment using AWS services (Sagemaker +
S3 etc.)
d) Offline (batch) serving
e) Online serving
Tech-stack
a) Python
b) Containers
c) AWS
d) MLflow Model Registry
e) API
4 Monitor the Performance of deployed predictive model Virtual Labs
Stages
a) Monitoring Data and feature drift
b) Monitoring target drift
c) Monitoring model performance
Tech-Stack
a) MLflow
b) Model Server
c) Evidently
5 Manage MLOps lifecycle using Cloud services Virtual Labs
Stages
Tech-Stack
a) Azure Machine Learning
b) Azure DevOps
Evaluation Scheme:
Legend: EC = Evaluation Component; AN = After Noon Session; FN = Fore Noon Session
No Name Type Duration Weight Day, Date, Session, Time
Experiential learning Take 15 days 20% TBA
EC-1 Assignment-I Home
Experiential learning Take 15 days 20% TBA
Assignment-II Home
EC-2 Mid-Semester Test Closed 2 hours 30% Per programme schedule
Book
EC-3 Comprehensive Open 3 hours 30% Per programme schedule
Exam Book
Syllabus for Mid-Semester Test (Closed Book): Topics in Session Nos. 1 to 7
Syllabus for Comprehensive Exam (Open Book): All topics (Session Nos. 1 to 16)
Important links and information:
Elearn portal: https://elearn.bits-pilani.ac.in
Students are expected to visit the Elearn portal on a regular basis and stay up to date with the latest
announcements and deadlines.
Contact sessions: Students should attend the online lectures as per the schedule provided on the Elearn portal.
Evaluation Guidelines:
1. EC1 consists of two assignments. Announcements will be made available on the portal, in a timely
manner.
2. For Closed Book tests: No books or reference material of any kind will be permitted.
3. For Open Book exams: Use of books and any printed / written reference material (filed or bound) is
permitted. However, loose sheets of paper will not be allowed. Use of calculators is permitted in all
exams. Laptops/Mobiles of any kind are not allowed. Exchange of any material is not allowed.
4. If a student is unable to appear for the Regular Test/Exam due to genuine exigencies, the student
should follow the procedure to apply for the Make-Up Test/Exam which will be made available on the
Elearn portal. The Make-Up Test/Exam will be conducted only at selected exam centres on the dates to
be announced later.
It shall be the responsibility of the individual student to be regular in maintaining the self-study schedule as
given in the course handout, attend the online lectures, and take all the prescribed evaluation components such
as Assignment/Quiz, Mid-Semester Test and Comprehensive Exam according to the evaluation scheme
provided in the handout.