Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views97 pages

Machine Learning Operations

The document outlines the objectives and structure of a course on Machine Learning Operations (MLOps) offered by Databricks Academy. It covers essential topics such as the integration of DataOps, DevOps, and ModelOps, practical applications within Databricks, and best practices for implementing machine learning projects. Prerequisites include basic knowledge of machine learning concepts and Python, along with an understanding of DevOps principles.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views97 pages

Machine Learning Operations

The document outlines the objectives and structure of a course on Machine Learning Operations (MLOps) offered by Databricks Academy. It covers essential topics such as the integration of DataOps, DevOps, and ModelOps, practical applications within Databricks, and best practices for implementing machine learning projects. Prerequisites include basic knowledge of machine learning concepts and Python, along with an understanding of DevOps principles.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 97

Machine Learning

Operations

Databricks Academy
August, 2024
©2024 Databricks Inc. — All rights reserved
Course Learning Objectives
● Explain modern machine learning operations within the frameworks of
DataOps, DevOps, and ModelOps.
● Relate MLOps activities to the features and tools available in Databricks,
and explore their practical applications in the machine lifecycle.
● Design and implement basic machine learning operations, including
setting up and executing a machine learning project on Databricks,
following best practices and recommended tools.
● Detail Implementation and Monitoring capabilities of MLOps solutions
on Databricks.

©2024 Databricks Inc. — All rights reserved


Prerequisites/Technical Considerations
Things to keep in mind before you work through this course

Prerequisites Technical Considerations


Basic knowledge of traditional machine
1 learning concepts 1 A cluster running on DBR ML 14.3+
Unity Catalog, Model Serving, and
Beginner experience with traditional 2 Lakehouse Monitoring enabled
2 machine learning development on workspace
Databricks
3 CLI Authentication
Intermediate knowledge of Python for
3 machine learning projects

Recommended: Beginner experience with


4 basic DevOps concepts like CI/CD

©2024 Databricks Inc. — All rights reserved


AGENDA
1. Modern MLOps Time Lecture DEMO LAB
Defining MLOps 20 min ✔

20 min
MLOps on Databricks 15 min ✔ ✔ ✔
15 min

02. Architecting MLOps Solutions


Opinionated MLOps Principles
Course Agenda
15 min
25 min

Recommended MLOps Architectures 15 min ✔ ✔ ✔


15 min

03. Implementation and Monitoring MLOps Solution


MLOps Stacks Overview 10 min ✔

Type of Model Monitoring 15 min ✔

15 min
Monitoring in Machine Learning 25 min ✔ ✔ ✔
30 min

©2024 Databricks Inc. — All rights reserved


Modern MLOps

Machine Learning Operations

©2024 Databricks Inc. — All rights reserved


Learning Objectives
● Explain the significance of MLOps by integrating DataOps, DevOps, and
ModelOps in modern machine learning.
● Identify and understand the components of DataOps, DevOps, and
ModelOps within the context of machine learning.
● Describe Databricks' capabilities for handling tasks related to DataOps,
DevOps, and ModelOps.
● Relate Databricks features and services to practical applications in
DataOps, DevOps, and ModelOps tasks.

©2024 Databricks Inc. — All rights reserved


Modern MLOps

LECTURE

Defining MLOps

©2024 Databricks Inc. — All rights reserved


The Machine Learning Full Lifecycle
End to End Process from Business Problem to Deployment and Monitoring

Data
Business Define Success Preprocessing/
Data Collection Model Training
Problem Criteria Feature
Engineering

Model Development Model


(use static historical data) Evaluation

Deployment & Production


(deal with continuously Model Model
changing new data) Monitoring Deployment

©2024 Databricks Inc. — All rights reserved


Defining MLOps
An all-inclusive, holistic approach to managing ML systems

The set of practices, processes, and technologies for


managing data, code, and models to improve
performance stability and long-term efficiency in
ML systems.

©2024 Databricks Inc. — All rights reserved


Understanding the Components of MLOps
MLOps components each address a specific part of ML projects

DataOps DevOps ModelOps

A set of practices, A set of practices, A set of practices,


processes, and technologies processes, and technologies processes, and technologies
to organize and improve to integrate and automate to organize and govern the
processes around data to software development lifecycle of machine learning
increase speed, governance, workflows. and artificial intelligence
quality, and collaboration. models.

©2024 Databricks Inc. — All rights reserved


Responsibilities of MLOps Components
Key Functions of DataOps, DevOps, and ModelOps in MLOps

DataOps DevOps ModelOps

• Optimized data • Machine learning is code • Move beyond models as


processing • Continuous integration objects
• Centralized data and continuous • Treating model code as
discovery, management, deployment (CI/CD) software
and governance • Version control via Git • Treating models as data
• Ensured data quality • Production-grade • Manage the model
• Traceable data lineage workflows lifecycle
and monitoring • Orchestration
• Automation

©2024 Databricks Inc. — All rights reserved


Comprehensive MLOps
Operationalizing the entire machine learning solution

DataOps

The set of processes and


automation for managing
data, code, and models to
improve performance stability MLOps
and long-term efficiency in ML
systems. DevOps ModelOps

©2024 Databricks Inc. — All rights reserved


A Simple Example ML Project
Retail Recommendation System

1. A business owner defines a problem to be solved with a


recommendation service.
2. A data scientist begins exploring governed data associated with the
service.
3. A data scientist develops a scalable ML solution using using relevant
data while tracking the experiment.
4. A machine learning engineer implements CI/CD, automates the ML
solution, and establishes model performance monitoring; the data is
written to a production catalog.

©2024 Databricks Inc. — All rights reserved


Why does MLOps matter?
Success depends on quality data and operations practices

• Defining an effective strategy Real-world Example


• ML systems built on quality data
• Streamlining the process of taking Databricks customer CareSource accelerated
solutions to production their model’s development and deployment,
• Operationalizing performance and resulting in a self-service MLOps solution for
effectiveness monitoring data scientists that reduced ML project time
• So what? from 8 weeks to 3-4 weeks.
• Time to realizing business value is
accelerated The CareSource team can extend this
• Reduction in manual oversight by approach to other machine learning projects,
high-value data science teams realizing this benefit broadly.
Learn more about the work here.

©2024 Databricks Inc. — All rights reserved


W
DataOps, DevOps, and ModelOps in MLOps
Effective machine learning involves managing data, code, and the model
lifecycle to maintain and improve performance.
Model Lifecycle Management

Data and
Data Model Model Model
EDA Model
Preparation Development Validation Serving
Monitoring CI/CD
Workflows
Tooling
Scalable, efficient, and performant data processing
DAG-based
Data Processing Solution Code management, orchestration, job
version control, and scheduling
Unified security, governance, and cataloging automatic testing
Data Governance Solution
Unified data storage for reliability, quality, and sharing
Data Storage and Management Solution

Lakehouse Data Architecture and Storage

©2024 Databricks Inc. — All rights reserved


A Single Platform for Modern MLOps
Combining DataOps, DevOps, and ModelOps solutions

Model Registry, Model Serving, and Lakehouse Monitoring

Data and
Data Model Model Model
EDA
Preparation Development Validation Serving
Model
Monitoring
Workflow
Repos Orchestration
Scalable, efficient, and performant data processing
Code management, DAG-based
Apache Spark and Photon version control, and orchestration, job
automatic testing scheduling
Unified security, governance, and cataloging
Unity Catalog
Unified data storage for reliability, quality, and sharing
Delta Lake

Lakehouse Data Architecture and Storage

©2024 Databricks Inc. — All rights reserved


Modern MLOps

LECTURE

MLOps on
Databricks

©2024 Databricks Inc. — All rights reserved


DataOps tasks and tools in Databricks
The table lists common DataOps tasks and tools in Databricks:
Ingest & transform data Autoloader and Apache Spark*

Track data changes to including


Delta tables*
versioning & lineage

Build, manage, & monitor


Delta Live Tables*
data processing pipelines

Ensure data security &


Unity Catalog*
governance

Exploratory data analysis, Databricks SQL, Dashboards,


dashboards, & general coding and Databricks notebooks
Schedule data pipelines &
Databricks Workflows
Automate general workflows

Create, store, manage, &


Databricks Feature Store*
discover features

* Discussed in other associated Data monitoring Lakehouse Monitoring**


machine learning courses.
** Covered later in the training
©2024 Databricks Inc. — All rights reserved
Databricks SQL

Delivering analytics on the Data Science


& AI
ETL &
Real-time Analytics
Orchestration Data
Warehousing

freshest data with data Mosaic AI Delta Live Tables Workflows Databricks SQL

warehouse performance and data Use generative AI to understand the semantics of your data

Data Intelligence Engine

lake economics Unity Catalog


Securely get insights in natural language

■ Better price / performance than other


Delta Lake
cloud data warehouses Data layout is automatically optimized based on usage patterns
■ Simplify discovery and sharing of new
insights Open Data Lake
■ Connect to familiar BI tools, like Tableau All Raw Data
(Logs, Texts, Audio, Video, Images)
or Power BI
■ Simplified administration and governance

©2024 Databricks Inc. — All rights reserved


Better price / performance

Run SQL queries on your


lakehouse and analyze your
freshest data with up to 12x
better price/performance than
traditional cloud data
warehouses.

Source: Performance Benchmark with Barcelona Supercomputing Center

©2024 Databricks Inc. — All rights reserved


A new home for Data Analysts

Enable data analysts to quickly


perform ad-hoc and exploratory
data analysis, with a new SQL
query editor, visualizations and
dashboards. Automatic alerts can
be triggered for critical changes,
allowing to respond to business
needs faster.

©2024 Databricks Inc. — All rights reserved


DataOps Tasks and Tools in Databricks
The table lists common DataOps tasks and tools in Databricks:
Ingest & transform data Autoloader and Apache Spark*

Track data changes to including


Delta tables*
versioning & lineage

Build, manage, & monitor


Delta Live Tables*
data processing pipelines

Ensure data security &


Unity Catalog*
governance

Exploratory data analysis, Databricks SQL, Dashboards,


dashboards, & general coding and Databricks notebooks
Schedule data pipelines &
Databricks Workflows
Automate general workflows

Create, store, manage, &


Databricks Feature Store*
discover features

* Discussed in other associated Data monitoring Lakehouse Monitoring**


machine learning courses.
** Covered later in the training
©2024 Databricks Inc. — All rights reserved
Databricks Workflows Databricks Workflows

Data ScienceAI
Databricks ETL &Tables
Delta Live Orchestration
Workflows Data
Workflows is a fully-managed & AI
Create, tune, and
Real-time Analytics
Automated Job cost optimized
Databricks SQL
Warehousing

Databricks AI Delta Live Tables Workflows Text-to-SQL


Databricks SQL
cloud-based general-purpose task
serve custom LLMs data quality based on past runs

Use generative AI to understand the semantics of your data


orchestration service for the entire Data Intelligence Engine

Lakehouse. Unified security, governance, and cataloging


Unity Catalog
Unity
Securely get Catalog
insights in natural language

Unified data storage for reliability and sharing


Delta Lake
Workflows is a service for data engineers, Delta
Data layout is automatically Lake based on usage patterns
optimized

data scientists and analysts to build


Open Data Lake
reliable data, analytics and AI workflows All Raw Data
(Logs, Texts, Audio, Video, Images)
on any cloud.

©2024 Databricks Inc. — All rights reserved


Databricks Workflows
Databricks has two main task orchestration services:
Job Workflows Delta Live Tables

• Execute jobs on a • Perform machine • Implement a variety of • ETL processes


predefined schedule learning operations by tasks within a job • Compatible with batch
• Series of interrelated running tasks within job • Using notebooks, JARS, and streaming inputs
tasks. frameworks like MLflow Delta Live Tables • Enforced data quality
pipelines, or Python, and consistency.
Scala, Spark submit, • Tracking & logging of
SQL and Java data transformation
applications
Arbitrary Code,
Data Ingestion
Orchestration of Machine External API
and
Dependent Jobs Learning Tasks Call, Custom Transformation
Tasks

Note: DLT pipeline can be a task in a workflow

©2024 Databricks Inc. — All rights reserved


Workflows Jobs
Key Features

Workflow Job
• Easy creation, scheduling, and orchestration of
your code with a DAG (Directed Acyclic Graphs)
• Key features
• Simplicity: Easy creation and monitoring in the
DAG of tasks
UI
• Many Tasks types suited to your workload
• Fully integrated in Databricks platform, making
inspecting results, debugging faster
• Reliability of proven Databricks scheduler
• Observability to easily monitor status

©2024 Databricks Inc. — All rights reserved


Building Blocks of Databricks Workflows Job
A unit of orchestration in Databricks Workflows is called a Job.

Jobs consist of
one or more Tasks
Databricks Python Python SQL DBSQL Delta Live Tables dbt Java Spark
Notebooks Scripts Wheels Files/Queries Dashboards Pipeline JAR file Submit

Control flows can


be established
between Tasks. Conditionals Jobs as a Task
Sequential Parallel
(Run If) (Modular)

Private
Preview

Jobs supports
different Triggers
Manual Scheduled API File Delta Table Continuous
Trigger (Cron) Trigger Arrival Update (Streaming)

©2024 Databricks Inc. — All rights reserved 26


ModelOps Tasks and Tools in Databricks
The table lists common ModelOps tasks and tools provided by Databricks:

Manage model lifecycle Models in Unity Catalog*

Track model development MLflow model tracking*

Model code version control


Databricks Repos
and sharing

No-code model development Databricks AutoML*

Model monitoring Lakehouse Monitoring**

* Discussed in other associated


machine learning courses.
** Covered later in the training
©2024 Databricks Inc. — All rights reserved
Introduction to Databricks Repos
Databricks Repos provide a visual Git client & API within Databricks, allowing
users to manage code repositories, collaborate, and integrate with Git services.

Key Features:
● Seamless Git integration.
● Collaborative coding
environment.
● Simplified version control.

©2024 Databricks Inc. — All rights reserved


Databricks Repo Setup and Commands
Once setup easily manage and perform Git operations within Databricks

● From the Databricks UI


common Git operations:
■ Clone
■ Checkout
■ Commit
■ Push
■ Pull
■ Branch management
● Uses Personal Access Token or
equivalent to authenticate.

©2024 Databricks Inc. — All rights reserved


DevOps: Production and automation
The table lists common DevOps tasks and tools provided by Databricks:
Data and model lineage
Unity Catalog*
Access control and governance

Maintain a highly available


Mosaic AI Model Serving*
low latency REST endpoint

Automate and schedule Databricks workflows


Databricks also supports integrations with popular third party
workloads, from: ETL to ML orchestrators like Airflow.

Deployment infrastructure for Asset Bundles, Databricks SDKs,


inference and serving Terraform provider, Databricks CLI

Databricks Asset Bundles, Azure


Establish CI/CD Pipelines
DevOps, Jenkins, or GitHub Actions

* Discussed in other associated Monitoring, and Maintaining


machine learning courses. Lakehouse Monitoring**
your Applications
** Covered later in the training
©2024 Databricks Inc. — All rights reserved
Overview Developer Tools

Databricks Asset Bundles Databricks CLI Databricks SDKs Terraform provider


(Recommended)
• Infrastructure as Code for • Easy-to-use interface SDKs for: • Flexible tool for to
Databricks resources and for automation from • Python manage your
workflows terminal, command • Java Databricks
prompt, or bash • Go workspaces and the
scripts. • R associated cloud
infrastructure

©2024 Databricks Inc. — All rights reserved


Overview Developer Tools

Databricks Asset Bundles Databricks CLI Databricks SDKs Terraform provider


(Recommended)

● Infrastructure as Code for ● Easy-to-use interface SDKs for: ● Flexible tool for to
Databricks resources and for automation from • Python manage your
terminal, command • Java Databricks
workflows prompt, or bash scripts. workspaces and the
• Go
• R associated cloud
infrastructure

©2024 Databricks Inc. — All rights reserved


Databricks CLI
There are two CLIs available. CLIs are built on top of the REST API and
organized into command groups.

Databricks CLI Databricks SQL CLI

Used for Data Science & Engineering workspace Used for SQL warehouses. Use cases:
assets such as cluster policies, clusters, file
• Run SQL queries on existing warehouses
systems, groups, pools, jobs. Use Cases:
• Run queries from a query string. Example:
• Provision compute resources in Databricks dbsqlcli -e "SELECT * FROM
workspaces default.diamonds LIMIT 2"
• Run data processing and data analysis tasks • Run queries from a text file. Example:
• List, import, and export notebooks and dbsqlcli -e my-query.sql
folders in workspaces • Run queries in read-evaluate-print loop
(REPL)

©2024 Databricks Inc. — All rights reserved


What is a DAB?

• Databricks Asset Bundles or DABs are a collection of Databricks artifacts 1 (e.g. jobs, ML
models, DLT pipelines, and clusters) and assets 2 (e.g. Python files, notebooks, SQL queries,
and dashboards).
• These DABs (aka bundles) are configured through YAML files and can be co-versioned in
the same repository as the assets and artifacts referenced in the bundle.
• Using the Databricks CLI these bundles can be materialized across multiple workspaces
like dev / staging and production enabling customers to integrate these into their
automation and CI/CD processes

1 Assets are instantiations of sources that persist state.


2 Artifacts are file-like resources that exist on a workspace path that carry little or no state

©2024 Databricks Inc. — All rights reserved 34


Step 1: Build the bundle

● Use ‘databricks bundle init’ to generate the core


folder structure on workstation using CLI
● All code (notebooks, DLT and python functions) is stored
in src/ folder
● All environment and resource configurations are stored in
databricks.yml as well as other YAML files inside
resources/ folder
● Entire DAB folder is managed through source control

©2024 Databricks Inc. — All rights reserved


Step 2: Deploy to and run in Dev
As part of active development
Alice

$ databricks bundle deploy -e “dev”


$ databricks bundle run pipeline –refresh-all -e “dev”

★ Deploy and run your project, tweak configs,


deploy and test changes

★ Deploy to multiple workspaces for testing


differences

★ Deploy and run from IDEs, terminals, or Databricks

©2024 Databricks Inc. — All rights reserved


Step 3: Automated push to Stage and Prod
As part of CI/CD processes ➜ databricks bundle deploy -e “staging”
➜ databricks bundle run pipeline –refresh-all -e “staging”

➜ databricks bundle deploy -e “production”


pull request release ➜ databricks bundle run pipeline –refresh-all -e
“production”

deploy deploy
check merge to prod
commit as test
out

★ Executed on CI/CD server


(e.g. GitHub Actions)

★ Triggered by CI or release pipelines

★ Ideally run as service principal

©2024 Databricks Inc. — All rights reserved


Step 3: Automated push to Stage and Prod
As part of CI/CD processes ➜ databricks bundle deploy -e “staging”
➜ databricks bundle run pipeline –refresh-all -e “staging”

➜ databricks bundle deploy -e “production”


pull request release
➜ databricks bundle run pipeline –refresh-all -e
“production”

deploy
check merge deploy
commit as test
out to prod

★ Executed on CI/CD server


(e.g. GitHub Actions)

★ Triggered by CI or release pipelines

★ Ideally run as service principal

©2024 Databricks Inc. — All rights reserved


Modern MLOps

DEMONSTRATION

Working with
Asset Bundles

©2024 Databricks Inc. — All rights reserved


Demo
Outline

What we’ll cover:


• Introduction to Asset Bundles - Overview and benefits
• Environment Setup - Authentication and CLI installation
• Creating Asset Bundles - Using templates and customizing directories
• Validating Bundles - Ensuring syntactical correctness
• Deploying Bundles - Deployment steps and viewing jobs
• Running Jobs - Executing and monitoring
• Modifying and Redeploying - Making changes and redeploying
• Cleaning Up - Destroying deployments and cleaning environment

©2024 Databricks Inc. — All rights reserved


Modern MLOps

LAB EXERCISE

Creating and
Managing
Workflow Jobs
using UI
©2024 Databricks Inc. — All rights reserved
Lab
Outline

What you’ll do:


• Create and Configure a Workflow Job
• Setup multiple tasks using the UI.
• Enable Email Notifications
• Configure notifications for job status updates.
• Manually Trigger Deployment Workflow
• Initiate the job run manually.
• Monitor Job Run
• Observe job execution and monitor the workflow.

©2024 Databricks Inc. — All rights reserved


Architecting
MLOps Solutions

Machine Learning Operations

©2024 Databricks Inc. — All rights reserved


Learning objectives
Things you’ll be able to do after completing this module

• Explain the importance of using the right MLOps architecture.


• Explain the reasoning behind the opinionated MLOps principles
informing the recommended architecture of Databricks.
• Describe the Databricks-recommended MLOps architecture
approach.
• Architect basic machine learning operations solutions for
traditional machine learning applications based on
Databricks-recommended best practices.

©2024 Databricks Inc. — All rights reserved


Architecting MLOps Solutions

LECTURE

Opinionated
MLOps Principles

©2024 Databricks Inc. — All rights reserved


Guiding Principle
A data-centric approach to machine learning

Machine Learning Pipelines


• ML projects are made up of data
pipelines
EDA
Data
Prep
Dev Validate
Model
Serving
Monitor
• Operationalizing ML solutions
requires the connection of a
Scalable, efficient, and performant data processing
Repos Workflows variety of data pipelines
Apache Spark and Photon
• Data pipelines require storage,
governance, orchestration, etc.
Unified security, governance, and cataloging

Unity Catalog
• Aligned to the Data Intelligence
Unified data storage for reliability, quality, and sharing

Delta Lake Platform vision

Lakehouse Data Architecture and Storage

©2024 Databricks Inc. — All rights reserved


Multi*-environment Semantics
Defining Development, Staging, and Production environments

Development Staging Production

An environment where data An environment where machine An environment where machine


scientists can explore, learning practitioners can test learning engineers can deploy
experiment, and develop their solutions and monitor their solutions

©2024 Databricks Inc. — All rights reserved


Environment Separation
How many Databricks workspaces do we have?

Direct Separation Indirect Separation

dev staging prod dev staging prod

● Completed separate Databricks workspaces ● One Databricks workspace with enforced


for each environment separation
● Simpler environments ● Simpler overall infrastructure requiring less
● Scales well to multiple projects permission required
● Complex individual environment
● Doesn’t scale well to multiple projects

©2024 Databricks Inc. — All rights reserved


Deployment Patterns
Moving from Deploy Model to Deploy Code

Deploy Model Deploy Code (recommended)

dev staging prod dev staging prod

● Model is trained in development ● Code is developed in the development


environment environment
● Model artifact is moved from staging ● Code is tested in the staging environment
through production ● Code is deployed in the production
● Separate process needed for other code environment
(inference, monitoring, operational pipelines) ● Training pipeline is run in each environment,
model is deployed in production

©2024 Databricks Inc. — All rights reserved


Architecting MLOps Solutions

LECTURE

Recommended
MLOps
Architectures

©2024 Databricks Inc. — All rights reserved


Importance of MLOps Architecture
Setting an ML project up for success starts with architecture

Simplicity Efficiency

When ML projects are well architected, the When ML projects are well architected, processes
downstream management, maintenance, and around the project become more efficient.
monitoring of the project is simplified.

Scalability Collaboration

When ML projects are well architected, they can When ML projects are well architected, it’s easy for
easily be scaled to adapt to changing requirements different users and different types of users to
for infrastructure and compute. collaborate effectively.

©2024 Databricks Inc. — All rights reserved


Dimensions of Architecture
Differentiating initial organization/setup and ongoing workflows

Infrastructure Workflow

The organization, governance, and The processes that ML practitioners


setup of environments, data, compute follow within a defined architecture to
and other resources. achieve success on an ML project.
● Set up one time (per project or per ● Repeatable, fluid processes
team/organization) specific to a project
● Crucial to downstream success of ● Aligned to organizational best
project(s) practices

©2024 Databricks Inc. — All rights reserved


Recommended MLOps Architecture
A high-level view of code, data, and ML environments

Code Management
A single project code repository to be used throughout all environments

Development Staging Production


A Databricks workspace (or environment) A Databricks workspace (or environment) A Databricks workspace (or environment)
for exploratory data analysis, model for testing the efficacy of the project, for production ML workflows, scaling, and
training/tracking and validation, including unit tests, integration tests, and monitoring.
deployment for model selection, and performance regression tests.
monitoring.

Data/Artifact Management
A single data/artifact management solution with access to environment-specific catalogs

©2024 Databricks Inc. — All rights reserved


MLOps Solution
Infrastructure through production

Production

4 Setting up the
deployment and
1 Development
monitoring of the
production-grade
Infrastructure Setup ML solution.
Optional* Developing the EDA
and ML Pipelines of
Staging
Organization and set up of infrastructure for a
machine learning project
the ML solution. 2 3
Establishing the
automated testing
setup for the ML
solution.

©2024 Databricks Inc. — All rights reserved


Infrastructure Setup
Getting set up for a machine learning project

• Who: Architect or Engineer


• How often: Set up once
• What:
• A Unity Catalog metastore
• One or three Databricks workspaces
• Three data catalogs: dev, staging, prod
1
Infrastructure Setup
• Command line environment Optional*
• MLOps Stacks project (per project) Organization and set up of infrastructure for a
• Git repository (per project) machine learning project

• How:
• Manual setup/Terraform
• MLOps Stacks

©2024 Databricks Inc. — All rights reserved


Infrastructure Setup Tasks
Using our DataOps + DevOps + ModelOps mental model
• Make Infrastructure Decisions: • Optimize for Efficiency:
• Number of workspaces • Project templates with guardrails and
• Use existing vs. new infrastructure best practices configured.
• Select CI/CD tooling • Git repository creation and connection
• Configure Databricks CLI and IDE
• e.g., VS Code extension
• Create Databricks Environment:
• Unity Catalog Metastore
• 1 or 3 workspaces • Additional Considerations:
• Unity Catalogs (dev, test, staging, prod) • Monitoring and logging
• Service principal permissions • Backup and recovery
• Security best practices
• Network configuration

©2024 Databricks Inc. — All rights reserved


Deep Dive: Infrastructure
A closer look at the Infrastructure stage

©2024 Databricks Inc. — All rights reserved


W
Development
Developing a machine learning project

• Who: Data Scientist


• How often:
• Initial solution development
• Solution updates
4
• What: Development
• EDA Developing the
• ML Development EDA and ML
Pipelines of the
• ML Validation and Deployment ML solution. 2 3
• Monitoring solution
• How:
• Develop ML pipelines within project
architecture by editing an MLOps
Stacks project

©2024 Databricks Inc. — All rights reserved


Development Tasks
Using our DataOps + DevOps + ModelOps mental model
• Make Changes to Code: • Deploy Solution:
• Add and update code • Deploy all jobs to ensure successful
• Use project templates (DAB, MLOps execution
Stacks, etc.) • Use CLI, scripts, and automation for
• Write notebooks and scripts consistent deployments across
• Create queries and alerts environments
• Commit and pull code changes • Establish rollback strategies

• Validate Data and Code: • Additional Considerations:


• Ensure correct setup before deployment • Implement version control
• Validate using DLT, Asset Bundles, etc. • Optimize deployment workflow
• Confirm data format compliance • Ensure scalability and performance
• Implement data quality checks tuning
• Review for security and compliance • Set up automated notifications
©2024 Databricks Inc. — All rights reserved
Deep Dive: Development
A closer look at the Development stage

©2024 Databricks Inc. — All rights reserved


W
Staging
Testing a machine learning project

• Who: ML Engineer
• How often:
• Set up and run every time a change is
made in Development
• Run every time a model is refreshed
4
• What:
• Project merge
Staging
• Project code testing 2 3
Establishing the
• How: automated
testing setup for
• Centralized Git repository the ML solution.

• Automated CI/CD infrastructure tools

©2024 Databricks Inc. — All rights reserved


Staging Tasks
Using our DataOps + DevOps + ModelOps mental model
• Review CI/CD Workflows and Tests: • Analyze Test Results
• Examine existing CI/CD workflows • If pass all tests and get approved,
• Add/change tests as needed • merge into the main branch
• Ensure workflow alignment with project • If tests fail,
• return to the development stage
• Create Pull Request to Run Tests: • Review test reports and logs for insights
• Establish a trigger
• e.g., pull request to merge dev branch • Additional Considerations:
into main branch • Implement automated notifications for
• Run Tests: test results
• Check for conflicts • Monitor staging environment
• Validate CI/CD setup performance
• Validate the project • Ensure data integrity and consistency
• Unit, integration, and stress tests during staging
©2024 Databricks Inc. — All rights reserved
Deep Dive: Staging
A closer look at the Staging stage

©2024 Databricks Inc. — All rights reserved


Production
Deploying and monitoring a machine learning project

• Who: ML Engineer
• How often:
• When changes are made and tests are Production
passed
• When the model needs refreshed
4 Setting up the
deployment and
monitoring of the
• What: production-grade
ML solution.
• Automated run/deployment of solution
• Monitoring of solution 2 3
• How:
• Centralized Git repository
• MLOps Stacks project

©2024 Databricks Inc. — All rights reserved


Production Tasks
Using our DataOps + DevOps + ModelOps mental model
• Create/Merge to Release Branch: • Monitor
• Set up release branch • Track system performance
• Include deployment triggers • Monitor deployment health
• Merge updates into release branch • Real-time alerting and notifications
• Log analysis for error detection
• Deploy Solution: • Data and model drift detection
• Automatic deployment triggered by
release branch updates • Additional Considerations:
• Deploy components: • Optimize resource utilization
• Project code • Ensure compliance with security policies
• Data, Model, and Monitoring workflows • Conduct post-deployment reviews
• Compute resources • Implement incident response protocols
• Set up automated retraining pipelines

©2024 Databricks Inc. — All rights reserved


Deep Dive: Production
A closer look at the Production stage

©2024 Databricks Inc. — All rights reserved


Complete MLOps Architecture

©2024 Databricks Inc. — All rights reserved


Architecting MLOps Solutions

DEMONSTRATION

Model Testing Job


with the
Databricks CLI

©2024 Databricks Inc. — All rights reserved


Demo
Outline

What we’ll cover:


• CLI Basics
• Execute the help command to explore functionalities
• Workflow Job Configuration
• Create a JSON configuration file for the workflow
• Creating and Running a Workflow Job
• Create the job using Databricks CLI
• Extract job ID and run the job
• Monitoring and Exploring Jobs
• Access the job console
• View and explore tasks and run output.
©2024 Databricks Inc. — All rights reserved
Architecting MLOps Solutions

LAB EXERCISE

Deploying Models
with Jobs and the
Databricks CLI

©2024 Databricks Inc. — All rights reserved


Lab
Outline

What you’ll do:


• Task 1: Identify and update a model's alias to "Champion".
• Task 2: Configure and use the Databricks CLI to manage jobs.
• Task 3: Create and run a workflow job for model deployment and
Batch Inferencing.
• Task 4: Monitor and explore the executing workflow job.

©2024 Databricks Inc. — All rights reserved


Implementation
and Monitoring
MLOps Solution

Machine Learning Operations

©2024 Databricks Inc. — All rights reserved


Learning objectives
Things you’ll be able to do after completing this module

• Understand the integration and application of Databricks MLOps Stacks to


improve CI/CD practices and infrastructure management for machine
learning environments.
• Develop skills to implement effective model monitoring strategies that
encompass business requirements, resource utilization, model
performance, and traceability.
• Develop expertise in diagnosing model drift types and setting up
appropriate retraining triggers to maintain model accuracy and reliability.
• Develop proficiency in employing monitoring techniques that ensure data
integrity, trace model performance, and automate alerts leveraging
Databricks' Lakehouse Monitoring.

©2024 Databricks Inc. — All rights reserved


Implementation and Monitoring MLOps Solution

LECTURE

Implementation of
MLOps Stacks

©2024 Databricks Inc. — All rights reserved


How do we set all of this up?
We recommend using Databricks MLOps Stacks

Databricks MLOps Stacks • Easing the implementation and


Out-of-the box MLOps tooling management of MLOps
infrastructure and architecture
• Return your focus to solving
Infrastructure
CI/CD
via GitHub Actions or Azure as-code Orchestration
with Workflows
business problems
DevOps with asset bundles and
templates • Aligned to recommended
deploy-code architecture best
practices
• Current Status: Public Preview
Built on existing Databricks infrastructure components like Workflows, MLflow experiments,
MLflow models, Feature Store

©2024 Databricks Inc. — All rights reserved


What does MLOps Stacks actually do?
Creates a repo for in a sample project structure for productioning ML

├── README.md
├── requirements.txt
├── databricks.yml
├── training Project structure for structuring ML code
├── validation
├── deployment
├── tests ML-tailored CI/CD for deploying ML systems across multiple
├── .github/.azure environments
├── resources
├── inference.yml Infra as code for configuring and managing ML resources
├── training.yml across multiple environments including model registry,
├── ml-artifacts.yml training job, batch jobs, feature engineering, monitoring,
serving endpoints, etc.
└── feature-engineering.yml

©2024 Databricks Inc. — All rights reserved


W
How do we use MLOps Stacks?
Set up and run the project from the command line in Public Preview

Set up the project Run the project

> databricks bundle init mlops-stacks > databricks bundle validate


> # … answer the prompts > databricks bundle deploy -t <env-name>
> databricks bundle run -t <env-name> <job-name>
> databricks bundle destroy -t <env-name>

● Initialize the project ● Validate the project


● Answer prompts with specific ● Deploy the project to environment
details ● Run the project jobs
● Edit the project code ● Delete the project when complete
● Commit/merge the project code

©2024 Databricks Inc. — All rights reserved


W
Implementation and Monitoring MLOps Solution

LECTURE

Type of Model
Monitoring

©2024 Databricks Inc. — All rights reserved


Type of Model Monitoring

Business Requirements Model Performance

● Ensuring the ML solution aligns with and fulfills ● Tracking the accuracy and efficiency of the
specific business objectives. model over time.
● Regular assessments to ensure continued ● Detecting and addressing types of drift or
relevance to evolving business needs. degradation.

Resource Utilization Traceability

● Ensuring efficient resource utilization within the ● Facilitating audit trails for troubleshooting,
ML infrastructure. regulatory compliance, and model improvement.
● Compliance with Service Level Agreements ● Tracking data lineage to understand the origin,
(SLAs) for system performance and availability. movement, and transformation of data.

©2024 Databricks Inc. — All rights reserved


Four Types of Drift

Data Drift: Concept Drift:

● Occurs when the statistical properties of the Happens when the relationship between input
input data change over time. features and the target variable changes.
● Can impact the model's quality by introducing
Forces models to adapt to new patterns to stay
inconsistencies in the data patterns.
relevant.

Model Quality Drift: Bias Drift:


● Involves shifts in model outcomes that could lead
• Reflects a decrease in the model's predictive to unfair treatment of certain groups.
performance over time. ● Monitoring for bias drift is crucial to maintain
• Can be detected through worsening metrics like fairness and ethical standards in model
accuracy, precision, recall, or F1 score. predictions.

©2024 Databricks Inc. — All rights reserved


Illustrating Data and Model Drift
Visualizing Changes in Data and Model Performance Over Time

Data Drift Model Drift

Training data distribution: Right After Deployment: After a Period of Time:

80 5 50 35

Production data distribution:


10 85 10 85

©2024 Databricks Inc. — All rights reserved


Illustrating Concept and Bias Drift
Visualizing Changes in Concept and Bias Over Time

Concept Drift Bias Drift

Change in Group Prediction accuracy varies


Distribution & New Group across different groups

©2024 Databricks Inc. — All rights reserved


Data Drift Scenario:
Event: Introduction of a New Product Line

Scenario: The company decides to introduce a new line of eco-friendly


products, heavily promoting them through social media and email campaigns.
This new product line attracts a different demographic compared to the
existing customer base.

Changes in Data: Impact of Data Drift: Addressing Data Drift:


• Demographic Shift ● Prediction Accuracy: Training data no
● Collect New Data
• Browsing Behavior longer represents the current
● Retrain the Model
• Historical Sales Data customer behavior and product
● Monitor Continuously
• Promotional offerings.
● Sales Forecasting: The sales of
Activities
eco-friendly products is
underestimated and the sales of other
products is overestimated.

©2024 Databricks Inc. — All rights reserved Note: The scenario does overlap with concept drift.
ML Model Retraining Triggers

● Scheduled Retraining:
○ Databricks recommends starting with scheduled, periodic retraining and moving to triggered
retraining when needed.
● Data Changes:
○ Changes in the data can either explicitly trigger a retraining job or it can be automated if data drift
is detected.
● Model Code Changes:
○ Retraining can be triggered by changes in the model code, often due to concept drift or other
factors that necessitate an update in the model.
● Model Configuration Changes:
○ Alterations in the model configuration can also initiate a retraining job.
● Monitoring and Alerts:
○ Jobs can monitor data and model drift, and Databricks SQL dashboards can display status and
send alerts

©2024 Databricks Inc. — All rights reserved


Implementation and Monitoring MLOps Solution

LECTURE

Monitoring in
Machine Learning

©2024 Databricks Inc. — All rights reserved


Monitoring ML Systems
Continuous logging and review of key component/system metrics

Why?
Used to help diagnose issues before they become severe or costly

Data to Monitor ML Assets to Monitor


Input data (tricky with existing models) Mid-training checkpoints for analysis
Data in feature stores and vector databases Component evaluation metrics
Human feedback data ML system evaluation metrics
Model Outputs Performance/cost details

©2024 Databricks Inc. — All rights reserved


Lakehouse Monitoring
Manage, govern, evaluate, and switch models easily

• Monitor data and AI assets


• Centralized and standardized
mechanism for monitoring models in
production
• Simplified, built-in tool for monitoring
mechanisms to diagnose errors, detect
drift, etc.
• Allow for the creation of additional
custom metrics.
• Alerting to get notified on drift or
quality issues.

©2024 Databricks Inc. — All rights reserved


What does this look like in practice?

DataOps

MLOps

DevOps ModelOps

©2024 Databricks Inc. — All rights reserved


Lakehouse Monitoring Capabilities
Monitor Key Metrics
For fields such as Nulls, null %, zeros, zero %, avg, distincts,
distinct %, max, min, stdev, median, max/min/avg length, value
frequencies, quantiles, row counts

Define (multiple) time granularities


Monitor metrics over time windows, e.g. every day, every 5
minutes, over n weeks UI (UC Data Explorer)
Monitor Data Slices import databricks.lakehouse_monitoring as lm

Slice metrics based on columns or predicates, e.g. state, # Set up monitoring parameters
product_class, “cart_total > 1000” lm.create_monitor(

Monitor Tables, VIEWS and ML Models table_name=”my_UC_table”,

…)
Consistent quality & drift monitoring of all your production
assets including machine-learning model’s fairness & bias # Refresh monitoring metrics

lm.run_refresh(”my_table”)

Python API

©2024 Databricks Inc. — All rights reserved


Lakehouse Monitoring Capabilities
Dashboards and Alerts Auto-generated dashboard

Auto-generated DB SQL dashboards to visualize


metrics & trends, SQL Alerts for notifications

Open Monitoring Results


Monitoring results stored in open format Delta tables
to build custom analytics using your favorite BI tool

Simple Operations
Databricks managed compute eliminates
infrastructure management and scaling complexity

©2024 Databricks Inc. — All rights reserved


What are Databricks SQL alerts?

• Databricks SQL alerts periodically run


queries, evaluate defined conditions, and
send notifications if a condition is met.
• Scheduled Execution: Automatically runs
queries at defined intervals to check specific
conditions.
• Multi-Channel Notifications: Receive alerts via
Email, Slack, Webhook, MS Teams, PagerDuty,
and more.
• Explore the documentation for in-depth
setup and customization options.

©2024 Databricks Inc. — All rights reserved


Implementation and Monitoring MLOps Solution

DEMONSTRATION

Lakehouse
Monitoring
Dashboard
Note: The Classroom setup for the
upcoming Lab: Model Monitoring takes
about 20 minutes. Executing the setup
prior to demonstration is recommended.
©2024 Databricks Inc. — All rights reserved
Demo
Outline

What we’ll cover:


• Train and deploy a machine learning model.
• Send batched requests to the deployed model endpoint.
• Monitor the model's performance and detect anomalies or drift.
• Handle drift detection and trigger retraining and redeployment when necessary.
• Set up and utilize Databricks Lakehouse Monitoring to continuously track and alert on
model performance metrics.

©2024 Databricks Inc. — All rights reserved


Implementation and Monitoring MLOps Solution

LAB EXERCISE

Model Monitoring

©2024 Databricks Inc. — All rights reserved


Lab
Outline

What you’ll do:


• Task 1: Enable Inference Table in Serving Endpoint using UI
• Task 2: Save the Training Data as a Reference for Drift
• Task 3: Sending Batched Requests to the Model Endpoint
• Task 4: Processing and Monitoring Inference Data
• 4.1: Processing Inference Table Data
• 4.2: Monitoring the Inference Table
• 4.3: Analyzing Processed Requests
• Task 5: Persisting Processed Model Logs
• Task 6: Setting Up and Monitoring Inference Data
• 6.1: Creating an Inference Monitor with Databricks Lakehouse Monitoring
• 6.2: Inspect and Monitor Metrics Tables
©2024 Databricks Inc. — All rights reserved
Summary and Next
Steps

©2024 Databricks Inc. — All rights reserved


©2024 Databricks Inc. — All rights reserved

You might also like