0% found this document useful (0 votes)

55 views33 pages

Introduction To ML Lecture 1

The document provides an agenda for an introduction to machine learning theory and practice workshop. The agenda includes sessions on the machine learning landscape, classification and regression, linear regression with NumPy, and an introduction to Scikit-Learn. It also lists the instructor's background and references related machine learning books and documentation.

Uploaded by

Amal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views33 pages

Introduction To ML Lecture 1

Uploaded by

Amal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Introduction to Machine Learning

Theory and Practice

David R. Pugh
Instructional Assistant Professor, KAUST
Director, SDAIA-KAUST AI

• 5+ years teaching applied machine learning and deep learning at KAUST.

• 2+ years as the director of SDAIA-KAUST AI where I work to match applied AI
problems of interest to SDAIA with AI solutions developed at KAUST.
• 15+ years experience with the core data science Python stack: NumPy, SciPy,
Pandas, Matplotlib, NetworkX, Jupyter, Scikit-Learn, PyTorch, etc.

KAUST Academy 2
Agenda
Introduction to Machine Learning: Theory and Practice

09:00 - 09:05 Welcome and Opening Remarks Prof. David Pugh

09:05 - 10:30 The Machine Learning Landscape Prof. David Pugh

10:30 - 10:45 Break

10:45 - 12:00 Classification and Regression Prof. David Pugh

12:00 - 13:00 Lunch

13:00 - 14:30 Linear Regression with NumPy Prof. David Pugh + TAs

14:30 - 14:45 Break

14:45 – 16:00 Introduction to Scikit-Learn Prof. David Pugh + TAs

KAUST Academy 3
References

• Slides closely follow Hands-on Machine Learning with Scikit-Learn,

Keras and Tensorflow by Aurelien Geron.
• Another great reference is Machine Learning with PyTorch and Scikit-
Learn by Sebastian Raschka.
• Official documentation for Scikit-Learn is also fantastic.

KAUST Academy Prof. Da vi d R. Pugh 4

The ML Landscape

Prof. Da vi d R. Pugh
What is difference between AI and ML?

KAUST Academy Prof. Da vi d R. Pugh 6

What is ML?

• ML is the science (and art) of programming computers so they can learn from
data (Geron, 2019).
• [ML is the] field of study that gives computers the ability to learn without
being explicitly programmed (Samuel, 1959).
• A computer program is said to learn from experience E with respect to some
task T and some performance measure P, if its performance on T, as
measured by P, improves with experience E (Mitchell, 1997).

KAUST Academy Prof. Da vi d R. Pugh 7

Why is ML so popular right now?

Stanford’s Coursera machine learning course had more than 100,000 expressing interest
in the first year.

1. The field has matured both in terms of identity and in terms of methods and tools.
2. There is an abundance of data available
3. There is an abundance of computation to run methods
4. There have been impressive results, increasing acceptance, respect, and competition

Resources + Ingredients + Tools + Desire = Popularity

KAUST Academy Based on: http://machinelearningmastery.com/machine-learning-is-popular/?__s=yq1qzcnf67sfiuzmnvjf 8

Traditional approach is model/rules based...

KAUST Academy Prof. Da vi d R. Pugh 9

...ML approach is data-driven!

KAUST Academy Prof. Da vi d R. Pugh 10

ML adapts to change!

KAUST Academy Prof. Da vi d R. Pugh 11

ML can help humans learn!

KAUST Academy Prof. Da vi d R. Pugh 12

Types of ML systems

• Supervised vs unsupervised
• Semi-supervised vs self-supervised
• Batch (offline) vs incremental (online)
• Instance-based vs model-based

KAUST Academy Prof. Da vi d R. Pugh 13

Supervised learning

Classification Regression

KAUST Academy Prof. Da vi d R. Pugh 14

Other forms of supervised learning
Semi-supervised learning Self-supervised learning

KAUST Academy Prof. Da vi d R. Pugh 15

Unsupervised learning
Clustering Data visualization

KAUST Academy Prof. Da vi d R. Pugh 16

Reinforcement Learning

KAUST Academy Prof. Da vi d R. Pugh 17

Batch (offline) vs incremental (online) learning

Batch (offline) Learning Incremental (online) learning

KAUST Academy Prof. Da vi d R. Pugh 18

Out-of-core learning

KAUST Academy Prof. Da vi d R. Pugh 19

Instance-based vs model-based learning

Instance-based learning Model-based learning

KAUST Academy Prof. Da vi d R. Pugh 20

Main Challenges of Applying ML

KAUST Academy 21
Main Challenges of Applying ML

• Insufficient quantity of training data

• Non-representative training data
• Poor quality data
• Irrelevant features
• Overfitting the training data
• Underfitting the training data

KAUST Academy Prof. Da vi d R. Pugh 22

Insufficient quantity of training data

• The more data for training the

better!
• It can take a lot of data for most
ML algorithms to work.
• "Simple" problems often require
O(10k) samples.
• "Complex" problems often
require O(1m) samples.

KAUST Academy Prof. Da vi d R. Pugh 23

Non-representative training data

• Need training data to be

representative of new data for
generalization.
• Sampling noise: not enough
data => training data not
representative by chance.
• Sampling bias: poor sampling
technique => training data not
representative (biased).

KAUST Academy Prof. Da vi d R. Pugh 24

Poor quality training data

• Data can be full of errors, • Data types? Do you have

outliers, and noise (e.g., due to numeric features? Ordinal
poor-quality measurements). features? Categorical features?
• Dirty data => hard for • Look for outliers in your data:
any algorithm to detect Remove? Fix manually?
patterns. • Look for missing data:
• Significant amount of your Remove? Impute values?
time will be spent cleaning
data.

KAUST Academy Prof. Da vi d R. Pugh 25

Irrelevant features

Garbage in => garbage out! Feature engineering is often

critical to success.
• Learning requires sufficient
relevant features (and not too • Feature selection:
many irrelevant ones!). selecting the "best" subset
• Developing a good set of of features for training.
features for training is critical
part of ML project. • Feature extraction:
• Significant amount of combining existing features to
your time will be spent doing produce new ones.
feature engineering. • Creating new features
from new data.
KAUST Academy Prof. Da vi d R. Pugh 26
Overfitting the training data

What is overfitting?

• Overfitting is when model

performs well on training data
but poorly on new data.
• If model is complex or training
data is limited, model will detect
spurious patterns.
• Constraining a complex
model to make it simpler is
called regularization.
KAUST Academy Prof. Da vi d R. Pugh 27
Underfitting the training data

What is underfitting? How to reduce underfitting?

• Underfitting is when a model is • Select more complex (more

too simple to learn the parameters) model.
underlying structure of the data.
• Linear models will often • Feed better features to the
underfit (but often a good place model (feature engineering).
to start). • Reduce the constraints on
model (reduce regularization).

KAUST Academy Prof. Da vi d R. Pugh 28

Validation and Testing

KAUST Academy 29
Why measure generalization error?

• Only way to know if your model Some train-test split heuristics:

is good is to measure
performance new data! • For datasets smaller than
• Split your data into train and O(100k) samples, take 80%
test sets: error on the test set is for train and holdout 20%
estimate of generalization error. for test.
• Low training error, high • For larger datasets, O(1m)
generalization error => samples, holdout 1-10% of the
overfitting! dataset for test.

KAUST Academy Prof. Da vi d R. Pugh 30

Model Selection

• Often need to tune • Validation set too small =>

hyperparameters to find a good might select "bad" model by
model within a particular class mistake.
of models. • Validation set too large
• How? Split training data into => training set too small!
training set and validation set. • Cross validation: create lots
• Always compare tuned models of small validation sets,
using the test set! evaluate model on each
validation set, measure
average performance across
validation sets.
KAUST Academy Prof. Da vi d R. Pugh 31
Model selection process

KAUST Academy Prof. Da vi d R. Pugh 32

Thanks!

KAUST Academy Prof. Da vi d R. Pugh 36

MLUnit 1
No ratings yet
MLUnit 1
131 pages
Advanced ML Slides Intro
No ratings yet
Advanced ML Slides Intro
14 pages
ML Unit-1
No ratings yet
ML Unit-1
130 pages
Lifecycle of ML
No ratings yet
Lifecycle of ML
12 pages
ELE-COI-521 Machine Learning Topics
No ratings yet
ELE-COI-521 Machine Learning Topics
40 pages
Introduction 1
No ratings yet
Introduction 1
142 pages
Introduction To AIML
No ratings yet
Introduction To AIML
19 pages
Lec-7 Intro Machine Learning
No ratings yet
Lec-7 Intro Machine Learning
87 pages
Basic Concepts of Machine Learning For Beginners
No ratings yet
Basic Concepts of Machine Learning For Beginners
102 pages
Introduction - Final
No ratings yet
Introduction - Final
64 pages
ML Course Aug2025
No ratings yet
ML Course Aug2025
6 pages
CSE 445 - Lecture 1 - Machine Learning Introduction
No ratings yet
CSE 445 - Lecture 1 - Machine Learning Introduction
23 pages
2025 Slides7 ML Eng
No ratings yet
2025 Slides7 ML Eng
59 pages
ML Module I
No ratings yet
ML Module I
71 pages
Intro to Machine Learning & kNN
No ratings yet
Intro to Machine Learning & kNN
90 pages
Grade 8 August Holiday Revision Booklet
No ratings yet
Grade 8 August Holiday Revision Booklet
154 pages
HW4ML Project Code
No ratings yet
HW4ML Project Code
24 pages
Mastercam PDF
0% (1)
Mastercam PDF
2 pages
38_SAE International Journal of Passenger Cars - Mechanical Systems Volume 7 Issue 1 2014 [Doi 10.4271_2014!01!0872] Li, Bin; Yang, Xiaobo; Yang, James -- Tire Model Application and Parameter Identific
No ratings yet
38_SAE International Journal of Passenger Cars - Mechanical Systems Volume 7 Issue 1 2014 [Doi 10.4271_2014!01!0872] Li, Bin; Yang, Xiaobo; Yang, James -- Tire Model Application and Parameter Identific
13 pages
Module 1 ML Mumbai University
No ratings yet
Module 1 ML Mumbai University
47 pages
A.I. Lecture 4 NEW
No ratings yet
A.I. Lecture 4 NEW
31 pages
ML - Lecture - 1 Introduction To ML
No ratings yet
ML - Lecture - 1 Introduction To ML
29 pages
ML Unit1
No ratings yet
ML Unit1
6 pages
Machine Learning (Unit I)
No ratings yet
Machine Learning (Unit I)
12 pages
Machine Learning Foundation
No ratings yet
Machine Learning Foundation
13 pages
Thermal Hydraulics for Engineers
No ratings yet
Thermal Hydraulics for Engineers
85 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
5 pages
Syl3 ML
No ratings yet
Syl3 ML
5 pages
ML Revision
No ratings yet
ML Revision
207 pages
1-Introduction To Machine Learning
No ratings yet
1-Introduction To Machine Learning
61 pages
L1 Overview
No ratings yet
L1 Overview
28 pages
CMPE257 - W2C2 - ML Fundamentals - Part 1
No ratings yet
CMPE257 - W2C2 - ML Fundamentals - Part 1
18 pages
Chapter 5 AI
No ratings yet
Chapter 5 AI
40 pages
Introduction To ML Unit-1
No ratings yet
Introduction To ML Unit-1
90 pages
ML Module No 01
No ratings yet
ML Module No 01
138 pages
Data Management and Data Transformation, Introduction To Machine Learning
No ratings yet
Data Management and Data Transformation, Introduction To Machine Learning
54 pages
OpenSSH Setup for Sysadmins
No ratings yet
OpenSSH Setup for Sysadmins
13 pages
Multichannel CRM Integration Guide
No ratings yet
Multichannel CRM Integration Guide
9 pages
Module 1 ML
No ratings yet
Module 1 ML
51 pages
Class1 - Introduction and Foundation-1717413257735
No ratings yet
Class1 - Introduction and Foundation-1717413257735
23 pages
Unit Iv Parametric Machine Learning
No ratings yet
Unit Iv Parametric Machine Learning
4 pages
Shreve S.E. Stochastic Calculus For Finance I.. The Binomial Asset Pricing Model
No ratings yet
Shreve S.E. Stochastic Calculus For Finance I.. The Binomial Asset Pricing Model
203 pages
MATH 370: Intro to Machine Learning
No ratings yet
MATH 370: Intro to Machine Learning
60 pages
ML Cahp 1
No ratings yet
ML Cahp 1
35 pages
2024 Machine Learning Intro
No ratings yet
2024 Machine Learning Intro
50 pages
Unit1 ML NGP
No ratings yet
Unit1 ML NGP
106 pages
ML Chap 2
No ratings yet
ML Chap 2
60 pages
Lecture # 2-2 Artificial Neural Networks
No ratings yet
Lecture # 2-2 Artificial Neural Networks
70 pages
01 Intro To ML Wo Videos
No ratings yet
01 Intro To ML Wo Videos
46 pages
Intro DL 01
No ratings yet
Intro DL 01
64 pages
1 - Machine Learning Overview
No ratings yet
1 - Machine Learning Overview
56 pages
Machine Learning Basics & kNN Guide
No ratings yet
Machine Learning Basics & kNN Guide
94 pages
Study Notes - Lesson 1 - 7 PDF
No ratings yet
Study Notes - Lesson 1 - 7 PDF
25 pages
FML - KNN
No ratings yet
FML - KNN
64 pages
Machine Learning Updated
No ratings yet
Machine Learning Updated
14 pages
2021 Machine Learning Intro
No ratings yet
2021 Machine Learning Intro
43 pages
20mat21 Class Question Paper Notes PDF
No ratings yet
20mat21 Class Question Paper Notes PDF
3 pages
Syllabus Apni Kaksha
No ratings yet
Syllabus Apni Kaksha
1 page
Overview of Machine Learning
No ratings yet
Overview of Machine Learning
60 pages
EE353 - 769 06 Intro To ML
No ratings yet
EE353 - 769 06 Intro To ML
27 pages
Intro ML 1 Day
No ratings yet
Intro ML 1 Day
43 pages
CSL0777 L02
No ratings yet
CSL0777 L02
19 pages
APS1070 Lecture (3) Slides
No ratings yet
APS1070 Lecture (3) Slides
70 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
35 pages
Lecture 1 Ai
No ratings yet
Lecture 1 Ai
38 pages
TLW - Part A Questions & Answers
No ratings yet
TLW - Part A Questions & Answers
14 pages
Ece265p Fahmy Day7
No ratings yet
Ece265p Fahmy Day7
93 pages
Introduction To Systems Analysis and Design:: An Agile, Iterative Approach
No ratings yet
Introduction To Systems Analysis and Design:: An Agile, Iterative Approach
49 pages
Introduction To Machine Learning: Pekka Parviainen
No ratings yet
Introduction To Machine Learning: Pekka Parviainen
39 pages
From Field Problems To Machine Learning
No ratings yet
From Field Problems To Machine Learning
51 pages
Tricks and Treats For "CST Programmers": Amit Rappel, Itzik Haimov
No ratings yet
Tricks and Treats For "CST Programmers": Amit Rappel, Itzik Haimov
24 pages
Nimbus VTOL Manual 180306
100% (1)
Nimbus VTOL Manual 180306
11 pages
Gas Law Experiment
No ratings yet
Gas Law Experiment
3 pages
Hungarian Mathematical Olympiad 1998/99: Final Round
No ratings yet
Hungarian Mathematical Olympiad 1998/99: Final Round
1 page
Asmus CV
No ratings yet
Asmus CV
4 pages
Date Palm Pest Management Guide
No ratings yet
Date Palm Pest Management Guide
234 pages
Lab 6
No ratings yet
Lab 6
29 pages
External Reciprocating Steam Engine
No ratings yet
External Reciprocating Steam Engine
8 pages
Strong Swan Documentation (Updated Till Eap-Md5)
No ratings yet
Strong Swan Documentation (Updated Till Eap-Md5)
58 pages
Performance Measurament
No ratings yet
Performance Measurament
42 pages
Chapter 12
No ratings yet
Chapter 12
61 pages
JavaScript Global Object and Promise Polyfills
No ratings yet
JavaScript Global Object and Promise Polyfills
88 pages
UNIT3 2marks
No ratings yet
UNIT3 2marks
7 pages
XML: Extensible Markup Language
No ratings yet
XML: Extensible Markup Language
35 pages
Chapter19 v2
No ratings yet
Chapter19 v2
54 pages
Quickassist Adapter 8950 Brief
No ratings yet
Quickassist Adapter 8950 Brief
3 pages
Recent Advances and Application of Machine Learning in Food Flavor Prediction and Regulation
No ratings yet
Recent Advances and Application of Machine Learning in Food Flavor Prediction and Regulation
14 pages
Statement Project
No ratings yet
Statement Project
1 page
Mathematics SL Internal Assessment Does My Dog Walk More Than Me?
No ratings yet
Mathematics SL Internal Assessment Does My Dog Walk More Than Me?
15 pages
Introduction To Systems Analysis and Design:: An Agile, Iterative Approach
No ratings yet
Introduction To Systems Analysis and Design:: An Agile, Iterative Approach
39 pages
2023 Assessments Final
No ratings yet
2023 Assessments Final
12 pages
Tema 4 Synopsys Primer Ejemplo
No ratings yet
Tema 4 Synopsys Primer Ejemplo
21 pages
Control Structures in PLSQL
No ratings yet
Control Structures in PLSQL
8 pages
Vewlix VLX 1 Base
No ratings yet
Vewlix VLX 1 Base
6 pages
Physics 1.1
No ratings yet
Physics 1.1
3 pages
Sessional - 1 Blockchain (MCA)
No ratings yet
Sessional - 1 Blockchain (MCA)
9 pages

Introduction To ML Lecture 1

Uploaded by

Introduction To ML Lecture 1

Uploaded by

Introduction to Machine Learning

Theory and Practice

• 5+ years teaching applied machine learning and deep learning at KAUST.

09:00 - 09:05 Welcome and Opening Remarks Prof. David Pugh

09:05 - 10:30 The Machine Learning Landscape Prof. David Pugh

10:30 - 10:45 Break

10:45 - 12:00 Classification and Regression Prof. David Pugh

12:00 - 13:00 Lunch

14:30 - 14:45 Break

14:45 – 16:00 Introduction to Scikit-Learn Prof. David Pugh + TAs

• Slides closely follow Hands-on Machine Learning with Scikit-Learn,

KAUST Academy Prof. Da vi d R. Pugh 4

KAUST Academy Prof. Da vi d R. Pugh 6

KAUST Academy Prof. Da vi d R. Pugh 7

Resources + Ingredients + Tools + Desire = Popularity

KAUST Academy Based on: http://machinelearningmastery.com/machine-learning-is-popular/?__s=yq1qzcnf67sfiuzmnvjf 8

KAUST Academy Prof. Da vi d R. Pugh 9

KAUST Academy Prof. Da vi d R. Pugh 10

KAUST Academy Prof. Da vi d R. Pugh 11

KAUST Academy Prof. Da vi d R. Pugh 12

KAUST Academy Prof. Da vi d R. Pugh 13

KAUST Academy Prof. Da vi d R. Pugh 14

KAUST Academy Prof. Da vi d R. Pugh 15

KAUST Academy Prof. Da vi d R. Pugh 16

KAUST Academy Prof. Da vi d R. Pugh 17

Batch (offline) Learning Incremental (online) learning

KAUST Academy Prof. Da vi d R. Pugh 18

KAUST Academy Prof. Da vi d R. Pugh 19

Instance-based learning Model-based learning

KAUST Academy Prof. Da vi d R. Pugh 20

• Insufficient quantity of training data

KAUST Academy Prof. Da vi d R. Pugh 22

• The more data for training the

KAUST Academy Prof. Da vi d R. Pugh 23

• Need training data to be

KAUST Academy Prof. Da vi d R. Pugh 24

• Data can be full of errors, • Data types? Do you have

KAUST Academy Prof. Da vi d R. Pugh 25

Garbage in => garbage out! Feature engineering is often

• Overfitting is when model

What is underfitting? How to reduce underfitting?

• Underfitting is when a model is • Select more complex (more

KAUST Academy Prof. Da vi d R. Pugh 28

• Only way to know if your model Some train-test split heuristics:

KAUST Academy Prof. Da vi d R. Pugh 30

• Often need to tune • Validation set too small =>

KAUST Academy Prof. Da vi d R. Pugh 32

KAUST Academy Prof. Da vi d R. Pugh 36

You might also like