0% found this document useful (0 votes)

51 views65 pages

01 - ML - Introduction

Uploaded by

11-Nguyễn Thị Quỳnh Châu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views65 pages

01 - ML - Introduction

Uploaded by

11-Nguyễn Thị Quỳnh Châu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

Ho Chi Minh University of Banking

Department of Economic
Mathematics

Machine Learning
Lecture 1. Overview of Machine Learning

Vuong Trong Nhan ([email protected])

Outline

1. Introduction to Machine Learning

2. What is Machine Learning ?
3. Why use Machine Learning ?
4. Machine Learning Application
5. How does Machine Learning work?
6. Types of Machine Learning
7. Main Challenges of Machine Learning
8. Ethical and Social Considerations
9. Trends and Challenges

2
Introduction

3
Source: Internet
Introduction

Wine quality prediction

autonomous car

In healthcare 4
Source: Internet
AI

5
General AI vs Narrow AI

6
7
What is Machine Learning ?

Machine Learning (ML) is an active subfield of

Artificial Intelligence

8
What is Machine Learning ?

Arthur Samuel (1959). Machine Learning:

"Field of study that gives computers the
ability to learn without being explicitly
programmed".
Source: Wikipedia
What is Machine Learning ?

Tom Mitchell (1998) Well-posed Learning

Problem: "A computer program is said to learn
from experience E with respect to some task T
and some performance measure P, if its
performance on T, as measured by P, improves
Tom Mitchell’s homepage with experience E".

A learning problem can be described as a triple (T, P, E):

T: Task
P: performance
E: Experience

10
Example 1
Suppose your email program watches which emails you
do or do not mark as spam, and based on that learns how
to better filter spam.

What is the task T in this setting?

A. Classifying emails as spam or not spam. (T)

B. Watching you label emails as spam or not spam. (E)
C. The number (or fraction) of emails correctly classified
as spam/not spam. (P)
D. None of the above

11
Example 1
In Spam E-Mail detection,
Task, T: To classify mails into
Spam or Not Spam.
Performance measure, P: Total
percent of mails being correctly
classified as being “Spam” or “Not
Spam”. Spam
Experience, E: Set of Mails with No
?
Yes
label “Spam”

12
Why Use Machine Learning?
Consider how you would write a spam filter using:
Traditional programming techniques

Figure 1-1. The traditional approach

13
Why Use Machine Learning?
Consider how you would write a spam filter using:
The machine learning techniques

Figure 1-2. The Machine Learning approach

14
Why Use Machine Learning?
Consider how you would write a spam filter using:
The machine learning techniques

Figure 1-3. Automatically adapting to change

15
Why Use Machine Learning?

Figure 1-4. Machine learning can help humans learn

16
Examples of Applications

17
Applications in banking / finance

18
Examples of Applications
Analyzing images of products on a production line to
automatically classify them
Detecting tumors in brain scans
Automatically classifying news articles
Automatically flagging offensive comments on
discussion forums
Summarizing long documents automatically
Creating a chatbot or a personal assistant
Forecasting your company’s revenue next year, based
on many performance metrics
Making your app react to voice commands

19
Examples of Applications
Detecting credit card fraud
Segmenting clients based on their purchases so that
you can design a different marketing strategy for each
segment
Representing a complex, high-dimensional dataset in
a clear and insightful diagram
Recommending a product that a client may be
interested in, based on past purchases
Building an intelligent bot for a game

20
How does Machine Learning work?

Machine Learning Process

21
Ng, Frederick, Runqing Jiang, and James CL Chow. "Predicting radiation treatment planning evaluation parameter using artificial intelligence
and machine learning." IOP SciNotes 1.1 (2020): 014003.
CRISP-DM Methodology

Figure: A diagram of the CRISP-DM process which shows the six key phases 22
and indicates the important relationships between them [Wirth and Hipp, 2000].
Types of Machine Learning

Machine learning systems can be categorized based

on:
Supervision during Training:
o Supervised: Learn from labeled data with input-output pairs.
o Unsupervised: Discover patterns and structures without labeled data.
o Semi-supervised: A mix of labeled and unlabeled data.
o Self-supervised: Generate labels from data itself.
o Reinforcement Learning: Learn through interaction with an environment,
receiving rewards or penalties.
Learning Approach:
o Online Learning: Incremental learning on new data as it arrives.
o Batch Learning: Train on a fixed dataset.
Learning Strategy:
o Instance-based (lazy): Compare new data directly to known data points.
o Model-based (eager): Detect patterns in training data to build predictive
models.

23
Types of Machine Learning

Main ML algorithms:
Supervised
Unsupervised
Semi-supervised
Reinforcement Learning

24
SUPERVISED
LEARNING
25
Supervised learning

The training set you feed to the algorithm

includes the desired solutions, called labels

Typical tasks:
Classification
Regression

26
Supervised learning: classification
The spam filter:
Train: sample emails with their class / target (spam or ham)
Goal: learn how to classify new emails.

Figure 1-5. A labeled training set for spam classification 27

Supervised learning: regression
Predicting the price of a car based on its features
like mileage, age, and brand.
Train: examples of cars including (features, price)

Figure 1-6. A regression problem: predict a value, given an input feature

28
(usually multiple input features, and sometimes multiple output values)
Supervised learning
Data: D = {D1, D2, … , Dn} a set of n samples
where 𝐷𝑖 = < 𝐗𝐢, 𝑦𝑖 >
𝐗𝐢 is a input matrix
𝑦𝑖 is a desired output

Objective:
learning the mapping 𝑓: 𝑿 → 𝒚
subject to 𝑦𝑖 ≈ 𝑓(𝐗𝐢) for all i = 1,…,n

Classification: 𝑦 is discrete
Regression: 𝑦 is continuous

29
Classification

Types of classification problems

Binary classification
o Only two classes, but one sample has one label
Multi-class classification
o Multiple classes, but one sample has one label
Multi-label classification
o One sample can have multiple class labels
Image segmentation
o Traditionally, clustering problem
o Recently, pixel-based classification problem

30
Supervised learning: algorithms

k-Nearest Neighbors
Linear Regression
Logistic Regression
Support Vector Machines (SVMs)
Decision Trees and Random Forests
Neural networks
…

31
Scenario: You’re running a company, and you want to develop
learning algorithms to address each of two problems.

Problem 1: You have a large inventory of identical items. You want

to predict how many of these items will sell over the next 3 months.

Problem 2: You’d like software to examine individual customer

accounts, and for each account decide if it has been
hacked/compromised.

Should you treat these as classification or as regression problems?

A. Treat both as classification problems.
B. Treat problem 1 as a classification problem, problem 2 as a regression
problem.
C. Treat problem 1 as a regression problem, problem 2 as a classification
problem.
32
D. Treat both as regression problems.
UNSUPERVISED
LEARNING

33
Unsupervised Learning
❖ Dataset: unlabels 𝑥(1) , … 𝑥 (𝑛)

❖ Goal: to find interesting structures in the data

❖ Typical tasks:

❖ Clustering

❖ Dimensionality reduction

❖ Anomaly detection

34
Unsupervised Learning: Clustering

Figure 1-7. An unlabeled training

set for unsupervised learning

Figure 1-8. Clustering

35
Unsupervised Learning: Dimensionality reduction

Figure 1-9. Example of a t-SNE visualization highlighting semantic clusters

36
Unsupervised Learning: Anomaly detection

Figure 1-10. Anomaly detection

37
Unsupervised Learning: Algorithms
Clustering
K-Means
DBSCAN
Hierarchical Cluster Analysis (HCA)
Anomaly detection and novelty detection
One-class SVM
Isolation Forest
Dimensionality reduction
Principal Component Analysis (PCA)
Kernel PCA
Locally Linear Embedding (LLE)
Association rule learning
Apriori
Eclat

38
SEMI-
SUPERVISED
LEARNING

39
Semi-supervised learning

Figure 1-11. Semisupervised learning with two classes (triangles and squares): the unlabeled
examples (circles) help classify a new instance (the cross) into the triangle class rather than
the square class, even though it is closer to the labeled squares
40
SELF- SUPERVISED
LEARNING
Goal: generate a fully labeled dataset from a fully unlabeled one.

41
Self-supervised learning

Figure 1-12. Self-supervised learning example: input (left) and target (right)
42
REINFORCEMEN
T LEARNING
• Reinforcement learning is a very different beast.
• The learning system, called an agent in this context, can observe the
environment, select and perform actions, and get rewards in return (or penalties
in the form of negative rewards).
• It learns by itself what is the best strategy, called a policy, to get the most
reward over time. A policy defines what action the agent should choose when it
is in a given situation

43
Reinforcement learning

Figure 1-13. Reinforcement learning

44
Main Challenges of Machine Learning

1. Insufficient Quantity of Training Data

2. Nonrepresentative Training Data
3. Poor-Quality Data
4. Irrelevant Features
5. Overfitting or Underfitting the Training Data

45
Main Challenges of Machine Learning

1. Insufficient Quantity of Training Data

Figure 1-21. The importance of data versus algorithms 46

Main Challenges of Machine Learning

2. Nonrepresentative Training Data

For example, the set of countries you used earlier for training the linear model was not
perfectly representative; it did not contain any country with a GDP per capita lower than
$23,500 or higher than $62,500

Figure 1-22. A more representative training sample

47
Main Challenges of Machine Learning

3. Poor-Quality Data
Training data is full of errors, outliers, and noise
-> your system is less likely to perform well

=> spend lot of time cleaning up your training data

Solutions:
• If some instances are clearly outliers, it may help to
simply discard them or try to fix the errors manually.

• If some instances are missing a few features (e.g., 5%

of your customers did not specify their age), you must
decide whether you want to ignore this attribute
altogether, ignore these instances, fill in the missing
values (e.g., with the median age), or train one model
with the feature and one model without it 48
Main Challenges of Machine Learning

4. Irrelevant Features
Your system will only be capable of learning if the training data contains enough
relevant features and not too many irrelevant ones.

Solution:
Feature engineering:
• Feature selection
• Feature extraction
• Creating new features by gathering new data

-> spend lot of time cleaning up your training data

49
Main Challenges of Machine Learning

5. Overfitting or Underfitting

50
Main Challenges of Machine Learning

5. Overfitting or Underfitting

Test error
Error

Training error

Underfitting Good Overfitting

model
Too Too
Simple complex
51
Main Challenges of Machine Learning

Underfitting
• Select a more powerful model, with more parameters.
• Feed better features to the learning algorithm (feature engineering).
• Reduce the constraints on the model (for example by reducing the
regularization hyperparameter)

52
Main Challenges of Machine Learning
Overfitting
Some solutions:
• Simplify the model by selecting one with fewer
parameters (e.g., a linear model rather than a high-
degree polynomial model), by reducing the
number of attributes in the training data, or by
constraining the model.
• Gather more training data.
• Reduce the noise in the training data (e.g., fix
data errors and remove outliers)
• Use validation set (or dev-set)

53
Main Challenges of Machine Learning

Learning algorithm
Under what conditions the chosen algorithm will
converge?
For a given application/domain and a given
objective function, what algorithm performs best?
No-free-lunch theorem [Wolpert and Macready, 1997]:
if an algorithm performs well on a certain class of
problems then it necessarily pays for that with
degraded performance on the set of all remaining
problems

There is no one model that works best for every situation

Hay: Không có thuật toán nào là thuật toán hoạt động “tốt nhất”

cho mọi bài toán 54

Main Challenges of Machine Learning

Training data
How many observations are enough for learning?
Whether or not does the size of the training set
affect performance of an ML system?
What is the effect of the disrupted or noisy
observations?

55
Main Challenges of Machine Learning

Learnability:
The goodness/limit of the learning algorithm?
What is the generalization of the system?
o Predict well new observations, not only the training data.
o Avoid overfitting.

56
Overfitting: example
Increasing the size of a decision tree can degrade prediction on
unseen data, even though increasing the accuracy for the
training data.

[Mitchell, 1997] 57
Ethical and Social Considerations

• Bias and Fairness

• Privacy Concerns
• Accountability and Transparency

Amazon’s hiring algorithm COMPAS Recidivism Algorithm 58

Summary
Definition of Machine learning
Type of machine learning
Supervised Learning
o Data: labeled
o Task: regression, classification, …
Unsupervised Learning:
o Data: unlabeled
o Task: clustering, dimensionality reduction, …
Reinforcement Learning
Machine learning Process
Ethical and Social Considerations
Emerging Trends and Challenges

59
Review Questions

60
Scenario 1: Credit Card Fraud Detection

You work for a financial institution and are tasked with

developing a credit card fraud detection system. The dataset
you have includes past credit card transactions, each labeled
as either "fraudulent" or "legitimate." Your goal is to build a
machine learning model that can accurately predict whether a
new transaction is fraudulent or not.

Which type of machine learning approach should you use for

this task?
A) Supervised Learning
B) Unsupervised Learning
C) Reinforcement Learning
D) Semi-Supervised Learning

61
Scenario 2: Customer Segmentation
You are a marketing analyst working for a retail company.
Your team wants to group customers into different segments
based on their purchasing behavior. You have a large dataset
containing customer purchase history, but it does not have
any pre-defined labels for segments.

Which type of machine learning approach should you use to

perform customer segmentation?

A) Supervised Learning
B) Unsupervised Learning
C) Reinforcement Learning
D) Semi-Supervised Learning

62
Scenario 3: Anomaly Detection
As a cybersecurity analyst, your job is to detect network
intrusions and malicious activities. You have access to
log data from various network devices and systems.
Your objective is to identify abnormal patterns that might
indicate a potential security breach.

What type of machine learning approach should you use

to detect anomalies in the network data?

A) Supervised Learning
B) Unsupervised Learning
C) Reinforcement Learning
D) Semi-Supervised Learning

63
64
Thank you!

Unit 1
No ratings yet
Unit 1
92 pages
Machine Learning: Professor Department of Computer Science & Engineering
No ratings yet
Machine Learning: Professor Department of Computer Science & Engineering
59 pages
Machine Learning - Introduction
No ratings yet
Machine Learning - Introduction
138 pages
Module 1
No ratings yet
Module 1
54 pages
Machine Learning BE Merged Modules
No ratings yet
Machine Learning BE Merged Modules
561 pages
ML - Unit 1 - SPR - New July 212025
No ratings yet
ML - Unit 1 - SPR - New July 212025
60 pages
Unit I
No ratings yet
Unit I
69 pages
Chapter 1
No ratings yet
Chapter 1
27 pages
ML - Unit 1
No ratings yet
ML - Unit 1
68 pages
Unit 6 Introduction To Machine Learning
No ratings yet
Unit 6 Introduction To Machine Learning
63 pages
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
114 pages
Module 1 - Intro To ML - V2
No ratings yet
Module 1 - Intro To ML - V2
47 pages
Chapter 5 Machine Learning
No ratings yet
Chapter 5 Machine Learning
96 pages
Introduction To ML
No ratings yet
Introduction To ML
46 pages
Topic 1
No ratings yet
Topic 1
39 pages
Introduction To Machine Learning Lecture1 14july25
No ratings yet
Introduction To Machine Learning Lecture1 14july25
44 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
225 pages
Mlintro 2
No ratings yet
Mlintro 2
28 pages
Unit 1 ML
No ratings yet
Unit 1 ML
96 pages
UNIT I Introduction To Machine Learning
No ratings yet
UNIT I Introduction To Machine Learning
150 pages
Telangana Schemes and Policies (2014-2024) Updated Book-Target TSPSC - 35390223 - 2024 - 06 - 24 - 11 - 08
50% (2)
Telangana Schemes and Policies (2014-2024) Updated Book-Target TSPSC - 35390223 - 2024 - 06 - 24 - 11 - 08
117 pages
Chapter 01 Introduction To ML
No ratings yet
Chapter 01 Introduction To ML
178 pages
1 - ML - Introduction
No ratings yet
1 - ML - Introduction
47 pages
Unit 1
100% (1)
Unit 1
13 pages
1 - AML - Manish
No ratings yet
1 - AML - Manish
72 pages
ML Assignment 1
No ratings yet
ML Assignment 1
12 pages
FAMILY CODE - Ateneo Reviewer
100% (1)
FAMILY CODE - Ateneo Reviewer
26 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
68 pages
Ch3-Machine Learning
No ratings yet
Ch3-Machine Learning
124 pages
Lecture01 Introduction To Machine Learning (Chapter1)
No ratings yet
Lecture01 Introduction To Machine Learning (Chapter1)
64 pages
Module1 ML
No ratings yet
Module1 ML
114 pages
Ml-Unit 1
No ratings yet
Ml-Unit 1
53 pages
Machine Learning - Introduction
No ratings yet
Machine Learning - Introduction
73 pages
The Machine Learning Landscape
No ratings yet
The Machine Learning Landscape
25 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
49 pages
Machine Learning Basics & Types
No ratings yet
Machine Learning Basics & Types
56 pages
Session 3 Types of Machine Learning
No ratings yet
Session 3 Types of Machine Learning
22 pages
Chapter 1
No ratings yet
Chapter 1
40 pages
Grade Ten (10) Work: Unit 1: The Birth and Infancy of John The Baptist and Jesus Christ
67% (3)
Grade Ten (10) Work: Unit 1: The Birth and Infancy of John The Baptist and Jesus Christ
88 pages
Chap 12 PM-BB Multiple Choice Type Questions
No ratings yet
Chap 12 PM-BB Multiple Choice Type Questions
24 pages
Module 1
No ratings yet
Module 1
34 pages
Lecture 1 Machine Learning
No ratings yet
Lecture 1 Machine Learning
23 pages
Chapter - 2 Machine Learning Overview
No ratings yet
Chapter - 2 Machine Learning Overview
90 pages
An Overview of Machine Learning
No ratings yet
An Overview of Machine Learning
20 pages
Module 1 Notes
No ratings yet
Module 1 Notes
38 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Module 1
No ratings yet
Module 1
47 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
28 pages
ML Chapter 1
No ratings yet
ML Chapter 1
37 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Soal Bahasa Inggris Bab Colors Warna-Warna Dan Kunci Jawaban
No ratings yet
Soal Bahasa Inggris Bab Colors Warna-Warna Dan Kunci Jawaban
8 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
Machine Learning Fundamentals Guide
No ratings yet
Machine Learning Fundamentals Guide
46 pages
ML Lec 1
No ratings yet
ML Lec 1
47 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
19 pages
Machine Learning-Lecture 01
No ratings yet
Machine Learning-Lecture 01
28 pages
Unit-1 Introduction To Machine Learning
No ratings yet
Unit-1 Introduction To Machine Learning
24 pages
Machine Learning Types
No ratings yet
Machine Learning Types
30 pages
Machine Learning Syllabus Overview
No ratings yet
Machine Learning Syllabus Overview
70 pages
ML L1 PDF
No ratings yet
ML L1 PDF
43 pages
Machine Learning Essentials
No ratings yet
Machine Learning Essentials
17 pages
Machine Learning for Beginners
No ratings yet
Machine Learning for Beginners
27 pages
Module2 ch2
No ratings yet
Module2 ch2
36 pages
Acute Stress Disorder A Handbook of Theory, Assessment, and Treatment UpLoaDeD by LeaDeR DrVetTox (January 2009)
100% (1)
Acute Stress Disorder A Handbook of Theory, Assessment, and Treatment UpLoaDeD by LeaDeR DrVetTox (January 2009)
218 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
16 pages
Classroom Visits and Observing The Teaching Learning Situation
No ratings yet
Classroom Visits and Observing The Teaching Learning Situation
36 pages
Course: Operations Management Code: OPM 202 Case: Lenovo Student Name: Professor Name: Date: 11/16/2021
No ratings yet
Course: Operations Management Code: OPM 202 Case: Lenovo Student Name: Professor Name: Date: 11/16/2021
14 pages
Narrative Kelas Xi
No ratings yet
Narrative Kelas Xi
10 pages
Marketting Plan For TATA NEXON EV Group 9
100% (1)
Marketting Plan For TATA NEXON EV Group 9
17 pages
Pound Ezra The Cantos
100% (1)
Pound Ezra The Cantos
615 pages
RT 31021112017
No ratings yet
RT 31021112017
8 pages
Methods For Testing Tar and Bituminous Materials - Determination of Specific Gravity
100% (1)
Methods For Testing Tar and Bituminous Materials - Determination of Specific Gravity
10 pages
Experiancing God Unit 10
No ratings yet
Experiancing God Unit 10
2 pages
DLL Basic Cal. June 19-21-23 Week 12
No ratings yet
DLL Basic Cal. June 19-21-23 Week 12
3 pages
Miracle Worker: Chase Ra'Mel Phillips Ms. Nelson English 1
No ratings yet
Miracle Worker: Chase Ra'Mel Phillips Ms. Nelson English 1
3 pages
Book Release: Rock Garden
No ratings yet
Book Release: Rock Garden
4 pages
Anti-Sexual Harassment Guide
No ratings yet
Anti-Sexual Harassment Guide
14 pages
Prepositions of Movement Guide
No ratings yet
Prepositions of Movement Guide
6 pages
History of Tango
No ratings yet
History of Tango
22 pages
Parts Manual Parts Manual Parts Manual Parts Manual: Mfg. No: 122Q02-0001-H1
No ratings yet
Parts Manual Parts Manual Parts Manual Parts Manual: Mfg. No: 122Q02-0001-H1
25 pages
H and M Hennes and Mauritz Retail Private Limited
No ratings yet
H and M Hennes and Mauritz Retail Private Limited
20 pages
Rasmieh Odeh Case - Gov't Appeals Brief
No ratings yet
Rasmieh Odeh Case - Gov't Appeals Brief
75 pages
Contents-Rules of English Grammar and Usage
No ratings yet
Contents-Rules of English Grammar and Usage
5 pages
Soal Ujian Kelas 9 SMP Inggris 2018
No ratings yet
Soal Ujian Kelas 9 SMP Inggris 2018
6 pages
BA Underpayment Appeal Letter - NSA MNRP
No ratings yet
BA Underpayment Appeal Letter - NSA MNRP
3 pages
Laws of Guyana
100% (1)
Laws of Guyana
128 pages
Seiko 9F 8J 4J
No ratings yet
Seiko 9F 8J 4J
14 pages
w2 - For Students - w2 - Preparation For Chap 5
No ratings yet
w2 - For Students - w2 - Preparation For Chap 5
3 pages
Entrepreneur 3
No ratings yet
Entrepreneur 3
24 pages

01 - ML - Introduction

Uploaded by

01 - ML - Introduction

Uploaded by

Ho Chi Minh University of Banking

Vuong Trong Nhan ([email protected])

1. Introduction to Machine Learning

Wine quality prediction

Machine Learning (ML) is an active subfield of

Arthur Samuel (1959). Machine Learning:

Tom Mitchell (1998) Well-posed Learning

A learning problem can be described as a triple (T, P, E):

What is the task T in this setting?

A. Classifying emails as spam or not spam. (T)

Figure 1-1. The traditional approach

Figure 1-2. The Machine Learning approach

Figure 1-3. Automatically adapting to change

Figure 1-4. Machine learning can help humans learn

Machine Learning Process

Machine learning systems can be categorized based

The training set you feed to the algorithm

Figure 1-5. A labeled training set for spam classification 27

Figure 1-6. A regression problem: predict a value, given an input feature

Types of classification problems

Problem 1: You have a large inventory of identical items. You want

Problem 2: You’d like software to examine individual customer

Should you treat these as classification or as regression problems?

❖ Goal: to find interesting structures in the data

Figure 1-7. An unlabeled training

Figure 1-8. Clustering

Figure 1-9. Example of a t-SNE visualization highlighting semantic clusters

Figure 1-10. Anomaly detection

Figure 1-13. Reinforcement learning

1. Insufficient Quantity of Training Data

1. Insufficient Quantity of Training Data

Figure 1-21. The importance of data versus algorithms 46

2. Nonrepresentative Training Data

Figure 1-22. A more representative training sample

=> spend lot of time cleaning up your training data

• If some instances are missing a few features (e.g., 5%

-> spend lot of time cleaning up your training data

Underfitting Good Overfitting

There is no one model that works best for every situation

cho mọi bài toán 54

• Bias and Fairness

Amazon’s hiring algorithm COMPAS Recidivism Algorithm 58

You work for a financial institution and are tasked with

Which type of machine learning approach should you use for

Which type of machine learning approach should you use to

What type of machine learning approach should you use

You might also like