02 Machine Learning Overview
02 Machine Learning Overview
Foreword
Machine learning is a core research field of AI, and it is also a necessary knowledge for deep
learning. Therefore, this chapter mainly introduces the main concepts of machine learning, the
classification of machine learning, the overall process of machine learning, and the common
algorithms of machine learning.
1 Huawei Confidential
Objectives
2 Huawei Confidential
Contents
6. Case Study
3 Huawei Confidential
Machine Learning Algorithms (1)
Machine learning (including deep learning) is a study of learning algorithms. A computer program is said to
learn from experience 𝐄 with respect to some class of tasks 𝐓 and performance measure 𝐏 if its performance
at tasks in 𝑇, as measured by 𝑃, improves with experience 𝐸.
4 Huawei Confidential
Machine Learning Algorithms (2)
Induction Training
5 Huawei Confidential
Created by: Jim Liang
Training
data
Machine learning
7 Huawei Confidential
Application Scenarios of Machine Learning (2)
Small Large
Scale of the problem
8 Huawei Confidential
Rational Understanding of Machine Learning Algorithms
Target equation
𝑓: 𝑋 → 𝑌
Ideal
Actual
Training data Hypothesis function
Learning algorithms
𝐷: {(𝑥1 , 𝑦1 ) ⋯ , (𝑥𝑛 , 𝑦𝑛 )} 𝑔≈𝑓
9 Huawei Confidential
Main Problems Solved by Machine Learning
Machine learning can deal with many types of tasks. The following describes the most typical and common types of
tasks.
Classification: A computer program needs to specify which of the k categories some input belongs to.
Regression: For this type of task, a computer program predicts the output for the given input.
Clustering: A large amount of data from an unlabeled dataset is divided into multiple categories according to internal similarity of
the data.
Classification and regression are two main types of prediction, accounting from 80% to 90%. The output of
classification is discrete category values, and the output of regression is continuous numbers.
10 Huawei Confidential
Contents
6. Case study
11 Huawei Confidential
Machine Learning Classification
Supervised learning: Obtain an optimal model with required performance through training and learning based on the samples of known
categories. Then, use the model to map all inputs to outputs and check the output for the purpose of classifying unknown data.
Unsupervised learning: For unlabeled samples, the learning algorithms directly model the input datasets. Clustering is a common form
of unsupervised learning. We only need to put highly similar samples together, calculate the similarity between new samples and existing
ones, and classify them by similarity.
Semi-supervised learning: In one task, a machine learning model that automatically uses a large amount of unlabeled data to assist
learning directly of a small amount of labeled data.
Reinforcement learning: It is an area of machine learning concerned with how agents ought to take actions in an environment to
maximize some notion of cumulative reward. The difference between reinforcement learning and supervised learning is the teacher
signal. The reinforcement signal provided by the environment in reinforcement learning is used to evaluate the action (scalar signal)
rather than telling the learning system how to perform correct actions.
12 Huawei Confidential
Supervised Learning
Data feature Label
13 Huawei Confidential
Supervised Learning - Regression Questions
Regression: reflects the features of attribute values of samples in a sample dataset. The
dependency between attribute values is discovered by expressing the relationship of
sample mapping through functions.
How much will I benefit from the stock next week?
What's the temperature on Tuesday?
14 Huawei Confidential
Supervised Learning - Classification Questions
Classification: maps samples in a sample dataset to a specified category by using a
classification model.
Will there be a traffic jam on XX road during
the morning rush hour tomorrow?
Which method is more attractive to customers:
5 yuan voucher or 25% off?
15 Huawei Confidential
Unsupervised Learning
Data Feature
Monthly
Commodity Consumption Time
Consumption Category
1000–2000 Badminton racket 6:00–12:00 Cluster 1
500–1000 Basketball 18:00–24:00 Cluster 2
1000–2000 Game console 00:00–6:00
16 Huawei Confidential
Unsupervised Learning - Clustering Questions
Clustering: classifies samples in a sample dataset into several categories based on the
clustering model. The similarity of samples belonging to the same category is high.
17 Huawei Confidential
Semi-Supervised Learning
Data Feature Label
Semi-supervised learning
Feature 1 ... Feature n Unknown
algorithms
18 Huawei Confidential
Reinforcement Learning
The model perceives the environment, takes actions, and makes adjustments and choices
based on the status and award or punishment.
Model
Reward or Action 𝑎𝑡
Status 𝑠𝑡
punishment 𝑟𝑡
𝑟𝑡+1
𝑠𝑡+1 Environment
19 Huawei Confidential
Reinforcement Learning - Best Behavior
Reinforcement learning: always looks for best behaviors. Reinforcement learning is
targeted at machines or robots.
Autopilot: Should it brake or accelerate when the yellow light starts to flash?
Cleaning robot: Should it keep working or go back for charging?
20 Huawei Confidential
Contents
6. Case study
21 Huawei Confidential
Machine Learning Process
Feature
Data Data Model Model deployment
extraction and Model training
collection cleansing evaluation and integration
selection
22 Huawei Confidential
Basic Machine Learning Concept — Dataset
Dataset: a collection of data used in machine learning tasks. Each data record is called a sample. Events or attributes
that reflect the performance or nature of a sample in a particular aspect are called features.
Training set: a dataset used in the training process, where each sample is referred to as a training sample. The process of
creating a model from data is called learning (training).
Test set: Testing refers to the process of using the model obtained after learning for prediction. The dataset used is called a
test set, and each sample is called a test sample.
23 Huawei Confidential
Checking Data Overview
Typical dataset form
4 80 9 Southeast 1100
24 Huawei Confidential
Importance of Data Processing
Data is crucial to models. It is the ceiling of model capabilities. Without good data, there is no good model.
Data
Data cleansing preprocessing Data normalization
26 Huawei Confidential
Data Cleansing
Most machine learning models process features, which are usually numeric representations of input variables that can be
used in the model.
In most cases, the collected data can be used by algorithms only after being preprocessed. The preprocessing operations
include the following:
Data filtering
Processing of lost data
Processing of possible exceptions, errors, or abnormal values
Combination of data from multiple data sources
Data consolidation
27 Huawei Confidential
Dirty Data (1)
Generally, real data may have some quality problems.
Incompleteness: contains missing values or the data that lacks attributes
Noise: contains incorrect records or exceptions.
Inconsistency: contains inconsistent records.
28 Huawei Confidential
Dirty Data (2)
IsTeac #Stud
# Id Name Birthday Gender Country City
her ents
30 Huawei Confidential
Necessity of Feature Selection
Generally, a dataset has many features, some of which may be redundant or irrelevant to the value to be
predicted.
Feature selection is necessary in the following aspects:
Simplify models
to make them Reduce the
easy for users training time
to interpret
Improve model
Avoid dimension generalization
explosion and avoid
overfitting
31 Huawei Confidential
Feature Selection Methods - Filter
Filter methods are independent of the model during feature selection.
By evaluating the correlation between each feature and the target
attribute, these methods use a statistical measure to assign a value to
each feature. Features are then sorted by score, which is helpful for
preserving or eliminating specific features.
Common methods
• Pearson correlation coefficient
Traverse all Select the Train Evaluate the
features optimal feature • Chi-square coefficient
models performance
subset • Mutual information
32 Huawei Confidential
Feature Selection Methods - Wrapper
Wrapper methods use a prediction model to score feature subsets.
Wrapper methods consider feature selection as a search issue
for which different combinations are evaluated and compared.
A predictive model is used to evaluate a combination of
Select the optimal features and assign a score based on model accuracy.
feature subset
Common methods
Generate Train Evaluate
Traverse all a feature
• Recursive Feature Elimination (RFE)
models models
features subset
Limitations
Procedure of a • Wrapper methods train a new model for each subset,
wrapper method resulting in a huge number of computations.
• A feature set with the best performance is usually provided
for a specific type of model.
33 Huawei Confidential
Feature Selection Methods - Embedded
Embedded methods consider feature selection as a part of model construction.
The most common type of embedded feature selection method is the
regularization method.
Select the optimal feature subset Regularization methods are also called penalization methods that
introduce additional constraints into the optimization of a predictive
Train models algorithm that bias the model toward lower complexity and reduce
Traverse all Generate a
features feature subset + Evaluate the effect the number of features.
34 Huawei Confidential
Overall Procedure of Building a Model
Model Building Procedure
1 2 3
Model
training
Each feature or a combination of several features can provide a basis for a
model to make a judgment.
36 Huawei Confidential
Examples of Supervised Learning - Prediction Phase
Name City Age Label
Marine Miami 45 ?
Julien Miami 52 ? Unknown data
New Fred Orlando 20 ?
Recent data, it is not
data Michelle Boston 34 ?
known whether the
Nicolas Phoenix 90 ?
people are basketball
players.
IF city = Miami → Probability = +0.7
IF city= Orlando → Probability = +0.2
IF age > 42 → Probability = +0.05*age + 0.06
Application model IF age ≤ 42 → Probability = +0.01*age + 0.02
• Interpretability
Is the prediction result easy to interpret?
• Prediction speed
How long does it take to predict each piece of data?
• Practicability
Is the prediction rate still acceptable when the service volume
increases with a huge data volume?
38 Huawei Confidential
Model Validity (1)
Generalization capability: The goal of machine learning is that the model obtained after learning should perform well on new samples, not
just on samples used for training. The capability of applying a model to new samples is called generalization or robustness.
Error: difference between the sample result predicted by the model obtained after learning and the actual sample result.
Training error: error that you get when you run the model on the training data.
Generalization error: error that you get when you run the model on new samples. Obviously, we prefer a model with a smaller
generalization error.
Underfitting: occurs when the model or the algorithm does not fit the data well enough.
Overfitting: occurs when the training error of the model obtained after learning is small but the generalization error is large (poor
generalization capability).
39 Huawei Confidential
Model Validity (2)
Model capacity: model's capability of fitting functions, which is also called model complexity.
When the capacity suits the task complexity and the amount of training data provided, the algorithm effect is usually optimal.
Models with insufficient capacity cannot solve complex tasks and underfitting may occur.
A high-capacity model can solve complex tasks, but overfitting may occur if the capacity is higher than that required by a task.
Bias:
Difference between the expected (or average) prediction value and the correct value we are trying to predict.
41 Huawei Confidential
Variance and Bias
Combinations of variance and bias are as follows:
Low bias & low variance –> Good model
Low bias & high variance
High bias & low variance
High bias & high variance –> Poor model
42 Huawei Confidential
Model Complexity and Error
As the model complexity increases, the training error decreases.
As the model complexity increases, the test error decreases to a certain point and then increases in the reverse
direction, forming a convex curve.
Testing error
Error
Training error
Model Complexity
43 Huawei Confidential
Machine Learning Performance Evaluation - Regression
The closer the Mean Absolute Error (MAE) is to 0, the better the model can fit the training data.
𝑚
1
𝑀𝐴𝐸 = 𝑦𝑖 − 𝑦𝑖
m
𝑖=1
The value range of R2 is (–∞, 1]. A larger value indicates that the model can better fit the training data. TSS indicates the difference
between samples. RSS indicates the difference between the predicted value and sample value.
𝑚 2
2
𝑅𝑆𝑆 𝑖=1 𝑦𝑖 − 𝑦𝑖
𝑅 =1− =1− 𝑚 2
𝑇𝑆𝑆 𝑖=1 𝑦𝑖 − 𝑦𝑖
44 Huawei Confidential
Machine Learning Performance Evaluation - Classification (1)
Terms and definitions:
Estimated
𝐏: positive, indicating the number of real positive cases in the data. amount
yes no Total
𝐍: negative, indicating the number of real negative cases in the data. Actual amount
𝐓𝐏 : true positive, indicating the number of positive cases that are correctly classified by the classifier. yes 𝑇𝑃 𝐹𝑁 𝑃
𝐓𝐍: true negative, indicating the number of negative cases that are correctly classified by the classifier. no 𝐹𝑃 𝑇𝑁 𝑁
𝑃
𝐅𝐏: false positive, indicating the number of positive cases that are incorrectly classified by the classifier. Total 𝑃′ 𝑁′
+𝑁
𝐅𝐍: false negative, indicating the number of negative cases that are incorrectly classified by the classifier.
Confusion matrix
Confusion matrix: at least an 𝑚 × 𝑚 table. 𝐶𝑀𝑖,𝑗 of the first 𝑚 rows and 𝑚 columns indicates the number of cases that actually belong to class 𝑖 but
are classified into class 𝑗 by the classifier.
Ideally, for a high accuracy classifier, most prediction values should be located in the diagonal from 𝐶𝑀1,1 to 𝐶𝑀𝑚,𝑚 of the table while values outside the diagonal are 0
or close to 0. That is, 𝐹𝑃 and 𝐹𝑃 are close to 0.
45 Huawei Confidential
Machine Learning Performance Evaluation - Classification (2)
Measurement Ratio
𝑇𝑃 + 𝑇𝑁
Accuracy and recognition rate
𝑃+𝑁
𝐹𝑃 + 𝐹𝑁
Error rate and misclassification rate
𝑃+𝑁
𝑇𝑃
Sensitivity, true positive rate, and recall
𝑃
𝑇𝑁
Specificity and true negative rate
𝑁
𝑇𝑃
Precision
𝑇𝑃 + 𝐹𝑃
2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
𝐹1 , harmonic mean of the recall rate and precision
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
(1 + 𝛽 2 ) × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
𝐹𝛽 , where 𝛽 is a non-negative real number 𝛽 2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
46 Huawei Confidential
Example of Machine Learning Performance Evaluation
We have trained a machine learning model to identify whether the object in an image is a cat. Now we use 200
pictures to verify the model performance. Among the 200 images, objects in 170 images are cats, while others are
not. The identification result of the model is that objects in 160 images are cats, while others are not.
Estimated amount
𝑇𝑃 140
Precision: 𝑃 = = = 87.5% Actual 𝒚𝒆𝒔 𝒏𝒐 Total
𝑇𝑃+𝐹𝑃 140+20
amount
𝑇𝑃 140
Recall: 𝑅 = = = 82.4% 𝑦𝑒𝑠 140 30 170
𝑃 170
𝑇𝑃+𝑇𝑁 140+10 𝑛𝑜 20 10 30
Accuracy: 𝐴𝐶𝐶 = 𝑃+𝑁
=
170+30
= 75%
Total 160 40 200
47 Huawei Confidential
Contents
6. Case study
48 Huawei Confidential
Machine Learning Training Method - Gradient Descent (1)
Cost surface
The gradient descent method uses the negative gradient direction of the
current position as the search direction, which is the steepest direction.
The formula is as follows:
wk 1 wk f wk ( x i )
In the formula, 𝜂 indicates the learning rate and 𝑖 indicates the data record number 𝑖.
The weight parameter w indicates the change in each iteration.
Convergence: The value of the objective function changes very little, or the maximum
number of iterations is reached.
49 Huawei Confidential
Machine Learning Training Method - Gradient Descent (2)
Batch Gradient Descent (BGD) uses the samples (m in total) in all datasets to update the weight parameter based on the
gradient value at the current point.
1 m
wk 1 wk f wk ( x i )
m i 1
Stochastic Gradient Descent (SGD) randomly selects a sample in a dataset to update the weight parameter based on the
gradient value at the current point.
wk 1 wk f wk ( x i )
Mini-Batch Gradient Descent (MBGD) combines the features of BGD and SGD and selects the gradients of n samples in a
dataset to update the weight parameter.
1 t n 1
wk 1 wk f wk ( x i )
n it
50 Huawei Confidential
Machine Learning Training Method - Gradient Descent (3)
Comparison of three gradient descent methods
In the SGD, samples selected for each training are stochastic. Such instability causes the loss function to be unstable or even
causes reverse displacement when the loss function decreases to the lowest point.
BGD has the highest stability but consumes too many computing resources. MBGD is a method that balances SGD and BGD.
BGD
Uses all training samples for training each time.
SGD
Uses one training sample for training each time.
MBGD
Uses a certain number of training samples for training each time.
51 Huawei Confidential
Parameters and Hyperparameters in Models
The model contains not only parameters but also hyperparameters. The purpose is to enable the
model to learn the optimal parameters.
Model
Training
Use hyperparameters to
52 Huawei Confidential
control training.
Hyperparameters of a Model
• λ during Lasso/Ridge regression
• Often used in model parameter estimation • Learning rate for training a neural
processes.
network, number of iterations, batch size,
activation function, and number of
• Often specified by the practitioner. neurons
• 𝐶 and 𝜎 in support vector machines
• Can often be set using heuristics. (SVM)
• Often tuned for a given predictive modeling • K in k-nearest neighbor (KNN)
problem. • Number of trees in a random forest
53 Huawei Confidential
Hyperparameter Search Procedure and Method
1. Dividing a dataset into a training set, validation set, and test set.
2. Optimizing the model parameters using the training set based on the model performance indicators.
3. Searching for the model hyper-parameters using the validation set based on the model performance
Procedure for indicators.
searching 4. Perform step 2 and step 3 alternately. Finally, determine the model parameters and hyperparameters
hyperparameters and assess the model using the test set.
•Grid search
•Random search
Search algorithm •Heuristic intelligent search
•Bayesian search
(step 3)
54 Huawei Confidential
Hyperparameter Searching Method - Grid Search
Grid search attempts to exhaustively search all possible hyperparameter combinations to form a hyperparameter value grid.
In practice, the range of hyperparameter values to search is specified manually. Grid search
5
4
Grid search is an expensive and time-consuming method.
Hyperparameter 1
3
This method works well when the number of hyperparameters
2
Hyperparameter 2
55 Huawei Confidential
Hyperparameter Searching Method - Random Search
When the hyperparameter search space is large, random search is better Random search
than grid search.
In random search, each setting is sampled from the distribution of possible
parameter values, in an attempt to find the best subset of hyperparameters.
Parameter 1
Note:
Search is performed within a coarse range, which then will be narrowed based on where the
best result appears.
Parameter 2
Some hyperparameters are more important than others, and the search deviation will be
affected during random search.
56 Huawei Confidential
Cross Validation (1)
Cross validation: It is a statistical analysis method used to validate the performance of a classifier. The basic idea
is to divide the original dataset into two parts: training set and validation set. Train the classifier using the training
set and test the model using the validation set to check the classifier performance.
57 Huawei Confidential
Cross Validation (2)
Entire dataset
58 Huawei Confidential
Contents
6. Case study
59 Huawei Confidential
Machine Learning Algorithm Overview
Machine learning
GBDT GBDT
KNN
Naive Bayes
60 Huawei Confidential
Linear Regression (1)
Linear regression: a statistical analysis method to determine the quantitative relationships between two or more
variables through regression analysis in mathematical statistics.
Linear regression is a type of supervised learning.
61 Huawei Confidential
Linear Regression (2)
The model function of linear regression is as follows, where 𝒘 indicates the weight parameter, 𝒃 indicates the bias, and 𝒙 indicates the sample
attribute.
hw ( x) wT x b
The relationship between the value predicted by the model and actual value is as follows, where 𝒚 indicates the actual value, and 𝜺 indicates the error.
y w xb
T
The error 𝜺 is influenced by many factors independently. According to the central limit theorem, the error 𝜺 follows normal distribution. According to
the normal distribution function and maximum likelihood estimation, the loss function of linear regression is as follows:
1
J ( w) w
2
h ( x ) y
2m
To make the predicted value close to the actual value, we need to minimize the loss value. We can use the gradient descent method to calculate the weight
parameter 𝑤 when the loss function reaches the minimum, and then complete model building.
62 Huawei Confidential
Linear Regression Extension - Polynomial Regression
Polynomial regression is an extension of linear regression. Generally, the complexity of a dataset exceeds the possibility of fitting by a
straight line. That is, obvious underfitting occurs if the original linear regression model is used. The solution is to use polynomial
regression.
hw ( x ) w1 x w2 x 2 wn x n b
63 Huawei Confidential
Linear Regression and Overfitting Prevention
Regularization terms can be used to reduce overfitting. The value of 𝑤 cannot be too large or too small in the sample space. You can add
a square sum loss on the target function.
1
J ( w) w + w
2 2
h ( x ) y 2
2m
Regularization terms (norm): The regularization term here is called L2-norm. Linear regression that uses this loss function is also called
Ridge regression.
1
J ( w) w + w 1
2
h ( x ) y
2m
64 Huawei Confidential
Logistic Regression (1)
Logistic regression: The logistic regression model is used to solve classification problems. The model is defined
as follows:
𝑒 𝑤𝑥+𝑏
𝑃 𝑌=1𝑥 =
1 + 𝑒 𝑤𝑥+𝑏
1
𝑃 𝑌=0𝑥 =
1 + 𝑒 𝑤𝑥+𝑏
where 𝒘 indicates the weight, 𝑏 indicates the bias, and 𝑤𝑥 + 𝑏 is regarded as the linear function of 𝑥. Compare the preceding two probability
values. The class with a higher probability value is the class of 𝑥.
65 Huawei Confidential
Logistic Regression (2)
Both the logistic regression model and linear regression model are generalized linear models. Logistic regression introduces nonlinear
factors (the sigmoid function) based on linear regression and sets thresholds, so it can deal with binary classification problems.
According to the model function of logistic regression, the loss function of logistic regression can be estimated as follows by using the
maximum likelihood estimation:
1
J ( w)
m
y ln hw ( x) (1 y ) ln(1 hw ( x))
where 𝑤 indicates the weight parameter, 𝑚 indicates the number of samples, 𝑥 indicates the sample, and 𝑦 indicates the real value. The
values of all the weight parameters 𝑤 can also be obtained through the gradient descent algorithm.
66 Huawei Confidential
Logistic Regression Extension - Softmax Function (1)
Logistic regression applies only to binary classification problems. For multi-class classification problems, use
the Softmax function.
Grape?
Male? Orange?
Apple?
Female? Banana?
67 Huawei Confidential
Logistic Regression Extension - Softmax Function (2)
Softmax regression is a generalization of logistic regression that we can use for K-class classification.
The Softmax function is used to map a K-dimensional vector of arbitrary real values to another K-dimensional
vector of real values, where each vector element is in the interval (0, 1).
The regression probability function of Softmax is as follows:
wkT x
e
p ( y k | x; w) K
, k 1, 2 ,K
e
l 1
wlT x
68 Huawei Confidential
Logistic Regression Extension - Softmax Function (3)
Softmax assigns a probability to each class in a multi-class problem. These probabilities must add up to 1.
Softmax may produce a form belonging to a particular class. Example:
Category Probability
Grape? 0.09
Banana? 0.01
69 Huawei Confidential
Decision Tree
A decision tree is a tree structure (a binary tree or a non-binary tree). Each non-leaf node represents a test on
a feature attribute. Each branch represents the output of a feature attribute in a certain value range, and each leaf
node stores a category. To use the decision tree, start from the root node, test the feature attributes of the items
to be classified, select the output branches, and use the category stored on the leaf node as the final result.
Root
Short Tall
On land In water
Might be an elephant
Might be a
Might be a rhinoceros
70 Huawei Confidential hippo
Decision Tree Structure
Root Node
Internal Node
Leaf Node Leaf Node Leaf Node
71 Huawei Confidential
Decision Tree Construction Process
Feature selection: Select a feature from the features of the training data as the split standard of the current node.
(Different standards generate different decision tree algorithms.)
Decision tree generation: Generate internal node upside down based on the selected features and stop until the dataset
can no longer be split.
Pruning: The decision tree may easily become overfitting unless necessary pruning (including pre-pruning and post-
pruning) is performed to reduce the tree size and optimize its node structure.
72 Huawei Confidential
Decision Tree Example
The following figure shows a classification when a decision tree is used. The classification result is impacted by
three attributes: Refund, Marital Status, and Taxable Income.
Taxable
Tid Refund Marital Status Cheat
Income
1 Yes Single 125,000 No
Refund
2 No Married 100,000 No
3 No Single 70,000 No Marital
No Status
4 Yes Married 120,000 No
5 No Divorced 95,000 Yes
Taxable
6 No Married 60,000 No Income No
7 Yes Divorced 220,000 No
8 No Single 85,000 Yes No Yes
9 No Married 75,000 No
10 No Single 90,000 Yes
73 Huawei Confidential
SVM
SVM is a binary classification model whose basic model is a linear classifier defined in the eigenspace with the
largest interval. SVMs also include kernel tricks that make them nonlinear classifiers. The SVM learning
algorithm is the optimal solution to convex quadratic programming.
weight
Projection
74 Huawei Confidential
Linear SVM (1)
How do we split the red and blue datasets by a straight line?
or
With binary classification Both the left and right methods can be used to divide
Two-dimensional dataset datasets. Which of them is correct?
75 Huawei Confidential
Linear SVM (2)
Straight lines are used to divide data into different classes. Actually, we can use multiple straight lines to divide data. The core idea of
the SVM is to find a straight line and keep the point close to the straight line as far as possible from the straight line. This can enable
strong generalization capability of the model. These points are called support vectors.
In two-dimensional space, we use straight lines for segmentation. In high-dimensional space, we use hyperplanes for segmentation.
Distance between
support vectors
is as far as possible
76 Huawei Confidential
Nonlinear SVM (1)
How do we classify a nonlinear separable dataset?
Linear SVM can function well for linear Nonlinear datasets cannot be split with
separable datasets. straight lines.
77 Huawei Confidential
Nonlinear SVM (2)
Kernel functions are used to construct nonlinear SVMs.
Kernel functions allow algorithms to fit the largest hyperplane in a transformed high-dimensional feature space.
Linear Polynomial
kernel kernel
function function
Gaussian Sigmoid
kernel kernel
function function Input space High-dimensional
feature space
78 Huawei Confidential
KNN Algorithm (1)
The KNN classification algorithm is a theoretically mature
method and one of the simplest machine learning algorithms.
79 Huawei Confidential
KNN Algorithm (2)
As the prediction result is determined based on the number and
weights of neighbors in the training set, the KNN algorithm has a
simple logic.
KNN is a non-parametric method which is usually used in
datasets with irregular decision boundaries.
The KNN algorithm generally adopts the majority voting method for
classification prediction and the average value method for regression
prediction.
81 Huawei Confidential
Naive Bayes (1)
Naive Bayes algorithm: a simple multi-class classification algorithm based on the Bayes theorem. It assumes that
features are independent of each other. For a given sample feature 𝑋, the probability that a sample belongs to a
category 𝐻 is: P X 1 , , X n | Ck P Ck
P Ck | X 1 , , X n
P X 1 , , X n
𝑋1 , … , 𝑋𝑛 are data features, which are usually described by measurement values of m attribute sets.
For example, the color feature may have three attributes: red, yellow, and blue.
82 Huawei Confidential
Naive Bayes (2)
Independent assumption of features.
For example, if a fruit is red, round, and about 10 cm (3.94 in.) in diameter, it can be considered an apple.
A Naive Bayes classifier considers that each feature independently contributes to the probability that
the fruit is an apple, regardless of any possible correlation between the color, roundness, and
diameter.
83 Huawei Confidential
Ensemble Learning
Ensemble learning is a machine learning paradigm in which multiple learners are trained and combined to solve the same problem. When
multiple learners are used, the integrated generalization capability can be much stronger than that of a single learner.
If you ask a complex question to thousands of people at random and then summarize their answers, the summarized answer is better than
an expert's answer in most cases. This is the wisdom of the masses.
Training set
Large
Model
model
synthesis
84 Huawei Confidential
Classification of Ensemble Learning
85 Huawei Confidential
Ensemble Methods in Machine Learning (1)
Random forest = Bagging + CART decision tree
Random forests build multiple decision trees and merge them together to make predictions more accurate
and stable.
Random forests can be used for classification and regression problems.
Aggregation prediction
Bootstrap sampling Decision tree building result
Data subset 1 Prediction 1
86 Huawei Confidential
Ensemble Methods in Machine Learning (2)
GBDT is a type of boosting algorithm.
For an aggregative mode, the sum of the results of all the basic learners equals the predicted value. In essence, the residual of the error
function to the predicted value is fit by the next basic learner. (The residual is the error between the predicted value and the actual
value.)
During model training, GBDT requires that the sample loss for model prediction be as small as possible.
Prediction
30 years old 20 years old
Residual calculation
Prediction
10 years old 9 years old
Residual calculation
Prediction
1 year old 1 year old
87 Huawei Confidential
Unsupervised Learning - K-means
K-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest
mean, serving as a prototype of the cluster.
For the k-means algorithm, specify the final number of clusters (k). Then, divide n data objects into k clusters. The clusters obtained meet
the following conditions: (1) Objects in the same cluster are highly similar. (2) The similarity of objects in different clusters is
small.
x1 x1
K-means clustering
88 Huawei Confidential
Unsupervised Learning - Hierarchical Clustering
Hierarchical clustering divides a dataset at different layers and forms a tree-like clustering structure. The dataset
division may use a "bottom-up" aggregation policy, or a "top-down" splitting policy. The hierarchy of clustering is
represented in a tree graph. The root is the unique cluster of all samples, and the leaves are the cluster of only a
sample.
89 Huawei Confidential
Contents
6. Case study
90 Huawei Confidential
Comprehensive Case
Assume that there is a dataset containing the house areas and prices of 21,613 housing units sold in a city. Based
on this data, we can predict the prices of other houses in the city.
91 Huawei Confidential
Problem Analysis
This case contains a large amount of data, including input x (house area), and output y (price), which is a continuous value. We can use
regression of supervised learning. Draw a scatter chart based on the data and use linear regression.
Our goal is to build a model function h(x) that infinitely approximates the function that expresses true distribution of the dataset.
Then, use the model to predict unknown price data.
Price
Dataset Learning h(x)
algorithm
Output
y
Label: price
House area
92 Huawei Confidential
Goal of Linear Regression
Linear regression aims to find a straight line that best fits the dataset.
Linear regression is a parameter-based model. Here, we need learning parameters 𝑤0 and 𝑤1 . When
these two parameters are found, the best model appears.
Which line is the best parameter?
h( x) wo w1 x
Price
Price
House area House area
93 Huawei Confidential
Loss Function of Linear Regression
To find the optimal parameter, construct a loss function and find the parameter values when the loss function
becomes the minimum.
1
J ( w)
2
Loss function of linear h ( x ) y
regression: 2m
Error
Error
Error
Error
Goal:
Price
1
arg min J ( w) h( x ) y
2
w 2m
• where, m indicates the number of samples,
• h(x) indicates the predicted value, and y indicates the actual value.
House area
94 Huawei Confidential
Gradient Descent Method
The gradient descent algorithm finds the minimum value of a function through iteration.
It aims to randomize an initial point on the loss function, and then find the global minimum value of the loss function based on the negative
gradient direction. Such parameter value is the optimal parameter value.
Point A: the position of 𝑤0 and 𝑤1 after random initialization.
𝑤0 and 𝑤1 are the required parameters.
Cost surface
A-B connection line: a path formed based on descents in
a negative gradient direction. Upon each descent, values 𝑤0
and 𝑤1 change, and the regression line also changes.
95 Huawei Confidential
Iteration Example
The following is an example of a gradient descent iteration. We can see that as red points on the loss function
surface gradually approach a lowest point, fitting of the linear regression red line with data becomes better and
better. At this time, we can get the best parameters.
96 Huawei Confidential
Model Debugging and Application
After the model is trained, test it with the test set to ensure the The final model result is as follows:
generalization capability. h( x) 280.62 x 43581
If overfitting occurs, use Lasso regression or Ridge
regression with regularization terms and tune the
hyperparameters.
Price
such as GBDT.
Note:
For real data, pay attention to the functions of data cleansing and
feature engineering.
House area
97 Huawei Confidential
Summary
First, this course describes the definition and classification of machine learning, as well as problems machine
learning solves. Then, it introduces key knowledge points of machine learning, including the overall procedure
(data collection, data cleansing, feature extraction, model training, model training and evaluation, and model
deployment), common algorithms (linear regression, logistic regression, decision tree, SVM, naive Bayes, KNN,
ensemble learning, K-means, etc.), gradient descent algorithm, parameters and hyper-parameters.
Finally, a complete machine learning process is presented by a case of using linear regression to predict house
prices.
98 Huawei Confidential
Quiz
1. (True or false) Gradient descent iteration is the only method of machine learning algorithms. ( )
A. True
B. False
B. Decision tree
C. KNN
D. K-means
99 Huawei Confidential
Recommendations