Machine Learning Diploma
Level3: Machine Learning
Session 5
Agenda
➔ Standard Deviation and Coefficient of Variation (CV)
➔ Decision Tree Regression
➔ Tuning Trees
➔ Sklearn and Decision Trees
2
2. Standard Deviation and CV
3
Definition:
Standard Deviation
The Standard Deviation is a measure of how spread-out numbers are.
Its symbol is σ (the greek letter sigma)
The formula is easy: it is the square root of the Variance.
Variance
The Variance is defined as:
The average of the squared differences from the Mean.
4
Definition:
5
Definition:
6
Definition:
Coefficient of Variation (CV)
The coefficient of variation (CV) is a statistical measure of the dispersion
of data points in a data series around the mean. The coefficient of
variation represents the ratio of the standard deviation to the mean
Coefficient of Variation (CV)
147
𝐶𝑉 =
394
= 37.3%
7
2. Decision Tree Regression
8
Definition:
➔ Decision tree builds regression or classification models in the form of a
tree structure.
➔ It breaks down a dataset into smaller and smaller subsets while at the
same time an associated decision tree is incrementally developed.
➔ The final result is a tree with Decision Nodes and Leaf Nodes.
9
True Nodes:
➔ The Root Node is the initial node which represents the entire sample
and may get split further into further nodes.
➔ The Interior Nodes represent the features of a data set and the branches
represent the decision rules. ^&!@#$%%%
➔ Finally, the Leaf Nodes represent the outcome.
10
Decision Tree Structure:
11
Decision Tree Example:
12
3. Tuning Trees
13
Building Decision Tree:
14
Building Decision Tree:
➔ Decision tree employs a top-down greedy search through the space of
possible branches.
➔ We use Standard Deviation to calculate the Homogeneity of a numerical
sample. If the numerical sample is completely homogeneous its standard
deviation is zero.
➔ Standard Deviation: A decision tree is built top-down from a root node
and involves partitioning the data into subsets that contain instances
with similar values (homogenous).
15
Building Decision Tree:
➔ Tree is constructed using Standard Deviation Reduction.
➔ Standard Deviation Reduction: It is based on the decrease in standard
deviation after a dataset is split on an attribute.
➔ Constructing a decision tree is all about finding attribute that returns the
highest standard deviation reduction (i.e., the most homogeneous
branches).
16
Tuning Tree:
➔ Step 1: The standard deviation of the target is calculated.
17
Tuning Tree:
➔ Step 2: The dataset is then split on the different attributes. The standard
deviation for each branch is calculated. The resulting standard deviation
is subtracted from the standard deviation before the split. The result is
the standard deviation reduction.
18
Tuning Tree:
➔ Step 3: The attribute with the largest standard deviation reduction is
chosen for the decision node.
19
Tuning Tree:
➔ Step 4-a: The dataset is divided based on the values of the selected
attribute. This process is run recursively on the non-leaf branches, until
all data is processed.
20
Tuning Tree:
➔ Step 4-a: We need some termination criteria. For example, when
coefficient of deviation (CV) for a branch becomes smaller than a certain
threshold (e.g., 10%) and/or when too few instances (n) remain in the
branch (e.g., 3).
21
Tuning Tree:
➔ Step 4-b: "Overcast" subset does not need any further splitting because
its CV (8%) is less than the threshold (10%). The related leaf node gets
the average of the "Overcast" subset.
22
Tuning Tree:
➔ Step 4-c: However, the "Sunny" branch has an CV (28%) more than the
threshold (10%) which needs further splitting. We select "Windy" as the
best node after "Outlook" because it has the largest SDR.
23
Tuning Tree:
➔ Step 4-c: Because the number of data points for both branches (FALSE
and TRUE) is equal or less than 3, we stop further branching and assign
the average of each branch to the related leaf node.
24
Tuning Tree:
➔ Step 4-d: Moreover, the "rainy" branch has an CV (22%) which is more
than the threshold (10%). This branch needs further splitting. We select
“Temp" as the best node because it has the largest SDR.
25
Tuning Tree:
➔ Step 4-d: Because the number of data points for all three branches
(Cool, Hot and Mild) is equal or less than 3, we stop further branching
and assign the average of each branch to the related leaf node.
26
Decision Tree Algorithm Fitting:
27
Decision Tree Algorithm Fitting:
28
Decision Tree Advantages and disadvantage:
Advantages:
1.It can be used for both Classification and Regression problems
2.Easy to Understand, Interpret, Visualize
3.Useful in Data exploration
4.Less data preparation required
5.Can capture Nonlinear relationships
Disadvantage
1. Over fitting:
2. Not fit for continuous variables
3. Decision trees can be unstable
29
4. Sklearn and Decision Trees
30
THANK YOU!