0% found this document useful (0 votes)

3 views26 pages

CART

Decision trees are a tree-structured algorithm used for modeling relationships in data, suitable for both classification and regression tasks. They are easy to interpret and visualize but may not perform as well as more complex methods like random forests or boosting. Key concepts include root nodes, internal nodes, and leaf nodes, with considerations for overfitting and pruning to optimize model performance.

Uploaded by

jparrojado

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views26 pages

CART

Uploaded by

jparrojado

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Decision Trees

Decision Tree Overview

● Algorithm uses tree structure to model relationships
among the features and the potential outcomes

● It breaks down dataset into smaller subset with

increase in depth of tree

● It’s a flowchart for deciding how to classify a new

observation
Decision Tree Overview
● Decision trees are simple and useful for intepretation
● Has nice graphical representation
● Typically not competitive with best supervised learning approaches
● Random forests and Boosting are some ways to modify the approach
● Sometimes result in dramatic increase in prediction accuracy
● We will focus on basic Classification And Regression Trees (CART)
Terminology in CART
● Root node – first decision node starting from the original data

● Internal nodes – points along the tree where we split the data

● Branch – segment of the tree that connect the nodes

● Leaf/Terminal node – holds final result of splitting the data

Decision Tree Overview
Root Node
Branch

Internal node Internal node

Spli
t

Terminal node Internal node Terminal node Terminal node

Terminal node Terminal node

Classification Vs Regression
● Classification
○ Spam/ Not spam
○ Admit to ICU or not
○ Lend money/ deny
○ Intrusion detection

● Regression
○ Predict stock returns
○ Pricing a house or a car
○ Weather predictions (Temp, Rainfall etc.)
○ Economic growth predictions
Decision Tree Overview
● The decision tree algorithm learns (i.e creates the decision tree from the data set)
through the optimization of an error function.

● CART is one of the special techniques which can be used for solving both regression &
classification problems.

● In this lecture, we will be focusing on how to solve a classification problem. The

approach is similar for solving a regression problem.
Visualizing a Classification tree
● Let’s look at a simple example
● How do we classify the observations into red or blue?
Steps in a decision tree
●
Steps in a decision tree
●

RED CLASS BLUE CLASS

Steps in a decision tree
● Once the final divisions are made, if
a new data point comes in which has
value (0.2,0.3) – the classification BLUE CLASS RED CLASS

tree predicts that the new

observation is Red as it falls in the
region where the majority is Red
RED CLASS BLUE CLASS
Representation as a tree

Red Blue
Blue Red
●
How do we know where to split?
● The below dataset was easy where the pattern was easy to visualize. What about datasets which
are complicated where the patterns are more complex? How does the algorithm decide where to
split?

BLUE CLASS RED CLASS

RED CLASS BLUE CLASS

How do we know where to split?

● We choose the variable and place to split that results in the smallest sum
of the Gini indices of the two new regions
Maximum and minimum of Gini Index (2 classes)

G=1(1-1)+0(1-0)=0
Simple example
How do we know where to split?

G1 G2

G3 G4
Let’s build a small tree for a real world problem

Root Node
MALE FEMALE

Node Node
Group 1

Group 2
Let’s build a small tree for a real world problem
●

Group 1

Group 2
Let’s build a small tree for a real world problem

● A computer would calculate the Gini index for

splitting with different independent variables –
Occupation, Age, Gender etc. and finds the
optimal variable to split the data
Overfitting in Decision trees

● Splitting non-stop eventually

leads to each point being its
own region
● Such a decision tree model
would perform very poorly
on unseen test data.
● Our model is overfitting the
data !
How do we know when to stop?

● One strategy is to stop after the decrease in Gini index due to each split
is smaller than some threshold
● Not good as we may miss some good splits due to stopping early at poor
splits
● Better strategy is to grow a very large tree, then prune it to obtain a
subtree
Pruning
●
Pruning

● Pruning is not available in MS Azure

● How to decide optimal size of a tree?

● Answer – tuning (trial and error) and cross validation

● Build different decision trees with different hyperparameter values. Use

cross validation to observe their performance, and pick the best one
Hyperparameters for Decision Tree
● Maximum Depth- the largest length between the root to leaf – We can define the
depth of the decision tree

● Minimum number of samples per leaf - we can set a minimum for the number of
samples we allow on each leaf.

● Minimum sample split - the minimum number of samples required to split an internal
node

● Maximum features - the number of features that one looks for in each split.
Advantages Vs Disadvantages

Advantages Disadvantages

• Graphical representation of results: • Prediction accuracy is not as good as

very easy to explain to people more complicated approaches
(probably even easier than linear • Computational issue with large
regression) categorical variables
• Intepretation (logic of the model) • Final tree is not very stable (small
• Handle missing data change in data leads to very different
• Handles numeric and categorical tree)
variables
• Captures nonlinear effect

BR235 - EN - Col18 SAP Convergent Charging
No ratings yet
BR235 - EN - Col18 SAP Convergent Charging
195 pages
Laptop Price Prediction Using Machine Learning: International Journal of Computer Science and Mobile Computing
100% (1)
Laptop Price Prediction Using Machine Learning: International Journal of Computer Science and Mobile Computing
5 pages
Classification: Decision Trees: Business Analytics Lecture 7/8
No ratings yet
Classification: Decision Trees: Business Analytics Lecture 7/8
35 pages
Unit No.02 - Feature Extraction and Selection
No ratings yet
Unit No.02 - Feature Extraction and Selection
17 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Unit 4
No ratings yet
Unit 4
33 pages
CSL0777 L25
No ratings yet
CSL0777 L25
39 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Breaking Down Decision Tree Algorithm
No ratings yet
Breaking Down Decision Tree Algorithm
10 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Telecom Churn Predictive Analysis
No ratings yet
Telecom Churn Predictive Analysis
74 pages
Students Placement Prediction Using Machine Learning Algorithms
No ratings yet
Students Placement Prediction Using Machine Learning Algorithms
14 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
7 pages
Decision Trees for Data Scientists
0% (1)
Decision Trees for Data Scientists
24 pages
Decision Tree Analysis in PM
No ratings yet
Decision Tree Analysis in PM
33 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Decision Trees for Beginners
No ratings yet
Decision Trees for Beginners
45 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
13 pages
Decisiontree
No ratings yet
Decisiontree
6 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
34 pages
AI - Mod 5. Part 2
No ratings yet
AI - Mod 5. Part 2
40 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
Non-Metric Classification & Decision Trees
No ratings yet
Non-Metric Classification & Decision Trees
35 pages
Decision Tree DT
No ratings yet
Decision Tree DT
20 pages
Unit Iir20
No ratings yet
Unit Iir20
22 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
3 Decision Tree Learning
No ratings yet
3 Decision Tree Learning
38 pages
Thesis Template Final Content v6
No ratings yet
Thesis Template Final Content v6
75 pages
Operations and Supply Chain Management Week 5
No ratings yet
Operations and Supply Chain Management Week 5
5 pages
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
No ratings yet
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
22 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Decisiontrees
No ratings yet
Decisiontrees
28 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Appendix B: Using Palisade's Decision Tools Suite: B.3.4 Modifying A Graph
No ratings yet
Appendix B: Using Palisade's Decision Tools Suite: B.3.4 Modifying A Graph
6 pages
B15-Content - Analysis - in - Social - Media (1) - Bbhavani
No ratings yet
B15-Content - Analysis - in - Social - Media (1) - Bbhavani
59 pages
Tree
No ratings yet
Tree
7 pages
Machine Learning Chapter 4
No ratings yet
Machine Learning Chapter 4
9 pages
Session 11 and Session 12-Decision Trees Construction Worked Examples
No ratings yet
Session 11 and Session 12-Decision Trees Construction Worked Examples
5 pages
Decision Tree Learning (8 Hours)
No ratings yet
Decision Tree Learning (8 Hours)
141 pages
AISD Paper 5
No ratings yet
AISD Paper 5
16 pages
Decsion Tree
No ratings yet
Decsion Tree
6 pages
Enhancing Industrial IoT Security: A Comprehensive Approach To Intrusion Detection Using PCA-Driven Decision Trees
No ratings yet
Enhancing Industrial IoT Security: A Comprehensive Approach To Intrusion Detection Using PCA-Driven Decision Trees
11 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
1 Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, OR 97331-3902 August 29, 1994
No ratings yet
1 Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, OR 97331-3902 August 29, 1994
46 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Programming Assign Unit 5
No ratings yet
Programming Assign Unit 5
3 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
5 pages
Formulario - de - Extraccion - de - Datos - 2
No ratings yet
Formulario - de - Extraccion - de - Datos - 2
4 pages
Data Mining-1
No ratings yet
Data Mining-1
15 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Assignment 2 Solution
No ratings yet
Assignment 2 Solution
4 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
16 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
Loan Default Prediction Using Machine Learning
No ratings yet
Loan Default Prediction Using Machine Learning
5 pages
Unit 1 DMW
No ratings yet
Unit 1 DMW
41 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
11 pages
Weka Lab Manual
No ratings yet
Weka Lab Manual
49 pages
G14 (2) Removed
No ratings yet
G14 (2) Removed
33 pages
Lecture 3 - Decision Trees and Random Forest
No ratings yet
Lecture 3 - Decision Trees and Random Forest
20 pages
Naive Bayes and Decision Tree Classification
No ratings yet
Naive Bayes and Decision Tree Classification
21 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Lecture 5a
No ratings yet
Lecture 5a
24 pages
Data Science Lab Record2025
No ratings yet
Data Science Lab Record2025
64 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Team 11 Finala
No ratings yet
Team 11 Finala
26 pages
DS Tech M 3 1
No ratings yet
DS Tech M 3 1
13 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Predicting Student Stress Levels Based On Daily RESEARCH PAPER Li
No ratings yet
Predicting Student Stress Levels Based On Daily RESEARCH PAPER Li
15 pages
Unit 3 - ML (NEW)
No ratings yet
Unit 3 - ML (NEW)
68 pages
ML-PPT Unit Iii-1
No ratings yet
ML-PPT Unit Iii-1
38 pages
Decisiontree, Prefixcodeandgametree
No ratings yet
Decisiontree, Prefixcodeandgametree
12 pages
(Ebook PDF) Spreadsheet Modeling and Decision Analysis: A Practical Introduction To Business Analytics 7th Edition Instant Download
100% (1)
(Ebook PDF) Spreadsheet Modeling and Decision Analysis: A Practical Introduction To Business Analytics 7th Edition Instant Download
58 pages
2.unit 2
No ratings yet
2.unit 2
23 pages
Unit 3.2 Decision Tree Algorithm Wit Examples
No ratings yet
Unit 3.2 Decision Tree Algorithm Wit Examples
85 pages
23CS0902
No ratings yet
23CS0902
13 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Decision Tree & Random ForestNotes
No ratings yet
Decision Tree & Random ForestNotes
11 pages
Unit 3
No ratings yet
Unit 3
25 pages

CART

Uploaded by

CART

Uploaded by

Decision Trees

Decision Tree Overview

● It breaks down dataset into smaller subset with

● It’s a flowchart for deciding how to classify a new

● Branch – segment of the tree that connect the nodes

● Leaf/Terminal node – holds final result of splitting the data

Internal node Internal node

Terminal node Internal node Terminal node Terminal node

Terminal node Terminal node

● In this lecture, we will be focusing on how to solve a classification problem. The

RED CLASS BLUE CLASS

tree predicts that the new

BLUE CLASS RED CLASS

RED CLASS BLUE CLASS

● A computer would calculate the Gini index for

● Splitting non-stop eventually

● Pruning is not available in MS Azure

● Answer – tuning (trial and error) and cross validation

● Build different decision trees with different hyperparameter values. Use

• Graphical representation of results: • Prediction accuracy is not as good as

You might also like