Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views3 pages

Notes - CH 5 Decision Trees and Linear Regression

Chapter 5 discusses classification and regression in supervised learning, outlining the data, model, cost function, and types of supervised learning such as regression for continuous outputs and classification for discrete outputs. It explains logistic regression, classification trees, and the concept of purity gain, along with methods for controlling tree complexity and model evaluation using confusion matrices. Additionally, it covers linear regression models and the transformation of features to improve accuracy while maintaining linearity.

Uploaded by

ryanmak803
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views3 pages

Notes - CH 5 Decision Trees and Linear Regression

Chapter 5 discusses classification and regression in supervised learning, outlining the data, model, cost function, and types of supervised learning such as regression for continuous outputs and classification for discrete outputs. It explains logistic regression, classification trees, and the concept of purity gain, along with methods for controlling tree complexity and model evaluation using confusion matrices. Additionally, it covers linear regression models and the transformation of features to improve accuracy while maintaining linearity.

Uploaded by

ryanmak803
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

CH 5 Classification and regression

Supervised learning

Data Model Cost Function Types of supervised learning


Input Function with Dissimilarity measure between Regression: continuous y
(features) and parameters 𝑤 that observation and prediction to Classification: discrete y
outputs (label) maps input to output determine a model is good or bad)
{𝒙! , 𝑦! }"! 𝑓(𝒙; 𝒘) 𝑑(𝑦, 𝑓(𝒙; 𝒘))
Classicisation
Logistic regression
exp (𝑋𝒘) 1
𝜋! = = , 𝜋 = 𝑃(𝑦! = 1|𝒙! )
1 + exp (𝑋𝒘) 1 + exp (−𝑋𝒘) !
#
𝒘 can be found using the MLE approach. To maximize likelihood function ∏ 𝜋! ! (1 − 𝜋! )$%#!
max ∑𝑦! (ln 𝜋! ) + (1 − 𝑦! )(ln(1 − 𝜋! ))

Classification trees
If stop criterion is met: e.g. If the criterion is not met, partition the data into
Only contains 1 type of element subsets
Add a leaf note which assigns every Ask a number of question, partition the data accordingly
observation to the post prevalent class and select the question with the greatest purity gain

Purity gain
A binary splits create 3 partitions, root, left and right branches 𝑣$ , 𝑣& .
For each partition, 𝐼(𝑟), 𝐼(𝑣$ ), 𝐼(𝑣& ) is founded. Impurity measure can be one of the following
𝑬𝒏𝒕𝒓𝒐𝒑𝒚(𝒗) 𝑮𝒊𝒏𝒊(𝒗) 𝑪𝒍𝒂𝒔𝒔𝑬𝒓𝒓𝒐𝒓(𝒗)
' '
1 − max 𝑝(𝑐|𝑣)
− P 𝑝(𝑐|𝑣) log & 𝑝(𝑐|𝑣) 1 − P 𝑝(𝑐|𝑣)&
()$ ()$
𝑁𝑜. 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠 𝑐 𝑖𝑛 𝑏𝑟𝑎𝑛𝑐ℎ 𝑣
𝑝(𝑐|𝑣) =
𝑁(𝑣)
Purity gain is the weighted reduction in impurity
𝑁(𝑣* )
Δ = 𝐼(𝑟) − ∑ 𝐼(𝑣* )
𝑁(𝑟)

1
CH 5 Classification and regression
Example
𝒗𝟏 𝒗𝟐 Root
𝑷(𝑴𝒂𝒎𝒎𝒂𝒍) 0.2 0.6 0.333
𝑷(𝑵𝒐𝒏 − 𝒎𝒂𝒎𝒎𝒂𝒍) 0.8 0.4 0.666
𝒗𝟏 𝒗𝟐 Root
Entropy 0.3200 0.4800 0.4444
Gini 0.7219 0.9710 0.9183
Class Error 0.2000 0.4000 0.3333
Entropy Gini Class Error
Impurity gain 0.01778 0.3035 0
Controlling tree complexity
Stop splitting when a branch contains less than a specific number of observations.
Stop splitting if a certain depth of the tree is reached.
Stop splitting if purity gain ∆ for the best split is below a certain value.

Model evolution
Confusion matrix
Labeled positive Predicted negative
Actually positive True positive False negative
Actually negative False negative True negative

-./-0 1./10
Accuracy 0 , error rate 0 = 1 − 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦
K-nearest neighbor
Choose the number of neighbor K and A measure of distance
When performing classification: 1) Compute distance to all other data objects à Find the K-nearest data
objects à Classify according to majority of neighbors

Nearest neighbor decision surface

2
CH 5 Classification and regression
Regression
Linear model for regression
gg⃗ + 𝜀⃗ , where 𝛽⃗ = (𝑋 - 𝑋)%$ 𝑋 - 𝑦⃗
g⃗ = 𝑋𝑤
𝑌
Regression line in 1 dimension feature space Regression plane in 2 dimension feature space

Linear model after feature transformation


The features can be transformed into different forms to provide more accurate output without affecting
the linearity
g⃗ = 𝜙(𝑥)2 𝑤
𝑌 gg⃗ + 𝜀⃗, where 𝜙(𝑥)2 is a vector of function
Consider a model 𝑦 = 𝑤3 + 𝑤$ 𝑥$ + 𝑤& 𝑥& + 𝑤4 𝑥4 . 𝑥$& , cos(𝑥& ) or ln 𝑥4 can be used instead, the model
become 𝑦 = 𝑤3 + 𝑤$ 𝑥$& + 𝑤& cos 𝑥& + 𝑤4 ln 𝑥4 or 𝑌 g⃗ = 𝜙(𝑥)2 𝑤
gg⃗ + 𝜀⃗, 𝜙(𝑥) = [𝑥 & , cos 𝑥 , ln 𝑥]
Regression: 𝒚 = 𝒘𝟎 + 𝒘𝟏 𝒙 + 𝒘𝟐 𝒙𝟐 + 𝒘𝟑 𝒙𝟑 Regression: 𝒚 = 𝒘𝟎 + 𝒘𝟏 𝒄𝒐𝒔(𝒙) + 𝒘𝟐 𝒔𝒊𝒏 (𝟐𝒙)

You might also like