Unit 7
Learning Deterministic Models
Deterministic Models
• Mathematical models in which outcomes are precisely
determined though known relationship among known data
values
• Make models of real world situations
• Model can be used to make predictions, test assumptions
• Model does not include randomness
• If initial conditions are same then output is same
irrespective of the number of times model is run
• Ex: Known chemical reaction
• Ex: In Japan trains run on time, expected travel time can be
estimated using scheduled time
Stochastic (Probabilistic) Models
• Includes element of randomness
• Every time model is run output is likely to be
different even if same initial condition is used
• Final output is confidence interval
• Input has range of values in the form of
probability distribution
• Ex: rand()+2
• Ex: In USA trains are generally late therefore
expected time is probability distribution
Learning Methods to Generate Models
• Supervised
• Unsupervised
• Reinforcement
Supervised Learning
• Training data consists of pairs
• Each pair consists of input vector and a known output value
• Algorithm learns from training data and produces model
• Model should predict the correct output value for any new input
• It is a process in which
• Build a classifier based on input and output data
• Classifier is trained with the training set of data
• Classifier is tested with a test set of data
Supervised Learning
• Prediction is numerical value if regression and category if
classification
• Accuracy = Number of correct classifications/total number of test
cases
Classification
• Output variable is category
• Ex:
• Red or blue
• Disease or no disease
• Filtering email spam or not spam
• Transaction is authorised or not authorised
Regression
• Used for supervised learning technique
• Output is real or continuous
• Employs a model that describes a relationship between
dependent variables (output) and independent
variables (input)
• Ex: salary, weight
• Which of the following is regression task?
• Predicting
• age of a person
• Nationality of a person
• Whether stock price of a company will increase
tomorrow
Regression
Degree 1 Degree 2
• Single input and single output values
Examples: Regression
• Sale price of house in a particular locality in October 2020
• Next move by robot
• Next wicket in cricket match
• Helps investment and financial managers to value assets
and understand commodity prices
Linear Regression
Linear Regression
Linear Regression
Linear Regression
• Based on the independent variable, predict dependent
variable
• Y = ax + b
• Determine a and b such that estimation of output is correct
• Best line is the one which represents output which is closer
to actual output
Linear Regression
Experience Salary
(denomination of
1000)
3 30
8 57
9 64
13 72
3 36
6 43
11 59
21 90 Model:
• Salary=? Experience is 5 yrs
1 20 • Y=aX+b, • Y= 1.2x5 + 16 = 22
16 83 • a=1.2, b = 16
• Y=1.2X+16
Multiple Linear Regression
one
Multiple Linear Regression
Multiple Linear Regression
Multiple Linear Regression
Multiple Linear Regression
• Independent variables can be continuous or
categorical
• Fits a straight line which best describes all individual
data points
• Ex: selling price of house depends on location,
number of bedrooms, the year house was built etc.
• Geometrically multiple regression is fitting a hyper
plane in the d-dimensional space
Multiple Linear Regression Analysis
Examples: Multiple Regression Analysis
Examples: Multiple Regression Analysis
• Example 2: Model with 3 independent variables
• For x= {x1=1, x2=2, x3=1}, given that y=12
• Linear regression: y = ax1 + bx2 + cx3 + d, a=1, b=2, c=3, d=1
• For x= {x1=1, x2=2, x3=1}, y = 9
• Nonlinear regression: y = ax13 + bx22 + cx3 + d, a=1, b=2, c=3, d=1
• For x= {x1=1, x2=2, x3=1}, y = 13
• Nonlinear regression has better accuracy
Steps for Learning
1. Train
2. Validate (optional)
3. Test
Split for Train and Test data
Data
• Randomly split the complete data into
training and test sets
• Perform the model training on the training
set
• Use the test set for evaluation of model
• Ideally split the data into 70:30 or 80:20
• With this approach there is a possibility of
high bias if we have limited data
• because we would miss some information
about the data which we have not used for
training
• If our data is huge and our test sample and
train sample has the same distribution then
this approach is acceptable
• Has low variance because model does not
include each training point (green line)
Model
Validation for Model
Overfitting
• Has too many terms for the number of observations
• Model fits most random samples
• Occurs when model is too complex
• Does not generalize for unknown inputs
Cross validation
• Generally results in a less biased model as
compared to other methods
• Because it ensures that every observation
from the original dataset has the chance of
appearing in training and test set
• Useful if we have a limited number input data
samples
Cross validation
• Split the entire data randomly into k folds
• Train the model using the (K-1) folds
• Validate the model using the remaining Kth fold
• Determine errors, called cross validation error
• Repeat this process until every K-fold serve as the test set
• Take the average of cross validation error
• This is performance metric for the model
Cross validation
• Value of k shouldn’t be too small or too high
• Ideally we choose 5 to 10 depending on the
data size
• The higher value of K leads to less biased
model but large variance might lead to overfit
• Lower value of K is similar to the train-test
split
Crossvalidation: Example
• Data sample is {0.1, 0.2, 0.3, 0.4, 0.5, 0.6}
• Use k=3
• Shuffle data set and divide into 3 groups
• Fold1: 0.5, 0.2
• Model1: Fold1 as test data and remaining samples,
{0.1, 0.3, 0.4, 0.6} are used as training data
• Fold2: 0.1, 0.3
• Model2: Fold2 as test data and remaining samples,
{0.2, 0.4, 0.5, 0.6} are used as training data
• Fold3: 0.4, 0.6
• Model3: Fold3 as test data and remaining samples,
{0.1, 0.2, 0.3, 0.5} are used as training data
Cross validation: ideal procedure
• Divide data set into three sets
• Use train data to generate model and fit parameters
• Use validation data to tune other parameters
• Evaluate model using test data
• Test data is used to assess how classifier generalizes to new data
Decision Tree
• Tree shaped diagram used to determine a course
of action
• Each branch of the tree represents a possible
decision, occurrence or reaction
Decision Tree
Decision Tree
Decision Tree
• True/ False • Predict values
• Yes/No
Classification Tree
Regression Tree
Advantages of Decision tree
• Simple to understand, interpret and visualize
• Little efforts are required for data preparation
• Can handle both numerical and categorical
data
• Nonlinear parameters do not effect its
performance
Disadvantages of Decision tree
Important Terms
Important Terms
Important Terms
Important Terms
Important Terms
Important Terms
How does a Decision Tree work?
How does a Decision Tree work?
How does a Decision Tree work?
• The condition that gives the highest gain is used to make the first split
• Condition for maximum information gain is color yellow
How does a Decision Tree work?
How does a Decision Tree work?
How does a Decision Tree work?
How does a Decision Tree work?
Tree can predict all
classes of animals
with 100% accuracy
has single label
type therefore
entropy is zero
Decision Tree: Example
John will
play
tennis or
not?
New data D15 Rain High Weak ?
Decision Tree: Example
Decision Tree: Example
Decision Tree: Example
Decision Tree: Example
• New data D15- Outlook: Rain, Humidity: High, Wind: Weak
• John will play tennis or not?
• Rain with weak wind irrespective of humidity , John plays tennis
Decision Tree: Example
• How to decide which item to split
1. Determine entropy
2. Determine information gain
3. Identify which item shows maximum
information gain
4. Split identified data
Decision Tree Weekend Example
10 training instances
Cinema: 6, Tennis: 2, Stay at home: 1, Go to shopping: 1
Decision Tree Weekend Example
• 10 training instances
Cinema: 6, Tennis: 2, Stay at home: 1, Go to shopping: 1
• Entropy(T)
= -(6/10)log2(6/10) –(2/10)log2(2/10) –(1/10)log2(1/10)
= 1.571
• Gain(T, Weather) = ?
• Values of weather
• Sunny: 3 (1 cinema, 2 Tennis)
• Windy: 4 (3 Cinema, 1 Shopping)
• Rainy: 3 (2 Cinema, 1 Stay in)
• Entropy for each weather
• Entropy (TSunny) = -(1/3)log2(1/3)–(2/3)log2(2/3)
= 0.918
• Entropy (TWindy) = -(3/4)log2(3/4)–(1/4)log2(1/4)
= 0.811
• Entropy (TRainy) = -(2/3)log2(2/3)–(1/3)log2(1/3)
= 0.918
Decision Tree Weekend Example
• Entropy(T) = 1.571
• Entropy (TSunny) = 0.918
• Entropy (TWindy) = 0.811
• Entropy (TRainy) = 0.918
• Sunny: 3
• Windy: 4
• Rainy: 3
• Gain(T, Weather)
= Entropy(T) – P(Sunny)Entropy(Tsunny) - P(Windy)Entropy
(TWindy) - P(Rainy) Entropy (Trainy)
= 1.571 – (3/10)x0.918 – (4/10)x0.811 – (3/10)x0.918
= 0.70
Decision Tree Weekend Example
• Gain(T, Parents) = ?
• Yes: 5 (5 cinema)
• No: 5 (2 Tennis, 1 Cinema, 1 shopping, 1 stay in)
• Entropy (Tyes) = -(5/5)log2(5/5) = 0
• Entropy (Tno)
= -(2/5)log2(2/5)–(1/5)log2(1/5) –(1/5)log2(1/5)–(1/5)log2(1/5)
= 1.922
• Gain(T, Parents)
= Entropy(T) – P(yes) Entropy (Tyes) – P(yes) Entropy (Tyes)
= 1.571 – (5/10)x0 – (5/10)x 1.922
= 0.61
Decision Tree Weekend Example
• Gain(T, money) = ?
• Rich: 7 (3 cinema, 2 Tennis, 1 shopping, 1 stay in)
• Poor: 3 (3 Cinema)
• Entropy (Trich)
= -(3/7)log2(3/7)–(2/7)log2(2/7) –(1/7)log2(1/7)–(1/7)log2(1/7)
= 1.842
• Entropy (Tpoor)
= -(3/3)log2(3/3) = 0
• Gain(T, Money)
= Entropy(T) – P(rich) Entropy (Trich) – P(poor) Entropy (Tpoor)
= 1.571 – (7/10)x1.842 – (3/10)x 0
= 0.2816
Decision Tree Weekend Example
• Gain(T, Weather) = 0.70
• Gain(T, Parents) = 0.61
• Gain(T, money) = 0.2816
• Information gain is maximum for weather
Weather
sunny rainy
windy
Decision Tree Weekend Example
Decision Tree Weekend Example
• 3 training instances for weather: sunny
Cinema: 1, Tennis: 2
• Entropy(Sunny)
= -(1/3)log2(1/3) –(2/3)log2(2/3)
= 0.918
• Gain(Sunny, Parents) = ?
• yes: 1 (1 cinema)
• no: 2 (2 tennis)
• Gain(Sunny, parents)
= Entropy(Sunny) – P(Sunny_yes) Entropy (sunny_yes) –
P(sunny_no)Entropy (sunny_no)
= 0.918 – (1/3)x0 – (2/3)x 0
= 0.918
Decision Tree Weekend Example
• Gain(Sunny, money) = ?
• rich: 3 (1 cinema, 2 tennis)
• poor: 0
• Gain(Sunny, money)
= Entropy(Sunny) – P(Sunny_rich) Entropy (sunny_rich) –
P(sunny_poor) Entropy (sunny_poor)
= 0.918 – (3/3)x0.918 – (3/3)x 0
=0
• Gain(sunny, parents) is more than gain(sunny, money)
Decision Tree Weekend Example
Weather
sunny rainy
windy
Parents
yes
no
cinema
tennis
Unsupervised Learning
• Find hidden structure in unlabelled data
• Since data is not labeled there is no error for
training model
• It summarizes and produces key features of
the data
• It is a process in which
• Build an algorithm based on input data
• Test it to create classifiers
Reinforcement Learning
• Acts to maximize reward in a particular situation
• There is no output available
• Reinforcement decides what to do in the absence of
training dataset
• Learns from experience
• Process in which
• Build an algorithm based on input data
• State is dependent on the input data
• User rewards or punishes algorithm
• Algorithm learns from the reward or punishment and
updates itself
• Repeats this process for remaining inputs
• Learns from real time data