Regression & Regularization
Botao Jiao
Review
• Last week:
• Binary classification applications
• Evaluating classification models
• Artificial neurons
Binary Classification
Distinguish 2 classes
For example, we might want to classify data into categories like yes/no, true/false, or
positive/negative. The key idea here is that there are only two possible outcomes, and our
task is to decide which one the data belongs to.
Evaluation Methods: Confusion Matrix
•TP (True Positive): This is when the model
Actual correctly predicts the positive class. For example, it
Spam Trusted correctly marks a spam email as spam.
•TN (True Negative): This is when the model
Spam
TP FP TP= true positive correctly predicts the negative class, like marking a
Predicted
non-spam email as not spam.
TN = true negative •FP (False Positive): This is when the model
FP= false positive
Trusted
wrongly predicts the positive class, like marking a
FN TN FN = false negative non-spam email as spam.
•FN (False Negative): This is when the model
wrongly predicts the negative class, like marking a
spam email as not spam.
Artificial Neuron
Python Machine Learning; Raschkka & Mirjalili
Perceptron: Model (Linear Threshold Unit)
By connecting several neurons, we create a network that can process more information and make more
accurate decisions. This is how we build powerful models that can solve difficult problems, like recognizing
images or understanding speech.
Frank Rosenblatt, The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory, 1957
Today’s Topics
• Regression applications
• Evaluating regression models
• Background: notation
• Linear regression
• Polynomial regression
• Regularization (Ridge regression and Lasso regression)
Today’s Topics
• Regression applications
• Evaluating regression models
• Background: notation
• Linear regression
• Polynomial regression
• Regularization (Ridge regression and Lasso regression)
Today’s Focus: Regression
Predict continuous value
Predict Price to Charge for Your Home
Predict Future Stock Price
Predict Credit Score for Loan Lenders
Demo: https://www.youtube.com/watch?time_continue=6&v=0bEJO4Twgu4&feature=emb_logo
https://emerj.com/ai-sector-overviews/artificial-intelligence-applications-lending-loan-management/
What Else to Predict?
Insurance Cost Public Opinion
Popularity of Social Media Posts
Factory Analysis Call Center Complaints
Class Ratings
Weather Animal Behavior
Classification vs. Regression
Classification vs. Regression
Today’s Topics
• Regression applications
• Evaluating regression models
• Background: notation
• Linear regression
• Polynomial regression
• Regularization (Ridge regression and Lasso regression)
Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
Example:
Cost: $1,045,864 $918,000 $450,900 $725,000
Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
1. Split data into a “training set” and “ ”
Training Data Test Data
Example:
Cost: $1,045,864 $918,000 $450,900 $725,000
Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
2. Train model on “training set” to try to minimize prediction error on it
Training Data
Example:
Cost: $1,045,864 $918,000 $450,900
Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
3. Apply trained model on “ ” to measure generalization error
Test Data
Example:
Prediction Model
Cost: $725,000
Predicted Cost: ?
Regression Evaluation Metrics
Results: e.g., • Mean absolute error
Regression Evaluation Metrics
Results: e.g., • Mean absolute error
Today’s Topics
• Regression applications
• Evaluating regression models
• Background: notation
• Linear regression
• Polynomial regression
• Regularization (Ridge regression and Lasso regression)
Matrices and Vectors
• X : each feature is in its own column and each sample is in its own row
• y : each row is the target value for the sample
Feature 1 Feature 2 Feature M Label
Sample 1: 0.7 100 0.81 0.8
Sample N: 0.5 121 0.3 0.1
Matrices and Vectors
• X : each feature is in its own column and each sample is in its own row
• y : each row is the target value for the sample
Today’s Topics
• Regression applications
• Evaluating regression models
• Background: notation
• Linear regression
• Polynomial regression
• Regularization (Ridge regression and Lasso regression)
Linear Regression Model
• General formula:
Feature vector: x = x[0], x[1], …,x[p]
• How many features are there?
• p+1
Parameter vector to learn: w = w[0], w[1], …,w[p]
• How many parameters are there?
• p+2
Predicted value
“Simple” Linear Regression Model
• Formula: (Line)
Feature vector
Target
• How many features are there?
• 1
Parameter vector to learn
• How many parameters are there?
• 2
Predicted value Feature x
Figure Credit: http://sli.ics.uci.edu/Classes/2015W-273a?action=download&upname=04-linRegress.pdf
“Multiple” Linear Regression Model
• Formula:
(Plane)
Feature vector
• How many features are there?
• 2
Parameter vector to learn
• How many parameters are there?
• 3
x[0]
Predicted value x[0] x[1]
x[1]
Figure Credit: http://sli.ics.uci.edu/Classes/2015W-273a?action=download&upname=04-linRegress.pdf
Linear Regression Model: What to Learn?
• Weight coefficients:
• Indicates how much the predicted value will vary when that feature varies
while holding all the other features constant
Linear Regression Model: Learning Parameters
• Great interactive demo:
https://www.nctm.org/Classroom-Resources/Illuminations/Interactives/Line-of-Best-Fit/
Today’s Topics
• Regression applications
• Evaluating regression models
• Background: notation
• Linear regression
• Polynomial regression
• Regularization (Ridge regression and Lasso regression)
Linear Models: When They Are Not Good
Enough, Increase Representational Capacity
polynomial equations linear equations polynomial equations
(higher capacity) (lowest capacity) (highest capacity)
Polynomial Regression: Transform Features
to Model Non-Linear Relationships
• e.g., (Recall) Formula:
Predicted value
• e.g., New Formula: Parameter vector
Feature vector
• Still a linear model!
• But can now model more complex relationships!!
Polynomial Regression Model:
What Feature Transformation to Use?
• Why does train error shrink and test error grow?
• The higher the polynomial order the greater the
model “overfits” to the training data since it can
model noise! Models capturing noise generalize
poorly to new test data
• What polynomial order should you use?
Let’s watch a video on the Polynomial Regression
Model to help us understand it better.
Polynomial Regression for Machine Learning