Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views8 pages

Lecture2 MCQ Guide

The document is a study guide on supervised learning, covering key concepts such as linear regression, logistic regression, decision trees, random forests, support vector machines, and k-nearest neighbors. It explains how each algorithm works, their formulas, advantages, disadvantages, and evaluation metrics like accuracy, precision, and F1 score. Additionally, it includes practice questions and calculation problems to reinforce understanding of the material.

Uploaded by

pereraasp2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views8 pages

Lecture2 MCQ Guide

The document is a study guide on supervised learning, covering key concepts such as linear regression, logistic regression, decision trees, random forests, support vector machines, and k-nearest neighbors. It explains how each algorithm works, their formulas, advantages, disadvantages, and evaluation metrics like accuracy, precision, and F1 score. Additionally, it includes practice questions and calculation problems to reinforce understanding of the material.

Uploaded by

pereraasp2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Lecture 2: Supervised Learning - MCQ Study

Guide
Key Concepts Explained Simply
Linear Regression
What is Linear Regression? Linear regression finds the best straight line
that relates input features to a continuous output value. It’s like drawing the
best-fitting line through scattered points.

The Formula y = �� + ��x� + ��x� + … + ��x� + �


Where: - y is what we’re trying to predict - x�, x�, etc. are the input features
- �� is the y-intercept (bias term) - ��, ��, etc. are the coefficients (weights) - � is
the error term

How It Works
1. Start with random coefficients
2. Calculate predictions using current coefficients
3. Measure the error (how far predictions are from actual values)
4. Adjust coefficients to reduce error
5. Repeat until error is minimized

Cost Function: Mean Squared Error (MSE) MSE = (1/n) ×


Σ(y_actual - y_predicted)²

Types of Linear Regression


• Simple Linear Regression: One input feature
• Multiple Linear Regression: Multiple input features
• Polynomial Regression: Adds polynomial terms (x², x³, etc.)

Assumptions
1. Linearity: Relationship between X and Y is linear
2. Independence: Observations are independent
3. Homoscedasticity: Constant variance in errors
4. Normality: Errors are normally distributed

Logistic Regression
What is Logistic Regression? Despite its name, logistic regression is used
for classification, not regression. It predicts the probability that an instance
belongs to a particular class.

1
The Formula P(y=1) = 1 / (1 + e^(-z))
Where: - z = �� + ��x� + ��x� + … + ��x� - P(y=1) is the probability of the positive
class

How It Works
1. Calculate z (linear combination of features)
2. Apply sigmoid function to get probability between 0 and 1
3. If probability > 0.5, predict class 1; otherwise, predict class 0

Cost Function: Log Loss (Binary Cross-Entropy) J(�) = -(1/n) × Σ[y


× log(p) + (1-y) × log(1-p)]

Types
• Binary Logistic Regression: Two classes
• Multinomial Logistic Regression: More than two classes

Decision Trees
What is a Decision Tree? A decision tree is a flowchart-like structure where
each internal node represents a decision based on a feature, each branch repre-
sents an outcome, and each leaf node represents a class label or value.

How It Works
1. Select the best feature to split the data
2. Create branches based on feature values
3. Repeat for each branch until stopping criteria is met

Splitting Criteria
• For Classification:
– Gini Impurity: Measures how often a randomly chosen element would
be incorrectly labeled
– Entropy: Measures the impurity or uncertainty
• For Regression:
– Variance Reduction: Minimize the variance in each node

Advantages
• Easy to understand and interpret
• Requires little data preparation
• Can handle both numerical and categorical data

2
Disadvantages
• Can create overly complex trees that don’t generalize well
• Can be unstable (small changes in data can result in very different trees)

Random Forest
What is a Random Forest? A random forest is an ensemble of decision trees.
It builds multiple trees and merges their predictions to get a more accurate and
stable result.

How It Works
1. Create multiple decision trees using random subsets of data
2. Each tree also uses a random subset of features
3. For classification: Take majority vote from all trees
4. For regression: Average the predictions from all trees

Key Concepts
• Bagging (Bootstrap Aggregating): Training each tree on a random
sample of data
• Feature Randomness: Each tree considers only a random subset of
features
• Out-of-Bag Error: Error estimate using samples not used for training

Advantages
• More accurate than a single decision tree
• Less overfitting
• Can handle large datasets with higher dimensionality

Support Vector Machines (SVM)


What is SVM? SVM finds the optimal hyperplane that maximizes the mar-
gin between different classes.

Key Concepts
• Hyperplane: Decision boundary that separates classes
• Margin: Distance between hyperplane and closest data points
• Support Vectors: Data points closest to the hyperplane
• Kernel Trick: Transforms data into higher dimensions to find separable
boundaries

Common Kernels
• Linear: For linearly separable data

3
• Polynomial: For curved boundaries
• Radial Basis Function (RBF): For complex, non-linear boundaries

Advantages
• Effective in high-dimensional spaces
• Memory efficient (uses only a subset of training points)
• Versatile (different kernel functions for various decision boundaries)

K-Nearest Neighbors (KNN)


What is KNN? KNN classifies a data point based on how its neighbors are
classified. It’s like determining someone’s traits based on the people they hang
out with.

How It Works
1. Choose a value for K (number of neighbors)
2. Calculate distance between new point and all training points
3. Select K points with smallest distances
4. For classification: Take majority vote
5. For regression: Take average value

Distance Metrics
• Euclidean Distance: Straight-line distance
• Manhattan Distance: Sum of absolute differences
• Minkowski Distance: Generalization of Euclidean and Manhattan

Choosing K
• Small K: More sensitive to noise
• Large K: Smoother decision boundaries but may miss important patterns

Advantages
• Simple to implement
• No training phase
• Works well for multi-class problems

Disadvantages
• Computationally expensive for large datasets
• Sensitive to irrelevant features
• Requires feature scaling

4
Model Evaluation for Classification
Confusion Matrix

Predicted Positive Predicted Negative


Actual Positive True Positive (TP) False Negative (FN)
Actual Negative False Positive (FP) True Negative (TN)

Metrics
• Accuracy: (TP + TN) / (TP + TN + FP + FN)
• Precision: TP / (TP + FP) - How many selected items are relevant?
• Recall (Sensitivity): TP / (TP + FN) - How many relevant items are
selected?
• F1 Score: 2 × (Precision × Recall) / (Precision + Recall)
• Specificity: TN / (TN + FP)

ROC Curve and AUC


• ROC Curve: Plots True Positive Rate vs. False Positive Rate
• AUC: Area Under the ROC Curve (higher is better)

Cross-Validation
• K-fold: Split data into k subsets, train on k-1 and test on the remaining
one
• Stratified K-fold: Maintains the same class distribution in each fold
• Leave-One-Out: Special case where k equals the number of samples

MCQ Practice Questions


Question 1
Which of the following algorithms uses a hyperplane to separate
classes? - A) Decision Trees - B) K-Nearest Neighbors - C) Support Vector
Machines - D) Random Forest
Answer: C) Support Vector Machines
Explanation: SVMs work by finding the optimal hyperplane that maximizes
the margin between different classes.

Question 2
In logistic regression, what function transforms the linear combina-
tion of features into a probability? - A) Exponential function - B) Sigmoid
function - C) Hyperbolic tangent - D) Identity function

5
Answer: B) Sigmoid function
Explanation: The sigmoid function (1/(1+e^(-z))) transforms any real number
into a value between 0 and 1, which can be interpreted as a probability.

Question 3
What is the main difference between a Random Forest and a single
Decision Tree? - A) Random Forest can only handle classification problems
- B) Random Forest uses multiple trees and aggregates their predictions - C)
Decision Trees are more computationally efficient - D) Random Forest doesn’t
require feature selection
Answer: B) Random Forest uses multiple trees and aggregates their predictions
Explanation: A Random Forest is an ensemble method that builds multiple
decision trees and combines their outputs for better accuracy and stability.

Question 4 Why Use F1 Score?


The F1 Score balances two
Which metric is most appropriate when dealing with imbalanced things:
classes? - A) Accuracy - B) F1 Score - C) Mean Squared Error - D) R-squared
Precision (How many
Answer: B) F1 Score predicted positives are
correct?)
Explanation: F1 Score is the harmonic mean of precision and recall, making
it useful for imbalanced datasets where accuracy might be misleading. Recall (How many actual
positives did you catch?)
Question 5
In K-Nearest Neighbors, what happens as K increases? - A) The model
becomes more complex and prone to overfitting - B) The decision boundary be-
comes smoother and less sensitive to noise - C) The computational cost decreases
- D) The model becomes more accurate regardless of the dataset
Answer: B) The decision boundary becomes smoother and less sensitive to
noise
Explanation: Larger K values consider more neighbors, resulting in smoother
decision boundaries that are less affected by individual noisy data points.

Question 6
What is the cost function commonly used in linear regression? - A)
Log Loss - B) Hinge Loss - C) Mean Squared Error - D) Cross-Entropy
Answer: C) Mean Squared Error
Explanation: Mean Squared Error (MSE) measures the average squared differ-
ence between predicted and actual values, making it appropriate for regression
problems.

6
Other options (Incorrect for Linear Regression):
A) Log Loss Used in logistic regression (classification).

B) Hinge Loss Used in SVM (Support Vector Machines).

D) Cross-Entropy (Log Loss) Also for classification, not regression.


Question 7
Which of the following is NOT an assumption of linear regression? -
A) Linearity - B) Independence of errors - C) Homoscedasticity - D) Categorical
target variable
Answer: D) Categorical target variable
Explanation: Linear regression assumes a continuous target variable. For
categorical targets, classification algorithms like logistic regression are more ap-
propriate.

Question 8
What is the “kernel trick” in Support Vector Machines? - A) A method
to reduce computational complexity - B) A technique to transform data into
higher dimensions without explicitly computing the transformation - C) A way
to combine multiple SVMs - D) A method to select the most important features
Answer: B) A technique to transform data into higher dimensions without
explicitly computing the transformation
Explanation: The kernel trick allows SVMs to operate in a high-dimensional
feature space without actually computing the coordinates of the data in that
space, making non-linear classification more efficient.

Calculation Problems
Problem 1: Linear Regression Prediction
Given a linear regression model with the equation y = 3x + 5, what
would be the predicted value for x = 4?
Solution: y = 3x + 5 y = 3(4) + 5 y = 12 + 5 y = 17

Problem 2: Logistic Regression Probability


In a logistic regression model, if z = �� + ��x� + ��x� = 2, what is the
probability P(y=1)?
Solution: P(y=1) = 1 / (1 + e^(-z)) P(y=1) = 1 / (1 + e^(-2)) P(y=1) = 1 /
(1 + 0.135) P(y=1) = 1 / 1.135 P(y=1) � 0.881 or 88.1%

Problem 3: Confusion Matrix Metrics


**A classification model produces the following confusion matrix: - True Pos-
itives (TP): 80 - False Positives (FP): 20 - False Negatives (FN): 30 - True
Negatives (TN): 70
Calculate the accuracy, precision, recall, and F1 score.**

7
Solution: Accuracy = (TP + TN) / (TP + TN + FP + FN) = (80 + 70) /
(80 + 70 + 20 + 30) = 150 / 200 = 0.75 or 75%
Precision = TP / (TP + FP) = 80 / (80 + 20) = 80 / 100 = 0.8 or 80%
Recall = TP / (TP + FN) = 80 / (80 + 30) = 80 / 110 = 0.727 or 72.7%
F1 Score = 2 × (Precision × Recall) / (Precision + Recall) = 2 × (0.8 × 0.727)
/ (0.8 + 0.727) = 2 × 0.582 / 1.527 = 1.164 / 1.527 = 0.762 or 76.2%

Problem 4: KNN Classification


In a K-Nearest Neighbors model with K=5, if the 5 nearest neighbors
to a new data point have classes [0, 1, 0, 1, 1], what class would be
assigned to the new point?
Solution: Count of class 0: 2 Count of class 1: 3 Since class 1 has the majority
(3 out of 5), the new point would be classified as class 1.

Key Formulas to Remember


1. Linear Regression: y = �� + ��x� + ��x� + … + ��x� + �
2. Mean Squared Error: MSE = (1/n) × Σ(y_actual - y_predicted)²
3. Logistic Regression: P(y=1) = 1 / (1 + e^(-z)) where z = �� + ��x� +
��x� + … + ��x�
4. Log Loss: J(�) = -(1/n) × Σ[y × log(p) + (1-y) × log(1-p)]
5. Accuracy: (TP + TN) / (TP + TN + FP + FN)
6. Precision: TP / (TP + FP)
7. Recall: TP / (TP + FN)
8. F1 Score: 2 × (Precision × Recall) / (Precision + Recall)
9. Euclidean Distance: √[(x� - x�)² + (y� - y�)² + …]
10. Gini Impurity: 1 - Σ(p_i)²
11. Entropy: -Σ[p_i × log�(p_i)]

Tips for MCQ Questions


1. Understand algorithm strengths and weaknesses: Know when to
use each algorithm.
2. Know the formulas: Be able to calculate basic metrics.
3. Understand model assumptions: Know what conditions must be met
for each model.
4. Practice with confusion matrices: Be comfortable calculating all met-
rics from a confusion matrix.
5. Visualize decision boundaries: Understand how different algorithms
create different types of boundaries.

You might also like