Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views7 pages

Machine Learning Notes

Uploaded by

khushi.gupta5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views7 pages

Machine Learning Notes

Uploaded by

khushi.gupta5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Machine Learning Notes

Module 1
Applications of Machine learning:
 Virtual Personal Assistant  Virtual Personal Assistant
 Speech recognition  Online Transportation
 Email spam and malware filtering  Social Media Services
 Bioinformatics  Email spam filtering
 Natural language processing  Product Recommendation
 Real Time Examples  Online Fraud detection
 Traffic prediction
Advantages of ML:
 Fast, Accurate, Efficient.
 Automation of most applications.
 Wide range of real-life applications.
 Enhanced cyber security and spam detection.
 No human Intervention is needed.
 Handling multi-dimensional data.
Disadvantages of ML:
 It is very difficult to identify and rectify the errors.
 Data Acquisition.
 Interpretation of results Requires more time and space.
Artificial Intelligence is a concept of creating intelligent machines that stimulates human
behavior whereas Machine learning is a subset of Artificial intelligence that allows machine
to learn from data without being programmed.
Disadvantages of Supervised Learning: not
suitable for handling the complex tasks,
cannot predict the correct output if the test
data is different from the training dataset.
Training required lots of computation
times, we need enough knowledge about
the classes of object.
Advantages of Unsupervised Learning:
used for more complex tasks as we don't
have labeled input data. preferable as it is
easy to get unlabeled data in comparison to
labeled data.
Disadvantages of Unsupervised Learning:
intrinsically more difficult than supervised
learning as it does not have corresponding
output. The result might be less accurate as
input data is not labeled, and algorithms do
not know the exact output in advance.

In machine learning projects, we generally divide the original dataset into training dataset
and testing dataset. We train our model over a subset of the original dataset, i.e., the training
dataset(>=60%), and then evaluate whether it can generalize well to the new or unseen
dataset or test set(20-25%).

If the accuracy of the model on training data is greater than that on testing data, then the
model is said to have overfitting. On the other hand, the model is said to be underfitted
when it is not able to capture the underlying trend of the data. It means the model shows poor
performance even with the training dataset. In most cases, underfitting issues occur when the
model is not perfectly suitable for the problem that we are trying to solve. To avoid the
overfitting issue, we can either increase the training time of the model or increase the number
of features in the dataset.
Basic steps of cross-validations are:
 Reserve a subset of the dataset as a validation set.
 Provide the training to the model using the training dataset.
 Now, evaluate model performance using the validation set. If the model performs well
with the validation set, perform the further step, else check for the issues.
Hypothesis space (H): Defined as a set of all possible legal hypotheses; hence it is also
known as a hypothesis set. It is used by supervised machine learning algorithms to determine
the best possible hypothesis to describe the target function or best maps input to output.
Hypothesis (h): It is defined as the approximate function that best describes the target in
supervised machine learning algorithms. It is primarily based on data as well as bias and
restrictions applied to data.
Steps to perform hypothesis test are as follows:
1) Formulate a Hypothesis
2) Determine the significance level
3) Determine the type of test
4) Calculate the Test Statistic values and the p values. P value is a probability (between
0-1) with the assumption that null hypothesis is true.
5) Make Decision
Number of currect predictions
Accuracy=
Total number of predictions
Confusion Matrix:
N Predicted: NO Predicted: YES

Actual: NO

Actual: YES
True Positives ( TP )
Precision(P)=
True Positives (TP )+ False Positives ( FP )
True Positives (TP)
Recall(R)=
True Positives ( TP ) + False Negatives ( FN )
2∗P∗R
F 1 score=
P+ R
If we maximize precision, it will minimize the FP errors, and if we maximize recall, it will
minimize the FN error.
AUC-ROC curve: Probability curve that plots TPR against FPR at various threshold values
and separates signal from noise.
TP FP
TPR∨Sensivity= FPR=
TP+ FN FP+TN
TN
Specificity=
TN + FP
Performance metrics for regression:
1
Mean Absolute Error ( MAE )=
N
∑|Y −Y '|
1

2
Mean Squared Error ( M S E )= ( Y −Y ' )
N

2 MSE ( model ) SSres


R ( coefficient of determination )=1− =1−
MSE ( baseline ) SS tot
n
SSres =∑ ( y i−^
2
yi )
i=1

n
SStot =∑ ( y i− y )
2

i=1

yi= observed values yi cap=predicted values y bar=mean of observed


 R2=1: Perfect fit (explains all variance)
 R2=0: explains none of the variance
 R2<0: Performs worse than simply predicting mean

Module 2
Simple linear regression is when you want to predict values of one variable, given values of
another variable. It is same as a bivariate correlation between the independent and dependent
variable.
The purpose of regression analysis is to come up with an equation of a line that fits through
that cluster of points with the minimal amount of deviations from the line.
Each data point di represents the difference between the observed y-value and the predicted
y-value for a given x- value on the line. These differences are called residuals.
Regression line (line of best fit)- sum of squares of residuals is minimum
^y =mx+b

RMSE=√ MSE

2
R =1−
∑ ( y i− ^y i )2
∑ ( y i− y )2
Multiple regression equation: ŷ = b + m1x1 + m2x2 + m3x3 + … + mkxk xn=independent
variable
Logistic regression equation:
Logistic can’t use MSE because it gives wavy graph, use logarithmic function instead

Gradient descent by differentiating:

A loss function measures the performance of a model by measuring the difference between
the output expected from the model and the actual output obtained from the model.
The optimizer helps improve the model by adjusting its parameters to minimize the loss
function value.
General Optimization Algorithm Structure:
 Initialize variables or population.
 Evaluate the objective function for current solutions.
 Update parameters using search strategy (gradient step, mutation, exploration, etc.).
 Check stopping criteria (max iterations, convergence, tolerance).
 Return the best solution.
Steepest Descent: In this method, the search starts from an initial trial point X1, and
iteratively moves along the steepest descent directions until the optimum point is found.
Newton’s method: Based on Taylor series
Derivative free optimization algorithms are often used when it is difficult to find function
derivatives, or if finding such derivatives are time consuming.
Random Search: This method generates trial solutions for the optimization model using
random number generators for the decision variables. Random search method includes
random jump method, random walk method and random walk method with direction
exploitation.
Simplex Method: Simplex method is a conventional direct search algorithm where the best
solution lies on the vertices of a geometric figure in N-dimensional space made of a set of
N+1 points.
The Nelder–Mead method (also downhill simplex method, amoeba method, or polytope
method) is a numerical method used to find the minimum or maximum of an objective
function in a multidimensional space. 2D- triangle, 3D- tetrahedron, 4D- pentachoron
Steps: Sort  Reflect  Extend  Contract  Shrink  Check convergence
Example:

You might also like