Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
17 views23 pages

PSCS511 - Machine Learning

The document provides a comprehensive overview of Machine Learning (ML), covering its definition, importance, types, challenges, and various algorithms like linear regression, SVM, and decision trees. It also discusses deep learning as a subset of ML, highlighting its structure and applications. Each section includes definitions, explanations, examples, and types where applicable, making it suitable for M.Sc. CS exams.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views23 pages

PSCS511 - Machine Learning

The document provides a comprehensive overview of Machine Learning (ML), covering its definition, importance, types, challenges, and various algorithms like linear regression, SVM, and decision trees. It also discusses deep learning as a subset of ML, highlighting its structure and applications. Each section includes definitions, explanations, examples, and types where applicable, making it suitable for M.Sc. CS exams.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Absolutely!

Below is PSCS511 – Machine Learning made exam-ready, easy to understand,


while still sounding smart and detailed for M.Sc. CS exams. The structure follows:

Definition
Explanation
Example (Simple + Real-world)
Types (if applicable)

UNIT 1: The Fundamentals of Machine Learning

1. What is Machine Learning?

Definition:
Machine Learning (ML) is a method where computers learn from data to make decisions without
being directly programmed.

Explanation:
Instead of giving step-by-step instructions, we give the computer a lot of examples. It finds
patterns on its own and uses them to make future predictions.

Example:
If we show a computer many pictures of cats and dogs, it learns what a cat looks like vs a dog.
Later, it can tell if a new picture is a cat or dog without being told how.

Types:

1. Supervised Learning

2. Unsupervised Learning

3. Reinforcement Learning

2. Need and Importance of Machine Learning

Definition:
ML is needed to solve problems where writing exact rules is difficult, or where the system
should improve over time.

Explanation:
Some tasks are too complex for normal programming (like recognizing faces or driving cars). ML
learns automatically and adapts as it gets more data.

Example:
Google Maps learns traffic patterns to suggest faster routes, based on data from millions of
phones.
3. Types of Machine Learning

A. Supervised Learning

Definition:
Learning from data that already has the right answers (labels).

Explanation:
You teach the model with examples: each input has a correct output. The model tries to learn
the link.

Example:
Emails labeled “spam” or “not spam” help the model learn how to classify new emails.

Types:

• Classification (output is a category like Yes/No, Male/Female)

• Regression (output is a number like salary or price)

B. Unsupervised Learning

Definition:
Learning from data without any labels or correct answers.

Explanation:
The model explores the data and finds patterns or groups on its own.

Example:
Netflix groups users with similar movie tastes even if they never said what genre they like.

Types:

• Clustering (grouping similar data)

• Dimensionality Reduction (simplifying data without losing meaning)

C. Reinforcement Learning

Definition:
A model learns by doing actions and receiving rewards or penalties.

Explanation:
It improves step by step by trying, failing, and learning—like a child learning to ride a bike.

Example:
A robot learns to walk by getting points when it moves forward and losing points when it falls.

Types:

• Model-free (Q-learning)

• Model-based (plans using known environment)


4. Challenges of Machine Learning

Definition:
Things that make ML hard to build or use effectively.

Explanation:

• Overfitting: model remembers training data too well but fails on new data

• Underfitting: model is too simple and can’t learn well

• Bad data: missing, incorrect, or imbalanced data can confuse models

• Interpretability: some models are like “black boxes” (hard to understand)

Example:
If a face-recognition app only trains on pictures of light-skinned people, it may fail on dark-
skinned faces — that’s biased data.

5. Testing and Validation

Definition:
Steps to check how well the model is working on new (unseen) data.

Explanation:

• Training Set: to teach the model

• Validation Set: to tune the model (like testing before the final test)

• Test Set: to evaluate the final model’s real accuracy

Example:
A student studies from a textbook (training), solves a practice test (validation), and then gives a
real exam (testing).

6. Classification

Definition:
Predicting a category (label) based on input data.

Explanation:
The model puts things into classes based on features.

Example:
Medical diagnosis: Given symptoms, classify as “flu” or “not flu”.

Types:

• Binary (Yes/No)

• Multiclass (e.g., digits 0–9)


7. MNIST Dataset

Definition:
A famous dataset of handwritten digits (0 to 9) used for image classification tasks.

Explanation:
It contains 70,000 images of numbers written by different people. Each image is 28x28 pixels.

Example:
Used to train models that can read postal codes or numbers from scanned forms.

8. Performance Measures

Definition:
Metrics used to check how good or bad a model is.

Explanation:

• Accuracy = correct predictions ÷ total predictions

• Precision = how many predicted positives were correct

• Recall = how many actual positives were found

• F1 Score = balance between precision and recall

Example:
If your model says 100 people have COVID, but only 80 actually do, precision = 80%.

9. Confusion Matrix

Definition:
A table that shows how many predictions were right and wrong.

Explanation:

• TP (True Positive) = correctly predicted positive

• TN (True Negative) = correctly predicted negative

• FP (False Positive) = wrongly predicted as positive

• FN (False Negative) = wrongly predicted as negative

Example:

Predicted Positive Predicted Negative

Actual Positive TP (e.g., 90) FN (e.g., 10)

Actual Negative FP (e.g., 20) TN (e.g., 80)


10. Precision & Recall

Definition:

• Precision: How many predicted positives are correct

• Recall: How many actual positives were found

Explanation:
High precision = few false alarms
High recall = few missed cases

Example:
In cancer testing:

• High recall = catch most sick patients

• High precision = avoid falsely telling healthy people they’re sick

11. Precision-Recall Tradeoff

Definition:
Improving one (precision or recall) often worsens the other.

Explanation:
Changing the threshold (cut-off score) can make the model stricter or more generous.

Example:
A strict spam filter has high precision (only spam marked as spam), but may miss some spam
(low recall).

12. ROC Curve

Definition:
A graph that shows how well the model separates the classes.

Explanation:
ROC stands for Receiver Operating Characteristic. It plots TP rate vs FP rate.
AUC (Area Under Curve) shows overall quality: closer to 1 = better model.

Example:
If model A has AUC 0.95 and model B has 0.65, model A is better.
13. Multiclass Classification

Definition:
Classifying data into more than two categories.

Explanation:
Instead of just Yes/No, the model decides among 3 or more options.

Example:
Digit recognition (0 to 9) = 10 classes.

Techniques:

• One-vs-Rest: One class vs all others

• Softmax: Outputs probabilities for each class

14. Error Analysis

Definition:
Carefully studying where and why the model made mistakes.

Explanation:
Helps improve the model by finding patterns in errors.

Example:
If your digit model often confuses 4 and 9, maybe it needs better features to tell them apart.

Unit 1: DONE
UNIT 2: Training Models

1. Linear Regression

Definition:
A method to predict a continuous value based on the relationship between input (X) and output
(Y) using a straight line.

Explanation:
It fits the best straight line (Y = mX + b) through the data points by minimizing the gap between
predicted and actual values.

Example:
Predicting house prices based on size: bigger house → higher price.

Type:

• Simple Linear Regression (1 feature)

• Multiple Linear Regression (many features)

2. Gradient Descent

Definition:
An optimization algorithm used to minimize the error in a model by updating weights step by
step.

Explanation:
It checks how the output error changes with small changes in model weights and adjusts them
to reduce the error.

Example:
Like rolling a ball down a hill to reach the lowest point (minimum error).

Types:

• Batch GD – entire dataset per update (accurate but slow)

• Stochastic GD (SGD) – one sample per update (fast but noisy)

• Mini-batch GD – small batches (balance between speed and accuracy)


3. Polynomial Regression

Definition:
A type of regression that fits a curved line by adding powers of input (like X², X³).

Explanation:
Used when the data doesn't follow a straight-line trend, but still wants to use regression
technique.

Example:
Predicting salary based on experience where salary increases faster after some years.

Type:

• Quadratic Regression (X²)

• Cubic Regression (X³)

4. Learning Curves

Definition:
A graph showing the model’s performance (error) on training and validation sets over time or
data size.

Explanation:
Helps in checking whether the model is overfitting or underfitting.

Example:
If training error is low and validation error is high → overfitting.

5. Bias–Variance Tradeoff

Definition:
A balance between two types of model errors: bias (too simple) and variance (too complex).

Explanation:

• High bias: model can't learn well → underfitting

• High variance: model memorizes data too much → overfitting

Example:
A student who guesses answers (high bias) vs. one who memorizes the book but fails
application (high variance).
6. Ridge Regression

Definition:
A regularized version of linear regression that adds an L2 penalty (squares of weights) to reduce
overfitting.

Explanation:
Keeps the model weights small to prevent it from fitting noise in the data.

Example:
Useful when input features are many and related (multicollinearity).

7. Lasso Regression

Definition:
Similar to ridge regression but uses an L1 penalty (absolute values of weights).

Explanation:
Can shrink some weights to zero, effectively removing unnecessary features (feature selection).

Example:
If only 5 out of 20 features really matter, lasso keeps those 5.

8. Early Stopping

Definition:
A technique to stop training when the model starts overfitting (validation error increases).

Explanation:
Checks validation performance and stops before model starts to memorize the training data.

Example:
Like stopping cooking when the food is perfectly done, not overcooked.

9. Logistic Regression

Definition:
A classification algorithm that predicts the probability of a binary outcome using the sigmoid
function.

Explanation:
Instead of predicting numbers, it predicts a value between 0 and 1, which can be interpreted as
probability.

Example:
Predicting whether an email is spam (1) or not (0).

Types:

1. Binary Logistic Regression. 2. Multiclass via Softmax Regression.


10. Decision Boundaries

Definition:
A line (in 2D) or surface (in higher dimensions) that separates classes in classification tasks.

Explanation:
Based on model predictions, this is the line where the model decides whether an input belongs
to class A or B.

Example:
In a plot of height vs. weight, the boundary line can separate "fit" and "overweight" categories.

11. Softmax Regression

Definition:
A generalization of logistic regression for multiclass classification problems.

Explanation:
It gives a probability for each class and the class with the highest probability is selected.

Example:
Handwriting recognition from digits 0–9: softmax gives probabilities for each digit.

12. Cross-Entropy

Definition:
A loss function used in classification to measure the difference between predicted probabilities
and actual labels.

Explanation:
Helps the model learn how far off its predictions are, especially for probability-based outputs
like softmax.

Example:
If the model predicts 0.9 for class A but actual is class B, cross-entropy gives a high error.

Unit 2: DONE
UNIT 3: Support Vector Machines (SVM) & Decision Trees

1. Support Vector Machine (SVM)

Definition:
SVM is a supervised learning algorithm used for classification and regression. It finds the best
boundary (hyperplane) that separates classes with the maximum margin.

Explanation:
It selects data points (support vectors) closest to the decision boundary and tries to maximize
the distance between them. This helps the model generalize better.

Example:
In classifying emails as spam or not, SVM will find the best dividing line (or surface) based on
features like words or links.

Types:

• Linear SVM

• Non-Linear SVM (with kernels)

• SVM for regression (SVR)

2. Linear SVM Classification

Definition:
A type of SVM that works when the data can be separated by a straight line (or plane in higher
dimensions).

Explanation:
It finds a hyperplane (line or surface) that best separates the data into classes with maximum
margin.

Example:
Classifying apples and oranges based on size and color, when they are clearly separable.

3. Soft Margin Classification

Definition:
A variation of SVM that allows some misclassifications to prevent overfitting.

Explanation:
Not all data can be perfectly separated. Soft margin lets some points be on the wrong side of
the line but controls it using a parameter C.

Example:
In noisy email data, a few wrong classifications are allowed so the model doesn't become too
rigid.
4. Non-Linear SVM Classification

Definition:
An SVM that can classify data which is not linearly separable by using kernel functions.

Explanation:
It transforms input data into a higher-dimensional space where it becomes linearly separable.

Example:
Separating concentric circles (like a donut shape) which cannot be done using a straight line.

5. Polynomial Kernel

Definition:
A kernel function that maps input data into a higher-dimensional space using polynomial
equations.

Explanation:
It allows the SVM to create curved boundaries instead of straight lines.

Example:
If a linear boundary fails, a polynomial kernel can draw a curved line to separate overlapping
classes.

6. Gaussian RBF Kernel (Radial Basis Function)

Definition:
A kernel function that uses distance between points to measure similarity. It maps input data
into infinite-dimensional space.

Explanation:
The RBF kernel creates a flexible boundary that fits complex patterns.

Example:
Used in face recognition systems to handle non-linear and varied facial features.

7. SVM Regression (SVR)

Definition:
An SVM model used for predicting continuous values, not classes.

Explanation:
It tries to fit a line within a margin and ignores small errors. Focuses on points that lie outside
the margin.

Example:
Predicting stock prices where small prediction errors are acceptable.
8. Decision Trees

Definition:
A tree-like model used for classification and regression. It splits data into subsets based on
feature values.

Explanation:
Each internal node tests a feature, each branch represents the result, and each leaf node gives
the prediction.

Example:
A decision tree to decide if a person should play tennis based on weather conditions.

9. Training and Visualizing a Decision Tree

Definition:
Creating a tree by choosing the best features to split the data at each level using a criterion like
Gini or Entropy.

Explanation:
The goal is to build a tree that separates data as cleanly as possible.

Example:
If the first best split is "weather = sunny", the tree creates branches for "yes" and "no".

10. Making Predictions (Decision Tree)

Definition:
To predict, the model follows the tree path based on input feature values.

Explanation:
It starts from the root, checks each condition, and moves down to a leaf that contains the final
answer.

Example:
Input: “Humidity = High” → “Wind = Strong” → “Play = No”.

11. CART Training Algorithm

Definition:
CART (Classification and Regression Tree) is a standard algorithm for building decision trees.

Explanation:
It uses Gini Impurity for classification and splits data to reduce impurity at each step.

Example:
Used to build simple binary trees for decision-making in applications like loan approval.
12. Gini Impurity vs Entropy

Definition:
Both are measures of impurity (mixing) in a dataset.

Explanation:

• Gini Impurity: Measures how often a randomly chosen element would be incorrectly
labeled.

• Entropy: Measures the disorder or unpredictability in the dataset.

Example:
If a node has 50% yes and 50% no, both Gini and Entropy are high (uncertain). If all are yes, both
are 0 (pure).

13. Regularization Hyperparameters (Decision Trees)

Definition:
Settings that control the size and complexity of the tree to avoid overfitting.

Explanation:

• max_depth: limits how deep the tree can grow

• min_samples_split: sets minimum samples to make a split

• min_samples_leaf: sets minimum samples in each leaf

Example:
Setting max_depth=3 limits the tree to 3 levels → smaller and simpler model.

Unit 3: DONE
UNIT 4: Fundamentals of Deep Learning

1. What is Deep Learning?

Definition:
Deep Learning is a subset of Machine Learning that uses multi-layered neural networks to learn
from large amounts of data.

Explanation:
It mimics the way the human brain processes information using layers of "neurons" that
transform input data step by step into accurate predictions.

Example:
Face recognition in smartphones uses deep learning to identify your face from millions of pixels.

Types:
Deep Learning mainly uses:

• Artificial Neural Networks (ANN)

• Convolutional Neural Networks (CNN)

• Recurrent Neural Networks (RNN)

2. Need for Deep Learning

Definition:
Deep Learning is needed when traditional ML methods cannot handle the complexity or size of
the data.

Explanation:
DL can learn hidden patterns in high-dimensional data (like images, audio, video) and performs
better as the dataset grows.

Example:
Detecting diseases from X-ray images, translating speech to text, or self-driving car vision
systems—all use DL because simpler models fail to capture the complexity.
3. Artificial Neural Network (ANN)

Definition:
ANN is a network of interconnected nodes (neurons) inspired by the human brain, used to solve
tasks like classification, prediction, and pattern recognition.

Explanation:
Each neuron takes inputs, applies a weight, adds a bias, and passes the result through an
activation function. Layers of such neurons allow complex transformations.

Example:
An ANN predicting handwritten digits (0–9) by learning patterns in pixel values.

Types of Layers:

• Input Layer: Takes raw data

• Hidden Layers: Learn patterns

• Output Layer: Gives final result

4. Core Components of Neural Networks

Definition:
Basic building blocks that make up a neural network.

Explanation:
Includes:

• Neurons: processing units

• Weights: importance of inputs

• Bias: adjustable constant

• Activation Functions: decide output

• Loss Function: measures error

• Optimizer: improves learning

Example:
A single neuron gets 3 inputs with weights and bias, applies an activation function, and gives an
output.
5. Multi-Layer Perceptron (MLP)

Definition:
A type of ANN with multiple layers of neurons between input and output.

Explanation:
MLP is fully connected, meaning each neuron in one layer connects to all neurons in the next. It
is used for tasks like classification and regression.

Example:
An MLP trained on MNIST dataset learns to recognize digits by adjusting weights across layers.

Types:

• Shallow MLP: 1 hidden layer

• Deep MLP: multiple hidden layers

6. Activation Functions

Definition:
Functions that decide whether a neuron should be activated (i.e., pass information to the next
layer).

Explanation:
They introduce non-linearity, allowing the network to learn complex patterns.

Common Types:

• Sigmoid: outputs between 0 and 1

• ReLU (Rectified Linear Unit): outputs 0 if input < 0, else returns input

• Tanh: outputs between -1 and 1

Example:
ReLU is used in image models because it helps with faster training and avoids the vanishing
gradient problem.
7. Tensors and Tensor Operations

Definition:
Tensors are multi-dimensional arrays used to store and operate on data in deep learning.

Explanation:
Like numbers (0D), vectors (1D), and matrices (2D), tensors can be extended to higher
dimensions. Operations like addition, multiplication, reshaping happen on tensors.

Example:
A colored image is stored as a 3D tensor: height × width × color_channels (e.g., 28×28×3).

Types:

• 0D: Scalar (single value)

• 1D: Vector

• 2D: Matrix

• 3D+: Tensor (e.g., batches of images)

8. TensorFlow Framework

Definition:
TensorFlow is an open-source deep learning library developed by Google for building and
training machine learning models.

Explanation:
It supports both simple and complex DL models using Python. TensorFlow uses computational
graphs and automatic differentiation to speed up training on CPU/GPU/TPU.

Example:
You can build a neural network in TensorFlow with just a few lines: define layers, loss function,
optimizer, and train using .fit().

Features:

• Easy to use with Keras

• Works on all platforms

• Supports visualization with TensorBoard

UNIT 4: DONE
Important Machine Learning Formulas (Units 1–4)

UNIT 1: Fundamentals of ML

1. Accuracy

Accuracy=TP+TNTP+TN+FP+FN\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}

Explanation:
Tells how many total predictions were correct (both positives and negatives).

Example:
If 80 emails were classified correctly (spam or not) out of 100 total:

Accuracy=80100=0.8=80%\text{Accuracy} = \frac{80}{100} = 0.8 = 80\%

2. Precision

Precision=TPTP+FP\text{Precision} = \frac{TP}{TP + FP}

Explanation:
Out of all predicted positives, how many were actually correct.

Example:
Predicted 50 spam, and 40 were actually spam:

Precision=4050=0.8=80%\text{Precision} = \frac{40}{50} = 0.8 = 80\%

3. Recall (Sensitivity)

Recall=TPTP+FN\text{Recall} = \frac{TP}{TP + FN}

Explanation:
Out of all actual positives, how many the model correctly identified.

Example:
There were 50 spam emails, and model caught 40:

Recall=4050=0.8=80%\text{Recall} = \frac{40}{50} = 0.8 = 80\%


4. F1 Score

F1 Score=2×Precision×RecallPrecision+Recall\text{F1 Score} = 2 \times \frac{\text{Precision}


\times \text{Recall}}{\text{Precision} + \text{Recall}}

Explanation:
Harmonic mean of Precision and Recall – used when we want balance between them.

Example:
Precision = 0.8, Recall = 0.6:

F1=2×0.8×0.60.8+0.6=2×0.481.4=0.6857F1 = 2 \times \frac{0.8 \times 0.6}{0.8 + 0.6} = 2 \times


\frac{0.48}{1.4} = 0.6857

UNIT 2: Training Models

5. Linear Regression Equation

Y=wX+bY = wX + b

Explanation:

• Y: Predicted value

• X: Input feature

• w: Weight (slope)

• b: Bias (intercept)

Example:
If weight = 2 and bias = 3:

Y=2X+3Y = 2X + 3

6. Mean Squared Error (MSE)

MSE=1n∑i=1n(yi−y^i)2\text{MSE} = \frac{1}{n} \sum_{i=1}^{n}(y_i - \hat{y}_i)^2

Explanation:
Average squared difference between actual and predicted values. Used as loss function in
regression.

Example:
Actual: [4, 5], Predicted: [3, 6]
MSE = ((4−3)² + (5−6)²)/2 = (1 + 1)/2 = 1
7. Gradient Descent Update Rule

w:=w−α⋅∂J∂ww := w - \alpha \cdot \frac{\partial J}{\partial w}

Explanation:

• Update weight by subtracting the gradient (slope of error)

• α (alpha) = learning rate

Example:
If gradient = 0.5 and learning rate = 0.1, then:

w:=w−0.1×0.5=w−0.05w := w - 0.1 \times 0.5 = w - 0.05

8. Logistic Regression (Sigmoid Function)

y^=σ(z)=11+e−z,z=wX+b\hat{y} = \sigma(z) = \frac{1}{1 + e^{-z}}, \quad z = wX + b

Explanation:
Outputs probability between 0 and 1. Used in binary classification.

Example:
If z = 2,

σ(2)=11+e−2≈0.88\sigma(2) = \frac{1}{1 + e^{-2}} ≈ 0.88

9. Cross-Entropy Loss (Binary)

Loss=−[ylog⁡(y^)+(1−y)log⁡(1−y^)]\text{Loss} = -[y \log(\hat{y}) + (1 - y) \log(1 - \hat{y})]

Explanation:
Measures how far the prediction probability is from the actual label.

Example:
If y = 1 and prediction = 0.9,

Loss=−log⁡(0.9)≈0.105Loss = -\log(0.9) ≈ 0.105


UNIT 3: SVM & Decision Trees

10. SVM Optimization Objective (Hard Margin)

Minimize: 12∥w∥2\text{Minimize: } \frac{1}{2} \|w\|^2

Subject to:

yi(w⋅xi+b)≥1y_i(w \cdot x_i + b) \geq 1

Explanation:
Tries to find the hyperplane with the largest margin (distance between classes).

11. Gini Impurity (for Decision Trees)

G=1−∑i=1npi2G = 1 - \sum_{i=1}^{n} p_i^2

Explanation:
Measures how “mixed” the classes are. Lower Gini = purer node.

Example:
50% yes, 50% no → Gini = 1 - (0.5² + 0.5²) = 0.5
100% yes → Gini = 0

12. Entropy (Information Gain)

H=−∑i=1npilog⁡2(pi)H = -\sum_{i=1}^{n} p_i \log_2(p_i)

Explanation:
Used to measure uncertainty. Used in ID3 decision tree algorithm.

Example:
p(Yes) = 0.8, p(No) = 0.2

H=−[0.8log⁡2(0.8)+0.2log⁡2(0.2)]≈0.72H = -[0.8 \log_2(0.8) + 0.2 \log_2(0.2)] ≈ 0.72


UNIT 4: Deep Learning

13. Neuron Output

z=w1x1+w2x2+...+wnxn+bz = w_1x_1 + w_2x_2 + ... + w_nx_n + b


Output=Activation(z)\text{Output} = \text{Activation}(z)

Explanation:
A neuron computes weighted sum of inputs and passes it through an activation function.

Example:
For inputs x₁=1, x₂=2; weights w₁=0.5, w₂=0.3, b=0.2:

z=0.5×1+0.3×2+0.2=1.3z = 0.5×1 + 0.3×2 + 0.2 = 1.3

14. ReLU Activation Function

f(x)=max⁡(0,x)f(x) = \max(0, x)

Explanation:
If input is negative, output is 0. Else, return the input. Used for speed and performance.

Example:
f(3) = 3, f(-5) = 0

15. Softmax Function (for Multiclass Output)

σ(zi)=ezi∑j=1Kezj\sigma(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}}

Explanation:
Gives probabilities for each class in multiclass classification. All outputs sum to 1.

Example:
If z = [2, 1, 0], softmax returns probabilities like [0.7, 0.2, 0.1]

16. Categorical Cross-Entropy (for Softmax)

Loss=−∑i=1Cyilog⁡(y^i)\text{Loss} = -\sum_{i=1}^{C} y_i \log(\hat{y}_i)

Explanation:
Used to measure how close predicted class probabilities are to actual class.

That's All the Important Formulas!

You might also like