Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views34 pages

ML SecA

The document provides an overview of machine learning, including its types (supervised, unsupervised, semi-supervised, and reinforcement learning), methods, and applications. It discusses the machine learning life cycle, data sourcing, dimensionality reduction techniques like PCA, and validation methods such as cross-validation and confusion matrices. Additionally, it highlights real-world use cases like speech recognition, customer service, and fraud detection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views34 pages

ML SecA

The document provides an overview of machine learning, including its types (supervised, unsupervised, semi-supervised, and reinforcement learning), methods, and applications. It discusses the machine learning life cycle, data sourcing, dimensionality reduction techniques like PCA, and validation methods such as cross-validation and confusion matrices. Additionally, it highlights real-world use cases like speech recognition, customer service, and fraud detection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Machine Learning

Course Code: DAL 224


Section A

Introduction: Introduction to Machine learning, types: supervised


learning, unsupervised learning, applications of machine learning,
model representation

Basic Concepts of Learning Models and its performance Evaluation:


Dimensionality reduction using Principal component analysis, a general
view of feature extraction, feature ranking, Validation techniques,
Confusion matrix and its related performance parameters.
Machine Learning

Machine Learning allows the computers to learn from the experiences by its own, use
statistical methods to improve the performance and predict the output without being
explicitly programmed. Machine learning is a branch of artificial intelligence (AI) and
computer science which focuses on the use of data and algorithms to imitate the
way that humans learn, gradually improving its accuracy.

The term machine learning was first introduced by Arthur Samuel in 1959.
How Machine Learning works?

A Decision Process: In general, machine learning algorithms are used to make a prediction
or classification. Based on some input data, which can be labeled or unlabeled, your
algorithm will produce an estimate about a pattern in the data.
An Error Function: An error function evaluates the prediction of the model. If there are
known examples, an error function can make a comparison to assess the accuracy of the
model.
A Model Optimization Process: If the model can fit better to the data points in the training
set, then weights are adjusted to reduce the discrepancy between the known example and
the model estimate. The algorithm will repeat this “evaluate and optimize” process,
updating weights autonomously until a threshold of accuracy has been met.
Machine learning life cycle
Machine learning life cycle involves seven major steps, which are given below:
1. Gathering Data
2. Data preparation
3. Data Wrangling
4. Analyse Data
5. Train the model
6. Test the model
7. Deployment
The most important thing in the complete process is to understand the problem and to know the
purpose of the problem. Therefore, before starting the life cycle, we need to understand the
problem because the good result depends on the better understanding of the problem.
How to get datasets for Machine Learning
A dataset is a collection of data in which data is arranged in some order. A dataset can contain
any data from a series of an array to a database table. To work with machine learning projects,
we need a huge amount of data, because, without the data, one cannot train ML/AI models.

Popular Dataset Sources:


Kaggle Datasets: https://www.kaggle.com/datasets
UCI Machine Learning Repository:
https://archive.ics.uci.edu/ml/index.php
Datasets via AWS: https://registry.opendata.aws/
Google Dataset: https://datasetsearch.research.google.com/
Scikit Learn: https://scikit-learn.org/stable/datasets
Machine Learning Methods
Supervised learning, also known as supervised machine learning, is defined by its use of labeled
datasets to train algorithms to classify data or predict outcomes accurately. As input data is fed into
the model, the model adjusts its weights until it has been fitted appropriately. This occurs as part of
the cross validation process to ensure that the model avoids overfitting or underfitting. Supervised
learning helps organizations solve a variety of real-world problems at scale, such as classifying spam
in a separate folder from your inbox. Some methods used in supervised learning include neural
networks, naïve bayes, linear regression, logistic regression, random forest, and support vector
machine (SVM).
Algorithm Types:
● Regression
● Classification
Machine Learning Methods
Unsupervised learning, also known as unsupervised machine learning, uses machine learning
algorithms to analyze and cluster unlabeled datasets. These algorithms discover hidden patterns or
data groupings without the need for human intervention. This method’s ability to discover similarities
and differences in information make it ideal for exploratory data analysis, cross-selling strategies,
customer segmentation, and image and pattern recognition. It’s also used to reduce the number of
features in a model through the process of dimensionality reduction. Principal component analysis
(PCA) and singular value decomposition (SVD) are two common approaches for this. Other algorithms
used in unsupervised learning include neural networks, k-means clustering, and probabilistic clustering
methods.
Algorithm types:
● Clustering
● Association
Machine Learning Methods
Semi-supervised learning offers a happy medium between supervised and unsupervised
learning. During training, it uses a smaller labeled data set to guide classification and feature
extraction from a larger, unlabeled data set. Semi-supervised learning can solve the problem of
not having enough labeled data for a supervised learning algorithm. It also helps if it’s too costly
to label enough data.
Machine Learning Methods
Machine Learning Methods

Reinforcement machine learning is a machine learning model that is similar to supervised


learning, but the algorithm isn’t trained using sample data. This model learns as it goes by
using trial and error. A sequence of successful outcomes will be reinforced to develop the
best recommendation or policy for a given problem.
Machine learning algorithms - Overview
https://towardsdatascience.com/machine-learning-algorithms-in-laymans-terms-part-1-d0
368d769a7b
Common machine learning algorithms

Neural networks: Neural networks simulate the way the human brain works, with a huge
number of linked processing nodes. Neural networks are good at recognizing patterns and
play an important role in applications including natural language translation, image
recognition, speech recognition, and image creation.
Linear regression: This algorithm is used to predict numerical values, based on a linear
relationship between different values. For example, the technique could be used to predict
house prices based on historical data for the area.
Logistic regression: This supervised learning algorithm makes predictions for categorical
response variables, such as“yes/no” answers to questions. It can be used for applications
such as classifying spam and quality control on a production line.
Common machine learning algorithms

Clustering: Using unsupervised learning, clustering algorithms can identify patterns in


data so that it can be grouped. Computers can help data scientists by identifying
differences between data items that humans have overlooked.
Decision trees: Decision trees can be used for both predicting numerical values
(regression) and classifying data into categories. Decision trees use a branching sequence
of linked decisions that can be represented with a tree diagram. One of the advantages of
decision trees is that they are easy to validate and audit, unlike the black box of the neural
network.
Random forests: In a random forest, the machine learning algorithm predicts a value or
category by combining the results from a number of decision trees.
Real-world machine learning use cases

Speech recognition: It is also known as automatic speech recognition (ASR), computer


speech recognition, or speech-to-text, and it is a capability which uses natural language
processing (NLP) to translate human speech into a written format.
Customer service: Online chatbots are replacing human agents along the customer
journey, changing the way we think about customer engagement across websites and
social media platforms.
Computer vision: This AI technology enables computers to derive meaningful information
from digital images, videos, and other visual inputs, and then take the appropriate action.
Recommendation engines: Using past consumption behavior data, AI algorithms can help
to discover data trends that can be used to develop more effective cross-selling strategies.
Real-world machine learning use cases

Automated stock trading: Designed to optimize


stock portfolios, AI-driven high-frequency trading
platforms make thousands or even millions of
trades per day without human intervention.
Fraud detection: Banks and other financial
institutions can use machine learning to spot
suspicious transactions. Supervised learning can
train a model using information about known
fraudulent transactions. Anomaly detection can
identify transactions that look atypical and deserve
further investigation.
Basic Concepts

In Machine Learning, it is believed that the more the number of features the better our
prediction, but it is not always true. If we keep on increasing the number of features,
after a certain point, the performance of our machine learning algorithm tends to
decrease.

If the number of training samples is fixed and


we keep on increasing the number of
dimensions then the predictive power of our
machine learning model first increases, but
after a certain point it tends to decrease.
What is Dimensionality Reduction?

At a certain point, more features or dimensions can decrease a model’s accuracy since
there is more data that needs to be generalized — this is known as the curse of
dimensionality.
Dimensionality reduction is way to reduce the complexity of a model and avoid overfitting.
The number of input features, variables, or columns present in a given dataset is known as
dimensionality, and the process to reduce these features is called dimensionality reduction.
There are two main categories of dimensionality reduction: feature selection and feature
extraction. Via feature selection, we select a subset of the original features, whereas in
feature extraction, we derive information from the feature set to construct a new feature
subspace.
Benefits of applying Dimensionality Reduction
● By reducing the dimensions of the features, the space required to store the dataset
also gets reduced.
● Less Computation training time is required for reduced dimensions of features.
● Reduced dimensions of features of the dataset help in visualizing the data quickly.
● It removes the redundant features (if present) by taking care of multicollinearity.
Disadvantages of dimensionality Reduction:
● Some data may be lost due to dimensionality reduction.
● In the PCA dimensionality reduction technique, sometimes the principal components
required to consider are unknown.
Approaches of Dimension Reduction

Feature selection is the process of selecting the subset of the relevant features and leaving
out the irrelevant features present in a dataset to build a model of high accuracy. In other
words, it is a way of selecting the optimal features from the input dataset.
Feature extraction is the process of transforming the space containing many dimensions
into space with fewer dimensions. This approach is useful when we want to keep the whole
information but use fewer resources while processing the information.
Some common feature extraction techniques are:
● Principal Component Analysis
● Linear Discriminant Analysis
● Kernel PCA
Principal Component Analysis (PCA)

Principal Component Analysis is a statistical process that converts the observations of


correlated features into a set of linearly uncorrelated features with the help of orthogonal
transformation. These new transformed features are called the Principal Components. It is
one of the popular tools that is used for exploratory data analysis and predictive modeling.
PCA works by considering the variance of each attribute because the high attribute shows
the good split between the classes, and hence it reduces the dimensionality.
In a nutshell, PCA aims to find the directions of maximum variance in high-dimensional data
and projects it onto a new subspace with equal or fewer dimensions than the original one.
Applications of PCA are image processing, movie recommendation system, optimizing the
power allocation in various communication channels.
Principal Component Analysis (PCA)

The orthogonal axes (principal components) of the new subspace can be interpreted as the
directions of maximum variance given the constraint that the new feature axes are
orthogonal to each other, as illustrated in the following figure:
In the preceding figure, x1 and x2 are the original feature axes,
and PC1 and PC2 are the principal components.

https://miro.medium.com/v2/resize:fit:720/1*T7CqlFV5aRm6MxO5nJt7Qw.gif
Some common terms used in PCA algorithm:
Dimensionality: It is the number of features or variables present in the given dataset. More easily,
it is the number of columns present in the dataset.
Correlation: It signifies that how strongly two variables are related to each other. Such as if one
changes, the other variable also gets changed. The correlation value ranges from -1 to +1. Here,
-1 occurs if variables are inversely proportional to each other, and +1 indicates that variables are
directly proportional to each other.
Orthogonal: It defines that variables are not correlated to each other, and hence the correlation
between the pair of variables is zero.
Eigenvectors: If there is a square matrix M, and a non-zero vector v is given. Then v will be
eigenvector if Av is the scalar multiple of v.
Covariance Matrix: A matrix containing the covariance between the pair of variables is called the
Covariance Matrix.
Steps in Principal Component Analysis (PCA)
1. Standardize the d-dimensional dataset.
2. Construct the covariance matrix.
3. Decompose the covariance matrix into its eigenvectors and eigenvalues.
4. Sort the eigenvalues by decreasing order to rank the corresponding
eigenvectors.
5. Select k eigenvectors which correspond to the k largest eigenvalues, where k
is the dimensionality of the new feature subspace (k ≤ d).
6. Construct a projection matrix W from the “top” k eigenvectors.
7. Transform the d-dimensional input dataset X using the projection matrix W to
obtain the new k-dimensional feature subspace.
Validation Techniques
Validation techniques in machine learning are used to get the error rate of the ML model, which can be
considered as close to the true error rate of the population. If the data volume is large enough to be
representative of the population, you may not need the validation techniques. In machine learning, there
is always the need to test the stability of the model. It means based only on the training dataset; we
can't fit our model on the training dataset. For this purpose, we reserve a particular sample of the
dataset, which was not part of the training dataset. After that, we test our model on that sample before
deployment. Refer: https://www.upgrad.com/blog/cross-validation-in-machine-learning/
Resubstitution: If all the data is used for training the model and the error rate is evaluated based on
outcome vs. actual value from the same training data set, this error is called the resubstitution error.
This technique is called the resubstitution validation technique.
Holdout: To avoid the resubstitution error, the data is split into two different datasets labeled as a
training and a testing dataset. This can be a 60/40 or 70/30 or 80/20 split. This technique is called the
hold-out validation technique. In this case, there is a likelihood that uneven distribution of different
classes of data is found in training and test dataset. To fix this, the training and test dataset is created
with equal distribution of different classes of data. This process is called stratification.
Validation Techniques
K-Fold Cross-Validation: In this technique, k-1 folds are used for
training and the remaining one is used for testing as shown in the
picture.
The advantage is that entire data is used for training and testing.
The error rate of the model is average of the error rate of each
iteration. This technique can also be called a form the repeated
hold-out method. The error rate could be improved by using
stratification technique.
Leave-One-Out Cross-Validation (LOOCV)
In this technique, all of the data except one record is used for
training and one record is used for testing. This process is repeated
for N times if there are N records. The advantage is that entire data
is used for training and testing. The error rate of the model is
average of the error rate of each iteration. The following diagram
represents the LOOCV validation technique.
Validation Techniques
Random Subsampling: In this technique, multiple sets of data
are randomly chosen from the dataset and combined to form a
test dataset. The remaining data forms the training dataset.
The following diagram represents the random subsampling
validation technique. The error rate of the model is the average
of the error rate of each iteration.

Bootstrapping: In this technique, the training dataset is


randomly selected with replacement. The remaining examples
that were not selected for training are used for testing. Unlike
K-fold cross-validation, the value is likely to change from
fold-to-fold. The error rate of the model is average of the error
rate of each iteration. The following diagram represents the
same.
Confusion Matrix
The confusion matrix is a matrix used to determine the
performance of the classification models for a given set of test
data. It can only be determined if the true values for test data are
known. The matrix itself can be easily understood, but the related
terminologies may be confusing. Since it shows the errors in the
model performance in the form of a matrix, hence also known as
an error matrix.
Refer: https://medium.com/analytics-vidhya/what-is-a-confusion-matrix-d1c0f8feda5
● The matrix is divided into two dimensions, that are predicted values and actual values
along with the total number of predictions.
● Predicted values are those values, which are predicted by the model, and actual values
are the true values for the given observations.
Confusion Matrix
True Positives (TP): when the actual value is Positive and
predicted is also Positive.
True negatives (TN): when the actual value is Negative and
prediction is also Negative.
False positives (FP): When the actual is negative but
prediction is Positive. Also known as the Type 1 error
False negatives (FN): When the actual is Positive but the
prediction is Negative. Also known as the Type 2 error

● It evaluates the performance of the classification models, when they make predictions on
test data, and tells how good our classification model is.
● It not only tells the error made by the classifiers but also the type of errors such as it is
either type-I or type-II error.
Confusion Matrix Parameters
With the help of the confusion matrix, we can calculate the different parameters for the
model, such as:
Accuracy: It defines how often the model predicts the correct output. It can be calculated
as the ratio of the number of correct predictions made by the classifier to all number of
predictions made by the classifiers. The accuracy metric is not suited for imbalanced
classes. It tells us how many predictions are actually positive out of all the total positive
predicted. The formula is given below:
Confusion Matrix Parameters
Precision: It can be defined as the number of correct outputs provided by the model or out
of all positive classes that have predicted correctly by the model, how many of them were
actually true. In simple words, it tells us how many predictions are actually positive out of all
the total positive predicted.
It can be calculated using the below formula:

Ex1:- In Spam Mail Detection: Need to focus on precision


Ex2:- Precision is important in music or video recommendation systems, e-commerce
websites, etc. Wrong results could lead to customer churn and be harmful to the business.
Confusion Matrix Parameters
Recall: It is defined as the out of total positive classes, how our model predicted correctly.
The recall must be as high as possible. It is a measure of actual observations which are
predicted correctly, i.e. how many observations of positive class are actually predicted as
positive. It is also known as Sensitivity.

Ex 1:- suppose person having cancer (or) not?


He is suffering from cancer but model predicted as not
suffering from cancer

Ex 2:- Recall is important in medical cases where it


doesn’t matter whether we raise a false alarm but the actual positive cases should not go
undetected!
Confusion Matrix Parameters
F-measure / F1 Score: If two models have low precision and high recall or vice versa, it is
difficult to compare these models. So, for this purpose, we can use F-score. This score
helps us to evaluate the recall and precision at the same time. The F-score is maximum if
the recall is equal to the precision. F1 score is a harmonic mean of Precision and Recall. As
compared to Arithmetic Mean, Harmonic Mean punishes the extreme values more. F-score
should be high (ideally 1). It can be calculated using the below formula:
Confusion Matrix using Python

Refer: https://onestopdataanalysis.com/confusion-matrix-python/

You might also like