Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views149 pages

ML Module 1

The document provides an overview of Machine Learning, including its definition, types, and applications. It discusses various learning methods such as supervised, unsupervised, semi-supervised, and reinforcement learning, along with their pros and cons. Examples of applications in real-world scenarios, such as email spam filtering and recommendation systems, are also highlighted.

Uploaded by

covidgamer00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views149 pages

ML Module 1

The document provides an overview of Machine Learning, including its definition, types, and applications. It discusses various learning methods such as supervised, unsupervised, semi-supervised, and reinforcement learning, along with their pros and cons. Examples of applications in real-world scenarios, such as email spam filtering and recommendation systems, are also highlighted.

Uploaded by

covidgamer00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 149

MACHINE LEARNING

(CSC604 )

AI&DS
Sem VI
Machine Learning,
Module 1 : Introduction To Machine Learning
1

Dr. Himani Deshpande (TSEC)


Module 1.1

• Machine Learning,
• Types of Machine Learning,

AI&DS
• Issues in Machine Learning,

Sem VI
• Application of Machine Learning,
• Steps in developing a Machine Learning Application.

Machine Learning,
2
Dr. Himani Deshpande (TSEC)
What is Machine Learning?

AI&DS
⚫ Uses techniques to give machines the ability to “LEARN FROM
DATA”, without being explicitly programmed.

Sem VI
⚫ Machines to make data-driven decisions rather than being explicitly
programmed for carrying out a certain task.

Machine Learning,
⚫ Example Netfix or Spotify recommendation, e-commerce, weather
foreast, target marketing etc.
Machines learn

Machine Learning, Sem VI AI&DS


Machines learn

Machine Learning, Sem VI AI&DS


Machines learn

Machine Learning, Sem VI AI&DS


Machines learn

Machine Learning, Sem VI AI&DS


What is Machine Learning?

⚫ In 1959, Arthur Samuel defined

AI&DS
machine learning as a "Field of
study that gives computers the
ability to learn without being

Sem VI
explicitly programmed”.

⚫ Arthur Samuel used to play


checkers with an IBM 704 computer

Machine Learning,
in Poughkeepsie, New York

8
Dr. Himani Deshpande (TSEC)
AI&DS
Sem VI
Machine Learning,
9
Dr. Himani Deshpande (TSEC)
Components of ML

AI&DS
Sem VI
Machine Learning,
10
Dr. Himani Deshpande (TSEC)
What is Machine Learning? (1)
Definition by Tom Mitchel (1998) : ⚫ Example 1: Suppose your email program

➢ “A computer program is said to learn watches which email you do or do not mark as
from experience E with respect to spam, and based on that learns how to better

AI&DS
some class of tasks T and
performance measure P, if its filter spam.
performance at tasks in T, as

Sem VI
measured by P, improves with 1. Classifying emails as spam or not T
experience E.” spam
➢ A well-defined learning task is given by

Machine Learning,
<P,T, E> 2. Watching your label emails as spam or E
ham

3. The number of emails correctly P


classified as spam / ham
11
Dr. Himani Deshpande (TSEC)
What is Machine Learning? (2)
⚫ Example 2: Handwritten Recognition Learning ⚫ Example 3: Robot Driving Learning Problem
Problem.

1. Driving on public highway using vision T

AI&DS
1. Recognizing hand-written words within T
images sensors

Sem VI
2. Percent of words correctly classified P 2. Sequence of images and steering E
commands recorder while observing a
human driver

Machine Learning,
3. Database of human-labeled images of E
handwritten words 3. Average distance travelled before an P
error

12
Dr. Himani Deshpande (TSEC)
Types of Machine Learning

AI&DS
Sem VI
Machine Learning,
13
Dr. Himani Deshpande (TSEC)
Types of Machine Learning

AI&DS
Sem VI
Machine Learning,
14
Dr. Himani Deshpande (TSEC)
Supervised

AI&DS
Sem VI
Proper instructions are
given before the quiz,
how does a cat look

Machine Learning,
A quiz to segregate cats
from other animals
17
Dr. Himani Deshpande (TSEC)
Supervised Learning

AI&DS
Sem VI
Machine Learning,
18
Dr. Himani Deshpande (TSEC)
Supervised Learning
⚫ The type of learning algorithm where the input and
the desired output are provided is known as the
Supervised Learning Algorithm.

AI&DS
⚫ In Supervised Machine Learning, labeled data is
used to train machines in order to make them learn

Sem VI
and establish relationships between given inputs
and outputs.

Machine Learning,
⚫ In labeled data, a label is nothing but a known
description or a tag given to objects in the data.
⚫ Similar to teaching a child with the use of flash
cards.
19
Dr. Himani Deshpande (TSEC)
Supervised Learning (1)
⚫ Working

AI&DS
Sem VI
Training to achieve
better testing

Machine Learning,
results

20
Dr. Himani Deshpande (TSEC)
Supervised Learning (1)
⚫ Working

⚫ Given data in the form of input output pair, it is fed to a learning


algorithm one by one, during training.

AI&DS
⚫ Then algorithm is allowed to predict the output for each example, and

Sem VI
give it feedback as to whether it predicted the right answer or not.

⚫ Over time, the algorithm will learn to approximate the exact nature of
the relationship between input output pair.

Machine Learning,
⚫ When fully-trained, the supervised learning algorithm will be able to
observe a new, never-before-seen example and predict a correct
label/output for it.
21
Dr. Himani Deshpande (TSEC)
Supervised Learning (2)
⚫ Pros: ⚫ Cons:

⚫ Clear specific objective ⚫ Intensive Labor - data

AI&DS
requires labeling before the
⚫ Easy to measure accuracy – model is trained, which can
Since actual output is known its

Sem VI
take hours of human effort.
easy to design performance
metric for the system. ⚫ Needs large amount of data .
Controlled training process –

Machine Learning,

⚫ Limited insights - no freedom
which in return gives an for the machine to explore
outcome of a very specific other possibilities
behavior.

23
Dr. Himani Deshpande (TSEC)
Types of Supervised Learning

AI&DS
Sem VI
Machine Learning,
25
Dr. Himani Deshpande (TSEC)
Classification
Classifies the input data into one of several
predefined classes.

Is useful for providing categorical outcomes that fit

AI&DS
within the predefined labels.
Example 1

Sem VI
• Spam Classifier

Uses text filter (checks for phrases like lottery,

Machine Learning,
bit coin, jackpot and mark it as spam )

Client filter (understands client identity like if a


person is sending huge amount of mails or other
users have already marked it as spam).

26
Dr. Himani Deshpande (TSEC)
Classification
• Example 2

• Classifying type of waste - Multi class Classification

AI&DS
Sem VI
Machine Learning,
27
Dr. Himani Deshpande (TSEC)
Classification
• Example 3
• Cancer Detection – classifying from MRI images of brain whether
tumor is malignant or benign

AI&DS
• If malignant then based other attributes like size, shape of cells,

Sem VI
uniformity in cells decide on type of cancer i.e. stage 1, 2 or 3.

Machine Learning,
• Other examples - Fraud detection, recognizing handwritten digits or
characters, Face Recognition (facebook) .

28
Dr. Himani Deshpande (TSEC)
Regression

• Predictive algorithm that attempts to predict the output


value when the input value is given.

AI&DS
• It deals with continuous numerical values. It estimates

Sem VI
the relationship between variables.

Machine Learning,
29
Dr. Himani Deshpande (TSEC)
Regression Examples

• Prediction of age of stray animals.

AI&DS
• Impact of product price on number of sales.

Sem VI
• House price prediction based on area, location,
interior etc.

Machine Learning,
• Impact of rainfall amount on number fruits yielded

30
Dr. Himani Deshpande (TSEC)
Types of Machine Learning

AI&DS
Sem VI
Machine Learning,
31
Dr. Himani Deshpande (TSEC)
Unsupervised Behaviors

AI&DS
Sem VI
From a pile of clothes
decide what to keep
and what to discard

Machine Learning,
32
Dr. Himani Deshpande (TSEC)
Unsupervised Learning
⚫ In this type of learning, data is not
labeled.

Instead, our algorithm would be fed a

AI&DS

lot of data and given the tools to

Sem VI
understand the properties of the data.

⚫ From there, it can learn to group,


cluster, and/or organize the data in a

Machine Learning,
way such that a human (or other
intelligent algorithm) can come in and
make sense of the newly organized
data.

33
Dr. Himani Deshpande (TSEC)
Unsupervised Grouping

AI&DS
Sem VI
Machine Learning,
34
Dr. Himani Deshpande (TSEC)
Unsupervised Learning (1)
Working
⚫ The very first step is to load the unlabeled data
into the system.

AI&DS
⚫ Once the data is loaded into the system, the
algorithm analyzes the data.

Sem VI
⚫ As the analysis gets completed, the algorithm
will look for patterns depending upon the
behavior or attributes of the dataset.

Machine Learning,
⚫ Once pattern identification and grouping are
done, it gives the output.

35
Dr. Himani Deshpande (TSEC)
Unsupervised Learning (2)
⚫ Pros: ⚫ Cons:

⚫ Fast Process - since no data ⚫ Difficult to measure accuracy - it is


not easy to measure the accuracy

AI&DS
labeling is required here i.e.
since we don’t have any expected
fewer human resources is
or desired outcome to compare to.

Sem VI
required in order to perform
tasks. ⚫ Data Dimensionality - When the
dimension of data and the number
Unique insights – unique,

Machine Learning,

of variables become more and need
disruptive insights for a to be reduced in order to work on
business to consider as it that data, then the human
interprets data on its own. involvement becomes necessary to
clean the data.
36
Dr. Himani Deshpande (TSEC)
Unsupervised Learning (3) -
Examples
⚫ Recommender Systems:

⚫ YouTube or Netflix uses a video recommendation system.

AI&DS
⚫ We know things about videos, maybe their length, their genre, etc. We also

Sem VI
know the watch history of many users. Taking into account users that have
watched similar videos as you and then enjoyed other videos that you have
yet to see, a recommender system can see this relationship in the data and

Machine Learning,
prompt you with such a suggestion.

37
Dr. Himani Deshpande (TSEC)
Unsupervised Learning (3) -
Examples
⚫ Grouping User Logs:

Unsupervised learning is use to cluster

AI&DS

user logs and issues of similar type

Sem VI
together.

⚫ This can help companies identify central

Machine Learning,
themes to issues their customers face
and rectify these issues, through
improving a product or designing an
FAQ to handle common issues.

38
Dr. Himani Deshpande (TSEC)
Supervised Vs. Unsupervised

AI&DS
Sem VI
Machine Learning,
39
Dr. Himani Deshpande (TSEC)
Semi- Supervised

AI&DS
Sem VI
Machine Learning,
40
Dr. Himani Deshpande (TSEC)
Semi- Supervised Learning
⚫ Main difference between supervised and unsupervised machine learning is
this:

AI&DS
⚫ Labeled and unlabeled data

Sem VI
⚫ In Supervised machine learning, data scientist decides which features
are important whereas in unsupervised machine learning, features are
determined on their own based on inherent patterns in the data.

Machine Learning,
Semi-supervised learning algorithms are trained on a
combination of labeled and unlabeled data.
41
Dr. Himani Deshpande (TSEC)
Semi- Supervised Learning

⚫ Advantage:

AI&DS
⚫ Reduces time required for labeling massive data.

Sem VI
⚫ Avoids human biases which can be introduced due to
labeling.

⚫ Using lots of unlabeled data during the training process

Machine Learning,
improves the accuracy of the final model while reducing
the time and cost spent building it.

42
Dr. Himani Deshpande (TSEC)
Example of
different Learning methods

AI&DS
• Using classification of web pages as an example, let’s compare how
these three approaches work in practice:

Sem VI
• Supervised classification: The algorithm learns to assign labels to types
of webpages based on the labels that were inputted by a human during

Machine Learning,
the training process.

• Unsupervised clustering: The algorithm looks at inherent similarities


between webpages to place them into groups.
43
Dr. Himani Deshpande (TSEC)
Example of
different Learning methods

AI&DS
• Semi-supervised classification: Labeled data is used to help

Sem VI
identify that there are specific groups of webpage types present in the
data and what they might be. The algorithm is then trained on
unlabeled data to define the boundaries of those webpage types and
may even identify new types of webpages that were unspecified in the

Machine Learning,
existing human-inputted labels.

44
Dr. Himani Deshpande (TSEC)
Reinforcement Learning

AI&DS
Sem VI
Machine Learning,
learner

Reinforcement learning is very behavior driven. It has influences


from the fields of neuroscience and psychology.
45
Dr. Himani Deshpande (TSEC)
Reinforcement Learning

AI&DS
Sem VI
Machine Learning,
46
Dr. Himani Deshpande (TSEC)
Reinforcement Learning
⚫ Reinforcement learning is learning from mistakes.

⚫ Place a reinforcement learning algorithm into any environment ,

AI&DS
⚫ Initially, it will make a lot of mistakes until some sort of signal is provided.

Sem VI
⚫ The algorithm that associates good behaviors with a positive signal
(rewards) and bad behaviors with a negative one (Penalty) , we can
reinforce our algorithm to prefer good behaviors over bad ones. Over time,

Machine Learning,
our learning algorithm learns to make less mistakes than it used to.

⚫ Reinforcement learning can learn from its experience through trial and error.

47
Dr. Himani Deshpande (TSEC)
Elements of Reinforcement Learning
• Any reinforcement learning problem
requires
• an agent Environment

AI&DS
• an environment
• A way to connect them through a

Sem VI
feedback loop. Reward State Action
• To connect the agent to the environment,
we give it a set of actions that it can take

Machine Learning,
that affect the environment. To connect
the environment to the agent, we have it Agent
continually issue two signals to the
agent: an updated state and a reward
(our reinforcement signal for behavior).
48
Dr. Himani Deshpande (TSEC)
Reinforcement Learning
Pros: Cons:

⚫ Reinforcement learning can be used to ⚫ Computation Heavy and Time


solve very complex problems that Consuming.

AI&DS
cannot be solved by conventional
techniques. ⚫ The curse of dimensionality limits

Sem VI
⚫ This learning model is very similar to reinforcement learning heavily for real
the learning of human beings. Hence, physical systems. The curse of
it is close to achieving perfection. dimensionality refers to various
phenomena that arise when analyzing

Machine Learning,
⚫ DeepMind’s AlphaGo program, a and organizing data in high-
reinforcement learning model, beat dimensional spaces that do not occur
the world champion Lee Sedol at in low-dimensional settings.
the game of Go in March 2016.

49
Dr. Himani Deshpande (TSEC)
Reinforcement Learning Example
Game Playing:

• Reinforcement learning can be used effectively to determine the

AI&DS
best move to make in a game depending on a number of
different factors.

Sem VI
• It is very handy in games like Chess, Go, etc. Using
reinforcement learning, we can improve and personalize the

Machine Learning,
gaming experience in real-time. It is the algorithm that can solve
different games and sometimes achieve super-human
performance.

50
Dr. Himani Deshpande (TSEC)
Reinforcement Learning Example

Training Robots:
• Robots are trained using the trial and error method with human

AI&DS
supervision.

Sem VI
• Reinforcement learning teaches robots new tasks while retaining
prior knowledge.

Machine Learning,
51
Dr. Himani Deshpande (TSEC)
Reinforcement Learning Example

A variety of problems can be solved using reinforcement learning like

AI&DS
➢ Self-driving cars ,

Sem VI
➢trading stock prices forecasting,
➢optimizing chemical reactions,
➢Industrial automation,

Machine Learning,
➢news recommendations,
➢etc.

52
Dr. Himani Deshpande (TSEC)
Steps in Developing a
Machine Learning Application

AI&DS
Sem VI
Machine Learning,
53
Dr. Himani Deshpande (TSEC)
Steps in Developing a
Machine Learning Application
• Randomize Ordering • Core Step
• Useable format • Clean data is fed to extract
• Algorithm Specific knowledge.
Formatting • Store in a useable format for
• Learning rate

AI&DS
• Loading Data next step.
• One training step. • Initial Condition
• Splitting into training

Sem VI
and testing dataset

Machine Learning,
• Quantity and
• Different • Actual
Quality data • Testing Data set
algorithms are Output
• Web Scraping • If not satisfied
for different • If not proper
• RSS Feed, then change
tasks , redo
API previous steps.
• Choose the everything
• Publicly
right one
available data
54
Dr. Himani Deshpande (TSEC)
Steps in Developing a
Machine Learning Application

AI&DS
Sem VI
Machine Learning,
57
Dr. Himani Deshpande (TSEC)
Steps in Developing a Machine Learning Application

• Step 1 : Gathering Data


• Once you know exactly what you want and the equipment are in hand,
it takes you to the first real step of machine learning- Gathering Data.

AI&DS
• This step is very crucial as the quality and quantity of data gathered

Sem VI
will directly determine how good the predictive model will turn out to
be.
• The data collected is then tabulated and use for training as well as

Machine Learning,
testing.
• Data gathering can be done via using API, Web scrapping, Public
dataset.
58
Dr. Himani Deshpande (TSEC)
Steps in Developing a Machine Learning Application

• Step 2: Data preparation


• Data preparation involves adjusting and manipulating like
normalization, error correction, handling missing values , converting

AI&DS
data into useable format needed for algorithm .
• Once all data is ready , it is loaded into a suitable place and then the

Sem VI
order is randomized as the order of data should not affect what is
learned.

Machine Learning,
• Lastly, Data set is divided into training and testing data set.

59
Dr. Himani Deshpande (TSEC)
Steps in Developing a Machine
Learning Application (2)

Step 3 : Choosing a model

AI&DS
The next step is choosing an appropriate model as per
requirement. There are many models that researchers and data

Sem VI
scientists have created over the years. Some are very well suited
for image data, others for sequences (like text, or music), some for
numerical data, others for text-based data.

Machine Learning,
Model Model
1 Model
2 3

60
Dr. Himani Deshpande (TSEC)
Steps in Developing a Machine
Learning Application (2)
• Step 4: Train the Algorithm
• Training step is core step often considered as the bulk of
machine learning, where the data is used to incrementally

AI&DS
improve the model’s ability to predict.

Sem VI
• The training process involves initializing some random values
for say A and B of our model, predict the output with those

Machine Learning,
values, then compare it with the model's prediction and then
adjust the values so that they match the predictions that were
made previously. This process then repeats and each cycle of
updating is called one training step.
61
Dr. Himani Deshpande (TSEC)
Steps in Developing a Machine
Learning Application (3)
• Step 5 : Evaluation
• In this, testing dataset kept aside is use to evaluate and identify

AI&DS
efficiency of the model.

Sem VI
• Evaluation allows the testing of the model against data that has
never been seen and used for training and is meant to be
representative of how the model might perform when in the real

Machine Learning,
world.

62
Dr. Himani Deshpande (TSEC)
Steps in Developing a Machine
Learning Application (3)
• Step 6: Hyper Parameter Tuning
• Learning rate that defines how fast algorithm learn in each step, based on the

AI&DS
information from the previous training step. These value will have impact on
how long the training will take.

Sem VI
• For models that are more complex, initial conditions play a significant role in

Machine Learning,
the determination of the outcome of training. Differences can be seen
depending on whether a model starts off training with values initialized to
zeroes versus some distribution of values, which then leads to the question of
which distribution is to be used.
63
Dr. Himani Deshpande (TSEC)
Steps in Developing a Machine
Learning Application (3)

AI&DS
• Step 7: Prediction / Use It

Sem VI
Finally, use the model to predict the outcome. If
outcome is not proper then revisit all the steps.

Machine Learning,
64
Dr. Himani Deshpande (TSEC)
Issues with
Machine Learning Application

AI&DS
Sem VI
Machine Learning,
65
Dr. Himani Deshpande (TSEC)
Issues
• Data is not free at all
• Talent Deficit

AI&DS
• Inadequate Infrastructure

Sem VI
• Specialization not Generalization
• Technology is very young

Machine Learning,
66
Dr. Himani Deshpande (TSEC)
Data is not free at all

• For ML applications, algorithm requires massive dataset. These systems


don’t just require more information than humans to understand concepts

AI&DS
or recognize features, they require hundreds of thousands times more
than human.

Sem VI
• Storing data is not a concern since purchasing space is cheap, but
buying ready data set is very expensive.
• Creating a data set involves collecting it from different sources,

Machine Learning,
organizing and formatting as per requirement, feature sampling , record
sampling etc.

67
Dr. Himani Deshpande (TSEC)
Talent Deficit

• There is a shortage of skilled employees available to manage and

AI&DS
develop analytical content for Machine Learning i.e. develop the
technology.

Sem VI
Machine Learning,
68
Dr. Himani Deshpande (TSEC)
Inadequate Infrastructure

• Machine Learning requires vast amounts of data churning


capabilities. Legacy systems often can’t handle the workload
and buckle under pressure.

AI&DS
Sem VI
• ML developer should check infrastructure if not as per
specification then system must be upgraded with hardware
acceleration and flexible storage.

Machine Learning,
69
Dr. Himani Deshpande (TSEC)
Specialization not Generalization

• There is no AI application which can do multiple task .


• ML algorithms are incredibly efficient for doing a specific task, eg.

AI&DS
recognizing cats or playing Atari games, but there is no neural network
in the world that can do for example identifying objects and images,

Sem VI
playing Space Invaders, and listen to music at once.

Machine Learning,
70
Dr. Himani Deshpande (TSEC)
Technology is very young
• The biggest tech corporations like the
Alphabet Inc. (former Google) offers TensorFlow, while
Microsoft cooperates with Facebook developing Open Neural

AI&DS
Network Exchange (ONNX). are building environments for ML
applications. Since, the technology is still new, it may not be

Sem VI
production-ready, or be borderline production ready.

Machine Learning,
• Other issues could be time consuming in terms of planning,
training , testing, requires huge processing power, integrating
with existing legacy software are difficult, understanding which
process needs automation and many more.
71
Dr. Himani Deshpande (TSEC)
Applications of Machine Learning

AI&DS
Sem VI
Machine Learning,
72
Dr. Himani Deshpande (TSEC)
1. Image Processing
• Image recognition is one of the most common applications of machine
learning. It is used to identify objects, persons, places, digital images, etc.
The popular use case of image recognition and face detection is, Automatic

AI&DS
friend tagging suggestion:

Sem VI
• Facebook provides us a feature of auto friend tagging suggestion.
Whenever we upload a photo with our Facebook friends, then we
automatically get a tagging suggestion with name, and the technology
behind this is machine learning's face detection and recognition algorithm.

Machine Learning,
• It is based on the Facebook project named "Deep Face," which is
responsible for face recognition and person identification in the picture.

73
Dr. Himani Deshpande (TSEC)
2. Speech Recognition
• While using Google, we get an option of "Search by voice," it comes
under speech recognition, and it's a popular application of machine
learning.

AI&DS
• Speech recognition is a process of converting voice instructions into

Sem VI
text, and it is also known as "Speech to text", or "Computer speech
recognition." At present, machine learning algorithms are widely
used by various applications of speech recognition. Google

Machine Learning,
assistant, Siri, Cortana, and Alexa are using speech recognition
technology to follow the voice instructions.

74
Dr. Himani Deshpande (TSEC)
3. Traffic Prediction
• If we want to visit a new place, we take help of Google Maps, which
shows us the correct path with the shortest route and predicts the
traffic conditions.

AI&DS
• It predicts the traffic conditions such as whether traffic is cleared,

Sem VI
slow-moving, or heavily congested with the help of two ways:
• Real Time location of the vehicle form Google Map app and sensors
• Average time has taken on past days at the same time.

Machine Learning,
• Everyone who is using Google Map is helping this app to make it
better. It takes information from the user and sends back to its
database to improve the performance.

75
Dr. Himani Deshpande (TSEC)
4. Product recommendation
• Machine learning is widely used by various e-commerce and
entertainment companies such as Amazon, Netflix, etc., for product
recommendation to the user. Whenever we search for some product

AI&DS
on Amazon, then we started getting an advertisement for the same
product while internet surfing on the same browser and this is

Sem VI
because of machine learning.
• Google understands the user interest using various machine learning
algorithms and suggests the product as per customer interest.

Machine Learning,
• As similar, when we use Netflix, we find some recommendations for
entertainment series, movies, etc., and this is also done with the help
of machine learning.

76
Dr. Himani Deshpande (TSEC)
5. Self Driven Cars
• One of the most exciting applications of machine learning is self-
driving cars. Machine learning plays a significant role in self-driving
cars. Tesla, the most popular car manufacturing company is working

AI&DS
on self-driving car. It is using unsupervised learning method to train

Sem VI
the car models to detect people and objects while driving.

Machine Learning,
77
Dr. Himani Deshpande (TSEC)
6. Email Spam Filter
• Whenever we receive a new email, it is filtered automatically as
important, normal, and spam. We always receive an important mail in
our inbox with the important symbol and spam emails in our spam

AI&DS
box, and the technology behind this is Machine learning.

Sem VI
Machine Learning,
78
Dr. Himani Deshpande (TSEC)
7. Virtual Personal Assistant
• We have various virtual personal assistants such as Google
assistant, Alexa, Cortana, Siri. As the name suggests, they help us in
finding the information using our voice instruction. These assistants

AI&DS
can help us in various ways just by our voice instructions such as Play

Sem VI
music, call someone, Open an email, Scheduling an appointment, etc.

Machine Learning,
79
Dr. Himani Deshpande (TSEC)
8. Online Fraud Detection
• Machine learning is making our online transaction safe and secure by
detecting fraud transaction. Whenever we perform some online
transaction, there may be various ways that a fraudulent transaction

AI&DS
can take place such as fake accounts, fake ids, and steal money in the
middle of a transaction. So to detect this, Feed Forward Neural

Sem VI
network helps us by checking whether it is a genuine transaction or a
fraud transaction.
• For each genuine transaction, the output is converted into some hash

Machine Learning,
values, and these values become the input for the next round. For
each genuine transaction, there is a specific pattern which gets
change for the fraud transaction hence, it detects it and makes our
online transactions more secure.
80
Dr. Himani Deshpande (TSEC)
9. Stock Market Prediction
• Machine learning is widely used in stock market trading. In the stock
market, there is always a risk of up and downs in shares, so for this
machine learning's long short term memory neural network is used

AI&DS
for the prediction of stock market trends.

Sem VI
Machine Learning,
81
Dr. Himani Deshpande (TSEC)
10. Medical Diagnosis
• In medical science, machine learning is used for diseases diagnoses.
With this, medical technology is growing very fast and able to build
3D models that can predict the exact position of lesions in the brain.

AI&DS
• It helps in finding brain tumors and other brain-related diseases

Sem VI
easily.

Machine Learning,
82
Dr. Himani Deshpande (TSEC)
11. Automatic Language Transformation
• Nowadays, if we visit a new place and we are not aware of the
language then it is not a problem at all, as for this also machine
learning helps us by converting the text into our known languages.

AI&DS
Google's GNMT (Google Neural Machine Translation) provide this

Sem VI
feature, which is a Neural Machine Learning that translates the text
into our familiar language, and it called as automatic translation.
• The technology behind the automatic translation is a sequence to

Machine Learning,
sequence learning algorithm, which is used with image recognition
and translates the text from one language to another language.

83
Dr. Himani Deshpande (TSEC)
Module 1.2 : Introduction
To Machine Learning

AI&DS
Sem VI
Machine Learning,
Topics: Supervised and Unsupervised Learning: Concepts of Classification,
Clustering and prediction, Training, Testing and validation dataset, cross
validation, overfitting and underfitting of model
84

Dr. Himani Deshpande (TSEC)


Classification

• The Classification algorithm is a Supervised Learning technique


that is used to identify the category of new observations on the

AI&DS
basis of training data.

Sem VI
• In Classification, a program learns from the given dataset or
observations and then classifies new observation into a number

Machine Learning,
of classes or groups. Such as, Yes or No, 0 or 1, Spam or Not
Spam, cat or dog, etc.

85
Dr. Himani Deshpande (TSEC)
Classification Algorithms

AI&DS
Sem VI
Machine Learning,
86
Dr. Himani Deshpande (TSEC)
Clustering
• Clustering or cluster analysis is a machine learning technique, which groups the
unlabelled dataset.

AI&DS
Sem VI
Machine Learning,
A way of grouping the data points into different clusters, consisting of
similar data points. The objects with the possible similarities remain in a
group that has less or no similarities with another group."
87
Dr. Himani Deshpande (TSEC)
Clustering Algorithms

AI&DS
Sem VI
Machine Learning,
88
Dr. Himani Deshpande (TSEC)
Prediction

“Prediction” refers to the output of an algorithm

AI&DS
after it has been trained on a historical dataset
and applied to new data when forecasting the

Sem VI
likelihood of a particular outcome.

Machine Learning,
Prediction essentially means to predict a future outcome

Predict the price


89
Dr. Himani Deshpande (TSEC)
Training, Testing and validation dataset

AI&DS
Sem VI
Machine Learning,
90
Dr. Himani Deshpande (TSEC)
Training, Testing, Testing

AI&DS
Training --- Classroom Teaching

Sem VI
Validation ---- Periodic Test
Test ----- Semester Exam

Machine Learning,
91
Dr. Himani Deshpande (TSEC)
Training, Testing and validation dataset

AI&DS
Sem VI
Machine Learning,
92
Dr. Himani Deshpande (TSEC)
Training Dataset
• Training data are the sub-dataset which we use to train a
model.

AI&DS
• Algorithms study the hidden patterns and insights which

Sem VI
are hidden inside these observations and learn from them.

Machine Learning,
• The model will be trained over and over again using the
data in the training set machine learning and continue to
learn the features of this data.

93
Dr. Himani Deshpande (TSEC)
Testing Dataset
• In Machine learning Test data is the sub-dataset that we use to
evaluate the performance of a model built using a training dataset.

AI&DS
• Although we extract Both train and test data from the same dataset,
the test dataset should not contain any training dataset data.

Sem VI
• The purpose of creating a model is to predict unknown results.

Machine Learning,
• The test data is used to check the performance, accuracy, and
precision of the model created using training data.

94
Dr. Himani Deshpande (TSEC)
Train and Test Test

AI&DS
Sem VI
Machine Learning,
95
Dr. Himani Deshpande (TSEC)
Validation Dataset
• Validation data are a sub-dataset separated from
the training data, and it’s used to validate the
model during the training process.

AI&DS
Sem VI
• The information from the validation process
assists us in changing parameters, classifiers of
the model to get better results. So basically,

Machine Learning,
validation data helps us to optimize the model.

96
Dr. Himani Deshpande (TSEC)
Testing , Training, Validation sets

AI&DS
Sem VI
Machine Learning,
97
Dr. Himani Deshpande (TSEC)
Cross Validation

AI&DS
Sem VI
Machine Learning,
98
Dr. Himani Deshpande (TSEC)
Cross Validation Real life example

• In every round a company gives different

AI&DS
set/type of question during placement
test.

Sem VI
• Checking the caliber of student from
multiple angels

Machine Learning,
99
Dr. Himani Deshpande (TSEC)
Cross Validation

AI&DS
Sem VI
Machine Learning,
Every time use different testing and training set to validate the model
100
Dr. Himani Deshpande (TSEC)
Cross Validation

• Cross-validation is a technique for evaluating ML models by


training several ML models on subsets of the available input data

AI&DS
and evaluating them on the complementary subset of the data.

Sem VI
• Use cross-validation to detect overfitting, ie, failing to generalize a
pattern.

Machine Learning,
101
Dr. Himani Deshpande (TSEC)
Cross Validation Methods

1. Validation Set Approach

AI&DS
2. Leave-P-out cross-validation

Sem VI
3. Leave one out cross-validation

Machine Learning,
4. K-fold cross-validation

5. Stratified k-fold cross-validation

102
Dr. Himani Deshpande (TSEC)
Validation set Cross Validation
• We divide our input dataset into a training set and test or validation
set in the validation set approach. Both the subsets are given 50% of
the dataset.

AI&DS
Sem VI
• But it has one of the big disadvantages that we are just using a 50%
dataset to train our model, so the model may miss out to capture
important information of the dataset. It also tends to give the

Machine Learning,
underfitted model.
½ Testing Set
½ Training SetDataset

103
Dr. Himani Deshpande (TSEC)
K-fold cross-validation

• Split the input dataset into K groups


• For each group:

AI&DS
• Take one group as the reserve or test data set.
• Use remaining groups as the training dataset

Sem VI
• Fit the model on the training set and evaluate the performance of the model
using the test set.

Machine Learning,
104
Dr. Himani Deshpande (TSEC)
Leave p out Cross Validation

• The cross-validation known as Leave-P-Out refers to


leave p sample points out from the training set and use them as

AI&DS
the test set.
• The p points do not have to be contiguous;

Sem VI
• for a sample with n points, there can be different test sets.

Machine Learning,
105
Dr. Himani Deshpande (TSEC)
Leave one out cross validation

AI&DS
1 record for testing ,
all other for training

Sem VI
Machine Learning,
106
Dr. Himani Deshpande (TSEC)
ML Model Error

• Training Error

AI&DS
Sem VI
• Testing Error

Machine Learning,
107
Dr. Himani Deshpande (TSEC)
Machine Learning Errors

AI&DS
• In machine learning, an error is a measure of how accurately an algorithm
can make predictions for the previously unknown dataset.

Sem VI
• On the basis of these errors, the machine learning model is selected that can
perform best on the particular dataset.

Machine Learning,
108
Dr. Himani Deshpande (TSEC)
Training Error
⚫ Training error is the error that you get when you run the trained model
back on the training data.

AI&DS
Sem VI
⚫ Remember that this data has already been used to train the model and
this necessarily doesn't mean that the model once trained will
accurately perform when applied back on the training data itself.

Machine Learning,
109
Dr. Himani Deshpande (TSEC)
Generalization/Test Error

⚫ Generalization error (also known as the out-of-sample error or


the risk) is a measure of how accurately an algorithm is able to predict

AI&DS
outcome values for previously unseen data.

Sem VI
⚫ Test error is the error when you get when you run the trained model
on a set of data that it has previously never been exposed to.

This data is often used to measure the accuracy of the model before it

Machine Learning,

is shipped to production.

110
Dr. Himani Deshpande (TSEC)
Machine Learning Errors

AI&DS
Sem VI
Machine Learning,
Training Error

Testing Error

111
Dr. Himani Deshpande (TSEC)
Understanding Bias and Variance
Input Features : X1, X2,X3,X4,X5,………Xn Output Label : y

AI&DS
Sem VI
Machine Learning,
112
Dr. Himani Deshpande (TSEC)
Understanding Bias and Variance

• Assume you have a classification model, training data and testing data

AI&DS
• x_train , y_train // This is the training data

Sem VI
x_test , y_test // This is the testing data
y_predicted // the values predicted by the model given an input

Machine Learning,
• The error rate is the average error of value predicted by the model and the
correct value.

113
Dr. Himani Deshpande (TSEC)
Bias
Actual Value
• BIAS
Difference between

AI&DS
predicted and actual value

Sem VI
Predicted Value

Bias is the error rate of

Machine Learning,
y_predicted and y_train.

Error of training Data

Low Bias High Bias 114


Dr. Himani Deshpande (TSEC)
Bias
• LOW BIAS • High BIAS
Small Difference between Big Difference between

AI&DS
predicted and actual value predicted and actual value

Sem VI
Machine Learning,
Low Bias High Bias 115
Dr. Himani Deshpande (TSEC)
Bias

• While making predictions, a difference occurs between prediction values


made by the model and actual values/expected values, and this difference

AI&DS
is known as bias errors or Errors due to bias.

Sem VI
•Low Bias: A low bias model will make fewer assumptions about the form of
the target function.

Machine Learning,
•High Bias: A model with a high bias makes more assumptions, and the model
becomes unable to capture the important features of our dataset.

•A high bias model also cannot perform well on new data.


116
Dr. Himani Deshpande (TSEC)
Bias
• Let’s assume we have trained the model and are trying to predict values with
input ‘x_train’.

AI&DS
• The predicted values are y_predicted.

Sem VI
• Bias is the error rate of y_predicted and y_train.

Machine Learning,
• In simple terms, think of bias as the error rate of the training data.
• When the error rate is high, we call it High Bias and when the error rate is low,
we call it Low Bias

117
Dr. Himani Deshpande (TSEC)
Variance

• How much scattered are predicted values from actual values.

AI&DS
Error of testing Data

Sem VI
Predicted values Predicted values

Machine Learning,
High Variance Low Variance
118
Dr. Himani Deshpande (TSEC)
Variance
• The variance would specify the amount of variation in the prediction if the
different training data was used. In simple words, variance tells that how
much a random variable is different from its expected value.

AI&DS
Sem VI
• Ideally, a model should not vary too much from one training dataset to
another, which means the algorithm should be good in understanding the
hidden mapping between inputs and output variables.

Machine Learning,
• Variance errors are either of low variance or high variance.

119
Dr. Himani Deshpande (TSEC)
Variance

• Let’s assume we have trained the model and this time we are trying
to predict values with input ‘x_test’.

AI&DS
• Again, the predicted values are y_predicted.

Sem VI
• Variance is the error rate of the y_predicted and y_test
• In simple terms, think of variance as the error rate of the testing
data.

Machine Learning,
• When the error rate is high, we call it High Variance and when the
error rate is low, we call it Low Variance

120
Dr. Himani Deshpande (TSEC)
Understanding
Bias and Variance
IMP.
Diagram Center point
is Bulls eye,
i.e. the model

AI&DS
targets

Sem VI
Machine Learning,
121
Dr. Himani Deshpande (TSEC)
Bias and Variance

AI&DS
Training Set

Sem VI
Testing Set

Machine Learning,
123
Dr. Himani Deshpande (TSEC)
Training Set
Bias and Variance Testing Set

AI&DS
Sem VI
Machine Learning,
124
Dr. Himani Deshpande (TSEC)
Bias and Variance
Training Set

Testing Set

AI&DS
Sem VI
Machine Learning,
125
Dr. Himani Deshpande (TSEC)
Bias and Variance

AI&DS
Sem VI
Machine Learning,
126
Dr. Himani Deshpande (TSEC)
Over Fitting

AI&DS
&

Sem VI
Under fitting

Machine Learning,
127
Dr. Himani Deshpande (TSEC)
Over Fitting, Under fitting

AI&DS
Sem VI
Machine Learning,
129
Dr. Himani Deshpande (TSEC)
Overfitting
• Overfitting is an error that occurs in data modelling as a
result of a particular function aligning too closely to a
minimal set of data points.

AI&DS
Sem VI
• When a model has been compromised by overfitting, the
model may lose its value as a predictive tool for investing.

Machine Learning,
130
Dr. Himani Deshpande (TSEC)
Underfitting
• Underfitting is an error that occurs in data modelling as a
result of a particular function aligning too far from data
points.

AI&DS
Sem VI
• Underfitting is a scenario in data science where a data
model is unable to capture the relationship between the
input and output variables accurately, generating a high

Machine Learning,
error rate on both the training set and unseen data.

131
Dr. Himani Deshpande (TSEC)
Over Fitting, Under fitting

AI&DS
Sem VI
Machine Learning,
High Bias
High Variance

132
Dr. Himani Deshpande (TSEC)
Over Fitting, Under fitting

AI&DS
Sem VI
Machine Learning,
High Bias Low Bias
High Variance Low Variance

133
Dr. Himani Deshpande (TSEC)
Over Fitting, Under fitting

AI&DS
Sem VI
Machine Learning,
High Bias Low Bias Low Bias
High Variance Low Variance High Variance

134
Dr. Himani Deshpande (TSEC)
Understanding Bias and Variance

Overfitting
• When the model has a low error rate in training data but a high error rate

AI&DS
in testing data, we can say the model is overfitting.
• This usually occurs when the number of training samples is too high or

Sem VI
the hyperparameters have been tuned to produce a low error rate on the
training data.
• A low error rate in training data implies Low Bias whereas a high error

Machine Learning,
rate in testing data implies a High Variance, therefore In simple terms,
Low Bias and Hight Variance implies overfitting

135
Dr. Himani Deshpande (TSEC)
Understanding Bias and Variance
Underfitting
• When the model has a high error rate in the training data, we can say the

AI&DS
model is underfitting.
• This usually occurs when the number of training samples is too low.

Sem VI
• Since our model performs badly on the training data, it consequently
performs badly on the testing data as well.
• A high error rate in training data implies a High Bias, therefore In simple

Machine Learning,
terms, High Bias implies underfitting

136
Dr. Himani Deshpande (TSEC)
Overfitting, Underfitting in Regression

AI&DS
• In the first image, we try to fit the data using a linear equation. Due to the low flexibility of a

Sem VI
linear equation, it is not able to predict the samples (training data), therefore the error rate
is high, and it has a High Bias which in turn means it’s underfitting. This model won’t
perform well on unseen data.

Machine Learning,
• In the second image, the model is flexible enough to predict most of the samples correctly
but rigid enough to avoid overfitting. In this case, our model will be able to do well on the
testing data therefore this is an ideal model.
• In the third image, although it’s able to predict almost all the samples, it has too much
flexibility and will not be able to perform well on unseen data. As a result, it will have a
high error rate in testing data. Since it has a low error rate in training data (Low Bias) and
high error rate in training data (High Variance), it’s overfitting.
Dr. Himani Deshpande (TSEC) 137
Overfitting, Underfitting in Classification

AI&DS
Sem VI
• For Model A, The error rate of training data is too high as a result of which the error rate
of Testing data is too high as well. It has a High Bias and a High Variance, therefore it’s
underfit. This model won’t perform well on unseen data.
• For Model B, The error rate of training data is low and the error rate ofTesting data is low

Machine Learning,
as well. It has a Low Bias and a Low Variance, therefore it’s an ideal model. This model
will perform well on unseen data.
• For Model C, The error rate of training data is too low. However, the error rate of Testing
data is too high as well. It has a Low Bias and a High Variance, therefore it’s overfit. This
model won’t perform well on unseen data.

Dr. Himani Deshpande (TSEC) 138


Over Fitting, Under fitting

AI&DS
Sem VI
Machine Learning,
139
Dr. Himani Deshpande (TSEC)
Module 1.3 : Introduction
To Machine Learning

AI&DS
Sem VI
Machine Learning,
Topics: Performance Measures: Measuring Quality of model- Confusion Matrix,
Accuracy, Recall, Precision, Specificity, F1 Score, RMSE
140

Dr. Himani Deshpande (TSEC)


Performance Measures

AI&DS
Sem VI
Machine Learning,
141
Dr. Himani Deshpande (TSEC)
TP, TN, FP , FN

• True Positives (TP):


when the actual value is Positive and predicted is also Positive.

AI&DS
• True negatives (TN):
when the actual value is Negative and prediction is also Negative.

Sem VI
•False positives (FP):

Machine Learning,
When the actual is negative but prediction is Positive. Also known as the Type 1
error
•False negatives (FN):
When the actual is Positive but the prediction is Negative. Also known as
the Type 2 error
142
Dr. Himani Deshpande (TSEC)
TP, TN, FP , FN

We have a total of 20 cats and dogs and our


model predicts whether it is a cat or not.

AI&DS
Sem VI
Actual values =
[‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’,
‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’]

Machine Learning,
True Positive (TP) = ?
True Negative (TN) = ?
False Positive (Type 1 Error)
Predicted values = (FP) = ?
[‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘cat’, ‘cat’, ‘cat’, False Negative (Type 2 Error)
(FN) = ?
‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’]
143
Dr. Himani Deshpande (TSEC)
TP, TN, FP , FN

• True Positive (TP) = 6

AI&DS
• True Negative (TN) = 11

Sem VI
• False Positive (Type 1 Error) (FP) = 2
• False Negative (Type 2 Error) (FN) = 1

Machine Learning,
144
Dr. Himani Deshpande (TSEC)
Confusion Matrix

• A Confusion matrix is an N x N matrix used for evaluating


the performance of a classification model, where N is the number

AI&DS
of target classes.

Sem VI
• The matrix compares the actual target values with those predicted by
the machine learning model.

Machine Learning,
145
Dr. Himani Deshpande (TSEC)
Multi ClassConfusion Matrix

• A Confusion matrix is an N x N matrix used for evaluating


the performance of a classification model, where N is the number

AI&DS
of target classes.

Sem VI
Machine Learning,
146
Dr. Himani Deshpande (TSEC)
Prediction Accuracy
Accuracy simply measures how often the classifier makes
the correct prediction. It’s the ratio between the number
of correct predictions and the total number of predictions.

AI&DS
Accuracy is a valid choice of evaluation for classification

Sem VI
problems which are well balanced and not skewed or
there is no class imbalance.

The accuracy metric is not suited for imbalanced

Machine Learning,
classes. Accuracy has its own disadvantages,
for imbalanced data, when the model predicts that each
point belongs to the majority class label, the accuracy will
be high. But, the model is not accurate.

147
Dr. Himani Deshpande (TSEC)
Precision
It is a measure of correctness that is achieved in true prediction.
In simple words, it tells us how many predictions are actually
positive out of all the total positive predicted.

AI&DS
Precision is defined as the ratio of the total number

Sem VI
of correctly classified positive classes divided by the total
number of predicted positive classes. Or, out of all the
predictive positive classes, how much we predicted
correctly. Precision should be high(ideally 1).

Machine Learning,
“Precision is a useful metric in cases where False Positive is a
higher concern than False Negatives”
148
Dr. Himani Deshpande (TSEC)
Recall
It is a measure of actual observations which are
predicted correctly, i.e. how many observations of positive
class are actually predicted as positive.

AI&DS
It is also known as Sensitivity. Recall is a valid choice of

Sem VI
evaluation metric when we want to capture as many
positives as possible.

Recall is defined as the ratio of the total number of correctly

Machine Learning,
classified positive classes divide by the total number of positive
classes. Or, out of all the positive classes, how much we have
predicted correctly.
Recall should be high(ideally 1).

“Recall is a useful metric in cases where False


Negative trumps False Positive” 149
Dr. Himani Deshpande (TSEC)
AI&DS
Sem VI
Machine Learning,
150
Dr. Himani Deshpande (TSEC)
FI Score

F1 score is a harmonic mean of Precision and Recall. As compared to Arithmetic Mean,

AI&DS
Harmonic Mean punishes the extreme values more. F-score should be high(ideally 1).

Sem VI
F1 score sort of maintains a balance between the precision and recall for your
classifier. If your precision is low, the F1 is low and if the recall is low again your F1
score is low.

Machine Learning,
There will be cases where there is no clear distinction between whether Precision is
more important or Recall. We combine them!

151
Dr. Himani Deshpande (TSEC)
Sensivity & Specifity

AI&DS
Sem VI
Machine Learning,
152
Dr. Himani Deshpande (TSEC)
Sensitivity & Specificity

•Sensitivity -Recall
(true positive rate) refers to the probability of a

AI&DS
positive test, conditioned on truly being positive.

Sem VI
•Specificity
(true negative rate) refers to the probability of a

Machine Learning,
negative test, conditioned on truly being negative.

Sensitivity and specificity are used to evaluate the accuracy of


medical tests, while precision and recall are used to evaluate
model performance
153
Dr. Himani Deshpande (TSEC)
Sensivity & Specifity

AI&DS
Sem VI
High sensitivity means High specifity means

Machine Learning,
High TP and Low FN High TN and Low FP

154
Dr. Himani Deshpande (TSEC)
Sensivity & Specifity

AI&DS
Sem VI
Machine Learning,
155
Dr. Himani Deshpande (TSEC)
AI&DS
Sem VI
Machine Learning,
156
Dr. Himani Deshpande (TSEC)
END OF

AI&DS
Sem VI
UNIT - I

Machine Learning,
157
Dr. Himani Deshpande (TSEC)

You might also like