0 ratings0% found this document useful (0 votes) 280 views95 pagesEngineer Being Machine Learning Notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
MACHINE LEARN!
NOTES
MOST IMPORTANT QUESTIONS OF MACHINE LEAR!
-ENGINEER BEING
iG AKTU
MODULE 1
PART-I
Learning is the process of acquiring new understanding, knowledge, beliaviors,
skills, values, attitudes, and preferences. Learning is any process by which
system improves its performance from experience.
Ques2. What is Machine Learning? 2020-21 2M
Ans. Machine leaning (ML) is defined as a discipline of artificial intelligence (AT)
that provides machines the ability to automatically lea from data and past
experiences to identify patterns and make predictions with minimal human
intervention.
“Machine learning enables a machine to automatically lean from data, improve
performance from experiences, and predict things without being explicitly
programmed”.
‘Ques3.Difference between ML, AI, Deep Learning? 2020-21 2MArtificial Intelligenes
Lis the broadest concept of
all, and gives a machine the ability to imitate human Rit poe
behaviour. J orcitisqenee
Machine Learning: Machine Learning uses 7 Mecine. lem.
algorithms and techniques that enable the machines {
to learn from past experience/trends and predict the { (Dep
output based on that data, their performance improve. \ earring
as they are exposed to more data over time
Deep Learning: subset of machine learning in
which multilayered neural networks learn from >
vast amounts of data.
‘The main difference between machine learning and deep learning technologies is
of presentation of data. Machine learning uses structured/unstructured data for
learning, while deep learning uses neural networks for learning models,
applications of ML?
‘Ans. Machine learning is important because it gives enterprises a view of trends in
customer behavior and business operational patterns, as well as supports the
development of new products.
Many of today's leading companies, such as Facebook, Google and Uber, make
machine learning a central part of their operations; Machine learning has become a
significant competitive differcutiator for many, companies.
Applications of ML:
1. Image recognition:
a. Image recognition is the process of identifying and detecting an object or a
feature in a digital image or video.
b. This is used in many applications like systems for factory automation, toll booth
monitoring, and security surveillance.2. Speech recognition :
a. Speech Recognition (SR) is the translation of spoken words into text.
b. It is also known as Automatic Speech Recognition (ASR), computer speech
recognition, or Speech To Text (ST).
c. In speech recognition, a sottware application recognizes spoken words.
3.Product recommendation
Machine learning is widely used by various e-commerce and entertainment
companies such as Amazon, Netflix, etc., for product recommendation to the user.
Whenever we search for some product on Amazon, then we started getting an
advertisement for the same product while internet surfing on the same browser and
this is because of machine learning.
4, Email Spam and Malware Filtering:
Whenever we receive a new email, itis filtered automatically as important; normaly
and spam,
We always receive au important mail in ou inbox with the important symbol and
spam emails in our spam box, and the technology behind this is Machine-leamning-
5. Stock Market trading:
Machine learning is widely used in stock market trading.
In the stock market, there is always a risk of up and downs in shares, so for this
machine*Ieamning’s long short term memory neural networkis used for the
prediction of stock market trendsTypes of Machine Learning:
© Supervised Learning
© Unsupervised Learning
* Reinforcement Learning
Supervised learning is the types of machine learning in which machines are
trained using well “labelled” training data, and on basis of that data, machines
predict the output.
The labelled data means some input data is already tagged with the correct output.
Ex: Risk Assessment, Image classification, Fraud Detection, spam filtering, etc.
Types of Supervised learning
© Classifications) classification problem is when the Output variable is a
category, such as “red” or “blue” “disease” and “no disease”, Yes-No,
MaleFemale, True-false, etc.© ii, Regression: A regression problem is when the output variable is a real
value, such as, Forecasting sales, Weather forecasting, ete.
Unsupervised learning is a type of machine learning in which models are
trained using unlabeled dataset and are allowed to act on that data without
any supervision
The goal of unsupervised learning is to find the underlying structure of
dataset, group that data according to similarities, and represent that dataset
in a compressed format.
+ The output is dependent upon the coded algorithms.
a ob
Rag 200
+ Clustering: A clustering problem is where you want to discover the
inherent groupings in the data, such as grouping customers by purchasing
behavior.
‘© "Association: Association rule learning is a kind of unsupervised learning
technique that tests for the reliance of one data element on another
data element and design appropriately so that it can be more cost-
effective. It tries to discover some interesting relations or associations
between the variables of the dataset.
Semi Supervised learning is between the supervised and unsupervised learning
families. The semi-supervised models use both labeled and unlabeled data for
trainingReinforcement Learning is a feedback-based Machine learning technique in
which an agent leams to behave in an environment by performing the actions and
seeing the results of actions. For each good action, the agent gets positive
feedback. and for each bad action. the agent gets negative feedback or penalty.
‘The main elements ofan RL system are:
+ The agent or the learner
+The environment the agent interacts |= Renate aed
with = i
J
Net
+The policy that the agent follows to
take actions Dig
Ko
+The reward signal that the agent
observes upon taking action
GENETICATGORITHM TRANITIONAT AT GORTTHM.
A genetic algorithm isa search-based _| Traditional Algorithms refers
algorithm used for solving optimization | to general algorithms we use to solve
problems in machine learning. problems. It is a methodical procedure
to solve a given problem. There can be
several algorithms to solve a problem.
More Advanced Not as Advanced
Used in ML, AT Used in Programming, Math,1) Process Complexity of Machine Learning
The machine leaning process is very complex, which is also another major issue
faced by machine leaming engineers and data scientists. There is the majority of
hits and trial experiments; hence the probability of error 1s higher than expected.
Further, it also includes analyzing the data, removing data bias, training data,
applying complex mathematical calculations, etc., making the procedure more
complicated and quite tedious.
2) Getting bad recommendations
A machine learning model operates under a specific context which tesiilts in bad
recommendations and concept drift in the model. Suppose at a specific \time
customer is looking for some gadgets, but now customer requirement changed over
time but still machine leaming. model showing same recommendations to the
customer while customer expectation has been changed. This incident is called a
Data Drift. However, we can overcome this by regularly updating and
monitoring data according to the expectations.
3) Overfitting and Underfitting
Overfitting:
Overfitting is one of the most common issues faced by Machine Learning
engineers and data scientists, Whenever a machine learning model is trained with a
huge amount Of data, it starts capturing noise and inaccurate data into the training
data set.
Underfitting:
Underfitting is just the opposite of overfitting. Whenever a machine learning
model is trained with fewer amounts of data, and as a result, it provides incomplete
and inaccurate data and destroys the accuracy of the machine learning model.
4) Inadequate Traitiing Data
The major issue that comes While Using machine leaning algorithms is the lack of
quality as well as quantity of data. Although data plays a vital role in theprocessing of machine leaming algorithms, many data scientists claim that
inadequate data, noisy data, and unclean data are extremely exhausting the
machine learning algorithms.
For example, a simple task requires thousands of sample data, and an advanced
task such as speech or image recognition needs millions of sample data
examples. Further, data quality is also important for the algorithms to work ideally,
but the absence of data qnality is alsa found in Machine I earning applications
5) Monitoring and maintenance
As we know that generalized output data is mandatory for any machine learning
model. Hence, regular monitoring and maintenance become compulsory for the
same. Different results for different actions require data change; hence editing
of codes as well as resources for monitoring them also become necessary;
Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving,
Classification problems. It is a tree-structured classifier, where internal
nodes represent the features of a dataset, branches represent the
decision rules and each leaf node represents the outcome.
In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision and
have ‘multiple branches, whereas Leaf nodes are’ the Output! Of those
decisions and do not contain any further branches.
© In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
representation :You
Tube
ENGINEER
BEINGMachine Leaming
ANN
Clustering
Reinforcement Leat
Decision Tree Leamin
Bayesian Networks
SVM (Support Vector Mac
Genetic Algorithms
2020-21 10M
The term "Artificial Neural Network" is derived from Biological neural networks —
that develop cS oe a a pate brain. Similar to the human brain that has
0 neurons
. These+ The architec Hon
cir
aewtas
Biological Network ificial Ne letwork
Si
Dent
‘Synapse iterconnect
‘Axon Output
In a neural network, there are three essential layers —
Input Layers
Tupe
The inpns layer is the first layer of an ANN that receives the input information in
the form of various texts, numbers, audio files, image pixels, etc.
idden Havers
In the middle ofthe ANN model are the hidden layers. There can bea singleOutput Layer
In the ouput layer, we obtain the result that we obtain through rigorous
computations performed by the middle layer:
Artificial Neural Networks Application problems to apply:
Following are the important Artificial Neural Networks applications
Handwritten Character Recognition
ANNS are used for handwritten character recognition. Neural Networks are trained
to recognize the handwritten characters which ean be in the form of letters or
digits
Facial Recognition
In order to recognize the faces based on the identity of the person, we make use of
neural networks. They are most commonly used in areas where the users require
security access.
Speech Recognition
ANNs play an important role in speech recognition. The earlier models of Speéch
Recognition were based on statistical models like Hidden Markov Models. With
the advent of deep learning. various types of neural networks are the absolute
choice for obtaining an accurate classificetion.
2020-21 10M (UNIT 2)
SVM or Support Vector Machine isa linear model for classificationsand regression
problems. It can solve linear and non-linear problems and work well for many
practical problems:ccording to the SVM algorithm we find the points closest to the line from both the
classes. These points are called support vectors.
we compute the distance between the line and the support vectors. This distance is
called the margin. Our goal is to maximize the margin. The hyperplane for which
the margin is maximum is the optimal hyperplane.
Thus SVM tries to make a decision boundary in such a way that the separation
between the two classes is as wide as possible.
202
1 10M
Clustering
© Away of grouping the data points into different clusters, consisting of
similar data points, The objects with the possible similarities remain in a
group that has less or no similarities with another group."
© Itis an unsupervised learning method, hence no supervision is provided to
the algorithm, and it deals with the unlabeled dataset.
* After applying this clustering technique, each cluster or group is ptovided
with a cluster-ID. ML system can use this id to simplify the processing of
large and complex datasets.
© The clustering technique is commonly used for statistical data analysis.
Example =
Clustering technique with the real-world example of Mall:
When we visit any shopping mall, we can observe that the things with similar
usage are grouped together. Such as the t-shirts are grouped in one section, and
trousers are at other sections, similarly, at vegetable sections, apples, bananas,
Mangoes, etc., are grouped in separate sections, so that we can easily find out the
things. The clustering technique also works in the same way.Classification and Regression
Regression and Classification algorithms are Supervised Learning
algorithms. Both the algorithms are used for prediction in Machine learning
and work with the labeled datasets. But the difference between both is how
they are used for different machine learning problems.
Classification
Regression
Classification algorithms are used
to predict/Classify the discrete
values such as Male or Female,
True or False, Spam or Not Spam,
ete.
Regression algorithms are used
to predict the continuous values
such as price, salary, age, etc.
The task of the classification
algorithm is to map the input
value(x) with the discrete output
variable(y).
The task of the regression algorithm
is to map the input value (x) with
the continuous output variable(y).
Classification Algorithms are used
with discrete data.
Regression Algorithms are used
with continuous data.
The Classification algorithms can
be divided into Binary Classifier
and Mulli-class Classifier.
The regression Algorithm can be
further divided into Linear and
‘Non-linear Regression.
Classification Algorithms can be
used to solve classification
problems 'suchas Identification of
spam emails, Speech Recognition,
Identification of cancer.cells, etc.
Inj Email Spam Detection, the
model is trained on the basis of
millions of emails on different
parameters) and whenever it
receives a new email, it identifies
whether the'email is spam/ornot. If
the email is spam, then it is moved
to the Spamifolder
Regression algorithms can be used
to solve the regression problems
such as Weather Prediction.
House price prediction, etc.
Suppose we want to do weather
forecasting, so-for thisywerwill use
the Regression algorithm. In
weather prediction, the model is
frainedon the past datay and/once
the training is completed, it can
easily predict the weather for future
days.2021-22 2M
A learning problem is said to be well defined if it has three features: the class of
tasks, the measure of performance to be improved, and the source of experience
Ex: A checkers learning problem
~Task T: playing checkers
—Performance measure P: percent of games won against opponents
Trait
g experience B. playing practice games against itsell
"Data Science is a field of deep study of data that includes extracting useful
insights from the data, and processing that information using different tools,
statistical models, and Machine learning élgorithms.”
Machine Leaning allows the computers to learn from the past expericnees by
its own, it uses statistical methods to improve the performance and predict the
output without being explicitly programmed.
Or
Design the final design of Checkers Learning Program 2021-22 10M
Learning is the process of acquiring new understanding, knowledge, behaviors,
skills, values, attitudes, and preferences. Learning is any process by which a
system improves its performance from experience.
Designing a Learning System in Machine Learning:Step 1) Choosing the Training Experience: The very important and first task is
to choose the training data or training experience which will be fed to the
Machine Learning Algorithm.
Three attributes are used:
1. Whether the training experience provides direct or indirect feedback
regarding the choices made by the performance system.
2. Direct training examples in léaming to play checkers consist of individual
checkers board states and the correct move for each.
3. Indirect training examples in the same game consist of the move sequences
and final outcomes of various games played in which information about the
correctness of specific moves early in the game must be inferred indirectly
from the fact that the game was eventually won or lost ~credit assignment
problem.
2. The degree to which the learner controls the sequence of training examples.
Example: ~The learner might rely on the teacher to select informative board states
and to provide the correct move for each ~The learner might itself propose board
states that it finds particularly confusing end ask the teacher for the correct move, -
Or the learner may have complete control over the board states and (indirect)
classifications, as it does when it learns by playing against itself with no teacher
present.
3.The representation of the distribution of samples across which
performance will be tested is.the third crucial attribute.
This basically means the more diverse the set of training experience can be
the better the performance can get.
Example: If the training experience in play checkers consists only of games played
against itself, the leamer might never encounter certain crucial board states that are
very likely to be played by the human checker’s champion.
Step 2- Choosing target function: ‘To determine what type of knowledge will be
learned and how this will be used by the performance program.
Example: ~In play checkers, it needs to learn to choose thé best move among those
legal moves.Step 3- Choosing Representation for Target function: Once done with
choosing the target function now we have to choose a representation of this
target function, When the machine algorithm has a complete list of all
permitted movements, it may pick the best one using any format, such as
linear equations, hierarchical graph representation, tabular form, and so on.
Out of these moves, the NextMove function will move the Target move,
which will increase the success rate. For example; if achess machine has four
alternative moves, the computer will select the most optimal move that will
ead to victory
Step 4- Choosing Function Approximation Algorithm:
In this step, we choose a learning algorithm that can approximate the target
function chosen. This step further consists of two sub-steps: a. Estimating the
training value, and b. Adjusting the weights.
Tew i
Cael (eT
nes Cerenco. Hap otests
Probie
Adlitien | -C caibe Exorp les
Trace
Cort Wisronp)
The final design consists of four modules, as described in the picture.
1. The performance system: The performance system solves the given
performance task.ENGINEER
BEINGMOST IMPORTANT QUESTIONS MACHINE LEARNI
AKTU
iIGINEER BEING
MODULE 2
PART-I
a
2020-21 10M
Or
Discuss Support vectors im SVM. 2020-21 2M
Or
iia ey 2020-21 10M
SVM or Support Vector Machine is a linear model for classification and
regression problems, It can solve linear and non-linear problems and work well for
many practical problems.
It tries to classify data by a hyperplane that maximizes the margin
between the classes in the training data. Hence, SVM is an example of a large
margin classifier.
‘The idea of SVM is simple: The algorithm creates a line or a hyperplane which
separates the data into classesAccording to the SVM algorithm we find the points closest to the line from both the
classes. These points are called support vectors.
we compute the distance between the line and the support vectors. This distance is
called the margin. Our goal is to maximize the margin. The hyperplane for which
the margin is maximum is the optimal hyperplane.
Thus SVM tries to make a decision boundary in such a way that the separation
between the two classes is as wide as possible.
SVM KERNELS.
* SVM can work well in non-linear data cases using kernel trick.
* The function of the kernel trick is to map the low-dimensional input space and
transforms into a higher dimensional space.
+ In simple words, kemel converts non-separable problems into separable problems
by adding more dimensions to it.
+ It makes SVM more powerful, flexible and accurate.Dennen erent
S
oe
THREE TYPES OF KERNEL
1)Linear Kernel: A linear kernel can be used as normal dot product ofany two
given observations, The equations for the kernel function:
K(x, xi)=sum(x« xi)
2)Polynomial kernel: It is more generalized form of linear kernel and distinguish
curved or nonlinear input space.
Itis popular in image processing.
Following is the formula for polynomial kernel —
k(X, Xi)=1+sum(X« Xi)*d , d is the degree of the polynomial
3)Gaussian Radial Basis Function (RBF) Kernel: RBE kernel, mostly used in
SVM classification, maps input space in indefinite dimensional space:It is a general-purpose kernel; used when there is no prior knowledge about
the data
Following formula explains it mathematically :
K(x, xi)-exp(-gamma * sum(x-xi"2))
Gamma funetion: 1/20?
APPLICATIONS OF KERNEL
* Face detection — SVM classify parts of the image as a face and non-face
and create a square boundary around the face.
¢ Handwriting recognition — We use SVMs to recognize handwritten
characters used widely.
¢ Texture Classification using SVM- In this SVM application, we use the
images of certain textures and use that data to classify whether the surface is
smooth or not.
+ Stenography Detection in Digital Images
Using SVM, we can find out if an image is pure or adulterated. This could
be used in security-based organizations to uncover secret messages. Yes,
we can encrypt messages in high-resolution images.
In high-resolution images, there are more pixels, hence, the message is more
hard torfind. We can segregate the pixels and store in data‘in various
datasets. We can analyze those datesets using SVM.
PROPERTIES OF SYM:
1. Flexibility:in-choosing-a similarity function2. Sparseness of solution when dealing with large data sets- only support
vectors are used to specify the separating hyperplane
Ability to handle large feature spaces- complexity does not depend on the
dimensionality of the feature space
4. Overfitting can be controlled by soft margin approach (we let some data
points enter our margin intentionally)
s. A simple convex optimization problem which is guaranteed to converge to a
single global solution,
DISADVANTAGES OF SVM:
1. SVM algorithm is not suitable for large data sets because the required
training time is higher.
2. SVM does not perform very well when the data set has more noise ic.
target classes are overlapping.
3. In cases where the number of features for each data point exceeds the
number of training data samples, the SVM will underperform,
4, SVMs with the ‘wrong’ kernel - For SVMs nowadays, choosing the right
kernel function is key. As an example, using the linear kemel when the
data are not linearly separable results in the algorithm performing poorly.
2020-21 2M
Regression is asupervised learning technique which helps in finding the
correlation between variables and enables us to predict the continuous output
variable based on the one or more predictor variables.It is mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables
Some examples of regression can be as:
© Prediction of rain using temperature and other factors
© Determining Market trends
Prediction of road accidents due to rash driving.
Tt is used to find the trends in data.
By performing the regression, we can confidently determine the most important
factor, the least important factor, and how each factor is affecting the other
factors.
Linear Regression
Logistic Regression
Linear Regression is a supervised
regression model.
Logistic Regression is a supervised
classification model.
In Linear Regression, we predict the
value by an integer number.
In Logistic Regression, we predict the
value by 1 or 0.
It is based on the
estimation.
Teast square
It is based on maximum likelihood
estimation.
Here when we plot the training
datasets, a straight line can be drawn
that touches maximum plots.
‘Any change in the coefficient leads toa
change in both the direction and the
steepness of the logistic function. It
means positive slopes result in an S-
shaped curve and negative slopes result
in a Z-shaped curve.
Linear regréssion is used to estimate
the dependent variable in case of a
change in independent yariables..For
example, predict the price of houses.
Whereas logistic regression is used to
calculate the probability of an event.
For.cxample, classify..if tissue is
benign or malignant.MOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU
-ENGINEER BEING
MODULE 2
PART-II
2020-21 2M
The vertices and edges in Bayesian Network have some sort of meaning, The
network building itself gives you important information about the subject
dependence between the variables) With Neural Networks the network structure
does not tell you anything like Bayesian Network.
Similarity in ANN and Bayesian Network is that they both uses directed graphs.
Input ddan Layer Output
Outpt
See 2 aaa Sean 2021-22 10MOr
2021-22 10M
Bayes theorem is one of the most popular machine learning concepts that helps to
calculate the probability of occurring one event with uncertain knowledge while
other one has already occurred.
Bayes Theorem is a way of finding a probability when we know certain other
posisbilities.
P(X[Y) = P(VIX).P(X)
PCY),
Which tells us: how often X happens given that Y happens, written P(X/Y),
When we know: how ofter Y happens given that X happens, written P(Y/X)
and how likely X is on its own, written P(X)
and how likely Y is on its own, written P(Y)
The above equation is called’as Bayes Rule or Bayes Theorem:
o™P(X{Y) is called as posterior, which we need to’ealculate!"It is defined as
updated probability after considering the evidence.
P (Y|X) is called the likelihood. It is the probability of evidence when
hypothesis is true.
c P(X) is called the prior probability, probability of hypothesis before
considering the evidence
© P(Y) is called marginal probability. It is defined as the probability of
evidence under any consideration.Hence, Bayes Theorem can be written as
posterior = likelihood * prior / evidence
EXAMPLE:
Dangerous fires are rare (1%)
But smoke is fairly common (10%) due to barbecues.
And 90% of dangerous fires make smoke
We can then discover the probability of dangerous fire when there is no smoke:
P(Fire/Smoke) = P(Fire) P(Smoke/Firey/P(Smoke)
=(1% * 90% )/ 10%
=9%
Naive Bayes Classifier Algorithm
Naive Bayes algorithm is a supervised learning algorithm, which is based
on Bayes theorem and used for solving classification problems.
It is mainly used in ‘ext classification that includes a. high-dimensional
training dataset.
It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object
Some popular examples of Natve Bayes Algoridn are spam filtration,
Sentimental analysis, and classifying articles.
The distinction between Bayes theorem and Naive Bayes is that Naive Bayes
assumes conditional independence where Bayes theorem does not. This means
the relationship between all input features are independent
Working of Naive Bayes! Classifier:
Working of Naive Bayes! Classifier can be understood with the help of the below
example:Suppose we have a dataset of weather conditions and corresponding target
variable "Play". So using this dataset we need to decide that whether we should
play or not on a particular day according ‘o the weather conditions. So to solve this
problem, we need to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the Player should play or not?
| Outlook a Play
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 ‘Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
1 Rainy No
12 Overcast Yes
13 a 40a eS
Likelihood Table:Frequency Table:
Weather Yes No.
Overcast
Rainy
Sunny 3
Total 10
‘Applying Bayes
Weather No es
Overcast 0
Rainy B,
Sunny Zz 3
All 4/14=0.29 10/14=0.71
P(¥es | Sunny)— PSunny | Yes)*P(Ves)/P(Sunny)
P(Sunny | Yes)=3/10=03
P(Sunny)=0.35
P(Yes)-0.71
So P(Yes |
P(No | Su
P(Sunny | NO) = 2/40:
.71/0.35= 0.
(0)*P(No) nny)P(No)= 0.29
P(Sunny)= 0.35
AD
So P(No | Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that P(Yes | Sunny)>P(No | Sunny)
Hence on a Sunny day, Player can play the game.
Ques 3) what problem docs EM algorithm solves? 10M 2021-22
Or
what are task of E-stepsin EM Algorithm? 2M 2020-21
The Expectation-Maximization (EM) algorithm is defined as the combination of
various unsupervised machine learning algorithms, which is used to determine
the local maximum likelihood estimates (MLE) or maximum a posteriori
estimates (MAP) for unobservable variables in statistical models.
it is a technique to find maximum likelihood estimation when the latent variables
are present. It is also referred to as the latent variable model.
A latent variable model consists of both observable and unobservable variables
where observable can be predicted while unobserved are inferred from the
observed variable. These unobservable variables are known as latent variables
Steps in EM Algorithm
The EM algorithmyis completed)mainly in™4"steps, which include Initialization
‘Step, Expectation Step, Maximization Step, and convergence Step.G2D—> Initiar Values
[fe Eee |
Monson wetion Sep
1S naan Sapp
1* Step: The very first step is to initialize the parameter values. Further, the
system _is provided with incomplete observed data with the assumption
that data is obtained from a specific model.
2" Step: This step is known as Expectation or E-Step, which is used to
estimate or guess the values of the missing or incomplete data using the
observed data. Further. E-step primarily updates the variables.
3" Step: This step is known as Maximization or M-step, where we_use
complete data obtained from the 2™ step to update the parameter values.
Further, M-step primarily updates the hypothesis.
4" step: The last step is to check if the values of latent variables are
Converging or not. If.it-gets "yes", then stop the” process "else, repeat the
process from step 2 until the convergence occurs.MOST IMPORTANT QUESTIONS MACHINE LEAR! AKTU
-ENGINEER BEING
MODULE 3
PART-I
If we depend too much on the training data while drawing the decision tree, there
is a possibility that the tree will go into overfitting. That is, a particular hypothesis
will work good on the training data, but it doesn’t work good on Testing or the real
world data So such tree is calledas a overfitting.
Underfitting occurs when our machine learning model is not able to capture the
underlying trend of the data. In the case of underfitting, the model is not able to
learn enough from the training data, and hence it reduces the accuracy and
produces unreliable predictions.
a. overfitting the data
b. handling continuous valued attribute
c. handling missing attribute values
d. handling attributes with different costsAns: a. overfitting the data
If we depend too much on the training data while drawing the decision tree, there
is a possibility that the tree will go into overfitting. That is, a particular hypothesis
will work good on the training data, but it doesn’t work good on Testing or the real
world data So such tree is called as a overfitting,
This particular overfitting can be addressed with the two techniques
reduced error pruning
post rule pruning.
The decision tree works well with the problems where we have fixed number of
attributes and the discrete number of possibilities for each attributes. Ifa particular
attribute has the continuous values, then we cannot apply the decision tree directly.
First, we need to convert those particular attributes which are having continuous
values into a discrete possibilities. Then only we can apply decision tree learning.
if you have some missing attributes, we need to fill those particular missing
attributes with a proper values then only we can use this learning. Let us Say that a
particular attribute is not having a value, we need to find some value or fill it with
the proper value
Whenever we apply decision tree algorithm, each and every attribute in the given
equal importance. But sometimes what happens is @ given problem definition, there
is a possibility that a particular attribute may have more importance or it is given
more weightage, In such case we cannot use the core decision tree learning. We
need to handle this particular issue with some sort of calculation.| Ca-4)+ (4-4 ir
r (3-37 + G ie ay 2 - ee
GAMES Wplis se |
BEINGil | |
un ts.
hit |
ENGINEER
BEINGSEGEFvesRavaR ly
In Decision Tree the majo
root node in each level. This
popular attribute selection mea:
1. Information Gain
2. Gini Index
Information Gain
When we use a node in a decision tree to partition the training instances into
smaller subsets the entropy changes. Information gain is a mcasure of this change
inentropy
Gain(S.A)= Entropy(S) ~ y-vatuents) Sh.-Entropy (Se)
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
BEINGesse] Teli ls]
222299929299 29
pajass}aesi}}a|
gan BABSBSAREERENG
ot BEN
iseassscere
ENGINEER
BEL
ISSRABABABLEREAacess assageeeg|ge22eeeee228
Se
gF920 77122
sessTs)
PFE EEE RE!
pppasagays
ai}ipals
BRB ASRABRAEE
3 |
BRBASRRERAR EEREMOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU
-ENGINEER BEING
MODULE 3
PART-II
different from radial basis function network? 2020-21 10M
Ans: Instance-based learning refers to a family of techniques for classification
and regression, which produce a class label/predication based on the
similarity of the query to its nearest neighbor(s) in the training set.
Some of the instance-based learning algorithms are :
1. K Nearest Neighbor (KN)
2. Locally Weighted Learning (LWL)
3. Case-Based Reasoning
Locally weighted regression
Locally weighted linear regression is a non-parametric algorithm, that is,
the model does not learn a fixed set of parameters as is done in ordinary
linear regression. Rather parameters are computed individually for each
query point,
Locally weighted regression (LWR) is a memory-based method that
performs a regression around a point of interest using only training data that
are “ocal" to that point.
Locally weighted linear regression is'a supervised learning algorithm.
There exists No training phase. All the work is done during the testing
phase/while making predictions.
Locally weighted regression methods are a generalization of k-Nearest
Neighbour:* In Locally
constructed
Radial basis funct
RBF network on
layer, and an outpt
Input Layer
‘The input layer simply feeds the data to the hidden layers. As d FSSUlE, thé RuMber
of neurons in the input layer should be equal to the dimensionality of the data.
Hidden Layerematically
Output Layer
The output layer uses a linear activatio
regression tasks.In general, the case-based reasoning process entails:
Retrieve- Gathering from memory an experience elosest to the current
problem.
2. Reuse- Suggesting a solution based on the experience and adapting it to
mect the demands of the new situation.
3. Revise- Evaluating the use of the solution in the new context.
4, Retain- Storing this new problem-solving method in the memory system
A CADET system employs case based reasoning to assist in the conceptual design
of simple mechanical devices such as water faucets.
It uses a library containing approximately 75 previous designs and design
fragments of two suggest conceptual designs to meet the specifications of new
design problem
4
aT T+ terpet Fo, © 81 1,
[) 8 entiiow +
at lt
Sat =
r 1S
Qt igs
©. The function is represented in terms of qualitative relationships among the
water flow levels and temperatures at its inputs and outputs;°
In the functional description, an arrow with a “+” labeled indicates that the
variable at the arrow head increases with the variable at its tail. A “-” label
indicates that the variable at the head decreases with the variable at the tail.
o Here Qe refers to the flow of cold water from the into the faucet, Qh to the
input flow of hot water, and Qm to the single mixed flow out of the faucet.
o Te, Th, Tm refers to the temperature of the cold water, hot water and mixed
water respectively.
°
The variable Ct denotes the control signal for temperature that is input to the
faucet and Cf denote the control signal for water flow.
°
The control Ct and Cfare to influence the water flow Qc and Qh, thereby
indirectly influencing the faucet output flow Qm and temperature Tm.
2021-22 10M
Ease of knowledge elicitation : Lazy methods can utilise easily available case or
problem instances instead of rules that are difficult to extract.
Absence of problem-solving bias: Cases can be used for multiple problem-
solving purposes, because they are stored in a raw forms"This in contrast to eager
methods, which can be used merely for the purpose for which the knowledge has
already’been compiled.
Incremental learning : A CBL system can be put into operation with a minimal
set solved cases furnishing the case bases The case base will besfilled with new
cases increasing the system’s problem-solving ability
Ease of maintenance : This is particularly due to the fact that CBL systems can
adapt to many changes in the problem domain and the relevant environment,
merely by acquiringEase of explanation: The results ofa CBL system can be justified based upon the
similarity of the current problem to the reirieved case.CBL are easily traceable to
precedent cases, it is also easier to analyse failures of the system.
For example, CASEY for classification of auditory impairments, CASCADE for
classification of software failures
2021-22
The inductive bias (also known as leamning bias) of a learning algorithm is the set
of assumptions that the learner uses to predict outputs of given inputs that it
has not encountered. In machine learning, one aims to construct algorithms that
are able to learn to predict a certain target output.
Inductive learning methods require a certain number of training examples to
generalize accurately.
Analytical learning stems fromthe idea that when not enough training examples
are provided, it may be possible to “replace” the “missing” examples by prior
knowledge and deductive reasoning.
2021-22request is made.
on of the target
provided training
+ Lazy learningING AKTU
INEER BEING
Perceptrons are the buildin
learning algorithm of binary
The perceptron consists of 4 parts.
1. Input values or One input layer
2, Weights and Bias
3. Net sum
4. Activation Function
a. All wil
bAc. Apply that weighted sum to the correct Activation Function.
Weights shows the strength of the particular node.
A bias value allows you to shift the activation function curve up or down.
In short, the activation functions are used to map the input between the
required values like (0, 1) or (-1, 1),
Perceptron 1s usually used to classify the data into two parts. Iheretore, tt 1s also
known as a Linear Binary Classifier.
‘Ques 2)What is Gradient descent?
2021-22 2M
Gradient descent is an optimization algorithm which is commonly-used to train
machine learning models and neural networks, to find a local
minimum/maximum of a given function
This method is commonly used in machine learning (ML) and deep tearning(DL)
to minimize a cost/loss function.
2021-22 2M
In machine learning, the delta rule is a gredient descent learning rule for updating
the weights of the inputs to artificial neurons in a single-layer neural network. Itis
a special case ot the more general backpropagation algorithm.
‘Ques:4) Describe BPN algorithm in ANN along with a suitable example.
2020-21 10MBack-propagation is used for the (raining of neural network. U C
The Backpropagation algorithm looks for the minimum value of the error function
in weight space using a technique called the delta rule or gradient descent.
Tau aitificial neural nctwork, the values of weights aud Liases ave randuuily
initialized. Due to random initialization, the neural network probably has errors in
giving the correct output.
We need to reduce error values as much as possible. So, for reducing these error
values, we need a nen that can compare the desired output ofthe neural
network withthe.n e a a
and biases su
For this, we
and biases.Backpropagation is a short form for "backward propagation of errors." It is a
standard method of training artificial neural networks.
Backpropagation Algorithm:
Step 1: Inputs X, arrive through the preconnected path.
Step 2: The input is modeled using true weights W. Weights are usually chosen
randomly.
Step 3: Calculate the output of cach neuron fiom the input layer dhe hidden
layer to the output layer.
Step 4: Calculate the error in the outputs
Backpropagation Error Actual Output — Desired Output
Step 5: From the output layer, go back to the hidden layer to adjust the weights to
reduce the error.
Step 6: Repeat the process until the desired output is achieved.
Why We Need Backpropagation?
Most prominent advantages of Backpropzgation are:
+ Backpropagation is fast, simple and easy to program
+ It isa flexible method as it does not require prior knowledge about the network
+ It is a standard method that generally works well
+ It does not need any special mention of the features of the function to be leamed,
‘Types of Backpropagation Networks Two Types of Backpropagation Networks
are:
+ Static Back-propagation
+ Recurrent Backpropagation
The output two runs of @ neural network compete among themselves to become
active. Several output neurons may be active, but in competitive only single output
neuron is active at one time2020-21 10M.
Self OrganizingTo determine the best matchin
and calculate the Euclidean dist i and
current input vector. The node with tor closest to the inpt
tagged as the winning neuron.
Step 4: Find the new weight between input vector sample and winning output
Neuron.
New Weights = Old Weights + Learning Rate (Input Vector — Old Weights)
Step 5: Repeat st e eel weight
are similar to old we map stop clConvolutional Neural Networks (CNNs) are specially designed to wo
images. Convolutional Neural Networks (CNNs) are specially desigr
with images. An image consists of pixels. In deep learning, images are represented
as arrays of pixel values.
There are three main types of layers in a CNN:
¢ Convolutional layers
° 5 Pooling layers
Tn additic ti er and
fully connThere are four main types of operations in a CNN: Convolution
operation, Pooling operation, Flatten operation and Classification (or other
relevant) operation.
Convolutional layers and convolution operation: The first layer in a CNN is a
convolutional layer. It takes the images as the input and begins to process.
‘There are three elements in the convolutional layer: Input
image, Filters and Feature map
Section (3x3) Convolution operation between the image
and filter,
of spots
2}/0/1/0 4/3
o[1[3]2 ales
1) tts 3 |} 2
0};0);0)1 3)1
ol1[afo Feature map
(axa)
Input image Convolutional
(6x6) operation
Fil
: This is also called Kernel or Feature Detector.
Image section: The size of the image section should be equal to the size of the
filter(s) we choose. The number of image sections depends on the Stride.
Feature map: The feature map stores the outputs of different convolution
operations between different image sections and the filter(s).
‘The number of steps (pixels) that we shift the filter over the input image is
called Stride.Padding adds additional pixels with zero values to
each side of the image. That helps to get the feature
map of the same size as the input.
Padding=t
Pooling layers and pooling operation
Pooling layers are the second type of layer used in a
CNN. There can be multiple pooling layers in a
CNN. Each convolutional layer is followed by a Padded
Input mage
pooling layer. So, convolution and pooling layers are es)
used together as pairs.
It Reduce the dimensionality (number of pixels) of the output returned from
previous convolutional layers.
There are three elements in the pooling layer: Feature map, Filter and Pooled
feature map
There are two types of pooling operations.
+) Max pooling: Get the maximum value in the area where the filter is,
applied.
+ Average pooling: Get the average of the values in the area where the
filter is applied.
Then, we can flatten a pooled feature map that contains multiple channels.
Fully connected (dense) layers2020-21 10M.Size of kernel or filter is 3*3 hence the size of image section is also 3*3
(lle PAN foi
O- (t-kits
1D4
TXL4 LXE t0K0
OX( 1 OKO + 1KO.
= |IMOtKl +ixt
“LnputYou
Tube
ENGINEER
BEINGMOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU
MODULE 5
PART-I
REINFORCEMENT LEARNING — Introduction to Reinforcement Learning ,
Learning Task, Example of Reinforcement Learning in Practice, Learning Models for
Reinforcement — (Markov Decision process , Q Learning - Q Learning function, Q
Learning Algorithm ), Application of Reinforcement Learning, Introduction to Deep
Q Learning.
GENETIC ALGORITHMS: Introduction, Components, GA eyele of reproduction,
Crossover, Mutation, Genetic Programming, Models of Evolution and Learning,
Applications.
Reinforcement Learning is a feedback-based Machine learning technique in
which an agent learns to behave in an environment by performing the actions and
seeing the results of actions. For each good action, the agent gets positive
feedback, and for each bad action, the agent gets negative feedback or penalty.
The elements of reinforcement leaming are: Agent, Environment, Action, State,
Policy, Reward.
Leaming Models in RL:
© Markov Decision Process
© Q-Learning Algorithm
© Deep Q LeamingThe Markov Property state that :
“Future is Independent of the past
iven the present”
Mathematically we ean express this statement as :
P[Si+1 | Si] = P[S#i1 | Si, .-
15
It says that "If the agent is present in the current state S1, performs an action al
and move to the state s2, then the state transition from s1 to s2 only depends on the
current state and future action and states do not depend on past actions, rewards, or
states”.
MDP is a framework that can solve most Reinforcement Learning problems with
discrete actions
With the Markov Decision Process, an agent can artive at an optimal policy for
maximum rewards over time.
Markov Process is the memory less random process i.e. a sequence of a random
state S[1],S[2],....S[n] with a Markov Property.
Markov decision process has 5 tuples(S,A,P.R, 5):
*. Sis the set of states.
¢ Ais the set of action.
« P(S, A,S’)is the probability that action A in the state S at time T will lead
to state S’ at time T+ 1.
* R(S, A, S’) is the immediate reward received after a transition from State S
to S dash due to action A.* Discount Factor (x): It determines how much importance is to be given to
the immediate reward and future rewards. It has a value between 0 and 1
Quearning algorithm
© Q-learning is a popular model-free reinforcement learning algorithm based
on the Bellman equation.
© The main objective of Q-leaming is to lear the policy which can infarm the
agent that what actions
© The goal of the agent in Q-learning is to maximize the value of Q.
* Qsstands for quality in Q-learning, which means it specifies the quality of
ari Gétion taken by the agent should be taken for maximizing the toward
under what circumstances.
* A Q-Table is used to find the best action for each state in the environment.
We use the Bellman Equation at each state to get the expected future state
and reward and save it in a table to compare with other states.
Bellman Equation
V(s) ~ max [R@a) +yV@')] Where,
‘V(s)= value calculated at a particular point.R(s, a) = Reward at a particular states by performing an action.
y = Discount factor
Q-Leamning algorithm works like this:
Initialize all Q-values, e.g., with zeros
Choose an action a in the current state s based on the current best Q-value
Perform this action a and observe the outcome (new state s’).
Measure the reward R after this action
Update Q with an update formula that is called the Bellman Equation.
Repeat steps 2 to 5 until the learning no longer improves
EXAMPLE: An example of Q-learning is an Advertisement recommendation
system, In a normal ad recommendation system, the ads you get are based on your
previous purchases or websites you may have visited, If you’ ve bought a TV, you
will get recommended TVs of different brands.
Using Q-learning, we can optimize the ad recommendation system to recommend
products that are frequently bought together. The reward will be if the user clicks
on the suggested product.
DEEP Q-LEARNING MODEL.
°
O-Learning approach is practical for very small environments‘and quickly,
loses it’s feasibility when the number of states and actions in the
environment inereases:
The solution for the above leads us to Deep Q Learning which uses a deep
neural network to approximate the values
Deep Q Leaming uses the Q-leaming idea and takes it one step further.
Instead of-using a Q-table;weusea Neural Network thattakesia state and
approximates the Q-values for each action based on that state
The basic working'step for Deep Q-Learningiis that the initial stateiis fed into the
neural network and it retums the Q-value of all possible actions as an output.CT
able” => Q Value
State
Deep Q Learning
eRe)
Le)
State => | Nel ae
=p Q Value Action2
™D Q Value Action3
The difference between Q-Learning and Deep Q-Learning can be illustrated as
follows:
+ [aaa ‘amasInstead of using a Q-table, we use a Neural Network that takes a state and
approximates the Q-values for each action based on that state
Deep Neural Network
Ques 6) What are the applications of reinforcement learning?
Following are the applications of reinforcement learning :
Robotics for industrial automation.
2, Business Strategy planning.
3. Machine leaming and data processing.
4. Ithelps us to create training systems that provide custom instruction and
materials according to the requirement of students.
5. Aircraft control and robot motion control.INEER BEING
foreg on Diabetic Retinopathy, Bung anna speaker, Selderving ea
REINFORCEMENT LEARNING ~ Inodiction to Reinfrsement Leaning
Learning Task, Example of Reinforement Learning ia Practice, Learning Modes fr
Reinforeement~ (Markov Decision proces, Learing» Q Leasing funtion, Q
Leaming Algom .Applition ef Reinforcement Leaning, odin to DepThis algorithm reflects the process of natural selection where the fittest individuals
are selected in order to produce offspring of the next generation.
The process of natural selection starts with the selection of fittest individuals from
a population.
© They produce offspring which inherit the characteristics of the parents and
will be added to the next generation.
* Ifparents have better fitness, their offspring will be better than parents and
have a better chance at surviving. This process keeps on iterating and at the
end, a generation with the fittest individuals will be found.
© This notion can be applied for a search problem.
The genetic algorithm is a method for solving both constrained and
unconstrained optimization problems that is based on natural selection, the
process that drives biological evolution. The genetic algorithm repeatedly
modifies a population of individual solutions.
Five phases are considered in a genetie algorithm.
1 Initial population
Nv
Fitness function
318 SBIBAtion
4. Crossover
5. Mutation