0 ratings0% found this document useful (0 votes) 95 views95 pagesEngineer Being Machine Learning Notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
MACHINE LEARNING NOTES
MOST IMPORTANT QUESTIONS OF MACHINE LEARNING AKTU
-ENGINEER BEING
MODULE1
PART-I
Learning is the process of acquiring new understanding, knowledge, beltaviors,
skills, values, attitudes, and preferences. Learning is any process by which a
system improves its performance from experience.
Ques2. What is Machine Learning? 2020-21 2M
Ans. Machine learning (ML) is defined as a discipline of artificial intelligence (AT)
that provides machines the ability to automatically learn from data and past
experiences to identify patterns and make predictions with minimal human.
intervention.
“Machine learning enables 2 machine to cutomatically learn from data, improve
performance from experiences, and predict things without being explicitly
programmed”,
Ques3. Difference between ML, Al, Deep Learning? 2020-21 2MArtificial Intelligence: AI is the broadest concept of
all, and gives a machine the ability to imitate human Yr hvpccon
behaviour. rE Wi genet,
Machine Learning: Machine Leaming uses
algorithms and techniques that enable the machines /
to learn from past experience/trends and predict the i .
‘output based on that data, their performance improve, | ee 4/
as they are exposed to more data over time. x
Deep Learning: subset of machine learning in =
which multilayered neural networks learn from
vast amounts of data.
The main difference between machine learning and deep learning technologies is
of presentation of data. Machine learning uses structured/unstructured data for
learning, while deep learning uses neural networks for leaming models.
a a 2
Ans, Machine learning is important because it gives enterprises a view of trends in
customer behavior and business operational patterns, as well as supports the
development of new products.
Many of today's leading companies, such as Facebook, Google and Uber, make
machine learning a central part of their operations; Machine learning’has become.a
significant competitive differentiator for many companies.
Applications of ML:
1. Image recognition:
a. Image recognition is the process of identifying and detecting an object or a
feature in a digital image or video.
b. This is used in many applications like systems for factory automation, toll booth
monitoring, and security surveillance.2. Speech recognition :
a. Speech Recognition (SR) is the translation of spoken words into text.
b. It is also known as Automatic Speech Recognition (ASR), computer speech
recognition, or Speech To Text (STT).
c. In speech recognition, a sottware application recognizes spoken words.
3.Product recommendation
Machine learning is widely used by various e-commerce and entertainment
companies such as Amazon, Netflix, etc., for product recommendation to the user.
Whenever we search for some product on Amazon, then we started getting an
advertisement for the same product while internet surfing on the same browser and
this is because of machine leaning.
4, Email Spam and Malware Filtering:
Whenever we receive a new email, itis filtered automatically as important, normal;
and spam.
We always iecvive au important unail in our inbox with dhe important symbol and
spam emails in our spam box, and the technology behind this is Machine-learing,
5. Stock Market trading:
Machine learning is widely used in stock market trading
In the stock market, there is always a risk of up and downs in shares, so for this
machine Iearning's long short term memory neural networks used for the
prediction of stock market trends‘Types of Machine Learning:
© Supervised Leaning
© Unsupervised Learning
© Reinforeement Learning
Supervised learning is the types of machine leaming in which machines are
trained using well "labelled" training data, and on basis of that data, machines
predict the output.
The labelled data means some input data 's already tagged with the correct output.
Ex: Risk Assessment, Image classification, Fraud Detection, spam filtering, ete.
om Oo
ae
aa
ee
4
Uexaqon
Heaxoger! Lersos
eons Ovetp nee
Types of Supervised leaning
¢ Classification: A classification problem is When the output variable is a
category, such as “red” or “blue” “disease” and “no disease”, Yes-)
MaleFemale, True-false, etc.ii. Regression: A regression problem is when the output variable is a real
value, such as, Forecasting sales, Weather forecasting, ete.
Unsupervised learning is a type of machine learning in which models are
trained using unlabeled dataset and are allowed to act on that data without
any supervision
The goal of unsupervised learning is to find the underlying structure of
dataset, group that data according ‘o similarities, and represent that dataset
in a compressed format.
+ The output is dependent upon the coded algorithms
hoy
pote
ees |
+ Clustering: A clustering problem is where you want to discover the
inherent groupings in the data, such as grouping customers by purchasing
behavior.
“Association: Association rule learning is a kind of unsupervised learning
technique that tests for the reliance of one data element on another
data element and design appropriately so that it can be more cost-
effective. It tries to discover some interesting relations or associations
between the variables of the dataset.
Semi Supervised learning is between the supervised and unsupervised learning
families. The semi-supervised models use both labeled and unlabeled data for
training.Reinforcement Learning is a feedback-based Machine learning technique in
which an agent learns to behave in an environment by performing the actions and
seeing the results of actions. For each good action, the agent gets positive
feedback. and for each bad action. the agent gets negative feedback or penalty.
The main elements of an RL system are:
+ The agent or the learner cc
+The environment the agent interacts ( ie [Pena ecu
with v_| -
“The policy that the agent follows to
take actions +
Soda
“The reward signal that the agent
observes upon taking action
GENETIC ALGORITHM TRADITIONAL ALGORITHM
‘A genetic algorithm is a search-based | Traditional Algorithms refers
algorithm used for solving optimization | to general algorithms we use to solve
problems in machine learning problems. It is a methodical procedure
to solve a given problem. There can be
several algorithms to solve a problem.
More Advanced Not as Advanced
Used in ML, AT Used in Programming, Math,1) Process Complexity of Machine Learning
The machine learning process is very complex, which is also another major issue
faced by machine learning engineers and data scientists. There is the majority of
hits and trial experiments; hence the probability of error 1s higher than expected.
Further, it also includes analyzing the data, removing data bias, training data,
applying complex mathematical calculations, etc., making the procedure more
complicated and quite tedious.
2) Getting bad recommendations
‘A machine learning model operates imder a specific context which tesnlts in had
recommendations and concept drift in the model. Suppose at a specific \time
customer is looking for some gadgets, but now customer requirement changed over
time but still machine learning model showing same recommendations to the
customer while customer expectation has been changed. This incident is called a
Data Drift. However, we can overcome this by regularly updating and
monitoring data according to the expectations.
3) Overfitting and Underfitting
Overfitting:
Overfitting is one of the most common issues faced by Machine Learning
engineers and data scientists. Whenever a machine learning model is trained with a
huge amount of data, it starts capturing noise and inaccurate data into the training
data set.
Underfitting:
Underfitting is just the opposite of overfitting. Whenever a machine learning
model is trained with fewer amounts of data, and as a result, it proyides incomplete
and inaccurate data and destroys the accuracy of the machine learning model.
4) Inadequate Training Data
The major issue'that comes While Using machine learning algorithins/is the lack of
quality as well as quantity of data. Although data plays a vital role in theprocessing of machine learning algorithms, many data scientists claim that
inadequate data, noisy data, and unclean data are extremely cxhausting the
machine learning algorithms
For example, a simple task requires thousands of sample data, and an advanced
task such as speech or image recognition needs millions of sample data
examples. Further, data quality is also important for the algorithms to work ideally,
but the absence of data quality is alsa found in Machine I earning applications
5) Monitoring and maintenance
As we know that generalized output data is mandatory for any machine learning
model. Hence, regular monitoring and maintenance become compulsory for the
same. Different results for different actions require data change; hence editing
of codes as well as resources for monitoring them also become necessary:
© Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal
nodes represent the features of a dataset, branches represent the
decision rules and each leaf node represents the outcome,
o In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision and
have ‘multiple branches), whereas Leaf nodes are’ the’ output! of" those
decisions and do not contain any further branches.
e In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
representationENGINEER
BEINGNING AKTU
ER BEING.
Machine Leaming
ANN
Clustering
Reinforcement Leat
Decision Tree Learnin}
Bayesian Networks
SVM (Support Vector Mac!
Genetic Algorithms
2020-21 1M
‘The term "Artificial Neural Network" is derived from =o eel TenvOrkse .
that develop ae structure fe ob a cua bee. Pialer to the human brain that has
0 neurons
. TheseIna neural network, there are three essential layers —
Input Layers
The input layer is the first layer of an ANN that receives the input information in
the form of various texts, numbers, audio files, image pixels, ete.
HiddenvLayers — : -
In the middle of the ANN model are the hidden layers. There can be a single
hidden lay le hidden
layers p of th ition input data and
recognize Nf.Output Layer
In the ou(put layer, we obtain the result that we obtain through rigorous
computations performed by the middle layer.
Artificial Neural Networks Application problems to apply:
Following are the important Artificial Neural Networks applications —
Handwritten Character Recognition
ANNS are used for handwritten character recognition. Neural Networks are trained
to recognize the handwritten characters which ean be in the form of lettcrs or
digits.
Facial Recognition
In order to recognize the faces based on the identity of the person, we make use of
neural networks, They are most commonly used in areas where the users require
security access.
Speech Recognition
ANNs play an important role in speech recognition. The earlier models of Speech
Recognition were based on statistical models like Hidden Markov Models. With
the advent of deep learning. various types of neural networks are the absolute
choice for obtaining an accurate classification.
2020-21 10M (UNIT2)
SVM or Support Vector Machine:is’a lincar model for elassificationand:regression
problems. It can solve linear and non-linear problems and work well for many
practical problems:ccording to the SVM algorithm we find the points closest to the line from both the
classes. These points are called support vectors.
we compute the distance between the line and the support vectors. This distance is
called the margin. Our goal is to maximize the margin. The hyperplane for which
the margin is maximum is the optimal hyperplane.
‘Thus SVM tries to make a decision boundary in such a way that the separation
between the two classes is as wide as possible.
10M
Clustering
© Away of grouping the data points into different clusters, consisting of
similar data points. The objects with the possible similarities remain in a
group that has less or no similarities with another group.”
© Itis an unsupervised learning method, hence no supervision is provided to
the algorithm, and it deals with the unlabeled dataset.
After applying this clustering technique, each cluster or group is provided
with a cluster-ID. ML system can vse this id to simplify the processing of
large and complex datasets.
The clustering technique is commonly used for statistical data analysis.
Example =
‘Clustering technique with the real-world example of Mall:
When We visit any shopping inall, we can observe that the things with Similar
usage are grouped together. Such as the tshirts are grouped in one section, and
trousers are at other sections, similarly, at vegetable sections, apples, bananas,
Mangoes, ete,, are grouped in separate sections, so that we can easily find out the
things. The clustering technique also works in the same way.Classification and Regression
Regression and Classification algorithms are Supervised Learning
algorithms. Both the algorithms are used for prediction in Machine learning
and work with the labeled datasets. But the difference between both is how
they are used for different machine learning problems
Classification
Regression
Classification algorithms are used
to predict/Classify the discrete
values such as Male or Female,
True or False, Spam or Not Spam,
etc.
Regression algorithms are used
to predict the continuous values
such as price, salary, age, etc.
The task of the classification
algorithm is to map the input
value(x) with the discrete output
variable(y).
The task of the regression algorithm
is to map the input value (x) with
the continuous output variable(y).
Classification Algorithms are used
with discrete data.
Regression Algorithms are used
with continuous data.
‘The Classification algorithms can
be divided into Binary Classifier
and Mulli-class Classifier.
The regression Algorithm can be
further divided into Linear and
Notw-linear Regression.
Classification Algorithms can be
used to solve classification
problems suchas Identification of
spam emails, Speech Recognition,
Identification of cancer cells, etc.
In; Email Spam Detection, the
model is trained on the basis of
millions of emails on different
parameters; and whenever it
receives a new email, it identifies
whether the’email is spamvornot. 1°
the email is spam, then it is moved
to the Spam folder.
Regression algorithms can be used
to solve the regression problems
such as Weather Prediction,
House price prediction, ete
Suppose we want to do weather
forecasting, so for this, we will use
the Regression algorithm. In
weather prediction, the model is
trained on the past datayand’once
the training is completed, it can
easily predict the weather for future
days.2021-22 2M
A Ieaming problem is said to be well defined if it has three features: the class of
tasks, the measure of performance to be improved, and the source of experience.
Ex: A checkers learning problem
~Task T: playing checkers
~Performance measure P: percent of gemes won against opponents
—Traii
2 experience E. playing practice games against itself
"Data Science is a field of deep study of data that includes extracting useful
insights from the data, and processing that information using different tools,
statistical models, and Machine learning élgorithms.",
Machine Leaning allows the computers to learn from the past experiences by
its own, it uses statistical methods to improve the performance and predict the
output without being explicitly programmed
Or
Design the final design of Checkers Learning Program 2021-22 10M
Learning is the process of acquiring new understanding, knowledge, behaviors
skills, values, attitudes, and preferences. Learning is any process by which a
system improves its performance from experience.
Designing a Learning System in Machine Learning:Step 1) Choosing the Training Experience: The very important and first task is
to choose the training data or training experience which will be fed to the
Machine Learning Algorithm
Three attributes are used:
1. Whether the training experience provides direct or indirect feedback
regarding the choices made by the performance system.
2. Direct training examples in learning to play checkers consist of individual
checkers board states and the correct move for each.
3. Indirect training examples in the same game consist of the move sequences
and final outcomes of various games played in which information about the
correctness of specific moves early in the game must be inferred indirectly
from the fact that the game was eventually won or lost —credit assignment
problem.
2. The degree to which the leamer controls the sequence of training examples.
Example: ~The learner might rely on the teacher to select informative board states
and to provide the correct move for each ~The learner might itself propose board
states that it finds particularly confusing end ask the teacher for the correct move.
Or the leamer may have complete control over the board states and (indirect)
classifications, as it does when it leams by playing against itself with no teacher
piescul
3.The representation of the distribution of samples across which
performance will be tested is the third crucial attribute.
This basically means the more diverse the set of training experience can be
the better the performance can get.
Example: If the training experience in play checkers consists only of games played
against itself, the learner might never encounter certain crucial board states that are
very likely to be played by the human checker’s champion.
Step 2- Choosing target function: To determine what type of knowledge will be
learned and how this will be used by the performance program,
Example: —In play checkers, it needs to learn to choose the best move among those
legal moves.Step 3- Choosing Representation for Target function: Once done with
choosing the target function now we heve to choose a representation of this
target function, When the machine algorithm has a complete list of all
permitted movements, it may pick the best one using any format, such as
linear equations, hierarchical graph representation, tabular form, and so on.
Out of these moves, the NextMove function will move the Target move,
which will increase the success rate. For example, if a chess machine has four
alternative moves, the computer will select the most optimal move that will
lead to victory.
Step 4- Choosing Function Approximation Algorithm:
In this step, we choose a learning algorithm that can approximate the target
function chosen. This step further consists of two sub-steps: a. Estimating the
training value, and b, Adjusting the weights.
oe ps
ieee ) bie
Neo Ceverotor Hyp othest s
probiew
pes a
Adlition Lote ERetsp es
Trace
Cope Keisrunp)
The final design consists of four modules, as described in the picture
1. The performance system: The performance system solves the given
performance task.ENGINEER
BEINGMOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU
IGINEER BEING
MODULE2
PART-I
—— —_—_——— un me
2020-21 10M
Or
Discuss Support vectors in SVM. 2020-21 2M
Or
a 2020-21 10M
SVM or Support Vector Machine is a linear model for classification and
regression problems. It can solve linear and non-linear problems and work well for
many practical problems.
It tries to classify data by finding a hyperplane that maximizes the margin
between the classes in the training data, Hence, SVM is anexample of a large
margin classifier.
The idea of SVM is simple: The algorithm creates a line or a hyperplane which
separates the data into classesAccording to the SVM algorithm we find the points closest to the line from both the
classes. These points are called support vectors.
we compute the distance between the line and the support vectors. This distance is
called the margin, Our goal is to maximize the margin, The hyperplane for which
the margin is maximum is the optimal hyperplane.
Thus SYM tries to make a decision boundary in such a way that the separation
between the two classes is as wide as possible.
SVM KERNELS
* SVM can work well in non-linear data cases using kernel trick.
+ The function of the kernel trick is to map the low-dimensional input space and
transforms into a higher dimensional space.
+ In simple words, kemel converts non-separable problems into separable problems
by adding more dimensions to it
+ Itmakes SVM more powerful, flexible and accurate.LASS Rete Be sacks
Aero
THREE TYPES OF KERNEL
1)Linear Kernel: A linear kernel can be used as normal dot product offany two
given observations. The equations for the kernel function:
K(x, xi)=sum(x+ xi)
2)Polynomial kernel: It is more generalized form of linear kernel and distinguish
curved or nonlinear input space.
Itis popular in image processing.
Following is the formula for polynomial kernel —
K(X, Xi}=1+sum(X* Xi)*d , d is the degree of the polynomial
3)Gaussian Radial Basis Function (RBF) Kernel: RBF kernel, mostly used in
SVM classification, maps input space in indefinite dimensional space.It is a general-purpose kernel; used when there is no prior knowledge about
the data
Following formula explains it mathematically :
K(x, xi)-exp(-gamma + sum(x-xi*2))
Gamma funetion: 1/20?
APPLICATIONS OF KERNEL
«Face detection ~ SVM classify parts of the image as a face and non-face
and create a square boundary around the face.
«Handwriting recognition — We use SVMs to recognize handwritten
characters used widely.
«Texture Classification using SVM- In this SVM application, we use the
images of certain textures and use that data to classify whether the surface is
smooth or not.
+ Stenography Detection in Digital Images
Using SVM, we can find out if an image is pure or adulterated. This could
be used in security-based organizations to uncover secret messages. Yes,
we can encrypt messages in high-resolution images
In high-resolution images, there are more pixels, hence, the message is more
hard to find. We can segregate 'the pixels and store in datain various
datasets. We can analyze those datasets using SVM.
PROPERTIES OF SVM:
1. Flexibility: in choosing a similarity-funetion2, Sparseness of solution when dealing with large data sets- only support
vectors are used to specify the separating hyperplane
3. Ability to handle large feature spaces- complexity does not depend on the
dimensionality of the feature space
4. Overfitting can be controlled by soft margin approach (we let some data
points enter our margin intentionally)
s. A simple convex optimization problem which is guaranteed to converge to a
single global solution.
DISADVANTAGES OF SVM:
1. SVM algorithm is not suitable for large data sets because the required
training time is higher
2. SVM does not perform very well when the data set has more noise:i.
target classes are overlapping.
3. In cases where the number of features for each data point exceeds the
number of training data samples, the SVM will underperform,
4. SVMs with the ‘wrong’ kernel - For SVMs nowadays, choosing the right
kemel function is key. As an example, using the linear kernel when the
data are not linearly separable results in the algorithm performing poorly.
2020-21 2M
Regression is asupervised leaming technique which helps in finding the
correlation between variables and cnables us to predict the continuous output
variable based on the one or more predictor variables.It is mainly used for prediction, forecasting, time series
modeling, and
determining the causal-effect relationship between variables
Some examples of regression can be as:
o Prediction of rain using temperature and other factors
co Determining Market trends
© Prediction of road accidents due to.rash driving.
It is used to find the trends in data.
By performing the regression, we can confidently determine the most important
factor, the least important factor, and how each factor is affecting the other
factors.
Linear Regression
Logistic Regression
Linear Regression is a supetvised
regression model.
Logistic Regression is a supervised
classification model
Tn Linear Regression, we predict the
value by an integer number.
In Logistic Regression, we predict) the
value by | or 0
It is based on the
estimation.
Teast square
It is based on maximum likelihood.
estimation.
Here when we plot the training
datasets, a straight line can be drawn
that touches maximum plots.
Any change in the coefficient leads toa
change in both the direction and the
steepness of the logistic function, It
means positive slopes result in an S-
shaped curve and negative slopes result
in a Z-shaped curve
Linear regression is used to estimate
the dependent variable in case of a
change in independent variables..For
example, predict the price of houses.
Whereas logistic repression 1s used to
calculate the probability of an event
Forexample, classify.if tissue is
benign or malignant.MOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU
-ENGINEER BE!
MODULE2
PART- IL
2020-21 2M.
The vertices and edges in Bayesian Network have some sort of meaning, The
network building itself gives you important information about the subject
dependence between the variables. With Neural Networks the network structure
does not tell you anything like Bayesian Network.
Similarity in ANN and Bayesian Network is that they both uses directed graphs
i
I
Le
Output
Input #4
uct 2) Hae aie 2021-22 10MOr
2021-22 10M
Bayes theorem is one of the most popular machine learning concepts that helps to
calculate the probability of occurring ore event with uncertain knowledge while
other one has already occurred,
Bayes Theorem is a way of finding a probability when we know certain other
posisbilities.
P(X|Y) = PCv[X).P(X)
PCY),
Which tells us: how often X happens given that Y happens, written P(X/Y),
When we know: how ofter Y happens given that X happens, written P(Y/X)
und how likely X Is on Its own, written P(X)
and how likely Y is on its own, written P(Y)
The above equatiomis called/as Bayes Rule or Bayes Theorem:
©” P(X{Y) is called as posterior, which we need to Calculate: It is defined as
updated probability after considering the evidence.
© P (Y|X) is called the likelihood. It is the probability of evidence when
hypothesis is true,
© P(X) is called the prior probability, probability of hypothesis before
considering the evidence
© P(Y) is called marginal probability. It is defined as the probability of
evidence under any consideration,Hence, Bayes Theorem can be written as:
posterior = likelihood * prior / evidence
EXAMPLE:
© Dangerous fires are rare (1%)
© But smoke is fairly common (10%) due to barbecues.
«And 90% of dangerous fires. make smoke
We can then discover the probability of dangerous fire when there is no smoke:
P(Fire/Smoke) = P(Fire) P(Smoke/FireyP(Smoke)
= (1% * 90% )/ 10%
=9%
Naive Bayes Classifier Algorithm
on Bayes theorem and used for solving classification problems.
ive Bayes algorithm is a supervised learning algorithm, which is based
© It is mainly used in text classification that includes a high-dimensional
training dataset.
Itisa probal
probability of an object.
ic classifier, which means it predicts on the basis of the
© Some popular examples of Narve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.
The distinction between Bayes theorem and Naive Bayes is that Nalve Bayes
assumes conditional independence where Bayes theorem does not. This means
the relationship between all input features are independent .
Working of Naive Bayes' Classifier:
Working of Naive Bayes' Classifier can be understood with the help of the below
exampleSuppose we have a dataset of weather conditions and corresponding target
variable "Play". So using this dataset we need to decide that whether we should
play or not on a particular day according ‘o the weather conditions. So to solve this
problem, we need to follow the below steps:
1. Convert the given dataset into frequency tables
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the Player should play or not?
Outlook Play
0 Rainy Yes
I Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
g Rainy No
9 Sunny No
10 Sunny Yes
il Rainy No
12 Overcast Yes
13 Overcast Yes
Likelihood Table:Frequency Table:
Weather
Yes
Overcast
Rainy
N
Sunny
N
Total
on
Applying Bayes theorem:
Weather
No
Overcast
0
5/14=0,35
Rainy
A
2
4/14=0.29
Sunny
3
3/14=U.35
All
4/14=0.29
10/14=0.71
P(¥es | Sunny)= PGunny | Yes)*P(Ves)/P(Sunny)
P(Sunny | Yes)= 3/10= 0.3
P(Sunny)=0.35
P(Yes)}=0.71
So P(Yes | Sunny) = 0.3*0.71/0.35= 0.60
P(No | Sunny)= P(Sunny | No)*P(No)/P{Sunny)
P(Sunny | NO)=2,
/A=0.5P(No)= 0.29
P(Sunny)= 0.35
AD
So P(No | Sunny)=0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that P(Yes | Suany)>P(No | Sunny)
Hence on a Sunny day, Player can play the game.
Ques 3) what problem does EM algorithm solves? 10M 2021-22
Or
what are (ask of E-steps in EM Algorithai? 2M 2020-21
The Expectation-Maximization (EM) algorithm is defined as the combination of:
various unsupervised machine learning algorithms, which is used to determine
the local maximum likelihood estimates (MLE) or maximum a posteriori
estimates (MAP) for unobservable variables in statistical models.
it is a technique to find maximum likelihood estimation when the latent variables
arc present. It is also referred to as the latent variable model.
A latent variable model consists of both observable and unobservable variables
where observable can be predicted while unobserved are inferred from the
observed variable. These unobservable variables are known as latent variables
Steps in EM Algorithm
The EM algorithmyis completed mainly in’4/steps, which include Initialization
‘Step, Expectation Step, Maximization Step, and convergence Step.ep Initiar Values
1” Step: The very first step is to initialize the parameter values. Further, the
system is provided with incomplete observed data with the assumption
that data is obtained from a specific model.
2" Step: This step is known as Expectation or E-Step, which is used, to.
estimate or guess the values of the missing or incomplete data using the
observed data. Further. E-step primarily updates the variables.
3" Step: This step is known as Maximization or M-step, where we use
complete data obtained from the 2" step to update the parameter values.
Further, M-step primarily updates the hypothesis,
4" step: The last step is to check if the values of latent variables are
converging or not. If it gets "yes", then stop thé’ process’"élse, repeat the
process from step 2 until the convergence occurs.MOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU
-ENGINEER BEI
MODULE 3
PART-I
If we depend too much on the training data while drawing the decision tree, there
is a possibility that the tree will go into overfitting. That is, a particular hypothesis
will work good on the training data, but it doesn’t work good on Testing or the real
world data So such tree is called.as a overfitting.
Underfitting occurs when our machine learning model is not able to capture the
underlying trend of the data, In the ease of underfitting, the model is not able to
lear enough from the training data, and hence it reduces the accuracy and
produces unreliable predictions.
a. overfitting the data
b. han
ing continuous valued attribute
¢. handling missing attribute values
d. handling attributes with different costsAns: a. overfitting the data
If we depend too much on the training data while drawing the decision tree, there
is a possibility that the tree will go into oyerfitting. That is, a particular hypothesis
will work good on the training data, but it doesn’t work good on Testing or the real
world data So such tree is called as a overfitting.
This particular overfitting can be addressed with the two techniques
reduced error pruning
post rule pruning.
The decision tree works well with the problems where we have fixed number of
attributes and the discrete number of possibilities for each attributes. Ifa particular
attribute has the continuous values, then we cannot apply the decision tree directly.
First, we need to convert those particular attributes which are having continuous
values into a discrete possibilities, Then only we can apply decision tree learning.
if you have some missing attributes, we need to fill those particular missing
attributes with a proper values then only we can use this learning. Let us say that a
particular attribute is not having a value, we need to find some value or fill it with
the proper value
Whenever we apply decision tree algorithm, cach and every attribute in the given
eqiial importiiice. But sometimes what heppens is @ given problem definition, there
is a possibility that a particular attribute may haye more importance or it is given
more weightage. In such case We cannot use the core decision tree learning. We
need to handle this particular issue with some sort of calculation.w= (a4) 4 (au = 8
ra ay i so
(GDA | =f Is 3 a?
BEINGeT
Ks
ENGINEER
BEINGPlayTennis: training examples
Day [] Outlook Temperature Humility PlayTennis
pi |} Sunny Hoot ‘igh No
v2 |) sunny Hoot High No
D3 |] Overcast Hot High Yo
pa |] main ava gn res
ps |] Rain ‘Cool Normal Yo
D6 |} Rain Cool Normal No
7 |] Overcast Cool Normal Yes
ps |} sunny Mild High No
pe |} sunny Cool ‘Normal Yo
p10 |] Rain Mil oemal Yeo
pu |} Sunny Mild Normal Yes
12 |] Overcast Mild igh Yeo
Overcast Yes
Rain No.
In Decision Tree the major challenge is to identification of the attribute for the
root node in each level. This process is known as attribute selection. We have two
popular attribute selection measures:
1. Information Gain
2. Gini Index
Information Gain
When we use a node in a decision tree to partition the training instances into
smaller subsets the entropy changes. Information gain is a measure of this change
inentropy:
Gain(S,A)= Entropy(S) ~ S)y-ameats) Se-Entropy(Se)
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)peed0SSR 59925
a}saa}eiea}ds
jaseanabaaiaee gpeed 2999589829
pebasappgagdds
lesaaeanaaiaeeg|
ENGINEER
BEINGities}
ilseaaeasanRaeeagssaaeansaes |alssassanaseaeaaMOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU
-ENGINEER BEING
MODULE 3
PART-II
2020-21 10M.
Ans: Instance-based leaming refers to a family of techniques for classification
and regression, which produce a class label/predication based on the
similarity of the query to its nearest neighbor(s) in the training set.
Some of the instance-based learning algorithms are =
1. K Nearest Neighbor (KNN)
2. Locally Weighted Leaming (LWL)
3. Case-Based Reasoning
Locally weighted regression
Locally weighted linear regression is a non-parametric algorithm, that is,
‘the model does not learn a fixed set of parameters as is done in ordinary
linear regression. Rather parameters are computed individually for each
query point.
Locally weighted regression (LWR) is a memory-based method that
performs a regression around a point of interest using only training data that
are “‘local" to that point.
Locally weighted linear regression is @ supervised learning algorithm.
There exists No training phase. All the work is done during the testing
phase/while making predictions.
Locally weighted regression methods are a generalization of k-Nearest
Neighbour:RBF network on
layer, and an outp
Input Layer
The input layer simply feeds the data to the hidden layers. As a result, the number
of neurons in the input layer should be equal to the dimensionality of the data_.
Hidden Layer —Output Layer
The output layer uses a linear acti
regression tasks.In general, the case-based reasoning process entails:
. Retrieve- Gathering from memory an experience closest to the current
problem.
2. Reuse- Suggesting a solution based on the experience and adapting it to
meet the demands of the new situation:
. Revise- Evaluating the use of the solution in the new context.
4, Retain- Storing this new problem-solving method in the memory system.
A CADET system employs case based reasoning to assist in the conceptual design
of simple mechanical devices such as water faucets.
Tt uses a library containing approximately 75 previous designs and design
fragments of two suggest conceptual designs to meet the specifications of new
design problem.
at Bd
ray Te tempore CUR * rsh
{85 enkinfion +P Be
> &th ca
=e
|
! at
Qt Te
co. The function is represented in terms of qualitative relationships among the
water flow levels and temperatures at its inputs and outputs:© Inthe functional description, an arrow with a “+” labeled indicates that the
variable at the arrow head increases with the variable at its tail. A “-” label
indicates that the variable at the head decreases with the variable at the tail.
o Here Qe refers to the flow of cold water from the into the faucet, Qh to the
input flow of hot water, and Qm to the single mixed flow out of the faucet.
o Tc, Th, Tm refers to the temperature of the cold water, hot water and mixed
water respectively.
© The variable Ct denotes the control signal for temperature that is input to the
faucet and Cf denote the control signal for water flow.
© The control Ct and Cfare to influence the water flow Qc and Qh, thereby
indirectly influencing the faucet output flow Qm and temperature Tm.
2021-22 10M
Ease of knowledge elicitation : Lazy methods can utilise easily available case or
problem instances instead of rules that are difficult to extract.
Absence of problem-solving bias: Cases can be used for multiple problem-
solving purposes, because they are stored in a taw forms ‘This in contrast to eager
methods, which can be used merely for the purpose for which the knowledge has
already been compiled.
Incremental learning : A CBL system can be put into operation with a minimal
set solved casesifumishing the case basesThe'case base will be'filled:with new
cases increasing the system’s problem-solving ability.
Ease of maintenanee : This is particularly due to the fact that CBL systems can
adapt to many changes in the problem domain and the relevant environment,
merely by acquiring.Ease of explanation: The results ofa CBL system can be justified based upon the
similarity of the current problem to the reirieved case.CBL are easily traceable to
precedent cases, it is also easier to analyse failures of the system.
For example, CASEY for classification of auditory impairments, CASCADE for
classification of software failures
2021-22
The inductive bias (also known as learning bias) of a learning algorithm is the set
of assumptions that the learner uses to predict outputs of given inputs that it
has not encountered. In machine learning, one aims to construct algorithms that
are able to learn to predict a certain target output.
Inductive learning methods require a certain number of training examples to
generalize accurately.
Analytical learning stems from the idea that when not enough traming examples
are provided, it may be possible to “replace” the “missing” examples by prior
knowledge and deductive reasoning.
2021-22Lazy leaming
Eager Learning
Tazy learning, methods simply stare the
data and generalizing beyond these
data is postponed until an explicit
request is made.
Fager learning methods construct
general (one fit all), explicit (input
independent) description of the target
function based on the provided training
examples
* Lazy learning methods can construct
a different approximation to the target
function for each encountered query
instance.
Eager learning methods use the same
approximation to the target function,
which must be learned based on
training examples and before input
queries are observed
Tazy leaming is very suitable for
complex and incomplete problem
domainsINEER BEING.
Perceptrons are the buildin
learning algorithm of binary cl
The perceptron consists of 4 parts.
1. Input values or One input layer
2. Weights and Bias
3. Net sum
4 Activation Function
a. All
b. Adec. Apply that weighted sum to the correct Activation Function,
Weights shows the strength of the particular node
A bias value allows you to shift the activation function curve up or down.
In short, the activation functions are used to map the input between the
required values like (0, 1) or (-1, 1).
Perceptron 1s usually used to classity the data into two parts. Iherelore, t 1s also
known as a Linear Binary Classifier.
(Ques 2)What is Gradient descent?
2021-22 2M
Gradient descent is an optimization algorithm which is commonly-used to train
machine learning models and neural networks, to find a local
minimum/maximum of a given function.
This method is commonly used in machine leaming (ML) and deep learning(DLy
to minimize a cost/loss function.
2021-22 2M.
Tn machine learning, the delta rule is a gradient descent learning rule for updating
the weights of the inputs to artificial neurons in a single-layer neural network. It is
a special case of the more general backpropagation algorithm.
(Ques-4y Describe BPN algorithm in ANN along with a suitable example,
2020-21 10MBack-propayation is used for the training of neural network. U Q
The Backpropagation algorithm looks for the minimum value of the error function
in weight space using a technique called the delta rule or gradient descent
Thun ai lificial neural network, the values of weights and biases are randomly
initialized. Due to random initialization, the neural network probably. as errors in
giving the correct output.
We need fo reduce error vahies as much as possible. So, for reducing these error
values, we need a mechanism that can compare the desired output of the neuralBackpropagation is a short form for "backward propagation of errors." It is a
standard method of training artificial neural networks.
Backpropagation Algorithm:
Step 1: Inputs X, arrive through the preconnected path.
Step 2: The input is modeled using true weights W. Weights are usually chosen
randomly.
Step 3; Calculate dhe output of each neuron fiom the iuput layer w the hidden
layer to the output layer.
Step 4: Calculate the error in the outputs.
Backpropagation Error Actual Output — Desired Output
Step 5: From the output layer, go back to the hidden layer to adjust the weights to
reduce the error.
Step 6: Repeat the process until the desired output is achieved.
Why We Need Backpropagation?
Most prominent advantages of Backpropzgation are:
+ Backpropagation is fast, simple and easy to program
+ It isa flexible method as it does not require prior knowledge about the network
+ It is a standard method that generally works well
+ It does not need any special mention of the features of the function to be learned,
Types of Backpropagation Networks Two Types of Backpropagation Networks
are:
+ Static Back-propagation
* Recurrent Backpropagation
The output two runs of a neural network compete among themselves to become
active. Several output neurons may be active, but in competitive only single output
neuron is active at one time.2020-21 10M
Self Organizing Map
It follows an unsupervised lea
competitive leaming algorithm.
SOM is used for clustering and mapp
to map multidimensional data onto lo
reduce complex problems for easy interpretation.eh all the n¢
and calculate the Euclidean distar ight vector and the
current input vector. The node wit g tor closest to the input
tagged as the winning neuron.
Step 4: Find the new weight between input vector sample and winning output
Neuron 3 a = ‘“MOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU
-ENGINEER BEING
MODULE 4
PART-II
ARTIFICIAL NEURAL NETWORKS — Perceptron’s, Multilayer perceptron,
Gradient descent and the Delta mule, Multilayer networks, Derivation of
Backpropagation Algorithm, Generalization, Unsupervised Leaming - SOM
1V_| Algorithm and its variant;
DEEP LEARNING - Introduction, concept of convolutional neural.network , Types
of layers ~ ( Convolutional Layers , Activation function , pooling , fally connected) ,
‘Concept of Convolution (1D and 2D) layers, Training of network, Case study ofCNN
for eg on Diabetic Retinopathy, Building a smart speaker, Self-deriving car ete
Convolutional Neural Networks (CNNs) are specially designed to work with
images. Convolutional Neural Networks (CNNs) are specially designed to work
with images. An image consists of pixels. In deep learning, images are represented
as arrays of pixel values,
There are three main types of layers in a CNN:
© Convolutional layers
© Pooling layers
© Fully connected (dense) layers:
In addition to that, activation layers are added after each convolutional layer and
fully connected layer.There are four main types of operations in a CNN: Convolution
operation, Pooling operation, Flatten operation and Classification (or other
relevant) operation,
Convolutional layers and convolution operation: The first layer in a CNN is a
convolutional layer. It takes the images as the input and begins to process.
There are three elements in the convolutional layer: Input
image, Filters and Feature map
Secton (axa) Convolution operation between the image
J and filter
(i
3
5
Zi
el|nto
°
nv | & Le
a
5
6
4
4
3
3
ol-
ol
olsle
3
o}i{1jo Feature map
(4x4)
Input image Convolutional
(6x6) operation
Filter: This is also called Kernel or Feature Detector.
Image section: The size of the image section should be equal to the size of the
filter(s) we choose. The number of image sections depends on the Stride.
Feature map: The feature map stores the outputs of different convolution
operations between different image sections and the filter(s).
‘The number of steps (pixels) that we shift the filter over the input image is
called Stride.Padding adds additional pixels with zero values to
cach side of the image. That helps to get the feature
map of the same size as the input.
Pooling layers and pooling operation
Pooling layers are the second type of layer used in a
CNN. There can be multiple pooling layers ina
CNN. Each convolutional layer is followed by a
pooling layer. So, convolution and pooling layers are es)
used together as pairs
It Reduce the dimensionality (number of pixels) of the output returned from
previous convolutional layers.
There are three elements in the pooling layer: Feature map, Filter and Pooled
feature map.
‘There are two types of pooling operations.
+) Max pooling: Get the maximum value in the area where the filter is
applied.
+ Average pooling: Get the average of the values in the area where the
filter is applied.
‘Then, we can flatten a pooled feature map that contains multiple channels.
Fully connected (dense) layers2020-21 10MStep I: to c
npur a
1 jo 4 jolt
9 |O ‘ lo [ftteey
1 fate oll
i la
\[o miu
oll o |! fi |
on oa
Alea
mie OU
Tube
Size of kernel or filter is 3*3 hence the size of image section is also 3*3o
eS
°
+
lo
OKI TORO TIKO
= lixo+ict tixie] 2 4
IXLFIXt 40x09)=o
, SS
= —WY fiite
ENGINEER
BEINGMOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU
-ENGINEER BEI
MODULES
PART-I
REINFORCEMENT LEARNING — Introduction to Reinforeement Learning ,
Leaming Task, Example of Reinforcement Learning in Practice, Learning Models for
Reinforcement ~ (Markov Decision process , Q Learning - Q Leaming function, Q
Leaming Algorithm ), Application of Reinforcement Learning, Introduction to Deep
Q Learning
GENETIC ALGORITHMS: Introduction, Components, GA eyele of reproduction,
Crossover, Mutation, Genetic Programming, Models of Evolution and Learning,
Applications.
Reinforcement Leaming is a feedback-based Machine learning technique in
which an agent learns to behave in an environment by performing the actions and
seeing the results of actions For each good action, the agent gets positive
feedback, and for each bad action, the agent gets negative feedback or penalty.
The elements of reinforcement leaning are: Agent, Environment, Action, State,
Policy, Reward.
Learning Models in RL:
Markov Decision Process
© Q-Learning Algorithm
© Deep Q LeamingThe Markov Property state that :
“Future is Independent of the past given the present”
Mathematically we ean express this statement as :
P[Si+1 | Si] = P[Sw1 | Sr, , Si]
It says that "If the agent is present in the current state S1, performs an action al
and move to the state s2, then the state transition from s1 to s2 only depends on the
current state and future action and states do not depend on past actions, rewards, or
states”.
MDP is a framework that can solve most Reinforcement Learning problems with
discrete actions.
With the Markov Decision Process, an agent can arrive at an optimal policy for
maximum rewards over time.
Markoy Process is the memory less random process i.e. a sequence of a random
state S[1],S[2].....S[n] with a Markoy Property.
Markov decision process has 5 tuples(S,A,P,R, 3):
Sis the set of states.
¢ Ais the set of action.
© P(S, A, S’)is the probability that ection A in the state S at time T will lead
to state S’ at time T+ 1
© R(S, A, S’) is the immediate reward received after a transition from State S
to S dash due to action A.* Discount Factor (x): It determines how much importanee is to be given to
the immediate reward and future rewards. It las a value between 0 and 1.
Quearning algorithm
© Q-leaming is a popular model-free reinforcement learning algorithm based.
on the Bellman equation.
«The main objective of Q-learning is to learn the policy which can inform the
agent that what actions
© The goal of the agent in Q-learning is to maximize the value of Q.
= Qostands for quality in Q-learning, which meaus it specifies the quality of
arf ation taken by the agent should be taken for maximizing the reward
under what circumstances
¢ A Q-Table is used to find the best action for each state in the environment.
We use the Bellman Equation at each state to get the expected future state
and reward and save it in a table to compare with other states.
Bellman Equation
V(s) ~ max [R@a) + yV(S')] Where,
V(s)= value calculated ata particular point.R(s, a) — Reward at a particular states by performing an action.
y= Discount factor
Q-Learning algorithm works like this:
Initialize all Q-values, e.g., with zeros
Choose an action a in the current state s based on the current best Q-value
Perform this action a and observe the outcome (new state s’).
Measure the reward R after this action
Update Q with an update formula that is called the Bellman Equation.
Repeat steps 2 to 5 until the learning no longer improves
EXAMPLE: An example of Q-learning is an Advertisement recommendation
system, In a normal ad recommendation system, the ads you get are based on your
previous purchases or websites you may have visited, If you've bought a TV, you
will get recommended TVs of different brands.
Using Q-leaming, we can optimize the ad recommendation system to recommend
products that are frequently bought together. The reward will be if the user clicks
on the suggested product.
DEEP Q-LEARNING MODEL
0” Q-Leamning approach is practical for very small environments’and quickly
loses it’s feasibility when the number of states and actions in the
environment increases.
co. The solution for the above leads us to Deep Q Learning which uses a deep
neural network to approximate the values.
© Deep Q Learning uses the Q-learning idea and takes it one step further.
© Tnsteadof using aQ-table;weusea Neural Network thattakes state and
approximates the Q-values for each action based on that state
The basic working step for Deep Q-Learning is that the initial state:is fed into the
neural network and it retums the Q-value of all possible actions as an outputCPE uy)
>
Q-Table| => Q Value
“Bed
Po el eaey
Pu
Sri hel BBNSwrrryse ror
Network
' ™D Q Value Action3
The difference between Q-Leaming and Deep Q-Leaming can be illustrated as
follows:
Gee FE i =
, mo =" EHInstead of using a Q-table, we use a Neural Network that takes a state and
approximates the Q-values for each action based on that state
Deep Neural Network
state ‘ .
‘Ques 6) What are the applications of reinforcement learning?
Following are the applications of reinforcement learning :
1. Robotics for industrial automation.
2, Business strategy planning
3, Machine learning and data processing.
4, Ithelps us to create training systems that provide custom instruction and
materials according to the requirement of students
5. Aircraft control and robot motion controlING AKTU
INEER BEING.This algorithm refleets the process of natural selection where the fittest individuals
are selected in order to produce offspring of the next generation,
The process of natural selection starts with the selection of fittest individuals from
a population.
© They produce offspring which inherit the characteristics of the parents and
will be added to the next generation.
© If parents have better fitness, their offspring will be better than parents and
have a better chance at surviving. This process keeps on iterating and at the
end, a generation with the fittest individuals will be found.
© This notion can be applied for a search problem.
The genetic algorithm is a method for solving both constrained and
unconstrained optimization problems that is based on natural selection, the
process that drives biological evolution. The genetic algorithm repeatedly
modifies a population of individual solutions.
Five phases are considered in a genetic algorithm.
1 Initial population
v
Fitness function
BY Séleetion
4” Crossover
Mutation’
wInitial Population:
The process begii set of it Is is called
individual is a soh the problem you solve.
Al {
A2 a1 | 4:4
A3 1/0
A4 {1 4
Fitness Function
ition. Each
You
ube
The fitness function determines how fit an individual is (the ability of an individual
[Sees
Selection
The idea o} the fittest indivi I ass their
genes to the reTwo pairs of individ
Individuals with hig!
Crossover
Crossover is the mo
parents to be mated
For example, consid
shown below.
Offspring are created by excha
parents among themselves until the cro
A1 |0/0|0}0/0|0Mutation
In certain new of
a mutation with a I lom probability.
bit string can be fli
© Be’
Mutation: Before and
Mutation occurs to maintain
within the population and
premature convergence.
Termination
‘The algorithin terminates ifthe population has converged (does not produce
ofigpringiwhich are significantly different from the previous'Generation). Then itis
said that the genctie algorithm has provided a Set oF solutions to ourproblem.
BEINGI
Once the initial generation is created, the algorithm evolves the generation using
following
1) Selecti idea is to give prefe
fitness sco nes2) Crossover Operator: This represents mating between individuals. Two
individuals are selected using selection operator and crossover sites are chosen
randomly. Then the genes at these crossover sites are exchanged thus creating a
completely new individual (offspring).
For example —
~
~~ -
3)Mutation Operator: The key idea is to insert random genes in offspring to
maintain the diversity in the population to avoid premature convergence. For
example
~ FE ac Peal
aa Bll BebeMOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU
IGINEER BEING
INTRODUCTION — Learning, Types of Learning, Well defined learning problems,
Designing a Leaming System, History of ML, Introduction of Machine Leaming
Approaches — (Artificial Neural Network, Clustering, Reinforcement Leaming,
Decision Tree Learning, Bayesian networks, Support Vector Machine, Genetic
Algorithm), Issues in Machine Learning and Data Scicnce Vs Machine Learning;
REGRESSION: Linear Regression and Logistic Regression
BAYESIAN LEARNING - Bayes theorem, Concept learning, Bayes Optimal
11 _| Classifier, Natve Bayes classifier, Bayesian belief networks, EM algorithit. SUPPORT
VECTOR MACHINE: Introduction, Types of support vector kernel ~ (Linear kemel,
polynomial kernel, and Gaussian kernel), Hyperplane ~ (Decision surface), Properties
of SVM, and Issues in SVM.
In 1943, neurophysiologist Warren McCulloch and mathematician Walter Pitts
created a model of neurons using an electrical circuit, and thus the neural network
was created.
2. In 1952, Arthur Samuel created the first computer program which could lean as
it ran,
3, Frank Rosenblatt designed the first artificial neural network in 1958, called
Perceptron. The main goal of this was pattern and shape recognition
4, Use of back propagation in neural networks came in 1986, when researchers
from the Stanford psychology department decided to extend an algorithm created
by Widrow and Hoff in 1962. This allowed multiple layers to be used in a neural
network, creating what are known’as®*slow learners, which willlearmoyer a long
period of time.
5. In 1997, the IBM computer Deep Blue, which was a chess-playing computer,
beat the world chess champion.21st Century :
1. Since the start o
Jearning will increé
A Bayesian netwo
variables and their ¢
It is also called a Bayes
model.
Bayesian networks are probabil
a probability distribution
it consists of two parts:
© Directed Acyclic Graph
© Table of conditional probabilities.ind a variable can
lat ity, cl rh itl
rglary, nm juake occurred, 1 C2 be led .
Conditional probability
cy
False = True 0.31 0.69
False False 0.001 0.999al Nel ~ “fT rh
ENGIN JE ‘TR
Rens .Ss- P(SIA)*P DIA Ls “5 PRP EE)
“BEINGThe Bayes Optimal Classifier is a probabilistic model that predicts the most
likely outcome for a new situation. It is based on bayes theorem
It’s also related to Maximum a Posteriori (MAP), a probabilistic framework
for determining the most likely hypothesis for a training dataset.
Take a hypothesis space that has 3 hypotheses h1=0.4, h2=0.3, and h3=0.3.
Hence, hl is the MAP hypothesis
Let a new instance x is encountered, whieh is classified negative by h2 and
3 but positive by hl.
P(uj{D) = Y) P(yjlhi)P (iD)
hieH
‘The most probable classification of the new instance is obtained by
g the predictions of all hypotheses, weighted by their posterior
To illustrate in terms of the above example, the set of possible classifications
of the new instance is V = (@, ©). and
Ph {DY 4, POjhy) = 0, Pl@jhy) =
P(h2\D) P(Olh2) = 1, P(@Ih2) = 0
Phy) D) = 3p P(Othy) = Ay PK@IA3) = O
therefore
DY Pemprddp) = 4
sa
DY PeMprad) = 6
and
argmax )* P(vjlhi)P(i|D) = ©
148.0) helthim, Inductive bias,
on theory, Information
[NSTANCE-BASED
d Regression, Radial
In machine learning, #
assumptions made by a
generalize a finite set of obs
domain.
Inductive bias describes the basis
tree over all the possible decision tree
1D3 scarch in favor of shorter tree over the longer ones and Sclee
highest information gain as the root attribute.
Thatis to say) indubtivednference i8 based On a genéralizition from a finite SeUOF,
past observations, extending the observed pattern or relation to other future
instances or instances occurting' elsewhere.
It is Togieally true but it might not be realistically true.i. ID3 is an algorithm used to generate a decision tree from a dataset.
ii. To construct a decision tree, ID3 uses 2 top-down, greedy search through the
given sets, where each attribute at every tree node is tested to select the attribute
that is best for classification of a given set.
iii. For constructing a decision tree information gain is calculated for each and
every attribute and attribute with the highest information gain becomes the root
node.
i.C4.5 is an algorithm used to generate a decision tree. It is an extension of 1D3
algorithm.
ii, It is better than the IDB algorithm because it deals with both continuous and
discrete attributes and also with the missing values and pruning trees after
construction.
iii, C5.0 is the commercial successor of C4.5 because it is faster, memory efficient
and used for building smaller decision trees. v. C4.5 performs by default a tree
pruning process.
© Gini index is a measure of impurity or purity used while creating a decision
tree in the CART(Classification and Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as compared to the
high Gini index
© It only creates binary splits, and the CART algorithm uses the Gini index to
create binary splits.
e Gini index ean be calculated using the below formula:
Gini Index= 1- Y;P?
Decision trees can represent any boolean function of the input attributes. Let’s
use decision trees to perform the function of three boolean gates AND, OR and
XOR.© K-Nearest Neighbour is one of the simplest Machine Learning algorithms
based on Supervised Learning technique.
© K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar to
the available categories.
© K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears then it can be
easily classified into a well suite category by using K- NN algorithm,
KNN Classifier
>
Input value Predicted Output
Need
With the help of K-NN, we can easily identify the category or class of a particular
dataset.
Working
The K-NN working caf be explained on the basis of the below algorithm:
© Step-1: Select the number K of the neighbors
© Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean
distances
Step-4: Among these k neighbors, count thé number of the data points in
each category.Step-5: Assign the new data points to that category for which the number of
the neighbor is maximum, Our model 1s ready
ARTIFICIAL NEURAL NETWORKS — Perceptron’s, Multilayer perceptron,
Gradient descent and the Delta rule, Multilayer networks, Derivation of
Backpropayaion Algoridhin, Generaligation, Unsupervised Learning — SOM
1V_| Algorithm and its variant;
DEEP LEARNING - Introduction, concept of convolutional neural network , Types
of layers — ( Convolutional Layers , Activation function , pooling , fully connected) ,
Concept of Convolution (1D and 2D) layers, Training of network, Case study of CNN
for eg on Diabetic Retinopathy, Building a smart speaker, Self-deriving car etc.
Convergence of neural networks is a point of training a model after which changes
in the learning rate become lower and the errors produced by the model in training
comes to a minimum.
Convergence of the neural network helps in defining how many iterations of
trai
2 a neural network will require to produce minimum errors.
Most of the neural network fails to converge because The amount of the training
data is low, Inappropriate weight application in the network, or Implementation of
not enough nodes may be a reason behing this issue.
There are various things to do that can help in avoiding this failure : Change in the
activation funetionean be helpful; reinitializationof the weights of the network.
A higher learning:rate or-thesnumber of epochs should be avoidedsto make the
neural network converge faster.This essentially means how good our model is at learning from the given data and
applying the leamt information elsewhere.
When training a neural network, there’s going to be some data that the neural
network trains on, and there’s going to be some data reserved for checking the
performance of the neural network.
If the neural network performs well on the data which it has not trained on, we can
say it has generalized well.
Due to overfitting, NN fails to form a general understanding.
In neural networks, adding dropout neuroas is one of the most popular and
effective ways to reduce overfitting in neural networks
@, @
On
Dropout appligdite'a
neural network at a
given instant
0, @_e.
> i { Equivalent neural
4 network at this instant
Self Driving carCNN is the primary algorithm that these systems use to recognize and classify
different parts of the road, and to make appropriate decisions.
To understand the workings of self-driving cars, we need to examine the four main
parts:
1. Perception
z Localization
3. Prediction
4. Decision Making
Perception
Perception, which helps the car see the world around itself, as well as recognize
and classify the things that it sees.
To achieve such a high level of perception, a self-driving car must have three
sensors:
1. Camera
2. LiDAR Light Detection And Ranging
3. RADAR Radio detection and ranging
Localization
Localization algorithms in self-driving cars calculate the position and orientation
of thevehicleas it navigates
Prediction
The car has a 360-degree view of its environment that enables it to perceive and
capture all the information and process it. Prediction creates an n number of
possible actions or moves based on ihe environment
Decision-making
Decision-making is vital in self-driving cars. In order to make a decision, the car
should have enough information so that it can select the necessary set of actions.uilding a smart speaker
A smart speaker is a wireless electronic device that can respond to spoken
commands
Hardware Components
Raspberry Pi
eKeSpeaker 2-mics Hat / USB mic / USB sound card
© SD card
© speaker
3.5mm Aux cable/ JST PH2.0 connector
Speech recognition is used
Convolutional Neural Network (CNN) is applied as advanced deep neural
networks to classify each word from our pooled data set as a multi-class
classification task. The proposed deep neural network returned 97.06% as word
classification accuracy with a completely unknown speech sample.
REINFORCEMENT LEARNING — Introduction to Reinforcement Learning ,
Learning Task, Example of Reinforcement Learning in Practice, Learning Models for
Reinforcement — (Markov Decision process , Q Learning - Q Learning-function,
V__ | Learning Algorithm ), Application of Reinforcement Learning, Introduction-to- Deep.
QLeaming.
GENETIC ALGORITHMS: Introduction, Components, GA cycle of reproduction,
‘Crossover, Mutation, Genctié Programming, Models of Evolution and Learning,
Applications.
1. RL in Roboties
Robotics without any doubt facilitates raining a robot in such 4 way thal a robot
can perform tasks — justelike»a: human being can. But stillsthere is a biggerchallenge the rob aay a ren't able to use
common sense wi a
2.Traffie Control
Reinforcement learning i i isi ing and optimization for
traffic control acti
3. Gaming
From creating ane
efficient and relatively easy resource on wh
4, Natural Lang ocessing
Predictive text, text st
are all examples of
learning. By studying typic
how people speak to each o
Two evolution models.
Lamarckian evolution.
Baldwin effect. @ rN
Lamatekian evolution believed that individual genetic makeup is changed by the
lifetime experience, That i Bi sia Organism ee during its life to adopt 2 the
in terms.L.Optimization ~ Genetic Algorithms are most commonly used in optimization
problems wherein we have to maximize or minimize a given objective function
value under a given set of constraints.
2. Traveling salesman problem (TSP)
The main motive of this problem is to find an optimal way to be covered by the
salesman. After each iteration, we can generate offspring solutions that can inherit
the qualities of parent solutions
3. Financial markets
In the financial market, using genetic optimization, we can solve a variety of issues
because genetic optimization helps in finding an optimal set or combination of
parameters that can affect the market rules and trades.
4, Manufacturing system
One of the major applications of genetic optimization is to minimize a cost
function using the optimized set of parameters.
5.Parametric Design of Aircraft ~ GAs have been used to design aircrafts by
varying the parameters and evolving better solutions.
You might also like
Machine Learning: Bimmactad, Franzes Louise Cacliong, Fredyhil Guinyang, Nora Dupingay, Cris Ann Padduyao, Ynhavianie
Machine Learning: Bimmactad, Franzes Louise Cacliong, Fredyhil Guinyang, Nora Dupingay, Cris Ann Padduyao, Ynhavianie
35 pages