Sridevi Women’s Engineering College
V.N.Pally,Hyderabad
Department Of Computer Science & Engineering (AI & ML)
UNIT I
Machine Learning
R22 Machine Learning Lecture Notes
UNIT-I
Introduction: Learning, Types of Machine Learning, Supervised Learning, The Brain and the
Neuron, Design a Learning System, Perspectives and Issues in Machine Learning, Concept
Learning Task, Concept Learning as Search, Finding a Maximally Specific Hypothesis,
Version Spaces and the Candidate Elimination Algorithm, Linear Discriminates, Perceptron,
Linear Separability, Linear Regression
Learning:
Getting better at some task through practice is called Learning
Machine Learning:
Machine Learning is the study of computer algorithms that allow computer programs
to automatically improve through experience.
Machine Learning is about making computers modify or adapt their actions so that
these actions get more accurate.
Machine learning is a subset of AI, which enables the machine to automatically learn
from data, improve performance from past experiences, and make predictions.
Types of Machine Learning:
Supervised Learning:
A training set of examples with correct responses is provided and based on this, the
algorithm generalises to respond correctly to all possible inputs is called supervised
learning.
Works on labelled data.
Unsupervised Learning:
Correct answers are not provided, but instead the algorithm tries to identify
similarities between the inputs so that inputs that have something in common are
categorised together.
Works on unlabelled data.
Reinforcement Learning:
Reinforcement learning works on a feedback-based process, in which an AI agent
automatically explore its surrounding by hitting & trail, taking action, learning from
experiences, and improving its performance.
Agent gets rewarded for each good action and get punished for each bad action; hence
the goal of reinforcement learning agent is to maximize the rewards.
In reinforcement learning, there is no labelled data like supervised learning, and
agents learn from their experiences only.
Evolutionary Learning:
Biological evolution can be seen as a learning process.
Biological organisms adapt to improve their survival rates and chance of having
offspring in their environment.
1
R22 Machine Learning Lecture Notes
Uses fitness score, which corresponds to a score for how good the current solution is.
Supervised Learning:
The algorithm should produce sensible outputs for inputs that were not encountered
during learning is called generalisation
Supervised machine learning can be classified into two types of problems
o Classification
o Regression
Classification algorithms are used to solve the classification problems in which the
output variable is categorical, such as "Yes" or “No”.
The classification algorithms predict the categories present in the dataset.
Example: Spam Detection, Email Filtering
Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables.
These are used to predict continuous output variables, such as market trends, weather
prediction, etc.
The algorithm should produce sensible outputs for inputs that were not encountered
during learning is called generalisation.
The Machine Learning Process:
1. Data Collection and Preparation
Machine learning algorithms need significant amounts of data preferably
without too much noise.
It is hard to collect data because they are in a variety of a places and formats
and merging it appropriately is difficult.
We should ensure data should be clean that is it does not have significant
errors, missing data.
2. Feature Selection
It consists of identifying the features that are most useful for the problem
3. Choose Appropriate Algorithm
The knowledge of the underlying principles of each algorithm and examples
of their use is required.
4. Parameter and Model Selection
For many of the algorithms there are parameters that have to be set manually,
or that require experimentation to identify appropriate values.
5. Training
Given the dataset, algorithm, and parameters, training should be simply the
use of computational resources in order to build a model
6. Evaluation
Before a system can be deployed it needs to be tested and evaluated for
accuracy on data that it was not trained on.
The Brain and the Neuron:
In animals, learning occurs within the brain.
R22 Machine Learning Lecture Notes
If we can understand how the brain works, then there might be things in there for us to
copy and use for our machine learning systems.
The brain is an impressively powerful and complicated system.
It deals with noisy and even inconsistent data, and produces answers that are usually
correct from very high dimensional data (such as images) very quickly.
It weighs about 1.5 kg and is losing parts of itself all the time (neurons die as you age
at impressive/depressing rates), but its performance does not degrade appreciably.
The processing unit of the brain is called neuron (nerve cell)
Signals can be received from dendrites, and sent down the axon once enough signals
were received.
We can mimic most of this process by coming up with a function that receives a list
of weighted input signals and outputs some kind of signal if the sum of these
weighted inputs reach a certain bias.
Each neuron can be viewed as a separate processor, performing a very simple
computation: deciding whether or not to fire.
This makes the brain a massively parallel computer made up of 10 11 processing
elements.
Fig.1 Biological Neuron Vs Artificial Neuron
Hebb’s Rule:
Hebb’s rule says that the changes in the strength of synaptic connections are
proportional to the correlation in the firing of the two connecting neurons.
if two neurons consistently fire simultaneously, then any connection between them
will change in strength, becoming stronger.
if the two neurons never fire simultaneously, the connection between them will die
away.
The idea is that if two neurons both respond to something, then they should be
connected.
3
R22 Machine Learning Lecture Notes
McCulloch and Pitts Neurons:
Fig.2 McCulloch and Pitts mathematical model of neuron
The inputs xi are multiplied by the weights wi, and the neurons sum their values.
If this sum is greater than the threshold then the neuron fires; otherwise it does not.
a set of weighted inputs wi that correspond to the synapses
an adder that sums the input signals (equivalent to the membrane of the cell
that collects electrical charge)
an activation function (initially a threshold function) that decides whether the neuron
fires for the current inputs.
Limitations of McCulloch and pitts model:
Inability to handle non-boolean inputs.
The requirement to manually set thresholds.
All inputs are treated equally; no weighting mechanism.
Cannot handle functions that are not linearly separable
Design a Learning System:
1. Choosing the Training Experience
2. Choosing the Target Function
3. Choosing a Representation for the Target Function
4. Choosing a Function Approximation Algorithm
a. Estimating Training Values
b. Adjusting The Weights
5. The Final Design
R22 Machine Learning Lecture Notes
Designing a Learning
System:
Designing a learning system, as outlined in Tom Mitchell's work, involves several key
steps. These include defining the task, selecting training data, choosing a target function,
selecting a representation for that function, selecting a learning algorithm, and finally,
finalizing the design.
Here's a breakdown of these steps:
1. Defining the Task:
This involves specifying what the learning system is expected to do. What task will it
perform, and what will be the performance measure to evaluate its success?
For example, in a spam filter, the task is to classify emails as spam or not spam, and the
performance measure could be accuracy (the percentage of correctly classified emails).
2. Choosing the Training Experience:
This step involves selecting the data the system will learn from. The training experience
should be relevant to the task and provide enough information for the system to learn
effectively.
For the spam filter, the training experience would be a set of labeled emails (emails with
known spam/not spam classifications).
3. Choosing the Target Function:
This refers to the function that the learning system will approximate. It's the "ideal" function
that maps inputs to desired outputs.
In the spam filter example, the target function would be the mapping between email content
and its spam/not spam classification.
4. Choosing a Representation for the Target Function:
This step involves selecting a way to represent the target function that the learning system can
use. This could be a decision tree, a neural network, or another model.
For example, a decision tree could be used to represent the target function for a spam filter,
where each node represents a feature of the email (e.g., presence of certain words) and
branches represent decisions based on those features.
5. Choosing a Learning Algorithm:
This step involves selecting the algorithm that will be used to learn the target function from
the training data.
Various algorithms exist, such as decision tree learning, neural network training, or support
vector machines, and the choice depends on the task and representation.
6. Finalizing the Design:
R22 Machine Learning Lecture Notes
This step involves evaluating the performance of the learned system and making adjustments
to the design as needed. This may involve tuning parameters of the learning algorithm or
gathering more training data.
Key Perspectives:
Definition of Learning:
Mitchell defines machine learning as a program learning from experience E, with respect to a
task T and performance measure P, if its performance on T improves with experience E.
Components of a Learning Problem:
A well-defined machine learning problem involves identifying the task (T), the experience
(E), and the performance measure (P).
Types of Learning:
Machine learning encompasses various approaches, including supervised learning (learning
from labeled data), unsupervised learning (discovering patterns in unlabeled data), and
reinforcement learning (learning through trial and error).
Importance of Data:
Machine learning heavily relies on data for training and generalization. The quality and
quantity of data significantly impact the performance of learning models.
Generalization:
A crucial aspect of machine learning is the ability of a model to generalize from training data
to unseen data. This involves avoiding overfitting, where the model learns the training data
too well and performs poorly on new data.
Common Issues in Machine Learning:
Data Acquisition and Preparation:
Obtaining sufficient, relevant, and clean data is often a major challenge.
Feature Engineering:
Selecting and creating appropriate features from raw data is crucial for effective learning.
Model Selection:
Choosing the right algorithm and architecture for a specific task can be complex.
Computational Resources:
Training complex models can require significant computational power and resources.
Interpretability and Explainability:
Understanding how a machine learning model arrives at its decisions is becoming
increasingly important, especially in sensitive applications.
Bias and Fairness:
Machine learning models can perpetuate and amplify existing biases in the data, leading to
unfair or discriminatory outcomes.
R22 Machine Learning Lecture Notes
Ethical Considerations:
The deployment of machine learning systems raises ethical concerns about privacy, security,
and accountability.
Scalability:
Ensuring that machine learning models can handle large datasets and complex tasks
efficiently is an ongoing challenge.
These perspectives and issues highlight the multifaceted nature of machine learning,
emphasizing the need for careful consideration of data, algorithms, and ethical implications in
the design and deployment of these systems.
Perspectives and Issues in Machine Learning
Perspectives:
It involves searching a very large space of possible hypothesis to determine one that
best fits the observed data and any prior knowledge held by the learner.
Search a hypothesis space defined by some underlying representations (linear
function, decision tree, ANN).
These different hypothesis representations are appropriate for learning different kind
of target functions.
Learning as search problem where analysing the relationship between the size of
hypothesis space to be searched, the number of training examples available, the
confidence we can have that a hypothesis consistent with the training data will
correctly generalize to unseen examples.
Generic Issues:
The generic issues machine are:
What algorithms exist for learning general target functions from specific training
examples?
In what settings will particular algorithms converge to the desired function, given
sufficient training data?
Which algorithms perform best for which types of problems and representations?
How much training data is sufficient?
What general bounds can be found to relate the confidence in learned hypotheses to
the amount of training experience and the character of the learner's hypothesis space?
When and how can prior knowledge held by the learner guide the process of
generalizing from examples?
Can prior knowledge be helpful even when it is only approximately correct?
What is the best strategy for choosing a useful next training experience, and how does
the choice of this strategy alter the complexity of the learning problem?
R22 Machine Learning Lecture Notes
What is the best way to reduce the learning task to one or more function
approximation problems?
What specific functions should the system attempt to learn? Can this process itself be
automated?
How can the learner automatically alter its representation to improve its ability to
represent and learn the target function?
The answers to the above questions will solve the most of the issues in machine learning.
Specific issues in Machine Learning:
1. Inadequate Training Data
2. Poor quality of data
3. Non-representative training data
4. Overfitting and Underfitting
5. Monitoring and Maintenance
6. Getting Bad Recommendations
7. Lack of Skilled Resources
8. Customer Segmentation
9. Process Complexity of Machine Learning
10. Data Bias
Concept Learning Task:
Inferring a Boolean-valued function from training examples of its input and output is
called concept learning.
Example Problem: Days on which my friend enjoys his favourite water sport.
Table: Positive and Negative Training examples for the target concept EnjoySport
The most general hypothesis –that every day is a positive example is represented by
<?, ?, ?, ?, ?, ?>
< ∅,∅,∅,∅,∅,∅>
The most specific possible hypothesis – that no day is a positive example.
R22 Machine Learning Lecture Notes
Table: The EnjoySport Concept Learning Task
Concept Learning as Search:
Concept learning can be viewed as the task of searching through a large space of
hypotheses implicitly defined by the hypothesis representation.
The goal of this search is to find the hypothesis that best fits the training examples
If we view learning as a search problem, then it is natural that our study of learning
algorithms will examine different strategies for searching the hypothesis space.
We will be particularly interested in algorithms capable of efficiently searching very
large or infinite hypothesis spaces, to find the hypotheses that best fit the training
data.
General-to-specific ordering of hypotheses:
Many algorithms for concept learning organize the search through the hypothesis
space by relying on a very useful structure that exists for any concept learning
problem: a general-to-specific ordering of hypotheses.
we can design learning algorithms that exhaustively search even infinite hypothesis
spaces without explicitly enumerating every hypothesis.
To illustrate the general-to-specific ordering, consider the two hypotheses.
h1=<Sunyy,?,?,Strong,?,?>
h2=<Sunny,?,?,?,?,?>
Now consider the sets of instances that are classified positive by hl and by h2.
Because h2 imposes fewer constraints on the instance, it classifies more
instances as positive.
In fact, any instance classified positive by hl will also be classified positive by h2.
Therefore, we say that h2 is more general than hl
FIND-S: Finding a Maximally Specific Hypothesis:
Example: EnjoySport
The first step of FIND-S algorithm is to initialize h to the most specific hypothesis in
h< ∅,∅,∅,∅,∅,∅>
H.
R22 Machine Learning Lecture Notes
Consider first training example
h <Sunny, Warm, Normal, Strong, Warm, Same>
Here h is still very specific.
Next consider the second training example forces the algorithm to further generalize
h.
h <Sunny, Warm, ?, Strong, Warm, Same>
Upon encountering the third training example-in this case a negative example- the
algorithm makes no change to h.
In fact, the FIND-S algorithm simply ignores every negative example.
Consider the fourth training example leads to a further generalization of
h. h <Sunny, Warm, ?, Strong, ?, ?>
Version Spaces:
The version space represents the set of all hypotheses that are consistent with the
training examples.
It includes all hypotheses that correctly classify all the training examples as belonging
to their respective categories.
It is possible to represent the version space by two sets of hypotheses:
(1) the most specific consistent hypotheses
(2) the most general consistent hypotheses
The Candidate Elimination Algorithm:
The key idea in the Candidate Elimination Algorithm is to output a description of the
set of all hypotheses consistent with the training examples.
Surprisingly, this algorithm computes the description of the set without explicitly
enumerating all of its members
Example: EnjoySport
R22 Machine Learning Lecture Notes
Initially : G = [[?,?,?,?,?,?],[?,?,?,?,?,?],[?,?,?,?,?,?],[?,?,?,?,?,?],[?,?,?,?,?,?],[?,?,?,?,?,?]]
S = [Null, Null, Null, Null, Null, Null]
For instance 1: <'sunny','warm','normal','strong','warm ','same'> and positive output.
G1 = G
S1 = ['sunny','warm','normal','strong','warm ','same']
For instance 2 : <'sunny','warm','high','strong','warm ','same'> and positive output.
G2 = G
S2 = ['sunny','warm',?,'strong','warm ','same']
For instance 3 : <'rainy','cold','high','strong','warm ','change'> and negative output.
G3 = [['sunny', ?, ?, ?, ?, ?], [?, 'warm', ?, ?, ?, ?], [?, ?, ?, ?, ?, ?],
[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, 'same']]
S3 = S2
For instance 4 : <'sunny','warm','high','strong','cool','change'> and positive output.
G4 = G3
S4 = ['sunny','warm',?,'strong', ?, ?]
At last, by synchronizing the G4 and S4 algorithm produce the output.
G = [['sunny', ?, ?, ?, ?, ?], [?, 'warm', ?, ?, ?, ?]]
S = ['sunny','warm',?,'strong', ?, ?]
Linear Discriminates:
Linear Discriminants is a statistical method of dimensionality reduction that provides
the highest possible discrimination among various classes.
It is used in machine learning to find the linear combination of features, which can
separate two or more classes of objects with best performance.
It has been widely used in many applications, such as pattern recognition, image
retrieval, speech recognition, among others.
10
R22 Machine Learning Lecture Notes
The method is based on discriminant functions that are estimated based on a set of
data called training set.
These discriminant functions are linear with respect to the characteristic vector, and
usually have the form
f(t)=wtx+b0,
where w represents the weight vector, x the input vector, and b0 a threshold.
Perceptron:
The Perceptron is nothing more than a collection of McCulloch and Pitts neurons
together with a set of inputs and some weights to fasten the inputs to the neurons
Notice that the neurons in the Perceptron are completely independent of each other:
It doesn’t matter to any neuron what the others are doing
It works out whether or not to fire by multiplying together its own weights and the
input, adding them together, and comparing the result to its own threshold.
The result is a pattern of firing and non-firing neurons, which looks like a vector of 0s
and 1s
Example: (0,1,0,0,1)
R22 Machine Learning Lecture Notes
The Learning Rate:
This parameter learning rate determines how fast or slow we will move towards the
optimal weights.
If the learning rate is very large we will skip the optimal solution.
If it is too small we will need too many iterations to converge to the best values. So
using a good learning rate is crucial.
We therefore use a moderate learning rate, typically 0.1 < η < 0.4, depending upon
how much error we expect in the inputs.
The Bias Input:
A bias term is added to the input layer to provide the perceptron with additional
flexibility in modeling complex patterns in the input data.
∑wi*xi = x1*w1 + x2*w2 + x3*w3+............4*w4
Add a term called bias ‘b’ to this weighted sum to improve the model’s performance.
Y=f(∑wi*xi + b)
Where b is bias
There are three main reasons why we need to add bias:
1. It assists in achieving a better data fit and learning complex patterns
2. Handling zero inputs and mitigating the problem of vanishing gradient
3. Prevents underfitting and overfitting. Improves generalization
The Perceptron Learning Algorithm:
12
R22 Machine Learning Lecture Notes
Linear Seperability:
Linear separability implies that if there are two classes then there will be a point, line,
plane, or hyperplane that splits the input features in such a way that all points of one
class are in one-half space and the second class is in the other half-space.
For example, here is a case of selling a house based on area and price. We have got a
number of data points for that along with the class, which is house Sold/Not Sold:
Linear Regression:
Linear regression is a type of supervised machine learning algorithm that computes
the linear relationship between the dependent variable and one or more independent
features by fitting a linear equation to observed data.
When there is only one independent feature, it is known as Simple Linear Regression,
and when there are more than one feature, it is known as Multiple Linear Regression.
Similarly, when there is only one dependent variable, it is considered Univariate
Linear Regression, while when there are more than one dependent variables, it is
known as Multivariate Regression.
Types of Linear Regression:
There are two main types of linear regression
1. Simple Linear Regression
2. Multiple Linear Regression
Simple Linear Regression:
This is the simplest form of linear regression.
It involves only one independent variable and one dependent variable.
The equation for simple linear regression is:
y=β0+β1X
where:
Y is the dependent variable
13
R22 Machine Learning Lecture Notes
X is the independent variable
β0 is the intercept
β1 is the slope
Multiple Linear Regression:
This involves more than one independent variable and one dependent variable.
The equation for multiple linear regression
is: y=β0+β1X1+β2X2+………βnXn
where:
Y is the dependent variable
X1, X2, …, Xn are the independent variables
β0 is the intercept
β1, β2, …, βn are the slopes
The goal of the algorithm is to find the best Fit Line equation that can predict the values
based on the independent variables.
Best Fit Line:
The best Fit Line equation provides a straight line that represents the relationship between
the dependent and independent variables.
The slope of the line indicates how much the dependent variable changes for a unit
change in the independent variable(s).
Our primary objective while using linear regression is to locate the best-fit line, which
implies that the error between the predicted and actual values should be kept to a
minimum.
There will be the least error in the best-fit line.
Fig: Linear Regression
14
R22 Machine Learning Lecture Notes
Cost function for Linear Regression:
The cost function or the loss function is nothing but the error or difference between
the predicted value Ŷ and the true value Y.
In Linear Regression, the Mean Squared Error (MSE) cost function is employed,
which calculates the average of the squared errors between the predicted values ŷi and
the actual values yi.
The purpose is to determine the optimal values for the intercept θ 1 and the coefficient
of the input feature θ2 providing the best-fit line for the given data points.
The linear equation expressing this relationship is
ŷi = θ1+θ2xi
MSE function can be calculated as:
*****