0 ratings 0% found this document useful (0 votes) 37 views 161 pages Introduction To Machine Learning-Compressed
The document outlines the syllabus for a course on Machine Learning, covering topics such as types of learning, basic algorithms, dimensionality reduction, Bayesian concepts, logistic regression, neural networks, ensemble learning, and clustering. It emphasizes the importance of machine learning in developing computational theories and algorithms that allow computers to learn from experience. The course includes multiple choice questions and practical applications to reinforce learning concepts.
AI-enhanced title and description
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save Introduction to machine learning-compressed For Later AS PER NEW SYLLABUS - GTU - SEN - VII (EGE/ELEX.) Professional Elective - IV
Introduction
to Machine
-_
S f
be A oc! sf Sub, Cote | 3171114
ut}
* Simplified & Conceptual Approach
* Multiple Choice Questions with Answers
first edition : aca. i
iti
RI i
TECHNICAL |. A. Dhotre
PUBLICATIONS
‘An Up-Thrust for Knowledge
Scanned With CamScanner|
SYLLABUS
Introduction to Machine Learning - (3171114)
Examination Marks
Theory Marks Practical Marks
ESE (E) | PA(M) ESE(¥) PA(D
30 20
70
1. Introduction to Machine Learning
Introduction, Different Types of Learning, Hypothesis Space, Inductive Bias, Evaluation and Cross
Validation (Chapter = 1)
2, Basic Machine Learning Algorithms
Linear Regression, Decision Trees, Learning Decision Trees, K-nearest Neighbour, Collaborative
Filtering, Overfitting (Chapter - 2)
3. Dimensionality Reduetion
Feature Selection, Feature Extraction (Chapter ~3)
4, Bayesian Concept of Learning,
Bayesian Leaming, Naive Bayes, Bayesian Network, Exercise on Naive Bayes (Chapter - 4)
5, Logistic Regression and Support Vector Machine
Logistic Regression, Introduction to Support Vector Machine, The Dual Formation, Maximum
‘Margin with Noise, Nonlinear SVM and Kemel Function, SVM: Solution to the Dual Problem
(Chapter -§)
6. Basics of Neural Network
Introduction to neural network, Multilayer Neural Network, Neural Network and Backpropagation
Algorithm, Deep Neural Network (Chapter = 6)
7. Computation and Ensemble Learning
Jntroduction to Computation Leaming, Sample Complexity: Finite Hypothesis Space, VC
Dimension, Introduction to Ensembles, Bagging and Boosting (Chapter - 7)
8. Basie Concepts of Clustering
Introduction to Clustering, K-mcans Clustering, Agglomerative Hierarchical Clustering
(Chapter - 8)
©)
Scanned With CamScannerTABLE OF CONTENTS
Chapter-1 Introduction to Machine Learning (1-1) to
1.1 Introduction...
1.1.1 How do Machine Learn?.
1.1.2 Well Posed Learning Problem ............00.04
1.2 Types of Machine Learning
1.2.1 Supervised Learning .
224.2 Classification .
1.2.1.2 Regression.
1.2.2 Unsupervised Learning. ..
12.21 Clustering...
1.2.3 Reinforcement Learning.
1.2.3.1 Elements of Reinforcement Lea
1.3 Application of Machine Learning
1.4 Hypothesis Space...
1.5 Inductive Bias...
6 Evaluation and Cross Validation
1.6.1 Evaluating Performance Medel. , '
1.6.2 Concept Learning .
1.6.3 Concept Leaming as Search. ......4...,,.,,.
1.6.4 Find -S Algorithm , ., ee
1.7 Mutliple Choice Questions with Answers.
Chapter- 2 Basic Machine Learning Al,
2.1 Linear Regression...
2.1.1 Simple Linear Regression . .
2.1.2 Multiple Linear Regression .,,
2.1.3 Lasso and Ride Regression .......
2.2 Decision Tree...
&)
Scanned With CamScanner2.2.1 Decision Tree Representation ....
2.2.2 Appropriate Problem for Decision Tree Learning .
2.2.3 Advantages and Disadvantages of Decision Tree.
2.3 Basic Decision Tree Learning Algorithm
2.3.1 Which Attribute is "best"? .
2.3.2 Information Gain.
2.3.3 The 1D3 Algorithm:
2.4 K-nearest Neighbour......... wonssssned - 18
2.5 Collaborative Filtering. ue 19
2.5.1 Types of Collaborative Filtering. ....+.-- paewecens ‘ 2-20
2.5.2 Collaborative Filtering Algorithms. ... 2.2.5 .2-21
2.5.3 Advantages and Disadvantages of Collaborative Filtering . 2-23
2.6 Overfitting.... -24
2.7 Multiple Choice Questions with Answers -26
Chapter-3 Dimensionality Reduction (3 - 1) to (3 - 14)
3.1 Introduction of Dimensionality Reduction .....
3.1.1 Advantages and Disadvantages... ..
3.2 Feature and Feature Engineering ..
3.3 Feature Transformation...
3.3.1 Feature Construction . .
3.3.2 Feature Extraction
3.4 Feature Subset Selecti
3.4.1 Issues in High - Dimensional Data
3.4.2 Key Drivers
3.4.3 Measures of Feature Relevance and Redundancy.......... slat weed 8
3.4.4 Overall Feature Selection Process.......+ Samad
3.4.5 Feature Selection Approaches......0...scseeeseeseees
3.4.6 Difference between Filter, Wrapper and Embedded Method ........+ vee B22
3.5 Multiple Choice Questions with Answers. 13-12
Chapter-4 Bayesian Concept of Learning (4 = 1) to (4- 12)
4.1 Importance of Bayesian Methods... 2
ww 1
Scanned with CamScanner4.2 Bayes Theorem...
4.2.1 Prior and Posterior Probability .
4.2.2 Maximum - Likelihood Estimation... .
4.3 Bayes' Theorem and Concept Learning
4.3.1 Consistent Learners
4.3.2 Bayes Optimal Classifier.
4.3.3 Naive Bayes Classifier.
4.4 Bayesian Belief Network
4.5 Fill in the Blanks with Answers
Chapter-5 Logistic Regression and Support Vector Maghine to (5-12)
5.1 Logistic Regression...
5.2 Introduction to Support Vector Machine ..
5.2.1 Key Properties of Support Vector Machines.
5-7
5.2.5 Comparison of SVM and Neural Networks 5-8
5.3 Kernel Methods for Non - linearity. wD
Chapter-6 Basics of Neural Network (6- 1) to (6 - 26)
6.1 Introduction to Neural Network
6.1.1 Advantages of Neural Networ!
6.1.2 Application of Neural Network
6.1.3 Difference between Digital Computer and Neural Networks ...
6.2 Biological Neurons
6.3 Architecture of Neural Network ......
6.3.1 Single Layer Feed Forward Network.........
6.3.2 Multi - Layer Feed Forward Network
6.3.3 Recurrent Neural Network...... Savina (eee
6.4 Implementation of ANN...
6.4.1 McCulloch Pitts Neuron
ou
Scanned With CamScanner6.4.2 Rosenblatt's Perceptron........--+
6.4.3 ADALINE Network Model
6.5 Backpropagation Algorithm...
6.5.1 Advantages and Disadvantages » +--+
6.6 Deep Learning
6.7 Multiple Choice Questions with Answers:
Chapter-7 Computation and Ensemble Learning
women
7.1 Introduction to Computation Learning.
7.4.1 Probably Approximately Correct (PAC) Framework.
74.2 Mistake Bound Framework ....-++++++
7.2 Sample Complexity : Finite Hypothesis Space
7.2.1 VC Dimension... .
7.2.2 VC for Neural Networ!
7.3 Introduction to Ensembles
7.3.1 Bagging .
7.3.2 Boosting.
7.3.3 Randomization
Chapter-8 Basic Concepts of Clustering
8.1 Introduction te Clustering.
8.1.1 Partitioning Methods .
BLL1K-meanChstering 0. ee es
B112kMedoids 6...
8.1.2 Hierarchical Methods .
8.1.2.1 Difference between Clustering Vs Classification
8.2 Hierarchical Clustering...
8.2.1 Agelomerative Hierarchical Clustering ........
8.2.2 Divisive Hierarchical Clustering ..6...6..cccceseeeus
8.2.3 Dendrogram ...
8.2.4 Agglomerative Clustering in Scikit-learn....
8.2.5 Connectivity Constraints
8.3 Multiple Choice Questions with ANSWEFS ys.
wa
Scanned with CamScannerIntroduction to Machine Learning
Syllabus
Introduction, Different Types of Learning, Hypothests Space, Inductive Bias, Evaluation and Cross
Validation,
Contents
4.1. Introduction
1.2. Types of Machine Leaming
1.3 Application of Machine Learning
1.4 Hypothesis Space
1.5. Inductive Blas
1.6 Evaluation and Cross Validation
1.7 Mutliple Choice Questions
a-D
Scanned With CamScannera
Introduction to Machine Learning t-2 Intradtueton to Machine Leaning
Introduction
* Machine Learning (ML) is a sub-field of Artificial Intelligence (AI) which concerns
with developing compulational theories of leaning and building learning
machines.
* Learning is a phenomenon and process which has manifestations of various
aspects. Learning process includes gaining of new symbolic knowledge and
development of cognitive skills through instruction and practice. It is also
discovery of new facts and theories through observation and experiment.
Machine Learning Definition : A computer program is said to learn from
experience E with respect to some class of tasks T and performance measure P, if
its performance at tasks in T, as measured by P, improves with experience E.
Machine learning is programming computers to optimize a performance criterion
‘using example data or past experience. Application of machine learning methods
to large databases is called data mining.
It is very hard to write programs that solve problems like recognizing a human
face. We do not know what program to write because we don't know how our
brain does it, Instead of writing a program by hand, it is possible to collect lots of
examples that specify the correct output for a given input.
) * A machine learning algorithm then takes these examples and produces a program
.
that does the job. The program produced by the learning algorithm may look very
different from a typical hand-written program. It may contain millions of numbers.
If we do it right, the program works for new cases as well as the ones we trained
it on.
* Main goal of machine learning is to devise learning algorithms that do the
learning automatically without human intervention or assistance. The machine
learning paradigm can be viewed as “programming by example.” Another goal is
to develop computational models of human learning process and perform
computer simulations.
* The goal of machine learning is to build computer systems that can adapt and
learn from their experience.
* Algorithm is used to solve a problem on computer. An algorithm is a sequence of
instruction. It should carry out to transform the input to output, For example, for
addition of four numbers is carried out by giving four number as input to
algorithm and output is sum of all four numbers. For the same task, there may be
various algorithms. It is interested to find the most efficient one, requiring 8
least number of instructions or memory or both.
* For some tasks, however, we do not have an algorithm.
TECHNICAL PUBLIGATIONS® - an up-trust for knowledge i
Scanned with CamScannerIntroduction to Machine Leeming 1-3 Intraduction ta Machine Learning
Why
.
How
P
Is Machine Learning Important 7
Machine learning algorithms can figure out how to perform important tasks by
generalizing from examples.
Machine Learning provides business insight and intelligence. Decision makers are
Provided with greater insights into their organizations. This adaptive technology is
being used by global enterprises to gain a competitive edge.
Machine learning algorithms discover the relationships between the variables of a
system (input, output and hidden) from direct samples of the system.
Following are some of the reasons :
1. Some tasks cannot be defined well, except by examples. For example ;
Recognizing people.
2. Relationships and correlations can be hidden within large amounts of data. To
solve these problems, machine learning and data mining may be able to find
these relationships.
. Human designers often produce machines that do not work as well as desired
in the environments in which they are used,
+ The amount of knowledge available about certain tasks might be too large for
explicit encoding by humans.
5. Environments change time to time,
6. New knowledge about tasks is constantly being discovered by humans.
Machine learning also helps us find solutions of many problems in. computer
“sion, speech recognition and robotics. Machine leaming uses the theory of
Statistics in building mathematical models, because the core task is making
inference from a sample.
Machines Learn 7
Machine leaming typically follows three phases :
Training : A training set of examples of correct behavior is analyzed and some
fepresentation of the newly learnt ‘knowledge is stored, This is some form of rules.
Validation : The rules are checked and, if necessary, additional training is given.
Sometimes additional test data are used, but instead, a human expert may validate
the rules, or some other automatic knowledge - based component may be used.
The role of the tester is often called the opponent.
Application : The rules are used in responding to some new situation.
j TECHNICAL PUBLICATIONS” - an up-thrust for knowledge
Scanned with CamScannerFig. 1.4.1
ERED How do Machine Learn 7
* Machine learning
generalization,
Fig: 1.1.2 shows machine leaming process.
‘Process in divided into three Parts : Data inputs, abstraction and
Fig. 1.1.2 Machine leaming process
Data input ; Information is used for future decision making.
Abstraction + Inj
algorithm,
put data is represented in broader Way through the underlying
* Algorithm is used to solve a problem on computer. An algorithm is a sequence i
instruction. It should carry out to transform the input to output. For example,
"Teas oBLCAMOR con pal ieee
(etl
Scanned with CamScannerIntroduction to Machine Leaming 1-6 Introduction te Machine Learning
addition of four numbers is carried out by giving four number as input to the
algorithm and output is sum of all four numbers.
* For the same task, there may be various algorithms. It is interested to find the
most efficient one, requiring the least number of instructions or memory or both,
Abstraction
© During the machine learning process, knowledge is fed in the form of input data.
Collected data is raw data. It can not used directly for processing.
* Model known in machine leaning paradigm is summarized knowledge
representation of raw data, The model may be in any one of the following forms :
1. Mathematical equations.
2. Specific data structure like trees.
3. Logical grouping of similar observations.
4. Computational blocks.
* Choice of the model used to solve specific learning problem is the human task.
Some of the parameters are as follows :
a) Type of problem to be solved.
b) Nature of the input data.
©) Problem domain.
Well Posed Learning Problem
* Definition : A computer program is said to leam from experience E with respect
to some class of tasks T and performance measure P, if its performance at tasks
in T, as measured by P, improves with experience E.
* A (machine learning) problem is well-posed if a solution to it exists, if that
solution is unique, and if that solution depends on the data / experience but it is
not sensitive to (reasonably small) changes in the data / experience.
© Identify three features are as follows :
1. Class of tasks
2 Measure of performance to be improved
3. Source of experience
© What are T, P, E? How do we formulate a machine learning problem ?
* A Robot Driving Learning Problem
1. Task T: Driving on public, dane highway using vision sensors.
| 2 Performance measure P ; Average distance traveled before an error (as judged
by human overseer).
TECHNICAL PUBLICATIONS® » an up-thrust for knowiedge
Scanned with CamScanneri=. a. |
Introduction to Machine Leeming 1-6 Introduction to Machine Leaming
3. Training experience E : A sequence of images and steering command:
recorded while observing a human driver.
* A Handwriting Recognition Learning Problem.
1. Task T: Recognizing and classifying handwritten words within images.
2. Performance measure P ; Percent of words correctly classified.
3. Training experience E : A database of handwritten words with given
classifications.
* Text Categorization Problem.
1, Task T; Assign a document to its content category:
2. Performance measure P : Precision and Recall.
3. Training experience E : Example pre-classified documents.
EE] Types of Machine Learning
Learning is constructing or modifying representation of what is being experienced,
Learn means to get knowledge of by study, experience ot being taught.
Machine learning is a scientific discipline concerned with the design and
development of the algorithm that allows computers to evolve behaviours based
on empirical data, such as form sensors data or database,
Machine learning is usually divided into three types : Supervised, unsupervised
and reinforcement learning.
Why do machine learning ?
1, To understand and improve efficiency of human learning.
2. Discover new things or structure that is unknown to humans.
3. Fill in skeletal or incomplete specifications about a domain.
Unsupervised iaeining
Clustering
‘Supervised learning
Classification
Association analysis:
Regression
Fig. 1.2.4
TEGHNIGAL PUBLIGATIONS® = an up-thrust for knowledge
—
Scanned with CamScannerEe =—
Introduction to Machine Leaming te? Introduction to Machine Learning
ERI Supervised Learning
Supervised learning is the machine learning task of inferring a function from
supervised training data. The training data consist of a set of training examples.
“The task of the supervised learner is to predict the output behavior of a system for
any set of input values, after an initial training phase,
Supervised learning in which the network is trained by providing it with input
and matching output patterns, These input-output pairs are usually provided by
an external teacher:
Human learning is bas
experiences.
A computer system learns from data, which repre:
an application domain.
‘To leam a target function that can be used to predict the values of a discrete class
attribute, eg, approve or not-approved, and high-risk or low risk. The task is
commonly called : Supervised learning, Classification or inductive learning.
Training data includes both the input and the desired results. For some examples
the correct results (targets) are known and are given in input to the model during
the learning process. The construction of a proper training, validation and test set
is crucial. These methods are usually fast and accurate.
Have to be able to generalize : Give the correct results when new data are given
in input without knowing a priori the target.
Supervised learning is the machine learning task of inferring a function from
supervised training data. The training data consist of a set of training examples. In
supervised learning, each example is a pair consisting of an input object and a
desired output value.
A supervised leaming algorithm analyzes the training data and produces an
inferred function, which is called a classifier or a regression function. Fig. 12.2.
shows supervised learning process.
ed on the past experiences, A computer does not have
sent some “past experiences” of
*
Leaming
‘lgodthm
Training Testing
Fig. 1.2.2 Supervised learning process
The
“ Jearned model helps the system to perform task better as compared to no
arming,
TECHNICAL PUBLICATIONS® - an up-thnist for knowledge
Scanned with CamScanner7
Introduction fo Machine Leaming 1-8 Introduction te Machine Leaming
© Each input vector requires a corresponding target vector.
Training Pair = (Input Vector, Target Vector)
Fig. 1.2.3
* Supervised learning denotes a method in which some input vectors are collected
and presented to the network, The output computed by the net-work is observed
and the deviation from the expected answer is measured. The weights are
corrected according to the magnitude of the error in the way defined by the
learning algorithm.
Supervised leaming is further divided into methods which use reinforcement or
error correction. The perceptron learning algorithm is an example of supervised
learning with reinforcement.
* In order to solve a given problem of supervised learning, following steps ate
performed :
1. Find out the type of training examples.
2. Collect a training set.
3. Determine the input feature representation of the learned function.
4. Determine the structure of the learned function and corresponding learning
algorithm.
Complete the design and then run the learning algorithm on the collected
training set.
6. Evaluate the accuracy of the learned function. After parameter adjustment
and learning, the performance of the resulting function should be measured
on a test set that is separate from the training set.
a
TECHNICAL PUBLICATIONS® + an up-thrust for knowledge
Scanned with CamScannerIntroduction to Machine Leaming 1-9 Introduction to Machine Leaming
Classification
+ Classification predicts categorical labels (classes), prediction models
continuous-valued functions. Classification is considered to be supervised learning.
® Classifies data based on the training set and the values in a classifying attribute
and uses it in classifying new data, Prediction means models continucus-valued
functions, ie, predicts unknown or missing values.
‘= Preprocessing of the data in preparation for classification and prediction can
involve data cleaning to reduce noise or handle missing values, relevance analysis
to remove irrelevant or redundant attributes, and data transformation, such as
generalizing the data to higher level concepts or normalizing data
+. Fig. 1.24 shows the classification.
Fig. 1.2.4 Classification
labels for new samples.
‘Aim: To predict categorical class
Input : Training set of samples, each with a class label.
based on the training set and the class labels.
+ Classifier is
eee tructs a model and uses the model to
«= Prediction is similar to classification. It cons'
predict unknown or missing value.
Classification is the process of fin
data classes or concepts, for the pur
the class of objects whose class label is unknown.
the analysis of a set of training data,
+ Classification and prediction may need
which attempts to identify attributes that
prediction process.
Numeric prediction is the task of predicting continuous values for given input. For
example, we may wish to predict the salary of college employee with 15 years of
work experience, or the potential sales of a new product given its price.
ding @ model that describes and distinguishes
pose of being able to use the model to predict
‘The derived model is based on
to be preceded by relevance analysis,
do not contribute to the classification or
TECHNICAL PUBLIGATIONS® - an up-thrust for knowledge
Scanned with ComScannierIntroduction ta Machine Lesming
{Introduction to Machine Learning @
* Some of the classification metheds like back-propagatian, support ‘vector machines,
and k-nearest-neighbor classifiers can be used for prediction.
FFA Regression
© For an input x, if the output is continuous, this is called a regression problem. For
example, based on historical information of demand for tooth paste in your
nd for the next month.
you are asked to predict thee demat
Regression is concerned with the prediction of continuous quantities. Linear
regression is the oldest and most ‘widely used predictive model in the field of
machine learning. The goal is to minimize the sum of the squared errors to fit a
straight line to a set of data poin's
For regression tasks, the typical accuracy
(RMSE) and Mean ‘Absolute Percentage Error
distance between the predicted qjumeric target and the actual
supermarket,
metrics are Root Mean Square Error
(MAPE). These metrics measure the
numeric answer.
Regression Line
Least squares + The least squares zegression line is the line that makes the sum of
squared residuals as small as possible. Linear means “straight line’.
on Tine is the line which gives the best estimate of one variable from the
Regressi
value of any other given variable.
‘The regression tine gives the average relationship between the two variables in
mathematical form,
For two variables X and ¥, there are always two lines of regression.
best estimate for the value of X for any
Regression line of X on Y : Gives the
specific given values of Y :
X= a+bY
where
a = X- intercept
b = Slope of the line
X = Dependent variable
Y = Independent variable
* Regression line of ¥ on X : Gives the best estimate for the value of Y
specific given values of X :
Y = a+bx
for any
TECHNICAL PUBLICATIONS® - an up-thrust fer knowledge
a _d
Scanned With ComScannerJnfroduetion to Machine Leeming te Introduction to Machine Learning
where
a = Y- intercept
b = Slope of the line
Y = Dependent variable
x = Independent variable
+ By using the least squares method (a procedure that minimizes the vertical
deviations of plotted points surrounding a straight line) we are able to construct a
best fitting straight line to the scatter diagram points and then formulate a
regression equation in the form of :
F = a+bx
§ = Frb@e-3)
%
Population
a‘ Population slope Random
' y= intercept } i error
ae vitbeo Bx f
Dependent Independent
variable variable
x
Fig. 1.2.5
* Regression analysis is the art and science of fitting straight lines to pattems of
data. In a linear regression model, the variable of interest ( “dependent” variable)
is predicted from k other variables ("independeni” variables) using a linear
equation. If Y denotes the dependent variable, and X;,..., Xp, are the independent
variables, then the assumption is that the value of Y at time t in the data sample
is determined by the linear equation :
Y, = Bo +B Xie +B2 Xap +
Where the betas are constants and the
epsilons are independent and identically ny
distributed normal random variables input vector { _—
x t
Bu Xue +84
‘Bias term ——= 1 Wo ;
fw)
with mean zero. 3
« In a regression tree the idea is
this : Since the target variable does
not have classes, we fit a
Fig. 1.2.6
TECHNICAL PUBLICATIONS® « an upethrust for knowledge
Scanned with CamScannerSS
Introduction ta Machine Leaming 1212 Introduction to Mochine La.
ring
regression model to the target variable using each of the independent variabe,
Then for each independent variable, the data is split at several split points. =
At each split point, the “error” between the predicted value and the actual valyes
is squared to get a "Sum of Squared Errors (SSE)". The split point errors across the
variables are compared and the variable/point yielding the lowest SSE is chosen
as the root node/split point. This process is recursively continued.
+ Error function measures how much our predictions deviate from the desitet
answers.
Svinte)?
i-T on
‘Mean-squared exzor Jy
n
Multiple linear regression is an extension of linear regression, which allows a
response variable, y, to be modeled as a linear function of two or more predictor
variables,
Evaluating a Regression Model
« Assume we want to predict a car's price using some features such as dimensions,
horsepower, engine specification, mileage etc. This is a typical regression problem,
where the target variable (price) is a continuous numeric value.
We can fit a simple linear regression model that, given the feature values of ¢
certain car, can predict the price of that car, This regression model can be used to
score the same dataset we trained on. Once we have the predicted prices for all of
the cars, we can evaluate the performance of the model by looking at how much
the predictions deviate from the actual prices on average.
Advantages :
a. Training a linear regression model is usually much faster than methods
neural networks,
b. Linear regression models are simple and require minimum memory to implement-
¢. By examining the magnitude and sign of the regression coefficients you can infet
how predictor variables affect the target outcome.
such as
Assessing Performance of Regression- Error Measures
©. The training error is the mean error over the training sample. The te
expected prediction error over an independent test sample.
«Fig. 1.2.7 shows the relationship between training set and test set. 4
* Unlike decision trees, regression trees and model trees are used for prediction
regression trees, each leaf stores a continuous-valued prediction. In model
each leaf holds a regression model.
st error is the
Scanned with CamScannerIntroduction to Machine Laaming 1-13 Introduction to Machine Leeming
| Create a model Eslimate accuracy
Fig. 1.2.7
Unsupervised Learning
The model is not provided with the correct results during the training. It can be
used to cluster the input data in classes on the basis. of their statistical properties
only. Cluster significance and labeling.
The labeling can be carried out even if the labels are only available for a small
number of objects representative of the desired classes. All similar inputs patterns
are grouped together as clusters.
If matching pattern is not found, a new cluster is formed. There is no error
feedback.
External teacher is not used and is based upon only local information. It is also
referred to as self-organization.
‘They are called unsupervised because they do not need a teacher or super-visor to
label a set of training examples. Only the original data is required to start the
analysis.
In contrast to supervised learning, unsupervised or self-organized learning does
not require an external teacher. During the training session, the neural network
receives a number of different input patterns, discovers significant features in
these pattems and leas how to classify input data into appropriate categories.
TECHNICAL PUBLICATIONS® - an up-thrust fer krawledgo
Scanned with CamScannerintroduction to Machine Leaming
Introduction to Machine Leaming 1-14
idly and can be used jn
* Unsupervised learning algorithms aim to leam ed 1 date el
realtime. Unsupervised leaming is frequently nr 7
feature extraction etc. by Zurada is typically
ing
@ Another mode of learning called neces aapoatav’ mamnoey.satwns'
employed for associative memory BEINN. | ciworks stable states.
designed by recording several idea patterns
Clustering shod by which large sets of data are grouped into
met
© Clustering of data is a oP similar data, Clustering can be considered the most
e .
clusters of smaller sets of st
important ursupervsed IPE a ypich are “similar” between them and
ection of objects Ww!
oA fe is ee objects ele to other clusters. Fig. 1.2.8 shows cluster.
are "dissimi Bing
Fig. 1.2.8 Cluster
* In this case we easily identify the 4 clusters into which the data can be divided;
the similarity criterion is distance : two or more objects belong to the same cluster
if they are "close" according to a given distance (in this case geometrical distance).
This is called distance-based clustering,
Clustering means grouping of data or dividing a large data set into smaller data
sets of some similarity.
A clustering algorithm attempts to find natural groups of components or daté
based on some similarity. Also, the clustering algorithm finds the centroid of a
group of data sets,
To determine cluster membership, most algorithms evaluate the distance between
Me and the cluster centroids. The output from a clustering algorithm is
‘¥ a statistical description of the cluster centroi i! of
tng iniiveptinciatce ids with the number
Scanned With CamScannertoMachine Learning Z
215
ft
Int to Machine Lemin
ering
Raw date foo
Shusters of data
« Cluster centroid : The centroid of a cluster is a point whose par
the mean of the parameter values ofall the points in the chistes, Each dans A
- luster has
2 well defined centroid.
Centraid
Distance : The distance between two points is taken as a common metric to as see
the similarity among the components of a population. The commonly used
distance measure is the Euclidean metric which defines the distance between two
points p = (p1,P2--) and q = (41,42, ~) is given by :
a= ¥ @-ai)?
i=
The goal of clustering is to determine the intrinsic grouping in a set
data, But how to decide what constitutes a good clustering ? It can. be shown that
there is no absolute “best” criterion which would be independent of the final aim
of the clustering. Consequently, it is the user which must supply this criterion, in
such a way that the result of the dlustering will suit their needs.
Clustering analysis helps construct meaningful partitioning of a large set of objects.
Cluster analysis has been widely used in numerous applications, including patter
recognition, data analysis, image processing, etc.
as listed below +
* Clustering algorithms may be classified
1. Exelusive clustering
2. Overlapping clustering
3. Hierarchical clustering
ee ae
an® an uprtist for erowscre
Scanned With ComScanner
of unlabeledIntroduction to Me
chine.
‘Leaming 1-16 Introduetion to Machine
. A good clustering method will produce high quality clusters with high intracigg,
similarity and low inter-class similarity. The quality of a clustering result depend,
on both the similarity measure used by the method and its implementation, The
quality of a clustering method is also measured bY its ability to discover some ».
all of the hidden patterns.
Examples of Clustering Applications
1. Marketing : Help marketers discover
then use this knowledge to develop targeted
2. Land use : Identification of areas of simil
database.
3, Insurance : Identifying
average claim cost.
4. Urban planning = Identifying grou
value, and geographical location.
§ Seismology : Observed earth quake epicenters should be clustered along continent
faults.
EEE] Reinforcement Learning
+ User will get immediate feedback in supervised leaming and no feedback from
unsupervised learning; But in the reinforced learning, you will get delayed scalar
feedback.
Reinforcement learning is
learning what to do and how
to map situations to actions.
The learner is not told which
actions to take. Fig. 129
shows concept of reinforced
learning.
* Reinforced learning is deals
with agents that must sense
and act upon their
environment, It combines
an Artificial Intelligence
machine learning techniques,
It allows machines
behavior within a
distinct groups in their customer bases and
marketing programs.
ar land use in an earth observation
groups of motor insurance policy holders with a high
ps of houses according to their house type
Fig. 1.2.9 Reinforced learning
SS oatias agents to automatically determine the ideal
Specific context, in order to maximize its performance. Simple
Png
Scanned With ComScannerIntroduction to Machine Leaming 1-47 Introduction to Machine Leaming
reward feedback is required for the agent to learn its behavior; this is known as
the reinforcement signal.
* Two most important distinguishing features of reinforcement learning is
trial-and-error and delayed reward,
* With reinforcement learning algorithms an agent can improve its performance by
using the feedback it gets from the environment. This environmental feedback is
called the reward signal.
* Based on accumulated experience, the agent needs to learn which action to take in
a given situation in order to obtain a desired long term goal. Essentially actions
that lead to long term rewards need to reinforced. Reinforcement learning has
connections with control theory, Markov decision processes and game theory.
© Example of Reinforcement Leaming : A mobile robot decides whether it should
enter a new room in search of more trash to collect or start trying to find its way
back to its battery recharging station. It makes its decision based on how quickly
and easily it has been able to find the recharger in the past.
EER] Elements of Reinforcement Learning
* Reinforcement learning elements are as follows :
1. Policy 2. Reward Function
3. Value Function 4. Model of the environment
Fig. 1.2.10 shows
Policy : Policy defines the
learning agent behavior for
given time period. It is a f° ome
mapping from perceived a
states of the environment to —
actions to be taken when in idipeter
those states.
© Reward Function : Reward ae
function is used to define a rn
geal in a reinforcement ‘Agent
learning problem. It also Fig, 4.2.10 ; Elements of reinforcement learning
maps each perceived state of
the environment to a single number.
‘Value function : Value functions specify what is good in the lang run. The value
of a state is the total amount of reward an agent can expect to accumulate over
the future, starting from that state.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Scanned with CamScannerIntroduction to Mactrine Learning
1-18 Introduction to Machine in
‘Model of the environment : Models are used for planning.
Credit assignment problem : Reinforcement learning algorithms learn to generate
‘an internal value for the intermediate states as to how good they are in leading to
the goal.
‘The learning decision maker is called the agent. The agent interacts with the
environment that includes everything outside the agent,
‘The agent has sensors to decide on its state in the environment and takes an
action that modifies its state.
The reinforcement leaming problem model is an agent continuously interacting
with an environment. The agent and the environment interact in a sequence of
time steps. At each time step t, the agent receives the state of the environment and
a scalar numerical reward for the previous action, and then the agent then selects
an action.
Reinforcement Leaming is a technique for solving Markov Decision Problems.
Reinforcement learning uses a formal framework defining the interaction. between
a learning agent and its environment in terms of states, actions, and rewards. This
framework is intended to be a simple way of representing essential features of
the artificial intelligence problem,
Difference between Supervised, Unsupervised and Reinforcement
ke
|
|
Supervised leaming requires For unsupervised learning
that the target variable is well typically either the target
defined and that 3 sufficient variable is unknown or has
Learning
‘Supervised learning —- Unsupervised learning
number of its values are only been recorded for too ‘The leamer is not told which
given. small a number of cases, actions to take.
‘Supervised learning deals with Unsupervised Learning deals Reinforcement learning deals _
two main tasks regression and with clustering and associative _ with exploitation or
classification. rule mining problems. exploration, Makov's decision
processes, policy leaming,
deep leaming and value
learning.
The input data in supervised Unsupervised learning uses The data is not predefined in
Jearning in labelled data, unlabelled data. cement learning. |
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge j
Scanned With CamScanner110 Inleoedielion Wa Wahi Lawnierg
‘Leams by using labelled data, ‘Trained using unlabelled data — Works on interacting with the
without any guldance, envvironaient.
Mapa tho labeled inputs to the Understands patterns and Fotlavws the trial and error
— known:
EES Application of Machine Loarning
+ Examples of successful applications of machine learning, :
1, Leaming to recognize spoken words.
2. Learning to drive an autonomous vehicle.
3, Learning to classify new astronomical structures,
4, Learning to play world-class backgammon.
5. Spoken language understanding: within the context of a limited domain,
determine the meaning of something uttered by a speaker to the extent that it
can be classified into one of a fixed set of categories.
Face Recognition
* Face recognition task is effortlessly and every day we recognize our friends,
relative and family members. We also recognition by looking at the photographs.
In photographs, they are in different pose, hair styles, background light, makeup
and without makeup.
* We do it subconsciously and cannot explain how we do it. Because we can't
explain how we do it, we can't write an algorithm.
* Face has some structure. It is net a random collection of pixel. It is symmetric
structure. It contains predefined components like nose, mouth, eye, ears. Every
person face is a pattern composed of a particular combination of the features. By
analyzing sample face images of a person, a learning program captures the pattern
specific to that person and uses it to recognize if a new real face or new image
belongs to this specific person or not.
* Machine learning: algorithm creates an optimized model of the concept being
Jearned based on data or past experience.
Healthcare :
* With the advent of wearable sensors and devices that use data to access health of
a patient in real time, ML is becoming a fast-growing trend in healthcare.
* Sensors in wearable provide real-time patient information, such as overall health
condition, heartbeat, blood pressure and other vital parameters.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Scanned With CamScannerieppeucnan ts Mantine Leeming 1-20 Introduction to Machina Leaning
and medical experts can use this information to analyse the health
Doctors 4
. al, draw a pattern from the patient history and predict the
condition of an individu
occurrence of any ailments in the future.
+ The technology also empowers medical experts to analyze data to identify trends
that facilitate better diagnoses and treatment.
Financial services : =
«+ Companies in the financial sector are able to identify key insights in financial data
as well as prevent any occurrences of financial fraud, with the help of machine
learning technology.
© The technology is also used to identify opportunities for investments and trade,
© Usage of cyber surveillance helps in identifying those individuals or institutions
which are prone to financial risk and take necessary actions in time to prevent
fraud.
EEE hypothesis Space
© Hypothesis represents a function approximation for the target function. It is used
to associate/estimate or predict the target value Y, based on the input dataset, X,
model parameters, and hyper-parameters. It is represented using the letter, h.
The hypothesis is also referred to as a model. The hypothesis can be represented
as Y = h(X). Fig: 14.1 shows diagram representing the hypothesis.
Fig. 1.4.1 Diagram representing the hypothesis.
If H comprises all possible subsets of X, we cannot lea anything new beyond the
training data in D, because the labels ¢(x) of instances x outside D can
independently and arbitrarily be 0 or 1. That is, we have no inductive bias.
= Hypothesis space represents one or more hypothesis or function approximation or
models which can be created using different training data set derived from the
population.
The different hypothesis or models is created using a combination of different
training data set derived from the same population, features and hyper
parameters. One or more hypothesis or functions can also be said to be part of
what can be called as hypothesis class. Fig. 1.4.2 shows hypothesis class.
TECHNICAL PUBLICATIONS® ~ an up-thrust for knowledge
Scanned with CamScannerInlroducfian to Machine Learning 4-24 Intreductian to Machine Leerning
| Hypothesis
. yoo [+
xh |- Y
Fig. 1.4.2 Hypothesis class
‘Learning algorithm is not same as the hypothesis or function approximation. A
learning algorithm for a concept learning problem is given a set D of training
examples, and it returns a hypothesis h.
* Search space : The space of all feasible solutions is called search space. Each point
in the search space represent one feasible solution. Each feasible solution can be
"marked" by its value or fitness for the problem.
* If we are solving some problem, we are usually looking for some solution, which
will be the best among others.
* We are looking for our solution, which is one point or more among feasible
solutions, that is one point in the search space.
‘The looking for a solution is then equal to a looking for some extreme (minimum
or maximum) in the search space. The search space can be whole known by the
time of solving a problem, but usually we know only a few points from it and we
are generating other points as the process of finding solution continues.
© Genetic algorithm employs a randomized beam search method to seek a
maximally fit hypothesis.
Motivation :
© The solutions(s) to machine learning tasks are often called hypotheses, because
they can be expressed as a hypothesis that the observed positives and negatives
for a categorization is explained by the concept leamed for the solution.
« The hypothesis have to be represented in some representation scheme and as
usual with AI tasks, their will have a big effect on many aspects of the learning
methods.
* General definition of a hypothesis : "A hypothesis is a statement of a relationship
between two or more variables”.
* Sometimes, it is necessary to evaluate the performance of learned hypotheses.
TECHNICAL PUBLICATIONS® - en up-thrust for knowledge
Scanned with CamScannerIntroauetion to Machine Leaming fom Introduction to Mactirns
Reason for using hypotheses
+ Learning from a limited
+ size database indicating the effectiveness of
medical treatments, it is important to understand as precisely as Possible
accuracy of the leamed hypotheses. =
© The evaluating hypotheses are an integral component of many learning methods
+ It is important to understand the likely errors inherent in estimating the
of the pruned and unpnined tree.
Estimating the accuracy of a hypothesis is relatively straightforward shen dais;
plentiful. 7
« An estimator is any random variable used to estimate some parameter of
underlying population from which a sample is drawn. —
1. The estimation bias of an estimator Y for an arbitrary parameter p is EY] _
If the estimation bias is 0, then Y is an unbiased estimator for p. 7
2 The variance of an estimator Y for an atbitrary parameter p is simply &
variance of Y. Looe
EEX inductive Bias
. ce cream | Elimination algorthin will converge toward the ime ieee
concept provided it is given accurate training examples provided fis ini
‘hypothesis space contains the target concept. an oo
‘What if the target concept is not contained in the hypothesis space ?
* Can we avoid this difficuly 5 :
Possible hypothesis 7 fy By using 2 hypothesis space that inchudes evey
accuracy
a
Scanned with CamScannerIntroduction to Machine Leaming
introduction to Machine Leaming
Forecast Enjoy Sport|
| Bam Sky Ais Temp Humidity Wind Water
be Sunny ‘Warm fish Strong, Cool Change YRS 4]
% Cloudy Wem N ul Strong Cool Change YES
3 Doral secs: Coot Change NO |
« From first two examples : $2: , Warm, Normal, Strong, Cool, Change>
* This is inconsistent with third examples, and there are no hypotheses consistent
with these three examples PROBLEM : We have biased the learner to consider
only conjunctive hypotheses. We require a more expressive hypothesis space.
+ The obvious solution to the problem of assuring that the target concept is in the
hypothesis space H is to provide a hypothesis space capable of representing every
teachable concept.
Inductive Bias - Fundamental Property of Inductive Inference :
+ A learner that makes no a priori assumptions regarding the identity of the target
concept has no rational basis for classifying any unseen instances.
* Inductive Leap : A leamer should be able to generalize training data using prior
assumptions in order to classify unseen instances.
* The generalization is known as inductive leap and our prior assumptions are the
‘inductive bias of the learner.
* Inductive Bias (prior assumptions) of Candidate-Elimination algorithm is that the
concept can be represented by a conjunction of attribute values, the target
‘concept is contained in the hypothesis space and training examples are correct.
Inductive Bias - Formal Definition
© Consider a concept learning algorithm L for the set of instances X. Let c be an
arbitrary concept defined over X, and let De = {) be an arbitrary set of
training examples of c.
* Let L{xj,D,) denote the classification assigned to the instance x; by L after
training on the data D..
* The inductive bias of L is any minimal set of assertions B such that for any target
concept ¢ and corresponding training examples D, the following formula holds.
(xy XB De *x3)4 (Xi, Ded]
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Scanned with CamScannerIntroduction to Machine Learning 1-24 Introduction to Moching Leong
Three Learning Algorithms. :
ROTE-LEARNER : Learning corresponds simply to storing each observed trai
example in memory. Subsequent instances are classified by looking them 1
memory: If the instance is found in memory, the stored classification is retureg
Otherwise, the system refuses to classify the new instance, Inductive Bias ; No
inductive bias
CANDIDATE-ELIMINATION : New instances are classified only in the case where
all members of the current version space agree on the classification. Otherwise, the
system refuses to classify the new instance. Inductive Bias : the target concept can
be represented in its hypothesis space,
FIND-S : This algorithm, described earlier, finds the most specific hypothesis
consistent with the training examples. It then uses this hypothesis to classify all
subsequent instances. Inductive Bias : The target concept can be represented in
its hypothesis space, and all instances are negative instances unless the opposite
is entailed by its other knowledge.
ining
P in
Evaluation and Cross Validation
Cross-validation is a technique for evaluating estimating performance by training
several machine learning models on subsets of the available input data and
evaluating them on the complementary subset of the data. Use cross-validation to
detect overfitting, i, failing to generalize a pattern.
In general, machine learning involves deriving models from data, with the aim of
achieving some kind of desired behaviour, e.g., prediction or classification.
Fig. 1.6.1 shows cross-validation.
Dataset
[Tea] nanan
Cross validation
Bata permitting
Tasting | Training, Validation, Testing
Fig. 1.6.4 Cross. validation
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Scanned with CamScanner+ But this generic task is broken down into a number of special cases. When
training is done, the data that was removed can be used to test the performance of
the learned model on "new" data. This is the basic idea for a whole class of model
evaluation methods called cross validation,
«Types of cross validation methods are holdout, K-fold and Leave-one-out.
The holdout method is the simplest kind of cross validation. The data set is
separated into two sets, called the training set and the testing set, The function
approximate fits a function using the training set only.
The K-fold cross validation is one way to improve over the holdout method. The
data set is divided into k subsets, and the holdout method is repeated k times.
* Each time, one of the k subsets is used as the test set and the other k ~ 1 subsets
are put together to form a training set. Then the average error across all k trials is
computed,
* Leave-one-out cross validation is K-fold cross validation taken to its logical
‘extreme, with K equal to N, the number of data points in the set.
« That means that N separate times, the function approximate is trained on all the
data except for one point and a prediction is made for that point.
© Cross-validation ensures non-overlapping test sets.
K-fold cross-validation :
* In this technique, k ~ 1 folds are used for training and the remaining one is used
for testing as shown in Fig. 1.6.2.
‘Total number of examples
a
Saeee [| ey
Experiment 3
an examples
Experiment 4
Fig. 1.6.2 K-fold cross validation
* The advantage is that entire data is used for training and testing. The error rate of
the model is average of the error rate of each iteration.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Scanned With CamScanner'
Intrestuction to Machine Lowrning: 168 Joteduoton io Machine Leary
also be called a form the repeated hold-out method. The étroe
© This technique ean
1 by using, atratification technique.
rate could be improv
Evaluating Performance Moclol
© Classification is major task of supervised earning: The responsibility of the
classification model is to assign class Inbel to the target feature based on the value
of the predictor features.
When performing classification predictions, (here's four types of outcomes that
could occur, The evaluation mensures in clatalfieation problems are defined from 4
matrix with the numbers of examples correctly and incorrectly classified for each
class, named confusion matrix.
Confusion matrix is also called a contingency table.
1) True positives are when you predict an observation belongs to a class and it
actually does belong to that class,
2) True negatives are when you predict an observation does not belong to a class
and it actually does not belong to that class.
3) False positives occur when you predict an observation belongs to a class when
in reality it does not.
4) False negatives occur when you predict an observation does not belong to a
class when in fact it does.
* Confusion matrix goes deeper than classification accuracy by showing the correct
and incorrect (ie. true or false) predictions on each class. In case of a binary
classification task, a confusion matrix is a 2.x 2 matrix. If there are three different
classes, it is a 3.x 3 matrix and so on.
+ For any classification model, model accuracy is given by total number of correct
classifications (True Positive or True Negative) divided by total number of
classifications done.
Accuracy rate [rue segatives| ftir
Ise negatives + [True positives + [True neg
* The complement of accuracy rate is the error rate, which evaluates a classifier 'Y
its percentage of incorrect predictions.
TECHNICAL PUBLICATIONS® - an uptnrust for knowledge
4
Scanned With CamScannerIntroduction to Machine Leaming 1-27 Introcuction to Machine Learning
|False negatives | + | False positives |
Error rate = : S
[Falke negatives] + [False posilives | » [True negatives |+ [True positives]
Error tate = 1 — (Accuracy rate)
The recall accuracy rate predicted as positive.
* The specificity is a statistical measure of how well a binary classification test
correctly identifies the negatives cases.
Recall (R) [Tre negative]
_ [True posit
[False positives + [True negatives,
* True Positive Rate (TPR) is also called sensitivity, hit rate and recall.
Number of true positives
‘Number of true positives + Number of false negative
«= Frecision measures how good our model is when the prediction is positive.
Sensitivity =
Precision = ps
* The focus of precision is positive predictions. It indicates how many positive
predictions are true.
+ F, score is the weighted average of precision and recall.
Fe = 2 cn Real
F, score is a more useful measure than accuracy for problems with uneven class
distribution because it takes into account both false positive and false negatives.
= Kappa value of a model indicates the adjusted the model accuracy
Total accuracy - Random accuracy
Marreee San accuracy E
Total accuracy is simply the sum of true positive and true negatives, divided by
the total number of items, that is :
stat _ __TP+IN
otal accuracy = TpyTN+FP+EN
© Random Accuracy is defined as the sum of the products of reference likelihood
and result likelihood for each class. That is,
Random acy = Actual False + Predicted False + Actual True * Predicted True
Total «Total
TECHNICAL PUBLICATIONS” - an up-thrust for knonledge
Scanned With CamScanner1-28 Introduction to Machina L
Irv to Machine weming TS 78__TE ST toy
ms posi be written as :
- { false positives etc., random accuracy can
ee (IN-+FP) *(IN-+ FN) HFN +TP) #(FP + TP)
(IN +FP) «(N+
Random accuracy = Total Total
(CEEERED Consiter the he flog three-class confusion matrix.
i TT predetad
| 45 : 2
Actual - ms
| : :
Calculate precision and recall per class, Also calculate weighted average precision and
recall for the classifier.
Solution :
Calculate per-class precision and recall :
‘ iB 15
First class = 77 = 063 and 3 = 0.75
15 15
Second class 3p 7 0:75 and 55 = 0.50
Third class = # = 08 and # = 09
Calculate accuracy, precision mud recall for the following :
Predicted + Predicted ~ |
Actual +: 6 15
Actual ~
TEGHNICAL PUBLICATIONS® - an up-hrust for knowledge
_ad
Scanned With CamScannerIntroduction to Machine Leaming 1-29 Introduction te Machine Learning
Solution :
Accuracy = O48 a 75 5075 2 75%
Precision = OO = 0.8571
Recall wes
GERREIRE Colcutate true regative rte (ty } accuracy and pos fr the following
Predicied + Predicted ~ |
Solution :
50+25 75
AOMEY “Sorapzreao 109 Oe
Py = 0. = 0.9090
vedision = 50.
+ True negative rate is also called as specificity,
True negative rate = ma =08
ROC Curve =
+ Receiver Operating Characteristics (ROC) graphs have long been used in signal
detection theory to depict the trade-off between hit rates and false alarm rates over
noisy channel. Recent years have seen an increase in the use of ROC graphs in the
machine learning community.
* ROC curve summarizes the performance of the model at different threshold values
by combining confusion matrices at all threshold values, ROC curves are typically
used in binary classification to study the output of a classifier.
+ An ROC plot plots true positive rate on the Y-axis against false positive rate on
the X-axis; a single contingency table corresponds to a single point in an ROC
plot.
TECHNIGAL PUBLICATIONS® - an up-tnrust for knowledge
Scanned With CamScanner1-20 Introduction to Maching lev,
Ineveuction fo Machine Leaming "
py drawing a piecewise linear
a ranker ean be assessed by ¢ ‘
ae in ROC curve. The curve starts in (0, 0), finishes i
in both axes.
© The performance of
jn an ROC plot, known As
(A, 1, and is monotonically’
A « classifiers and visualizing their performy,
A useful technique for organizing © a
Eepecialy wef fer domains with skewed class distribution and une
classification error costs. /
Te allows to create ROC curve and a complete sensitivity/specificity report, 1},
ROC curve és. a fundamental tool for diagnostic test evaluation,
tn a ROC curve the true positive rate (Gensitivity) Is plotted in function of thy
false positive rate for different cut-off points of a parameter.
+ Each point on the ROC curve represents a sensitivity/specificity pay
somesponding to a particular decision threshold, The aren under the ROC curve js
a measure of how well a parameter can distinguish between two diagnostic
groups.
* Each point on an ROC curve connecting two segments corresponds to the true and
false positive rates achieved on the same test set by the classifier obtained from
the ranker by splitting the ranking between those two segments.
y * An ROC curve is convex if the slopes are monotonically non-increasing when
nestecreasittg
moving along the curve from (0, 0) to (1, 1). A concavity in an ROC curve, io,
two or more adjacent segments with increasing slopes, indicates a locally worse
than random ranking. 7
True Positive Rate (TPR) is a synonym for recall and is therefore defined 0
follows:
True Positive Rate TPR = TF
TP+FN
* False Positive Rate (FPR) is defined as follows :
False Positive Rate FPR =
FP+IN
EEE] concept Learning
* Inducing general functions from
machine learning.
* Concept t i
Coneep deuing | Acquiring the definition of a general category from gi
ple positive and negative training examples of the category,
© Concept te: fi
cept leaming can be seen as a problem of searching through a predefine!
Space of potential hy i
eager Ypotheses for the hypothesis that bost fits the traitité
aa
i PUBLICATIONS® an up.thrust far kno
specific training examples is a main issue °
r
Scanned With CamScannerIntroduction to Machine Leaming 1-3f intraduetion to Machine Learning
* The hypothesis space has a gencral-to-specific ordering of hypotheses, and the
search can be efficiently organized by taking advantage of a naturally occurring
structure over the hypothesis space.
* Formal definition for concept learning : Inferring a boolean-valued function from
training examples of its input and output.
* An example for concept - learning is the learning of bird-concept from the given
examples of birds (positive examples) and non-birds (negative examples).
+ We are trying to leam the definition of a concept from given examples.
+ Concept leaming involves determining a mapping from a set of input variables to
a Boolean value. Such methods are known as inductive learning methods.
* Ifa function can be found which maps training data to correct classifications, then
it will also work well for unseen data. This process is known as generalization.
« Example : Leam the “days on which my friend enjoys his favorite water sport”
Example Sky Air Temp Humidity Wind Water Forecast = Enjoy |
© A set of example days, and each is described by six attributes. The task is to
earn to predict the value of EnjeySport for arbitrary day, based on the values of
its attribute values.
«The inductive learning hypothesis : Any hypothesis found to approximate the
target function well over a sufficiently large set of training examples will also
approximate the target function well over other unobserved examples
© Although the learning task is to determine a hypothesis ( h) identical to the target
concept cover the entire set of instances ( X), the only information available about
c is its value over the training examples.
«Inductive learning algorithms can at best guarantee that the output hypothesis fits
the target concept over the training data.
TECHNICAL PUBLICATIONS® - an up-theust for knowiedge
Scanned With CamScannerIntroduction to Machine Learnlng 1-92 Introduction to Machine tan,
* Lacking any further information, our assumption is that the bes
Tegarding unseen instances is the hypothesis that best fits the oi
data. This is the fundamental assumption of inductive learning.
* Hypothesis representation (constraints on instance attributes) ;
1. Any value is acceptable is represented by ?
2. No value is acceptable is represented by @
t hypothe,
ting
EEE] concept Learning as Search
* Concept learning can be viewed as the task of searching through a large space of
hypotheses implicitly defined by the hypothesis representation.
* The goal of this search is to find the hypothesis that best fits the training
examples.
* By selecting a hypothesis representation, the designer of the leaming algorithm
implicitly defines the space of all hypotheses that the program can ever represet
and therefore can ever learn.
+ A hypothesis is a vector of constraints for each attribute.
1. Indicate by a "2" that any value is acceptable for this attribute
2. Specify a single required value for the attribute
3. Indication by a "@” that no value is acceptable
* I some instance x satisfies all the constraints of hypothesis h, then h classifies x=
a positive example (hix) = 1).
* Example hypothesis for EnjoySport : EnjoySport concept learning task
* Given
Instances X: Possible days, gach described by thé attribiites
Sky (with possible values Sunny, Cloudy, and Rainy)
AirTemp (with values Warm and Cold) is
Humidity (with values Normal and High)
Wind (with values Strong and Weak)
Water (with values Warm and Cool), and
Forecast (with values Same and Change)
Hypothesis H: Each hypothesis is descrined ‘by a conjunction of constraints oa th
attributes, The constraints may be "7", 2", or a specific value
dee ase br 0
ining Examples D: Positive o
ts tee r negative examples of
* Determine : A hypothesis h in H such that
A(x) = cf) for all x in X
a eeeeeeeeeSeSsSs“‘CNRNNNw”
TECHNICAL, PUBLICATIONS® ~ an upsthruet fr knowledge
Scanned With CamScannerIntroduction io Machine Learning 1239 Introduction to Mechine Learning
sis
« Search through a large space of hypothesis implicitly defined lista hypothesi
representation. Find the hypothesis that best fits the training examples. —
How big is the hypothesis space 7 In EnjoySpart six attributes : Sky has 3 values,
and the rest have 2.
How many distinct instances ?
How many hypothesis ?
the designer of the le
« By selecting a hypothesis representation,
implicitly defines the space of all hypothesis the program can ever repre:
therefore can ever learn.
st2°22re2= %
arning algorithm
ent and
Instances :
Hypothesis :
4gegtaegeas+1 = 973
g task. Most practical learning tas!
5tgearard* d= 5120
ks involve much
+ This is a very simple leaminy
larger, sometimes infinite, hypothesis spaces.
General-to-Specific Ordering of Hypotheses -
+ Many algorithms for concept learning organize the search through the hypothesis
spaces by relying on a general-to-specific ordering of hypotheses.
By taking advantage of this naturally occurring structure over the hypothesis
space, we can design learning algorithms that exhaustively search even infinite
hypothesis spaces without explicitly enumerating every hypothesis.
4 Consider two hypotheses :
hy = (Sunny, ?, 7, Strong, ?, 2)
hy = (Sunny, ?,?, 2,22)
Naw consider the sets of instances that are classified positive by hy and by h3.
Because hz imposes fewer constraints on the instance, it classifies more instances
as positive.
In fact, any instance classified positive by hj will also be classified positive by h2.
‘Therefore, we say that hy is more general than hy.
One learning method is to determine the most specific hypothesis that matches all
the training data.
More-General-Than-Or-Equal Relation : Let hy and hz be two boolean-valued
functions defined over X. Then hy is more-general-than-or-equal-to hy
(written hy 2 ha). If and only if any instance that satisfies hy also satisfies hy.
hy is more-general-than hz (hy > ho) if and only if hy 2 hg is true and hy 2 hy
is false, We also say h is more-specific-than hy.
TECHNICAL PUBLICATIONS® » an up-thnust for knowledge
Seanned with ComBcanneeIntroduction to Machin
Introduction to Machine Leeming 1-34
hy 2 hyiffvxeXihyes) = La hjo)=1
Specific
Nf
thy = «Sunny, 2,2, Su0Ng, ? 7
Soy ty pe asung.?.?, 7 2%
Latice he
h =
h =
h = The largest concept (in C) may not be contained in H.
(X= 322222 = %
enue learner converged to the target concept, as there can be several Consist
hypotheses (with both positive and negative training examples) 7
2. Why the most specific hypothesis is preferred ?
3. What if there are several maximally specific consistent hypotheses ?
4. What if the training examples are not correct ?
Mutliple Choice Questions
ai
Machine Tearaing is a sub - field of which concerns with developiag
computational theories of learning and building learning machines.
‘a. artificial intelligence
fe neural network ‘a! soft computing
a2 learning in which the network is trained
matching output patterns. ET ENE ae oe
a Unsupervised
| Somi-supervised
'b) Supervised
d All of these
3 Supervised learning and wnsupervised lea
ning are the f
4 human learning “i types of
b model leami
§ machine leaning alc mung. |
lone
Q4 Unsupervised learnig,
8 Uses i
fj — data.
a labelled
label
© unlabelled i Hed and unlabelled
a test
= TECHNICAL print ima
Scanned with CamScannerintroduction to Machine Leaming 1-37 Introduction to Machine Leaming
Q5 A computer program is said to learn from E with respect to some class
‘of tasks T and performance measure P, if lis performance at tasks in T, as
measured by P, improves with experience E.
a) training ‘b. experience
c. testing d. algorithm
Q.6 Unsupervised Learning dents with and____ mining problems.
a. classification, regression || clustering, classification
¢ clustering, associative rule ‘d| label, unlabelled data
a7 learning deals with two main tasks regression and classification,
a) Reinforcement ‘b| Deep
‘c Un supervised (d) Supervised
Q8 The individual tuples making up the training set are referred to as and are
selected from the database under analysis.
[a learning tuples |B] training tuples
“e samples id) database
Q9 Machine learning is inherently « multi-disciplinary field.
12) Inter-disciplinary (| Multi-disciplinary
c) Single (dl None
Q10 methods have been used to train computer - controlled vehicles to steer
correctly when driving on a variety of road types.
(a) Machine learning
{ | Data mining
[e] Neural networks [d) Robotics
Q.11 The individual tuples making up the training set are referred to as
and
are selected from the database under analysis,
a leaming tuples {b! training tuptes
‘c) samples ld) database
TECHNICAL PUBLICATIONS® - an up-thnist for knontedga
Scanned With CamScanner=
Introduction to Machine Leeming 1-38 introduction fo Machin Lnaning
G12 Training perceptron is based on i:
a) supervised learning technique {b) umsupervised leering
|e] reinforced learning {d] stochastic learning
Q.13 List the clements of reinforcement learning.
fal Policy |b] Reward function
[e) Value function {al All of these
Answer Keys for Multiple Choice Questions
for. [a
| O5
[ase
| os
Qo00
TECHNICAL PUBLICATIONS
an up-thwust for knawiediga
Scanned With CamScannerBasic Machine Learning
Algorithms
Syllabus |
Linear Regression, Deciston Trees, Lecerning Decksiom Trees, Koncarest Metighbowr, Caltaherrattve
Filtering, Overfining,
Contents
2.1 Linear Rogrosston
22 Dwelalon Troo
2.3. Basle Declslon Troo Lenrning Algomttun
24 K-naorost Neighbour
2.6 Collaborative Filtoriag
26 Overtiing
27 Multiple Cholew Questions
a0Basic Machine: Leaming Ager
[ESI Linear Regression
© The most common regression algorithms are,
a) Simple linear regression
b) Multiple linear regression
©) Polynomial regression
d) Multivariate adaptive regression splines
©) Logistic regression
f) Maximum likelihood estimation (Least squares)
ESE Simple Linear Regression
fhich involves only one predictor. Linear regression is 2
+ Regression model wi
statistical method that allows us to summarize and study relationships between
two continuous variables :
1. One variable, denoted x, is regarded as the ptedictor, explanatory, or
independent variable.
2 The other variable, denoted y, is regarded as the response, outcome, or
dependent variable.
+ Regression models predict a continuous variable, such as the sales
or predict temperature of a city.
Lot's imagine that you fit a line with the training points you have. Imagine you
want to add another data point, but to fit it, you need to change your existing
model.
« This will happen with each data point that we add to the model; hence, line
regression isn’t good for classification models.
Regression estimates are used to explain the relationship between one dependest
variable and one or more independent variables.
. Regression line of X on Y gives the best estimate for the value of X for
specific given values of Y :
Xea+by
Where a=X- intercept
b = Slope of the line
% = Dependent variable
¥ = Independent variable
made on a dey
TECHNICAL PUBLICATIONS® - an up-hrust for knewtedge
Scanned with CamScannerecucton 10 Race Lenina 2-3 __ Basie Machine Lenming Algorithms
Change
in¥
-J
Change in X
} = Y - intercept
Fig. 2.1.4
«Regression analysis is the art and science of fitting straight lines to patterns of
data. In a linear regression model, the variable of interest ("dependent” variable) is
predicted from k other variables ("independent" variables) using a linear equation.
HY denotes the dependent variable and X;,...,X,, are the independent variables,
then the assumption is that the value of Y at time t in the data sample is
determined by the linear equation :
Yy = Bo tBrXq1 +BaXzu + +PKXEE tet
where the betas are constants and the epsilons are independent and identically
distributed normal random variables with mean zero.
At each split point, the "error between the predicted value and the actual values
is squared to get a “Sum of Squared Errors (SSE)". The split point errors across the
variables are compared and the variable/point yielding the lowest SSE is chosen
as the root node/split point. This process is recursively continued.
+ Error function measures how much our predictions deviate from the desired
answers.
Mean-squared error J, =i ¥ wifey?
i=]
on
Advantages :
a. Training a linear regression model is usually much faster than methods such as
neural networks.
. Linear regression models are simple and require minimum memory to implement
<. By examining the magnitude and sign of the regression coefficients you can infer
how predictor variables affect the target outcome.
Scanned with CamScannerFines Machine Lonny Ay
Devtruwiuac ties Is Afavetairne Loenrrnirngy Mn,
ERED Multipte Linear Regression
wanton, which Nery
ns
n extension of linear
of two ar mare
eM rexston ts al
Multiple Nnear regres Teal aw Hv Ftict
response variable, yy to be aval
Pedy!
variables. bl
sted ables, Le.
session model, hwo of more Independent var Le. predic,
. multiple Bp
Tn a moltip *. weion madel and the muy
+ involved in the model. The almple near rege
; assume that the dependent variable bs continucus,
regression must
Difference between simple and multiple regression }
Multiple regression
‘St, No. Simple vegresston
i One dependent variable ¥ predicted (rom One dependent variable ¥ predicted fy
fone independent varkable X, a net of Independent variables
CK Hae eee Me)
2 (One regression coefficient, One regression coefficient for each
independent variable,
F : Fropertion of yarlation In dependent —R® + Proportion of variation in depervtes:
variable Y predictable from X. varlable ¥ predictable by set of
independent varisbles (Xs
EGE) Lasso and Ride Regression
+ Ridge regression and the Lasso are two forms of regularized regression. Thes
methods are seeking to improve the consequences of multicollinearity.
1. When variables are highly correlated, a large coefficient in one variable may
be alleviated by a large coefficient in another variabl i
correlated to the former, ee
2, Regularization imposes an upper threshold on the values taken by the
soefficients, thereby producing a more parsimonious solution and a set
coefficients with smaller variance.
* Riggs estimation produces a biased estimator of the true parameter fi.
EBM X) = OT eA Dp
= UXTXHAD OT Xa
= [-ATX+An-ys
= B-AXTX+ Bp
TECHNICAL PUBLICATIONS® - an up-tnnis fer hnowiedye
Scanned With CamScannerirboduction fe Machine Leaming 2-5 Basic Machine Leaming Algonthms
———— i rer Mier
+ Ridge regression shrinks the regression coefficients by imposing a penalty on their
size. The ridge coefficients minimize a penalized residual sum of squares,
« Ridge regression protects against the patentially high variance of gradients
estimated in the short directions
Lasse :
+ One significant problem of ridge regression is that the penalty term will never
force any of the coefficients to be exactly zero. Thus, the final model will include
all p predictors, which creates a challenge in model interpretation. A mote moder
machine learning altemative is the lasso.
© The lasso works in a similar way to ridge regression, except it uses a different
penalty term that shrinks some of the coefficients exactly to zero.
« Lasso : Lasso is @ regularized regression machine learning technique that avoids
over-fitting of training data and is useful for feature selection.
Decision Tree
* A decision tree is a simple representation for classifying examples, Decision tree
Jeaming is one of the most successful techniques for supervised classification
learning.
+ In decision analysis, a decision tree can be used to visually and explicitly represent
decisions and decision making. As the name goes, it uses a tree-like model of
decisions.
+ Leamed trees can also be represented as sets of if-then rules to improve human
readability.
+ A decision tree has two kinds of nodes
1. Each leaf mode has a class label, determined by majority vote of training
examples reaching that leaf.
2. Each internal node is a question on features. It branches out according to the
answers.
* Decision tree learning is a method for approximating discrete-valued target
functions. The learned function is represented by a decision tree.
A leamed decision tree can also be re-represented as a set of if-then rules,
Decision tree learning is one of the most widely used and practical methods for
inductive inference.
It is robust to noisy data and capable of learning disjunctive expressions.
Decision tree learning method searches a completely expressive hypothesis
TECHNICAL PUBLICATIONS® - an up-tiust for knowledge
Scanned With CamScannerBasic Machine Le:
-6 and
Friction to Meanie Gearing BSS
jassifying examples aS positive of 4,
Bat,
[EEXI Decision Tree Representation
* Goal : Build a decision tree for cl
instances of a concept rocessing of training examples, using a Prefer,
© Supervised learning, batch P
bias.
© A decision nee bs bes Ss cociated with it an attribute (feature),
a. Each ~ “ ha associated with it a classification (+ or -)
je has associal
be aa leat _ qecociated with it one of the possible values of the attribute ,
« arc rected. i
; is direc
from which the are
In eg ee fa test on an attribute. Branch represents an outcome of g,
7 anrernei fades represent clas labels or cass distribution.
A ‘i ision tree is a flow-chart-like tree structure, where each node denotes a ty
peat value, each branch represents an outa of the test and be
uw represent classes or class distributions. Decision trees can easily j,
converted to classification rules
Decision Tree Algorithm ia
« To generate decision tree from the training tuples of data partition D.
Input:
1. Data partition (D) 2. Attribute list 3. Attribute selection method
Algorithm :
1, Create a node (N)
2. If tuples in D are all of the same class then
3. Return node (N) as a leaf node labeled with the class C.
4. If attribute list is empty then return N as a leaf node labeled with the major
class in D
5. Apply attribute selection method(D, attribute list) to find the "best" splitit
6. Label node N with splitting criterion;
If splitting attribute is discrete-valued and multiway splits allowed
2
-> splitting attribute
8 Then attribute list -> attribute list
9. For (each outcome j of Splitting criterion )
10. Let Dj be the set of data tuples in D satisfying outcome j;
JED, is empty then attach a leaf labeled with the majority class in D to node
anaes
IL,
Seanned with ComScanner