Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
37 views161 pages

Introduction To Machine Learning-Compressed

The document outlines the syllabus for a course on Machine Learning, covering topics such as types of learning, basic algorithms, dimensionality reduction, Bayesian concepts, logistic regression, neural networks, ensemble learning, and clustering. It emphasizes the importance of machine learning in developing computational theories and algorithms that allow computers to learn from experience. The course includes multiple choice questions and practical applications to reinforce learning concepts.

Uploaded by

Pranay Chaudhari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
37 views161 pages

Introduction To Machine Learning-Compressed

The document outlines the syllabus for a course on Machine Learning, covering topics such as types of learning, basic algorithms, dimensionality reduction, Bayesian concepts, logistic regression, neural networks, ensemble learning, and clustering. It emphasizes the importance of machine learning in developing computational theories and algorithms that allow computers to learn from experience. The course includes multiple choice questions and practical applications to reinforce learning concepts.

Uploaded by

Pranay Chaudhari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 161
AS PER NEW SYLLABUS - GTU - SEN - VII (EGE/ELEX.) Professional Elective - IV Introduction to Machine -_ S f be A oc! sf Sub, Cote | 3171114 ut} * Simplified & Conceptual Approach * Multiple Choice Questions with Answers first edition : aca. i iti RI i TECHNICAL |. A. Dhotre PUBLICATIONS ‘An Up-Thrust for Knowledge Scanned With CamScanner | SYLLABUS Introduction to Machine Learning - (3171114) Examination Marks Theory Marks Practical Marks ESE (E) | PA(M) ESE(¥) PA(D 30 20 70 1. Introduction to Machine Learning Introduction, Different Types of Learning, Hypothesis Space, Inductive Bias, Evaluation and Cross Validation (Chapter = 1) 2, Basic Machine Learning Algorithms Linear Regression, Decision Trees, Learning Decision Trees, K-nearest Neighbour, Collaborative Filtering, Overfitting (Chapter - 2) 3. Dimensionality Reduetion Feature Selection, Feature Extraction (Chapter ~3) 4, Bayesian Concept of Learning, Bayesian Leaming, Naive Bayes, Bayesian Network, Exercise on Naive Bayes (Chapter - 4) 5, Logistic Regression and Support Vector Machine Logistic Regression, Introduction to Support Vector Machine, The Dual Formation, Maximum ‘Margin with Noise, Nonlinear SVM and Kemel Function, SVM: Solution to the Dual Problem (Chapter -§) 6. Basics of Neural Network Introduction to neural network, Multilayer Neural Network, Neural Network and Backpropagation Algorithm, Deep Neural Network (Chapter = 6) 7. Computation and Ensemble Learning Jntroduction to Computation Leaming, Sample Complexity: Finite Hypothesis Space, VC Dimension, Introduction to Ensembles, Bagging and Boosting (Chapter - 7) 8. Basie Concepts of Clustering Introduction to Clustering, K-mcans Clustering, Agglomerative Hierarchical Clustering (Chapter - 8) ©) Scanned With CamScanner TABLE OF CONTENTS Chapter-1 Introduction to Machine Learning (1-1) to 1.1 Introduction... 1.1.1 How do Machine Learn?. 1.1.2 Well Posed Learning Problem ............00.04 1.2 Types of Machine Learning 1.2.1 Supervised Learning . 224.2 Classification . 1.2.1.2 Regression. 1.2.2 Unsupervised Learning. .. 12.21 Clustering... 1.2.3 Reinforcement Learning. 1.2.3.1 Elements of Reinforcement Lea 1.3 Application of Machine Learning 1.4 Hypothesis Space... 1.5 Inductive Bias... 6 Evaluation and Cross Validation 1.6.1 Evaluating Performance Medel. , ' 1.6.2 Concept Learning . 1.6.3 Concept Leaming as Search. ......4...,,.,,. 1.6.4 Find -S Algorithm , ., ee 1.7 Mutliple Choice Questions with Answers. Chapter- 2 Basic Machine Learning Al, 2.1 Linear Regression... 2.1.1 Simple Linear Regression . . 2.1.2 Multiple Linear Regression .,, 2.1.3 Lasso and Ride Regression ....... 2.2 Decision Tree... &) Scanned With CamScanner 2.2.1 Decision Tree Representation .... 2.2.2 Appropriate Problem for Decision Tree Learning . 2.2.3 Advantages and Disadvantages of Decision Tree. 2.3 Basic Decision Tree Learning Algorithm 2.3.1 Which Attribute is "best"? . 2.3.2 Information Gain. 2.3.3 The 1D3 Algorithm: 2.4 K-nearest Neighbour......... wonssssned - 18 2.5 Collaborative Filtering. ue 19 2.5.1 Types of Collaborative Filtering. ....+.-- paewecens ‘ 2-20 2.5.2 Collaborative Filtering Algorithms. ... 2.2.5 .2-21 2.5.3 Advantages and Disadvantages of Collaborative Filtering . 2-23 2.6 Overfitting.... -24 2.7 Multiple Choice Questions with Answers -26 Chapter-3 Dimensionality Reduction (3 - 1) to (3 - 14) 3.1 Introduction of Dimensionality Reduction ..... 3.1.1 Advantages and Disadvantages... .. 3.2 Feature and Feature Engineering .. 3.3 Feature Transformation... 3.3.1 Feature Construction . . 3.3.2 Feature Extraction 3.4 Feature Subset Selecti 3.4.1 Issues in High - Dimensional Data 3.4.2 Key Drivers 3.4.3 Measures of Feature Relevance and Redundancy.......... slat weed 8 3.4.4 Overall Feature Selection Process.......+ Samad 3.4.5 Feature Selection Approaches......0...scseeeseeseees 3.4.6 Difference between Filter, Wrapper and Embedded Method ........+ vee B22 3.5 Multiple Choice Questions with Answers. 13-12 Chapter-4 Bayesian Concept of Learning (4 = 1) to (4- 12) 4.1 Importance of Bayesian Methods... 2 ww 1 Scanned with CamScanner 4.2 Bayes Theorem... 4.2.1 Prior and Posterior Probability . 4.2.2 Maximum - Likelihood Estimation... . 4.3 Bayes' Theorem and Concept Learning 4.3.1 Consistent Learners 4.3.2 Bayes Optimal Classifier. 4.3.3 Naive Bayes Classifier. 4.4 Bayesian Belief Network 4.5 Fill in the Blanks with Answers Chapter-5 Logistic Regression and Support Vector Maghine to (5-12) 5.1 Logistic Regression... 5.2 Introduction to Support Vector Machine .. 5.2.1 Key Properties of Support Vector Machines. 5-7 5.2.5 Comparison of SVM and Neural Networks 5-8 5.3 Kernel Methods for Non - linearity. wD Chapter-6 Basics of Neural Network (6- 1) to (6 - 26) 6.1 Introduction to Neural Network 6.1.1 Advantages of Neural Networ! 6.1.2 Application of Neural Network 6.1.3 Difference between Digital Computer and Neural Networks ... 6.2 Biological Neurons 6.3 Architecture of Neural Network ...... 6.3.1 Single Layer Feed Forward Network......... 6.3.2 Multi - Layer Feed Forward Network 6.3.3 Recurrent Neural Network...... Savina (eee 6.4 Implementation of ANN... 6.4.1 McCulloch Pitts Neuron ou Scanned With CamScanner 6.4.2 Rosenblatt's Perceptron........--+ 6.4.3 ADALINE Network Model 6.5 Backpropagation Algorithm... 6.5.1 Advantages and Disadvantages » +--+ 6.6 Deep Learning 6.7 Multiple Choice Questions with Answers: Chapter-7 Computation and Ensemble Learning women 7.1 Introduction to Computation Learning. 7.4.1 Probably Approximately Correct (PAC) Framework. 74.2 Mistake Bound Framework ....-++++++ 7.2 Sample Complexity : Finite Hypothesis Space 7.2.1 VC Dimension... . 7.2.2 VC for Neural Networ! 7.3 Introduction to Ensembles 7.3.1 Bagging . 7.3.2 Boosting. 7.3.3 Randomization Chapter-8 Basic Concepts of Clustering 8.1 Introduction te Clustering. 8.1.1 Partitioning Methods . BLL1K-meanChstering 0. ee es B112kMedoids 6... 8.1.2 Hierarchical Methods . 8.1.2.1 Difference between Clustering Vs Classification 8.2 Hierarchical Clustering... 8.2.1 Agelomerative Hierarchical Clustering ........ 8.2.2 Divisive Hierarchical Clustering ..6...6..cccceseeeus 8.2.3 Dendrogram ... 8.2.4 Agglomerative Clustering in Scikit-learn.... 8.2.5 Connectivity Constraints 8.3 Multiple Choice Questions with ANSWEFS ys. wa Scanned with CamScanner Introduction to Machine Learning Syllabus Introduction, Different Types of Learning, Hypothests Space, Inductive Bias, Evaluation and Cross Validation, Contents 4.1. Introduction 1.2. Types of Machine Leaming 1.3 Application of Machine Learning 1.4 Hypothesis Space 1.5. Inductive Blas 1.6 Evaluation and Cross Validation 1.7 Mutliple Choice Questions a-D Scanned With CamScanner a Introduction to Machine Learning t-2 Intradtueton to Machine Leaning Introduction * Machine Learning (ML) is a sub-field of Artificial Intelligence (AI) which concerns with developing compulational theories of leaning and building learning machines. * Learning is a phenomenon and process which has manifestations of various aspects. Learning process includes gaining of new symbolic knowledge and development of cognitive skills through instruction and practice. It is also discovery of new facts and theories through observation and experiment. Machine Learning Definition : A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. Machine learning is programming computers to optimize a performance criterion ‘using example data or past experience. Application of machine learning methods to large databases is called data mining. It is very hard to write programs that solve problems like recognizing a human face. We do not know what program to write because we don't know how our brain does it, Instead of writing a program by hand, it is possible to collect lots of examples that specify the correct output for a given input. ) * A machine learning algorithm then takes these examples and produces a program . that does the job. The program produced by the learning algorithm may look very different from a typical hand-written program. It may contain millions of numbers. If we do it right, the program works for new cases as well as the ones we trained it on. * Main goal of machine learning is to devise learning algorithms that do the learning automatically without human intervention or assistance. The machine learning paradigm can be viewed as “programming by example.” Another goal is to develop computational models of human learning process and perform computer simulations. * The goal of machine learning is to build computer systems that can adapt and learn from their experience. * Algorithm is used to solve a problem on computer. An algorithm is a sequence of instruction. It should carry out to transform the input to output, For example, for addition of four numbers is carried out by giving four number as input to algorithm and output is sum of all four numbers. For the same task, there may be various algorithms. It is interested to find the most efficient one, requiring 8 least number of instructions or memory or both. * For some tasks, however, we do not have an algorithm. TECHNICAL PUBLIGATIONS® - an up-trust for knowledge i Scanned with CamScanner Introduction to Machine Leeming 1-3 Intraduction ta Machine Learning Why . How P Is Machine Learning Important 7 Machine learning algorithms can figure out how to perform important tasks by generalizing from examples. Machine Learning provides business insight and intelligence. Decision makers are Provided with greater insights into their organizations. This adaptive technology is being used by global enterprises to gain a competitive edge. Machine learning algorithms discover the relationships between the variables of a system (input, output and hidden) from direct samples of the system. Following are some of the reasons : 1. Some tasks cannot be defined well, except by examples. For example ; Recognizing people. 2. Relationships and correlations can be hidden within large amounts of data. To solve these problems, machine learning and data mining may be able to find these relationships. . Human designers often produce machines that do not work as well as desired in the environments in which they are used, + The amount of knowledge available about certain tasks might be too large for explicit encoding by humans. 5. Environments change time to time, 6. New knowledge about tasks is constantly being discovered by humans. Machine learning also helps us find solutions of many problems in. computer “sion, speech recognition and robotics. Machine leaming uses the theory of Statistics in building mathematical models, because the core task is making inference from a sample. Machines Learn 7 Machine leaming typically follows three phases : Training : A training set of examples of correct behavior is analyzed and some fepresentation of the newly learnt ‘knowledge is stored, This is some form of rules. Validation : The rules are checked and, if necessary, additional training is given. Sometimes additional test data are used, but instead, a human expert may validate the rules, or some other automatic knowledge - based component may be used. The role of the tester is often called the opponent. Application : The rules are used in responding to some new situation. j TECHNICAL PUBLICATIONS” - an up-thrust for knowledge Scanned with CamScanner Fig. 1.4.1 ERED How do Machine Learn 7 * Machine learning generalization, Fig: 1.1.2 shows machine leaming process. ‘Process in divided into three Parts : Data inputs, abstraction and Fig. 1.1.2 Machine leaming process Data input ; Information is used for future decision making. Abstraction + Inj algorithm, put data is represented in broader Way through the underlying * Algorithm is used to solve a problem on computer. An algorithm is a sequence i instruction. It should carry out to transform the input to output. For example, "Teas oBLCAMOR con pal ieee (etl Scanned with CamScanner Introduction to Machine Leaming 1-6 Introduction te Machine Learning addition of four numbers is carried out by giving four number as input to the algorithm and output is sum of all four numbers. * For the same task, there may be various algorithms. It is interested to find the most efficient one, requiring the least number of instructions or memory or both, Abstraction © During the machine learning process, knowledge is fed in the form of input data. Collected data is raw data. It can not used directly for processing. * Model known in machine leaning paradigm is summarized knowledge representation of raw data, The model may be in any one of the following forms : 1. Mathematical equations. 2. Specific data structure like trees. 3. Logical grouping of similar observations. 4. Computational blocks. * Choice of the model used to solve specific learning problem is the human task. Some of the parameters are as follows : a) Type of problem to be solved. b) Nature of the input data. ©) Problem domain. Well Posed Learning Problem * Definition : A computer program is said to leam from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. * A (machine learning) problem is well-posed if a solution to it exists, if that solution is unique, and if that solution depends on the data / experience but it is not sensitive to (reasonably small) changes in the data / experience. © Identify three features are as follows : 1. Class of tasks 2 Measure of performance to be improved 3. Source of experience © What are T, P, E? How do we formulate a machine learning problem ? * A Robot Driving Learning Problem 1. Task T: Driving on public, dane highway using vision sensors. | 2 Performance measure P ; Average distance traveled before an error (as judged by human overseer). TECHNICAL PUBLICATIONS® » an up-thrust for knowiedge Scanned with CamScanner i=. a. | Introduction to Machine Leeming 1-6 Introduction to Machine Leaming 3. Training experience E : A sequence of images and steering command: recorded while observing a human driver. * A Handwriting Recognition Learning Problem. 1. Task T: Recognizing and classifying handwritten words within images. 2. Performance measure P ; Percent of words correctly classified. 3. Training experience E : A database of handwritten words with given classifications. * Text Categorization Problem. 1, Task T; Assign a document to its content category: 2. Performance measure P : Precision and Recall. 3. Training experience E : Example pre-classified documents. EE] Types of Machine Learning Learning is constructing or modifying representation of what is being experienced, Learn means to get knowledge of by study, experience ot being taught. Machine learning is a scientific discipline concerned with the design and development of the algorithm that allows computers to evolve behaviours based on empirical data, such as form sensors data or database, Machine learning is usually divided into three types : Supervised, unsupervised and reinforcement learning. Why do machine learning ? 1, To understand and improve efficiency of human learning. 2. Discover new things or structure that is unknown to humans. 3. Fill in skeletal or incomplete specifications about a domain. Unsupervised iaeining Clustering ‘Supervised learning Classification Association analysis: Regression Fig. 1.2.4 TEGHNIGAL PUBLIGATIONS® = an up-thrust for knowledge — Scanned with CamScanner Ee =— Introduction to Machine Leaming te? Introduction to Machine Learning ERI Supervised Learning Supervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training examples. “The task of the supervised learner is to predict the output behavior of a system for any set of input values, after an initial training phase, Supervised learning in which the network is trained by providing it with input and matching output patterns, These input-output pairs are usually provided by an external teacher: Human learning is bas experiences. A computer system learns from data, which repre: an application domain. ‘To leam a target function that can be used to predict the values of a discrete class attribute, eg, approve or not-approved, and high-risk or low risk. The task is commonly called : Supervised learning, Classification or inductive learning. Training data includes both the input and the desired results. For some examples the correct results (targets) are known and are given in input to the model during the learning process. The construction of a proper training, validation and test set is crucial. These methods are usually fast and accurate. Have to be able to generalize : Give the correct results when new data are given in input without knowing a priori the target. Supervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value. A supervised leaming algorithm analyzes the training data and produces an inferred function, which is called a classifier or a regression function. Fig. 12.2. shows supervised learning process. ed on the past experiences, A computer does not have sent some “past experiences” of * Leaming ‘lgodthm Training Testing Fig. 1.2.2 Supervised learning process The “ Jearned model helps the system to perform task better as compared to no arming, TECHNICAL PUBLICATIONS® - an up-thnist for knowledge Scanned with CamScanner 7 Introduction fo Machine Leaming 1-8 Introduction te Machine Leaming © Each input vector requires a corresponding target vector. Training Pair = (Input Vector, Target Vector) Fig. 1.2.3 * Supervised learning denotes a method in which some input vectors are collected and presented to the network, The output computed by the net-work is observed and the deviation from the expected answer is measured. The weights are corrected according to the magnitude of the error in the way defined by the learning algorithm. Supervised leaming is further divided into methods which use reinforcement or error correction. The perceptron learning algorithm is an example of supervised learning with reinforcement. * In order to solve a given problem of supervised learning, following steps ate performed : 1. Find out the type of training examples. 2. Collect a training set. 3. Determine the input feature representation of the learned function. 4. Determine the structure of the learned function and corresponding learning algorithm. Complete the design and then run the learning algorithm on the collected training set. 6. Evaluate the accuracy of the learned function. After parameter adjustment and learning, the performance of the resulting function should be measured on a test set that is separate from the training set. a TECHNICAL PUBLICATIONS® + an up-thrust for knowledge Scanned with CamScanner Introduction to Machine Leaming 1-9 Introduction to Machine Leaming Classification + Classification predicts categorical labels (classes), prediction models continuous-valued functions. Classification is considered to be supervised learning. ® Classifies data based on the training set and the values in a classifying attribute and uses it in classifying new data, Prediction means models continucus-valued functions, ie, predicts unknown or missing values. ‘= Preprocessing of the data in preparation for classification and prediction can involve data cleaning to reduce noise or handle missing values, relevance analysis to remove irrelevant or redundant attributes, and data transformation, such as generalizing the data to higher level concepts or normalizing data +. Fig. 1.24 shows the classification. Fig. 1.2.4 Classification labels for new samples. ‘Aim: To predict categorical class Input : Training set of samples, each with a class label. based on the training set and the class labels. + Classifier is eee tructs a model and uses the model to «= Prediction is similar to classification. It cons' predict unknown or missing value. Classification is the process of fin data classes or concepts, for the pur the class of objects whose class label is unknown. the analysis of a set of training data, + Classification and prediction may need which attempts to identify attributes that prediction process. Numeric prediction is the task of predicting continuous values for given input. For example, we may wish to predict the salary of college employee with 15 years of work experience, or the potential sales of a new product given its price. ding @ model that describes and distinguishes pose of being able to use the model to predict ‘The derived model is based on to be preceded by relevance analysis, do not contribute to the classification or TECHNICAL PUBLIGATIONS® - an up-thrust for knowledge Scanned with ComScannier Introduction ta Machine Lesming {Introduction to Machine Learning @ * Some of the classification metheds like back-propagatian, support ‘vector machines, and k-nearest-neighbor classifiers can be used for prediction. FFA Regression © For an input x, if the output is continuous, this is called a regression problem. For example, based on historical information of demand for tooth paste in your nd for the next month. you are asked to predict thee demat Regression is concerned with the prediction of continuous quantities. Linear regression is the oldest and most ‘widely used predictive model in the field of machine learning. The goal is to minimize the sum of the squared errors to fit a straight line to a set of data poin's For regression tasks, the typical accuracy (RMSE) and Mean ‘Absolute Percentage Error distance between the predicted qjumeric target and the actual supermarket, metrics are Root Mean Square Error (MAPE). These metrics measure the numeric answer. Regression Line Least squares + The least squares zegression line is the line that makes the sum of squared residuals as small as possible. Linear means “straight line’. on Tine is the line which gives the best estimate of one variable from the Regressi value of any other given variable. ‘The regression tine gives the average relationship between the two variables in mathematical form, For two variables X and ¥, there are always two lines of regression. best estimate for the value of X for any Regression line of X on Y : Gives the specific given values of Y : X= a+bY where a = X- intercept b = Slope of the line X = Dependent variable Y = Independent variable * Regression line of ¥ on X : Gives the best estimate for the value of Y specific given values of X : Y = a+bx for any TECHNICAL PUBLICATIONS® - an up-thrust fer knowledge a _d Scanned With ComScanner Jnfroduetion to Machine Leeming te Introduction to Machine Learning where a = Y- intercept b = Slope of the line Y = Dependent variable x = Independent variable + By using the least squares method (a procedure that minimizes the vertical deviations of plotted points surrounding a straight line) we are able to construct a best fitting straight line to the scatter diagram points and then formulate a regression equation in the form of : F = a+bx § = Frb@e-3) % Population a‘ Population slope Random ' y= intercept } i error ae vitbeo Bx f Dependent Independent variable variable x Fig. 1.2.5 * Regression analysis is the art and science of fitting straight lines to pattems of data. In a linear regression model, the variable of interest ( “dependent” variable) is predicted from k other variables ("independeni” variables) using a linear equation. If Y denotes the dependent variable, and X;,..., Xp, are the independent variables, then the assumption is that the value of Y at time t in the data sample is determined by the linear equation : Y, = Bo +B Xie +B2 Xap + Where the betas are constants and the epsilons are independent and identically ny distributed normal random variables input vector { _— x t Bu Xue +84 ‘Bias term ——= 1 Wo ; fw) with mean zero. 3 « In a regression tree the idea is this : Since the target variable does not have classes, we fit a Fig. 1.2.6 TECHNICAL PUBLICATIONS® « an upethrust for knowledge Scanned with CamScanner SS Introduction ta Machine Leaming 1212 Introduction to Mochine La. ring regression model to the target variable using each of the independent variabe, Then for each independent variable, the data is split at several split points. = At each split point, the “error” between the predicted value and the actual valyes is squared to get a "Sum of Squared Errors (SSE)". The split point errors across the variables are compared and the variable/point yielding the lowest SSE is chosen as the root node/split point. This process is recursively continued. + Error function measures how much our predictions deviate from the desitet answers. Svinte)? i-T on ‘Mean-squared exzor Jy n Multiple linear regression is an extension of linear regression, which allows a response variable, y, to be modeled as a linear function of two or more predictor variables, Evaluating a Regression Model « Assume we want to predict a car's price using some features such as dimensions, horsepower, engine specification, mileage etc. This is a typical regression problem, where the target variable (price) is a continuous numeric value. We can fit a simple linear regression model that, given the feature values of ¢ certain car, can predict the price of that car, This regression model can be used to score the same dataset we trained on. Once we have the predicted prices for all of the cars, we can evaluate the performance of the model by looking at how much the predictions deviate from the actual prices on average. Advantages : a. Training a linear regression model is usually much faster than methods neural networks, b. Linear regression models are simple and require minimum memory to implement- ¢. By examining the magnitude and sign of the regression coefficients you can infet how predictor variables affect the target outcome. such as Assessing Performance of Regression- Error Measures ©. The training error is the mean error over the training sample. The te expected prediction error over an independent test sample. «Fig. 1.2.7 shows the relationship between training set and test set. 4 * Unlike decision trees, regression trees and model trees are used for prediction regression trees, each leaf stores a continuous-valued prediction. In model each leaf holds a regression model. st error is the Scanned with CamScanner Introduction to Machine Laaming 1-13 Introduction to Machine Leeming | Create a model Eslimate accuracy Fig. 1.2.7 Unsupervised Learning The model is not provided with the correct results during the training. It can be used to cluster the input data in classes on the basis. of their statistical properties only. Cluster significance and labeling. The labeling can be carried out even if the labels are only available for a small number of objects representative of the desired classes. All similar inputs patterns are grouped together as clusters. If matching pattern is not found, a new cluster is formed. There is no error feedback. External teacher is not used and is based upon only local information. It is also referred to as self-organization. ‘They are called unsupervised because they do not need a teacher or super-visor to label a set of training examples. Only the original data is required to start the analysis. In contrast to supervised learning, unsupervised or self-organized learning does not require an external teacher. During the training session, the neural network receives a number of different input patterns, discovers significant features in these pattems and leas how to classify input data into appropriate categories. TECHNICAL PUBLICATIONS® - an up-thrust fer krawledgo Scanned with CamScanner introduction to Machine Leaming Introduction to Machine Leaming 1-14 idly and can be used jn * Unsupervised learning algorithms aim to leam ed 1 date el realtime. Unsupervised leaming is frequently nr 7 feature extraction etc. by Zurada is typically ing @ Another mode of learning called neces aapoatav’ mamnoey.satwns' employed for associative memory BEINN. | ciworks stable states. designed by recording several idea patterns Clustering shod by which large sets of data are grouped into met © Clustering of data is a oP similar data, Clustering can be considered the most e . clusters of smaller sets of st important ursupervsed IPE a ypich are “similar” between them and ection of objects Ww! oA fe is ee objects ele to other clusters. Fig. 1.2.8 shows cluster. are "dissimi Bing Fig. 1.2.8 Cluster * In this case we easily identify the 4 clusters into which the data can be divided; the similarity criterion is distance : two or more objects belong to the same cluster if they are "close" according to a given distance (in this case geometrical distance). This is called distance-based clustering, Clustering means grouping of data or dividing a large data set into smaller data sets of some similarity. A clustering algorithm attempts to find natural groups of components or daté based on some similarity. Also, the clustering algorithm finds the centroid of a group of data sets, To determine cluster membership, most algorithms evaluate the distance between Me and the cluster centroids. The output from a clustering algorithm is ‘¥ a statistical description of the cluster centroi i! of tng iniiveptinciatce ids with the number Scanned With CamScanner toMachine Learning Z 215 ft Int to Machine Lemin ering Raw date foo Shusters of data « Cluster centroid : The centroid of a cluster is a point whose par the mean of the parameter values ofall the points in the chistes, Each dans A - luster has 2 well defined centroid. Centraid Distance : The distance between two points is taken as a common metric to as see the similarity among the components of a population. The commonly used distance measure is the Euclidean metric which defines the distance between two points p = (p1,P2--) and q = (41,42, ~) is given by : a= ¥ @-ai)? i= The goal of clustering is to determine the intrinsic grouping in a set data, But how to decide what constitutes a good clustering ? It can. be shown that there is no absolute “best” criterion which would be independent of the final aim of the clustering. Consequently, it is the user which must supply this criterion, in such a way that the result of the dlustering will suit their needs. Clustering analysis helps construct meaningful partitioning of a large set of objects. Cluster analysis has been widely used in numerous applications, including patter recognition, data analysis, image processing, etc. as listed below + * Clustering algorithms may be classified 1. Exelusive clustering 2. Overlapping clustering 3. Hierarchical clustering ee ae an® an uprtist for erowscre Scanned With ComScanner of unlabeled Introduction to Me chine. ‘Leaming 1-16 Introduetion to Machine . A good clustering method will produce high quality clusters with high intracigg, similarity and low inter-class similarity. The quality of a clustering result depend, on both the similarity measure used by the method and its implementation, The quality of a clustering method is also measured bY its ability to discover some ». all of the hidden patterns. Examples of Clustering Applications 1. Marketing : Help marketers discover then use this knowledge to develop targeted 2. Land use : Identification of areas of simil database. 3, Insurance : Identifying average claim cost. 4. Urban planning = Identifying grou value, and geographical location. § Seismology : Observed earth quake epicenters should be clustered along continent faults. EEE] Reinforcement Learning + User will get immediate feedback in supervised leaming and no feedback from unsupervised learning; But in the reinforced learning, you will get delayed scalar feedback. Reinforcement learning is learning what to do and how to map situations to actions. The learner is not told which actions to take. Fig. 129 shows concept of reinforced learning. * Reinforced learning is deals with agents that must sense and act upon their environment, It combines an Artificial Intelligence machine learning techniques, It allows machines behavior within a distinct groups in their customer bases and marketing programs. ar land use in an earth observation groups of motor insurance policy holders with a high ps of houses according to their house type Fig. 1.2.9 Reinforced learning SS oatias agents to automatically determine the ideal Specific context, in order to maximize its performance. Simple Png Scanned With ComScanner Introduction to Machine Leaming 1-47 Introduction to Machine Leaming reward feedback is required for the agent to learn its behavior; this is known as the reinforcement signal. * Two most important distinguishing features of reinforcement learning is trial-and-error and delayed reward, * With reinforcement learning algorithms an agent can improve its performance by using the feedback it gets from the environment. This environmental feedback is called the reward signal. * Based on accumulated experience, the agent needs to learn which action to take in a given situation in order to obtain a desired long term goal. Essentially actions that lead to long term rewards need to reinforced. Reinforcement learning has connections with control theory, Markov decision processes and game theory. © Example of Reinforcement Leaming : A mobile robot decides whether it should enter a new room in search of more trash to collect or start trying to find its way back to its battery recharging station. It makes its decision based on how quickly and easily it has been able to find the recharger in the past. EER] Elements of Reinforcement Learning * Reinforcement learning elements are as follows : 1. Policy 2. Reward Function 3. Value Function 4. Model of the environment Fig. 1.2.10 shows Policy : Policy defines the learning agent behavior for given time period. It is a f° ome mapping from perceived a states of the environment to — actions to be taken when in idipeter those states. © Reward Function : Reward ae function is used to define a rn geal in a reinforcement ‘Agent learning problem. It also Fig, 4.2.10 ; Elements of reinforcement learning maps each perceived state of the environment to a single number. ‘Value function : Value functions specify what is good in the lang run. The value of a state is the total amount of reward an agent can expect to accumulate over the future, starting from that state. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Scanned with CamScanner Introduction to Mactrine Learning 1-18 Introduction to Machine in ‘Model of the environment : Models are used for planning. Credit assignment problem : Reinforcement learning algorithms learn to generate ‘an internal value for the intermediate states as to how good they are in leading to the goal. ‘The learning decision maker is called the agent. The agent interacts with the environment that includes everything outside the agent, ‘The agent has sensors to decide on its state in the environment and takes an action that modifies its state. The reinforcement leaming problem model is an agent continuously interacting with an environment. The agent and the environment interact in a sequence of time steps. At each time step t, the agent receives the state of the environment and a scalar numerical reward for the previous action, and then the agent then selects an action. Reinforcement Leaming is a technique for solving Markov Decision Problems. Reinforcement learning uses a formal framework defining the interaction. between a learning agent and its environment in terms of states, actions, and rewards. This framework is intended to be a simple way of representing essential features of the artificial intelligence problem, Difference between Supervised, Unsupervised and Reinforcement ke | | Supervised leaming requires For unsupervised learning that the target variable is well typically either the target defined and that 3 sufficient variable is unknown or has Learning ‘Supervised learning —- Unsupervised learning number of its values are only been recorded for too ‘The leamer is not told which given. small a number of cases, actions to take. ‘Supervised learning deals with Unsupervised Learning deals Reinforcement learning deals _ two main tasks regression and with clustering and associative _ with exploitation or classification. rule mining problems. exploration, Makov's decision processes, policy leaming, deep leaming and value learning. The input data in supervised Unsupervised learning uses The data is not predefined in Jearning in labelled data, unlabelled data. cement learning. | TECHNICAL PUBLICATIONS® - an up-thrust for knowledge j Scanned With CamScanner 110 Inleoedielion Wa Wahi Lawnierg ‘Leams by using labelled data, ‘Trained using unlabelled data — Works on interacting with the without any guldance, envvironaient. Mapa tho labeled inputs to the Understands patterns and Fotlavws the trial and error — known: EES Application of Machine Loarning + Examples of successful applications of machine learning, : 1, Leaming to recognize spoken words. 2. Learning to drive an autonomous vehicle. 3, Learning to classify new astronomical structures, 4, Learning to play world-class backgammon. 5. Spoken language understanding: within the context of a limited domain, determine the meaning of something uttered by a speaker to the extent that it can be classified into one of a fixed set of categories. Face Recognition * Face recognition task is effortlessly and every day we recognize our friends, relative and family members. We also recognition by looking at the photographs. In photographs, they are in different pose, hair styles, background light, makeup and without makeup. * We do it subconsciously and cannot explain how we do it. Because we can't explain how we do it, we can't write an algorithm. * Face has some structure. It is net a random collection of pixel. It is symmetric structure. It contains predefined components like nose, mouth, eye, ears. Every person face is a pattern composed of a particular combination of the features. By analyzing sample face images of a person, a learning program captures the pattern specific to that person and uses it to recognize if a new real face or new image belongs to this specific person or not. * Machine learning: algorithm creates an optimized model of the concept being Jearned based on data or past experience. Healthcare : * With the advent of wearable sensors and devices that use data to access health of a patient in real time, ML is becoming a fast-growing trend in healthcare. * Sensors in wearable provide real-time patient information, such as overall health condition, heartbeat, blood pressure and other vital parameters. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Scanned With CamScanner ieppeucnan ts Mantine Leeming 1-20 Introduction to Machina Leaning and medical experts can use this information to analyse the health Doctors 4 . al, draw a pattern from the patient history and predict the condition of an individu occurrence of any ailments in the future. + The technology also empowers medical experts to analyze data to identify trends that facilitate better diagnoses and treatment. Financial services : = «+ Companies in the financial sector are able to identify key insights in financial data as well as prevent any occurrences of financial fraud, with the help of machine learning technology. © The technology is also used to identify opportunities for investments and trade, © Usage of cyber surveillance helps in identifying those individuals or institutions which are prone to financial risk and take necessary actions in time to prevent fraud. EEE hypothesis Space © Hypothesis represents a function approximation for the target function. It is used to associate/estimate or predict the target value Y, based on the input dataset, X, model parameters, and hyper-parameters. It is represented using the letter, h. The hypothesis is also referred to as a model. The hypothesis can be represented as Y = h(X). Fig: 14.1 shows diagram representing the hypothesis. Fig. 1.4.1 Diagram representing the hypothesis. If H comprises all possible subsets of X, we cannot lea anything new beyond the training data in D, because the labels ¢(x) of instances x outside D can independently and arbitrarily be 0 or 1. That is, we have no inductive bias. = Hypothesis space represents one or more hypothesis or function approximation or models which can be created using different training data set derived from the population. The different hypothesis or models is created using a combination of different training data set derived from the same population, features and hyper parameters. One or more hypothesis or functions can also be said to be part of what can be called as hypothesis class. Fig. 1.4.2 shows hypothesis class. TECHNICAL PUBLICATIONS® ~ an up-thrust for knowledge Scanned with CamScanner Inlroducfian to Machine Learning 4-24 Intreductian to Machine Leerning | Hypothesis . yoo [+ xh |- Y Fig. 1.4.2 Hypothesis class ‘Learning algorithm is not same as the hypothesis or function approximation. A learning algorithm for a concept learning problem is given a set D of training examples, and it returns a hypothesis h. * Search space : The space of all feasible solutions is called search space. Each point in the search space represent one feasible solution. Each feasible solution can be "marked" by its value or fitness for the problem. * If we are solving some problem, we are usually looking for some solution, which will be the best among others. * We are looking for our solution, which is one point or more among feasible solutions, that is one point in the search space. ‘The looking for a solution is then equal to a looking for some extreme (minimum or maximum) in the search space. The search space can be whole known by the time of solving a problem, but usually we know only a few points from it and we are generating other points as the process of finding solution continues. © Genetic algorithm employs a randomized beam search method to seek a maximally fit hypothesis. Motivation : © The solutions(s) to machine learning tasks are often called hypotheses, because they can be expressed as a hypothesis that the observed positives and negatives for a categorization is explained by the concept leamed for the solution. « The hypothesis have to be represented in some representation scheme and as usual with AI tasks, their will have a big effect on many aspects of the learning methods. * General definition of a hypothesis : "A hypothesis is a statement of a relationship between two or more variables”. * Sometimes, it is necessary to evaluate the performance of learned hypotheses. TECHNICAL PUBLICATIONS® - en up-thrust for knowledge Scanned with CamScanner Introauetion to Machine Leaming fom Introduction to Mactirns Reason for using hypotheses + Learning from a limited + size database indicating the effectiveness of medical treatments, it is important to understand as precisely as Possible accuracy of the leamed hypotheses. = © The evaluating hypotheses are an integral component of many learning methods + It is important to understand the likely errors inherent in estimating the of the pruned and unpnined tree. Estimating the accuracy of a hypothesis is relatively straightforward shen dais; plentiful. 7 « An estimator is any random variable used to estimate some parameter of underlying population from which a sample is drawn. — 1. The estimation bias of an estimator Y for an arbitrary parameter p is EY] _ If the estimation bias is 0, then Y is an unbiased estimator for p. 7 2 The variance of an estimator Y for an atbitrary parameter p is simply & variance of Y. Looe EEX inductive Bias . ce cream | Elimination algorthin will converge toward the ime ieee concept provided it is given accurate training examples provided fis ini ‘hypothesis space contains the target concept. an oo ‘What if the target concept is not contained in the hypothesis space ? * Can we avoid this difficuly 5 : Possible hypothesis 7 fy By using 2 hypothesis space that inchudes evey accuracy a Scanned with CamScanner Introduction to Machine Leaming introduction to Machine Leaming Forecast Enjoy Sport| | Bam Sky Ais Temp Humidity Wind Water be Sunny ‘Warm fish Strong, Cool Change YRS 4] % Cloudy Wem N ul Strong Cool Change YES 3 Doral secs: Coot Change NO | « From first two examples : $2: * This is inconsistent with third examples, and there are no hypotheses consistent with these three examples PROBLEM : We have biased the learner to consider only conjunctive hypotheses. We require a more expressive hypothesis space. + The obvious solution to the problem of assuring that the target concept is in the hypothesis space H is to provide a hypothesis space capable of representing every teachable concept. Inductive Bias - Fundamental Property of Inductive Inference : + A learner that makes no a priori assumptions regarding the identity of the target concept has no rational basis for classifying any unseen instances. * Inductive Leap : A leamer should be able to generalize training data using prior assumptions in order to classify unseen instances. * The generalization is known as inductive leap and our prior assumptions are the ‘inductive bias of the learner. * Inductive Bias (prior assumptions) of Candidate-Elimination algorithm is that the concept can be represented by a conjunction of attribute values, the target ‘concept is contained in the hypothesis space and training examples are correct. Inductive Bias - Formal Definition © Consider a concept learning algorithm L for the set of instances X. Let c be an arbitrary concept defined over X, and let De = {) be an arbitrary set of training examples of c. * Let L{xj,D,) denote the classification assigned to the instance x; by L after training on the data D.. * The inductive bias of L is any minimal set of assertions B such that for any target concept ¢ and corresponding training examples D, the following formula holds. (xy XB De *x3)4 (Xi, Ded] TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Scanned with CamScanner Introduction to Machine Learning 1-24 Introduction to Moching Leong Three Learning Algorithms. : ROTE-LEARNER : Learning corresponds simply to storing each observed trai example in memory. Subsequent instances are classified by looking them 1 memory: If the instance is found in memory, the stored classification is retureg Otherwise, the system refuses to classify the new instance, Inductive Bias ; No inductive bias CANDIDATE-ELIMINATION : New instances are classified only in the case where all members of the current version space agree on the classification. Otherwise, the system refuses to classify the new instance. Inductive Bias : the target concept can be represented in its hypothesis space, FIND-S : This algorithm, described earlier, finds the most specific hypothesis consistent with the training examples. It then uses this hypothesis to classify all subsequent instances. Inductive Bias : The target concept can be represented in its hypothesis space, and all instances are negative instances unless the opposite is entailed by its other knowledge. ining P in Evaluation and Cross Validation Cross-validation is a technique for evaluating estimating performance by training several machine learning models on subsets of the available input data and evaluating them on the complementary subset of the data. Use cross-validation to detect overfitting, i, failing to generalize a pattern. In general, machine learning involves deriving models from data, with the aim of achieving some kind of desired behaviour, e.g., prediction or classification. Fig. 1.6.1 shows cross-validation. Dataset [Tea] nanan Cross validation Bata permitting Tasting | Training, Validation, Testing Fig. 1.6.4 Cross. validation TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Scanned with CamScanner + But this generic task is broken down into a number of special cases. When training is done, the data that was removed can be used to test the performance of the learned model on "new" data. This is the basic idea for a whole class of model evaluation methods called cross validation, «Types of cross validation methods are holdout, K-fold and Leave-one-out. The holdout method is the simplest kind of cross validation. The data set is separated into two sets, called the training set and the testing set, The function approximate fits a function using the training set only. The K-fold cross validation is one way to improve over the holdout method. The data set is divided into k subsets, and the holdout method is repeated k times. * Each time, one of the k subsets is used as the test set and the other k ~ 1 subsets are put together to form a training set. Then the average error across all k trials is computed, * Leave-one-out cross validation is K-fold cross validation taken to its logical ‘extreme, with K equal to N, the number of data points in the set. « That means that N separate times, the function approximate is trained on all the data except for one point and a prediction is made for that point. © Cross-validation ensures non-overlapping test sets. K-fold cross-validation : * In this technique, k ~ 1 folds are used for training and the remaining one is used for testing as shown in Fig. 1.6.2. ‘Total number of examples a Saeee [| ey Experiment 3 an examples Experiment 4 Fig. 1.6.2 K-fold cross validation * The advantage is that entire data is used for training and testing. The error rate of the model is average of the error rate of each iteration. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Scanned With CamScanner ' Intrestuction to Machine Lowrning: 168 Joteduoton io Machine Leary also be called a form the repeated hold-out method. The étroe © This technique ean 1 by using, atratification technique. rate could be improv Evaluating Performance Moclol © Classification is major task of supervised earning: The responsibility of the classification model is to assign class Inbel to the target feature based on the value of the predictor features. When performing classification predictions, (here's four types of outcomes that could occur, The evaluation mensures in clatalfieation problems are defined from 4 matrix with the numbers of examples correctly and incorrectly classified for each class, named confusion matrix. Confusion matrix is also called a contingency table. 1) True positives are when you predict an observation belongs to a class and it actually does belong to that class, 2) True negatives are when you predict an observation does not belong to a class and it actually does not belong to that class. 3) False positives occur when you predict an observation belongs to a class when in reality it does not. 4) False negatives occur when you predict an observation does not belong to a class when in fact it does. * Confusion matrix goes deeper than classification accuracy by showing the correct and incorrect (ie. true or false) predictions on each class. In case of a binary classification task, a confusion matrix is a 2.x 2 matrix. If there are three different classes, it is a 3.x 3 matrix and so on. + For any classification model, model accuracy is given by total number of correct classifications (True Positive or True Negative) divided by total number of classifications done. Accuracy rate [rue segatives| ftir Ise negatives + [True positives + [True neg * The complement of accuracy rate is the error rate, which evaluates a classifier 'Y its percentage of incorrect predictions. TECHNICAL PUBLICATIONS® - an uptnrust for knowledge 4 Scanned With CamScanner Introduction to Machine Leaming 1-27 Introcuction to Machine Learning |False negatives | + | False positives | Error rate = : S [Falke negatives] + [False posilives | » [True negatives |+ [True positives] Error tate = 1 — (Accuracy rate) The recall accuracy rate predicted as positive. * The specificity is a statistical measure of how well a binary classification test correctly identifies the negatives cases. Recall (R) [Tre negative] _ [True posit [False positives + [True negatives, * True Positive Rate (TPR) is also called sensitivity, hit rate and recall. Number of true positives ‘Number of true positives + Number of false negative «= Frecision measures how good our model is when the prediction is positive. Sensitivity = Precision = ps * The focus of precision is positive predictions. It indicates how many positive predictions are true. + F, score is the weighted average of precision and recall. Fe = 2 cn Real F, score is a more useful measure than accuracy for problems with uneven class distribution because it takes into account both false positive and false negatives. = Kappa value of a model indicates the adjusted the model accuracy Total accuracy - Random accuracy Marreee San accuracy E Total accuracy is simply the sum of true positive and true negatives, divided by the total number of items, that is : stat _ __TP+IN otal accuracy = TpyTN+FP+EN © Random Accuracy is defined as the sum of the products of reference likelihood and result likelihood for each class. That is, Random acy = Actual False + Predicted False + Actual True * Predicted True Total «Total TECHNICAL PUBLICATIONS” - an up-thrust for knonledge Scanned With CamScanner 1-28 Introduction to Machina L Irv to Machine weming TS 78__TE ST toy ms posi be written as : - { false positives etc., random accuracy can ee (IN-+FP) *(IN-+ FN) HFN +TP) #(FP + TP) (IN +FP) «(N+ Random accuracy = Total Total (CEEERED Consiter the he flog three-class confusion matrix. i TT predetad | 45 : 2 Actual - ms | : : Calculate precision and recall per class, Also calculate weighted average precision and recall for the classifier. Solution : Calculate per-class precision and recall : ‘ iB 15 First class = 77 = 063 and 3 = 0.75 15 15 Second class 3p 7 0:75 and 55 = 0.50 Third class = # = 08 and # = 09 Calculate accuracy, precision mud recall for the following : Predicted + Predicted ~ | Actual +: 6 15 Actual ~ TEGHNICAL PUBLICATIONS® - an up-hrust for knowledge _ad Scanned With CamScanner Introduction to Machine Leaming 1-29 Introduction te Machine Learning Solution : Accuracy = O48 a 75 5075 2 75% Precision = OO = 0.8571 Recall wes GERREIRE Colcutate true regative rte (ty } accuracy and pos fr the following Predicied + Predicted ~ | Solution : 50+25 75 AOMEY “Sorapzreao 109 Oe Py = 0. = 0.9090 vedision = 50. + True negative rate is also called as specificity, True negative rate = ma =08 ROC Curve = + Receiver Operating Characteristics (ROC) graphs have long been used in signal detection theory to depict the trade-off between hit rates and false alarm rates over noisy channel. Recent years have seen an increase in the use of ROC graphs in the machine learning community. * ROC curve summarizes the performance of the model at different threshold values by combining confusion matrices at all threshold values, ROC curves are typically used in binary classification to study the output of a classifier. + An ROC plot plots true positive rate on the Y-axis against false positive rate on the X-axis; a single contingency table corresponds to a single point in an ROC plot. TECHNIGAL PUBLICATIONS® - an up-tnrust for knowledge Scanned With CamScanner 1-20 Introduction to Maching lev, Ineveuction fo Machine Leaming " py drawing a piecewise linear a ranker ean be assessed by ¢ ‘ ae in ROC curve. The curve starts in (0, 0), finishes i in both axes. © The performance of jn an ROC plot, known As (A, 1, and is monotonically’ A « classifiers and visualizing their performy, A useful technique for organizing © a Eepecialy wef fer domains with skewed class distribution and une classification error costs. / Te allows to create ROC curve and a complete sensitivity/specificity report, 1}, ROC curve és. a fundamental tool for diagnostic test evaluation, tn a ROC curve the true positive rate (Gensitivity) Is plotted in function of thy false positive rate for different cut-off points of a parameter. + Each point on the ROC curve represents a sensitivity/specificity pay somesponding to a particular decision threshold, The aren under the ROC curve js a measure of how well a parameter can distinguish between two diagnostic groups. * Each point on an ROC curve connecting two segments corresponds to the true and false positive rates achieved on the same test set by the classifier obtained from the ranker by splitting the ranking between those two segments. y * An ROC curve is convex if the slopes are monotonically non-increasing when nestecreasittg moving along the curve from (0, 0) to (1, 1). A concavity in an ROC curve, io, two or more adjacent segments with increasing slopes, indicates a locally worse than random ranking. 7 True Positive Rate (TPR) is a synonym for recall and is therefore defined 0 follows: True Positive Rate TPR = TF TP+FN * False Positive Rate (FPR) is defined as follows : False Positive Rate FPR = FP+IN EEE] concept Learning * Inducing general functions from machine learning. * Concept t i Coneep deuing | Acquiring the definition of a general category from gi ple positive and negative training examples of the category, © Concept te: fi cept leaming can be seen as a problem of searching through a predefine! Space of potential hy i eager Ypotheses for the hypothesis that bost fits the traitité aa i PUBLICATIONS® an up.thrust far kno specific training examples is a main issue ° r Scanned With CamScanner Introduction to Machine Leaming 1-3f intraduetion to Machine Learning * The hypothesis space has a gencral-to-specific ordering of hypotheses, and the search can be efficiently organized by taking advantage of a naturally occurring structure over the hypothesis space. * Formal definition for concept learning : Inferring a boolean-valued function from training examples of its input and output. * An example for concept - learning is the learning of bird-concept from the given examples of birds (positive examples) and non-birds (negative examples). + We are trying to leam the definition of a concept from given examples. + Concept leaming involves determining a mapping from a set of input variables to a Boolean value. Such methods are known as inductive learning methods. * Ifa function can be found which maps training data to correct classifications, then it will also work well for unseen data. This process is known as generalization. « Example : Leam the “days on which my friend enjoys his favorite water sport” Example Sky Air Temp Humidity Wind Water Forecast = Enjoy | © A set of example days, and each is described by six attributes. The task is to earn to predict the value of EnjeySport for arbitrary day, based on the values of its attribute values. «The inductive learning hypothesis : Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples © Although the learning task is to determine a hypothesis ( h) identical to the target concept cover the entire set of instances ( X), the only information available about c is its value over the training examples. «Inductive learning algorithms can at best guarantee that the output hypothesis fits the target concept over the training data. TECHNICAL PUBLICATIONS® - an up-theust for knowiedge Scanned With CamScanner Introduction to Machine Learnlng 1-92 Introduction to Machine tan, * Lacking any further information, our assumption is that the bes Tegarding unseen instances is the hypothesis that best fits the oi data. This is the fundamental assumption of inductive learning. * Hypothesis representation (constraints on instance attributes) ; 1. Any value is acceptable is represented by ? 2. No value is acceptable is represented by @ t hypothe, ting EEE] concept Learning as Search * Concept learning can be viewed as the task of searching through a large space of hypotheses implicitly defined by the hypothesis representation. * The goal of this search is to find the hypothesis that best fits the training examples. * By selecting a hypothesis representation, the designer of the leaming algorithm implicitly defines the space of all hypotheses that the program can ever represet and therefore can ever learn. + A hypothesis is a vector of constraints for each attribute. 1. Indicate by a "2" that any value is acceptable for this attribute 2. Specify a single required value for the attribute 3. Indication by a "@” that no value is acceptable * I some instance x satisfies all the constraints of hypothesis h, then h classifies x= a positive example (hix) = 1). * Example hypothesis for EnjoySport : EnjoySport concept learning task * Given Instances X: Possible days, gach described by thé attribiites Sky (with possible values Sunny, Cloudy, and Rainy) AirTemp (with values Warm and Cold) is Humidity (with values Normal and High) Wind (with values Strong and Weak) Water (with values Warm and Cool), and Forecast (with values Same and Change) Hypothesis H: Each hypothesis is descrined ‘by a conjunction of constraints oa th attributes, The constraints may be "7", 2", or a specific value dee ase br 0 ining Examples D: Positive o ts tee r negative examples of * Determine : A hypothesis h in H such that A(x) = cf) for all x in X a eeeeeeeeeSeSsSs“‘CNRNNNw” TECHNICAL, PUBLICATIONS® ~ an upsthruet fr knowledge Scanned With CamScanner Introduction io Machine Learning 1239 Introduction to Mechine Learning sis « Search through a large space of hypothesis implicitly defined lista hypothesi representation. Find the hypothesis that best fits the training examples. — How big is the hypothesis space 7 In EnjoySpart six attributes : Sky has 3 values, and the rest have 2. How many distinct instances ? How many hypothesis ? the designer of the le « By selecting a hypothesis representation, implicitly defines the space of all hypothesis the program can ever repre: therefore can ever learn. st2°22re2= % arning algorithm ent and Instances : Hypothesis : 4gegtaegeas+1 = 973 g task. Most practical learning tas! 5tgearard* d= 5120 ks involve much + This is a very simple leaminy larger, sometimes infinite, hypothesis spaces. General-to-Specific Ordering of Hypotheses - + Many algorithms for concept learning organize the search through the hypothesis spaces by relying on a general-to-specific ordering of hypotheses. By taking advantage of this naturally occurring structure over the hypothesis space, we can design learning algorithms that exhaustively search even infinite hypothesis spaces without explicitly enumerating every hypothesis. 4 Consider two hypotheses : hy = (Sunny, ?, 7, Strong, ?, 2) hy = (Sunny, ?,?, 2,22) Naw consider the sets of instances that are classified positive by hy and by h3. Because hz imposes fewer constraints on the instance, it classifies more instances as positive. In fact, any instance classified positive by hj will also be classified positive by h2. ‘Therefore, we say that hy is more general than hy. One learning method is to determine the most specific hypothesis that matches all the training data. More-General-Than-Or-Equal Relation : Let hy and hz be two boolean-valued functions defined over X. Then hy is more-general-than-or-equal-to hy (written hy 2 ha). If and only if any instance that satisfies hy also satisfies hy. hy is more-general-than hz (hy > ho) if and only if hy 2 hg is true and hy 2 hy is false, We also say h is more-specific-than hy. TECHNICAL PUBLICATIONS® » an up-thnust for knowledge Seanned with ComBcannee Introduction to Machin Introduction to Machine Leeming 1-34 hy 2 hyiffvxeXihyes) = La hjo)=1 Specific Nf thy = «Sunny, 2,2, Su0Ng, ? 7 Soy ty pe asung.?.?, 7 2% Latice he h = h = h = The largest concept (in C) may not be contained in H. (X= 322222 = % enue learner converged to the target concept, as there can be several Consist hypotheses (with both positive and negative training examples) 7 2. Why the most specific hypothesis is preferred ? 3. What if there are several maximally specific consistent hypotheses ? 4. What if the training examples are not correct ? Mutliple Choice Questions ai Machine Tearaing is a sub - field of which concerns with developiag computational theories of learning and building learning machines. ‘a. artificial intelligence fe neural network ‘a! soft computing a2 learning in which the network is trained matching output patterns. ET ENE ae oe a Unsupervised | Somi-supervised 'b) Supervised d All of these 3 Supervised learning and wnsupervised lea ning are the f 4 human learning “i types of b model leami § machine leaning alc mung. | lone Q4 Unsupervised learnig, 8 Uses i fj — data. a labelled label © unlabelled i Hed and unlabelled a test = TECHNICAL print ima Scanned with CamScanner introduction to Machine Leaming 1-37 Introduction to Machine Leaming Q5 A computer program is said to learn from E with respect to some class ‘of tasks T and performance measure P, if lis performance at tasks in T, as measured by P, improves with experience E. a) training ‘b. experience c. testing d. algorithm Q.6 Unsupervised Learning dents with and____ mining problems. a. classification, regression || clustering, classification ¢ clustering, associative rule ‘d| label, unlabelled data a7 learning deals with two main tasks regression and classification, a) Reinforcement ‘b| Deep ‘c Un supervised (d) Supervised Q8 The individual tuples making up the training set are referred to as and are selected from the database under analysis. [a learning tuples |B] training tuples “e samples id) database Q9 Machine learning is inherently « multi-disciplinary field. 12) Inter-disciplinary (| Multi-disciplinary c) Single (dl None Q10 methods have been used to train computer - controlled vehicles to steer correctly when driving on a variety of road types. (a) Machine learning { | Data mining [e] Neural networks [d) Robotics Q.11 The individual tuples making up the training set are referred to as and are selected from the database under analysis, a leaming tuples {b! training tuptes ‘c) samples ld) database TECHNICAL PUBLICATIONS® - an up-thnist for knontedga Scanned With CamScanner = Introduction to Machine Leeming 1-38 introduction fo Machin Lnaning G12 Training perceptron is based on i: a) supervised learning technique {b) umsupervised leering |e] reinforced learning {d] stochastic learning Q.13 List the clements of reinforcement learning. fal Policy |b] Reward function [e) Value function {al All of these Answer Keys for Multiple Choice Questions for. [a | O5 [ase | os Qo00 TECHNICAL PUBLICATIONS an up-thwust for knawiediga Scanned With CamScanner Basic Machine Learning Algorithms Syllabus | Linear Regression, Deciston Trees, Lecerning Decksiom Trees, Koncarest Metighbowr, Caltaherrattve Filtering, Overfining, Contents 2.1 Linear Rogrosston 22 Dwelalon Troo 2.3. Basle Declslon Troo Lenrning Algomttun 24 K-naorost Neighbour 2.6 Collaborative Filtoriag 26 Overtiing 27 Multiple Cholew Questions a0 Basic Machine: Leaming Ager [ESI Linear Regression © The most common regression algorithms are, a) Simple linear regression b) Multiple linear regression ©) Polynomial regression d) Multivariate adaptive regression splines ©) Logistic regression f) Maximum likelihood estimation (Least squares) ESE Simple Linear Regression fhich involves only one predictor. Linear regression is 2 + Regression model wi statistical method that allows us to summarize and study relationships between two continuous variables : 1. One variable, denoted x, is regarded as the ptedictor, explanatory, or independent variable. 2 The other variable, denoted y, is regarded as the response, outcome, or dependent variable. + Regression models predict a continuous variable, such as the sales or predict temperature of a city. Lot's imagine that you fit a line with the training points you have. Imagine you want to add another data point, but to fit it, you need to change your existing model. « This will happen with each data point that we add to the model; hence, line regression isn’t good for classification models. Regression estimates are used to explain the relationship between one dependest variable and one or more independent variables. . Regression line of X on Y gives the best estimate for the value of X for specific given values of Y : Xea+by Where a=X- intercept b = Slope of the line % = Dependent variable ¥ = Independent variable made on a dey TECHNICAL PUBLICATIONS® - an up-hrust for knewtedge Scanned with CamScanner ecucton 10 Race Lenina 2-3 __ Basie Machine Lenming Algorithms Change in¥ -J Change in X } = Y - intercept Fig. 2.1.4 «Regression analysis is the art and science of fitting straight lines to patterns of data. In a linear regression model, the variable of interest ("dependent” variable) is predicted from k other variables ("independent" variables) using a linear equation. HY denotes the dependent variable and X;,...,X,, are the independent variables, then the assumption is that the value of Y at time t in the data sample is determined by the linear equation : Yy = Bo tBrXq1 +BaXzu + +PKXEE tet where the betas are constants and the epsilons are independent and identically distributed normal random variables with mean zero. At each split point, the "error between the predicted value and the actual values is squared to get a “Sum of Squared Errors (SSE)". The split point errors across the variables are compared and the variable/point yielding the lowest SSE is chosen as the root node/split point. This process is recursively continued. + Error function measures how much our predictions deviate from the desired answers. Mean-squared error J, =i ¥ wifey? i=] on Advantages : a. Training a linear regression model is usually much faster than methods such as neural networks. . Linear regression models are simple and require minimum memory to implement <. By examining the magnitude and sign of the regression coefficients you can infer how predictor variables affect the target outcome. Scanned with CamScanner Fines Machine Lonny Ay Devtruwiuac ties Is Afavetairne Loenrrnirngy Mn, ERED Multipte Linear Regression wanton, which Nery ns n extension of linear of two ar mare eM rexston ts al Multiple Nnear regres Teal aw Hv Ftict response variable, yy to be aval Pedy! variables. bl sted ables, Le. session model, hwo of more Independent var Le. predic, . multiple Bp Tn a moltip *. weion madel and the muy + involved in the model. The almple near rege ; assume that the dependent variable bs continucus, regression must Difference between simple and multiple regression } Multiple regression ‘St, No. Simple vegresston i One dependent variable ¥ predicted (rom One dependent variable ¥ predicted fy fone independent varkable X, a net of Independent variables CK Hae eee Me) 2 (One regression coefficient, One regression coefficient for each independent variable, F : Fropertion of yarlation In dependent —R® + Proportion of variation in depervtes: variable Y predictable from X. varlable ¥ predictable by set of independent varisbles (Xs EGE) Lasso and Ride Regression + Ridge regression and the Lasso are two forms of regularized regression. Thes methods are seeking to improve the consequences of multicollinearity. 1. When variables are highly correlated, a large coefficient in one variable may be alleviated by a large coefficient in another variabl i correlated to the former, ee 2, Regularization imposes an upper threshold on the values taken by the soefficients, thereby producing a more parsimonious solution and a set coefficients with smaller variance. * Riggs estimation produces a biased estimator of the true parameter fi. EBM X) = OT eA Dp = UXTXHAD OT Xa = [-ATX+An-ys = B-AXTX+ Bp TECHNICAL PUBLICATIONS® - an up-tnnis fer hnowiedye Scanned With CamScanner irboduction fe Machine Leaming 2-5 Basic Machine Leaming Algonthms ———— i rer Mier + Ridge regression shrinks the regression coefficients by imposing a penalty on their size. The ridge coefficients minimize a penalized residual sum of squares, « Ridge regression protects against the patentially high variance of gradients estimated in the short directions Lasse : + One significant problem of ridge regression is that the penalty term will never force any of the coefficients to be exactly zero. Thus, the final model will include all p predictors, which creates a challenge in model interpretation. A mote moder machine learning altemative is the lasso. © The lasso works in a similar way to ridge regression, except it uses a different penalty term that shrinks some of the coefficients exactly to zero. « Lasso : Lasso is @ regularized regression machine learning technique that avoids over-fitting of training data and is useful for feature selection. Decision Tree * A decision tree is a simple representation for classifying examples, Decision tree Jeaming is one of the most successful techniques for supervised classification learning. + In decision analysis, a decision tree can be used to visually and explicitly represent decisions and decision making. As the name goes, it uses a tree-like model of decisions. + Leamed trees can also be represented as sets of if-then rules to improve human readability. + A decision tree has two kinds of nodes 1. Each leaf mode has a class label, determined by majority vote of training examples reaching that leaf. 2. Each internal node is a question on features. It branches out according to the answers. * Decision tree learning is a method for approximating discrete-valued target functions. The learned function is represented by a decision tree. A leamed decision tree can also be re-represented as a set of if-then rules, Decision tree learning is one of the most widely used and practical methods for inductive inference. It is robust to noisy data and capable of learning disjunctive expressions. Decision tree learning method searches a completely expressive hypothesis TECHNICAL PUBLICATIONS® - an up-tiust for knowledge Scanned With CamScanner Basic Machine Le: -6 and Friction to Meanie Gearing BSS jassifying examples aS positive of 4, Bat, [EEXI Decision Tree Representation * Goal : Build a decision tree for cl instances of a concept rocessing of training examples, using a Prefer, © Supervised learning, batch P bias. © A decision nee bs bes Ss cociated with it an attribute (feature), a. Each ~ “ ha associated with it a classification (+ or -) je has associal be aa leat _ qecociated with it one of the possible values of the attribute , « arc rected. i ; is direc from which the are In eg ee fa test on an attribute. Branch represents an outcome of g, 7 anrernei fades represent clas labels or cass distribution. A ‘i ision tree is a flow-chart-like tree structure, where each node denotes a ty peat value, each branch represents an outa of the test and be uw represent classes or class distributions. Decision trees can easily j, converted to classification rules Decision Tree Algorithm ia « To generate decision tree from the training tuples of data partition D. Input: 1. Data partition (D) 2. Attribute list 3. Attribute selection method Algorithm : 1, Create a node (N) 2. If tuples in D are all of the same class then 3. Return node (N) as a leaf node labeled with the class C. 4. If attribute list is empty then return N as a leaf node labeled with the major class in D 5. Apply attribute selection method(D, attribute list) to find the "best" splitit 6. Label node N with splitting criterion; If splitting attribute is discrete-valued and multiway splits allowed 2 -> splitting attribute 8 Then attribute list -> attribute list 9. For (each outcome j of Splitting criterion ) 10. Let Dj be the set of data tuples in D satisfying outcome j; JED, is empty then attach a leaf labeled with the majority class in D to node anaes IL, Seanned with ComScanner

You might also like