Engineer Being Machine Learning Notes

machine-learning-notes

Uploaded by

Kumar Sahu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

95 views95 pages

Engineer Being Machine Learning Notes

machine-learning-notes

Uploaded by

Kumar Sahu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 95

MACHINE LEARNING NOTES MOST IMPORTANT QUESTIONS OF MACHINE LEARNING AKTU -ENGINEER BEING MODULE1 PART-I Learning is the process of acquiring new understanding, knowledge, beltaviors, skills, values, attitudes, and preferences. Learning is any process by which a system improves its performance from experience. Ques2. What is Machine Learning? 2020-21 2M Ans. Machine learning (ML) is defined as a discipline of artificial intelligence (AT) that provides machines the ability to automatically learn from data and past experiences to identify patterns and make predictions with minimal human. intervention. “Machine learning enables 2 machine to cutomatically learn from data, improve performance from experiences, and predict things without being explicitly programmed”, Ques3. Difference between ML, Al, Deep Learning? 2020-21 2MArtificial Intelligence: AI is the broadest concept of all, and gives a machine the ability to imitate human Yr hvpccon behaviour. rE Wi genet, Machine Learning: Machine Leaming uses algorithms and techniques that enable the machines / to learn from past experience/trends and predict the i . ‘output based on that data, their performance improve, | ee 4/ as they are exposed to more data over time. x Deep Learning: subset of machine learning in = which multilayered neural networks learn from vast amounts of data. The main difference between machine learning and deep learning technologies is of presentation of data. Machine learning uses structured/unstructured data for learning, while deep learning uses neural networks for leaming models. a a 2 Ans, Machine learning is important because it gives enterprises a view of trends in customer behavior and business operational patterns, as well as supports the development of new products. Many of today's leading companies, such as Facebook, Google and Uber, make machine learning a central part of their operations; Machine learning’has become.a significant competitive differentiator for many companies. Applications of ML: 1. Image recognition: a. Image recognition is the process of identifying and detecting an object or a feature in a digital image or video. b. This is used in many applications like systems for factory automation, toll booth monitoring, and security surveillance.2. Speech recognition : a. Speech Recognition (SR) is the translation of spoken words into text. b. It is also known as Automatic Speech Recognition (ASR), computer speech recognition, or Speech To Text (STT). c. In speech recognition, a sottware application recognizes spoken words. 3.Product recommendation Machine learning is widely used by various e-commerce and entertainment companies such as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some product on Amazon, then we started getting an advertisement for the same product while internet surfing on the same browser and this is because of machine leaning. 4, Email Spam and Malware Filtering: Whenever we receive a new email, itis filtered automatically as important, normal; and spam. We always iecvive au important unail in our inbox with dhe important symbol and spam emails in our spam box, and the technology behind this is Machine-learing, 5. Stock Market trading: Machine learning is widely used in stock market trading In the stock market, there is always a risk of up and downs in shares, so for this machine Iearning's long short term memory neural networks used for the prediction of stock market trends‘Types of Machine Learning: © Supervised Leaning © Unsupervised Learning © Reinforeement Learning Supervised learning is the types of machine leaming in which machines are trained using well "labelled" training data, and on basis of that data, machines predict the output. The labelled data means some input data 's already tagged with the correct output. Ex: Risk Assessment, Image classification, Fraud Detection, spam filtering, ete. om Oo ae aa ee 4 Uexaqon Heaxoger! Lersos eons Ovetp nee Types of Supervised leaning ¢ Classification: A classification problem is When the output variable is a category, such as “red” or “blue” “disease” and “no disease”, Yes-) MaleFemale, True-false, etc.ii. Regression: A regression problem is when the output variable is a real value, such as, Forecasting sales, Weather forecasting, ete. Unsupervised learning is a type of machine learning in which models are trained using unlabeled dataset and are allowed to act on that data without any supervision The goal of unsupervised learning is to find the underlying structure of dataset, group that data according ‘o similarities, and represent that dataset in a compressed format. + The output is dependent upon the coded algorithms hoy pote ees | + Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior. “Association: Association rule learning is a kind of unsupervised learning technique that tests for the reliance of one data element on another data element and design appropriately so that it can be more cost- effective. It tries to discover some interesting relations or associations between the variables of the dataset. Semi Supervised learning is between the supervised and unsupervised learning families. The semi-supervised models use both labeled and unlabeled data for training.Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to behave in an environment by performing the actions and seeing the results of actions. For each good action, the agent gets positive feedback. and for each bad action. the agent gets negative feedback or penalty. The main elements of an RL system are: + The agent or the learner cc +The environment the agent interacts ( ie [Pena ecu with v_| - “The policy that the agent follows to take actions + Soda “The reward signal that the agent observes upon taking action GENETIC ALGORITHM TRADITIONAL ALGORITHM ‘A genetic algorithm is a search-based | Traditional Algorithms refers algorithm used for solving optimization | to general algorithms we use to solve problems in machine learning problems. It is a methodical procedure to solve a given problem. There can be several algorithms to solve a problem. More Advanced Not as Advanced Used in ML, AT Used in Programming, Math,1) Process Complexity of Machine Learning The machine learning process is very complex, which is also another major issue faced by machine learning engineers and data scientists. There is the majority of hits and trial experiments; hence the probability of error 1s higher than expected. Further, it also includes analyzing the data, removing data bias, training data, applying complex mathematical calculations, etc., making the procedure more complicated and quite tedious. 2) Getting bad recommendations ‘A machine learning model operates imder a specific context which tesnlts in had recommendations and concept drift in the model. Suppose at a specific \time customer is looking for some gadgets, but now customer requirement changed over time but still machine learning model showing same recommendations to the customer while customer expectation has been changed. This incident is called a Data Drift. However, we can overcome this by regularly updating and monitoring data according to the expectations. 3) Overfitting and Underfitting Overfitting: Overfitting is one of the most common issues faced by Machine Learning engineers and data scientists. Whenever a machine learning model is trained with a huge amount of data, it starts capturing noise and inaccurate data into the training data set. Underfitting: Underfitting is just the opposite of overfitting. Whenever a machine learning model is trained with fewer amounts of data, and as a result, it proyides incomplete and inaccurate data and destroys the accuracy of the machine learning model. 4) Inadequate Training Data The major issue'that comes While Using machine learning algorithins/is the lack of quality as well as quantity of data. Although data plays a vital role in theprocessing of machine learning algorithms, many data scientists claim that inadequate data, noisy data, and unclean data are extremely cxhausting the machine learning algorithms For example, a simple task requires thousands of sample data, and an advanced task such as speech or image recognition needs millions of sample data examples. Further, data quality is also important for the algorithms to work ideally, but the absence of data quality is alsa found in Machine I earning applications 5) Monitoring and maintenance As we know that generalized output data is mandatory for any machine learning model. Hence, regular monitoring and maintenance become compulsory for the same. Different results for different actions require data change; hence editing of codes as well as resources for monitoring them also become necessary: © Decision Tree is a Supervised learning technique that can be used for both classification and Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome, o In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to make any decision and have ‘multiple branches), whereas Leaf nodes are’ the’ output! of" those decisions and do not contain any further branches. e In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree algorithm. representationENGINEER BEINGNING AKTU ER BEING. Machine Leaming ANN Clustering Reinforcement Leat Decision Tree Learnin} Bayesian Networks SVM (Support Vector Mac! Genetic Algorithms 2020-21 1M ‘The term "Artificial Neural Network" is derived from =o eel TenvOrkse . that develop ae structure fe ob a cua bee. Pialer to the human brain that has 0 neurons . TheseIna neural network, there are three essential layers — Input Layers The input layer is the first layer of an ANN that receives the input information in the form of various texts, numbers, audio files, image pixels, ete. HiddenvLayers — : - In the middle of the ANN model are the hidden layers. There can be a single hidden lay le hidden layers p of th ition input data and recognize Nf.Output Layer In the ou(put layer, we obtain the result that we obtain through rigorous computations performed by the middle layer. Artificial Neural Networks Application problems to apply: Following are the important Artificial Neural Networks applications — Handwritten Character Recognition ANNS are used for handwritten character recognition. Neural Networks are trained to recognize the handwritten characters which ean be in the form of lettcrs or digits. Facial Recognition In order to recognize the faces based on the identity of the person, we make use of neural networks, They are most commonly used in areas where the users require security access. Speech Recognition ANNs play an important role in speech recognition. The earlier models of Speech Recognition were based on statistical models like Hidden Markov Models. With the advent of deep learning. various types of neural networks are the absolute choice for obtaining an accurate classification. 2020-21 10M (UNIT2) SVM or Support Vector Machine:is’a lincar model for elassificationand:regression problems. It can solve linear and non-linear problems and work well for many practical problems:ccording to the SVM algorithm we find the points closest to the line from both the classes. These points are called support vectors. we compute the distance between the line and the support vectors. This distance is called the margin. Our goal is to maximize the margin. The hyperplane for which the margin is maximum is the optimal hyperplane. ‘Thus SVM tries to make a decision boundary in such a way that the separation between the two classes is as wide as possible. 10M Clustering © Away of grouping the data points into different clusters, consisting of similar data points. The objects with the possible similarities remain in a group that has less or no similarities with another group.” © Itis an unsupervised learning method, hence no supervision is provided to the algorithm, and it deals with the unlabeled dataset. After applying this clustering technique, each cluster or group is provided with a cluster-ID. ML system can vse this id to simplify the processing of large and complex datasets. The clustering technique is commonly used for statistical data analysis. Example = ‘Clustering technique with the real-world example of Mall: When We visit any shopping inall, we can observe that the things with Similar usage are grouped together. Such as the tshirts are grouped in one section, and trousers are at other sections, similarly, at vegetable sections, apples, bananas, Mangoes, ete,, are grouped in separate sections, so that we can easily find out the things. The clustering technique also works in the same way.Classification and Regression Regression and Classification algorithms are Supervised Learning algorithms. Both the algorithms are used for prediction in Machine learning and work with the labeled datasets. But the difference between both is how they are used for different machine learning problems Classification Regression Classification algorithms are used to predict/Classify the discrete values such as Male or Female, True or False, Spam or Not Spam, etc. Regression algorithms are used to predict the continuous values such as price, salary, age, etc. The task of the classification algorithm is to map the input value(x) with the discrete output variable(y). The task of the regression algorithm is to map the input value (x) with the continuous output variable(y). Classification Algorithms are used with discrete data. Regression Algorithms are used with continuous data. ‘The Classification algorithms can be divided into Binary Classifier and Mulli-class Classifier. The regression Algorithm can be further divided into Linear and Notw-linear Regression. Classification Algorithms can be used to solve classification problems suchas Identification of spam emails, Speech Recognition, Identification of cancer cells, etc. In; Email Spam Detection, the model is trained on the basis of millions of emails on different parameters; and whenever it receives a new email, it identifies whether the’email is spamvornot. 1° the email is spam, then it is moved to the Spam folder. Regression algorithms can be used to solve the regression problems such as Weather Prediction, House price prediction, ete Suppose we want to do weather forecasting, so for this, we will use the Regression algorithm. In weather prediction, the model is trained on the past datayand’once the training is completed, it can easily predict the weather for future days.2021-22 2M A Ieaming problem is said to be well defined if it has three features: the class of tasks, the measure of performance to be improved, and the source of experience. Ex: A checkers learning problem ~Task T: playing checkers ~Performance measure P: percent of gemes won against opponents —Traii 2 experience E. playing practice games against itself "Data Science is a field of deep study of data that includes extracting useful insights from the data, and processing that information using different tools, statistical models, and Machine learning élgorithms.", Machine Leaning allows the computers to learn from the past experiences by its own, it uses statistical methods to improve the performance and predict the output without being explicitly programmed Or Design the final design of Checkers Learning Program 2021-22 10M Learning is the process of acquiring new understanding, knowledge, behaviors skills, values, attitudes, and preferences. Learning is any process by which a system improves its performance from experience. Designing a Learning System in Machine Learning:Step 1) Choosing the Training Experience: The very important and first task is to choose the training data or training experience which will be fed to the Machine Learning Algorithm Three attributes are used: 1. Whether the training experience provides direct or indirect feedback regarding the choices made by the performance system. 2. Direct training examples in learning to play checkers consist of individual checkers board states and the correct move for each. 3. Indirect training examples in the same game consist of the move sequences and final outcomes of various games played in which information about the correctness of specific moves early in the game must be inferred indirectly from the fact that the game was eventually won or lost —credit assignment problem. 2. The degree to which the leamer controls the sequence of training examples. Example: ~The learner might rely on the teacher to select informative board states and to provide the correct move for each ~The learner might itself propose board states that it finds particularly confusing end ask the teacher for the correct move. Or the leamer may have complete control over the board states and (indirect) classifications, as it does when it leams by playing against itself with no teacher piescul 3.The representation of the distribution of samples across which performance will be tested is the third crucial attribute. This basically means the more diverse the set of training experience can be the better the performance can get. Example: If the training experience in play checkers consists only of games played against itself, the learner might never encounter certain crucial board states that are very likely to be played by the human checker’s champion. Step 2- Choosing target function: To determine what type of knowledge will be learned and how this will be used by the performance program, Example: —In play checkers, it needs to learn to choose the best move among those legal moves.Step 3- Choosing Representation for Target function: Once done with choosing the target function now we heve to choose a representation of this target function, When the machine algorithm has a complete list of all permitted movements, it may pick the best one using any format, such as linear equations, hierarchical graph representation, tabular form, and so on. Out of these moves, the NextMove function will move the Target move, which will increase the success rate. For example, if a chess machine has four alternative moves, the computer will select the most optimal move that will lead to victory. Step 4- Choosing Function Approximation Algorithm: In this step, we choose a learning algorithm that can approximate the target function chosen. This step further consists of two sub-steps: a. Estimating the training value, and b, Adjusting the weights. oe ps ieee ) bie Neo Ceverotor Hyp othest s probiew pes a Adlition Lote ERetsp es Trace Cope Keisrunp) The final design consists of four modules, as described in the picture 1. The performance system: The performance system solves the given performance task.ENGINEER BEINGMOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU IGINEER BEING MODULE2 PART-I —— —_—_——— un me 2020-21 10M Or Discuss Support vectors in SVM. 2020-21 2M Or a 2020-21 10M SVM or Support Vector Machine is a linear model for classification and regression problems. It can solve linear and non-linear problems and work well for many practical problems. It tries to classify data by finding a hyperplane that maximizes the margin between the classes in the training data, Hence, SVM is anexample of a large margin classifier. The idea of SVM is simple: The algorithm creates a line or a hyperplane which separates the data into classesAccording to the SVM algorithm we find the points closest to the line from both the classes. These points are called support vectors. we compute the distance between the line and the support vectors. This distance is called the margin, Our goal is to maximize the margin, The hyperplane for which the margin is maximum is the optimal hyperplane. Thus SYM tries to make a decision boundary in such a way that the separation between the two classes is as wide as possible. SVM KERNELS * SVM can work well in non-linear data cases using kernel trick. + The function of the kernel trick is to map the low-dimensional input space and transforms into a higher dimensional space. + In simple words, kemel converts non-separable problems into separable problems by adding more dimensions to it + Itmakes SVM more powerful, flexible and accurate.LASS Rete Be sacks Aero THREE TYPES OF KERNEL 1)Linear Kernel: A linear kernel can be used as normal dot product offany two given observations. The equations for the kernel function: K(x, xi)=sum(x+ xi) 2)Polynomial kernel: It is more generalized form of linear kernel and distinguish curved or nonlinear input space. Itis popular in image processing. Following is the formula for polynomial kernel — K(X, Xi}=1+sum(X* Xi)*d , d is the degree of the polynomial 3)Gaussian Radial Basis Function (RBF) Kernel: RBF kernel, mostly used in SVM classification, maps input space in indefinite dimensional space.It is a general-purpose kernel; used when there is no prior knowledge about the data Following formula explains it mathematically : K(x, xi)-exp(-gamma + sum(x-xi*2)) Gamma funetion: 1/20? APPLICATIONS OF KERNEL «Face detection ~ SVM classify parts of the image as a face and non-face and create a square boundary around the face. «Handwriting recognition — We use SVMs to recognize handwritten characters used widely. «Texture Classification using SVM- In this SVM application, we use the images of certain textures and use that data to classify whether the surface is smooth or not. + Stenography Detection in Digital Images Using SVM, we can find out if an image is pure or adulterated. This could be used in security-based organizations to uncover secret messages. Yes, we can encrypt messages in high-resolution images In high-resolution images, there are more pixels, hence, the message is more hard to find. We can segregate 'the pixels and store in datain various datasets. We can analyze those datasets using SVM. PROPERTIES OF SVM: 1. Flexibility: in choosing a similarity-funetion2, Sparseness of solution when dealing with large data sets- only support vectors are used to specify the separating hyperplane 3. Ability to handle large feature spaces- complexity does not depend on the dimensionality of the feature space 4. Overfitting can be controlled by soft margin approach (we let some data points enter our margin intentionally) s. A simple convex optimization problem which is guaranteed to converge to a single global solution. DISADVANTAGES OF SVM: 1. SVM algorithm is not suitable for large data sets because the required training time is higher 2. SVM does not perform very well when the data set has more noise:i. target classes are overlapping. 3. In cases where the number of features for each data point exceeds the number of training data samples, the SVM will underperform, 4. SVMs with the ‘wrong’ kernel - For SVMs nowadays, choosing the right kemel function is key. As an example, using the linear kernel when the data are not linearly separable results in the algorithm performing poorly. 2020-21 2M Regression is asupervised leaming technique which helps in finding the correlation between variables and cnables us to predict the continuous output variable based on the one or more predictor variables.It is mainly used for prediction, forecasting, time series modeling, and determining the causal-effect relationship between variables Some examples of regression can be as: o Prediction of rain using temperature and other factors co Determining Market trends © Prediction of road accidents due to.rash driving. It is used to find the trends in data. By performing the regression, we can confidently determine the most important factor, the least important factor, and how each factor is affecting the other factors. Linear Regression Logistic Regression Linear Regression is a supetvised regression model. Logistic Regression is a supervised classification model Tn Linear Regression, we predict the value by an integer number. In Logistic Regression, we predict) the value by | or 0 It is based on the estimation. Teast square It is based on maximum likelihood. estimation. Here when we plot the training datasets, a straight line can be drawn that touches maximum plots. Any change in the coefficient leads toa change in both the direction and the steepness of the logistic function, It means positive slopes result in an S- shaped curve and negative slopes result in a Z-shaped curve Linear regression is used to estimate the dependent variable in case of a change in independent variables..For example, predict the price of houses. Whereas logistic repression 1s used to calculate the probability of an event Forexample, classify.if tissue is benign or malignant.MOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU -ENGINEER BE! MODULE2 PART- IL 2020-21 2M. The vertices and edges in Bayesian Network have some sort of meaning, The network building itself gives you important information about the subject dependence between the variables. With Neural Networks the network structure does not tell you anything like Bayesian Network. Similarity in ANN and Bayesian Network is that they both uses directed graphs i I Le Output Input #4 uct 2) Hae aie 2021-22 10MOr 2021-22 10M Bayes theorem is one of the most popular machine learning concepts that helps to calculate the probability of occurring ore event with uncertain knowledge while other one has already occurred, Bayes Theorem is a way of finding a probability when we know certain other posisbilities. P(X|Y) = PCv[X).P(X) PCY), Which tells us: how often X happens given that Y happens, written P(X/Y), When we know: how ofter Y happens given that X happens, written P(Y/X) und how likely X Is on Its own, written P(X) and how likely Y is on its own, written P(Y) The above equatiomis called/as Bayes Rule or Bayes Theorem: ©” P(X{Y) is called as posterior, which we need to Calculate: It is defined as updated probability after considering the evidence. © P (Y|X) is called the likelihood. It is the probability of evidence when hypothesis is true, © P(X) is called the prior probability, probability of hypothesis before considering the evidence © P(Y) is called marginal probability. It is defined as the probability of evidence under any consideration,Hence, Bayes Theorem can be written as: posterior = likelihood * prior / evidence EXAMPLE: © Dangerous fires are rare (1%) © But smoke is fairly common (10%) due to barbecues. «And 90% of dangerous fires. make smoke We can then discover the probability of dangerous fire when there is no smoke: P(Fire/Smoke) = P(Fire) P(Smoke/FireyP(Smoke) = (1% * 90% )/ 10% =9% Naive Bayes Classifier Algorithm on Bayes theorem and used for solving classification problems. ive Bayes algorithm is a supervised learning algorithm, which is based © It is mainly used in text classification that includes a high-dimensional training dataset. Itisa probal probability of an object. ic classifier, which means it predicts on the basis of the © Some popular examples of Narve Bayes Algorithm are spam filtration, Sentimental analysis, and classifying articles. The distinction between Bayes theorem and Naive Bayes is that Nalve Bayes assumes conditional independence where Bayes theorem does not. This means the relationship between all input features are independent . Working of Naive Bayes' Classifier: Working of Naive Bayes' Classifier can be understood with the help of the below exampleSuppose we have a dataset of weather conditions and corresponding target variable "Play". So using this dataset we need to decide that whether we should play or not on a particular day according ‘o the weather conditions. So to solve this problem, we need to follow the below steps: 1. Convert the given dataset into frequency tables 2. Generate Likelihood table by finding the probabilities of given features. 3. Now, use Bayes theorem to calculate the posterior probability. Problem: If the weather is sunny, then the Player should play or not? Outlook Play 0 Rainy Yes I Sunny Yes 2 Overcast Yes 3 Overcast Yes 4 Sunny No 5 Rainy Yes 6 Sunny Yes 7 Overcast Yes g Rainy No 9 Sunny No 10 Sunny Yes il Rainy No 12 Overcast Yes 13 Overcast Yes Likelihood Table:Frequency Table: Weather Yes Overcast Rainy N Sunny N Total on Applying Bayes theorem: Weather No Overcast 0 5/14=0,35 Rainy A 2 4/14=0.29 Sunny 3 3/14=U.35 All 4/14=0.29 10/14=0.71 P(¥es | Sunny)= PGunny | Yes)*P(Ves)/P(Sunny) P(Sunny | Yes)= 3/10= 0.3 P(Sunny)=0.35 P(Yes)}=0.71 So P(Yes | Sunny) = 0.3*0.71/0.35= 0.60 P(No | Sunny)= P(Sunny | No)*P(No)/P{Sunny) P(Sunny | NO)=2, /A=0.5P(No)= 0.29 P(Sunny)= 0.35 AD So P(No | Sunny)=0.5*0.29/0.35 = 0.41 So as we can see from the above calculation that P(Yes | Suany)>P(No | Sunny) Hence on a Sunny day, Player can play the game. Ques 3) what problem does EM algorithm solves? 10M 2021-22 Or what are (ask of E-steps in EM Algorithai? 2M 2020-21 The Expectation-Maximization (EM) algorithm is defined as the combination of: various unsupervised machine learning algorithms, which is used to determine the local maximum likelihood estimates (MLE) or maximum a posteriori estimates (MAP) for unobservable variables in statistical models. it is a technique to find maximum likelihood estimation when the latent variables arc present. It is also referred to as the latent variable model. A latent variable model consists of both observable and unobservable variables where observable can be predicted while unobserved are inferred from the observed variable. These unobservable variables are known as latent variables Steps in EM Algorithm The EM algorithmyis completed mainly in’4/steps, which include Initialization ‘Step, Expectation Step, Maximization Step, and convergence Step.ep Initiar Values 1” Step: The very first step is to initialize the parameter values. Further, the system is provided with incomplete observed data with the assumption that data is obtained from a specific model. 2" Step: This step is known as Expectation or E-Step, which is used, to. estimate or guess the values of the missing or incomplete data using the observed data. Further. E-step primarily updates the variables. 3" Step: This step is known as Maximization or M-step, where we use complete data obtained from the 2" step to update the parameter values. Further, M-step primarily updates the hypothesis, 4" step: The last step is to check if the values of latent variables are converging or not. If it gets "yes", then stop thé’ process’"élse, repeat the process from step 2 until the convergence occurs.MOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU -ENGINEER BEI MODULE 3 PART-I If we depend too much on the training data while drawing the decision tree, there is a possibility that the tree will go into overfitting. That is, a particular hypothesis will work good on the training data, but it doesn’t work good on Testing or the real world data So such tree is called.as a overfitting. Underfitting occurs when our machine learning model is not able to capture the underlying trend of the data, In the ease of underfitting, the model is not able to lear enough from the training data, and hence it reduces the accuracy and produces unreliable predictions. a. overfitting the data b. han ing continuous valued attribute ¢. handling missing attribute values d. handling attributes with different costsAns: a. overfitting the data If we depend too much on the training data while drawing the decision tree, there is a possibility that the tree will go into oyerfitting. That is, a particular hypothesis will work good on the training data, but it doesn’t work good on Testing or the real world data So such tree is called as a overfitting. This particular overfitting can be addressed with the two techniques reduced error pruning post rule pruning. The decision tree works well with the problems where we have fixed number of attributes and the discrete number of possibilities for each attributes. Ifa particular attribute has the continuous values, then we cannot apply the decision tree directly. First, we need to convert those particular attributes which are having continuous values into a discrete possibilities, Then only we can apply decision tree learning. if you have some missing attributes, we need to fill those particular missing attributes with a proper values then only we can use this learning. Let us say that a particular attribute is not having a value, we need to find some value or fill it with the proper value Whenever we apply decision tree algorithm, cach and every attribute in the given eqiial importiiice. But sometimes what heppens is @ given problem definition, there is a possibility that a particular attribute may haye more importance or it is given more weightage. In such case We cannot use the core decision tree learning. We need to handle this particular issue with some sort of calculation.w= (a4) 4 (au = 8 ra ay i so (GDA | =f Is 3 a? BEINGeT Ks ENGINEER BEINGPlayTennis: training examples Day [] Outlook Temperature Humility PlayTennis pi |} Sunny Hoot ‘igh No v2 |) sunny Hoot High No D3 |] Overcast Hot High Yo pa |] main ava gn res ps |] Rain ‘Cool Normal Yo D6 |} Rain Cool Normal No 7 |] Overcast Cool Normal Yes ps |} sunny Mild High No pe |} sunny Cool ‘Normal Yo p10 |] Rain Mil oemal Yeo pu |} Sunny Mild Normal Yes 12 |] Overcast Mild igh Yeo Overcast Yes Rain No. In Decision Tree the major challenge is to identification of the attribute for the root node in each level. This process is known as attribute selection. We have two popular attribute selection measures: 1. Information Gain 2. Gini Index Information Gain When we use a node in a decision tree to partition the training instances into smaller subsets the entropy changes. Information gain is a measure of this change inentropy: Gain(S,A)= Entropy(S) ~ S)y-ameats) Se-Entropy(Se) Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)peed0SSR 59925 a}saa}eiea}ds jaseanabaaiaee gpeed 2999589829 pebasappgagdds lesaaeanaaiaeeg| ENGINEER BEINGities} ilseaaeasanRaeeagssaaeansaes |alssassanaseaeaaMOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU -ENGINEER BEING MODULE 3 PART-II 2020-21 10M. Ans: Instance-based leaming refers to a family of techniques for classification and regression, which produce a class label/predication based on the similarity of the query to its nearest neighbor(s) in the training set. Some of the instance-based learning algorithms are = 1. K Nearest Neighbor (KNN) 2. Locally Weighted Leaming (LWL) 3. Case-Based Reasoning Locally weighted regression Locally weighted linear regression is a non-parametric algorithm, that is, ‘the model does not learn a fixed set of parameters as is done in ordinary linear regression. Rather parameters are computed individually for each query point. Locally weighted regression (LWR) is a memory-based method that performs a regression around a point of interest using only training data that are “‘local" to that point. Locally weighted linear regression is @ supervised learning algorithm. There exists No training phase. All the work is done during the testing phase/while making predictions. Locally weighted regression methods are a generalization of k-Nearest Neighbour:RBF network on layer, and an outp Input Layer The input layer simply feeds the data to the hidden layers. As a result, the number of neurons in the input layer should be equal to the dimensionality of the data_. Hidden Layer —Output Layer The output layer uses a linear acti regression tasks.In general, the case-based reasoning process entails: . Retrieve- Gathering from memory an experience closest to the current problem. 2. Reuse- Suggesting a solution based on the experience and adapting it to meet the demands of the new situation: . Revise- Evaluating the use of the solution in the new context. 4, Retain- Storing this new problem-solving method in the memory system. A CADET system employs case based reasoning to assist in the conceptual design of simple mechanical devices such as water faucets. Tt uses a library containing approximately 75 previous designs and design fragments of two suggest conceptual designs to meet the specifications of new design problem. at Bd ray Te tempore CUR * rsh {85 enkinfion +P Be > &th ca =e | ! at Qt Te co. The function is represented in terms of qualitative relationships among the water flow levels and temperatures at its inputs and outputs:© Inthe functional description, an arrow with a “+” labeled indicates that the variable at the arrow head increases with the variable at its tail. A “-” label indicates that the variable at the head decreases with the variable at the tail. o Here Qe refers to the flow of cold water from the into the faucet, Qh to the input flow of hot water, and Qm to the single mixed flow out of the faucet. o Tc, Th, Tm refers to the temperature of the cold water, hot water and mixed water respectively. © The variable Ct denotes the control signal for temperature that is input to the faucet and Cf denote the control signal for water flow. © The control Ct and Cfare to influence the water flow Qc and Qh, thereby indirectly influencing the faucet output flow Qm and temperature Tm. 2021-22 10M Ease of knowledge elicitation : Lazy methods can utilise easily available case or problem instances instead of rules that are difficult to extract. Absence of problem-solving bias: Cases can be used for multiple problem- solving purposes, because they are stored in a taw forms ‘This in contrast to eager methods, which can be used merely for the purpose for which the knowledge has already been compiled. Incremental learning : A CBL system can be put into operation with a minimal set solved casesifumishing the case basesThe'case base will be'filled:with new cases increasing the system’s problem-solving ability. Ease of maintenanee : This is particularly due to the fact that CBL systems can adapt to many changes in the problem domain and the relevant environment, merely by acquiring.Ease of explanation: The results ofa CBL system can be justified based upon the similarity of the current problem to the reirieved case.CBL are easily traceable to precedent cases, it is also easier to analyse failures of the system. For example, CASEY for classification of auditory impairments, CASCADE for classification of software failures 2021-22 The inductive bias (also known as learning bias) of a learning algorithm is the set of assumptions that the learner uses to predict outputs of given inputs that it has not encountered. In machine learning, one aims to construct algorithms that are able to learn to predict a certain target output. Inductive learning methods require a certain number of training examples to generalize accurately. Analytical learning stems from the idea that when not enough traming examples are provided, it may be possible to “replace” the “missing” examples by prior knowledge and deductive reasoning. 2021-22Lazy leaming Eager Learning Tazy learning, methods simply stare the data and generalizing beyond these data is postponed until an explicit request is made. Fager learning methods construct general (one fit all), explicit (input independent) description of the target function based on the provided training examples * Lazy learning methods can construct a different approximation to the target function for each encountered query instance. Eager learning methods use the same approximation to the target function, which must be learned based on training examples and before input queries are observed Tazy leaming is very suitable for complex and incomplete problem domainsINEER BEING. Perceptrons are the buildin learning algorithm of binary cl The perceptron consists of 4 parts. 1. Input values or One input layer 2. Weights and Bias 3. Net sum 4 Activation Function a. All b. Adec. Apply that weighted sum to the correct Activation Function, Weights shows the strength of the particular node A bias value allows you to shift the activation function curve up or down. In short, the activation functions are used to map the input between the required values like (0, 1) or (-1, 1). Perceptron 1s usually used to classity the data into two parts. Iherelore, t 1s also known as a Linear Binary Classifier. (Ques 2)What is Gradient descent? 2021-22 2M Gradient descent is an optimization algorithm which is commonly-used to train machine learning models and neural networks, to find a local minimum/maximum of a given function. This method is commonly used in machine leaming (ML) and deep learning(DLy to minimize a cost/loss function. 2021-22 2M. Tn machine learning, the delta rule is a gradient descent learning rule for updating the weights of the inputs to artificial neurons in a single-layer neural network. It is a special case of the more general backpropagation algorithm. (Ques-4y Describe BPN algorithm in ANN along with a suitable example, 2020-21 10MBack-propayation is used for the training of neural network. U Q The Backpropagation algorithm looks for the minimum value of the error function in weight space using a technique called the delta rule or gradient descent Thun ai lificial neural network, the values of weights and biases are randomly initialized. Due to random initialization, the neural network probably. as errors in giving the correct output. We need fo reduce error vahies as much as possible. So, for reducing these error values, we need a mechanism that can compare the desired output of the neuralBackpropagation is a short form for "backward propagation of errors." It is a standard method of training artificial neural networks. Backpropagation Algorithm: Step 1: Inputs X, arrive through the preconnected path. Step 2: The input is modeled using true weights W. Weights are usually chosen randomly. Step 3; Calculate dhe output of each neuron fiom the iuput layer w the hidden layer to the output layer. Step 4: Calculate the error in the outputs. Backpropagation Error Actual Output — Desired Output Step 5: From the output layer, go back to the hidden layer to adjust the weights to reduce the error. Step 6: Repeat the process until the desired output is achieved. Why We Need Backpropagation? Most prominent advantages of Backpropzgation are: + Backpropagation is fast, simple and easy to program + It isa flexible method as it does not require prior knowledge about the network + It is a standard method that generally works well + It does not need any special mention of the features of the function to be learned, Types of Backpropagation Networks Two Types of Backpropagation Networks are: + Static Back-propagation * Recurrent Backpropagation The output two runs of a neural network compete among themselves to become active. Several output neurons may be active, but in competitive only single output neuron is active at one time.2020-21 10M Self Organizing Map It follows an unsupervised lea competitive leaming algorithm. SOM is used for clustering and mapp to map multidimensional data onto lo reduce complex problems for easy interpretation.eh all the n¢ and calculate the Euclidean distar ight vector and the current input vector. The node wit g tor closest to the input tagged as the winning neuron. Step 4: Find the new weight between input vector sample and winning output Neuron 3 a = ‘“MOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU -ENGINEER BEING MODULE 4 PART-II ARTIFICIAL NEURAL NETWORKS — Perceptron’s, Multilayer perceptron, Gradient descent and the Delta mule, Multilayer networks, Derivation of Backpropagation Algorithm, Generalization, Unsupervised Leaming - SOM 1V_| Algorithm and its variant; DEEP LEARNING - Introduction, concept of convolutional neural.network , Types of layers ~ ( Convolutional Layers , Activation function , pooling , fally connected) , ‘Concept of Convolution (1D and 2D) layers, Training of network, Case study ofCNN for eg on Diabetic Retinopathy, Building a smart speaker, Self-deriving car ete Convolutional Neural Networks (CNNs) are specially designed to work with images. Convolutional Neural Networks (CNNs) are specially designed to work with images. An image consists of pixels. In deep learning, images are represented as arrays of pixel values, There are three main types of layers in a CNN: © Convolutional layers © Pooling layers © Fully connected (dense) layers: In addition to that, activation layers are added after each convolutional layer and fully connected layer.There are four main types of operations in a CNN: Convolution operation, Pooling operation, Flatten operation and Classification (or other relevant) operation, Convolutional layers and convolution operation: The first layer in a CNN is a convolutional layer. It takes the images as the input and begins to process. There are three elements in the convolutional layer: Input image, Filters and Feature map Secton (axa) Convolution operation between the image J and filter (i 3 5 Zi el|nto ° nv | & Le a 5 6 4 4 3 3 ol- ol olsle 3 o}i{1jo Feature map (4x4) Input image Convolutional (6x6) operation Filter: This is also called Kernel or Feature Detector. Image section: The size of the image section should be equal to the size of the filter(s) we choose. The number of image sections depends on the Stride. Feature map: The feature map stores the outputs of different convolution operations between different image sections and the filter(s). ‘The number of steps (pixels) that we shift the filter over the input image is called Stride.Padding adds additional pixels with zero values to cach side of the image. That helps to get the feature map of the same size as the input. Pooling layers and pooling operation Pooling layers are the second type of layer used in a CNN. There can be multiple pooling layers ina CNN. Each convolutional layer is followed by a pooling layer. So, convolution and pooling layers are es) used together as pairs It Reduce the dimensionality (number of pixels) of the output returned from previous convolutional layers. There are three elements in the pooling layer: Feature map, Filter and Pooled feature map. ‘There are two types of pooling operations. +) Max pooling: Get the maximum value in the area where the filter is applied. + Average pooling: Get the average of the values in the area where the filter is applied. ‘Then, we can flatten a pooled feature map that contains multiple channels. Fully connected (dense) layers2020-21 10MStep I: to c npur a 1 jo 4 jolt 9 |O ‘ lo [ftteey 1 fate oll i la \[o miu oll o |! fi | on oa Alea mie OU Tube Size of kernel or filter is 3*3 hence the size of image section is also 3*3o eS ° + lo OKI TORO TIKO = lixo+ict tixie] 2 4 IXLFIXt 40x09)=o , SS = —WY fiite ENGINEER BEINGMOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU -ENGINEER BEI MODULES PART-I REINFORCEMENT LEARNING — Introduction to Reinforeement Learning , Leaming Task, Example of Reinforcement Learning in Practice, Learning Models for Reinforcement ~ (Markov Decision process , Q Learning - Q Leaming function, Q Leaming Algorithm ), Application of Reinforcement Learning, Introduction to Deep Q Learning GENETIC ALGORITHMS: Introduction, Components, GA eyele of reproduction, Crossover, Mutation, Genetic Programming, Models of Evolution and Learning, Applications. Reinforcement Leaming is a feedback-based Machine learning technique in which an agent learns to behave in an environment by performing the actions and seeing the results of actions For each good action, the agent gets positive feedback, and for each bad action, the agent gets negative feedback or penalty. The elements of reinforcement leaning are: Agent, Environment, Action, State, Policy, Reward. Learning Models in RL: Markov Decision Process © Q-Learning Algorithm © Deep Q LeamingThe Markov Property state that : “Future is Independent of the past given the present” Mathematically we ean express this statement as : P[Si+1 | Si] = P[Sw1 | Sr, , Si] It says that "If the agent is present in the current state S1, performs an action al and move to the state s2, then the state transition from s1 to s2 only depends on the current state and future action and states do not depend on past actions, rewards, or states”. MDP is a framework that can solve most Reinforcement Learning problems with discrete actions. With the Markov Decision Process, an agent can arrive at an optimal policy for maximum rewards over time. Markoy Process is the memory less random process i.e. a sequence of a random state S[1],S[2].....S[n] with a Markoy Property. Markov decision process has 5 tuples(S,A,P,R, 3): Sis the set of states. ¢ Ais the set of action. © P(S, A, S’)is the probability that ection A in the state S at time T will lead to state S’ at time T+ 1 © R(S, A, S’) is the immediate reward received after a transition from State S to S dash due to action A.* Discount Factor (x): It determines how much importanee is to be given to the immediate reward and future rewards. It las a value between 0 and 1. Quearning algorithm © Q-leaming is a popular model-free reinforcement learning algorithm based. on the Bellman equation. «The main objective of Q-learning is to learn the policy which can inform the agent that what actions © The goal of the agent in Q-learning is to maximize the value of Q. = Qostands for quality in Q-learning, which meaus it specifies the quality of arf ation taken by the agent should be taken for maximizing the reward under what circumstances ¢ A Q-Table is used to find the best action for each state in the environment. We use the Bellman Equation at each state to get the expected future state and reward and save it in a table to compare with other states. Bellman Equation V(s) ~ max [R@a) + yV(S')] Where, V(s)= value calculated ata particular point.R(s, a) — Reward at a particular states by performing an action. y= Discount factor Q-Learning algorithm works like this: Initialize all Q-values, e.g., with zeros Choose an action a in the current state s based on the current best Q-value Perform this action a and observe the outcome (new state s’). Measure the reward R after this action Update Q with an update formula that is called the Bellman Equation. Repeat steps 2 to 5 until the learning no longer improves EXAMPLE: An example of Q-learning is an Advertisement recommendation system, In a normal ad recommendation system, the ads you get are based on your previous purchases or websites you may have visited, If you've bought a TV, you will get recommended TVs of different brands. Using Q-leaming, we can optimize the ad recommendation system to recommend products that are frequently bought together. The reward will be if the user clicks on the suggested product. DEEP Q-LEARNING MODEL 0” Q-Leamning approach is practical for very small environments’and quickly loses it’s feasibility when the number of states and actions in the environment increases. co. The solution for the above leads us to Deep Q Learning which uses a deep neural network to approximate the values. © Deep Q Learning uses the Q-learning idea and takes it one step further. © Tnsteadof using aQ-table;weusea Neural Network thattakes state and approximates the Q-values for each action based on that state The basic working step for Deep Q-Learning is that the initial state:is fed into the neural network and it retums the Q-value of all possible actions as an outputCPE uy) > Q-Table| => Q Value “Bed Po el eaey Pu Sri hel BBNSwrrryse ror Network ' ™D Q Value Action3 The difference between Q-Leaming and Deep Q-Leaming can be illustrated as follows: Gee FE i = , mo =" EHInstead of using a Q-table, we use a Neural Network that takes a state and approximates the Q-values for each action based on that state Deep Neural Network state ‘ . ‘Ques 6) What are the applications of reinforcement learning? Following are the applications of reinforcement learning : 1. Robotics for industrial automation. 2, Business strategy planning 3, Machine learning and data processing. 4, Ithelps us to create training systems that provide custom instruction and materials according to the requirement of students 5. Aircraft control and robot motion controlING AKTU INEER BEING.This algorithm refleets the process of natural selection where the fittest individuals are selected in order to produce offspring of the next generation, The process of natural selection starts with the selection of fittest individuals from a population. © They produce offspring which inherit the characteristics of the parents and will be added to the next generation. © If parents have better fitness, their offspring will be better than parents and have a better chance at surviving. This process keeps on iterating and at the end, a generation with the fittest individuals will be found. © This notion can be applied for a search problem. The genetic algorithm is a method for solving both constrained and unconstrained optimization problems that is based on natural selection, the process that drives biological evolution. The genetic algorithm repeatedly modifies a population of individual solutions. Five phases are considered in a genetic algorithm. 1 Initial population v Fitness function BY Séleetion 4” Crossover Mutation’ wInitial Population: The process begii set of it Is is called individual is a soh the problem you solve. Al { A2 a1 | 4:4 A3 1/0 A4 {1 4 Fitness Function ition. Each You ube The fitness function determines how fit an individual is (the ability of an individual [Sees Selection The idea o} the fittest indivi I ass their genes to the reTwo pairs of individ Individuals with hig! Crossover Crossover is the mo parents to be mated For example, consid shown below. Offspring are created by excha parents among themselves until the cro A1 |0/0|0}0/0|0Mutation In certain new of a mutation with a I lom probability. bit string can be fli © Be’ Mutation: Before and Mutation occurs to maintain within the population and premature convergence. Termination ‘The algorithin terminates ifthe population has converged (does not produce ofigpringiwhich are significantly different from the previous'Generation). Then itis said that the genctie algorithm has provided a Set oF solutions to ourproblem. BEINGI Once the initial generation is created, the algorithm evolves the generation using following 1) Selecti idea is to give prefe fitness sco nes2) Crossover Operator: This represents mating between individuals. Two individuals are selected using selection operator and crossover sites are chosen randomly. Then the genes at these crossover sites are exchanged thus creating a completely new individual (offspring). For example — ~ ~~ - 3)Mutation Operator: The key idea is to insert random genes in offspring to maintain the diversity in the population to avoid premature convergence. For example ~ FE ac Peal aa Bll BebeMOST IMPORTANT QUESTIONS MACHINE LEARNING AKTU IGINEER BEING INTRODUCTION — Learning, Types of Learning, Well defined learning problems, Designing a Leaming System, History of ML, Introduction of Machine Leaming Approaches — (Artificial Neural Network, Clustering, Reinforcement Leaming, Decision Tree Learning, Bayesian networks, Support Vector Machine, Genetic Algorithm), Issues in Machine Learning and Data Scicnce Vs Machine Learning; REGRESSION: Linear Regression and Logistic Regression BAYESIAN LEARNING - Bayes theorem, Concept learning, Bayes Optimal 11 _| Classifier, Natve Bayes classifier, Bayesian belief networks, EM algorithit. SUPPORT VECTOR MACHINE: Introduction, Types of support vector kernel ~ (Linear kemel, polynomial kernel, and Gaussian kernel), Hyperplane ~ (Decision surface), Properties of SVM, and Issues in SVM. In 1943, neurophysiologist Warren McCulloch and mathematician Walter Pitts created a model of neurons using an electrical circuit, and thus the neural network was created. 2. In 1952, Arthur Samuel created the first computer program which could lean as it ran, 3, Frank Rosenblatt designed the first artificial neural network in 1958, called Perceptron. The main goal of this was pattern and shape recognition 4, Use of back propagation in neural networks came in 1986, when researchers from the Stanford psychology department decided to extend an algorithm created by Widrow and Hoff in 1962. This allowed multiple layers to be used in a neural network, creating what are known’as®*slow learners, which willlearmoyer a long period of time. 5. In 1997, the IBM computer Deep Blue, which was a chess-playing computer, beat the world chess champion.21st Century : 1. Since the start o Jearning will increé A Bayesian netwo variables and their ¢ It is also called a Bayes model. Bayesian networks are probabil a probability distribution it consists of two parts: © Directed Acyclic Graph © Table of conditional probabilities.ind a variable can lat ity, cl rh itl rglary, nm juake occurred, 1 C2 be led . Conditional probability cy False = True 0.31 0.69 False False 0.001 0.999al Nel ~ “fT rh ENGIN JE ‘TR Rens .Ss- P(SIA)*P DIA Ls “5 PRP EE) “BEINGThe Bayes Optimal Classifier is a probabilistic model that predicts the most likely outcome for a new situation. It is based on bayes theorem It’s also related to Maximum a Posteriori (MAP), a probabilistic framework for determining the most likely hypothesis for a training dataset. Take a hypothesis space that has 3 hypotheses h1=0.4, h2=0.3, and h3=0.3. Hence, hl is the MAP hypothesis Let a new instance x is encountered, whieh is classified negative by h2 and 3 but positive by hl. P(uj{D) = Y) P(yjlhi)P (iD) hieH ‘The most probable classification of the new instance is obtained by g the predictions of all hypotheses, weighted by their posterior To illustrate in terms of the above example, the set of possible classifications of the new instance is V = (@, ©). and Ph {DY 4, POjhy) = 0, Pl@jhy) = P(h2\D) P(Olh2) = 1, P(@Ih2) = 0 Phy) D) = 3p P(Othy) = Ay PK@IA3) = O therefore DY Pemprddp) = 4 sa DY PeMprad) = 6 and argmax )* P(vjlhi)P(i|D) = © 148.0) helthim, Inductive bias, on theory, Information [NSTANCE-BASED d Regression, Radial In machine learning, # assumptions made by a generalize a finite set of obs domain. Inductive bias describes the basis tree over all the possible decision tree 1D3 scarch in favor of shorter tree over the longer ones and Sclee highest information gain as the root attribute. Thatis to say) indubtivednference i8 based On a genéralizition from a finite SeUOF, past observations, extending the observed pattern or relation to other future instances or instances occurting' elsewhere. It is Togieally true but it might not be realistically true.i. ID3 is an algorithm used to generate a decision tree from a dataset. ii. To construct a decision tree, ID3 uses 2 top-down, greedy search through the given sets, where each attribute at every tree node is tested to select the attribute that is best for classification of a given set. iii. For constructing a decision tree information gain is calculated for each and every attribute and attribute with the highest information gain becomes the root node. i.C4.5 is an algorithm used to generate a decision tree. It is an extension of 1D3 algorithm. ii, It is better than the IDB algorithm because it deals with both continuous and discrete attributes and also with the missing values and pruning trees after construction. iii, C5.0 is the commercial successor of C4.5 because it is faster, memory efficient and used for building smaller decision trees. v. C4.5 performs by default a tree pruning process. © Gini index is a measure of impurity or purity used while creating a decision tree in the CART(Classification and Regression Tree) algorithm. o An attribute with the low Gini index should be preferred as compared to the high Gini index © It only creates binary splits, and the CART algorithm uses the Gini index to create binary splits. e Gini index ean be calculated using the below formula: Gini Index= 1- Y;P? Decision trees can represent any boolean function of the input attributes. Let’s use decision trees to perform the function of three boolean gates AND, OR and XOR.© K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning technique. © K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categories. © K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means when new data appears then it can be easily classified into a well suite category by using K- NN algorithm, KNN Classifier > Input value Predicted Output Need With the help of K-NN, we can easily identify the category or class of a particular dataset. Working The K-NN working caf be explained on the basis of the below algorithm: © Step-1: Select the number K of the neighbors © Step-2: Calculate the Euclidean distance of K number of neighbors o Step-3: Take the K nearest neighbors as per the calculated Euclidean distances Step-4: Among these k neighbors, count thé number of the data points in each category.Step-5: Assign the new data points to that category for which the number of the neighbor is maximum, Our model 1s ready ARTIFICIAL NEURAL NETWORKS — Perceptron’s, Multilayer perceptron, Gradient descent and the Delta rule, Multilayer networks, Derivation of Backpropayaion Algoridhin, Generaligation, Unsupervised Learning — SOM 1V_| Algorithm and its variant; DEEP LEARNING - Introduction, concept of convolutional neural network , Types of layers — ( Convolutional Layers , Activation function , pooling , fully connected) , Concept of Convolution (1D and 2D) layers, Training of network, Case study of CNN for eg on Diabetic Retinopathy, Building a smart speaker, Self-deriving car etc. Convergence of neural networks is a point of training a model after which changes in the learning rate become lower and the errors produced by the model in training comes to a minimum. Convergence of the neural network helps in defining how many iterations of trai 2 a neural network will require to produce minimum errors. Most of the neural network fails to converge because The amount of the training data is low, Inappropriate weight application in the network, or Implementation of not enough nodes may be a reason behing this issue. There are various things to do that can help in avoiding this failure : Change in the activation funetionean be helpful; reinitializationof the weights of the network. A higher learning:rate or-thesnumber of epochs should be avoidedsto make the neural network converge faster.This essentially means how good our model is at learning from the given data and applying the leamt information elsewhere. When training a neural network, there’s going to be some data that the neural network trains on, and there’s going to be some data reserved for checking the performance of the neural network. If the neural network performs well on the data which it has not trained on, we can say it has generalized well. Due to overfitting, NN fails to form a general understanding. In neural networks, adding dropout neuroas is one of the most popular and effective ways to reduce overfitting in neural networks @, @ On Dropout appligdite'a neural network at a given instant 0, @_e. > i { Equivalent neural 4 network at this instant Self Driving carCNN is the primary algorithm that these systems use to recognize and classify different parts of the road, and to make appropriate decisions. To understand the workings of self-driving cars, we need to examine the four main parts: 1. Perception z Localization 3. Prediction 4. Decision Making Perception Perception, which helps the car see the world around itself, as well as recognize and classify the things that it sees. To achieve such a high level of perception, a self-driving car must have three sensors: 1. Camera 2. LiDAR Light Detection And Ranging 3. RADAR Radio detection and ranging Localization Localization algorithms in self-driving cars calculate the position and orientation of thevehicleas it navigates Prediction The car has a 360-degree view of its environment that enables it to perceive and capture all the information and process it. Prediction creates an n number of possible actions or moves based on ihe environment Decision-making Decision-making is vital in self-driving cars. In order to make a decision, the car should have enough information so that it can select the necessary set of actions.uilding a smart speaker A smart speaker is a wireless electronic device that can respond to spoken commands Hardware Components Raspberry Pi eKeSpeaker 2-mics Hat / USB mic / USB sound card © SD card © speaker 3.5mm Aux cable/ JST PH2.0 connector Speech recognition is used Convolutional Neural Network (CNN) is applied as advanced deep neural networks to classify each word from our pooled data set as a multi-class classification task. The proposed deep neural network returned 97.06% as word classification accuracy with a completely unknown speech sample. REINFORCEMENT LEARNING — Introduction to Reinforcement Learning , Learning Task, Example of Reinforcement Learning in Practice, Learning Models for Reinforcement — (Markov Decision process , Q Learning - Q Learning-function, V__ | Learning Algorithm ), Application of Reinforcement Learning, Introduction-to- Deep. QLeaming. GENETIC ALGORITHMS: Introduction, Components, GA cycle of reproduction, ‘Crossover, Mutation, Genctié Programming, Models of Evolution and Learning, Applications. 1. RL in Roboties Robotics without any doubt facilitates raining a robot in such 4 way thal a robot can perform tasks — justelike»a: human being can. But stillsthere is a biggerchallenge the rob aay a ren't able to use common sense wi a 2.Traffie Control Reinforcement learning i i isi ing and optimization for traffic control acti 3. Gaming From creating ane efficient and relatively easy resource on wh 4, Natural Lang ocessing Predictive text, text st are all examples of learning. By studying typic how people speak to each o Two evolution models. Lamarckian evolution. Baldwin effect. @ rN Lamatekian evolution believed that individual genetic makeup is changed by the lifetime experience, That i Bi sia Organism ee during its life to adopt 2 the in terms.L.Optimization ~ Genetic Algorithms are most commonly used in optimization problems wherein we have to maximize or minimize a given objective function value under a given set of constraints. 2. Traveling salesman problem (TSP) The main motive of this problem is to find an optimal way to be covered by the salesman. After each iteration, we can generate offspring solutions that can inherit the qualities of parent solutions 3. Financial markets In the financial market, using genetic optimization, we can solve a variety of issues because genetic optimization helps in finding an optimal set or combination of parameters that can affect the market rules and trades. 4, Manufacturing system One of the major applications of genetic optimization is to minimize a cost function using the optimized set of parameters. 5.Parametric Design of Aircraft ~ GAs have been used to design aircrafts by varying the parameters and evolving better solutions.

Engineer Being Machine Learning Notes
No ratings yet
Engineer Being Machine Learning Notes
95 pages
ML Unit 1 Intro ML
No ratings yet
ML Unit 1 Intro ML
43 pages
Module 1
No ratings yet
Module 1
54 pages
Machine Learning: Presentation
100% (2)
Machine Learning: Presentation
23 pages
Machine Learning BE Merged Modules
No ratings yet
Machine Learning BE Merged Modules
561 pages
UNIT III (ML, Classification, Regression, Types of ML)
No ratings yet
UNIT III (ML, Classification, Regression, Types of ML)
19 pages
Maharana Pratap Group of Institutions, Mandhana, Kanpur: Department of Computer Science Engineering)
No ratings yet
Maharana Pratap Group of Institutions, Mandhana, Kanpur: Department of Computer Science Engineering)
115 pages
Engineer Being Machine Learning Notes
No ratings yet
Engineer Being Machine Learning Notes
95 pages
Unit3 - Updated
No ratings yet
Unit3 - Updated
116 pages
ML
No ratings yet
ML
19 pages
ML, DL, DS
No ratings yet
ML, DL, DS
23 pages
Machine Learning
100% (2)
Machine Learning
81 pages
Intro To Machine Learning
No ratings yet
Intro To Machine Learning
31 pages
Machine Learning: Bimmactad, Franzes Louise Cacliong, Fredyhil Guinyang, Nora Dupingay, Cris Ann Padduyao, Ynhavianie
No ratings yet
Machine Learning: Bimmactad, Franzes Louise Cacliong, Fredyhil Guinyang, Nora Dupingay, Cris Ann Padduyao, Ynhavianie
35 pages
Machine Learning Practical File
No ratings yet
Machine Learning Practical File
41 pages
Unit 1
No ratings yet
Unit 1
55 pages
Machine Learning Life Cycle
No ratings yet
Machine Learning Life Cycle
25 pages
Unit V
No ratings yet
Unit V
67 pages
Machine Learning Techniques-Bcds062!01!01
No ratings yet
Machine Learning Techniques-Bcds062!01!01
66 pages
UNIT III DKD
No ratings yet
UNIT III DKD
48 pages
Unit Iii - Aiml
No ratings yet
Unit Iii - Aiml
47 pages
Data Science IV
No ratings yet
Data Science IV
126 pages
MLT Unit - 1
No ratings yet
MLT Unit - 1
38 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Introduction To ML
No ratings yet
Introduction To ML
17 pages
DS Unit2
No ratings yet
DS Unit2
23 pages
Lesson 2 Introduction To Machine Learning
No ratings yet
Lesson 2 Introduction To Machine Learning
38 pages
O220880ppt 1
No ratings yet
O220880ppt 1
19 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
20 pages
Python UNIT-5
100% (1)
Python UNIT-5
67 pages
ML Lec 1
No ratings yet
ML Lec 1
49 pages
Unit 1
No ratings yet
Unit 1
62 pages
Unit IV ML
No ratings yet
Unit IV ML
10 pages
ML R20 Material
No ratings yet
ML R20 Material
96 pages
Machine Learning Full Course
No ratings yet
Machine Learning Full Course
31 pages
Ml-Unit 1
No ratings yet
Ml-Unit 1
53 pages
ML-Unit 1 Merged
No ratings yet
ML-Unit 1 Merged
151 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
20 pages
Introduction To Machine Learning Basics
No ratings yet
Introduction To Machine Learning Basics
12 pages
ML-Unit 1
No ratings yet
ML-Unit 1
43 pages
Unit 5
No ratings yet
Unit 5
26 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
5 pages
Machine Learning Unit-1
No ratings yet
Machine Learning Unit-1
22 pages
Machine Learning
No ratings yet
Machine Learning
25 pages
Module 1 ML
No ratings yet
Module 1 ML
51 pages
Ai Faheem
No ratings yet
Ai Faheem
16 pages
ML Unit-1
No ratings yet
ML Unit-1
39 pages
5th Sem Report
No ratings yet
5th Sem Report
29 pages
Intorduction of ML
No ratings yet
Intorduction of ML
14 pages
Machine Learning Exploration
No ratings yet
Machine Learning Exploration
20 pages
Machine Learning Essentials
No ratings yet
Machine Learning Essentials
17 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
10 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
15 pages
Unit-1 Part-1 Material
No ratings yet
Unit-1 Part-1 Material
45 pages
ML Basics for Beginners
No ratings yet
ML Basics for Beginners
20 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
225 pages
Machine Learning: Louis Fippo Fitime
No ratings yet
Machine Learning: Louis Fippo Fitime
37 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
What Is Network
No ratings yet
What Is Network
16 pages
Comp Sys Sec Unit2
No ratings yet
Comp Sys Sec Unit2
5 pages
Chandrayaan-3 Article 12
No ratings yet
Chandrayaan-3 Article 12
2 pages

Engineer Being Machine Learning Notes

Uploaded by

Engineer Being Machine Learning Notes

Uploaded by

You might also like