0% found this document useful (0 votes)

13 views111 pages

Unit1 (Complete)

The document outlines a Machine Learning course, detailing its structure, topics, and lab programs. It covers various learning algorithms, including supervised, unsupervised, and reinforcement learning, along with practical applications and challenges. Additionally, it provides textbook references and course outcomes for students to achieve proficiency in machine learning techniques.

Uploaded by

mdfurhanpasha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views111 pages

Unit1 (Complete)

Uploaded by

mdfurhanpasha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 111

Machine Learning

23AI4PCMLG
UNIT -1
Machine Learning
Sem IV
Course Title: Machine Learning
Course Code: 23AI4PCMLG Total Contact Hours: 40 hours
L-T-P: 3-0-1 Total Credits: 4

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

UNIT WISE DETAILS
Unit
Topics Hours
No.
1 Machine Learning Landscape: Introduction, Types of Machine Learning, 8
Challenges of Machine Learning, Testing and Validating.
Supervised Learning
Decision Tree Learning: Decision tree representation, Appropriate problems
for decision tree learning, Basic decision tree learning algorithm, Issues in
Decision tree learning, CART Training algorithm

2 Support Vector Machines: Linear SVM, Non Linear SVM, SVM Regression, 8
Under the Hood.
Instance Based Learning: Introduction, k-Nearest Neighbor learning

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

3 Probabilistic Learning 8
Bayesian Learning: Bayes Theorem and Concept Learning, Maximum
Likelihood, Minimum Description Length Principle, Bayes Optimal Classifier,
Gibbs Algorithm, Naïve Bayes Classifier, Bayesian Belief Network, EM
Algorithm.
4 Ensemble Learning and Random Forests: Voting Classifiers, Bagging and 8
5 Unsupervised
Pasting, Learning
Random Techniques
Patches and Random Subspaces, Random Forests, 8
Boosting, – Kmeans, DBSCAN, Other Clustering Algorithms, Gaussian
ClusteringStacking
Mixtures – Anomaly Detection, Selecting Clustering, Bayesian Gaussian
Mixture Models, Other algorithms for anomaly and novelty detection

Reinforcement Learning: Markov Decision Process, Introduction, Learning

Task, Q Learning

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

TEXT BOOK DETAILS
Prescribed Text Book
Sl. No. Book Title Authors Edition Publisher Year
1. Machine Learning Tom M. First McGraw Hill 2013
Mitchell Education
2 Hands-On Machine Aurelien Geron Second O’Reilly 2020
Learning with Scikit-
Learn, Keras &
TensorFlow

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

TEXT BOOK DETAILS
Reference Text Book
Sl. Book Title Authors Edition Publisher Year
No.
1. Introduction to Andreas C First Shroff 2019
Machine Muller & Publishers
Learning Sarah
with Python Guido
2. Thoughtful Mathew First Shroff 2019
Machine Kirk Publishers
learning

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Course Outcomes

CO1 Apply different learning algorithms for various complex problems

CO2 Analyze the learning techniques for given dataset
CO3 Design a model using machine learning to solve a problem.
Ability to conduct practical experiments to solve problems using
CO4 appropriate machine learning techniques.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

SEE Exam Question paper format

Unit-1 Mandatory One Question to be asked for 20Marks

Unit-2 Mandatory One Question to be asked for 20Marks
Unit-3 Internal Choice Two Questions to be asked for 20Marks each
Unit-4 Internal Choice Two Questions to be asked for 20Marks each
Unit-5 Mandatory One Question to be asked for 20Marks

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Lab Program

Lab Unit# Program Details

Program
Write a program to demonstrate the working of the decision tree based ID3
1 1 algorithm. Use an appropriate data set for building the decision tree and apply
this knowledge to classify a new sample.
Develop a program to construct Support Vector Machine considering a Sample
2 2
Dataset
Write a program to implement k-Nearest Neighbour algorithm to classify the
3 2
iris data set. Print both correct and wrong predictions
Write a program to implement the naïve Bayesian classifier for a sample
4 3 training data set stored as a .CSV file. Compute the accuracy of the classifier,
considering few test data sets

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Lab Program

Write a program to construct a Bayesian network considering training data.

5 3
Use this model to make predictions.
Apply EM algorithm to cluster a set of data stored in a .CSV file. Compare the
6 3
results of k-Means algorithm and EM algorithm.
7 4 Implement Boosting ensemble method on a given dataset.
Write a program to construct random forest for a sample training data.
8 4
Display model accuracy using various metrics
9 5 Implement tic tac toe using reinforcement learning
Consider a sample application. Deploy machine learning model as a web
10 5
service and make them available for the users to predict a given instance.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Key Motivations for Machine Learning
Systems that support humans by either improving upon existing human capabilities or providing
new capabilities

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Problems Solved by Machine Learning
Today
Spam Detection
Information Retrieval
Recognition
Robotics
Recommendation Systems

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Problems Solved by Machine Learning
Today

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Problems Solved by Machine Learning
Today

Recognition

Information Retrieval Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Problems Solved by Machine Learning Today

Robotics
Recommendation Systems

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Problems Solved by Machine Learning
Today
Computer Vision Systems Home Virtual Assistants

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Machine Learning - Definition
Machine Learning is the science (and art) of programming computers so they can learn from
data.
Machine Learning is the field of study that gives computers the ability to learn without being
explicitly programmed. —Arthur Samuel, 1959
A computer program is said to learn from experience E with respect to some task T and some
performance measure P, if its performance on T, as measured by P, improves with experience E.
—Tom Mitchell, 1997

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Machine Learning
▪ The examples that the system uses to learn are called the training set.
▪ Each training example is called a training instance (or sample).
▪ In this case, the task T is to flag spam for new emails, the experience E is the training data,
and the performance measure P needs to be defined; for example, you can use the ratio of
correctly classified emails.
▪ This particular performance measure is called accuracy and it is often used in classification
tasks.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Why Use Machine Learning?
Consider how you would write a spam filter using traditional programming technical questions
some words or phrases (such as “4U,” “credit card,” “free,” and “amazing”)

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Why Use Machine Learning?
a spam filter based on Machine Learning techniques automatically learns which words and phrases are good
predictors of spam by detecting unusually frequent patterns of words in the spam examples compared to the
ham examples
If spammers notice that all their emails
containing “4U” are blocked, they might
start writing “For U” instead.
A spam filter using traditional programming
techniques would need to be updated to flag
“For U” emails.
If spammers keep working around your spam
filter, you will need to keep writing new rules
forever

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Why Use Machine Learning?
In contrast, a spam filter based on Machine Learning techniques automatically notices that “For
U” has become unusually frequent in spam flagged by users, and it starts flagging them without
your intervention

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Why Use Machine Learning?
Machine Learning can help humans learn
Applying ML techniques to dig into large amounts of data can help discover patterns that were
not immediately apparent. This is called data mining.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Types of Machine Learning Systems

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Types of Machine Learning Systems
Machine Learning systems can be classified according to the amount and type of supervision
they get during training.
1. Supervised/Unsupervised Learning
2. Unsupervised learning
3. Semi-supervised learning
4. Reinforcement Learning
5. Batch and Online Learning
6. Instance-Based Versus Model-Based Learning

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Supervised
The training data feed to the algorithm includes the desired solutions called labels
Classification
▪ Spam filter
▪ Trained with many example emails along with their class (spam or ham)
▪ Model must learn how to classify new emails

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Supervised

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Supervised
Regression
▪ Predict a target numeric value, such as the price of a car
▪ A set of features (mileage, age, brand, etc.) called predictors.
▪ To train the system, you need to give it many examples of cars, including both their predictors
and their labels (i.e., their prices).

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Supervised algorithms - Types
Supervised learning algorithms
• k-Nearest Neighbors
• Linear Regression
• Logistic Regression
• Support Vector Machines (SVMs)
• Decision Trees and Random Forests
• Neural networks

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Supervised – An Example for
classification

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Unsupervised learning
▪ DATA THAT IS NOT ASSOCIATED WITH LABELS ARE CALLED UNLABELLED DATA
▪ The training data is unlabeled
▪ The system tries to learn without a teacher.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Unsupervised learning
• Clustering
• Visualization and dimensionality
— K-Means reduction
— Principal Component Analysis (PCA)
— DBSCAN
— Kernel PCA
— Hierarchical Cluster Analysis (HCA) — Locally-Linear Embedding (LLE)
— t-distributed Stochastic Neighbor
• Anomaly detection and novelty detection
Embedding (t-SNE)
— One-class SVM • Association rule learning
— Apriori
— Isolation Forest
— Eclat

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Unsupervised learning
▪ Clustering - detect groups of similar
▪ Example - Hierarchical clustering algorithm- it
may also subdivide each group into smaller
groups.
▪ This may help you target your posts for each
group.
▪ Visualization algorithms are also good examples of
unsupervised learning algorithms
▪ Algorithm is feed with a lot of complex and
unlabeled data output is a 2D or 3D
representation of your data that can easily be
plotted

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Unsupervised learning

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Unsupervised learning
▪ Dimensionality reduction
▪ Simplify the data without losing too much information.
▪ One way to do this is to merge several correlated features into one.
▪ For example,
▪ a car’s mileage may be very correlated with its age
▪ Humidity and temperature

▪ Dimensionality reduction algorithm will merge them into one feature that represents the car’s
wear and tear. This is called feature extraction.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Unsupervised learning
Anomaly detection
▪ For example, detecting unusual credit card transactions to prevent fraud
▪ Catching manufacturing defects
▪ Automatically removing outliers from a dataset before feeding it to another learning algorithm

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Supervised learning v/s Unsupervised
learning

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Semisupervised learning
Some algorithms can deal with partially labeled training data, usually a lot of unlabeled data
and a little bit of labeled data.
▪ Photo-hosting services, such as Google Photos
▪ Once you upload all your family photos to the service, it automatically recognizes that the same
person A shows up in photos 1, 5, and 11, while another person B shows up in photos 2, 5, and
7.

▪ This is the unsupervised part of the algorithm

(clustering).
▪ Now all the system needs is for you to tell it
who these people are.
▪ Just one label per person,4 and it is able to
name everyone in every photo, which is useful
for searching photos.
Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1
Reinforcement Learning
The learning system 🡪 called an agent
Observe the environment, select and perform actions, and get rewards in return (or penalties in
the form of negative rewards
It must then learn by itself what is the best strategy, called a policy, to get the most reward over
time.
A policy defines what action the agent should choose when it is in a given situation.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Reinforcement Learning
DeepMind’s AlphaGo program
Made the headlines in May 2017 when it beat the world champion Ke Jie at the game of Go.
It learned its winning policy by analyzing millions of games, and then playing many games against itself

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Batch and Online Learning

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Batch Learning
▪ System can learn incrementally from a stream of incoming data
▪ It must be trained using all the available data.
▪ This will generally take a lot of time and computing resources, so it is typically done offline.
▪ Also called Offline learning
▪ System is trained and launched into production
▪ Requires no more learning it just applies what it has learned.
▪ What happens when there is New data ??? (such as a new type of spam)
▪ Train a new version of the system from scratch on the full dataset (not just the new data, but also the
old data)
▪ Stop the old system and replace it with the new one.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Batch Learning
▪ The process of automating is fairly easily
▪ Training on the full set of data requires a lot of computing resources (CPU, memory space, disk
space, disk I/O, network I/O, etc.).
▪ Cost for training is huge
▪ Bad option for system that is with limited resources

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Online learning
▪ Train the system incrementally by feeding it data instances
▪ Sequentially
▪ Individually
▪ By small groups called mini-batches.

▪ The algorithm loads part of the data, runs a training step on that data, and repeats the process
until it has run on all of the data
▪ Each learning step is fast and cheap, so the
system can learn about new data on the fly

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Online learning
▪ Receive data as a continuous flow (e.g., stock prices) and need to adapt to change rapidly or
autonomously.
▪ It is also a good option if you have limited computing resources Train systems on huge datasets
that cannot fit in one machine’s main memory (this is called out-of-core learning)

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Online learning
Learning rate
▪ How fast they should adapt to changing data ?
▪ If you set a high learning rate, then system will learn fast
▪ Rapidly adapt to new data, but it will also tend to quickly forget the old data
▪ If you set a low learning rate, then system will learn slow

▪ A big challenge with online learning is that if bad data is fed to the system, the system’s
performance will gradually decline.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Instance-based learning
▪ Flagging emails that are identical to known spam emails
▪ Alterative approach - Flag emails that are very similar to known spam emails.
▪ Measure of similarity between two emails.
▪ Count the number of words they have in common.
▪ Steps
▪ Learns
▪ Generalizes to new cases by comparing them to the learned examples
▪ The new instance would be classified as a triangle because the majority of the most similar
instances belong to that class.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Model-based learning
▪ Steps
▪ Study the data
▪ Select a model
▪ Train it on the training data (i.e., the learning algorithm searched for the model parameter values
that minimize a cost function).
▪ Finally, you applied the model to make predictions on new cases (this is called inference), hoping
that this model will generalize well

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Main Challenges of Machine Learning
Insufficient Quantity of Training Data
❑ very simple problems needs thousands of examples,
❑ complex problems such as image or speech recognition needs millions of examples
Nonrepresentative Training Data
❑ In order to generalize well, it is crucial that your training data be representative of the new
cases you want to generalize to
Poor-Quality Data
❑If training data is full of errors, outliers, and noise
❑System finds to difficult to detect the underlying patterns
❑Hence less likely to perform well. Solution - Cleanse

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Main Challenges of Machine Learning
Irrelevant Features
System will only be capable of learning if the training data contains enough relevant features
and not too many irrelevant ones
Feature engineering
Identify good set of features to train on
• Feature selection: selecting the most useful features to train on among existing features.
• Feature extraction: combining existing features to produce a more useful one (
• Creating new features by gathering new data.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Main Challenges of Machine Learning
Overfitting the Training Data
it means that the model performs well on the training data, but it does not generalize well.
Solutions
▪ To simplify the model by selecting one with fewer parameters
▪ Reduce the number of attributes in the training data or by constraining the model
▪ To gather more training data
▪ To reduce the noise in the training data (e.g., fix data errors and remove outliers)

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Main Challenges of Machine Learning
▪ Underfitting the Training Data
▪ Occurs when a model is too simple
▪ Model needs more training time, more input features

▪ The main options to fix this problem are:

• Selecting a more powerful model, with more parameters
• Feeding better features to the learning algorithm (feature engineering)
• Reducing the constraints on the model (e.g., reducing the regularization hyper‐ parameter)

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Testing and Validating
The only way to know how well a model will generalize to new cases ???
▪ Try it out on new cases.
▪ Put your model in production and monitor how well it performs.
▪ Not the best idea.
Solution
▪ Split your data into two sets: the training set and the test set.
▪ The error rate on new cases is called the generalization error
▪ Evaluate model on the test set, get an estimate of this error.
▪ This value tells you how well your model will perform on instances it has never seen before.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Hyperparameter Tuning and Model
Selection
▪ How can you decide between two models (say a linear model and a polynomial model)
▪ One option is to train both and compare how well they generalize using the test set.
▪ Measuring the generalization error multiple times on the test set
▪ Model and hyperparameters is adapted to produce the best model for that particular set.
▪ This means that the model is unlikely to perform as well on new data.

▪ A common solution to this problem is called holdout validation

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Hyperparameter Tuning and Model
Selection - holdout validation

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Hyperparameter Tuning and Model
Selection - holdout validation
However, if the validation set is too small, then model evaluations will be imprecise
We end up selecting a suboptimal model by mistake.
Conversely, if the validation set is too large, then the remaining training set will be much
smaller than the full training set.
Solution
Perform repeated cross-validation, using many small validation sets

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Hyperparameter Tuning and Model
Selection – Cross Validation

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Data Mismatch
▪ One solution is to hold out part
▪ Dataset is split into
o Training set is used for training.
o The Dev set (often called “validation set”) is used for adjusting the model
o The Test set is a final check of your completed model.
▪ After the model is trained evaluate it on the train-dev set: performance is good 🡪 model is not
overfitting the training set
▪ performance is poor🡪 on the validation set, the problem must come from the data mismatch

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Data Mismatch
Here is a way to split the data into three sets: 80% train, 10% dev and 10% test. OR 60% train,
20% dev and 20% test

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Decision Trees

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Decision Trees
The initial node is called the root node (colored in blue), the final nodes are called the leaf
nodes (colored in green) and the rest of the nodes are called intermediate or internal nodes.

Decision tree that is used to classify whether a person is Fit or Unfit.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Decision Trees
ID3 stands for iteratively (repeatedly) dichotomizes(divides) features into two or more
groups at each step.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Decision Trees
(Outlook = Sunny, Temperature = Hot,
Humidity = High, Wind = Strong)

(Outlook = Sunny A Humidity = Normal)

V (Outlook = Overcast) v (Outlook = Rain
A Wind = Weak)

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

APPROPRIATE PROBLEMS FOR
DECISION TREE LEARNING
Decision tree learning is generally best suited to problems with the following characteristics:
1. Instances are represented by attribute-value pairs.
▪ Instances are described by a fixed set of attributes (e.g., Temperature) and their values (e.g., Hot).
▪ Each attribute takes on a small number of disjoint possible values e.g., Hot, Mild, Cold
2. The target function has discrete output values.
▪ Assigns a boolean classification (e.g., yes or no) to each example
3. Disjunctive descriptions may be required - represent disjunctive expressions.
4. Decision tree learning methods are robust to errors
▪ Errors in classifications of the training examples
▪ Errors in the attribute values that describe these examples

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

THE BASIC DECISION TREE LEARNING
ALGORITHM
▪ Our basic algorithm, ID3, learns decision trees by constructing them top-down
▪ Which attribute should be tested at the root of the tree?
▪ Each instance attribute is evaluated using a statistical test to determine how well it alone classifies
the training examples.
▪ The best attribute is selected and used as the test at the root node of the tree.
▪ A descendant of the root node is then created for each possible value of this attribute, and the
training examples are sorted to the appropriate descendant node

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

THE BASIC DECISION TREE LEARNING
ALGORITHM
Which Attribute Is the Best Classifier?
What is a good quantitative measure of an attribute?
Statistical property - information gain and Gini Index
Measures how well a given attribute separates the training examples according to their target
classification.
ID3 uses this information gain measure to select among the candidate attributes at each step
while growing the tree.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

THE BASIC DECISION TREE LEARNING
ALGORITHM
Information gain and Entropy

Entropy: Entropy is a measure of any sort of uncertainty that is present in data.

Information gain: Suggests how much information a particular feature or a particular

variable gives us about final outcomes.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

THE BASIC DECISION TREE LEARNING
ALGORITHM

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1
THE BASIC DECISION TREE LEARNING
ALGORITHM
Gain(S, Outlook) = 0.246
Gain(S, Humidity) = 0.151
Gain(S, Wind) = 0.048
Gain(S, Temperature) = 0.029
information gain measure, the Outlook attribute provides the best prediction of the target
attribute, PlayTennis, over the training examples.
Therefore, Outlook is selected as the decision attribute for the root node, and branches are
created below the root for each of its possible values (i.e.,Sunny, Overcast, and Rain).

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Decision Trees - Expression

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Decision Trees
Building a Decision Tree

1. First test all attributes and select the one that would function as the best root;
2. Break-up the training set into subsets based on the branches of the root node;
3. Test the remaining attributes to see which ones fit best underneath the branches of the root
node;
4. Continue this process for all other branches until
a. all examples of a subset are of one type
b. there are no examples left (return majority classification of the parent)
c. there are no more attributes left (default value should be majority classification)

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Decision Trees
A is discrete-valued

A is continuous-valued

A is discrete-valued and a binary tree

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Issues of Decision Trees - Overfitting
Overfitting occurs when a decision tree model tries to cover all the data points
But also, simultaneous captures noise and irrelevant patterns in the training
data, instead of the underlying true patterns.
The model becomes specialized in the training data and fails to generalize well to
unseen data.
◦ This phenomenon is called Variance
◦ Variance: If the machine learning model performs well with the training dataset, but does
not perform well with the test dataset, then variance occurs.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Issues of Decision Trees
Classification Model must have
◦ Low Training Errors
◦ Low Generalisation Errors

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Issues of Decision Trees- Underfitting
Model fails to capture the underlying patterns in the training data
Hence reduced accuracy and produces unreliable predictions.
How to avoid underfitting:
• By increasing the training time of the model.
• By increasing the number of features.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Issues of Decision Trees- Overfitting
What is pruning ?
In general pruning is a process of removal of selected part of plant such as bud, branches
and roots .
In Decision Tree pruning removes the branches of decision tree
This is done to overcome the overfitting condition of decision tree.
Types
Post Pruning
Pre-Pruning

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Decision Trees

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Decision Trees

Since there is an outlier the partition of the tree is no pure .So the algorithm shall
add more layers.
But in the test data the 3 blues dots gets misclassified to orange ( overfitting of the
training data since it is not capable of generalising orange and blue dots)
Solution – Prune the tree

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Prepruning approach
A tree is “pruned” by halting its construction early
◦ Decide not to further split or partition the subset of training tuples at a given node.
◦ Tree should not get deep → specify the leaf depth
Upon halting, the node becomes a leaf.
The leaf may hold the most frequent class among the subset tuples or the probability distribution
of those tuples.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Prepruning approach

Setting of the Max- depth = 3 , pre-pruning

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Post Pruning

Removes subtrees from a “fully grown” tree.

A subtree at a given node is pruned by removing its branches and replacing it with a leaf.
The leaf is labeled with the most frequent class among the subtree being replaced.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

The tree has too many layers. Start at the deepest layer. Since the training data
has 3 blue and 1 orange the leaf would be blue

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Post Pruning
Here the Most common class within this subtree is “class B.”
In the pruned version of the tree, the subtree in question is pruned by replacing it with the leaf “class B.”

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Decision Trees
▪ During the late 1970s and early 1980s, J. Ross Quinlan, a researcher in machine learning,
developed a decision tree algorithm known as ID3 (Iterative Dichotomiser)
▪ Quinlan later presented C4.5 (a successor of ID3), which became a benchmark to which
newer supervised learning algorithms
▪ In 1984, a group of statisticians (L. Breiman, J. Friedman, R. Olshen, and C. Stone) published
the book Classification and Regression Trees (CART), which described the generation of
binary decision trees

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

CART (Classification And Regression Tree)
in Machine Learning
▪ CART uses the Classification And Regression Tree (CART) algorithm to
train Decision Trees (also called “growing” trees)
▪ It works by recursively partitioning the data into smaller and smaller subsets based on
certain criteria.
▪ The goal is to create a tree structure that can accurately predict the target variable for new
data points.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

CART
Gini Index
◦ The Gini index measures the impurity of D, a data partition or set of training tuples, as

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

CART
If A has v possible values, then there are possible subsets.
For example, if income has three possible values, namely
{low, medium, high}, then the possible subsets are
{low, medium, high}, {low, medium}, {low, high}, {medium, high}, {low}, {medium}, {high},
and {}
Therefore, there are − 2 possible ways to form two partitions of the data, D, based on a
binary split on A.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

CART
If a binary split on A partitions D into D1 and D2, the Gini index of D given that partitioning is

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

CART

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

CART

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

CART
Buys computer = YES/NO

To find the splitting criterion for the tuples in D, we need to compute the Gini index for each attribute.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

CART
Income
◦ {low, high} and {medium}) → 0.458
◦ {medium, high} and {low} → 0.450
◦ {low, medium} (or {high}) → 0.443
◦ The best binary split for attribute income is {low, medium} (or {high}) because it minimizes the Gini
index.
Evaluating age
◦ {youth, senior} (or {middle aged}) → Gini index of 0.375;
Student → Gini index 0.367
Credit rating → Gini index values 0.429
Age
{youth, senior} → 0.459 − 0.357 = 0.102

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

ISSUES IN DECISION TREE LEARNING
1. Avoiding Overfitting the Data
◦ REDUCED ERROR PRUNING
◦ RULE POST-PRUNING
2. Incorporating Continuous-Valued Attributes
3. Alternative Measures for Selecting Attributes
4.Handling Training Examples with Missing Attribute Values
5.Handling Attributes with Differing Costs

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Avoiding Overfitting the Data

Definition: Given a hypothesis space H, a hypothesis h E H is said to overfit the training data
if there exists some alternative hypothesis h' E H, such that h has smaller error than h' over
the training examples, but h' has a smaller error than h over the entire distribution of
instances.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

1. Avoiding Overfitting the Data

Figure - Illustrates the impact of overfitting in a typical application of decision tree learning.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

1. Avoiding Overfitting the Data

▪ ID3 algorithm is applied to the task of learning which medical patients have a form of diabetes.
▪ The horizontal axis of this plot indicates the total number of nodes in the decision tree, as the
tree is being constructed.
▪ The vertical axis indicates the accuracy of predictions made by the tree.
▪ The solid line shows the accuracy of the decision tree over the training examples, whereas the
broken line shows accuracy measured over an independent set of test examples (not included in
the training set).
▪ Predictably, the accuracy of the tree over the training examples increases monotonically as the
tree is grown.
▪ However, the accuracy measured over the independent test examples first increases, then
decreases.
▪ Once the tree size exceeds approximately 25 nodes, further elaboration of the tree decreases its
accuracy over the test examples despite increasing its accuracy on the training examples.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

1. Avoiding Overfitting the Data

▪ There are several approaches to avoiding overfitting in decision tree learning.

▪ These can be grouped into two classes :
▪ Approaches that stop growing the tree earlier, before it reaches the point where it perfectly
classifies the training data
▪ Approaches that allow the tree to overfit the data, and then post-prune the tree.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

1. Avoiding Overfitting the Data

What is the correct tree size is found by stopping early or by post-pruning ?

What is the criterion used to determine the correct final tree size ?
1. Use a separate set of examples- distinct from the training examples, to evaluate the utility
of post-pruning nodes from the tree.
2. Apply a statistical test to estimate whether expanding (or pruning) a particular node
example - chi-square test to estimate
3. Minimum Description Length principle - explicit measure of the complexity for encoding the
training examples and the decision tree, halting growth of the tree when this encoding size
is minimized.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

1.1 REDUCED ERROR PRUNING
How exactly can we use a validation set to prevent overfitting?
reduced-error pruning
◦ Consider each of the decision nodes in the tree to be candidates for pruning.
◦ Pruning a decision node consists of removing the subtree rooted at that node, making it a leaf
node,
◦ Assign it the most common classification of the training examples affiliated with that node.
◦ Nodes are removed should not cause any performance reduction than the original over the
validation set.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

1.1 REDUCED ERROR PRUNING

Figure - Shows the impact of reduced error pruning of the tree produced by ID3.
There is an increase in accuracy over the test set as nodes are pruned from the tree.
Here, the validation set used for pruning is distinct from both the training and test sets.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

1.2 RULE POST-PRUNING
Rule post-pruning involves the following steps:
1. Infer the decision tree from the training set, growing the tree until the training
data is fit as well as possible and allowing overfitting to occur.
2. Convert the learned tree into an equivalent set of rules by creating one rule
for each path from the root node to a leaf node.
3. Prune (generalize) each rule by removing any preconditions that result in
improving its estimated accuracy.
4. Sort the pruned rules by their estimated accuracy, and consider them in this
sequence when classifying subsequent instances.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

1.2 RULE POST-PRUNING
Each attribute test along the path from the
root to the leaf becomes a rule antecedent (precondition)
classification at the leaf node becomes the rule consequent (postcondition).
IF (Outlook = Sunny) A (Humidity = High)
THEN PlayTennis = No

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

1.2 RULE POST-PRUNING
Rule post-pruning would consider removing the preconditions
(Outlook = Sunny) and (Humidity = High).
Select pruning steps which produced the greatest improvement in estimated rule accuracy
Then consider pruning the second precondition as a further pruning step.
No pruning step is performed if it reduces the estimated rule accuracy.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

1.2 RULE POST-PRUNING
b. Estimate rule accuracy is to use a validation set of examples disjoint from the training set.
c. Another method, used by C4.5, is to evaluate performance based on the training set itself,
using a pessimistic estimate
◦ calculating the rule accuracy
◦ calculating the standard deviation

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

1.2 RULE POST-PRUNING
Why convert the decision tree to rules before pruning? There are three main advantages.
◦ Converting to rules allows distinguishing among the different contexts in which a decision node is
used.
◦ Each distinct path through the decision tree node produces a distinct rule,
◦ The pruning decision about that attribute test can be made differently for each path.
◦ In contrast, if the tree itself were pruned - remove the decision node completely, or to retain it in
its original form.
Converting to rules removes the distinction between attribute tests that occur near the root
of the tree and those that occur near the leaves.
Converting to rules improves readability - Rules are often easier for to understand.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

2.Incorporating Continuous-Valued
Attributes
▪ ID3 is restricted to attributes that take on a discrete set of values.
▪ First, the target attribute whose value is predicted by the learned tree must be discrete
valued.
▪ Second, the attributes tested in the decision nodes of the tree must also be discrete valued.
▪ Second restriction can be removed by incorporating continuous-valued decision attributes
▪ Define new discrete valued attributes that partition the continuous attribute value into a
discrete set of intervals.
▪ In particular, for an attribute A that is continuous-valued, the algorithm can dynamically
create a new boolean attribute A, that is true if A < c and false otherwise.
▪ How to select the best value for the threshold c.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

2. Incorporating Continuous-Valued
Attributes
▪ Generate a set of candidate thresholds midway between the corresponding values of A.
▪ computing the information gain for the Candidate thresholds
▪ In the example, there are two candidate thresholds, corresponding to the values of
Temperature at which the value of PlayTennis changes:
▪ (48 + 60)/2, and (80 + 90)/2.
▪ The information gain can then be computed for each of the candidate attributes,
▪ temperature > 54 temperature > 85

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

3.Alternative Measures for Selecting
Attributes
If we were to add this attribute to the data in Table 3.2
e.g., March 4, 1979
Identify very poor predictor of the target function over unseen instances.
select decision attributes based on some measure other than information gain

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

4. Handling Training Examples with
Missing Attribute Values
▪ Assign it the value that is most common among training examples at node n.
▪ Assign a probability to each of the possible values of A rather than simply assigning the most
common value to A(x)
If node n contains six known examples with A = 1
four known examples with A = 0,
▪ probability that A(x) = 1 is 0.6, and
▪ probability that A(x) = 0 is 0.4.
▪ Fractional 0.6 of instance x is now distributed down the branch for A = 1 and
▪ Fractional 0.4 of x down the other tree branch.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

5 Handling Attributes with Differing
Costs
Patients in terms of attributes such as Temperature, BiopsyResult, Pulse, BloodTestResults,
etc.
These attributes vary significantly in their costs
A cost term into the attribute selection measure

Tan and Schlimmer (1990) and Tan (1993) describe - robot perception task
Attribute cost is measured by the number of seconds required to obtain the attribute value
by positioning and operating the sonar.

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

5 Handling Attributes with Differing
Costs
Nunez (1988) describes application to learning medical diagnosis rules

where w E [0, 11 is a constant that determines the relative importance of cost versus
information gain

Dr. Lakshmi Shree K Dept of AI & DS ML UNIT -1

Chapter 1 Enrichment 3A Math in Focus
67% (3)
Chapter 1 Enrichment 3A Math in Focus
7 pages
ML m1-m5 NOTES
No ratings yet
ML m1-m5 NOTES
160 pages
PDF Machine Learning
100% (2)
PDF Machine Learning
222 pages
Intro To ML - 1
No ratings yet
Intro To ML - 1
29 pages
2nd Summative Test Business Math
No ratings yet
2nd Summative Test Business Math
2 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
133 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
428 pages
ML Unit I - It
No ratings yet
ML Unit I - It
30 pages
Machine Learning Notes - Concepts, Algorithms
No ratings yet
Machine Learning Notes - Concepts, Algorithms
171 pages
CS8392 OBJECT ORIENTED PROGRAMMING - Syllabus
No ratings yet
CS8392 OBJECT ORIENTED PROGRAMMING - Syllabus
14 pages
Unit 1
No ratings yet
Unit 1
119 pages
MLUnit 1
No ratings yet
MLUnit 1
131 pages
ML - Unit I - Final
No ratings yet
ML - Unit I - Final
132 pages
ELE-COI-521 Machine Learning Topics
No ratings yet
ELE-COI-521 Machine Learning Topics
40 pages
Machine Learning: B. Tech Iii Year - I Sem
No ratings yet
Machine Learning: B. Tech Iii Year - I Sem
230 pages
UNIT-1 Machine Learning
No ratings yet
UNIT-1 Machine Learning
42 pages
Lecture 1
No ratings yet
Lecture 1
43 pages
B.Tech IT Machine Learning Guide
No ratings yet
B.Tech IT Machine Learning Guide
135 pages
UNIT-1 Machine Learning
No ratings yet
UNIT-1 Machine Learning
43 pages
Notes Unit 1-3 Part-I
No ratings yet
Notes Unit 1-3 Part-I
20 pages
Machine Learning: Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya
No ratings yet
Machine Learning: Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya
333 pages
ML Day1
No ratings yet
ML Day1
21 pages
Machine Learning and Its Applications
No ratings yet
Machine Learning and Its Applications
81 pages
Upload Unit 1
No ratings yet
Upload Unit 1
36 pages
Presentation 33360 Content Document 20250319044717PM
No ratings yet
Presentation 33360 Content Document 20250319044717PM
126 pages
Cvs
100% (1)
Cvs
23 pages
III BCA ML - Syll - Model - All Units
No ratings yet
III BCA ML - Syll - Model - All Units
85 pages
Lahore University of Management Sciences CS 535/EE 514 Machine Learning
No ratings yet
Lahore University of Management Sciences CS 535/EE 514 Machine Learning
3 pages
6.1.unit-1 ML Handsout
No ratings yet
6.1.unit-1 ML Handsout
18 pages
Overview of Machine Learning
No ratings yet
Overview of Machine Learning
60 pages
ML CP-23-24 EVEN As On 81.25
No ratings yet
ML CP-23-24 EVEN As On 81.25
13 pages
ML Course Overview for Students
No ratings yet
ML Course Overview for Students
41 pages
Ax 25proctocol
No ratings yet
Ax 25proctocol
13 pages
21AI63 Module 1
No ratings yet
21AI63 Module 1
38 pages
1 Introduction
No ratings yet
1 Introduction
24 pages
Intro to Machine Learning Course
No ratings yet
Intro to Machine Learning Course
68 pages
Unit 1-2
No ratings yet
Unit 1-2
22 pages
Lecture1 - ML Introduction
No ratings yet
Lecture1 - ML Introduction
21 pages
ML Unit-1
No ratings yet
ML Unit-1
12 pages
21ai63 Mod 1
No ratings yet
21ai63 Mod 1
38 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
2 pages
ML Unit I
No ratings yet
ML Unit I
43 pages
Chapter 1
No ratings yet
Chapter 1
62 pages
Introduction to Machine Learning Course
No ratings yet
Introduction to Machine Learning Course
37 pages
Machine Learning Course Guide
No ratings yet
Machine Learning Course Guide
53 pages
CE469 - Introduction To Machine Learning: Lecturer Contact
No ratings yet
CE469 - Introduction To Machine Learning: Lecturer Contact
33 pages
ITILv3 Overview All Part1
No ratings yet
ITILv3 Overview All Part1
49 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
15 pages
Unit 3
No ratings yet
Unit 3
58 pages
MBA Guide to Machine Learning
No ratings yet
MBA Guide to Machine Learning
17 pages
ML Final
No ratings yet
ML Final
98 pages
Machine Learning: Professional CORE (CET3006B) T. Y. B.Tech CSE
No ratings yet
Machine Learning: Professional CORE (CET3006B) T. Y. B.Tech CSE
106 pages
ML Intro 23
No ratings yet
ML Intro 23
11 pages
1 - ML Introduction1
No ratings yet
1 - ML Introduction1
23 pages
Introduction To AI, ML and DL: Dr. Manjubala Bisi
No ratings yet
Introduction To AI, ML and DL: Dr. Manjubala Bisi
33 pages
CS550 Lec1
No ratings yet
CS550 Lec1
35 pages
National Institute of Technology Patna: Department of Computer Science & Engineering
No ratings yet
National Institute of Technology Patna: Department of Computer Science & Engineering
2 pages
ML Final
No ratings yet
ML Final
95 pages
AD8552 UNIT 1 Machine Learning
No ratings yet
AD8552 UNIT 1 Machine Learning
19 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
270 pages
Excel Tips
100% (1)
Excel Tips
186 pages
ML Unit 1
No ratings yet
ML Unit 1
16 pages
Installing Oracle, PHP and Apache On WINDows
No ratings yet
Installing Oracle, PHP and Apache On WINDows
5 pages
Pencom - Visual Basic For Application Serial Port Software Example
No ratings yet
Pencom - Visual Basic For Application Serial Port Software Example
8 pages
Mathematics UPIL Catalogue
No ratings yet
Mathematics UPIL Catalogue
62 pages
Apple Employee Communications Kit
100% (1)
Apple Employee Communications Kit
17 pages
GE Centricity Practice Solution (CPS) Training For New Medical Providers
No ratings yet
GE Centricity Practice Solution (CPS) Training For New Medical Providers
39 pages
BIM Hand Book
No ratings yet
BIM Hand Book
2 pages
The C Programming Language
No ratings yet
The C Programming Language
21 pages
Operating Systems Lab 1 2013 Regulation
No ratings yet
Operating Systems Lab 1 2013 Regulation
116 pages
Samsung Electrical Parts Guide
No ratings yet
Samsung Electrical Parts Guide
16 pages
Angry Birds - Olson
No ratings yet
Angry Birds - Olson
2 pages
03 Graphs
No ratings yet
03 Graphs
51 pages
Animasi Karakter: The Struggle
No ratings yet
Animasi Karakter: The Struggle
16 pages
Computer Shops in Matara - Sri Lanka
No ratings yet
Computer Shops in Matara - Sri Lanka
2 pages
1 1 - Why Clouds 01032023 013831am
No ratings yet
1 1 - Why Clouds 01032023 013831am
17 pages
YOLO-PowerLite A Lightweight YOLO Model For Transmission Line Abnormal Target Detection
No ratings yet
YOLO-PowerLite A Lightweight YOLO Model For Transmission Line Abnormal Target Detection
12 pages
Handout 11 - Introduction - To - MICT V20140101-1.0.0
No ratings yet
Handout 11 - Introduction - To - MICT V20140101-1.0.0
9 pages
E1 Exam Sol
No ratings yet
E1 Exam Sol
6 pages
Cookie Manipulation
No ratings yet
Cookie Manipulation
5 pages
Humanoid Robot Presentation Through Multimodal Presentation Markup Language MPML-HR
No ratings yet
Humanoid Robot Presentation Through Multimodal Presentation Markup Language MPML-HR
7 pages
B.Tech Seminar: Autonomous Cars
No ratings yet
B.Tech Seminar: Autonomous Cars
5 pages
Activity 6.4.4: Basic Route Summarization: Topology Diagram
No ratings yet
Activity 6.4.4: Basic Route Summarization: Topology Diagram
3 pages
Kushla Vaishnawi Naidu
No ratings yet
Kushla Vaishnawi Naidu
2 pages
Java Programming Quiz
No ratings yet
Java Programming Quiz
21 pages