0% found this document useful (0 votes)

14 views25 pages

ML Chapter 3

The document discusses key concepts in machine learning, focusing on decision trees, ensemble learning, and algorithms like AdaBoost and Bagging. It explains how decision trees provide a simple method for making predictions, while ensemble methods combine multiple models to improve accuracy. Additionally, it covers k-means clustering and nearest neighbor methods for classification and regression tasks.

Uploaded by

aditya kanna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views25 pages

ML Chapter 3

Uploaded by

aditya kanna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

MACHINE LEARNING

Decision Trees
We use decision trees in machine learning because they offer a simple yet powerful way to
make predictions and decisions. Here’s why they are needed:
1. Easy to understand and interpret:
They look like a flowchart — you can actually "see" how a decision is made.
2. Handle both types of data:
Decision trees can work with both categorical (like "yes" or "no") and numerical (like
prices or age) data.
3. Minimal data preparation:
They don't need data to be scaled or normalized, and missing values can often be
handled naturally.

RAJASHEKER-ATRI 1
MACHINE LEARNING

CONSTRUCTION OF TREES

The important idea is to work out how much the entropy of the whole training set would
decrease if we choose each particular feature for the next classification step. This is known
as the information gain, and it is defined as

RAJASHEKER-ATRI 2
MACHINE LEARNING

RAJASHEKER-ATRI 3
MACHINE LEARNING

CLASSIFICATION AND REGRESSION TREES (CART)

RAJASHEKER-ATRI 4
MACHINE LEARNING

Therefore, the root node will be the party feature, which has two feature values (‘yes’ and
‘no’), so it will have two branches coming out of it (see Figure 12.6). When we look at the

RAJASHEKER-ATRI 5
MACHINE LEARNING

‘yes’ branch, we see that in all five cases where there was a party we went to it, so we just
put a leaf node there, saying ‘party’. For the ‘no’ branch, out of the five cases there are three
different outcomes, so now we need to choose another feature. The five cases we are
looking at are:

RAJASHEKER-ATRI 6
MACHINE LEARNING

Ensemble Learning
Imagine you have a tough decision to make.
Instead of deciding alone, you ask a group of people (a committee) and combine their
opinions.
Ensemble learning in machine learning is based on this same idea:
Instead of using one model (learner), we use many models, and combine their outputs to
get a better, more accurate result.

RAJASHEKER-ATRI 7
MACHINE LEARNING

Why use many models?

 Each model may learn something different from the data.
 Some models might be good at catching one type of pattern, others at different
patterns.
 When we combine them smartly, the final answer is usually better than any single
model alone.

Simple Real-Life Example:

When you visit a doctor with a complicated illness:
 If one test is not enough, the doctor orders multiple tests (blood test, scan, expert
opinions).
 Then, she combines the information from all tests to make a correct diagnosis.
Similarly, ensemble learning gathers opinions from multiple machine learning models to
make a better decision.

How it works (Step-by-Step):

1. Use many simple models (like decision trees, or others).
2. Train each model differently, so that they see or focus on different parts of the data.
o Example: Split the data into parts and give different parts to different models.
3. Combine their outputs.
o A simple way: Majority voting (whichever class gets the most votes wins).

RAJASHEKER-ATRI 8
MACHINE LEARNING

Important points:
 If the models are diverse and strong individually, the ensemble becomes very
powerful.
 Ensemble learning works well even with little data (because we reuse small pieces
of data cleverly).
 Majority voting: If most models are correct, the final answer will also be correct.

Popular ensemble methods like Bagging and Boosting.

Boosting is a way of combining many weak learners (bad models that are only slightly better than
guessing) to build one strong learner (a model that makes very good predictions).

Even if each small model is bad, together they become strong and accurate!

🌟 Main Boosting Idea:

1. Train a model (say, a simple decision tree).

2. Check where it makes mistakes.

3. Give more importance (weight) to the wrong answers (hard examples).

4. Train a new model, focusing more on these difficult examples.

5. Repeat this process many times.

6. Finally, combine all the models smartly to make the final decision.

AdaBoost
AdaBoost stands for Adaptive Boosting.

It is the most popular boosting algorithm.

RAJASHEKER-ATRI 9
MACHINE LEARNING

Here's how AdaBoost works, step-by-step:

1. Start by giving equal importance to all data points.

2. Train the first model.

3. Find out which points the model got wrong.

4. Increase the weight (importance) of those wrong points.

5. Train a second model — now it pays more attention to the difficult points.

6. Keep repeating this: each time focusing more on the points that were misclassified before.

7. Finally, combine all models — the better a model did, the more power (vote) it gets in the
final decision.

RAJASHEKER-ATRI 10
MACHINE LEARNING

1. The weights are initially all set to the same value, 1/N, where N is the number of datapoints
in the training set.
2. Then, at each iteration, the error (ϵ) is computed as the sum of the weights of the
misclassified points, and the weights for incorrect examples are updated by being multiplied
by α = (1 − ϵ)/ϵ.
3. Weights for correct examples are left alone, and then the whole set is normalised so that it
sums to 1 (which is effectively a reduction in the importance of the correctly classified
datapoints).
4. Training terminates after a set number of iterations, or when either all of the datapoints are
classified correctly, or one point contains more than half of the available weight.

Example:

Imagine you're trying to teach a group of students to solve math problems.

 Some students get easy questions right but fail on hard ones.

 Next time, you spend more time teaching the harder questions.

 Then again, you focus even more on the parts they still struggle with.

 After a few rounds, the students get much better!

Similarly, AdaBoost focuses more and more on mistakes, so the overall learning keeps improving.

Stumping:

 A stump is a tiny decision tree — it just asks one question and stops.

 Each stump is very weak (bad alone).

 But when we boost many stumps, they together become very strong!

Bagging
Bagging is a way to improve the performance and stability of machine learning models by combining
multiple models trained on different random samples of the same dataset.

The Core Idea:

 You take your original dataset.

 You create many random samples from it — with replacement (this is called a bootstrap
sample).

 You train a separate model on each sample (often decision trees).

 When you want to predict something, you combine the predictions of all these models —
usually by voting (for classification) or averaging (for regression).

RAJASHEKER-ATRI 11
MACHINE LEARNING

Why does this work?

 Each model sees a slightly different version of the data.

 That makes them behave differently (less correlated).

 When you combine all these different models, you get a more stable and accurate
prediction.

What is a Bootstrap Sample?

 It is created by randomly picking examples from the dataset — with replacement.

 Some data points may appear multiple times, others not at all.

 Each sample is the same size as the original dataset, just scrambled.

Example :

Imagine you are predicting what a group of people want to do (like “Party”, “Study”, “TV”, etc.).
You:

1. Make 20 random samples of this data (with replacement).

2. Train 20 small decision trees (called stumps — they only ask one question).

3. Each stump gives its own prediction.

4. You take the majority vote of all 20 trees to get your final prediction.

Despite each stump being weak on its own, together they give very accurate results!

Subagging

Subagging = Subsampling + Bagging

 The idea is almost the same as bagging, but samples are smaller than the original dataset.

 Instead of taking full-size samples, you take half-size (or any fraction).

 Sampling is usually done without replacement (like shuffling and picking the top few).

It’s faster and can still give very good results.

RAJASHEKER-ATRI 12
MACHINE LEARNING

Different Ways to Combine Classifiers

In ensemble learning, we combine multiple classifiers to make better predictions.
But how we combine them matters a lot!
There are several strategies:

1. Bagging (Bootstrap Aggregating)

 Each classifier sees different random samples of the training data.

 Then, each classifier votes.

 Final output: Class with majority vote wins.

 Important: In bagging, each classifier has the same weight (no classifier is treated as more
important).

2. Boosting

 All classifiers see the same data.

 But, the importance (weight) of each data point changes.

 Hard-to-classify points get higher weights for the next classifier.

RAJASHEKER-ATRI 13
MACHINE LEARNING

 Final output: A weighted vote — better classifiers have more say.

3. Majority Voting Variations

 Normally, majority voting = the class chosen by most classifiers.

 Variations:

o Only output if more than half of classifiers agree (otherwise, no prediction — useful
to avoid wrong answers for tough cases).

o Simple majority: just pick the most common output even if it’s not >50%.

📢 Important: If each classifier has a success rate p > 0.5, and you use a large number of classifiers,
the ensemble's success probability approaches 100%!

4. Regression Tasks (Numerical output, not categories)

 Instead of voting, average the outputs.

 Problem: Mean (average) is sensitive to outliers.

 Solution: Use Median instead of Mean.

Using Median instead of Mean leads to the Bragging Algorithm ("robust bagging").

5. Mixture of Experts

 Smart, learned way to combine classifiers.

 Instead of just voting or averaging, the system learns:

o Which classifier to trust for each input.

 Idea: Some classifiers are better for certain types of data.

🔹 How Mixture of Experts Works:

1. Each expert (classifier) gives an output (like probability).

2. A gating network decides how much to trust each expert (gives a weight).

3. Outputs are combined using those trust weights.

4. Final prediction is made.

 Training: Use EM algorithm (Expectation Maximization) or gradient descent.

6. Other Views of Mixture of Experts

 Like soft decision trees: not hard splits but probabilistic splits.

 Similar to Radial Basis Function (RBF) networks:

But instead of constant outputs, nodes give linear approximations.

RAJASHEKER-ATRI 14
MACHINE LEARNING

📋 In short:

Method How it Combines Classifiers Special Note

Bagging Majority vote (equal weight) Different samples of data

Boosting Weighted vote Focus on hard-to-classify data points

Simple Majority Voting Most common output May produce no output if disagreement

Averaging (Regression) Mean of outputs Use median for robustness

Mixture of Experts Learned weighted combination Uses gating network, smarter combination

🎯 Key Takeaway:

Even if each individual classifier is weak, by combining them wisely, you can get very strong and
accurate predictions!

Gaussian Mixture Models (GMM)

In machine learning, Gaussian Mixture Models (GMM) are used for modelling a set of data that is
assumed to be generated from multiple Gaussian distributions (also known as normal distributions).
These models are a powerful way of handling unsupervised learning problems, especially when the
data can be described by multiple different distributions or "modes."

RAJASHEKER-ATRI 15
MACHINE LEARNING

RAJASHEKER-ATRI 16
MACHINE LEARNING

RAJASHEKER-ATRI 17
MACHINE LEARNING

7.2 Nearest Neighbour Methods

Nearest Neighbour (NN) methods are non-parametric techniques used for classification and
regression. They are based on the idea that similar data points exist near each other in feature
space.

RAJASHEKER-ATRI 18
MACHINE LEARNING

1. Intuition (Nightclub Analogy)

Imagine you're in a nightclub, unsure how to dance. You look at nearby people:

 1-NN: You copy the closest person.

 k-NN: You observe k closest people and follow the majority.

This is how k-NN works: look at the closest data points and make your prediction based on them.

RAJASHEKER-ATRI 19
MACHINE LEARNING

Unsupervised Learning
 No labels or correct outputs are provided.

 Goal: Find patterns, similarities, or clusters within the data.

 You can't perform regression directly; instead, you group similar inputs.

2. Challenge

 Unlike supervised learning (which minimizes a target-based error like sum-of-squares), in

unsupervised learning, you can’t use external targets for error measurement.

 Instead, the algorithm must internally define an objective, like minimizing distances between
points and their cluster centers.

3. k-Means Clustering Algorithm

RAJASHEKER-ATRI 20
MACHINE LEARNING

 You pick a value for k = number of clusters you expect.

 Steps:

1. Randomly place k cluster centers.

2. Assign each data point to the nearest cluster center.

3. Reposition each center to the mean of the points assigned to it.

4. Repeat until centers stop moving (convergence).

 Distance measure: Typically Euclidean distance.

 Problems:

o Local Minima: The result depends heavily on the initial center placement.

o Choosing k: If k is too small or too big, you underfit or overfit.

 Solutions:

o Run multiple times with different initializations and pick the best solution.

o Try different values of k and evaluate carefully (but watch out for overfitting).

4. Dealing with Noise

 Outliers can mislead the mean (because the mean is sensitive to extreme values).

 Alternative: Use the median instead of the mean for cluster centers, but this is
computationally heavier.

5. k-Means as a Neural Network

 Think of cluster centers as neurons.

 Each neuron has weights corresponding to its location in space.

 Activation of each neuron = similarity to input (higher is better).

 Use a winner-takes-all strategy: the neuron closest to the input "fires."

 Update Rule (for the winner only):

o Move the weights slightly toward the input:

Δwij=ηxj\Delta w_{ij} = \eta x_jΔwij=ηxj

 (η is the learning rate.)

6. Importance of Normalization

 Problem: If neurons have very different weight magnitudes, activations become

incomparable.

RAJASHEKER-ATRI 21
MACHINE LEARNING

 Solution: Normalize weights so that they all have the same overall magnitude (often unit
length).

Visual Summary

k-Means clustering is like:

 Randomly throwing down some flags (cluster centers) in a crowd (data points).

 Each person (data point) walks to the closest flag.

 Each flag moves to the average position of the people around it.

 Repeat until flags stop moving.

When adapted to neural networks, neurons play the role of cluster centers, and they compete to
recognize patterns.

K-Means Algorithm: Step-by-Step

RAJASHEKER-ATRI 22
MACHINE LEARNING

RAJASHEKER-ATRI 23
MACHINE LEARNING

RAJASHEKER-ATRI 24
MACHINE LEARNING

RAJASHEKER-ATRI 25

AI Fundamentals Midterm Exam - Attempt Review
No ratings yet
AI Fundamentals Midterm Exam - Attempt Review
17 pages
Solutions To II Unit Exercises From Kamber
83% (42)
Solutions To II Unit Exercises From Kamber
16 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
LR Desktop Udo6rlp
No ratings yet
LR Desktop Udo6rlp
4 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Unit V - Multiple Learners
No ratings yet
Unit V - Multiple Learners
54 pages
Week 11
No ratings yet
Week 11
16 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
16-Ensemble Learning - Cont... - 12-04-2024
No ratings yet
16-Ensemble Learning - Cont... - 12-04-2024
13 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
CSE 445 - Lecture 7 - Ensemble Learning
No ratings yet
CSE 445 - Lecture 7 - Ensemble Learning
17 pages
Ai Model
No ratings yet
Ai Model
79 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
Ensemble Methods
No ratings yet
Ensemble Methods
19 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
Bagging
No ratings yet
Bagging
7 pages
ML8 Ensembles
No ratings yet
ML8 Ensembles
31 pages
ML Exp 9
No ratings yet
ML Exp 9
3 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
AI25
No ratings yet
AI25
7 pages
UNIT1
No ratings yet
UNIT1
80 pages
Unit 4 ML
No ratings yet
Unit 4 ML
25 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
Ensemble Learning (Autosaved)
No ratings yet
Ensemble Learning (Autosaved)
31 pages
Ensemble Methods Send
No ratings yet
Ensemble Methods Send
20 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Enseble LEarning
100% (1)
Enseble LEarning
57 pages
Chapter Five
No ratings yet
Chapter Five
42 pages
Module 5,1 Ensemble - Bagging, RF, Boosting
No ratings yet
Module 5,1 Ensemble - Bagging, RF, Boosting
66 pages
Unit 2
No ratings yet
Unit 2
13 pages
UNIT3 Class
No ratings yet
UNIT3 Class
30 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
Ensemble Classifiers Overview
No ratings yet
Ensemble Classifiers Overview
37 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
Ensemble Learning
No ratings yet
Ensemble Learning
30 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Unit 3
No ratings yet
Unit 3
63 pages
Ens Embling
No ratings yet
Ens Embling
8 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
14-AI ML Ensemble 2022
No ratings yet
14-AI ML Ensemble 2022
41 pages
Ensemble - Part 1
No ratings yet
Ensemble - Part 1
33 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
9 pages
Ensemble Techniques Presentation
No ratings yet
Ensemble Techniques Presentation
17 pages
ML Unit@4
No ratings yet
ML Unit@4
70 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
24 pages
Ensemble
No ratings yet
Ensemble
33 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
4 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
Unit V Aiml
No ratings yet
Unit V Aiml
18 pages
Ensembling Techniques
No ratings yet
Ensembling Techniques
11 pages
Lec06 - Ensembling Methods Bagging Boosting
No ratings yet
Lec06 - Ensembling Methods Bagging Boosting
48 pages
Lecture 2
No ratings yet
Lecture 2
35 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Voting or Averaging of Predictions of Multiple Pre-Trained Models
No ratings yet
Voting or Averaging of Predictions of Multiple Pre-Trained Models
23 pages
Bagging Vs Boosting in Machine Learning
No ratings yet
Bagging Vs Boosting in Machine Learning
5 pages
Applied Computational Intelligence and Soft Computing - 2024 - Ahmed - Student Performance Prediction Using Machine
No ratings yet
Applied Computational Intelligence and Soft Computing - 2024 - Ahmed - Student Performance Prediction Using Machine
15 pages
AI-Driven Optimization System For Large-Scale Kubernetes Clusters Enhancing Cloud Infrastructure Availability Security and Disaster Recovery
No ratings yet
AI-Driven Optimization System For Large-Scale Kubernetes Clusters Enhancing Cloud Infrastructure Availability Security and Disaster Recovery
26 pages
A Survey On Outlier Detection Techniques
No ratings yet
A Survey On Outlier Detection Techniques
37 pages
Land Cover Classification for REDD+ in Ghana
No ratings yet
Land Cover Classification for REDD+ in Ghana
100 pages
Educational Data Mining: A Review of The State of The Art
No ratings yet
Educational Data Mining: A Review of The State of The Art
18 pages
Osr Missile Systems Maintenance
No ratings yet
Osr Missile Systems Maintenance
80 pages
MSC Computer Science Syl
No ratings yet
MSC Computer Science Syl
42 pages
Artificial Intelligence Activities
No ratings yet
Artificial Intelligence Activities
34 pages
Module 4
No ratings yet
Module 4
18 pages
Chapter1 Intro
No ratings yet
Chapter1 Intro
31 pages
Course Outline ML - Spring-22
No ratings yet
Course Outline ML - Spring-22
7 pages
Importance of Clustering in Data Mining
No ratings yet
Importance of Clustering in Data Mining
5 pages
Deep Orthogonal Matrix Factorization As A Hierarchical Clustering Technique
No ratings yet
Deep Orthogonal Matrix Factorization As A Hierarchical Clustering Technique
5 pages
Machine Learning - Machine - Learning - Tutorial
No ratings yet
Machine Learning - Machine - Learning - Tutorial
35 pages
Pavan Kumar Masters Thesis - Compressed
No ratings yet
Pavan Kumar Masters Thesis - Compressed
94 pages
Top 10 Data Mining Papers
No ratings yet
Top 10 Data Mining Papers
126 pages
Design of An Improved Interval Type-2 Controller Using FCM and Supervised Clustering Algorithms
No ratings yet
Design of An Improved Interval Type-2 Controller Using FCM and Supervised Clustering Algorithms
10 pages
Expert Systems With Applications: Hui-Chuan Chu, Tsung-Yi Chen, Chia-Jou Lin, Min-Ju Liao, Yuh-Min Chen
No ratings yet
Expert Systems With Applications: Hui-Chuan Chu, Tsung-Yi Chen, Chia-Jou Lin, Min-Ju Liao, Yuh-Min Chen
13 pages
BTech IT
No ratings yet
BTech IT
81 pages
Data Warehousing
No ratings yet
Data Warehousing
4 pages
Dav Assignment 5
No ratings yet
Dav Assignment 5
2 pages
Ai For IT Non Coders
No ratings yet
Ai For IT Non Coders
14 pages
Creditcard Fraud Detection
No ratings yet
Creditcard Fraud Detection
26 pages
Data Mining Project - Clustering - State Wise Health Income
No ratings yet
Data Mining Project - Clustering - State Wise Health Income
9 pages
Information Visualization Technologies
No ratings yet
Information Visualization Technologies
15 pages
Data-Driven Personas: Constructing Archetypal Users With Clickstreams and User Telemetry
No ratings yet
Data-Driven Personas: Constructing Archetypal Users With Clickstreams and User Telemetry
10 pages
MCQ On Consumer Perception and Consumer Preference: A) True B) False
No ratings yet
MCQ On Consumer Perception and Consumer Preference: A) True B) False
4 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
6 pages