0% found this document useful (0 votes)

11 views15 pages

Part C

The document discusses various ensemble learning techniques, including Stacking, Bagging, and Boosting, highlighting their applications in classification tasks like customer churn prediction. It explains the architecture and justification for using specific base learners and meta-learners, as well as the performance of K-Means clustering and its limitations with non-convex data. Additionally, it addresses the computational efficiency of KNN and K-Means for fraud detection and offers strategies to mitigate overfitting in Boosting methods.

Uploaded by

rajalakshmir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views15 pages

Part C

Uploaded by

rajalakshmir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

PART C

1 Stacking, or Stacked Generalization, is an ensemble learning technique that

combines multiple base models (also called level-0 models) and a meta-model (level-1
model) to improve prediction performance. It leverages the strengths of diverse algorithms
by training a meta-learner on their outputs. In this case, we design a stacking model using:

 K-Nearest Neighbors (KNN)

 Decision Tree
 Naive Bayes
as base learners, and a Logistic Regression model as the meta-learner.

Architecture of the Stacking Model:

 Level-0 Models (Base Learners):

o KNN: Instance-based, non-parametric learner, effective in local decision
boundaries.
o Decision Tree: Captures complex decision rules and feature interactions.
o Naive Bayes: Assumes feature independence, fast and performs well with
categorical data.
 Level-1 Model (Meta-Learner):
o Logistic Regression: A linear model ideal for binary classification tasks
(e.g., churn prediction). It generalizes well and interprets probabilities from
base model outputs.

Justification of Base Learner Selection

 KNN:
o Captures local data patterns and performs well with well-separated classes.
o Brings diversity due to its lazy learning and distance-based approach.
 Decision Tree:
o Handles non-linear relationships and feature importance.
o Robust to irrelevant features and outliers.
 Naive Bayes:
o Simple and fast, ideal when feature independence assumption holds.
o Often performs surprisingly well on text or categorical data.

➡️These algorithms represent diverse learning biases, improving ensemble generalization

through variance in predictions.

Justification of Meta-Learner Selection

 Logistic Regression is selected because:

o It can interpret and weigh the predictions of base models effectively.
o It is less prone to overfitting as a meta-learner.
o Provides probabilistic outputs, useful for classification thresholds and ROC
analysis.

Implementation Example in Python

# Define base learners
base_learners = [
('knn', KNeighborsClassifier(n_neighbors=5)),
('dt', DecisionTreeClassifier(max_depth=5)),
('nb', GaussianNB())
]

# Define meta-learner
meta_learner = LogisticRegression()

# Build stacking model

stacking_model = StackingClassifier(estimators=base_learners,
final_estimator=meta_learner)

The designed stacking ensemble effectively combines the strengths of KNN (local
patterns), Decision Trees (rule-based learning), and Naive Bayes (probabilistic
reasoning). The Logistic Regression meta-learner efficiently integrates their predictions
for improved accuracy, robustness, and generalization. This approach is particularly
beneficial in complex classification problems like customer churn prediction, where no
single algorithm performs best across all scenarios.

2 Bagging (Bootstrap Aggregating) is an ensemble technique used to improve the accuracy

and stability of machine learning algorithms. It is especially effective with high-variance models
like Decision Trees. In customer churn prediction, where we aim to identify customers likely to
leave, bagging can improve classification performance by reducing overfitting.
Working of Bagging Ensemble

Bagging works as follows:

1. Bootstrap Sampling: Multiple subsets of the original training dataset are created by
random sampling with replacement.
2. Model Training: A Decision Tree is trained on each bootstrap sample.
3. Aggregation: For classification tasks, predictions from all trees are combined using
majority voting.

Python Code Example

# Simulated data (replace with churn dataset)
X, y = make_classification(n_samples=1000, n_features=20,
random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# Bagging with Decision Trees

bag_model = BaggingClassifier(base_estimator=DecisionTreeClassifier(),
n_estimators=50, random_state=42)
bag_model.fit(X_train, y_train)
y_pred = bag_model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))

Impact of Increasing Number of Base Learners

No. of Base Learners Effect on Model Performance

Low (e.g., 5–10) Faster training, but less robust. Higher variance.

Moderate (e.g., 30–50) Balanced performance. Reduces overfitting. Good accuracy.

High (e.g., 100+) Minor gains in accuracy. Lower variance. Stable predictions.

Too Many Minimal gain. Increased computation time and memory usage.
✅ Summary: More base learners reduce variance and increase model stability, but after a certain
point, the performance gain plateaus.

Bagging with Decision Trees is a powerful ensemble method for predicting customer churn. By
aggregating multiple models trained on diverse bootstrap samples, it improves accuracy and
reduces overfitting. Increasing the number of base learners enhances model robustness, but the
improvement slows after a certain threshold due to the law of diminishing returns.
3 K-Means is a centroid-based, unsupervised clustering algorithm that partitions data into K
clusters by minimizing the within-cluster sum of squares (WCSS). It assumes that clusters are
spherical (convex) and equally sized.

Performance of K-Means on Non-Convex Clusters When applied to non-convex clusters (e.g.,

crescent-shaped or ring-shaped clusters), K-Means often performs poorly because:

 It relies on Euclidean distance to assign points to the nearest centroid.

 It cannot correctly represent clusters with irregular shapes, varying density, or overlaps.
 It may incorrectly split a single non-convex cluster into multiple convex parts.

Example Diagram: K-Means Failure on Non-Convex Data

Original Clusters (Non-Convex) K-Means Clustering Output

( ) ( ) XXXXX OOOOO
( ) ( ) XXXXX OOOOO
( ) ( ) XXXXX OOOOO
Ring shapes K-Means splits into incorrect
convex parts

Left: True cluster shapes; Right: K-Means failing to capture the structure.

Alternative Clustering Method: DBSCAN

We recommend DBSCAN (Density-Based Spatial Clustering of Applications with Noise) as a

better alternative.

Why DBSCAN Works Better:

 Captures arbitrary shapes: Clusters can be non-convex and irregular.

 Handles noise: Can identify outliers as noise.
 No need to specify K: Instead, uses parameters eps (neighborhood size) and
min_samples.

Key Advantages Over K-Means:

Feature K-Means DBSCAN

Cluster shape Convex only Arbitrary shapes

Sensitivity to noise High Low (can detect outliers)

Number of clusters Must be specified (K) Determined automatically

Density sensitivity Poor Very good

Python Example: DBSCAN vs. K-Means

# Create non-convex data
X, _ = make_moons(n_samples=300, noise=0.05)

# K-Means clustering
kmeans = KMeans(n_clusters=2).fit(X)

# DBSCAN clustering
dbscan = DBSCAN(eps=0.3, min_samples=5).fit(X)

# Plot results
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_)
plt.title("K-Means Clustering")

plt.subplot(1, 2, 2)
plt.scatter(X[:, 0], X[:, 1], c=dbscan.labels_)
plt.title("DBSCAN Clustering")
plt.show()

K-Means clustering is efficient but assumes convex, equally sized clusters, leading to poor
performance on non-convex datasets. An ideal alternative is DBSCAN, which identifies arbitrary-
shaped clusters and detects noise, making it a robust choice for real-world, non-linearly
separable data.This highlights the importance of choosing clustering methods based on data
geometry and distribution.

4 (a) KNN vs. K-Means for Fraud Detection

1. Nature of Algorithms
KNN (Supervised) K-Means (Unsupervised)

Classification algorithm Clustering algorithm

Requires labeled data Does not require labels

Predicts based on similarity to neighbors Groups data into K clusters

2. Use Case for Fraud Detection

 KNN: Effective for fraud classification when historical fraud labels (fraud/non-fraud) are
available. It compares a new transaction to known patterns.
 K-Means: Groups transactions into clusters, but does not know which are fraudulent.
Used for anomaly detection, not direct classification.
3. Suitability Justification (3 Marks)
Criteria KNN K-Means

Label availability ✅ Uses labels ❌ No labels

Fraud prediction ✅ Direct prediction ❌ Only detects groups

Interpretability ✅ Transparent ⚠ Depends on clusters

Suitability ✔ Best for classification Used for anomaly grouping

KNN is more suitable for fraud detection when labeled data is available. It can predict directly
whether a transaction is fraudulent based on past examples, whereas K-Means is only useful for
unsupervised anomaly detection.

4. Diagram: KNN vs. K-Means Fraud Detection (1 Mark)

KNN (Supervised) K-Means (Unsupervised)

------------------ -----------------------
[Known Labels] [No Labels]
↓ ↓
Train KNN Run K-Means (k=2)
↓ ↓
Predict Fraud / Legit Cluster 1 / Cluster 2
(Fraud: Red) (Unknown which is fraud)

(b) Computational Efficiency: 1000 samples, 50 features

1. KNN Complexity

 Training Time: O(1) – No training required (lazy learner).

 Prediction Time: O(n × d)
where:
o n = number of training samples = 1000
o d = number of features = 50

➡️Cost per prediction: Must compute distance to all 1000 points in 50D space.

➡️Scales poorly with high dimensionality (Curse of Dimensionality).

2. K-Means Complexity

 Training Time: O(k × n × d × i)

where:
o k = number of clusters (e.g., 3–10)
o n = 1000
o d = 50
o i = number of iterations (typically <100)

➡️K-Means does not need to compare to all points for every prediction.

➡️More efficient in prediction (assigning to nearest centroid), but initial training cost is higher
than KNN.

3. Conclusion on Efficiency
Algorithm Training Efficiency Prediction Efficiency Overall (50D)

KNN ✅ Fast (no training) ❌ Slow (distance to all points) ❌ Less efficient

K-Means ❌ Slower (training) ✅ Fast (after training) ✅ More efficient

For high-dimensional data (50 features), K-Means is computationally more efficient overall.
KNN becomes slower during prediction due to distance calculations in high dimensions.
5 A Gaussian Mixture Model (GMM) is a probabilistic model that assumes the data is
generated from a mixture of several Gaussian distributions with unknown parameters.
Expectation-Maximization (EM) is an iterative algorithm used to estimate these
parameters:

 E-Step (Expectation): Estimate the probability (responsibility) that each data point
belongs to each Gaussian.
 M-Step (Maximization): Update the parameters (means, variances, and weights)
based on the responsibilities.
After the first EM iteration, the GMM starts adapting to the data structure by shifting means and
responsibilities. The log-likelihood improves in each iteration, moving toward convergence. This
iterative process allows GMMs to model complex multimodal distributions, making them
effective in applications like speaker recognition, image segmentation, and anomaly detection.
6 Boosting is an ensemble method that combines several weak learners (typically decision
trees) to form a strong classifier.

 It trains models sequentially, where each new model focuses on correcting the errors of
the previous ones.

Effectiveness of Boosting on Noisy Data

🔹 Problem with Noisy Data:

 Boosting gives higher weights to misclassified examples.
 If noise causes some points to be consistently misclassified, Boosting overemphasizes
these, treating them as important patterns.
 This leads to overfitting, where the model fits the noise instead of the true data
distribution.

🔹 Observed Effects:
Scenario Behavior of Boosting

Clean data Learns progressively better models

Noisy data Learns patterns in noise → overfits

Imbalanced data May overfocus on outliers

Conclusion: Boosting is sensitive to noise and can overfit significantly if not controlled.

Diagram: Boosting Overfitting on Noisy Data

Correct pattern ignored → Noise fit increases

Modifications to Reduce Overfitting

1. Use Early Stopping

 Monitor validation loss.

 Stop training when validation performance degrades (prevents overfitting on noise).

2. Limit Base Learner Complexity

 Use shallow decision trees (depth = 1 or 2).

 Prevents learning overly specific patterns caused by noise.

3. Use Robust Variants of Boosting

 Gradient Boosting with Regularization (e.g., XGBoost):

o Adds penalties for model complexity.
o Regularizes trees using parameters like gamma, lambda.
 Stochastic Boosting:
o Introduce randomness by subsampling data/features.
o Prevents the model from focusing too heavily on noise.

4. Label Cleaning or Outlier Removal

 Preprocess data to remove mislabeled/noisy points using:

o Clustering, outlier detection (e.g., LOF)
o Manual review for small datasets

 Boosting is highly effective on clean data but suffers in noisy environments due to its
error-focusing nature.
 Using techniques like regularization, early stopping, and robust models (e.g., XGBoost,
LightGBM) significantly reduces overfitting.
 These modifications make Boosting more reliable and generalizable even with noisy
datasets.

Iot Hand Written Notes
No ratings yet
Iot Hand Written Notes
11 pages
Programming & DSA Essentials
No ratings yet
Programming & DSA Essentials
5 pages
Database Technology
No ratings yet
Database Technology
2 pages
Introduction To Classification and Classification Algorithms
100% (1)
Introduction To Classification and Classification Algorithms
9 pages
DM Unit - 3
No ratings yet
DM Unit - 3
21 pages
Yunsu Han KNN K Means
No ratings yet
Yunsu Han KNN K Means
8 pages
ML 4
No ratings yet
ML 4
33 pages
Live Classroom 2
No ratings yet
Live Classroom 2
40 pages
Data Science for Business Leaders
No ratings yet
Data Science for Business Leaders
18 pages
Revision
No ratings yet
Revision
12 pages
Rohini 27336950025
No ratings yet
Rohini 27336950025
6 pages
Mooc Part 2
No ratings yet
Mooc Part 2
8 pages
UNIT 3 Data Warehousing
No ratings yet
UNIT 3 Data Warehousing
39 pages
حسین
No ratings yet
حسین
3 pages
Data Communications and Networking (5th Edition) - Behrouz A. Forouzan Ch4
No ratings yet
Data Communications and Networking (5th Edition) - Behrouz A. Forouzan Ch4
3 pages
Aids 2 Yr Ia ML
No ratings yet
Aids 2 Yr Ia ML
3 pages
Cloud Database Management
No ratings yet
Cloud Database Management
2 pages
Single Layer Perceptron Classifiers
No ratings yet
Single Layer Perceptron Classifiers
52 pages
Classification in Data Mining
No ratings yet
Classification in Data Mining
60 pages
Let's Begin With:: Differentiate Between Supervised and Unsupervised Learning
No ratings yet
Let's Begin With:: Differentiate Between Supervised and Unsupervised Learning
26 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
13 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
9 pages
Unit 3 Ds
No ratings yet
Unit 3 Ds
10 pages
Fraud Detection Model Analysis
100% (1)
Fraud Detection Model Analysis
14 pages
DM After Midz
No ratings yet
DM After Midz
22 pages
Al3451 - Question Bank
100% (1)
Al3451 - Question Bank
12 pages
Unit 4
No ratings yet
Unit 4
24 pages
DM Cheat Sheet
No ratings yet
DM Cheat Sheet
7 pages
ml2 1
No ratings yet
ml2 1
7 pages
Data Mining Techniques
No ratings yet
Data Mining Techniques
11 pages
Ds Notes Mca
No ratings yet
Ds Notes Mca
30 pages
Machine Lar Arii
No ratings yet
Machine Lar Arii
9 pages
Machine Learning Note Modul 4 5
No ratings yet
Machine Learning Note Modul 4 5
20 pages
CC Unit IV
No ratings yet
CC Unit IV
30 pages
Aam Unit 4 QB With Answer
No ratings yet
Aam Unit 4 QB With Answer
11 pages
Data Analytics Unit4 FullNotes
No ratings yet
Data Analytics Unit4 FullNotes
4 pages
Class Ix Number System and Polynomials
No ratings yet
Class Ix Number System and Polynomials
2 pages
Avoiding Overfitting in ML Models
No ratings yet
Avoiding Overfitting in ML Models
38 pages
Predicting Churn
No ratings yet
Predicting Churn
37 pages
AIML Solved Paper Nov-Dec 2024
No ratings yet
AIML Solved Paper Nov-Dec 2024
2 pages
Spam Not Spam
No ratings yet
Spam Not Spam
7 pages
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
No ratings yet
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
34 pages
Chapter 2 Machine Learning Draft-85-172
No ratings yet
Chapter 2 Machine Learning Draft-85-172
88 pages
Study Guide and Intervention Solving Compound Inequalities
No ratings yet
Study Guide and Intervention Solving Compound Inequalities
2 pages
UNIT 3 - Final
No ratings yet
UNIT 3 - Final
37 pages
DWDM Unit-3
No ratings yet
DWDM Unit-3
9 pages
E27-24 Machine Learning (Final) Ticket 4: Ridge Regression
No ratings yet
E27-24 Machine Learning (Final) Ticket 4: Ridge Regression
9 pages
SML Hand Note Bau by DT
No ratings yet
SML Hand Note Bau by DT
1 page
Unit - 4 DWDM
No ratings yet
Unit - 4 DWDM
27 pages
Fam Question Bank CT
No ratings yet
Fam Question Bank CT
14 pages
BDA LabReport-9
No ratings yet
BDA LabReport-9
17 pages
Unit 6
No ratings yet
Unit 6
22 pages
ML - K-Means
No ratings yet
ML - K-Means
12 pages
Group # 7: Kalman Filter Based Parameter Estimation
No ratings yet
Group # 7: Kalman Filter Based Parameter Estimation
11 pages
Evaluation of Student Academic Performan
No ratings yet
Evaluation of Student Academic Performan
7 pages
Unit 4 Introduction To Algorithm
No ratings yet
Unit 4 Introduction To Algorithm
10 pages
ML Model Evaluation & Techniques
No ratings yet
ML Model Evaluation & Techniques
5 pages
Fowlkes-Mallows & K-Means Clustering
No ratings yet
Fowlkes-Mallows & K-Means Clustering
6 pages
Informed Search Strategies - Final
No ratings yet
Informed Search Strategies - Final
27 pages
Project Report - Credit Card Fraud Detection
No ratings yet
Project Report - Credit Card Fraud Detection
11 pages
ML Notes 1
No ratings yet
ML Notes 1
3 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
ML Topics
No ratings yet
ML Topics
18 pages
NLP Chapter 2
No ratings yet
NLP Chapter 2
79 pages
Credit Card Fraud Analysis Ashutosh
No ratings yet
Credit Card Fraud Analysis Ashutosh
3 pages
Recurrence Relations
No ratings yet
Recurrence Relations
59 pages
Loan Prediction for Banks
No ratings yet
Loan Prediction for Banks
3 pages
Mall Customer Segmentation Guide
No ratings yet
Mall Customer Segmentation Guide
8 pages
Fundamentals of AI&ML - Syllabus
No ratings yet
Fundamentals of AI&ML - Syllabus
3 pages
Al3451 - Machine Learning - Answer Key 13 Mark
No ratings yet
Al3451 - Machine Learning - Answer Key 13 Mark
22 pages
CN Laboratory 2024 Print
No ratings yet
CN Laboratory 2024 Print
54 pages
Bi-quad Filter Circuit Design
No ratings yet
Bi-quad Filter Circuit Design
8 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
CS 161 Summer 2009 Homework #2 Sample Solutions: Problem 1 (24 Points)
No ratings yet
CS 161 Summer 2009 Homework #2 Sample Solutions: Problem 1 (24 Points)
8 pages
Algorithms 1
No ratings yet
Algorithms 1
23 pages
Expt No1-3 Cnlab
No ratings yet
Expt No1-3 Cnlab
17 pages
5 - External Academic Audit Report - Course File - ODD Sem
No ratings yet
5 - External Academic Audit Report - Course File - ODD Sem
2 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
Sigmoid Neural Networks To Predict Handwritten Digits
No ratings yet
Sigmoid Neural Networks To Predict Handwritten Digits
16 pages
Causality: The Impulse Response H (N) of An Ideal Low Pass Filter With Frequency Response
No ratings yet
Causality: The Impulse Response H (N) of An Ideal Low Pass Filter With Frequency Response
3 pages
Al3451 Ia 2 Answer Key
No ratings yet
Al3451 Ia 2 Answer Key
12 pages
ML Syllabus
No ratings yet
ML Syllabus
2 pages
Optimization Transfer Algorithms in Statistics
No ratings yet
Optimization Transfer Algorithms in Statistics
29 pages
OS Scheduling: Round Robin vs Priority
No ratings yet
OS Scheduling: Round Robin vs Priority
9 pages
Sampling and Quantization Lab
No ratings yet
Sampling and Quantization Lab
8 pages
Radix Sort - Wikipedia, The Free Encyclopedia
No ratings yet
Radix Sort - Wikipedia, The Free Encyclopedia
13 pages
Bilal Ahmed Shaik Data Mining
No ratings yet
Bilal Ahmed Shaik Data Mining
88 pages
Student Name List
No ratings yet
Student Name List
1 page
UNIT II Part-1
No ratings yet
UNIT II Part-1
59 pages
4 4 Lagrange-Presentation
No ratings yet
4 4 Lagrange-Presentation
18 pages
Fourier Analysis of Signals - Students
No ratings yet
Fourier Analysis of Signals - Students
69 pages
Chapter 2 Polynomials
No ratings yet
Chapter 2 Polynomials
6 pages
Heap Sort
No ratings yet
Heap Sort
28 pages
Hash Generation: Bharati Vidyapeeth Deemed University
No ratings yet
Hash Generation: Bharati Vidyapeeth Deemed University
32 pages
Navya - Week 3 Assignment
No ratings yet
Navya - Week 3 Assignment
14 pages
Classification Full
No ratings yet
Classification Full
50 pages
DWM Experiment No.04: B.1 Software Code Written by Student: INPUT (.Arff File)
No ratings yet
DWM Experiment No.04: B.1 Software Code Written by Student: INPUT (.Arff File)
3 pages
Significant Figures
No ratings yet
Significant Figures
2 pages
Soft-NMS: Boost Object Detection
No ratings yet
Soft-NMS: Boost Object Detection
10 pages