CSE472 Assignment 2

Uploaded by

rayhanahmed49

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views3 pages

CSE472 Assignment 2

Uploaded by

rayhanahmed49

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Document Version: 2024/09/12/02

Document version history

Version Key changes Changed on Changed by
2024/09/12/01 First draft 12-SEP-2024 MDSR
2024/09/12/02 The code should be in a Jupyter notebook file 13-SEP-2024 MDSR

CSE472 (Machine Learning Sessional)

Assignment# 2: Logistic Regression with Bagging and Stacking
Introduction
In ensemble learning, we combine decisions from multiple learners to solve a classification
problem. In this assignment, you will implement a Logistic Regression (LR) classifier with
bagging and stacking.
Programming Language/Platform
 Python 3 [Hard requirement]
Dataset preprocessing
You need to demonstrate the performance and efficiency of your implementation for the
following three different datasets.
1. https://www.kaggle.com/blastchar/telco-customer-churn
2. https://archive.ics.uci.edu/ml/datasets/adult
3. https://www.kaggle.com/mlg-ulb/creditcardfraud
They differ in size, number and types of attributes, data quality (missing attribute values), data
descriptions (whether train and test data are separate, attribute description format, etc.), etc. Your
core implementation for LR, bagging and stacking must work for all three datasets without any
modification. You should add a separate dataset-specific preprocessing script/module/function to
feed your learning engine a standardized data file in matrix format. On the evaluation day, you
will be given another new dataset for which you must create a preprocessor. Any lack of
understanding about your own code will severely hinder your chances of success. Here are some
suggestions for you,
1. Design and develop your own code. You can take help from tons of materials available on
the web, but do it yourself. This is the only way to ensure you know every subtle issue
that needs to be tweaked during customization.
2. Do not assume anything about your dataset. Keep an open mind. Deal with their subtleties
in preprocessing.
3. Use Python library functions for common preprocessing tasks such as normalization,
binarization, discretization, imputation, encoding categorical features, scaling etc.
Visithttp://scikit-learn.org/stable/modules/preprocessing.html for more information.
4. Go through the dataset description given in the link carefully. Misunderstanding will lead
to incorrect preprocessing.
5. For the third dataset, don’t worry if your implementation takes long time. You can use a
smaller subset (randomly selected 20000 negative samples + all positive samples) of that

Page 1 of 3
Document Version: 2024/09/12/02

dataset for demonstration purpose. Do not exclude any positive sample, as they are
scarce.
6. Split your preprocessed datasets into 80% training and 20% testing data when the dataset
is not split already. From the training set, separate out 20% data for validation. You can
use the Scikit-learn built-in function for the train-test split. For splitting guidelines, see
https://developers.google.com/machine-learning/crash-course/training-and-test-
sets/splitting-data.
Use bagging (sampling with replacement) to generate 9 training sets, and then train LR
models on each of them. These will be the base learners for the stacking ensemble. You
should use every tool in your arsenal to achieve good and generalizable results. Experiment
with the different regularizers discussed in the class.
Then train another LR model as the meta classifier to finalize the stacking ensemble. You
should also develop a simple majority voting based ensemble and make a comparative
analysis.
Write clean and modularize your code.
Performance Evaluation
1. Always use a constant seed for any random number generation so that your experiments
are reproducible.
2. Draw violin plots for each performance metric for the 9 bagging LR learners. (See
https://medium.com/analysts-corner/how-to-create-violine-chart-in-python-
3517e5d4a652)
3. Make a comparative analysis table as follows.
Performance on Test set
Accuracy Sensitivity Specificity Precision F1-score AUROC AUPR

LR*

Voting
ensemble

Stacking
ensemble
* For LR, report average ± stdev for the 9 bagging LR learners

Submission
1. Upload the codes in Moodle within 20-SEP-2024 (Friday) 10:00 PM. (Strict deadline)
2. You need to submit a report file in PDF format containing the following items (No hard
copy is required.):
a. Clear instructions on how to run your script to train your model(s) and test them.
(For example, which part needs to be commented out when training each dataset,

Page 2 of 3
Document Version: 2024/09/12/02

how to run evaluation etc.) We would like to run the script in our computers
before the sessional class.
b. The table(s) and plot(s) mentioned in the performance evaluation section with
your experimental results.
c. Any observations.
3. Write code in a Jupyter Notebook file (*.ipynb). Please provide description of the various
code segments. Rename it with your student id. For example, if your student id
is1905123, then your code file name should be “1905123.ipynb” and the report name
should be “1905123.pdf”.
4. Finally make a main folder, put the code and report in it, and rename the main folder as
your student id. Then zip it and upload it.
Evaluation
1. You have to reproduce your experiments during in-lab evaluation. Keep everything ready
to minimize delay.
2. You are likely to give online tasks during evaluation which will require you to modify
your code.
3. You will be tested on your understanding through viva-voce.
4. If evaluators like performance, efficiency or modularity of a particular code, they can
give bonus marks. This will be completely at the discretion of evaluators.
Warning
1. Don’t copy! We regularly use copy checkers.
2. First time wrongdoers (either copier or the provider) will receive negative marking
because of dishonesty.
3. Repeated occurrence will lead to severe departmental action and jeopardize your
academic career. We expect fairness and honesty from you. Don’t disappoint us!

Page 3 of 3

Project1 Description
No ratings yet
Project1 Description
7 pages
ML Lab Black & White
No ratings yet
ML Lab Black & White
83 pages
Lab Assignment - SVM - 2024
No ratings yet
Lab Assignment - SVM - 2024
5 pages
Green University of Bangladesh Department of Computer Science and Engineering (CSE)
No ratings yet
Green University of Bangladesh Department of Computer Science and Engineering (CSE)
6 pages
Capstone Project - Jaro-Prof. Babji
No ratings yet
Capstone Project - Jaro-Prof. Babji
5 pages
DL LAb
No ratings yet
DL LAb
30 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
Logistic Regression Sentiment Analysis
No ratings yet
Logistic Regression Sentiment Analysis
3 pages
Data Mining Assignment No 2
No ratings yet
Data Mining Assignment No 2
4 pages
DL 3
No ratings yet
DL 3
5 pages
Data Mining Project
No ratings yet
Data Mining Project
4 pages
22aip3101rap Lab Work Book
No ratings yet
22aip3101rap Lab Work Book
84 pages
Lab Assessment 2 - Question
No ratings yet
Lab Assessment 2 - Question
2 pages
Important Questions
No ratings yet
Important Questions
4 pages
ML Assignment 1
No ratings yet
ML Assignment 1
15 pages
HW 01 - CSL 537
No ratings yet
HW 01 - CSL 537
6 pages
Amandt 3
No ratings yet
Amandt 3
10 pages
A3 Classification and Feature Engineering
No ratings yet
A3 Classification and Feature Engineering
2 pages
PythonForML2023 Laboratory07 08 Regression Classification Update2
No ratings yet
PythonForML2023 Laboratory07 08 Regression Classification Update2
6 pages
Assignment Guideline and Rubric CPC251
No ratings yet
Assignment Guideline and Rubric CPC251
3 pages
7641 Assignment 1
No ratings yet
7641 Assignment 1
4 pages
Sentiment Analysis On Tweets
No ratings yet
Sentiment Analysis On Tweets
2 pages
Data Mining Regression and Classification
No ratings yet
Data Mining Regression and Classification
11 pages
CTI Record
No ratings yet
CTI Record
49 pages
UCCD2063 Artificial Intelligence Techniques Practical Assignment
No ratings yet
UCCD2063 Artificial Intelligence Techniques Practical Assignment
3 pages
Skill
No ratings yet
Skill
42 pages
AIML Lab: Regression Models Guide
No ratings yet
AIML Lab: Regression Models Guide
7 pages
Spring 2025 - CS619 - 10969
No ratings yet
Spring 2025 - CS619 - 10969
4 pages
School of Engineering: Lab Manual On Machine Learning Lab
No ratings yet
School of Engineering: Lab Manual On Machine Learning Lab
23 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
26 pages
ML Project Proposal PDF
No ratings yet
ML Project Proposal PDF
4 pages
Practical 7 Thsem
No ratings yet
Practical 7 Thsem
50 pages
Data Science & Big Data Lab Guide
No ratings yet
Data Science & Big Data Lab Guide
167 pages
AML - Lab - Syllabus - Chandigarh University
No ratings yet
AML - Lab - Syllabus - Chandigarh University
9 pages
AI Project: Real-World Data Classification
No ratings yet
AI Project: Real-World Data Classification
6 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
AI Course Help Guide
No ratings yet
AI Course Help Guide
3 pages
178 hw3
No ratings yet
178 hw3
3 pages
ASSiGN ML
No ratings yet
ASSiGN ML
2 pages
Iml Lab (1) .177
No ratings yet
Iml Lab (1) .177
32 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
1 - Data Preprocessing and Cleaning - 55
No ratings yet
1 - Data Preprocessing and Cleaning - 55
8 pages
Ce473 Project - Fall 2024
No ratings yet
Ce473 Project - Fall 2024
8 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
54 pages
DSLR Sheet
No ratings yet
DSLR Sheet
1 page
Machine Learning Assignment 2: Assessment Type
No ratings yet
Machine Learning Assignment 2: Assessment Type
11 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
ML2 Write-Ups Prac 1-5
No ratings yet
ML2 Write-Ups Prac 1-5
11 pages
Fall 2024 - Project - CEP
No ratings yet
Fall 2024 - Project - CEP
2 pages
En Subject
No ratings yet
En Subject
13 pages
Logistic Regression Lab Guide
No ratings yet
Logistic Regression Lab Guide
9 pages
Ad3461 ML Lab Manual
No ratings yet
Ad3461 ML Lab Manual
48 pages
CS335 Lab6
No ratings yet
CS335 Lab6
7 pages
ML Lab 11 Manual - Neural Networks (Ver4)
No ratings yet
ML Lab 11 Manual - Neural Networks (Ver4)
8 pages
hw1 Problem Set
No ratings yet
hw1 Problem Set
8 pages
It Hardware Exp1
No ratings yet
It Hardware Exp1
8 pages
Fire Protection System
No ratings yet
Fire Protection System
60 pages
1.1.1 Binary Systems Worksheet
No ratings yet
1.1.1 Binary Systems Worksheet
5 pages
Guidelines AdvancedWebProgramming
No ratings yet
Guidelines AdvancedWebProgramming
2 pages
Roof Design
100% (2)
Roof Design
19 pages
Get Invoice
No ratings yet
Get Invoice
2 pages
20mat21 Class Question Paper Notes PDF
No ratings yet
20mat21 Class Question Paper Notes PDF
3 pages
Physics of Fusion Power
No ratings yet
Physics of Fusion Power
22 pages
Oriental College of Technology: Ritika Makhija
No ratings yet
Oriental College of Technology: Ritika Makhija
23 pages
AutoCAD Customization Projects
No ratings yet
AutoCAD Customization Projects
6 pages
Examples: 238 17 Psychrometrics
No ratings yet
Examples: 238 17 Psychrometrics
12 pages
Blockholders' Power & Firm Value
No ratings yet
Blockholders' Power & Firm Value
13 pages
Plunger Lift Brochure
No ratings yet
Plunger Lift Brochure
4 pages
To Check Yourself
No ratings yet
To Check Yourself
12 pages
Geotechnical Study for Baghdad Site
No ratings yet
Geotechnical Study for Baghdad Site
20 pages
Midas Gen: 1. Design Information
No ratings yet
Midas Gen: 1. Design Information
1 page
De Morgan
0% (1)
De Morgan
11 pages
International Society For Soil Mechanics and Geotechnical Engineering
No ratings yet
International Society For Soil Mechanics and Geotechnical Engineering
6 pages
CAED Assignment Questions
No ratings yet
CAED Assignment Questions
3 pages
Sci8-Q1-W5-6-L2-3 - Work, Power and Energy
No ratings yet
Sci8-Q1-W5-6-L2-3 - Work, Power and Energy
4 pages
CPX27xx-0010: Installation and Operating Instructions - EN
No ratings yet
CPX27xx-0010: Installation and Operating Instructions - EN
39 pages
Reg Pop Density
No ratings yet
Reg Pop Density
1 page
RCC Structure by PANDI MANI
No ratings yet
RCC Structure by PANDI MANI
13 pages
Strain Gauges For Integration in Fiber Composite Materials LI66
No ratings yet
Strain Gauges For Integration in Fiber Composite Materials LI66
2 pages
Rajasthan Basin
No ratings yet
Rajasthan Basin
239 pages
Design For Torsion and Shear According To ACI-318-99
No ratings yet
Design For Torsion and Shear According To ACI-318-99
1 page
Reviewer
No ratings yet
Reviewer
5 pages
Bunn Programing Manual
No ratings yet
Bunn Programing Manual
18 pages
IPM Indore 2021 by Cracku
No ratings yet
IPM Indore 2021 by Cracku
16 pages
ICDE 2024 Managing The Future Route Planning Influence Evaluation in Transportation Systems
No ratings yet
ICDE 2024 Managing The Future Route Planning Influence Evaluation in Transportation Systems
15 pages
36 Series Gear Box
No ratings yet
36 Series Gear Box
1 page

CSE472 Assignment 2

Uploaded by

CSE472 Assignment 2

Uploaded by

Document Version: 2024/09/12/02

Document version history

CSE472 (Machine Learning Sessional)

You might also like