Thanks to visit codestin.com
Credit goes to www.scribd.com

Open navigation menu

Scribd

0% found this document useful (0 votes)

4 views6 pages

PD Assignment

Uploaded by

hariharasudharsan10

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views6 pages

PD Assignment

Uploaded by

hariharasudharsan10

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

PREDICTIVE

MODELING
ASSIGNMENT : 2

BY
HARI HARA SUDHARSAN
R
720722110011
3rd YEAR IT ‘A’
1.Customer Churn Prediction in Telecom

Objective:

Identify why customers leave and predict which ones are likely to churn.

1. Using Categorical vs. Continuous Field Analysis in SPSS Modeler

Categorical Fields (e.g., Plan Type, Complaints):

 Use "Table" and "Distribution" nodes to view frequency of categories.

 Cross-tab with churn status to see which groups have high churn rates.

o Example: Prepaid users have a 30% churn rate, Postpaid only 10%.

Continuous Fields (e.g., Monthly Bill, Usage Time):

 Use "Statistics", "Histogram", and "Box Plot" nodes to examine:

o Average and median bill.

o Outliers (e.g., very high bills may cause dissatisfaction).

o Relationships with churn: Use "Pearson Correlation" to see how monthly bill
correlates with churn probability.

These insights guide variable selection and feature engineering for modeling.

2. How CHAID Helps in Predicting Churn

CHAID (Chi-squared Automatic Interaction Detection):

 A decision tree algorithm that selects splits based on chi-square tests.

 Excellent at identifying:

o Interaction effects between variables.

o Subgroups with significantly different churn rates.

Benefits:

 Works well with categorical variables.

 Produces easy-to-interpret trees (e.g., "If usage < 100 mins and has 2+ complaints →
50% churn probability").

 Allows multilevel splits, unlike binary trees (like CART).

You can plug in a CHAID node in SPSS Modeler and let it visualize the churn-driving factors
hierarchically.

3. Handling Missing Values

Steps:

1. Identify missing values using the "Data Audit" node.

2. Handle them using:

o "Missing Values" node for automatic imputation.

o Manual strategies:

 Categorical: Replace with "Unknown" or most common category.

 Numerical: Impute with mean, median, or regression based on other

fields.

3. Consider flag variables to indicate where data was missing — sometimes the
missingness itself is predictive!

2.Employee Salary Prediction Using Regression Models

Objective:

Predict salaries using experience, education, and department.

1. Why Data Partitioning is Important

Before building a model:

 Partition data into:

o Training set: 70% for building the model.

o Testing set: 30% for evaluating it.

This avoids overfitting—where the model performs well on old data but poorly on new
cases.

Use SPSS Modeler’s "Partition" node to split data automatically.

2. Key Factors for Accurate Salary Prediction

 Predictor Quality: Experience, education level (categorical), department.

 Linearity Check: Salary should increase proportionally with experience.

 Multicollinearity Check: Use correlation matrix or VIF (Variance Inflation Factor) to

ensure variables aren't duplicating info.

 Outlier Management: Salary outliers (e.g., C-level execs) can distort regression —
consider log transformations or remove them.

3. Role of Central Tendency and Variability

Before modeling:

 Mean/Median: Show typical salary — good for baseline comparison.

 Standard Deviation: Indicates how much variation there is.

 Skewness/Kurtosis: Reveal whether salary is normally distributed or skewed

(important for linear regression assumptions).

Use "Statistics" node to get a full statistical profile of salary and its predictors.

3. Credit Risk Analysis Using Neural Networks

Objective:

Predict if a loan applicant is a high or low credit risk.

1. Why Data Transformation Helps

Before modeling, you may need to:

 Bin continuous data (e.g., income into low/medium/high).

o Helps with nonlinear relationships and model interpretability.

 Reclassify categories (e.g., combine job types into “skilled” vs “unskilled”).

In SPSS Modeler:

 Use "Binning node" or "Reclassify node".

 Transforms improve model convergence and reduce noise.

2. Why Choose Neural Network Over Logistic Regression

 Logistic Regression:

o Simple, explainable.

o Assumes linear relationship between predictors and log-odds of default.

 Neural Networks (MLP, RBF):

o Learn complex, nonlinear patterns (e.g., "high income + low repayment =

risk").

o Multi-layer perceptron (MLP) models capture interactions and subtle signals

in the data.

o Can achieve higher accuracy, especially with large datasets.

When to prefer NN:

 When the relationship is not obviously linear.

 When predictive power matters more than interpretability.

3. Using Statistical Functions to Analyze Risk Trends

 Use Descriptive Statistics nodes to summarize key variables:

o Mean loan amount by risk category.

o Frequency of late payments.

 Use Graphs (bar charts, time trends) to visualize:

o Risk categories by income.

o Changes in customer behavior over time.

Summary of Tools in SPSS Modeler

Task Node to Use

Field Distribution Data Audit, Histogram, Boxplot

Missing Value Handling Missing Values node

Partitioning Partition node

Modeling CHAID, Regression, Neural Net

Binning/Reclassifying Binning node, Reclassify node

Task Node to Use

Statistical Summary Statistics node, Table node

You might also like

Chapter 8
No ratings yet
Chapter 8
15 pages
Course Project Report: Indian Institute of Technology, Kanpur
No ratings yet
Course Project Report: Indian Institute of Technology, Kanpur
15 pages
Nikhil Sanjay Thorat Assignment 2
No ratings yet
Nikhil Sanjay Thorat Assignment 2
9 pages
CE802 Report
No ratings yet
CE802 Report
7 pages
Churn Prediction Algorithms Study
No ratings yet
Churn Prediction Algorithms Study
25 pages
Syllabus
No ratings yet
Syllabus
4 pages
1.descriptive Statistics and Probability Distributions:: Datascience Course Content
No ratings yet
1.descriptive Statistics and Probability Distributions:: Datascience Course Content
10 pages
MBA Data Analytics Course Plan
No ratings yet
MBA Data Analytics Course Plan
10 pages
Lecture 1.2.2
No ratings yet
Lecture 1.2.2
18 pages
Fda A3 13642032 PDF
No ratings yet
Fda A3 13642032 PDF
19 pages
DM Unit - 3
No ratings yet
DM Unit - 3
21 pages
CA1MKTM535
No ratings yet
CA1MKTM535
16 pages
DM Notes - UNIT 3
No ratings yet
DM Notes - UNIT 3
24 pages
Telco Churn Analysis for Students
No ratings yet
Telco Churn Analysis for Students
23 pages
Pma 5
No ratings yet
Pma 5
39 pages
Lecture 1 Introduction PM
No ratings yet
Lecture 1 Introduction PM
21 pages
Report of Profit Prediction
No ratings yet
Report of Profit Prediction
15 pages
Churn Prediction with ML Techniques
No ratings yet
Churn Prediction with ML Techniques
77 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
4 pages
Salary Prediction
No ratings yet
Salary Prediction
4 pages
Predictive Analytics Guide
No ratings yet
Predictive Analytics Guide
17 pages
CSR Performance Classification
No ratings yet
CSR Performance Classification
5 pages
CE802 Pilot
No ratings yet
CE802 Pilot
2 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Module 6 Predictive Analytics
No ratings yet
Module 6 Predictive Analytics
20 pages
Data Mining Classification Prediction
No ratings yet
Data Mining Classification Prediction
3 pages
Capstone Project
No ratings yet
Capstone Project
6 pages
(1.2.2019) Classification and Prediction Basic Concepts
No ratings yet
(1.2.2019) Classification and Prediction Basic Concepts
4 pages
SAS Test
No ratings yet
SAS Test
4 pages
Assignment 3 F1 - F4
No ratings yet
Assignment 3 F1 - F4
19 pages
21aml17253-Pa Ia1
No ratings yet
21aml17253-Pa Ia1
3 pages
Article Review 11 Eng
No ratings yet
Article Review 11 Eng
18 pages
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
No ratings yet
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
28 pages
Unit 7 - Introduction To Predictive Analytics
No ratings yet
Unit 7 - Introduction To Predictive Analytics
10 pages
DM Unit 4
No ratings yet
DM Unit 4
22 pages
Unit 3
No ratings yet
Unit 3
53 pages
Predictive Analytical Models CHAP 2
No ratings yet
Predictive Analytical Models CHAP 2
24 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
Unit 4
No ratings yet
Unit 4
3 pages
TB 969425740
No ratings yet
TB 969425740
16 pages
UNIT-5 DWM
No ratings yet
UNIT-5 DWM
73 pages
ML Assignment-2 PDF
No ratings yet
ML Assignment-2 PDF
4 pages
Introduction To Predictive Analytics: UNIT-1
No ratings yet
Introduction To Predictive Analytics: UNIT-1
14 pages
The Predictive Analytics Model
No ratings yet
The Predictive Analytics Model
6 pages
Paper Prediction: Ostl Mini Project 1.hrutwika Ambavane 2.juili Kadu 3. Bhavesh Bawankar 4.akshat Singh
No ratings yet
Paper Prediction: Ostl Mini Project 1.hrutwika Ambavane 2.juili Kadu 3. Bhavesh Bawankar 4.akshat Singh
13 pages
Predective Analytics
No ratings yet
Predective Analytics
11 pages
Pa - PPT Unit 4
100% (1)
Pa - PPT Unit 4
96 pages
Machine Learning Techniques For Predicting Credit Approvals: Prawar Mundra 2018IMG-037
No ratings yet
Machine Learning Techniques For Predicting Credit Approvals: Prawar Mundra 2018IMG-037
16 pages
Ba Unit 4 - Part1
No ratings yet
Ba Unit 4 - Part1
7 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
PAM All Files
No ratings yet
PAM All Files
90 pages
Predictive Modelling Sweta Kumari
No ratings yet
Predictive Modelling Sweta Kumari
35 pages
Phase-2 (1) .Docx - Abi
No ratings yet
Phase-2 (1) .Docx - Abi
11 pages
ICAICT 2016 Paper 26
No ratings yet
ICAICT 2016 Paper 26
8 pages
Classification & Prediction Guide
No ratings yet
Classification & Prediction Guide
83 pages
Unit 5 (DS)
No ratings yet
Unit 5 (DS)
15 pages
Data Cleaning & Predictive Modeling Guide
No ratings yet
Data Cleaning & Predictive Modeling Guide
26 pages
3 DM Classification
No ratings yet
3 DM Classification
62 pages
C5
No ratings yet
C5
6 pages
CS822 DataMining Week3
No ratings yet
CS822 DataMining Week3
91 pages
The Summary of An Urdu Version of Diabetes Self-Ca
No ratings yet
The Summary of An Urdu Version of Diabetes Self-Ca
7 pages
Emotional Maturity in Ernakulam Teens
No ratings yet
Emotional Maturity in Ernakulam Teens
10 pages
FX & FX Derivatives - Numerical Guide
No ratings yet
FX & FX Derivatives - Numerical Guide
126 pages
Filipino Work Values and Job Satisfaction Among SC
No ratings yet
Filipino Work Values and Job Satisfaction Among SC
31 pages
Mbaiqtregression 191207012407
100% (1)
Mbaiqtregression 191207012407
26 pages
Valleyadmin, Journal Manager, 1 Theijsshi
No ratings yet
Valleyadmin, Journal Manager, 1 Theijsshi
8 pages
Digital Learning and Academic Performance of
No ratings yet
Digital Learning and Academic Performance of
19 pages
Summer Trainning Project Report On Managing, and Exhicution Policies in Sales and MARKETING DEPARTMENT Undertaken at "Aashman Foundation"
No ratings yet
Summer Trainning Project Report On Managing, and Exhicution Policies in Sales and MARKETING DEPARTMENT Undertaken at "Aashman Foundation"
42 pages
Suggestionof Empirical Equationsfor Damping Ratioof Plastican
No ratings yet
Suggestionof Empirical Equationsfor Damping Ratioof Plastican
7 pages
Chapter No 11 (Simple Linear Regression)
No ratings yet
Chapter No 11 (Simple Linear Regression)
3 pages
Impact of Digital Marketing On Consumer Behaviour
No ratings yet
Impact of Digital Marketing On Consumer Behaviour
9 pages
Python June2025
No ratings yet
Python June2025
2 pages
Irrational 12 Thoughts Summary
No ratings yet
Irrational 12 Thoughts Summary
93 pages
1254 - B.Com (ABST) Semester-I, II
No ratings yet
1254 - B.Com (ABST) Semester-I, II
12 pages
Diabetes Stigma and Its Association With Diabetes Outcomes: A Cross-Sectional Study of Adults With Type 1 Diabetes
No ratings yet
Diabetes Stigma and Its Association With Diabetes Outcomes: A Cross-Sectional Study of Adults With Type 1 Diabetes
7 pages
G02 Com107
No ratings yet
G02 Com107
2 pages
Making Sense of Statistics A Conceptual Overview 7th Edition Fred Pyrczak - Own The Ebook Now and Start Reading Instantly
100% (17)
Making Sense of Statistics A Conceptual Overview 7th Edition Fred Pyrczak - Own The Ebook Now and Start Reading Instantly
75 pages
Identification of Parkinson's Disease Using Machine Learning Algorithms
No ratings yet
Identification of Parkinson's Disease Using Machine Learning Algorithms
4 pages
July 2015 PMTH003Q
No ratings yet
July 2015 PMTH003Q
20 pages
Radhe Shyam
No ratings yet
Radhe Shyam
13 pages
Special Correlation Methods
No ratings yet
Special Correlation Methods
64 pages
Factors Influencing Mall Shoppers
No ratings yet
Factors Influencing Mall Shoppers
7 pages
Cfa二级百题预测金程教育学员版答案
No ratings yet
Cfa二级百题预测金程教育学员版答案
143 pages
EI 2023 Electronics and Instrument Engineering Etr 2023 Paper
No ratings yet
EI 2023 Electronics and Instrument Engineering Etr 2023 Paper
56 pages
Emotional Intelligence in Adolescent Education
No ratings yet
Emotional Intelligence in Adolescent Education
14 pages
Bca 3 Sem Statistical Methods 2019
No ratings yet
Bca 3 Sem Statistical Methods 2019
3 pages
UUM Full Thesis Report
No ratings yet
UUM Full Thesis Report
480 pages
Regression Analysis MCQ's
No ratings yet
Regression Analysis MCQ's
3 pages