0% found this document useful (0 votes)

4 views5 pages

Exploratory Data Analysis

This report conducts an exploratory data analysis on a customer delinquency dataset to identify trends and risk indicators associated with loan delinquency. Key findings highlight that missed payments, low credit scores, and high credit utilization are strong indicators of delinquency. The report recommends steps for data auditing, cleaning, feature engineering, and model building to enhance predictive modeling efforts.

Uploaded by

divyagawas143

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views5 pages

Exploratory Data Analysis

Uploaded by

divyagawas143

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Exploratory Data Analysis (EDA) Summary

Report

1. Introduction

The purpose of this report is to conduct an initial exploratory data analysis on the
provided customer delinquency dataset. The primary goal is to identify key
trends, patterns, and potential risk indicators associated with loan delinquency.
The insights gained from this analysis will be instrumental in preparing the data
for subsequent predictive modeling, aiming to forecast future customer behavior
and credit risk.

2. Dataset Overview

This section summarizes the dataset, including the number of records,

keyvariables, and data types. It also highlights any anomalies, duplicates, or
inconsistencies observed during the initial review of the available data sample.

■Key dataset attributes:

● Number of records:
The dataset contains records for a minimum of 494 distinct customers. A full
review of the dataset would be required to determine the exact total.

●Key variables:
Customer_ID: A unique identifier for each customer.

●Age: The customer's age in years.

●Income: The customer's annual income.

●Credit Score: A numerical score indicating creditworthiness.

●Credit Utilization: The ratio of credit used to available credit, as defined in the
glossary.
●Missed_Payments: A count of the number of missed payments.

●Delinquent_Account: The target variable, indicating whether an account is

delinquent (1) or not (0).

●Loan_Balance: The outstanding loan balance.

●Debt_to Income Ratio: The ratio of a customer's total debt to their gross
income.

●Employment_Status: The customer's employment status (e.g., 'Employed',

'Self-employed', 'Unemployed').

●Account_Tenure: The duration of the customer's account in months.

●Credit Card Tune: The type ofCredit_Card_Type: The type of credit card the
customer holds.

●Location: The customer's geographical location.

●Month_1 Month_6: The payment status for each of the last six months ('On-
time', 'Late', 'Missed').

●Data types: The dataset contains a mix of numerical and categorical data. Age,
Income, Credit_Score, Credit_Utilization Missed_Payments, Loan_Balance
Debt_to_Income_Ratio, and Account_Tenure are numerical. The remaining
variables (Customer_ID Employment_Status, Credit_Card_Type, Location,
Month_1 Month 6, and Delinquent_Account ) are categorical.

3. Missing Data Analysis

Identifying and addressing missing data is critical to ensuring model accuracy.

This section outlines missing values in the dataset, the approach taken to handle
them, and justifications for the chosen method.
●Key missing data findings: Based on the provided data sample, no missing
values were immediately apparent. A full-scale analysis of the complete dataset
is required to confirm the presence of any missing data.

●Missing data treatment: If missing data were identified, a strategy of

imputation would be recommended. For numerical features like Income or
Loan_Balance, an appropriate imputation method could involve using the mean
or median of the variable, or more advanced techniques like K-Nearest Neighbors
(KNN) imputation ifcorrelations with other variables are high. For categorical
features like Employment_Status or Location, imputation with the mode or
creating a new 'Unknown' category would be a suitable approach.

4. Key Findings and Risk Indicators

This section identifies trends and patterns that may indicate risk factors for
delinquency. Feature relationships and statistical correlations are explored to
uncover insights relevant to predictive modeling.

● Correlations observed between key variables:

●Missed Payments: The Missed_Payments variable appears to be a strong

indicator of delinquency. For example,CUST0002, with 6 missed payments, is
marked as a delinquent account (1), while CUST0003, with 0 missed payments, is
not (0). This suggests a direct positive correlation.

●Credit Score: A lower Credit Score seems to correlate with a higher likelihood of
delinquency.

●Credit Utilization: A higher Credit_Utilization ratio may also be associated with

delinquency, as it indicates a customer is using a large portion of their available
credit, which can be a sign of financial strain.

●Payment History (Month 1-6): The detailed payment status for the past six
months provides granular
behavioral data. A high frequency of 'Late' or 'Missed' payments in this period is
a clear precursor to a delinquent status. The data for CUST0002 corroborates
this, showing 'Missed' and 'Late' payments across multiple months.
●Employment Status: The data includes various employment statuses
(Employed, Self-employed, Unemployed). It is highly probable that unemployed
customers or those with less stable employment are at a higher risk of
delinquency.

●Unexpected anomalies: An interesting data point is CUST0494, who has a very

low Credit Score of 306. This value is an extreme outlier and warrants further
investigation to determine if it is a data entry error or a legitimate score.
Suchdata point is CUST0494, who has a very low Credit Score of 306. This value
is an extreme outlier and warrants further investigation to determine if it is a
data entry error or a legitimate score. Such anomalies can significantly skew a
predictive model and should be handled with care.

5. Al & GenAl Usage

Generative Al tools were used to summarize the dataset, infer data patterns, and
structure this report. The Al analyzed the provided data snippet, identified the
variables and their types, and extrapolated potential relationships and risk
indicators based on common data analysis practices.

Example Al prompts used:

"Create an EDA reportfollowing the structure of the

EDA_SummaryReport_Template.docx file. Analyze the key variables, identify
potential risk factors for delinquency, and propose next steps for predictive
modeling."

●"Analyze the provided CSV snippet to identify data types, missing values, and
potential correlations between the target variable 'Delinquent_Account' and
other features."

6. Conclusion & Next Steps

The initial EDA has successfully identified several key risk indicators for customer
delinquency, most notably a high number of missed payments, a low credit
score, and a high credit utilization ratio. The detailed payment history over the
last six months provides valuable behavioral data that can be based on the
provided customer delinquency dataset,provides valuable behavioral data that
can be used to build a robust predictive model.
The recommended next steps are:

1. Full Data Audit: Conduct a comprehensive scan of the entire dataset to confirm
the findings, identify any other anomalies, and accurately quantify missing data.

2. Data Cleaning & Preprocessing: Address any missing values using the
proposed imputation strategies.

3. Feature Engineering: Create new features from the existing data, such as a
"total months with late payments" variable from the Month_1 to Month_6
columns, or categorical bins for numerical data.

4. Model Building: Use the cleaned and prepared data to train a predictive model
(e.g., Logistic Regression,Gradient Boosting Machines) to forecast customer
delinquency.

5. Model Evaluation: Evaluate the model's

performance using appropriate metrics such as AUC-ROC and a confusion matrix

to ensure it can accurately identify customers at risk.

ML Lab Manual TE 2021-22
No ratings yet
ML Lab Manual TE 2021-22
43 pages
EDA SummaryReport Filled
No ratings yet
EDA SummaryReport Filled
4 pages
EDA Report
No ratings yet
EDA Report
6 pages
Filled EDA Summary Report
No ratings yet
Filled EDA Summary Report
3 pages
Assignment Tata
No ratings yet
Assignment Tata
3 pages
Geldium EDA Summary Report Template Filled
No ratings yet
Geldium EDA Summary Report Template Filled
2 pages
XYZ Data Analysis Report
100% (1)
XYZ Data Analysis Report
14 pages
Internship IN Data Analysis Using Machine Learning: Gopal Tiwari
No ratings yet
Internship IN Data Analysis Using Machine Learning: Gopal Tiwari
44 pages
EDA Example Answer
No ratings yet
EDA Example Answer
3 pages
Aea2014 Ps Meta
No ratings yet
Aea2014 Ps Meta
24 pages
Institute of Mathematical Statistics
No ratings yet
Institute of Mathematical Statistics
20 pages
Probit Logit Ohio PDF
No ratings yet
Probit Logit Ohio PDF
16 pages
Econometrics for Finance Assignment
100% (1)
Econometrics for Finance Assignment
3 pages
AIS-Q450 (MTBF) : Reliability Prediction Report
No ratings yet
AIS-Q450 (MTBF) : Reliability Prediction Report
16 pages
EDA Summary Report
No ratings yet
EDA Summary Report
6 pages
EDA SummaryReport Nivetha Final
No ratings yet
EDA SummaryReport Nivetha Final
3 pages
Dimensionality Reduction Algorithms
No ratings yet
Dimensionality Reduction Algorithms
34 pages
Predicting Inflation Through Online Prices
No ratings yet
Predicting Inflation Through Online Prices
20 pages
Credit EDA Case Study Problem Statement
No ratings yet
Credit EDA Case Study Problem Statement
4 pages
G22008 Arnab Sarkar SecA
No ratings yet
G22008 Arnab Sarkar SecA
9 pages
CIEA Term Project
No ratings yet
CIEA Term Project
19 pages
Mod 8 Test Practice
No ratings yet
Mod 8 Test Practice
17 pages
Bank Loan Case Study Report
No ratings yet
Bank Loan Case Study Report
23 pages
V25 C12 240tips
No ratings yet
V25 C12 240tips
14 pages
Morey - 2016 - The Fallacy of Placing Confidence Interval
No ratings yet
Morey - 2016 - The Fallacy of Placing Confidence Interval
21 pages
2 - Central Limit Theorem
No ratings yet
2 - Central Limit Theorem
15 pages
Geldium EDA Report
No ratings yet
Geldium EDA Report
3 pages
Multiple Regression: Problem Set 7
No ratings yet
Multiple Regression: Problem Set 7
3 pages
423 - ShreyaKumari - TSA - 2 - Shreya Kumari
No ratings yet
423 - ShreyaKumari - TSA - 2 - Shreya Kumari
5 pages
Unsupervised Classification of Multivariate Time Series Using VPCA and Fuzzy Clustering With Spatial Weighted Matrix Distance
No ratings yet
Unsupervised Classification of Multivariate Time Series Using VPCA and Fuzzy Clustering With Spatial Weighted Matrix Distance
10 pages
EDA SummaryReport
No ratings yet
EDA SummaryReport
4 pages
EDA Risk Profiling Report
No ratings yet
EDA Risk Profiling Report
2 pages
Credit Card Fraud Detection Guide
No ratings yet
Credit Card Fraud Detection Guide
5 pages
Business Analytics - Training Curriculum - SKOLAR
No ratings yet
Business Analytics - Training Curriculum - SKOLAR
16 pages
Econometrics 1 Cumulative Final Study Guide
No ratings yet
Econometrics 1 Cumulative Final Study Guide
35 pages
Assignment AP&S PDF
No ratings yet
Assignment AP&S PDF
9 pages
QA Notes
No ratings yet
QA Notes
8 pages
Business Summary Report
No ratings yet
Business Summary Report
4 pages
EDA Case Study
No ratings yet
EDA Case Study
94 pages
CompFin 2020 SS QF Sheet 03
No ratings yet
CompFin 2020 SS QF Sheet 03
2 pages
Engineering Probability & Stats Syllabus
No ratings yet
Engineering Probability & Stats Syllabus
4 pages
EDA Summary Report
No ratings yet
EDA Summary Report
2 pages
Capstone Project - Final Submission
No ratings yet
Capstone Project - Final Submission
36 pages
Excel Solver for Curve Fitting
No ratings yet
Excel Solver for Curve Fitting
3 pages
Wa0001
No ratings yet
Wa0001
2 pages
EDA Report
No ratings yet
EDA Report
2 pages
Geldium EDA Report Template
No ratings yet
Geldium EDA Report Template
5 pages
EDA SummaryReport
No ratings yet
EDA SummaryReport
5 pages
3 Normal Distribution
No ratings yet
3 Normal Distribution
2 pages
Aniket Project
No ratings yet
Aniket Project
4 pages
Business Statistics
No ratings yet
Business Statistics
4 pages
Capstone Project
100% (1)
Capstone Project
7 pages
1 Sample Size Computation
No ratings yet
1 Sample Size Computation
4 pages
EDA Report
No ratings yet
EDA Report
4 pages
Credit Risk Case Study
No ratings yet
Credit Risk Case Study
2 pages
EDA Risk Profiling Report
No ratings yet
EDA Risk Profiling Report
2 pages
Credit Data Analysis Summary
No ratings yet
Credit Data Analysis Summary
3 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
34 pages
Delinquency Risk Report
No ratings yet
Delinquency Risk Report
3 pages
Loan Delinquency Prediction-1
No ratings yet
Loan Delinquency Prediction-1
4 pages
EDA SummaryReport Template
No ratings yet
EDA SummaryReport Template
3 pages
Thera Bank Loan Campaign Analysis
No ratings yet
Thera Bank Loan Campaign Analysis
21 pages
Credit Risk Analysis Report
No ratings yet
Credit Risk Analysis Report
1 page
Credit Default Project 23124001
No ratings yet
Credit Default Project 23124001
13 pages
EDA SummaryReport
No ratings yet
EDA SummaryReport
3 pages
Credit EDA Case Study
No ratings yet
Credit EDA Case Study
42 pages
Credit EDA Assignment PDF
No ratings yet
Credit EDA Assignment PDF
40 pages
No 3
No ratings yet
No 3
2 pages
Trainity-Data An
No ratings yet
Trainity-Data An
24 pages
Credit Risk EDA Business Insights
No ratings yet
Credit Risk EDA Business Insights
8 pages
Bank Loan Default Risk Analysis
No ratings yet
Bank Loan Default Risk Analysis
26 pages
Updated Business Summary Report
No ratings yet
Updated Business Summary Report
3 pages
Exploratory Data Analysis Report
No ratings yet
Exploratory Data Analysis Report
2 pages
EDA SummaryReport
No ratings yet
EDA SummaryReport
2 pages
Problem Statement
No ratings yet
Problem Statement
11 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
22 pages
Decision Making Assignment
No ratings yet
Decision Making Assignment
6 pages
Group 5 Dseb64a Report
No ratings yet
Group 5 Dseb64a Report
10 pages
Thera Bank Loan Campaign Analysis
100% (1)
Thera Bank Loan Campaign Analysis
21 pages
EDA Report 3
No ratings yet
EDA Report 3
4 pages
Goive Me A File So That Can Uplkoad
No ratings yet
Goive Me A File So That Can Uplkoad
2 pages
Imputation Guide Handout
No ratings yet
Imputation Guide Handout
3 pages
EDA Summary Report
No ratings yet
EDA Summary Report
2 pages
EDA SummaryReport Template
No ratings yet
EDA SummaryReport Template
2 pages
Final Project Title and Abstract Group-3
No ratings yet
Final Project Title and Abstract Group-3
5 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
2 pages
Geldium EDA Report
No ratings yet
Geldium EDA Report
1 page