100% found this document useful (1 vote)

474 views14 pages

Predicting Credit Card Approvals

The document discusses predicting credit card approvals using machine learning techniques. It outlines 10 tasks for a project: 1) loading and viewing the dataset, 2) exploring the data, 3) handling missing values through mean and frequent imputation, 4) converting non-numeric to numeric values, 5) splitting data into train and test sets, 6) scaling feature values, 7) fitting a logistic regression model to predict approvals, 8) making predictions and evaluating performance. The overall goal is to build an automatic credit card approval predictor and evaluate how well it performs.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

474 views14 pages

Predicting Credit Card Approvals

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Predicting Credit Card Approvals using ML

Techniques
Ravjot Singh Follow

Oct 27, 2020 · 10 min read

In this project, we’ll be using Credit Card Approval Dataset from UCI Machine Learning
Repository. The structure of our project will be as follows —

To get a basic introduction of our project & What’s the business problem associated
with it ?

We’ll start by loading and viewing the dataset.

To manipulate data, if there are any missing entries in the dataset.

To perform exploratory data analysis (EDA) on our dataset.

To pre-process data before applying machine learning model to the dataset.

To apply machine learning model that can predict if an individual’s application for a
credit card will be accepted or not.

Credit Card Applications and the problems associated with it

Nowadays, banks receive a lot of applications for issuance of credit cards. Many of them
rejected for many reasons, like high-loan balances, low-income levels, or too many inquiries
on an individual’s credit report. Manually analyzing these applications is error-prone and a
time-consuming process. Luckily, this task can be automated with the power of machine
learning and pretty much every bank does so nowadays. In this project, we will be build an
automatic credit card approval predictor using machine learning techniques, just like the
real banks do.
FIRST TASK

Importing the pandas package and loading the dataset.

1. pandas : Pandas is used to read the dataset file and import it as a dataframe, which is
similar to a table with rows and columns.

Importing the pandas packages and Loading the dataset

OUTPUT
On observing above, the output appears a bit confusing at its first sight, but let’s try to
figure out the most important features of a credit card application. We find that since the
data is confidential, the contributor of this dataset has anonymized the feature names. The
features of this dataset have been anonymized to protect the privacy, but this bloggives us a
pretty good overview of the probable features.The probable features in a typical credit card
application
are Gender , Age , Debt , Married , BankCustomer , EducationLevel , Ethnicity , YearsEmployed ,
PriorDefault , Employed , CreditScore , DriversLicense , Citizen , ZipCode , Income and finally
the ApprovalStatus .

SECOND TASK

As we can see from our first output at the data, the dataset has a mixture of numerical and
non-numerical features. This can be fixed with some pre-processing, but before we do that,
let’s learn about the dataset a bit more to see if there are other dataset issues that need to be
fixed.

So, let’s start by printing the summary statistics and dataframe information -

Data analysis part

OUTPUT

THIRD TASK

Manipulating the data — Part -1

We’ve uncovered some issues that will affect the performance of our machine learning
model if they go unchanged:

Our dataset contains both numeric and non-numeric data (specifically data that are
of float64 , int64 and object types). Specifically, the features 2, 7, 10 and 14 contain
numeric values (of types float64, float64, int64 and int64 respectively) and all the other
features contain non-numeric values (of type object).

The dataset also contains values from several ranges. Some features have a value range
of 0–28, some have a range of 2–67, and some have a range of 1017–100000. Apart
from these, we can get useful statistical information (like mean , max , and min ) about the
features that have numerical values.

Finally, the dataset has missing values, which we’ll take care of in this task. The missing
values in the dataset are labeled with ‘?’, which can be seen in the last cell’s output.

Now, let’s temporarily replace these missing value question marks with NaN.

numpy package: numpy enables us to work with arrays with great efficiency.

Importing numpy package and manipulating the dataset

OUTPUT

FOURTH TASK

Manipulating the data — Part -2

We have replaced all the question marks with NaNs. This is going to help us in the next
missing value treatment that we are going to perform in this task.

An important question that gets raised here is “why are we giving so much
importance to missing values?” Can’t they be just ignored? Ignoring missing values
can affect the performance of a machine learning model heavily. While ignoring the missing
values our machine learning model may miss out on information about the dataset that may
be useful for its training. Then, there are many models which cannot handle missing values
implicitly such as LDA.
So, to avoid this problem, we are going to impute the missing values with a strategy
called mean imputation.

Imputing the missing values with mean

OUTPUT

FIFTH TASK

Manipulating the data — Part -3

We have successfully taken care of the missing values present in the numeric columns.
There are still some missing values to be imputed for columns 0, 1, 3, 4, 5, 6 and 13. All of
these columns contain non-numeric data and this why the mean imputation strategy would
not work here. This needs a different treatment.

We are going to impute these missing values with the most frequent values as present in the
respective columns. This is good practice when it comes to imputing missing values for
categorical data in general.

Imputing the missing values with the most frequent value in that column

“OUTPUT” shows that there are no more missing values in the dataset

SIXTH TASK

Pre-processing the data — Part -1

The missing values are now successfully handled.

There is still some minor but essential data pre-processing needed before we proceed
towards building our machine learning model. We are going to divide these remaining pre-
processing steps into three main tasks:

1. Convert the non-numeric data into numeric.

2. Split the data into train and test sets.

3. Scale the feature values to a uniform range.

First, we will be converting all the non-numeric values into numeric ones. We do this
because not only it results in a faster computation but also many machine learning models
(like XGBoost) (and especially the ones developed using scikit-learn) require the data to be
in a strictly numeric format. We will do this by using a technique called label encoding.

Converting the non-numeric values into numeric values

SEVENTH TASK

Splitting the dataset into training and test sets

We have successfully converted all the non-numeric values to numeric ones.

Now, we will split our data into train set and test set to prepare our data for two different
phases of machine learning modeling: training and testing. Ideally, no information from the
test data should be used to scale the training data or should be used to direct the training
process of a machine learning model. Hence, we first split the data and then apply the
scaling.

Also, features like DriversLicense and ZipCode are not as important as the other features in
the dataset for predicting credit card approvals. We should drop them to design our
machine learning model with the best set of features. In Data Science literature, this is often
referred to as feature selection.

Splitting the data into training set (70%) and test set (30%)

EIGHTH TASK

Pre-processing the data — Part -2

The data is now split into two separate sets — train and test sets respectively. We are only
left with one final pre-processing step of scaling before we can fit a machine learning model
to the data.

Now, let’s try to understand what these scaled values mean in the real world. Let’s
use CreditScore as an example. The credit score of a person is their creditworthiness based
on their credit history. The higher this number, the more financially trustworthy a person is
considered to be. So, a CreditScore of 1 is the highest since we're rescaling all the values to
the range of 0-1.
Scaling the feature values to a given range

NINTH TASK

Fitting a Logistic Regression Model to the training set

Essentially, predicting if a credit card application will be approved or not is a classification

task. According to UCI, our dataset contains more instances that correspond to “Denied”
status than instances corresponding to “Approved” status. Specifically, out of 690 instances,
there are 383 (55.5%) applications that got denied and 307 (44.5%) applications that got
approved.

This gives us a benchmark. A good machine learning model should be able to accurately
predict the status of the applications with respect to these statistics.

Which model should we pick? A question to ask is: are the features that affect the credit
card approval decision process correlated with each other? Although we can measure
correlation, that is outside the scope of this notebook, so we’ll rely on our intuition that they
indeed are correlated for now. Because of this correlation, we’ll take advantage of the fact
that generalized linear models perform well in these cases. Let’s start our machine learning
modeling with a Logistic Regression model (a generalized linear model).

sklearn package: This machine learning library includes numerous machine learning
algorithms already builtin with certain parameters set as default parameters, so they
work right out of the box.
Importing Logistic Regression classification model from sklearn package

OUTPUT

TENTH TASK

Making predictions and evaluating the performance of the model

But how well does our model perform?

We will now evaluate our model on the test set with respect to classification accuracy. But
we will also take a look the model’s confusion matrix. In the case of predicting credit card
applications, it is equally important to see if our machine learning model is able to predict
the approval status of the applications as denied that originally got denied. If our model is
not performing well in this aspect, then it might end up approving the application that
should have been approved. The confusion matrix helps us to view our model’s
performance from these aspects.

Predicting the accuracy of model on the test set

“OUTPUT” showing the accuracy of our classification model

ELEVENTH TASK

Grid Search and making the model perform better

Our model was pretty good! It was able to yield an accuracy score of almost 84%.

For the confusion matrix, the first element of the of the first row of the confusion matrix
denotes the true negatives meaning the number of negative instances (denied applications)
predicted by the model correctly. And the last element of the second row of the confusion
matrix denotes the true positives meaning the number of positive instances (approved
applications) predicted by the model correctly.

Let’s see if we can do better. We can perform a grid search of the model parameters to
improve the model’s ability to predict credit card approvals.

scikit-learn’s implementation of logistic regression consists of different hyperparameters

but we will grid search over the following two:

tol

max_iter

Applying Hyper-parameters to make the model perform better

TWELFTH TASK

We have defined the grid of hyperparameter values and converted them into a single
dictionary format which GridSearchCV() expects as one of its parameters. Now, we will
begin the grid search to see which values perform best.
We will instantiate GridSearchCV() with our earlier logreg model with all the data we have.
Instead of passing train and test sets separately, we will supply X (scaled version) and y .
We will also instruct GridSearchCV() to perform a cross-validation of five folds.

We’ll end the notebook by storing the best-achieved score and the respective best
parameters.

Best score after applying hyper-parameters

OUTPUT

CONCLUSION:

While building this credit card approval predictor model, we tackled some of the most
widely-known pre-processing steps such as scaling, label encoding, and missing value
imputation. We finished with some machine learning model to predict if a person’s
application for a credit card would get approved or not given some information about that
person.

Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
3 pages
CareCredit Payment Options
No ratings yet
CareCredit Payment Options
7 pages
1050 Credit Card Lab
No ratings yet
1050 Credit Card Lab
8 pages
Credit Card Approval
No ratings yet
Credit Card Approval
15 pages
Credit Card Scoring Systems
100% (1)
Credit Card Scoring Systems
33 pages
Capital One Hack Advisory
No ratings yet
Capital One Hack Advisory
3 pages
6 Credit Cards
0% (1)
6 Credit Cards
77 pages
OnDeck Understanding Business Credit Eguide
No ratings yet
OnDeck Understanding Business Credit Eguide
9 pages
Credit Card Business
No ratings yet
Credit Card Business
51 pages
Credit Builder Products
No ratings yet
Credit Builder Products
4 pages
Fintech-Credit Karma PDF
100% (1)
Fintech-Credit Karma PDF
39 pages
Social Security Number For International Students
100% (1)
Social Security Number For International Students
2 pages
Credit Cards in India: Types, Providers, and Benefits
No ratings yet
Credit Cards in India: Types, Providers, and Benefits
15 pages
Achieve Instant Approval and Issuance of Credit Cards Using Bizagi - Nividous
No ratings yet
Achieve Instant Approval and Issuance of Credit Cards Using Bizagi - Nividous
1 page
Business Credit Approval Guide
100% (1)
Business Credit Approval Guide
5 pages
Random Test Credit Card Numbers
100% (1)
Random Test Credit Card Numbers
3 pages
How To Change Your IP To The CVV From Any Country
No ratings yet
How To Change Your IP To The CVV From Any Country
2 pages
Credit Cards in India
No ratings yet
Credit Cards in India
11 pages
Usingcrcards
100% (1)
Usingcrcards
2 pages
Analysis On Credit Card Fraud Detection Methods
No ratings yet
Analysis On Credit Card Fraud Detection Methods
5 pages
Credit Score
100% (3)
Credit Score
16 pages
NBI Clearance Online Application and Renewal 2018
No ratings yet
NBI Clearance Online Application and Renewal 2018
1 page
15
No ratings yet
15
1 page
I Have A Bad Credit Background - Are Unsecured Loans Possible? Three Faqs
100% (1)
I Have A Bad Credit Background - Are Unsecured Loans Possible? Three Faqs
2 pages
Electronic Banking: Push and Pull Messages
No ratings yet
Electronic Banking: Push and Pull Messages
11 pages
Online Scam Types and Prevention
No ratings yet
Online Scam Types and Prevention
8 pages
A Guide To Build Up Credit
No ratings yet
A Guide To Build Up Credit
15 pages
Bitcoin Carding Tutorial PDF
0% (2)
Bitcoin Carding Tutorial PDF
1 page
Credit Cards Best Rewards No Annual Fee
No ratings yet
Credit Cards Best Rewards No Annual Fee
1 page
Buy Now, Pay Later - New? Old? Better?: October 2019 David Ojerholm, Director Edouard Mouy, Consultant
100% (1)
Buy Now, Pay Later - New? Old? Better?: October 2019 David Ojerholm, Director Edouard Mouy, Consultant
10 pages
H/W Topic 6 4/11/19
No ratings yet
H/W Topic 6 4/11/19
4 pages
Credit Data Feature Overview
No ratings yet
Credit Data Feature Overview
2 pages
Tiderc Drac (Credit Card)
No ratings yet
Tiderc Drac (Credit Card)
74 pages
Credit Cards Information
50% (2)
Credit Cards Information
15 pages
Credit Card Fraud Detection System
100% (1)
Credit Card Fraud Detection System
7 pages
Credit Card Basics for Students
100% (2)
Credit Card Basics for Students
32 pages
Credit Card Fraudulent Transaction Detection and Prevention
100% (1)
Credit Card Fraudulent Transaction Detection and Prevention
8 pages
Credit Card Fraud Detection HMM
No ratings yet
Credit Card Fraud Detection HMM
4 pages
Credit Card Approval Prediction Model
100% (1)
Credit Card Approval Prediction Model
2 pages
Credit Card Approval System
100% (1)
Credit Card Approval System
28 pages
Faqs For E-Banking Customers
No ratings yet
Faqs For E-Banking Customers
5 pages
Standard Chartered Credit Card Guide
No ratings yet
Standard Chartered Credit Card Guide
14 pages
Inline: Import As Import As Import As Import As Matplotlib Import
100% (1)
Inline: Import As Import As Import As Import As Matplotlib Import
15 pages
Bank Loan
100% (1)
Bank Loan
57 pages
Online Loan Application and Verification ForPerson
No ratings yet
Online Loan Application and Verification ForPerson
39 pages
Credit Card Functionality Explained
No ratings yet
Credit Card Functionality Explained
5 pages
Digital Card FAQs TCs Digital Onboarding
No ratings yet
Digital Card FAQs TCs Digital Onboarding
5 pages
AUTO-LOAN Flowchart
No ratings yet
AUTO-LOAN Flowchart
3 pages
Credit Application Form
No ratings yet
Credit Application Form
3 pages
Banking System: The Banking System Provides Access To Eight Basic Functions of Banking. These Include
No ratings yet
Banking System: The Banking System Provides Access To Eight Basic Functions of Banking. These Include
24 pages
Identifying and Reporting Consumer Frauds and Scams
No ratings yet
Identifying and Reporting Consumer Frauds and Scams
15 pages
Paper 65-Fraud Detection in Credit Cards
No ratings yet
Paper 65-Fraud Detection in Credit Cards
12 pages
Visa Classic 3. Rupay Classic 6. Mastercard Gold 8. Rupay Platinum 9. Rupay Select Card
No ratings yet
Visa Classic 3. Rupay Classic 6. Mastercard Gold 8. Rupay Platinum 9. Rupay Select Card
4 pages
Deep Learning for Fraud Detection
No ratings yet
Deep Learning for Fraud Detection
8 pages
Digital Account Opening - 20190604
No ratings yet
Digital Account Opening - 20190604
14 pages
Affirm Buy Now Pay Later Case Study
No ratings yet
Affirm Buy Now Pay Later Case Study
9 pages
Credit Card Fraud Detection Using Random Forest Algorithm and CNN
No ratings yet
Credit Card Fraud Detection Using Random Forest Algorithm and CNN
48 pages
Data Mining Lab Manual CSE VII Sem
No ratings yet
Data Mining Lab Manual CSE VII Sem
63 pages
Progress Report 1
No ratings yet
Progress Report 1
11 pages
Articles Xgboost Classification With Smote-Enn Algorithm
No ratings yet
Articles Xgboost Classification With Smote-Enn Algorithm
11 pages
Sustainable Supply Chain Strategy
No ratings yet
Sustainable Supply Chain Strategy
40 pages
NGO-Nurturing Grim Openly
No ratings yet
NGO-Nurturing Grim Openly
32 pages
B1+ UNIT 3 Test Answer Key Higher
83% (12)
B1+ UNIT 3 Test Answer Key Higher
2 pages
Process: 4 Levels of Listening - Theory U/ Deep Listening Exercise
No ratings yet
Process: 4 Levels of Listening - Theory U/ Deep Listening Exercise
3 pages
The Implementation of Cooperative Learning Type TGT
No ratings yet
The Implementation of Cooperative Learning Type TGT
36 pages
This! A Few A Little Any Each Eyéry Ryäny PD: Yny - Voth
No ratings yet
This! A Few A Little Any Each Eyéry Ryäny PD: Yny - Voth
5 pages
Tony Hsieh and Zappos
No ratings yet
Tony Hsieh and Zappos
13 pages
Haleighpierceturek Resume
No ratings yet
Haleighpierceturek Resume
2 pages
Industry 4.0 4
No ratings yet
Industry 4.0 4
23 pages
របៀប boost post live page without blue verifies
No ratings yet
របៀប boost post live page without blue verifies
2 pages
Curriculum Definition Rubric
No ratings yet
Curriculum Definition Rubric
2 pages
QB DPMT VI Industrial Management Entrepreneurship
No ratings yet
QB DPMT VI Industrial Management Entrepreneurship
16 pages
Research Project Guidelines & Tips
No ratings yet
Research Project Guidelines & Tips
4 pages
Serbian Scientists' Legacy
No ratings yet
Serbian Scientists' Legacy
578 pages
Fashion Design Curriculum Guide
No ratings yet
Fashion Design Curriculum Guide
32 pages
QA/QC Plan Evaluation for A/E Firms
No ratings yet
QA/QC Plan Evaluation for A/E Firms
21 pages
Quess Corporate Deck
No ratings yet
Quess Corporate Deck
11 pages
Sample Lesson Plan
No ratings yet
Sample Lesson Plan
16 pages
DSI Presentation New
No ratings yet
DSI Presentation New
30 pages
Case Analysis: OMNITEL PRONTO ITALIA
100% (5)
Case Analysis: OMNITEL PRONTO ITALIA
5 pages
Deleuze on Rousseau: Course Analysis
No ratings yet
Deleuze on Rousseau: Course Analysis
74 pages
Philippines Building Standards
83% (6)
Philippines Building Standards
2 pages
AFRICA PERSONNEL SERVICES (PTY) LTD Vesrus GOVERNMENT OF NAMIBIA AND OTHERS (Explanatory Notes)
100% (1)
AFRICA PERSONNEL SERVICES (PTY) LTD Vesrus GOVERNMENT OF NAMIBIA AND OTHERS (Explanatory Notes)
18 pages
Teaching Resume Updated
No ratings yet
Teaching Resume Updated
2 pages
Tehran Air Pollution Dynamics
No ratings yet
Tehran Air Pollution Dynamics
8 pages
Controlled vs Freer Practice Guide
No ratings yet
Controlled vs Freer Practice Guide
3 pages
Nash
No ratings yet
Nash
3 pages
Jayanth Chowdary Ienapudi: Carreer Objective
No ratings yet
Jayanth Chowdary Ienapudi: Carreer Objective
4 pages
Internationally Recognised Core Labour Standards in Chinese Taipei
No ratings yet
Internationally Recognised Core Labour Standards in Chinese Taipei
10 pages

Predicting Credit Card Approvals

Uploaded by

Predicting Credit Card Approvals

Uploaded by

Predicting Credit Card Approvals using ML

Oct 27, 2020 · 10 min read

We’ll start by loading and viewing the dataset.

To manipulate data, if there are any missing entries in the dataset.

To perform exploratory data analysis (EDA) on our dataset.

To pre-process data before applying machine learning model to the dataset.

Credit Card Applications and the problems associated with it

Importing the pandas package and loading the dataset.

Importing the pandas packages and Loading the dataset

Data analysis part

Manipulating the data — Part -1

Importing numpy package and manipulating the dataset

Manipulating the data — Part -2

Imputing the missing values with mean

Manipulating the data — Part -3

Pre-processing the data — Part -1

The missing values are now successfully handled.

1. Convert the non-numeric data into numeric.

2. Split the data into train and test sets.

3. Scale the feature values to a uniform range.

Converting the non-numeric values into numeric values

Splitting the dataset into training and test sets

We have successfully converted all the non-numeric values to numeric ones.

Pre-processing the data — Part -2

Fitting a Logistic Regression Model to the training set

Essentially, predicting if a credit card application will be approved or not is a classification

Making predictions and evaluating the performance of the model

But how well does our model perform?

Predicting the accuracy of model on the test set

Grid Search and making the model perform better

scikit-learn’s implementation of logistic regression consists of different hyperparameters

Applying Hyper-parameters to make the model perform better

Best score after applying hyper-parameters

You might also like