Loan Approval Prediction Using Machine Learning
Loan Approval Prediction Using Machine Learning
M.Shanmukha Priya5
1
[email protected],
2
[email protected],
3
[email protected],
4
[email protected],
5
[email protected]
1,2,3,4,5
Department of CSE,
Abstract:
The process of loan approval is crucial for financial institutions, as it involves assessing the
risk associated with lending funds. Traditional methods for loan approval are time-consuming
and often subjective, leading to delays and inconsistencies. To address this issue, machine
learning (ML) models have been increasingly employed to automate and enhance the
accuracy of loan predictions. This project explores and compares multiple ML models,
including XGBoost, Random Forest, Support Vector Machine (SVM), and Logistic
Regression, to determine the most effective approach for loan approval prediction. The
models are trained on historical loan data, considering various financial and demographic
features of applicants. A comparative analysis is conducted based on performance metrics
such as accuracy, precision, recall, and F1-score. Through this analysis, the study aims to
identify the most reliable model that minimizes errors in prediction while reducing processing
time. Preliminary results indicate that ensemble-based models, particularly XGBoost and
Random Forest, outperform other classifiers in terms of accuracy and robustness.
Implementing such predictive models can significantly streamline the loan approval process,
enhancing decision-making efficiency and reducing financial risks for lenders.
Keywords: Loan Approval Prediction, Machine Learning, Random Forest, Support Vector Machine
(SVM), Logistic Regression,XGBoost Credit Scoring, Financial Risk Assessment.
1 Introduction:
In banking and financial institutions, approval of loans is essential for institutional growth
and stability. Traditional evaluation of loan proposals is dependent on manual validation and
risk-based checks, which take time and can be subject to errors, thus resulting in over-
approval or under-approval of undeserving or default-prone clients, respectively. For this
problem, machine learning (ML) models can eliminate delays and augment predictive
accuracy regarding the approval of loans, allowing improved decision making, minimizing
default in loans, and maximizing asset deployment.
Simple ML algorithms such as Logistic Regression, Decision Trees, and Support Vector
Machines (SVMs) have been applied to loan prediction systems with satisfactory outcomes.
But when loan data is increasingly complex, there is a requirement for more complex models
to handle complex relationships and enhance prediction accuracy. Class imbalance, where
approved loans outnumber defaults by far, is one of the major challenges in loan prediction.
This may skew the predictions towards the majority class (approved loans). To address this,
sophisticated models like Random Forests, XGBoost, Neural Networks, and SVM have been
used to enhance accuracy and manage class imbalanceThis paper examines the use of these
machine learning models to predict loan approval improvement, based on features such as
credit score, income, amount of loan, and employment status. The decision-making process is
automated by the system, aiding financial institutions to reduce risks and make the approval
process more efficient, while being free from prejudice and making informed decisions.
The subsequent sections deal with the training dataset, the algorithms used, their evaluation
parameters, and results achieved.
2 Literature Review
1. Loan Approval Prediction based on Machine Learning Approach
Authors: Kumar Arun, Garg Ishan, Kaur Sanmeet
Year: 2023.This paper focuses on predicting whether granting a loan to a specific person is
safe. The work is divided into four parts: (i) Data Collection, (ii) Comparison of Machine
Learning Models, (iii) Training the Model, (iv) Testing.
2. Exploring Machine Learning Algorithm for Loan Sanctioning
Authors: E. Chandra Blessie, R. Rekha
Year: 2024
This paper addresses the challenge faced by banks and NBFCs in granting loans amidst
limited capital. It uses past customer data and a trained machine learning model to predict
loan repayment. The study shows Naïve Bayes as the most effective model for loan
forecasting.
3. Loan Prediction using Machine Learning Model
Year: 2024
The paper focuses on reducing the risk in loan approval by predicting whether it’s safe to
grant a loan. The methodology involves mining past loan data to train a machine learning
model. The paper compares models like Classification, Logistic Regression, Decision Trees,
and Gradient Boosting to predict loan safety.
4. Loan Prediction using Decision Tree and Random Forest
Authors: Kshitiz Gautam, Arun Pratap Singh, Keshav Tyagi, Mr. Suresh Kumar
Year: 2023
The paper addresses the growing number of loan applications in India and aims to predict
whether a customer will repay the loan. It uses exploratory data analysis techniques to
classify applicants as defaulters or non-defaulters, focusing on decision tree and random
forest models.
5. Loan Default Prediction Using Neural Networks and Random Forest
Authors: Kumar Ashish, Yadav Pooja
Year: 2024
This study focuses on classifying borrowers as defaulters or non-defaulters based on their
credit history. It compares the performance of Neural Networks and Random Forest models,
concluding that ensemble methods like Random Forest provide more reliable predictions than
standalone deep learning models.
6. Enhancing Loan Approval Decisions Using Machine Learning
Authors: Choudhary Deepak, Agrawal Simran
Year: 2023
The paper proposes a machine learning framework to improve loan approval decisions. It
evaluates models such as Logistic Regression, K-Nearest Neighbors (KNN), and XGBoost,
demonstrating that XGBoost achieves the best predictive accuracy with minimal overfitting.
7. A Hybrid Model for Loan Risk Assessment Using Machine Learning
Authors: Mishra Alok, Sharma Neha
Year: 2024
This research presents a hybrid machine learning model that combines Decision Trees with
Gradient Boosting for loan approval prediction. The study finds that hybrid models improve
classification performance, particularly in handling imbalanced datasets.
8. Loan Approval System Using Explainable AI Techniques
Authors: Desai Rakesh, Kapoor Ananya
Year: 2023
The paper discusses the implementation of explainable AI in loan approval systems. It applies
SHAP values and LIME to improve transparency in machine learning models such as
Random Forest and Neural Networks, ensuring better trust in automated decisions.
3.Loan Prediction Methodology:
Import the necessary libraries, such as scikit-learn, pandas, and numpy, to process data
and create a prediction model.Fill a pandas DataFrame with the loan data.Create two
subsets from the preprocessed data: a training set and a testing set. The predictive
model will be trained using the training set, and its performance will be assessed using the
testing set.Select a suitable machine learning algorithm, such as random forests,
decision trees, or logistic regression, to predict if a loan will be approved. Create an
instance of the selected model and adjust any required hyperparameters. Using the fit()
function, adjust the model to the training set of data. In order to produce predictions, the
model will discover patterns and relationships in the training data Depending on its
characteristics, the model will categorize each loan application as authorized or denied.
Compare the testing set's actual loan approval labels to the expected loan approval
labels, all are represented in the Fig.1
Collection of Data set
Result analysis
Performance metrics
Machine learning models can exhibit a diverse range of characteristics and behaviors, making
it challenging to identify the optimal model for a given task. Consequently, it is crucial to
possess a set of tools that can assess the performance of machine learning models effectively.
Several commonly employed quality control measures in machine learning are outlined
below. Among these measures, the accuracy, precision, recall, and F1-score stand out as the
most widely used method for evaluating model performance. The confusion matrix for
computing accuracy, precision, recall, and F1-score is presented below.
1.True Positives occur when the prediction is YES, and the actual output is YES.
2.True Negatives occur when the prediction is NO and the actual output is NO.
3.False Positives occur when the prediction is YES, but the actual output is NO.
4.False Negatives occur when the prediction is NO and the actual output is YES.
Saving and Deploying the Model After training, the best model is saved for prediction in the
future. The model can be deployed through:
Web/Mobile Applications:Images of skin lesions can be uploaded by users, and the model
will classify them immediately.
Cloud Deployment: It can be hosted on AWS, Google Cloud, or TensorFlow Serving.
Edge Devices: The model can be saved as TensorFlow Lite for mobile use, allowing
screening on the device.
6.RESULTS AND DISCUSSION
We will go each steps of the program. Firstly, Python programmers frequently use the
function df.head() to show the first few rows of a DataFrame object. You can examine
a preview of data in the DataFrame df by executing the function df.head(). The
DataFrame df's first five rows will be printed to the console when this code is run.
The head() function accepts an integer as an input if you want to display a different
number of rows. For instance, df.head(10) will show the DataFrame's top ten rows.
Random Forest is robust to feature scaling since it selects split points based on feature values
rather than distances. Standardization has minimal impact on its performance but can help
when combining with other models. It reduces overfitting by averaging multiple decision
trees for better generalization.
A) Random Forest
Feature Importance
Let us find the feature importance now, i.e. which features are most important for this
problem. We will use feature_importances_ attribute of sklearn to do so. It will return the
feature importances (the higher, the more important the feature).
C)XGBoost
XGBoost works only with numeric variables and we have already replaced the categorical
variables with numeric variables. Let’s have a look at the parameters that we are going to use
in our model.
Logistic Regression
we will start with logistic regression model and then move over to more complex models like
RandomForest and XGBoost.
7. Conclusion
The predictive models based on Logistic Regression, Decision Tree and Random Forest, give
the accuracy as 80.945%, 93.648% and 83.388% whereas the cross-validation is found to be
80.945%, 72.213% and 80.130% respectively. This shows that for the given dataset, the
accuracy of model based on decision tree is highest but random forest is better at
generalization even though it’s cross validation is not much higher than logistic regression.
Future Work
Feature Engineering: Incorporate additional features like bank transaction history,
customer behavior, etc.
Deep Learning Models: Experiment with Neural Networks for improved predictions.
Explainability: Use SHAP values to explain model decisions for regulatory
compliance.
By integrating ML-based automation into financial services, institutions can achieve faster,
data-driven, and more reliable loan approval decisions.
8.Reference
1. Krishnaraj P., Rita S., Jaiswal J. (2024). "Comparing Machine Learning Techniques
for Loan Approval Prediction," Proceedings of the 1st International Conference on
Artificial Intelligence, Communication, IoT, Data Engineering and Security (IACIDS
2023), IEEE.
2. Dharavath Sai Kiran, Avula Dheeraj Reddy, Suneetha Vazarla, Dileep P. (2023).
"Loan Approval Prediction using Adversarial Training and Data Science," Turkish
Journal of Computer and Mathematics Education (TURCOMAT).
3. F. M. Ahosanul Haque, Md. Mahbubur Rahman (2023). "A Machine Learning
Approach for Credit Risk Prediction in Loan Approval Systems," Springer Lecture
Notes in Computer Science.
4. A. Singh, P. Gupta, R. Kumar (2024). "Loan Default Prediction Using Hybrid
Machine Learning Models," IEEE Transactions on Computational Social Systems.
5. X. Zhao, J. Wang, L. Chen (2022). "Ensemble Learning-Based Credit Scoring for
Loan Approval," Journal of Financial Data Science.
6. M. S. Khan, T. Rahman, H. Hasan (2023). "Predicting Loan Approval Using
Supervised Machine Learning Algorithms," International Journal of Machine
Learning and Cybernetics.
7. S. Bose, N. Raj, P. Das (2024). "Application of Neural Networks in Loan Approval
Prediction," Expert Systems with Applications.
8. L. Zhang, C. Li, Z. Wang (2022). "Deep Learning Approaches for Loan Approval
Decision Making," Neural Computing and Applications.
9. T. Kumar, M. Verma (2023). "Comparative Study of Machine Learning Models for
Credit Risk Assessment," International Journal of Artificial Intelligence & Data
Science.
10. V. Sharma, R. Prasad (2024). "Random Forest and XGBoost for Loan Approval
Prediction: A Case Study," IEEE Access.
11. H. Wei, J. Sun, X. Lu (2022). "Bayesian Network-Based Credit Risk Evaluation for
Loan Processing," Computational Intelligence and Finance.
12. K. Patel, M. Mehta (2023). "Automated Loan Approval System Using Natural
Language Processing and ML," ACM Transactions on Intelligent Systems and
Technology.
13. R. Nair, J. Thomas (2024). "Enhancing Loan Approval Prediction Using Federated
Learning Models," Journal of Financial Technology and Innovation.
14. P. Malhotra, A. Roy (2022). "Feature Selection Methods for Improving Loan
Approval Classification Models," Springer Advances in Data Science.
15. S. Pandey, T. Agarwal (2023). "Loan Repayment Prediction Using Gradient Boosting
and Explainable AI," Elsevier Applied Soft Computing.
16. B. Roy, H. Chatterjee (2024). "Comparative Analysis of Support Vector Machines and
Neural Networks for Loan Default Prediction," IEEE Transactions on Financial
Engineering.
17. C. Wang, F. Li (2022). "Hybrid ML Models for Real-Time Loan Approval Decisions,"
Journal of AI in Banking and Finance.
18. D. Evans, J. Roberts (2023). "Improving Fairness in Loan Approvals Using AI Ethics
Frameworks," International Journal of Ethics in AI and Machine Learning.
19. S. Yadav, K. Bansal (2024). "An Explainable AI Model for Loan Approval
Decisions," ACM Transactions on Computational Finance.
20. N. Gupta, V. Saxena (2023). "Evaluating the Role of Big Data in Machine Learning-
Based Credit Scoring Models," Springer Journal of Banking Analytics.