0% found this document useful (0 votes)

19 views10 pages

Installment 3

The document is a project report from a group analyzing business statistics, focusing on loan repayment data and regression analysis. It discusses the least squares regression of amounts repaid at six months against total amounts to be repaid, evaluates the performance ratio at six months (PRSM), and compares different lending strategies. The analysis concludes that larger loans tend to overperform in terms of PRSM, but the data does not fully conform to simple regression model assumptions.

Uploaded by

Deepak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views10 pages

Installment 3

Uploaded by

Deepak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Business Statistics II

Project Report-Group 9

Submitted by:
Midhun Mohan
Braja kishor Prida
Paras kohli
Deepak Kannan
Mahesh kumar Yadav
Ashutosh Tiwari
Project 1(instalment 3)

Q1: a. If the amounts garnished from each merchant’s credit card transactions is accumulating
at a rate that will pay off the loan at around 12 months, then approximately what should be
the slope and intercept of the least squares regression of the amount repaid at six months (y)
versus the total amount to be repaid (x)?

As per the question, we can assume that payment is done uniformly, so the slope will become 0.5
And equation will be y=0.5x
b. For your data set, what is the slope and intercept of the least squares regression of
the amount repaid at six months (y) versus the total amount to be repaid (x)?
Compare these estimates with the values anticipated in “a”? Explain any differences.

R is a collaborative project with many contributors.

Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or

'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Workspace loaded from ~/.RData]

> library(readr)
> Project_dataset <- read_csv("F:/Term 2/Business Statistics
2/Project/Project_dataset.csv")
> View(Project_dataset)
> Project_dataset
# A tibble: 628 x 27
Sno CaseNumber Amt.Repaid.at.6.Months Nominal.Loan.Amount
Total.Amt.to.be.Repaid
<int> <int> <int> <dbl> <dbl>
1 1 344012260 7879 19200 22656.00
2 2 378114786 19392 37000 43475.00
3 3 351015636 14253 26600 31654.00
4 4 79719718 6934 15600 18330.00
5 5 104110253 6199 18400 22816.00
6 6 371413226 21268 40000 49000.00
7 7 253511876 27233 44000 51700.00
8 8 355114334 4841 8900 10635.55
9 9 214917684 8040 14800 18352.00
10 10 161017385 5885 15000 17850.00
# ... with 618 more rows, and 22 more variables: Repayment.Percentage <dbl>,
# Commission.Upfront <dbl>, Validated.Monthly.Batch <dbl>,
# Historical.Monthly.Credit.Card.Receipts <dbl>, Loan.Type <chr>, Loan.Size.Class
<chr>,
# FICO <int>, Years.In.Business <int>, Num.of.Credit.Lines <int>,
# Num.of.Paid.off.Credit.Lines <int>, Current.Delinquent.Credit.Lines <int>,
# Previous.Delinquent.Credit.Lines <int>, Business.Entity.Type <chr>,
Num.of.Trade.Lines <int>,
# Num.of.Derog.Legal.Item <int>, Two.Digit.SIC.Code <int>, Two.Digit.SIC.Description
<chr>,
# Population.in.Zip.Code <int>, Average.House.Value.in.Zip.Code <int>,
# Income.Per.Household.in.Zip.... <int>, State <chr>, ISO.Name <chr>
> y<-Project_dataset$Amt.Repaid.at.6.Months
>y
[1] 7879 19392 14253 6934 6199 21268 27233 4841 8040 5885
12249
[12] 1207 27408 10628 52687 46141 41531 69413 4253
> x<-Project_dataset$Total.Amt.to.be.Repaid
> plot(y,x)
> plot(x,y)
> plot(x, y, xlab = "Total amount to be repaid", ylab = "Amount paid in six months",
main="scatter plot of y and x")
> lm(x,y)
Error in formula.default(object, env = baseenv()) : invalid formula
> lm(x~y)

Call:
lm(formula = x ~ y)

Coefficients:
(Intercept) y
4.349e+04 4.516e-02

> model1<-lm(x~y)
> model1$fitted.values
1 2 3 4 5 6 7 8 9
43841.32 44361.21 44129.15 43798.65 43765.46 44445.92 44715.27 43704.14
43848.59
10 11 12 13 14 15 16 17

> yfit<-model1$fitted.values
> plot(x,y)
> lines(x,yfit)
> residual<-y-yfit
> plot(x,residual)
> cor(x,y)
[1] 0.1422644
(c) Use residuals to determine whether the data used to estimate the simple regression in
“b” conform to the assumptions of the simple regression model (SRM). If the data do
conform to the SRM, report and briefly interpret the value of R2 and RMSE. If the
data are not consistent with the assumptions of the SRM, explain how the data deviate
from those assumptions.

Checklist for Assumptions in Linear Regression:

✓ is the association between Y and X linear?

✓ Have we ruled out obvious lurking variables?
✓ Are the errors evidently independent?
✓ Are the variances of the residuals similar?
✓ Are the residuals nearly normal?

Linearity:
Variables Amt.Repaid.at.6.Months and Total.Amt.to.be.Repaid are not related linearly. As
we can clearly figure out by looking at below picture.

Residual plot:
Independence:
Residuals in the plot are following a pattern, it suggests that variables which are not linearly
related, we are forcefully fitting as linear in this equation.
Variance:
Pattern in the residuals shows that the variance is not constant.

2. The lender uses a performance metric known as PRSM, performance ratio at six
months. In this question, you will form and describe this new variable. To construct
PRSM, define a new column in your data table using a formula. In the formula, divide
two times the amount repaid at six months by the total amount to be repaid:

(a) If small loans and large loans are performing comparably and accumulating at a
rate that will pay off the loans at around 12 months, then approximately what
should be the slope and intercept of the least squares regression of PRSM (y)
versus the total amount to be repaid (x)?
Y = PRSM
X = amount to be re paid
Loans are accumulating at a rate that pays off loans, it indicates loans are performing
well. People are paying their installment without defaulting.
In this scenario PRSM is equal to one and amount to be repaid(x) is equal to the
Amount repaid at six months.
Equation: Y = 1
Intercept = 1, slope = 0
(b) For your data set, what is the slope and intercept of the least squares regression
of PRSM (y) versus the total amount to be repaid (x)? Compare these
estimates with the values of the slope and intercept anticipated in “a”? Explain
any differences.
Reg4 <- lm(Treated_sample2$PRSM~Treated_sample2$Total.Amt.to.be.Repaid)
Call:
lm(formula = Treated_sample2$PRSM ~
Treated_sample2$Total.Amt.to.be.Repaid)

Coefficients:
(Intercept) 7.801e-01
Treated_sample2$Total.Amt.to.be.Repaid 7.279e-07

Equation: PRSM = 7.801e-01+ 7.279e-07 * Total.Amt.to.be.Repaid

summary(Reg4)

Residuals:
Min 1Q Median 3Q Max
-0.67970 -0.14559 0.00587 0.15022 0.76540

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.801e-01 1.100e-02 70.908 < 2e-16 ***
Treated_sample2$Total.Amt.to.be.Repaid 7.279e-07 1.502e-07 4.847 1.58e-06 ***

Explanation:

1) In question 2(a), Ideal case intercept is 1, here it is 7.801e-01. It is due to

extrapolation outside the dataset.
2) In ideal case slope is zero, here it is 7.279e-07. This is happening due to the
presence of irregular or partial payments of instalments. p –value of the slope is
1.58e-06. With 95% confidence level we can reject the null hypothesis.
Presence of slope in this equation is significant.

(C) Use residuals to determine whether the data used to estimate the simple
regression in “b” conform to the assumptions of the simple regression model
(SRM). If the data do conform to the SRM, report and briefly interpret the
value of R2 and RMSE. If the data are not consistent with the assumptions of
the SRM, explain how the data deviate from those assumptions.
Assumptions of SRM :
1) Linearity:
2) Residuals should be independent:

(d) Does your analysis of the data indicate that loans with larger total amount to be
repaid have smaller average PRSM (i.e., tend on average to underperform)
compared to those with smaller principal?
The analysis of the data indicates that loans with larger total amount to be repaid have
larger average PRSM (i.e., tend on average to over perform) compared to those with smaller
principal. But this inference cannot be justified because the residuals do not fulfil the SRM
conditions (heteroscedasticity).

3. Suppose that the lender could lend sufficient money so that $160,000 has to be repaid,
but in one of two ways. It can lend the money to one merchant, who needs to pay back the
entire $160,000, or to 16 distinct merchants, each of which must repay $10,000. Due to its
own credit obligations, the lender must have at least $60,000 of its money paid back within
six months. Which approach would you recommend to management of the lender: lend to a
single merchant or spread the money over 16 merchants? Support your choice by supplying
an estimate of the probability that at least $60,000 will be paid back within six months
under both scenarios. Your answer should note any key assumptions that are necessary for
your analysis. The following questions will help guide your answer.

(a) Which of the two models considered in questions 1 and 2 better conforms to the Simple
Regression Model (SRM) assumptions? Briefly explain why.

The model 2 conforms better to the SRM assumptions because, the second regression model
depicts a better linear relationship between the predictor and the response variable . Also
based on the results of correlation function, it can be seen that the model doesn’t suffer
from issues of multi-collinearity.

(b) Using the model from Q2, estimate the average PRSM of loans which must repay $10k
and of loans which must repay $160k.

Case 1 : PRSM = 7.801e-01 + (7.279e-07)(10) = 2.93

Case 2 : PRSM = 7.801e-01 + (7.279e-07)(160)= 3.93

(c) For each lending scheme, what is the smallest average PRSM that the lender must
observe for each lending scheme to ensure $60k has been paid back within 6 months?

PRSM = 2 * (Amt.Repaid.at.6.Months/Total.Amt.to.be.Repaid)
Case1 : 2 * (60/(160*1)) = 0.75
Case2 : 2 * (60/(10*16)) = 0.75

(d) What is the approximate distribution of the average PRSM score in each of the two
lending schemes?

Case 1 : Mean = 2.93 ; RMSE = 0.218477

Case 2 : Mean = 3.93 ; RMSE = 0.218477
(e) For each lending scheme, determine the probability that at last $60k is paid back within
6 months. Which lending scheme is better – that is, which has the higher probability of
repayment?

Case 1 : Mean = 2.93 ; RMSE = 0.218477; Z value of PRSM = 0.75 (calculated above)
= ( 0.75 – 2.93)/0.222 = 9.81
Case 2 : Mean = 3.93 ; RMSE = 0.218477; Z value of PRSM = 0.75 (calculated above)
= (0.75 – 3.93)/0.222 = 14.32

4. To gauge the risk of loans, the lender has available several easily-obtained variables: the
FICO score, the Years in Business, the number of Credit Lines, and the Income per
Household in the zip code. Choose the best one of these four variables as an explanatory
variable in a simple regression to explain variation in PRSM. You want to choose the single
explanatory variable that makes these predictions as accurate as possible. (Be aware that a
transformation of an explanatory variable may produce a better fitting, more predictive
model, but keep any transformations simple and do not combine the predictor variables).
Absolutely, do not transform the PRSM score.

library(data.table)
library(dplyr)
library(ggplot2)
setwd("C:/NIIT/Term 2/Business Stats II/proj/")
data <- fread("insdata.csv")
data$Amt.Repaid.at.6.Months <- as.integer(data$Amt.Repaid.at.6.Months)

data <- data %>%

filter(!is.na(Amt.Repaid.at.6.Months)) %>%
mutate(prsm = 2*(Amt.Repaid.at.6.Months/Total.Amt.to.be.Repaid))

four <- data %>% select(c(12:14,25,28))

mod_fico <- lm(prsm~FICO, data = four)
summary(mod_fico)
mod_yib <- lm(prsm~Years.In.Business, data = four)
summary(mod_yib)
mod_ncl <- lm(prsm~Num.of.Credit.Lines, data = four)
summary(mod_ncl)
mod_iph <- lm(prsm~Income.Per.Household.in.Zip...., data = four)
summary(mod_iph)

(a) Explain briefly the reasoning behind your choice of the explanatory variable used
in your simple regression. Empirically, why choose this variable?

The variable ‘Years.In.Business’ explains most part of the variation in PRSM among the
other variables (FICO, Number.of.Credit.Lines & Income.Per.Household). This is evident
from the summary of the model and the coefficients of the intercept.

(b) Summarize the estimated simple regression, including R2, RMSE, and the
estimated slope and intercept. Interpret the values of these estimates appropriately.
coefficients(mod_yib)
(Intercept) Years.In.Business
0.8127613 0.1207524
summary(mod_yib)$sigma
[1] 28.03813
The expected value of PRSM given the average Years.In.Business (all the values) is
0.827613. i.e for the average value of Years.In.Business, the expected value of PRSM is
0.827613. Also, for every 1yr increase in Years.In.Business, the PRSM value increases by
0.1207524

DS Unit 4
No ratings yet
DS Unit 4
21 pages
Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
H-311 Linear Regression Analysis With R
100% (1)
H-311 Linear Regression Analysis With R
71 pages
Practical MCQs Least Squares
No ratings yet
Practical MCQs Least Squares
3 pages
(Ebook PDF) Applied Numerical Methods With MATLAB For Engineers and Scientists 4th Edition Download
100% (1)
(Ebook PDF) Applied Numerical Methods With MATLAB For Engineers and Scientists 4th Edition Download
50 pages
H311 Regression Practical
No ratings yet
H311 Regression Practical
38 pages
Regresssion Analysis
No ratings yet
Regresssion Analysis
19 pages
DISC 203-Probability & Statistics-Muhammad Asim
No ratings yet
DISC 203-Probability & Statistics-Muhammad Asim
4 pages
Day.10 Regression Evaluation Metrics MSE, RMSE, MAE, R-Squared
No ratings yet
Day.10 Regression Evaluation Metrics MSE, RMSE, MAE, R-Squared
8 pages
Part 2 - Multiple Regression Model
No ratings yet
Part 2 - Multiple Regression Model
49 pages
Experiment 1
No ratings yet
Experiment 1
17 pages
Lec 05 2 - Time Series Regression Model
No ratings yet
Lec 05 2 - Time Series Regression Model
75 pages
SEM Tutorial for Educators
100% (1)
SEM Tutorial for Educators
31 pages
Upam Chowdhury 241001161 DAS603 A2
No ratings yet
Upam Chowdhury 241001161 DAS603 A2
27 pages
Chapter 8 B - Trendlines and Regression Analysis
No ratings yet
Chapter 8 B - Trendlines and Regression Analysis
73 pages
Unit III
No ratings yet
Unit III
13 pages
Predictive Modelling Alternate Project Business Case
No ratings yet
Predictive Modelling Alternate Project Business Case
47 pages
Lec 05 - Time Series Regression Model
No ratings yet
Lec 05 - Time Series Regression Model
32 pages
Hidden EP Quiz No.02
86% (7)
Hidden EP Quiz No.02
284 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Linear+Regression+ +transcription
No ratings yet
Linear+Regression+ +transcription
22 pages
Advance Machine Learning
No ratings yet
Advance Machine Learning
16 pages
Linear Regression in R
No ratings yet
Linear Regression in R
19 pages
Operations Management: Chapter 4 - Forecasting
No ratings yet
Operations Management: Chapter 4 - Forecasting
110 pages
Regression Practice Questions
No ratings yet
Regression Practice Questions
19 pages
L9.1 2023
No ratings yet
L9.1 2023
47 pages
SHS Core - Statistics and Probability CG
100% (1)
SHS Core - Statistics and Probability CG
11 pages
Credit Risk Project, Installment 3: Indian School of Business
No ratings yet
Credit Risk Project, Installment 3: Indian School of Business
3 pages
MIS BA Solution Chapter03
No ratings yet
MIS BA Solution Chapter03
3 pages
SanatKulkarni - AP22110010183 - Assignment3-1
No ratings yet
SanatKulkarni - AP22110010183 - Assignment3-1
4 pages
Jana Sir - Final
No ratings yet
Jana Sir - Final
19 pages
Exercise 6
No ratings yet
Exercise 6
2 pages
Assingment 11
No ratings yet
Assingment 11
10 pages
Linear Regression Case Study
No ratings yet
Linear Regression Case Study
6 pages
Graded Quiz Unit 3 PDF
No ratings yet
Graded Quiz Unit 3 PDF
10 pages
Lab 05-1
No ratings yet
Lab 05-1
6 pages
t2 Sol
No ratings yet
t2 Sol
5 pages
DA-3rd Unit
No ratings yet
DA-3rd Unit
16 pages
AIML Lab Ex 3-5 - 1
No ratings yet
AIML Lab Ex 3-5 - 1
31 pages
Lesson Week 13
No ratings yet
Lesson Week 13
6 pages
Credit Card Default Analysis
No ratings yet
Credit Card Default Analysis
5 pages
Da Unit 3 R22
No ratings yet
Da Unit 3 R22
15 pages
Simulate A Regression Model For A Given Dataset: EX NO:03 - Date
No ratings yet
Simulate A Regression Model For A Given Dataset: EX NO:03 - Date
4 pages
Intro To R Introspection
No ratings yet
Intro To R Introspection
24 pages
Evans Analytics2e PPT 08
No ratings yet
Evans Analytics2e PPT 08
65 pages
cs447 - Tool Assessing Linear Prediction Rules With Residuals
No ratings yet
cs447 - Tool Assessing Linear Prediction Rules With Residuals
7 pages
Banking Risk Management
No ratings yet
Banking Risk Management
57 pages
Appc 2.6 Packet
No ratings yet
Appc 2.6 Packet
7 pages
6013B0519Y T2 Homework Questions 20240424
No ratings yet
6013B0519Y T2 Homework Questions 20240424
7 pages
Advanced - Linear Regression
No ratings yet
Advanced - Linear Regression
57 pages
Linear Regression Analysis. Statistics 2 Notes
No ratings yet
Linear Regression Analysis. Statistics 2 Notes
20 pages
Linear Regression Datascience Basit PDF
No ratings yet
Linear Regression Datascience Basit PDF
19 pages
2 Linear Regression
No ratings yet
2 Linear Regression
5 pages
Regression Essay
No ratings yet
Regression Essay
7 pages
Group 8 - Business Stats Project - Installment I
No ratings yet
Group 8 - Business Stats Project - Installment I
16 pages
Assignment1
No ratings yet
Assignment1
9 pages
Linear Regression & Data Science Basics
No ratings yet
Linear Regression & Data Science Basics
10 pages
3 Da
No ratings yet
3 Da
16 pages
Antwerpen2014sessie5 (Regression)
No ratings yet
Antwerpen2014sessie5 (Regression)
42 pages
Differential Approach
No ratings yet
Differential Approach
7 pages
Homework 2
100% (1)
Homework 2
14 pages
ECN190 Term Project: Predicting Credit Card Default Risk: Introduction and Literature
No ratings yet
ECN190 Term Project: Predicting Credit Card Default Risk: Introduction and Literature
18 pages
Intro To Regression
No ratings yet
Intro To Regression
30 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
Corporate Social Responsibility and Cost of Equity Capital The Moderating Role of Capital Structure
No ratings yet
Corporate Social Responsibility and Cost of Equity Capital The Moderating Role of Capital Structure
8 pages
Elliott 2017
No ratings yet
Elliott 2017
16 pages
Hamisi
No ratings yet
Hamisi
34 pages
Math 1281 Learning Journal Unit 6
No ratings yet
Math 1281 Learning Journal Unit 6
7 pages
Exams
No ratings yet
Exams
74 pages
Slides Ridge Lasso Regression
No ratings yet
Slides Ridge Lasso Regression
23 pages
Propel Research and Analysis With A Comprehensive Statistical Software Solution (SPSS Statistics v28)
No ratings yet
Propel Research and Analysis With A Comprehensive Statistical Software Solution (SPSS Statistics v28)
9 pages
Lecture 3 - Machine Learning and Data Driven Analysis
No ratings yet
Lecture 3 - Machine Learning and Data Driven Analysis
36 pages
SPSS Jalal Et Al 21nov 2024
No ratings yet
SPSS Jalal Et Al 21nov 2024
76 pages
3.1 Multivariate Analysis
No ratings yet
3.1 Multivariate Analysis
32 pages
Logistic Regression - Example
No ratings yet
Logistic Regression - Example
3 pages
Session Commands
No ratings yet
Session Commands
1,033 pages
Effect of Climate Change On Maize Yield in Western Ethiopia
No ratings yet
Effect of Climate Change On Maize Yield in Western Ethiopia
14 pages
1 PB
No ratings yet
1 PB
17 pages
Tutorial 03 WK 4 - Multi Linear Regression
No ratings yet
Tutorial 03 WK 4 - Multi Linear Regression
6 pages
Doeslegalizedprostitution PDF
No ratings yet
Doeslegalizedprostitution PDF
49 pages
Wang Et Al 2021 Evaluation of Yarn Appearance On A Blackboard Based On Image Processing
No ratings yet
Wang Et Al 2021 Evaluation of Yarn Appearance On A Blackboard Based On Image Processing
9 pages
Chat Openai Com Share d1822345 3a2b 42c7 9060 79766097ae3b
No ratings yet
Chat Openai Com Share d1822345 3a2b 42c7 9060 79766097ae3b
14 pages
Ordinal Logistic Regression Analysis of Factors Affecting The Blood Sugar Levels of Diabetes Mellitus Patients
No ratings yet
Ordinal Logistic Regression Analysis of Factors Affecting The Blood Sugar Levels of Diabetes Mellitus Patients
10 pages
Exchange-Rate Hedging: Financial Versus Operational Strategies
No ratings yet
Exchange-Rate Hedging: Financial Versus Operational Strategies
5 pages
Unit Wise Questions
No ratings yet
Unit Wise Questions
14 pages
Microemulsion Binder 2 PDF
No ratings yet
Microemulsion Binder 2 PDF
320 pages
The Coefficient Stability Test
No ratings yet
The Coefficient Stability Test
21 pages
The Impact of The Family Background On Students' Entrepreneurial Intentions: An Empirical Analysis
No ratings yet
The Impact of The Family Background On Students' Entrepreneurial Intentions: An Empirical Analysis
18 pages