0% found this document useful (0 votes)

25 views26 pages

Data Analyst Interview Assignment

The document outlines a data analysis assignment for a Data Analyst interview, focusing on a dataset of loans issued to customers as of January 31st. It includes instructions for data loading, cleaning, imputation, and feature engineering, along with necessary Python imports and data descriptions. The assignment emphasizes visualizing key aspects of the dataset to derive insights on loan statuses and repayment behaviors.

Uploaded by

邓雯卿

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views26 pages

Data Analyst Interview Assignment

Uploaded by

邓雯卿

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

6/4/25, 3:51 PM Data Analyst Interview Assignment

Data Analyst Interview Assignment

Data Analysis - Pilot report as at 31st Jan

Assignment Task;
Using the attached dataset(pilot report as at 31st jan.csv) analyse the data and visualize the most important
aspects using your preferred method. This dataset contains information on loans that have been issued to
customers and their status as at 31st of January. Attached (data dictionary.xlsx) find a data dictionary to aid
with understanding the different attributes.

# Necessary imports
import numpy as np
import pandas as pd
pd.set_option("display.max_columns", None)

import matplotlib.pyplot as plt

%matplotlib inline
import seaborn as sns
sns.set_style("darkgrid")

from tqdm import tqdm, trange

import warnings; warnings.filterwarnings("ignore")

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 1/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

# Load data
data = pd.read_csv("data/pilot Report as at 31st jan.csv", parse_dates=['CreateDate', 'Inv
data
 

PartnerID CreditLimit SONumber Cleared Overdue CreditUsed Amo

0 36262 26,100 SO11705794 True False 1,464 1,4

1 36262 26,100 SO11705909 True True 146 148

2 36262 26,100 SO11780664 True False 1,650 1,6

3 36262 26,100 SO11833594 True False 8,220 8,2

4 36262 26,100 SO11909592 True False 2,080 2,0

... ... ... ... ... ... ... ...

2354 1298401 1,669 SO13572455 True True 870 875

2355 1298401 1,669 SO13572754 True True 220 222

2356 1298401 1,669 SO13810848 True False 1,344 1,3

2357 1298401 1,669 SO13810938 True False 236 236

2358 1298401 1,669 SO14061060 False False 1,320 0

2359 rows × 13 columns

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 2/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

# Data descriptions
data_dict = pd.read_excel("data/data dictionary.xlsx")
data_dict

Attribute Descriprion

0 PartnerID Customer Unique Identifier

1 CreditLimit Maximum amount a customer can borrow at a give...

2 SONumber Unique loan identifier

3 Cleared Loan Status\nTrue = Loan has been paid\nFalse ...

4 Overdue Loan Tenure Status\nTrue = Loan has exceeded i...

5 CreditUsed Total Amount borrowed

6 AmountRepaid Total Loan amount paid back

7 Balance CreditUsed - AmountRepaid

8 Fees Fees accrued from late repayment

9 DaysOverdue Number of days the loan is overdue by

10 CreatedDate Date order was placed

11 InvoiceDate Date order was delivered

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 3/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

# Correct column name and print descriptions in full

data_dict.columns = ['Attribute', 'Description']

for row in data_dict.index:

print(f"{data_dict.loc[row, 'Attribute']} : {data_dict.loc[row, 'Description']}\n")

PartnerID : Customer Unique Identifier

CreditLimit : Maximum amount a customer can borrow at a given time

SONumber : Unique loan identifier

Cleared : Loan Status

True = Loan has been paid
False = Loan is still pending

Overdue : Loan Tenure Status

True = Loan has exceeded its repayment days
False = Loan is still within its repayment days

CreditUsed : Total Amount borrowed

AmountRepaid : Total Loan amount paid back

Balance : CreditUsed - AmountRepaid

Fees : Fees accrued from late repayment

DaysOverdue : Number of days the loan is overdue by

CreatedDate : Date order was placed

InvoiceDate : Date order was delivered

General Data Statistics

Preview general dataset stats using package datastand
Source code: https://github.com/lyraxvincent/datastand/blob/master/datastand/datastand.py

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 4/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

from datastand.datastand import datastand

datastand(data)


General stats:
==================
Size of DataFrame: 30667
Shape of DataFrame: (2359, 13)
Number of unique data types : {dtype('float64'), dtype('O'), dtype('int64'), dtype('<M8[ns]')}
Number of numerical columns: 3
Number of non-numerical columns: 8

Head of DataFrame:
__________________
PartnerID CreditLimit SONumber Cleared Overdue CreditUsed AmountRepaid \
0 36262 26,100 SO11705794 True False 1,464 1,464
1 36262 26,100 SO11705909 True True 146 148
2 36262 26,100 SO11780664 True False 1,650 1,650
3 36262 26,100 SO11833594 True False 8,220 8,220
4 36262 26,100 SO11909592 True False 2,080 2,080

Balance Fees DaysOverdue CreateDate InvoiceDate group

0 0 0.0 0.0 2021-10-15 2021-10-18 Test
1 0 2.0 0.0 2021-10-15 2021-10-18 Test
2 0 0.0 0.0 2021-10-19 2021-10-21 Test
3 0 0.0 0.0 2021-10-22 2021-10-25 Test
4 0 0.0 0.0 2021-10-27 2021-10-29 Test

Tail of DataFrame:
__________________
PartnerID CreditLimit SONumber Cleared Overdue CreditUsed \
2354 1298401 1,669 SO13572455 True True 870
2355 1298401 1,669 SO13572754 True True 220
2356 1298401 1,669 SO13810848 True False 1,344 

Do you wish to long-list missing data statistics?(y/n): y

Column:
SONumber
_______________

Missing data points 94 out of total 2359.

Most occurring value: SO11554320, count: 1
Column:
Cleared
_______________

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 5/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

Missing data points 66 out of total 2359. 

Most occurring value: True, count: 400

Column:
Overdue
_______________

Missing data points 22 out of total 2359.

Most occurring value: False, count: 1351
Column:
CreditUsed
_______________

Missing data points 22 out of total 2359.

Most occurring value: 860, count: 61
Column:
AmountRepaid
_______________

You can visualize missing data automatically right away or you can use the
function plot_missing() after importing it from DataStand. Visualize now?(y/n): y

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 6/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

<datastand.datastand.datastand at 0x7fbc74280a30>

Data Cleaning and Imputation

Fix inconsistent data types
Deal with missing values:
Drop rows that have missing values in almost all columns
Fill other missing data points

# fix data types

def to_int(value):

if str(value) != 'nan':
value = int(''.join(str(value).split(',')))

return value

def fix_dtypes(df):

df.CreditLimit = df.CreditLimit.apply(to_int)
#df.Cleared = df.Cleared.astype(bool)
#df.Overdue = df.Overdue.astype(bool)
df.CreditUsed = df.CreditUsed.apply(to_int)
df.AmountRepaid = df.AmountRepaid.apply(to_int)
df.Balance = df.Balance.apply(to_int)

return df

data = fix_dtypes(data)
print(data.dtypes)

PartnerID int64
CreditLimit int64
SONumber object
Cleared object
Overdue object
CreditUsed float64
AmountRepaid float64
Balance float64
Fees float64
DaysOverdue float64
CreateDate datetime64[ns]
InvoiceDate datetime64[ns]
group object
dtype: object

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 7/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

Missing values imputation:

First drop those rows with missing data points in several columns. These are the 22 rows with missing points
from the Overdue upto CreateDate columns as shown from our general stats output above.

data.dropna(subset=['Overdue',
'CreditUsed', 'AmountRepaid', 'Balance', 'Fees', 'DaysOverdue',
'CreateDate'], axis=0, inplace=True)

# reset index
data.reset_index(drop=True, inplace=True)

data.isnull().sum()

PartnerID 0
CreditLimit 0
SONumber 72
Cleared 44
Overdue 0
CreditUsed 0
AmountRepaid 0
Balance 0
Fees 0
DaysOverdue 0
CreateDate 0
InvoiceDate 0
group 0
dtype: int64

data[data.Cleared.isna()].sample(5, random_state=101)

PartnerID CreditLimit SONumber Cleared Overdue CreditUsed Amoun

73 60592 13100 NaN NaN True 1050.0 1053.

1853 370338 3556 NaN NaN True 2910.0 3270.

1794 47288 1975 NaN NaN True 1875.0 1935.

916 363796 16000 NaN NaN True 860.0 869.0

2218 855202 1142 NaN NaN True 445.0 473.0

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 8/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

data[data.Fees == 0].count()[0]

1351

Now we have two columns with missing data points and since the SONumber column is an index column, we
only have the Cleared column to fill.
We use information from other columns to fill this column. In this case the CreditUsed and AmountRepaid
columns.
Clearly, if amount repaid is equal(for zero fee) or more than credit used(with charged fee), this means that
the customer repaid the loan.

for idx in tqdm(data[data.Cleared.isna()].index):

if data.loc[idx, 'AmountRepaid'] >= data.loc[idx, 'CreditUsed']:

data.loc[idx, 'Cleared'] = True
else:
data.loc[idx, 'Cleared'] = False

100%|████████████████████████████████████████████████████████████████████████████████████████████████████
 

# Check for duplicates

data[data.duplicated(subset=['CreditLimit', 'SONumber', 'Cleared', 'Overdue',
'CreditUsed', 'AmountRepaid', 'Balance', 'Fees', 'DaysOverdue',
'CreateDate', 'InvoiceDate', 'group'])]

PartnerID CreditLimit SONumber Cleared Overdue CreditUsed AmountRep

Feature Engineering
Design features from the already available ones:

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 9/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

def feature_eng(df):

# datetime features
df['CreateDate_year'] = df.CreateDate.dt.year.astype(int)
df['CreateDate_month'] = df.CreateDate.dt.month.astype(int)
df['CreateDate_day'] = df.CreateDate.dt.day.astype(int)
df['CreateDate_dayname'] = df.CreateDate.dt.day_name()

df['InvoiceDate_year'] = df.InvoiceDate.dt.year.astype(int)
df['InvoiceDate_month'] = df.InvoiceDate.dt.month.astype(int)
df['InvoiceDate_day'] = df.InvoiceDate.dt.day.astype(int)
df['InvoiceDate_dayname'] = df.InvoiceDate.dt.day_name()

# time difference in days between create date and invoice date

df['CrtInv_dateDiff'] = (df.InvoiceDate - df.CreateDate).apply(lambda x: int(str(x).sp

# binary category columns for cleared and overdue columns

df['cleared_cat'] = df.Cleared.map({True: 'Cleared', False: 'Not Cleared'})
df['overdue_cat'] = df.Overdue.map({True: 'Overdue', False: 'On time'})

return df

data = feature_eng(data)
 

data.head()

PartnerID CreditLimit SONumber Cleared Overdue CreditUsed Amount

0 36262 26100 SO11705794 True False 1464.0 1464.0

1 36262 26100 SO11705909 True True 146.0 148.0

2 36262 26100 SO11780664 True False 1650.0 1650.0

3 36262 26100 SO11833594 True False 8220.0 8220.0

4 36262 26100 SO11909592 True False 2080.0 2080.0

Exploratory Data Analysis

Univariate Analysis
Studying selected columns one by one:

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 10/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

# Total number of customers spanned in this dataset

print(f"Total Customers: {data.PartnerID.nunique()}")

Total Customers: 190

# Distribution of Cleared column

print(data.Cleared.value_counts())

# as a percentage
print(data.Cleared.value_counts()*100 / len(data))
plt.figure(figsize=(10,6))
sns.countplot(data.Cleared)
plt.title("Distribution of Cleared status")

True 1937
False 400
Name: Cleared, dtype: int64
True 82.884039
False 17.115961
Name: Cleared, dtype: float64

Text(0.5, 1.0, 'Distribution of Cleared status')

Most customers have cleared their loans.

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 11/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

# Distribution of Overdue column

print(data.Overdue.value_counts())

# as a percentage
print(data.Overdue.value_counts()*100 / len(data))
plt.figure(figsize=(10,6))
sns.countplot(data.Overdue)
plt.title("Distribution of Overdue status")

False 1351
True 986
Name: Overdue, dtype: int64
False 57.809157
True 42.190843
Name: Overdue, dtype: float64

Text(0.5, 1.0, 'Distribution of Overdue status')

Most customers paid their loans on time.

Top 10 customers who have taken the most number of loans:

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 12/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

loan_times_dict = data.groupby('PartnerID')['SONumber'].count().to_dict()

# sort dictionary
marklist = sorted(loan_times_dict.items(), key=lambda x:x[1], reverse=True)
loan_times_dict = dict(marklist)

for key, val in zip(list(loan_times_dict.keys())[:10], list(loan_times_dict.values())[:10]

print(f"{key} : {val}")
 

388436 : 122
363796 : 116
548447 : 86
437063 : 84
400649 : 78
105975 : 73
303101 : 60
302148 : 54
410793 : 52
340828 : 44

Customers that are still defaulting on their loans:

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 13/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

defaulting_customers = pd.DataFrame(sorted(data[data.Cleared == False].groupby('PartnerID'

key=lambda x:x[1], reverse=True),
columns=['PartnerID', 'Total_balance'])
defaulting_customers = defaulting_customers[defaulting_customers.Total_balance > 0]
defaulting_customers
 

PartnerID Total_balance

0 398110 27739.0

1 274819 26048.0

2 805373 20759.0

3 105975 19337.0

4 548447 18940.0

... ... ...

103 360373 85.0

104 171681 67.0

105 309779 20.0

106 538014 18.0

107 65627 5.0

108 rows × 2 columns

defaulting_customers.head(10)

PartnerID Total_balance

0 398110 27739.0

1 274819 26048.0

2 805373 20759.0

3 105975 19337.0

4 548447 18940.0

5 400649 15790.0

6 437063 12970.0

7 668827 11477.0

8 351590 10022.0

9 174216 9802.0

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 14/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

Out of 190 customers, 108 have not yet cleared their loans, with 9 customers having total pending balances
of above KShs. 10000.

Loans that are long overdue:

data[data.DaysOverdue >= 100]#.count()[0]

PartnerID CreditLimit SONumber Cleared Overdue CreditUsed Amo

63 58981 11200 SO11566125 False True 540.0 0.0

64 58981 11200 SO11566450 False True 324.0 0.0

288 174216 12700 SO11671068 False True 995.0 31.

646 334599 10500 SO11593867 False True 2626.0 0.0

647 334599 10500 SO11640698 False True 1081.0 0.0

1646 668827 18500 SO11640140 False True 2120.0 324

1718 964932 6200 SO11563593 False True 178.0 0.0

1719 964932 6200 SO11633125 False True 272.0 0.0

1720 964932 6200 SO11633163 False True 106.0 0.0

1721 964932 6200 SO11634061 False True 378.0 0.0

1722 964932 6200 SO11635044 False True 403.0 0.0

1723 964932 6200 SO11635099 False True 255.0 0.0

1724 964932 6200 SO11635347 False True 769.0 0.0

1821 309779 957 SO11667635 False True 220.0 220

1973 519688 1925 SO11664370 False True 89.0 89.

data[data.DaysOverdue >= 100].count()[0]

Credit usage over time:

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 15/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

plt.figure(figsize=(15,8))
sns.lineplot(x='CreateDate', y='CreditUsed', data=data.sort_values(by='CreateDate'),
hue='Cleared')
plt.title("Credit usage over time")

Text(0.5, 1.0, 'Credit usage over time')

Investigate the spiked Credit that is not yet cleared:

data[(data.CreateDate > pd.to_datetime('2021-12-01')) & (data.CreateDate < pd.to_datetime(

(data.CreditUsed > 6000) & (data.Cleared == False)]
 

PartnerID CreditLimit SONumber Cleared Overdue CreditUsed Amou

990 384931 14900 SO12926220 False True 13152.0 1322

It is partly paid with a minimal balance of KShs. 127

Days when most loan orders are created and when they are invoiced:
https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 16/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

plt.figure(figsize=(10,6))
sns.countplot(y=data.CreateDate_dayname,
order=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'])
plt.title("Number of Loan orders day-wise(CreateDate)")

Text(0.5, 1.0, 'Number of Loan orders day-wise(CreateDate)')

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 17/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

plt.figure(figsize=(10,6))
sns.countplot(y=data.InvoiceDate_dayname,
order=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'])
plt.title("Number of Invoices day-wise(InvoiceDate)")

Text(0.5, 1.0, 'Number of Invoices day-wise(InvoiceDate)')

Most loan orders are made on Thursday and Saturday while most invoices are sent out on Wednesday.
Wednesdays, Thursdays and Saturdays are the busy days in a week.

data.CreateDate_year.value_counts()

2021 2072
2022 265
Name: CreateDate_year, dtype: int64

data.CreateDate_month.value_counts()

11 790
10 772
12 510
1 265
Name: CreateDate_month, dtype: int64

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 18/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

data.InvoiceDate_month.value_counts()

11 783
10 700
12 577
1 277
Name: InvoiceDate_month, dtype: int64

The dataset spans 4 months; October, November, December 2021 and January 2022.
January orders have cut down to close than half those of December, with maximum orders being made in
November 2021.

Difference in days from create date to invoice date:

data.CrtInv_dateDiff.value_counts()

2 1452
3 540
4 266
6 30
7 19
9 16
5 14
Name: CrtInv_dateDiff, dtype: int64

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 19/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

fig, ax = plt.subplots(1,2, figsize=(14,6))

ax[0].pie(data.CrtInv_dateDiff.value_counts(), labels=['2 days','3 days','4 days','6 days'
sns.countplot(y=data.CrtInv_dateDiff, ax=ax[1])
fig.suptitle("Difference in days from create date to invoice date")
 

Text(0.5, 0.98, 'Difference in days from create date to invoice date')

The shortest time it takes for a loan order to be invoiced is 2 days.

There are 16 worse cases where orders took upto 9 days, and total 35 cases took over a week.(7 and 9
days)
Bivariate Analysis
Studying relationships between columns:
Boxplots to visualize quartiles and their ranges as well as detect outliers:

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 20/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

cols, rows = 3, 2

fig, axes = plt.subplots(rows, cols, figsize=(16,12))

columns = ['CreditLimit', 'CreditUsed', 'AmountRepaid', 'Balance', 'Fees', 'DaysOverdue']

for index, col in enumerate(columns):

# new subplot with (i + 1)-th index laying on a grid
plt.subplot(rows, cols, index + 1)
# drawing the plot
sns.boxplot(x='cleared_cat', y=col, data=data)
plt.title(f"{col}")

fig.suptitle("Numerical columns in relation to Cleared status")

plt.show()
 

Inspecting correlation of credit limit and credit used:

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 21/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

print(data[['CreditLimit', 'CreditUsed']].corr())
sns.heatmap(data[['CreditLimit', 'CreditUsed']].corr())

CreditLimit CreditUsed
CreditLimit 1.000000 0.360038
CreditUsed 0.360038 1.000000

<AxesSubplot:>

Normally we would expect a customer with a high credit limit to borrow more. The above heatmap shows
weak positive correlation, meaning that although increased limit increases a customers borrowing amount, it
does not always have to be that increased credit limit for a customer will make them borrow more.

Further credit limit vs credit used analysis:

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 22/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

sns.jointplot(x='CreditLimit', y='CreditUsed', data=data, kind='reg')

<seaborn.axisgrid.JointGrid at 0x7fbc2e9d1bb0>

The smaller the credit limit, the more the small amounts loans. -> More borrowers are small scale.

Finish off EDA with a pairplot to see if we gain insights from an overall plot with more than two variables and
a pandas profile report to summarise everything.

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 23/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

sns.pairplot(data, vars=['CreditLimit', 'CreditUsed', 'AmountRepaid', 'Balance', 'Fees', '

 

<seaborn.axisgrid.PairGrid at 0x7fbc2e947d60>

We see a linear relationship between CreditUsed and Balance column (where balance is not zero).

from pandas_profiling import ProfileReport

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 24/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

ProfileReport(data)

Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]

Generate report structure: 0%| | 0/1 [00:00<?, ?it/s]

Render HTML: 0%| | 0/1 [00:00<?, ?it/s]

Overview

Dataset statistics
Number of variables 24

Number of observations 2337

Missing cells 72

Missing cells (%) 0.1%

Duplicate rows 0

Duplicate rows (%) 0.0%

Total size in memory 438.3 KiB

Average record size in memory 192.1 B

Variable types
Numeric 10

Categorical 10

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 25/26
6/4/25, 3:51 PM Data Analyst Interview Assignment

Overall Data Analysis Report

There are 190 total customers in the dataset.
Most customers have cleared their loans (=~ 82.88%)
Most customers paid their loans on time (=~ 57.8%)
The customer who have taken the most number of loans is Partner ID 388436 with 122 loans taken
followed by Partner ID 363796 with 116.
Out of 190 customers, 108 have not yet cleared their loans, with 9 customers having total pending
balances of above KShs. 10000.
There are a number of loans that are long overdue, more than 15 defaulting loans are more than 3
months overdue.
Most loan orders are made on Thursday and Saturday while most invoices are sent out on Wednesday.
Wednesdays, Thursdays and Saturdays are the busy days in a week.
The dataset spans 4 months; October, November, December 2021 and January 2022.
January orders have cut down to close than half those of December, with maximum orders being made
in November 2021.
The shortest time it takes for a loan order to be invoiced is 2 days.
There are 16 worse cases where orders took upto 9 days, and total 35 cases took over a week.(7 and 9
days)
There is weak relationship between credit limit and credit used. Although increased limit increases a
customers borrowing amount, increased credit limit for a customer does not make them borrow more.
The smaller the credit limit, the more the small amounts loans. More borrowers are small scale.

https://deepnote.com/app/lyraxvincent/Data-Analyst-Interview-Assignment-7e8895fe-2ae6-45cf-9cd5-788364d86d1f 26/26

Trainity-Data An
No ratings yet
Trainity-Data An
24 pages
Data Pre Processing and Cleaning
No ratings yet
Data Pre Processing and Cleaning
56 pages
Data Pre Processing and Cleaning
No ratings yet
Data Pre Processing and Cleaning
23 pages
Capstone Project - Final Submission
No ratings yet
Capstone Project - Final Submission
36 pages
PA v0.21
No ratings yet
PA v0.21
17 pages
Exp 8 - LM
No ratings yet
Exp 8 - LM
10 pages
Group 5 Dseb64a Report
No ratings yet
Group 5 Dseb64a Report
10 pages
Trainity Data Analytics Training Project 6
No ratings yet
Trainity Data Analytics Training Project 6
22 pages
PA v0.20
No ratings yet
PA v0.20
17 pages
Business Analytics
No ratings yet
Business Analytics
56 pages
Final - EDA Assignment - Sourabh S Hubballi
No ratings yet
Final - EDA Assignment - Sourabh S Hubballi
34 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
43 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
2 pages
Bank Loan PPT
No ratings yet
Bank Loan PPT
45 pages
AIML Lab Ex 3-5 - 1
No ratings yet
AIML Lab Ex 3-5 - 1
31 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
71 pages
Uber Trip Data Analysis
No ratings yet
Uber Trip Data Analysis
10 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
PA v0.25
No ratings yet
PA v0.25
18 pages
Bank Loan Case Study Report
No ratings yet
Bank Loan Case Study Report
23 pages
EDA SummaryReport
No ratings yet
EDA SummaryReport
5 pages
Bank Loan Risk Analysis
No ratings yet
Bank Loan Risk Analysis
10 pages
EDA Report
No ratings yet
EDA Report
6 pages
Day 2 Notes - Interview Class
No ratings yet
Day 2 Notes - Interview Class
30 pages
This Study Resource Was: Bank Loan Default Prediction Model
No ratings yet
This Study Resource Was: Bank Loan Default Prediction Model
9 pages
Day6 Dataanalyst
No ratings yet
Day6 Dataanalyst
9 pages
DS Lec 6
No ratings yet
DS Lec 6
27 pages
Credit EDA Assignment PDF
No ratings yet
Credit EDA Assignment PDF
40 pages
EDA & FE-Graded Internal Microproject
No ratings yet
EDA & FE-Graded Internal Microproject
1 page
Investigate A Dataset-2
No ratings yet
Investigate A Dataset-2
9 pages
PFDA
No ratings yet
PFDA
23 pages
Loan Api
No ratings yet
Loan Api
5 pages
Credit EDA Case Study Doc 1
100% (1)
Credit EDA Case Study Doc 1
16 pages
Bank Loan Data Insights
No ratings yet
Bank Loan Data Insights
11 pages
DAP Writeups - Merged
No ratings yet
DAP Writeups - Merged
33 pages
Trainity Data Analytics Trainee Task 6
No ratings yet
Trainity Data Analytics Trainee Task 6
52 pages
Program 4+Linear+Discriminant+Analysis+-+Mentor+Version0.2 - New
No ratings yet
Program 4+Linear+Discriminant+Analysis+-+Mentor+Version0.2 - New
16 pages
Capstone Report: FIRST NAME: Gopalakrishnan LAST NAME: Kalarikovilagam Subramanian M12821535
No ratings yet
Capstone Report: FIRST NAME: Gopalakrishnan LAST NAME: Kalarikovilagam Subramanian M12821535
17 pages
Task-2 Example Code
No ratings yet
Task-2 Example Code
8 pages
Data Analysis Using Python
No ratings yet
Data Analysis Using Python
12 pages
Data Cleaning With Python and Pandas
No ratings yet
Data Cleaning With Python and Pandas
49 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
5 pages
FRA Milestone1 - Maminulislam
100% (4)
FRA Milestone1 - Maminulislam
23 pages
Ensemble Techniques Project
100% (2)
Ensemble Techniques Project
28 pages
Day 3 - Notes Interview Questions
No ratings yet
Day 3 - Notes Interview Questions
36 pages
CSC 452 DM Week04 Data PreProcessing A 13102020 015436pm
No ratings yet
CSC 452 DM Week04 Data PreProcessing A 13102020 015436pm
31 pages
Ass-2 Ds
No ratings yet
Ass-2 Ds
29 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
22 pages
Credit Eda Case Study Analysis
75% (4)
Credit Eda Case Study Analysis
13 pages
Asg One
No ratings yet
Asg One
10 pages
PracticalWeek02
No ratings yet
PracticalWeek02
1 page
Project 5
No ratings yet
Project 5
29 pages
EDA Credit Assignment Shakti - PDF
No ratings yet
EDA Credit Assignment Shakti - PDF
51 pages
Assignment 1 DA - E Oct 2023 V1-1
No ratings yet
Assignment 1 DA - E Oct 2023 V1-1
3 pages
EDA SummaryReport Filled
No ratings yet
EDA SummaryReport Filled
4 pages
Assignment Question
No ratings yet
Assignment Question
6 pages
Pandas 1
No ratings yet
Pandas 1
13 pages
DM Preprocessing Lec4,5
No ratings yet
DM Preprocessing Lec4,5
36 pages
What Is Malware - and Its Types - GeeksforGeeks
No ratings yet
What Is Malware - and Its Types - GeeksforGeeks
16 pages
Excel Keyboard Shortcuts Guide
No ratings yet
Excel Keyboard Shortcuts Guide
1 page
990 3773J EN TypeA
No ratings yet
990 3773J EN TypeA
63 pages
Day 21-24 Presentation Software (Rules in Creating Presentations,,masterslide,,motion Path, Animation Pane)
No ratings yet
Day 21-24 Presentation Software (Rules in Creating Presentations,,masterslide,,motion Path, Animation Pane)
35 pages
Slides 02 Programming Languages - UET CS - Talha Waheed - Classification of PL
No ratings yet
Slides 02 Programming Languages - UET CS - Talha Waheed - Classification of PL
27 pages
Mitel 6863 SIP Phone Datasheet
No ratings yet
Mitel 6863 SIP Phone Datasheet
3 pages
Flow Chart 2
No ratings yet
Flow Chart 2
16 pages
Blinkit Customer Satisfaction Report
100% (1)
Blinkit Customer Satisfaction Report
43 pages
Backup and RestoreOnLinux
No ratings yet
Backup and RestoreOnLinux
4 pages
MIS Unit 1 Notes
No ratings yet
MIS Unit 1 Notes
7 pages
GEC LIE Report Group 1
No ratings yet
GEC LIE Report Group 1
17 pages
Secure PA Systems for Large Projects
No ratings yet
Secure PA Systems for Large Projects
8 pages
Unit 3 Flashcards
No ratings yet
Unit 3 Flashcards
15 pages
Project 5 B
No ratings yet
Project 5 B
5 pages
Cloudscheduling Backfills
No ratings yet
Cloudscheduling Backfills
19 pages
Ansible Fundamentals To Advance
No ratings yet
Ansible Fundamentals To Advance
18 pages
Unit 1
No ratings yet
Unit 1
83 pages
Quectel GSM FILE AT Commands Manual V1.5C
No ratings yet
Quectel GSM FILE AT Commands Manual V1.5C
24 pages
Information Technology Used in Housekeeping
No ratings yet
Information Technology Used in Housekeeping
10 pages
Networking Devices and Initial Configuration
No ratings yet
Networking Devices and Initial Configuration
59 pages
UM - E-OCD II Debugger Manual - V1.0.2
No ratings yet
UM - E-OCD II Debugger Manual - V1.0.2
92 pages
1-06-09019 Return Material Authorization (RMA) Form
No ratings yet
1-06-09019 Return Material Authorization (RMA) Form
1 page
Spm-Unit Ii Chap-Ii
No ratings yet
Spm-Unit Ii Chap-Ii
59 pages
Salesforce Developer Interview Q&A
No ratings yet
Salesforce Developer Interview Q&A
8 pages
Heidelberg Prepress Manager Integration Datasheet US
No ratings yet
Heidelberg Prepress Manager Integration Datasheet US
3 pages
Mod Menu Crash 2023 08 30-21 42 27
No ratings yet
Mod Menu Crash 2023 08 30-21 42 27
2 pages
HPC for Industry and Research
No ratings yet
HPC for Industry and Research
73 pages
? Class 12 Python Notes
No ratings yet
? Class 12 Python Notes
5 pages
Maintenance and Reengineering Project 1
No ratings yet
Maintenance and Reengineering Project 1
12 pages
Acrobat DC
No ratings yet
Acrobat DC
10 pages