0% found this document useful (0 votes)

35 views11 pages

Datanot

The document outlines a SQL query to analyze customer spending and savings data, connecting multiple tables to derive insights about customer behavior. It includes R code for data processing, model training using decision trees, and prediction evaluation through resampling methods. Additionally, it discusses the implementation of KNN for gender prediction based on financial metrics, emphasizing the importance of proper data handling and model validation.

Uploaded by

zeynep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views11 pages

Datanot

Uploaded by

zeynep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

SELECT T2.*, T4.GENDER, T4.MSTAT,T4.CITY,T8.

BALANCE
FROM
(SELECT T1.CREDT_CARD_NUM,
SUM(T1.TRANS_AMOUNT) TOT_EXP,
MIN(T1.TRANS_DATE) FIRST_DATE,
MAX(T1.TRANS_DATE) LAST_DATE
FROM CARD_TRANS T1
WHERE T1.TRANS_TYPE = '-'
GROUP BY T1.CREDT_CARD_NUM) T2,
M_CREDIT_CARDS T3,
M_CUSTOMERS T4,
(SELECT T6.*,T7.BALANCE
FROM
(SELECT T5.CUSTOMER_ID,
MAX(T5.TRANSACTION_DATE) LAST_DATE_SAVE
FROM SAVINGS_TRANS T5
GROUP BY T5.CUSTOMER_ID) T6,
SAVINGS_TRANS T7
WHERE T6.CUSTOMER_ID = T7.CUSTOMER_ID
AND T6.LAST_DATE_SAVE = T7.TRANSACTION_DATE) T8
WHERE T2.CREDT_CARD_NUM = T3.CREDIT_CARD_ID
AND T3.CUSTOMER_ID = T4.CUSTOMER_ID
AND T8.CUSTOMER_ID = T3.CUSTOMER_ID

We need to connect CARD_TRANS (T1) table to M_CUSTOMERs (T4). Therefore, although we won't use
M_CREDIT_CARDS (T3) we need to include it.

We have two subqueries here

(SELECT T1.CREDT_CARD_NUM,
SUM(T1.TRANS_AMOUNT) TOT_EXP,
MIN(T1.TRANS_DATE) FIRST_DATE,
MAX(T1.TRANS_DATE) LAST_DATE
FROM CARD_TRANS T1
WHERE T1.TRANS_TYPE = '-'
GROUP BY T1.CREDT_CARD_NUM) T2
This is the first one , it is to get total expending fist date and last date,
Because we need average spending

(SELECT T5.CUSTOMER_ID,
MAX(T5.TRANSACTION_DATE) LAST_DATE_SAVE
FROM SAVINGS_TRANS T5
GROUP BY T5.CUSTOMER_ID) T6
This is the second one, we contructed a table where we get maximumum transaction date to get the
current balance for a customer.

WHERE T6.CUSTOMER_ID = T7.CUSTOMER_ID

AND T6.LAST_DATE_SAVE = T7.TRANSACTION_DATE) T8
WHERE T2.CREDT_CARD_NUM = T3.CREDIT_CARD_ID
AND T3.CUSTOMER_ID = T4.CUSTOMER_ID
AND T8.CUSTOMER_ID = T3.CUSTOMER_ID
This is the last part; we did this to join tables and prevent excessive information.

R tutorial:
#THIS CODE TO GET DATA INTO R=
library(odbc)
library(DBI)
con<- DBI::dbConnect(
odbc::odbc(),
Driver = "SQL Server",
Server = "10.1.10.6",
Database = "SaversBankDb",
UID= "student",
PWD= "Khas2020!",
Port= 1433
) #WE HAVE THE CONNECTION

dataseQ1t<- dbGetQuery(con,
"SELECT T3.CUSTOMER_ID,T2.*, T4.GENDER,
T4.MSTAT,T4.CITY,T8.BALANCE
FROM
(SELECT T1.CREDT_CARD_NUM,
SUM(T1.TRANS_AMOUNT) TOT_SPEND,
MIN(T1.TRANS_DATE) FIRST_DATE,
MAX(T1.TRANS_DATE) LAST_DATE
FROM CARD_TRANS T1
WHERE T1.TRANS_TYPE = '-'
GROUP BY T1.CREDT_CARD_NUM) T2,
M_CREDIT_CARDS T3,
M_CUSTOMERS T4,
(SELECT T6.*,T7.BALANCE
FROM
(SELECT T5.CUSTOMER_ID,
MAX(T5.TRANSACTION_DATE) LAST_DATE_SAVE
FROM SAVINGS_TRANS T5
GROUP BY T5.CUSTOMER_ID) T6,
SAVINGS_TRANS T7
WHERE T6.CUSTOMER_ID = T7.CUSTOMER_ID
AND T6.LAST_DATE_SAVE = T7.TRANSACTION_DATE) T8
WHERE T2.CREDT_CARD_NUM = T3.CREDIT_CARD_ID
AND T3.CUSTOMER_ID = T4.CUSTOMER_ID
AND T3.CREDIT_CARD_ID = T4.CREDIT_CARD_ID
AND T8.CUSTOMER_ID = T3.CUSTOMER_ID")

#always check out the summary to make sure

summary(dataset)

#date is in character format as we see in summary so we need to change it

datasetQ1$FIRST_DATE <- as.Date(datasetQ1$FIRST_DATE)

datasetQ1$LAST_DATE <- as.Date(datasetQ1$LAST_DATE)
datasetQ1$day_diff <- as.numeric(datasetQ1$LAST_DATE - datasetQ1$FIRST_DATE)

#what we need is to calculate average spending

datasetQ1$AVG_AMOUNT <- datasetQ1$TOT_SPEND / datasetQ1$day_diff

#LAB
#we have average daily spending and we are going to create a class for consumer
datasetQ1$Consumer <- (datasetQ1$AVG_AMOUNT<150)*1
head(datasetQ1)
#this will give us whenever we have larger than 150 we will have class 1 the rest is zero. Now we have
class variable Consumer and we will use city, martial status, gender, and savings balance as classifiers

#as we see we have problem in gender column it shows 1 for male and F for female so change 1 as M
to prevent problematic situations:
datasetQ1$GENDER[datasetQ1==1] = "M"

#lapply decision tree model, to be able to do that we need to call rpart library
library(rpart)

#our dependent variable is Consumer because it is the class variable

m1 <- rpart(Consumer ~ CITY + GENDER + MSTAT + BALANCE , data= datasetQ1)
m1
#read results:

#1) we have root and

#2) we have dinstinction on CITY (ADN, ANK, ANTEP…); 3297 observations and zeros. That zeros
represent:
[n_i (number of observations from class i)] /[ sum(n_i)(all observations in the subset ] =
0/3297 = 0 (we have perfect homogeneous subset)
#so if this is zero then it means there is no consumer in these cities.
#3) On the other hand in İstanbul there are 499 observations and percentage of consumers are
0.9458918
(this is an estimation for probability, apparently model splits the data into two parts just using the
city, the rest should be insignificant.

#Question: give a prediction for a new female customer who lives in istanbul, married and has a 5000
balance in their account?
#by just looking at the previous table you can say that with 0.94589 probability, it is a consumer and
in class one (because she is in İstanbul)

newdata = data.frame(CITY= "IST", GENDER ="F", MSTAT ="M", BALANCE=5000)

predict(m1, newdata=newdata)
#this will give the same result which is 0.9458918
#we built a new dataset for new customer and then we developed the predictions

We also interested in problems of predictions: resampling methods to verify our models. How do we do
resampling?
One approach is divide your dataset into two parts, validation set and test set. Split is done by taking a
random sample from the dataset and we take the first part as train set and use the rest for test set.
For validation we will use 50 + 50 for now. Half for training and half for testing.
How can we take a random sample from our dataset? We can control the row indices of the table. To do
this we need a special function called: sample()

#RESAMPLING CODE:
c(1:10)
sample(c(1:10), 4)
#we create a vector from 1 to 10 and we will use 4 observations from this vector. If you execute the
same code you will get different results because it is a random sampling method.
dim(datasetQ1)
#function dim() gives you the dimension of the table, first one gives you the number of rows second
one gives you the number of columns:

#row number is 3796,so we will create a vector of 1 to 3796 and we will take a random sample
#so contructing random vector for indicating row indeces of table for training set:
#we create a vector from 1 to 3796 and take randomly half of it and call it training set:
index.train <- sample(c(1:dim(datasetQ1)[1]), dim(datasetQ1[1]/2//2)

datasetQ1.train <- datasetQ1[index.train,]

datasetQ1.test <- datasetQ1[-index.train,] #there is a minus sign before index.train because it will
remove that element e.g your vector c(1:4)[-2] = romoves 2nd element = 1,3,4

#now we will use training set to train our model and test set for calculating the accuracy:
m2 <- rpart(Consumer ~ CITY + GENDER + MSTAT + BALANCE , data= datasetQ1.train)
m2 #see the results, it is slightly different than m1 but the structure is same
predictions <- predict(m1,newdata= datasetQ1.test)

datasetQ1.test$Predict = predictions

#Last column is predicted value , consumer column is our observations. Last column is probability
between zero and 1, but how can i turn this probability into class prediction? By using a treshold , lets
use 0.8 as a treshold. We do not know 0.8 is a good choice or not, it depends on false negative and
false positive.

tau=0.8
datasetQ1.test$Predict.Class <- (predictions>tau)*1
#*1 means if it is larger than tau we are going to call this class 1
#result:
#now we need to build a confusion matrix to do that we should have TP,TN, FP,FN:
#to calculate true positive we need observations that are predicted positive and real observations
that are equal to positive at the same time:
TP = sum((datasetQ1.test$Predict.Class==1)&(datasetQ1.test$Consumer==1))
TN = sum((datasetQ1.test$Predict.Class==0)&(datasetQ1.test$Consumer==0))
FP = sum((datasetQ1.test$Predict.Class==1)&(datasetQ1.test$Consumer==0))
FN = sum((datasetQ1.test$Predict.Class==0)&(datasetQ1.test$Consumer==1))
#we took the summation of them because these are logical matrices ,but we need to have the number
of TPs or etc.)

confusion.mat <- matrix(c(TN,FP,FN,TP),2,2) #be careful: matrix function locates vectors vertically ->
TN, FP first column FN,TP second column
confusion.mat
#result:

#precision = TP/P* this is the precision value for tau= 0.8

Precision08 = TP / (TP+FP)

#precision value depends on random sample that we took, so how can we say this is a reliable,
accurate calculation of precision? If not, how can we change it to make it reliable? --> Do this sampling
over and over again and then take the average of precision values. By doing this, we remove this
situation from randomness. USE FOR loop.
#for each replication we are going to calculate precision value and keep it in a vector so then later on
we can take the average. Same code in a for loop :

#initializing precision vectors (NOTE: in class we also calculated tau=0.2 here):

Precision08.vect <- 0
Precision02.vect <- 0
for(r in 1:100)
{
index.train <- sample(c(1:dim(datasetQ1)[1]), dim(datasetQ1[1]/2)
datasetQ1.train <- datasetQ1[index.train,]
datasetQ1.test <- datasetQ1[-index.train,]
m2 <- rpart(Consumer ~ CITY + GENDER + MSTAT + BALANCE , data= datasetQ1.train)
m2
predictions <- predict(m1,newdata= datasetQ1.test)
datasetQ1.test$Predict = predictions
tau=0.8
datasetQ1.test$Predict.Class <- (predictions>tau)*1
TP = sum((datasetQ1.test$Predict.Class==1)&(datasetQ1.test$Consumer==1))
TN = sum((datasetQ1.test$Predict.Class==0)&(datasetQ1.test$Consumer==0))
FP = sum((datasetQ1.test$Predict.Class==1)&(datasetQ1.test$Consumer==0))
FN = sum((datasetQ1.test$Predict.Class==0)&(datasetQ1.test$Consumer==1))
confusion.mat <- matrix(c(TN,FP,FN,TP),2,2)
Precision08.vect[r] = TP / (TP+FP)

index.train <- sample(c(1:dim(datasetQ1)[1]), dim(datasetQ1[1]/2//2)

datasetQ1.train <- datasetQ1[index.train,]
datasetQ1.test <- datasetQ1[-index.train,]
m2 <- rpart(Consumer ~ CITY + GENDER + MSTAT + BALANCE , data= datasetQ1.train)
m2
predictions <- predict(m1,newdata= datasetQ1.test)
datasetQ1.test$Predict = predictions
tau=0.2
datasetQ1.test$Predict.Class <- (predictions>tau)*1
TP = sum((datasetQ1.test$Predict.Class==1)&(datasetQ1.test$Consumer==1))
TN = sum((datasetQ1.test$Predict.Class==0)&(datasetQ1.test$Consumer==0))
FP = sum((datasetQ1.test$Predict.Class==1)&(datasetQ1.test$Consumer==0))
FN = sum((datasetQ1.test$Predict.Class==0)&(datasetQ1.test$Consumer==1))
confusion.mat <- matrix(c(TN,FP,FN,TP),2,2)
Precision02.vect[r] = TP / (TP+FP)
}

#then we will take their mean value (this is more reliable than we calculated for just one value)
mean(Precision 02.vect)
mean(Precision08.vect)

NOTE: 2nd question of this lab is cancelled

#knn cross-validation
#QUESTION3
datasetQ3<- dbGetQuery(con,
"SELECT T2.*,T3.GENDER, T4.MONTHLY_INCOME
FROM
(SELECT T1.CUSTOMER_ID,T1.SAVINGS_ACCOUNT,
AVG(T1.BALANCE) AVG_BALANCE,AVG(T1.INVESTMENT) AVG_TRANS
FROM SAVINGS_TRANS T1
GROUP BY T1.CUSTOMER_ID,T1.SAVINGS_ACCOUNT) T2,
M_CUSTOMERS T3,
M_CREDIT_CARDS T4
WHERE T2.CUSTOMER_ID = T3.CUSTOMER_ID
AND T2.SAVINGS_ACCOUNT = T3.SAVING_ACCNT
AND T3.CREDIT_CARD_ID = T4.CREDIT_CARD_ID
AND T3.CUSTOMER_ID = T4.CUSTOMER_ID")
head(datasetQ3)
summary(datasetQ3)
datasetQ3$GENDER[datasetQ3$GENDER=="1"] = "M"
datasetQ3$GENDER.num<- (datasetQ3$GENDER=="M")*1
datasetQ3$MONTHLY_INCOME<- as.numeric(datasetQ3$MONTHLY_INCOME)

#we reorganized our table to make it ready for use and it is now ready for knn
#knn algorithm works with class library
library(class)
?knn

#knn algorithm checks k-closest observation and choose the most frequent one as the prediction. R
asks us to provide k value = number of neighbours considered.
#suppose we do not have observations for test set, what we can do is validation. Lets take 80 percent
of this data as training set, and take 20 percent as test set and for this split get this data randomly:

index.train <- sample( c( 1:dim(datasetQ3)[1] ), round( dim( datasetQ3 )[1]*0.80 ) )

#this is the vector of all row index. 1st element of the dimension function gives us #number of rows,
thats why we took just first element of it.-
# --> c(1:dim(datasetQ3)[1])

datasetQ3.train <- datasetQ3[index.train,]

datasetQ3.test <- datasetQ3[-index.train,]

#now we can run our knn but first we need to think about which column should be #used as an
independent variable. The ones that have prediction power over our class variable which is gender in
this case will be used (e.g. Customer id is just a random number, but avg balance, avg trans and
monthly income can have a prediction power over gender)

head(datasetQ3)

#so we will only get these 3 columns: balance, transaction and monthly income as training set and we
will only provide these columns to the knn algorithm. THIS IS SMTHNG THAT YOU SHOULD BE
CAREFUL IN KNN
train <- datasetQ3.train[,c(3,4,6)] #you are assigning just 3rd, 4th and 5th columns of datasetQ3.train
to the new dataset "train", same idea for test set:
test <- datasetQ3.test[,c(3,4,6)]
class <- datasetQ3.train$GENDER.num
pred1 <- knn(train, test, class, k=1)
pred1
#result:

#so far we learned some data formats like numbers, characters, date, etc. Now we have another
format which is factor (levels:0 1, that has shown in the result). It is a categorical variable for R, so R
has a specific format for categorical variables. It is difficult to deal with it. We will change it into
numbers. To do so, first we need to change factor into character, and then change it into numeric:
datasetQ3.test$pred1 <- as.numeric(as.character(pred1))

#whenever you see a factor variable you need to make this transformation
#next step is to calculate true positive and true negatives rate.

head(datasetQ3.test)
#result (now we have predictions for each gender for k=1):

#just last one is true positive,

#from 1 to 4 --> 4 false positive
#5th one --> true negative

#to calculate all, you will compare GENDER.num and pred1 columns and count them
TP = sum((datasetQ3.test$GENDER.num == 1)&(datasetQ3.test$pred1 ==1))
TN = sum((datasetQ3.test$GENDER.num == 0)&(datasetQ3.test$pred1 ==0))
FP = sum((datasetQ3.test$GENDER.num == 0)&(datasetQ3.test$pred1 ==1))
FN = sum((datasetQ3.test$GENDER.num == 1)&(datasetQ3.test$pred1 ==0))
confuse.mat <- matrix(c(TN,FP,FN,TP),2,2)
confuse.mat
#result:

#without confusion matrix we can't calculate precision(true positives over all positives) for k=1.
precision1 = TP/ sum((datasetQ3.test$pred1==1))

#change k value to find the best result for precision:

pred3 <- knn(train, test, class, k=3)

datasetQ3.test$pred3 <- as.numeric(as.character(pred3))
TP = sum((datasetQ3.test$GENDER.num == 1)&(datasetQ3.test$pred3 ==1))
TN = sum((datasetQ3.test$GENDER.num == 0)&(datasetQ3.test$pred3 ==0))
FP = sum((datasetQ3.test$GENDER.num == 0)&(datasetQ3.test$pred3 ==1))
FN = sum((datasetQ3.test$GENDER.num == 1)&(datasetQ3.test$pred3 ==0))
confuse.mat <- matrix(c(TN,FP,FN,TP),2,2)
precision3 = TP/ sum((datasetQ3.test$pred3==1))

Pred5 <- knn(train, test, class, k=5)

datasetQ3.test$pred5 <- as.numeric(as.character(pred5))
TP = sum((datasetQ3.test$GENDER.num == 1)&(datasetQ3.test$pred5 ==1))
TN = sum((datasetQ3.test$GENDER.num == 0)&(datasetQ3.test$pred5 ==0))
FP = sum((datasetQ3.test$GENDER.num == 0)&(datasetQ3.test$pred5 ==1))
FN = sum((datasetQ3.test$GENDER.num == 1)&(datasetQ3.test$pred5 ==0))
confuse.mat <- matrix(c(TN,FP,FN,TP),2,2)
precision5 = TP/ sum((datasetQ3.test$pred5==1))

#we calculated precision values for k= 1 , 3 and 5. compare them:

c(precision1,precision3,precision5)
#result (k=3 seems the best, but we should be careful; you get this result based on a random sample.
We need to replicate this couple of times then we should calculate the best value afterwards):

precision1.vect = precision3.vect=precision.5=0
for( r in 1:100)
{
index.train <- sample( c( 1:dim(datasetQ3)[1] ), round( dim( datasetQ3 )[1]*0.80 ) )
datasetQ3.train <- datasetQ3[index.train,]
datasetQ3.test <- datasetQ3[-index.train,]
train <- datasetQ3.train[,c(3,4,6)]
test <- datasetQ3.test[,c(3,4,6)]
class <- datasetQ3.train$GENDER.num

pred1 <- knn(train, test, class, k=1)

datasetQ3.test$pred1 <- as.numeric(as.character(pred1))
TP = sum((datasetQ3.test$GENDER.num == 1)&(datasetQ3.test$pred1 ==1))
TN = sum((datasetQ3.test$GENDER.num == 0)&(datasetQ3.test$pred1 ==0))
FP = sum((datasetQ3.test$GENDER.num == 0)&(datasetQ3.test$pred1 ==1))
FN = sum((datasetQ3.test$GENDER.num == 1)&(datasetQ3.test$pred1 ==0))
confuse.mat <- matrix(c(TN,FP,FN,TP),2,2)
precision1.vect[r] = TP/ sum((datasetQ3.test$pred1==1))

pred3 <- knn(train, test, class, k=3)

Pred5 <- knn(train, test, class, k=5)

c(precision1,precision3,precision5)
}

#after creating these codes inside of a for loop we can take the mean values:

#now best value is changed. For one sample it was k=3, but for 100 replications the best k value
becomes k=1.

Statistics Consulting Cheat Sheet: Kris Sankaran October 1, 2017
100% (1)
Statistics Consulting Cheat Sheet: Kris Sankaran October 1, 2017
44 pages
R Machine Learning Lab Guide
0% (1)
R Machine Learning Lab Guide
9 pages
Elevator Installation Safety Guide
No ratings yet
Elevator Installation Safety Guide
30 pages
S14 Zenki ECU Pinout Guide
No ratings yet
S14 Zenki ECU Pinout Guide
1 page
R Assignment
No ratings yet
R Assignment
8 pages
ML Algorithm
No ratings yet
ML Algorithm
1 page
ANZ Virtual Internship Module Model Answer For Task 1
No ratings yet
ANZ Virtual Internship Module Model Answer For Task 1
7 pages
Sakhil Assignment 02
No ratings yet
Sakhil Assignment 02
8 pages
Shreve S.E. Stochastic Calculus For Finance I.. The Binomial Asset Pricing Model
No ratings yet
Shreve S.E. Stochastic Calculus For Finance I.. The Binomial Asset Pricing Model
203 pages
Orange 3
100% (1)
Orange 3
46 pages
5) Basel Ctd4 (Credit Scoring Approach)
No ratings yet
5) Basel Ctd4 (Credit Scoring Approach)
1 page
Zuber CFT Lectures
No ratings yet
Zuber CFT Lectures
36 pages
Hydraulics Course for Marine Engineers
No ratings yet
Hydraulics Course for Marine Engineers
1 page
30 Days ML Projects Challenge
No ratings yet
30 Days ML Projects Challenge
288 pages
IT SKILL LAB KMBN MBA 1st Sem
No ratings yet
IT SKILL LAB KMBN MBA 1st Sem
23 pages
ML Assignemnt PDF
No ratings yet
ML Assignemnt PDF
21 pages
Orange3 Data Mining Library Using Python
50% (2)
Orange3 Data Mining Library Using Python
102 pages
Credit Card Fraud Detection Methods
100% (1)
Credit Card Fraud Detection Methods
20 pages
SSRN 5162304
No ratings yet
SSRN 5162304
271 pages
R Companion Data Mining
No ratings yet
R Companion Data Mining
370 pages
Supervised Learning
100% (1)
Supervised Learning
15 pages
Imbalanced Classes in Big Data
No ratings yet
Imbalanced Classes in Big Data
20 pages
MKT4080-Codes
No ratings yet
MKT4080-Codes
9 pages
Classification
No ratings yet
Classification
3 pages
Column Base Plate Calculation Report
No ratings yet
Column Base Plate Calculation Report
13 pages
Electromagnetism Research Paper
No ratings yet
Electromagnetism Research Paper
3 pages
Universal Bank Case Solution
No ratings yet
Universal Bank Case Solution
9 pages
Project
No ratings yet
Project
16 pages
Unit Iii
No ratings yet
Unit Iii
67 pages
ISYE 6501 Notes
No ratings yet
ISYE 6501 Notes
45 pages
Data Science and ML - End Term
No ratings yet
Data Science and ML - End Term
4 pages
Data Analysis Chap 3
No ratings yet
Data Analysis Chap 3
21 pages
Antilock Brake System 4f
No ratings yet
Antilock Brake System 4f
24 pages
List - Midterm - 1 ML
No ratings yet
List - Midterm - 1 ML
6 pages
ISYE6501 Homework 2
No ratings yet
ISYE6501 Homework 2
11 pages
Saurabh
No ratings yet
Saurabh
22 pages
MLS 2 - Classification
No ratings yet
MLS 2 - Classification
13 pages
Articles Xgboost Classification With Smote-Enn Algorithm
No ratings yet
Articles Xgboost Classification With Smote-Enn Algorithm
11 pages
Unit 5
No ratings yet
Unit 5
18 pages
Practical Machine Learning Guide
No ratings yet
Practical Machine Learning Guide
7 pages
Final Project
No ratings yet
Final Project
9 pages
Strong Swan Documentation (Updated Till Eap-Md5)
No ratings yet
Strong Swan Documentation (Updated Till Eap-Md5)
58 pages
Intermediate Analytics-Regression-Week 3-1
No ratings yet
Intermediate Analytics-Regression-Week 3-1
44 pages
Lab 7 - Bias and Variance
No ratings yet
Lab 7 - Bias and Variance
5 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
Project Report-Micro Credit Loan
No ratings yet
Project Report-Micro Credit Loan
8 pages
Machine Learning for IT Students
No ratings yet
Machine Learning for IT Students
99 pages
Machine Learning Overview & SVMs
No ratings yet
Machine Learning Overview & SVMs
378 pages
Fiche Econo 2
No ratings yet
Fiche Econo 2
14 pages
Interview Questions Companie
No ratings yet
Interview Questions Companie
72 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
FRA Assignment - India Credit Model
No ratings yet
FRA Assignment - India Credit Model
14 pages
Computer Lab 2 Block 1-3
No ratings yet
Computer Lab 2 Block 1-3
7 pages
Financial Risk Analytics: Assignment
No ratings yet
Financial Risk Analytics: Assignment
35 pages
Learning Book 11 Feb
No ratings yet
Learning Book 11 Feb
322 pages
Kit - 500 Coating Thickness Gauge
No ratings yet
Kit - 500 Coating Thickness Gauge
8 pages
Glass Ceramics PDF
No ratings yet
Glass Ceramics PDF
80 pages
200749205339
No ratings yet
200749205339
10 pages
Capstone Assessment
No ratings yet
Capstone Assessment
18 pages
Codes
No ratings yet
Codes
14 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
Master Endre Final
No ratings yet
Master Endre Final
116 pages
Big Data Lesson 2 Lucrezia Noli
No ratings yet
Big Data Lesson 2 Lucrezia Noli
21 pages
Credit Risk Classification Analysis
No ratings yet
Credit Risk Classification Analysis
16 pages
Chapter 02 Overview (R)
No ratings yet
Chapter 02 Overview (R)
43 pages
Bookdown Demo PDF
No ratings yet
Bookdown Demo PDF
19 pages
Unit 3 in Machine Intelligence
No ratings yet
Unit 3 in Machine Intelligence
62 pages
20mat21 Class Question Paper Notes PDF
No ratings yet
20mat21 Class Question Paper Notes PDF
3 pages
Shader Tweaks for Gamers
No ratings yet
Shader Tweaks for Gamers
44 pages
A Note On R
No ratings yet
A Note On R
90 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
CAED Assignment Questions
No ratings yet
CAED Assignment Questions
3 pages
New Pattern Input Output Exam Cart
No ratings yet
New Pattern Input Output Exam Cart
55 pages
Proteus CT1628 Electrical Simulation
No ratings yet
Proteus CT1628 Electrical Simulation
4 pages
JavaScript Global Object and Promise Polyfills
No ratings yet
JavaScript Global Object and Promise Polyfills
88 pages
Handbook of Shanti Swarup Bhatnagar Prize Winners (1958 - 1998)
No ratings yet
Handbook of Shanti Swarup Bhatnagar Prize Winners (1958 - 1998)
118 pages
Bunn Programing Manual
No ratings yet
Bunn Programing Manual
18 pages
B10 AutoCAD 201222
No ratings yet
B10 AutoCAD 201222
2 pages
Rectifier-RM2048XE PDF
No ratings yet
Rectifier-RM2048XE PDF
2 pages
Electronic Cheat Sheet
No ratings yet
Electronic Cheat Sheet
1 page
Design Animation Tutorial #1: Assembly Sequence of Manifold
No ratings yet
Design Animation Tutorial #1: Assembly Sequence of Manifold
7 pages
2024 Spring Project
No ratings yet
2024 Spring Project
7 pages
Attention Stern Tube 27-03-2025
No ratings yet
Attention Stern Tube 27-03-2025
2 pages
SPPS M1507 D Datasheet
No ratings yet
SPPS M1507 D Datasheet
2 pages
Guidelines AdvancedWebProgramming
No ratings yet
Guidelines AdvancedWebProgramming
2 pages

Datanot

Uploaded by

Datanot

Uploaded by

SELECT T2.*, T4.GENDER, T4.MSTAT,T4.CITY,T8.

We have two subqueries here

WHERE T6.CUSTOMER_ID = T7.CUSTOMER_ID

#always check out the summary to make sure

#date is in character format as we see in summary so we need to change it

datasetQ1$FIRST_DATE <- as.Date(datasetQ1$FIRST_DATE)

#what we need is to calculate average spending

#our dependent variable is Consumer because it is the class variable

#1) we have root and

newdata = data.frame(CITY= "IST", GENDER ="F", MSTAT ="M", BALANCE=5000)

datasetQ1.train <- datasetQ1[index.train,]

#precision = TP/P* this is the precision value for tau= 0.8

#initializing precision vectors (NOTE: in class we also calculated tau=0.2 here):

index.train <- sample(c(1:dim(datasetQ1)[1]), dim(datasetQ1[1]/2//2)

NOTE: 2nd question of this lab is cancelled

index.train <- sample( c( 1:dim(datasetQ3)[1] ), round( dim( datasetQ3 )[1]*0.80 ) )

datasetQ3.train <- datasetQ3[index.train,]

#just last one is true positive,

#change k value to find the best result for precision:

pred3 <- knn(train, test, class, k=3)

Pred5 <- knn(train, test, class, k=5)

#we calculated precision values for k= 1 , 3 and 5. compare them:

pred1 <- knn(train, test, class, k=1)

pred3 <- knn(train, test, class, k=3)

Pred5 <- knn(train, test, class, k=5)

You might also like