Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
22 views7 pages

Practical Machine Learning Guide

Uploaded by

Dpt Htegn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views7 pages

Practical Machine Learning Guide

Uploaded by

Dpt Htegn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

PE IV - Practical Machine Learning

in sample error = error resulted from applying your prediction algorithm to the dataset you built it
with
– also known as resubstitution error
– often optimistic (less than on a new sample) as the model may be tuned to error of the sample

out of sample error = error resulted from applying your prediction algorithm to a new data set
– also known as generalization error
– out of sample error most important as it better evaluates how the model should perform

in sample error < out of sample error


– reason is over-fitting: model too adapted/optimized for the initial dataset

when discussing the outcome decided on by the algorithm, Positive = identified and negative =
rejected
– True positive = correctly identified (predicted true when true)
– False positive = incorrectly identified (predicted true when false)
– True negative = correctly rejected (predicted false when false)
– False negative = incorrectly rejected (predicted false when true)

example: medical testing


– True positive = Sick people correctly diagnosed as sick
– False positive = Healthy people incorrectly identified as sick
– True negative = Healthy people correctly identified as healthy
– False negative = Sick people incorrectly identified as healthy

PE IV - Practical Machine Learning 1


Receiver Operating Characteristic Curves:
are commonly used techniques to measure the quality of a prediction algorithm.
predictions for binary classification often are quantitative (i.e. probability, scale of 1 to 10)
different cutoffs/threshold of classification (> 0.8 → one outcome) yield different results/predictions
Receiver Operating Characteristic curves are generated to compare the different outcomes

x-axis = 1 - specificity (or, probability of false positive)


y-axis = sensitivity (or, probability of true positive)
points plotted = cutoff/combination
areas under curve = quantifies whether the prediction model is viable or not

Cross Validation:
Random subsampling: a randomly sampled test set is subsetted out from the original training set, the predictor
is built on the remaining training data and applied to the test set
K-folds method: Breaking data into 3 subsets, building models for all three, and comparing them. larger k =
less bias, more variance, smaller k = more bias, less variance

k <- 5 # Number of folds for cross-validation


folds <- createFolds(data$Species, k = k, list = TRUE, returnTrain = FALSE, folds = k)

# Explanation of Parameters:
# data$Species: The outcome variable or grouping factor used for stratified sampling
# k: Number of folds for cross-validation
# list = TRUE: Returns a list of indices; each list element represents a fold
# returnTrain = FALSE: Specifies that only the indices for test/validation sets are returned
# folds = k: Specifies the number of folds (optional; usually the default value)

Leave one out: Exactly use one sample and train the model on the other samples and test with one sample left
out in the beginning.

Data Preprocessing: (Caret Package)


createDataPartition - Multiple partitions in the data and test

library(caret)
createDataPartition(y=data$var, times=1, p=0.75, list=FALSE)

createFolds - Multiple folds in the data

library(caret)
createFolds(y=data$var, k=10, list=TRUE, returnTrain=TRUE)

createResample -Multiple samples in the data

library(caret)
resamples <- createResample(y=spam$type,times=10,list=TRUE)

PE IV - Practical Machine Learning 2


createTimeSlices - initialWindow and horizon

library(caret)
tme <- 1:1000
# create time slices
folds <- createTimeSlices(y=tme,initialWindow=20,horizon=10)

Training Options:
train() default method is rf(random forest)

train(y ~ x, data=df, method="glm")


# function to apply the machine learning algorithm to construct model from training data

trainControl() creates object that sets many options for how the model will be applied for training

method = "boot" -> meant for bootstrapping(drawing without replacement)

Plotting Predictors:

featurePlot(x=preds, y= outcomes, plot = "pairs")


#Plots relationship between preds and outcomes
qplot(age, wage, color=eduction, data=training)
#qplot can also be used with diff colors
cut2(variable, g=3)
#creates a new factor variable by cutting the specified variable into n groups (3 in this case) based on percentiles

Centering and scaling:

train(y~x, data=training, preProcess = c("center","scale"))


#preProcess method: BoxCox, knnImpute

Creating dummy variables by converting factor variables

inTrain <- createDataPartition(y=Wage$wage,p=0.7, list=FALSE)


training <- Wage[inTrain,]; testing <- Wage[-inTrain,]
# create a dummy variable object
dummies <- dummyVars(wage ~ jobclass,data=training)

Removing zero covariates as no variability and hurts the model

nearZeroVar(training, saveMetrics = true)


#returns the stats of the data sets

Creating splines through splines package.. bs() and poly() functions

PE IV - Practical Machine Learning 3


bsBasis <- bs(training$age,df=3)

Multicore Parallel processing.

doMC::registerDoMC(cores=4)
#using multiple cores for data intensive model.. doMC package ..

PCA used to reduce the predictors, data compression, capture most of the information, linear type models

pr <-prcomp(data)
#does PCA on the data
#RMSE test > RMSE train,

# Generate sample data


set.seed(123)
data <- matrix(rnorm(100), ncol = 5) # Creating a random data matrix

# Perform PCA
pca_result <- prcomp(data, scale. = TRUE) # Scale. = TRUE scales the variables to have unit variance

# Standard deviation of each principal component


sd_pca <- pca_result$sdev

# Eigenvectors (loadings) of each principal component


eigenvectors <- pca_result$rotation

# Percentage of variance explained by each principal component


percentage_var <- (sd_pca^2) / sum(sd_pca^2) * 100

# create train and test sets


inTrain <- createDataPartition(y=spam$type,p=0.75, list=FALSE)
training <- spam[inTrain,]
testing <- spam[-inTrain,]
# create preprocess object
preProc <- preProcess(log10(training[,-58]+1),method="pca",pcaComp=2)
# calculate PCs for training data
trainPC <- predict(preProc,log10(training[,-58]+1))
# run model on outcome and principle components
modelFit <- train(training$type ~ .,method="glm",data=trainPC)
# calculate PCs for test data
testPC <- predict(preProc,log10(testing[,-58]+1))
# compare results
confusionMatrix(testing$type,predict(modelFit,testPC))

Prediction with Trees: split variables into groups.. produce nonlinear model

Measures of impurity:
Misclassification error: pmk>0.5 to have predominance else continue the classification
Gini index: 0 perfect.. 0.5 no purity

PE IV - Practical Machine Learning 4


Deviance: 0 perfect.. 1 no purity (e log)
Information Gai: 0 perfect.. 1 no purity (2 log)

#Constructing trees
train(y~., data=train, method="rpart")

Bagging: resample training dataset/bootstrap aggregating


Averaging the predictions together or majority vote... algorithms: bagEarth, treebag, bagFDA

bag(predictors, outcome, B=10, bagControl(fit, predict, aggregate))

Random forest: extension of bagging on classification trees


Process: bootstrap samples.. split and bootstrap variables..grow trees
drawbacks: overfitting, slow, hard to interpret

#Constructing trees random forest method


rf<-train(outcome~., data=train, method="rf",ntree=500)
#ntree=500 = specify number of trees that should be constructed

Boosting: gradient boosting, take a group of predictors.. add them up to make a strong predictor... method=
"gbm" for trees

#Boosting
gbm <- train(outcome ~ variables, method="gbm", data=train, verbose=F)

Model-based prediction: use of Bayes theorem,, certain assumptions..

linear discriminant analysis: same covariance for each predictor variable


method = "lda"

lda <- train(Species ~ .,data=training,method="lda")


# predict test outcomes using LDA model
pred.lda <- predict(lda,testing)

quadratic discriminant analysis: different covariance for each predictor variable

Naive Bayes: variables independent, probability proportional to the numerator


method = "nb"

nb <- train(Species ~ ., data=training,method="nb")


# predict test outcomes using naive Bayes model
pred.nb <- predict(nb,testing)

PE IV - Practical Machine Learning 5


Model Selection:
1) More predictors less error in prediction - training set
2) More predictors error dec and then inc - test set
3)Avoid overfitting and minimize errors

How to?
1) Split samples 60-20-20
2) Reduce the expected predicted error
Problems: low data, high computation/complexity
goal of prediction model = minimize overall expected prediction error

Regularized Regression Concept:


Regularizing the large coefficients.. increase bias.. decrease variance.. less pred error

Penalized residual sum of squares(PRSS):


Penalty + Prediction squared error
Higher PRSS worse model, lambda is the tuning parameter

# as λ → 0, the result approaches the least square solution as λ → ∞,


# all of the coefficients receive large penalties and the conditional coefficients β ridge
# as λ = ∞ approaches zero collectively

# Ridge Regression
ridge<-lm.ridge(outcome ~ predictors, data=training, lambda=5)

Lasso regression
Similar to ridge regression,
controls size of coefficients or the amount of regularization
large values of λ will set some coefficient equal to zero

#Lars Package
lasso<-lars(as.matrix(x), y, type="lasso", trace=TRUE)

Combining Predictors:
Combine classifiers.. vote majority or average..
reduces interpretability.. increase computation

Forecasting:
Predicting values using time series..
subsampling is hard as data dependent on time..
specific patterns: trend, seasonal patterns, cyclic
prediction through EMA, SMA

PE IV - Practical Machine Learning 6


#Forecast package
ma(ts, order=3)
# calculates the simple moving average for the order specified
# order=3 = order of moving average smoother, effectively the number of values that should be
# used to calculate the moving average
ets(train, model="MMM")
#runs exponential smoothing model on training data
# model = "MMM" = method used to create exponential smoothing

Unsupervised prediction:
When we are unaware about the classifications.
create clusters, label them, build models, predict clusters
use kmeans()

kmeans(data, centers=3)

PE IV - Practical Machine Learning 7

You might also like