Thanks to visit codestin.com
Credit goes to github.com

Skip to content

heech31/SEDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SEmiparametric Discriminant Analysis (SEDA)

R codes to implement sparse SEmiparametric Discriminant Analysis (SEDA) for high-dimensional zero-inflated data.

Reference:

Chung, H.C., Ni, Y., & Gaynanova, I. Sparse semiparametric discriminant analysis for high-dimensional zero-inflated data. Journal of Machine Learning Research (to appear)

Yoon G., Carroll R.J. & Gaynanova I. (2020). Sparse semiparametric canonical correlation analysis for data of mixed types Biometrika, 107(3), 609-625.

Vandeputte, D., Kathagen, G., D’hoe, K., Vieira-Silva, S., Valles-Colomer, M., Sabino, J., … & Raes, J. (2017). Quantitative microbiome profiling links gut community variation to microbial load. Nature, 551(7681), 507-511.

To run

To implement, (1) clone this repository; (2) open SEDA.Rproj; (3) load packages and function located in the folder ./functions/ as follows

# Set Paths
dataPath   <- "./Data/atac/"
funcPath   <- "./functions/"

# Load packages and source functions
files.sources = list.files(path=funcPath)
extension <- substring(files.sources,nchar(files.sources)-1,nchar(files.sources))
lets.source <- paste(funcPath, files.sources[extension==".R"], sep="")
invisible( mapply(source, lets.source) )

# "checkPD.R" modifies tmvtnorm:::checkSymmetricPositiveDefinite
environment( checkSymmetricPositiveDefiniteCustom ) <- asNamespace("tmvtnorm")
assignInNamespace( "checkSymmetricPositiveDefinite", checkSymmetricPositiveDefiniteCustom, ns="tmvtnorm" )

Simple example

This example assumes that you have your own dataset xTrain, yTrain, xTest, and yTest. In this example, we use

# Training data sample size and dimension
dim(xTrain) 
## [1] 100 300
# Test data sample size and dimension 
dim(xTest)  
## [1] 200 300
# Class sizes (training)
table(yTrain)
## yTrain
##  0  1 
## 51 49
# Class sizes (testing)
table(yTest)
## yTest
##   0   1 
## 102  98

Training

The function kfold.cv_seda trains SEDA classifier using K-fold cross-validation.

# Monte-Carlo sample size for the posterior probability estimation
nmc  <- 100 
# Tuning grid for the intercept (Delta_y, the class label threshold).
delta1_grid <- seq(-1.5, 1.5, l=100)
# Number of tuning grid points for the sparsity parameter (lambda).
ngrid_lambda <- 100
# Number of folds
K <- 5

fit <- kfold.cv_seda(K=K, xTrain,yTrain, delta1_grid, ngrid_lambda = ngrid_lambda, nmc = nmc)

Prediction

Estimation of truncated variables of test data

Condition means are approximated with a Monte-Carlo sample of size nmc. The function zhat_cnd returns latent Gaussian level test data with truncated variables replaced with respective conditional means.

## nmc : MC sample size
## xTrain : training x
## xTest  : test x
## fit$Rx.train: sample correlation matrix based on training x
zhat.test_seda    <- zhat_cnd(nmc = nmc, xTrain = xTrain, xValid = xTest, Rx = fit$Rx.train )$zhat
Prediction of class labels of test data

The function test_seda returns the misclassification error rate comparing predicted class labels and true class label yTest. The function requires (1) tuned classification direction fit$tuned.beta, (2) intercept fit$tuned.delta1, (3) latent Gaussian level test data zhat_test_seda, and (4) class label of test data yTest.

## fit$tuned_beta: tuned classification direction 
## fit$tuned_delta1: tuned intercept
## zhat.test_seda: estimated latent Gaussian level test data, and 
## yTest: test data class label for misclassification rate calculation.
seda.test.error    <-  test_seda(fit$tuned.beta, fit$tuned.delta1, zhat.test_seda, yTest, plot=TRUE )

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages