0% found this document useful (0 votes)

111 views14 pages

LEA: An R Package For Landscape and Ecological Association Studies

This document provides an overview of the R package LEA, which is dedicated to landscape genomics and ecological association studies. It can analyze population structure, estimate individual ancestry coefficients, perform missing data imputation, and evaluate genetic associations with environmental gradients. The key functions described are PCA, SNMF for admixture analysis and ancestry estimation, and LFMM for latent factor mixed modeling of genetic and environmental data. Examples are given using a tutorial dataset of 400 SNPs from 50 individuals, with the last 50 SNPs correlated to an environmental variable.

Uploaded by

Suany Quesada Calderon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views14 pages

LEA: An R Package For Landscape and Ecological Association Studies

Uploaded by

Suany Quesada Calderon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

LEA: An R Package for Landscape and Ecological

Association Studies
Eric Frichot and Olivier François
Université Grenoble-Alpes,
Centre National de la Recherche Scientifique,
TIMC-IMAG UMR 5525, Grenoble, 38042, France.

Contents
1 Overview 1

2 Introduction 2
2.1 Input files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Analysis of population structure and imputation of missing

data 3
3.1 Principal Component Analysis . . . . . . . . . . . . . . . . . 4
3.2 Inference of individual admixture coefficients using snmf . . . 5
3.3 Population differentation tests using snmf . . . . . . . . . . . 8
3.4 Missing genotype imputation using snmf . . . . . . . . . . . . 9

4 Ecological association tests using lfmm 10

1 Overview
LEA is an R package dedicated to landscape genomics and ecological associa-
tion tests (Frichot and Francois, 2015). LEA can run analyses of population
structure and genomewide tests for local adaptation. The package includes
statistical methods for estimating ancestry coefficients from large genotypic
matrices and for evaluating the number of ancestral populations (snmf, pca).
It performs statistical tests using latent factor mixed models for identifying
genetic polymorphisms that exhibit high correlation with environmental gra-
dients (lfmm). LEA is mainly based on optimized C programs that can scale
with the dimension of large data sets.

1
2 Introduction
The goal of this tutorial is to give an overview of the main functionali-
ties of the R package LEA. It will show the main steps of analysis, includ-
ing 1) analysing population structure and preparing a genotypic matrix for
genomewide association studies, 2) fitting GWAS latent factor mixed models
to the data and extracting candidate regions of interest.
As some functions may take a few hours to analyse very large datasets,
output files are written into text files that can be read by LEA after each
batch of runs (called a ’project’). We advise creating working directories
containing genotypic data and covariables when starting LEA. Note that two
files with the same names but a different extension are assumed to contain
the same data in distinct formats.

# creation of a directory for LEA analyses

dir.create("LEA_analyses")
# set the created directory as the working directory
setwd("LEA_analyses")

This tutorial is based on a small dataset consisting of 400 SNPs geno-

typed for 50 diploids individuals. The last 50 SNPs are correlated with an
environmental variable, and represent the target loci for an association anal-
ysis. Similar artificial data were analyzed in the computer note introducing
the R package LEA (Frichot and Francois, 2015).

library(LEA)
# Creation a the genotypic file: "genotypes.lfmm"
# The data include 400 SNPs for 50 individuals.
data("tutorial")
# Write genotypes in the lfmm format
write.lfmm(tutorial.R, "genotypes.lfmm")
# Write genotypes in the geno format
write.geno(tutorial.R, "genotypes.geno")
# creation of an environment gradient file: gradient.env.
# The .env file contains a single ecological variable
# for each individual.
write.env(tutorial.C, "gradients.env")

Note that the LEA package is to be able to handle very large population
genetic data sets. Genomic data are processed using fast C codes wrapped
into the R code. Most LEA functions use character strings containing paths
to input files as arguments.

2
2.1 Input files
The R package LEA can handle several input file formats for genotypic ma-
trices. More specifically, the package uses the lfmm and geno formats, and
provides functions to convert from other formats such as ped, vcf, and
ancestrymap formats. The program VCFTOOLS can be very useful in
providing one of those format (ped is the closest to an lfmm matrix).
The lfmm and geno formats can also be used for coding multiallelic
marker data (eg, microsatellites). For multiallelic marker data, the conver-
sion functionstruct2geno() converts files in the STRUCTURE format in
the geno or lfmm formats. LEA can also process allele frequency data if they
are encoded in the lfmm format. In that case, the lfmm function will use
allele counts for populations in its model.
Phenotypic traits and ecological predictors must be formatted in the
env format. This format corresponds to a matrix where each variable is
represented as a column (Frichot et al., 2013). It uses the .env extension.
When using ecological data, we often need to decide which variables
should be used among a large number of ecological indicators (eg, climatic
variables), we suggest that users summarize their data using linear combi-
nations of those indicators. Considering principal component analysis and
using the first principal components as proxies for ecological gradients linked
to selective forces can be useful in this context.
The LEA package can handle missing data in population structure analy-
ses. In association analyses, missing genotypes must be replaced by imputed
values using a missing data imputation method. We encourage users to re-
move their missing data by using the function impute(), which is based
on population structure analysis(see next section). Note that specialized
genotype imputation programs such as BEAGLE, IMPUTE2 or MENDEL-
IMPUTE could provide better imputation results than LEA. Filtering out
rare variants – retaining minor allele frequency greater than 5 percent –,
and removing regions in strong LD may also result in better analyses with
LEA.

3 Analysis of population structure and imputation

of missing data
The R package LEA implements two classical approaches for the estimation
of population genetic structure: principal component analysis (pca) and ad-
mixture analysis (Patterson et al., 2006; Pritchard et al., 2000) using sparse
nonnegative matrix factorization (snmf). The algorithms programmed in
LEA are improved versions of pca and admixture analysis, and they are able
to process large genotypic matrices efficiently.

3
3.1 Principal Component Analysis
The LEA function pca() computes the scores of a pca for a genotypic matrix,
and returns a screeplot for the eigenvalues of the sample covariance matrix.
Using pca, an object of class pcaProject is created. This object contains a
path to the files storing eigenvectors, eigenvalues and projections.

# run of pca
# Available options, K (the number of PCs),
# center and scale.
# Create files: genotypes.eigenvalues - eigenvalues,
# genotypes.eigenvectors - eigenvectors,
# genotypes.sdev - standard deviations,
# genotypes.projections - projections,
# Create a pcaProject object: pc.
pc = pca("genotypes.lfmm", scale = TRUE)

The number of ”significant” components can be evaluated using graphical

methods based on the screeplot (Figure 1). The knee in the screeplot in-
dicates that there are around K = 4 major components in the data (≈ 5
genetic clusters). Following (Patterson et al., 2006), the tracy.widom func-
tion computes Tracy-Widom tests for each eigenvalue as follows.

# Perfom Tracy-Widom tests on all eigenvalues.

# create file: tuto.tracyWidom - tracy-widom test information.
tw = tracy.widom(pc)

# display p-values for the Tracy-Widom tests (first 5 pcs).

tw$pvalues[1:5]

[1] 8.000e-09 8.000e-09 8.000e-09 1.503e-04 3.152e-02

4
# plot the percentage of variance explained by each component
plot(tw$percentage)

0.10
tw$percentage

0.06
0.02

0 10 20 30 40 50

Index

Figure 1: Screeplot for the percentage of variance explained by each compo-

nent in a PCA of the genetic data. The knee at K = 4 indicates that there
are 5 major genetic clusters in the data.

3.2 Inference of individual admixture coefficients using snmf

LEA includes the R function snmf that estimates individual admixture coeffi-
cients from the genotypic matrix with provides results very close to Bayesian
clustering programs (Pritchard et al., 2000; François and Durand, 2010). As-
suming K ancestral populations, the R function snmf provides least-squares
estimates of ancestry proportions (Frichot et al., 2014).

# main options
# K = number of ancestral populations
# entropy = TRUE: computes the cross-entropy criterion,
# CPU = 4 the number of CPUs.
project = NULL
project = snmf("genotypes.geno",
K = 1:10,

5
entropy = TRUE,
repetitions = 10,
project = "new")

The snmf function estimates an entropy criterion that evaluates the qual-
ity of fit of the statistical model to the data using a cross-validation technique
(Figure 2). The entropy criterion can help choosing the number of ancestral
populations that best explains the genotypic data (Alexander and Lange,
2011; Frichot et al., 2014). Here we have a clear minimum at K = 4, sug-
gesting 4 genetic clusters. Often, the plot shows a less clear pattern, and
choosing the ”knee” is a generally good approach. The number of ancestral
populations is closely linked to the number of principal components that
explain variation in the genomic data. Both numbers can help determining
the number of latent factors when correcting for confounding effects due to
population structure in ecological association tests.

# plot cross-entropy criterion for all runs in the snmf project

plot(project, col = "blue", pch = 19, cex = 1.2)
0.68
Cross−entropy

0.64
0.60
0.56

2 4 6 8 10

Number of ancestral populations

Figure 2: Value of the cross-entropy criterion as a function of the number

of populations in snmf.

6
The next step is to display a barplot for the Q-matrix. In Figure 3, the
Q() function of LEA is called and the output Q-matrix is converted into a
Qmatrix object. The conversion of the Q-matrix as a Qmatrix object is also
useful for running improved graphical functions from other packages such as
tess3r (Caye et al., 2016).

# select the best run for K = 4

best = which.min(cross.entropy(project, K = 4))
my.colors <- c("tomato", "lightblue",
"olivedrab", "gold")
barchart(project, K = 4, run = best,
border = NA, space = 0,
col = my.colors,
xlab = "Individuals",
ylab = "Ancestry proportions",
main = "Ancestry matrix") -> bp
axis(1, at = 1:length(bp$order),
labels = bp$order, las=1,
cex.axis = .3)

Ancestry matrix
1.0
Ancestry proportions

0.8
0.6
0.4
0.2
0.0

1 2 3 4 8 12 17 24 27 29 35 42 46 6 9 19 22 28 30 34 37 38 40 41 48 15 16 18 20 21 23 26 36 43 45 47 49 50 5 7 10 11 13 14 25 31 32 33 39 44

Individuals

Figure 3: Ancestry coefficients obtained from snmf.

7
3.3 Population differentation tests using snmf
The most common approaches to detecting outlier loci from the genomic
background have focused on extreme values of the fixation index, Fst , across
loci. The snmf() function can compute fixation indices when the population
is genetically continuous, when predefining subpopulations is difficult, and in
the presence of admixed individuals in the sample ((Martins et al., 2016)). In
the snmf approach, population differentiation statistics are computed from
ancestry coefficients obtained a snmf project, and p-values are returned for
all loci. Figure 4 is an example of outlier analysis with snmf.

# Population differentiation tests

p = snmf.pvalues(project,
entropy = TRUE,
ploidy = 2,
K = 4)
pvalues = p$pvalues
par(mfrow = c(2,1))
hist(pvalues, col = "orange")
plot(-log10(pvalues), pch = 19, col = "blue", cex = .7)

Histogram of pvalues
60
Frequency

40
20
0

0.0 0.2 0.4 0.6 0.8 1.0

pvalues
−log10(pvalues)

4
2
0

0 100 200 300 400

Index

Figure 4: P -values for population differentiation tests.

8
3.4 Missing genotype imputation using snmf
Missing genotypes are critical to genomewide association studies. Before
running an association study, an important step is to replace the missing
data, represented as ’9’ in the geno and lfmm) files, by better values. To
provide an example of missing data imputation, let’s start by removing 100
genotypes from the original data. The resulting matrix is saved in the file
genotypeM.geno.

# creation of a genotypic matrix with missing genotypes

dat = as.numeric(tutorial.R)
dat[sample(1:length(dat), 100)] <- 9
dat <- matrix(dat, nrow = 50, ncol = 400)
write.lfmm(dat, "genotypeM.lfmm")

## [1] "genotypeM.lfmm"

Next, snmf can be run on the data with missing genotypes as follows.
The genotypic matrix completion is based on estimated ancestry coefficients
and ancestral genotype frequencies.

project.missing = snmf("genotypeM.lfmm", K = 4,
entropy = TRUE, repetitions = 10,
project = "new")

Then the project data can be used to impute the missing data as follows.

# select the run with the lowest cross-entropy value

best = which.min(cross.entropy(project.missing, K = 4))

# Impute the missing genotypes

impute(project.missing, "genotypeM.lfmm",
method = 'mode', K = 4, run = best)

## Missing genotype imputation for K = 4

## Missing genotype imputation for run = 4
## Results are written in the file: genotypeM.lfmm_imputed.lfmm

# Proportion of correct imputation results

dat.imp = read.lfmm("genotypeM.lfmm_imputed.lfmm")
mean( tutorial.R[dat == 9] == dat.imp[dat == 9] )

## [1] 0.74

The results are saved in an output file with the string "imputed" in its
suffix name.

9
4 Ecological association tests using lfmm
The R package LEA performs genomewide association analysis based on la-
tent factor mixed models using the lfmm function (Frichot et al., 2013). To
recall the model, let G denote the genotypic matrix, storing allele frequencies
for each individual at each locus, and let X denote a set of d ecological pre-
dictors or phenotypic traits. LFMMs consider the genotypic matrix entries
as response variables in a linear regression model

Gi` = µ` + β`T Xi + UiT V` + i` , (1)

where µ` is a locus specific effect, β` is a d-dimensional vector of regression

coefficients, Ui contains K latent factors, and V` contains their corresponding
loadings (i stands for an individual and ` for a locus). The residual terms,
i` , are statistically independent Gaussian variables with mean zero and
variance σ 2 .
In latent factor models, association between predictors and allele fre-
quencies can be tested while estimating unobserved latent factors that model
confounding effects. In principle, the latent factors include levels of popu-
lation structure due to shared demographic history or background genetic
variation. After correction for confounding effects, association between al-
lele frequencies and an ecological predictor at a particular locus is often
interpreted as a signature of natural selection.

Running LFMM. The lfmm program is based on a stochastic algorithm

(MCMC) which doesnot provide exact results. We recommend using large
number of cycles (e.g., -i 6000) and the burnin period should set at least
to one-half of the total number of cycles (-b 3000). We have noticed that
the program results are sensitive to the run-length parameter when data
sets have relatively small sizes (eg, a few hundreds of individuals, a few
thousands of loci). We recommend increasing the burnin period and the
total number of cycles in this situation.

# main options:
# K: (the number of latent factors)
# Runs with K = 6 and 5 repetitions.
project = NULL
project = lfmm("genotypes.lfmm",
"gradients.env",
K = 6,
repetitions = 5,
project = "new")

10
Deciding the number of latent factors. Deciding an appropriate value
for the number of latent factors in the lfmm call can be based on the analysis
of histograms of test significance values. Ideally, histograms should be flat,
with a peak close to zero.
Since the objective is to control the false discovery rate (FDR) while
keeping reasonable power to reject the null hypothesis, we recommend using
several runs for each value of K and combine p-values (use 5 to 10 runs,
see our script below). Choosing values of K for which the histograms show
their correct shape warrants that the FDR can be controlled efficiently.
Testing all K values in a large range, from 1 to 20 for example, is gener-
ally useless. A careful analysis of population structure and estimates of the
number of ancestral populations contributing to the genetic data indicates
the range of values to be explored. For example the snmf program estimates
4 ancestral populations, then running lfmm for K = 3 − 6 often provides
good results.

Combining z-scores obtained from multiple runs. We use the Fisher-

Stouffer method to combine z-scores from multiple runs. In practice, we
found that using the median z-scores of 5-10 runs and re-adjusting the p-
values afterwards can increase the power of lfmm tests. This procedure is
implemented in LEA function lfmm.pvalues().

# compute adjusted p-values

p = lfmm.pvalues(project, K = 6)
pvalues = p$pvalues

The results displayed in Figure 5 show that the null-hypothesis is cor-

rectly calibrated. The loci exhibiting significant associations are found at
the right on the Manhattan plot.

11
# GWAS significance test
par(mfrow = c(2,1))
hist(pvalues, col = "lightblue")
plot(-log10(pvalues), pch = 19, col = "blue", cex = .7)

Histogram of pvalues
80
Frequency

40
0

0.0 0.2 0.4 0.6 0.8 1.0

pvalues
−log10(pvalues)

8
6
4
2
0

0 100 200 300 400

Index

Figure 5: P -values for LFMM tests. The loci showing significant associations
are at the right on the Manhattan plot.

To adjust p-values for multiple testing issues, we use the Benjamini-

Hochberg procedure (Benjamini and Hochberg, 1995). We set the expected
levels of FDR to q = 5%, 10%, 15% and 20% respectively . The lists of
candidate loci are given by the following script. Since we the ground truth
is known for the simulated data, we can compare the expected FDR levels
to their observed levels, and compute the power (TPR) of the test.

for (alpha in c(.05,.1,.15,.2)) {

# expected FDR
print(paste("Expected FDR:", alpha))
L = length(pvalues)

# return a list of candidates with expected FDR alpha.

# Benjamini-Hochberg's algorithm:

12
w = which(sort(pvalues) < alpha * (1:L) / L)
candidates = order(pvalues)[w]

# estimated FDR and True Positive Rate

Lc = length(candidates)
estimated.FDR = sum(candidates <= 350)/Lc
print(paste("Observed FDR:",
round(estimated.FDR, digits = 2)))
estimated.TPR = sum(candidates > 350)/50
print(paste("Estimated TPR:",
round(estimated.TPR, digits = 2)))
}

## [1] "Expected FDR: 0.05"

## [1] "Observed FDR: 0.08"
## [1] "Estimated TPR: 0.7"
## [1] "Expected FDR: 0.1"
## [1] "Observed FDR: 0.12"
## [1] "Estimated TPR: 0.86"
## [1] "Expected FDR: 0.15"
## [1] "Observed FDR: 0.13"
## [1] "Estimated TPR: 0.92"
## [1] "Expected FDR: 0.2"
## [1] "Observed FDR: 0.18"
## [1] "Estimated TPR: 0.92"

References
Alexander DH and Lange K. 2011. Enhancements to the ADMIXTURE
algorithm for individual ancestry estimation. BMC Bioinformatics 12:246.

Benjamini Y and Hochberg Y. 1995. Controlling the false discovery rate: a

practical and powerful approach to multiple testing. J R Stat Soc B Met.
pp. 289–300.

Caye K, Jay F, Michel O and Francois O. 2016. Fast inference of individual

admixture coefficients using geographic data. bioRxiv p. 080291.

François O and Durand E. 2010. Spatially explicit bayesian clustering models

in population genetics. Mol Ecol Resour. 10:773–784.

Frichot E and Francois O. 2015. LEA: an R package for Landscape and

Ecological Association studies. Methods in Ecology and Evolution 6:925–
929.

13
Frichot E, Mathieu F, Trouillon T, Bouchard G and François O. 2014.
Fast and efficient estimation of individual ancestry coefficients. Genet-
ics 196:973–983.

Frichot E, Schoville SD, Bouchard G and François O. 2013. Testing for

associations between loci and environmental gradients using latent factor
mixed models. Mol Biol Evol. 30:1687–1699.

Martins H, Caye K, Luu K, Blum MGB and Francois O. 2016. Identify-

ing outlier loci in admixed and in continuous populations using ancestral
population differentiation statistics. Molecular Ecology 25:5029–5042.

Patterson N, Price AL and Reich D 2006. Population structure and eigen-

analysis. PLoS Genet. 2:20.

Pritchard JK, Stephens M and Donnelly P. 2000. Inference of population

structure using multilocus genotype data. Genetics 155:945–959.

R Packages for GWAS Analysis
No ratings yet
R Packages for GWAS Analysis
18 pages
Genepop
No ratings yet
Genepop
51 pages
R Package adegenet Tutorial
No ratings yet
R Package adegenet Tutorial
63 pages
Evo Bio
No ratings yet
Evo Bio
20 pages
Statistical Analysis of Molecular Data in Diversity Studies
No ratings yet
Statistical Analysis of Molecular Data in Diversity Studies
11 pages
Design, Analysis, and Interpretation of Genome Wide Association Scans ISBN 1461494427, 9781461494423 Scribd Download
No ratings yet
Design, Analysis, and Interpretation of Genome Wide Association Scans ISBN 1461494427, 9781461494423 Scribd Download
17 pages
Methods Available For The Analysis of Data From Dominant Molecular Markers
No ratings yet
Methods Available For The Analysis of Data From Dominant Molecular Markers
6 pages
Co-So-Di-Truyen-Chon-Giong-Cay-Trong - Geneticmaps - (Cuuduongthancong - Com)
No ratings yet
Co-So-Di-Truyen-Chon-Giong-Cay-Trong - Geneticmaps - (Cuuduongthancong - Com)
41 pages
SNPassoc: R Package for Genome Studies
No ratings yet
SNPassoc: R Package for Genome Studies
35 pages
Bioinformatics 24-11-1403
No ratings yet
Bioinformatics 24-11-1403
3 pages
Admixture Manual
No ratings yet
Admixture Manual
14 pages
Abdel Abdellaoui - Population Stratification - Statistical Genetics Workshop Boulder 2019
No ratings yet
Abdel Abdellaoui - Population Stratification - Statistical Genetics Workshop Boulder 2019
93 pages
Methods History Using Genomic
No ratings yet
Methods History Using Genomic
30 pages
Week 7-Tidy Survey Analysis
No ratings yet
Week 7-Tidy Survey Analysis
93 pages
Pub - Encyclopedia of Genetics Genomics Proteomics and B PDF
100% (1)
Pub - Encyclopedia of Genetics Genomics Proteomics and B PDF
4,046 pages
Manual Haplo Stats
No ratings yet
Manual Haplo Stats
54 pages
Moesm1 Esm PDF
No ratings yet
Moesm1 Esm PDF
66 pages
Structure Software v2.3 Documentation
No ratings yet
Structure Software v2.3 Documentation
39 pages
Genetic Maps
No ratings yet
Genetic Maps
41 pages
Hybrid autoencoder - tài liệu về autoencoder
No ratings yet
Hybrid autoencoder - tài liệu về autoencoder
14 pages
Hellenthal Supplementary Material
No ratings yet
Hellenthal Supplementary Material
118 pages
GAPIT Guide for Geneticists
No ratings yet
GAPIT Guide for Geneticists
9 pages
The Utility of Single Nucleotide Polymorphisms in Inferences of Population History
No ratings yet
The Utility of Single Nucleotide Polymorphisms in Inferences of Population History
8 pages
IJE Volume 37 Issue 2 Pages 412-424
No ratings yet
IJE Volume 37 Issue 2 Pages 412-424
13 pages
GWAStutorial 23feb
No ratings yet
GWAStutorial 23feb
3 pages
Assignment Ans
No ratings yet
Assignment Ans
4 pages
R-Lequin Graphic Description
No ratings yet
R-Lequin Graphic Description
8 pages
Advanced Population and Medical Genetics EPI511, Spring 2019 Experience 1
No ratings yet
Advanced Population and Medical Genetics EPI511, Spring 2019 Experience 1
4 pages
R Textbook Full
No ratings yet
R Textbook Full
96 pages
Gen Alex File
No ratings yet
Gen Alex File
19 pages
07 - Diversity - Stats in R
No ratings yet
07 - Diversity - Stats in R
25 pages
Learn Pop Gen
No ratings yet
Learn Pop Gen
7 pages
Gotzenberger Et Al. - 2021 - Trait-Based Ecology Tools in R
No ratings yet
Gotzenberger Et Al. - 2021 - Trait-Based Ecology Tools in R
267 pages
Genepop
No ratings yet
Genepop
19 pages
Rel Mix
No ratings yet
Rel Mix
16 pages
Population Genomics: Friend or Foe?
No ratings yet
Population Genomics: Friend or Foe?
88 pages
LDP Webpage Release Methods
No ratings yet
LDP Webpage Release Methods
8 pages
Marcadores Moleculares
No ratings yet
Marcadores Moleculares
66 pages
Ggplot2 Slides
No ratings yet
Ggplot2 Slides
82 pages
Genetic Data Analysis Tools
No ratings yet
Genetic Data Analysis Tools
43 pages
Statistical Methods in Genetics
No ratings yet
Statistical Methods in Genetics
12 pages
Exploratory Data Analysis: 2.1 Objectives
No ratings yet
Exploratory Data Analysis: 2.1 Objectives
23 pages
Pickrell 2012 Suppl Info
No ratings yet
Pickrell 2012 Suppl Info
47 pages
Convert Genind to Hierfstat Data Frame
No ratings yet
Convert Genind to Hierfstat Data Frame
2 pages
IntroTutorial Dartr
No ratings yet
IntroTutorial Dartr
67 pages
Package Spader': R Topics Documented
No ratings yet
Package Spader': R Topics Documented
22 pages
GWAS Tutorial Using TASSEL 5.0
No ratings yet
GWAS Tutorial Using TASSEL 5.0
140 pages
Slides Woods
No ratings yet
Slides Woods
156 pages
DALab Part-B BCU&BU
No ratings yet
DALab Part-B BCU&BU
12 pages
Pop Gen
No ratings yet
Pop Gen
38 pages
Chapter 3 Feature Selection - Basics of Single-Cell Analysis With Bioconductor
No ratings yet
Chapter 3 Feature Selection - Basics of Single-Cell Analysis With Bioconductor
16 pages
Network-Based Hierarchical Population Structure Analysis For Large Genomic Data Sets
No ratings yet
Network-Based Hierarchical Population Structure Analysis For Large Genomic Data Sets
14 pages
Population Genomics PDF
No ratings yet
Population Genomics PDF
336 pages
Chapman 2018appendixs2
No ratings yet
Chapman 2018appendixs2
10 pages
Population Genomics With R 1st Edition Emmanuel Paradis Instant Download
No ratings yet
Population Genomics With R 1st Edition Emmanuel Paradis Instant Download
105 pages
AGHmatrix - Genetic Relationship Matrices in R
No ratings yet
AGHmatrix - Genetic Relationship Matrices in R
4 pages
R for Phylogenetic Analysis Beginners
No ratings yet
R for Phylogenetic Analysis Beginners
22 pages
GOATOOLS: A Python Library For Gene Ontology Analyses
No ratings yet
GOATOOLS: A Python Library For Gene Ontology Analyses
17 pages
IPC12 Presentation Guide for Speakers
No ratings yet
IPC12 Presentation Guide for Speakers
13 pages
Scordato Et Al-2017-Molecular Ecology
No ratings yet
Scordato Et Al-2017-Molecular Ecology
16 pages
Finding The Genomic Basis of Local Adaptation: Pitfalls, Practical Solutions, and Future Directions
No ratings yet
Finding The Genomic Basis of Local Adaptation: Pitfalls, Practical Solutions, and Future Directions
19 pages
Using IMa3 PDF
No ratings yet
Using IMa3 PDF
70 pages
Focus On Your Science: Features
No ratings yet
Focus On Your Science: Features
1 page
Eva 12696 PDF
No ratings yet
Eva 12696 PDF
16 pages
T Z N - Et - Al 2018 Oikos
No ratings yet
T Z N - Et - Al 2018 Oikos
11 pages
A Tutorial: Genome - Based RNA - Seq Analysis Using The TUXEDO Package (Updated: 2014 - 10 - 21)
No ratings yet
A Tutorial: Genome - Based RNA - Seq Analysis Using The TUXEDO Package (Updated: 2014 - 10 - 21)
17 pages
Toro Et Al 2016. BPA and NP Removal From Municipal Wastewater by Tropical Horizontal Subsurface Constructed Wetlands-1
No ratings yet
Toro Et Al 2016. BPA and NP Removal From Municipal Wastewater by Tropical Horizontal Subsurface Constructed Wetlands-1
1 page
Mytilus Eduli M. Trossulus
No ratings yet
Mytilus Eduli M. Trossulus
13 pages
Tutorial Genomics
No ratings yet
Tutorial Genomics
51 pages
Macse
No ratings yet
Macse
5 pages
Treml Et Al-2015-Diversity and Distributions
No ratings yet
Treml Et Al-2015-Diversity and Distributions
12 pages
Gene Ontology and Pathways: Ståle Nygård
No ratings yet
Gene Ontology and Pathways: Ståle Nygård
38 pages
DNA Barcode Survey of Society Islands Arthropods
No ratings yet
DNA Barcode Survey of Society Islands Arthropods
13 pages
Ruiz Daniels Rose PDF
No ratings yet
Ruiz Daniels Rose PDF
204 pages
Leroux1997 PDF
No ratings yet
Leroux1997 PDF
20 pages
Computational Biology Lecture
No ratings yet
Computational Biology Lecture
36 pages
Introduction To Phylogeny: 36-149 The Tree of Life Christopher R. Genovese
No ratings yet
Introduction To Phylogeny: 36-149 The Tree of Life Christopher R. Genovese
20 pages
Labour Welfare Scheme
No ratings yet
Labour Welfare Scheme
20 pages
Stack by Linked List (By C++) : #Include
No ratings yet
Stack by Linked List (By C++) : #Include
4 pages
Financial Systems & Cheque Clearing
No ratings yet
Financial Systems & Cheque Clearing
5 pages
NOC Check List DCA
No ratings yet
NOC Check List DCA
8 pages
Bài Tập Bổ Trợ Tiếng Anh 9 Global 4 Kỹ Năng Siêu Hay HS UNIT 2
No ratings yet
Bài Tập Bổ Trợ Tiếng Anh 9 Global 4 Kỹ Năng Siêu Hay HS UNIT 2
17 pages
8-Step Guide to Effective Gemba Walks
No ratings yet
8-Step Guide to Effective Gemba Walks
10 pages
HIST342 Exercise 10
No ratings yet
HIST342 Exercise 10
5 pages
ZBAA
No ratings yet
ZBAA
53 pages
School Space Allocation Guide
No ratings yet
School Space Allocation Guide
5 pages
Hartley Oscillator
No ratings yet
Hartley Oscillator
4 pages
Batocera Installation Guide
No ratings yet
Batocera Installation Guide
14 pages
Victoria Adaugo Onyekwere - 8109678605 - 20250102202313
No ratings yet
Victoria Adaugo Onyekwere - 8109678605 - 20250102202313
43 pages
GSCH003 - Rev04 24.11.2021
No ratings yet
GSCH003 - Rev04 24.11.2021
55 pages
SIGVERIF
No ratings yet
SIGVERIF
7 pages
Valve Packing Standards Guide
No ratings yet
Valve Packing Standards Guide
9 pages
Adidas'S Sustainability Story
No ratings yet
Adidas'S Sustainability Story
12 pages
University of Cambridge International Examinations International General Certificate of Secondary Education
0% (1)
University of Cambridge International Examinations International General Certificate of Secondary Education
109 pages
10 Examples of Human Rights - Human Rights Careers
No ratings yet
10 Examples of Human Rights - Human Rights Careers
5 pages
TEDwp Fy Youth
No ratings yet
TEDwp Fy Youth
4 pages
Bungalow Melody - Lyrics - Chords
No ratings yet
Bungalow Melody - Lyrics - Chords
1 page
CAT DP40 Electric Equipment Parts
No ratings yet
CAT DP40 Electric Equipment Parts
6 pages
MGT301 MIDTERM Solved PDF
100% (2)
MGT301 MIDTERM Solved PDF
69 pages
Essential of Financial Accounting
No ratings yet
Essential of Financial Accounting
8 pages
Manas College Pamphlet 2025
No ratings yet
Manas College Pamphlet 2025
2 pages
Taran Et Al. 2015 PDF
No ratings yet
Taran Et Al. 2015 PDF
31 pages
Scomi Drilling Fluid
No ratings yet
Scomi Drilling Fluid
23 pages
Lec 1 Manufacturing Processes HAF
No ratings yet
Lec 1 Manufacturing Processes HAF
11 pages
Strategic Change - 2022 - Joy - Digital Future of Luxury Brands Metaverse Digital Fashion and Non Fungible Tokens
No ratings yet
Strategic Change - 2022 - Joy - Digital Future of Luxury Brands Metaverse Digital Fashion and Non Fungible Tokens
7 pages
HIRA Night Works
No ratings yet
HIRA Night Works
13 pages
SIDF Corporate Profile 2022
No ratings yet
SIDF Corporate Profile 2022
63 pages

LEA: An R Package For Landscape and Ecological Association Studies

Uploaded by

LEA: An R Package For Landscape and Ecological Association Studies

Uploaded by

LEA: An R Package for Landscape and Ecological

3 Analysis of population structure and imputation of missing

4 Ecological association tests using lfmm 10

# creation of a directory for LEA analyses

This tutorial is based on a small dataset consisting of 400 SNPs geno-

3 Analysis of population structure and imputation

The number of ”significant” components can be evaluated using graphical

# Perfom Tracy-Widom tests on all eigenvalues.

# display p-values for the Tracy-Widom tests (first 5 pcs).

[1] 8.000e-09 8.000e-09 8.000e-09 1.503e-04 3.152e-02

Figure 1: Screeplot for the percentage of variance explained by each compo-

3.2 Inference of individual admixture coefficients using snmf

# plot cross-entropy criterion for all runs in the snmf project

Number of ancestral populations

Figure 2: Value of the cross-entropy criterion as a function of the number

# select the best run for K = 4

Figure 3: Ancestry coefficients obtained from snmf.

# Population differentiation tests

0.0 0.2 0.4 0.6 0.8 1.0

0 100 200 300 400

Figure 4: P -values for population differentiation tests.

# creation of a genotypic matrix with missing genotypes

# select the run with the lowest cross-entropy value

# Impute the missing genotypes

## Missing genotype imputation for K = 4

# Proportion of correct imputation results

Gi` = µ` + β`T Xi + UiT V` + i` , (1)

where µ` is a locus specific effect, β` is a d-dimensional vector of regression

Running LFMM. The lfmm program is based on a stochastic algorithm

Combining z-scores obtained from multiple runs. We use the Fisher-

# compute adjusted p-values

The results displayed in Figure 5 show that the null-hypothesis is cor-

0.0 0.2 0.4 0.6 0.8 1.0

0 100 200 300 400

To adjust p-values for multiple testing issues, we use the Benjamini-

for (alpha in c(.05,.1,.15,.2)) {

# return a list of candidates with expected FDR alpha.

# estimated FDR and True Positive Rate

## [1] "Expected FDR: 0.05"

Benjamini Y and Hochberg Y. 1995. Controlling the false discovery rate: a

Caye K, Jay F, Michel O and Francois O. 2016. Fast inference of individual

François O and Durand E. 2010. Spatially explicit bayesian clustering models

Frichot E and Francois O. 2015. LEA: an R package for Landscape and

Frichot E, Schoville SD, Bouchard G and François O. 2013. Testing for

Martins H, Caye K, Luu K, Blum MGB and Francois O. 2016. Identify-

Patterson N, Price AL and Reich D 2006. Population structure and eigen-

Pritchard JK, Stephens M and Donnelly P. 2000. Inference of population

You might also like

Gi` = µ` + β`T Xi + UiT V` + i` , (1)