Lecture Intro To Stat in R
Lecture Intro To Stat in R
§ Rémy Beugnon
https://remybeugnon.netlify.app
@BeugnonRemy
§ Christian Ristok
@ChristianRistok
§ Malte Jochum
http://maltejochum.de/
@MalteJochum
Summary
In this lecture:
2. Application
4. Conclusion
3
Summary
In this lecture:
4
Summary
In this lecture:
Response
Explanatory
5
Steps to analyze your data
Check your 1. What are your variables?
data i. What is your response variable?
ii. What is your explanatory variable?
structure 2. How are your data distributed?
3. How do you expect your response variable to be
distributed?
6
Steps to analyze your data
Check your 1. What are your variables?
data i. What is your response variable?
ii. What is your explanatory variable?
structure 2. How are your data distributed?
3. How do you expect your response variable to be
distributed?
7
Steps to analyze your data
Check your 1. What are your variables?
data i. What is your response variable?
ii. What is your explanatory variable?
structure 2. How are your data distributed?
3. How do you expect your response variable to be
distributed?
8
Steps to analyze your data
Check your 1. What are your variables?
data i. What is your response variable?
ii. What is your explanatory variable?
structure 2. How are your data distributed?
3. How do you expect your response variable to be
distributed?
9
Steps to analyze your data
Check your 1. What are your variables?
data i. What is your response variable?
ii. What is your explanatory variable?
structure 2. How are your data distributed?
3. How do you expect your response variable to be
distributed?
10
Steps to analyze your data
Check your 1. What are your variables?
data i. What is your response variable?
ii. What is your explanatory variable?
structure 2. How are your data distributed?
3. How do you expect your response variable to be
distributed?
YES
Extract your
results
11
Summary
In this lecture:
2. Application
12
Who to do that using RStudio
§ You need
§ RStudio
§ Plot: ggplot2 (join the course from Steph for more details)
§ A dataset to analyze
13
Example: tree diversity effect on litterfall and decomposition
14
Example: tree diversity effect on litterfall and decomposition
Litterfall biomass
[g/m2]
15
Example: tree diversity effect on litterfall and decomposition
Litterfall biomass
[g/m2]
Litter species
richness
17
Example: tree diversity effect on litterfall and decomposition
C loss (%)
18
Example: tree diversity effect on litterfall and decomposition
C loss (%)
N loss (%)
Litter sp. rich Litter sp. rich
19
Example: tree diversity effect on litterfall and decomposition
C loss (%)
N loss (%)
Litter sp. rich Litter sp. rich
C loss (%)
Litterfall
20
Example: tree diversity effect on litterfall and decomposition
C loss (%)
N loss (%)
Litter sp. rich Litter sp. rich
C loss (%)
N loss (%)
Litterfall Litterfall
21
Example: tree diversity effect on litterfall abundance
Litterfall abundance
(g/m2)
22
Check your data structure
Check your 1. What are your variables?
data i. What is your response variable?
ii. What is your explanatory variable?
structure 2. How are your data distributed?
3. How do you expect your response variable to be
distributed?
23
Check your data structure
24
Check your data structure
25
Check your data structure
Litterfall neigh.sp.rich
TSP
26
Check your data structure
27
Check your data structure
str(df)
28
Check your data structure
29
Check your data structure
30
Check your data structure
31
Check your data structure
DANGER ZONE
Your data are not Normally distributed, your residuals should be!
32
Check your data structure
DANGER ZONE
Your data are not Normally distributed, your residuals should be!
Let takes people height as example:
33
Check your data structure
DANGER ZONE
Your data are not Normally distributed, your residuals should be!
Let takes people height as example, drinking your soup makes you grow
up
34
Check your data structure
DANGER ZONE
Your data are not Normally distributed, your residuals should be!
Let takes people height as example, drinking your soup makes you grow
up
NOT NORMAL
35
Check your data structure
DANGER ZONE
36
Check your data structure
37
Check your data structure
38
Check your data structure
boxplot(df$litterfall)
Litterfall neigh.sp.rich
39
Check your data structure
df[df$litterfall<0 | df$litterfall>500,]
Conditions on rows
All columns
40
Check your data structure
df[df$litterfall<0 | df$litterfall>500,]
41
Check your data structure
df[df$neigh.sp.rich<1 | df$neigh.sp.rich>12,]
42
Check your data structure
43
Check your data structure
df = df[!(df$neigh.sp.rich<1 | df$neigh.sp.rich>12),]
df = df[!(df$litterfall<0 | df$litterfall>500),]
44
Check your data structure
Litterfall neigh.sp.rich
45
Build your hypothesis
Check your 1. What are your variables?
data i. What is your response variable?
ii. What is your explanatory variable?
structure 2. How are your data distributed?
3. How do you expect your response variable to be
distributed?
46
Build your hypothesis
Litterfall abundance
(g/m2)
47
Build your hypothesis
48
Build your hypothesis
49
Build your hypothesis
H0: α = 0, 𝑙𝑖𝑡𝑡𝑒𝑟𝑓𝑎𝑙𝑙 ~ 𝜇 + ε
50
Build your hypothesis
51
Build your hypothesis
52
Build your hypothesis
ε ↪ 𝑁(0, 𝜎)
53
Build your hypothesis
54
Build your hypothesis
i. Independence
v. Linearity
55
Build your hypothesis
i. Independence
v. Linearity
56
Build your model in R
Check your 1. What are your variables?
data i. What is your response variable?
ii. What is your explanatory variable?
structure 2. How are your data distributed?
3. How do you expect your response variable to be
distributed?
57
Build your model in R
58
Build your model in R
59
Build your model in R
Formula: y ~ x
60
Build your model in R
Formula: y ~ x
61
Check the model fit
Check your 1. What are your variables?
data i. What is your response variable?
ii. What is your explanatory variable?
structure 2. How are your data distributed?
3. How do you expect your response variable to be
distributed?
62
Check the model fit
Check the model quality and the assumptions: the performance package
check_model(mod)
63
Check the model fit
Check the model quality and the assumptions: the performance package
check_model(mod)
64
Check the model fit
Check the model quality and the assumptions: the performance package
check_model(mod)
65
Check the model fit
Check the model quality and the assumptions: the performance package
check_model(mod)
66
Check the model fit
Check the model quality and the assumptions: the performance package
check_model(mod)
67
Check the model fit
Check the model quality and the assumptions: the performance package
check_model(mod)
68
Check the model fit
Check the model quality and the assumptions: the performance package
check_model(mod)
69
Data transformation and outliers
Check your 1. What are your variables?
data i. What is your response variable?
ii. What is your explanatory variable?
structure 2. How are your data distributed?
3. How do you expect your response variable to be
distributed?
70
Data transformation and outliers
71
Data transformation and outliers
72
Data transformation and outliers
73 Barry et al 2019
Data transformation and outliers
74 Barry et al 2019
Data transformation and outliers
75 Barry et al 2019
Data transformation and outliers
76 Barry et al 2019
Data transformation and outliers
77
Data transformation and outliers
78
Data transformation and outliers
79
Data transformation and outliers
80
Data transformation and outliers
Sigma: residual
standard error
AIC: fit quality –
weighted by the RMSE: Mean Root
number of variables Standard Error –
standard error of the
BIC: fit quality – residuals
weighted by the
number of variables R: fit quality – part of
and the sample size variance explained
81
Data transformation and outliers
YES
Extract your
results
83
Extract your results
summary(mod)
Mean litterfall when diversity null = 50.852 +/- 18.304 g/m2 (Estimate +/- 1.96 x SE)
Effect species richness = 53.960 +/- 15.958 g/m2/log(#species)
84
Extract your results
summary(mod)
Mean litterfall when diversity null = 50.852 +/- 18.304 g/m2 (Estimate +/- 1.96 x SE)
Effect species richness = 53.960 +/- 15.958 g/m2/log(#species)
85
Extract your results
summary(mod)
A B C D
lm(formula = litterfall ~ species, data = df)
86
Extract your results
summary(mod)
A B C D
lm(formula = litterfall ~ species, data = df)
𝑠𝑝𝑒𝑐𝑖𝑒% is 0 or 1
87
Extract your results
summary(mod)
88
Extract your results
summary(mod)
𝛼!
𝛼" − 𝛼!
𝛼# − 𝛼!
𝛼$ − 𝛼!
If you like to test the differences between the different factors you need to do
an ANOVA and a Tukey test
89
Extract your results
summary(mod)
90
Extract your results
summary(mod)
91
Extract your results
summary(mod)
92
Summary
In this lecture:
93
Your time to play
C loss (%)
N loss (%)
Litter sp. rich Litter sp. rich
C loss (%)
N loss (%)
Litterfall Litterfall
94