Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
33 views8 pages

Formula Sheet

This document provides a 3-sentence summary of the key information from the technical document: The document outlines formulas and statistical methods for simple and multiple linear regression, including equations for estimating coefficients, residuals, variance, and confidence intervals. It also covers model diagnostics such as standardized residuals, studentized residuals, autocorrelation of residuals, and the Durbin-Watson test. Multiple linear regression, ANOVA, and hypothesis testing procedures are described.

Uploaded by

Viktoria Weidel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views8 pages

Formula Sheet

This document provides a 3-sentence summary of the key information from the technical document: The document outlines formulas and statistical methods for simple and multiple linear regression, including equations for estimating coefficients, residuals, variance, and confidence intervals. It also covers model diagnostics such as standardized residuals, studentized residuals, autocorrelation of residuals, and the Durbin-Watson test. Multiple linear regression, ANOVA, and hypothesis testing procedures are described.

Uploaded by

Viktoria Weidel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Technical University Munich MA4401 Applied Regression

Department of Mathematics Formula Sheet

Prof. Donna Ankerst Winter Term 2021/2022

1 Simple linear regression


ˆ yi = β0 + β1 xi + εi with εi ∼ N (0, σ 2 )
iid

n n
1 X 1 X
ˆ Least squares estimates: b1 = 2 (xi − x̄)yi = 2 (xi − x̄)(yi − ȳ) and b0 = ȳ − b1 · x
sxx i=1 sxx i=1
n n n n
1X 1X
with x = xi , y = yi , s2xx = (xi − x)2 , s2yy = (yi − y)2 , and
X X
n i=1 n i=1 i=1 i=1
n
X
s2xy = (xi − x) (yi − y)
i=1

x2 σ2
    
1
ˆ b0 ∼ N β0 , σ 2
+ 2 and b1 ∼ N β1 , 2
n sxx sxx
n
1 X
ˆ Estimate for σ : s = 2
(yi − ybi )2 with ybi = b0 + b1 xi
2
n − 2 i=1
ˆ 100(1 − α)%-condence interval
s
 for the slope: b1 ± tn−2,1− α2 ·
sxx
v " #
∗ − x)2
u
1 (x
 at x∗ for the mean: b0 + b1 x∗ ± tn−2,1− α2
u
· ts2 +
n s2xx
v " #
∗ − x)2
u
1 (x
 at x∗ for a new observation: b0 + b1 x∗ ± tn−2,1− α2
u
· ts 2 1 + +
n s2xx

b1 − β1∗
ˆ t-test H0 : β1 = β1∗ for the slope: ∼ tn−2
s/sxx
x − x2
ˆ Two-sample t-test H0 : µ1 = µ2 : test statistic T = p 1 ∼ tn1 +n2 −2
sp 1/n1 + 1/n2
(n1 − 1)s21 + (n2 − 1)s22
with s2p = and s21 , s22 the sample variance of the two groups
n1 + n2 − 2
n
ˆ Sum of squares total: SStotal =
X
(yi − y)2 = s2yy
i=1
n
ˆ Sum of squares regression: SSregression =
X
yi − y)2
(b
i=1
n
ˆ Sum of squares residuals: SSresidual =
X
(yi − ybi )2
i=1

SSregression SSresidual s4xy


ˆ R-squared: r = 2
=1− = 2 2
SStotal SStotal sxx syy
s2xy
ˆ Pearson correlation: r =
sxx syy

2 Multiple linear regression


k
βj xij + εi with εi ∼ N (0, σ 2 ) , k is the number of covariates
iid
X
ˆ yi = β0 +
j=1

ˆ Least squares estimate b = (b0 , b1 , . . . , bk )′ = (X ′ X) X ′ y , b ∼ N (β, σ 2 (X ′ X)−1 )


−1

ˆ Residuals e = y − ŷ = (In − H)y , e ∼ N (0, σ 2 (In − H)) with hat matrix H = X(X ′ X)−1 X ′
n k
1
ˆ Estimate of σ 2 : s2 = (yi − ybi )2 with ybi = b0 +
X X
bj xij
n − k − 1 i=1 j=1
q
ˆ Standard error of b: se(bj ) = s j,j , j ∈ {0, . . . , k}
(X ′ X)−1

bj − β ∗
ˆ t-test H0 : βj = β ∗ for individual regression coecients: ∼ tn−k−1
se(bj )
ˆ 100(1 − α)%-condence interval
 for βj : bj ± tn−k−1,1− α2 · se(bj )
q
 at a for the mean: a b ± tn−k−1,1− α2 s a′ (X ′ X)−1 a

q
 at a for a new observation: a b ± tn−k−1,1− α2 s a′ (X ′ X)−1 a + 1

SSregression /k
ˆ F-test of H0 : β1 = . . . = βk = 0, test statistic: F = ∼ Fk,n−k−1
SSresidual /(n − k − 1)
restrict f ull
(SSresidual − SSresidual )/a
ˆ F-test of sets of linear hypotheses H0 : Aβ = 0 F = f ull
∼ Fa,n−k−1 ,
SSresidual /(n − k − 1)
where a is the rank of the matrix A and k is the number of predictors in the full model.
ˆ SSregression , SSresidual , SStotal , and R2 = r2 : see Simple Linear Regression.
ˆ Replace n − k − 1 with n − p, where p is the total number of parameters for general models with
or without an intercept.

3 Specication
ˆ One-way analysis of variance (ANOVA): F-test Pk of H0 : k 1groups
Pk have equal means, P ni , ȳi = number,
sample mean of group i, i = 1, . . . , k, n = i=1 ni , ȳ = n i=1 ni ȳi , SSregression = ki=1 ni (ȳi − ȳ)2 ,
SSregression /(k − 1)
− ȳi )2 , F =
Pk Pni
SSresidual = i=1 j=1 (yij ∼ Fk−1,n−k
SSresidual /(n − k)

4 Model diagnostics
ei
ˆ Standardized residuals esi = , s dened in Multiple Linear Regression
s
ei
ˆ Studentized residuals: di = √ ∼ N (0, 1) , where hii is the ith diagonal element of the
s 1 − hii
hat matrix, the leverage of case i
Pn
et et−k
ˆ Sample autocorrelation of residuals rk = t=k+1Pn 2 , k = 1, 2, . . .
t=1 et
Pn
(et − et−1 )2
ˆ Durbin-Watson test statistic: DW = t=2Pn 2 ≈ 2(1 − r1 )
t=1 et

ˆ High leverage if hii > 2(k + 1)/n


(b − b(i) )′ X ′ X(b − b(i) )
ˆ Cook's D-statistic: D = , where b(i) is the estimate of β with the ith case
(k + 1)s2
ei
deleted and b − b(i) = (X ′ X)−1 xi
1 − hii
n
ˆ Prediction error sum of squares: P RESS = e2(i) with the PRESS residual e(i) = yi − x′i b(i)
X

i=1

5 Lack of t, transformations


ni
k X
ˆ Pure error sum of squares: P ESS = with k the number of groups
X
(yij − y i )2 = SSresidual
f ull

i=1 j=1

k
ˆ Lack of t sum of squares: LF SS = − P ESS for an assumed model
X
ni (y i − ybi )2 = SSresidual
restrict

i=1

LF SS/(k − dim(β))
ˆ Lack of t test statistic ∼ Fk−dim(β),n−k for
P ESS/(n − k)
Hrestricted : µres = µres (xi , β) vs. Hf ull : µ = β1 I(xi = x1 ) + . . . + βk I(xi = xk )
(yiλ −1)
ˆ Box-Cox transformations: Find λ such that yi minimizes SSres (λ), where y g is the
(λ)
= λy gλ−1
geometric mean of y

6 Model selection
SSresidual /(n − k − 1)
ˆ Adjusted R2 : Radj
2
=1−
SStotal /(n − 1)
ˆ Akaike Information Criterion: AIC = n log(SSresidual /n) + 2p , where p is the total number of
parameters
ˆ Bayesian Information Criterion: BIC = n log(SSresidual /n) + log(n)p , where p is the total number
of parameters

7 Survival regression
ˆ Survivor function: S(t) = P (T ≥ t) = 1 − F (t) , with T survival time
P (t ≤ T < t + δ|T ≥ t) f (t)
ˆ Hazard function: h(t) = lim = , where f (t) = F ′ (t)
δ→0 δ S(t)
ˆ Cumulative hazard: H(t) = − log(S(t))
Number of individuals with survival times ≥ t
ˆ Empirical survivor function: Se (t) =
Number of individuals in data set
ˆ Average number of individuals at risk in interval j : n′j = nj − cj /2 , with assumption that censored
cases occur uniformly throughout j th interval and
 dj = number of deaths,
 cj = censored cases in interval,
 nj = number at risk (alive) at start of interval
ˆ Probability of death in j th interval: dj /n′j

ˆ Probability of survival in j th inteval: 1 − dj /n′j = (n′j − dj )/n′j


k
(n′j − dj )
ˆ Life table estimator: Slif e (t) = t ∈ [tk , tk+1 ), k = 1, ..., m , with
Y

j=1
n′j
[tj , tj+1 ) j th interval for j = 1, ..., m intervals
k
(nj − dj )
ˆ Kaplan-Meier estimator: SKM (t) = t ∈ [t(k) , t(k+1) ), k = 1, ..., r ,
Y

j=1
nj
k
Nelson-Aalen estimator: SN A (t) = exp(−dj /nj ) t ∈ [t(k) , t(k+1) ), k = 1, ..., r ,
Y

j=1
with t(1) < t(2) < ... < t(r) ordered, unique observed death times

ˆ Greenwood's formula for pointwise (1 − α)-CI for SKM (t): SKM (t) ± z1−α/2 ∗ se(SKM (t)) , with
k
!1/2
dj
, for t(k) ≤ t < t(k+1)
X
se(SKM (t)) ≈ SKM (t)
j=1
nj (nj − d j )

ˆ Median survival time t(50): t(50) = min(observed time|S(time) ≤ 0.50)


ˆ Non-parametric test for dierences in survival curves:
H0 : Survival Group I = Survival Group II vs. HA : Survival Group I ̸= Survival Group II
 t(1) < t(2) < ... < t(r) ordered, unique observed death times
 d1j , d2j number of deaths; n1j , n2j number at risk at t(j) for Group I and Group II, respectively
dj nj −dj
 
n1j −d
 Under H0 : d1j ∼ Hypergeometric distribution: P (d1j = d) = for d = 0, . . . , dj with
d
nj

n1j

dj n n d (n − dj )
e1j = E[d1j ] = n1j and v1j = Var(d1j ) = 1j 2j2 j j
nj nj (nj − 1)
r r
UL2
 Log-rank test: UL = (d1j − e1j ) , Var(UL ) = VL = v1j , under H0
X X
∼ χ21
j=1 j=1
VL
r r 2
UW
 Wilcoxon test: UW = nj (d1j − e1j ) , Var(UW ) = VW = n2j v1j , under H0
X X
∼ χ21
j=1 j=1
VW

ˆ Cox proportional hazard model equation: h(t|X) = P (T = t|T ≥ t, X) = h0 (t) exp(Xβ)


h(t|X = 1)
ˆ Hazard ratio for X = 1 vs. X = 0: HR = = exp(β)
h(t|X = 0)

8 Linear mixed models


ˆ Yi = Xi β + Zi bi + ei with bi ∼ Nq (0q , D), ei ∼ Nni (0ni , Ri ) independent
Yi ∼ N (Xi β, Vi (θ)) with Vi (θ) = Zi DZi′ + Ri
N
ni
ˆ Likelihood: L(β, θ) =
Y
(2π)− 2 |Vi (θ)|−1/2 exp(−1/2(Yi − Xi β)′ Vi (θ)−1 (Yi − Xi β))
i=1

N
!−1 N
ˆ Estimator for xed θ: β̂(θ) = Xi′ Wi Yi with Wi = Vi (θ)−1
X X
Xi′ Wi Xi
i=1 i=1

ˆ Best linear unbiased predictor: Ŷi = Xi β + Zi DZi′ Vi−1 (Yi − Xi β)

ˆ Raw residuals for individual i: ri = Yi − Xi βb , βb the mle for β

ˆ Random eect predictions: bbi = DZ


b ′ Vb −1 (Yi − Xi β)
i i
b

9 Logistic and Poisson regression


exp(x′i β)
Logistic regression Yi ∼ Ber(π(xi , β)) , with π(xi , β) = and the logistic link func-
1 + exp(x′i β)
 
P (Y = 1)
tion log = x′i β for i = 1, . . . , n
1 − P (Y = 1)
P (Y = 1|X) P (Y = 1|X)
ˆ Odds of Y = 1 for X : Odds(X) = =
P (Y = 0|X) 1 − P (Y = 1|X)
P (Y =1|X=1)
Odds(X = 1)
ˆ Odds ratio OR: OR = = exp(β) , with β the coecient for X .
P (Y =0|X=1)
= P (Y =1|X=0)
Odds(X = 0)
P (Y =0|X=0)

m
nk π(xk , β)[1 − π(xk , β)]xk x′k and asymptotically Var(b) ≈ I(b)−1 ,
X
ˆ I(b) = E[−D log(L(b))] =
2

k=1
where b is the mle of β , L(b) the log-likelihood evaluated at b
p
ˆ se(bk ) = Var(b)kk
bk
ˆ Wald test for individual coecients H0 : βk = 0: Z = ∼ N (0, 1)
se(bk )
ˆ (1 − α) - CI for OR: exp(bk ± z1−α/2 se(bk ))
 
L(bres )
ˆ Likelihood Ratio Test for restricted vs. full model: LRT = −2 log ∼ χ2a , where a =
L(b)
dim(b) − dim(bres )
m
[yk − nk π̂k ]2
ˆ Pearson chi-square: ∼ χ2m−p , where m is the number of constellations, π̂k the
X

k=1
n π̂
k k (1 − π̂ k )
estimated probabilities from a model with p parameters
G
(ok − nk π̄k )2
ˆ Hosmer and Lemeshow statistic: ∼ χ2g−2 , where G is the number of
X
HL =
k=1
nk π̄k (1 − π̄k )
percentile groups, with ok observed frequencies in group k, nk number of observations in group k
π̄k average estimated probability for group k
ˆ Hat matrix H = V −1/2 X(X ′ V −1 X)−1 X ′ V −1/2 , where V −1 = diag(ni π̂i (1 − π̂i ))
ri yi − ni π̂i
ˆ Standardized Pearson residuals: ris = √ =p √
1 − hii nk π̂k (1 − π̂k ) 1 − hii
ˆ Cook's inuence measure: Di = (b − b(i) )′ (X ′ V −1 X)−1 (b − b(i) ) ≈ (ris )2 hii (1 − hii )

Poisson regression Y ∼ P ois(µ) with log(µ) = β0 + β1 X1 + ... + βk Xk , k number of covariates


µy
ˆ Poisson distribution Y ∼ P ois(µ) : P (Y = y) = exp(−µ) , for y = 0, 1, 2, 3... and µ > 0
y!
ˆ (1 − α)-CI for mean ratio exp(bj ) of covariate j : exp(bj ± z1−α/2 se(bj ))

10 Spatio-temporal statistics
ˆ Observations {Z(si ; tj )} at spatial locations {si : i = 1, . . . , m}, times {tj : j = 1, . . . , T }
T
1X
ˆ At location si and across all times, empirical mean: µ̂z,s (si ) = Z(si ; tj ) and covariance:
T j=1
T
1X
Ĉz(0) (si , sk ) = (Z(si ; tj ) − µ̂z,s (si )) (Z(sk ; tj ) − µ̂z,s (sk ))
T j=1

ˆ At time tj , Ztj = (Z(s1 ; tj ), . . . , Z(sm ; tj ))′ , µ̂z,s = (µ̂z,s (s1 ), . . . , µ̂z,s (sm ))′ = T1 Tj=1 Ztj ∈ Rm
P

ˆ For τ = 0, 1, . . . , T − 1, empirical lag-τ covariance between 2 stations:


T
1
(Z(si ; tj ) − µ̂z,s (si )) (Z(sk ; tj − τ ) − µ̂z,s (sk )) , spatial covariance matrix:
X
(τ )
Ĉz (si , sk ) =
T − τ j=τ +1
T
1 X
Ĉz(τ ) = {Ĉz(τ ) (si , sk )} = (Zt − µ̂z,s )(Ztj −τ − µ̂z,s )′ ∈ Rm×m
T − τ j=τ +1 j
ˆ Cross-correlation between outcomes Ztj ∈ Rm and Xtj ∈ Rn :
T
1 X
(τ )
Ĉz,x = (Ztj − µ̂z,s )(Xtj −τ − µ̂x,s )′ ∈ Rm×n
T − τ j=τ +1
ˆ Neighborhoods: Nt (τ )= pairs of points within time lag τ of each other, Ns (h)= pairs of points
within spatial lag h of each other
ˆ Empirical spatio-temporal covariogram:
1 1
(Z(si ; tj ) − µ̂z,s (si )) (Z(sk ; tl ) − µ̂z,s (sk )) with |N (·)|=
X X
Ĉz (h; τ ) =
|Ns (h)| |Nt (τ )|
si ,sk ∈Ns (h) tj ,tl ∈Nt (τ )

number of elements in N (·), here the number of pairs


1
ˆ Semivariogram: γz (si , sk ; tj , tl ) = var(Z(si ; tj ) − Z(sk ; tl )) ≈ Ĉz (0; 0) − Ĉz (h; τ )
2
ˆ Stationary semivariogram: γz (h; τ ) = 12 var(Z(s+h; t+τ )−Z(s, t)) = 21 E ((Z(s + h; t + τ ) − Z(s, t))2 ) ≈
γ̂z (h; τ ) = |Ns1(h)| |Nt1(τ )| si ,sk ∈Ns (h) tj ,tl ∈Nt (τ ) (Z(si ; tj ) − Z(sk ; tl ))2
P P

ˆ Principal component analysis (PCA) decomposition: Ĉz = ΨΛΨ′ , where Ψ, Λ ∈ Rm×m , empirical
(0)

orthogonal functions (EOFs): eigenvectors ψk ∈ Rm that are columns of Ψ, eigenvalues: diagonal


elements λk ∈ R of Λ for k = 1, . . . , m
ˆ k th principal component time series: ak = {ak (tj ) : j = 1, . . . , T }, ak (tj ) = ψk′ Ztj ∈ R
ˆ k th canonical correlation for k = 1, . . . , min(m, n):
(0)
cov(ak , bk ) ξk′ Ĉz,x ψk
rk = cor(ak , bk ) = p p rk =  1/2  1/2 ∈ R , cov and var are cal-
var(ak ) var(bk ) ′ (0)
ξk Ĉz ξk ′ (0)
ψk Ĉx ψk
culated over the time domain, ak , bk ∈ Rm with elements ak (tj ) = ξk′ Ztj ∈ R, bk (tj ) = ψk′ Xtj ∈ R,
ξk and ψk are optimal weights
ˆ Long data format: Z(s11 ; t1 ), Z(s21 ; t1 ), . . . , Z(sm1 1 ; t1 ), . . . , Z(s1T ; tT ), Z(s2T ; tT ), . . . , Z(smT T ; tT )
ˆ Inverse distance weighting:
mj
T X
w̃ij (s0 ; t0 )
,
X
Ẑ(s0 ; t0 ) = wij (s0 ; t0 )Z(sij ; tj ), wij (s0 ; t0 ) = PT Pmk
j=1 i=1 k=1 l=1 w̃lk (s0 ; t0 )

1
w̃ij (s0 ; t0 ) = with d(·, ·): distance, α: smoothing parameter
d ((sij ; tj ), (s0 ; t0 ))α
ˆ Kernel predictors: w̃ij (s0 ; t0 ) = k((sij ; tj ), (s0 ; t0 ); θ) with k : kernel function, θ: bandwidth param-
eter
ˆ Regression or trend surface estimation: Z(si ; tj ) = β0 + β1 X1 (si ; tj ) + · · · + βp Xp (si ; tj ) + e(si ; tj )
with e(si ; tj ) ∼ N (0, σe2 ) indep
T X
m  2
ˆ Residual sum of squares: RSS = with tted values Ẑ(si ; tj ) =
X
Z(si ; tj ) − Ẑ(si ; tj )
j=1 i=1

β̂0 + β̂1 X1 (si ; tj ) + · · · + β̂p Xp (si ; tj ) and ordinary least squares estimates β̂0 , β̂1 , . . . , β̂p
γ̂e (∥h1 ∥; τ1 )
ˆ F-test for H0 : spatio-temporal independence: F = − 1 with γ̂e (∥h1 ∥; τ1 ): empirical
σ̂e2
semivariogram estimate at the smallest spatial (∥h1 ∥) and temporal (τ1 ) lags, σ̂e2 : residual error
estimate
ˆ Data model: Z = Y + ϵ with Y = (Y (s11 ; t1 ), . . . , Y (smT T ; tT ))′ , ϵ = (ϵ(s11 ; t1 ), . . . , ϵ(smT T ; tT ))′
is random
ˆ Process model: Y = µ + η with µ = (µ(s11 ; t1 ), . . . , µ(smT T ; tT ))′ = Xβ is xed,
η = (η(s11 ; t1 ), . . . , η(smT T ; tT ))′ is random, independent of ϵ
ˆ Covariances: Cov(Y ) = Cη , Cov(Z) = Cη + Cϵ

ˆ Estimate: β̂gls = (X ′ CZ−1 X)−1 X ′ CZ−1 Z


ˆ Kriging at location (s0 ; t0 ): c′0 = cov(Y (s0 ; t0 ), Z) ∈ R1× j=1 mj , c0,0 = var(Y (s0 ; t0 )) ∈ R1 ,
PT

predictor: Ŷ (s0 ; t0 ) = x(s0 ; t0 )′ β̂gls + c′0 Cz−1 (Z − X β̂gls ), variance: σY,uk


2
(s0 ; t0 ) = c0,0 − c′0 Cz−1 c0 + κ
with κ = (x(s0 ; t0 ) − X Cz c0 ) (X Cz X) (x(s0 ; t0 ) − X Cz 1c0 ), standard error: σY,uk (s0 ; t0 )
′ −1 ′ ′ −1 −1 ′ −

ˆ Spatio-temporal covariance function: c∗ (s, s′ ; t, t′ ) = cov(Y (s; t), Y (s′ , t′ ))


ˆ Second-order or weakly stationary: constant expectation and c∗ (s, s′ ; t, t′ ) = c∗ (s − s′ ; t − t′ ) = c(h; τ )
with h = s − s′ , τ = t − t′
c(h; τ )
ˆ Spatio-temporal correlation function: ρ(h; τ ) =
c(0; 0)
ˆ Dynamic spatio-temporal model (DSTM): Zt (·) = Ht (Yt (·), θd,t , ϵt (·)) , t = 1, . . . , T ; · denotes a
spatial location
ˆ First-order Markov model: Yt (·) = M(Yt−1 (·), θp,t , ηt (·))
ˆ Latent linear Gaussian DSTM: Zt = bt + Ht Yt + ϵt , ϵt ∼ Gau(0, Cϵ,t )
ˆ Continuous rst-order spatio-temporal integro-dierence equation (IDE) process:
Z
Yt (s) = m(s, x; θp )Yt−1 (x)dx + ηt (s)
Ds
n
ˆ Discretized IDE process: Yt (si ) = mij (θp )Yt−1 (sj ) + ηt (si ) , matrix form: Yt = M Yt−1 + ηt
X

j=1

ˆ Posterior predictive distribution: [Zppd |Z] =


RR
[Zppd |Y, θ][Y, θ|Z]dY dθ
ˆ Prior predictive distribution: [Zpri ] =
RR
[Zpri |Y, θ][Y |θ][θ]dY dθ
ˆ Empirical predictive distribution: [Zepd |Z] = [Zepd |Y, θ̂][Y |Z, θ̂]dY
R

ˆ Empirical marginal distribution: [Zemp ] = [Zemp |Y, θ̂][Y |θ̂]dY


R

ˆ Mean squared prediction error: M SP E = T1m Tj=1 m 2


P P
i=1 {Zν (si ; tj ) − Ẑν (si ; tj )}
ˆ Mean absolute prediction error: M AP E = T1m Tj=1 m
P P
i=1 |Zν (si ; tj ) − Ẑν (si ; tj )|

ˆ Model averaging: [g|Z] = l=1 [g|Z, Ml ]P (Ml |Z), P (Ml |Z) = PL[Z|M , prior model prob-
PL l ]P (Ml )

j=1 [Z|Mj ]P (Mj )


ability: P (Ml ), marginal likelihood: [Z|Ml ] =
RR
[Z|Y, θ, Ml ][Y |θ, Ml ][θ|Ml ]dY dθ
[Z|Ml ]
ˆ Bayes factor: Bl,k (Z) =
[Z|Mk ]

ˆ Akaike information criterion: AIC(Ml ) = −2 log[Z|θ̂, Ml ] + 2pl , pl is number of parameters

ˆ Bayesian information criterion: BIC(Ml ) = −2 log[Z|θ̂, Ml ] + log(m∗ )pl , m∗ is the sample size

You might also like