The goal of the BayesGP
package is to efficiently implement
model-based smoothing with flexible GP priors, within a variety of
Bayesian hierarchical models.
You can install the development version of BayesGP from GitHub with:
# install.packages("devtools")
devtools::install_github("https://github.com/Bayes-GP/BayesGP/tree/development")
In this example, we will use the lynx
dataset as an example, which can
be accessed directly from R. Let’s load the dataset and visualize it:
library(BayesGP)
data <- data.frame(year = seq(1821, 1934, by = 1), y = as.numeric(lynx))
data$x <- data$year - min(data$year)
plot(lynx)
Based on a visual examination of the dataset, we can observe an obvious 10-year quasi-periodic behavior in the lynx count with evolving amplitudes over time. Therefore, we will consider fitting the following model:
$$
\begin{equation}
\begin{aligned}
y_i|\lambda_i &\sim \text{Poisson}(\lambda_i) ,\
\log(\lambda_i) &= \eta_i = \beta_0 + g(x_i) + \xi_i,\
g &\sim \text{sGP} \bigg(\alpha = \frac{2\pi}{10}, \sigma\bigg),\
\xi_i &\sim N(0,\sigma_\xi).
\end{aligned}
\end{equation}
$$ Here, each
To specify the priors for the sGP boundary conditions and the intercept
parameter, we assume independent normal priors with mean 0 and variance
1000. For the overdispersion parameter
To determine the prior for the standard deviation parameter
To make inference of the quasi-periodic seasonal effect through the
seasonal-B spline approach, we make use of the model_fit
function from
BayesGP
. The prior information of the sGP component should be
specified in f(.)
. The argument k = 30
will setup 30 equally spaced
seasonal B-spline basis functions for the approximation. A higher value
of k
will increase the approximation accuracy of the seasonal B-spline
approximation, at the cost of longer runtime.
By default, the posterior is obtained through an approximation using the
adaptive quadrature method described in the main paper. However, exact
method through MCMC sampling is also possible, by specifying
method = "MCMC"
in model_fit
.
mod <- model_fit(formula = y ~ f(x, model = "sgp", period = 10, k = 30,
sd.prior = list(param = list(u = 1, alpha = 0.01), h = 50),
boundary.prior = list(prec = 0.001)) +
f(year, model = "iid", sd.prior = list(param = list(u = 1, alpha = 0.01))),
family = "Poisson", data = data, method = "aghq")
The posterior summary of the fitted model can be examined through
summary
or plot
:
summary(mod)
#> Here are some posterior/prior summaries for the parameters:
#> name median q0.025 q0.975 prior prior:P1 prior:P2
#> 1 intercept 6.687 6.508 6.868 Normal 0 1e+03
#> 2 x (PSD) 2.841 2.230 3.591 Exponential 1 1e-02
#> 3 year (SD) 0.250 0.194 0.325 Exponential 1 1e-02
#> For Normal prior, P1 is its mean and P2 is its variance.
#> For Exponential prior, prior is specified as P(theta > P1) = P2.
plot(mod)
This is a basic example which shows you how to use BayesGP
to fit and
analyze some models, we consider the following data set of COVID-19
mortality in Canada, which is available in the package:
library(BayesGP)
## basic example code
head(covid_canada)
#> Date new_deaths t weekdays1 weekdays2 weekdays3 weekdays4
#> 1 2020-03-01 0 591.0323 -1 -1 -1 -1
#> 2 2020-03-02 0 591.0645 1 0 0 0
#> 3 2020-03-03 0 591.0968 0 1 0 0
#> 4 2020-03-04 0 591.1290 0 0 1 0
#> 5 2020-03-05 0 591.1613 0 0 0 1
#> 6 2020-03-06 0 591.1935 0 0 0 0
#> weekdays5 weekdays6 index
#> 1 -1 -1 1
#> 2 0 0 2
#> 3 0 0 3
#> 4 0 0 4
#> 5 0 0 5
#> 6 1 0 6
We can fit a model with model_fit
:
fit_result <- model_fit(new_deaths ~ weekdays1 + weekdays2 + weekdays3 + weekdays4 + weekdays5 + weekdays6 +
f(smoothing_var = t, model = "IWP", order = 3, k = 30),
data = covid_canada, method = "aghq", family = "Poisson")
We can take a look at the posterior summary of this model:
summary(fit_result)
#> Here are some posterior/prior summaries for the parameters:
#> name median q0.025 q0.975 prior prior:P1 prior:P2
#> 1 intercept 3.668 3.593 3.743 Normal 0 1e+03
#> 2 weekdays1 0.093 0.070 0.117 Normal 0 1e+03
#> 3 weekdays2 0.079 0.055 0.102 Normal 0 1e+03
#> 4 weekdays3 0.127 0.103 0.150 Normal 0 1e+03
#> 5 weekdays4 0.125 0.101 0.149 Normal 0 1e+03
#> 6 weekdays5 0.050 0.025 0.074 Normal 0 1e+03
#> 7 weekdays6 -0.152 -0.178 -0.126 Normal 0 1e+03
#> 8 t (SD) 5.175 4.009 6.940 Exponential 1 5e-01
#> For Normal prior, P1 is its mean and P2 is its variance.
#> For Exponential prior, prior is specified as P(theta > P1) = P2.
We can also see the inferred function
plot(fit_result)
We can use the predict
function to obtain the posterior summary of new_data
.
For the function
library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr 1.1.3 ✔ readr 2.1.4
#> ✔ forcats 1.0.0 ✔ stringr 1.5.0
#> ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
#> ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
#> ✔ purrr 1.0.2
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
predict_f <- predict(fit_result, variable = "t", newdata = data.frame(t = seq(from = 605, to = 617, by = 0.1)))
predict_f %>% ggplot(aes(x = t)) + geom_line(aes(y = mean), lty = "solid") +
geom_line(aes(y = `q0.025`), lty = "dashed") +
geom_line(aes(y = `q0.975`), lty = "dashed") +
theme_classic()
For the first derivative:
predict_f1st <- predict(fit_result, variable = "t", newdata = data.frame(t = seq(from = 605, to = 617, by = 0.1)), deriv = 1)
predict_f1st %>% ggplot(aes(x = t)) + geom_line(aes(y = mean), lty = "solid") +
geom_line(aes(y = `q0.025`), lty = "dashed") +
geom_line(aes(y = `q0.975`), lty = "dashed") +
theme_classic()
For the second derivative:
predict_f2nd <- predict(fit_result, variable = "t", newdata = data.frame(t = seq(from = 605, to = 617, by = 0.1)), deriv = 2)
predict_f2nd %>% ggplot(aes(x = t)) + geom_line(aes(y = mean), lty = "solid") +
geom_line(aes(y = `q0.025`), lty = "dashed") +
geom_line(aes(y = `q0.975`), lty = "dashed") +
theme_classic()