gkwdist implements the Generalized Kumaraswamy (GKw) distribution
family and its seven nested sub-models for bounded continuous data on
Key Features: - Seven flexible distributions for proportions, rates,
and bounded data - Standard R distribution API: d*, p*, q*, r* -
Analytical log-likelihood, gradient, and Hessian functions - All
functions implemented in C++ for optimal performance - 10-50× faster
than equivalent R implementations - Numerically stable for extreme
parameter values
# Install from GitHub
# install.packages("devtools")
devtools::install_github("evandeilton/gkwdist")| Distribution | Code | Parameters | Functions |
|---|---|---|---|
| Generalized Kumaraswamy | gkw |
dgkw, pgkw, qgkw, rgkw, llgkw, grgkw, hsgkw
|
|
| Beta-Kumaraswamy | bkw |
dbkw, pbkw, qbkw, rbkw, llbkw, grbkw, hsbkw
|
|
| Kumaraswamy-Kumaraswamy | kkw |
dkkw, pkkw, qkkw, rkkw, llkkw, grkkw, hskkw
|
|
| Exponentiated Kumaraswamy | ekw |
dekw, pekw, qekw, rekw, llekw, grekw, hsekw
|
|
| McDonald (Beta Power) | mc |
dmc, pmc, qmc, rmc, llmc, grmc, hsmc
|
|
| Kumaraswamy | kw |
dkw, pkw, qkw, rkw, llkw, grkw, hskw
|
|
| Beta | beta_ |
dbeta_, pbeta_, qbeta_, rbeta_, llbeta, grbeta, hsbeta
|
Distribution Functions (C++ implementation with R interface):
d*()— Probability density function (PDF)p*()— Cumulative distribution function (CDF)q*()— Quantile function (inverse CDF)r*()— Random number generation
Analytical Functions for Maximum Likelihood (C++ implementation):
All analytical functions use the signature: function(par, data) where
par is a numeric vector of parameters.
ll*(par, data)— Negative log-likelihood:
gr*(par, data)— Negative gradient (negative score vector):
hs*(par, data)— Negative Hessian matrix:
Note: These functions return negative values to facilitate
direct use with optimization routines like optim(), which perform
minimization by default.
Notation: All parameters are strictly positive
(
Parameters:
PDF:
CDF:
Quantile: Numerical inversion of the CDF via root-finding algorithms.
Moments: Analytical expressions not available in closed form. Numerical integration or simulation methods required.
Relationship: Special case of GKw with
PDF:
Simplifying:
CDF:
Quantile: Numerical inversion via root-finding. For the inverse:
Solving for
Relationship: Special case of GKw with
PDF:
CDF:
Quantile (closed-form):
Moments: Analytical expressions not available in closed form.
Relationship: Special case of GKw with
PDF:
CDF:
Quantile (closed-form):
Moments:
where the binomial coefficient is generalized:
Relationship: Special case of GKw with
PDF:
CDF:
Quantile:
where
Moments:
which is valid for
Relationship: Special case of GKw with
PDF:
CDF:
Quantile (closed-form):
Moments:
which is valid for
Special Cases:
Relationship: Special case of GKw with
PDF:
CDF:
Quantile:
Moments:
GKw(α, β, γ, δ, λ)
/ \
λ = 1 γ = 1
/ \
BKw(α, β, γ, δ) KKw(α, β, δ, λ)
| |
α = β = 1 δ = 1
| |
MC(γ, δ, λ) EKw(α, β, λ)
| |
λ = 1 γ = δ = 1
| |
Beta(γ, δ) Kw(α, β)
Note: The Beta distribution is obtained from MC by setting
library(gkwdist)
# Set parameters
alpha <- 2; beta <- 3; gamma <- 1.5; delta <- 2; lambda <- 1.2
x <- seq(0.01, 0.99, length.out = 100)
# Density
dens <- dgkw(x, alpha, beta, gamma, delta, lambda)
# CDF
cdf <- pgkw(x, alpha, beta, gamma, delta, lambda)
# Quantiles
q <- qgkw(c(0.25, 0.5, 0.75), alpha, beta, gamma, delta, lambda)
print(q)
# Random generation
set.seed(123)
random_sample <- rgkw(1000, alpha, beta, gamma, delta, lambda)
# Visualization
par(mfrow = c(1, 2))
plot(x, dens, type = "l", lwd = 2, col = "blue",
main = "GKw PDF", xlab = "x", ylab = "Density")
grid()
plot(x, cdf, type = "l", lwd = 2, col = "red",
main = "GKw CDF", xlab = "x", ylab = "F(x)")
grid()library(gkwdist)
par(mfrow = c(1,1), mar = c(3,3,2,2))
x <- seq(0.001, 0.999, length.out = 500)
# Compute densities for all families
d_gkw <- dgkw(x, 2, 3, 1.5, 2, 1.2)
d_bkw <- dbkw(x, 2, 3, 1.5, 2)
d_kkw <- dkkw(x, 2, 3, 2, 1.2)
d_ekw <- dekw(x, 2, 3, 1.5)
d_mc <- dmc(x, 2, 3, 1.2)
d_kw <- dkw(x, 2, 5)
d_beta <- dbeta_(x, 2, 3)
# Plot comparison
plot(x, d_gkw, type = "l", lwd = 2, col = "black",
ylim = c(0, max(d_gkw, d_bkw, d_kkw, d_ekw, d_mc, d_kw, d_beta)),
main = "Distribution Family Comparison",
xlab = "x", ylab = "Density")
lines(x, d_bkw, lwd = 2, col = "red")
lines(x, d_kkw, lwd = 2, col = "blue")
lines(x, d_ekw, lwd = 2, col = "green")
lines(x, d_mc, lwd = 2, col = "purple")
lines(x, d_kw, lwd = 2, col = "orange")
lines(x, d_beta, lwd = 2, col = "brown")
legend("topright",
legend = c("GKw", "BKw", "KKw", "EKw", "MC", "Kw", "Beta"),
col = c("black", "red", "blue", "green", "purple", "orange", "brown"),
lwd = 2, cex = 0.85)
grid()library(gkwdist)
# Generate synthetic data from Kumaraswamy distribution
set.seed(2024)
n <- 2000
true_alpha <- 2.5
true_beta <- 3.5
data <- rkw(n, true_alpha, true_beta)
# Get starting values
par_ini <- gkwgetstartvalues(data, family = "kw", n_starts = 2)
# Optimization using BFGS with analytical gradient
fit <- optim(
par = par_ini,
fn = llkw, # Negative log-likelihood function
gr = grkw, # Negative gradient (negative score function)
data = data,
method = "BFGS",
hessian = TRUE
)
# Standard errors from observed information matrix
se <- sqrt(diag(solve(fit$hessian)))
# 95% Confidence intervals
ci <- cbind(
Lower = fit$par - 1.96 * se,
Estimate = fit$par,
Upper = fit$par + 1.96 * se
)
rownames(ci) <- c("alpha", "beta")
cat("True parameters: ", true_alpha, true_beta, "\n")
cat("Estimated parameters:", fit$par, "\n\n")
print(ci)library(gkwdist)
# Using fitted model from Example 3
x_grid <- seq(0.001, 0.999, length.out = 200)
# Fitted density
fitted_dens <- dkw(x_grid, fit$par[1], fit$par[2])
# True density (for comparison)
true_dens <- dkw(x_grid, true_alpha, true_beta)
# Diagnostic plot
hist(data, breaks = 30, probability = TRUE,
col = "lightgray", border = "white",
main = "Kumaraswamy Distribution Fit",
xlab = "Data", ylab = "Density")
lines(x_grid, fitted_dens, col = "red", lwd = 2, lty = 1)
lines(x_grid, true_dens, col = "blue", lwd = 2, lty = 2)
legend("topright",
legend = c("Observed Data", "Fitted Model", "True Model"),
col = c("gray", "red", "blue"),
lwd = c(10, 2, 2), lty = c(1, 1, 2))
grid()library(gkwdist)
# Generate data from Exponentiated Kumaraswamy
set.seed(456)
n <- 1500
data <- rekw(n, alpha = 2, beta = 3, lambda = 1.5)
# Define competing models
models <- list(
Beta = list(
nll = function(par) llbeta(par, data),
gr = function(par) grbeta(par, data),
start = gkwgetstartvalues(data, family = "beta"),
npar = 2
),
Kw = list(
nll = function(par) llkw(par, data),
gr = function(par) grkw(par, data),
start = gkwgetstartvalues(data, family = "kw"),
npar = 2
),
EKw = list(
nll = function(par) llekw(par, data),
gr = function(par) grekw(par, data),
start = gkwgetstartvalues(data, family = "ekw"),
npar = 3
),
MC = list(
nll = function(par) llmc(par, data),
gr = function(par) grmc(par, data),
start = gkwgetstartvalues(data, family = "mc"),
npar = 3
)
)
# Fit all models using optim with analytical gradients
fits <- lapply(models, function(m) {
optim(par = m$start, fn = m$nll, gr = m$gr, method = "BFGS")
})
# Extract log-likelihoods and compute information criteria
loglik <- sapply(fits, function(f) -f$value)
k <- sapply(models, `[[`, "npar")
results <- data.frame(
Model = names(models),
Coefs = sapply(fits, function(f) paste0(round(f$par, 3), collapse = "|")),
LogLik = round(loglik, 2),
nPar = k,
AIC = round(-2 * loglik + 2 * k, 2),
BIC = round(-2 * loglik + k * log(n), 2)
)
# Sort by AIC
results <- results[order(results$AIC), ]
print(results, row.names = FALSE)
cat("\nBest model by AIC:", results$Model[1], "\n")library(gkwdist)
# Generate data from EKw
set.seed(789)
n <- 1000
data <- rekw(n, alpha = 2, beta = 3, lambda = 1.5)
params <- c(2, 3, 1.5)
# Compute analytical functions at true parameters
nll <- llekw(params, data)
neg_score <- grekw(params, data)
neg_hess <- hsekw(params, data)
# Fisher information matrix (negative of negative Hessian = Hessian)
fisher <- -neg_hess
# Asymptotic standard errors
se <- sqrt(diag(solve(fisher)))
names(se) <- c("alpha", "beta", "lambda")
cat("Negative log-likelihood:", nll, "\n")
cat("\nNegative score vector (should be close to zero at true params):\n")
print(neg_score)
cat("\nNegative Hessian matrix:\n")
print(neg_hess)
cat("\nFisher Information matrix:\n")
print(fisher)
cat("\nAsymptotic standard errors:\n")
print(se)library(gkwdist)
# Generate data and fit model
set.seed(101)
n <- 2000
data <- rkw(n, alpha = 2, beta = 3)
# Fit using optim
fit <- optim(
par = c(1, 1),
fn = function(par) llkw(par, data),
gr = function(par) grkw(par, data),
method = "BFGS"
)
# Theoretical quantiles
p <- ppoints(n)
theoretical_q <- qkw(p, fit$par[1], fit$par[2])
# Empirical quantiles
empirical_q <- sort(data)
# Q-Q plot
plot(theoretical_q, empirical_q,
xlab = "Theoretical Quantiles",
ylab = "Empirical Quantiles",
main = "Q-Q Plot: Kumaraswamy Distribution",
pch = 19, col = rgb(0, 0, 1, 0.5))
abline(0, 1, col = "red", lwd = 2, lty = 2)
grid()library(gkwdist)
# Generate data from GKw
set.seed(2025)
n <- 10000
true_params <- c(alpha = 2, beta = 2.5, gamma = 1.5, delta = 2, lambda = 1.3)
data <- rgkw(n, true_params[1], true_params[2], true_params[3],
true_params[4], true_params[5])
# Get starting values
par_ini <- gkwgetstartvalues(data, family = "gkw", n_starts = 5)
# Fit using optim with analytical gradient
fit_gkw <- optim(
par = par_ini,
fn = llgkw, # Negative log-likelihood
gr = grgkw, # Negative gradient
data = data,
method = "BFGS",
hessian = TRUE,
control = list(maxit = 1000)
)
# Results
se <- sqrt(diag(solve(fit_gkw$hessian)))
estimates <- data.frame(
Parameter = names(true_params),
True = true_params,
Estimate = fit_gkw$par,
SE = se
)
print(estimates, digits = 4)library(gkwdist)
# Generate data
set.seed(999)
n <- 10000
data <- rkw(n, alpha = 2, beta = 3)
par <- c(2, 3)
# Negative analytical gradient
neg_grad_analytical <- grkw(par, data)
# Numerical gradient (using finite differences)
neg_grad_numerical <- numDeriv::grad(
func = function(p) llkw(p, data),
x = par
)
# Compare
comparison <- data.frame(
Parameter = c("alpha", "beta"),
Analytical = neg_grad_analytical,
Numerical = neg_grad_numerical,
Difference = abs(neg_grad_analytical - neg_grad_numerical)
)
print(comparison, digits = 8)The C++ implementation provides substantial performance gains:
library(microbenchmark)
# Generate large dataset
n <- 10000
data <- rkw(n, 2, 3)
# Compare log-likelihood computation
benchmark <- microbenchmark(
R_sum_log_d = -sum(log(dkw(data, 2, 3))), # Manual negative log-likelihood
Cpp_ll = llkw(c(2, 3), data), # C++ negative log-likelihood
times = 100
)
print(benchmark)
plot(benchmark)
# Typical results: C++ implementation is 10-50× fasterWhy C++ Implementation Matters:
- Speed: Critical for maximum likelihood estimation with large datasets
- Numerical Stability: Better handling of extreme parameter values and edge cases
- Memory Efficiency: Optimized memory allocation in tight loops
- Scalability: Linear scaling with sample size
- Precision: Analytical derivatives are exact (up to floating-point precision)
| Data Characteristics | Recommended Distribution | Rationale |
|---|---|---|
| Unimodal, symmetric | Beta | Parsimony; well-studied properties |
| Unimodal, asymmetric | Kumaraswamy | Closed-form CDF and quantile |
| Bimodal or U-shaped | GKw or BKw | Maximum flexibility |
| Extreme skewness | KKw or EKw | Fine control over tail behavior |
| J-shaped (monotonic) | Kw or Beta | With appropriate parameter values |
| Power transformations | McDonald | Explicit power parameter |
| Unknown shape | GKw | Fit and test nested models |
Model Selection Workflow:
- Start with Beta and Kumaraswamy (2 parameters)
- Check goodness-of-fit using Q-Q plots and formal tests
- If inadequate, try EKw or MC (3 parameters)
- For complex patterns, use BKw, KKw, or GKw (4-5 parameters)
- Use AIC/BIC to balance fit quality and parsimony
- Validate final model with residual diagnostics
-
Cordeiro, G. M., & de Castro, M. (2011). A new family of generalized distributions. Journal of Statistical Computation and Simulation, 81(7), 883-898. doi:10.1080/00949650903530745
-
Carrasco, J. M. F., Ferrari, S. L. P., & Cordeiro, G. M. (2010). A new generalized Kumaraswamy distribution. arXiv:1004.0911. arxiv.org/abs/1004.0911
-
Kumaraswamy, P. (1980). A generalized probability density function for double-bounded random processes. Journal of Hydrology, 46(1-2), 79-88. doi:10.1016/0022-1694(80)90036-0
-
Jones, M. C. (2009). Kumaraswamy’s distribution: A beta-type distribution with some tractability advantages. Statistical Methodology, 6(1), 70-81. doi:10.1016/j.stamet.2008.04.001
-
Lemonte, A. J., & Cordeiro, G. M. (2013). An extended Lomax distribution. Statistics, 47(4), 800-816. doi:10.1080/02331888.2011.568119
-
Cordeiro, G. M., & Lemonte, A. J. (2011). The β-Birnbaum–Saunders distribution: An improved distribution for fatigue life modeling. Computational Statistics & Data Analysis, 55(3), 1445-1461. doi:10.1016/j.csda.2010.10.007
-
McDonald, J. B. (1984). Some generalized functions for the size distribution of income. Econometrica, 52(3), 647-663. doi:10.2307/1913469
-
Cordeiro, G. M., & Brito, R. S. (2012). The beta power distribution. Brazilian Journal of Probability and Statistics, 26(1), 88-112. doi:10.1214/10-BJPS124
citation("gkwdist")@Manual{gkwdist2025,
title = {gkwdist: Generalized Kumaraswamy Distribution Family},
author = {J. E. Lopes},
year = {2025},
note = {R package},
url = {https://github.com/evandeilton/gkwdist}
}J. E. Lopes
LEG - Laboratory of Statistics and Geoinformation
PPGMNE - Graduate Program in Numerical Methods in Engineering
Federal University of Paraná (UFPR), Brazil
Email: [email protected]
MIT License. See LICENSE file for details.