Over informative processes, naive estimator of learning---difference between post and pre process scores---underestimates actual learning. A heuristic account for why the naive estimator is negatively biased is as follows: people know as much or more after exposed to an informative process than before it. And the less people know, the larger the number of items they don't know. And greater the opportunity to guess.
Guessing, even when random, only increases the proportion correct. Thus, bias due to guessing for naive measures of knowledge is always positive. On average, thus, there is more positive bias in the pre-process scores than post-process scores. And naturally, subtracting pre-process scores from post-process provides an attenuated estimate of actual learning. For a more complete treatment of the issue, read this paper by Ken Cor and Gaurav Sood.
We provide a few different ways to adjust estimates of learning for guessing. For now, we limit our attention to cases where the same battery of knowledge questions has been asked in both the pre- and the post-process wave. And to cases where closed-ended questions have been asked. (Guessing is not a serious issue on open-ended items. See more evidence for that in DK Means DK by Robert Luskin and John Bullock.) More generally, the package implements the methods to adjust learning for guessing discussed in this paper.
To get the current release version from CRAN:
install.packages("guess")
To get the current development version from GitHub:
# install.packages("devtools")
library(devtools)
devtools::install_github("finite-sample/guess", build_vignettes = TRUE)
To learn about how to use the package, see the vignette:
# Overview of the package
vignette("using_guess", package = "guess")
The package provides several methods for adjusting learning estimates:
lca_cor()
: Latent Class Analysis correction using transition matricesstnd_cor()
: Standard guessing correction based on number of incorrect responsesgroup_adj()
: Group-level adjustment accounting for propensity to guessfit_model()
: Unified model fitting function with goodness-of-fit testing
Important: The group_adj()
function handles different groups implicitly through their gamma values (propensity to guess), rather than requiring explicit group identifiers. Groups with different guessing behaviors should use different gamma parameters:
# Example: Different groups with different guessing rates
high_ability_gamma <- 0.15 # Lower guessing rate
low_ability_gamma <- 0.35 # Higher guessing rate
# Apply adjustment with group-specific gamma
group_adj(pre_test, post_test, gamma = c(high_ability_gamma, low_ability_gamma, ...))
This design allows for flexible modeling where gamma can vary by item, respondent characteristics, or any other grouping structure.
Scripts are released under MIT License.
The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.