The moder package determines single or multiple modes (most frequent
values). By default, its functions check whether missing values make
this impossible, and return NA in this case. They have no
dependencies.
Mode functions fill a gap in measures of central tendency in R. mean()
and median() are built into the standard library, but there is a lack
of properly NA-sensitive functions for calculating the mode. Use moder
for this!
You can install the development version of moder like so:
remotes::install_github("lhdjung/moder")library(moder)Everything is fine here:
mode_first(c(7, 8, 8, 9, 9, 9))
#> [1] 9But what if some values are missing? Maybe there are so many missings
that it’s impossible to tell which value is the most frequent one. If
both NAs below are secretly 2, then 2 is the (first) mode.
Otherwise, 1 is. The mode is unclear, so the function returns NA:
mode_first(c(1, 1, 2, NA, NA))
#> [1] NAIgnore NAs using na.rm = TRUE if there is a strong rationale for it:
mode_first(c(1, 1, 2, NA, NA), na.rm = TRUE)
#> [1] 1The next example is different. Even if the NA stands in for 8, there
will only be three instances of 8 but four instances of 7. The mode
is 7, independent of the true value behind NA.
mode_first(c(7, 7, 7, 7, 8, 8, NA))
#> [1] 7This function captures multiple modes:
mode_all(c("a", "a", "b", "b", "c", "d", "e"))
#> [1] "a" "b"If some values are missing but there would be multiple modes when
ignoring NAs, mode_all() returns NA. That’s because missings can
easily create an imbalance between the equally-frequent known values:
mode_all(c(1, 1, 2, 2, NA))
#> [1] NAIf NA masks either 1 or 2, that number is the (single) mode. As
before, if the mode depends on missing values, the function returns
NA.
Yet na.rm = TRUE makes the function ignore this:
mode_all(c(1, 1, 2, 2, NA), na.rm = TRUE)
#> [1] 1 2mode_single() is stricter than mode_first(): It returns NA by
default if there are multiple modes. Otherwise, it works the same way.
mode_single(c(3, 4, 4, 5, 5, 5))
#> [1] 5
mode_single(c("x", "x", "y", "y", "z"))
#> [1] NAThese minimal and maximal sets of modes are possible given the missing value:
mode_possible_min(c("a", "a", "a", "b", "b", "c", NA))
#> [1] "a"
mode_possible_max(c("a", "a", "a", "b", "b", "c", NA))
#> [1] "a" "b"Ken Williams’ mode functions on Stack Overflow were pivotal to moder.