Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

nrennie/messy

Repository files navigation

R-CMD-check CRAN_Status_Badge

messy

When teaching examples using R, instructors often using nice datasets - but these aren't very realistic, and aren't what students will later encounter in the real world. Real datasets have typos, missing values encoded in strange ways, and weird spaces. The {messy} R package takes a clean dataset, and randomly adds these things in - giving students the opportunity to practice their data cleaning and wrangling skills without having to change all of your examples.

Read the preprint of the article associated with this package at nrennie.rbind.io/making-messy-data, and see the article source at github.com/nrennie/making-messy-data. The article gives more in-depth explanations of the motivation behind the package, and ideas of how to use it in teaching.

Installation

Install from CRAN using:

install.packages("messy")

Install development version from GitHub using:

remotes::install_github("nrennie/messy")

Usage

For more in-depth usage instructions, see the package documentation at nrennie.rbind.io/messy which has examples of each function.

The simplest way to use the {messy} package is applying the messy() function:

set.seed(1234)
messy(ToothGrowth[1:10,])
    len supp dose
1   4.2   VC  0.5
2  11.5 <NA> <NA>
3  7.3    VC  0.5
4   5.8  (VC  0.5
5   6.4   VC <NA>
6    10   VC  0.5
7  11.2 <NA>  0.5
8  11.2   VC  0.5
9  5.2    VC  0.5
10    7   VC 0.5 

You can vary the amount of messiness for each function, and chain together different functions to create customised messy data:

set.seed(1234)
ToothGrowth[1:10,] |> 
  make_missing(cols = "supp", missing = " ") |> 
  make_missing(cols = c("len", "dose"), missing = c(NA, 999)) |> 
  add_whitespace(cols = "supp", messiness = 0.5) |> 
  add_special_chars(cols = "supp")
    len supp dose
1   4.2   VC  0.5
2  11.5  VC    NA
3   7.3   VC  0.5
4   5.8 *VC   0.5
5   6.4  VC   0.5
6  10.0   VC  0.5
7  11.2       0.5
8  11.2  V#C   NA
9   5.2  !VC  0.5
10  7.0 VC*   0.5

About

R package to make a data frame messy and untidy.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 5

Languages