epidict

The goal of {epidict} is to provide standardized data dictionaries for use in epidemiological data analysis templates. Currently it supports standardised dictionaries from MSF OCA. This is a product of the R4EPIs project; learn more at https://r4epi.github.io/sitrep/

Installation

You can install {epidict} from CRAN:

install.packages("epidict")

Click here for alternative installation options

If there is a bugfix or feature that is not yet on CRAN, you can install it via the {drat} package:

You can also install the in-development version from GitHub using the {remotes} package (but there’s no guarantee that it will be stable):

# install.packages("remotes")
remotes::install_github("R4EPI/epidict")

Accessing dictionaries

The dictionaries can be obtained via the msf_dict() function, which specifies a variable and their possible options (if categorical).

There are MSF intersectional outbreak dictionaries available in {epidict} based on ODK exports.

There are MSF OCA outbreak dictionaries available in {epidict} based on DHIS2 exports. > You can read more about the outbreak dictionaries at https://r4epi.github.io/epidict/articles/Outbreaks.html

In addition, there are MSF survey dictionaries available based on ODK exports. > You can read more about the survey dictionaries at https://r4epi.github.io/epidict/articles/Surveys.html

You can also read in your own ODK dictionaries using read_dict().

Generating data

The {epidict} package has a function for generating data that’s called gen_data(), which takes three arguments: The dictionary, which column describes the variable names, and how many rows are needed in the output.

Click here for code examples

library("epidict")
gen_data("Measles", varnames = "data_element_shortname", numcases = 100, org = "MSF")
#> # A tibble: 100 × 52
#>    case_number date_of_consultation_admis…¹ patient_facility_type patient_origin
#>    <chr>       <date>                       <fct>                 <chr>         
#>  1 A1          2018-04-23                   OP                    Village D     
#>  2 A2          2018-02-23                   OP                    Village C     
#>  3 A3          2018-04-15                   OP                    Village C     
#>  4 A4          2018-04-30                   OP                    Village A     
#>  5 A5          2018-01-09                   IP                    Village D     
#>  6 A6          2018-03-14                   OP                    Village D     
#>  7 A7          2018-03-20                   OP                    Village D     
#>  8 A8          2018-03-23                   OP                    Village A     
#>  9 A9          2018-01-23                   IP                    Village A     
#> 10 A10         2018-03-19                   OP                    Village D     
#> # ℹ 90 more rows
#> # ℹ abbreviated name: ¹date_of_consultation_admission
#> # ℹ 48 more variables: age_years <int>, age_months <int>, age_days <int>,
#> #   sex <fct>, pregnant <fct>, trimester <fct>,
#> #   foetus_alive_at_admission <fct>, exit_status <fct>, date_of_exit <date>,
#> #   time_to_death <fct>, pregnancy_outcome_at_exit <fct>,
#> #   baby_born_with_complications <fct>, previously_vaccinated <fct>, …
gen_data("Vaccination_long", varnames = "name", numcases = 100, org = "MSF")
#> # A tibble: 100 × 120
#>    start end   today deviceid date       team_number village_name village_other
#>    <lgl> <lgl> <lgl> <lgl>    <date>     <lgl>       <fct>        <lgl>        
#>  1 NA    NA    NA    NA       2018-02-11 NA          village_4    NA           
#>  2 NA    NA    NA    NA       2018-03-24 NA          village_2    NA           
#>  3 NA    NA    NA    NA       2018-02-02 NA          village_9    NA           
#>  4 NA    NA    NA    NA       2018-02-20 NA          village_3    NA           
#>  5 NA    NA    NA    NA       2018-04-09 NA          village_5    NA           
#>  6 NA    NA    NA    NA       2018-01-27 NA          village_7    NA           
#>  7 NA    NA    NA    NA       2018-03-12 NA          village_6    NA           
#>  8 NA    NA    NA    NA       2018-04-11 NA          village_3    NA           
#>  9 NA    NA    NA    NA       2018-04-28 NA          village_3    NA           
#> 10 NA    NA    NA    NA       2018-03-09 NA          other        NA           
#> # ℹ 90 more rows
#> # ℹ 112 more variables: cluster_number <dbl>, household_number <int>,
#> #   households_building <int>, random_hh <int>, consent <chr>,
#> #   no_consent_reason <fct>, no_consent_other <lgl>, caretaker_relation <fct>,
#> #   caretaker_other <lgl>, number_children <dbl>, child_number <chr>,
#> #   sex <fct>, date_birth <date>, age_years <int>, age_months <int>,
#> #   any_vaccine <fct>, vaccine_card <fct>, hf_records <fct>, …

Cleaning data with the dictionaries

You can use the dictionaries to clean the data via the {matchmaker} package:

Click here for code examples

library("matchmaker")
library("dplyr")

dat <- gen_data(dictionary = "Cholera", 
  varnames = "data_element_shortname",
  numcases = 20,
  org = "MSF"
)
print(dat)
#> # A tibble: 20 × 45
#>    case_number date_of_consultation_admiss…¹ patient_origin age_years age_months
#>    <chr>       <date>                        <chr>              <int>      <int>
#>  1 A1          2018-03-23                    Village D             18         NA
#>  2 A2          2018-01-29                    Village B             27         NA
#>  3 A3          2018-01-23                    Village D             40         NA
#>  4 A4          2018-02-02                    Village D             15         NA
#>  5 A5          2018-01-23                    Village B             28         NA
#>  6 A6          2018-01-15                    Village A             40         NA
#>  7 A7          2018-01-04                    Village B             34         NA
#>  8 A8          2018-04-25                    Village D             29         NA
#>  9 A9          2018-04-18                    Village D             45         NA
#> 10 A10         2018-01-22                    Village C             10         NA
#> 11 A11         2018-02-06                    Village A             19         NA
#> 12 A12         2018-03-03                    Village D             61         NA
#> 13 A13         2018-01-08                    Village B             20         NA
#> 14 A14         2018-03-08                    Village C             73         NA
#> 15 A15         2018-03-08                    Village C             66         NA
#> 16 A16         2018-03-18                    Village A             52         NA
#> 17 A17         2018-04-03                    Village A             59         NA
#> 18 A18         2018-01-12                    Village B             24         NA
#> 19 A19         2018-01-19                    Village D             53         NA
#> 20 A20         2018-03-07                    Village A              7         NA
#> # ℹ abbreviated name: ¹date_of_consultation_admission
#> # ℹ 40 more variables: age_days <int>, sex <fct>, pregnant <fct>,
#> #   trimester <fct>, foetus_alive_at_admission <fct>, exit_status <fct>,
#> #   date_of_exit <date>, time_to_death <fct>, pregnancy_outcome_at_exit <fct>,
#> #   previously_vaccinated <fct>, previous_vaccine_doses_received <fct>,
#> #   readmission <fct>, msf_involvement <fct>,
#> #   cholera_treatment_facility_type <fct>, residential_status_brief <fct>, …

# We want the expanded dictionary, so we will select `compact = FALSE`
dict <- msf_dict(dictionary = "Cholera", 
  long    = TRUE,
  compact = FALSE,
  tibble  = TRUE
)
print(dict)
#> # A tibble: 182 × 11
#>    data_element_uid data_element_name                     data_element_shortname
#>    <chr>            <chr>                                 <chr>                 
#>  1 AafTlSwliVQ      egen_001_patient_case_number          case_number           
#>  2 OTGOtWBz39J      egen_004_date_of_consultation_admiss… date_of_consultation_…
#>  3 wnmMr2V3T3u      egen_006_patient_origin               patient_origin        
#>  4 sbgqjeVwtb8      egen_008_age_years                    age_years             
#>  5 eXYhovYyl61      egen_009_age_months                   age_months            
#>  6 UrYJSk2Wp46      egen_010_age_days                     age_days              
#>  7 D1Ky5K7pFN6      egen_011_sex                          sex                   
#>  8 D1Ky5K7pFN6      egen_011_sex                          sex                   
#>  9 D1Ky5K7pFN6      egen_011_sex                          sex                   
#> 10 dTm5R53YYXC      egen_012_pregnancy_status             pregnant              
#> # ℹ 172 more rows
#> # ℹ 8 more variables: data_element_description <chr>,
#> #   data_element_valuetype <chr>, data_element_formname <chr>,
#> #   used_optionset_uid <chr>, option_code <chr>, option_name <chr>,
#> #   option_uid <chr>, option_order_in_set <dbl>

# Now we can use matchmaker to filter the data
dat_clean <- matchmaker::match_df(dat, dict, 
  from  = "option_code",
  to    = "option_name",
  by    = "data_element_shortname",
  order = "option_order_in_set"
)
print(dat_clean)
#> # A tibble: 20 × 45
#>    case_number date_of_consultation_admiss…¹ patient_origin age_years age_months
#>    <chr>       <date>                        <chr>              <int>      <int>
#>  1 A1          2018-03-23                    Village D             18         NA
#>  2 A2          2018-01-29                    Village B             27         NA
#>  3 A3          2018-01-23                    Village D             40         NA
#>  4 A4          2018-02-02                    Village D             15         NA
#>  5 A5          2018-01-23                    Village B             28         NA
#>  6 A6          2018-01-15                    Village A             40         NA
#>  7 A7          2018-01-04                    Village B             34         NA
#>  8 A8          2018-04-25                    Village D             29         NA
#>  9 A9          2018-04-18                    Village D             45         NA
#> 10 A10         2018-01-22                    Village C             10         NA
#> 11 A11         2018-02-06                    Village A             19         NA
#> 12 A12         2018-03-03                    Village D             61         NA
#> 13 A13         2018-01-08                    Village B             20         NA
#> 14 A14         2018-03-08                    Village C             73         NA
#> 15 A15         2018-03-08                    Village C             66         NA
#> 16 A16         2018-03-18                    Village A             52         NA
#> 17 A17         2018-04-03                    Village A             59         NA
#> 18 A18         2018-01-12                    Village B             24         NA
#> 19 A19         2018-01-19                    Village D             53         NA
#> 20 A20         2018-03-07                    Village A              7         NA
#> # ℹ abbreviated name: ¹date_of_consultation_admission
#> # ℹ 40 more variables: age_days <int>, sex <fct>, pregnant <fct>,
#> #   trimester <fct>, foetus_alive_at_admission <fct>, exit_status <fct>,
#> #   date_of_exit <date>, time_to_death <fct>, pregnancy_outcome_at_exit <fct>,
#> #   previously_vaccinated <fct>, previous_vaccine_doses_received <fct>,
#> #   readmission <fct>, msf_involvement <fct>,
#> #   cholera_treatment_facility_type <fct>, residential_status_brief <fct>, …

Name		Name	Last commit message	Last commit date
Latest commit History 206 Commits
.github		.github
R		R
docs		docs
inst/extdata		inst/extdata
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CRAN-SUBMISSION		CRAN-SUBMISSION
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
codecov.yml		codecov.yml
cran-comments.md		cran-comments.md
epidict.Rproj		epidict.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

epidict

Installation

Accessing dictionaries

Generating data

Cleaning data with the dictionaries

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

R4EPI/epidict

Folders and files

Latest commit

History

Repository files navigation

epidict

Installation

Accessing dictionaries

Generating data

Cleaning data with the dictionaries

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages