skimr

The goal of skimr is to provide a frictionless approach to dealing with summary statistics iteratively and interactively as part of a pipeline, and that conforms to the principle of least surprise.

skimr provides summary statistics that you can skim quickly to understand and your data and see what may be missing. It handles different data types (numerics, factors, etc), and returns a skimr object that can be piped or displayed nicely for the human reader.

See our blog post here.

Installation

# install.packages("devtools")
devtools::install_github("hadley/colformat")
devtools::install_github("ropenscilabs/skimr")

Skim statistics in the console

added missing, complete, n, sd
reports numeric/int/double separately from factor/chr
handles dates, logicals
uses Hadley Wickham's colformat package, specifically colformat::spark-bar()

Nicely separates numeric and factor variables:

Many numeric variables:

Another example:

skim_df object (long format)

By default skim prints beautifully in the console, but it also produces a long, tidy-format skim_df object that can be computed on.

a <-  skim(chickwts)
dim(a)
# [1] 22  5
View(a)

Compute on the full skim_df object

> skim(mtcars) %>% filter(stat=="hist")
# A tibble: 11 × 5
     var    type  stat      level value
   <chr>   <chr> <chr>      <chr> <dbl>
1    mpg numeric  hist ▂▅▇▇▇▃▁▁▂▂     0
2    cyl numeric  hist ▆▁▁▁▃▁▁▁▁▇     0
3   disp numeric  hist ▇▇▅▁▁▇▃▂▁▃     0
4     hp numeric  hist ▆▆▇▂▇▂▃▁▁▁     0
5   drat numeric  hist ▃▇▂▂▃▆▅▁▁▁     0
6     wt numeric  hist ▂▂▂▂▇▆▁▁▁▂     0
7   qsec numeric  hist ▂▃▇▇▇▅▅▁▁▁     0
8     vs numeric  hist ▇▁▁▁▁▁▁▁▁▆     0
9     am numeric  hist ▇▁▁▁▁▁▁▁▁▆     0
10  gear numeric  hist ▇▁▁▁▆▁▁▁▁▂     0
11  carb numeric  hist ▆▇▂▁▇▁▁▁▁▁     0

Works with strings!

Specify your own statistics

 funs <- list(iqr = IQR,
    quantile = purrr::partial(quantile, probs = .99))
  skim_with(numeric = funs, append = FALSE)
  skim_v(iris$Sepal.Length)
  
#  A tibble: 2 × 4
#      type     stat level value
#     <chr>    <chr> <chr> <dbl>
# 1 numeric      iqr  .all   1.3
# 2 numeric quantile   99%   7.7

Limitations of current version

Currently the print methods are still in early stages of development. Printing is limited to numeric, character, and factor data types. Therefore although additional types that are supported by skim() and skim_v() will not display with the default printing. To view these you may view and manipulate the skim object.

At the moment in addition to the three types with print support complex, logical, Date, POSIXct, and ts classes are supported with skim_v methods and the results are in the skim object.

We are also aware that both print.skim and print.data.frame (used for the skim object) do not handle significant digits incorrectly.

Windows support for spark histograms

Windows cannot print the spark-histogram characters when printing a data-frame. For example, "▂▅▇" is printed as "<U+2582><U+2585><U+2587>". This longstanding problem originates in the low-level code for printing dataframes. One workaround for showing these characters in Windows is to set the CTYPE part of your locale to Chinese/Japanese/Korean with Sys.setlocale("LC_CTYPE", "Chinese"). These values do show up by default when printing a data-frame created by skim() as a list (as.list()) or as a matrix (as.matrix()).

Contributing

We welcome issue reports and pull requests including adding support for different variable classes.

Name		Name	Last commit message	Last commit date
Latest commit History 280 Commits
R		R
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
blog.Rmd		blog.Rmd
blog.html		blog.html
blog.md		blog.md
skimr.Rproj		skimr.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

skimr

Installation

Skim statistics in the console

skim_df object (long format)

Compute on the full skim_df object

Works with strings!

Specify your own statistics

Limitations of current version

Windows support for spark histograms

Contributing

About

Uh oh!

Releases

Packages

Languages

License

paulklemm/skimr

Folders and files

Latest commit

History

Repository files navigation

skimr

Installation

Skim statistics in the console

skim_df object (long format)

Compute on the full skim_df object

Works with strings!

Specify your own statistics

Limitations of current version

Windows support for spark histograms

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages