grouping sets

Some keywords: GROUPING SETS, ROLLUP, CUBE, GROUPING
Some references: [postgres](http://www.postgresql.org/docs/9.5/static/queries-table-expressions.html#QUERIES-GROUPING-SETS), [Oracle](https://oracle-base.com/articles/misc/rollup-cube-grouping-functions-and-grouping-sets), [SQL Server](https://technet.microsoft.com/en-us/library/bb510427%28v=sql.105%29.aspx), [groupings combined with arbitrary functions](http://www.databasejournal.com/features/mssql/using-the-rollup-cube-and-grouping-sets-operators.html)

_Grouping sets_ and friends are useful to pre-calculate various aggregation levels, which is often desired. Api for that feature in data.table is not very friendly, see [Aggregating sub totals and grand totals with data.table](http://stackoverflow.com/q/9315258/2490497).

In case of _rollup_ those are aggregations for provided `by` from top to bottom. See description from postgres man, and example code below.

```
ROLLUP ( e1, e2, e3, ... )
```

is equivalent to:

```
GROUPING SETS (
    ( e1, e2, e3, ... ),
    ...
    ( e1, e2 )
    ( e1 )
    ( )
)
```

I wonder if there could be cheap speed-up of that process? this is potentially heavy computing task. Would be great to have computation of _grouping sets_ feature developed in C, so all the _rollup/cube_ and other features could be built on top of _grouping sets_ more easily in R still utilizing full speed.

---

Answers to update when closed:
- [ ] https://stackoverflow.com/questions/21366138/multi-level-aggregations-like-grouping-sets-via-ddply-or-other-r-function

``` r
library(plyr)
grp.cols <- c("vs", "am", "gear", "carb", "cyl")
plyr.r = do.call(
    rbind.fill,
    lapply(1:length(grp.cols), function(x) ddply(mtcars, grp.cols[1:x], summarize, agg=mean(mpg)))
)

library(data.table) # 1.9.7+
dt.r = rollup(as.data.table(mtcars), j = .(agg=mean(mpg)), by=grp.cols)
all.equal(
    as.data.table(plyr.r),
    dt.r[-.N], # exclude grand total, not present in BrodieG answer
    ignore.row.order = TRUE,
    ignore.col.order = TRUE
)
#[1] TRUE
# install.packages("data.table", type = "source", repos = "https://Rdatatable.github.io/data.table")
```
- [x] https://stackoverflow.com/questions/9315258/aggregating-sub-totals-and-grand-totals-with-data-table/

``` r
library(data.table)
set.seed(1)
DT = data.table(
    group=sample(letters[1:2],100,replace=TRUE), 
    year=sample(2010:2012,100,replace=TRUE),
    v=runif(100))

cube(DT, mean(v), by=c("group","year"))
#    group year        V1
#1:     a 2011 0.4176346
#2:     b 2010 0.5231845
#3:     b 2012 0.4306871
#4:     b 2011 0.4997119
#5:     a 2012 0.4227796
#6:     a 2010 0.2926945
#7:    NA 2011 0.4463616
#8:    NA 2010 0.4278093
#9:    NA 2012 0.4271160
#10:     a   NA 0.3901875
#11:     b   NA 0.4835788
#12:    NA   NA 0.4350153
cube(DT, mean(v), by=c("group","year"), id=TRUE)
#    grouping group year        V1
#1:        0     a 2011 0.4176346
#2:        0     b 2010 0.5231845
#3:        0     b 2012 0.4306871
#4:        0     b 2011 0.4997119
#5:        0     a 2012 0.4227796
#6:        0     a 2010 0.2926945
#7:        2    NA 2011 0.4463616
#8:        2    NA 2010 0.4278093
#9:        2    NA 2012 0.4271160
#10:        1     a   NA 0.3901875
#11:        1     b   NA 0.4835788
#12:        3    NA   NA 0.4350153

# install.packages("data.table", type = "source", repos = "https://Rdatatable.github.io/data.table")
```

Some other questions can get new answers also:  
- [ ] https://stackoverflow.com/questions/20918619/nested-table-within-column-sub-group-totals-frequencies-and-percentages-using
- [ ] https://stackoverflow.com/questions/10956300/grouping-and-sorting-in-r/10956655#10956655
- [ ] https://stackoverflow.com/questions/5982546/r-calculating-column-sums-row-sums-as-an-aggregation-from-a-dataframe
- [ ] https://stackoverflow.com/questions/14242409/use-plyr-to-compute-margins
- [ ] https://stackoverflow.com/questions/2566766/margin-totals-in-xtabs
- [ ] https://stackoverflow.com/questions/12445574/subtotals-in-columns-using-reshape2
- [x] http://stackoverflow.com/questions/36169073/how-to-do-group-by-rollup-in-r-like-sql


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grouping sets #1377

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

grouping sets #1377

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions