Thanks to visit codestin.com
Credit goes to github.com

Skip to content

option to "flatten" backends #942

@mb706

Description

@mb706

Apparently we get some overhead when mlr3pipelines builds tasks with many BackendCbinds. One way to fix this would be if there were an option to "flatten" cbinded tasks.
Suggested interface:

Task$flatten(force = FALSE)  # default

creates a task with a single BackendDataTable, unless this is for some reason a bad idea, e.g. when a backend is a database backend. (A Backend class would need to report whether flattening is a "bad idea", possibly with an active binding, e.g. a database backend could say flattening is okay if the size is less than X MB)

Setting force = TRUE should OTOH flatten the task always, equivalent to creating a new task with the task$data().

Example: TaskClassif that consists of two cbinded data.tables that were cbinded with a database backend:
(abbreviating (DataBackend as DB)

                TaskClassif
                  |
               DBCbind
              /       \
         DBCbind      DBDataBase
        /       \ 
 DBDataTable DBDataTable

$flatten(force = FALSE):

                TaskClassif
                  |
               DBCbind
              /       \
     DBDataTable      DBDataBase

$flatten(force = TRUE):

               TaskClassif
                  |
               DBDataTable

We could think whether it is a good idea if mlr3pipelines does this with all its output tasks by default.

Another question is whether that should be an in-place operation that swaps out a task's data backend, or whether this should create a new task.

Another question is what to do with columns that do not have any column role. Maybe a good default would be to drop backends that do not provide columns that have a role (and are therefore ignored in many cases).

Maybe we would want to have a DataBackendMultiCBind that can cbind multiple sources, so even a task that has many different database backends will only be one level deep at the most after flattening. The $flatten(force = FALSE) -operation would have to check, for each column, if it comes from a data backend that reports it does not want to be flattened. There should be a method in DataBackend that does this recursively. $flatten() would then construct the desired DataBackendMultiCBind.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions