-
-
Notifications
You must be signed in to change notification settings - Fork 95
Closed
Description
When I train a model with a task containing a numeric column, and then predict new data where that column is an integer, then $predict_newdata() internally creates a task where that feature is reported as a dbl feature, but the data is integer when retrieved with $data(). In most cases integers and numerics are the same in R, but when calling external code (C, Python) having the wrong type is bad. The discrepancy between reported type and actual data when gotten with $data() makes problems when one writes code that depends on the Task's reported type to do feature conversion.
ll <- lrn("classif.debug", save_tasks = TRUE)$train(tsk("iris"))
ll$predict_newdata(data.table(
Sepal.Length = 1L, Sepal.Width = 1L,
Petal.Length = 1L, Petal.Width = 1L
))
#> <PredictionClassif> for 1 observations:
#> row_ids truth response
#> 1 <NA> virginica
ll$model$task_predict
#> <TaskClassif:iris> (1 x 5)
#> * Target: Species
#> * Properties: multiclass
#> * Features (4):
#> - dbl (4): Petal.Length, Petal.Width, Sepal.Length, Sepal.Width
ll$model$task_predict$feature_types # prediction task types reported as 'dbl'
#> id type
#> 1: Petal.Length numeric
#> 2: Petal.Width numeric
#> 3: Sepal.Length numeric
#> 4: Sepal.Width numeric
str(ll$model$task_predict$data()) # actual data: integer type
#> Classes ‘data.table’ and 'data.frame': 1 obs. of 5 variables:
#> $ Species : Factor w/ 3 levels "setosa","versicolor",..: NA
#> $ Petal.Length: int 1
#> $ Petal.Width : int 1
#> $ Sepal.Length: int 1
#> $ Sepal.Width : int 1
#> - attr(*, ".internal.selfref")=<externalptr> I'd say ideally the $predict_newdata() code should probably make sure that the given newdata is compatible and do conversion to numeric in this case.