-
-
Notifications
You must be signed in to change notification settings - Fork 29
Description
problem: quite often, a learner breaks, because it sees SOME prediction in a larger table, which contains new, unseen factor levels. in such a case the predict of the underlying learner fails, completely.
see reprex here:
mlr-org/mlr3#97
this is really annoying. especially as this can happen on only a few observations, but we still 100% fail the complete prediction.
current options are: the mlr3 fallback learner. that does not really help. because this produces now fallback predictions on the complete test set.
here is MAYBE a better option.
PipOpUnseenLevels
before we go into the learner, we can on-training, store which levels are present in each factor.
PipOpUnseenLevels
train: task--stored-levels--->task
predict: task-->stored-levels-->task
train: simply stores a list, one element per factor feature, with the seen level
predict: does through all observations. for each observation where we see "unseen" levels, we create a random row, by sampling from the marginals of the columns.
that is a bit hacky, but should work?