You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd like to propose native categorical support for linear models, similar to #18394, i.e. ordinal encoded columns can be specified as "categorical". I see 3 main benefits:
Possibly better user experience because one-hot-encoding becomes obsolete (though ordinal encoding is required as long pandas.Categorical is unsupported)
Possible speed up of fitting as the orthogonal design of categoricals could be exploited.
Possible better memory footprint in particular in combination with dense numerical features.
Describe your proposed solution
Add a new parameter categorical_features indicating which columns to treat as categoricals (ordinal encoded) as in #18394.
Then, add this functionality via a coordinate descent (begin edit) and/or newton-cholesky (end edit) solver:
Internally, do as if categoricals were one-hot-encoded (as is done for the multiclass targets in HGBT, cf. code here) and exploit the following structure:
For a feature (sub-) matrix X of a single one-hot-encoded feature and a diagonal weight matrix W, it holds that: X.t @ W @ X = diagonal.
Thus, coordinate descent could loop in parallel over all levels/categories of this feature X, i.e. a parallelized block update.
Estimators
If only the existing coordinate descent solver is modified, then only squared error based estimators, i.e. ElasticNet and Lasso, would profit.
If a new or extended coordinate descent solver is OK, then several GLMs would also have native categorical support, i.e. LogisticRegression, PoissonRegressor, TweedieRegressor, etc. See also #16637.
The text was updated successfully, but these errors were encountered:
Uh oh!
There was an error while loading. Please reload this page.
Describe the workflow you want to enable
I'd like to propose native categorical support for linear models, similar to #18394, i.e. ordinal encoded columns can be specified as "categorical". I see 3 main benefits:
pandas.Categorical
is unsupported)Describe your proposed solution
Add a new parameter
categorical_features
indicating which columns to treat as categoricals (ordinal encoded) as in #18394.Then, add this functionality via a coordinate descent (begin edit) and/or newton-cholesky (end edit) solver:
Internally, do as if categoricals were one-hot-encoded (as is done for the multiclass targets in HGBT, cf. code here) and exploit the following structure:
For a feature (sub-) matrix
X
of a single one-hot-encoded feature and a diagonal weight matrixW
, it holds that:X.t @ W @ X = diagonal
.Thus, coordinate descent could loop in parallel over all levels/categories of this feature
X
, i.e. a parallelized block update.Estimators
If only the existing coordinate descent solver is modified, then only squared error based estimators, i.e.
ElasticNet
andLasso
, would profit.If a new or extended coordinate descent solver is OK, then several GLMs would also have native categorical support, i.e.
LogisticRegression
,PoissonRegressor
,TweedieRegressor
, etc. See also #16637.The text was updated successfully, but these errors were encountered: