-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Discretizer #5778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think the one-hot encoding makes sense for cross-product features. Ordinal features would be useful to remove noise but not for cross-product features. |
Yes but we provide tools to transform ordinal into one-hot On 10 November 2015 at 19:33, Mathieu Blondel [email protected]
|
Good point. I think I could live with a tool that produces ordinal features then :) |
Implementation at #4801 |
#4801 is not an implementation of what I proposed. Maybe we could have a |
MDLP is optionally supervised, iirc. On 10 November 2015 at 23:29, Mathieu Blondel [email protected]
|
I also opened up #5003 a long while ago. |
That's fine, I understand, @jnothman. It seems that @mblondel's discretization description is very similar to a function in R called cut. (cut is a function in R's standard library.)
The difference is that the user has to designate where the breaks are placed. If the user passes in an integer |
Is a PR welcome for this? We can discuss how we could like the class to be structured in the PR. |
@hlin117 That would be nice, thanks! |
Thanks for the support, @mblondel. I'll work on this PR. |
I would start simple (only uniform binning) and add more strategies in other PRs. |
Please check the PR in #5825. Thanks! |
\scikit-learn#5788: Resolving csc bugs
\scikit-learn#5788: More fixes to the min and max functions
…_ and searched_points_
A discretizer has been merged to a branch (https://github.com/scikit-learn/scikit-learn/tree/discrete). Should be merged to master once some remaining features and an example are added. |
Fixed in #9342 |
Binarizer
transforms continuous values to two states (0 or 1). It would be nice to generalize this to an arbitrary number of states K.This preprocessor would produce a scipy sparse matrix of shape (n_samples, K * n_features) using the one-of-K encoding. The K thresholds could be chosen uniformly between the min and max of each feature or using the K-quantiles.
For example, using uniformly chosen thresholds, if min=0, max=1.0 and K=3, a feature value between 0 and 0.33 would be encoded as [1, 0, 0], a value between 0.33 and 0.66 as [0, 1, 0] and a value between 0.66 and 1.0 as [0, 0, 1].
My usecase is that this encoding might be more meaningful than continuous values when using
PolynomialFeatures
.Possibly related to #1062.
The text was updated successfully, but these errors were encountered: