-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Use _check_sample_weight to consistently validate sample_weight #15358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@NicolasHug I could give it a try. Furthermore, should |
I think for the above mentioned estimators @NicolasHug intended this as an easier refactoring issues for new contributors, but if you want to look into it feel free to open PRs.
@lorentzenchr Your expertise would certainly be appreciated there. As you mention #15438 there is definitely work to be done on improving
There are some use cases when it is useful, see #12464 (comment) but in most cases it would indeed make sense to error on them. In the linked issues it was suggested maybe to enable this check but then allow it to be disabled with a global flag in |
In that case, I will focus on |
Thanks @lorentzenchr . BTW @rth maybe we should add a |
Should sample_weights be made an array in all cases? I feel like we have shortcuts if it's Also: negative sample weights used to be allowed in tree-based models, not sure if they still are. |
No hence my comment above |
Oh sorry didn't see that ;) |
@fbchow and I will pick up the fix for BaseDecisionTree for the scikitlearn sprint |
@fbchow we will do DBSCAN for the wmlds scikitlearn sprint (pair programming @akeshavan) |
working on BaseBagging for the wmlds sprint with Honglu Zhang (@ritalulu) |
@MDouriez and I will work on the GaussianNB one |
Working on BaseGradientBoosting for wimlds sprint (pair programming @akeshavan) |
Working on BaseForest for wimlds sprint (pair programming @lakrish) |
often the check is within a |
@fbchow and I will move onto DummyClassifier. |
[MRG] Fixed DummyRegressor |
Working on the IsotonicRegression one (notice that the isotonic_regression function doesn't have a validating step for sample_weight, so no need to work on that one) |
Picking up KernelDensity with @fbchow |
In case people are wondering the ones left up for grabs are
|
We now need a github feature to prevent an issue from being closed by PRs.. |
@NicolasHug I've updated the list of the estimators already generalized: I think that the issue will be more readable if the list is on the top of the page. Do you think you could find a minute to update it? Thanks a lot! |
Done, thanks @cmarmo |
can take KernelRidge |
We'd like to try to tackle this one for BaseDiscreteNB. @gelavizh1 @lschwetlick #ScikitLearnSpint |
Now working with @gelavizh1 and @lschwetlick on IsotonicRegression. |
@jeremiedbb it looks like everything has been addressed? We can open a specific issue for the linear models. BTW looks like there was already an open PR for IsotonicRegression... |
Yes I guess it has been overtook a bit quick. I'm closing it |
We recently introduced
utils.validation._check_sample_weight
which returns a validatedsample_weight
array.We should use it consistently throughout the code base, instead of relying on custom and adhoc checks like
check_consistent_lenght
orcheck_array
(which are now handled by_check_sample_weight
).Here's a list of the estimators/functions that could make use of it (mostly in
fit
orpartial_fit
):(I left-out the linear_model module because it seems more involved there)
Could be a decent sprint issue @amueller ?
To know where a given class is defined, use e.g.
git grep -n "class DBSCAN"
The text was updated successfully, but these errors were encountered: