An issue with an associated common check originally discussed in #15015
EDIT: here is a report of statistical worst offenders:
https://github.com/snath-xoc/sample-weight-audit-nondet/blob/main/reports/sklearn_estimators_sample_weight_audit_report.ipynb
This is a pretty simple sample_weight test that says that a weight of 0 is equivalent to not having the samples.
I think every failure here should be considered a bug. This is:
The following estimators have a stochastic fit, so testing for correct handling of sample weights cannot be tested with check_sample_weight_equivalence but instead requires a statistical test:
(some might have been fixed since, need to check).
The required sample weight invariance properties (including the behavior of sw=0) were also discussed in #15657
EDIT: expected sample_weight semantics have since been more generally in the refactoring of check_sample_weight_invariance into check_sample_weight_equivalence to check fitting with integer sample weights is equivalent to fitting with repeated data points with a number of repetitions that matches the original weights.
An issue with an associated common check originally discussed in #15015
EDIT: here is a report of statistical worst offenders:
https://github.com/snath-xoc/sample-weight-audit-nondet/blob/main/reports/sklearn_estimators_sample_weight_audit_report.ipynb
CategoricalNB().__sklearn_tags__.input_tags.categoricaltoTrue#31556fitis deterministic or not with the config used incheck_sample_weight_equivalence: to be investigated.check_sample_weight_equivalencefails event when subsampling for binning is not enabled.sample_weightaware but can then be only properly tested with a statistical test instead ofcheck_sample_weight_equivalenceLinearRegression's numerical stability on rank deficient data by setting thecondparameter in the call toscipy.linalg.lstsq#30040liblinearliblinearlbfgscausescheck_sample_weight_equivalenceto fail (slightly)liblinearwithC=0.01causescheck_sample_weight_equivalenceto fail (slightly)check_sample_weight_equivalencenow passes for this estimator after lowering thetolvalue forlsqrandsparse-cgin the per-check params.check_sample_weight_equivalencenow passes for this estimator after lowering thetolvalue forlsqrandsparse-cgin the per-check params.cv(which is the case incheck_sample_weight_equivalence) orscoringparams.check_sample_weight_equivalencefails withprobability=False: to be investigatedprobability=Trueas the weights are not propagated to the internal CV implemented in libsvmsample_weightto their scorer by default FIX Forward sample weight to the scorer in grid search #30743.sample_weightin general: SLEP006: default routing #26179The following estimators have a stochastic fit, so testing for correct handling of sample weights cannot be tested with
check_sample_weight_equivalencebut instead requires a statistical test:KBinsDiscretizer#29907n_samplesand uniform or quantile strategies, the fit is deterministic.(some might have been fixed since, need to check).
The required sample weight invariance properties (including the behavior of sw=0) were also discussed in #15657
EDIT: expected
sample_weightsemantics have since been more generally in the refactoring ofcheck_sample_weight_invarianceintocheck_sample_weight_equivalenceto check fitting with integer sample weights is equivalent to fitting with repeated data points with a number of repetitions that matches the original weights.