-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Open
Labels
Description
Describe the issue linked to the documentation
The documentation page for the fit method of the LinearRegression class mentions that the sample_weight parameter must be of type array_like or None (docs). However this is not entirely true since we can also pass float or int for this parameter. Floats or ints get transformed into an array of that same value repeating n times. Code snippet here:
scikit-learn/sklearn/utils/validation.py
Lines 2000 to 2003 in f59c503
| if sample_weight is None: | |
| sample_weight = np.ones(n_samples, dtype=dtype) | |
| elif isinstance(sample_weight, numbers.Number): | |
| sample_weight = np.full(n_samples, sample_weight, dtype=dtype) |
This makes it that a sample weight of
float or int is essentially equal to None since they all have the same relative weight (not sure if I'm overseeing something, but could not think of any case where a float or int for sample_weight could be meaningful).
Suggest a potential alternative/fix
I see two possible fixes:
- Change the documentation to address the fact that numbers are valid values for
sample_weighthowever they have no effect since there is no difference in the relative weight of the samples. - Change the code so that an error or warning is raised if the
sample_weightparameter is afloator anint.