-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Docs say parameter sample_weight of LinearRegression.fit must be array but number is also valid #28732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Indeed, PR welcome to fix the documentation. |
Ok, will start working on it later today |
Given that providing the same value for all samples is a bit weird, does anyone remember why this is supported in the first place? |
I guess we need to edit the code to raise an error and re-run the tests to check if passing a scalar int or float is used anywhere in our code base, and in particular as part of public facing API. If it's not used anywhere, maybe we can deprecate this. |
My guess is that Or maybe we leave this undocumented and don't introduce a check. Nothing bad/incorrect happens if you pass a int/float as weight, it is just unusual. Maybe we need to balance this with helping users who passed a int/float by mistake. For those it would be useful to raise an error. |
Following @betatim's point that this technically works, but was probably a mistake, an intermediate solution could be to just warn for now: Warning: a single number {sample_weight} was provided as "sample_weight", each sample will receive the same weight of {sample_weight}. |
Raising an error in
Some tests just check that passing a float works. Some check consistency, i.e. passing a float has the same effect as passing None. |
To me keeping support for float can be handy if we want to extend our tests for consistent sample weight behavior as discussed in e.g. #15657. But it's not a big problem since we just pass an array with all equal elements. There a use case where I can think supporting a float is convenient: if you want to learn on minibatches where for some reason you want to apply the same weight for all elements in a batch but different between batches. Again, doable by passing an array with all equal elements, but less convenient. |
Is that something that should be done, i.e., use the I think if this was being done from scratch my inclination would be to only accept either the Perhaps the only downside may be undetected bugs that could occur. Not sure if any of the maintainers as some intuition for how likely you believe a bug of this kind would be to occur in someone's code? Doesn't seem too likely to me, but I'm quite new to open-source |
Describe the issue linked to the documentation
The documentation page for the
fit
method of theLinearRegression
class mentions that thesample_weight
parameter must be of typearray_like
orNone
(docs). However this is not entirely true since we can also passfloat
orint
for this parameter. Floats or ints get transformed into an array of that same value repeating n times. Code snippet here:scikit-learn/sklearn/utils/validation.py
Lines 2000 to 2003 in f59c503
This makes it that a sample weight of
float
orint
is essentially equal toNone
since they all have the same relative weight (not sure if I'm overseeing something, but could not think of any case where a float or int forsample_weight
could be meaningful).Suggest a potential alternative/fix
I see two possible fixes:
sample_weight
however they have no effect since there is no difference in the relative weight of the samples.sample_weight
parameter is afloat
or anint
.The text was updated successfully, but these errors were encountered: