-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Description
Description
Following the discussion on the mailing list about the random_state of the One-Class SVM it might be good to clarify a few things.
-
The
random_stateparameter inOneClassSVMis defined by its docstring as:
"The seed of the pseudo random number generator to use when shuffling the data. [...]"
If this refers to the shuffling used for probability estimation then I think it is incorrect as there is no probability estimation for the One-Class SVM. If this refers to something else it might be good to clarify it in the docstring or in the User Guide. But from the Libsvm paper it seems that the underlying SMO implementation is not random? -
Same issue for the
random_stateparameter ofLinearSVC? (LinearSVCdoes not seem to provide probability estimation). Doesrandom_statecontrol the seed of the random number generator used in the underlying implementation? As written in the doc:
"The underlying LinearSVC implementation uses a random number generator to select features when fitting the model. It is thus not uncommon, to have slightly different results for the same input data. If that happens, try with a smaller tol parameter."
Other comments
The way randomness is handled for SVC and nuSVC could deserve a bit more explanation in the doc as well. The docstring of random_state is the same as the one used for the OneClassSVM. However the dosctring of random_seed in sklearn.libsvm.fit clearly states that this parameter is used for probability estimation:
"Seed for the random number generator used for probability estimates. 0 by default".
We should maybe do the same and specifies if the underlying LibSVM implementation is random or not?