Closed
Description
Issue 1: Allow scoring function to be a function that does not need to return pvalues
SelectKBest
and SelectPercentile
both require that the return value be a pair of arrays (scores, pvalues)
. But this prevents functions that return only one array of scores for each feature, such as
- mutual_info_regression
- mutual_info_classif
np.var(X, axis=0)
(In this case,y
is None).lambda X: np.var(X, axis=0)
to be precise.
Issue 2: Make a wrapper around functions that score an individual feature
Requested from this comment.
This would be useful for certain scipy correlation functions, such as
For example,
def make_feature_scorer(feature_scorer):
"""A wrapper around functions which returns scores for each feature.
`feature_scorer` should be a function that takes in as input (X, y), where
y is allowed to be None.
Example
------------
>>> from sklearn.feature_selection import make_feature_scorer, SelectKBest
>>> from scipy.stats import spearmanr
>>> from sklearn.datasets import make_classification
>>>
>>> X, y = make_classification(random_state=0)
>>> skb = SelectKBest(make_feature_scorer(spearmanr), k=10)
>>> skb.fit(X, y) # Calculates spearmanr for each feature in `X`
>>> new_X = skb.transform(X) # Selects the top k=10 features with greatest spearmanr
Note that the make_feature_scorer
needs to also handle functions that return a score that might be negative. For example, kendalltau returns values between [-1, 1]
, where 0
means X
has no correlation to the response y
.
Metadata
Metadata
Assignees
Labels
No labels