Thanks to visit codestin.com
Credit goes to github.com

Skip to content

More scoring flexibility for SelectKBest / SelectPercentile #6673

Closed
@hlin117

Description

@hlin117

Issue 1: Allow scoring function to be a function that does not need to return pvalues

SelectKBest and SelectPercentile both require that the return value be a pair of arrays (scores, pvalues). But this prevents functions that return only one array of scores for each feature, such as

Issue 2: Make a wrapper around functions that score an individual feature

Requested from this comment.

This would be useful for certain scipy correlation functions, such as

For example,

def make_feature_scorer(feature_scorer):
    """A wrapper around functions which returns scores for each feature.

    `feature_scorer` should be a function that takes in as input (X, y), where 
    y is allowed to be None.

    Example
    ------------
    >>> from sklearn.feature_selection import make_feature_scorer, SelectKBest
    >>> from scipy.stats import spearmanr
    >>> from sklearn.datasets import make_classification
    >>> 
    >>> X, y = make_classification(random_state=0)
    >>> skb = SelectKBest(make_feature_scorer(spearmanr), k=10)
    >>> skb.fit(X, y)    # Calculates spearmanr for each feature in `X`
    >>> new_X = skb.transform(X)    # Selects the top k=10 features with greatest spearmanr

Note that the make_feature_scorer needs to also handle functions that return a score that might be negative. For example, kendalltau returns values between [-1, 1], where 0 means X has no correlation to the response y.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions