Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Use cross_validation.cross_val_score with metrics.precision_recall_fscore_support #1837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
SolomonMg opened this issue Apr 3, 2013 · 19 comments
Milestone

Comments

@SolomonMg
Copy link

I'd like to use cross_validation.cross_val_score with metrics.precision_recall_fscore_support so that I can get all relevant cross-validation metrics without having to run my cross-validation once for accuracy, once for precision, once for recall, and once for f1. But when I try this I get a ValueError:

from sklearn.datasets import fetch_20newsgroups

from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import metrics
from sklearn import cross_validation
import numpy as np

data_train = fetch_20newsgroups(subset='train', #categories=categories,
                                shuffle=True, random_state=42)
clf = LinearSVC(loss='l1', penalty='l2')
vectorizer = TfidfVectorizer(
  sublinear_tf=False, 
  max_df=0.5,
  min_df=2, 
  ngram_range = (1,1),
  use_idf=False,
  stop_words='english')

X_train = vectorizer.fit_transform(data_train.data)

# Cross-validate:
scores = cross_validation.cross_val_score(
  clf, X_train, data_train.target, cv=5, 
  scoring=metrics.precision_recall_fscore_support)
@amueller
Copy link
Member

amueller commented Apr 4, 2013

You would need to do scoring=AsScorer(metrics.precision_recall_fscore_support) but that also doesn't work. Currently the scoring interface only supports returning a single value. I gotta fix that. There is also some work by @jnothman that may help you.

@jnothman
Copy link
Member

jnothman commented Apr 4, 2013

Yes, @solomonm, something like this was my initial motivation for #1768. Allowing scoring functions to return a tuple, as you have attempted here, would suffice for some cases, but not for *SearchCV where the scores need to be summable and comparable.

In that PR I implemented an approach in which Scorer objects may provide a collection of named scores, providing a PRFScorer that would handle your use-case; I only made changes affecting GridSearchCV and RandomizedSearchCV.

I would still like something like this available, but it needs a fresh understanding of what a Scorer does and its API (see discussion at #1774); and to be useful in a parameter search, it requires extensible output from *SearchCV.fit(...) as proposed in #1787, which is awaiting feedback.

@amueller
Copy link
Member

amueller commented Apr 4, 2013

I think for for @solomonm a quick hack in cross_val_score would probably help (as I think he doesn't want to compare scores). I am really really sorry I didn't have more time to look at your work and it is really high up in my sklearn priority queue.

@SolomonMg
Copy link
Author

@andreas Totally, this is really only useful to help the analyst assess performance on multiple measures (very common use case). I doubt using this for param tuning will be a common use case. 

Sent from Mailbox for iPhone

On Thu, Apr 4, 2013 at 6:16 AM, Andreas Mueller [email protected]
wrote:

I think for for @solomonm a quick hack in cross_val_score would probably help (as I think he doesn't want to compare scores). I am really really sorry I didn't have more time to look at your work and it is really high up in my sklearn priority queue.

Reply to this email directly or view it on GitHub:
#1837 (comment)

@amueller
Copy link
Member

amueller commented Apr 5, 2013

@solomonm just a quick github reminder: if you use @, someone will get a ping / email. You just pinged some random guy ;)

Do you know what to change to make this work? I will try to do it on the weekend.

@arjoly
Copy link
Member

arjoly commented Jul 25, 2013

It is addressed by the new scorer interface.

@arjoly arjoly closed this as completed Jul 25, 2013
@amueller
Copy link
Member

err not by the one that was already merged, right?

@arjoly
Copy link
Member

arjoly commented Jul 25, 2013

I have closed this a bit fast.

@arjoly arjoly reopened this Jul 25, 2013
@jnothman
Copy link
Member

the multi-metric support was not merged, pending further discussion of what it should look like.

@mwjackson
Copy link

What is the status of this?

@amueller
Copy link
Member

@mwjackson It is part of @rvraghav93's google summer of code this year. nothing finished yet, though.

@arnavsharma93
Copy link

Any update on this?

@amueller
Copy link
Member

same as in july ;) getting there.

@jnothman
Copy link
Member

I think the elided "now that GSoC has concluded" might have been intended!

On 1 September 2015 at 04:35, Andreas Mueller [email protected]
wrote:

same as in july ;) getting there.


Reply to this email directly or view it on GitHub
#1837 (comment)
.

@raghavrv
Copy link
Member

I take the full blame for not having finished this during my GSoC... :)

Now that #4294 is merged, I will start working on it shortly and hope to get this merged ASAP!!

@hlin117
Copy link
Contributor

hlin117 commented Apr 7, 2016

Progress?

@jnothman
Copy link
Member

jnothman commented Apr 7, 2016

See #2759 (comment) for the latest!

@jnothman
Copy link
Member

jnothman commented Feb 7, 2017

Although #7388 appears close to being accepted for merge, I'd like to discuss the broader question of whether extending scoring is the right way to go about this, or whether a simpler, more generic callback for retrieving (or storing) experimental artefacts is appropriate.

When I suggested such a hook for diverse diagnostics years ago, someone -- @GaelVaroquaux I assume -- suggested it's not so hard to roll your own CV/search so why need such a callback? Many features have been added since to *SearchCV (cv_results_ including timing and training score, safe indexing for different input types, on_error, potentially use_warm_start in #8230, etc.) by which I would suggest that roll-your-own is no longer the right answer.

Advantages of extending scoring as in #7388:

  • intuitive specification of additional scorers.
  • results include mean, std, rank.
  • avoids user effort/error in terms of weighted averaging (wrt iid) and calling prediction methods. (I don't even think we provide the user with test_sample_counts` to make that averaging easy to DIY.)
  • can still be hacked to store arbitrary data in global variables or out of memory as long as float is returned (although X_test needs to be mapped back to CV split index if associating the data with a particular split is important).
  • it's implemented; and it's still possible to implement a more general diagnostic callback

Disadvantages:

  • restricted to single float per scorer, and even if we extend this to handle arrays or tuples, we will probably require their shape to be fixed across all calls.
  • scoring is no longer semantically appropriate, as we only use one metric to score, and the others for diagnostics. refit becomes a bit awkward.
  • limited to being a function of (estimator, X_test, y_test).
  • I fear that if we want to start seamlessly returning per-class results or avoid duplicated prediction work (particularly useful for things like KNN, kernel SVM), we will be engineering a beast.

What's the right compromise?

@jnothman
Copy link
Member

jnothman commented Feb 8, 2017

I've added a couple of points to my previous comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants