scikit-learn · SaurabhJha · Aug 29, 2014 · MechCoder · Oct 28, 2014 · SaurabhJha
diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst
@@ -48,10 +48,10 @@ Common cases: predefined values
 
 For the most common use cases, you can designate a scorer object with the
 ``scoring`` parameter; the table below shows all possible values.
-All scorer ojects follow the convention that higher return values are better 
-than lower return values.  Thus the returns from mean_absolute_error 
-and mean_squared_error, which measure the distance between the model 
-and the data, are negated.  
+All scorer ojects follow the convention that higher return values are better
+than lower return values.  Thus the returns from mean_absolute_error
+and mean_squared_error, which measure the distance between the model
+and the data, are negated.
 
 
 ======================     =======================================     ==================================
@@ -60,7 +60,7 @@ Scoring                    Function
 **Classification**
 'accuracy'                 :func:`metrics.accuracy_score`
 'average_precision'        :func:`metrics.average_precision_score`
-'f1'                       :func:`metrics.f1_score`                    
+'f1'                       :func:`metrics.f1_score`
 'log_loss'                 :func:`metrics.log_loss`                    requires ``predict_proba`` support
 'precision'                :func:`metrics.precision_score`
 'recall'                   :func:`metrics.recall_score`
@@ -91,10 +91,10 @@ Usage examples:
 
 .. note::
 
-    The values listed by the ValueError exception correspond to the functions measuring 
+    The values listed by the ValueError exception correspond to the functions measuring
     prediction accuracy described in the following sections.
     The scorer objects for those functions are stored in the dictionary
-    ``sklearn.metrics.SCORERS``.  
+    ``sklearn.metrics.SCORERS``.
 
 .. currentmodule:: sklearn.metrics
 
@@ -112,8 +112,8 @@ measuring a prediction error given ground truth and prediction:
 - functions ending with ``_error`` or ``_loss`` return a
   value to minimize, the lower the better.  When converting
   into a scorer object using :func:`make_scorer`, set
-  the ``greater_is_better`` parameter to False (True by default; see the 
-  parameter description below). 
+  the ``greater_is_better`` parameter to False (True by default; see the
+  parameter description below).
 
 Metrics available for various machine learning tasks are detailed in sections
 below.
@@ -136,33 +136,33 @@ the :func:`fbeta_score` function::
     >>> grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=ftwo_scorer)
 
 The second use case is to build a completely custom scorer object
-from a simple python function using :func:`make_scorer`, which can 
-take several parameters: 
+from a simple python function using :func:`make_scorer`, which can
+take several parameters:
 
-* the python function you want to use (``my_custom_loss_func`` 
+* the python function you want to use (``my_custom_loss_func``
   in the example below)
 
-* whether the python function returns a score (``greater_is_better=True``, 
-  the default) or a loss (``greater_is_better=False``).  If a loss, the output 
+* whether the python function returns a score (``greater_is_better=True``,
+  the default) or a loss (``greater_is_better=False``).  If a loss, the output
   of the python function is negated by the scorer object, conforming to
-  the cross validation convention that scorers return higher values for better models. 
+  the cross validation convention that scorers return higher values for better models.
 
-* for classification metrics only: whether the python function you provided requires continuous decision 
-  certainties (``needs_threshold=True``).  The default value is 
+* for classification metrics only: whether the python function you provided requires continuous decision
+  certainties (``needs_threshold=True``).  The default value is
   False.
 
 * any additional parameters, such as ``beta`` in an :func:`f1_score`.
 
-Here is an example of building custom scorers, and of using the 
+Here is an example of building custom scorers, and of using the
 ``greater_is_better`` parameter::
 
     >>> import numpy as np
     >>> def my_custom_loss_func(ground_truth, predictions):
-    ...     diff = np.abs(ground_truth - predictions).max()		
+    ...     diff = np.abs(ground_truth - predictions).max()
     ...     return np.log(1 + diff)
     ...
-    >>> # loss_func will negate the return value of my_custom_loss_func, 
-    >>> #  which will be np.log(2), 0.693, given the values for ground_truth 
+    >>> # loss_func will negate the return value of my_custom_loss_func,
+    >>> #  which will be np.log(2), 0.693, given the values for ground_truth
     >>> #  and predictions defined below.
     >>> loss  = make_scorer(my_custom_loss_func, greater_is_better=False)
     >>> score = make_scorer(my_custom_loss_func, greater_is_better=True)
@@ -175,7 +175,7 @@ Here is an example of building custom scorers, and of using the
     -0.69...
     >>> score(clf,ground_truth, predictions) # doctest: +ELLIPSIS
     0.69...
-    
+
 
 .. _diy_scoring:
 
@@ -193,7 +193,7 @@ the following two rules:
 
 - It returns a floating point number that quantifies the
   ``estimator`` prediction quality on ``X``, with reference to ``y``.
-  Again, by convention higher numbers are better, so if your scorer 
+  Again, by convention higher numbers are better, so if your scorer
   returns loss, that value should be negated.
 
 
@@ -214,7 +214,6 @@ Some of these are restricted to the binary classification case:
 .. autosummary::
    :template: function.rst
 
-   hinge_loss
    matthews_corrcoef
    precision_recall_curve
    roc_curve
@@ -226,6 +225,7 @@ Others also work in the multiclass case:
    :template: function.rst
 
    confusion_matrix
+   hinge_loss
 
 
 Some also work in the multilabel case:
@@ -307,7 +307,7 @@ The :func:`confusion_matrix` function evaluates
 classification accuracy by computing the `confusion matrix
 <http://en.wikipedia.org/wiki/Confusion_matrix>`_.
 
-By definition, entry :math:`i, j` in a confusion matrix is 
+By definition, entry :math:`i, j` in a confusion matrix is
 the number of observations actually in group :math:`i`, but
 predicted to be in group :math:`j`. Here is an example::
 
@@ -330,7 +330,7 @@ from the :ref:`example_model_selection_plot_confusion_matrix.py` example):
 .. topic:: Example:
 
   * See :ref:`example_model_selection_plot_confusion_matrix.py`
-    for an example of using a confusion matrix to evaluate classifier output 
+    for an example of using a confusion matrix to evaluate classifier output
     quality.
 
   * See :ref:`example_classification_plot_digits_classification.py`
@@ -661,11 +661,11 @@ Then the metrics are defined as:
   (array([ 0.66...,  0.        ,  0.        ]), array([ 1.,  0.,  0.]), array([ 0.71...,  0.        ,  0.        ]), array([2, 2, 2]...))
 
 
-Hinge loss 
+Hinge loss
 ----------
 
 The :func:`hinge_loss` function computes the average distance between
-the model and the data using 
+the model and the data using
 `hinge loss <http://en.wikipedia.org/wiki/Hinge_loss>`_, a one-sided metric
 that considers only prediction errors. (Hinge
 loss is used in maximal margin classifiers such as support vector machines.)
@@ -678,8 +678,22 @@ value, and :math:`w` is the predicted decisions as output by
 
   L_\text{Hinge}(y, w) = \max\left\{1 - wy, 0\right\} = \left|1 - wy\right|_+
 
-Here is a small example demonstrating the use of the :func:`hinge_loss` function
-with a svm classifier::
+If there are more than two labels, :func:`hinge_loss` uses a multiclass variant
+due to Crammer & Singer.
+`Here <http://jmlr.csail.mit.edu/papers/volume2/crammer01a/crammer01a.pdf>`_ is
+the paper describing it.
+
+If :math:`y_w` is the predicted decision for true label and :math:`y_t` is the
+maximum of the predicted decisions for all other labels, where predicted
+decisions are output by decision function, then multiclass hinge loss is defined
+by:
+
+.. math::
+
+  L_\text{Hinge}(y_w, y_t) = \max{1 + y_t - y_w, 0\right\}
+
+Here a small example demonstrating the use of the :func:`hinge_loss` function
+with a svm classifier in a binary class problem::
 
   >>> from sklearn import svm
   >>> from sklearn.metrics import hinge_loss
@@ -696,6 +710,22 @@ with a svm classifier::
   >>> hinge_loss([-1, 1, 1], pred_decision)  # doctest: +ELLIPSIS
   0.3...
 
+Here is an example demonstrating the use of the :func:`hinge_loss` function
+with a svm classifier in a multiclass problem::
+
+  >>> X = np.array([[0], [1], [2], [3]])
+  >>> Y = np.array([0, 1, 2, 3])
+  >>> labels = np.array([0, 1, 2, 3])
+  >>> est = svm.LinearSVC()
+  >>> est.fit(X, Y)
+  LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
+       intercept_scaling=1, loss='l2', max_iter=1000, multi_class='ovr',
+       penalty='l2', random_state=None, tol=0.0001, verbose=0)
+  >>> pred_decision = est.decision_function([[-1], [2], [3]])
+  >>> y_true = [0, 2, 3]
+  >>> hinge_loss(y_true, pred_decision, labels)  #doctest: +ELLIPSIS
+  0.56...
+
 
 Log loss
 --------
@@ -752,7 +782,7 @@ sample has label 0.  The log loss is non-negative.
 Matthews correlation coefficient
 ---------------------------------
 
-The :func:`matthews_corrcoef` function computes the 
+The :func:`matthews_corrcoef` function computes the
 `Matthew's correlation coefficient (MCC) <http://en.wikipedia.org/wiki/Matthews_correlation_coefficient>`_
 for binary classes.  Quoting Wikipedia:
 
@@ -788,7 +818,7 @@ function:
 Receiver operating characteristic (ROC)
 ---------------------------------------
 
-The function :func:`roc_curve` computes the 
+The function :func:`roc_curve` computes the
 `receiver operating characteristic curve, or ROC curve <http://en.wikipedia.org/wiki/Receiver_operating_characteristic>`_.
  Quoting Wikipedia :
 

diff --git a/doc/whats_new.rst b/doc/whats_new.rst
@@ -64,6 +64,12 @@ Enhancements
      to `Rohit Sivaprasad`_), as well as evaluation metrics (by
      `Joel Nothman`_).
 
+   - Add ``sample_weight`` parameter to `metrics.jaccard_similarity_score`.
+     By `Jatin Shah`.
+
+   - Add support for multiclass in `metrics.hinge_loss`. Added ``labels=None``
+     as optional paramter. By `Saurabh Jha`.
+
    - Add ``multi_class="multinomial"`` option in
      :class:`linear_model.LogisticRegression` to implement a Logistic
      Regression solver that minimizes the cross-entropy or multinomial loss

diff --git a/sklearn/metrics/classification.py b/sklearn/metrics/classification.py
@@ -16,6 +16,7 @@
 #          Joel Nothman <[email protected]>
 #          Noel Dawe <[email protected]>
 #          Jatin Shah <[email protected]>
+#          Saurabh Jha <[email protected]>
 # License: BSD 3 clause
 
 from __future__ import division
@@ -1376,14 +1377,20 @@ def log_loss(y_true, y_pred, eps=1e-15, normalize=True, sample_weight=None):
     return _weighted_sum(loss, sample_weight, normalize)
 
 
-def hinge_loss(y_true, pred_decision, pos_label=None, neg_label=None):
+def hinge_loss(y_true, pred_decision, labels=None):
     """Average hinge loss (non-regularized)
 
-    Assuming labels in y_true are encoded with +1 and -1, when a prediction
-    mistake is made, ``margin = y_true * pred_decision`` is always negative
-    (since the signs disagree), implying ``1 - margin`` is always greater than
-    1.  The cumulated hinge loss is therefore an upper bound of the number of
-    mistakes made by the classifier.
+    In binary class case, assuming labels in y_true are encoded with +1 and -1,
+    when a prediction mistake is made, ``margin = y_true * pred_decision`` is
+    always negative (since the signs disagree), implying ``1 - margin`` is
+    always greater than 1.  The cumulated hinge loss is therefore an upper
+    bound of the number of mistakes made by the classifier.
+
+    In multiclass case, the function expects that either all the labels are
+    included in y_true or an optional labels argument is provided which
+    contains all the labels. The multilabel margin is calculated according
+    to Crammer-Singer's method. As in the binary case, the cumulated hinge loss
+    is an upper bound of the number of mistakes made by the classifier.
 
     Parameters
     ----------
@@ -1394,6 +1401,9 @@ def hinge_loss(y_true, pred_decision, pos_label=None, neg_label=None):
     pred_decision : array, shape = [n_samples] or [n_samples, n_classes]
         Predicted decisions, as output by decision_function (floats).
 
+    labels : array, optional, default None
+        Contains all the labels for the problem. Used in multiclass hinge loss.
+
     Returns
     -------
     loss : float
@@ -1403,6 +1413,16 @@ def hinge_loss(y_true, pred_decision, pos_label=None, neg_label=None):
     .. [1] `Wikipedia entry on the Hinge loss
             <http://en.wikipedia.org/wiki/Hinge_loss>`_
 
+    .. [2] Koby Crammer, Yoram Singer. On the Algorithmic
+           Implementation of Multiclass Kernel-based Vector
+           Machines. Journal of Machine Learning Research 2,
+           (2001), 265-292
+
+    .. [3] 'L1 AND L2 Regularization for Multiclass Hinge Loss Models
+           by Robert C. Moore, John DeNero.
+           <http://www.ttic.edu/sigml/symposium2011/papers/
+           Moore+DeNero_Regularization.pdf>'
+
     Examples
     --------
     >>> from sklearn import svm
@@ -1420,27 +1440,56 @@ def hinge_loss(y_true, pred_decision, pos_label=None, neg_label=None):
     >>> hinge_loss([-1, 1, 1], pred_decision)  # doctest: +ELLIPSIS
     0.30...
 
+    In the multiclass case:
+    >>> X = np.array([[0], [1], [2], [3]])
+    >>> Y = np.array([0, 1, 2, 3])
+    >>> labels = np.array([0, 1, 2, 3])
+    >>> est = svm.LinearSVC()
+    >>> est.fit(X, Y)
+    LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
+         intercept_scaling=1, loss='l2', max_iter=1000, multi_class='ovr',
+         penalty='l2', random_state=None, tol=0.0001, verbose=0)
+    >>> pred_decision = est.decision_function([[-1], [2], [3]])
+    >>> y_true = [0, 2, 3]
+    >>> hinge_loss(y_true, pred_decision, labels)  #doctest: +ELLIPSIS
+    0.56...
     """
-    # TODO: multi-class hinge-loss
     check_consistent_length(y_true, pred_decision)
+    pred_decision = check_array(pred_decision, ensure_2d=False)
     y_true = column_or_1d(y_true)
-    pred_decision = column_or_1d(pred_decision)
-
-    # the rest of the code assumes that positive and negative labels
-    # are encoded as +1 and -1 respectively
-    lbin = LabelBinarizer(neg_label=-1)
-    y_true = lbin.fit_transform(y_true)[:, 0]
-
-    if len(lbin.classes_) > 2 or (pred_decision.ndim == 2
-                                  and pred_decision.shape[1] != 1):
-        raise ValueError("Multi-class hinge loss not supported")
-    pred_decision = np.ravel(pred_decision)
-
-    try:
-        margin = y_true * pred_decision
-    except TypeError:
-        raise TypeError("pred_decision should be an array of floats.")
+    y_true_unique = np.unique(y_true)
+    if y_true_unique.size > 2:
+        if (labels is None and pred_decision.ndim > 1 and
+                (np.size(y_true_unique) != pred_decision.shape[1])):
+            raise ValueError("Please include all labels in y_true "
+                             "or pass labels as third argument")
+        if labels is None:
+            labels = y_true_unique
+        le = LabelEncoder()
+        le.fit(labels)
+        y_true = le.transform(y_true)
+        mask = np.ones_like(pred_decision, dtype=bool)
+        mask[np.arange(y_true.shape[0]), y_true] = False
+        margin = pred_decision[~mask]
+        margin -= np.max(pred_decision[mask].reshape(y_true.shape[0], -1),
+                         axis=1)
+
+    else:
+        # Handles binary class case
+        # this code assumes that positive and negative labels
+        # are encoded as +1 and -1 respectively
+        pred_decision = column_or_1d(pred_decision)
+        pred_decision = np.ravel(pred_decision)
+
+        lbin = LabelBinarizer(neg_label=-1)
+        y_true = lbin.fit_transform(y_true)[:, 0]
+
+        try:
+            margin = y_true * pred_decision
+        except TypeError:
+            raise TypeError("pred_decision should be an array of floats.")
+
     losses = 1 - margin
-    # The hinge doesn't penalize good enough predictions.
+    # The hinge_loss doesn't penalize good enough predictions.
     losses[losses <= 0] = 0
     return np.mean(losses)