add balanced accuracy metric #6747

amueller · 2016-05-02T18:52:55Z

I've recently see more people using "balanced accuracy" for imbalanced binary and multi-class problems. I think it is the same as macro average recall. If so, I think we might want to create an alias, because it is not super obvious, and maybe add a scorer.
Also see EpistasisLab/tpot#108

jnothman · 2016-05-03T10:09:15Z

Of course it is. Why didn't I think of that.

I think creating an alias (and a scorer) is a good idea, with the constraint that it applies to binary problems. It could also be calculated per-label for multilabel problems (and then potentially macro-averaged...).

jnothman · 2016-05-03T10:12:33Z

I think this is moderate seeing as it involved that data format checking and narrative docs.

rhiever · 2016-05-06T23:31:45Z

Following from EpistasisLab/tpot#108

Balanced accuracy is where you calculate accuracy on a per-class basis, then average all of those accuracies.

Here is a paper that introduces it: http://onlinelibrary.wiley.com/doi/10.1002/gepi.20211/abstract

jnothman · 2016-05-07T12:30:31Z

But by "accuracy" on a per-class basis, you must mean "recall"; and we're still only considering the binary classification case.

rhiever · 2016-05-07T13:48:26Z

Here's the definition that we use: https://github.com/rhiever/tpot/blob/master/tpot/tpot.py#L1207

In the multiclass case, we simply consider the current class we're calculating accuracy for to be 1 and the other classes to be 0, i.e., a one-vs-all configuration. That allows us to calculate accuracy normally for each class.

Indeed most of the papers that discuss balanced accuracy do so only in the context of binary classification, but it seems reasonable to expand it to the multiclass case in this manner.

jnothman · 2016-05-07T14:04:22Z

At a first glance, I don't think that multiclass definition is really appropriate. But I'll think about it at little. I suspect macro-averaged recall reflects better the intentions of balanced accuracy.

rhiever · 2016-06-10T20:04:44Z

I believe the same procedure is used with macro-averaged AUC.

From my understanding, macro-averaged recall != balanced accuracy in the multiclass case, only in the binary classification case. Thus I don't think we should label macro-averaged recall as balanced accuracy. Balanced accuracy is a separate metric that places more importance on TNR (relative to macro-averaged recall) in multiclass classification problems.

jnothman · 2016-06-13T08:44:47Z

macro-averaged AUC is explicitly for multilabel. Problem is that I'm not so
sure what TNR means in a multiclass context, or why OvR transform makes
sense for that. I've had a go at simplifying the overall score for a
3-class classification problem, by hand, but haven't got far enough that I
can see how its formula is interesting and value.

On 11 June 2016 at 06:04, Randy Olson [email protected] wrote:

I believe the same procedure is used with macro-averaged AUC
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html#sklearn.metrics.roc_auc_score
.

From my understanding, macro-averaged recall != balanced accuracy in the
multiclass case, only in the binary classification case. Thus I don't think
we should label macro-averaged recall as balanced accuracy. Balanced
accuracy is a separate metric that places more importance on TNR (relative
to macro-averaged recall) in multiclass classification problems.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#6747 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAEz6_-q3V22tgGRhOyF4MKstrokPeIhks5qKcNdgaJpZM4IVm6g
.

rhiever · 2016-06-14T20:01:10Z

Section 2.1.18 in the attached paper describes the mathematical formulation of balanced accuracy.

The key part is that you calculate balanced accuracy using a one-vs-all configuration. So you start with the first class a, where you treat the data as a binary classification task such that all records labeled a are 1 and all other classes are 0. Calculate accuracy as you normally would. Then for the next class b, repeat the same process except all instances labeled b are 1 and the rest are 0. And so on. Once you have the per-class accuracy for every class, average them and that is balanced accuracy.

Urbanowicz 2015 ExSTraCS 2.0 description and evaluation of a scalable learning.pdf

jnothman · 2016-06-15T02:21:02Z

I'm not persuaded that this is the right thing to do, but I am beginning to be persuaded that this is a logical extension that diverse people are assuming is legitimate.

rhiever · 2016-06-15T02:26:37Z

We've worked through the math and logic behind it several times and it checks off for us, but that doesn't mean we're right. I'm very curious to hear why it may not be the right thing to do.

jnothman · 2016-06-16T03:23:50Z

The redundancy of information inherent in including both one class's true positives and another's true negatives makes me a little uncomfortable.

However, the multiclass case has some niceties: such a macro-average over a binary problem actually results in the same formula as the non-multiclass treatment; and empirically it seems that random class assignment (from a fixed distribution) in the multiclass case will still yield a score of 0.5, which is pretty neat.

I'm coming to appreciate that this may be an appropriate extension

rhiever · 2016-06-16T16:01:27Z

Yes exactly. As with all metrics, balanced accuracy is just an indirect method of capturing what is "good" performance for our models. As a metric in the multiclass case, balanced accuracy puts a stronger emphasis on TN than TP, at least when compared to macro-averaged recall. And as you point out, balanced accuracy has the nice feature that 0.5 will consistently be "as good as random," with plenty of room for models to perform better (>0.5) or worse (<0.5) than random.

It'd be great if we could get balanced accuracy added as a new sklearn metric for measuring a model's multiclass performance.

amueller · 2016-06-17T17:29:34Z

If this is the only paper using this definition, I'm not sure we should include it. Where did you get it from? That paper?

amueller · 2016-06-17T17:34:17Z

So this paper says

Balanced Accuracy is the Recall for each class, averaged over the number of classes.

The paper that @rhiever claims introduces the metric does so only for binary, right?
The second paper indeed uses the average over the (sensitivity + specificity) / 2 over classes. But I think that is not the standard definition.

As long as we can't come up with what the standard definition is (if any), I don't think we should add it under this name. We can add macro_average_accuracy or something... This is somewhat similar to macro average f1, right?

You also first used a different definition of average precision in your code...

rhiever · 2016-06-17T18:36:21Z

You also first used a different definition of average precision in your code...

That was a bug that we fixed. :-)

The definition of balanced accuracy for the multiclass case is in the Urbanowicz paper. The original paper I linked to was only for the binary case, yes.

For the Master's thesis that you linked, that definition of balanced accuracy is under a section describing "Two-Class Evaluation Measures," i.e., binary or multilabel classification. I don't think that thesis discusses balanced accuracy in the multiclass case.

It's valid to say that balanced accuracy is the macro-averaged recall in the binary case. In the binary case, it works out the same mathematically as calculating accuracy on a per-class basis then averaging those two accuracies.

We're simply proposing an extension to the definition of balanced accuracy to also cover the multiclass case.

jnothman · 2016-06-18T09:58:43Z

I'm happy for you to veto this @amueller, after my change of heart. I was persuaded by the following features:

treating a binary problem as multiclass gives identical results (i.e. bal_acc(y_true, y_pred) == bal_acc(~y_true, ~y_pred)) == 1/2 * (bal_acc(y_true, y_pred) + bal_acc(~y_true, ~y_pred)))
the valuable property of ROC AUC, that regardless of class prevalence a random prediction will produce a fixed score in the limit holds true in the multiclass transformation
perfect classification performance is still 1.0 so introducing random error decreases the score towards 0.5, etc.

These properties are much more persuasively meaningful than any properties of macro-averaged P/R/F in the multiclass case!

This extension has also been reinvented in a few places, suggesting it is sought-after and reasonable.

rhiever · 2016-06-21T15:54:49Z

Here's another paper from the AutoML challenge that defines balanced accuracy for the multiclass case. They use a similar definition, with the only difference being the normalization procedure that they apply at the end (where they ensure that "as good as average accuracy" = (1 / N), where N is the number of classes).

jnothman · 2016-06-21T23:47:04Z

Thanks for the reference, though I think it muddies the water a bit (pending a look at their implementation). It's far from clear to me that the accuracies they are averaging class-wise in the multiclass case incorporate sensitivity and specificity. By default I assume they mean standard Rand accuracy over each binarization, although this seems a strange choice given that then the binary problem needs to be, as they say, a "special case".

That correction for chance (under a uniform prior) allows for an "everything incorrect" response to score zero (assuming I'm correct about their use of Rand accuracy). I don't think your score allows 0, except in the case where the only predicted classes are not in the gold standard.

Then classification at random in their measure does not yield a nice score that is invariant of the number of classes, nor one invariant to the distribution of those classes in the gold standard.

N = 1000000
for K in range(3, 10):
    x = np.random.rand(N)
    y_true = (x[:, None] < np.random.rand(K - 1)).sum(axis = 1)
    y_pred = np.random.randint(K, size=N)
    R = 1/K; classes=np.unique(np.concatenate([y_true, y_pred]))
    bac = np.mean([roc_auc_score(y_true == k, y_pred == k) for k in classes])
    chalearn_bac = np.mean([accuracy_score(y_true == k, y_pred == k) for k in classes])
    print('{:.2f}\t{:.2f}\t{:.2f}'.format(chalearn_bac, (chalearn_bac - R)/(1 - R), bac))

produces

unnorm norm yours
0.56    0.33    0.50
0.62    0.50    0.50
0.68    0.60    0.50
0.72    0.67    0.50
0.76    0.71    0.50
0.78    0.75    0.50
0.80    0.78    0.50

Same results if y_true is uniformly sampled, where above sampling is weighted.

jnothman · 2016-06-23T00:53:44Z

To make things murkier, that metric description is repeated here and hyperlinked to here but I don't see the relevance of the latter!

Ah. Now I see the relevance. They've actually implemented macro-averaged recall. Which means that, indeed, the chance correction they propose results in a score of 0 for random predictions. It also means that binary classification isn't actually a special case, despite what they say.

But they also have other nonsense in that paper such as "We also normalize F1 with F1 := (F1-R)/(1-R), where R is the expected value of F1 for random predictions (i.e. R=0.5 for binary classification and R=(1/C) for C-class classification problems)." Expected value of F1 for random predictions is the prevalence of the positive class in the binary case, not 0.5. So while they attempt to throw a principled kitchen sink of evaluation metrics at the task, I'm not sure they are coming from a place of critical expertise, at least in their description of the measures.

Still, the fact that they describe something different to your metric with the same name makes it a bit uncomfortable...

amueller · 2016-10-13T15:56:23Z

we can always call it macro_average_accuracy (that's what we're talking about, right?) and say that "balanced accuracy" can mean "macro average accuracy" or "macro average recall" depending on who you ask.

…ikit-learn#6747

amueller · 2018-09-10T20:50:34Z

Haven't followed this and I'm kinda busy, but this seems like a potential blocker, right?

amueller · 2018-09-10T20:57:03Z

@ledell I think @jnothman is concerned with what's a good metric because people use what's in sklearn. People use R^2 for regression because it's the default in sklearn. People use 10 trees in a random forest b/c it's default in sklearn (we are changing the latter, it's hard to change the former).
Basically sklearn is prescriptive just because of its wide use, for better or worse.

Honestly my conclusion from that would be that we force the user to pick, though. Maybe having an option as @adrinjalali suggests, and for the scorers/strings only have macro_average_recall and macro_average_accuracy.

For the record, I think that log-loss is a terrible metric for multi-class classification.
If the true class is 0, these two have the same log loss:

(.4, .6, 0)
(.4, .3, .3)

where the argmax gives a correct classification in the second case, but not the first.

amueller · 2018-09-10T20:59:28Z

Wasn't it the case that chalearn implemented something different in the code than what they said in the paper? at least one of the ml competitions used weighted macro-average recall.

ledell · 2018-09-11T08:14:27Z

@amueller

People use R^2 for regression because it's the default in sklearn.

Agreed... to clarify, I don't have a preference on what the default metric should be for multi-class problems -- my only concern is the use of a polysemous method name like balanced accuracy_score() to refer to only one of the two things (this is the current status of the code in the release candidate). If you have an option that allows switching between the two definitions, that seems fine to me. Are there any other scoring methods that have a switch like this?

glemaitre · 2018-09-11T15:02:10Z

Wasn't it the case that chalearn implemented something different in the code than what they said in the paper? at least one of the ml competitions used weighted macro-average recall.

It does not look like it: https://github.com/ch-imad/AutoMl_Challenge/blob/master/Starting_kit/scoring_program/libscores.py#L203

I don't have a preference on what the default metric should be for multi-class problems

I have a preference there: in addition to the points raised by @jnothman in #6747 (comment), the macro-average recall is falling back the the accuracy_score with a balanced dataset while this is not the case for the macro-average accuracy.

If you care about the accuracy of each class equally (regardless of it's presence in the training set), then it's an appropriate metric to use.

So you are interested about the accuracy and do not want to correct for the class imbalancing. Actually, I was wondering why there is no average parameter in the accuracy_score?

If you have an option that allows switching between the two definitions, that seems fine to me

I would find this option confusing. We all agree that the literature lack of clarity regarding the definition of the metric and this option replicates the same fuzziness in the implementation.

So, I am fine with the current behavior and naming of the balanced_accuracy_score. I don't see a problem on choosing one of the propose definition in the literature, document it properly, and warn about the controversy. IMO, selecting the macro-average recall seems the most appropriate choice (equivalence with accuracy in balanced setting). I would be inclined to also add an average option to the accuracy_score. This behavior corresponds to the other definition and could be added to the documentation as well.

glemaitre · 2018-09-11T15:10:05Z

FWIW, an alternative metric used in the imbalanced classification literature is the geometric mean of the per-class recall.

amueller · 2018-09-11T15:12:19Z

I would find this option confusing. We all agree that the literature lack of clarity regarding the definition of the metric and this option replicates the same fuzziness in the implementation.

Why would you find this confusing? Indeed this would mean the implementation reflects the state of the signature and the understanding of the community. There's no fuzzyness if there's two definitions for the same name.

And there's lots of literature on multi-class metrics and we can go into that at some point, I think @ledell makes a good point in being clear about what we implement and allowing alternatives.
Similarly we decided not to pick an averaging strategy in any of the multi-class metrics and require users to explicitly pass an averaging strategy to clarify what they want.

glemaitre · 2018-09-11T17:28:53Z

Why would you find this confusing? Indeed this would mean the implementation reflects the state of the signature and the understanding of the community. There's no fuzzyness if there's two definitions for the same name.

"Confusion" might not be the right term but returning completely different statistics would surprise me and I am not sure that we can advise to choose either implementation. In short, I am scared that users switch methods because the score obtained is higher. I am also concerned for the string style for the metric. Having balanced_accuracy_score and different strings will be difficult to document well.

Regarding the metric itself, alternative definitions which do not guarantee to obtain the same result than accuracy_score in a balanced setting seem weird to me.

And there's lots of literature on multi-class metrics and we can go into that at some point, I think @ledell makes a good point in being clear about what we implement and allowing alternatives.

I completely agree with this. I am sure that we can make the documentation better and we should be opened to alternative methods, even if I have my concerns this time with the alternative balanced_accuracy_score.

amueller · 2018-09-11T21:34:00Z

I am also concerned for the string style for the metric. Having balanced_accuracy_score and different strings will be difficult to document well.

That's what we do for the different averaging methods, right?

glemaitre · 2018-09-11T21:37:51Z

That's what we do for the different averaging methods, right?

That's true

jnothman · 2018-09-12T07:02:09Z

Wasn't it the case that chalearn implemented something different in the code than what they said in the paper? at least one of the ml competitions used weighted macro-average recall.

It does not look like it: ch-imad/AutoMl_Challenge:Starting_kit/scoring_program/libscores.py@master#L203

@amueller is right here. You've referenced the binary case, @glemaitre. Chalearn AutoML indeed implements macro-average recall, adjusted so that random performance is 0: https://github.com/ch-imad/AutoMl_Challenge/blob/2353ec0/Starting_kit/scoring_program/libscores.py#L206-L208. This is equivalent to our balanced_accuracy with adjusted=True.

I think we can safely eliminate Chalearn as a counter-example to our implementation preference, @ledell. But we could explicitly note in our docs that adjusted=True equates to Chalearn's. I think we may have indeed contacted the authors at some point (@amueller obviously has a better memory of all this history than I do).

If we can presume that Chalearn's description was in error, and that some of the subsequent references to "averaged accuracy" are copying Chalearn's in error, can we let this go?

Can we please lead the community, and define the standard meaning of balanced accuracy because we have identified many arguments for this definition (and several against alternatives), and put the discrepancy in the literature to rest??

jnothman · 2018-09-12T07:03:29Z

We already do say in our documentation that adjusted=True equates to the Chalearn implementation. We do not note that their description is in error. Should we??

qinhanmin2014 · 2018-09-13T10:16:40Z

Yes, I think we can have implement multiple definitions here.

Maybe having an option as @adrinjalali suggests, and for the scorers/strings only have macro_average_recall and macro_average_accuracy.

+1. It's not so good but maybe we need to do so if we keep the name balanced_accuracy_score

What's the definition of macro_average_accuracy here? I don't think we've provided users with references about macro_average_accuracy, and I can't find any references in the PR.

Also, I start to wonder whether it's good to regard class balanced accuracy as a multiclass definition of balanced_accuracy_score. It seems that they are two different metrics (See P46 of Mosley2013, where the authors compare balanced accuracy and class balance. What's more, the authors clearly state that Balanced Accuracy is the Recall for each class, averaged over the number of classes. in P25), am I wrong?

amueller · 2018-09-18T21:41:08Z

I'm too tired (in several ways ;) to make a decision on this but I think it's the last remaining blocker?

qinhanmin2014 · 2018-09-19T08:18:56Z

I guess a new option will not block the new release. Things we need to consider now is whether we need to change a name for current scorer. Also, if we decide to implement macro_average_accuracy as another option, we might need to provide some references in the user guide.

jnothman · 2018-09-19T12:01:59Z

I'm, FWIW, -1 on a new option. I don't want to perpetuate the misreading of the Guyon et al (Chalearn AutoML) paper where they have inaccurately described their implementation.

jnothman · 2018-09-19T12:02:18Z

I'm okay with having macro-average accuracy available, only I don't know what use it is.

qinhanmin2014 · 2018-09-19T14:25:41Z

I'm, FWIW, -1 on a new option. I don't want to perpetuate the misreading of the Guyon et al (Chalearn AutoML) paper where they have inaccurately described their implementation.

So the only reference we have for the so-called macro_average_accuracy is Guyou et al. 2015? If so, I might prefer to to close the issue and leave balanced_accuracy_score as it is. I think by saying the average of class-wise accuracy, they actually mean macro_average_recall.

I'm okay with having macro-average accuracy available, only I don't know what use it is.

I'll vote +0(maybe -1) to include it unless we can find some references which clearly define macro_average_accuracy. At least I can't find any references in the PR.

jnothman · 2018-09-19T21:51:36Z

The references to variant multiclass balanced accuracy are discussed in model_evaluation.rst

qinhanmin2014 · 2018-09-20T01:13:48Z

The references to variant multiclass balanced accuracy are discussed in model_evaluation.rst

@jnothman Which entry? Seems that class balanced accuracy and balanced accuracy from Urbanowicz et al. 2015 are not the so-called macro_average_accuracy discussed here?

jnothman · 2018-09-20T03:04:33Z

Sorry. This conservation has been thoroughly confused. Partially because of the time passed since we solved it and wrote it up. No one refers to macro averaged accuracy as balanced accuracy. That is only a misunderstanding due to Guyon et al. We could provide an Urbanowicz-style implementation but I think it's a really poor reuse of the name "balanced accuracy" for something that has nothing to do with it. Binary balanced accuracy does not incorporate precision, it incorporates specificity. They account for the same kind of error but are fundamentally different quantifications of that error. Notably, the demonstrate denominator of specificity depends only on y_true, while the denominator of precision depends on y_pred. It behaves very differently depending on the biases of the classifier to particular classes.

qinhanmin2014 · 2018-09-20T03:11:57Z

We could provide an Urbanowicz-style implementation but I think it's a really poor reuse of the name "balanced accuracy" for something that has nothing to do with it.

Agree. I don't think the definition from Urbanowicz et al. 2015 is widely accepted, unless provided with more references.

@jnothman Close the issue?

jnothman · 2018-09-20T04:12:21Z

I'm happy to have it closed.

amueller · 2018-09-24T17:16:22Z

macro averaged accuracy as balanced accuracy

I think @rhiever and @ledell do, right? but ok to leave this closed.

qinhanmin2014 · 2018-09-25T00:55:01Z

I think @rhiever and @ledell do, right?

I'm happy to reopen if someone provides some references (except for Guyou et al. 2015)

jnothman · 2018-09-25T23:59:46Z

No one refers to macro averaged accuracy as balanced accuracy.

I think @rhiever and @ledell do, right?

No, @ledell cited macro-averaged accuracy as what Guyon et al call balanced accuracy, and as something offered in H2O, but as mean zero-one loss, not under the name "balanced accuracy". As far as I can glean from above @rhiever has used the Urbanowicz definition, which I was incorrect above to say incorporates precision (that was Mosley et al.; I need this mess like I need a hole in the head!), but rather is the average of binary balanced accuracies for each class. (Need I argue against this again?)

amueller added the Easy Well-defined and straightforward way to resolve label May 2, 2016

jnothman added the Need Contributor label May 3, 2016

jnothman added Moderate Anything that requires some knowledge of conventions and best practices and removed Easy Well-defined and straightforward way to resolve labels May 3, 2016

xyguo mentioned this issue May 3, 2016

[WIP] score function computing balanced accuracy #6752

Closed

rhiever mentioned this issue Jun 17, 2016

Replace balanced_accuracy with macro-averaged recall from sklearn EpistasisLab/tpot#108

Closed

amueller added Sprint and removed Sprint labels Jul 15, 2016

dalmia added a commit to dalmia/scikit-learn that referenced this issue Dec 16, 2016

FIX: added changes as per last review See scikit-learn#6752, fixes sc…

348b1ac

…ikit-learn#6747

amueller added the Blocker label Sep 10, 2018

ogrisel added this to the 0.20 milestone Sep 18, 2018

qinhanmin2014 closed this as completed Sep 20, 2018

ledell mentioned this issue Sep 25, 2018

Multi-class balanced accuracy definition inconsitent ch-imad/AutoMl_Challenge#3

Open

Uh oh!

add balanced accuracy metric #6747

add balanced accuracy metric #6747

Comments

amueller commented May 2, 2016

jnothman commented May 3, 2016

Uh oh!

jnothman commented May 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhiever commented May 6, 2016

Uh oh!

jnothman commented May 7, 2016

Uh oh!

rhiever commented May 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented May 7, 2016

Uh oh!

rhiever commented Jun 10, 2016

Uh oh!

jnothman commented Jun 13, 2016

Uh oh!

rhiever commented Jun 14, 2016

Uh oh!

jnothman commented Jun 15, 2016

Uh oh!

rhiever commented Jun 15, 2016

Uh oh!

jnothman commented Jun 16, 2016

Uh oh!

rhiever commented Jun 16, 2016

Uh oh!

amueller commented Jun 17, 2016

Uh oh!

amueller commented Jun 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhiever commented Jun 17, 2016

Uh oh!

jnothman commented Jun 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhiever commented Jun 21, 2016

Uh oh!

jnothman commented Jun 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jun 23, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Oct 13, 2016

Uh oh!

amueller commented Sep 10, 2018

Uh oh!

amueller commented Sep 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Sep 10, 2018

Uh oh!

ledell commented Sep 11, 2018

Uh oh!

glemaitre commented Sep 11, 2018

Uh oh!

glemaitre commented Sep 11, 2018

Uh oh!

amueller commented Sep 11, 2018

Uh oh!

glemaitre commented Sep 11, 2018

Uh oh!

amueller commented Sep 11, 2018

Uh oh!

glemaitre commented Sep 11, 2018

Uh oh!

jnothman commented Sep 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Sep 12, 2018

Uh oh!

qinhanmin2014 commented Sep 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

jnothman commented May 3, 2016 •

edited

Loading

rhiever commented May 7, 2016 •

edited

Loading

amueller commented Jun 17, 2016 •

edited

Loading

jnothman commented Jun 18, 2016 •

edited

Loading

jnothman commented Jun 21, 2016 •

edited

Loading

jnothman commented Jun 23, 2016 •

edited

Loading

amueller commented Sep 10, 2018 •

edited

Loading

jnothman commented Sep 12, 2018 •

edited

Loading

qinhanmin2014 commented Sep 13, 2018 •

edited

Loading

qinhanmin2014 commented Sep 19, 2018 •

edited

Loading