Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@evernat
Copy link
Contributor

@evernat evernat commented Jun 3, 2017

In my opinion the current variance (and by consequence the standard deviation) computation is imprecise, when curSize is not high (when curSize < 10 for example):

    variance = (sumSquares / curSize) - (mean * mean);

The computation I suggest in this pull request does not hurt and is precise:

    if (curSize == 1) {
      variance = 0d;
    } else {
      variance = (sumSquares - ((double) total * total / curSize)) / (curSize - 1);
    }

For reference:
http://web.archive.org/web/20050512031826/http://helios.bto.ed.ac.uk/bto/statistics/tress3.html

evernat added 2 commits June 4, 2017 00:33
In my opinion the current variance (and by consequence the standard deviation) computation are imprecise, when curSize is not high (when curSize < 10 for example):
```java
    variance = (sumSquares / curSize) - (mean * mean);
```
The computation I suggest in this pull request does not hurt and is precise:
```java
    if (curSize == 1) {
      variance = 0d;
    } else {
      variance = (sumSquares - ((double) total * total / curSize)) / (curSize - 1);
    }
```
For reference:
http://web.archive.org/web/20050512031826/http://helios.bto.ed.ac.uk/bto/statistics/tress3.html
Copy link
Contributor

@brharrington brharrington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. Looks good to me.

FYI, we do not use StatsTimer much internally. The computed stats are for a local node and do not work very well when slicing and dicing by various dimensions or looking at aggregates across the cluster which are the most common use-cases for us.

The basic timers report 4 stats: total, totalOfSquares, count, and max. The first three allow us to compute std dev across an arbitrary grouping by computing the sum of those stats and then using them to compute std dev server side.

}
mean = (double) total / curSize;
variance = (sumSquares / curSize) - (mean * mean);
if (curSize == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add a test case for current size of 1. At first glance none of the existing tests will check this branch.

@brharrington brharrington merged commit 243b8f3 into Netflix:master Jun 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants