You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<h4id="New-to-Plotly?">New to Plotly?<aclass="anchor-link" href="#New-to-Plotly?">¶</a></h4><p>Plotly's Python library is free and open source! <ahref="https://plot.ly/python/getting-started/">Get started</a> by dowloading the client and <ahref="https://plot.ly/python/getting-started/">reading the primer</a>.
22
21
<br>You can set up Plotly to work in <ahref="https://plot.ly/python/getting-started/#initialization-for-online-plotting">online</a> or <ahref="https://plot.ly/python/getting-started/#initialization-for-offline-plotting">offline</a> mode, or in <ahref="https://plot.ly/python/getting-started/#start-plotting-online">jupyter notebooks</a>.
@@ -26,16 +25,14 @@ <h4 id="New-to-Plotly?">New to Plotly?<a class="anchor-link" href="#New-to-Plotl
<p>In statistics, normality tests are used to determine whether a data set is modeled for Normal (Gaussian) Distribution. Many statistical functions require that a distribution be normal or nearly normal.</p>
41
38
<p>There are several methods of assessing whether data are normally distributed or not. They fall into two broad categories: <em>graphical</em> and <em>statistical</em>.
<p>We can see that the mean and standard deviation are reasonable but rough estimations of the true underlying population mean and standard deviation, given the small-ish sample size.</p>
<h3id="Histogram-Plot">Histogram Plot<aclass="anchor-link" href="#Histogram-Plot">¶</a></h3><p>A simple and commonly used plot to quickly check the distribution of a sample of data is the histogram.</p>
156
148
<p>In the histogram, the data is divided into a pre-specified number of groups called bins. The data is then sorted into each bin and the count of the number of observations in each bin is retained.</p>
<p>Another popular plot for checking the distribution of a data sample is the quantile-quantile plot, Q-Q plot, or QQ plot for short.</p>
221
210
<p>This plot generates its own sample of the idealized distribution that we are comparing with, in this case the Gaussian distribution. The idealized samples are divided into groups (e.g. 5), called quantiles. Each data point in the sample is paired with a similar member from the idealized distribution at the same cumulative distribution.</p>
@@ -237,7 +226,7 @@ <h3 id="Quantile-Quantile-Plot">Quantile-Quantile Plot<a class="anchor-link" hre
<p>Running the example creates the QQ plot showing the scatter plot of points in a diagonal line, closely fitting the expected diagonal pattern for a sample from a Gaussian distribution.</p>
321
309
<p>There are a few small deviations, especially at the bottom of the plot, which is to be expected given the small data sample.</p>
@@ -324,16 +312,14 @@ <h3 id="Quantile-Quantile-Plot">Quantile-Quantile Plot<a class="anchor-link" hre
<p>There are many statistical tests that we can use to quantify whether a sample of data looks as though it was drawn from a Gaussian distribution.</p>
339
325
<p>Each test makes different assumptions and considers different aspects of the data.</p>
@@ -358,16 +344,14 @@ <h4 id="Interpretation-of-a-Test">Interpretation of a Test<a class="anchor-link"
<p>The <ahref="https://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test">Shapiro-Wilk test</a> evaluates a data sample and quantifies how likely it is that the data was drawn from a Gaussian distribution, named for Samuel Shapiro and Martin Wilk.</p>
373
357
<p>In practice, the Shapiro-Wilk test is believed to be a reliable test of normality, although there is some suggestion that the test may be suitable for smaller samples of data, e.g. thousands of observations or fewer.</p>
<p><ahref="https://en.wikipedia.org/wiki/Anderson%E2%80%93Darling_test">Anderson-Darling Test</a> is a statistical test that can be used to evaluate whether a data sample comes from one of among many known data samples, named for Theodore Anderson and Donald Darling.</p>
456
437
<p>It can be used to check whether a data sample is normal. The test is a modified version of a more sophisticated nonparametric goodness-of-fit statistical test called the <ahref="https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test">Kolmogorov-Smirnov test</a>.</p>
<p>Running the example calculates the statistic on the test data set and the critical values are tabulated.</p>
543
523
<p>Critical values in a statistical test are a range of pre-defined significance boundaries at which the H0 can be failed to be rejected if the calculated statistic is less than the critical value. Rather than just a single p-value, the test returns a critical value for a range of different commonly used significance levels.</p>
<p>The <ahref="https://en.wikipedia.org/wiki/D%27Agostino%27s_K-squared_test">D'Agostino's $K^{2}$ test</a> calculates summary statistics from the data, namely kurtosis and skewness, to determine if the data distribution departs from the normal distribution, named for Ralph D’Agostino.</p>
<h4id="Conclusion">Conclusion<aclass="anchor-link" href="#Conclusion">¶</a></h4><p>We have covered a few normality tests, but this is not all of the tests that exist. It is recommended to use all possible tests on your data, where appropriate.</p>
642
618
<p><strong><em>How to interpret the results?</em></strong></p>
643
619
<ul>
644
-
<li>Your data may not be normal for lots of different reasons. Each test looks at the question of whether a sample was drawn from a Gaussian distribution from a slightly different perspective.</li>
645
-
<li>Investigate why your data is not normal and perhaps use data preparation techniques to make the data more normal.</li>
620
+
<li>Your data may not be normal for many different reasons. Each test looks at the question of whether a sample was drawn from a Gaussian distribution from a slightly different perspective.</li>
621
+
<li>Investigate why your data is not normal and perhaps use data preparation techniques to normalize the data.</li>
646
622
<li>Start looking into the use of nonparametric statistical methods instead of the parametric methods.</li>
647
623
<li>If some of the methods suggest that the sample is Gaussian and some not, then perhaps take this as an indication that your data is Gaussian-like.</li>
648
624
<li>In many situations, you can treat your data as though it is Gaussian and proceed with your chosen parametric statistical methods.</li>
0 commit comments