Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
57 views21 pages

Understanding Hypothesis Tests: Why We Need To Use Hypothesis Tests in Statistics

This document discusses hypothesis testing and how to interpret p-values. It provides an example of an economist analyzing energy cost data from families to determine if the monthly cost has changed from the previous year. The document explains that hypothesis testing is needed because sampling error means the sample mean may differ from the population mean by chance. It introduces key concepts like significance levels, critical regions, and p-values. Graphs are used to illustrate these concepts and show that the sample data in the example is statistically significant at the 0.05 level but not the 0.01 level, matching the results of comparing the p-value to the significance levels.

Uploaded by

udai1singh-2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views21 pages

Understanding Hypothesis Tests: Why We Need To Use Hypothesis Tests in Statistics

This document discusses hypothesis testing and how to interpret p-values. It provides an example of an economist analyzing energy cost data from families to determine if the monthly cost has changed from the previous year. The document explains that hypothesis testing is needed because sampling error means the sample mean may differ from the population mean by chance. It introduces key concepts like significance levels, critical regions, and p-values. Graphs are used to illustrate these concepts and show that the sample data in the example is statistically significant at the 0.05 level but not the 0.01 level, matching the results of comparing the p-value to the significance levels.

Uploaded by

udai1singh-2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
You are on page 1/ 21

https://blog.minitab.

com/blog/adventures-in-statistics-2/how-to-correctly-interpret-p-values

Understanding Hypothesis Tests: Why We


Need to Use Hypothesis Tests in Statistics
Minitab Blog Editor 05 March, 2015

 Share

Hypothesis testing is an essential procedure in statistics. A hypothesis test


evaluates two mutually exclusive statements about a population to determine
which statement is best supported by the sample data. When we say that a
finding is statistically significant, it’s thanks to a hypothesis test. How do these
tests really work and what does statistical significance actually mean?

In this series of three posts, I’ll help you intuitively understand how hypothesis
tests work by focusing on concepts and graphs rather than equations and
numbers. After all, a key reason to use statistical software like Minitab is so you
don’t get bogged down in the calculations and can instead focus on
understanding your results.

To kick things off in this post, I highlight the rationale for using hypothesis tests
with an example.

The Scenario
An economist wants to determine whether the monthly energy cost for families
has changed from the previous year, when the mean cost per month was $260.
The economist randomly samples 25 families and records their energy costs for
the current year. (The data for this example is FamilyEnergyCost and it is just
one of the many data set examples that can be found in Minitab’s Data Set
Library.)
I’ll use these descriptive statistics to create a probability distribution plot that
shows you the importance of hypothesis tests. Read on!

The Need for Hypothesis Tests


Why do we even need hypothesis tests? After all, we took a random sample and
our sample mean of 330.6 is different from 260. That isdifferent, right?
Unfortunately, the picture is muddied because we’re looking at a sample rather
than the entire population.

Sampling error is the difference between a sample and the entire population.
Thanks to sampling error, it’s entirely possible that while our sample mean is
330.6, the population mean could still be 260. Or, to put it another way, if we
repeated the experiment, it’s possible that the second sample mean could be
close to 260. A hypothesis test helps assess the likelihood of this possibility!

Use the Sampling Distribution to See If


Our Sample Mean is Unlikely
For any given random sample, the mean of the sample almost certainly doesn’t
equal the true mean of the population due to sampling error. For our example, it’s
unlikely that the mean cost for the entire population is exactly 330.6. In fact, if
we took multiple random samples of the same size from the same population, we
could plot a distribution of the sample means.

A sampling distribution is the distribution of a statistic, such as the mean, that is


obtained by repeatedly drawing a large number of samples from a specific
population. This distribution allows you to determine the probability of obtaining
the sample statistic.

Fortunately, I can create a plot of sample means without collecting many


different random samples! Instead, I’ll create a probability distribution plot using
the t-distribution, the sample size, and the variability in our sample to graph the
sampling distribution.

Our goal is to determine whether our sample mean is significantly different from
the null hypothesis mean. Therefore, we’ll use the graph to see whether our
sample mean of 330.6 is unlikely assuming that the population mean is 260. The
graph below shows the expected distribution of sample means.
You can see that the most probable sample mean is 260, which makes sense
because we’re assuming that the null hypothesis is true. However, there is a
reasonable probability of obtaining a sample mean that ranges from 167 to 352,
and even beyond! The takeaway from this graph is that while our sample mean of
330.6 is not the most probable, it’s also not outside the realm of possibility.

The Role of Hypothesis Tests


We’ve placed our sample mean in the context of all possible sample means while
assuming that the null hypothesis is true. Are these results statistically
significant?

As you can see, there is no magic place on the distribution curve to make this
determination. Instead, we have a continual decrease in the probability of
obtaining sample means that are further from the null hypothesis value. Where
do we draw the line?

This is where hypothesis tests are useful. A hypothesis test allows us quantify
the probability that our sample mean is unusual.
For this series of posts, I’ll continue to use this graphical framework and add in
the significance level, P value, and confidence interval to show how hypothesis
tests work and what statistical significance really means.

 Part Two: Significance Levels (alpha) and P values

 Part Three: Confidence Intervals and Confidence Levels

If you'd like to see how I made these graphs, please read: How to Create a
Graphical Version of the 1-sample t-Test.

Understanding Hypothesis Tests: Significance


Levels (Alpha) and P values in Statistics
Minitab Blog Editor 19 March, 2015

 Share

What do significance levels and P values mean in hypothesis tests?


What is statistical significance anyway? In this post, I’ll continue to focus on
concepts and graphs to help you gain a more intuitive understanding of how
hypothesis tests work in statistics.

To bring it to life, I’ll add the significance level and P value to the graph in my
previous post in order to perform a graphical version of the 1 sample t-test. It’s
easier to understand when you can see what statistical significance truly means!

Here’s where we left off in my last post. We want to determine whether our
sample mean (330.6) indicates that this year's average energy cost is
significantly different from last year’s average energy cost of $260.
The probability distribution plot above shows the distribution of sample
means we’d obtain under the assumption that the null hypothesis is true
(population mean = 260) and we repeatedly drew a large number of random
samples.

I left you with a question: where do we draw the line for statistical significance
on the graph? Now we'll add in the significance level and the P value, which are
the decision-making tools we'll need.

We'll use these tools to test the following hypotheses:

 Null hypothesis: The population mean equals the hypothesized mean


(260).

 Alternative hypothesis: The population mean differs from the


hypothesized mean (260).

What Is the Significance Level (Alpha)?


The significance level, also denoted as alpha or α, is the probability of rejecting
the null hypothesis when it is true. For example, a significance level of 0.05
indicates a 5% risk of concluding that a difference exists when there is no actual
difference.

These types of definitions can be hard to understand because of their technical


nature. A picture makes the concepts much easier to comprehend!

The significance level determines how far out from the null hypothesis value
we'll draw that line on the graph. To graph a significance level of 0.05, we need to
shade the 5% of the distribution that is furthest away from the null hypothesis.

In the graph above, the two shaded areas are equidistant from the null
hypothesis value and each area has a probability of 0.025, for a total of 0.05. In
statistics, we call these shaded areas the critical region for a two-tailed test. If
the population mean is 260, we’d expect to obtain a sample mean that falls in the
critical region 5% of the time. The critical region defines how far away our
sample statistic must be from the null hypothesis value before we can say it is
unusual enough to reject the null hypothesis.

Our sample mean (330.6) falls within the critical region, which indicates it is
statistically significant at the 0.05 level.
We can also see if it is statistically significant using the other common
significance level of 0.01.

The two shaded areas each have a probability of 0.005, which adds up to a total
probability of 0.01. This time our sample mean does not fall within the critical
region and we fail to reject the null hypothesis. This comparison shows why you
need to choose your significance level before you begin your study. It protects
you from choosing a significance level because it conveniently gives you
significant results!

Thanks to the graph, we were able to determine that our results are statistically
significant at the 0.05 level without using a P value. However, when you use the
numeric output produced by statistical software, you’ll need to compare the P
value to your significance level to make this determination.

What Are P values?


P-values are the probability of obtaining an effect at least as extreme as the one
in your sample data, assuming the truth of the null hypothesis.
This definition of P values, while technically correct, is a bit convoluted. It’s
easier to understand with a graph!

To graph the P value for our example data set, we need to determine the distance
between the sample mean and the null hypothesis value (330.6 - 260 = 70.6).
Next, we can graph the probability of obtaining a sample mean that is at least as
extreme in both tails of the distribution (260 +/- 70.6).

In the graph above, the two shaded areas each have a probability of 0.01556, for
a total probability 0.03112. This probability represents the likelihood of obtaining
a sample mean that is at least as extreme as our sample mean in both tails of
the distribution if the population mean is 260. That’s our P value!

When a P value is less than or equal to the significance level, you reject the null
hypothesis. If we take the P value for our example and compare it to the common
significance levels, it matches the previous graphical results. The P value of
0.03112 is statistically significant at an alpha level of 0.05, but not at the 0.01
level.
If we stick to a significance level of 0.05, we can conclude that the average
energy cost for the population is greater than 260.

A common mistake is to interpret the P-value as the probability that the null
hypothesis is true. To understand why this interpretation is incorrect, please read
my blog post How to Correctly Interpret P Values.

Discussion about Statistically Significant


Results
A hypothesis test evaluates two mutually exclusive statements about a
population to determine which statement is best supported by the sample data.
A test result is statistically significant when the sample statistic is unusual
enough relative to the null hypothesis that we can reject the null hypothesis for
the entire population. “Unusual enough” in a hypothesis test is defined by:

 The assumption that the null hypothesis is true—the graphs are


centered on the null hypothesis value.

 The significance level—how far out do we draw the line for the
critical region?

 Our sample statistic—does it fall in the critical region?

Keep in mind that there is no magic significance level that distinguishes


between the studies that have a true effect and those that don’t with 100%
accuracy. The common alpha values of 0.05 and 0.01 are simply based on
tradition. For a significance level of 0.05, expect to obtain sample means in the
critical region 5% of the time when the null hypothesis is true. In these cases,
you won’t know that the null hypothesis is true but you’ll reject it because the
sample mean falls in the critical region. That’s why the significance level is also
referred to as an error rate!

This type of error doesn’t imply that the experimenter did anything wrong or
require any other unusual explanation. The graphs show that when the null
hypothesis is true, it is possible to obtain these unusual sample means for no
reason other than random sampling error. It’s just luck of the draw.

Significance levels and P values are important tools that help you quantify and
control this type of error in a hypothesis test. Using these tools to decide when
to reject the null hypothesis increases your chance of making the correct
decision.
If you like this post, you might want to read the other posts in this series that
use the same graphical framework:

 Previous: Why We Need to Use Hypothesis Tests

 Next: Confidence Intervals and Confidence Levels

How to Correctly Interpret Confidence


Intervals and Confidence Levels
A confidence interval is a range of
values that is likely to contain an unknown population parameter. If you draw a
random sample many times, a certain percentage of the confidence intervals will
contain the population mean. This percentage is the confidence level.

Most frequently, you’ll use confidence intervals to bound the mean or standard
deviation, but you can also obtain them for regression coefficients, proportions,
rates of occurrence (Poisson), and for the differences between populations.

Just as there is a common misconception of how to interpret P values, there’s a


common misconception of how to interpret confidence intervals. In this case, the
confidence level is not the probability that a specific confidence interval
contains the population parameter.

The confidence level represents the theoretical ability of the analysis to produce
accurate intervals if you are able to assess many intervals and you know the
value of the population parameter. For a specific confidence interval from one
study, the interval either contains the population value or it does not—there’s no
room for probabilities other than 0 or 1. And you can't choose between these two
possibilities because you don’t know the value of the population parameter.

"The parameter is an unknown constant and no probability statement


concerning its value may be made."
—Jerzy Neyman, original developer of confidence intervals.

This will be easier to understand after we discuss the graph below . . .

With this in mind, how do you interpret confidence intervals?

Confidence intervals serve as good estimates of the population parameter


because the procedure tends to produce intervals that contain the parameter.
Confidence intervals are comprised of the point estimate (the most likely value)
and a margin of error around that point estimate. The margin of error indicates
the amount of uncertainty that surrounds the sample estimate of the population
parameter.

In this vein, you can use confidence intervals to assess the precision of the
sample estimate. For a specific variable, a narrower confidence interval [90 110]
suggests a more precise estimate of the population parameter than a wider
confidence interval [50 150].

Confidence Intervals and the Margin of


Error
Let’s move on to see how confidence intervals account for that margin of error.
To do this, we’ll use the same tools that we’ve been using to understand
hypothesis tests. I’ll create a sampling distribution using probability distribution
plots, the t-distribution, and the variability in our data. We'll base our confidence
interval on the energy cost data set that we've been using.

When we looked at significance levels, the graphs displayed a sampling


distribution centered on the null hypothesis value, and the outer 5% of the
distribution was shaded. For confidence intervals, we need to shift the sampling
distribution so that it is centered on the sample mean and shade the middle 95%.
The shaded area shows the range of sample means that you’d obtain 95% of the
time using our sample mean as the point estimate of the population mean. This
range [267 394] is our 95% confidence interval.

Using the graph, it’s easier to understand how a specific confidence interval
represents the margin of error, or the amount of uncertainty, around the point
estimate. The sample mean is the most likely value for the population mean
given the information that we have. However, the graph shows it would not be
unusual at all for other random samples drawn from the same population to
obtain different sample means within the shaded area. These other likely sample
means all suggest different values for the population mean. Hence, the interval
represents the inherent uncertainty that comes with using sample data.

You can use these graphs to calculate probabilities for specific values. However,
notice that you can’t place the population mean on the graph because that value
is unknown. Consequently, you can’t calculate probabilities for the population
mean, just as Neyman said!

Why P Values and Confidence Intervals


Always Agree About Statistical
Significance
You can use either P values or confidence intervals to determine whether your
results are statistically significant. If a hypothesis test produces both, these
results will agree.

The confidence level is equivalent to 1 – the alpha level. So, if your significance
level is 0.05, the corresponding confidence level is 95%.

 If the P value is less than your significance (alpha) level, the


hypothesis test is statistically significant.

 If the confidence interval does not contain the null hypothesis value,
the results are statistically significant.

 If the P value is less than alpha, the confidence interval will not
contain the null hypothesis value.

For our example, the P value (0.031) is less than the significance level (0.05),
which indicates that our results are statistically significant. Similarly, our 95%
confidence interval [267 394] does not include the null hypothesis mean of 260
and we draw the same conclusion.

To understand why the results always agree, let’s recall how both the
significance level and confidence level work.

 The significance level defines the distance the sample mean must
be from the null hypothesis to be considered statistically significant.

 The confidence level defines the distance for how close the
confidence limits are to sample mean.

Both the significance level and the confidence level define a distance from a
limit to a mean. Guess what? The distances in both cases are exactly the same!

The distance equals the critical t-value * standard error of the mean. For our
energy cost example data, the distance works out to be $63.57.

Imagine this discussion between the null hypothesis mean and the sample mean:

Null hypothesis mean, hypothesis test representative: Hey buddy! I’ve found that
you’re statistically significant because you’re more than $63.57 away from me!

Sample mean, confidence interval representative: Actually, I’m significant


because you’re more than $63.57 away from me!

Very agreeable aren’t they? And, they always will agree as long as you compare
the correct pairs of P values and confidence intervals. If you compare the
incorrect pair, you can get conflicting results, as shown by common mistake #1
in this post.

Closing Thoughts
In statistical analyses, there tends to be a greater focus on P values and simply
detecting a significant effect or difference. However, a statistically significant
effect is not necessarily meaningful in the real world. For instance, the effect
might be too small to be of any practical value.

It’s important to pay attention to the both the magnitude and the precision of the
estimated effect. That’s why I'm rather fond of confidence intervals. They allow
you to assess these important characteristics along with the statistical
significance. You'd like to see a narrow confidence interval where the entire
range represents an effect that is meaningful in the real world.
If you like this post, you might want to read the previous posts in this series that
use the same graphical framework:

 Part One: Why We Need to Use Hypothesis Tests

 Part Two: Significance Levels (alpha) and P values

For more about confidence intervals, read my post where I compare them to
tolerance intervals and prediction intervals.

How to Correctly Interpret P Values


Minitab Blog Editor 17 April, 2014

 Share

The P value is used all over statistics, from t-tests to regression


analysis. Everyone knows that you use P values to determine statistical
significance in a hypothesis test. In fact, P values often determine what studies
get published and what projects get funding.

Despite being so important, the P value is a slippery concept that people often
interpret incorrectly. How do you interpret P values?

In this post, I'll help you to understand P values in a more intuitive way and to
avoid a very common misinterpretation that can cost you money and credibility.
What Is the Null Hypothesis in Hypothesis
Testing?

In order to understand P values, you must first understand the null hypothesis.

In every experiment, there is an effect or difference between groups that the


researchers are testing. It could be the effectiveness of a new drug, building
material, or other intervention that has benefits. Unfortunately for the
researchers, there is always the possibility that there is no effect, that is, that
there is no difference between the groups. This lack of a difference is called
the null hypothesis, which is essentially the position a devil’s advocate would
take when evaluating the results of an experiment.

To see why, let’s imagine an experiment for a drug that we know is totally
ineffective. The null hypothesis is true: there is no difference between the
experimental groups at the population level.

Despite the null being true, it’s entirely possible that there will be an effect in
the sample data due to random sampling error. In fact, it is extremely unlikely
that the sample groups will ever exactly equal the null hypothesis value.
Consequently, the devil’s advocate position is that the observed difference in
the sample does not reflect a true difference between populations.

What Are P Values?

P values evaluate how well the sample


data support the devil’s advocate argument that the null hypothesis is true. It
measures how compatible your data are with the null hypothesis. How likely is
the effect observed in your sample data if the null hypothesis is true?

 High P values: your data are likely with a true null.

 Low P values: your data are unlikely with a true null.

A low P value suggests that your sample provides enough evidence that you can
reject the null hypothesis for the entire population.

How Do You Interpret P Values?


In technical terms, a P value is the probability of obtaining an effect at least as
extreme as the one in your sample data, assuming the truth of the null
hypothesis.

For example, suppose that a vaccine study produced a P value of 0.04. This P
value indicates that if the vaccine had no effect, you’d obtain the observed
difference or more in 4% of studies due to random sampling error.

P values address only one question: how likely are your data, assuming a true
null hypothesis? It does not measure support for the alternative hypothesis. This
limitation leads us into the next section to cover a very common
misinterpretation of P values.

P Values Are NOT the Probability of


Making a Mistake
Incorrect interpretations of P values are very common. The most common
mistake is to interpret a P value as the probability of making a mistake by
rejecting a true null hypothesis (a Type I error).

There are several reasons why P values can’t be the error rate.

First, P values are calculated based on the assumptions that the null is true for
the population and that the difference in the sample is caused entirely by
random chance. Consequently, P values can’t tell you the probability that the null
is true or false because it is 100% true from the perspective of the calculations.

Second, while a low P value indicates that your data are unlikely assuming a true
null, it can’t evaluate which of two competing cases is more likely:

 The null is true but your sample was unusual.

 The null is false.

Determining which case is more likely requires subject area knowledge and
replicate studies.

Let’s go back to the vaccine study and compare the correct and incorrect way to
interpret the P value of 0.04:

 Correct: Assuming that the vaccine had no effect, you’d obtain the
observed difference or more in 4% of studies due to random sampling
error.

 Incorrect: If you reject the null hypothesis, there’s a 4% chance that


you’re making a mistake.

To see a graphical representation of how hypothesis tests work, see my


post: Understanding Hypothesis Tests: Significance Levels and P Values.

What Is the True Error Rate?


Think that this interpretation difference is simply a matter of semantics, and
only important to picky statisticians? Think again. It’s important to you.

If a P value is not the error rate, what the heck is the error rate? (Can you guess
which way this is heading now?)

Sellke et al.* have estimated the error rates associated with different P values.
While the precise error rate depends on various assumptions (which I
discuss here), the table summarizes them for middle-of-the-road assumptions.
P value Probability of incorrectly rejecting a true null hypothesis

0.05 At least 23% (and typically close to 50%)

0.01 At least 7% (and typically close to 15%)

Do the higher error rates in this table surprise you? Unfortunately, the common
misinterpretation of P values as the error rate creates the illusion of
substantially more evidence against the null hypothesis than is justified. As you
can see, if you base a decision on a single study with a P value near 0.05, the
difference observed in the sample may not exist at the population level. That can
be costly!

Now that you know how to interpret P values, read my five guidelines for how to
use P values and avoid mistakes.

You can also read my rebuttal to an academic journal that actually banned P
values!

An exciting study about the reproducibility of experimental results was published


in August 2015. This study highlights the importance of understanding the true
error rate. For more information, read my blog post: P Values and the Replication
of Experiments.

The American Statistical Association speaks out on how to use p-values!

You might also like