Hypothesis Testing
STC224
Inferential statistics
• Inferential statistics are statistical procedures that use
samples to make generalizations about populations.
• For example, consider trying to forecast the winner in a
presidential election.
• Using inferential statistics, with a relatively small sample
(e.g., 2000 people), statements about the population of
all U.S. voters can be made with a fairly high degree of
accuracy (assuming the sample is representative of the
population from which it comes).
• Inferential statistics are extremely useful because they
allow us to draw conclusions about populations based
on limited information (i.e., samples).
Hypothesis Testing
• When we use a sample to make an inference
about a population, we engage in a process
known as hypothesis testing.
•
Types of Hypothesis Testing
• In hypothesis testing, it is customary to state two hypotheses,
a null hypothesis and an alternative hypothesis.
• The null hypothesis typically states that a treatment did not
have an effect, while the alternative hypothesis states that
the treatment had an effect.
• Suppose, for example, that a new therapy is being evaluated
to determine if it helps people who are suffering from
depression. The null hypothesis would state that the therapy
did not have an effect on depression (i.e., it did not change
the way people felt), while the alternative hypothesis would
state that the therapy had an effect on depression (i.e., it
changed the way people felt).
Types of Hypothesis Testing cont’d
• In hypothesis testing, we evaluate the null
hypothesis with the purpose of either
rejecting it or not rejecting it.
• If, based on the results found in our study, the
null hypothesis seems reasonable, we do not
reject it.
• If, however, the null hypothesis seems
unreasonable, we reject it (and side with the
alternative hypothesis).
Types of tests
• One-Tailed and
• Two-Tailed Tests
One-Tailed and Two-Tailed Tests
• Hypotheses are evaluated using either a two-tailed
test or a one-tailed test.
• A two-tailed test is used when a treatment is
evaluated to see whether it has an impact in either
direction (to see if scores are higher or lower),
while a one-tailed test is used when the intent is to
investigate only a single direction (only higher or
only lower).
• Consider an example where two different therapies
are investigated to see which one is more effective
at helping people with anxiety (we’ll refer to the
therapies as therapy A and therapy B).
One-Tailed and Two-Tailed Tests cont’d
• A two-tailed test would allow for the possibility of either
therapy being more effective (i.e., either A < B or B <
A ), while a one-tailed test would specify beforehand a
single direction to be investigated (e.g., A < B ).
• The advantage of a one-tailed test is that it has a greater
chance of finding the effect (assuming it exists in the
hypothesized direction); the disadvantage of a one-
tailed test is that if the effect is located in the opposite
direction of what was anticipated, it cannot be declared.
• In practice, two-tailed tests are more common in
research, and for this reason they will be emphasized
here
Type I and Type II Errors
• Recall that in hypothesis testing samples are used to make
inferences about populations. Because samples are
incomplete “pictures” of populations, it is possible to make
a mistake in the hypothesis testing process.
• There are two different types of mistakes that can be made
—a Type I error and a Type II error.
• A Type I error occurs if the null hypothesis is rejected when
it is true (if it is true, it should not have been rejected).
• A Type II error occurs if the null hypothesis is not rejected
when it is false (if it is false, it should have been rejected).
• Both are errors in hypothesis testing since the conclusion
made from the hypothesis test is contrary to the true
situation
Power
• While Type I and Type II errors are mistakes in hypothesis
testing, power is concerned with making a correct decision.
• Power is equal to the probability of rejecting the null
hypothesis when it is false (if the null hypothesis is false
and it is rejected, a correct decision has been made).
• Power ranges from 0 to 1, with higher values indicating
greater power.
• Power of .80, for example, means that there is an 80%
chance of rejecting the null hypothesis prior to conducting
the study.
• (Put another way, if a study was conducted many times,
power of .80 means that 80% of the time the null
hypothesis would be rejected—a correct decision—and
20% of the time it would not be rejected—a Type II error.)
Sampling Error
• Sampling error is a very important concept in inferential
statistics and it’s one that, if understood, will make following
the logic of hypothesis testing easier.
• Consider a problem where two different strategies for
remembering words are compared.
• Suppose that there is no difference between the two strategies
so that whichever strategy is used, the number of words
recalled is the same in the population.
• Imagine that I took a sample of 10 people and gave them the
first strategy, and then I took a sample of another 10 people
and gave them the second strategy, and then compared the
number of words recalled for the two groups.
• While the strategies result in the same number of words
recalled in the population, it is extremely unlikely that the
number of words recalled for the two groups will be the same
in the samples.
Sampling Error cont’d
• This is because samples are subsets of the
population, and since only part of the population is
taken, it will not be represented perfectly.
• Generally speaking, the smaller the sample, the
larger the discrepancy between the sample and the
population.
• The discrepancy between the sample and the
population is known as sampling error.
• It is a normal part of statistics and it’s important to
keep it in mind in the following chapters.
• When different samples are drawn from the same
population, the samples will typically not be the
same (e.g., they will not have the same mean).
p-Values
• Continuing with the word strategy example, if samples are
not typically the same when drawn from a population, how
can we determine whether there is really a meaningful
difference between the samples (i.e., one strategy is really
better than the other in the population), or if the difference
is just due to sampling error (i.e., the strategies are the
same in the population)? This decision is made based on the
p-value obtained from the output in SPSS.
• A p-value indicates the exact probability of obtaining the
specific results (or results even more extreme) if the null
hypothesis is true.
• For example, in the learning strategy problem described
previously, let’s assume there was a difference of two words
recalled between the strategies with a p-value of .03.
p-Values cont’d
• The p-value of .03 would indicate that there is only a
3% chance of getting a difference of two words (or
more) between the groups if the null hypothesis was
true.
• In hypothesis testing, the p-value for a test is compared
to a predetermined value, known as alpha (represented
by the symbol α), and based on that comparison, a
decision is made about the null hypothesis.
• In this text, we will use an alpha level of .05 (the most
commonly used value in the social and behavioral
sciences) and evaluate the p-value against that level.
p-Values cont’d
• The process of evaluating the p-value is as
follows:
– If the p-value is less than or equal to .05 (alpha),
the null hypothesis is rejected (a difference
between the strategies is assumed).
– If the p-value is greater than .05 (alpha), the null
hypothesis is not rejected(a difference between
the strategies is not assumed).
p-Values cont’d
• In SPSS the p-value is reported as “sig.”
• The decision process for hypothesis testing is
summarized in the following table
p-Values cont’d
• Consider the examples shown in the table below.
• The first example shows a p-value of .020.
• With an alpha of .05, what decision would be
made about the null hypothesis?
• Since the p-value of .02 is less than .05, the null
hypothesis is rejected. In examples 2 and 3, the
null hypothesis would not be rejected since both
p-values (.080 and .521) are greater than .05.
p-Values cont’d
• While an alpha of .05 will be used in this text, if you wish to
use another value (such as .01 or .001), all you have to do is
adjust your decision rule. If alpha of .01 was used, for
example, all three p-values in the preceding table would lead
to the null hypothesis not being rejected since all values are
greater than .01.
• As is shown in this example, the value of alpha (.05 vs. .01)
can make a difference in the conclusion made about the null
hypothesis (i.e., whether it is rejected or not).
Effect Sizes
• While hypothesis testing is a powerful tool for making
inferences about populations, it is important to recognize
what hypothesis testing does and does not tell us.
• Considering our word strategy problem again, hypothesis
testing allows us to conclude, with a reasonable degree of
assurance, whether or not the two strategies are different
in the population.
• What hypothesis testing fails to indicate, however, is how
different the groups are (i.e., hypothesis testing indicates
that the groups are different but doesn’t indicate whether
the difference is small, moderate, or large).
• One way to describe the degree of difference between the
groups is by calculating an effect size.
• Effect sizes indicate the magnitude of the results in our
study.
Questions