What is Hypothesis Testing?
Hypothesis testing can be defined as a statistical tool that is used to identify if the results of
an experiment are meaningful or not.
It involves setting up a null hypothesis and an alternative hypothesis. These two hypotheses
will always be mutually exclusive. This means that if the null hypothesis is true then the
alternative hypothesis is false and vice versa.
A hypothesis is an assumption or idea, specifically a statistical claim about an unknown
population parameter.
Types of Hypothesis Testing:
There are two types of hypotheses used in Hypothesis Testing:
Null Hypothesis: The effect does not exist in the population
The Null Hypothesis assumes no significant relationship exists between the variables
being studied (one variable does not affect the other). In other words, it assumes that
any observed effects in the sample are due to chance and are not real. The Null
Hypothesis is typically denoted as H0.
Alternative Hypothesis: The effect does exist in the population
It is the opposite of the Null Hypothesis. It assumes that there is a significant
relationship between the variables. In other words, it assumes that the observed effects
in the sample are real and are not due to chance. The Alternative Hypothesis is
typically denoted as H1 or Ha.
Hypothesis Testing:
Z-Test: (If population means and standard deviations are known. Z-statistic is
commonly used.)
Z-Test is used to compare the means of two groups when the population's standard
deviation is known, and the number of samples is large.
It is calculated by dividing the difference between the two means by the standard error
of the mean. The resulting z-score is then compared to a normal distribution to
determine whether the difference is statistically significant or not.
Z-score for two groups (sample and population) can be calculated using the below
formula:
T-Test: T test is used when n<30 (If population standard deviations are
unknown. and sample size is small than t-test statistic is more appropriate.)
The T-Test is similar to z-tests, but they are used when the populations’ standard
deviation is unknown and the number of samples is small.
The t-test calculates the difference between the two means and compares it to a t-
distribution to determine whether the difference is statistically significant.
There are several types of t-tests used to compare the means of two groups, such as
one-sample t-test, two-sample t-test, independent sample t-test, etc.
Type I and Type II Error:
During performing Hypothesis Testing, there might be some errors. Two types of
errors are generally encountered while performing Hypothesis Testing.
F-Test:
F-test is often used in analysis of variance (ANOVA) to compare variances or test
the equality of means across multiple groups.
Chi-Square Test:
The Chi-Square test is used to test the population variance against the known or
assumed value of the population variance. It is also a better choice to test the
goodness of fit of a distribution of data. The two most common Chi-Square tests are
the Chi-Square test of independence and the chi-square test of variance.
ANOVA:
Analysis of Variance or ANOVA compares the data sets of two different populations
or samples. It is similar in its use to the t-test or the Z-test, but it allows us to compare
more than two sample means. ANOVA allows us to test the significance between an
independent variable and a dependent variable, namely X and Y, respectively.
o Type I Error:
Type I error is the case when we reject the null hypothesis, but in reality, it is true.
The probability of having a Type-I error is called significance level alpha(α). It is
the case of False Positive or incorrect rejection of the null hypothesis.
o Type II Error:
Type II error is the case when we fail to reject the null hypothesis, but it is false. The
probability of having a type-II error is called beta(β). It is the case of False Negative
or incorrect acceptance of the null hypothesis.
Large Sample Test
The sample size n is greater than 30 (n≥30) it is known as large sample. For large
samples the sampling distributions of statistic are normal (Z test). A study
of sampling distribution of statistic for large sample is known as large sample test.
Small sample Test
If the sample size n is less than 30 (n<30), it is known as small sample. For small
samples the sampling distributions are t, F and χ2 distribution. A study
of sampling distributions for small samples is known as small sample theory.