Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
18 views32 pages

Data Analysis

The document provides an overview of descriptive and inferential statistics, detailing their definitions, types, and applications. Descriptive statistics summarize data characteristics using measures such as mean, median, and variance, while inferential statistics allow for predictions and generalizations about a population based on sample data. Additionally, it includes a sample questionnaire to illustrate the use of descriptive statistics in analyzing the effects of drug and alcohol abuse on academic performance.

Uploaded by

kuriayvonne982
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views32 pages

Data Analysis

The document provides an overview of descriptive and inferential statistics, detailing their definitions, types, and applications. Descriptive statistics summarize data characteristics using measures such as mean, median, and variance, while inferential statistics allow for predictions and generalizations about a population based on sample data. Additionally, it includes a sample questionnaire to illustrate the use of descriptive statistics in analyzing the effects of drug and alcohol abuse on academic performance.

Uploaded by

kuriayvonne982
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

STATISTICAL ANALYSIS

Descriptive and inferential Statistics Definition

Descriptive statistics can be defined as a field of statistics that is used to


summarize the characteristics of a sample by utilizing certain quantitative
techniques. It helps to provide simple and precise summaries of the sample
and the observations using measures like mean, median, variance, graphs,
and charts. Univariate descriptive statistics are used to describe data
containing only one variable. On the other hand, bivariate and multivariate

Types of Descriptive Statistics

Measures of central tendency and measures of dispersion are two types of


descriptive statistics that are used to quantitatively summarize the
characteristics of grouped and ungrouped data. When an experiment is
conducted, the raw data obtained is known as ungrouped data. When this
data is organized logically it is known as grouped data. To visually represent
data, descriptive statistics use graphs, charts, and tables.

Measures of Central Tendency


In descriptive statistics, the measures of central tendency are used to
describe data by determining a single representative central value. The
important measures of central tendency are given below:
Mean: The mean can be defined as the sum of all observations divided by
the total number of sample observations.
Median: The median can be defined as the center-most observation that is
obtained by arranging the data in ascending order.
Mode: The mode is the most frequently occurring observation in the data
set.
Measures of Dispersion
In descriptive statistics, the measures of dispersion are used to determine
how spread out a distribution is with respect to the central value. The
important measures of dispersion are given below:
Range: The range can be defined as the difference between the highest
value and the lowest value
Variance: The variance gives the variability of the distribution with respect
to the mean.
1
Standard Deviation: The square root of the variance will result in
the standard deviation. It helps to analyze the variability in a data set in a
more effective manner as compared to the variance.
Mean Deviation: The mean deviation will give the average of the absolute
value of the data about the mean, median, or mode. It is also known as
absolute deviation.
Quartile Deviation: Half of the difference between the third and first
quartile gives the quartile deviation.

Inferential statistics allows you to make predictions (“inferences”) from


that data. With inferential statistics, you take data from samples and make
generalizations about a population.
There are two main areas of inferential statistics:
1. Estimating parameters. This means taking a statistic from your
sample data (for example the sample mean) and using it to say
something about a population parameter (i.e. the population mean).
2. Hypothesis tests. This is where you can use sample data to answer
research questions. For example, you might be interested in knowing if
a new cancer drug is effective. In this case a sample data from a small
number of people is taken to try to determine if the data can predict
whether the drug will work for everyone (i.e. the population).
There are various ways you can do this, from calculating a z-score (z-
scores are a way to show where your data would lie in a normal
distribution to post-hoc (advanced) testing.
Other statistical models to help you compare your sample data to other
samples or to previous research include Generalized Linear model ,
Student’s t-tests, ANOVA (Analysis of Variance), regression analysis that
result in straight-line (“linear“) probabilities and results.

Descriptive Statistics Representations

2
Descriptive statistics can also be used to summarize data visually before
quantitative methods of analysis are applied to them. Some important forms
of representations of descriptive statistics are as follows:
Frequency Distribution Tables: These can be either simple or
grouped frequency distribution tables. They are used to show the distribution
of values or classes along with the corresponding frequencies. Such tables
are very useful in making charts as well as catching patterns in data.
Graphs and Charts: Graphs and charts help to represent data in a
completely visual format. It can be used to display percentages,
distributions, and frequencies. Scatter plots, bar graphs, pie charts, etc., are
some graphs that are used in descriptive statistics.
Descriptive Statistics vs Inferential Statistics
Inferential and descriptive statistics are both used to analyze data.
Descriptive statistics helps to describe the data quantitatively while
inferential statistics uses these parameters to make inferences about the
population. The differences between descriptive statistics and inferential
statistics are given below.

Descriptive Statistics Inferential Statistics

It is used to draw inferences


It is used to describe the
about the population data
characteristics of either the
from the sample data by
sample or the population by
making use of analytical
using quantitative tools.
tools.

Measures of central tendency


Hypothesis testing and
and measures of dispersion are
regression analysis are the
the most important types of
types of inferential statistics.
descriptive statistics.

It is used to describe the It tries to make inferences


characteristics of a known about the population that
dataset. goes beyond the known data.

Measures of descriptive Measures of inferential

3
Descriptive Statistics Inferential Statistics

statistics are z test, f test,


statistics are mean, median,
linear regression, ANOVA
variance, range, etc.
test, etc.

SAMPLE QUESTIONNAIRE TO ILLUSTRATE USE OF DESCRIPTIVE


STATISTICS
Effects of Drugs and Alcohol Abuse on Academic Performance

QUESTIONNAIRE FOR STUDENT COUNCILS ON EFFECTS OF DRUG


AND SUBSTANCE ABUSE ON PERFORMANCE
SECTION A: DEMOGRAPHIC INFORMATION (Tick Correctly)
1. Gender: Male [ ] Female [ ]
2. Age: 12-14 [ ] 15-17 [ ] 18 and above [ ]
3. Class: Form 1 [ ] Form 2 [ ] Form 3 [ ] Form 4 [ ]
4. Responsibility: Head girl[ ] Head boy[ ] Class prefect[ ] Dorm captain[ ]
Games captain [ ]
Specify other………………….

SECTION B: CAUSES OF DRUG AND ALCOHOL ABUSE


Using a Likert scale where 1 is Disagree, 2 Undecided/not sure, 3 Agree,
indicate your opinion on the causes of drugs and substance abuse in
secondary schools
Table 1: Causes Of Drug and Alcohol Abuse
S/N 1 2 3
Causes of drug and alcohol Disagre Undecid Agree
abuse e ed

1 Peer pressure/peer influence


2 Availability of drugs

4
3 Adolescence
4 Stress
5 Lack of parental guidance

SECTION C: SOURCES OF DRUG AND ALCOHOL


Using a Likert scale where 1 is Disagree, 2 Undecided/not sure, 3 Agree,
indicate your opinion on the sources of drugs and substance abuse in
secondary schools
Table 2: Sources Of Drugs and Alcohol
S/N 1 2 3
Sources of drugs and alcohol D U A

1 Fellow students
2 Chemist
3 Family members
4 Teachers
5 Workers

SECTION D: TYPES OF DRUGS COMMONLY ABUSED


Using a Likert scale where 1 is Disagree, 2 Undecided/not sure, 3 Agree,
indicate your opinion on the types of drugs and substance abuse in
secondary schools
Table 3: Types Of Drugs Commonly Abused
S/N 1 2 3
Types of drugs and alcohol D U A
commonly abused

1 Tobacco/shisha
2 Alcohol (Beer, changaa
3 Kuber
4 Bhang
5 Cocaine/heroine

HOW OFTEN DO THEY TAKE DRUGS/ALCOHOL

5
Using a Likert scale where 1 is Disagree, 2 Undecided/not sure, 3 Agree,
indicate your opinion on how often students take drugs and alcohol in
secondary schools
Table 4: How often student take drugs/alcohol
S/N 1 2 3
How often D U A

1 Once in a day
2 Twice in a day
3 Not at all
4 Sometimes
5 Many times/frequently

DATA ANALYSIS

Effects of Drug and Alcohol Abuse on Academic Performance

1. USE OF DESCRIPTIVE STATISTICS


Data tabulation
The first thing after data collection is to tabulate the results for the different
variables in the data set. This process will give a comprehensive picture of
what the data looks like and assist in identifying patterns. The best ways to
do this are by constructing frequency and percentage distributions. A
frequency distribution is an organized tabulation of the number of individuals
or scores located in each category (see the table below).
 This will help you determine:
 If scores are entered correctly
 If scores are high or low
 How many are in each category
 The spread of the scores
The table below shows responses from 140 student councils.
Data analysis for the questionnaire for students (student councils)
1. CAUSES OF DRUG AND ALCOHOL ABUSE
S/N 1 2 3
Causes of drug and D U A Mea Std.
alcohol abuse n Dev

6
1 Peer pressure/peer influence f 54 11 75
% 38. 7.8 53.6
6
2 Availability of drugs f 60 15 65
% 42. 10.7 46.4
9
3 Adolescence f 41 19 80
% 29. 13.8 57.1
3
4 Stress f 72 22 46
% 51. 15.7 32.9
4
5 Lack of parental guidance f 55 16 69
% 39. 11.4 49.3
3

KEY:
D- Disagree
U-Undecided
A- Agree
The results show that 53.6% of the respondents agreed that peer pressure
was the cause of drug and alcohol abuse, 38.6% disagreed and 7.8 % were
not sure. Thus, peer pressure need to be controlled in schools.

The study also indicated that 46.4% of the respondents agreed that
availability of drugs was the cause of drug and alcohol abuse, 42.9%
disagreed and 10.7 % were not sure. Hence, there is need to control the
availability and access of drugs and alcohol by students.

2. SOURCES OF DRUGS AND ALCOHOL


S/N 1 2 3
Sources of drugs and alcohol D U A Mea Std.
n dev
1 Fellow students f
%
2 Chemist f
%

7
3 Family members f
%
4 Teachers f
%
5 Workers f
%

The results shows that ………. of the respondents …………

3. TYPES OF DRUGS COMMONLY ABUSED


S/N 1 2 3
Types of drugs and alcohol D U A Mea Std.
commonly abused n dev
1 Tobacco/shisha f
%
2 Alcohol (Beer, changaa f
%
3 Kuber f
%
4 Bhang f
%
5 Cocaine/heroine f
%

The results shows that ………. of the respondents …………

4. HOW OFTEN DO THEY TAKE DRUGS/ALCOHOL


S/N 1 2 3
How often D U A Mea Std.
n dev
1 Once in a day f
%

8
2 Twice in a day f
%
3 Not at all f
%
4 Sometines f
%
5 Many times/frequently f
%

The results shows that ………. of the respondents …………

2. USE OF INFERENTIAL STATISTICS


In addition to the basic methods described above there are a variety of more
complicated analytical procedures that you can perform with your data.
These include:

a) Analysis of variance ANOVA


b) Correlation
c) Regression
d) Parametric and non-parametric methods

These types of analyses generally require computer software (e.g., SPSS,


SAS, STATA, MINITAB) and a solid understanding of statistics to interpret the
results. Basic descriptions of each method is provided but you are
encouraged to seek additional information (e.g.,
a) ANALYSIS OF VARIANCE (ANOVA)
An analysis of variance (ANOVA) is used to determine whether the difference
in means (averages) for two groups is statistically significant. For example,
an analysis of variance will help you determine if the high school grades of
those students who did mathematics test with calculators are significantly
different from the grades of students who did not use calculators. ANOVA
helps to compare these group means to find out if they are statistically

9
different or if they are similar. Analysis of Variance (ANOVA) is a statistical
formula used to compare variances across the means (or average) of
different groups. A range of scenarios use it to determine if there is any
difference between the means of different groups.
ANOVA terminology
Dependent variable: This is the item being measured that is theorized to
be affected by the independent variables.
Independent variable/s: These are the items being measured that may
have an effect on the dependent variable.
A null hypothesis (H0): This is when there is no difference between the
groups or means. Depending on the result of the ANOVA test, the null
hypothesis will either be accepted or rejected.
An alternative hypothesis (H1): When it is theorized that there is a
difference between groups and means.
Factors and levels: In ANOVA terminology, an independent variable is
called a factor which affects the dependent variable. Level denotes the
different values of the independent variable that are used in an experiment.
Null Hypothesis, H0: μ1= μ2 = μ3= ... = μk
Alternative Hypothesis, H1: The means are not equal
Decision Rule: If test statistic > critical value then reject the null hypothesis
and conclude that the means of at least two groups are statistically
significant.

The Two Types of ANOVA Test


There are two types of ANOVA: one-way and two-way. Depending on the
number of independent variables and how they interact with each other,
both are used in different scenarios.
1. One-way ANOVA
A one-way ANOVA test is used when there is one independent variable with
two or more groups. The objective is to determine whether a significant
difference exists between the means of different groups.

10
In our example, we can use one-way ANOVA to compare the effectiveness of
the three different teaching methods (lecture, workshop, and online learning)
on student exam scores. The teaching method is the independent variable
with three groups, and the exam score is the dependent variable.
 Null Hypothesis (H₀): The mean exam scores of students across the
three teaching methods are equal (no difference in means).
 Alternative Hypothesis (H₁): At least one group’s mean significantly
differs.

Comparison of the null and alternative hypothesis.

The one-way ANOVA test will tell us if the variation in student exam scores
can be attributed to the differences in teaching methods or if it’s likely due
to random chance.
One-way ANOVA is effective when analyzing the impact of a single factor
across multiple groups, making it simpler to interpret. However, it does not

11
account for the possibility of interaction between multiple independent
variables, where two-way ANOVA becomes necessary.

2. Two-way ANOVA
Two-way ANOVA is used when there are two independent variables, each
with two or more groups. The objective is to analyze how both independent
variables influence the dependent variable.
Let’s assume you are interested in the relationship between teaching
methods and study techniques and how they jointly affect student
performance. The two-way ANOVA is suitable for this scenario. Here we test
three hypotheses:
 The main effect of factor 1 (teaching method): Does the teaching
method influence student exam scores?
 The main effect of factor 2 (study technique): Does the study
technique affect exam scores?
 Interaction effect: Does the effectiveness of the teaching method
depend on the study technique used?
For example, two-way ANOVA could reveal that students using the lecture
method perform better in group study, and those using online learning might
perform better in individual study. Understanding these interactions gives a
deeper insight into how different factors together impact outcomes.
What is an ANOVA Test?
ANOVA stands for Analysis of Variance, a statistical test used to compare the
means of three or more groups. It analyzes the variance within the group
and between groups. The primary objective is to assess whether the
observed variance between group means is more significant than within the
groups. If the observed variance between group means is significant, it
suggests that the differences are meaningful.
Mathematically, ANOVA breaks down the total variability in the data into two
components:

12
 Within-Group Variability: Variability caused by differences within
individual groups, reflecting random fluctuations.
 Between-Group Variability: Variability caused by differences
between the means of the different groups.

F-statistic to compute ANOVA.

The test produces an F-statistic, which shows the ratio between between-
group and within-group variability. If the F-statistic is sufficiently large, it
indicates that at least one of the group means is significantly different from
the others.
To understand this better, consider a scenario where you are asked to assess
a student’s performance (exam scores) based on three teaching methods:
lecture, interactive workshop, and online learning. ANOVA can help us assess
whether the teaching method statistically impacts the student’s exam
performance.

The steps to perform the one-way ANOVA test are given below:

 Step 1: Calculate the mean for each group.


 Step 2: Calculate the total mean. This is done by adding all the
means and dividing it by the total number of means.
 Step 3: Calculate the SSB.
 Step 4: Calculate the between groups degrees of freedom.
 Step 5: Calculate the SSE.

13
 Step 6: Calculate the degrees of freedom of errors.
 Step 7: Determine the MSB and the MSE.
 Step 8: Find the f test statistic.
 Step 9: Using the f table for the specified level of significance, αα,
find the critical value. This is given by F(αα, df1. df2).
 Step 10: If f > F then reject the null hypothesis.

EXAMPLE ONE

Performing an ANOVA Test

We will use the same example of comparing different teaching methods to


examine how they affect student exam scores. Let’s assume you are
provided with the following data showing exam scores (dependent variable)
based on the teaching method (independent variable).

Effects of teaching methods on student’s exam scores

Teaching Methods
Lecture Workshop Online Learning
80 55 70
85 34 65
78 43 74
83 54 77

Exam scores for each teaching method for four students each.

Step 1: Define the hypothesis

The first step in the process is defining the hypothesis. State the null and
alternative hypotheses:

 Null Hypothesis (H₀): The means of exam scores for students across
the three teaching methods are equal.
 Alternative Hypothesis (H₁): At least one teaching method has a
different mean exam score.

14
H0: μ1 = μ2 = … = μk

Where μ1, μ2, …, μk are the means of the samples.

Step 2: Check ANOVA assumptions

Before performing ANOVA, ensure that the assumptions are met. Normality,
independence, and homogeneity of variances. For simplicity, let’s assume all
the assumptions are met.

Step 3: Calculate ANOVA

Once the assumptions are checked, calculate the ANOVA.

The formula for the F-statistic in one-way ANOVA is defined below.

F-statistic in one-way ANOVA.

F-statistic in one-way ANOVA is the ratio between the mean square sum
between the groups and the mean square sum within the groups.

To arrive at this, let’s go step-by-step.

1. Calculate the mean for each group and the overall mean.

Use the equation below to calculate the mean for each teaching method (Ai).
Divide the sum of the exam scores for each group by the number of students
in each group.

15
Mean for each group (teaching method).

Next, calculate the overall mean (G) by dividing the sum of all the instances
by the total number of students.

Overall mean of the exam scores.

2. Calculate the sum of squares for each group

The equation is as follows to calculate the sum of squares for each group.

16
The sum of squares for each method of teaching. Image by Author

After computing, fill this table with the values for easy access.

17
Summary of students' performance by teaching method.

3. Calculate the sum of squares between the group, the sum of


squares within the group, and the total sum of squares.

Using the equation below, calculate the sum of squares between the groups.
In the equation,

 Ai: Mean of the group


 G: Overall mean
 ni: number of instances in each group

18
Make use of the values from the summary table for the calculation.

Sum of squares between the group

Next, calculate the sum of squares within the group. It is the summation of
the sum of squares (SS) for each group.

Sum of squares within the group. Image by Author

Use the equation below to calculate the total sum of squares

Total sum of squares.

19
Verify the calculation by checking if the total sum of squares is the addition
of the sum of squares between the groups and the sum of squares within the
group. After verifying, move on to calculating mean squares.

4. Calculate the mean squares

Mean squares is the ratio of square sums to the degree of freedom.

The degree of freedom between groups df_between is equal to the number


of groups minus one, and the degree of freedom within groups df_w is equal
to the total number of participants minus the number of groups.

With the values calculated in the previous step, compute the mean squares.

Mean squares between the groups and within the groups.

5. Calculate the F-statistic using the equation below

F-statistic is the ratio of the mean square between the group to the mean
square within the group.

F-statistic.

20
The computed value of the F-statistic is 28.747.

Finally, the p-value is computed using the F-statistic, degree of freedom df,
and F-distribution table.

In this example, the numerator df is 2, the denominator df is 9, and the F-


statistic is 28.747. Therefore, the p-value from the F-distribution table is
0.000123.

Step 4: Interpret the results

 F-statistic: The F-statistic measures the ratio of between-group


variation to within-group variation. A higher F-statistic indicates a more
significant difference between group means relative to random
variation.
 P-value: The p-value determines whether the differences between
group means are statistically significant. If the p-value is below a
predefined threshold (commonly 0.05), reject the null hypothesis and
conclude that at least one group has a significantly different mean.

The p-value is 0.000123, and we would reject the null hypothesis to conclude
that the teaching method significantly affects exam scores.

EXAMPLE TWO

One-Way ANOVA: Example

Suppose we want to know whether or not three different exam prep


programs lead to different mean scores on a certain exam. To test this, we
recruit 30 students to participate in a study and split them into three groups.

The students in each group are randomly assigned to use one of the three
exam prep programs for the next three weeks to prepare for an exam. At the
end of the three weeks, all of the students take the same exam.

The exam scores for each group are shown below:

21
To perform a one-way ANOVA on this data, we will use the Statology One-
Way ANOVA Calculator with the following input:

22
From the output table we see that the F test statistic is 2.358 and the
corresponding p-value is 0.11385.

23
Since this p-value is not less than 0.05, we fail to reject the null hypothesis.

This means we don’t have sufficient evidence to say that there is a


statistically significant difference between the mean exam scores of the
three groups.

EXAMPLE THREE

One factor analysis of variance, also known as ANOVA, gives us a way to


make multiple comparisons of several population means. Rather than doing
this in a pairwise manner, we can look simultaneously at all of the means
under consideration. To perform an ANOVA test, we need to compare two
kinds of variation, the variation between the sample means, as well as the
variation within each of our samples.

We combine all of this variation into a single statistic, called the F statistic
because it uses the F-distribution. We do this by dividing the variation
between samples by the variation within each sample. The way to do this is
typically handled by software, however, there is some value in seeing one
such calculation worked out.

Steps in One Way ANOVA

1. Calculate the sample means for each of our samples as well as the
mean for all of the sample data.
2. Calculate the sum of squares of error. Here within each sample, we
square the deviation of each data value from the sample mean. The
sum of all of the squared deviations is the sum of squares of error,
abbreviated SSE.
3. Calculate the sum of squares of treatment. We square the deviation of
each sample mean from the overall mean. The sum of all of these
squared deviations is multiplied by one less than the number of

24
samples we have. This number is the sum of squares of treatment,
abbreviated SST.
4. Calculate the degrees of freedom. The overall number of degrees of
freedom is one less than the total number of data points in our sample,
or n - 1. The number of degrees of freedom of treatment is one less
than the number of samples used, or m - 1. The number of degrees of
freedom of error is the total number of data points, minus the number
of samples, or n - m.
5. Calculate the mean square of error. This is denoted MSE = SSE/(n - m).
6. Calculate the mean square of treatment. This is denoted MST =
SST/m - `1.
7. Calculate the F statistic. This is the ratio of the two mean squares that
we calculated. So F = MST/MSE.

Software does all of this quite easily, but it is good to know what is
happening behind the scenes. In what follows we work out an example of
ANOVA following the steps as listed above.

Data and Sample Means

Suppose we have four independent populations that satisfy the conditions for
single factor ANOVA. We wish to test the null hypothesis H0: μ1 = μ2 = μ3 =
μ4. For purposes of this example, we will use a sample of size three from
each of the populations being studied. The data from our samples is:

 Sample from population #1: 12, 9, 12. This has a sample mean of 11.
 Sample from population #2: 7, 10, 13. This has a sample mean of 10.
 Sample from population #3: 5, 8, 11. This has a sample mean of 8.
 Sample from population #4: 5, 8, 8. This has a sample mean of 7.

The mean of all of the data is 9.

Sum of Squares of Error

We now calculate the sum of the squared deviations from each sample
mean. This is called the sum of squares of error.

 For the sample from population #1: (12 – 11)2 + (9– 11)2 +(12 – 11)2 =
6
 For the sample from population #2: (7 – 10)2 + (10– 10)2 +(13 – 10)2 =
18
 For the sample from population #3: (5 – 8)2 + (8 – 8)2 +(11 – 8)2 = 18
 For the sample from population #4: (5 – 7)2 + (8 – 7)2 +(8 – 7)2 = 6.

We then add all of these sum of squared deviations and obtain 6 + 18 + 18


+ 6 = 48.

25
Sum of Squares of Treatment

Now we calculate the sum of squares of treatment. Here we look at the


squared deviations of each sample mean from the overall mean, and
multiply this number by one less than the number of populations:

3[(11 – 9)2 + (10 – 9)2 +(8 – 9)2 + (7 – 9)2] = 3[4 + 1 + 1 + 4] = 30.

Degrees of Freedom

Before proceeding to the next step, we need the degrees of freedom. There
are 12 data values and four samples. Thus the number of degrees of
freedom of treatment is 4 – 1 = 3. The number of degrees of freedom of
error is 12 – 4 = 8.

Mean Squares

We now divide our sum of squares by the appropriate number of degrees of


freedom in order to obtain the mean squares.

 The mean square for treatment is 30 / 3 = 10.


 The mean square for error is 48 / 8 = 6.

The F-statistic

The final step of this is to divide the mean square for treatment by the mean
square for error. This is the F-statistic from the data. Thus, for our example F
= 10/6 = 5/3 = 1.667.

Tables of values or software can be used to determine how likely it is to


obtain a value of the F-statistic as extreme as this value by chance alone.

EXAMPLE FOUR
Steps in One-Way ANOVA
One-way ANOVA (Analysis of Variance) is a statistical test used to compare
the means of three or more samples to determine if there are significant
differences among them. It is based on the assumption that the samples are
drawn from normally distributed populations with equal variances.
Steps in One-way ANOVA:

26
1. Specify the null and alternative hypotheses. The null
hypothesis is usually that there is no difference among the means of
the samples, while the alternative hypothesis is that there is at least
one difference among the means of the samples.
2. Select three or more samples from the populations and calculate
the sample means and sizes.
3. Calculate the overall mean of all the samples combined.
4. Calculate the sum of squares within groups (SSW), the sum
of squares between groups (SSB) and the degrees of freedom
for both SSW and SSB.
5. Calculate the F statistic as the ratio of the Mean SSB to the Mean
SSW.
6. Determine the critical value of the F statistic based on the test's
significance level (alpha) and the degrees of freedom for the
numerator and denominator. The degrees of freedom for the
numerator is the number of groups minus 1, while the degrees of
freedom for the denominator is the total sample size minus the
number of groups.
7. Compare the calculated F statistic to the critical value to
determine whether to reject or fail to reject the null hypothesis. If the
calculated F statistic exceeds the critical value, the null hypothesis is
rejected, and the alternative hypothesis is accepted.
Conditions for One-way ANOVA:
To conduct a valid one-way ANOVA, the following conditions must be met:
1. The samples must be drawn randomly from the populations.
2. Each observation in each sample must be independent of the others.
3. The population distributions must approximate a normal
distribution.
4. The population variances must be equal.
Typical Null and Alternate Hypotheses in One-way ANOVA:

27
The null hypothesis in a one-way ANOVA is that there is no difference among
the means of the samples. This can be expressed as:
H0: μ1 = μ2 = … = μk
Where μ1, μ2, …, μk are the means of the samples.
The alternate hypothesis is the opposite of the null hypothesis and is that
there is at least one difference among the means of the samples. This can be
expressed as:
H0: At least one μ1 ≠ μ2
Calculating SSW and SSB:
Sum of Square Within (SSW):
The Sum of Squares Within (SSW) measures the variance within the groups.
It is calculated as the sum of the squared differences between each
individual observation and the group mean divided by the degrees of
freedom within the groups.
The formula for calculating SSW is:
SSW=Σ(Xi−M)2
Where X is the individual observation, and M is the group mean.

Sum of Square Between (SSB):

28
The formula for calculating the Sum of Squares Between (also known as SSB)
is:
SSB=Σ(Mi−M)2∗ni
Where Mi is the mean of the ith group, M is the grand mean (the mean of all
observations), and n is the number of observations in each group. (assuming
an equal number of observations in each group)
To calculate SSB, you would first need to find the mean of each group and
the grand mean. Then, for each group, you would subtract the grand mean
from the group mean and square the result. Finally, you would multiply this
squared difference by the number of observations in the group and sum the
results for all groups to get the SSB.

Calculating F Statistic:
The F statistic in a one-way ANOVA (analysis of variance) is a measure of
how much variation in the data can be attributed to the different groups
(also known as "treatments") compared to the variation within the groups. It
is calculated as the ratio of the Mean SSB (mean sum of squares between
groups) to the Mean SSW (mean sum of squares within groups).

29
The Mean SSB is a measure of the variation between the group means, and
is calculated as:
Mean SSB=SSB(k−1)
Where SSB is the sum of squares between groups, and k is the number of
groups.
The Mean SSW is a measure of the variation within each group, and is
calculated as:
Mean SSW=SSW(n−k)
Where SSW is the sum of squares within groups, and n is the total number of
observations.
The F statistic is then calculated as:
F=Mean SSB Mean SSW=SSBk−1SSWn−k
A high F value indicates that there is a significant difference between the
group means, while a low F value indicates that the group means are not
significantly different. The F statistic is used to test the null hypothesis that
there is no significant difference between the group means. If the calculated
F value is greater than the critical F value, the null hypothesis is rejected,
and it is concluded that there is a significant difference between the group
means.
Calculating Critical Values:
The critical values for the F statistic in a one-way ANOVA depend on the
significance level of the test and the degrees of freedom for the numerator
and denominator. The degrees of freedom for the numerator are the number
of groups minus 1 or (k-1), while the degrees of freedom for the denominator
are the total sample size minus the number of groups or (n-k). Using these
two values (significance level and degrees of freedom), you can find out the
value of the critical F statistic using an F Distribution Table.

30
In addition, you can use statistical software to find out the critical value. In
Excel, you can use =F.INV.RT(probability, deg_freedom1, deg_freedom2)
for the right tail value.
Remember: One-way ANOVA is always a one-tail (right tail) test.

2. CORRELATION
A correlation is a statistical calculation which describes the nature of the
relationship between two variables (i.e., strong and negative, weak and
positive, statistically significant).
An important thing to remember when using correlations is that a correlation
does not explain causation. A correlation merely indicates that a relationship
or pattern exists, but
it does not mean that one variable is the cause of the other.
For example, you might see a strong positive correlation between
participation in the Group discussions and students’ grades during course
work; however, the correlation will not tell you if the course work duration is
the reason why students’ grades were higher.
3. REGRESSION

31
Regression is an extension of correlation and is used to determine whether
one variable is a predictor of another variable. A regression can be used to
determine how strong the relationship is between your intervention and your
outcome variables. More importantly, a regression will tell you whether a
variable (e.g., participation in your program) is a statistically significant
predictor of the outcome variable (e.g.,1st class, 2rd class upper, etc.). A
variable can have a positive or negative influence, and the strength of the
effect can be weak or strong. For example, a regression would help you
determine if the length of participation (number of weeks) within the
semester program is actually a predictor of students’ grades at the end of
the semester. Like correlations, causation cannot be inferred from
regression, only prediction.
Other than the above, quantitative data analysis may also be done using
parametric and non-parametric statistics.
Parametric methods make assumptions about the population distribution.
Whether normal or not normal. On the other hand, non-parametric are
statistical techniques for which we do not have to make any assumption of
normality for the population we are studying. Indeed, the methods do not
have any dependence on the population of interest.

32

You might also like