MODULE-1 Introduction To Statistics
MODULE-1 Introduction To Statistics
2021-2022
Summer Term
Instructor: Mary Jane Bugarin-Tolentino, LPT
Luke 2:1-4
In those days a decree went out
It includes the study of relationships among variables.
from Caesar Augustus that all the
world should be registered. 2 This
was the first registration If you have gathered data from a survey and have organized them in a
when[a] Quirinius was governor of
3
Syria. And all went to be systematic, easy-to-read manner then you have succeeded in applying the basic
registered, each to his own
town. 4 And Joseph also went principles of descriptive statistics.
up from Galilee, from the town
of Nazareth, to Judea, to the city of Among the measurements falling under descriptive statistics are the measures
David, which is
called Bethlehem, because he was of central tendency, measures of variability, skewness, kurtosis, minimum, maximum,
of the house and lineage of David…
summation and other items which help in describing a data set.
Examples of Descriptive statistics are as follows:
1. Based on the research conducted by the DOH, 63% of those found to have diabetes were not aware
that they have such disease.
2. According to the nationwide survey, the three highest responses to persons living with older persons
are: grandchild (61.8%), spouse (59%) and daughter (50.9%).
3. Cigarettes were associated with 29% of the 4,470 civilian fire deaths in 1989.
Inferential Statistics
is a statistical procedure used to draw inferences for the population on the basis of the
information obtained from the sample.
you are going to try to arrive at conclusions extending beyond the data alone.
You may use it to make judgments of the possibility that an observed difference between
groups/data is a dependable one or it just happened due to chance.
1. Is there a significant difference in the academic performance of students enrolled in an online and
modular class?
2. Is there a significant difference between the proportions of students who are interested to take
statistics online and those who are not?
See CLMS for Activity 1
Sample – is a small portion or part of a population; a representative of the population in a research study.
Statistic – is any numerical value which describes a sample. Statistic (n)
Example: Out of the 7, 592 students enrolled in a Marian Institution, 3,568 are Female. n = 3,568
Constant – is a characteristic or property of a population or sample which makes the members similar
to each other.
Variable – is a characteristic or property of a population or sample which makes the members different
from each other.
Dependent Variable – A variable that is affected by another variable.
Independent Variable – a variable which affects the dependent variable.
Scales of Measurement
1. Nominal level of measurement classifies data into mutually exclusive categories in which no order
or ranking can be imposed on the data. Nominal numbers are just labels. e.g. SSS number
2. Ordinal level of measurement classifies data into categories that can be ranked; however, precise
differences between the ranks do not exist. e.g. size of t-shirt.
3. Interval level of measurement ranks data, and precise differences between units of measure do exist;
however, there is no meaningful zero. e.g. temperature.
4. Ratio level of measurement possesses all the characteristics of interval measurement, and there exists
a true zero. in addition, true ratios exist when the same variable is measured on two different members
of the population. e.g. height
Sum It Up!
The study of statistics involves the collection of data or measurement. Thus, there is always a need to add
several numbers. The Greek capital letter sigma, Σ is used in the process. The symbol Σ, read as the sum of tells you
to add certain numerical values.
The symbol is
To substitute the data:
read as “the summation
of xi, i from 1 to 10”.
For large observations, say 50, the summation will be expressed as:
In general,
If all the given values of a variable are to be used in finding the sum, the limits of the summation are usually
omitted, as
Example 2: Given are the ages of the first 4 shoppers at a newly openedconvenience store in the neighborhood
12, 24, 30, 45.
a. What will x represent in the information given?
b. What will the subscript i represent?
c. Write an expression for the sum.
d. What are the lower and upper limits of the expression?
e. Write the formula for the summation and find the sum of the given information.
Answers:
a. x will represent the ages of the first 4 shoppers in the newly opened convenience store.
b. i will represent the first 4 shoppers in the newly opened conveniencestore.
c. The expression for the summation would be
4
∑ 𝑥𝑖
𝑖=1
∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4
𝑖=1
= 12 + 24 + 30 + 45
= 𝟏𝟏𝟏
∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 + 𝑥5
𝑖=1
• The sum of the squares of the five observations is represented as:
5
∑ 𝑥𝑖 2 = 𝑥1 2 + 𝑥2 2 + 𝑥3 2 + 𝑥4 2 + 𝑥5 2
𝑖=1
• The sum of the products of pairs of five observations is expressed as:
5
∑ 𝑎𝑖 𝑥𝑖 = 𝑎1 𝑥1 + 𝑎2 𝑥2 + 𝑎3 𝑥3 + 𝑎4 𝑥4 + 𝑎5 𝑥5
𝑖=1
Solution: In the given data, there are 10 observations denoted as
𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 , 𝑥6 , 𝑥7 , 𝑥8 , 𝑥9 , 𝑥10 . Hence,
Example 3: Consider the first four multiples of 2: 2, 4, 6, 8. Use the corresponding summation formula to find the
following:
To substitute the data:
a. The sum of the first four multiples of 2
b. The sum of the squares of the first four multiples of 2.
c. The sum of the products of pairs of values consisting of the first four counting numbers and the first
four multiples of 2.
Answers:
a.
4
∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4
𝑖=1
= 2+4+6+8
= 𝟐𝟎
b.
4
∑ 𝑥𝑖 2 = 𝑥1 2 + 𝑥2 2 + 𝑥3 2 + 𝑥4 2
𝑖=1
= 22 + 42 + 62 + 82
= 𝟏𝟐𝟎
c.
4
∑ 𝑎 𝑖 𝑥𝑖 = 𝑎1 𝑥1 + 𝑎2 𝑥2 + 𝑎3 𝑥3 + 𝑎4 𝑥4
𝑖=1
Example 4: Find
6
∑3
𝑖=1
Solution:
6
∑ 3 = 3 + 3 + 3 + 3 + 3 + 3 = 6(3)
𝑖=1
= 𝟏𝟖
Observe that in example 4, the summation of a constant c is the product of the constant and the number of
terms n in the summation, that is,
𝑛
∑ 𝑐 = 𝑛𝑐
𝑖=1
A. Simple random sampling. This is a procedure where a sample is selected in such a way that every element
is as likely to be selected as any other element in the population.
Example: Lottery: this needs a complete list of the population. You write the names or codes of each
member and place them in a container, then randomly draw the desired number of samples. This is easy if
the population is small.
B. Systematic random sampling. This method is a sampling procedure with a random start. Samples are
randomly chosen using the rules set by the researchers. This involves choosing the 𝑘 𝑡ℎ member of the
𝑁
population, with 𝑘 = 𝑛 , but there should be a random start.
Class A 1000
Class B 2 500
Class C 1 500
N 5 000
1. Using Raosoft Calculator to find the sample size (n), use 5% margin of error with 50% response rate,
n = 357
(Note: that when you use 5% margin of error, the confidence level would be at 95%; 3% margin of error
would mean 97% confidence level. Which would mean that margin error + confidence level = 100%
2. Use proportional allocation, how many from each group should be taken as sample?
Types of Data
1. Primary Data are data collected directly by the researcher himself. These are first-hand or
original sources. They can be collected through the following:
2. Secondary Data are information taken from published or unpublished materials previously
gathered by other researchers or agencies such as books, newspapers, magazines, journals,
published and unpublished thesis and dissertations.
Senior 200
total 1 100
Example 1:
Table 1
Mahusay National High
School Enrolment, SY
2005-2006
Year Level Male Female
First 216 267
Second 197 216
Third 187 227
Fourth 176 215
Total 776 925
You will observe that the table above shows clearly the enrolment data in Mahusay National High School for the
school year 2005-2006.
Another type of tabular presentation is the frequency table. It is an arrangement of the data that shows
the frequency of occurrence of different values of the variables.
A frequency table is constructed by listing the measurements from highest to lowest, then making tally marks
to record how often each number occurs. After tallying, count the marks and record them in the proper column.
17 20 15 18 19 16 11 10 15 16
12 12 13 14 11 10 14 13 12 11
13 15 14 10 15 16 17 17 18 20
20 18 19 19 18 17 16 15 12 12
13 14 15 19 20
Solution: To prepare a frequency table for the given set of scores, the scores are listed from highest to lowest,
tally marks are made and counted. The counted tally marks will then be recorded under the column
frequency. Notice that every 5th tally crosses the first four tallies. This is done to make counting
of marks easier especially if the number of cases is rather big.
A frequency distribution is a distribution of the total number of measures or frequencies over arbitrarily
defined categories or classes. The number of measures falling under a class is called class frequency.
Example 1.
The frequency distribution below shows the scores obtained by 300 students in an English test of 50 items.
Number of
Score Students
45-49 15
40-44 32
35-39 42
30-34 108
25-29 67
20-24 21
15-19 10
10-14 5
Total 300
➢ In the example above, the symbol 45-49 and the other symbols which follow up to 10-14 are called class
intervals. The end numbers are called class limits. For instance, in the class interval 45-49, 45 is called the
lower limit while 49 is called the upper limit.
➢ Each class interval has also a lower boundary and a higher boundary. For the class interval 45-49, the lower
boundary is 44.5 while the higher boundary is 49.5. Hence, for the class interval 45 - 49, 44.5 – 49.5 are
called the class boundaries.
➢ The size of the class interval, also called class size is the difference between the upper boundary and the
lower boundary. Hence, the class size in the given example is 5 (obtained from 49.5 – 44.5)
➢ A class interval has also a midpoint or a class mark. It is obtained by taking half the sum of the lower and
45+49
upper class limit. For instance, the midpoint of the class interval 45-49 is 2 or 47.
Range (R) is the difference of the Highest score (H) and the lowest score (L) in the given data set.
The following are the suggested steps on how to make a class interval:
1. Determine the desired number of classes (n) (number of rows)
2. Solve for the class width (i)
𝑅𝑎𝑛𝑔𝑒
𝑖= 𝑛
3. Start the lowest class interval with the lowest value / score in the given data set. (lowest score plus i).
Continue until the highest value in the distribution is reached.
2. Histogram
• A histogram is a bar graph-like representation of data that buckets a range of outcomes into columns
along the x-axis.
• The y-axis represents the number count or percentage of occurrences in the data for each column and
can be used to visualize data distributions.
• It is constructed by drawing rectangles for each class of data. The height of each rectangle is the
frequency. The width of each rectangle is the same and the rectangles touch each other.
• It is a graph used to present quantitative data, is similar to the bar graph.
• It is used to organize continuous data.
3. Pie chart
• A Pie Chart (or Pie Graph) is a special chart that uses "pie slices" to show relative sizes of data.
• It is a circle divided into sectors. Each sector represents a category of data. The area of each sector is
proportional to the frequency of the category.
• Pie charts are typically used to present the relative frequency of qualitative data.
4. Line Graph
• A graph that shows information that is connected in some way (such as change over time).
• Line segments are then drawn connecting the points. It is use to organize continuous data.
• Very useful in identifying trends in the data over time.
Simple Line Graph Multiple Line Graph
The simplest of line graphs is the single line graph, so Multiple line graphs illustrate information on several
called because it displays information concerning one variables so that comparison is possible between them.
variable only, in terms of its frequencies.
SUMMARY
Statistics is a branch mathematics that deals with the collection, organization or
presentation, analysis, and interpretation of data. Its fundamental purpose is to
describe and draw inferences about the numerical properties of a population.
Frequency Distribution is a distribution of the total number of measures or frequencies over arbitrarily defined
categories or classes. The number of measures falling under a class is called class frequency.
Data can be presented in textual form, tabular form, or graphical form.
REFERENCES
Altares, et. Al (2005). Elementary Statistics with Computer Applications: Rex Bookstore, Manila, Philippines
Balayan, et. al (2006). Biostatistics a Foundation to the Medical & Health Sciences. Sampaloc: Sta. Monica
Printing Corporation
Blay, Basilia e. (2007). Elementary Statistics. Pasig City: Anvil Publishing, Inc.
Elston and Johnson (1995).Essentials of Biostatistics. Singapore: Info Access & Distribution Pte LTD.