Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
47 views14 pages

MODULE-1 Introduction To Statistics

stats
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views14 pages

MODULE-1 Introduction To Statistics

stats
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

PCOA 022 – Statistical Analysis with Software Application S.Y.

2021-2022
Summer Term
Instructor: Mary Jane Bugarin-Tolentino, LPT

Module 1 : Introduction to Statistics


MOTIVATION
The Recall Challenge
This is a recall challenge. You have to write at least 5 things that comes into mind whenever you hear the
word Statistics. It could be terms and definitions, statistical tools, or even how statistics is applied in daily
life.

CONTENT Definition of Terms

Historical/ Biblical Note Descriptive Statistics


The origin of descriptive statistics is a statistical procedure concerned with describing the characteristics and properties of a group
can be traced to data collection of persons, places or things
methods used in censuses taken by
the Babylonians and Egyptians
between 4500 and 3000 BC.
it is based on easily verifiable facts.
In the Roman Empire between 27
BC to 17 AD conducted surveys on
births and deaths its citizens, the It organizes the presentation, description, and interpretation of data gathered. It
number of livestock and the includes the study of relationships among variables.
harvested crops yearly.

Luke 2:1-4
In those days a decree went out
It includes the study of relationships among variables.
from Caesar Augustus that all the
world should be registered. 2 This
was the first registration If you have gathered data from a survey and have organized them in a
when[a] Quirinius was governor of
3
Syria. And all went to be systematic, easy-to-read manner then you have succeeded in applying the basic
registered, each to his own
town. 4 And Joseph also went principles of descriptive statistics.
up from Galilee, from the town
of Nazareth, to Judea, to the city of Among the measurements falling under descriptive statistics are the measures
David, which is
called Bethlehem, because he was of central tendency, measures of variability, skewness, kurtosis, minimum, maximum,
of the house and lineage of David…
summation and other items which help in describing a data set.
Examples of Descriptive statistics are as follows:
1. Based on the research conducted by the DOH, 63% of those found to have diabetes were not aware
that they have such disease.
2. According to the nationwide survey, the three highest responses to persons living with older persons
are: grandchild (61.8%), spouse (59%) and daughter (50.9%).
3. Cigarettes were associated with 29% of the 4,470 civilian fire deaths in 1989.

Descriptive statistics answers the questions such as:


1. How many students are interested to take online classes?
2. What months has the highest and the lowest number of covid-19 positive?
3. What are the most likable Netflix series according to students?
4. Who performed better in the entrance examination?
5. What proportion of the ULS college students likes online class?

MICROSOFT OFFICE USER 1


PCOA 022 – Statistical Analysis with Software Application S.Y.2021-2022
Summer Term
Instructor: Mary Jane Bugarin-Tolentino, LPT

Inferential Statistics
is a statistical procedure used to draw inferences for the population on the basis of the
information obtained from the sample.

you are going to try to arrive at conclusions extending beyond the data alone.

You may use it to make judgments of the possibility that an observed difference between
groups/data is a dependable one or it just happened due to chance.

It is a matter of deciding between reality and coincidence.

Examples of Inferential statistics are as follows:


1. Eating garlic can lower blood pressure.
2. Drinking red wine may reduce the risk of heart diseases by 12%.
3. Aspirin may lower the rate of heart attacks by 50%.
4. Carrot juice may strengthen the lungs.
5. Eating chili foods may cause shrinkage of the liver.

Inferential statistics can answer questions such as:

1. Is there a significant difference in the academic performance of students enrolled in an online and
modular class?
2. Is there a significant difference between the proportions of students who are interested to take
statistics online and those who are not?
See CLMS for Activity 1

Population – refers to a large collection of objects, places or things.


Parameter – is any numerical value which describes a population.
Example: There are 7, 592 students enrolled in a certain Marian Institution.
N = 7, 592 Parameter (N)

Sample – is a small portion or part of a population; a representative of the population in a research study.
Statistic – is any numerical value which describes a sample. Statistic (n)

Example: Out of the 7, 592 students enrolled in a Marian Institution, 3,568 are Female. n = 3,568

MICROSOFT OFFICE USER 2


PCOA 022 – Statistical Analysis with Software Application S.Y.2021-2022
Summer Term
Instructor: Mary Jane Bugarin-Tolentino, LPT

Constant – is a characteristic or property of a population or sample which makes the members similar
to each other.
Variable – is a characteristic or property of a population or sample which makes the members different
from each other.
Dependent Variable – A variable that is affected by another variable.
Independent Variable – a variable which affects the dependent variable.
Scales of Measurement
1. Nominal level of measurement classifies data into mutually exclusive categories in which no order
or ranking can be imposed on the data. Nominal numbers are just labels. e.g. SSS number
2. Ordinal level of measurement classifies data into categories that can be ranked; however, precise
differences between the ranks do not exist. e.g. size of t-shirt.
3. Interval level of measurement ranks data, and precise differences between units of measure do exist;
however, there is no meaningful zero. e.g. temperature.
4. Ratio level of measurement possesses all the characteristics of interval measurement, and there exists
a true zero. in addition, true ratios exist when the same variable is measured on two different members
of the population. e.g. height

See CLMS for Activity 2

MICROSOFT OFFICE USER 3


PCOA 022 – Statistical Analysis with Software Application S.Y.2021-2022
Summer Term
Instructor: Mary Jane Bugarin-Tolentino, LPT

Sum It Up!

The study of statistics involves the collection of data or measurement. Thus, there is always a need to add
several numbers. The Greek capital letter sigma, Σ is used in the process. The symbol Σ, read as the sum of tells you
to add certain numerical values.

Example 1: Consider the scores obtained by 10 students in a 50-items statistics test.

Student No. Score


1 35 For convenience, variables will be used to present the data.
2 40 Let x = score obtained by each student
3 29
xi = different values or observations of x
4 37
5 25 xi is read as “x sub i” where i is a subscript which indicates
6 33 the position ofeach value in the series.
7 49
8 47
9 28
10 42
Solution: In the given data, there are 10 observations denoted as
𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 , 𝑥6 , 𝑥7 , 𝑥8 , 𝑥9 , 𝑥10 . Hence,

The symbol is
To substitute the data:
read as “the summation
of xi, i from 1 to 10”.

For large observations, say 50, the summation will be expressed as:

In general,

If all the given values of a variable are to be used in finding the sum, the limits of the summation are usually
omitted, as

MICROSOFT OFFICE USER 4


PCOA 022 – Statistical Analysis with Software Application S.Y.2021-2022
Summer Term
Instructor: Mary Jane Bugarin-Tolentino, LPT

Example 2: Given are the ages of the first 4 shoppers at a newly openedconvenience store in the neighborhood
12, 24, 30, 45.
a. What will x represent in the information given?
b. What will the subscript i represent?
c. Write an expression for the sum.
d. What are the lower and upper limits of the expression?
e. Write the formula for the summation and find the sum of the given information.

Answers:
a. x will represent the ages of the first 4 shoppers in the newly opened convenience store.
b. i will represent the first 4 shoppers in the newly opened conveniencestore.
c. The expression for the summation would be
4

∑ 𝑥𝑖
𝑖=1

d. The lower limit is 1, and the upper limit is 4.


e.
4

∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4
𝑖=1

= 12 + 24 + 30 + 45
= 𝟏𝟏𝟏

This time, consider 5 observations.


• The sum of five observations is written as:
5

∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 + 𝑥5
𝑖=1
• The sum of the squares of the five observations is represented as:
5

∑ 𝑥𝑖 2 = 𝑥1 2 + 𝑥2 2 + 𝑥3 2 + 𝑥4 2 + 𝑥5 2
𝑖=1
• The sum of the products of pairs of five observations is expressed as:
5

∑ 𝑎𝑖 𝑥𝑖 = 𝑎1 𝑥1 + 𝑎2 𝑥2 + 𝑎3 𝑥3 + 𝑎4 𝑥4 + 𝑎5 𝑥5
𝑖=1
Solution: In the given data, there are 10 observations denoted as
𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 , 𝑥6 , 𝑥7 , 𝑥8 , 𝑥9 , 𝑥10 . Hence,

Example 3: Consider the first four multiples of 2: 2, 4, 6, 8. Use the corresponding summation formula to find the
following:
To substitute the data:
a. The sum of the first four multiples of 2
b. The sum of the squares of the first four multiples of 2.
c. The sum of the products of pairs of values consisting of the first four counting numbers and the first
four multiples of 2.

MICROSOFT OFFICE USER 5


PCOA 022 – Statistical Analysis with Software Application S.Y.2021-2022
Summer Term
Instructor: Mary Jane Bugarin-Tolentino, LPT

Answers:
a.
4

∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4
𝑖=1

= 2+4+6+8
= 𝟐𝟎
b.
4

∑ 𝑥𝑖 2 = 𝑥1 2 + 𝑥2 2 + 𝑥3 2 + 𝑥4 2
𝑖=1

= 22 + 42 + 62 + 82
= 𝟏𝟐𝟎
c.
4

∑ 𝑎 𝑖 𝑥𝑖 = 𝑎1 𝑥1 + 𝑎2 𝑥2 + 𝑎3 𝑥3 + 𝑎4 𝑥4
𝑖=1

= 1(2) + 2(4) + 3(6) + 4(8)


= 𝟔𝟎

Example 4: Find
6

∑3
𝑖=1

Solution:
6

∑ 3 = 3 + 3 + 3 + 3 + 3 + 3 = 6(3)
𝑖=1
= 𝟏𝟖

Observe that in example 4, the summation of a constant c is the product of the constant and the number of
terms n in the summation, that is,
𝑛

∑ 𝑐 = 𝑛𝑐
𝑖=1

See CLMS for Activity 3

MICROSOFT OFFICE USER 6


PCOA 022 – Statistical Analysis with Software Application S.Y.2021-2022
Summer Term
Instructor: Mary Jane Bugarin-Tolentino, LPT

Population and Samples


In doing research, if the population is too big, a scientific number of
Note this: samples is acceptable. One way of getting a number of samples is by using the
Sampling is the process of RAOSOFT survey tool. You can use the raosoft calculator tool online to
selecting samples from a given compute the desired and substantial sample size.
population
http://www.raosoft.com/samplesize.html
There are two types of sampling
techniques
(1) Probability Sampling: Note that the “e” is called the margin of error. It is a value which
samples are chosen in such quantifies possible sampling errors. Usually the margin of error is either 0.01 or
a way that each member of
the population has an equal
1%, 0.10 or 10 % and 0.05 or 5%. Sampling error means that the results in the
chance of being selected in sample differ from those of the target population because of the “luck of the
the samples draw”.
(2) Non Probability Sampling:
each member of the
population does not have a Since you already know what to use to compute the appropriate sample
known chance of being size, the next is how to select the samples from the population. This is referred
included in the sample.
Hence, personal judgment
to as sampling.
plays an important role in the
selection. We will only consider and discuss the probability sampling techniques.

Probability Sampling Techniques

A. Simple random sampling. This is a procedure where a sample is selected in such a way that every element
is as likely to be selected as any other element in the population.
Example: Lottery: this needs a complete list of the population. You write the names or codes of each
member and place them in a container, then randomly draw the desired number of samples. This is easy if
the population is small.

B. Systematic random sampling. This method is a sampling procedure with a random start. Samples are
randomly chosen using the rules set by the researchers. This involves choosing the 𝑘 𝑡ℎ member of the
𝑁
population, with 𝑘 = 𝑛 , but there should be a random start.

Example: Choose a sample of size 10 from N = 500.


1. Choose a random start, say 10.
500
2. Determine the 𝑘 𝑡ℎ period by 𝑘 = = 50, so every 50th member will be chosen starting from 10.
10
3. So the respondents will be member number 10, 60, 110, 160, 210, 260, 310, 360, 410, 460.

MICROSOFT OFFICE USER 7


PCOA 022 – Statistical Analysis with Software Application S.Y.2021-2022
Summer Term
Instructor: Mary Jane Bugarin-Tolentino, LPT
C. Stratified random sampling. This is used when the population can be naturally classified into groups or
strata.
Example: A survey to find out families living in a certain municipality are in favor of charter change will be
conducted. To ensure that all income groups are represented, respondents will be divided into high-income
(Class A), middle (class B) and low-income (class C) groups. Below is the distribution of income groups.

Strata Number of Families

Class A 1000

Class B 2 500

Class C 1 500

N 5 000

1. Using Raosoft Calculator to find the sample size (n), use 5% margin of error with 50% response rate,
n = 357
(Note: that when you use 5% margin of error, the confidence level would be at 95%; 3% margin of error
would mean 97% confidence level. Which would mean that margin error + confidence level = 100%

2. Use proportional allocation, how many from each group should be taken as sample?

Strata Number of Families Percent Number of


Samples (n)

Class A 1000 1000 (0.2)(357)


= 0.2 = 20%
5000 = 71.4 = 71

Class B 2 500 2500 (0.5)(357)


= 0.5 = 50%
5000 = 178.5 = 179

Class C 1 500 1500 (0.3)(357)


= 0.3 = 30%
5000 = 107.1 = 107
D. Cluster sampling. This
N can be done5 000by subdividing the population into a smaller nunits
= 357and then selecting only
at random some primary units where the study would then be concentrated. This is sometimes referred to as
an “area sampling” because it is frequently applied on a geographical basis.
So, 71 families should be taken as respondents from Class A, 179 from Class B and 107 from Class C, for a
totalCluster
of 357.sampling is used when the members of a population is too large. Instead of using simple random
sampling, the population can be divided into clusters, like cities could be divided into districts or areas.
Examples: In a large school district, all teachers from two buildings are interviewed for a study or research.
Nurses required details relating to 20 female patients with asthma between 30 and 50 years of
age. They simply select the first 20 individuals who present themselves and fulfill these criteria.

See CLMS for Activity 4

MICROSOFT OFFICE USER 8


PCOA 022 – Statistical Analysis with Software Application S.Y.2021-2022
Summer Term
Instructor: Mary Jane Bugarin-Tolentino, LPT

Data Collection and Presentation


Data are needed whenever we undertake studies or researches. They are used to undertake
particular problems or to provide a basis which certain decisions are generated.

Types of Data

1. Primary Data are data collected directly by the researcher himself. These are first-hand or
original sources. They can be collected through the following:

❖ Direct observation or measurement


❖ Interview using sets of questions called questionnaires or rating scales as guides in
collecting objective and measurable data
❖ Experimentation to find out cause and effect of a certain phenomenon
❖ Registration such as registry of births, deaths, marriages.

2. Secondary Data are information taken from published or unpublished materials previously
gathered by other researchers or agencies such as books, newspapers, magazines, journals,
published and unpublished thesis and dissertations.

Organizing Data in a Table


The study of statistics begins with the collection of data or measurements. Data collected should be organized
systematically for easier and faster interpretation. They may be presented in any of the following forms:
The textual form can be used if the data to be presented is few.
The tabular and graphical forms are used when more detailed information about the data is to be presented.
A table is used when you want to present a data in a systematic and organized manner so that reading and
interpretation will be simpler and easier.

When a table is used, you must consider the following parts:

1. Table number Table 3


Distribution of students Hogwarts School According to Year Level
2. Table Title
3. Column header Year Level Number of Students
4. Row classifier Freshman 350
5. Body of the table
Sophomore 300
6. Source note
Junior 250

Senior 200

total 1 100

Source: Hogwarts Registrar

MICROSOFT OFFICE USER 9


PCOA 022 – Statistical Analysis with Software Application S.Y.2021-2022
Summer Term
Instructor: Mary Jane Bugarin-Tolentino, LPT

Example 1:
Table 1
Mahusay National High
School Enrolment, SY
2005-2006
Year Level Male Female
First 216 267
Second 197 216
Third 187 227
Fourth 176 215
Total 776 925

You will observe that the table above shows clearly the enrolment data in Mahusay National High School for the
school year 2005-2006.

Another type of tabular presentation is the frequency table. It is an arrangement of the data that shows
the frequency of occurrence of different values of the variables.

A frequency table is constructed by listing the measurements from highest to lowest, then making tally marks
to record how often each number occurs. After tallying, count the marks and record them in the proper column.

Example 2: The scores of 45 students on a 20-point Science quiz are as follows:

17 20 15 18 19 16 11 10 15 16
12 12 13 14 11 10 14 13 12 11
13 15 14 10 15 16 17 17 18 20
20 18 19 19 18 17 16 15 12 12
13 14 15 19 20

Solution: To prepare a frequency table for the given set of scores, the scores are listed from highest to lowest,
tally marks are made and counted. The counted tally marks will then be recorded under the column
frequency. Notice that every 5th tally crosses the first four tallies. This is done to make counting
of marks easier especially if the number of cases is rather big.

Score Tallies Frequency


20 //// 4 Prepare a frequency table
19 //// 4 for the set of data.
18 //// 4
17 //// 4
16 //// 4
15 ////// 6
14 //// 4
13 //// 4
12 ///// 5
11 /// 3
10 /// 3
Total 45

MICROSOFT OFFICE USER 10


PCOA 022 – Statistical Analysis with Software Application S.Y.2021-2022
Summer Term
Instructor: Mary Jane Bugarin-Tolentino, LPT

Frequency Distribution Table


If the number of measures in consideration is rather big, the presentation of data is further simplified by
grouping the measures into class intervals called a frequency distribution.

A frequency distribution is a distribution of the total number of measures or frequencies over arbitrarily
defined categories or classes. The number of measures falling under a class is called class frequency.
Example 1.
The frequency distribution below shows the scores obtained by 300 students in an English test of 50 items.
Number of
Score Students
45-49 15
40-44 32
35-39 42
30-34 108
25-29 67
20-24 21
15-19 10
10-14 5
Total 300

➢ In the example above, the symbol 45-49 and the other symbols which follow up to 10-14 are called class
intervals. The end numbers are called class limits. For instance, in the class interval 45-49, 45 is called the
lower limit while 49 is called the upper limit.

➢ Each class interval has also a lower boundary and a higher boundary. For the class interval 45-49, the lower
boundary is 44.5 while the higher boundary is 49.5. Hence, for the class interval 45 - 49, 44.5 – 49.5 are
called the class boundaries.

➢ The size of the class interval, also called class size is the difference between the upper boundary and the
lower boundary. Hence, the class size in the given example is 5 (obtained from 49.5 – 44.5)

➢ A class interval has also a midpoint or a class mark. It is obtained by taking half the sum of the lower and
45+49
upper class limit. For instance, the midpoint of the class interval 45-49 is 2 or 47.

Range (R) is the difference of the Highest score (H) and the lowest score (L) in the given data set.
The following are the suggested steps on how to make a class interval:
1. Determine the desired number of classes (n) (number of rows)
2. Solve for the class width (i)
𝑅𝑎𝑛𝑔𝑒
𝑖= 𝑛

3. Start the lowest class interval with the lowest value / score in the given data set. (lowest score plus i).
Continue until the highest value in the distribution is reached.

See CLMS for Activity 5

MICROSOFT OFFICE USER 11


PCOA 022 – Statistical Analysis with Software Application S.Y.2021-2022
Summer Term
Instructor: Mary Jane Bugarin-Tolentino, LPT

Graphical Presentation of Data


1. Bar chart
•It is constructed by labeling each category of data on either the horizontal or vertical axis and the
frequency of the category on the other axis.
• Rectangles of equal width are drawn for each category.
• The height of each rectangle represents the category’s frequency.
• It is used to organize discrete data.
Multiple Bar Graph
Simple Bar Graph
The multiple bar chart is an extension of a simple bar chart when
The simple bar chart is used for the case of one there are quantities of several variables to be displayed. The bars
variable only. representing the quantities for the different variables are piled next
to one another for each attribute.

2. Histogram

• A histogram is a bar graph-like representation of data that buckets a range of outcomes into columns
along the x-axis.
• The y-axis represents the number count or percentage of occurrences in the data for each column and
can be used to visualize data distributions.
• It is constructed by drawing rectangles for each class of data. The height of each rectangle is the
frequency. The width of each rectangle is the same and the rectangles touch each other.
• It is a graph used to present quantitative data, is similar to the bar graph.
• It is used to organize continuous data.

MICROSOFT OFFICE USER 12


PCOA 022 – Statistical Analysis with Software Application S.Y.2021-2022
Summer Term
Instructor: Mary Jane Bugarin-Tolentino, LPT

3. Pie chart

• A Pie Chart (or Pie Graph) is a special chart that uses "pie slices" to show relative sizes of data.
• It is a circle divided into sectors. Each sector represents a category of data. The area of each sector is
proportional to the frequency of the category.
• Pie charts are typically used to present the relative frequency of qualitative data.

4. Line Graph

• A graph that shows information that is connected in some way (such as change over time).
• Line segments are then drawn connecting the points. It is use to organize continuous data.
• Very useful in identifying trends in the data over time.
Simple Line Graph Multiple Line Graph

The simplest of line graphs is the single line graph, so Multiple line graphs illustrate information on several
called because it displays information concerning one variables so that comparison is possible between them.
variable only, in terms of its frequencies.

See CLMS for a reflection activity

MICROSOFT OFFICE USER 13


PCOA 022 – Statistical Analysis with Software Application S.Y.2021-2022
Summer Term
Instructor: Mary Jane Bugarin-Tolentino, LPT

SUMMARY
Statistics is a branch mathematics that deals with the collection, organization or
presentation, analysis, and interpretation of data. Its fundamental purpose is to
describe and draw inferences about the numerical properties of a population.

Two main parts of Statistics: Descriptive Statistics and Inferential Statistics


Population – refers to a large collection of objects, places or things.
Parameter – is any numerical value which describes a population.
Sample – is a small portion or part of a population; a representative of the
population in a research study.
Sampling –is the process of selecting the elements of a sample from the population being studied. The methods of
sampling include simple random sampling, systematic random sampling, and stratified random sampling.
Statistic – is any numerical value which describes a sample
Data - are facts, or a set of information gathered or under study.
Data can be Qualitative or Quantitative. And quantitative data is classified into two: discrete and
continuous
Constant – is a characteristic or property of a population or sample which makes the members similar to each other.
Variable – is a characteristic or property of a population or sample which makes the members different from each
other. Variable can either be dependent or independent.
Scales of Measurement: Nominal level of measurement, Ordinal level of measurement, Interval level of
measurement, and ratio level of measurement.
Table is used to present a data in a systematic and organized manner to make its reading and interpretation simple
and easy.

Frequency Distribution is a distribution of the total number of measures or frequencies over arbitrarily defined
categories or classes. The number of measures falling under a class is called class frequency.
Data can be presented in textual form, tabular form, or graphical form.

REFERENCES
Altares, et. Al (2005). Elementary Statistics with Computer Applications: Rex Bookstore, Manila, Philippines
Balayan, et. al (2006). Biostatistics a Foundation to the Medical & Health Sciences. Sampaloc: Sta. Monica
Printing Corporation
Blay, Basilia e. (2007). Elementary Statistics. Pasig City: Anvil Publishing, Inc.
Elston and Johnson (1995).Essentials of Biostatistics. Singapore: Info Access & Distribution Pte LTD.

MICROSOFT OFFICE USER 14

You might also like