Statistics Lecture
Statistics Lecture
Introduction
Statistics - proved to the world as a very powerful tool in almost all fields of work.
- found in the field of research, education, business, politics, psychology, and even in a
simple event that needs analysis.
- Very useful in recording facts about people, objects, and events and in making predictions
and decisions based from the available data.
- it demands answers to some questions that were formulated from an existing situation or
environment.
- the validity, reliability, and accuracy of these answers can only be based from the proper
conduct of the five methods in statistics such as 1. Collection 2. Organization 3. Presentation 4.
Analysis and 5 interpretations of data.
Statistics can be traced back to the Biblical times in ancient Egypt, Babylon and Rome.
- as early as 3,500 yrs before the birth of Christ ; it is used in Egypt in the form of
recording the number of sheep or cattle , the amount of grain produced, the number of people
living in a particular city.
- 3800 B.C. Babylonian govt. – to measure the number of men under a king’s rule and the
vast territory that he occupied.
-700 B.C. Roman empires used statistics by conducting registration to record population
for the purpose of collecting taxes.
- used to assess opinions from polls and unlock secret codes from a game of chance.
Modern statistics is said to have begun with John Graunt (1620 – 1674), an English
tradesman.
-he published records called “bills of mortality” that included information about the
numbers and causes of deaths in the city of London.
- he analyzed more than 50 yrs of data and created the first mortality table ,that shows how
long a person may be expected to live after reaching a certain age.
Karl Friedrich Gauss (1777 – 1855), brilliant German mathematician - making predictions
about the positions of the planets in our solar system.
Adolphe Quetelet (1796-1874) , a Belgian astronomer developed the idea of the “average
man” from his studies of the Belgian census. “Father of Modern Statistics”
Karl Pearson (1857 – 1936) , an English mathematician made important links between
probability and statistics.
Sir Ronald Aylmer Fisher - British statistician developed the F – tool in inferential
statistics. His tool was very useful in testing improvements of production from agricultural
experiments and improvement of precision of results from medical, biological, and industrial
experimentation.
George Gallup (1901 – 1984) – was instrumental in making statistical polling, a common
tool in political campaigns.
Statistics - is an art and science that deals with the collection, organization, creative presentation,
analysis, and interpretation of data.
Uses of Statistics
1. Education – assess students’ performance and correlate factors affecting teaching and
learning processes to improve quality of education.
3. Business and economics – to analyze a wide range of data like sales, outputs, price
indices, revenues, costs, inventories, accounts, etc.
Fields of Statistics
- is the method of collecting and presenting data. It includes the computation of measures
of central tendency, measures of central location, likewise the measures of dispersion or
variability. It also includes the construction of tables and graphs.
Constants – refers to the fundamental quantities that do not change in value. E.g. fixed costs and
acceleration
Variables – are quantities that may take anyone of a specified set of values. Classified as
E.g. Gender – dichotomous variable since an individual may take one of the two values
(male or female)
Smoking habits – (Always/ Very often, Often, Seldom, Very seldom or Never)
2. Quantitative variables – quantities that can be counted with your bare hands, can be
measured with the use of some measuring devices, or can be calculated with the use of a
mathematical formula.
The process of using statistics always begins with a question. “Who will probably become the next
president?”
When questions like this have been asked the next step it to collect information about the
subject. The kind of information we get is called data and the people who collect, organize and
analyze the data are called researchers.
Data – refers to facts concerning things such as status in life of people, defectiveness of objects or
effect of an event to the society.
Information - is a set of data that have been processed and presented in a form suitable for human
interpretation, usually a purpose or revealing trends or patterns about the population.
Sources of Data
2. Secondary source – is taken from other’s works, news reports, readings, and those that
are kept by the NSO, etc.
A good questionnaire should contain questions that are arranged in logical order and as
much as possible in a checklist type.
1. Nominal scale – classifies objects or peoples’ responses so that all of those in a single
category are equal with respect to some attributes and then each category is coded numerically. Eg.
Marital status, single – 1, married – 2, separated – 3 , or widow – 4.
3. Interval scale – to quantitative measurements in which lower and upper control limits are
adapted to classify relative order and differences of item numbers or actual scores. Household’s
socioeconomic status are classified based from what income level and age bracket they belong.
4. Ratio scale – takes into account the interval size and ratio of two related quantities,
which are usually based on a standard measurement. E.g. weights, time, height, rate of change in
production.
Advantage – organized data from institution can serve as ready references for future study
or personal claims or people’s records.
Disadvantage – problem arises only when an agency doesn’t have a MIS or if the system or
process of registration is not implemented well.
4. Observation method – a scientific method of investigation that makes possible use of all
senses to measure or obtain outcomes/responses from the object of study.
Advantage – applied to respondents that cannot be asked or need not speak, culture of
organization
Disadvantage – subjectivity of information sought cannot be avoided
Population – is a finite or infinite collection of objects, events, or individuals with specified class
or characteristics under consideration, such as students in a certain school.
Sloven’s formula in determining the sample size
N
n = ---------------
1 + Ne2
- “the larger the size of the sample, the more certain we can be sure that the sample mean
will be good estimate of the population mean”. The larger the size of the sample, the closer its
characteristics would be to the characteristics of the entire population.
Complete enumeration or census taking- use as benchmarks or reference points for current
statistics and are used as sampling frame for most current sample surveys.
Random sampling – is the most commonly used sampling technique in which each member
in the population is given an equal chance of being selected in the sample.
- called as fair sampling
1. Equiprobability – means that each member of the population has an equal chance of being
selected and included in the sample.
2. Independence – means that the chance of one member being drawn does not affect the chance of
the other member. E.g. In conducting a study on the product preference of customers, the choice of
one member of the family cannot be assumed as the choice of the entire
2. Unrestricted random sampling – is considered the best random sampling design because
there were no restrictions imposed and every member in the population has an equal chance of
being included in the sample
Sampling Techniques
1. Lottery or fishbowl sampling – done by simply writing the names or numbers of all the
members of the population in small rolled pieces of paper which are later placed in a container.
This is usually done in a lottery.
2. Sampling with the use of Table of random numbers – if the population is large, a more
practical procedure is the use of Table of Random Numbers which contains rows and columns of
digits randomly ordered by a computer.
3. Systematic sampling – done by taking every kth element in the population. It applies to a
group of individuals arranged in a waiting line or in methodical manner
N
k = -------
n
4. Stratified random sampling – when the population can be partitioned into several strata
or subgroups, it may be wiser to employ the stratified technique to ensure a representative of each
group in the sample. Random samples will be selected from each stratum.
1. Simple stratified random sampling – when the population is grouped into more r less
homogeneous classes, that is different groups but with a relatively common characteristic, then
each can be sampled independently by taking equal number of elements from each stratum.
Population Sample
Fourth yr 185 50
Third yr 200 50
Second yr 215 50
First yr 200 50
Total N = 800 n=200
b. Stratified proportional random sampling- the characteristic of the population is such that
the proportions of the subgroups are grossly equal. The researcher may wish to maintain these
characteristics in the sample with the use of the stratified proportion technique
Fourth yr 120 15 30
Third yr 200 25 50
Second yr 220 27.5 55
First yr 260 32.5 65
Total 800 100 200
Multi-stage and Multiple sampling – uses several stages or phrases in getting the sample
from the population. However, selection of the sample is still done at random.
2. Quota sampling – this is a quick and inexpensive method to operate since the choice of
the number of persons or elements to be included in a sample is done at the researcher’s own
convenience or preference and is not predetermined by some carefully operated randomizing plan.
4. Incidental sampling- applied to those samples which are taken because they are the most
available. The investigator simply takes the nearest individuals as subjects of the study until it
reaches the desired size.
5. Convenience sampling – widely used in television and radio programs to find out
opinions of TV viewers and listeners regarding a controversial issue.
1. Textual – this form of presentation combines text and numerical facts in a statistical
report.
2. Tabular – form of presentation is better than the textual form because it provides
numerical facts in a more concise and systematic manner. Statistical tables are constructed to
facilitate the analysis of relationships.
2. It provides the reader a good grasp of the meaning of the quantitative relationship
indicated in the report.
3. It tells the whole story without the necessity of mixing textual matter with figures.
4. The systematic arrangement of columns and rows makes them easily read and readily
understood.
5. The column and rows make comparison easier.
1. Line graph- it shows relationships between two sets of quantities. This is done by
plotting point of X set of quantities along the horizontal axis against the Y set of quantities along
the vertical axis in a Cartesian coordinate plane. It is often used to predict growth trends for a
longer period of time.
2. Bar graph – consists of bars or rectangles of equal widths, either drawn vertically or
horizontally, segmented or non-segmented.
5. Map graph or cartogram – one of the best ways to present geographical data. This kind
of graph is always accompanied by a legend which tells us the meaning of the lines, colors, or
other symbols used and partitioned in a map.
6. Scatter point diagram – a graphical device to show the degree of relationship between
two quantitative variables. The plotted points for every pair of X and Y set of quantities are not
connected by line segments but are simply scattered on the Cartesian coordinate plane.
Frequency Distribution
E.g. 5 13 6 13 10
8 12 15 10 12
11 15 12 7 15
1. Class Limits / integral limits – groupings or categories defined by lower and upper
limits.
E.g. 16 – 20
21 – 25
26 – 30
Lower class limits are the smallest numbers that belong to the different classes.
Upper class limits are the highest numbers that belong to the different classes.
L.L. U.L.
16 20} class size - 5
21 25}
3. Class boundaries/ real limits – are the numbers used to separate class but w/o gaps
created by class limits. The number to be added or subtracted is half the difference between the
upper limit of one class and the lower limit of the preceding class.
16 20 15.5 20.5
21 25 20.5 25.5
26 30 25.5 30.5
31 35 30.5 35.5
4. Class Marks – are the midpoints of the classes. They can be found by adding the lower and
upper limits and then divide by 2.
3. Set up the classes - in setting up the classes, we add c/2 where c is the class interval, to
the highest score as the upper limit of the highest class and subtract c/2 to the highest score as the
lower limit of the highest class. For instance, the highest score is 47 plus 3/2 or 1.5 is equal to 48.5
and 47 minus 1.5 is equal to 45.5. The highest class limit is from 45.5 to 48.5. This setting of
classes is called real limits or exact limits and these are sometimes spoken of as class boundaries.
Once the highest class is set, subtract 3 as your class interval to the next class until you reach the
lowest score.
There are two ways of setting classes, namely, real limits and integral limits. The latter is
obtained by adding 0.5 to the lower limit or a class interval and subtracting 0.5 to the upper limit.
For instance, the upper class is 45.5 to 48.5 for real limits and 46 to 48 for integral limits.
E.g. Setting of Classes in Real & Integral limits
Real limits/ C.B. Integral limits/C.L.
45.5 – 48.5 46 – 48
42.5 – 45.5 43 – 45
39.5 – 42.5 39 – 42
36.5 – 39.5 36 – 39
33.5 – 36.5 33 – 36
30.5 – 33.5 30 – 33
4. Tally the score - having adopted a set of classes, we are ready to tally them. Locate it
within its proper class and tally. After tallying count the number of tallies in each class and write it
in column frequency (f).
The tally should be carefully checked if the sum is equal to the total number of scores in
the sample. If there is an unequal frequency from the sample, tallying should be repeated. At the
bottom of column 4 the symbol, N or ∑f in which ∑(capital Greek sigma) stands for the “sum of”
equals 35 or the total number of cases (N).
The “less than” cumulative frequency distribution (<F) is obtained by adding successively from
the lowest to the highest interval while “more than” cumulative frequency distribution (>F) is
obtained by adding frequencies from the highest class interval to the lower class interval.
V 0.05 0.01
1 12.706 63.657
2 4.303 9.925
3 3.186 5.841
4 2.776 4.604
5 2.571 4.032
6 2.447 3.707
7 2.365 3.499
8 2.306 3.355
9 2.262 3.250
10 2.228 3.169
11 2.201 3.106
12 2.179 3.055
13 2.160 3.012
14 2.145 2.977
15 2.131 2.947
16 2.120 2.921
17 2.110 2.898
18 2.101 2.878
19 2.093 2.861
20 2.086 2.845
21 2.080 2.831
22 2.074 2.819
23 2.096 2.807
24 2.064 2.797
25 2.060 2.787
26 2.056 2.779
27 2.052 2.771
28 2.048 2.763
29 2.045 2.756
30 4.960 2.576
THE Z –TEST
V1
V2 1 2 3 4 5 6 7 8 9
1 161.4 199.5 215.7 224.6 230.2 234.0 236.8 238.9 240.5
2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00
Measures of central tendency – is a descriptive measures that are used to indicate where the
center, the middle property, or the most typical value of a set of data lies.
- any measure indicating the center of a set of data, arranged in an increasing or
decreasing magnitude.
Population mean – is the set of data taken from the average of the population. It is the total
population, added together, then divided by the population size N.
Eg. The number of faculty members in 10 different colleges are: 16, 25, 40, 24, 15, 20, 50, 15, 35,
and 20 . Treating the data as a population, find the population mean of faculty members for the 10
colleges.
Sample mean – is the set of data taken from the average or mean of the sample, added together
then divided by the sample size n.
Formula:
Eg. The following are the ages of samples of 9 children in a urban area: 9, 8, 1, 3, 4, 5, 6, 7, 2
Example. 9, 8, 7, 6, 5, 4, 3, 2, 1
X f d fd
35 – 39 3 +3 +9
30 – 34 5 +2 +10
25 – 29 8 +1 +8
20 – 24 10 0 0
15 – 19 4 -1 - 4
10 – 14 2 -2 - 4
5–9 2 -3 -6
n= 34 ∑fd = +13
Steps in getting the sample mean or average
1. Select the assumed mean from any of the 7 steps distribution by adding the lower and the upper
limits and dividing their sum by two (2).
Am = 20 +24 = 44 = 22
2
rd
2. Construct the 3 column (d) for positive and negative deviations.
3. Put zero along column d where you select the assumed mean and +1, +2, +3 …, above zero and
-1, -2, -3 below zero.
4. Multiply the frequency (f) and the deviation (d) considering the signs, and write it in column fd.
5. Get the ∑fd algebraically
solution :
Midpoint method
A frequency distribution of the scores in Statistics 01 of 34 BSED students
Scores f Midpoint x’ f’
35 – 39 3 37 111
30 – 34 5 32 160
25 – 29 8 27 216
20 – 24 10 22 220
15 – 19 4 17 68
10 – 14 2 12 24
5–9 2 7 14
n= 34 ∑fx’ = 813
Steps in solving for the sample mean using the midpoint method.
1. Get the midpoint for every step distribution by adding the lower and upper limits then dividing
the sum by 2. place them on column x’ (midpoint).
2. Multiply the values of f and x’ place them under column fx’
3. Find ∑fx’ by adding the values of column fx’ then use the formula
Solution:
Scores (X) f
35 – 39 3
30 – 34 5
25 – 29 8
20 – 24 10
15 – 19 4
10 – 14 2
5–9 2
n= 34
Steps in solving for the median of grouped data:
1. Construct cumulative frequency (F) by copying the frequency of the last step which is 2.
2. Add the frequency going up, thus 2+2 equals 4+4 equal 8+10 equals 26+5equals 31+3 equals
34.
3. Get the half n/2 ; 34/2 is equal to 17
4. Subtract the cumulative frequency (F) from the half sum n/2. take note that the F should not
exceed the n/2 . So 8 in the F, and the small f is 10 one step higher than the cumulative frequency
F.
5. The L is the true lower limit. Subtract .5 from 20, so 20 - .5 is equal 19.5, then
L = 19.5
n/2 = 34/2 = 17
F=8
f = 10
i=5
Scores (X) f F
35 – 39 3 34
30 – 34 5 31
25 – 29 8 26
20 – 24 10 18
15 – 19 4 8
10 – 14 2 4
5–9 2 2
n= 34
Scores (X) F
54 – 56 3
51 – 53 2
48 – 50 1
45 – 47 5
42 – 44 6
39 – 41 8
36 – 38 4
33 – 35 6
30 – 32 2
27 – 29 3