Chapter One
1. Introduction
This chapter introduces the subject matter of statistics, the art of learning from data. It describes the two
branches of statistics, descriptive and inferential. The idea of learning about a population by sampling and
studying certain of its members is described. It is reasonable to start by thinking about this question, relating
it to your own experiences, and talking it over with friends. However, if you want to convince others and
obtain a consensus, it is then necessary to gather some objective information. We are using it to indicate
the modern approach to learning about a complicated question. Namely, one must collect relevant
information, or data, and these data must then be described and analyzed in such a way that valid conclusion
is made. This is the subject matter of statistics.
1.1 Definition and Classification of Statistics
The common usage of the word “statistics” has, therefore, two meanings. In one sense “statistics” is the
plural form which refers to the numerical facts and figures collected for a certain purposes. Statistics as a
numerical data: In this sense statistics is defined as aggregates of numerical expressed facts (figures)
collected in a systematic manner for a predetermined purpose.(in plural sense)
In the other sense;” statistics” refers to a field of study or to a body of knowledge or to a subject that is
concerned with systematic collection and interpretation of numerical data to make a decision. In this sense
the word statistics is singular. Statistics as a subject (field of study): in this sense statistics is defined as
the science of collecting, organizing, presenting, analyzing and interpreting numerical data to make
decision on the bases of such analysis.(in singular sense)
In this course, we shall be mainly concerned with statistics as a subject, that is, as a field of study.
Classification of statistics
Anyone can apply statistical techniques to, virtually, every branch of science and art. These techniques are
so diverse that statisticians commonly classify them into the following two broad categories (Descriptive
statistics and inferential statistics)
Descriptive Statistics: it is an area of statistics which is mainly concerned with the methods and
techniques used in collection, organization, presentation, and analysis of a set of data without making
any conclusions or inferences. According to this definition the activities in the area of Descriptive
Statistics include: Gathering data, Editing and classifying data, Presenting data in tables, drawing
diagrams and graphs for them, Calculating averages and measures of dispersions. Descriptive statistics
doesn’t go beyond describing the data themselves.
Examples: 1) Recording a students’ grades throughout the semester and then finding the average of
these grades.
2) From sample we have 40% employee suggest positive attitude toward the management of the
organization.
3) Drawing graphs that show the difference in the scores of males and females.
All the above examples simply summarize and describe a given data. Nothing is inferred or concluded
on the basis of the above description.
Inferential Statistics: Inferential statistics is an area of statistics which deals with the method of
inferring or drawing conclusion about the characteristics of the population based upon the results of
a sample. Statistics is concerned not only with collection , organization , presentation and analysis
of data but also with the inferences which can be made after the analysis is completed. In collecting
data concerning the characteristics of a set of elements, or the element can even be infinite. Instead
of observing the entire set of objects, called the population, one observes a subset of the population called
a sample.
Inferential statistics utilizes sample data to make decision for entire data set based on sample. Examples
of inferential Statistics are:-
1. Of 50 randomly selected students at Sport department of Dire Dawa university 28 of which are
female. An example of inferential statistics is the following statement: "56% or ((28/50)*100) of
students at Sport department of Dire Dawa University are female." We have no information about
all students at Sport department of Dire Dawa University, just about the 50. We have taken that
information and generalized it to talk about all students at Sport department of Dire Dawa
University.
2. As a result of recent reduction in oil production by oil producing nations , we can expect
the price of gasoline to double up in the next year.(It is an inference from sample survey).
1.2 Stages in Statistical Investigation
Before we deal with statistical investigation, let us see what statistical data mean. Each and every numerical
data can’t be considered as statistical data unless it possesses the following criteria. 1) the data must be
aggregate of facts. 2) they must be affected to a marked extent by a multiplicity of causes. 3) they must be
estimated according to reasonable standards of accuracy. 4) the data must be collected in a systematic
manner for predefined purpose. 5) the data should be placed in relation to each other.
The different stages of statistical investigation include formulating the problem and then collecting,
organizing (classifying), presenting, analyzing and interpreting of statistical data.
Data Collection: This is a stage where we gather information for our purpose
o If data are needed and if not readily available, then they have to be collected.
o Data may be collected by the investigator directly using methods like interview, questionnaire,
and observation or may be available from published or unpublished sources.
o Data gathering is the basis (foundation) of any statistical work.
o Valid conclusions can only result from properly collected data.
Data Organization: It is a stage where we edit our data . A large mass of figures that are collected from
surveys frequently need organization. The collected data involve irrelevant figures, incorrect facts,
omission and mistakes. Errors that may have been included during collection will have to be edited. After
editing, we may classify (arrange) according to their common characteristics. Classification or arrangement
of data in some suitable order makes the information easer for presentation.
Data Presentation: The organized data can now be presented in the form of tables, diagram and graphs.
At this stage, large data will be presented in tables in a very summarized and condensed manner. The main
purpose of data presentation is to facilitate statistical analysis. Graphs and diagrams may also be used to
give the data a bright meaning and make the presentation attractive.
Data Analysis: This is the stage where we critically study the data to draw conclusions about the population
parameter. The purpose of data analysis is to dig out information useful for decision making. Analysis
usually involves highly complex and sophisticated mathematical techniques. However, in this course only
the most commonly used methods of statistical analysis are included in next chapters. Such as the
calculations of averages, the computation of majors of dispersion, regression and correlation analysis are
covered.
Data Interpretation: This is the stage where one draw valid conclusions from the results obtained through
data analysis. Interpretation means drawing conclusions from the data which form the basis for decision
making. The interpretation of data is a difficult task and requires a high degree of skill and experience. If
data that have been analyzed are not properly interpreted, the whole purpose of the investigation may be
defected and fallacious conclusion be drawn. So that great care is needed.
1.3 Definition of Some Basic Statistical Terms
In this section, we will define terms which will be used frequently.
Data: Data as a collection of related facts and figures from which conclusions may be drawn. In other
words data is simply a scientific term for facts, figures, information and measurement.
Population: A population is a totality of things, objects, peoples, etc about which information is being
collected.. It is the totality of observations with which the researcher is concerned. The population
represents the target of an investigation, and the objective of the investigation is to draw conclusions
about the population hence we sometimes call it target population.
Example: population of trees under specified climatic conditions, population of animals fed a certain
type of diet, population of households, etc.
Census: a complete enumeration of the population. But in most real problems it cannot be realized,
hence we take sample.
Sample: A sample is a subset or part of a population selected to draw conclusions about the
population.
Sampling: The process of selecting a sample from the population.
Sample size: The number of elements or observation to be included in the sample.
Statistic: It is a value computed from the sample, used to describe the sample.
Parameter: It is a descriptive measure (value) computed from the population. It is the population
measurement used to describe the population. Example: population mean and standard deviation.
Sampling frame:-A list of people, items or units from which the sample is taken.
Variable: A certain characteristic whose value changes from object to object and time to time.
Sample size: The number of elements or observation to be included in the sample.
Census survey: It is the process of examining the entire population. It is the total count of the
population.
Censes survey (studying the whole population without considering samples) requires a great deal of time,
money and energy. Trying to study the entire population is in most cases technically and economically not
feasible. To solve this problem, we take a representative sample out of the population on the basis of which
we draw conclusions about the entire population.
Therefore, sampling survey
➢ Helps to estimate the parameter of a large population.
➢ Is cheaper, practical, and convenient.
➢ Save time and energy and easy to handle and analysis.
1.4 Applications, Uses and Limitations of Statistics
Application of Statistics
The scope of statistics is indeed very vast; and applicable in almost all fields of human endeavor. Apart
from helping elicit an intelligent assessment from a body of figures and facts, statistics is indispensable
tool for any scientific enquiry-right from the stage of planning enquiry to the stage of conclusion. It applies
almost all sciences: pure and applied, physical, natural, biological, medical, agricultural and engineering
.It also finds applications in social and management sciences, in commerce, business and industry,
applicable in some process e.g. invention of certain drugs, extent of environmental pollution and industries
especially in quality control area etc.
Uses of statistics
Today the field of statistics is recognized as a highly useful tool to making decision process by managers
of modern business, industry, frequently changing technology. It has a lot of functions in everyday
activities. The following are some of the most important uses of statistics.
❖ Statistics condenses and summarizes complex data. The original set of data (raw data) is normally
voluminous and disorganized unless it is summarized and expressed in few numerical values.
❖ Statistics facilitates comparison of data. Measures obtained from different set of data can be compared
to draw conclusion about those sets. Statistical values such as averages, percentages, ratios, etc, are the
tools that can be used for the purpose of comparing sets of data.
❖ Statistics helps in predicting future trends. Statistics is extremely useful for analyzing the past and
present data and predicting some future trends.
❖ Statistics influences the policies of government. Statistical study results in the areas of taxation, on
unemployment rate, on the performance of every sort of military equipment, family planning, etc, may
convince a government to review its policies and plans with the view to meet national needs.
❖ Statistical methods are very helpful in formulating and testing hypothesis and to develop new theories.
Limitations of Statistics
Even though, statistics is widely used in various fields of natural and social sciences, which closely related
with human inhabitant. It has its own limitations as far as its application is concerned. Some of these
limitations are:
❖ Statistics doesn’t deal with single (individual) values. Statistics deals only with aggregate values. But
in some cases single individual is highly important to consider in some situations.
❖ Statistics can’t deal with qualitative characteristics. It only deals with data which can be quantified.
❖ Statistical conclusions are not universally true. Statistical conclusions are true only under certain
condition or true only on average. The conclusions drawn from the analysis of the sample may, perhaps,
differ from the conclusions that would be drawn from the entire population. For this reason, statistics
is not an exact science.
❖ Statistical interpretations require a high degree of skill and understanding of the subject. It requires
extensive training to read and interpret statistics in its proper context. It may lead to wrong conclusions
if inexperienced people try to interpret statistical results.
❖ Statistics can be misused. Sometimes statistical figures can be misleading unless they are carefully
interpreted.
1.5 Types of variables and Scales of Measurements
Variables and Attributes
A variable in statistics is any characteristic, which can take on different values for different elements when
data are collected. A quantitative or qualitative characteristic that varies from observation to observation
in the same group is called a variable. In case of quantitative variables, observations are made using interval
scales whereas in case qualitative variables nominal scales are used. Conventionally, the quantitative
variables are termed as variables and qualitative variables are termed as attributes. Example: attributes like
gender, religion, marital status, coding etc. are attributes.
Types of Variables
A. Continuous Variables: - are usually obtained by measurement not by counting. These are variables
which assume or take any decimal value when collected. The variables like age, time, height, income,
price, temperature, length, volume, rate, time, amount of rainfall and etc are all continuous since the
data collected from such variables can take decimal values.
B. Discrete Variables: - are obtained by counting. A discrete variable takes always whole number values
that are counted. Example: Variables such as number of students, number of errors per page, number
of accidents on traffic line, number of defective or non-defectives.
Scales of Measurements
Normally, when one hears the term measurement, they may think in terms of measuring the length of
something (i.e. the length of a piece of wood) or measuring a quantity of something (i.e. a cup of flour).
This represents a limited use of the term measurement. In statistics, the term measurement is used more
broadly and is more appropriately termed scales of measurement. A scale of measurement refers to ways
in which variables or numbers are defined and categorized and/or is the assignment of numbers to objects
or events in a systematic fashion. Each scale of measurement has certain properties which in turn determine
the appropriateness for use of certain statistical analyses. The various measurement scales results from the
facts that measurement may be carried out under different sets of rules. Four levels of measurement scales
are commonly distinguished: nominal, ordinal, interval, and ratio; and each possessed different properties
of measurement systems.
i. Nominal Scale: consists of ‘naming’ observations or classifying them into various mutually exclusive
categories. Sometimes the variable under study is classified by some quality it possesses rather than by
an amount or quantity. In such cases, the variable is called attribute. Example: Sex (Male, Female), Eye
color (brown, black, etc.), Blood type (A, B, AB and O) etc.
ii. Ordinal Scale: -Whenever observations are not only different from category to category, but can be
ranked according to some criterion. The variables deal with their relative difference rather than with
quantitative differences. Ordinal data are data which can have meaningful inequalities. The inequality
signs < or > may assume any meaning like ‘stronger, softer, weaker, better than’, etc. Example: Patients
may be characterized as (unimproved, improved & much improved), Individuals may be classified
according to socio-economic as (low, medium & high), Letter for grading system (A, B, C, D, F),
authority, career, etc.
Note: Qualitative variables can be either Nominal or Ordinal scales of measurements.
iii. Interval Scale: With this scale it is not only possible to order measurements, but also the distance between
any two measurements is known but not meaningful quotients. There is no true zero point but arbitrary
zero point. Interval data are the types of information in which an increase from one level to the next
always reflects the same increase. Possible to add or subtract interval data but they may not be multiplied
or divided. Example: Temperature of zero degrees does not indicate lack of heat. The two common
temperature scales; Celsius (C) and Fahrenheit (F). We can see that the same difference exists between
10oC (50oF) and 20oC (68OF) as between 25oc (77oF) and 35oc (95oF) i.e. the measurement scale is
composed of equal-sized interval. But we cannot say that a temperature of 20oc is twice as hot as a
temperature of 10oc because the zero point is arbitrary.
iv. Ratio Scale: - Characterized by the fact that equality of ratios as well as equality of intervals may be
determined. Fundamental to ratio scales is a true zero point. All arithmetic operations are used to calculate
values of ratio scale. Most statistical data analysis procedures do not distinguish between the interval and
ratio properties of the measurement scales. Example: Variables such as age, height, length, volume, rate,
time, amount of rainfall, etc. are require ratio scale.
Note: Quantitative variables can be either Interval or Ratio scales of measurements.