Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
27 views40 pages

01.basic Statistics

The document provides an overview of statistics, defining it as the science of collecting, analyzing, and interpreting data, with applications across various fields such as medicine and business. It discusses the limitations and scope of statistics, types of data (quantitative and qualitative), and methods for collecting primary and secondary data. Additionally, it covers the classification of data and the rules for effective classification.

Uploaded by

Karthik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views40 pages

01.basic Statistics

The document provides an overview of statistics, defining it as the science of collecting, analyzing, and interpreting data, with applications across various fields such as medicine and business. It discusses the limitations and scope of statistics, types of data (quantitative and qualitative), and methods for collecting primary and secondary data. Additionally, it covers the classification of data and the rules for effective classification.

Uploaded by

Karthik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

MATHEMATICAL STATISTICS

UNIT - I

What is Statistics?

The word „Statistics‟ has its root either to Latin word „Status‟ or Italian word „Statista‟ or
German word „Statistik‟ each of which means a „political state‟.

The word „Statistics‟ was primarily associated with the presentation of facts and figures
pertaining to demographic, social and political situations prevailing in a state/government. Its
evolution over time formed the basis for most of the science and art disciplines.

Statistics is used in the developmental phases of both theoretical and applied areas,
encompassing the field of Industry, Agriculture, Medicine, Sports and Business analytics.

Meaning of Statistics

Statistics is concerned with scientific methods for collecting, organizing summarizing,


presenting and analyzing data as well as deriving valid conclusions and making reasonable
decisions on the basis of this analysis.

Statistics is concerned with the systematic collection of numerical data and its
interpretation.

Definition of Statistics

Statistics has been defined by various statisticians.

 Statistics is the science of counting - A. L .Bowley


 Statistics is the science which deals with the collection, presentation, analysis and
interpretation of numerical data - Croxton and Cowden
 Statistics is a body of methods for making decisions in the face of uncertainty - Wallist
and Roberts
 Statistics is a method of decision making in the face of uncertainty on the basis of
numerical data and calculated risk - Ya-Lun-Chou.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 1


MATHEMATICAL STATISTICS

Limitation of Statistics

 Statistics does not deal with individual measurements.


 Statistics deals only with quantitative characteristics.
 Statistical results are true only on an average.
 Statistics is only one of the methods of studying a problem.
 Statistics can be misused.

Scope of Statistics

 Statistics and actuarial science


 Statistics and Commerce
 Statistics and Economics
 Statistics and Medicine
 Statistics and Agriculture
 Statistics and Industry
 Statistics and Information Technology
 Statistics and Government
Variable

Information, especially facts or numbers collected for decision making is called data.
Data may be numerical or categorical. Data may also be generated through a variable.

Variable: A variable is an entity that varies from a place to place, a person to person, a
trial to trial and so on. For instance the height is a variable; domicile is a variable since they vary
from person to person.

A variable is said to be quantitative if it is measurable and can be expressed in specific


units of measurement (numbers).

A variable is said to be qualitative if it is not measurable and cannot be expressed in


specific units of measurement (numbers). This variable is also called categorical variable.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 2


MATHEMATICAL STATISTICS

Types of Data

There are two types of data is given

1. Quantitative data
2. Qualitative data

Quantitative Data

Quantitative data (variable are measurements that are collected or recorded as a number.
Apart from the usual data like height, weight etc.,

Qualitative Data

Qualitative data are measurements that cannot be measured on a natural numerical scale.
For example, the blood types are categorized as O, A, B along with the Rh factors. They can only
be classified into one of the pre assigned or pre designated categories.

Categories and Sources of Data (Methods of collection Data)

There are two categories of data namely

1. Primary data
2. Secondary data

Primary data are that information which is collected for the first time, from a Survey, or
an observational study or through experimentation. For example

 A survey is conducted to identify the reasons from the parents for selection of a particular
school for their children in a locality.
 Information collected from the observations made by the customers based on the service
they received.
 To test the efficacy of a drug, a randomized control trial is conducted using a particular
drug and a placebo.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 3


MATHEMATICAL STATISTICS

1. Methods of collecting primary data

The primary data comes in the following three formats.

(i) Survey data: The investigator or his agency meets the respondents and gets the required data.

(ii) Experimental data (field/laboratory): The investigator conducts an experiment, controlling


the independent variables and obtains the corresponding values of the dependent variable.

(iii) Observational data: In the case of a psychological study or in a medical situation, the
investigator simply observes and records the information about respondent. In other words the
investigator behaves like a spectator.

The various methods used to collect primary data are:

(i) Direct Method

(ii) Indirect Method

(iii) Questionnaire Method

(iv) Local Correspondents Method

(v) Enumeration Method

(i) Direct Method

There are three methods under the direct method

(a) Personal Contact Method

As the name says, the investigator himself goes to the field, meets the respondents and
gets the required information. In this method, the investigator personally interviews the
respondent either directly or through phone or through any electronic media. This method is
suitable when the scope of investigation is small and greater accuracy is needed.

Merits

 This method ensures accuracy because of personal interaction with the investigator.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 4


MATHEMATICAL STATISTICS

 This method enables the interviewer to suitably adjust the situations with the respondent.

Limitations

 When the field of enquiry is vast, this method is more expensive, time consuming and
cumbersome.
 In this type of survey, there is chance for personal bias by the investigator in terms of
asking „leading questions‟.

(b) Telephone Interviewing

In the present age of communication explosion, telephones and mobile phones are
extensively used to collect data from the respondents. This saves the cost and time of collecting
the data with a good amount of accuracy.

(c) Computer Assisted Telephone Interviewing (CATI)

With the widespread use of computers, telephone interviewing can be combined with
immediate entry of the response into a data file by means of terminals, personal computers, or
voice data entry. Computer – Assisted Telephone Interviewing (CATI) is used in market research
organizations throughout the world.

(d) Computer Administered Telephone Survey

Another means of securing immediate response is the computer-administered telephone


survey. Unlike CATI, there is no interviewer. A computer calls the phone number, conducts the
interview, places data into a file for later tabulation, and terminates the contact. The questions are
voice synthesized and the respondent‟s answer and computer timing trigger continuation or
disconnect. The last three methods save time and cost, apart from minimizing the personal bias.

(ii) Indirect Method

The indirect method is used in cases where it is delicate or difficult to get the information
from the respondents due to unwillingness or indifference. The information about the respondent
is collected by interviewing the third party who knows the respondent well.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 5


MATHEMATICAL STATISTICS

Instances for this type of data collection include information on addiction, marriage
proposal, economic status, witnesses in court, criminal proceedings etc. The shortcoming of this
method is genuineness and accuracy of the information, as it completely depends on the third
party.

(iii) Questionnaire Method

A questionnaire contains a sequence of questions relevant to the study arranged in a


logical order. Preparing a questionnaire is a very interesting and a challenging job and requires
good experience and skill.

The general guidelines for a good questionnaire:

 The wording must be clear and relevant to the study


 Ability of the respondents to answer the questions to be considered
 Avoid jargons z Ask only the necessary questions so that the questionnaire may not be
lengthy.
 Arrange the questions in a logical order.
 Questions which hurt the feelings of the respondents should be avoided.
 Calculations are to be avoided.
 It must be accompanied by the covering letter stating the purpose of the survey and
guaranteeing the confidentiality of the information provided.
Editing the preliminary questionnaire
Once a preliminary draft of the questionnaire has been designed, the researcher is obligated
to critically evaluate and edit, if needed. This phase may seem redundant, given all the careful
thoughts that went into each question. But recall the crucial role played by the questionnaire.
Pre Test
Once the rough draft of the questionnaire is ready, pretest is to be conducted. This practice
of pretest often reveals certain short comings in the questions, which can be modified in the final
form of the questionnaire. Sometimes, the questionnaire is circulated among the competent
investigators to make suggestions for its improvement. Once this has been done and suggestions
are incorporated, the final form of the questionnaire is ready for the collection of data.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 6


MATHEMATICAL STATISTICS

Advantages
 In a short span of time, vast geographical area can be covered.
 It involves less labor.

Limitations

 This method can be used only for the literate population.


 Some of the mailed questionnaires may not be returned.
 Some of the filled questionnaires may not be complete.
 The success of this method depends on the nature of the questions and the
involvement of the respondents

(iv) Local Correspondents Methods

In this method, the investigator appoints local agents or correspondents in different


places. They collect the information on behalf of the investigator in their locality and transmit the
data to the investigator or headquarters. This method is adopted by newspapers and government
agencies.

For instance, the Central Statistical Organization (CSO) of Government of India has local
correspondents NSSO. Through them they get the required data. Newspaper publishers appoint
agents to collect news for their dailies. These people collect data in their locality on behalf of the
publisher and transmit them to the head office.

This method is economical and provides timely information on a continuous basis. It


involves high degree of personal bias of the correspondents.

(v) Enumeration method

In this method, the trained enumerators or interviewers take the schedules themselves,
contact the informants, get replies and fill them in their own hand writing. Thus, schedules are
filled by the enumerator whereas questionnaires are filled by the respondents. The enumerators
are paid honorarium. This method is suitable when the respondents include illiterates. The
success of this method depends on the training imparted to the enumerators. The voters‟ list

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 7


MATHEMATICAL STATISTICS

preparation, information on ration card for public distribution in India, etc., follow this method of
data collection. National Sample Survey Office (NSSO) collects information using schedules
depending on the theme.

2. Secondary data

Secondary data is collected and processed by some other agency but the investigator uses
it for his study. They can be obtained from published sources such as government reports,
documents, newspapers, books written by economists or from any other source., for example
websites. Use of secondary data saves time and cost. Before using the secondary data scrutiny
must be done to assess the suitability, reliability, adequacy, and accuracy of the data.

Sources of Secondary Data

The secondary data comes from two main sources, namely published or unpublished.

The published sources include:

 Government Publications - Reserve Bank of India (RBI) Bulletin, Statistical Abstracts of


India by Central Statistical Organization (CSO), Statistical Abstracts of Tamil Nadu by
the Department of Economics and Statistics, Government of Tamil Nadu.
 International Publications - Publications of World Health Organizations, World Bank,
International Labor Organizations, United Nations Organizations.
 Publications of Research institutes – Indian Council of Medical Research (ICMR), Indian
Council of Agricultural Research (ICAR).
 Journals or Magazines or Newspapers - Economic Times, Business Line.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 8


MATHEMATICAL STATISTICS

The data which are not published are also available in files and office records
Government and Private organizations. The different sources described above are schematically
described below.

Comparison between Primary and Secondary data

Primary Secondary

It is collected for the first time Compiled from already existing sources

It is collected directly by the investigator Complied by persons other than the persons who
or by his team collected the data

It costs more It costs less

It requires more time It requires considerably less time

Possibility of having personal bias Personal bias is minimized

Classification of Data

Classification is the process of arranging the primary data in a definite pattern and
presenting in a systematic form.

Definition: Classification as the process of arranging the data into sequences and groups
according to their common characteristics or separating them into different but related parts.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 9


MATHEMATICAL STATISTICS

Objectives of Classification

Classification of data has manifold objectives. The salient features among them are the
following:

 It explains the features of the data.


 It facilitates comparison with similar data.
 It strikes a note of homogeneity in the heterogeneous elements of the collected
information.
 It explains the similarities which may exist in the diversity of data points.
 It is required to condense the mass data in such a manner that the similarities and
dissimilarities are understood.
 It reduces the complexity of nature of data and renders the data to comprehend easily.
 It enables proper utilization of data for further statistical treatment.

Types of Classification

The general types of classification are:

(i) Classification by Time or Chronological Classification

(ii) Classification by Space or Spatial Classification

(iii) Classification by Attribute or Qualitative Classification and

(iv) Classification by Size or Quantitative Classification.

(i) Classification by Time or Chronological Classification

The method of classifying data according to time component is known as classification


by time or chronological classification. In this type of classification, the groups or classes are
arranged either in the ascending order or in the descending order with reference to time such as
years, quarters, months, weeks, days, etc. Illustrations for statistical data to be classified under
this type are listed below:

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 10


MATHEMATICAL STATISTICS

 Number of new schools established in Tamil Nadu during 1995 – 2015


 Pass percentage of students in SSLC Board Examinations over a period of past 5 years
 Index of market prices in stock exchanges arranged day-wise
 Month-wise salary particulars of employees in an industry
 Particulars of outpatients in a Primary Health Centre presented day-wise.

(ii) Classification by Space (Spatial) or Geographical Classification

The method of classifying data with reference to geographical location such as countries,
states, cities, districts, etc., is called classification by space or spatial classification. It is also
termed as geographical classification. The following are some examples:

 Number of school students in rural and urban areas in a State


 Region-wise literacy rate in a state
 State-wise crop production in India
 Country-wise growth rate in South East Asia

(iii) Classification by Attributes or Qualitative classification

The method of classifying statistical data on the basis of attribute is said to be


classification by attributes or qualitative classification.

Examples of attributes include nationality, religion, gender, marital status, literacy and so
on

(iv) Classification by Size or Quantitative Classification

When the characteristics are measured on numerical scale, they may be classified on the
basis of their magnitude. Such a classification is known as classification by size or quantitative
classification.

For example data relating to the characteristics such as height, weight, age, income,
marks of students, production and consumption, etc., which are quantitative in nature, come
under this category.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 11


MATHEMATICAL STATISTICS

Rules for Classification of Data

There are certain rules to be followed for classifying the data which are given below.

 The classes must be exhaustive, i.e., it should be possible to include each of the data
points in one or the other group or class.
 The classes must be mutually exclusive, i.e., there should not be any overlapping.
 It must be ensured that number of classes should be neither too large or nor too small.
Generally, the number of classes may be fixed between 4 and 15.
 The magnitude or width of all the classes should be equal in the entire classification.
 The system of open end classes may be avoided.

Types of Tables

Statistical tables can be classified under two general categories, namely,

 General Tables
 Summary Tables.

General tables contain a collection of detailed information including all that is relevant
to the subject or theme.

The main purpose of such tables is to present all the information available on a certain
problem at one place for easy reference and they are usually placed in the appendices of reports.

Summary tables are designed to serve some specific purposes. They are smaller in size
than general tables, emphasize on some aspect of data and are generally incorporated within the
text.

The summary tables are also called derivative tables because they are derived from the
general tables. The information contained in the summary table aims at analysis and inference.
Hence, they are also known as interpretative tables

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 12


MATHEMATICAL STATISTICS

The statistical tables may further be classified into two broad classes namely simple
tables and complex tables. A simple table summarizes information on a single characteristic and
is also called a univariate table.

Components of a Table

Generally a table should be comprised of the following components:

 Table number and title


 Stub (the headings of rows)
 Caption (the headings of columns)
 Body of the table
 Foot notes
 Sources of data.
(i) Table Number and Title: Each table should be identified by a number given at the
top. It should also have an appropriate short and self explanatory title indicating what
exactly the table presents.
(ii) Stub: Stubs stand for brief and self explanatory headings of rows.
(iii) Caption: Caption stands for brief and self explanatory headings of columns. It may
involve headings and sub-headings as well.
(iv) Body of the Table: The body of the table should provide the numerical information
in different cells.
(v) Foot Note: The explanatory notes should be given as foot notes and must be
complete in order to understand them at a later stage.
(vi) Source of Data: It is always customary to provide source of data to enable the user
to refer the original data. The source of data may be provided in a foot note at the
bottom of the table.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 13


MATHEMATICAL STATISTICS

A typical format of a table is given below:

Table Number
Title of the Table

Caption
Stub heading Total
(Column headings)

Stub
Body
(Row entries)

Total

Foot note (if any)


Source of Data (if any)

General Precautions for Tabulation

The following points may be considered while constructing statistical tables:

 A table must be as precise as possible and easy to understand.


 It must be free from ambiguity so that main characteristics from the data can be easily
brought out.
 Presenting a mass of data in a single table should be avoided. Displaying the data in a
single table would increase the chances for occurrence of mistakes and would make the
table unwieldy. Such data may be presented in more than one table such that each table
should be complete and should serve the purpose.
 Figures presented in columns for comparison must be placed as near to each other as
possible. Percentages, totals and averages must be kept close to each other. Totals to be
compared may be given in bold type wherever necessary.
 Each table should have an appropriate short and self- explanatory title indicating what
exactly the table presents.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 14


MATHEMATICAL STATISTICS

 The main headings and subheadings must be properly placed.


 The source of the data must be indicated in the footnote.
 The explanatory notes should always be given as footnotes and must be complete in
order to understand them at a later stage.
 The column or row heads should indicate the units of measurements such as monetary
units like Rupees, and other units such as meters, etc. wherever necessary.
 Column heading may be numbered for comparison purposes. Items may be arranged
either in the order of their magnitude or in alphabetical, geographical, and chronological
or in any other suitable arrangement for meaningful presentation.
 Figures as accurate as possible are to be entered in a table. If the figures are
approximate, the same may be properly indicated.

Meaning of Diagrams and Graphs

Diagrams

A diagram is a visual form for presenting statistical data for highlighting the basic facts
and relationship which are inherent in the data. The diagrammatic presentation is more
understandable and it is appreciated by everyone. It attracts the attention and it is a quicker way
of grasping the results saving the time. It is very much required, particularly, in presenting
qualitative data.

Graphs

The quantitative data is usually represented by graphs. Though it is not quite attractive
and understandable by a layman, the classification and tabulation techniques will reduce the
complexity of presenting the data using graphs. Statisticians have understood the importance of
graphical presentation to present the data in an interpretable way. The graphs are drawn
manually on graph papers.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 15


MATHEMATICAL STATISTICS

Significance of Diagrams and Graphs:

Diagrams and graphs are extremely useful due to the following reasons:

 They are attractive and impressive


 They make data more simple and intelligible
 They are amenable for comparison
 They save time and labour and
 They have great memorizing effect.

Rules for Constructing Diagrams

While constructing diagrams for statistical data, the following guidelines are to be kept in
mind:

 A diagram should be neatly drawn in an attractive manner


 Every diagram must have a precise and suitable heading
 Appropriate scale has to be defined to present the diagram as per the size of the paper
 The scale should be mentioned in the diagram
 Mention the values of the independent variable along the X-axis and the values of the
dependent variable along the Y-axis
 False base line(s) may be used in X-axis and Y-axis, if required
 Legends should be given for X-axis, Y-axis and each category of the independent variable
to show the difference
 Foot notes can be given at the bottom of the diagram, if necessary

Types of Diagrams

The most commonly used diagrams are

1) Simple Bar Diagram


2) Multiple Bar Diagram
3) Component Bar Diagram/Sub-divided Bar Diagram
4) Percentage Bar Diagram
5) Pie Diagram

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 16


MATHEMATICAL STATISTICS

1) Simple Bar Diagram

Simple bar diagram can be drawn either on horizontal or vertical base. But, bars on
vertical base are more common. Bars are erected along the axis with uniform width and space
between the bars must be equal. While constructing a simple bar diagram, the scale is determined
as proportional to the highest value of the variable. The bars can be coloured to make the
diagram attractive. This diagram is mostly drawn for categorical variable. It is more useful to
present the data related to the fields of Business and Economics.

Example 1

The production cost of the company in lakhs of rupees is given below.

(i) Construct a simple bar diagram.

(ii) Find in which year the production cost of the company is

(a) maximum (b) minimum (c) less than 40 lakhs.

(iii) What is the average production cost of the company?

(iv) What is the percentage increase from 2014 to 2015?

Year Production Cost

2010 55

2011 40

2012 30

2013 25

2014 35

2015 70
Solution:

(i) We represent the above data by simple bar diagram in the following manner:

Step-1: Years are marked along the X-axis and labelled as „Year‟.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 17


MATHEMATICAL STATISTICS

Step-2: Values of Production Cost are marked along the Y-axis and labelled as „Production Cost
(in lakhs of `).

Step-3: Vertical rectangular bars are erected on the years marked and whose height is
proportional to the magnitude of the respective production cost.

Step-4: Vertical bars are filled with the same colours.

The simple bar diagram is presented in figure 1

(ii) (a) The maximum production cost of the company was in the year 2015.

(b) The minimum production cost of the company was in the year 2013.

(c) The production cost of the company during the period 2012- 2014 is less than 40
lakhs.

(iii) Average production Cost of the company

55  40  30  25  35  70

6
 42.5 Lakhs

(iv) Percentage increase in the production cost of the company is

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 18


MATHEMATICAL STATISTICS

70
 100
35
 200%

2) Multiple Bar Diagram

Multiple bar diagram is used for comparing two or more sets of statistical data. Bars with
equal width are placed adjacently for each cluster of values of the variable. There should be
equal space between clusters. In order to distinguish bars in each cluster, they may be either
differently coloured or shaded. Legends should be provided.

Example 2

The table given below shows the profit obtained before and after tax payment (in lakhs of
rupees) by a business man on selling cars from the year 2014 to 2017.

Year Profit before tax Profit after tax

2014 195 80

2015 200 87

2016 165 45

2017 140 32
(i) Construct a multiple bar diagram for the above data.

(ii) In which year, the company earned maximum profit before paying the tax?

(iii) In which year, the company earned minimum profit after paying the tax?

(iv) Find the difference between the average profit earned by the company before paying
the tax and after paying the tax.

Solution:

Since we are comparing the profit earned before and after paying the tax by the same
Company, the multiple bar diagram is drawn. The diagram is drawn following the procedure
presented below:

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 19


MATHEMATICAL STATISTICS

Step 1 : Years are marked along the X-axis and labeled as “Year”.

Step 2 : Values of Profit before and after paying the tax are marked along the Y-axis and labeled
as “Profit (in lakhs of `)”.

Step 3 :Vertical rectangular bars are erected on the years marked, whose heights are proportional
to the respective profit. The vertical bars corresponding to the profit earned before and
after paying the tax in each year are placed adjacently.

Step 4 :The vertical bars drawn corresponding to the profit earned before paying the tax are
filled with one type of colour. The vertical bars drawn corresponding to the profit
earned after paying the tax are filled with another type of colour. The colouring
procedure should be applied to all the years uniformly.

Step 5 :Legends are displayed to describe the different colours applied to the bars drawn for
profit earned before and after paying the tax.

The multiple bar diagram is presented in Figure 2

Figure 2: Multiple Bar Diagram for Profit by the Company earned before and after
paying the Tax

(i) The company earned the maximum profit before paying the tax in the year 2015.

(ii) The company earned the minimum profit after paying the tax in the year 2017.

(iii) The average profit earned before paying the tax = 4/700 = 175 lakhs

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 20


MATHEMATICAL STATISTICS

The average profit earned after paying the tax = 4/244 = 61 lakhs

Hence, difference between the average profit earned by the company before paying the
tax and after paying the tax is

= 175 – 61 = 114 lakhs.

3) Component Bar Diagram/Sub-divided Bar Diagram

A component bar diagram is used for comparing two or more sets of statistical data, as
like multiple bar diagram. But, unlike multiple bar diagram, the bars are stacked in component
bar diagrams. In the construction of sub-divided bar diagram, bars are drawn with equal width
such that the heights of the bars are proportional to the magnitude of the total frequency. The
bars are positioned with equal space. Each bar is sub-divided into various parts in proportion to
the values of the components. The subdivisions are distinguished by different colours or shades.
If the number of clusters and the categories in the clusters are large, the multiple bar diagram is
not attractive due to more number of bars. In such situation, component bar diagram is preferred.

Example 3

Total expenditure incurred on various heads of two schools in an year are given below.
Draw a suitable bar diagram

Amount (in lakhs)


Expenditure Head
School I School II
Construction/Repairs 80 90
Computers 35 50
Laboratory 30 25
Watering plants 45 40
Library books 40 30
Total 230 235
Which school had spent more amount for

(a) construction/repairs (b) Watering plants?

Solution

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 21


MATHEMATICAL STATISTICS

Since we are comparing the amount spent by two schools in a year towards various
expenditures with respect to their total expenditures, a component bar diagram is drawn.

Step 1 : Schools are marked along the X-axis and labeled as “School”.

Step 2 : Expenditure Head are marked along the Y-axis and labeled as “Expenditure (` in
lakhs)”.

Step 3 : Vertical rectangular bars are erected for each school, whose heights are proportional to
their respective total expenditure.

Step 4 : Each vertical bar is split into components in the order of the list of expenditure heads.
Area of each rectangular box is proportional to the frequency of the respective
expenditure head/component. Rectangular boxes for each school are coloured with
different colours. Same colours are applied to the similar expenditure heads for each
school.

Step 5 : Legends are displayed to describe the colours applied to the rectangular boxes drawn for
various expenditure heads.

The component bar diagram is presented in Figure 3

Amount (in lakhs)


Expenditure Head Cumulative Amount Cumulative Amount
School I School II
Spent Spent
Construction/Repairs 80 80 90 90

Computers 35 115 50 140

Laboratory 30 145 25 165

Watering plants 45 190 40 205

Library books 40 230 30 235

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 22


MATHEMATICAL STATISTICS

Figure 3: Component Bar Diagram for expenditure of School I and School II

(i) School- II had spent more amount towards Construction/Repairs.

(ii) School- I had spent more amount towards Watering plants.

4) Percentage Bar Diagram

Percentage bar diagram is another form of component bar diagram. Here, the heights of the
components do not represent the actual values, but percentages. The main difference between
sub-divided bar diagram and percentage bar diagram is that, in the former, the height of the bars
corresponds to the magnitude of the value. But, in the latter, it corresponds to the percentages.
Thus, in the component bar diagram, heights of the bars are different, whereas in the percentage
bar diagram, heights are equal corresponding to 100%. Hence, percentage bar diagram will be
more appealing than sub-divided bar diagram. Also, comparison between components is much
easier using percentage bar diagram

Example 4. Total expenditure incurred on various heads of two schools in an year are given
below. Draw the percentage sub-divided bar diagram

Amount (in lakhs)


Expenditure Head
School I School II

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 23


MATHEMATICAL STATISTICS

Construction/Repairs 80 90
Computers 35 50
Laboratory 30 25
Watering plants 45 40
Library books 40 30
Total 230 235

. Also find (i) The percentage of amount spent for computers in School I (ii) What are the
expenditures in which School II spent more than School I.

Solution:

Since we are comparing the amount spent by two schools in a year towards various expenditures
with respect to their total expenditures in percentages, a percentage bar diagram is drawn.

Step 1 : Schools are marked along the X-axis and labeled as “School”.

Step 2 : Amount spent in percentages are marked along the Y-axis and labeled as “Percentage of
Expenditure (` in lakhs)”.

Step 3 : Vertical rectangular bars are erected for each school, whose heights are taken to be
hundred.

Step 4 : Each vertical bar is split into components in the order of the list of percentage
expenditure heads. Area of each rectangular box is proportional to the percentage of frequency of
the respective expenditure head/component. Rectangular boxes for each school are coloured with
different colours. Same colours are applied to the similar expenditure heads for each school.

Step 5 : Legends are displayed to describe the colours applied to the rectangular boxes drawn for
various expenditure heads.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 24


MATHEMATICAL STATISTICS

The percentage bar diagram is presented in Figure 4.

Amount (in lakhs)


Expenditure Percentage of Cumulative Percentage Cumulative
School School
Head Amount of Amount
I Percentage II Percentage
spent spent
Construction/
80 35 35 90 38 38
Repairs
Computers 35 15 50 50 21 59
Laboratory 30 13 63 25 11 70
Watering plants 45 20 83 40 17 87
Library books 40 17 100 30 13 100
Total 230 100 235 100

Figure4. Percentage Bar Diagram for expenditures of School I and School II

(i) 21% of the amount was spent for computers in School I

(ii) 38% of expenditure was spent for construction/Repairs by School II than School I

5) Pie Diagram

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 25


MATHEMATICAL STATISTICS

The Pie diagram is a circular diagram. As the diagram looks like a pie, it is given this
name. A circle which has 360c is divided into different sectors. Angles of the sectors, subtending
at the center, are proportional to the magnitudes of the frequency of the components.

Procedure:

The following procedure can be followed to draw a Pie diagram for a given data:

(i) Calculate total frequency, say, N.

Class frequency
(ii) Compute angles for each component using the formula.  360
N

(iii) Draw a circle with radius of sufficient length as a horizontal line.

(iv) Draw the first sector in the anti-clockwise direction at an angle calculated for the first
component.

(v) Draw the second sector adjacent to the first sector at an angle corresponding to the
second component.

(vi) This process may be continued for all the components.

(vii) Shade/colour each sector with different shades/colours.

(viii) Write legends to each component.

Example 5.

Draw a pie diagram for the following data (in hundreds) of house hold expenditure of a
family.

Items Expenditure
Food 87
Clothing 24
Recreation 11
Education 13
Rent 25

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 26


MATHEMATICAL STATISTICS

Miscellaneous 20

Also find

(i) The central angle of the sector corresponding to the expenditure incurred on Education

(ii) By how much percentage the recreation cost is less than the Rent.

Solution

The following procedure is followed to draw a Pie diagram for a given data:

(i) Calculate the total expenditure, say, N.

(ii) Compute angles for each component food, clothing, recreation, education, rent and
Class frequency
miscellaneous using the formula  360
N

Items Expenditure Angle of the circle


87
Food 87  360  174
180
24
Clothing 24  360  48
180
11
Recreation 11  360  22
180
13
Education 13  360  26
180
25
Rent 25  360  50
180
20
Miscellaneous 20  360  40
180
Total N = 180 360

(iii) Draw a circle with radius of sufficient length as a horizontal line.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 27


MATHEMATICAL STATISTICS

(iv) Draw the first sector in the anti-clockwise direction at an angle calculated for the first
component food.

(v) Draw the second sector adjacent to the first sector at an angle corresponding to the
second component clothing.

(vi) This process is continued for all the components namely recreation, education, rent
and miscellaneous.

(vii) Shade/colour each sector with different shades/colours.

(viii) Write legends to each component.

The pie diagram is presented in Figure 5.

The central angle of the sector corresponding to the expenditure incurred on Education is
26o

Recreation cost is less than rent by 28o

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 28


MATHEMATICAL STATISTICS

Types of Graphs

Graphical representation can be advantageous to bring out the statistical nature of the
frequency distribution of quantitative variable, which may be discrete or continuous.

The most commonly used graphs are

1. Histogram
2. Frequency Polygon
3. Frequency Curve
4. Cumulative Frequency Curves (Ogives)

1. Histogram

A histogram is an attached bar chart or graph displaying the distribution of a frequency


distribution in visual form. Take classes along the X-axis and the frequencies along the Y-axis.
Corresponding to each class interval, a vertical bar is drawn whose height is proportional to the
class frequency.

Limitations:

We cannot construct a histogram for distribution with open-ended classes. The histogram
is also quite misleading, if the distribution has unequal intervals.

Example 1

The following table shows the time taken (in minutes) by 100 students to travel to school on a
particular day

Time 0-5 5-10 10-15 15-20 20-25


No. of Students 5 25 40 17 13

Draw the histogram. Also find:

(i) The number of students who travel to school within 15 minutes.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 29


MATHEMATICAL STATISTICS

(ii) Number of students whose travelling time is more than 20 minutes

Solution

Since we are displaying the distribution of time taken (in minutes) by 100 students to
travel to school on a particular day in visual form, the histogram is drawn.

Step 1 : Time taken are marked along the X-axis and labeled as “Time (in minutes)”.

Step 2 : Number of students are marked along the Y-axis and labeled as “No. of students”.

Step 3 : Corresponding to each time taken, a vertical attached bar is drawn whose height is
proportional to the number of students.

Figure 1. Histogram for time taken by students to travel to school

The Histogram is presented in Figure 1.

(i) 5+25+40=70 students travel to school within 15 minutes

(ii) 13 students travelling time is more than 20 minutes

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 30


MATHEMATICAL STATISTICS

2. Frequency Polygon

Frequency polygon is drawn after drawing histogram for a given frequency distribution.
The area covered under the polygon is equal to the area of the histogram. Vertices of the polygon
represent the class frequencies. Frequency polygon helps to determine the classes with higher
frequencies. It displays the tendency of the data. The following procedure can be followed to
draw frequency polygon:

(i) Mark the midpoints at the top of each vertical bar in the histogram representing the
classes.

(ii) Connect the midpoints by line segments.

Example 2

A firm reported that its Net Worth in the years 2011-2016 are as followings

Year 2011-2012 2012-2013 2013-2014 2014-2015 2015-2016


Net Worth („ in lakhs) 100 112 120 133 117
Draw the frequency polygon for the above data

Solution:

Since we are displaying the distribution of Net worth in the years 2011-2016, the
Frequency polygon is drawn to determine the classes with higher frequencies. It displays the
tendency of the data.

The following procedure can be followed to draw frequency polygon:

Step 1: Year are marked along the X-axis and labeled as „Year‟.

Step 2: Net worth are marked along the Y-axis and labeled as „Net Worth (in lakhs of `)‟.

Step 3: Mark the midpoints at the top of each vertical bar in the histogram representing the year.

Step 4:Connect the midpoints by line segments. The Frequency polygon is presented in Figure 2.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 31


MATHEMATICAL STATISTICS

Figure 2. Frequency polygon for Net Worth in the year 2011-16

3. Frequency Curve

Frequency curve is a smooth and free-hand curve drawn to represent a frequency


distribution. Frequency curve is drawn by smoothing the vertices of the frequency polygon.
Frequency curve provides better understanding about the properties of the data than frequency
polygon and histogram.

Example 3

The ages of group of pensioners are given in the table below. Draw the Frequency curve
to the following data.

Age 65-70 70-75 75-80 80-85 85-90


No. of pensioners 38 45 24 10 8
Solution:

Since we are displaying the distribution of Age and Number of Pensioners, the Frequency
curve is drawn, to provide better understanding about the age and number of pensioners than
frequency polygon.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 32


MATHEMATICAL STATISTICS

The following procedure can be followed to draw frequency curve:

Step 1 : Age are marked along the X-axis and labeled as „Age‟.

Step 2 : Number of pensioners are marked along the Y-axis and labeled as „No. of Pensioners‟.

Step3 : Mark the midpoints at the top of each vertical bar in the histogram representing the age.

Step 4 : Connect the midpoints by line segments by smoothing the vertices of the frequency
polygon

The Frequency curve is presented in Figure 3.

Figure 3. Frequency curve for Age and No. of pensioners

4. Cumulative frequency curve ( Ogive )

Cumulative frequency curve (Ogive) is drawn to represent the cumulative frequency


distribution. There are two types of Ogives such as „less than Ogive curve‟ and „more than Ogive
curve‟. To draw these curves, we have to calculate the „less than‟ cumulative frequencies and

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 33


MATHEMATICAL STATISTICS

„more than‟ cumulative frequencies. The following procedure can be followed to draw the ogive
curves:

Less than Ogive: Less than cumulative frequency of each class is marked against the
corresponding upper limit of the respective class. All the points are joined by a free-hand curve
to draw the less than ogive curve.

More than Ogive: More than cumulative frequency of each class is marked against the
corresponding lower limit of the respective class. All the points are joined by a free-hand curve
to draw the more than ogive curve.

Example 4.1

Draw the less than Ogive curve for the following data:

Daily Wages (in Rs.) 70-80 80-90 90-100 100-110 110-120 120-130 130-140 140-150
No. of workers 12 18 35 42 50 45 20 8
Also, find

(i) The Median

(ii) The number of workers whose daily wages are less than ` 125.

Solution:

Since we are displaying the distribution of Daily Wages and No. of workers, the Ogive
curve is drawn, to provide better understanding about the wages and No. of workers.

The following procedure can be followed to draw Less than Ogive curve:

Step 1 : Daily wages are marked along the X-axis and labeled as “Wages(in `)”.

Step 2 : No. of Workers are marked along the Y-axis and labeled as “No. of workers”.

Step 3 : Find the less than cumulative frequency, by taking the upper class-limit of daily wages.
The cumulative frequency corresponding to any upper class-limit of daily wages is the
sum of all the frequencies less than the limit of daily wages.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 34


MATHEMATICAL STATISTICS

Step 4 : The less than cumulative frequency of Number of workers are plotted as points against
the daily wages (upper-limit). These points are joined to form less than ogive curve.

The Less than Ogive curve is presented in Fig 4.12.

Daily wages
No. of workers
(less than)
80 12
90 30
100 65
110 107
120 157
130 202
140 222
150 230

(i) Median = ` 120

(ii) 183 workers get daily wages less than ` 125

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 35


MATHEMATICAL STATISTICS

Example 4.2

The following table shows the marks obtained by 120 students of class IX in a cycle test-I
. Draw the more than Ogive curve for the following data :

Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100
No. of students 2 6 8 20 30 22 18 8 4 2
Also, find

(i) The Median

(ii) The Number of students who get more than 75 marks.

Solution:

Since we are displaying the distribution marks and No. of students, the more than Ogive
curve is drawn, to provide better understanding about the marks of the students and No. of
students.

The following procedure can be followed to draw More than Ogive curve:

Step 1 : Marks of the students are marked along the X-axis and labeled as „Marks‟.

Step 2 : No. of students are marked along the Y-axis and labeled as „No. of students‟.

Step 3 : Find the more than cumulative frequency, by taking the lower class-limit of marks. The
cumulative frequency corresponding to any lower class limit of marks is the sum of all
the frequencies above the limit of marks.

Step 4 : The more than cumulative frequency of number of students are plotted as points against
the marks (lower-limit). These points are joined to form more than ogive curve.

The More than Ogive curve is presented in Fig 4.2.

Marks More than No. of Students


0 120
10 118

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 36


MATHEMATICAL STATISTICS

20 112
30 104
40 84
50 54
60 32
70 14
80 6
90 2

Figure 2. More than Ogive curve for Marks and No. of students

(i) Median =42 students

(ii) 7 students get more than 75 marks.

Example 3

The yield of mangoes were recorded (in kg)are given below:

Graphically,

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 37


MATHEMATICAL STATISTICS

(i) find the number of trees which yield mangoes of less than 55 kg.

(ii) find the number of trees from which mangoes of more than 75 kg.

(iii) find the median.

Draw the Less than and More than Ogive curves. Also, find the median using the Ogive
curves

Yield (in kg) 40-50 50-60 60-70 70-80 80-90 90-100 Total
No. of trees 10 15 17 14 12 2 70
Solution:

Since we are displaying the distribution of Yield and No. of trees, the Ogive curve is
drawn, to provide better understanding about the Yield and No. of trees

The following procedure can be followed to draw Ogive curve:

Step 1 : Yield of mangoes are marked along the X-axis and labeled as „Yield (in Kg.)‟.

Step 2 : No. of trees are marked along the Y-axis and labeled as „No. of trees‟.

Step 3 : Find the less than cumulative frequency, by taking the upper class-limit of Yield of
mangoes. The cumulative frequency corresponding to any upper class-limit of Mangoes
is the sum of all the frequencies less than the limit of mangoes.

Step 4 : Find the more than cumulative frequency, by taking the lower class-limit of Yield of
mangoes. The cumulative frequency corresponding to any lower class-limit of Mangoes
is the sum of all the frequencies above the limit of mangoes.

Step 5 : The less than cumulative frequency of Number of trees are plotted as points against the
yield of mangoes (upper-limit). These points are joined to form less than ogive curve.

Step 6 : The more than cumulative frequency of Number of trees are plotted as points against the
yield of mangoes (lower-limit). These points are joined to form more than O give curve

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 38


MATHEMATICAL STATISTICS

Less than Ogive More than Ogive


Yield less than No. of trees Yield greater than No. of trees
50 10 40 70
60 25 50 60
70 42 60 45
80 56 70 28
90 68 80 14
100 70 90 2

The Ogive curve is presented in Fig 3

Figure 4. Ogive curve for yield of mangoes and number of trees

(i) 16 trees yield less than 55 kg

(ii) 20 trees yield more than 75 kg (iii) Median =66 kg

Comparison of Tables, Diagrams and Graphs

Data may be presented in the form of tables as well as using diagrams and graphs. Tables
can be compared with graphs and diagrams on the basis of various characteristics as follows;

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 39


MATHEMATICAL STATISTICS

(i) Table contains precise and accurate information, whereas graphs and diagrams give
only an approximate idea.

(ii) More information can be presented in tables than in graphs and diagrams.

(iii) Tables require careful reading and are difficult to interpret, whereas diagrams and
graphs are easily interpretable.

(iv) For common men, graphs and diagrams are attractive and more appealing than tables.

(v) Diagrams and graphs exhibit the inherent trends in the distribution easily on
comparable mode than the tables.

(vi) Graphs and diagrams can be easily misinterpreted than tables.

Comparison between diagrams and graphs

(i) Diagrams can be drawn on plain papers, whereas graphs require graph papers.

(ii) Diagrams are appropriate and effective to present information about one or more
variables. Normally, it is difficult to draw graphs for more than one variable in the
same graph.

(iii) Graphs can be used for interpolation and/or extrapolation, but diagrams cannot be
used for this purpose.

(iv) Median can be determined using graphs, but not using diagrams.

(v) Diagrams can be used for comparison of data/variables, whereas graphs can be used
for determining the relationship between variables.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 40

You might also like