STATISTICS AND ANALY TICS
UNIT – 1: STATISTICAL DATA COLLECTION AND TYPES
Data : Data can be defined as the raw information such as numbers, words, images, facts, names,
observations, measurements etc., used for a specific purpose of survey or research or analysis.
Types of data:
Data
Secondary
Primary Data
Data
Quantitative Qualitative Published Unpublished
Data Data Data Data
Continuous
Discrete Data Nominal Data Ordinal Data
Data
Primary data: These are the data that are collected for the first time by an investigator for a specific
purpose.
Ex: The census of india, the reserve bank of india (RBI) etc
It is classified into two types:
1) Quantitative data &
2) Qualitative data
Quantitative data: These are numerical data and it is based on mathematical calculations. These
can be measured numerically.
Ex: height, weight, length, age, area, volume, temperature etc
There are two types of quantitative data:
1) Discrete data &
2) Continuous data
Discrete data: These are numerical data which can be counted as a whole number (exact value & not
any fractional value). It contains only a finite number of possible values.
Ex: → Number of students in the class
→ If a coin is tossed 3 times, the possible number of heads = 0, 1, 2, 3. It cannot take 2.2 or any
other value
→ There are 10 number of car parked in a ground, there cannot be a 9.5 car parked in ground
Continuous data: These are numerical data which can be measured in fractional numbers. It
contains infinite number of possible values between a certain range (integral as well as fractional)
Ex: → Length of a ruler can take any length between 0 to 100 cm. it can be either 30 cm, 30.11 cm
etc
→ Temperature range
→ Distance between two cities is 100.32 km
Qualitative data: These are descriptive data based on observation. It does not involve any
mathematical calculations. It describes quality of something or someone.
Ex: skin colour, eye colour, intelligence, honesty, wisdom etc
There are two types of qualitative data:
1) Nominal data &
2) Ordinal data
Nominal data: These data are examined using the grouping method and data are grouped in
categories. It is used for naming or labelling variables, without any numerical values.
Ex: → letters, symbols, words, gender etc
→ cars in the parking ground can be categorised based on its colour, brand name, brand symbol,
area of registration etc
Ordinal data: These data are examined based on rank, satisfaction and fanciness. The variables in
ordinal data are listed in an ordered manner
Ex: → How was your customer service experience?
o Good
o Neutral
o Bad
→ Rate your customer service experience on a scale of 1 – 5 (Lowest - Highest)
o 1
o 2
o 3
o 4
o 5
→ Answer to survey
o Strongly disagree
o Disagree
o Agree
o Strongly agree
Secondary data: Secondary data is second-hand data that is already collected and recorded by some
researchers for their purpose and not for the current research problem.
It means the information is already available & someone analyses it.
Ex: magazines, news papers, books, journals etc
There are two types of secondary data:
1) Published data &
2) Unpublished data
Published data includes
Census report of India
Reserve bank of India bulletin
Annual survey of industries
Five year plans
Stock exchanges
Trade unions
Unpublished data includes
Diaries
Letters
Unpublished biographies
Records maintained by business enterprises
The central government, state government and research institutes also collect data which is
not published due to some reasons
Data collection tools: There are many methods of gathering information. The main tools are:
Questionnaires
Survey
Interviews
Focus group discussion
Case studies
Portfolios
Questionnaires: Questionnaires are simple, straight forward data collection methods. Respondents
get a series of questions either open or close – ended, related to the matter at hand. The
participants fill in the questionnaire and mail it back.
Merits:
1) This method is economical in respect of money, labour and time
2) This method is used for extensive enquiries covering a very wide area
Demerits:
1) This method is not useful in case the respondents are illiterate or semi – literate
2) The collected data is not expected to be very accurate due to lack of seriousness in the
respondents
Survey:
Survey research is used to collect data from a sample of people and gather their thoughts, opinions
and feelings for a specific topic
Surveys can be divided in to two types:
1) Paper surveys: Paper surveys are in the form of questionnaires
2) Online surveys: The online surveys are carried out in the form of a web survey developed
using software
Interviews:
The interview method of collecting data involves asking questions and getting answers from
participants. It is achieved in two ways, such as
1) Personal interview: In this method, an interviewer is required to ask questions face to face
to the other person
2) Telephonic interview: This method of collecting information consists in contacting
respondents on telephone itself
Focus group discussion: A focus group is a group interview of approximately six to twelve people
with similar characteristics, discuss the common areas of the problem. The responses are captured
by video recording, voice recording or writing.
Advantage:
Information obtained is usually very detailed
Disadvantage:
The discussion can be dominated or side tracked by a few individuals
Data cleaning:
Data cleaning is the process used to determine inaccurate, incomplete or unreasonable data and
then improving the quality through correction of detected errors and omissions. The process of
maintaining a high data quality is called data cleaning.
Applications/uses of MS Excel in data analysis:
Frequency
Relative frequency
Bar graph
Pie graph
Line graph
Boxplot
Leaf plot etc
For your knowledge:
Assume that you are collecting feedback from customer about your restaurant, then prepare
sample questionnaire containing five questions about the same
While using MS Excel, what are the short cut keys are used for the following options:
1. Cut
2. Copy
3. Paste
4. Undo
5. Select all