Research Fundamentals and Terminology
3. Editing
Information gathered during data collection may lack uniformity. Example: Data
collected through questionnaire and schedules may have answers which may
not be ticked at proper places, or some questions may be left unanswered.
Sometimes information may be given in a form which needs reconstruction in a
category designed for analysis, e.g., converting daily/monthly income in annual
income and so on. The researcher has to take a decision as to how to edit it.
Editing also needs that data are relevant and appropriate and errors are
modified. Occasionally, the investigator makes a mistake and records and
impossible answer. “How much red chilies do you use in a month” The answer
is written as “4 kilos”. Can a family of three members use four kilo chilies in a
month? The correct answer could be “0.4 kilo”.
1-5
Research Fundamentals and Terminology
Care should be taken in editing (re-arranging) answers to open-
ended questions. Example: Sometimes “don’t know” answer is
edited as “no response”. This is wrong. “Don’t know” means that
the respondent is not sure and is in a double mind about his
reaction or considers the questions personal and does not want to
answer it. “No response” means that the respondent is not familiar
with the situation/object/event/individual about which he is asked.
1-6
Research Fundamentals and Terminology
It is the process of checking and adjusting the data for
omissions, legibility and consistency and readying them for
coding and storage.
Reasons for editing is to ensure completeness, consistency and to
detect questions answered out of order.
Editing means the following treatment of unsatisfactory results.
To ensure consistency in response, the questionnaires with
unsatisfactory responses may be returned to the field, where the
interviewers re-contact the respondents.
1-7
Research Fundamentals and Terminology
To ensure completeness, assigning missing values if returning
the questionnaires to the field is not feasible, the editor
may assign missing values to unsatisfactory responses.
If the questions answered are out of order,
discarding unsatisfactory respondents.
In this approach, the respondents with
unsatisfactory responses are simply discarded.
1-8
Research Fundamentals and Terminology
4. Coding
Coding is translating answers into numerical values or assigning
numbers to the various categories of a variable to be used in data
analysis. Coding is done by using a code book, code sheet, and
a computer card. Coding is done on the basis of the instructions
given in the codebook. The code book gives a numerical code for
each variable.
Now-a-days, codes are assigned before going to the field while
constructing the questionnaire/schedule. Pose data collection;
pre-coded items are fed to the computer for processing and
analysis. For open-ended questions, however, post-coding is
necessary. In such cases, all answers to open-ended questions
are placed in categories and each category is assigned a
code.
1-9
Research Fundamentals and Terminology
Manual processing is employed when qualitative methods are
used or when in quantitative studies, a small sample is used, or
when the questionnaire/schedule has a large number of open-
ended questions, or when accessibility to computers is difficult
or inappropriate. However, coding is done in manual processing
also.
1-10
Research Fundamentals and Terminology
•Coding is the process of identifying and assigning a numerical
score or other character symbol to previously edited data.
•Coding means assigning a code, usually a number , to each
possible response to each question asked.
•The code includes an indication of the column position (field) and
data record it will occupy.
1-11
Research Fundamentals and Terminology
Coding Questions
•Fixed field codes, which mean that the number of records for
each respondent is the same and the same data appear in the
same column(s) for all respondents, are highly desirable.
• If possible standard codes should be used for missing data.
•Coding of structured questions is relatively simple, since the
responses options are predetermined.
•In questions that permit a large number of responses, each
possible response option should be assigned a separate column.
1-12
Research Fundamentals and Terminology
Guidelines for coding unstructured questions:
•Category codes should be mutually exclusive and collectively
exhaustive.
• Only a few of the responses should fall in to the other category.
•Category codes should be assigned for critical issues even if no
one has mentioned them.
• Data should be coded to retain as much detail as possible.
1-13
Research Fundamentals and Terminology
A codebook contains coding instructions and the necessary
information about variables in the data set. A codebook
generally contains the following information:
• Column number
• Variable number
• Question number
• Record number
• Variable number
• Instructions for coding
1-14
Research Fundamentals and Terminology
Coding
Questionnaire:
The respondent code and the record number
appear on each record in the data.
The first record contains the additional codes: project
code,
interview code, date and time codes, and validation code.
It is a good practice to insert blanks between parts.
1-15
Research Fundamentals and Terminology
Converting raw data into transcribe
data
The raw data can be converted into transcribed data by using
various devices.
First the raw data input is done by using devices such as optical
scanning, mark sense forms, computerized sensory analysis, Key
punching and verification to correct etc.
The input data is stored in computer memory, hard disks and
magnetic tapes which can be retrieved as transcribed data as and
when required.
1-16
Research Fundamentals and Terminology
Data Cleaning
Consistency checks identify data that are out of range, logically
inconsistent, or have extreme values.
Computer packages like MINITAB, SPSS, SAS, and EXCEL can
be programmed to identify out-of-range values for each variable
and print out respondent code, variable code, variable name,
record number, column number and out-of-range value.
1-17
Research Fundamentals and Terminology
Data cleaning Treatment of Missing Responses
•Substitute a Neutral Value: A neutral value, typically the mean
response to the variable, is substituted for the missing responses.
•Substitute an Imputed Response: The respondents pattern of
responses to other questions are used to impute or calculate a
suitable response to the missing questions.
•In case wise deletion, cases or respondents with any missing
responses are discarded from the analysis.
•In Pair wise deletion, instead of discarding all cases with any
missing values, the researcher uses only the cases or
respondents with complete responses for each calculation
1-18
Research Fundamentals and Terminology
Statistically adjusting
the data
In weighting each case or respondent in the database is assigned
a weight to reflect its importance relative to other cases or
respondents.
Weighting is most widely used to make the sample data more
representative of a target population on specific characteristics.
Yet another use of weighting is to adjust the sample so greater
importance is attached to respondents with certain characteristics.
Statistically adjusting the data for variable re-specification.
Variable re-specification involves the transformation of data to
create new variables or modify existing variables.
1-19
Research Fundamentals and Terminology
Selecting data analysis strategy
The first data analysis strategy is the transformation of raw data
into a form that that will make them easy to understand
interpret; rearranging, ordering, and manipulating data
and
to generate information
descriptiveis the starting point of data
analysis strategy.
Second data analysis strategy is to perform basic data analysis by
using descriptive statistics.
Third data analysis strategy is to perform basic analysis using
univariate statistics.
Fourth data analysis strategy is to perform data analysis using
bivariate analysis such as testing the differences.
1-20
Research Fundamentals and Terminology
Fifth data analysis strategy is to perform data analysis
using measures of association such as regression and
correlation and nonparametric statistics.
Sixth data analysis strategy is to perform data
analysis using multivariate statistics.
1-21
Research Fundamentals and Terminology
Data Presentation Tools
This is the first category of Research Methodology tools and techniques
which are used to present data.
The list of data presentation tools is as below.
1. Bar chart 2. Pie chart 3. Frequency Distribution
4. Histogram 5. Frequency Polygon 6. Ogive
7. Stem and Leaf Plot 8. Cross Tabulation table
To summarize Qualitative and quantitative data various
charts and graphs are developed.
1-22
Research Fundamentals and Terminology
Summarizing
Qualitative Data
Here we are using various charts and graphs.
1. Frequency distribution
2. Relative frequency distribution
3. Percent frequency distribution
4. Bar graphs
5. Pie charts
1-23
Research Fundamentals and Terminology
Summarizing
Quantitative Data
1. Frequency distribution
2. Relative frequency distribution
3. Percent frequency distribution
4. Cumulative frequency distribution
5. Cumulative relative frequency
6.Cumulative percent frequency distribution
7.Histogram
8.Frequency Polygon
9.Ogive
1-24
Research Fundamentals and Terminology
Tabulation
• Tabulation is the final stage in collection and compilation
of data, and it is thestepping-stone to theanalysis and
interpretation of data. In deciding about thetype of
tabulation onehas to keep in mind the nature, scope
and the object of research problem. Tabulation of data should
be done in such a form that it suits the nature and object of the
research investigation.
• Tabulation simply means presenting of data through tables. To
be more precise “Tabulation is an orderly arrangement of data
in column and rows‟.
1-25
Research Fundamentals and Terminology
OBJECTIVES, IMPORTANCE AND
ADVANTAGES OF TABULATION
• Proper tabulation of data is of great importance because if the
tabulation of data is not satisfactory its analysis is bound to be
defective.
• Tabulation of data is done in order to achieve simplicity and
convenience in processing and interpretation of data collected
for research problem.
1-26
Research Fundamentals and Terminology
Advantages Of Tabulation
1. Ease in Understanding: It is easy
understand tabulated data to than
unorganized data. the
2.Time Savings: Tabulation of Data leads to
immense saving of time during analysis.
3.Ease in Drawing Diagrams: Diagrammatic
representation of data is more convenient if
it is done on the basis of a tabulated data.
1-27
Research Fundamentals and Terminology
4. Ease in Comparison: Through tabulated data
becomes easy to undertake comparative study
because it is systematically displayed.
5. Detection of Errors: Errors and omissions in data
are easily detected in tabulated data.
6. Space-Saving: In tabulation, the data is displayed
in column and rows and so it uses less space.
7. No Chances of Repetition: Repetition of data may
occur if it is displayed in an unorganized fashion.
Tabulation saves the data from being repeated.
1-28
Research Fundamentals and Terminology
Cross-Tabulation
• A cross tabulation is the merging of the frequency
distribution of 2 or more variables in a single table. It
helps to understand how 1 variable relates to another
variables
• Example: brand loyalty relates to gender
1-29
Research Fundamentals and Terminology
Example:- Cross-tabulation
Gender Handedness
Sample
1 Female Right-handed
2 Male Left-handed
3 Female Right-handed
4 Male Right-handed
5 Male Left-handed
6 Male Right-handed
7 Female Right-handed
8 Female Left-handed
9 Male Right-handed
1-30 10 Female Right-handed
Research Fundamentals and Terminology
Left- Right-
Total
handed handed
Males 2 3 5
Female 1 4 5
s
Total 3 7 10
1-31
Research Fundamentals and Terminology
Advantages
1. Cross tabulation analysis & results can be
easily interpreted
2. Theclarity of interpretation provides a stronger
link between research result & managerial action
3. It is simple
4. Easy to understand by manager because
Not statistically oriented
1-32