Research Methodologies
Research Methodologies
“Indian Institute of Materials Management (IIMM)”, with its headquarters at Navi Mumbai, is a
Professional Body of Materials Management classified under Engineering & Technology Group under
Apprenticeship Act, 1961 and is recognised by ISTE, MHRD.
Through its wide network of 56 branches and 19 chapters having around 9500 members drawn
RESEARCH METHODOLOGY
from public and private sectors, IIMM is dedicated to the promotion of the profession of Materials
Management through its multifarious activities including Educational Programs approved by AICTE
(Post Graduate Diploma in Materials Management and Post Graduate Diploma in Supply Chain
Management & Logistics), Seminars, National Conferences, Regional Conferences, Workshops,
In-house training programs, Consultancy & Research Programs.
To have an effective global interaction, the Institute is a charter member of International Federation
of Purchasing and Supply Management (IFPSM), Helsinki, Finland which has its roots in over
44 member countries.
M
In furtherance of its objectives, IIMM brings out a monthly journal, “Materials Management Review”
comprising latest Articles and Research Papers in the field of Materials, Logistics, Purchase, Inventory,
Supply Chain Management and latest Technological Innovations like Artificial Intelligence, Block
M
Chain, Cloud Computing and Internet of Things.
The Institute has its Centre for Research in Materials Management (CRIMM) at Kolkata, which
II
is engaged in promotion of research activities in collaboration with industries for furthering the
advancement of the profession of Materials and Supply Chain Management.
The Institute is dedicated for the Societal & Environmental considerations through Sustainable
Procurement, Green Purchasing and Life Cycle Consideration, which are part of our course curriculum.
The aim & objective of the Institution is to update & upgrade the skills & knowledge of professionals
so as to ensure inclusive and sustainable development.
RESEARCH METHODOLOGY
M
M
II
© Copyright 2024 Publisher
ISBN: 978-93-91540-95-1
This book may not be duplicated in any way without the express written consent of the
publisher, except in the form of brief excerpts or quotations for the purposes of review.
The information contained herein is for the personal use of the reader and may not be
incorporated in any commercial programs, other books, databases, or any kind of software
without written consent of the publisher. Making copies of this book or any portion,
for any purpose other than your own is a violation of copyright laws. The author and
publisher have used their best efforts in preparing this book and believe that the content is
reliable and correct to the best of their knowledge. The publisher makes no representation
or warranties with respect to the accuracy or completeness of the contents of this book.
M
M
II
Table of Contents
Chapter 1:
M
Fundamentals of Research................................................................................... 1
Chapter 4: Sampling.............................................................................................................. 59
AP
H
C
Fundamentals of Research
M
Table of Contents
M
1.1 Introduction
1.2 Concept of Research
II
1.1 INTRODUCTION
In simple words, the term ‘Research’ is associated with the act of seeking out the
information and knowledge on a specific topic or subject. In other words, research
refers to an art of systematic and careful investigation into an explicit field. The
systematic investigation makes research as an art of scientific investigation. Research
is of significant importance in various fields, such as business, economics and
politics. Research is regarded as a powerful and essential tool, which leads human
beings towards progress. M
Research is conducted to serve a varied range of purposes, such as increasing the
knowledge of the researcher, developing and revising theories based on observed
facts, etc. For instance, organisations use research to take well-informed decisions
about the products and services they deal into or to devise new strategies. Significant
M
management decisions, such as pricing decisions, new product launch, undertaking
new projects, etc., require a research to be conducted to find the probable state of the
circumstances and the most feasible and appropriate strategies that can be designed
II
This chapter will help you in understanding the concept of research. You will study
the characteristics of a good research and types of research. Further, various research
approaches and significance of research are also discussed. The latter section of this
chapter will describe problems encountered by a researcher and ethics in research.
Towards the end, you will learn about the research process.
new knowledge of the already existing researched facts. The term ‘research’ has Notes
been defined by various authors in different ways. Few major definitions of research
are as follows:
In the words of Redman and Mory, “Research is a careful and systematised effort to
gain new knowledge.”
Basic research
Applied research
Descriptive research
Causal research
Conceptual research
Empirical research
Qualitative research
Quantitative research
Notes the basic assumptions or the empirical content or the very validity of theory under
the given conditions. Applied research may explore ways to:
zz Treat a disease
zz Identify social, economic or political trends
zz Improve agricultural productivity
zz Curb or reduce carbon emissions
zz Improve energy efficiency
zz Reduce inflation
Descriptive research: Descriptive research aims to describe the characteristics of a
phenomenon. It includes a different kind of conducting surveys and fact-finding
enquiries. In descriptive research, the researcher only describes the phenomenon.
Descriptive research can answer what, where, when and how questions, but not
why questions. Descriptive research may be conducted in the following situations:
zz To explain the inflation rate in India in the past 20 years
zz To know how India’s housing market changed over the past 10 years
zz To know the most popular news channels among the middle-aged people
M
Causal research: Causal research is also known as explanatory research. It is one
step ahead of the descriptive research as it aims to investigate the cause-effect
relationships. For instance, if, in a descriptive research, the inflation rates of the
M
past 20 years in India is studied and explained without explaining its negative
or positive impact on the Indian economy, a causal research would thoroughly
investigate the causes of the same. In the cause-effect analysis, data can be analysed
in different ways, such as by comparing inflation rates of different years, giving
II
On the basis of this hypothesis, a researcher draws some predictions, such as the Notes
brain development of children who play musical instruments is not affected. After
that, the researcher would conduct a suitable experiment to test predictions. On the
basis of the result of the experiment, the topic of observation would be supported or
revised. For example, if the researcher finds that the brain development of children
who play musical instruments is not affected by it, the topic of observation would
be supported; otherwise, it would be revised.
Qualitative research: Qualitative research is concerned with getting a deep
understanding of qualitative phenomenon. The phenomenon in this research
relates to quality or kind. For example, if the researcher wants to know the cause
of the rising disrespect of the youth towards elders, he/she would have to deeply
look at different aspects, such as changing lifestyle, increasing stress among the
youth and the attitude of people towards the nuclear family. This research tries
to find out why and how rather than what, when and where of a phenomenon.
The aim of a qualitative research is to discover the fundamental ideas, desires and
motives by using the method of in-depth interviews.
Quantitative research: Quantitative research aims to study a phenomenon that is
expressed in terms of quantity. Some examples of the quantitative research are as
follows:
zz
M
A research study that shows that the average rainfall in the month of June in
Uttar Pradesh is more than that of July.
zz A research study that aims to show the percentage of all components of the
M
earth’s atmosphere.
Other types of research: In addition to the types of research mentioned in the
preceding section, there are some other types of research, which are explained as
II
follows:
zz One-time research: This refers to the research that is carried only once.
zz Longitudinal research: This refers to the observational research that is
performed for the same purpose repeatedly over a period of time on the same
group of subjects.
zz Laboratory research: This refers to the research that is done in a laboratory. It is
also known as simulation research. A research in the fields of natural sciences,
such as Physics, Chemistry and Biology, are examples of the laboratory
research. For example, reaction of one chemical with another chemical is an
example of the laboratory research.
zz Field-setting research: This refers to the research that cannot be done in a
laboratory. The research conducted on topics of economics, such as demand,
supply, product and price are examples of a field research.
zz Historical research: This refers to the research in which the researcher either
takes the help of historical sources to conduct fresh research or studies past
events. For example, a research on the outcome of the Revolt of1857 may be
considered as a historical research.
7
Research Methodology
Types of
Research
Approach
Qualitative research approach: The qualitative research approach deals with the Notes
subjective evaluation of attitudes, opinions and actions. This approach generates
results in a non-quantitative form. This research is based out of researcher’s
insights and impressions. Usually, the techniques used in a qualitative research
involve focus group interviews, projective techniques and depth interviews.
Pragmatic research approach: This research approach is also known as mixed
method. The research conducted in this approach involves collecting both
quantitative and qualitative data to conduct inquiries, integrating the two forms
of data, and using different designs that may involve philosophical assumptions
and theoretical frameworks. The central postulation of this form of research is
that by combining both qualitative and quantitative approaches, a more complete
understanding of a research problem is achieved than either approach alone.
Apart from the role of research in marketing, the significance of research in the field
of business is noteworthy for an organisation because it helps:
Identify and define opportunities
Define, monitor and refine strategies
Identify economic and business objectives
Identify policy objectives
Develop products
Identify objectives of human resource development
Identify promotional objectives
Identify market objectives
Identify customer satisfaction objectives
The significance of research for social scientists is reflected in studying social
relationships and in seeking answers to various social problems. The main purpose of 9
research in social sciences is related to two main motives, firstly with the knowledge
Research Methodology
Notes for its own motives and, secondly, with the knowledge for what can be contributed
to practical concerns of the society.
The significance of research can also be appreciated for the following purposes:
For students, writing a master’s or Ph.D. thesis may mean better career
opportunities. It also aids them to attain a high position in the social structure.
For professionals working with research methodology jobs, such as a survey
research assistant, research associate, research faculty, etc., research may mean a
source of livelihood.
Philosophers and thinkers bring light to new ideas and insights after conducting
research.
Literary men and women may use research as a means for the development of
new styles and creative work.
Analysts and intellectuals use the concepts of research for the generalisations of
new theories.
zz
research also depends upon the developing or developed nations. In the developing
nations, research is in its initiation stage while the developed nations have sufficient
facilities and resources to carry on research. Researchers particularly in a developing
nation face the following problems:
Lack of scientific training: The lack of a scientific training in the methodology of
research is a great impediment for researchers in the developing nations. There is
scarcity of capable and experienced researchers. Numerous researchers without
any prior experience and without any certainty about research methods conduct the
research. So, all researches done are necessarily not methodologically appropriate.
Some researchers and their guides, due to lack of specific research training, take
research as a scissor and paste job without doing actual analysis of the collected
material. The outcome of such researches is that the results of research do not
reflect the reality. Thus, irresponsible research necessitates the need of a systematic
study of research methodology. A researcher must be properly equipped with all
methodological aspects of research prior to starting any research project. As such,
efforts should be made to provide short-duration intensive courses for meeting the
requirement of a systematic research methodology.
Insufficient interaction: The lack of interaction among various research and
12 non-research organisations causes problems to researchers. There is inadequate
communication between the sole researcher/university research departments
on one side and business organisations/government departments/research
Fundamentals of Research
institutions on the other side. Due to the lack of interaction and proper contacts of Notes
researchers, a large amount of primary data remains unused.
In order to overcome this problem, efforts should be made to develop a satisfactory
link among all concerned (institutions, organisations, researchers, etc.) for better
and realistic researches. Certain systematic mechanisms, such as university-
industry interaction programmes, must be developed so that researchers can get
ideas and encouragement from the experienced practitioners.
Lack of secrecy: The lack of confidentiality about the usage of information often
creates problems for researchers. Business organisations in a developing country
do not have much confidence that the information shared by them to researchers
will not be misused. The concept of information secrecy is very important for any
business organisation. This restricts business organisations to share information
and proves a barrier to researchers. Consequently, the utmost requirement for the
researchers is to generate the confidence among business organisations that the
information/data obtained from a business organisation will not be misused.
Identification of research problems: Researchers often face the problem of
appropriate identification of research problems. The absence of adequate
information often makes researchers choose such research problems and conduct
research studies which may overlap with the previous researches. This results in
M
duplication and wastes away resources. The solution to this problem is creating
a proper list of subjects, with research problem topics, and the places where the
research is done. Such lists need to be revised and updated at regular intervals and
made available to all the prospective researchers for appropriate identification of
M
the research problems.
Lack of assistance: Researchers often face the problem of the absence of support
in terms of time, funds and proper direction for research. It leads to unnecessary
II
delays in the completion of the research studies. This difficulty can be lessened by
providing sufficient and timely assistance to researchers.
Lack of resources: Deficiency of resources leads to the wastage of energy and
efforts of researchers. At many places, the functioning of library is not satisfactory
and researchers have to spend a lot of their precious time in searching books,
journals and reports. In many libraries, especially which are away from cities and
state capitals, it is difficult to obtain copies of old acts/rules, reports and other
government journals and publications. This creates a big obstacle in the research
work.
Code of conduct: There is a lack of pre-defined code of conduct for researchers.
This, sometimes, results in inter-university and inter-departmental rivalries.
Thus, it is required to develop a code of conduct for researchers which, if obeyed
sincerely, can solve this problem.
Notes a common sense, but it is also right to say that ethical norms vary according to
individuals. Hence, different people may interpret ethical norms in different ways.
For example, if an experimental research involves children as respondents, then
the parents of the children must be informed about the same and prior permission
should be taken. If parents are not informed and their consent is not gained, the
research would be deemed as unethical.
The objectives which underline the necessity of adhering to ethics in a research are
as follows:
Ethics in research should be obeyed to protect the interests of participants involved
in a research.
Ethics in research should be followed to make sure that research is carried out in
a manner that serves interests of individuals, groups and/or society as a whole.
Ethics in research helps scrutinise specific research for its ethical soundness keeping
in consideration issues, such as management of risk, protection of confidentiality
and process of informed consent.
The primary ethical values related to research are shown in Figure 3:
M
Honesty
M
Social
Objectivity
Responsibility
II
Ethics in
Research
Confidentiality Integrity
Confidentiality: It involves that the secret information, such as military secrets, Notes
papers, and personnel records which are used in the research, should be kept
private.
Social responsibility: It infers that a researcher should try to increase social welfare
through his/her research study. In addition, the researcher should not harm
society and environment in any way while conducting research. For example, if
the research is related to animals, the researcher should give them proper care and
respect.
A researcher should adhere to the following five major principles of research ethics:
1. Do good (Beneficence)
2. Do no harm (Non-malfeasance)
3. Obtain informed consent from research participants
4. Do not use deceptive practices
5. Research participants should have the right to withdraw from the research at any
point of time
As ethical norms and standards are important for research, many universities and
government organisations, such as National Institute of Health (NIH), National
M
Science Foundation (NSF) and Food and Drug Administration (FDA), have adopted
and implemented some rules and procedures related to research ethics.
M
1.2.8 MANAGERS AND RESEARCH
Managers equipped with the basic knowledge of research are at an advantage as
compared to those managers who do not have any idea about it. Managers of an
II
Defining
Reviewing Formulating Designing Collecting Analysing Preparing
Research
Literature Hypothesis Research Data Data Reports
Problem
16
Figure 4: Fundamental Steps of a Research Process
Fundamentals of Research
The steps of a research process are closely correlated. It is not essential to follow Notes
the research steps in strict order. However, this order of steps provides a useful
guideline to the researcher. These steps are discussed as follows:
Step 1: Defining Research Problem: The first step refers to the identification
of a problem whose solution can be attained by research. In simple terms, a
research problem means the matter on which the investigator/researcher wants
to investigate. At this stage, the researcher usually feels confused and doubtful.
Research comes into existence through the efforts made by the researcher to solve
doubts and confusions. Basically, two steps are involved in defining a research
problem:
zz Knowing the problem correctly
zz Expressing the problem into meaningful terms
Step 2: Reviewing Literature: The second step refers to a way of developing a
proper understanding of the research problem. Usually, two types of literature
may be reviewed by a researcher, i.e., conceptual literature, which comprises
thoughts and presumptions and empirical literature, which comprises empirical
studies done earlier on a same or similar topic. It is important for a researcher to
review the literature properly to achieve the following:
zz Develop and refine research ideas
M
zz Improve subject information
zz Elucidate study questions
M
zz Focus research possibilities that have been disregarded or unseen
zz Shun easy monotonous work, which has been done previously
II
zz Find out and provide an insight into research advances, tactics and methods
Step 3: Formulating Hypothesis: The third step relates to an uncertain hypothesis
made by the researcher to consider the result of research. It provides the crucial
point for the research and helps the researcher be on the right track.
Step 4: Designing Research: The fourth step is deciding the type of research
design that should be followed for conducting the research study. The research
design selected is based upon the type of research problem and the scope of the
research study undertaken. The preparation of the research design enables the
researcher to yield maximal information from the research conducted.
Step 5: Collecting Data: The fifth step relates to assembling information, which
is crucial for any research study. There are essentially two types of data: primary
data and secondary data. Primary data is collected by testing or investigations. In
case of investigations, data can be composed through:
zz Observation
zz Interviews
zz Telephonic talk
17
zz Feedback form
Research Methodology
Notes zz Agenda
zz Questionnaires
Secondary data relates to that information which has already been composed by
some other researcher. Examples of such data are biographies, diary, records, and
published material. In order to complete a research study successfully, exact and
suitable information is compulsory.
Step 6: Analysing Data: The sixth step of the research process is transforming
and refining data to highlight useful information. There are various statistical
methods to analyse the data, such as tabulation, bar diagrams and pie charts.
Statistical theories, such as correlation, regression and time series are also used
for data analysis. After the data analysis, the researcher is in a position to test the
hypothesis formulated in step three. The researcher can check the rationality of
the hypothesis by using several statistical tests, such as Chi square test, t-test and
F-test.
Step 7: Preparing Reports: The seventh step of the research process is the last stage
in which a researcher shows the complete work done by him through a report
prepared by him. Report writing should be done with great care by keeping in
view the proper layout of report.
M
The main text of a report should include:
zz Preface
zz Summary of whatever researcher has found
M
zz Main report
zz Conclusion
II
At the end, proof tables, questionnaires, and other documents used in the research
study should be given in the form of appendices. The research report also needs to
contain a bibliography, i.e., a list of literary material consulted by the researcher.
All the seven steps of a research process mentioned above are discussed in detail
in further chapters.
S elf A ssessment Q uestions
8. ____________ provides the crucial point for the research and helps the
researcher be on the right track.
9. The preparation of a research design enables a researcher to yield minimal
information from the research conducted. (True/False)
10. Report writing should be done with great care by keeping in view the proper
__________ of report.
A ctivit y
Make a list of 10 research companies running in India. Analyse their process of
research and prepare a short report with the information collected.
18
Fundamentals of Research
Notes Objectivity: It implies that the researcher ought not to be partial in research
plan, collecting information, understanding, investigation, and other features of
research.
Hypothesis: It is a proposition made on the basis of limited evidence for further
investigation.
Simulation: It is a scientific modelling of natural systems with an aim to understand
their functioning.
Honesty: It implies to the truthfulness with which the researcher collects and
presents data.
In order to clearly and deeply understand the topic, the company surveys and
reviews already available research papers and thesis, which included conceptual as
M
well as empirical literature.
It also takes the help from various books and journals. This in-depth review and survey
II
of the available material enable Survey Limited to develop clearer understanding for
formulating research hypothesis. Survey Limited formulates its research hypothesis
as:
Westernisation and changing lifestyle are responsible for the increasing use of
alcoholic beverages.
After formulating its research hypothesis clearly, SurveyLimited chalks out complete
research design within which research would be carried out. It collects primary
as well as secondary data from books, journals, and observation and personal
interviews. The collected data is then analysed critically using various statistical
tools, such as bars, pie charts, tables, and time series. Survey Limited presents a final
report of its work that also includes strategies to reduce effects of alcohol.
QUESTIONS
1. What did Survey Limited do for a clear and deep understanding of the research
topic?
(Hint: The company surveyed and reviewed already available research papers
20 and thesis)
Fundamentals of Research
2. How did in-depth review and survey of the available material help Survey Limited Notes
in conducting research?
(Hint: To develop a clearer understanding for formulating research hypothesis.)
3. What tools were used to analyse the collected data?
(Hint: Various statistical tools, such as bars, pie charts, tables, and time series.)
4. State the research process followed in Survey Limited.
(Hint: Defining the topic of research, reviewing literature to gain more
understanding about the topic, and so on)
5. What sources were used by Survey Limited for data collection?
(Hint: Primary as well as secondary sources)
1.7 EXERCISE
1. Explain the characteristics of a good research.
2. What are the objectives of research?
3. Discuss the importance of research.
4. What ethical norms are required to be followed while conducting research?
M
5. What are the problems encountered by a researcher in the research process?
6. What are the applications of research in various fields of management?
M
7. Explain the research process in detail.
8. Explain various forms of research.
II
2. observation
3. False
4. inferential
5. marketing
6. False
7. True
9. True
10. layout
21
Research Methodology
E-REFERENCES
What is Research Definition, Types, Methods & Examples. (2020). Retrieved 7
March 2020, from https://www.questionpro.com/blog/what-is-research/
Materials, U., Aptitude, R., & Notes, S. (2020). Steps Involved In Research Process
| Research Aptitude Notes. Retrieved 7 March 2020, from https://ugcnetpaper1.
com/research-process/
M
M
II
22
R
TE
2
AP
H
C
Defining and Formulating a
Research Problem M
Table of Contents
M
2.1 Introduction
2.2 Management Dilemma
II
M
M
II
Defining and Formulating a Research Problem
2.1 INTRODUCTION
In the previous chapter, you studied about the concept of research. The chapter
discussed the characteristics and types of research. The latter section of the chapter
described the problems encountered by a researcher. The chapter concluded with
the explanation of the research process.
answer to the problem, the researcher conducts literature review to gain the idea of
previous researches on a similar topic or area of study. Literature review helps in
developing the knowledge base, outlining the research questions and finding of the
previous or existing research conducted by the other researchers on the same topic
of study. It is important to find out the exact problem faced by the management for
conducting research as it is correct to say that a problem rightly explained is half
solved. If the researcher has recognised more than one problem, then the selection
of problem must be done on the basis of priority, financial condition and time limit.
Researchers must aware themselves about the selected problem by studying the
available literature.
In this chapter, you will study the management dilemma. Next, you learn about
the literature review, its functions and process. Further, the chapter will describe
the concept of a research problem. Towards the end, the chapter will brief about
formulating a research problem.
Notes means ‘proposition’ or ‘subject’. Let us assume that Priyamvadha went to the pacific
mall. Now she has to choose between red dress and blue dress. Here, we cannot say
that she is in a dilemma, but if a fire broke out at her floor of the residential building
and her cat and dog are inside the room, and she can save only one of them, then this
can be considered as an awful dilemma.
Therefore, we can say that the management dilemma is a complex situation faced by
executives or managers when they have to achieve two or more goals at a particular
time. It becomes difficult for them to prioritise one goal out of other goals. As an
executive or a manager of an organisation, people are likely to face management
dilemmas on a regular basis. For example, the marketing head of XYZ Organisation
is in a dilemma because a few months ago, one competitor organisation announced
that it will launch a new product very soon, which is now under the development
stage. XYZ Organisation has also publicly announced to launch the same type of
product. The launch of the new product is in 2 months, and the development team
informs the marketing executive that the version of its product will not be up to
the standard as that of its competitor. It needs at least more than a year for creating
a product matching the competitor’s standards, and it will be too late to start in a
fresh manner. This type of a condition forces the marketing head to take a decision
whether to launch the product or postpone it. The scope of business management
is full of such problems called management dilemmas that need detailed business
research and study.
M
A management dilemma is usually the symptom of a problem that requires a business
decision, which can be related to:
M
Increase in the overall costs
Decline in the sales
II
A good literature review provides a clear picture of the current knowledge on the
research subject. The objectives of a literature review are:
To conduct a survey in the area or subject of study
To synthesise the information into a summary
27
Research Methodology
Notes To perform a critical analysis of the collected information by recognising the gaps
in the existing knowledge
To show limitations of research theories and review controversial areas
To present the literature in a proper manner
Let us understand the importance of a literature review.
Literature review also helps broaden the knowledge base in the related research
area in which researchers wants to study.
Literature review helps in finding the researcher’s present knowledge for
conducting the study.
Literature review helps identify the experts on a researcher’s topic of study. For
example, if any person has written 20 articles on a research topic related to your
research subject, then he is likely to be knowledgeable about that topic. This
person’s written work could be a key resource for consultation in your research.
Literature review helps in avoiding delicacy and plagiarism.
Literature review helps make a comparison between the findings of the researcher
and others.
Now, let us understand the functions of a literature review:
research subject and establishes the relation between what a researcher is proposing Notes
to examine and what he/she has already studied.
M
Figure 1: Functions of a Literature Review
Notes topic which helps them in narrowing down or limiting it and expressing it in
the form of a research question. The research hypothesis is constructed using the
research question. Therefore, a literature review helps in formulating the research
hypothesis.
Develop a theoretical
framework
M
review
e. Explain the recent advances and current trends in the field of research
f. Compare and evaluate findings based on:
i. Assumptions of research
ii. Theories related to the topic of research
iii. Hypotheses
iv. Research designs applied
v. Variables selected
vi. Potential future work speculated by the researchers
g. Acknowledge, cite and quote sources of research. Give credit to the works
of other researchers. Quote their work to show how research contradicts
or contributes to their work. This will make the literature review more
comprehensive and precise.
Notes to write a literature review. How should researcher go about it? Some strategies of
writing a literature review are shown in Figure 3:
Present
Find a Focus State the Focus
Information
State the focus: The researcher writes a simple statement in the literature review
that tells readers what to expect. Some examples are as follows:
zz The current trend in treatment for cancer combines surgery, medicine and
natural healing.
zz Popular media is acquiring academic consideration.
Present information: The researcher organises the information to present in the
following way:
zz Cover the basic categories: A literature review contains the following three
basic categories:
99 Introduction: It gives a quick idea of the topic of literature review, such as
central theme.
99 Body: It contains discussion of sources. It can be organised chronologically,
thematically or methodologically (discussed further).
99 Conclusions/recommendations: It provides the conclusion the researcher
has drawn from reviewing literature.
32
Defining and Formulating a Research Problem
zz Organise the body: Once the researcher has the basic categories in place, Notes
consider how to organise the sources within the body of the review. Table 1
shows the ways to organise sources of a literature review:
a suitable evidence.
zz Selectively highlight only the most important points in each source. Your
points must directly relate to the review’s focus.
zz Avoid using any direct quotes. This is because the survey nature of the literature
review does not allow for in-depth discussion or detailed quotes from the text.
However, if you do want to use quotes to emphasise a point, then use short
quotes sparingly.
zz Summarise and synthesise your sources within each paragraph and throughout
the review.
zz Maintain your own voice by starting and ending a paragraph with your own
ideas and own words.
zz When paraphrasing a source that is not your own, remember to represent the
author’s information/opinions accurately and in your own words.
Revise or review information: Finally, the researcher must revise the review.
Make sure that it follows the outline. Rewrite the language of review to present
information in the most concise manner possible. Avoid unnecessary jargon
or slang; use familiar terminology. The researcher must verify that sources are 33
documented and format the review appropriately.
Research Methodology
Ensure efficient and focussed literature review and other studies Notes
Keep the research centred around the problem
Notes Usefulness and significance: The practical usefulness of a problem is also a major
motivation for a researcher to attend it.
Timelines of the problem: Some problems take little time to be resolved, while
others take a considerable time. So, the time taken to complete research work is
also an important criterion to select a problem.
Data availability: A researcher would select a problem, which has sufficient and
relevant data available.
Novelty: If a problem is around a current topic of interest, then it is more likely
to be picked up for research. Any findings would invite immediate publicity and
funding for the researcher.
3. Evaluating the research problem: The third step in formulating a research problem Notes
is to evaluate it in terms of originality, importance and feasibility. These factors are
discussed as follows:
zz Originality: The research problem should be unique. Any topic on which a
lot of research has already been done should be avoided because it would be
difficult to highlight anything new in that topic. However, in some cases, you
may decide to research a previously researched topic to verify its conclusions,
explain and elaborate the conclusions in a more effective manner, and solve
some of the inconsistencies of the previous research.
zz Importance: The research study should be significant enough to either become
the basis of any new theory or pose some problems for further research. In
addition, the research study should also have some practical applications.
zz Feasibility: This refers to the chances of conducting a successful research. You
should take up a problem, which is feasible for you to conduct a research. A
research problem may not be feasible because of the following reasons:
99 Lack of skills and competencies of the researcher
99 Lack of interest and enthusiasm of the researcher
99 High cost involved in the research study
M
99 Time constraint
99 Administrative constraints, such as lack of cooperation from administrative
M
authorities
S elf A ssessment Q uestions
7. A well-formulated __________ makes the research process easier and more
II
focussed.
8. The first step of formulating a research problem is to identify the variables.
(True/False)
9. __________ is the chief motivation to select a research problem.
10. Formulating a research problem is a __________-step process. Choose the
correct answer.
a. three b. four
c. five d. six
2.5 SUMMARY
A dilemma is referred to as a tough choice in a complicated situation where
managers have to choose between more than one alternative. The word ‘dilemma’
is created by combining prefix ‘di’ and suffix ‘lemma’, where ‘di’ means ‘double’
and ‘lemma’ means ‘proposition’ or ‘subject’.
A management dilemma is usually the symptom of a problem that requires a
business decision, which can be related to an increase in the overall costs, decline 37
in sales, conflicts among employees, etc.
Research Methodology
Notes A literature review is a document that is prepared after conducting search and
evaluation according to the subject or chosen topic area. It examines the published
information related to the particular subject area about which the writer is writing.
A good literature review provides a clear picture of the current knowledge on the
research subject. Conducting literature review is just like doing homework and
getting an idea about the topic in advance.
Literature review also helps broaden the knowledge base in the related research
area in which researchers want to study.
A literature review gives the theoretical background of the research subject.
The steps to conduct the literature review process include searching the existing
literature in your field of interest, reviewing the literature obtained, developing a
theoretical framework, and writing up the literature review.
Literature review sources can be divided into two categories, such as primary and
secondary sources for literature review.
The primary sources are original and provide the first-hand information. The
secondary sources are non-original and provide the second-hand information.
A research problem is referred to as a statement which is about an area of concern,
M
a condition that needs improvement, a difficulty to be eliminated, or a troublesome
query that exists in scholarly literature, in theory, or in practice that requires
meaningful understanding and deliberate investigation.
M
It is important to formulate a research problem carefully to indicate what you
intend to achieve through research.
faced a reduction in sales during the harsh economic times with 2.6 percent reduction Notes
in store visits. Researchers used Management-Research Question Hierarchy
(MRQH) for finding the management’s dilemma. MRQH is a process of sequential
question formulation that helps researchers find solutions to a specific situation or
management dilemma.
It was found that during the initial 5 months, there was a drop of 82.8 million in
customer visits when Walmart’s competitors like the Dollar General Corp and the
Kroger Co. have increased their sales. This was identified and defined as the existing
problem which was required to be solved promptly. Walmart was required to make
sure that its stores address all the existing demands of their customers for ensuring
customer retention.
Various solutions suggested by the researches for Walmart stores were as follows:
Management must recreate the organisation’s leadership in terms of price and
delivery as per the customer’s needs.
Management must focus on delivering high-quality products at reasonable or
reduced prices in every season.
Management must emphasise on offering a different range of products to the
customer and offer more choices.
M
Source: https://ivypanda.com/essays/wal-marts-management-dilemma/
QUESTIONS
M
1. What dilemma was faced by the management of Walmart?
(Hint: Sales reduction, increase in competitor’s sales)
II
2.8 EXERCISE
1. What do you understand by the term ‘management dilemma’?
2. What is the importance of conducting a literature review in research?
3. Define research problem.
39
4. Explain the conditions and components of a research problem.
Research Methodology
SUGGESTED BOOKS
KOTHARI, C. (2019). Research Methodology. [S.l.]: New Age International.
Goddard, W., & Melville, S. (2011). Research Methodology. Kenwyn, South Africa:
Juta & Co.
E-REFERENCES
(2020). Retrieved 3 March 2020, from http://newhorizonindia.edu/nhc_
kasturinagar/wp-content/uploads/2018/01/IV-BBA-BRM-1.pdf
(2020). Retrieved 3 March 2020, from http://www.crectirupati.com/sites/default/
files/lecture_notes/BRM_notes.pdf
7 Basic Steps in Formulating a Research Problem | Research Idea. (2020). Retrieved
3 March 2020, from https://www.campuscareerclub.com/steps-in-formulating-a-
research-problem/
(2020). Retrieved 3 March 2020, from https://www.manaraa.com/upload/43ef7b58-
40 5c8a-4371-8aea-699609cd2aaf.pdf
R
TE
3
AP
H
C
Research Design
M
Table of Contents
M
3.1 Introduction
3.2 The Concept of Research Design
II
3.1 INTRODUCTION
In the previous chapter, the use of research for handling management dilemma has
been discussed. The chapter discussed the importance, functions and process of a
literature review. The chapter next described how to write a literature review and
the types of sources for review. Further, the need of defining a research problem,
conditions and components of a research problem have been discussed. The chapter
concluded with an explanation of formulating a research problem.
The preparation of the design of any research project, generally known as a research
design, is one of the crucial stages for the success of a research project. A research
M
design is a blueprint which is followed as a guide during the complete research
study. A research design is needed to create the framework for a research study
that acts as a guide for data collection and data analysis. A research design is the
blueprint for collection measurement and analysis of data. The all-inclusive purpose
M
of any research is to seek an answer to a research problem. The successful completion
of any research project depends on how well its research design fits with its research
problem.
II
This chapter will help you in understanding the concept of research design. You will
study the need and features of a research design. Further, various types of research
design are also discussed. Towards the end, you will learn about the components of
research design.
research using a particular methodology. It combines various components and data Notes
to arrive at a feasible outcome.
The decisions concerning what, where, when, how much, and by what means
regarding an investigation or a research study constitute a research design. Some
definitions of a research design by different experts are given as follows:
According to David J. Luck and Ronald S. Rubin, “A research design is the determination
and statement of the general research approach or strategy adopted for the particular project.
It is the heart of planning. If the design adheres to the research objective, it will ensure that
the client’s needs are served.”
According to Green and Tull, “A research design is the specification of methods and
procedures for acquiring the information needed. It is the overall operational pattern or
framework of the project that stipulates what information is to be collected from which source
by what procedures.”
M
In other words, a research design is a complete guide and provides answers to the
following questions:
M
What is the research all about?
Why is the research required?
II
and transparency of the collected data which is further analysed and interpreted Notes
to get information.
Validity: A research design should define the use of a measuring device or
instrument and it only measures what it is expected to measure. For instance,
an intelligence test conducted to measure the Intelligence Quotient (IQ) should
measure only the intelligence and nothing else. The questionnaire for IQ test shall
be framed accordingly.
Adequate information: A research design should provide adequate information
so that the research problem can be analysed on a wide perspective. A perfect
research design should consider the following important factors:
zz The exact research problem to be studied
zz The main purpose of the research
zz The procedure of finding information
zz The accessibility of adequate and skilled manpower
zz The availability of enough financial resources for carrying research
Generalisability: This implies how best the data collected from the samples
can be utilised for drawing certain generalities, which will be relevant to a large
M
group from which the sample is drawn. Therefore, a research design helps
researchers generalise their findings provided that due care is taken in defining
the population, selecting the sample, deriving appropriate statistical analysis, etc.,
M
while preparing the research design. A research problem to be generalised should
have the following characteristics:
zz The problem should be clearly formulated.
II
45
Research Methodology
So, for an exploratory research, the research design should be flexible to accommodate
continuous changes. On the other hand, if a research is diagnostic, then the flexible
research design is not appropriate because this type of research demands precision,
accuracy, minimum bias and reliability. Therefore, the research design must be rigid
(not flexible) in this case.
Prior to deciding the research design of a particular type of research, the following
questions must be asked:
What is the nature of the problem of research to be conducted?
Which technique of data collection and analysis would be used in conducting the
research?
M
Which situations are required to be applied to the selected method of data collection
and analysis?
M
Depending on the type of research study to be conducted, the types of research
design are shown in Figure 1:
II
conducts an exploratory study when some facts are known about a problem or Notes
situation and there is a need to know more about it. The key emphasis in such studies
is on the discovery of ideas and insights.
For example, a restaurant chain might undertake an exploratory study to find out
different ways that can be used to improve the quality of customer service in its
restaurants chain without making any major investments. The researcher, in this
case, initially will have only a little information regarding the current status of
quality of customer service for which the researcher wants to conduct a research.
Such information can be gained by exploratory study only. The researcher along
with the research team formed for the purpose may interview the existing customers
of the restaurant, review the available literature and consult experienced people in
the field.
Therefore, in the descriptive and diagnostic studies, the primary requirement of the
research design is the clarity of objectives. It means that the researcher should be
clear about the type of study undertaken and the reasons behind the study. After
that, the techniques of data collection should be selected.
There are various methods of data collection, such as interviews, observations and
questionnaires. The researcher should select any of these methods according to the
research study requirement, but the collected data should be free from any bias
and ambiguity. However, it is good to ensure that the data collection method used
would result in the least number of errors.
The time and place of data collection should also be taken carefully. For instance, if
the researcher wants to survey the effects of recession, the data of only the recession
period is to be considered. In the same manner, if the researcher wants to survey the
effects of water scarcity on the lives of people, then the researcher should approach
those areas that face acute water shortage. Thus, the time and place of data require
discretion on the researcher’s part.
The collected data must be properly analysed by using proper statistical and
software tools. Finally, the report of the study is presented in detail. The report must
48 be presented in a simple and planned manner to explain the findings to the people
concerned in an effective way.
Research Design
Overall design: It is framed with rigidity to protect against biasness and maximise
reliability. So, it has a rigid design.
Sampling design: It follows a probability sampling design.
Statistical design: A pre-planned design for analysis is used.
Observational design: Well-thought and structured data collection instruments
are used.
Operational design: In this, advanced decision about operational procedures is
taken.
Notes Principle of local control: It implies to reduce the experimental error by conducting
the experiment more efficiently. As per this principle, the extraneous factor, a
known source of variability, is made to vary purposely over as varied a range as
needed. This is done to measure the inconsistency caused due to variation and
eliminate the experimental error.
There are multiple ways to categorise experimental research designs. A basic way to
categorise them is as follows:
Formal experimental research designs: These designs use comparatively more
refined and precise forms of data analysis.
Informal experimental research designs: These designs use less sophisticated
forms of data analysis.
Another common way in which they are categorised is:
Basic designs: Basic designs refer to those designs that include only one
independent variable. The main types of basic designs are shown in Figure 2:
zz Pre-test–Post-test Control
Basic Group Design
Designs
M zz Post-test-Only Control
Group Design
M
Figure 2: Types of Basic Designs
In such an experiment, the changes that are observed in the values of the
dependent variable in the experimental group (O2 – O1) arise as a result of the
treatment. Here, it might happen that there is a difference between the control
group’s score, i.e., (O4 – O3). The difference of O3 and O4 is the change in the
value of the dependent variable that may occur even in the absence of any
treatment.
zz Post-test-only control group design: In a post-test-only control group design,
the researcher randomly assigns subjects to the experimental and control
groups. In such a design, the pre-test is not administered. The experimental
group is exposed to a treatment, whereas no treatment is administered to the
control group. Table 2 presents the symbolic representation of the post-test-
only control group design:
The post-test-only control group design is used for research where it is not
possible to assign subjects to groups randomly due to any (ethical/practical)
reason. The main benefit of this design is that it is very simple to implement
and has a low error propagation percentage. The main disadvantage of this
design is that it is highly vulnerable to threats to internal validity.
Statistical designs: Statistical designs refer to those experimental designs in which
there are two or more independent variables. The main types of statistical designs
are as follows:
zz Completely randomised (C.R.) design: The C.R. design refers to the design
in which there is random assignment of subjects (experimental units) to
treatments. Out of the three basic principles of experimental design, this
design includes only two (the principle of randomisation and the principle
of replication). In complete randomisation, every subject carries an equal
probability to be assigned to any treatment. For example, if you wish to test
eight subjects under two treatments (A and B), there is an equal opportunity
of every subject to be assigned to any of the treatments. C.R. designs may
be analysed using ANOVA, independent t-test, or non-parametric tests
depending upon the number of treatments. A two-group randomised design is
the simplest form of C.R. design. In this design, two randomisations (selecting 51
the items randomly), namely random sampling and random assignment, take
Research Methodology
Notes place. Random sampling refers to selecting a sample from the population.
Random assignment refers to assigning subjects selected from the sample to
an experimental group and a control group. The diagrammatic representation
of the two-group simple randomised design is shown in Figure 3:
Experimental Treatment
Group A
Random Independent
Population Sample Variable
Selection
Control Treatment
Group B
experimental group and control group. One group is given training, whereas,
the other group is not. Here, it can be assumed that the group that has received
the training (experimental group) is in a better position as compared to the
other group (control group). This assumption/hypothesis can be tested using a
two-group simple randomised design.
zz Randomised block design: In this design, all three principles of experimental
designs can be applied. The randomised block design refers to the design that is
used when you want to eliminate uncontrolled variations. These variations are
caused by a variable called blocking variable or nuisance variable. For example,
a doctor wants to treat a patient with a specifically prepared medicine. In this
case, the nuisance factor may be the time of giving medicine to the patient
or room temperature. These factors affect the outcome but are not of prime
interest to the doctor.
Numerous nuisance variables exist in all experiments. One can eliminate their
effect on the research study by a technique called blocking. For example, in
the study of school students, one can expect homogeneity in the students of
the same class as compared to the students of the entire school in terms of
knowledge and skills. In this case, a class is a block that can help in reducing
52
variation in the research.
Research Design
Operational design: This component of a research design deals with the techniques
of carrying out the procedures related to sampling design, observational design
and statistical design.
The important concepts related to a research design which are also useful in framing
various components of the research design are explained through the following
points:
Variable: It refers to a parameter that keeps changing with time and space. The
parameter or the variable can take on different quantitative values. Examples of
the variables are income, expenditure and weight that keep on fluctuating from
time to time. Various forms of variables are as follows:
zz Dependent variable: It refers to the variable that can be measured by the
researcher. A dependent variable is affected by the changes in an independent
variable. Researchers measure dependent variables.
zz Independent variable: It refers to the variable that causes a change in a
dependent variable. Independent variables can be controlled. Researchers
manipulate the independent variable to measure its impact on the dependent
variable(s).
53
Research Methodology
A ctivit y
Prepare a PowerPoint presentation on different types of research designs and
their usage in real world.
3.6 SUMMARY
A research design is framed with the purpose to ensure that the information
collected from research will enable the researcher to answer the research problem
satisfactorily.
A research design is needed because it enables the smooth functioning of various
research operations.
A research design is needed to plan in advance the availability of staff, time and
money.
M
A research design should be consistent throughout a series of measurements so as
to provide consistency or reliability.
M
A research design should provide the use of measuring device or instrument
which measures what it is expected to measure.
II
55
Research Methodology
To conduct the study of reduction in price, the senior manager takes 24 routes and
M
randomly assigns 8 routes to treatment A (reduction of ` 5), 8 routes to treatment
B (reduction of ` 10) and 8 routes to treatment C (reduction of ` 15). The tabular
representation of design is as follows:
M
Table A: Completely Randomised Design Table
Group 1 (8 routes) X1 A X2
Group 2 (8 routes) X3 B X4
Group 3 (8 routes) X5 C X6
The preceding table shows the observations made by the researcher before the
treatment, which are termed as X1, X3 and X5 for different fare reductions. It is also
showing the observations made by the researcher after the treatment, which are
termed as X2, X4 and X6.
Thus, by comparing X2 and X1, X4 and X3, X6 and X5, the effect of fare reduction
will be clear to the mamnager.
QUESTIONS
1. How many bus routes were considered by bus transport organisation’s senior
manager for conducting bus fare reduction study?
(Hint: 24 routes)
2. Which of the designs should be considered better: randomised block design or
56 completely randomised design? Give reasons.
Research Design
3.9 EXERCISE
1. What is a research design? List the questions asked to structure a complete research
design.
2. Explain the needs of a research design.
3. What are the features of a research design? Explain.
4. Explain the components of a research design.
M
5. Discuss the significant concepts used in framing various components of a research
design.
6. What do you mean by research design for exploratory research studies?
M
7. Explain the research design for descriptive studies.
8. Describe the research design for experimental studies.
II
2. True
8. laboratory
The Components of Research Design 9. Sampling design
10. False
57
Research Methodology
E-REFERENCES
Elements of Research Design - SAGE Research Methods. (2020). Retrieved 17
March 2020, from https://methods.sagepub.com/book/handbook-of-research-
design-social-measurement/n9.xml
Research Guides: Organizing Your Social Sciences Research Paper: Types of
Research Designs. (2020). Retrieved 17 March 2020, from https://libguides.usc.
M
edu/writingguide/researchdesigns
M
II
58
R
TE
4
AP
H
C
Sampling
M
Table of Contents
M
4.1 Introduction
4.2 Concept of Sampling
II
4.1 INTRODUCTION
In the previous chapter, you studied the concept of research design. The chapter
discussed the features and types of research design. The chapter concluded with the
components of research design.
Sampling is the process of choosing a subset of subject matter or units from the
whole population related to the area of study for the purpose of conducting research.
Researchers use the sampling method when it is not feasible to study every single
element of the target population. Population refers to the collection of elements,
M
individuals, items and objects about which the researcher desires to collect the
information. Population can be finite or infinite. The population is finite when it has
a fixed number of items or elements; for example, the number of people working
M
in an organisation or the number of students in a school. The population is infinite
when it has no fixed number of items or elements and the researcher has no clue or
idea regarding the number of items or elements, for example, the total number of
II
The researcher must make a methodological plan for obtaining a sample from the
target population. This plan is called sample design and the number or items or
elements in the sample are known as the sample size of the population. The researcher
can use the census method or sample survey method for collecting information, but
sample method may result in inaccuracy or error, which is called sampling error.
In this chapter, you will study the concept of sampling and how to determine a
sample size. The chapter will also describe the errors in measurement and sampling
errors. Towards the end, the chapter briefs about the methods of sampling.
Figure 1 shows how samples are taken from the population: Notes
Population
Sample
According to Goode and Hatt, “A sample, as the name applies, is a smaller representative
of a larger whole.”
M
Let us understand the population and sample with the help of some examples:
1. Manyata is a professor of psychology in the University of Delhi. She is interested in
II
studying the level of stress that B. Tech. students encounter during finals. Manyata
is planning to conduct a sample survey and send it around the finals time to some
students for ranking their level of stress during finals on a scale of 1 to 5. Manyata
needs to select students for conducting her survey.
Once the students are selected for the survey, the final number of students is called
the survey sample.
2. Dr. Suyash, the chancellor of a university, wants to collect the feedback of students
on a grading system. It is not practically possible to take the feedback from each
and every student. A sample shall be chosen and based on it, the general feedback
would be considered. A representative sample is the outcome of improved
exactness and accuracy of results.
Researchers may use two primary methods of data collection, i.e., the census method
and sampling method. Let us understand these methods in detail.
Notes the target population which is called sample. For selecting a sample, researchers
must determine the population. Once a researcher recognises the target population,
a sample must be selected. Table 1 shows the difference between census and sample
survey:
be sampled. For example, Geeta takes 4 schools near to her house in her sampling Notes
frame for conducting her study.
3. Define sampling units: The next step is to define the sampling units. It is splitting
up the population in parts, for example, if a researcher wants to survey the entire
nation, the sampling unit will be states, districts, blocks and villages.
4. Specify the techniques of sampling: The next step is to choose the technique
of sampling. There are two types of sampling techniques, i.e., probability and
non-probability techniques. When the sampling frame is the same as the target
population (approximately), the researcher must use a random sampling technique
for choosing the sample. But when the sampling frame is not representing the
target population, the researcher must select a non-random sampling technique.
5. Determine the sample size: The number of elements/items that a sample has
is called the sample size. A researcher should decide the sample size carefully.
It should be neither too large nor too small. Before selecting a sample size, the
following points should be considered:
zz Flexibility: The sample size should have the ability to adapt to changes to
some extent when required.
zz Representativeness: The sample should represent the whole population.
zz Precision: The sample should involve the desired accuracy.
zz
M
Reliability: The sample should be free from errors.
zz Population variance: The deviation in the items of the population is called
population variance.
M
Choose a sample size wisely to control the population variance. For an extremely
diverse population, choose a large sample and vice versa if there is little diversity
in population.
II
6. Implement the sampling plan: The last step is to implement the sampling plan
after identifying the target population, choosing sampling frame, specifying
sampling technique and determining the sample size.
Notes homogeneous, a small sample can be taken for representing the behaviour of the
entire universe or population. When the universe or population is heterogeneous
(dissimilar) in nature, samples must be selected as from each heterogeneous unit.
Class intervals: If there is a large number of class intervals to be created, then the
sample size should be more as it has to showcase the whole population. In case of
small samples, there are chances that few samples are not being included.
Research study nature: The sample size depends on the researcher’s study. For
an intensive study conducted for a long duration, large samples are selected. But
for technical study, the selection of a large number of respondents may result in
increasing complexity while collecting information.
Gross errors: These errors are physical errors in the analysis, calculation and
recording. Gross errors occur due to human errors that lead to inconsistencies in
the research data. When the researcher studies or records incorrect or different
values from the data, these errors occur. These errors are predictable in nature and
can be corrected by reviewing and revisiting the research report.
Random errors: This error is random and unpredictable in nature. Random errors
occur due to a large number of variables that are beyond the researcher’s control
and affect the outcome of the study. Random errors are of two types: sampling
errors and non-sampling errors.
The researcher must be able to recognise the sources of errors in the measurement
and should minimise them. Some important reasons that may cause errors are:
Errors due to the interviewer’s attitude: These errors occur because of the biased
attitude of the interviewer. The interviewer can encourage or discourage certain
viewpoints of respondents by rephrasing questions.
Errors due to respondent’s reluctance: These errors occur due to the reluctance of
respondents to respond to questions. The respondents may feel reluctant to answer
questions correctly because of fatigue, hunger or ill-health. The respondents may
also commit errors because of the lack of knowledge.
64
Errors due to ineffectiveness of the instrument: These errors occur because of
the ineffective measuring instrument, such as a questionnaire. If the questionnaire
Sampling
It is a statistical error that occurs when a researcher does not select a sample that
represents the entire population of data and the results from the sample are not
applicable to the entire population. These statistics may have a value close to or
exactly the same as that of the entire population. For example, if a researcher wants
M
to analyse the average production of wheat in a village during a specific year, then
the researcher needs to find out the target population. In this case, the population
will be the farmers. There are 8,000 farmers in the village who produce wheat. Out
M
of these 8,000 farmers, the researcher selects 800 farmers and calculates the average
production of wheat based on the data (figures) given by 800 farmers. There is a
surety that the output average that has been extracted will have a slightly different
value as compared to the original average. This particular phenomenon is commonly
II
known as sampling error. Hence, these errors may arise because only a small section
of the population as a sample had been selected. Hence, it is commonly known that
a small sample of population won’t give the exact criteria to anything, or won’t
show the real trend of the outcome, but still these errors can be brought down with
a better sampling design.
Notes respondent error, and non-response error. Non-sampling errors may occur even if
all the elements of a given population are considered for a study. In other words,
non-sampling errors arise as a result of factors other than sample selection. Non-
sampling errors may also be caused because of selection bias, ambiguous population
specification, sampling frame error, processing error, respondent errors, non-
response errors, physical environment, inadequacy of enumerators, etc. For example,
a population contains 1,000 elements. The researcher intends to find the average
income of the population. Even if the researcher considers all 1,000 elements to find
the average income, he/she may get inaccurate results because of non-sampling
errors.
information about their income, age and current level of education and skills. The
incorrect information provided by respondents leads to erratic results, even if each
element of the population is considered.
S elf A ssessment Q uestions
7. ____________________ errors do not occur because of taking a sample for data
collection.
8. The poor response of respondents makes it easy for the researcher to derive
accurate results. (True/False)
9. Non-sampling errors may also be caused because of:
a. Selection bias
b. Ambiguous population specification
c. Sampling frame error
d. All of these
66
Sampling
There are two methods of sampling, i.e., probability and non-probability sampling
methods. These sampling methods are shown in Figure 3:
The researchers select the sampling methods based on the requirement of their
chosen topic of the research study. Sampling methods must give:
Precision and accuracy
Extra information about the target population
M
Expenditure recognition
Let us understand the concept of probability and non-probability sampling.
M
4.5.1 PROBABILITY SAMPLING METHODS
Probability sampling is the method of sampling in which the probability of selecting
II
each item from the target population as a sample is equal. The probability sampling
method is alternatively known as random sampling. Examples of this sampling are
tossing the coin and selecting a chit out of five chits. Figure 4 shows the types of
probability sampling:
Systematic Multi-stage
Sampling Sampling
Notes of sample. Here, the probability of selection is 1/50. Figure 5 shows simple random
sampling:
Stratum 1
Stratum 2
Stratum 3
Cluster sampling: In cluster sampling, the entire population is divided into groups
or clusters. After that, these clusters are selected on the basis of random sampling.
All elements of the selected clusters should be included in the sample leaving out
all the elements of the non-selected clusters. For example, a population has been
68 divided into 10 clusters named a, b, c, d, e, f, g, h, i and j. The researcher requires
only 3 clusters for his/her sample out of 10 clusters. Suppose 3 clusters, namely a,
Sampling
i, and d, are selected randomly. In the sample, all elements from these 3 clusters Notes
would be included. Cluster sampling looks similar to stratified, but in stratified
sampling, elements or items are selected from all sub-groups. However, in cluster
sampling, clusters themselves are selected and all elements or items are from the
selected clusters that are included in the sample.
Systematic sampling: In systematic sampling, a sample is selected from the
sampling frame at regular intervals. In this type of sampling, the elements in
the sampling frame are numbered consecutively. After this, a random number is
chosen. Thereafter, the sampling fraction is calculated as the ratio of the actual
sample size and the total population. Starting with the randomly selected number,
samples are drawn using a frequency (inverse of the sampling fraction). For
example, if we need to select 100 units out of available 1,000, the sampling fraction
is 1/10 and the frequency would be 10. It means that if a researcher starts with a
randomly selected element at number 3 (in sampling frame), then he would choose
13th and 23rd elements. Systematic sampling is used in cases when the researcher
has a complete list of all the members of the population.
Multi-stage sampling: In multi-stage sampling, population is partitioned
into various clusters and multiple clusters are again divided and grouped into
different strata on the basis of similarity. Clusters (one or more) can be randomly
selected by the researcher from each stratum. The researcher performs this process
continuously until the cluster cannot be further divided anymore. For instance, the
M
world can be divided into countries and countries into different states, states into
cities, cities into urban and rural areas, and these areas having similar features can
be merged for forming strata. Figure 7 shows multi-stage sampling:
M
Cluster =
Stratum =
II
Item = Population
1 2 3 4 5 6 7 8 9 10
1 5 9 2 6 4 8 3 7 10
1 9 6 4 7 10
Notes because an unknown proportion of the entire population was not sampled. However,
the results obtained from the non-probability sampling cannot be generalised with
much confidence. Figure 8 shows the types of non-probability sampling methods:
Judgement Snowball
Sampling Sampling
of sample directly depends on the expert’s judgement. Quota sampling can also be Notes
considered as a type of judgement sampling because the elements that are chosen
for each quota depend upon the judgement of the interviewer/researcher.
Snowball sampling: Snowball sampling is also known as chain sampling or
referral sampling. This type of sampling is used in research where it is difficult to
identify or locate the units or elements to be included in the sample. In the snowball
technique, the researcher first picks up one or more subjects (to be included in
sample) and then he/she asks them to recommend or refer to subjects that conform
to the criteria for being included in the sample. This process of referral is repeated
with the new subjects till the required number of subjects in the sample is fulfilled.
This method of sampling is called snowball sampling because the process is akin
to the process of rolling a snowball downhill. The initial snowball size (sample
subjects) keeps on increasing in size till the snowball reaches a flat surface (the
desired sample size is achieved). Snowball sampling is used in those cases where
there is no list of population of interest or when the subjects refrain from identifying
themselves socially or due to the secretiveness or illegality of the organisation for
which they work.
A ctivit y
Find out the advantages and disadvantages of probability and non-probability
sampling method.
4.6 SUMMARY
A sample can be a group of people, items/elements or objects selected out of the
population for conducting research.
Sampling is the process of choosing a subset of subject matter or units from the
whole population for the purpose of conducting a research.
Researchers may use two primary methods of data collection, i.e., the Census
method and Sampling method.
Census method of data collection is the method in which researchers study all
elements or items of the population. In the sample survey method, a few elements
of the target population are studied, and not all.
A sample design is considered as a road map which provides the foundation or
basis for the selection of a sample survey. 71
Research Methodology
Notes A good sample design must be error-free and must produce the representative
sample.
An error is a fault or the disparity between the evaluated value and the correct
or exact value. An error may occur due to biased attitude of the interviewer,
respondent’s reluctance, ineffectiveness of instrument, etc.
Sampling error is the error or mistake that occurs in a data collection process as
an outcome of taking a sample from target population rather than using the entire
population.
Non-sampling errors may also be caused because of selection bias, ambiguous
population specification, sampling frame error, processing error, respondent
errors, non-response error, physical environment, inadequacy of enumerators, etc.
There are two types of sampling methods, i.e., probability and non- probability
sampling methods.
Probability sampling is the method of sampling in which the probability of selecting
each item from the target population as a sample is equal. In the non-probability
sampling method, samples are collected in such a manner that all individuals or
elements in the population do not get equal chances of being selected.
population
Sampling errors: The statistical error that occurs when a researcher does not select
a sample that represents the entire population of data
through 2007 (United States v. Brier, et. al., pg. 5). Thus, the IRS was looking for damages Notes
close to 85 million dollars.
There were two major sampling selection errors made by the IRS analyst:
The 345 tax returns were chosen from returns that had a Schedule C attached.
The statistical inference and finding were made by evaluating these 345 samples
only.
Therefore, any inferences from the study could not be generalised for the whole
population. The IRS made the basic mistake in sample selection and did wrong
calculation and, ultimately, provided inaccurate conclusions. This affected the
credibility of the IRS. This proves that any person having the basic knowledge of
statistics and research methodology can catch and highlight errors that are being
stated in the inferences made from inaccurate mathematical analysis and poor
sampling selection techniques.
QUESTIONS
1. Who is/are the plaintiff and defendant in the above case study?
(Hint: IRS and owner of a tax preparation service)
2. What type of errors were found in the inferences made by IRS?
M
(Hint: Mathematical, sampling errors)
3. According to you, what are the sampling selection errors?
M
(Hint: Small sample out of large population)
4. What are the reasons behind the occurrence of errors in sampling?
(Hint: Interviewer’s attitude, knowledge, sample selection techniques, respondent’s
II
skills)
5. Describe some ways that can help in reducing errors in sampling.
(Hint: Careful assessment of population, sample selection, sample size)
4.9 EXERCISE
1. What is sampling design?
2. Explain the steps involved in sampling design.
3. Describe the types of errors in measurement. How can these errors be minimised?
4. What is the difference between census and sample survey?
5. Explain the sampling errors in detail.
6. Describe the methods of sampling.
7. Write a short note on:
a. Stratified sampling
b. Cluster sampling
c. Snowball sampling 73
8. Why does non-sampling error occur?
Research Methodology
2. False
3. Sample survey
Errors in Measurement and Sampling Errors 4. Random
5. True
6. a. Systematic
Non-Sampling Errors 7. Non-sampling
8. False
9. d. All of these
Methods of Sampling 10. multi-stage
11. b. Snowball sampling
12. True
M
4.11 SUGGESTED BOOKS AND E-REFERENCES
SUGGESTED BOOKS
M
Goddard, W. and Melville, S. (2011). Research Methodology. Kenwyn, South Africa:
Juta& Co.
Welman, J., Kruger, F. and Mitchell, B. (2005). Research Methodology. Cape Town:
II
E-REFERENCES
How Simple Random Samples Work. (2020). Retrieved 18 March 2020, from
https://www.investopedia.com/terms/s/simple-random-sample.asp
Types of sampling methods | Statistics (article) | Khan Academy. (2020). Retrieved
18 March 2020, from https://www.khanacademy.org/math/statistics-probability/
designing-studies/sampling-methods-stats/a/sampling-methods-review
Sampling in Statistics: Different Sampling Methods, Types & Error - Statistics
How To. (2020). Retrieved 18 March 2020, from https://www.statisticshowto.
datasciencecentral.com/probability-and-statistics/sampling-in-statistics/
Skye Wills, T. (2020). Sampling Design. Retrieved 18 March 2020, from http://ncss-
tech.github.io/stats_for_soil_survey/chapters/3_sampling/3_sampling.html
74
R
TE
5
AP
H
C
Measurement and Scaling
M
Table of Contents
M
5.1 Introduction
5.2 Concept of Measurement
II
5.1 INTRODUCTION
In the previous chapter, the concept of sampling has been explained. The chapter
discussed census versus sample survey, developing sample design/sampling process
and characteristics of a good sample design. The chapter also described errors in
measurement and sampling processes as well as non-sampling errors. The chapter
concluded with an explanation of methods of sampling.
Measurement and scaling both play an important role in the daily lives of all
human beings. Measurement is important for measuring physical objects that can
M
be measured and expressed in some measuring units. The absence of measurement
in human lives would even make their daily activities very tough. For instance,
construction of a house or an office cannot be imagined without measuring its length,
M
width or height. Even purchase of rice, flour, milk or any groceries is not possible in
the absence of a measurement unit, such as kilograms, grams and litres. Dimensions
are required even for buying clothes.
II
Both measurement and scaling are essential parts of a research study. In any research,
data collection is not possible without measurement and scaling. The reliability and
authenticity of a research result are related to accurate and specific data collection.
It is said that ‘a research is as good as the data that is used for research’ and quality
data requires a well-defined measurement and scaling.
This chapter will help you in understanding the concept of measurement. You will
study the measurement scales, the development of measurement tools and the basic
criteria of a good measurement tool. Further, the concept of scaling techniques is
also discussed. The latter sections of the chapter will discuss the types of scaling
techniques and bases of scale classification. Towards the end, you will learn about
76 the techniques of scale construction.
Measurement and Scaling
zz Named and
zz Named and ordered
ordered variables having
zzNamed and variables having proportionate
zz Named variables
ordered variables proportionate interval between
interval between them. It can also
them accommodate
absolute zero.
In the above table, different ranks have been assigned to the company’s objectives.
78 It is clear that the company has preferred to increase sales as compared to the
number of customers. However, it cannot be said that the company’s preference
Measurement and Scaling
for the increase in sales is two times higher than the company’s preference for a Notes
decrease in cost. Therefore, it can be inferred that an ordinal scale is an improvement
over the nominal scale. The main characteristics of the ordinal scale are as follows:
zz The scale is not expressed in absolute terms.
zz The scale ranks the things from the highest to the lowest.
zz The adjacent ranks do not have equal variance always.
zz Central tendency is measured with the use of a median.
zz Dispersion is measured by using percentile or quartile.
Despite the above-mentioned characteristics of the ordinal scale, the limitation
of the ordinal scale is that arithmetic operations, such as addition, subtraction,
division, etc., cannot be performed with the assigned ranks.
Interval scale: The interval scale is also known as the cardinal scale. It is based on
the principle of ‘equality of interval’, i.e., the intervals are assumed equal and are
used as the basis for making the units equal. In other words, in the interval scale,
the interval between successive positions is equal. The positions are separated by
equally spaced intervals or bases. For example, a researcher wants to measure the
level of happiness among youths along a scale rated from 1 to 10. With the use of
M
an interval scale, following conclusions can be made:
zz The most happy is represented by number 10 and the least happy is represented
by number 1.
M
zz Number 7 represents a higher level of happiness than number 6.
zz The difference in the level of happiness between 4 and 3 is the same as the
difference in the level of happiness between 7 and 8.
II
Notes metre scale does not imply any measurement, such as length, breadth or height.
Arithmetical operations, such as multiplication and division can be easily carried
out with ratio scale.
Ratio scale is the most substantial measurement scale as almost all statistical
operations are possible by its measurements, which cannot be performed by other
scales. For example, it can be measured that the weight or height of Ram is twice
that of Shyam with the help of ratio scale, but this measurement is not possible
with the help of all other scales. Some of the important characteristics of the ratio
scale are as follows:
zz The ratio scale has an absolute zero measurement.
zz The central tendency can be measured by using geometric and harmonic
means.
Development
Stage 3 – Selection of indicators: It is the third stage in the process of developing Notes
measurement tools. Indicators are devices to measure knowledge, opinion, choices,
expectations, and feelings of respondents. Examples of indicators are scales and
questionnaires. As there is rarely a perfect measure of a concept, the researcher
should consider more than one indicator to have the stable scores and improved
validity.
Stage 4 – Formation of index: The last stage of the process of developing
measurement tools is based on the other three stages. A researcher takes into
account multiple dimensions of a particular concept and collects suitable indicators
for proper measurement. After that, these indicators are combined into a single
index for proper measurement. Consequently, an overall index is prepared. For
example, an overall Body Mass Index (BMI) is prepared by National Institute
of Health (NIH) by using individual indicators, such as weight and height, to
measure body fat.
Notes zz Content validity: It is the scope to which the measuring instrument content
provides appropriate coverage of the topic. In case the measuring instrument
contains a representative sample of the universe, the content validity
is considered good. The determination of the instrument is principally
judgemental and instinctual. A panel of persons can also determine the validity
by judging how well the measuring instrument meets the standards, but it
cannot be expressed numerically.
zz Criterion-related validity: It is the situation in which some criterion is used
to judge the validity of the measuring instrument. In other words, it is as the
ability to predict some outcome or estimate the existence of some current
condition. Generally, it is done by making a comparison of the instrument
with other instruments of the same type in which the researcher has more
confidence; for example, comparing the results of two IQ tests on a group of
four students.
The success of measures used for some empirical estimating purposes is
reflected by such validity. A criterion-related validity in a broad term is
subdivided into the following:
99 Predictive validity: It refers to the usefulness of a test in predicting some
future performance.
99
M
Concurrent validity: It refers to the usefulness of a new test as compared
to a well-established test.
zz Construct validity: It is one of the complex and abstract validity measurement
M
criteria. It implies that there should be compatibility between a theoretical
concept and a measuring instrument. Technically, a measure signifying a degree
confirming to predicted correlations with other theoretical propositions is said
II
concepts more precisely. A scale denotes a continuum consisting of the highest point
and lowest point along with numerous intermediate points between the extreme
points. The relation of scale-point positions is that when the first point appears
to be the highest point, the second point indicates a higher degree in terms of the
given characteristics as compared to the third point, and so on. Scaling defines the
measures that are to be used for assigning numbers to various degrees of opinion,
attitudes and other concepts. It may be described as a ‘procedure for the assignment
of numbers to a property of objects to impart some of the characteristics of numbers
to the properties in question’.
Types of Scaling
Techniques
Comparative Non-Comparative
Scales Scales
Summated
Paired
M Scale (Likert)
Comparison
M
Semantic
Rank Order Differential Scale
Scale
II
Guttman
Scale
F 4
In the given example, 6 items are shown. The respondent was asked to
rank the items as per his/her preferences. Item E is the most preferred and
item A is the least preferred by the respondent.
zz Constant sum scale: Constant sum scale is viewed as an ordinal scale because
of its comparative nature. In this form of scaling, the respondents are asked
to rate the different characteristics of an object and assign some number of
units to each characteristic. The respondents have to rate each characteristic in
such a manner that the total number of units or points equals the total number
of units assigned by the researcher or the experimenter. Respondents assign
the number of units to each characteristic based on the importance of the
characteristics to them. If the characteristic holds no importance for an object,
the respondent can assign zero units to it. For example, an HR professional
may create a constant sum scale that equals to 100 marks, to know the relative
importance of different infrastructural features in an organisation, such
as drinking water, clean washroom, gymnasium, sports room, canteen, etc.
The respondents under study were instructed to assign the numbers to
infrastructural features in such a way that the sum of all the marks allocated
to infrastructural features of the organisation must be equal to 100. After the 85
response of all respondents has been noted, the numbers of points earned by
Research Methodology
Notes each attribute are counted. The values arrived through constant sum scale
method can be used to conclude results or help in research.
Non-comparative scales: These are those scales wherein each object is measured
independently of the other objects under the same research study. Absolute
results are obtained for each object. Examples of non-comparative scales include
continuous rating scales, Likert scale, etc. They are generally divided into two
categories: continuous rating scale and itemised rating scale.
zz Continuous rating scale: In a continuous rating scale, the respondents are
asked to rate different objects on a continuum according to a certain criterion.
A continuum is a line running from one extreme value to the other extreme
value of the criterion. The rating is given by respondents by marking a point
on the continuum.
zz Itemised rating scale: In itemised rating scale, items are shown in the form
of ordered statements and the respondents are required to select the category
that best describes the concerned item. The respondents are asked to make
a choice according to their preferences or opinions. A brief description of
each category is associated with the itemised rating scales. The most common
itemised rating scales used by researchers include Likert scale (summated
scale), semantic differential scale, Thurstone and Guttman scale.
99
M
Summated scale (Likert): Summated scales are constructed by using the
item analysis approach. Such scales consist of a number of statements
that express either positive or adverse feelings towards any topic or idea.
M
The summated scale is most frequently used in studying social attitudes.
It follows the pattern developed by Likert; thus, the summated scale is
also termed as the Likert scale. Most commonly, a Likert scale contains
II
five degrees of a statement. Let us know more about the Likert scale with
the help of the following example (statement and options).
Statement: The Internet is creating a positive impact on children.
Response options:
(a) Strongly Agree (1)
(b) Agree (2)
(c) Neutral (3)
(d) Disagree (4)
� (e) Strongly Disagree (5)
In the preceding example, there are five degrees of responses for the given
statement. The right extreme of the scale shows the strongest approval of
the statement, whereas the left extreme indicates the strongest disapproval
of the statement. The middle points are between these two extremes. Each
point on the scale has a numerical value. This example constitutes only
one statement, but more than one statement can be used in Likert scale. In
the Likert scaling method, each statement is assigned a numerical value.
86 The total score for each respondent is calculated by considering his/her
response to each statement.
Measurement and Scaling
99 Semantic differential scale: Factor scales are developed using the Notes
factor analysis approach. The semantic differential scale is an example
of factor scale and was developed by Charles E. Osgood, G.J. Suchi
and P.H. Tannenbaum. It measures the connotative meaning of objects,
events and concepts. The semantic differential scale comprises bipolar
adjectives, such as valuable-worthless and good-bad. The respondent
is asked to select his/her position between these two adjectives. Let us
understand the concept of the semantic differential scale with the help of
the following example. A semantic differential scale analysing candidates
for a managerial position is shown in Table 1. Here, two adjectives
(successful and unsuccessful) are shown on two extremes. In between
these two extremes, scores (3, 2, 1, 0, –1, –2, and –3) are mentioned to
rate the candidates according to the level of traits possessed by them.
Successful-unsuccessful, progressive-regressive, and true-false represent
the evaluative attitude. The potency attitude is represented by the severe-
lenient and strong-weak pairs. The rest of the adjectives shown in Table
1 represent the activity factor. The semantic differential scale has a wide
usage in measurement of the attitude of different people. Table 1 shows
the semantic differential scale for rating candidates for a managerial
position on the basis of the given traits and scores:
M
Table 1: Semantic Differential Scale for Analysing Candidates for a Managerial Position
Successful Unsuccessful
Progressive Regressive
M
Active Passive
Fast Slow
II
Strong Weak
Severe Lenient
True False
3 2 1 0 −1 −2 −3
Subject Orientation
Response Form
Degree of Subjectivity
Scale Properties
Number of Dimensions
Degree of subjectivity: It is the basis which classifies the scale either by measuring Notes
personal preferences or non-preference judgements. In the first case, respondents
may be asked to exhibit their personal opinion. For example:
Which of the following organisations do you favour the most?
zz Organisation A zz Organisation C
zz Organisation B zz Organisation D
In the second case, the respondent may be simply asked to decide the most
profitable organisation. It is clear that in the second case, the scope of personal
opinion is not there.
Scale properties: This is the base of scale classification according to which scales
can be classified as nominal, ordinal, interval, and ratio scales. These are already
discussed in the previous sections.
Number of dimensions: It indicates the dimensions on the basis of which scales are
classified. Two types of scales are used: one-dimensional and multi-dimensional.
Scale construction approach: It indicates scale-classification on the basis of
different approaches used.
Notes Factor analysis approach: According to this approach, the correlation between
different items is established on the basis of a common factor.
Cumulative scale approach: According to this approach, the approval of an item
representing an extreme position should also result in the approval of all items
indicating a lesser than the extreme position.
5.4 SUMMARY
II
The stages of developing measurement tools are concept development, specification Notes
of concept dimension, selection of indicators and formation of index.
A measurement tool should clearly and accurately indicate what the researcher
intends to measure.
Validity denotes the ability of an instrument to measure the sample under study
with logic and reasonability.
Reliability is another important criterion of good measurement. The reliability
of a measuring instrument implies the consistent outcomes received through
measurement.
The practicality feature of instruments of measurement implies that a good
measurement should be economic, interpretable and convenient.
A scale denotes a continuum consisting of the highest point and lowest point along
with numerous intermediate points between the extreme points.
A tool or mechanism that is used to differentiate individuals from one another on
the variables of interest of research is known as a scale.
The types of scales include comparative scales and non-comparative scales.
Comparative scales include ranking scale and constant sum scale. Ranking scale
M
is divided into paired comparison and rank order scale. Non-comparative scales
include continuous rating scale and itemised rating scale. Itemised rating scale
includes summated scale (Likert), semantic differential scale and Guttman scale.
M
Arbitrary scales are easy to develop and provide specific information about a
particular topic.
The consensus approach implies that the items to be included in the scale are
II
Notes toothpaste industry. The study of the existence of competitors in the market, desired
expectations of consumers and the preferences of customers will enable the company
to design its new product as per the market requirements.
For this purpose, CBA Ltd. conducted small research on a sample of 300 respondents,
using a questionnaire containing questions based on rank order scale. The respondents
were presented with 10 toothpaste brands simultaneously and were asked to rank
or order them according to their own presumed criteria. Following form along with
instructions was given to the respondents:
zz Continue with this process until all the brands have been ranked in order
of your preference.
zz The least preferred brand of toothpaste should be rated 10.
zz Also, no two brands should be ranked in the same number.
zz The criteria of preference entirely dependent on respondent. There is
nothing like right or wrong answer.
By compiling and analysing the information received from the survey, the company
could make an assessment that the product characteristics present in Brand 5 (Close
Up) were most valued by customers, followed by Brand 3 (Ultra Bite) and Brand
9 (Pepsodent). The price, durability, quality, functionality, packaging and other
features of the topmost brands gave the required market information to the company
for deciding the desired specifications in the new product to be developed.
The value of competition prevailing in the toothpaste market could also be assessed
by the company. Although the survey gave details on the most favoured and
unfavoured brands, but could not reveal the distances between research objects
or the reasons for customers’ choices between different brands. It was felt that
the survey provided limited information for knowing about the criteria based on
which consumers accept or reject a product. It could not reveal why a product was
important or unimportant to the respondents.
92
Measurement and Scaling
QUESTIONS Notes
5.7 EXERCISE
M
1. Discuss the concept of measurement.
2. Explain the types of scales.
M
3. Explain comparative scales.
4. Describe various non-comparative scales.
5. What are the different bases of scale classification?
II
E-REFERENCES
Measurement Scales in Research Methodology Tutorial 21 March 2020 - Learn
Measurement Scales in Research Methodology Tutorial (11478) | Wisdom Jobs
India. (2020). Retrieved 21 March 2020, from https://www.wisdomjobs.com/e-
M
university/research-methodology-tutorial-355/measurement-scales-11478.html
What are Scaling Techniques? - Business Jargons. (2020). Retrieved 21 March 2020,
from https://businessjargons.com/scaling-techniques.html
M
II
94
R
TE
6
AP
H
C
Data Collection Techniques
M
Table of Contents
M
6.1 Introduction
6.2 Data Collection
II
6.1 INTRODUCTION
In the previous chapter, you studied about the construction of measurement scales
and different types of scaling techniques used for measurement of objects in research.
After completing this part of the research design, the next step is to collect data from
the respondents. This chapter focusses on methods of collection of data. Data can be
collected from two types of sources, i.e., primary or secondary.
suit the purpose of research, then the researcher must modify the tool accordingly
or construct some other tool. Reliability and accuracy must be maintained in the
process of data collection.
This chapter begins by defining primary data and secondary data. The primary data
refers to the information gathered firsthand by the researcher on the interest variables
for the specific purpose of the research study. On the other hand, the secondary
data refers to information gathered from the already existing sources like records of
companies, government publications, etc. In the latter part of the chapter, various
methods of primary data collection and secondary data collection are discussed in
detail. Factors affecting the selection of data collection methods are described at the
end of the chapter.
information from all the relevant sources for finding answers to the research problem, Notes
for testing the hypothesis, and for evaluating the results.
No research can be carried out without sufficient, useful and relevant data. To obtain
accurate data, it is important for a researcher to approach the right resource. For
instance, if the researcher wants to conduct research on the most prevailing disease,
then he/she would approach doctors to collect data for a number of patients suffering
from different types of diseases. After collecting data, the researcher processes and
analyses the data to obtain meaningful information.
There are mainly two types of data which are explained below:
Primary data: Primary data are the data that are collected fresh and for the first
time. The researcher may himself collect this data directly from the respondents or
through his team. Since this data has not been published yet anywhere, it proves
to be more objective and authentic for research objectives. The relevance of this
M
data is higher than other data because it has not been altered. The primary data
can be collected through field observations, surveys, questionnaires or through
experiments. It can include a wide geographical coverage and a large population.
The degree of accuracy of primary data is very high because they are specific to
M
the researcher’s needs and relevant to the topic of the research study. Moreover,
since the primary data is current, it can provide a realistic view of the topic under
consideration to the researcher.
II
Sociometric Questionnaire
Method Method
zz Direct method: In this method, the researcher waits for a particular experiment
or behaviour to occur. This process takes a longer time to get a single response.
II
Notes For example, the calls between customer care executives and customers are
recorded in various call centres for training and quality purposes.
zz Structured method: In this method, the researcher knows what is to be
observed.
For example, if the researcher has to know about a particular brand of a car,
he/she would observe only that brand of car and would not pay any attention
to other car brands. The structured method consumes less time and makes it
easier for the researcher to analyse the data. It is used when the researcher
specifies in detail what has to be observed and how measurements have to be
recorded.
It offers the following advantages:
99 It simplifies and systematises the data-recording process.
99 It is likely to produce quantitative data beneficial for analysing and
comparing information.
It suffers from the following disadvantages:
99 Results are not detailed and in-depth.
99 It is useful for studying small-scale interactions only.
Unstructured method: In this method, the researcher does not know what
zz
M
exactly he/she has to observe. The unstructured method is used in exploratory
research. In this method, the researcher wants to search for all the aspects that
can affect a particular problem.
M
For example, the researcher observes the buying behaviour of people for
different brands of the same product. He/she would study all factors that
can affect the buying decision of people. After that, he/she would analyse the
II
buying decision for a particular brand. Under this method, the researcher
enters the research field with some idea of what might be important, but not of
what exactly will be observed.
It offers the following advantages:
99 The observer has the freedom to decide and observe everything that is
relevant.
99 It is more explorative than the structured method.
It suffers from the following disadvantages:
99 It is an unfocussed approach with the investigator documenting as much
as possible.
99 It is more time-taking than the structured approach.
zz Mechanical method: In this method, the researcher uses some devices to
observe people’s responses. Examples of these devices are video cameras and
audiometres. This method has application in real-time scenarios, such as voice
pitch metres for measuring emotional reactions, analysing traffic flows in the
urban square, monitoring website traffic, etc.
Notes For example, a structured interview can include questions such as:
1. How (as an HR) will you handle a situation of understaffing.
2. What were your major achievements in the previous job?
3. Which manager in your previous jobs was best according to you and
why?
4. Which organisations do you dream of working in at some point in
time and why?
It offers the following advantages:
They are easy to replicate because a fixed set of questions are used.
They are fairly quick to conduct and the results obtained are
representative of a large population.
It suffers from the following disadvantages:
This method is not flexible because a fixed interview schedule has
to be followed.
The answers lack detail and closed questions only generate the
quantitative data.
Unstructured interviews: In these interviews, questions are not
99
M
predefined. The researcher asks questions according to the situation
and environment of the interview. This method is used for probing
more details of a participant so as to assess and judge his responses.
M
Unstructured interviews are carried out when the researcher wants
to explore detailed information about the thoughts or behaviour of
interviewees.
II
is used mostly used by government and big organisations. This type of data helps
in conducting research on a big scale.
Indirect method: In this method, the researcher observes the behaviours that have
occurred in the past using recordings, journals, magazines, industry publications,
etc. This method consumes less time and is less expensive as compared to the
other methods. Suppose a researcher needs to know the sale of a particular brand
in a store. In this case, data can be collected from registers showing the sale of
different products in the store.
S elf A ssessment Q uestions
3. The sociometric method/test enables the researcher to analyse a social group
or workgroup by studying attractions and repulsions among the group
members. (True/False)
4. The schedule method is same as the sociometric method as both the methods
contain a set of questions in the written form. (True/False)
5. Which of the following provide(s) information in the form of balance sheets
and sales records?
a. Company records b. The Internet
106 c. Print media d. Census
Data Collection Techniques
METHODS
The selection of an appropriate method of data collection depends on a number of
factors which are as follows:
The objective of research: It plays an important role in determining the method of
data collection. It defines the motive of conducting research, which, in turn, helps
in knowing the type of data (quantitative or qualitative) that needs to be collected.
The time frame for research: This is the duration within which research needs
to be completed. If the time frame to complete the research is less, the researcher
would use data collection methods that are less time-consuming. However, if the
time to complete the research is more, then the researcher can use data collection
methods that take more time, but provide relatively authentic data such as an in-
depth interview used for exploratory study.
Availability of resources/funds: If the researcher has sufficient funds to conduct
the research, he/she can use expensive methods of data collection, otherwise, he/
she has to look for economical methods.
Precision: It refers to the measure of how close a result comes to its true value. If
the data collection is not done with precision, the findings of the research would
not be valid.
M
Skills of the researcher: This makes or destroys the whole effort of data collection.
Selection of a skilled researcher is necessary because if the researcher is unskilled,
M
he/she may not be able to select the right method of data collection.
Size of sample: Different types of data collection methods are suitable for different
sample sizes. Therefore, the researcher must select the type of data collection
II
method based on the sample size. For example, it would be highly inconvenient to
administer a questionnaire to the participants of a census survey.
S elf A ssessment Q uestions
6. Selection of a skilled researcher is necessary because if the researcher is
unskilled, he/she may not be able to select the right method of data collection.
(True/False)
7. A researcher having sufficient funds to conduct the research can use expensive
methods of data collection; otherwise, they should look for economical
methods. (True/False)
8. If the time period to complete the research is________, the researcher would
use data collection methods that are less time-consuming.
A ctivit y
Suppose you are given the responsibility by your organisation for conducting
research on the popularity of baby food brands among consumers. Which data
collection method would you prefer to select for conducting the research?
107
Research Methodology
BuyerSynthesis believes that the consumers are the most important factor in
any business. Therefore, the organisations must become consumer-oriented.
BuyerSynthesis helps in taking the voice of an organisation’s consumers to the
concerned organisations which can then plan their marketing strategies accordingly
BuyerSynthesis team carries out primary research projects along with their client’s
108 in-house teams to carry out their research.
Data Collection Techniques
In order to carry out data collection for this, BuyerSynthesis started with an internal
audit of the marketing department of ABC so that they may assess the challenges and
the resources of ABC. This was essential in order to find out what aspects of marketing
required refurbishing and whether the recommendations of BuyerSynthesis would
be feasible for them or not.
The focus groups were moderated and they discussed the following aspects:
What did ABC mean to them?
What changes in the organisation would they like to see?
What could be the effect of innovations on them?
All the participants narrated their experiences with respect to the recent and
memorable experience.
M
Focus group research helped BuyerSynthesis in gaining information regarding who
M
ABC’s audience was and what attributes were important for them. BuyerSynthesis
also recognised that the organisational members felt a high degree of personal
attachment with ABC and they deeply appreciated it.
II
QUESTIONS
1. Describe the nature of BuyerSynthesis as an organisation.
(Hint: BuyerSynthesis is a marketing research organisation and it helps its clients
by creating more effective marketing strategies and plans by better understanding
their buyers.)
2. What were the major topics that were discussed within the focus groups created
by BuyerSynthesis for ABC?
(Hint: The major topics that were discussed within the focus groups included:
What ABC meant to them?; What changes in the organisation would they like to
see, etc.)
3. What was the first step adopted by BuyerSynthesis for collecting data for ABC?
(Hint: Data collection with an internal audit of the marketing team of ABC.) 109
Research Methodology
Notes 4. How did focus group research help BuyerSynthesis in gaining information about
ABC?
(Hint: To gain information about ABC’s audience and what attributes were
important for them.)
5. How did the recommendations made by BuyerSynthesis help ABC?
(Hint: Enhancing the relationship between the organisation and its clients and, at
the same time, keep the costs under control.)
6.8 EXERCISE
1. Define data collection and describe the different types of data collection in detail.
2. Explain the different methods of primary data collection.
3. How is data collected using the schedule method?
4. Explain the different methods of secondary data collection.
5. Which factors are to be considered while selecting the methods of data collection?
Collection Methods
7. True
8. less
E-REFERENCES
Jha, G., & Jha, G. (2020). 4 Data Collection Techniques: Which One’s Right for You?
- Atlan | Humans of Data. Retrieved 8 April 2020, from https://humansofdata.
atlan.com/2017/08/4-data-collection-techniques-ones-right/
110 Data Collection Methods - Research-Methodology. (2020). Retrieved 8 April 2020,
from https://research-methodology.net/research-methods/data-collection/
R
TE
7
AP
H
C
Introduction to Questionnaire
Designing M
Table of Contents
M
7.1 Introduction
7.2 Concept of Questionnaire Designing
II
7.1 INTRODUCTION
In the previous chapter, you studied the concept of data collection. The chapter
discussed the types of data. The latter section of the chapter described the methods
of data collection. The chapter concluded with the explanation of the factors affecting
the selection of data collection methods.
Businesses operate on facts and data. Without data, an organisation would have no
idea on where it stands and where it needs to go. One of the simplest, cheapest and
quickest ways to gather data is to create questionnaires. The design of a questionnaire
determines the success of data collection.
M
Creating a questionnaire is an art as well as a science. If it is well-designed, then it
will have better chances of inviting responses than a badly crafted questionnaire.
While creating a questionnaire, you must consider various factors, such as how
M
many questions to ask, whether to ask close-ended questions or open-ended ones,
how to keep the wording of questions simple and effective, how to create questions
that invite correct responses from respondents, how to place questions in the
II
In this chapter, you will study the concept of questionnaire designing. Next, you
learn about the types of questions in questionnaire designing. Further, the chapter
will describe the steps of questionnaire designing. Towards the end, the chapter will
brief about designing an effective questionnaire.
Are you Male Female When was the last time you saw a film, what was it?
II
Are you old are you? Years Months How many hours a day would you spend watching
reading or listening to:
Who do you live with at home (be specific please) TV
Radio
Internet
Print (magazines/newspapers)
M
M
II
Dos Don’ts
1. learly define target
C 1. A
void leading questions, which subtly prompt the
respondents, their age, respondents to answer in a particular way. Such
education level, etc. questions result in false or slanted information.
Examples:
Leading question: You are satisfied with our customer
service, aren’t you?
Non-leading question: How satisfied are you with our
customer service?
Leading question: Do you always consume fast food?
Non-leading question: How frequently do you consume
fast food?
2. ecide if your
D 2. Avoid technical terms or jargons.
questionnaire should be Jargon question: Which feature would you like baked
anonymous or not. into our new product?
Non-jargon question: Which feature would you suggest
to be included into our new product?
115
Research Methodology
close-ended ones:
zz No
How satisfied are you with your current job role? What do you expect from this appraisal?
zz Very satisfied
zz Somewhat satisfied
zz Somewhat unsatisfied
zz Very unsatisfied
How satisfied are you with your manager? How will you describe your relationship
zz Very satisfied with your manager?
zz Somewhat satisfied
zz Somewhat unsatisfied
zz Very unsatisfied
117
Research Methodology
Notes Fixed alternative or multiple choice questions: These questions provide multiple-
choice answers. These questions are usually asked when the possible responses
are limited and clear, such as age, gender, etc. For example:
1. How old are you?
99 12 or younger 99 40 to 59
99 13 to 19 99 60 to 79
99 20 to 39 99 80 or older
2. Which product would you like to see in the showroom?
99 Sports Utility Vehicle 99 Convertible
99 Sedan 99 All of these
99 Hatchback
Dichotomous questions: These are also close-ended questions which can be
answered as Yes/No, True/False or Agree/Disagree. Examples:
1. Have you ever purchased a product or service from our website?
a. Yes M b. No
2. Do you intend to buy a new car within the next six months?
a. Yes b. No
Rating scale/continuum questions: These are close-ended questions where you
M
can assign weights to each answer choice on a scale. The commonly used rating
scales are:
zz Likert rating scale: It is typically a five-, seven- or nine- point scale used to
II
1 2 3 4 5 6
zz Itemised rating scale: This scale is similar to the graphic scale, except that
there are a number of categories which can be marked. For example:
Evaluate each of the following attributes of our product by checking the
118 appropriate box.
Introduction to Questionnaire Designing
‘project’ their own thoughts or attitude in the response. These questions can use
techniques, such as word associations or fill in the blanks. They are difficult to
analyse and are usually used in exploratory research. For example:
Complete the following sentences with the first word or phrase that comes into
your mind.
1. My father seldom ___________.
2. Most people don’t know that I am afraid of ___________.
3. When I was a child, I ___________.
4. When encountering frustration, I usually ___________.
4. Decide the
1. Initial 2. Define the Target 3. Identify the Data Content and
Considerations Audience Required Format of the
Question
7. Design the
8. Add
5. Select the Type 6. Make a Plan of Sequence and
Administrative
of Questions Statistical Analysis Layout of the
Instructions
Question
Notes 8. Add administrative instructions: Add instructions for the administrator. Also,
add definitions of keywords for the ease of participants.
9. Pilot test and revise: Conduct a pilot test and do revisions as necessary.
10. Finalise and implement: Finalise the questionnaire. Ensure that each question is
clear, simple and brief, and the layout is clear. Finally, launch it on the appropriate
media formats.
S elf A ssessment Q uestions
9. What is the first step to design a questionnaire?
a. Define the target audience
b. Decide the statistical tests to be used
c. Define the purpose
d. Select the type of questions to be used
10. You want to administer a questionnaire to different groups simultaneously.
Which design should you use for your questionnaire?
a. Latitudinal design b. Longitudinal design
c. Cross-purpose design d. Cross-sectional design
M
11. Where should you place sensitive questions in a questionnaire?
a. In the beginning b. In the middle
M
c. In the last d. They should not be placed
12. The most important questions should be asked at the end of a questionnaire.
(True/False)
II
Answer format: Arrange answers vertically under each question. If you need to Notes
place any explanatory text or definition, then place them in parenthesis immediately
after the question.
Logical: As far as possible, the questionnaire should reflect some natural flow
of thoughts, a sequence of events, or a logical conversation, depending upon the
subject matter.
Sensitive information: Sensitive topics, whether personal or societal, should be
explored appropriately through indirect questions and are best suited to be placed
at the end of survey.
Pilot study: Always pilot the questionnaire either with some colleagues or people
from the target audience. This will help in detecting any flows prior to the main
survey.
Grouping: Section heading may be used appropriately, and similar questions
related to a particular topic should be grouped together.
Neutral language: The terminology used should be such that it does not lead the
respondents to answer in one particular way, i.e., positive or negative.
Brevity: Make use of relevant, clear, concise and efficient questions. Clear and
concise questions will be able to achieve the desired results rather than including
too many questions.
M
S elf A ssessment Q uestions
M
13. What should not be used while designing a questionnaire?
a. Introduction stating the purpose of the questionnaire
b. Legible typeface
II
Notes A ctivit y
Develop a complete questionnaire for a survey that you will administer to fellow
students in your university. Develop a topic for your questionnaire, determine
the set of constructs you want to measure in the questionnaire and draft each
questionnaire item. Make sure of the following requirements:
The questionnaire will be administered to fellow students, so should be
appropriate for this population.
The questionnaire must include the following:
zz An introduction describing the purpose of the survey
zz At least 15 questions
zz At least three open-ended questions
zz At least three close-ended questions
zz At least one potentially sensitive question
7.6 SUMMARY
Questionnaires are used to collect statistical data from a group of respondents.
M
They are economical, quick, easily implementable and cover a wide range of
population. However, they have the disadvantage of inviting limited response.
Therefore, it is important to design effective questionnaires.
M
A well-designed questionnaire has the following features:
zz They are visually appealing, intriguing, engaging and brief.
II
zz
Pets is one such venture. It assists people who would like to take their pets along
while they go on a vacation. The founder Amit Kumar got the idea for the start-
up when he needed to step out of town for a break and could not find a suitable
boarding facility for his 5-month-old Labrador Lucy. So, he decided to set up a start-
up to help like-minded people.
Buddy-Pets helps plan their vacations by providing a 24x7 boarding and day care
facility for pets. It also helps find the right grooming and pet supplies services. It
even provides a pet-friendly environment where owners can come with pets to dine
in, socialise and play in a garden cafe.
Buddy-Pets faces the challenge of drawing out a strategic marketing plan to make
its venture fundable by the right target group. It wants to position itself in the
operational gap in the current pet care setup, which mostly consists of pet shops,
clinics and grooming centres with referral tie-ups for boarding establishments. It
also wants to study customer preferences regarding pet care facility. Thus, Buddy-
Pets wants to do a market research to:
Analyse customer preferences for the desired pet care facility in a major city,
including Delhi, Mumbai and Bangalore
126
Identify and evaluate opportunities available in these cities
Introduction to Questionnaire Designing
QUESTIONS
1. What considerations should you keep in mind while designing a questionnaire for
the market research?
(Hint: Initial considerations, target audience, type of design, data required, type of
questions, tips, etc.)
2. What steps will you take to design a questionnaire?
(Hint: Initial considerations, define the target audience, make a plan of statistical
analysis, etc.)
3. What challenge does Buddy-Pets face?
(Hint: The challenge of making a marketing strategic plan which attracts venture
fund by the right target group)
4. What was the purpose of the research in the case?
M
(Hint: To design a market strategy for Buddy-Pets)
5. How did Buddy-Pets want to position itself in the existing market?
(Hint: To position itself as an operational gap filler by studying customer
M
preferences for pet care facility.)
7.9 EXERCISE
II
128
R
TE
8
AP
H
C
Data Processing and Analysis
M
Table of Contents
M
8.1 Introduction
8.2 Concept of Data Processing
II
8.2.1 Editing
8.2.2 Coding
8.2.3 Classification
8.2.4 Data Entry
8.2.5 Tabulation
Self Assessment Questions
8.3 Concept of Data Analysis
Self Assessment Questions
8.4 Measures of Central Tendency
8.4.1 Mean
8.4.2 Median
8.4.3 Mode
Self Assessment Questions
8.5 Measures of Dispersion
8.5.1 Range
8.5.2 Mean Deviation
8.5.3 Standard Deviation
Self Assessment Questions
Table of Contents
8.6 Measure of Skewness
Self Assessment Questions
8.7 Measures of Relationship
8.7.1 Correlation Analysis
8.7.2 Regression Analysis
Self Assessment Questions
8.8 Different Charts Used in Data Analysis
Self Assessment Questions
8.9 Summary
8.10 Key Words
8.11 Case Study
8.12 Exercise
8.13 Answers for Self Assessment Questions
8.14 Suggested Books and e-References
M
M
II
Data Processing and Analysis
8.1 INTRODUCTION
In the previous chapter, you studied about questionnaire designing. Now, you will
learn the significance and ways of processing and analysing data retrieved from
such questionnaires.
Data in its raw form does not convey any useful information. It needs to be organised
properly to extract the relevant information and make it fit for research. This is
M
done with the help of data processing that involves various steps, including editing,
coding, classification, data entry and tabulation.
After processing data, you need to analyse it to find answers to the research
M
problem. You can use various statistical measures, such as the measures of central
tendency, dispersion, skewness and relationship to analyse data. The selection of a
measure depends upon the type of the research problem. For example, if you wish
II
to find out the average marks of students of class IX in English, then you would use
the measures of central tendency. However, if you want to know the relationship
between the eating habits of children and problems of obesity, then you would
use the measures of relationship. It is important to note that no single statistical
measure is complete in itself to analyse a data series. Therefore, you should use an
optimum combination of different measures to address the problem at hand in the
most effective manner. Any carelessness in data processing and data analysis can
result in erroneous research findings. Moreover, these data tasks form a major part
of research and consume considerable time and effort of the researcher. Therefore, it
is advisable to remain extra vigilant while processing and analysing data for making
the research as authentic as possible.
The chapter begins by explaining the concept of data processing and data analysis.
Next, it talks about the measures of central tendency, including mean, median
and mode. Information is also provided about the measures of dispersion and
the measures of skewness. It also explains the measures of relationship, including
correlation analysis, regression analysis and multiple regression. Towards the end,
the chapter discusses other statistical measures used for data analysis.
131
Research Methodology
Let us now discuss each step of data processing in the following section.
8.2.1 EDITING
Editing refers to reviewing the collected data to check whether it is valid or not. Data
is examined to detect errors and omission. Errors are corrected, omitted data is filled
in, and data is prepared for further processing. The data is retained for analysis.
M
The editor is responsible for ensuring that the data is accurate, uniform, as complete
as possible and acceptable for tabulation. Editing helps in filtering ambiguous
information that can create a problem at the time of data analysis. Ambiguous
M
information can be in the form of biased or incorrect responses in a questionnaire
and such information needs to be deleted.
8.2.2 CODING
II
Coding is the process of providing some codes to the data in the form of symbols,
characters and numbers. It helps the researcher in interpreting the data and deriving
accurate results. If the data is generated with the help of a questionnaire, it can be
coded either at the time of framing the questionnaire or after collecting the data.
The data that is already coded is known as precoded data.
The data that is coded at the time of data processing is known as postcoded data.
Generally, a questionnaire may contain the following types of questions:
Interval-scale questions: An interval scale is any range of values that have a
relevant mathematical difference but no true zero. Any question where the
respondent must enter a temperature value is an interval scale question because
degrees are interval measurements. The data collected through interval-scale and
closed-ended questions is an example of precoded data.
Closed-ended questions: These questions are those for which a researcher
provides respondents with options from which to choose a response.
Open-ended questions: These questions are those which require more thought
and more than a simple one-word answer. The data collected through open-ended
132 questions is an example of postcoded data. Apart from these, the questionnaire
can also include questions based on nominal scale, ordinal scale and ratio scale.
Data Processing and Analysis
It is easier to code.
It reduces the effort in data processing.
It leads to fewer chances of human error during data processing. Let us understand
the concept of coding with the help of an example.
Following questionnaire aims to measure the comfort level of women in a job after
marriage. Questions 1 to 5 are multiple-choice questions (close-ended questions),
questions 6 to 14 are interval-scale questions and questions 15–16 are dichotomous
questions.
1. ge
A a. 20-30 b. 30-40 c. 40-50 d. 5 0 and
Group above
(years)
2. arital
M a. Married b. Unmarried c. Divorced d. P
lease
Status specify…
(for
example,
engaged,
M widow or
whatever)
3. Children a. None b. One c. Two d. M
ore than
M
two
4. Working a. Working b. Non- c. R
etired d. S
earching e. On
Status working from the for the job leave
job
II
Questions 6–14: Give the ratings in the following questions as per your choice.
The rating of 1 means the lowest and 5 means the highest.
8.2.3 CLASSIFICATION
Classification refers to categorising the coded questions into different segments as
per their relevance. This is done to simplify data processing and analysis to a great
extent. It is important to note that variables in a segment possess certain similar
characteristics. For example, demographic information is a segment that includes
variables, such as age, education and work experience of the respondents.
134 In Table 1, 10 respondents are in the age group of 25-35. Thus, 10 is the frequency
of the class interval 25-35. When class intervals and frequencies are represented in a
tabular form, as in Table 1, such a representation is known as frequency distribution.
Data Processing and Analysis
After classifying data, the researcher enters data in the computer. If wrong data is
entered, then the result would be inaccurate. There are various statistical or database
management software for data entry, such as:
Bio-Medical Data Package (BMDP)
Statistical Programming Language (S-PLUS)
Statistical Analysis System (SAS)
Statistical Package for Social Sciences (SPSS)
Out of all this software, SPSS is widely used by researchers for data entry.
8.2.5 TABULATION
Tabulation refers to presenting data in the form of a table so that it can be easily
analysed. In this stage, the frequencies of the dataset are also computed.
There are three types of frequencies, namely absolute frequency, relative frequency
and cumulative frequency.
Absolute frequency is the exact frequency given by the respondents.
M
Relative frequency is calculated with relation to the frequency of the other class
intervals. It is the percentage of all respondents who have given a particular
response.
M
Cumulative frequency is the percentage of all respondents who have given a
response equal or less than a particular value.
II
There are two types of frequency distributions, which can be put into a tabular form:
1. Two-way frequency distribution: In this type of frequency distribution, two
variables can be analysed at a time. This frequency distribution is also known as
cross tabulation.
2. One-way frequency distribution: In this type of frequency distribution, a single
variable is analysed.
Table 2 shows an example of the one-way frequency distribution.
Notes In Table 2, age group is taken as a variable and different types of frequencies are
calculated. As already discussed, absolute frequency is the precise frequency given
by the respondents. Relative frequency can be calculated by dividing the absolute
frequency with the total frequency. For example, in case of the 20-30 age group,
absolute frequency is 10 and the total frequency is 56; therefore, the relative
frequency is 17.86 (10/56×100). Cumulative frequency can be calculated by adding
up the relative frequency of the present class interval (whose cumulative frequency
we are calculating) and the relative frequency for the following class interval. For
example, in case of the 20-30 and 30-40 age groups, the relative frequencies are 17.86
and 25.00, respectively. Therefore, the cumulative frequency in the case of the 30-40
age group is 42.86 (17.86 + 25.00).
Univariate Analysis
Multivariate analysis
Parametric tests
Inferential analysis
Non-parametric tests
Descriptive analysis: In this type of data analysis, the distribution patterns and
characteristics of different types of variables are analysed. There are three types of
descriptive analysis:
zz Univariate analysis: This analysis studies a single variable. Examples include
measures of central tendency, dispersion and skewness. However, sometimes
these measures can also be used for bivariate and multivariate analysis.
zz Bivariate analysis: In this analysis, two variables are studied. One variable can
be classified as independent and the other as dependent. Examples are rank
correlation, simple correlation and simple regression.
zz Multivariate analysis: In this analysis, more than two variables are studied.
Among the variables being studied, there can be more than two independent
variables and more than one dependent variable. Examples include multiple
correlations and regressions.
Inferential analysis: In this type of data analysis, significance tests are used to
check the validity of a hypothesis for studying a problem. There are two types of
significance tests:
zz
M
Parametric tests: These tests make assumptions about the parameters of the
population from which a sample is derived. Examples of parametric tests
include z-test and t-test.
M
zz Non-parametric tests: These tests do not make any assumptions about the
parameters of the population from which the sample is derived. An example
of a non-parametric test is the Kruskal Wallis test.
II
137
Research Methodology
Weighted Mean
Mode
8.4.1 MEAN
M
Mean represents the value calculated after dividing the sum of observations by the
total number of observations (n) taken. It is also known as arithmetic mean.
M
Following formula is used to calculate mean:
Xi = X1 + X2 +… + Xn
n = Number of observations
Let us understand the concept of arithmetic mean with the help of an example.
Suppose you want to find the average weight of a group of five friends. Table 3
shows the weight of each person in the group:
X = ∑Xi/n
n=5
X = (35 + 40 + 34 + 39 + 42)/5
X = 190/5
X = 38 kg
Notes You want to calculate the geometric mean of four observations: 10, 12, 10 and 11.
The calculation of geometric mean is shown as follows:
X1, X2, X3, X4 = 10, 12, 10, 11
n=4
X g= 4 X1 × X2 × X3 × X4
Xg= 4 10 × 12 × 10 × 11
X g= 4
1320 = 10.718
Therefore, the geometric mean of four observations is 10.7 years.
Harmonic mean: Harmonic mean refers to reciprocal of the average of the
reciprocals of the values in a data series (or observations). The formula to calculate
harmonic mean is as follows:
Harmonic mean (XΗ) = Rec. (Rec. X1 + Rec. X2 +…....+ Rec. Xn)/n
Where, Rec. X1, Rec. X2 …. Rec. Xn = Reciprocal of observations 1, 2,.........., n
n = Number of observations
Example of Harmonic Mean
M
Calculate the harmonic mean of four observations: 10, 12, 10 and 11.
M
Harmonic mean is calculated as:
(XH) = Rec. [(Rec. X1 + Rec. X2 + + Rec. X4)/n]
Where, Rec. X1, Rec. X2 …. Rec. X4 = 1/10, 1/12, 1/10, 1/11
II
n=4
1 1 1 1
X H = Rec. + + + 4
10 12 10 11
247
660 247 660 × 4
X H = Rec. = Rec. =
4 660 × 4 247
247
660 247 660 × 4
X H = Rec. = Rec. = = 10.68
4 660 × 4 247
Therefore, the harmonic mean of the four observations is 10.7 years. It is used
for units that add up as reciprocals in a sequence, such as speed, distance,
capacitance in series or resistance in parallel.
8.4.2 MEDIAN
Median is defined as a central or mid-value of a dataset. Median divides a dataset
140 into two halves – one half contains the values greater than the mid-value (or median)
and the other half contains the values less than the mid-value.
Data Processing and Analysis
Before calculating median, you need to arrange the dataset in the ascending or Notes
descending order. The formula to calculate median is as follows:
n = Number of observations
A group of 17 people gave the following ratings to a book on a 5-pointer scale (where
1 is the lowest rating and 5 is the highest rating):
2, 5, 3, 4, 1, 5, 4, 3, 1, 2, 5, 4, 3, 2, 1, 5, 4
Now you want to calculate the average rating by using median. To do so, arrange the
data in the ascending order, as follows:
1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5
M
Since the number of observations is odd, the following formula will be used to
calculate median:
M
Median = Value of (n + 1/2)th observation
Median = 3
Now, if n is an even number, then we calculate median as the simple average of the
middle two numbers. In other words, median is the simple average of the n/2th and
(n/2 + 1)th terms.
Now, if a group of 20 people gave their ratings to a movie on a 5-point scale as:
2, 5, 3, 4, 1, 5, 4, 3, 1, 2, 5, 4, 3, 2, 1, 5, 4, 1, 2, 3
Now, to calculate the average rating using median, all the 20 observations are
arranged in ascending order as:
1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5
Here, median is the average of middle two values, i.e., values at 10th and 11th
positions. This is calculated as: 141
Median = (3 + 3)/2 = 3
Research Methodology
According to Croxton and Cowden, the mode of a distribution is value at the point
around which the items tend to be most heavily concentrated. It may be regarded as the most
typical of a series of values.
Let us learn to calculate mode with the help of an example. Suppose the marks of
five friends in a science paper are 70, 90, 50, 70, and 30. You want to find the mode
of their marks.
You need to find the highest frequency of the present data to calculate mode. Here,
the number having the highest frequency is 70 as it occurs two times; therefore, the
mode of students’ marks is 70.
Mode is used as the most important statistic for nominal data where values are names
rather than numbers. In such cases, there is no concept of centre because there are no
numbers. In addition, when we are dealing with continuous variables, probability
that observations occurring in the data sample are different is 1. Therefore, mode
cannot be used for continuous variables.
Mode is not considered a true measure of central tendency because of two reasons:
M
i. It is not necessary that one data series has only one mode because many numbers
in the data series can have the highest frequency.
M
ii. Mode does not consider all the frequencies to arrive at the central value of the data
series. Therefore, the results of mode are not reliable.
iii It is possible that a series has observations that occur only once. In such cases,
II
Range
M
Measures of
Dispersion
M
Mean Standard
Deviation Deviation
II
8.5.1 RANGE
Range represents the difference between the highest value and the lowest value in a
data series. It is considered as a rough measure of variability because it depends on
the size of the data series. When the highest (H) and/or the lowest (L) data point in a
data series changes, the range also changes.
Let us learn to calculate range with the help of the preceding example in which a
group of 17 people rated a book on a 5-pointer scale, where 1 is the lowest rating and
5 is the highest rating. The rating given by the 17 people is as follows:
2, 5, 3, 4, 1, 5, 4, 3, 1, 2, 5, 4, 3, 2, 1, 5, 4
143
Research Methodology
Notes Now, you want to calculate the range for the data series.
To do so, you need to find the highest and lowest values of the data series. In the
present case,
Range = (5 – 1)
Range = 4
According to Clark and Schkade, average deviation is the average amount of scatter of
M
the items in a distribution from either the mean or the median, ignoring the signs of the
deviations. The average that is taken of the scatter is an arithmetic mean, which accounts for
the fact that this measure is often called the mean deviation.
M
Mean Deviation is used to measure variability across a data series.
X = Mean/Median/Mode
n = Number of observations
With the help of MD, you can also calculate the coefficient of MD. The coefficient of
MD refers to the relative measure of dispersion that can be calculated by dividing
MD with mean/median/mode.
Coefficient of MD = MD/X
Where
X = Mean/Median/Mode
Let us understand the concept of MD and the coefficient of MD with the help of an
earlier example in which you calculated the average weight of five friends.
144
Data Processing and Analysis
Table 5 shows the data used for calculating mean deviation: Notes
35 + 40 + 34 + 39 = 42
X= = 38
5
Mean Deviation (M.D.) = ∑|Xi – X|/n
M.D. = 14/5 M
M.D. = 2.8
= 0.074
II
Therefore, the dispersion of the weight of five friends from the mean value is 2.8.
Therefore, the weight of all friends is dispersed more or less by 2.8 kg from the
average weight. The relative measure of weight is 0.074.
∑ (X − X) f
2
i
SD of population σ =
n
and σ = Parameter of the population
∑ (X − X) f
2
i
SD of Sample S = 145
n
Research Methodology
If the observations are grouped into a frequency table, then the formulae for SD and
variance change as follows:
∑ (X − X )
2
f
σ=
n
and X =
∑ Xf
∑f
n = ∑f
∑ (X − X )
2
2
f
Therefore, σ =
n
M
The coefficient of SD can be calculated by dividing SD with the mean of the series. It
is a relative measure of dispersion.
M
Let us understand the concepts of SD, the coefficient of SD, and the coefficient of
variance with the help of an example.
II
Suppose you want to calculate the standard deviation of the weights of five friends
shown in the preceding example. Table 6 shows the data used to calculate the standard
deviation, the coefficient of standard deviation, and the coefficient of variance:
Jenny 35 −3 9
Robert 40 2 4
Ella 34 −4 16
Andy 39 1 1
Eliza 42 4 16
146 35 + 40 + 34 + 39 = 42
X= = 38
5
Data Processing and Analysis
= √46/5 = √9.2
= 3.033
= 3.03/38
= 0.0798
As you have learned in the preceding sections, through a measure of central tendency,
you measure the concentration of values of a data series in the middle of a frequency
distribution. Through a measure of dispersion, you measure the scattering of values
near the middle value of the data series.
It may be possible that two data series, which are widely different in nature and
composition, have the same mean and standard deviation. However, when you plot
the data of such series on graphs, you obtain curves with different shapes. This
shows that the measures of central tendency and dispersion are not sufficient to
study the frequency distribution of a data series because they do not talk about the
shape of the frequency distribution curves. Therefore, you need skewness to gain
an understanding of the different shapes of various frequency distribution curves.
The measure of skewness is used when the concentration of values of a data series is
more on a single side that is either positive or negative.
147
Research Methodology
Notes Skewness can be classified as positive skewness and negative skewness. This is
shown in Figure 5:
M M M
E M
O M E O
D D
D I E M D
E A A E I E
N N A A
N N
Symmetric Distribution
Positive skewness implies that the concentration of values is on the right side of the
curve, whereas negative skewness implies that the concentration of values is on the
left side of the curve. Skewness is calculated by taking the difference of mean and
mode. In positive skewness, the values of these three measures of central tendency
are in the following order:
However, in the case of negative skewness, the values of these three measures of
central tendency are in the following order:
Skewness = X – Z
Z = 3M – 2X Notes
The coefficient of skewness is the relative measure of skewness that can be calculated
by dividing skewness with standard deviation.
X−Z
Coefficient of skewness = S k = σ
Sk = (3 Mean – 3 Median)/σ
For a moderately skewed, if there is more than one mode or if there is no mode, then
you need to calculate skewness and the coefficient of skewness using the method of
Moments.
Let us now calculate skewness and the coefficient of skewness with the help of an
M
example. Suppose you want to calculate the skewness and the coefficient of skewness
of the data given in Table 7:
X = 89/5
X = 17.8
M = (5 + 1/2)th observation
M = 3rd observation = 18
Since the data contains two modes (17 and 18), you do not consider mode in this 149
case.
Research Methodology
σ = √∑ (Xi – X)2/n
= √2.80/5 ≅ 0.75
The skewness in the ages of five friends is 0.6 and the relative measure of skewness
is 0.8.
Measures of
Relationship
Correlation Regression
Analysis Analysis
150 Different tools are used to study the correlation pattern between variables. These
include: Rank correlation and Simple correlation.
Data Processing and Analysis
Rank correlation: Rank correlation refers to the correlation between two data
series in which the data is ranked. Generally, it is found when the data is qualitative
in nature. It was given by Charles Spearman. Therefore, it is also known as
Spearman’s coefficient of correlation. It calculates the degree of relationship
between two types of variables.
The formula to calculate rank correlation is as follows:
6∑ di2
Rank Correlation ρ = 1 −
(
n n2 − 1 )
Where, di = Difference between the individual/ith pair of variables
n = Number of pairs of observations
Simple correlation: Simple correlation is used to find the degree of linear
relationship between two variables. It is the most commonly used measure to
describe relationship between two linearly related variables. It was given by Karl
Pearson. Therefore, it is also known as Karl Pearson’s coefficient of correlation.
Simple correlation can be of three types, as given in Figure 7:
M
Positive Correlation Negative Correlation No Correlation
M
II
The strength of association between two variables depends on the calculated value
of the correlation coefficient and the sample size. The value of the correlation
coefficient lies between a range of –1 and +1.
zz If the value of the correlation coefficient is close to –1 and the sample size
is sufficiently large, then there is a strong negative correlation between two
variables. For example, if the coefficient of correlation is –0.8, then there is a
strong negative association between variables.
zz If the value of the correlation coefficient is close to +1 and the sample size
is sufficiently large, then there is a strong positive correlation between two
variables. For example, if the coefficient of correlation is 0.8, then there is a
strong positive association between variables.
zz If the correlation coefficient is not close to –1 or +1 and the sample size is
sufficiently large, then there is weak correlation between two variables. For 151
example, if the coefficient of correlation is 0.3 or –0.3, then the association
between variables is weak.
Research Methodology
n ∑ XY − ∑ (X )∑ ( Y )
r=
n X 2 − (∑ X ) n∑ Y − (∑ Y)
2 2
∑
2
X = Mean of X variable
Y = Mean of Y variable
M
n = Number of pairs of observations
M
Sx = Standard deviation of X
Sy = Standard deviation of Y
II
Let us learn to calculate simple correlation between two variables with the help of an
example. Suppose you want to study the correlation between the age and weight of a
group of people to find out the relation between the two. Table 8 shows the required
data:
Number of
Age (Xi) Weight (Yi) Xi2 Yi2 XiYi
Observations
1 18 35 324 1225 630
2 20 38 400 1444 760
3 25 50 625 2500 1250
4 30 65 900 4225 1950
5 35 70 1225 4900 2450
6 24 50 576 2500 1200
7 17 35 289 1225 595
8 16 39 256 1521 624
9 49 76 2401 5776 3724
10 45 72 2025 5184 3240
152
11 50 85 2500 7225 4250
Data Processing and Analysis
Number of Notes
Age (Xi) Weight (Yi) Xi2 Yi2 XiYi
Observations
12 18 32 324 1024 576
13 20 34 400 1156 680
14 25 57 625 3249 1425
15 24 50 576 2500 1200
16 17 35 289 1225 595
17 16 39 256 1521 624
18 23 44 529 1936 1012
19 22 45 484 2025 990
20 34 60 1156 3600 2040
21 36 65 1296 4225 2340
22 31 63 961 3969 1953
23 43 70 1849 4900 3010
24 44 72 1936 5184 3168
25 16 35 256 1225 560
Total ∑Xi=698 ∑Yi=1316 ∑Xi =22458
2
∑Yi =75464
2
∑XiYi=40846
r = (25 × 40846 – 698 × 1316)/√ (25 × 22458 – 698 × 698) (25 × 75464 – 1316 × 1316) r =
M
102582/√74246 × 154744
r = 0.96
II
Cause and effect analysis is measured using simple regression or multiple regression.
Notes is used to generally predict the values of Y based on the values of X. However, it
cannot be rightly said that Y is caused by X. Before making such an interpretation,
it is extremely imperative for the researcher to thoroughly understand the variables
under study and the circumstances or context under which they operate.
Y = α + βX
Where,
1
M
α= ∑ Y − β∑ X
n
Simple regression analysis is useful in a number of situations, for example, it is used
M
in analysing the relationship between number of consumers (independent variable)
and product sales of a month (dependent variable). The regression equation to the
data is fitted with the use of least squares method in regression analysis.
II
Let us take an example with data of number of customers and monthly sales for 10
number of observations (N) as shown in Table 9:
Y = α + βX Notes
Thus, the regression equation for the above data is given as:
Y = 12.485 + 1.1314X
With this equation, the values of Y (monthly sales) can be computed for any given
value of X (no. of customers) as depicted in Table 10 below:
If the research report contains many descriptive tables, it can be made more
readable and attractive if the most important tables are presented through graphs
and diagrams. In the graphical presentation, facts and figures are gathered first and 155
then they are depicted in the form of graphs and charts to present the statistical
information.
Research Methodology
Notes The most frequently used graphs and charts include the following:
Bar chart: A bar chart represents categorical data with the help of rectangular bars,
plotted vertically or horizontally. The heights or lengths of rectangular bars are
proportional to the values represented by them. The data can be in the form of
absolute frequencies or relative frequencies.
Figure 8 below shows a bar chart to depict the relative frequency/percentage of
shortages of anti-inflammatory medicines in the rural health organisations:
Shortages of anti-inflammatory
NEVER 31
OCASSIONALLY 11
medicines
FREQUENTLY 3
RARELY 55
0 10 20 30 40 50 60
M Percentage of health clinicals
Pie chart: A pie chart is a circular statistical graphic, segregated into different
segments to illustrate the numerical proportions/relative frequency of a number
II
of items. The arc length of each segment shows the proportionate quantity
represented by it. Pie charts provide a quick overview of the data presented to the
readers. All segments of the pie chart should be added up to 100%.
Figure 9 shows a pie chart to depict the relative frequency/percentage of shortages
of anti-inflammatory medicines in the rural health organisations:
Never
31%
Rarely
55%
Ocassionally
11%
Frequently
3%
5
No of sales Salespersons
(32, 182) (182, 332) (332, 482) (482, 632) (632, 782) (782, 932)
M
Sales Range
Figure 10: Absolute Frequency of Sales Effected by Different Salespersons in a Month (n=60)
M
Line graph: A line graph or a line chart is generally used to visualise the value
of a particular variable over time. They are useful to show the trend of numerical
data over a period of time. Two or more distributions (each depicted by a separate
II
line) can be shown in one graph as long as the difference between them is easily
distinguishable. They also make it possible to compare the distributions of different
groups, for example, age distribution between males and females. Figure 11 shows
a line graph to depict the frequency of daily number of patients being treated at
the rural health organisations in District Y:
25
DAILY NUMBER OF PATIENTS
UNDERGOING TREATMENT
20
15
10
0
1 2 3 4 5 6 7 8 9 10 11 12
DAY NUMBER
157
Figure 11: Daily Number of Patients Being Treated at the Rural Health Organisations
in District Y in Line Chart
Research Methodology
Notes Box and whisker plot: This is a method of graphically representing different
groups of numerical data through their quartiles. The box plots can also have
vertical lines extending from the boxes (called whiskers) to indicate the variability
outside the upper and lower quartiles. For example, variability between
sales patterns effected in Area X and Area Y is shown through box plots in
Figure 12:
900
800
700
600
500
Sales
400
300
200
100
0
M
Figure 12: Sales Patterns of Food Grains Effected in Area X and Area Y
8.9 SUMMARY
A researcher collects any type of data, quantitative and qualitative, in raw form.
After that, he/she needs to process the collected data to make it fit for analysis.
Editing refers to reviewing the collected data to check whether it is valid or not.
This helps in eliminating the extra information and retaining the relevant matter
for analysis.
When the data is generated with the help of a questionnaire, it can be coded either
at the time of framing the questionnaire or after collecting the data.
Classification refers to categorising the coded questions into different segments as
per their relevance.
Tabulation refers to presenting the data in the form of a table so that it can be
analysed easily.
Descriptive analysis is used to study the relationship pattern among variables.
Inferential analysis uses various types of test of significance to check the validity
158 of a hypothesis for studying a problem.
Data Processing and Analysis
The measures of central tendency are used to study the distribution pattern of a Notes
dataset.
Mean represents the value received after dividing the sum of observations by the
total number of observations.
Median refers to the central value of the given dataset.
Mode refers to the value that has the highest frequency in a data series.
The measures of dispersion refer to the measures that are used to study the
dispersed value near the mean value.
Standard deviation is used to calculate the scattering of values in a given dataset.
The measure of skewness is used to study the shape of the curve that can be drawn
by plotting the data of a frequency distribution on a graph.
The measures of relationship study the relationship between two or more variables
in a given data series.
The consultants collected a large scale of data with the help of questionnaires,
interviews, and observations in the restaurants’ outlets. Then, they carefully
followed the data processing steps to analyse it and retrieve relevant and meaningful
information from it.
While processing the responses in the questionnaires, they found that quite a large
number of questions were left unanswered. Instead of ignoring such questions, they 159
Research Methodology
After retrieving sufficient data from the questionnaires, they classified the collected
data. To do so, they combined customers’ responses from different cities and then
sub-grouped them according to their cities. Next, they formed a table to analyse the
relationship between customers’ satisfaction and the sales of the company:
Calculating the Correlation between Customer Satisfaction and Sales of the Company
12 8 8 64 64 64
13 7 9 49 81 63
14 10 11 100 121 110
15 6 5 36 25 30
16 9 12 81 144 108
17 8 15 64 225 120
18 10 12 100 144 120
19 9 16 81 256 144
20 8 20 64 400 160
21 10 20 100 400 200
22 4 6 16 36 24
23 5 8 25 64 40
24 10 14 100 196 140
25 10 19 100 361 190
Total 185 239 1525 2957 1973
The correlation between the customers’ satisfaction and the sales of company is as
follows:
160
Correlation (r) = (n∑XiYi -∑ Xi∑Yi) / √n∑Xi2
Data Processing and Analysis
r = (25 × 1973 – 185 × 239) / √ (1525 × 25 – 185 × 185) (25 × 2957 – 239 × 239) Notes
r = 5110/8095.41
r = 0.6
Since the correlation coefficient is positive and close to 1, it indicates that the
relationship between the customers’ satisfaction and the sales is positive and strong.
Similarly, the consultants studied the relationship between different variables, such
as quality of service and customer satisfaction, quality of service and established
standards, and so on. Finally, they concluded that the satisfaction level of the
restaurant’s customers was positive and strong. However, the restaurant’s service
level was far behind the established quality standards.
QUESTIONS
1. What are the different steps of data processing used in the case study?
(Hint: The consultants used all the steps of data processing, that is, first they
extracted the relevant data. Then, they classified and organised the information
and studied the relationship between variables.)
M
2. Which type of measure is used in analysing the table and what type of analysis is
used?
(Hint: The measure of relationship is used to analyse the table.)
M
3. What was done to unanswered questions of the questionnaires filled by customers?
(Hint: Unanswered questions were not ignored and a systematic procedure was
followed to retrieve sufficient data.)
II
4. How was the data retrieved from questionnaire collected and classified?
(Hint: The customers’ responses from different cities were combined and then sub
grouped according to their cities.)
5. How the relationship between customers’ satisfaction and the sales of the company
was derived?
(Hint: By forming a table and calculating correlation between customers’
satisfaction and the sales of the company)
8.12 EXERCISE
1. Explain the different steps of data processing.
2. What are the different types of data analysis?
3. What are the measures of central tendency? Why are they used?
4. What are the measures of dispersion? Why are they used?
5. What do you understand by ‘skewness’? What is the measure of skewness? What
does its calculated value indicate?
161
6. What is the purpose of casual analysis?
Research Methodology
E-REFERENCES
What are Mean, Median, Mode and Range?. (2020). Retrieved 9 April 2020, from
https://searchdatacenter.techtarget.com/definition/statistical-mean-median-
mode-and-range
Introduction to Correlation and Regression Analysis. (2020). Retrieved 9 April 2020,
from http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Multivariable/
BS704_Multivariable5.html
162
R
TE
9
AP
H
C
Concept of Hypothesis
M
Table of Contents
M
9.1 Introduction
9.2 Defining Hypothesis
II
9.1 INTRODUCTION
In the previous chapter, you studied the concept of data processing. The chapter
discussed the concept of data analysis. The latter sections of the chapter described
the measures of central tendency, measures of dispersion and measures of skewness.
The chapter concluded with the explanation of the different charts used in data
analysis.
has resulted into enhanced sales or not. In this case, the researcher would first form
the hypothesis that the new advertisement has no impact on the organisation’s sales.
This hypothesis is known as null hypothesis. After that, the researcher would form
another hypothesis, known as alternative hypothesis, which states that the new
advertisement has a positive impact on the organisation’s sales. Then, the researcher
would analyse the data to find the relationship between the new advertisement and
the organisation’s sales. If he/she finds a relationship between the new advertisement
and the sales, he/she would reject the null hypothesis and accept the alternative
hypothesis.
In the field of research, the concept of hypothesis and hypothesis testing hold a very
special place. The formation of hypothesis helps the researcher remain focussed on
the research problem. In addition, it gives direction to the research project by clearly
defining the scope of research. Hypothesis testing assists the researcher in deriving
realistic results, as it takes into consideration the errors due to sampling.
In this chapter, you will learn about the concept of hypothesis and explore the
characteristics and types of hypothesis. The chapter also provides information about
164
Concept of Hypothesis
hypothesis testing, null and alternative hypotheses, decision rules, one-tailed test Notes
and two-tailed test.
Clear topic: A hypothesis should clearly define its topic. The topic should also be
meaningful.
Precise: A hypothesis should be clear and specific to facilitate a deep and
comprehensive study, and enable researchers to draw reliable inferences on its
basis.
Testable: A hypothesis should be capable of being tested. Hypothesis is specific
and it may either agree or disagree with the research question.
Limited in scope: A hypothesis should be limited in scope, as narrower hypotheses
are generally more testable.
Consistent: A hypothesis should be based on previous research.
165
Research Methodology
Inductive Hypothesis
On the Basis
of Derivation
Deductive Hypothesis:
Types of
Hypotheses Deductive Hypothesis:
Non-directional Hypothesis
On the Basis
of Formulation
Null Hypothesis
Alternative Hypothesis
M Figure 1: Types of Hypotheses
On the basis of derivation, there are two types of hypotheses, which are explained
as follows:
M
Inductive hypothesis: In inductive hypothesis, you move from specific
observations to broad generalisations. First, you observe a phenomenon.
II
Then, you form a pattern from your observations. After that, you form a hypothesis
to study the pattern. Finally, you form a theory on the basis of your study of
the pattern. The inductive hypothesis is used to conduct qualitative studies of
subjective variables. In this type of hypothesis, you should ask open-ended and
process-oriented questions.
Deductive hypothesis: In this type of hypothesis, you move from a general
statement to a specific, logical conclusion. You start from a theory and, based on
it, you make a prediction of its consequences. In other words, you predict what
the observations should be if the theory were correct. Finally, analysis is done
to arrive at a conclusion whether the theory is rejected or accepted with respect
to the problem. In deductive hypothesis, a research goes from general theory to
specific observation. In this type of hypothesis, you should ask closed-ended and
outcome-oriented questions.
On the basis of formulation, there are four types of hypothesis which are explained
as follows:
Directional hypothesis: This hypothesis checks the direction of relationship
between two variables. In directional hypothesis, you use terms, such as more
166 than, less than, negative and positive. An example of the directional hypothesis is:
In an organisation, women are more productive than men.
Concept of Hypothesis
The researchers initially state the null hypothesis and alternative hypothesis. After
M
this, they conduct certain specific tests and at the end of test, they make statements
regarding the likelihood that a research hypothesis is FALSE.
It is true that the researchers make probability statements regarding the likelihood
II
of hypothesis being false instead of it being true, i.e., researchers are interested in
rejecting null hypothesis rather than accepting the null hypothesis because they
never know how much type II error they might be making.
After the null hypothesis and alternative hypothesis have been stated, the researcher
sets the decision criteria for which he/she needs to state the level of significance of
M
test. If the null hypothesis is true, the sample mean will be equal to population mean
on average.
The most commonly used levels of significance in statistics are 1%, 5% and 10%.
II
Mean Value
90%of Area
95%of Area
99%of Area
168
Figure 2: z-Values for the Levels of Significance
Concept of Hypothesis
In Figure 2, you can see that the areas expressed in percentage and their values are Notes
given on X-axis. Table 1 provides the levels of significance and their z-values:
In hypothesis testing, the value level of significance is very important as it helps you
in rejecting or accepting a null hypothesis. You should be careful while formulating
or determining the level of significance for a problem/topic. The reason is that you
may reject a true hypothesis on the basis of a level of significance. If the level of
significance is 5%, it implies that the probability of rejecting a true hypothesis is 0.05
(max).
After the level of significance has been set, the researcher then proceeds to compute
the test statistic which basically describes how far a sample mean is from the
M
population mean. The greater the value of test statistic, the farther is the sample
mean from the population mean described in null hypothesis. Thereafter, on the
basis of value of test statistic, a decision is made.
M
If the null hypothesis is true and the probability of obtaining a sample mean is less
than 5%, then we reject the null hypothesis. On the contrary, if null hypothesis is
true and the probability of obtaining a sample mean is more than 5%, then the null
II
hypothesis is retained.
Example 1: Assume that if a patient takes physiotherapy sessions two times instead
of three times in a week post operation, then his/her recovery time would be greater.
Assume that if the average recovery time after operation is 7 weeks:
H0: The average recovery time after operation is less than or equal to 7 weeks.
H1: The average recovery time after operation is greater than 7 weeks.
From the preceding two examples, it is clear that H0 is totally opposite of the statement 169
the researcher wants to study. The researchers always test H0 for significance, not H1
because they are usually interested in disproving H0.
Research Methodology
Notes H0 and H1 are in the descriptive form. The researcher must convert them into the
quantitative form to compute them.
H 0: μ ≤ 7
H 1: μ > 7
Where,
μ = Population mean
You can also formulate a hypothesis for testing with the help of a benchmark. This
benchmark is a numerical digit with which you have to compare your results and
test the hypothesis. This is one of the finest and widely used methods for framing
null and alternate hypotheses because it represents null and alternate hypotheses in
quantitative form. This makes hypothesis testing easier.
For example, in a school, the average weight of every class is 100 (population mean).
You consider all sections of class 10 as a sample (assume there are 5 sections of
class 10) and calculate their average weight (sample mean). Now, you want to check
whether the sample mean is equal to the population mean or not. In this case, H0 and
H1 would be as follows:
M
H0: X = 100
M
H1: X < 100
Where,
II
X = Sample mean
The researchers assume that the null hypothesis is true and proceed further to find
out various methods/possibilities to solve the problem. They try to reject the null
hypothesis.
A hypothesis can never be right or wrong. Rather, it is judged by what you want to
analyse. If a hypothesis is framed in such a way that it can answer your problem,
then it would be right.
It is important to note that different types of errors may occur while testing a Notes
hypothesis. Therefore, the researcher should take into consideration the possibilities
of these errors while taking decisions.
The decision grid helps the researcher in taking decisions, which is shown in
Figure 3:
Accept H0 Reject H0
Type I errors: These errors occur when the researcher rejects a null hypothesis (H0)
when null hypothesis was true. In this case, the decision taken by the researcher
is wrong. Type I errors are also known as the first kind of error or false positive.
These errors are represented by α.
Type II errors: Type II errors occur when the researcher accepts a null hypothesis
(H0) that should have been rejected. In this case, the decision taken by the
researcher is wrong. Type II errors are also known as the second kind of error or
false negative. These errors are represented by β. The probability of rejecting the
null hypothesis when it is false = 1 – β and is called the power of test.
If you minimise Type I errors, Type II errors would increase or vice versa. Therefore,
you have to be very careful while minimising one type of error. You must remember
that both the types of errors can be limited using an appropriate sample size.
Notes hourly basis and checked for ideal weight. In a given hour, 11 balls are checked
randomly and their mean is calculated as 55.006 grams and SD of 0.029 grams. If
the production line gets out of sync with more than 1% level of significance, the
production line is shut down. Let us see if the production line should be shut down
in this case.
Here,
μp = 55 g;
H0: μp = 55 g
H1: μp ≠ 55 g
α = 1% =0.01
p = 1 – (α/2) = 0.995
tp = 3.169 M
Now, calculate tc.
X−µ
tc =
s n
M
55.006 − 55
tc =
0.029 10
II
0.006
tc = = 0.659
0.0091
Fail to Reject H0
Reject H0 Reject H0
In Figure 4, at the 1% level of significance, the t value would be ±3.169. If the calculated Notes
value of test statistics lies in between the range of –3.169 and +3.169, then H0 would
be accepted. However, if the calculated value of test statistics lies outside this range,
it would be rejected. Here, the rejection region is equally divided between two tails
of the distribution (–0.005 is upper tail and 0.005 is lower tail). In this example, the
null hypothesis is accepted.
For example, assume that the null hypothesis states that mean weight of people is 60
kg or more. In this case, the alternative hypothesis would be that the mean weight of
people is less than 60 kg. Here, the rejection region comprises the range of numbers
0 to 60 located on the left side of sampling distribution (set of numbers that are less
than 60).
Acceptance Region
(If sample mean lies
II
Rejection Region
(If sample mean
lies in this area,
reject H0)
–1.64
The level of significance can be represented with the help of α and α/2 in one-tailed
test and two-tailed test, respectively. For example:
In one-tailed test, if the level of significance is 5%, then α is 5%. In this case, the
value of test statistics would be determined at 0.05.
173
In two-tailed test, if the level of significance is 5%, then the value of test statistics
would be determined at 0.025% (α/2).
Research Methodology
6. Take a Decision
174 1. State H0 and H1: In this step, null hypothesis and the alternative hypothesis are
framed.
Concept of Hypothesis
175
Research Methodology
Notes
9. Which one of the following is the commonly used level of significance for
minimising Type I and Type II errors?
a. 10% b. 12%
c. 5% d. 7%
10. Which of the following are the types of test of significance?
a. t-test b. z-test
c. F-test d. All of these
A ctivit y
Prepare a PowerPoint presentation on hypothesis and hypothesis testing.
9.5 SUMMARY
Hypothesis is a proposed explanation given for an observed situation.
Inductive hypothesis is a type of derivation hypothesis where you move from
specific observations to broad generalisations.
M
Deductive hypothesis is a type of derivation hypothesis in which you move from
a general statement to a specific conclusion.
Directional hypothesis refers to the formulation hypothesis that checks the
M
direction of relationship between two variables.
Non-directional hypothesis refers to the formulation hypothesis where the
direction of the relationship between two variables cannot be specified.
II
Alternative hypothesis: The hypothesis that finds out the relation between two
variables.
Deductive hypothesis: The type of hypothesis that moves from a general
observation to a specific conclusion.
Directional hypothesis: The hypothesis that checks the direction of relationship
between two variables under study.
Non-directional hypothesis: The hypothesis where the direction of relationship
between two variables under study cannot be specified.
Null hypothesis: The hypothesis that says there is no relationship between two
variables under study.
H0: μp ≤ 71.1°
p = 95% = 0.95
Thereafter, the researcher finds out the value of t-statistic at 95% confidence and df
at 6 using t-table, which comes out to be 1.943.
X−µ
tc =
s n
71.3 − 71.1
tc =
0.214 7
177
Research Methodology
Notes 0.2
tc = = 2.47
0.0809
The researcher draws a detailed graph to represent his research as:
95%
Reject
Fail to Reject
5%
tp = 1.943 tc = 2.47
It can be seen that the critical t-value (tc) lies in the rejection region. Therefore, the
researcher rejects the null hypothesis. Rejecting the null hypothesis means that
the sample was not acceptable and it can be stated that there is some issue in the
production of alternators at AM Pvt. Ltd., which it must find out and resolve.
M
QUESTIONS
M
1. What would be the value of tc if the sample had 49 alternators in it?
(Hint:
−µ
II
s n
71.3 − 71.1
tc =
0.214 49
0.2
tc = = 6.65
0.0305
2. What would be the value of tc if the standard deviation was changed to 0.851?
(Hint:
X−µ
tc =
s n
71.3 − 71.1
tc =
0.815 7
0.2
tc = = 0.623
0.321
178 In this case, the null hypothesis would have been accepted.)
Concept of Hypothesis
3. State the policy of the Indian government for selling an alternator in the market. Notes
(Hint: If an alternator can run at less than 71.1° C under stress test assuming 95%
confidence.)
4. How were the samples collected to test the quality?
(Hint: Samples were chosen randomly.)
5. What did the quality department of the company want to find out?
(Hint: The department wanted to find if there was any quality issue or not.)
9.8 EXERCISE
1. Describe the hypothesis and its types in detail.
2. What are the characteristics of a good hypothesis?
3. Explain the hypothesis testing in detail.
4. Explain the following terms:
a. Null hypothesis
b. Two-tailed test M
c. Decision rule
Notes E-REFERENCES
Hypothesis Testing - Statistics Solutions. (2020). Retrieved 9 April 2020, from
https://www.statisticssolutions.com/hypothesis-testing/
Chapter 11: Fundamentals of Hypothesis Testing - Statistics for LIS with Open
Source R. (2020). Retrieved 9 April 2020, from http://www.statisticsforlis.org/
chapter-11-fundamentals-of-hypothesis-testing/
M
M
II
180
R
TE
10
AP
H
C
Parametric Tests
M
Table of Contents
M
10.1 Introduction
10.2 Types of Hypothesis Testing
II
M
M
II
Parametric Tests
10.1 INTRODUCTION
In the previous chapter, you studied to test a hypothesis to find the solution of a
research problem. To check the validity of a hypothesis, you can use two main types
of tests, namely parametric tests and non-parametric tests.
Parametric tests are statistical measures used in the analysis phase of research to
draw inferences and conclusions to solve a research problem. There are various
types of parametric tests, such as z-test, t-test and F-test. Selection of a particular
test for a research depends upon various factors, such as the type of population,
sample size, Standard Deviation (SD) and variance of population. It is important for
M
a researcher to identify the appropriate test to maintain the authenticity and validity
of research results.
In this chapter, you will learn about the concept of parametric tests. You will learn
M
about one-sample and two-sample tests. You will also learn to apply z-test, t-test and
F-test in different conditions and scenarios for one-sample and two-sample tests.
II
Types of
Tests
Parametric Non-parametric
Tests Tests
Parametric tests: In these tests, the researcher makes assumptions about the
parameters of the population from which a sample is derived. An example of a
parametric test is z-test.
183
Research Methodology
Notes Non-parametric tests: These are distribution-free tests of hypotheses. Here, the
researcher does not make assumptions about the parameters of the population
from which a sample is derived. An example of a non-parametric test is the Kruskal
-Wallis test.
S elf A ssessment Q uestions
1. What do you call the hypotheses tests where the researcher makes assumptions
about the parameters of the population from which a sample is derived?
a. Non-parametric tests b. Parametric tests
c. Chi-Square test d. Distribution-free tests
For example, t-test assumes that the variable under study in population is normally
distributed. Researchers calculate the parameters of population using various
M
test statistics. Then, they test the hypothesis by comparing the calculated value
of parameters with the benchmark value given in the problem. The scale used for
dependent value in parametric tests is mostly the interval scale or ratio.
M
There are various types of parametric tests, as shown in Figure 2:
II
z-test
Parametric Tests
t-test
F-test:
problem. After comparison, researcher may decide to reject or support null Notes
hypothesis. The z-test is used in the following cases:
zz To compare the mean of a sample with the mean of a hypothesised population
when the sample size is large and the population variance is known
zz To compare the significant difference between the means of two independent
samples in the case of large samples or when the population variance is known
zz To compare the proportion of a sample with the proportion of the population
t-test: This test is used to study the mean of samples when the sample size is less
than 30 and/or the population variance is unknown. It is based on t-distribution. A
t-distribution is a type of probability distribution that is appropriate for estimating
the mean of a normally distributed population where the sample size is small and
population variance is unknown.
The t-value (test statistic) is calculated for the present data and compared with
the t-value at a specified level of significance for concerning degrees of freedom
for accepting/rejecting the null hypothesis. The degree of freedom is calculated by
subtracting one observation from the number of observations. It is used to check
the t-value in the t-distribution table.
Sometimes, the t-test is used to compare the means of two related samples when the
M
sample size is small and the population variance is unknown. In such a situation,
it is known as the paired t-test.
F-test: This test is used to compare the ratio of variances of two samples under study.
M
It involves comparing the ratio of two variances of two samples. The F-distribution
is a right-skewed distribution that is used most commonly in Analysis of Variance
(ANOVA). Here, the test statistic has an F-distribution.
II
The F-value (test statistic) is calculated for the present data and compared with
the F-value at that level of significance, which is decided earlier in the question/
problem. In an F-test, these are two independent degrees of freedom in numerator
and denominator, respectively. The degrees of freedom (d.f.) of two samples are
calculated separately by subtracting one from the number of observations. After
that, the F-value is calculated from the F-distribution table.
Parametric tests are further divided into two parts – one-sample tests and two-
sample tests. You will learn more about them in the next sections.
ASSUMPTIONS OF F-TEST
F-distribution is usually asymmetric with minimum value of zero. However, the
maximum value is infinity.
Notes The degrees of freedom for different tests are calculated in different ways as
follows:
3. The ___________ is used to compare the mean of samples when the sample
size is less than 30 and the population variance is unknown.
4. Which test is used to compare the significant difference between the variances
of two samples under study?
a. z-test
b. Chi-square test
c. t-test
d. F-test
5. The degree of freedom is calculated by subtracting ___________ from the
________ for t-test.
t = X – μ/(s/√n) (√ (N – n)/(N – 1)
II
Where,
μ = Population mean
N = Population size
n = Sample size
X = Sample mean
z = X – μ/(σp/√n) (√ (N – n)/(N – 1)
Where,
μ = Population mean
Solution: The null hypothesis and the alternative hypothesis are as follows:
H0: The average of production of product A is the same as the overall production of
all products combined.
H1: The average of production of product A is more than the overall production of
all products combined.
Or,
H0: μs = 8 cm
H1: μs > 8 cm
M
Where, μs = Sample mean, that is, the average amount of production of product A
M
Assumed Population mean (μ) = 8 cm
Since the population is finite, the researcher uses the following formula for z-test to
test the hypothesis for significance:
X−µ N−n
z= ×
σp N−1
n
10 − 8 50 − 35
z= ×
2.5 50 − 1
35
2 15
z= ×
2.5 49
188 5.91
The z-value for the 5% level of significance for one-tailed test is + 1.64. Notes
Acceptance Region
Rejection Region
+1.64 +2.61
In Figure 3, it can be observed that the calculated value of z lies in the rejection
region; therefore, H0 is rejected. This implies that the average diameter production
of product A is more than the overall production.
M
10.4.3 EXPLORING CASE-III
M
In Case-III, the population is normal and infinite, the sample size is large and the
population variance is unknown. In this case, the following test statistic is used:
t = X – μ/(σs/√n)
II
Where,
μ = Population mean
n = Sample size
X= Sample mean
The marketers have the average rating from the whole city as 7.5. Now, the
organisation wants to know whether the south part also has the same rating. Use 5%
as the level of significance.
189
Research Methodology
Notes Solution: The null hypothesis and the alternative hypothesis are as follows:
H0: The average rating of the south part of the city is the same as the average rating
of the city
H1: The average rating of the south part of the city is not the same as the average
rating of the city
Or,
H0: μp = 7.5
H1: μp ≠ 7.5
Where, μp = Sample mean, that is, the rating given by the customers in the south part
of the city
The data and the calculation part of the previous problem are shown in Table 2:
7 5 –2.4 5.76
8 8 0.6 0.36
9 8 0.6 0.36
10 9 1.6 2.56
11 7 –0.4 0.16
12 9 1.6 2.56
13 10 2.6 6.76
14 4 –3.4 11.56
15 8 0.6 0.36
16 5 –2.4 5.76
17 10 2.6 6.76
18 8 0.6 0.36
19 9 1.6 2.56
20 6 –1.4 1.96
21 6 –1.4 1.96
22 6 –1.4 1.96
190 23 8 0.6 0.36
Parametric Tests
X = 7.38
M
Population mean (μp) = 7.5
Since the standard deviation for the population is not given, the researcher needs to
calculate the SD for the sample.
σs = √106.56/35
σs = 3.044
The population is infinite; therefore, the researcher uses the following formula for
z-test to test the hypothesis for significance:
t = X – μ/(σs/√n)
(
t = (7.38 − 7.5) / 3.044 / 36 )
−0.12 × 6
t = − 0.12 / (3.044 6 ) = = − 0.236
3.044
The t-value for the 5% level of significance for two-tailed test is + 2.03.
191
Research Methodology
Notes After checking the t-value for significance, the researcher applies two-tailed test.
The graphical representation of the preceding solution is shown in Figure 4:
Acceptance Region
In Figure 4, it can be observed that the calculated z-value lies in the acceptance
region; therefore, H0 is accepted. This implies that the average rating of the south
part of the city is the same as the average rating of the city.
M
10.4.4 EXPLORING CASE-IV
In Case-IV, the observed sample proportions are known. In such a situation, the
M
researcher uses the following test statistic:
p̂ − p
z=
II
pq
n
x
p̂ =
n
n = Sample size
x = value to be standardised
Solution: The null hypothesis and the alternative hypothesis are as follows: Notes
H0: The proportion of girl students observed in the survey is the same as in the
college record.
H1: The proportion of girl students observed in the survey is different from their
proportion in the college record.
Or,
H0: p = 0.40
H1: p ≠ 0.40
Where,
p = Probability of success, that is, the actual proportion of girls in the college
p = 0.40
q = 1 – 0.40
q = 0.60
( p̂ ) = 0.4833
M
p̂ − p
z=
pq
n
II
z = 0.0833/0.009
z = 9.26
The z-value for the 5% level of significance for two-tailed test is ± 1.96. The graphical
representation of the preceding solution is shown in Figure 5:
Acceptance Region
Rejection
Region
193
Figure 5: Calculated z-Value When the Proportion of Population and Sample Means are Given
Research Methodology
Notes In Figure 5, it can be observed that the z-value lies in the rejection region; therefore,
H0 is rejected. This implies that the proportion of girl students observed in the survey
is different from their proportion in the college record. It can be interpreted from
the calculated z-value that the average number of girls in the college has increased.
t = (X– μ)/(σs/√n)
Where,
μ = Population mean
n = Sample size
X = Sample mean
3 2
4 2
5 1.9
6 2
7 1.9
8 2
9 1.9
10 2
11 2
12 1.9
13 2
14 1.9
15 2
16 2
17 1.8
18 2
19 2
194 20 2
21 2
Parametric Tests
22 1.9
23 2
24 1.9
25 2
Total
The average recorded package for the marketing executive post is ` 2 lakhs. The
researcher wants to know whether the average recorded package is valid for this
group or not. Use 5% as the level of significance.
Solution: The null hypothesis and the alternative hypothesis are as follows:
H0: The average recorded package and the sample average income of group are the
same.
H1: The average recorded package and the sample average income of group are
different.
H0: μs = 2,00,000
H1: μs ≠ 2,00,000
M
Where,
μs= Sample mean, that is, the sample mean for the income of the group
M
The data and the calculation part for this example are shown in Table 3:
II
1 2 0.04 0.0016
2 1.9 – 0.06 0.0036
3 2 0.04 0.0016
4 2 0.04 0.0016
5 1.9 – 0.06 0.0036
6 2 0.04 0.0016
7 1.9 – 0.06 0.0036
8 2 0.04 0.0016
9 1.9 – 0.06 0.0036
10 2 0.04 0.0016
11 2 0.04 0.0016
12 1.9 – 0.06 0.0036
13 2 0.04 0.0016
195
Research Methodology
Notes
No. of Observations Income (Lakhs) X i– X (X i – X2)
X = 1.96
II
Since the standard deviation for the population is unknown, the researcher needs to
calculate the standard deviation for the sample as follows:
σs = √0.08/24
σs = 0.058
The population is infinite; therefore, the researcher uses the following formula for
t-test to test the hypothesis for significance:
t = X – μ/(σs/√n)
t = – 0.04/0.0116
t = – 3.45
= 25 – 1
= 24
196
Parametric Tests
The t-value for the 5% level of significance for two-tailed test and 24 d.f. is ±2.064. Notes
The graphical representation of the preceding solution is shown in Figure 6:
Acceptance
Region
Rejection Region
In Figure 6, it can be observed that the calculated t-value lies in the rejection region;
therefore, H0 is rejected. This implies that the average recorded package and the
sample average of the income of the group are different. It can be interpreted that
the average income for the marketing executive post has decreased in the market.
M
10.4.6 EXPLORING CASE-VI
In Case-VI, the population is normal and finite, the sample size is small, and the
population variance is unknown. In this case, the researcher uses the following test
M
statistic:
X−µ (N − n )
II
t=
(σ s
/ n ) (N − 1)
Where,
μ = Population mean
n = Sample size
X = Sample mean
Size Variance
Situation I Normal Large Unknown One-tailed or two-tailed
Situation II Normal Normal Known One-tailed or two-tailed
Situation III Normal Small Unknown One-tailed or two-tailed
These different situations with examples are discussed in the following sections.
Situation-I
In this situation, the population is normal, the sample size is large and the population
variance is unknown. The researcher can use either two-tailed test or one-tailed test
depending on the alternate hypothesis of the research. If the researcher wants to
compare the two samples drawn from two different populations, then he/she would
use the following test statistic:
t = X1 − X 2 / (σ 2
s1 ) (
/ n1 + σ s22 / n 2 )
Where,
When, in any research problem, the value of population variance in known, then the
researcher should use t-statistic.
Solution: The null hypothesis and the alternative hypothesis are as follows:
Or, M
H0: μ1 = μ2
H1: μ1 ≠ μ2
M
Where,
The data and the calculation part of the preceding problem are shown in Table 5:
X1 = 315/35
X1 = 9
X2 = 328/35
X2 = 9.37 ≈ 9.4
∑ (X − X1 )
2
(σ ) =
s1
1i
(n 1
− 1)
36
= = 1.058 = 1.028
34
200
Parametric Tests
∑ (X − X2 )
2
(σ ) =
s2
(n
2i
2
− 1)
8.2
= = 0.2411 = 0.491
34
Since the sample size is more than 30 and two samples are under study, the researcher
applies the following z-test:
t = X1 − X 2 / (σ 2
s1 ) (
/ n1 + σ s22 / n 2 )
(1.028) (0.491)
2 2
t = (9 − 9.4 ) / +
35 35
1.056 + 0.241
t = −0.4 /
35
1.297
t = −0.4 / = 0.037
35
M
t = −0.4 / 0.037 = −10.81
The t-value for the 5% level of significance for two-tailed test is ± 2.032 (degree of
M
freedom = 35 – 1 = 34).
Acceptance Region
Rejection Region
In Figure 7, it can be observed that the z-value lies in the rejection region; therefore,
H0 is rejected. The popularity of Brand A is not the same as the popularity of
Brand B.
201
Research Methodology
Notes Situation-II
In this situation, the population is normal, the sample size is large, and the population
variance is known. The populations are equal. The researcher can use either two-
tailed test or one-tailed test depending on the alternate hypothesis of the research. If
the researcher wants to compare two samples drawn from the same population, then
he/she would use the following test statistic:
z = ( X 1 − X 2 ) / σ p2 (1 / n1 ) + (1 / n 2 )
Where,
Example 6: A researcher has collected two samples from various production houses
of an organisation. He has taken a sample of Product P from 500 production houses.
M
He has found that the average production of Product P is equal to 1,000 pieces/
month with a standard deviation of 13 pieces. He has also taken a sample of product
Q from 400 production houses. He finds that the average production of product Q is
M
1,200 pieces/month with a standard deviation of 15 pieces. The standard deviation
of the production houses of the organisation is 14. Is this the same organisation from
where the researcher has collected the samples? Use 5% as the level of significance.
II
Or,
H0: μ1 = μ2
H1: μ1 ≠ μ2
Since the sample size is more than 30, the population variance is known, and two
samples are under study, the researcher would apply the following z-test:
z = ( X 1 − X 2 ) / σ p2 (1 / n1 ) + (1 / n 2 )
1 1
z = (1000 − 1200 ) / (14)
2
500 + 400
4 + 5
z = (−200 ) / 196
2000
z = (−200) / 0.939
z = −212.99
M
The z-value for the 5% level of significance for two-tailed test is ± 1.96. The graphical
representation of the preceding solution is shown in Figure 8:
M
Acceptance Region
II
Rejection Region
In Figure 8, it can be observed that the z-value lies in the rejection region; therefore,
H0 is rejected. This implies that the population means of products P and Q are
different. It can be interpreted that the calculated z-value showing the difference
between means of two samples is statistically significant.
Situation-III
In this situation, the population is normal, the sample size is small, and the population
variance is unknown. The researcher can use either two-tailed test or one-tailed test 203
on the basis of research problem and the alternative hypothesis. If the researcher
Research Methodology
Notes wants to compare two samples drawn from two different populations, then he/she
would use the following test statistic:
X1 − X 2
t=
SE
Where,
SE = Standard Error
1 1
SE = S p +
n1 n 2
σ 12 (n1 − 1) + σ 22 (n 2 − 1)
Sp =
(n 1
− 1) + (n 2 − 1)
X1 − X 2
∴t =
σ 12 (n1 − 1) + σ 22 (n 2 − 1) 1 1
+
( n1 − 1) + (n 2 − 1) n1 n 2
M
Where
Example 7: The average sales volume of two cities, A and B, for an organisation in
10 retail outlets is 100 and 200, respectively. The standard deviation for A is 5.5 and
for B is 6.5. Test the hypothesis for the difference in sales of the two cities by using
5% as a test of significance.
Solution: The null hypothesis and the alternative hypothesis are as follows:
H1: Average sale of City A is not equal to the average sale of city B.
Or,
H 0: μ 1 = μ 2
H 1: μ 1 ≠ μ 2
204 Where,
Since the sample size is less than 30 and two samples are under study, the researcher
would apply the following t-test:
X1 − X 2
t=
σ (n1 − 1) + σ 22 (n 2 − 1) 1
2
1 1
+
( n1 − 1) + (n 2 − 1) n1 n 2
t=
(100 − 200)
(5.5) (10 − 1) + (6.5) (10 − 1) 1
2 2
1
+
(10 − 1) + (10 − 1) 10 10
M
−100
t=
9 (30.25 + 42.25) 1
M
18 5
−100
=
7.25
II
−100 −100
= =
2.692 2.7
= − 37.03
The t-value for the 5% level of significance for two-tailed test with 18 as degree of
freedom is ± 2.101. The graphical representation of the preceding solution is shown
in Figure 9:
Acceptance Region
Rejection Region
Notes In Figure 9, it can be observed that the t-value lies in the rejection region; therefore,
H0 is rejected. This implies that the average sales volume of City A is not equal to
the average sales volume of city B. It can be interpreted from the calculated t-value
that the difference between the means of the two samples is statistically significant.
Where,
M
p1 = Proportion of success of the first sample
Example 8: In a college, there are two streams: science and commerce. The college
management wants to find out whether there is a significant difference between
the proportions of average students (students who are neither toppers or laggards
with respect to study) of the two streams. Therefore, the management conducts a
survey and finds out that 350 out of 500 students of the science stream are under
the category of average students. In the case of the commerce stream, 550 students
out of 600 students are under the category of average students. Use 5% as a level of
significance.
Solution: The null hypothesis and the alternative hypothesis are as follows:
206 H0: There is no difference between the proportions of average students of the science
and commerce streams in the college.
Parametric Tests
H1: There is a significant difference between the proportions of average students of Notes
the science and commerce streams in the college.
Or,
H 0: p 1 = p 2
H 1: p 1 ≠ p 2
Where,
p1 = 0.7
q1 = 0.3
p1 − p2
z=
p1q 1 p2 q 2
n + n
1 2
0.7 − 0.5
z=
(0.7 )(0.3) + (0.5)(0.5)
500 600
z = 0.2/0.029
z = 6.9
The z-value for the 5% level of significance for two-tailed test is ± 1.96.
207
Research Methodology
Notes The graphical representation of the preceding solution is shown in Figure 10:
Acceptance Region
Rejection Region
In Figure 10, it can be observed that the z-value lies in the rejection region; therefore,
H0 is rejected. This implies that there is a significant difference between the average
students of the science and commerce streams in the college. It can be interpreted
from the calculated z-value that the difference between the proportions of the two
samples is statistically significant.
M
Example 9: In a sample of 700 engineering colleges from a state, littering by first
year students was prevalent in 500 colleges. After the ban on littering in the same
state, it was found that 500 colleges out of 800 colleges were involved in littering.
M
The decrease in the proportion of the number of colleges involved in littering was
significant or not? Test the hypothesis at the 1% level of significance.
Solution: The null hypothesis and the alternative hypothesis are as follows:
II
Or,
H 0: p 1 = p 2
H 1: p 1 ≠ p 2
Where,
p1 = 0.71
208 Proportion of failure in sample one, q1 = 1 – p1= 1 – 0.71
q1 = 0.29
Parametric Tests
p2 = 0.625
q2 = 0.375
The two samples are taken from the same population; therefore, you can calculate
the best estimate for proportion, which is the common value of proportion. The best
estimate for proportion (p0) for the two samples of colleges involved in ragging can
be calculated as follows:
p0 = 0.66
q0 = 1 – 0.66
q0 = 0.34
M
The test of significance used is as follows:
p1 − p2
M
z=
p1q 1 p2 q 2
n + n
1 2
II
z = 0.085/0.024
z = 3.54
The z-value for the 1% level of significance for two-tailed test is ± 2.58. The graphical
representation of the preceding solution is shown in Figure 11:
Acceptance Region
Rejection Region
Notes In Figure 11, it can be observed that the z-value lies in the rejection region; therefore,
H0 is rejected. This implies that there is a significant difference between the number
of engineering colleges involved in littering. It can be interpreted from the calculated
z-value that the difference between the proportions of two samples is statistically
significant.
If a researcher wants to compare two related samples, then he/she can use the
following test statistic:
D
t=
SD
M
n
M
Where,
n = Sample size
(∑ D)
2
∑D 2
−
n
(SD) =
(n − 1)
F = s12/s22
210
Where, s1 is larger of the two variances
Parametric Tests
∑ (X − X1 )
2 Notes
1i
s1 = Variance of the first sample =
2
(n 1
− 1)
∑ (X − X2 )
2
2i
s 22 = Variance of the second sample =
(n 2
− 1)
n = Sample size
Variance of the two samples can be calculated using the following formula:
n1 = Sample size
n2 = Sample size
Example 10: A researcher studied two samples of a type of wheat produced from the
north region and the south region of a state. He took two samples of wheat – type A
(north region) and type B (south region). The sample size of type A wheat is 10 cities
and the sample size of type B wheat is 13 cities. The variances for two samples with
respect to gluten content are 5 and 4, respectively. The researcher wants to find out
whether the two populations have the same variance. Test this at the 5% significance
level.
Solution: The null hypothesis and the alternative hypothesis are as follows:
Or,
H0 : σ 12 = σ 22
H1 : σ 12 # σ 22
Where,
H0 : σ 12 = σ 2
Population
2 variance from sample A 211
H0 : σ 12 = σ 22 = Population variance from sample B
Research Methodology
Notes H0 : σ H
We are given that =0 :σ
1 =
2
5σ2and
2
1 = σ2 = 4
2 2
Therefore,
σ 12
F:
σ 22
F = 5/4
F = 1.25
= 10 – 1= 9
= 13 – 1
= 12
The value of sample B is greater than the value of sample A; therefore, v1 = 12 and v2
M
= 9. In this case, the F-values for the two-tailed test are calculated as:
Accept H0
Reject H0
Reject H0
1.25
212 In Figure 12, it can be observed that the calculated F-value lies in the acceptance
region; therefore, H0 is accepted and H1 is rejected. This implies that there is no
Parametric Tests
difference between the variances in gluten content of two populations. It can be Notes
interpreted from the calculated F-value that the samples are statistically insignificant,
that is, the variances of the two populations are equal.
ANOVA is used to study and explain the amount of variation in more than two
samples or data sets. In a data set, two main types of variations can occur. One type
of variation occurs due to chance, while the other type of variation occurs due to
specific reasons. These variations are studied separately in ANOVA to identify the
actual cause of variation and help the researcher take effective decisions. There are
two main types of ANOVA. Let us learn about these in detail.
X = Individual observation,
Xj = Sample mean of the jth treatment (or group),
X = Overall sample mean,
k = The number of treatments OR independent comparison groups
and
N = Total number of observations or total sample size
3. Calculate the variation between two samples, known as SS between, with the help
of the following formula:
SS between = n (X1 – X) 2 + n 2(X2 – X)2 + n3 (X3 – X) 2
Where, n1, n2 …..= sample size of sample 1, sample 2, and so on………………
SS between is the square of deviations of the sample means from the mean of the
sample means value. It helps know variations between two samples.
4. Divide SS between with d.f. k – 1 to get mean of square between (MS between).
MS between is the mean of variations in two samples. The following formula is
used to calculate MS between:
MS between = SS between/(k – 1)
5. Calculate variation within samples, known as SS within, with the help of following
formula:
SS within = ∑ (X1i – X1)2 + ∑ (X2i – X2)2 + ∑ (X3i – X3)2
Where, X1i, X2i, X3i = observed values in a sample
214 X1, X2, X3 = means of corresponding samples
Parametric Tests
SS within is the square of deviations of values of data series from the corresponding Notes
means of samples. It helps calculate variations within samples.
6. Divide SS within with d.f. n – k to get mean of square within (MS within).
MS within is the mean of variations occurred within samples. The following
formula is used to calculate the MS within:
MS within = SS within/(n – k)
Where, n = total of the sample size of all the samples, that is, n1 + n2 +…..
7. Add the square of deviations to get the total variation in samples. The following
formula is used to calculate the total variation:
Total variation = SST = ∑∑( X − X)2
To calculate total SS, first individual observations are subtracted from the mean
of sample means. After that, the squares of individual observations are taken and
summed up to obtain results. The d.f. used in this case is n – 1.
8. Calculate the F-ratio with the help of the following formula: F-ratio = MS between/
MS within.
The calculated value of F-ratio is tested against the tabulated value of F-ratio
(determined at a specified level of significance). If the value of F-ratio lies under
M
the limits of acceptance region, the null hypothesis is accepted and the alternate
hypothesis is rejected.
M
Let us understand the application of one-way ANOVA with the help of the following
example.
Example 11: The researcher observed the sale of a product of a particular brand in
II
six big retail houses in three cities. He/She wants to determine whether the mean
sale is the same across cities. Use the data shown in Table 7 to calculate one-way
ANOVA:
Retail Houses City A (in Lakhs) City B (in Lakhs) City C (in Lakhs)
1 3 6 9
2 8 9 8
3 4 8 6
4 9 5 7
5 6 7 5
6 7 4 7
H1: The mean sale of at least one city is different from the rest of the two cities
X = 6.6
= 2.1
= (10.05 + 3.35 + 4.71 + 8.01 + 0.03 + 0.7 + 0.25 + 6.25 + 2.25 + 2.25 + 0.25 + 6.25 + 4 + 1
+ 1 + 0 + 4 + 0) = 54.34
M
Total variance = [(3 – 6.6) 2 + (8 – 6.6)2 + (4 – 6.6)2 + (9 – 6.6)2 + (6 – 6.6)2 + (7- 6.6)2 + (6
– 6.6)2 + (9 – 6.6)2 + (8 – 6.6)2 + (5 – 6.6)2 + (7 – 6.6)2 + (4 – 6.6)2 + (9 – 6.6)2 + (8 – 6.6)2 +
(6 – 6.6)2 + (7 – 6.6)2 + (5 – 6.6)2 + (7 – 6.6)2] = 56.48
M
ANOVA table created after completing preceding calculation is shown in Table 8:
You can check the F-table for significance with the help of one-tailed test. The
graphical representation of the preceding solution is shown in Figure 13:
Acceptance Region
Rejection Region
0.29 3.68
216
Figure 13: Graph Showing the Position of the Calculated F-Value
Parametric Tests
Figure 13 shows that the calculated F-value lies in the acceptance region; therefore, Notes
H0 is accepted and H1 is rejected. The value implies that the product's sale is
almost same in the three cities. You can also use another method of ANOVA, which
is performed with the help of correction factor. It is also termed as the shortcut
method. It is more convenient in case of non-integer values. The steps involved in
this method are mentioned as follows:
1. Calculate the correction factor with the help of the following formula: Correction
Factor = (T)2/n
Where, T= Summation of all the observed values in the samples
n = Total number of observations
2. Compute SS between by first taking the sum of observed values in each sample.
Thereafter, obtain the square of the sum of the observed values and divide the
number with the respective sizes of samples. Then, add the resultant values and
take difference between the added value and correction factor to obtain variation
between two samples. The following formula is used to calculate the variation:
SS between = ∑ (Tj)2/nj– (T)2/n
Where, Ti= Sum of the observed value of a sample = T1, T2, ……….
M
nj = Sample size of a sample = n1, n2,……………………
n = Sum of the sample size of different samples
3. Divide SS between with d.f. k – 1 to get MS between. The following formula is
M
used to calculate MS between:
MS between = SS between/(k – 1)
II
4. Calculate and add the squares of all individual values in samples. The sum of the
square of individual values is subtracted from SS between and the value obtained
is termed as SS within or variation within the samples. The following formula is
used to calculate SS within:
SS within = ∑Xij2 – ∑ (Tj)2/nj
Where, Xij 2 = Squares of all individual values in samples
5. Divide SS within with d.f. n – k to get MS within. The following formula is used to
calculate MS within:
MS within = SS within/(n – k)
Where, n = Total of the sample size of all the samples, that is, n1 + n2 + …..
6. Calculate total variation by taking the sum of squares of all individual values
in the samples. After that, subtract each variation of individual values with its
corresponding correction factor. The following formula is used to calculate the
variation:
Total SS = ∑Xij2 – (T)2/n
7. Calculate the F-ratio with the help of the following formula:
217
F-ratio = MS between/MS within
Research Methodology
Notes The calculated value of F-ratio is tested against the tabulated F-value that is
determined at a specified level of significance. If the calculated value of F-ratio
lies under the limits of acceptance region, the null hypothesis is accepted and the
alternate hypothesis is rejected.
Let us learn the application of one-way ANOVA with the help of correction factor
using Example 12.
Example 12: First calculate the correction factor and then various components of
ANOVA table.
Where, T= summation of all the observed values in the three cities collectively
= 773.6
= 2.1
M
SS within = ∑Xij2 – ∑ (Tj)2/nj
= (3)2+ (8)2 + (4)2 + (9)2 + (6)2 + (7)2 + (6)2 + (9)2 + (8)2 + (5)2 + (7)2 + (4)2+ (9)2 + (8)2 + (6)2
II
= 54.34
= 56.4
The values of total SS, SS between and SS within are same in both the cases used for
the calculation of ANOVA. Therefore, the ANOVA table would also be the same.
2. Compute SS between rows. To do so, first take the sum of observed values in each Notes
row. Thereafter, take the square of the sum of observed values and divide the
number with the respective sample size of rows. Then, the resultant values are
added and difference between the added value and correction factor is taken to
obtain the variation between two rows. The following formula is used to calculate
SS between rows:
SS between rows = ∑ (Tj)2/nj – (T)2/n
Where, Tj = Sum of the observed value of a row = T1, T2,……….
nj = Sample size of a row = n1, n2,……………………
n = Sum of the sample size of different samples
In two-way ANOVA, there are three possible null hypotheses. These are as follows:
1. There is no difference in the means of the first factor.
2. There is no difference in the means of the second factor.
3. There is no interaction between first and second factors.
For null hypotheses 1 and 2, the alternative hypothesis is: The means of first factor
and second factor are not equal.
M
For null hypothesis 3, the alternative hypothesis is: There is an interaction between
first factor and second factor.
3. Divide SS between rows with d.f. k – 1 to get MS between rows, which is the mean
M
of variations occurred in between row samples. Similarly, MS between rows for
other attributes can also be calculated.
The following formula is used to calculate MS between rows: MS between rows =
II
SS between rows/(r – 1)
Where, r = number of rows
4. Calculate SS between columns. To do so, first take the sum of observed values in
each column. Thereafter, take the square of sum of observed values and divide
the number with the respective sample size of columns. Then, the resultant values
are added and difference between the added value and correction factor is taken
to obtain the variation between columns. Similarly, SS between columns for other
attributes can also be calculated. The following formula is used to calculate SS
between columns:
SS between columns = ∑ (Tj)2/nj – (T)2/n
Where, Tj = Sum of the observed value of a column = T1, T2,……….
nj = Sample size of a columns = n1, n2,……………………
5. Divide SS between columns with d.f. n – k to get MS between columns, which is
the mean of variations occurred within samples. Similarly, MS within for other
attributes can also be calculated. The following formula is used to calculate MS
within columns:
MS between columns = SS between columns/(c – 1) 219
Where, c = Total of the sample size of all the columns
Research Methodology
Notes 6. Calculate total variation by first taking the sum of squares of all individual values
in the samples. After that, subtract the sum of squares from correction factor.
Similarly, total variation for other attributes can also be calculated. The following
formula is used to calculate variation:
Total SS = ∑Xij 2 – (T)2/n
7. Compute residual variation by first adding SS between and SS within, and then
subtracting the difference between total SS and the value obtained by adding up
SS between and SS within. Similarly, residual variation for other attributes can
also be calculated. The following formula is used to calculate residual variation:
Residual variation = Total SS – (SS between + SS within)
8. Calculate the F-ratio with the help of the following formula: F-ratio = MS between/
MS within
The calculated value of F-ratio is tested against the tabulated F-value that is
determined at a specified level of significance. If the calculated value of F-ratio
lies under the limits of acceptance region, the null hypothesis is accepted and the
alternate hypothesis is rejected.
Let us understand the application of two-way ANOVA with the help of an example.
M
Example 13: Three respondents have rated three small cars of different brands on a
five-point scale (5 being the highest) with respect to their features. The ratings and
features are provided in Table 9:
M
Table 9: Ratings Given by Customers to Different Brands of Cars with Respect to their Features
1 Zen 3 2 4 3 5
i10 4 4 4 5 4
Alto 4 3 5 2 4
2 Zen 2 4 3 1 4
i10 4 5 3 4 4
Alto 3 1 2 5 3
3 Zen 4 5 3 2 4
i10 3 2 4 5 3
Alto 4 5 4 5 5
The researcher wants to know the difference between the brands in terms of features.
H0: There is no difference in the means of the five features of the cars.
= (162 × 162)/45
= 583.2
SS between columns (i.e., between variables) = (31 × 31)/9 + (31 × 31)/9 + (32 × 32)/9 +
(32 × 32)/9 + (36 × 36)/9 – 583.2
= 585.2 – 583.2
=2
SS between rows (i.e., between cars) = (56 × 56)/15 + (48 × 48)/15 + (58 × 58)/15 –583.2
= 587– 583.2
= 3.8
Total SS = (3)2 + (4)2 + (4)2 + (2)2 + (4)2 + (3)2 + (4)2 + (4)2 + (5)2 + (3)2 + (5)2 + (2)2 + (5)2 +
(4)2 + (4)2 + (2)2 + (4)2+ (3)2 + (4)2 + (5)2 + (1)2 + (3)2 + (3)2 + (2)2 + (1)2 + (4)2 + (5)2 + (4)2 +
M
(4)2 + (3)2 + (4)2 + (3)2 + (4)2 + (5)2 + (2)2 + (5)2 + (3)2 + (4)2 + (4)2 + (2)2 + (5)2 + (5)2 + (4)2 +
(3)2+(5)2 – 583.2
M
= 638 – 583.2
= 54.8
II
= 54. 8 – (2 + 3.8)
= 49
Source of
SS d.f. MS F-ratio 5% F limit
Variation
Between
2 (5 – 1) = 4 2/4 = 0.5 0.5/6.125 = 0.08 F(4,8) = 3.84
columns
(5 – 1) × (3 – 1)
Residual 49 49/8 = 6.125
=8
221
Research Methodology
Notes You can check the F-value for significance with the help of one-tailed test. The
graphical representation of the preceding solution for F-value at 4 v1 and 8 v2 is
shown in Figure 14:
Acceptance Region
Rejection Region
0.08 3.84
Rejection Region
0.31 4.46
Figures 14 and 15 show that the calculated F-value lies in the acceptance region.
Therefore, H0 is accepted and H1 is rejected. The value implies that the cars have the
same features.
A ctivit y Notes
Search on the Internet more about the parametric tests and prepare a note of
1,000 words on parametric tests.
10.7 SUMMARY
A hypothesis can be tested by using a large number of tests and these tests are
connected with each other in one way or another.
In parametric tests, researchers make some assumptions about some properties of
the parent population from which samples are drawn. In non-parametric tests, no
assumptions are made.
The different types of parametric tests are z-test, t-test, Chi-square test and F-test.
In a one-sample test, you study the relationship between a sample and the
population.
In a two-sample test, you study the relationship between two samples drawn from
two different or same populations.
ANOVA is used to study and explain more than two samples or data sets. It helps
in explaining the amount of variation in two data sets.
M
10.8 KEY WORDS
M
Distribution pattern: A probability distribution pattern that is similar to normal
distribution and is used for testing hypothesis.
F-test: A test that is used to compare the significant difference between the
II
Notes of the company develops a questionnaire in a summated scale form to collect data
about customer satisfaction with and without the warranty card. The department
mails the questionnaire to a random sample of customers after they have received
warranty cards. The same questionnaire is then sent to the same set of customers
after their warranty cards are expired. The company also sends the questionnaire to
dealers who have provided their customers with warranty cards.
The customers and dealers have provided marks out of 100 for their satisfaction level.
The data collected by the marketing research department for customer satisfaction
and dealer satisfaction is given in Table A as follows:
10 46 75 65
After conducting the research, the company comes to the conclusion that warranty
cards do not have much impact on the customers’ and dealers’ satisfaction. A reason
behind this can be that the same type of warranty is given by the competitors of
National Motors Inc.
QUESTIONS
1. Find out the effect of warranty cards on the satisfaction of customers with the help
of data provided in the case study. Use 5% as the level of significance to test the
hypothesis.
(Hint: H0: The customer satisfaction before and after returning the warranty card
is the same.)
2. What should National Motors do to overcome this problem?
(Hint: The company can conduct a survey regarding the available warranty cards
in the entire motor scooters industry.)
224
Parametric Tests
3. Why did the marketing research department of National Motors Inc. develop a Notes
questionnaire?
(Hint: In order to determine whether the customers’ and dealers’ satisfaction
depends on warranty cards or not.)
4. What was the base of providing marks to the questionnaire?
(Hint: Satisfaction level)
5. Was the questionnaire mailed to all customers and dealers?
(Hint: The department mailed the questionnaire to a random sample of customers
and to dealers who have provided their customers with warranty cards.)
10.10 EXERCISE
1. What are the two types of hypotheses tests?
2. Explain the different types of parametric tests.
3. Explore the following cases of one-sample tests:
a. Normal and infinite population, large sample size, known population variance
and two-tailed test or one-tailed test.
M
b. Normal and infinite population, small sample size, unknown population
variance and two-tailed test or one-tailed test.
4. Explain any two-sample tests along with examples.
M
5. Explain the concept of ANOVA in detail.
E-REFERENCES
IMPORTANT PARAMETRIC TESTS in Research Methodology Tutorial 09 April
2020 - Learn IMPORTANT PARAMETRIC TESTS in Research Methodology
Tutorial (11529) | Wisdom Jobs India. (2020). Retrieved 9 April 2020, from
https://www.wisdomjobs.com/e-university/research-methodology-tutorial-355/
important-parametric-tests-11529.html
ANOVA Test: Definition, Types, Examples - Statistics How To. (2020). Retrieved
9 April 2020, from https://www.statisticshowto.com/probability-and-statistics/
hypothesis-testing/anova/
M
M
II
226
R
TE
11
AP
H
C
Non-Parametric Tests
M
Table of Contents
M
11.1 Introduction
11.2 Concept of Non-Parametric Tests
II
M
M
II
Non-Parametric Tests
11.1 INTRODUCTION
In the previous chapter on Parametric Tests, you have learned about different types
of parametric tests used to check the validity of a hypothesis. You have also studied
that parametric tests can only be applied if you know population type and population
parameters, such as mean and variance. However, if this information is unavailable,
you cannot use parametric tests. In such a situation, you need non-parametric tests
to check the validity of a hypothesis and draw inferences.
M
Non-parametric tests are used when you do not have adequate information about
population type and parameters. These tests are widely used to study data given in
the form of ranks. Examples of non-parametric tests are sign tests, rank correlation,
M
rank sum test, Wilcoxon matched pairs and chi-square test. The selection of the test
depends on problem type, sample size and data. For example, rank correlation is
used to establish correlation between two ranked data sets. Researchers should
II
observe caution while selecting a non-parametric test to ensure accurate and precise
results.
This chapter covers non-parametric tests and their types. It provides information
about one and two sample sign tests. It also elaborates on rank correlation and rank
sum tests, including the Mann-Whitney and Kruskal-Wallis tests. In addition, it
explains the Wilcoxon matched pairs test/signed rank test. Finally, the chapter also
sheds light on chi-square test for goodness of fit and chi-square test for independence.
Let us understand the reason behind choosing a non-parametric test over a parametric
test with the help of a simple example. Suppose, a researcher wants to find out the 229
preference of customers about the different brands of toothpaste available in the
Research Methodology
Notes market. He/she would ask customers to rank different brands according to their
preferences. The data collected would be in the rank form on which parametric tests
cannot be performed. This is because a parametric test requires numeric values, such
as mean and variance, to test a hypothesis. Therefore, in this case, the researcher
would use a non-parametric test.
Sign Test
Non-parametric Tests
Rank Correlation
Chi-Square Test
M
Figure 1: Non-Parametric Tests
Sign Test
One sample sign test is applied on a sample where the researcher does not assume
that the data is normally distributed. In this test, the probability of getting a sample
value of less or greater than median value is equal. This implies that the proportion
of success (p) and failure (q) is equal, which means that p = q = 0.50. Therefore, it
is called binomial sign test. In one sample sign test, the researcher provides sample
values with positive (+) and negative (–) signs to test the hypothesis.
Here, the researcher usually tests the null hypothesis: M = M0 against an appropriate
alternate hypothesis.
Sign test is a hypothesis test for population median and not for population mean.
The mean and standard deviation of normal distribution are given as follows:
Mean µ = np
SD = σ = npq
Let us understand the application of one sample sign test with the help of an example.
Example 1: The scores of 15 students in a class test of 20 marks are as follows: 09, 10,
16, 18, 17, 19, 20, 16, 14, 12, 11, 13, 14, 09 and 13. 231
Research Methodology
Notes Test the hypothesis that the median score of all the students is equal to 15 against
the hypothesis that the median score of 15 students is greater than 15. Use 5% level of
significance.
The researcher assigns minus (–) sign to values of less than 15 and plus (+) sign to
values of greater than 15.
Observation 19 17 16 18 17 19 20 16 16 18 11 13 14 09 13
Sign + + + + + + + + + + – – – – –
No. of + signs = 10
No. of – signs = 5
Number of observations = 15
M
It must be remembered that the test statistics is larger of the number of + signs and
the number of – signs.
M
Now, we need to check whether 10 plus signs observed in the given 15 trials support
the null hypothesis that p = 0.5 or p > 0.5.
Now, we use binomial probability table to find the probability of 10 or more successes
II
as follows:
⇒ P (10 or more successes (X ≥ 10) | n = 15, p = 0.5) = P(X = 10) + P(X = 11) +
…….. + P(X = 15)
Since the value of one-tailed p is greater than α = 0.05, null hypothesis is accepted.
Note that here np = 15 (0.5) = 7.5.
X − np
Z=
npq
10 − 7.5 2.5
= = = 1.295
232 15 1.93
4
Non-Parametric Tests
The value of Z at 0.05 level of significance is +1.645. Since Z = 1.295 lies in the Notes
acceptance region, the null hypothesis is accepted.
Acceptance
Region
Rejection Region
Figure 3 shows that the calculated binomial value lies in the acceptance region.
Therefore, H0 is accepted. This implies that the median marks scored by 15 students
are 15.
Thereafter, the researcher calculates the total plus and minus signs and divides
the number by the sample size. Then, standard error is calculated and limits are
determined. Finally, the hypothesis is tested against the calculated value of limit.
Let us understand the application of the two sample sign test with the help of an
example.
The researcher wants to find out whether the first employee is the better performer.
Use 5% as the level of significance.
Solution: Null hypothesis (H0) and alternate hypothesis (H1) are as follows:
H0: p = 1/2
Or
H1: Sales done by the first employee is more than that of the second employee.
The researcher assigns the plus (+) and minus (–) signs to the data shown in Table 3:
Month
M
Employee 1 (in Lakhs) X Employee 2 (in Lakhs) Y Sign (X–Y)
1 2 1.5 +
2 2 2.5 –
M
3 4 3 +
4 1 1 0
5 1 1.5 –
II
6 2.5 2.75 –
7 3 2.5 +
8 3.5 1 +
9 4 3 +
10 1.5 1.4 +
11 2 4 –
12 3 3 0
No. of – signs = 4
Now, we use binomial probability table to find the probability of 6 or more successes
as follows:
6−5
Z=
10
4
1
Z= = 0.632
1.581
The value of Z at 0.05 level of significance is +1.645. Since Z = 0.632 and it lies in
the acceptance region, null hypothesis is accepted. This implies that the median sale
done by two employees is equal.
M
11.3.3 WILCOXON MATCHED PAIRS TEST/SIGNED RANK TEST
The Wilcoxon matched pairs test/Signed rank test is a combination of sign and rank
M
tests and is used to compare a paired sample. It is used in place of paired t-test
when the distribution is not normal. The Wilcoxon matched pairs test is used when
the researcher wants to determine the direction and magnitude of difference in the
matched values. Steps to perform the test are mentioned below:
II
Mean, µT = n (n + 1)/4
T − µT
Z=
σT
If the calculated z-value lies under the limits of acceptance region, the null hypothesis
is accepted and the alternate hypothesis is rejected.
Let us understand the application of the Wilcoxon matched pairs test/signed rank
test with the help an example.
Example 3: Two brands are ranked on a five-point scale (five being the highest).
The researcher wants to determine the difference between the satisfaction levels of
customers for two brands. The data for Brand A and Brand B and their difference is
provided in Table 4:
10 5 4 1
11 2 4 –2
12 4 3 1
No. of Brand Brand Difference |di| Sign Rank Rank with Signs
Respondents A B (di) |di|
1 2 2 0 0 0 – –
2 3 4 –1 1 – 4.5 – 4.5
3 4 3 1 1 + 4.5 + 4.5
236 4 1 2 –1 1 – 4.5 – 4.5
5 2 5 –3 3 – 11 – 11
Non-Parametric Tests
No. of Brand Brand Difference |di| Sign Rank Rank with Signs Notes
Respondents A B (di) |di|
6 5 4 1 1 + 4.5 + 4.5
7 4 2 2 2 + 9.5 + 9.5
8 3 4 –1 1 – 4.5 – 4.5
9 4 3 1 1 + 4.5 + 4.5
10 5 4 1 1 + 4.5 + 4.5
11 2 4 –2 2 – 9.5 – 9.5
12 4 3 1 1 + 4.5 + 4.5
Total Sum of Positive
Ranks (W+) = 32
Sum of Negative
Ranks (W–) = 34
T (smaller of W+
and W–) = 32
In this case, the researcher has neglected the first observation, as it is 0. The ranking
of difference is done from a smaller to a larger value. If there is a tie between the
ranks, the mean of ranks is taken and assigned to identical values. The T statistic is
M
equal to 32, which is the smallest value between the ranks with positive signs and
negative signs. The T-value, with 5% level of significance and two-tailed test, is ±
1.96.
M
Value of z-statistic is calculated as:
T − µT
Z=
II
σT
32 − [(11(11 + 1)) / 4] 32 − 33 −1
Z= = = = −0.088
11(11 + 1)[2(11) + 1] 11 × 12 × 23 11.25
24 24
Acceptance
Region
237
Figure 4: Position of the Calculated Value
Research Methodology
Notes Figure 4 shows that the calculated Z-value lies in the acceptance region; therefore,
H0 is accepted. This implies that customer satisfaction for two brands is the same.
present or not. He/She forms a null hypothesis that there is no correlation between Notes
the two data sets and tests it at 5% level of significance using two-tailed test. The
researcher checks the critical value for ρ in the table showing values of Spearman’s
rank correlation coefficient. The critical value of ρ is – 0.5179 (lower limit) and + 0.5179
(upper limit). The given value of ρ = 0.6364 is outside the acceptance region; therefore,
the researcher rejects the null hypothesis and concludes that there is a correlation
between two data sets.
Example 4: A researcher wants to test correlation between the IQ level and hours
spent in studying newspaper per week. The data is provided in Table 6:
9 112 10
10 110 17
11 94 16
12 110 8
13 112 9
Use rank correlation to find out correlation between the IQ level and hours spent on
reading a newspaper, with 5% level of significance.
H0: There is no correlation between the IQ level and hours spent on reading the
newspaper every week.
H1: There is correlation between the IQ level and hours spent on reading a newspaper
every week.
Or
H 0: ρ = 0
239
H 1: ρ ≠ 0
Research Methodology
ρ = 1 – [2697/2184]
ρ = 1 – 1.235 = –0.235
The rank correlation value at 5% level of significance with a degree of freedom (d.f.)
13 and two-tailed test is ± 0.484. The researcher can check the rank correlation value
for significance with the help of two-tailed test.
The calculated rank correlation value lies in the acceptance region; therefore, H0 is
accepted. This implies that there is no correlation between the IQ level and number
of hours spent on reading newspaper in a week. It can be interpreted that reading a
newspaper cannot increase your IQ level unless you analyse news.
ascending order of value. Thereafter, these observations are ranked and the sum of Notes
ranked observations is calculated. Finally, the sum is tested against the specified test
statistic value to test the hypothesis. There are two types of rank sum tests, as shown
in Figure 5:
The mean and SD are determined to calculate the limits of acceptance region. The
mean could be calculated with the help of the following formula:
µU = n1 n2/2
σU = n1 n 2 n1 + n 2 + 1
12
If the value of U test lies under the limits of the acceptance region, the null hypothesis
is accepted. However, if the calculated U value lies outside the limits of the acceptance
region, the null hypothesis is rejected and the alternate hypothesis is accepted. Let
us take an example to understand the application of the Mann- Whitney test.
12 25 49
The researcher wants to find out that the two products are from the same production
house. Use the Mann-Whitney test (or U test) with 10% significance level.
The researcher merges the data of two products and arranges it in the increasing
order. Thereafter, he/she calculates R1 and R2 for Products A and B, respectively, as
shown in Table 9:
n1 n 2 12 × 12
µU = = = 72
2 2
M
n1 n 2 (n1 + n 2 + 1) 12 × 12(12 + 12 + 1)
SD = σU = =
12 12
M
SD = 5 12 = 17.3
Uα = 42
Since U is greater than Uα, the researcher rejects H0. This implies that Products A and
B are from different production houses.
Where,
243
n = Sample size
Research Methodology
Notes Ri= Sum of the ranks of all the samples separately, that is, R1, R2, and….., ni
Chi-square value is determined at d.f. k–1 and the specified level of significance
and the calculated H value is tested against it. If the H value lies under the limits of
acceptance region, the researcher accepts the null hypothesis and rejects the alternate
hypothesis. However, if the H value lies outside the limits of acceptance region, the
researcher rejects the null hypothesis and accepts the alternate hypothesis.
Let us understand the application of the Kruskal-Wallis test with the help of an
example.
Machine 1
M Machine 2 Machine 3 Machine 4
24 28 26 33
23 34 28 37
M
26 29 31 36
27 32 25 35
29 30 21 38
II
Perform the Kruskal-Wallis test to establish whether all four machines are equally
good. Use 5% level of significance.
H0: All four machines are equally good. (This implies that Median1 = Median2 =
Median3 = Median4)
First, the researcher merges performance data for the four machines and arranges it
in an increasing order. Thereafter, he/she ranks the data and classifies ranks as R1,
R2, R3 and R4 for machines 1, 2, 3 and 4, respectively. Finally, the researcher takes out
the total of ranks in R1, R2, R3 and R4. The calculation is shown in Table 11:
5 26 5.5
6 26 5.5
7 27 7
8 28 8.5
9 28 8.5
10 29 10.5
11 29 10.5
12 30 12
13 31 13
14 32 14
15 33 15
16 34 16
17 35 17
18 36 18
19 37
M 19
20 38 20
After that, ranks are classified as R1, R2, R3 and R4 for machines 1, 2, 3 and 4,
M
respectively, as shown in Table 12:
Where, n = 20
R1 = 28 R2 = 61 R3 = 32 R4 = 89
n 1= n 2= n 3= n 4= 5
= 13.85 245
d.f. = k – 1
Research Methodology
Notes =4–1
=3
Chi-square value at 5% level of significance and 3 d.f. is 7.815. You can check the
value for significance with the help of one-tailed test. The graphical representation
of the preceding solution is given in Figure 6:
Acceptance
Region
Rejection Region
7.815 13.85
Figure 6 shows that the calculated chi-square value lies in the rejection region;
M
therefore, H0 is rejected and H1 is accepted. This implies that all four machines are not
equally good. It can be interpreted that all four machines have different capabilities
and machine number 4 is the best, as its score (89) is the highest.
M
S elf A ssessment Q uestions
7. _________________ is used to determine whether two independent samples
are drawn from the same population.
II
(Oi − Ei )2
k
χ =∑
2
i =1 Ei
Ei = Expected frequency
246 Expected frequency can be calculated with the help of the following formula:
If the value of chi-square is greater than critical value of c2, null hypothesis is rejected. Notes
Figure 7 shows two types of chi-square tests that are mainly used to find out the
association between variables:
Chi-square Tests
Let us discuss Chi-square test for goodness of fit and chi-square test for independence
in detail.
Thereafter, he/she calculates chi-square value with the formula used to calculate
M
chi-square. In chi-square test, d.f. used is n–1. Chi-square value is determined at the
specified level of significance and d.f. If the calculated chi-square value lies under
the limits of acceptance region, the researcher accepts the null hypothesis and rejects
II
Let us understand the application of chi-square test with the help of an example.
Test the hypothesis that the customers have no preference for any particular product.
Use 5% level of significance.
247
Solution: For H0: Customers have no preference for any particular product.
Research Methodology
Notes The expected frequency and observed frequency of customers’ responses are shown
in Table 14 as follows:
k
(Oi − Ei )2
χ2 = ∑
i =1 Ei
50 2 + 30 2 + 30 2 + 50 2
χ2 =
250
6800
χ2 = = 27.2
250
M
Critical value of χ2 at 5% level of significance with 3 degrees of freedom (k–1 = 3) is
7.81. Since our χ2 is greater than the critical value, we reject H0.
M
are associated with each other. For example, the researcher wants to know that the
introduction of better/unique services helps increase sales of an organisation or not.
In this case, the researcher is trying to establish a relation between two attributes−
better services and sales. In chi-square test, first expected frequency is calculated
and then the value of chi-square is ascertained. The d.f. used in this case is (r–1)
(c–1), where r equals the number of levels for one category of variable and c equals
the number of levels for the second category of variable. The chi-square value is
determined at the specified level of significance and d.f. If the calculated chi-square
value lies under the limits of acceptance region, the null hypothesis is accepted and
the alternate hypothesis is rejected.
Let us understand the application of chi-square test with the help of an example.
Example 8: The researcher has the data for the preferences of men and women
regarding the joint and nuclear families, as shown in Table 15:
Table 15: Data for Preferences of Men and Women for Joint and Nuclear Families
The researcher wants to find out whether the opinion of men and women about the Notes
type of family is the same. Use 5% level of significance.
H0: The opinion of men and women about the type of family is indifferent.
H1: The opinion of men and women about the type of family is different.
The test statistic used for this data is chi-square test for independence. The following
equation is used for calculation:
(Oi − Ei )2
k
χ =∑
2
i =1 Ei
Ei = Expected frequency
Expected frequency can be calculated with the help of the following equation:
In the current scenario, expected frequency can be calculated using the following
M
method:
After calculating the expected frequency and the square of differences between the
observed and expected frequency, Table 16 is created:
d.f. = (r – 1) (c – 1) 249
= (2 – 1) (2 – 1) = 1
Research Methodology
Notes Chi-square value at 5% level of significance with one-tailed test and 1 d.f. is 3.841.
You can check the chi-square value for significance with the help of one-tailed test.
The graphical representation of the preceding solution is shown in Figure 8:
Acceptance
Region
Rejection Region
3.841 74.16
Figure 8 shows that the value lies in the rejection region; therefore, H0 is rejected.
The value implies that there is a vast difference between the opinions of men and
women about the type of family.
M
S elf A ssessment Q uestions
10. Expected frequency can be calculated using the formula ______________.
M
11. Chi-square test for independence refers to a test in which two attributes are
tested to find out whether they are associated with each other. (True/False)
II
A ctivit y
Prepare a PowerPoint presentation on ‘Non-Parametric Test’.
11.7 SUMMARY
A researcher can use non-parametric tests without taking into consideration
population distribution and sample type. Non-parametric tests are also known as
distribution-free tests.
Sign test is considered as one of the easiest non-parametric tests because it takes
into account only the plus and minus signs of observations in a sample.
One sample sign test is applied on a single sample taken from a symmetrical
population.
Two sample sign test is used to check whether two samples are related to each
other. It is also known as paired sign test.
Rank correlation, also known as Spearman’s rank correlation coefficient, is used to
establish correlation between two data sets that can be ranked.
250 Rank sum test is used to analyse ordinal data (in the rank form) and calculate the
value of rank sum statistics. To conduct this test, observations need to be arranged
in the ascending order.
Non-Parametric Tests
The Mann-Whitney test (or U test) is used to determine whether two independent Notes
samples are drawn from the same population. It is applied in general conditions
and does not have any specific requirement.
The Kruskal-Wallis test is similar to one-way ANOVA with only one difference
that the former is based on ranks, while the latter is based on numerical values.
The Wilcoxon matched pairs test/signed rank test is a combination of sign and
rank tests. It is used to compare two paired samples.
Chi-square test is used to find out dependency between two types of data. It can
also be used to make comparisons between theoretical population (expected data)
and actual data (observed data).
Notes of the portable generator industry was 2.5 lakh units a month due to huge demand
from customers. However, this demand was short-lived and, by 1987, many units
closed down the production of generators. For example, Kirloskare Group has
withdrawn its 0.5 KVA portable generator from the market. Lombardini has also
disappeared from the market. Two major competitors, Sri Ram Honda and Birla
Yamaha, were indulged in a price war.
2 35 30
3 24 35
4 36 40
5 22 45
6 26 41
7 45 46
8 50 52
9 44 49
10 47 44
11 48 50
12 25 42
252
Non-Parametric Tests
Following table shows the preferences of rural and urban customers for two types Notes
(branded and local) of generators:
Preferences of Rural and Urban People for Local and Branded Generators
Top Competitors Local Marketers Total
Rural Market 100 150 250
Urban Market 120 99 219
Total 220 249 469
The researchers concluded that the rural market is widely different from the urban
market. In addition, the efficiency of generators produced by top competitors is
almost same as those produced by local companies. Therefore, the generators
produced by top competitors to fulfil demand from urban marketers, can also be
introduced in the rural market. The two market leaders should market their products
very effectively in the rural market to capture the market share. Local marketers have
the first mover advantage in the rural market. The market leaders can make slight
changes in their generators to improve their capacity and promote their products as
specifically designed for the rural market.
QUESTIONS
M
1. What are the two research topics identified by researchers?
(Hint: Studying the requirements of rural market in terms of technical feasibility
and consumer preferences.)
M
2. What is the conclusion given by researchers in the case study?
(Hint: The researchers concluded that the rural market is widely different from the
II
urban market.)
3. How did the two market leaders (Sri Ram Honda and Birla Yamaha) conduct the
market research?
(Hint: Sri Ram Honda and Birla Yamaha hired researchers to study the changing
market scenario.)
4. What strategy was followed by researchers to conduct research?
(Hint: The researchers divided the problem into two research topics.)
5. How was the study of the first research topic conducted by researchers?
(Hint: Researchers collected two samples: one from a market leader and another
from a local marketer. Rural technicians have assigned scores to two types of
generators according to efficiency.)
11.10 EXERCISE
1. Explain the concept of non-parametric test.
2. Describe rank correlation with the help of an example.
3. Explain the types of sign tests. 253
Research Methodology
Biddle, J., & Emmett, R. Research in the History of Economic Thought and Methodology
National Academies Press. (2009). Partnerships for Emerging Research Institutions.
Washington, D.C.
E-REFERENCES
Nonparametric Tests - Overview, Reasons to Use, Types. (2020). Retrieved 9 April
2020, from https://corporatefinanceinstitute.com/resources/knowledge/other/
nonparametric-tests/
Using Chi-Square Statistic in Research - Statistics Solutions. (2020). Retrieved 9
April 2020, from https://www.statisticssolutions.com/using-chi-square-statistic-
in-research/
254
R
TE
12
AP
H
C
Report Writing
M
Table of Contents
M
12.1 Introduction
12.2 Research Proposal
II
12.1 INTRODUCTION
In the previous chapter, you studied the non-parametric tests. The chapter discussed
the sign test and its types. The latter sections of the chapter described rank correlation
and rank sum test. The chapter concluded with the explanation of chi-square test.
Report writing is a process to document each and every step involved in the
research process. These steps are Introduction, Literature Review, Methodology,
Data Analysis and Interpretation, Conclusion and Recommendations. It helps the
researcher in checking whether the research is progressing in the right direction
M
or not. A research report serves as a reference for findings and recommendations
of a research in future. The research report consists of a written report and an oral
presentation. The written report states objectives, data, research methodology and
findings. The oral presentation helps the target audience in judging whether research
M
recommendations are feasible to address the research problem or not.
may misinterpret the research findings and take wrong decisions, which may prove
disastrous for the organisation. Therefore, the researcher should observe utmost care
and adopt a predetermined structure while writing a report to prevent the creeping
in of ambiguities in the report.
The chapter begins by explaining the concept of research proposal. Next, it provides
in-depth information about research report. Then the following topics are explained:
written report, audience of a report, types of reports and steps in writing a report.
The integral parts of a report are also discussed. Towards the end, the concept of oral
presentations is discussed.
Submitted to
Sales Manager: Vikas Kumar
II
Submitted by
Manali Batra, Senior Researcher
MSD Research Institute
Location of the Work: Max New York Life, Elegance Tower, Jasola Vihar
This study covers data analysis of MNYL and HDFC for only a limited period of
time from the financial years 2014–15 to 2018–19. Hence, the results are comparable
and representative for this period only.
The written report is an official document giving the facts and information to the
interested readers in a presentable manner. The facts must be accurate, complete
and interpreted. The oral report, on the other hand, is a piece of face-to-face
communication presenting one’s research work in a seminar, workshop, etc. It helps
the researcher present his/her views more clearly in front of research stakeholders.
Since the reporter has to interact directly with the audience, any faltering during oral
presentation can leave a negative impact on the audience. However, an oral report
helps the researcher gather valuable suggestions and feedback from the research
stakeholders. As compared to an oral report, a written report is a permanent record
that can be used for reference again and again. Let us discuss the written reports and
the oral presentations in detail.
Notes time duration, the methodology adopted for research, and so on. In addition, a
written report helps in identifying alternative solutions to address a problem by
presenting the present and past findings and recommendations.
Audience of a Report
As already discussed, there are two parties involved in a research – the first party
wants the research to be conducted and the second party conducts the research. The
first party is called audience. The researcher should tailor the writing of research
report towards the specific requirements of the target audience. The length and
composition of a research report and the details provided in it vary as per the target
audience. This happens because organisations differ from one another in significant
ways.
The researcher should adapt his writing successfully to three types of audiences that
requires different techniques:
High-tech peers: The research report should make use of the most professional/
complex resources, along with writing of jargon and technical terms, keeping in
mind the expert-level knowledge of the audience.
Low-tech peers: The research report should provide proper definitions for all the
abbreviations/acronyms/technical terms used throughout the writing. This would
enhance understanding where it is a mixture of laymen and professionals.
M
Lay readers: The research report should use simple terms that are a lot easier to
understand and interpret. There should be no use of abbreviations/acronyms.
of development patterns are mostly used in research reports: logical development Notes
and chronological development. In logical development, the researcher makes
logical decisions by using mental thoughts and links between one topic and the
other. Logical thinking is mostly based on the study that the researcher has done
during the research work. In logical development, the subject matter moves from
simple to complex. In chronological development, the subject matter is sequentially
structured.
2. Drawing the outline of the report: At this stage of report writing, the researcher
makes a structure or outline of the report. It consists of a brief description of the
topic to be covered in the report. This helps the researcher not to miss out any topic
to be studied in the report. The outline is also considered as the framework of the
report.
3. Preparing the rough draft: At this stage, the researcher starts writing the report.
The researcher organises his/her thoughts and mentions methods to be used for
data collection, analysis techniques, major findings of the research and limitations
faced by him/her during the study. The recommendations of the study are also
described in the rough draft.
4. Reviewing the rough draft: The researcher checks whether the report conveys
the intent of the research work to be carried out. In addition, at this stage, the
researcher also checks whether the report is apt for the target audience.
M
5. Preparing bibliography: Bibliography is a section of the report that contains sources
of secondary data collection. It includes names of books, journals, magazines and
other sources of print media from where the data is collected. It also contains the
Internet links used in the preparation of the research report. There is a proper
M
pattern to write the name of the source from where the data is collected.
Multiple styles of referencing can be used, such as APA citation, Harvard
referencing and MLA format, each having its unique rules for the structure of
II
references with respect to author name, book title, date, publisher name, etc. Let us
understand the pattern of mentioning data sources in bibliography with the help
of the following examples:
For books and pamphlets, the order of writing in APA referencing is as follows:
Last name of the author, initials of the first name (year). Title of the book (edition).
Place. Publisher name.
For example,
Sekaran, U., & Bougie, R. (2016). Research Methods FOR Business (4th ed.). New
York: Wiley.
For websites, the order of writing in APA referencing is as follows: Article title.
(year). Retrieved from: URL.
For example,
4 Types of Research Methods For Start-Ups. (2019). Retrieved from https:// www.
bl.uk/business-and-ip-centre/articles/4-basic-research-methods-for-business-
start-ups.
6. Making the final draft: At this stage of report writing, the researcher gives a final
touch to his/her report. The final report is prepared keeping in mind the objective
of the research. It should be simple, concise and convincing. At this stage, it is 261
checked whether all the portions of the research are covered or not.
Research Methodology
researcher to conduct the research. It also contains the name of the people who Notes
have contributed to the research. Preface talks about the subject matter of the
report.
Executive summary: It contains a brief account of the introduction, body and
conclusion of the research. It gives an idea of every segment in the report. The
summary can come at the start or end of the report. It depends on the type of
report and the way of report-writing.
Introduction and objective: It contains the detailed background of the research
topic and the purpose of conducting the research. For example, if the research
is carried on an organisation’s product, the introduction would include product
features and the background, profile, market and future plans of the organisation.
It can also involve the industry background, which includes information regarding
main players and the level of competition in the market.
Body of the report: This part contains a detailed description of the research topic.
It also contains methodology used in the research and analysis of the collected
data.
Findings, conclusion and recommendations: This part contains major findings of
the research.
Bibliography and appendices: This part lists sources from where the research data
is collected. Bibliography contains sources of secondary data while appendices
M
contain the sources of primary data or some extra information about the research
topic. Appendices also contain the questionnaire or other sources of acquiring
data.
M
S elf A ssessment Q uestions
6. The _______________ contains the name of the sponsor of the research, the
name of the researcher and duration of the research.
II
A ctivit y
Prepare a PowerPoint presentation on research report-writing techniques.
12.5 SUMMARY
A research proposal is an agreement between two parties. The first party
wants the research to be conducted and the second party actually conducts the
research.
The research proposal includes purpose, population, research design, methods
of data collection, tests of significance, time frame and budget.
A research report is a crucial part of a research as it includes solutions and
actionable recommendations of the research problem.
263
A research report can be of two types, namely written report and oral presentation.
Research Methodology
Notes Broadly, reports are classified into two types – technical report and popular report.
Technical report lays emphasis on the method employed in conducting research,
assumptions made during the research, details about the research topic and the
research findings and recommendations.
Popular report is non-technical in nature and is less comprehensive as the audience
of this report is interested in knowing the results of the research, not the entire
analysis.
The report writing process involves sequential steps, which are analysing the
subject matter, drawing the outline of the report, preparing the rough draft,
reviewing the rough draft and preparing bibliography.
Oral presentations are given with the help of PowerPoint software, which facilitate
data presentation in the form of graphs and charts.
A research report contains many sections that provide segregated research
information, which are title page, preliminary pages, executive summary,
introduction and objective, body of the report, findings, conclusion,
recommendations and bibliography and appendices.
RESEARCH PROPOSAL
Submitted to
Manager S.R. Dicosta
Submitted by
Veera Malhotra
Senior Researcher
RPS Research Institute
zz Secondary data: The Internet and articles in different sources (print media)
3. Sample
zz Sampling: Purposive and convenient sampling
zz Sample size: 200
zz Sample population: Customers of similar product in the market
4. Tools
zz Excel
zz SPSS
zz TABLEAU
Importance of the Work
This study will help in knowing the competition level in the new market and how
the company will beat this competition and enter the market.
Expected Outcomes
After sending the report proposal, the researcher starts researching and writes a
report after the completion of the research. The report is as follows:
Research Report
Study of Paint Industry in Delhi
Findings of the Study
The researcher tries to explain the results of the research with the help of SWOT
(Strengths, Weaknesses, Opportunities and Threats) analysis, which is presented
in the following table:
Strengths Weaknesses
zz Strong brand image zz Centralised structure
zz Dedicated sales team zz Rigid department heads
zz Value-added services
Opportunities Threats
zz Large untapped market zz Presence of very strong competitors
zz Distinguishable product (such as PR zz Aggressive marketing by competitors
Paint) zz Various paints of good quality
zz Unsatisfied customer
zz New area of expansion
M
Table: SWOT Analysis
M
QUESTIONS
1. Which type of report is used in the case study?
II
Biddle, J. & Emmett, R. Reserach in the History of Economic Thought and Methodology.
Chandra, S. & Sharma, M. Research Methodology.
National Academies Press. (2009). Partnerships for Emerging Research Institutions.
Washington, D.C.
E-REFERENCES
How to Write a Research Proposal | Guide and Template. (2020). Retrieved 10
April 2020, from https://www.scribbr.com/research-process/research-proposal/
Research Reports: Definition and How to Write Them | QuestionPro. (2020).
Retrieved 10 April 2020, from https://www.questionpro.com/blog/research-
reports/
268
About IIMM
“Indian Institute of Materials Management (IIMM)”, with its headquarters at Navi Mumbai, is a
Professional Body of Materials Management classified under Engineering & Technology Group under
Apprenticeship Act, 1961 and is recognised by ISTE, MHRD.
Through its wide network of 56 branches and 19 chapters having around 9500 members drawn
RESEARCH METHODOLOGY
from public and private sectors, IIMM is dedicated to the promotion of the profession of Materials
Management through its multifarious activities including Educational Programs approved by AICTE
(Post Graduate Diploma in Materials Management and Post Graduate Diploma in Supply Chain
Management & Logistics), Seminars, National Conferences, Regional Conferences, Workshops,
In-house training programs, Consultancy & Research Programs.
To have an effective global interaction, the Institute is a charter member of International Federation
of Purchasing and Supply Management (IFPSM), Helsinki, Finland which has its roots in over
44 member countries. M
In furtherance of its objectives, IIMM brings out a monthly journal, “Materials Management Review”
comprising latest Articles and Research Papers in the field of Materials, Logistics, Purchase, Inventory,
Supply Chain Management and latest Technological Innovations like Artificial Intelligence, Block
M
Chain, Cloud Computing and Internet of Things.
The Institute has its Centre for Research in Materials Management (CRIMM) at Kolkata, which
II
is engaged in promotion of research activities in collaboration with industries for furthering the
advancement of the profession of Materials and Supply Chain Management.
The Institute is dedicated for the Societal & Environmental considerations through Sustainable
Procurement, Green Purchasing and Life Cycle Consideration, which are part of our course curriculum.
The aim & objective of the Institution is to update & upgrade the skills & knowledge of professionals
so as to ensure inclusive and sustainable development.
RESEARCH METHODOLOGY