BUSINESS RESEARCH METHODS
KMBN/A
UNIT -3
UNIT-3
Scaling & measurement techniques: Concept of Measurement:
Need of Measurement; Problems in measurement in
management research – Validity and Reliability.
Levels of measurement – Nominal, Ordinal, Interval, Ratio.
Attitude Scaling Techniques: Concept of Scale – Rating Scales
viz. Likert Scales, Semantic Differential Scales, Constant Sum
Scales, Graphic Rating Scales – Ranking Scales – Paired
comparison & Forced Ranking – Concept and Application.
Validity and Reliability – Criterion for Good Measurement
Reliability is about the consistency of a measure, and validity is about the
accuracy of a measure.
Reliability-
• Measurement is said to be reliable when it give consistent results. i.e. when
repeated measurements of same things give constant results.
• Reliability is the extent to which the same finding will be obtained if the
research is repeated at another time by another researcher. If the same finding
can be obtained again, the instrument is consistent or reliable.
Reliability refers to how consistently a method measures something. If the
same result can be consistently achieved by using the same methods under the
same circumstances, the measurement is considered reliable.
Ex- You measure the temperature of a liquid sample several times under identical
conditions. The thermometer displays the same temperature every time, so the
results are reliable.
Example: If you weigh yourself on a weighing scale throughout the day, you’ll get
the same results. These are considered reliable results obtained through repeated
measures.
• Two dimensions underlie the concept of reliability:
1. Repeatability
2. Internal Consistency
Types of reliability
1.Test-retest: The consistency of a measure across time: do you get the same results when
you repeat the measurement? Ex- A group of participants complete a questionnaire
designed to measure personality traits. If they repeat the questionnaire days, weeks or
months apart and give the same answers, this indicates high test-retest reliability.
2.Inter-Term: It measures the consistency of the measurement. Example: The results of the
same tests are split into two halves and compared with each other. If there is a lot of
difference in results, then the inter-term reliability of the test is low.
3. Inter-Rater : It measures the consistency of the results at the same time by different
raters (researchers). Example: Suppose five researchers measure the academic performance
of the same student by incorporating various questions from all the academic subjects and
submit various results. It shows that the questionnaire has low inter-rater reliability.
4.Parallel Forms: It measures Equivalence. It includes different forms of the same test
performed on the same participants. Example: Suppose the same researcher conducts the
two different forms of tests on the same topic and the same students. The tests could be
written and oral tests on the same topic. If results are the same, then the parallel-forms
reliability of the test is high; otherwise, it’ll be low if the results are different.
Validity
Validity refers to how accurately a method measures what it is intended to
measure. If research has high validity that means it produces results that
correspond to real properties, characteristics, and variations in the physical or
social world.
• High reliability is one indicator that a measurement is valid. If a method is not
reliable, it probably isn’t valid.
Validity refers to the accuracy of the measurement. Validity shows how a specific
test is suitable for a particular situation. If the results are accurate according to the
researcher’s situation, explanation, and prediction, then the research is valid.
• Example: Suppose a questionnaire is distributed among a group of people to
check the quality of a skincare product and repeated the same questionnaire with
many groups. If you get the same response from various participants, it means the
validity of the questionnaire and product is high as it has high reliability.
• Example: Your weighing scale shows different results each time you weigh
yourself within a day even after handling it carefully, and weighing before and
after meals. Your weighing machine might be malfunctioning. It means your
method had low reliability. Hence you are getting inaccurate or inconsistent results
that are not valid.
For example, variable like behavior of employees to measure consumer satisfaction
in a big shopping mall is a validity issue. As behavior of employees is not the only
determinant of consumer satisfaction rather various other factors such as pricing
policies, discount policy, parking facility, and others may be responsible for
generating consumer satisfaction. Hence, the tool that was designed to measure
consumer satisfaction from “employee’s behavior” may not be a valid measurement
tool. The researchers are always concerned about the validity of their measuring
instrument.
Validity is referred in context of two terms viz., internal & external validity.
External Validity refers to the generalizability of research findings to the external
environment like population, variables, etc. In other words, external validity of
research findings is the data's ability to be generalized across universe.
On the other hand, internal validity is the ability of a research instrument to
measure what it is purported (supposed) to measure.
Types of Validity
1.Content validity: It shows whether all the aspects of the test/measurement are
covered. Example:A language test is designed to measure the writing and reading
skills, listening, and speaking skills. It indicates that a test has high content validity.
2.Face validity: It is about the validity of the appearance of a test or procedure of
the test. Example:The type of questions included in the question paper, time, and
marks allotted. The number of questions and their categories. Is it a good question
paper to measure the academic performance of students?
3.Construct validity :It shows whether the test is measuring the correct construct
(ability/attribute, trait, skill) Example: A self-esteem questionnaire could be assessed
by measuring other traits known or assumed to be related to the concept of self-
esteem (such as social skills and optimism). Strong correlation between the scores
for self-esteem and associated traits would indicate high construct validity.
4.Criterion validity :Refers to how well the measurement of one variable can
predict the response of another variable.
A job applicant takes a performance test during the interview process. If this test
accurately predicts how well the employee will perform on the job, the test is said to
have criterion validity.
Any measurement tool should have the ability to measure a particular variable
accurately and it must measure what it is supposed to measure. A good instrument
would enhance the quality of research results. Hence it becomes necessary that we
assess the ‘goodness’ of the measures developed. Any instrument that meets the test
of reliability, validity and practicality is said to possess the „goodness‟ of measures.
These tests of sound measurement are:
How to Increase Reliability
• Use an appropriate questionnaire to measure the competency level.
• Ensure a consistent environment for participants
• Make the participants familiar with the criteria of assessment.
• Train the participants appropriately.
• Analyse the research items regularly to avoid poor performance.
How to Increase Validity
• The respondents should be motivated.
• The intervals between the pre-test and post-test should not be lengthy.
• Dropout rates should be avoided.
• The inter-rater reliability should be ensured.
• Control and experimental groups should be matched with each other.
Attitude Scaling Techniques
Marketers are interested in measuring consumers’ attitudes toward their products. An
attitude scale involves a series of phrases, adjectives, or sentences about the attitude
object.
Attitude measurement may help marketers in several ways. Attitudes consumers
hold toward a particular firm, and its products greatly influence the firm’s marketing
strategy’s success or failure.
If consumers hold negative attitudes about one or more aspects of a firm’s marketing
practices, they may stop buying the firm’s products and influence others not to buy
the same. As consumers’ attitudes play an important role in determining consumer
behavior, marketers should measure consumers’ attitudes.
Scaling
Concept of Scale – Scaling is a technique used for measuring qualitative responses of
respondents such as those related to their feelings, perception, likes, dislikes, interests and
preferences.
Scaling is the branch of measurement that involves the construction of an instrument that
associates qualitative constructs with quantitative metric units.
• Several scales formats have been developed to enable a researcher in collecting
appropriate data for conducting a study. The scales are broadly divided into two
categories viz.
• Conventional scaling
• Unconventional scaling
The conventional scales are used in the questionnaire format and are most common. The
unconventional scales are used for unconventional collection of data through games,
puzzles, etc.
Comparative scales
Comparative scales include scales wherein the researchers ask the
respondents for their relative preference between two or more objects.
For example, “Do you prefer Colgate or Babool?”
Examples of comparative scales include paired comparison, rank order,
and constant sum scale.
1. Paired comparison:
• This technique is a widely used comparative scaling technique.
• In this technique, the respondent is asked to pick one object among the
two objects with the help of some criterion.
• The respondent makes a series of judgements between objects.
• The data obtained is ordinal in nature.
• With n brands, [n(n-1)/2] paired comparisons are required.
Example:
• For example: A survey was conducted to find out consumer’s preference
for dark chocolate or white chocolate. The outcome was as follows:
• Dark chocolate= 30%
• White chocolate= 70%
• Thus, it is visible that consumers prefer white chocolate over dark
chocolate.
2. Rank order:
A ranking question is a type of survey question that asks respondents to compare a
list of items with each other and arrange them in order of preference. It is used by
market researchers to understand the order of importance of items from multiple
items.
A ranking scale is a close-ended scale that allows respondents to evaluate multiple
row items in relation to one column item or a question in a ranking survey and then
rank the row items. It is the scale used by market researchers to ask ranking
questions.
On a ranking scale, the question may be in terms of product features, needs, wants,
etc. It can be used for both online and offline surveys.
• For example: A respondent is asked to rate the following soft drinks:
3. Constant sum scaling:
• In this technique, the respondent is assigned with the constant sum of units, such
as 100 points to attributes of a product to reflect their importance.
• If the attribute is not important, the respondent assigns it 0 or no points.
• If an attribute is twice as important as another attribute, it receives twice as many
points.
• The sum of all points is 100, that is, constant. Hence, the name of the scale.
A constant sum scale is a type of question used in a market research survey in
which respondents are required to divide a specific number of points or percents as
part of a total sum. The allocation of points are divided to detail the variance and
weight of each category.
4. Q sort:
• It is a sophisticated form of rank order.
• In this technique, a set of objects is given to an individual to sort into
piles to specified rating categories.
For example: A respondent is given 10 brands of shampoos and asked
to place them in 2 piles, ranging from “most preferred” to “least
preferred“.
Pile 1
Most preferred
pile 2
Least preferred
Non-comparative scales:
In non-comparative scales, each object of the stimulus set is scaled
independently of the others. The resulting data are generally assumed to
be ratio scaled.
1. Continuous rating scales:
A continuous rating scale is a type of scale wherein the respondents are
asked to rate different objects on a continuum according to certain
criterion. The rating is given by respondents by marking a point on the
continuum.
• For example: A respondent is asked to rate the service of Domino’s:
• Type 1
• Following are the two categories under scaling techniques:
Type 2
2. Itemised rating scales:
In itemized rating scale, items are shown in the form of ordered
statements in and the respondents are required to select the category
that best describes the concerned item. The respondents are asked to
select one of the choices according to their preferences or opinions.
a. Likert scale:
• This scale requires the respondent to indicate a degree of agreement
or disagreement with the statements mentions on the left side of the
object.
• The analysis is often conducted on an item-by-item basis, or a total
score can be calculated.
• When arriving at a total score, the categories assigned to the negative
statements by the respondent is scored by reversing the scale.
• For example: A well-known shampoo brand carried out Likert scaling
technique to find the agreement or disagreement for ayurvedic
shampoo.
• Summated scales are constructed by using the item analysis approach.
Such scales consist of a number of statements that express either positive
or adverse feelings toward any topic or idea. The summated scale is most
frequently used in studying social attitudes. It follows the pattern
developed by Likert; thus, the summated scale is also termed as Likert
scale. Most commonly, a Likert scale contains five degrees of a statement.
Statement: The Internet is creating a positive impact on Children.
• Strongly Agree (1)
• Agree (2)
• Neutral (3)
• Disagree (4)
• Strongly Disagree (5)
In the preceding example, there are five degrees of responses for the
given statement. The right extreme of the scale shows the strongest
approval of the statement, whereas, the left extreme indicates the
strongest disapproval of the statement. The middle points are between
these two extremes.
2.Semantic differential scale:
It measures the connotative meaning of objects, events and concepts.
The semantic differential scale consists of bipolar adjectives, such as
good bad and valuable-worthless. The respondent is asked to select
his/her position between these two adjectives. Let us understand the
concept of the semantic differential scale with the help of the following
example.
A semantic differential scale analyzing candidates for a managerial
position is shown in Table:
TATEMENT: HOW DO YOU RATE
YOURSELF ON THE FOLLOWING
TRAITS?
Successful Unsuccessful
Progressive Regressive
Active Passive
Fast Slow
Strong Weak
Severe Lenient
True False
3 2 1 0 –1 –2 –3
c. Staple scale:
A Staple scale is that itemized rating scale which measures the
response, perception or attitude of the respondents for a
particular object through a unipolar rating. The range of a Staple
scale is between -5 to +5 eliminating 0, thus confining to 10
units.
• For example, A tours and travel company asked the
respondent to rank their holiday package in terms of value for
money and user-friendly interface as follows:
With the help of the above scale, we can say that the company
needs to improve its package in terms of value for money.
However, the decisive point is that the interface is quite user-
friendly for the customers.