Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
16 views34 pages

Validity Reliability Unit 2

The document discusses the concepts of validity and reliability in research, emphasizing their importance in ensuring accurate and consistent measurement through various types of validity such as face, content, construct, and criterion-related validity. It also outlines methods for assessing reliability, including test-retest, parallel-forms, inter-rater, and internal consistency reliability. The text highlights the significance of these concepts in research methodology to enhance the quality and credibility of survey instruments.

Uploaded by

sdeepakkumar870
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views34 pages

Validity Reliability Unit 2

The document discusses the concepts of validity and reliability in research, emphasizing their importance in ensuring accurate and consistent measurement through various types of validity such as face, content, construct, and criterion-related validity. It also outlines methods for assessing reliability, including test-retest, parallel-forms, inter-rater, and internal consistency reliability. The text highlights the significance of these concepts in research methodology to enhance the quality and credibility of survey instruments.

Uploaded by

sdeepakkumar870
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Validity & Reliability

Dr. P.Routray
Context
 The main objective of questionnaire in
research is to obtain relevant information in
most reliable and valid manner.
 Thus the accuracy and consistency of

survey/questionnaire forms a significant


aspect of research methodology which are
known as validity and reliability.
Face Validity

Content Validity
Validity Discriminant
Validity
Construct
Validity
Convergence
Validity

Predictive
Validity
Criterion
Validity Concurrent
Validity

Postdictive
Validity
VALIDITY
 Research validity in surveys relates to the
extent at which the survey measures right
elements that need to be measured.
 In simple terms, validity refers to how well

an instrument as measures what it is


intended to measure.
Face Validity
 Face validity is a subjective judgment on the operationalization of a
construct.
 Cohen’s kappa measures the agreement between two raters who each
classify N items into C mutually exclusive categories.
 In order to examine the face validity, the dichotomous scale can be used
with categorical option of “Yes” and “No” which indicate a favourable and
unfavourable item respectively. Then the collected data is analysed using
Cohen’s Kappa Index (CKI) in determining the face validity of the
instrument.
 Face validity is considered by some as a basic and a very minimum index
of content validity.
 The value for kappa can range between 0 -1. A score of 0 means that there
is random agreement among raters, whereas a score of 1 means that there
is a complete agreement between the raters.DM. et al. (1975)
recommended a minimally acceptable Kappa of 0.60 for inter-rater
agreement.
A => The total number of instances that both
raters said were correct. The Raters are in
agreement.
B => The total number of instances that Rater
1 said were incorrect, but Rater 2 said were (A + D)/(A + B+ C+ D)
correct. This is a disagreement.
C => The total number of instances that Rater
2 said were incorrect, but Rater 1 said were
correct. This is also a disagreement.
D => The total number of instances that both
Raters said were incorrect. Raters are in
agreement.
Content validity
 Content validity is defined as “the degree to which
items in an instrument reflect the content
universe to which the instrument will be
generalized” (Straub, Boudreau et al. 2004).
 It is highly recommended to apply content validity
while the new instrument is developed.
 In general, content validity involves evaluation of
a new survey instrument in order to ensure that it
includes all the items that are essential and
eliminates undesirable items to a particular
construct domain (Lewis et al., 1995, Boudreau et
al., 2001).
Steps involved in Content
Validity
 An exhaustive literature reviews to extract the
related items.
 A content validity survey is generated (each item is
assessed using three point scale (not necessary,
useful but not essential and essential).
 The survey should sent to the experts in the same
field of the research.
 The content validity ratio (CVR) is then calculated for
each item by employing Lawshe (1975) ‘s method.
 Items that are not significant at the critical level are
eliminated.
Construct validity
 Construct validity refers to how well you translated or transformed
a concept, idea, or behaviour that is a construct into a functioning
and operating reality, the operationalization.
 Construct validity testifies to how well the results obtained from the

use of the measure fit the theories around which the test is
designed.
 Construct validity has two components: Convergent and

Discriminant(Divergent)validity.
 Discriminant validity is the extent to which latent variable A

discriminates from other latent variables (e.g., B, C, D).


 Discriminant validity means that a latent variable is able to account

for more variance in the observed variables associated with it than


a) measurement error or similar external, unmeasured
influences;
orb) other constructs within the conceptual framework.
 If this is not the case, then the validity of the individual indicators

and of the construct is questionable.


 In brief, Discriminant validity (or divergent validity) tests that

constructs that should have no relationship do, in fact, not have


Construct validity
 Construct validity refers to how well a test or tool measures the construct that it
was designed to measure. There are two types of construct validity: convergent and
discriminant validity
 Convergent validity tests that constructs that are expected to be related.
Convergent validity is established when the scores obtained with two different
instruments measuring the same concept are highly correlated.
 With the purpose of verifying the construct validity (discriminant and convergent
validity), a factor analysis can be conducted utilizing principal component analysis
(PCA) with varimax rotation method. Items loaded above 0.40, which is the
minimum recommended value in research are considered for further analysis.
 Also, items cross loading above 0.40 should be deleted. Therefore, the factor
analysis results will satisfy the criteria of construct validity including both the
discriminant validity (loading of at least 0.40, no cross-loading of items above 0.40)
and convergent validity (eigenvalues of 1, loading of at least 0.40, items that load
on posited constructs)
Criterion-related validity
 Criterion-related validity is established when the measure differentiates
individuals on a criterion it is expected to predict. A test has this type of
validity if it is useful for predicting performance or behavior in another situation
(past, present, or future)
 Criterion validity reflects the use of a criterion - a well-established
measurement procedure - to create a new measurement procedure to measure
the construct you are interested in. The criterion and the new measurement
procedure must be theoretically related. The measurement procedures could
include a range of research methods (e.g., surveys, structured observation, or
structured interviews, etc.), provided that they yield quantitative data.
 This can be done by establishing postdictive validity, concurrent validity or
predictive validity, as explained below.
 Postdictive validity: if the test is a valid measure of something that happened
before
 Concurrent validity is established when the same construct is measured by two
different instruments at same time and the results are consistent
 Predictive validity indicates the ability of the measuring instrument(e.g. job
knowledge test) to differentiate among individuals with reference to a future
criterion(e.g. job performance).Strong consistent relationship talk about high
predictive validity
Case-in -point
For market researchers, criterion validity is crucial, and can
make or break a product. One famous example is when Coca-
Cola decided to change the flavor of their trademark drink.
Diligently, they researched whether people liked the new flavor,
performing taste tests and giving out questionnaires. People
loved the new flavor, so Coca-Cola rushed New Coke into
production, where it was a titanic flop.
The mistake that Coke made was that they forgot about
criterion validity, and omitted one important question from the
survey.
People were not asked if they preferred the new flavor to the
old, a failure to establish concurrent validity.
The Old Coke, known to be popular, was the perfect
benchmark, but it was never used. A simple blind taste test,
asking people which flavor they preferred out of the two, would
have saved Coca Cola millions of dollars.
Ultimately, the predictive validity was also poor, because their
good results did not correlate with the poor sales. By then, it
was too late!
External vs Internal
validity
 External validity refers to the extent of generalizability of
the results of a causal study to other settings, people, or
events.
 Internal validity refers to the degree of our confidence in
the causal effects (i.e., that variable X causes variable Y).
 Field experiments have more external validity (i.e., the
results are more generalizable to other similar
organizational settings), but less internal validity (i.e., we
cannot be certain of the extent to which variable X alone
causes variable Y).
 Note that in the lab experiment, the reverse is true. The
internal validity is high but the external validity is rather
low.
Factors affecting internal validity
 The seven major threats to internal validity are the effects of history, maturation, testing,
instrumentation, selection, statistical regression, and mortality.
 Certain events or factors that would have an impact on the independent vari- able–
dependent variable relationship might unexpectedly occur while the experi- ment is in
progress, and this history of events would confound the cause-and-effect relationship
between the two variables, thus affecting the internal validity.
 Cause-and-effect inferences can also be contaminated by the effects of the pas- sage of
time—another uncontrollable variable. Such contamination is called maturation effects.
 Frequently, to test the effects of a treatment, subjects are given what is called a pretest
(say, a short questionnaire eliciting their feelings and attitudes). That is, first a measure of
the dependent variable is taken (the pretest), then the treat- ment given, and after that a
second test, called the posttest, administered. The difference between the posttest and
the pretest scores is then attributed to the treatment.
 Instrumentation effects are yet another source of threat to internal validity. These might
arise because of a change in the measuring instrument between pretest and posttest, and
not because of the treatment‘s differential impact at the end .
 The threat to internal validity could also come from improper or unmatched selection of
subjects for the experimental and control groups.
 The effects of statistical regression are brought about when the members chosen for the
experimental group have extreme scores on the dependent variable to begin with.
 Another confounding factor on the cause-and-effect relationship is the mortality or
attrition of the members in the experimental or control group or both, as the experiment
progresses. When the group composition changes over time across the groups,
comparison between the groups becomes difficult, because those who dropped out of the
experiment may confound the results
Factors affecting External Validity
 Internal validity raises questions about whether
it is the treatment alone or some additional
extraneous factor that causes the effects,
external validity raises issues about the
generalizability of the findings to other settings.
 Thus, subject selection and its interaction with
the treatment would also pose a threat to
external validity.
 Maximum external validity can be obtained by
ensuring that, as far as possible, the lab
experimental conditions are as close to and
compatible with the real-world situation.
Validity in Lab Experiment
 Internal validity refers to the confidence we place in the cause-and-
effect rela- tionship. In other words, it addresses the question, ―To
what extent does the research design permit us to say that the
independent variable A causes a change in the dependent
variable B?
 As Kidder and Judd (1986) note, in research with high
internal validity, we are relatively better able to argue that
the relationship is causal, whereas in studies with low
internal validity, causality can- not be inferred at all. In lab
experiments where cause-and-effect relationships are
substantiated, internal validity can be said to be high.
 To what extent would the results found in the lab setting be
transferable or gen- eralizable to the actual organizational or field
settings? In other words, if we do find a cause-and-effect relationship
after conducting a lab experiment, can we then confidently say that
the same cause-and-effect relationship will also hold true in the
organizational setting?
Reliability
 The reliability of a measure is an indication of the
stability and consistency with which the
instrument measures the concept and helps to assess
the ―goodness of a measure.
 The ability of a measure to remain the same over
time—despite uncontrollable testing conditions or the
state of the respondents themselves—is indicative of
its stability and low vulnerability to changes in the
situation. Two tests of stability are test–retest
reliability and parallel-form reliability.
 The internal consistency of measures is indicative of
the homogeneity of the items in the measure that tap
the construct. Inter-item consistency reliability and
split-half reliability are two test for consistency.
Reliability
Reliability is stability and consistency
 across time(test retest reliability)
 across instruments(Parallel-form reliability)
 across researchers(inter-rater reliability )
 across items(internal consistency)
Test-Retest Reliability

 Used to assess the consistency of a measure from one time to


another.
 This approach assumes that there is no substantial change in the
construct being measured between the two occasions. The
amount of time allowed between measures is critical.
 The shorter the time gap, the higher the correlation; the longer
the time gap, the lower the correlation.
 This is because the two observations are related over time -- the
closer in time we get the more similar the factors that contribute
to error
Parallel-Forms Reliability

 Used to assess the consistency of the results of two tests


constructed in the same way from the same content
domain.
 In parallel forms reliability you first have to create two
parallel forms. One way to accomplish this is to create a
large set of questions that address the same construct and
then randomly divide the questions into two sets.
 this approach makes the assumption that the randomly
divided halves are parallel or equivalent
Inter-Rater or Inter-Observer Reliability

 Used to assess the degree to which different raters/observers


give consistent estimates of the same phenomenon.
 Inter-rater reliability is especially relevant when the data are
obtained through observations, projective tests, or
unstructured interviews, all of which are liable to be
subjectively interpreted.
 There are two major ways to actually estimate inter-rater
reliability.
 Measure is in category scale
 Measure is a continuous one
Internal Consistency Reliability

 Used to assess the consistency of results across items within a


test.
 In internal consistency reliability estimation single
measurement instrument administered to a group of people on
one occasion to estimate reliability.
 In effect the reliability of the instrument by estimating how well
the items that reflect the same construct yield similar results.
 looking at how consistent the results are for different items for
the same construct within the measure.
 There are a wide variety of internal consistency measures that
can be used like
 Average Inter-item Correlation
 Average Item total Correlation
 Split-Half Reliability
 Cronbach's Alpha (a)
Average Inter-item
Correlation
 The average inter-item correlation uses all
of the items on our instrument that are
designed to measure the same construct.
 First the correlation between each pair of

items, as illustrated in the figure.


 The average inter-item correlation is simply

the average or mean of all these


correlations.
Average Item total Correlation

 This approach also uses the inter-item


correlations.
 In addition a total score for the six items and use
that as a seventh variable in the analysis.
 The figure shows the six item-to-total correlations
at the bottom of the correlation matrix. They
range from .82 to .88 in this sample analysis, with
the average of these at .85.
Split-Half Reliability

 In split-half reliability we randomly divide all


items that purport to measure the same construct
into two sets.
 We administer the entire instrument to a sample
of people and calculate the total score for each
randomly divided half.
 The split-half reliability estimate, as shown in the
figure, is simply the correlation between these
two total scores. In the example it is .87
Cronbach's Alpha (a)

 Cronbach's Alpha is mathematically


equivalent to the average of all possible split-
half estimates, although that's not how we
compute it.
Comparison of Reliability Estimators

 Each of the reliability estimators has certain


advantages and disadvantages.
 Inter-rater reliability is one of the best ways
to estimate reliability when your measure is
an observation.
 However, it requires multiple raters or
observers.
 As an alternative, one could look at the
correlation of ratings of the same single
observer repeated on two different occasions
Validity and Reliability of
Secondary data
 Data that do not measure latent constructs,
not required for factor analysis for
determining validity.
 Reliability is required to be measured when

the same construct is measured through


different indicators.

You might also like