Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
15 views5 pages

Participant Nonnaiveté and The Reproducibility of Cognitive Psychology

Uploaded by

anylogin333
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views5 pages

Participant Nonnaiveté and The Reproducibility of Cognitive Psychology

Uploaded by

anylogin333
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Psychon Bull Rev (2018) 25:1968–1972

DOI 10.3758/s13423-017-1348-y

BRIEF REPORT

Participant Nonnaiveté and the reproducibility


of cognitive psychology
Rolf A. Zwaan 1 & Diane Pecher 1 & Gabriele Paolacci 2 & Samantha Bouwmeester 1 &
Peter Verkoeijen 1,3 & Katinka Dijkstra 1 & René Zeelenberg 1

Published online: 25 July 2017


# The Author(s) 2017. This article is an open access publication

Abstract Many argue that there is a reproducibility crisis in estimate is that fewer than half of the findings in cognitive
psychology. We investigated nine well-known effects from the and social psychology are reproducible (Open Science
cognitive psychology literature—three each from the domains Collaboration, 2015). In addition, there have been several been
of perception/action, memory, and language, respectively—and high-profile, preregistered, multi-lab failures to replicate well-
found that they are highly reproducible. Not only can they be known effects psychology (Eerland et al., 2016; Hagger et al.,
reproduced in online environments, but they also can be 2016; Wagenmakers et al., 2016). A similar multi-lab replica-
reproduced with nonnaïve participants with no reduction of tion psychology that was considered successful yielded an ef-
effect size. Apparently, some cognitive tasks are so constraining fect size that was much smaller than the original (Alogna et al.
that they encapsulate behavior from external influences, such as 2014). These findings have engendered pessimism about
testing situation and prior recent experience with the experi- reproducibility.
ment to yield highly robust effects. Coincident with the start of the reproducibility debate was
the advent of online experimentation. Crowd-sourcing
Keywords Replication . Reproducibility . Perception . websites, such as Amazon Mechanical Turk, offered the pros-
Memory . Language pect of more efficient, powerful, and generalizable ways of
testing psychological theories (Buhrmester, Kwang, &
Gosling, 2011). The lower monetary costs and the more
A hallmark of science is reproducibility. A finding is promoted time-efficient way of conducting experiments online rather
from anecdote to scientific evidence if it can be reproduced than in a physical lab allowed researchers to recruit larger
(Lykken, 1968; Popper, 1959). There is growing awareness that numbers of participants across broader geographical, age,
problems exist with reproducibility in psychology. A recent and educational ranges of participants compared with under-
graduates (Paolacci & Chandler, 2014). However, online ex-
perimentation presents challenges, typically associated with
Electronic supplementary material The online version of this article
(doi:10.3758/s13423-017-1348-y) contains supplementary material,
the loss of control over the testing environment and conditions
which is available to authorized users. (Bohannon, 2016). Most relevant to the reproducibility de-
bate, online participant pools are large but not infinite, and
* Rolf A. Zwaan hundreds of studies are conducted on the same participant
[email protected] pool every day, familiarizing participants with study materials
and procedures (Chandler, Mueller, Paolacci, 2014; Stewart
1
Department of Psychology, Educational, and Child Sciences,
et al., 2015). Of particular concern for reproducibility, partic-
Erasmus University Rotterdam, Burgemeester Oudlaan 50, 3000 ipants may participate in studies in which they have partici-
DR Rotterdam, Netherlands pated before. A recent preregistered study found sizable re-
2
Rotterdam School of Management, Erasmus University Rotterdam, ductions in decision-making effects among participants had
Rotterdam, Netherlands previously participated in the same studies, suggesting that
3
Learning and Innovation Center, Avans University of Applied nonnaïve participants may pose a threat to reproducibility
Sciences, Breda, The Netherlands (Chandler et al., 2015). Indeed, nonnaïve participants have
Psychon Bull Rev (2018) 25:1968–1972 1969

been implicated in failures to replicate and declining effect responses towards what is perceived as normatively correct
sizes (DeVoe & House, 2016; Rand et al., 2014). (Chandler et al., 2015). However, studies in cognitive psy-
Although concerns with reproducibility span the entire chology typically have nontransparent research goals, making
field of psychology and beyond, results in cognitive psychol- memory of previous experiences irrelevant. Accordingly, a
ogy are typically conceived as comparatively robust (Open reduction of effect size due to repeated participation should
Science Collaboration, 2015). We put a sample of these find- be close to zero.
ings to a particularly stringent test by running them under We tested the hypothesis that cognitive psychology is rel-
circumstances that are increasingly representative of current atively immune to nonnaïveté effects in a series of nine
practices of data collection but also are documented as chal- preregistered experiments (https://osf.io/shej3/wiki/home/;
lenging for reproducibility. In particular, we conducted the see Table 1 for descriptions of each experiment). We
first preregistered replication of a large set of cognitive psy- selected these experiments for the following reasons. First,
chological effects in the most popular online participant pool we wanted a broad coverage of cognitive psychology.
(Crump, McDonnell, & Gureckis, 2013 and Zwaan & Pecher, Therefore, we selected three experiments each from the
2012 for non-preregistered replications on MTurk). Most im- domains of perception/action, memory, and language,
portantly, we examined whether reproducibility depends on arguably the major areas in the field of cognitive
participant nonnaïveté by conducting the same experiments psychology. Second, we selected findings that are both well
twice on the same participants a few days apart. known and known to be robust. After all, testing immunity to
Research suggests that access to knowledge obtained from nonnaïveté effects presupposes that one finds effects in the
previous participation (e.g., from alternative conditions or first place. Third, we selected tasks that lend themselves to
elaboration) can affect people’s responses and may reduce online testing. And fourth, we selected tasks that our team
effect sizes when participants accordingly adjust their intuitive had experience with.

Table 1 Brief descriptions of and references to all replicated experiments

Number Task Description Reference

1 Simon task Choice-reaction time task that measures spatial compatibility. Responses are Craft and
faster when a visual target (a red square is presented on the left of the screen) Simon (1970)
is spatially compatible with the response (pressing the left button) than when
the target is spatially incompatible with the response (presented on the right
of the screen).
2 Flanker task Response inhibition task in which relevant information is selected and inappropriate Eriksen and
responses in a certain context are suppressed. Responses are faster for congruent Eriksen (1974)
trials in which compatible distractors flank a central target (AAAAA) than for
incongruent trials in which incompatible distractors flank a central target (AAEAA).
3 Motor priming A task with a priming procedure in which responses to stimuli (arrow probes <<) Forster and
(a = masked, are required that are primed by presented compatible (<<) or incompatible (>>) Davis (1984)
b = unmasked) items. Responses are slower for compatible items when primes are masked but
faster when primes are visible.
4 Spacing effect Learning task in which learning (of words) is spaced over time. Recall of words is Greene (1989)
higher for spaced item repetitions with intervening items than for massed items
immediately repeated after their first presentation.
5 False memories Memory task that assesses false memory of recognition performance of items that Roediger and
have not been presented before in a word list but tend to be recognized as McDermott (1995)
presented before because they are semantically related to the words in the list.
6 Serial position Memory task that examines recall probability based on a word’s position in a list. Murdock (1962)
(a = primacy, Recall is higher for the first and last words in the list and lowest for items in the
b = recency) middle of the list.
7 Associative priming Implicit memory task which requires a response to a target word that is preceded Meyer and
by prime word. Responses are faster when the prime is related than when the Schvaneveldt (1971)
prime is unrelated.
8 Repetition priming Implicit memory task in which speed of response depends on previous exposure Forster and
(a = low frequency, to an item and the word frequency of that item. Responses are faster for Davis (1984)
b = high frequency) repeated than for new items. This repetition effect is larger for low
frequency words than high frequency words.
9 Shape simulation Sentence-verification task that requires a response on whether the object in a picture Zwaan, Yaxley, and
was present in the previous sentence. Yes responses are faster when the picture Stanfield (2002)
matches the implied shape mentioned in sentence than when it mismatches.
1970 Psychon Bull Rev (2018) 25:1968–1972

Although these findings have proven to be highly reproduc- expected null effect for the crucial interactions, power analyses
ible in the laboratory, their robustness in an online environment could not be used to determine our sample sizes, because these
has not yet been established in preregistered experiments. More analyses require that one predicts an effect and that one has
importantly, it is unknown whether these findings are robust to strong arguments for its magnitude. Hence, we decided to obtain
the presence of nonnaïve participants. We tested this hypothesis more observations than is typically done in previous experi-
by replicating each study in the most conservative case—in ments examining the same effects. By doing so, our parameter
which all participants encountered the study before. estimates are relatively precise.

Exclusion criteria
General method
Data from participants with an accuracy <80% in RT tasks or
Detailed method descriptions for each experiment can be an accuracy <10% in memory tasks or a mean (reaction time)
found in the Supplementary Materials. Participants were tested RT longer than the group M + 3SD were excluded. Data from
in two waves using the Mechanical Turk platform. Approval for each participant in the RT tasks were trimmed by excluding
data collection was obtained from the Institutional Review trials where the trial RT deviated more than 3SD from the
Board in the Department of Psychology at Erasmus University subject M. From the remainder, participants were excluded
Rotterdam. All experiments were programmed in Inquisit. The (starting with those who participated last) to create equal num-
Inquisit scripts used for collecting the data can be found bers of participants per counterbalancing version.
at https://osf.io/ghv6m/. At the end of wave 1 of each Participants were recruited via Amazon Mechanical Turk.
experimental task, participants were asked to provide the The subjects participated in two waves, held approximately
following information: age, gender, native language, 3 days apart. In the second wave, half of the subjects participat-
education. At the end of both waves, we asked the following ed in an exact copy of the experiment they had participated in
questions, all of which could be responded to by selecting one before; the other half participated in a version that had an iden-
of the alternatives Bnot at all,^ Bsomewhat,^ or Bvery much^: tical instruction and procedure but used different stimuli. A
BI’m in a noisy environment^; BThere are a lot of distractions recent study demonstrated that certain findings replicated with
here^; BI’m in a busy environment^; BAll instructions were the same but not with a different set of (similar) stimuli (Bahník
clear^; BI found the experiment interesting^; BI followed the & Vranka, 2017). Our manipulation allowed us to examine
instructions closely^; BThe experiment was difficult^; BI did whether changing the surface features of an experiment (i.e.,
my best on the task at hand^; BI was distracted during the the stimuli) affects the reproducibility of its effect in the same
experiment.^ sample of subjects. Each experiment had a sample size of 80 per
In all experiments, different versions of materials and, in between-subjects condition (same stimuli vs. different stimuli).
some cases, key assignments were created. Different ver-
sions ensured counterbalancing of stimulus materials and
key assignments. Participants were randomly assigned to General results
one of the versions when they participated in wave 1.
Then, upon return 3 or 4 days later for wave 2, half of Detailed results per experiment are described in the
the participants were assigned to the exact same version Supplementary Materials. Data for all experiments can be found
of the experiment and the other half were assigned to a here: https://osf.io/b27fd/. The results can be summarized as
different version such that there was zero overlap between follows. First, the first wave yielded highly significant effects
the stimuli in the first and second wave. Participants who for all nine experiments, with in each case Bayes factors in
had participated in one of the experiments were not prohibited excess of 10,000 in support of the prediction. Second, each
from participating in the other experiments. effect was replicated during the second wave. Third, effect size
did not vary as a function of wave; Bayes factors showed
Sampling plan moderate to very strong support for the null hypothesis.
Fourth, it did not matter whether subjects had previously
For each experiment, we started with recruiting 200 participants: participated in the exact same experiment or one with different
100 on Monday and 100 on Thursday. Three or four days after stimuli. The main results are summarized in Fig. 1. The x-axis
the first participation, each participant was invited to participate displays the wave-1 effect sizes and the y-axis the wave-2 effect
again. Our goal was to have a final sample size of 80 partici- sizes. The blue dots indicate the same-stimuli condition and the
pants per condition (same items or different items on the second red dots the different-stimuli condition. The numbers indicate
occasion), taking into account nonresponses and the exclusion the specific experiment (e.g., 5 = false memory).
criteria below. Whenever we ended up with fewer than 80 par- In the preregistration, we stated that BBayesian analysis
ticipants per condition, we recruited another batch. Because we will be used to determine whether the effect size difference
Psychon Bull Rev (2018) 25:1968–1972 1971

3.0

2.5
3b
3b

2.0

same
Effect size Wave 2

different
1.5
1
8a 6b
8a
3a 5 5
1
3a
1.0 4
7
6b
7
6a 4 6a

9 2
0.5
8b
2
9 8b

0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Effect size Wave 1
Fig. 1 Wave 1 effect size versus wave 2 effect size (Cohen’s d). Effect plotted for same materials between sessions (blue solid dots) and different
sizes were computed in JASP (JASP Team, 2017). Diagonal line repre- materials between sessions (red striped dots). Labels correspond to the
sents equal effect sizes. For each experiment separate effect sizes are different experiments listed in Table 1.

between waves 1 and 2 better fits a 0% reduction model or a impossible that some of our participants had previously
25% reduction model.^ However, the absence of a reduction participated in similar experiments. For these participants,
in effect sizes from wave 1 to wave 2—the wave 2 effect sizes wave 1 would actually be wave N+1 and wave 2 would be
were, if anything, larger than the wave 1 effect sizes—ren- wave N+2. Nevertheless, it appears that the tasks used in
dered the planned analysis meaningless. We therefore did this study are so constraining that they encapsulate behav-
not conduct this analysis. ior from contextual variation and even from recent rele-
vant experiences to yield highly reproducible effects. We
should add a note of caution. What we have examined are
General discussion the basic effects with each of these paradigms. In the
literature, one often finds variations that are designed to
Overall, these results present good news for the field of examine how the basic effect varies as a function of some
psychology. In contrast to findings in other parts of the other factor, such as manipulations of instructions, stimu-
field (Chandler et al., 2015), the effects we studied were lus materials (e.g., emotional vs. neutral stimuli), subject
reproducible in samples of nonnaïve participants, which population (patients vs. controls) of the addition of a sec-
are increasingly becoming the staple of psychological re- ondary task. The jury is still out on whether such second-
search. What the tasks used in this research have in com- ary findings are as robust as the more basic findings we
mon is that they (1) use within-subjects designs and (2) have presented here.
have opaque goals. Although it is clear that participants
may learn something from their previous experience with Author contributions R.A. Zwaan developed the study concept. All
authors contributed to the study design. Testing and data collection were
the experiments (e.g., response times were often faster in
performed by D. Pecher and S. Bouwmeester. D. Pecher and R.A. Zwaan
wave 2 than in wave 1), this learning did not extent to the performed the data analysis. R.A. Zwaan, D. Pecher, and G. Paolacci
nature of the manipulation. We should note that it is not drafted the manuscript, and all other authors provided critical revisions.
1972 Psychon Bull Rev (2018) 25:1968–1972

All authors approved the final version of the manuscript for submission. Forster, K. I., & Davis, C. (1984). Repetition Priming and Frequency
We thank Frederick Verbruggen and Hal Pashler for helpful feedback on a Attenuation. Journal of Experimental Psychology: Learning,
previous version of this paper. Memory, and Cognition, 10, 4.
Greene, R. L. (1989). Spacing effects in memory: Evidence for a two-
Open Access This article is distributed under the terms of the Creative process account. Journal of Experimental Psychology: Learning,
Commons Attribution 4.0 International License (http:// Memory, and Cognition, 15, 371–377.
creativecommons.org/licenses/by/4.0/), which permits unrestricted use, Hagger, M. S., Chatzisarantis, N. L. D., Alberts, H., Anggono, C. O.,
distribution, and reproduction in any medium, provided you give Batailler, C., Birt, A. R., ... Zwienenberg, M. (2016). A multilab
appropriate credit to the original author(s) and the source, provide a link preregistered replication of the ego-depletion effect. Perspectives
to the Creative Commons license, and indicate if changes were made. on Psychological Science, 11, 546–573.
Lykken, D. T. (1968). Statistical significance in psychological research.
Psychological Bulletin, 70, 151–159.
References Meyer, D. E., & Schvaneveldt, R. W. (1971). Facilitation in recognizing
pairs of words: Evidence of a dependence between retrieval opera-
tions. Journal of Experimental Psychology, 90, 227–234.
Alogna, V. K., Attaya, M. K., Aucoin, P., Bahnik, S., Birch, S., Birt, A. Murdock, B. B., Jr. (1962). The serial position effect of free recall.
R., & Zwaan, R. A. (2014). Registered replication report: Schooler Journal of Experimental Psychology, 64, 482–488.
& Engstler-Schooler (1990). Perspectives on Psychological Science,
Open Science Collaboration. (2015). Estimating the reproducibility of
9, 556–578.
psychological science. Science, 349(6251), aac4716. doi:10.1126/
Bahník, S., & Vranka, M. A. (2017). If it’s difficult to pronounce, it might
science.aac4716
not be risky. The effect of fluency on judgment of risk does not
Paolacci, G., & Chandler, J. (2014). Inside the Turk: Understanding
generalize to new stimuli. Psychological Science, 28, 427–436.
Mechanical Turk as a participant pool. Current Directions in
Bohannon, J. (2016). Mechanical Turk upends social sciences. Science,
Psychological Science, 23, 184–188.
352, 1263–1264.
Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon's Popper, K. R. (1959). The Logic of Scientific Discovery, translation of
Mechanical Turk a new source of inexpensive, yet high-quality, Logik der Forschung. Oxford: Routledge.
data? Perspectives on Psychological Science, 6, 3–5. Rand, D. G., Peysakhovich, A., Kraft-Todd, G. T., Newman, G. E.,
Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaïveté among Wurzbacher, O., Nowak, A. W., & Greene, J. D. (2014). Social
Amazon Mechanical Turk workers: Consequences and solutions for heuristics shape intuitive cooperation. Nature Communications,
behavioral researchers. Behavior Research Methods, 46, 112–130. 5, 4677.
Chandler, J., Paolacci, G., Peer, E., Mueller, P., & Ratliff, K. A. (2015). Roediger, H. L., & McDermott, K. B. (1995). Creating false memories:
Using nonnaive participants can reduce effect sizes. Psychological Remembering words not presented in lists. Journal of Experimental
Science, 26, 1131–1139. Psychology: Learning, Memory, and Cognition, 24, 803–814.
Craft, J. L., & Simon, J. R. (1970). Processing symbolic information from Stewart, N., Ungemach, C., Harris, A. J. L., Bartels, D. M., Newell, B. R.,
a visual display: Interference from an irrelevant directional cue. Paolacci, G., & Chandler, J. (2015). The average laboratory samples
Journal of Experimental Psychology, 83, 415–420. a population of 7,30 Amazon Mechanical Turk workers. Judgment
Crump, M. J. C., McDonnell, J. V., & Gureckis, V. M. (2013). Evaluating and Decision Making, 10, 479–491.
Amazon’s Mechanical Turk as a tool for experimental behavioral JASP Team (2017). JASP (Version 0.8.1.2)[Computer software].
research. PLoS One, 8, e57410. Wagenmakers, E.-J., Beek, T., Dijkhoff, L., Gronau, Q. F., Acosta, A.,
DeVoe, S. E., & House, J. (2016). Replications with MTurkers who are Adams, R. B., Jr., & Zwaan, R. A. (2016). Registered Replication
naïve versus experienced with academic studies: A comment on Report: Strack, Martin, & Stepper (1988). Perspectives on
Connors, Khamitov, Moroz, Campbell, and Henderson (2015). Psychological Science, 11, 917–928.
Journal of Experimental Social Psychology, 67, 65–67. Zwaan, R. A., & Pecher, D. (2012). Revisiting mental simulation in
Eerland, A., Sherrill, A. M., Magliano, J. P., Zwaan, R. A., Arnal, J. D., language comprehension: Six replication attempts. PLoS One, 7,
Aucoin, P., & Prenoveau, J. M. (2016). Registered replication report: e51382.
Hart & Albarracín (2011). Perspectives on Psychological Science, Zwaan, R. A., Yaxley, R., & Stanfield, R. (2002). Language comprehenders
11, 158–171. mentally represent the shape of objects. Psychological Science, 13,
Eriksen, B. A., & Eriksen, C. W. (1974). Effects of noise letters upon the 168–171. Experiment 1.
identification of a target letter in a non search task. Perception and
Psychophysics, 16, 143–149.

You might also like