Bilingual Lexicon Study

Evidence for a Cascade Model of Lexical Access in Speech Production
Ezequiel Morsella and Michele Miozzo

Columbia University
How word production unfolds remains controversial. Serial models posit that phonological encoding
begins only after lexical node selection, whereas cascade models hold that it can occur before selection.
Both models were evaluated by testing whether unselected lexical nodes influence phonological encoding
in the picturepicture interference paradigm. English speakers were shown pairs of superimposed
pictures and were instructed to name one picture and ignore another. Naming was faster when target
pictures were paired with phonologically related (bedbell) than with unrelated (bedpin) distractors.
This suggests that the unspoken distractors exerted a phonological influence on production. This finding
is inconsistent with serial models but in line with cascade ones. The facilitation effect was not replicated
in Italian with the same pictures, supporting the view that the effect found in English was caused by the
phonological properties of the stimuli.
Converging evidence from reaction-time experiments (e.g.,
Schriefers, Meyer, & Levelt, 1990), error analyses (e.g., Fromkin,
1971; Garrett, 1980), and brain-lesion studies (e.g., Badecker,
Miozzo, & Zanuttini, 1995; Goodglass, Kaplan, Weintraub, &
Ackerman, 1976; Kay & Ellis, 1987) suggests that there are at least
two levels of representation at play during lexical processing in
word production. At one level of representation, a node pointing to
the words syntactic features is assumed to exist for each word
known by a speaker. The lexical node cat, for instance, points to
the features of number (singular vs. plural) and grammatical class
(noun), among other syntactic features. Another level of represen-
tation encodes information about a words phonologyfor exam-
ple, that the word cat (a) is composed of the phonemes /k/, //, and
/t/; (b) is monosyllabic, (c) has only one vowel, and so forth.
Consistent with this form of lexical organization, lexical retrieval
in word production appears to involve two distinct stages: One
stage is devoted to the selection of a words lexical node and its
syntactic features, and the other stage is aimed at retrieving word
phonology (Butterworth, 1989; Caramazza, 1997; Dell, 1986; Gar-
rett, 1980; Levelt, 1989; MacKay, 1987; Stemberger, 1985). We
refer to these stages as lexical node
1
selection and phonological
encoding. Furthermore, there is little disagreement about two
additional assumptions concerning lexical retrieval in word pro-
duction. First, the semantic system activates a cohort of related
lexical nodes. If the speaker wants to say cat, for instance, the
lexical nodes for tiger, dog, and whisker receive activation
along with cat. In normal circumstances the target lexical
nodecat, in our exampleis selected because it reaches the
highest level of activation. Second, selection proceeds in a fixed
order: The selection of lexical nodes precedes that of word pho-
nology. But the agreement ends here, and relevant issues concern-
ing the architecture of the lexical system and the dynamics of
lexical access remain controversial.
An issue that is a matter of debate among theories of word
production is how activation flows between the two levels of
lexical representation. Is it the case that word phonology is acti-
vated only after a lexical node has been selected, or can activation
from lexical nodes flow onto the phonological level before lexical
node selection has taken place? One major view of speech pro-
duction contends that speech production is strictly a serial process,
with phonological encoding beginning only after a lexical node has
been selected (Butterworth, 1992; Garrett, 1980; Levelt, Roelofs,
& Meyer, 1999; Roelofs, 1992; Schriefers et al., 1990). From this
point of view, only the phonological representations of the selected
lexical node will be activated. Another widely held view proposes
that although phonological forms can only be activated after lex-
ical nodes, the activation at the lexical level can flow onto the
phonological level before lexical selection has taken place (e.g.,
Caramazza, 1997; Dell, 1986; Harley, 1993; Humphreys, Riddoch,
& Quinlan, 1988; MacKay, 1987; Stemberger, 1985). These two
views have been referred to as the serial and cascade hypotheses
of speech production.
A crucial distinction between the two hypotheses is whether
unselected lexical nodes activate phonological representations:
Cascade models posit that unselected lexical nodes do activate
phonology, whereas serial models claim that only the selected
node can activate phonology. Evidence that the phonology of
1
In psycholinguistics, this representation has traditionally been referred
to as the lemma (Kempen & Huijbers, 1983; Levelt, 1989), but we avoid
this term because it is associated with specific theoretical views. Instead,
we use the more neutral term lexical node.
Ezequiel Morsella and Michele Miozzo, Department of Psychology,
Columbia University.
The research reported here was done by Ezequiel Morsella in partial
completion of the doctoral program at Columbia University under the
supervision of Michele Miozzo. The work was supported by a Keck
Foundation grant. We gratefully acknowledge the comments of Robert M.
Krauss. We thank the Department of Psychology of the University of
Padua, Padua, Italy, for providing the space and equipment for running the
control study. We also thank three anonymous reviewers for their com-
ments and suggestions.
Correspondence concerning this article should be addressed to Michele
Miozzo, Department of Psychology, Columbia University, 401 Schermer-
horn Hall, 1190 Amsterdam Avenue, Mail Code 5501, New York, New
York 10027. E-mail: [email protected]
Journal of Experimental Psychology: Copyright 2002 by the American Psychological Association, Inc.
Learning, Memory, and Cognition
2002, Vol. 28, No. 3, 555563
0278-7393/02/$5.00 DOI: 10.1037//0278-7393.28.3.555
555
unselected words is activated would falsify serial models. Cascade
models, on the other hand, are more difficult to test because when
faced with the potentially problematic finding of a lack of such
activation, they could hypothesize that the activation exists but is
too weak to be detected. In the literature there is mounting evi-
dence consistent with the cascade hypothesis (e.g., Costa, Car-
amazza, & Sebastian-Galles, 2000; Cutting & Ferreira, 1999;
Jescheniak & Schriefers, 1998; Martin, Dell, Saffran, & Schwartz,
1994; Peterson & Savoy, 1998), but there is also some evidence
that is apparently consistent with the serial hypothesis (Levelt et
al., 1991; Schriefers et al., 1990). For reasons outlined below, the
evidence at hand has left the serialcascade question still a matter
of great contention. In this article we address the controversy by
first examining some of the evidence cited in support of the
cascade hypothesis and then by reporting an experiment we con-
ducted to test a crucial distinction between the two models
whether unselected lexical nodes activate phonology.
Historically, one of the first sets of evidence favoring cascade
models came from speech-error analyses. Errors can come in
various guises, and they differ as to how the erroneous word is
related to the intended word. Some errors are semantic (e.g.,
saying tiger instead of dog), and some are phonological (e.g.,
saying dof instead of dog). Because some of the semantic
competitors may also happen to be phonologically related to the
intended word, sometimes a speech error is semantically and
phonologically related to the intended word (e.g., saying rat
instead of cat). These are known as mixed errors. According to
serial accounts, all other things being equal, a mixed error should
be no more likely to occur than a regular semantic error, because
its phonological relation to the intended word is purely incidental.
Contrary to what is predicted by serial models, analyses of spon-
taneous and experimental speech, from normal and brain-lesioned
speakers, show that this is not the case: Mixed errors occur more
frequently than they would if they were selected by chance (Dell
& Reich, 1981; Martin et al., 1994). Cascade theories explain
mixed errors as a product of the phonological activation of un-
selected lexical nodes, which is a natural consequence of the
cascade architecture (Dell, 1986; Stemberger, 1985). Specifically,
these errors result from the phonological activation of lexical
nodes that are semantically related to the target word. For instance,
when intending to say cat, the semantically related lexical nodes
rat and pig also send activation to the phonological level.
Because rat happens to be phonologically related to cat, its pho-
nological features will also receive activation from the lexical node
cat. A lower activation level is reached by the phonological
features of pig, because they are activated only by the lexical node
pig. If the probability of producing an erroneous word is in part
proportional to the activation level reached by the words phono-
logical features, one is more likely to say rat instead of cat
than pig instead of cat. Though mixed errors seem to be
incompatible with serial models, by incorporating some additional
assumptions such errors can be explained by these models. For
example, it is possible to consider mixed errors as reflecting the
functioning of a postencoding, speech production editor, whose
task it is to monitor the production system (Levelt et al., 1991).
The editor is purportedly less accurate in detecting mixed errors
because of these errors semantic and phonological resemblance to
the desired target, which makes them less salient and thus less
detectable as errors.
2
On these grounds, Levelt et al. (1999) dis-
missed the speech-error data, claiming that it cannot serve as
unequivocal support for a cascade model of speech production;
instead, they suggested that stronger evidence would come from
reaction-time experiments, which more closely elucidate the nor-
mal performance of the speech-production process.
Of the reaction-time experiments bearing on the serialcascade
debate (Rapp & Goldrick, 2000, for review), we focus on the
experiments that attempted to determine whether unselected lexi-
cal nodes activate phonologythe issue directly examined by our
experiment. Despite extensive investigations, this remains an issue
of contention, in part because serial models have been capable of
explaining data that, at first, appeared to be inconsistent with their
fundamental tenets. This was the fate, for instance, of the findings
obtained by Starreveld and La Heij (1995) with the pictureword
interference paradigm. In the pictureword interference paradigm,
participants are instructed to name a picture and ignore a written
word distractor that is presented along with the picture. Word
distractors disrupt picture naming, which takes significantly longer
when distractors are shown. However, interference varies as a
function of the relation between the picture and the distractor word
(for reviews see Deyer, 1973; MacLeod, 1991). If one considers
the interference obtained with unrelated pictureword pairs (e.g.,
catsky) as baseline, interference is larger with semantically re-
lated pairs (e.g., cathorse) and significantly reduced with phono-
logically similar pairs (e.g., catmat). Starreveld and La Heij
(1995) were interested in the interference produced by picture
word pairs that were both semantically and phonologically related
(e.g., catrat). According to the serial hypothesis, the semantic
characteristics of a distractor should affect lexical node selection
only, and the phonological features of a distractor should affect
phonological encoding only. This is because, from the serial point
of view, lexical node selection and phonological encoding are
independent processes. Hence, Starreveld and La Heij (1995)
reasoned that the effects from a distractor that is both semantically
and phonologically related to the target should be additivethat is,
no evidence of statistical interaction should be found. Contrary to
these predictions, Starreveld and La Heij (1995) observed a sta-
tistical interaction for semantically and phonologically related
distractors. This interaction was interpreted as problematic for the
serial hypothesis and as support for cascade models, but not
without controversy.
In response to Starreveld and La Heijs (1995) study, Roelofs,
Meyer, and Levelt (1995) pointed out that whether the interaction
found with semantically and phonologically related distractors is at
odds with serial models depends on the interpretation given to the
phonological effects generated by written-word distractors. It is
possible that written-word distractors activate phonology through
mechanisms that bypass the lexical nodes and reach phonology
directly from orthography (for a thorough exposition, see Roelofs
et al., 1995). The point is that the data generated by the picture
2
It is far from clear how the editor would work within the framework of
serial models. According to serial models, the phonology of the intended
word is not available until it is selected. Hence, it is not obvious how the
editor would deem the sound of an erroneous word as incorrect if the
phonology of the intended (albeit unselected) word cannot be retrieved. For
a detailed discussion of the notion of an editor, see Santiago and MacKay
(1999).
556
MORSELLA AND MIOZZO
word interference paradigm could be interpreted in ways that are
consistent with serial models. In short, the arguments raised by
Roelofs et al. (1995) reveal that written words are ill-suited as
stimuli to adjudicate between serial and cascade models of lexical
accessan issue to which we return later.
Another, more recent set of evidence apparently in favor of
cascade models comes from research on bilingual individuals.
Costa et al. (2000) compared naming performance on two groups
of pictures: cognate pictures, the names of which are phonologi-
cally similar in Catalan and Spanish (e.g., gatgato [cat]), and
noncognate pictures, the names of which sound different in Cata-
lan and Spanish (e.g., taulamesa [table]). When proficient bilin-
gual individuals named pictures in Spanish, response latencies
were faster for pictures with cognate names than for those with
noncognate names. Costa et al. offered an explanation of this
finding that is in line with the cascade hypothesis: The selected
node (from the language being used, Spanish) and the unselected
node (from the language not being used, Catalan) both send
activation to the phonological level. Because the phonemes corre-
sponding to cognate words receive activation from two sources,
they reach a higher activation level than the phonemes of noncog-
nate words. The confluence of activation from the two sources
makes cognate words easier to say, all other things being equal.
Yet, it is possible to account for the cognate advantage within
the framework of the serial architecture, as admitted by Costa et al.
(2000). The advantage for cognate pictures might reflect a fre-
quency effectfor a speaker, the combinations of phonemes con-
stituting cognates tend to be very frequent because they happen to
occur in both languages, as in the case of /ga/, which is found in
the CatalanSpanish cognates gatgato [cat]. If naming latencies
are in part a function of the frequency of phoneme combination,
one expects cognate words to be named faster because, on average,
they tend to be composed of frequently occurring phoneme com-
binations. This account, though logically plausible, remains hypo-
thetical, for there is a lack of data showing that the frequency of
phoneme combination affects naming latencies (for a discussion
on these points, see Costa et al., 2000, and Levelt et al., 1999).
The reaction-time data reviewed thus far are naturally accounted
for by cascade models, and serial models can accommodate them
only after making ad hoc assumptions about experimental para-
digms or about certain aspects of lexical access. If these assump-
tions hold, the serial hypothesis is saved. It is not so with the data
obtained by Peterson and Savoy (1998) and Jescheniak and
Schriefers (1998)data that demand a revision of the serial hy-
pothesis. For reasons of brevity, we illustrate only the finding of
Peterson and Savoy here. They used a complex dual-task para-
digm. In each trial, participants named a picture after a given cue
appeared. On some trials, however, a word appeared instead of the
cue, and in those cases participants were instructed to read the
word aloud as fast as possible. The critical stimuli in this experi-
ment were pictures with near-synonym names, that is, pictures
with more than one acceptable name (e.g., couch and sofa). Par-
ticipants were instructed to utter one of the two acceptable
namestypically the one used more frequently (e.g., couch). The
crucial finding was that when presented with the picture couch, for
instance, participants were faster at reading the word soda, which
is phonologically related to the unselected near-synonym of the
picture (i.e., sofa) than the unrelated word fork. Similar findings
were reported by Jescheniak and Schriefers (1998).
Are the findings with near-synonyms reconcilable with the
serial hypothesis? According to Levelt et al. (1999), they are
incompatible with a strong version of the serial hypothesis, a
version which restricts phonological activation to words that are
effectively produced. In an alternative account, Levelt et al. (1999)
proposed what one could call a weak version of the serial hypoth-
esis, according to which there is not only the activation but also the
selection of multiple lexical nodes in the special case of near-
synonyms. Multiple selection occurs because each of the near-
synonyms satisfies the conditions for lexical node selection, and
hence, the lexical system inadvertently licenses the selection of
more than one lexical node. When faced with the picture couch, for
example, both couch and sofa are selected. In these circum-
stances, the phonological features of both near-synonyms are ac-
tivated, leading to the facilitatory effects observed experimentally by
Peterson and Savoy (1998) and by Jescheniak and Schriefers (1998).
If the weak serial hypothesis explains the phenomena of pho-
nological facilitation associated with near-synonyms, then evi-
dence for the multiple selection of other types of words raises
problems for the hypothesis. This is indeed the case for the data
obtained by Cutting and Ferreira (1999) with homophones, words
that sound the same but differ in meaning. The words ball, as in
toy, and ball, as in dance, are an example of homophones. In the
version of the pictureword interference paradigm adopted by
Cutting and Ferreira (1999), pictures appeared along with auditory
distractors. Of interest here are pictureword pairs like (toy) ball
dance, in which the distractor is semantically related to (dance)
ball, a homophone of the picture (toy) ball. Picture-naming laten-
cies were faster for pictureword pairs like (toy) balldance.
Phonological activation from unselected lexical nodes explains
this facilitatory effect. At the semantic level, the word dance
activates the related concept ball (as in a dance). Activation then
flows from the concept (dance) ball to its corresponding lexical
nodes and then to the phonemes that constitute the word ball. If the
target picture is a (toy) ball, the phonological activation from
(dance) ball makes it easier to say ball. Cascade models seem to
provide a natural explanation of the facilitatory effect reported by
Cutting and Ferreira (1999). In contrast, it is not obvious how
serial models can account for this effect, which may turn out to
create serious, if not insurmountable, problems for them.
3
In summary, there is no shortage of findings claiming to show
that unselected lexical nodes can activate their phonological fea-
tures. These findings have been cited in support of cascade models.
The problem, as we have seen, is that these very findings, perhaps
with few exceptions, are not completely incompatible with an
alternative serial hypothesis that restricts phonological activation
to selected lexical nodes. Obviously, attempting to account for this
evidence comes with a cost for serial models, which are forced to
make additional, severely constraining assumptions. Unfortu-
3
Levelt et al. (1999) attempted to provide an explanation for the find-
ings of Cutting and Ferreira (1999) within the framework of their serial
model. Levelt et al. (1999) proposed that the distractor dance will
semantically and phonologically coactivate its associate ball in the per-
ceptual network (p. 17). It was further assumed that the perceptual
network will directly activate the corresponding phonological features in
the production lexicon. However, it remains mysterious to us how the word
dance might activate the semantically related word ball at the perceptual
level, in which semantic information is not specified.
557
A CASCADE MODEL OF LEXICAL ACCESS
nately, without evidence clearly demonstrating that unselected
lexical nodes can activate phonology, deciding between a weak serial
hypothesis and a cascade architecture in speech production remains a
matter of taste. Our experiment represents a further effort to show that
unselected lexical nodes indeed activate phonology and, as such,
attempts to contribute to a resolution of this ongoing debate
between serial and cascade models of lexical processing in speech
production. Below, we describe the paradigm used in our experi-
ment and how it addresses the issues under investigation.
PicturePicture Interference Paradigm
In this paradigm, participants were instructed to name only one
of two colored pictures that were presented simultaneously, with
one superimposed upon the other (see the examples in Figure 1).
The paradigm can be considered a picturepicture variant of the
Stroop task (Stroop, 1935), and participants are required to name
pictures of a given color (green) and ignore pictures of another
color (red). Slightly different versions of the picturepicture par-
adigm have been used in past studies. Glaser and Glaser (1989)
showed picture pairs presented sequentially, and participants were
instructed to name the picture that appeared first or second. With
this paradigm, slower responses were found with semantically
related pairs (e.g., catdog), whereas facilitation occurred with
identical pairs (e.g., catcat). Tipper (1985) used overlapping
composites like ours in a task in which a distractor picture became
the target picture of the following trial. Tipper (1985) used seman-
tically related composites to examine the extent to which the
processing of the distractor picture was inhibited. In short, picture
picture stimuli had never been previously used in the manner
described below for the purpose at hand.
Our experiment examined the effect of pairs formed by pictures
whose names are phonologically related as in BEDbell and PIG
pin. Henceforth, we refer to these pairs as phonologically related
composites, and the target object is presented in capital letters. The
reaction times for such cases were compared with those of pho-
nologically unrelated composites (e.g., BEDpin and PIGbell).
We were operating under the assumption that because speakers did
not name the distractor picture, the lexical node corresponding to
the name of the distractor picture was not selected. This is the same
assumption underlying the pictureword version of the interfer-
ence paradigm (see Levelt et al., 1991; Schriefers et al., 1990).
Because serial models of word production posit that unselected
lexical nodes should not influence phonological processing, pho-
nologically related distractors should not produce any effect what-
soeverneither facilitation nor inhibition. Conversely, any effect
found would be incompatible with serial models. Such an effect,
however, would be in line with cascade models of word
production.
Although an effect in any direction would bear on the serial
cascade issue, we predicted that phonologically related distractors
would facilitate naming, because this is what is found in the
pictureword interference analogue of this experiment (Klein,
1964; Lupker, 1982; Rayner & Posnansky, 1978; Underwood &
Briggs, 1984). It should be noted that in using the picturepicture
paradigm we were avoiding the problems inherent in the interpre-
tation of the phonological effects observed in the pictureword
paradigm, problems that we briefly discussed above. Indeed, the
only possibility for picture distractors to activate their phonology
seems to be through the prior activation of their concepts and
lexical nodes.
To make sure that, in truth, any effect found would be related to
phonology and not to some other property of the composites (e.g.,
the identifiability of the drawings), a control study was carried out
in Italian, a language in which the composites selected for English
do not possess a phonological relationship. If the effect found in
English is truly due to phonology, then it should not be found in
Italian. Presumably, Italian speakers would be as sensitive as
English speakers to any visual or semantic properties of the com-
posites. If in Italian, too, responses are faster for related compos-
ites, then any effect found in English is probably not due to the
phonological relationship between the targets and their distractors.
On the other hand, if in Italian we do not replicate the difference
observed in English, then phonology is the only feasible variable
explaining an effect in English. In sum, the replication in Italian
provides an adequate control for possible artifacts stemming from
material selection and preparation.
Experiment
Picture Naming in English and Italian
The experiment had two parts: In the first part, English native
speakers named a set of picturepicture composites, and in the
second part, Italian native speakers named the same set of com-
posites. In some trials, the two pictures had phonologically related
names in English (as in BEDbell). In another condition, the
pictures were also paired with distractors that were phonologically
Figure 1. Sample stimulus of a phonologically related composite, BED
bell (with the lighter drawing [BED] being the green target and the darker
drawing [BELL] being the red distractor), and an unrelated composite
(BEDhat).
558
MORSELLA AND MIOZZO
unrelated in English. Phonologically unrelated composites con-
sisted of the same pictures forming the related condition but
re-paired in such a way that there was no phonological relation
between the target and distractor in English. By using the same
items for both lists, we controlled for any effects that could arise
from having lists composed of different materials. In Italian, the
pictures forming the composites have phonologically unrelated
names. Two procedures have been adopted to maximize the prob-
ability of detecting a phonological effect in English. First, the
pictures of the phonologically related composites have English
names sharing the largest number of phonemes without being
homophones (as in PIGpin or BEDbell).
4
Second, we selected
only pictures that English speakers named consistently. In a pre-
paratory study, 10 native English speakers named a set of pictures
at a leisurely rate (none of these speakers took part in the exper-
iment proper). Only pictures that received the same name by all 10
speakers were retained for the experiment.
We had a large number of unrelated filler composites (74%), so
that only in a small percentage of the trials (13%) were the names
of the target and distractor pictures phonologically related. This
measure was aimed at discouraging participants from building
strong expectations about the nature of the test materials. Finally,
it is important to note that distractor pictures were never named
during the experiment. This rules out that the phonological effect
comes from having recently named the distractor picture or from
having its name as one of the possible responses.
Method
Participants. Thirty-nine native English-speaking students from Co-
lumbia University were paid for their participation. Another group of 32
native Italian speakers and students at the University of Padua, Padua,
Italy, volunteered their participation in the control study. None of the
Italian-speaking participants reported being a native ItalianEnglish bilin-
gual or having participated in any English courses besides the one-semester
course that is mandatory at the university.
Materials. A total of 152 composites of superimposed red and green
line drawings served as the stimuli. Composites were created with the
program Aldus Superpaint (1993). The target was green, and the distractor
was red. Composites occupied the center of the screen, covering
roughly 3.5 in. (8.89 cm) in diameter. Of the 152 composites, 19 (13%)
depicted objects that were phonologically related in English (as in BED
bell and PIGpin). The pictures of these 19 composites were then randomly
paired again with one another to form the unrelated condition, the items of
which were not phonologically related in English (e.g., BEDpin and
PIGbell; in four cases, different pictures served as phonologically related
and control distractors). The phonologically related composites and their
controls formed the experimental set (see the Appendix for a complete list).
Pictures of the phonologically related composites have monosyllabic
names sharing the consonant onset and, in most cases, the vowel. Care was
taken to make sure that neither the phonologically related nor the unrelated
items had a semantic relationship. The 19 pictures were also paired with a
new set of 38 distractors, and these composites served as one third of the
fillers. We included these 19 target pictures with filler composites so that
participants would have a reasonable number of targets to name in the
practice phase (see below) and during the experimental session. Additional
fillers were obtained by pairing each picture of a new target set (n 19)
with 76 distractor pictures. Filler composites were not phonologically
related in any apparent way. A filler and an experimental composite were
in all respects undistinguishable and named an equal number of times
during the experiment (four times). To summarize, of the 152 composites
that formed the experimental session, 19 (13%) were phonologically re-
lated, 19 served as their control, and 114 were filler items.
The same materials and procedures used with English speakers were
used in the second part of the experiment carried out in Italian. The Italian
names of the composites are given in the Appendix. Related and unrelated
composites do not, in their translation, bear a phonological relationship in
Italian. The only exception is the unrelated pair CANEcampana [DOG
bell], in which the words share the initial consonant and vowel.
Each participant was run through four experimental blocks, each with 38
composites. Within each block, the presentation of the composites was in
a pseudorandom order, according to the following criteria: (a) The first and
last three composites of each block consisted of filler items; (b) each
experimental target was never named more than once per block, and
experimental targets never appeared contiguously within the blocks; and
(c) a composite was never preceded by a pictureword pair that contained
items that were semantically or phonologically related to it. Two lists were
prepared that differed in terms of the order in which items were shown
within the blocks. Half of the participants were presented with one list; the
other half were presented with the other list. The order of presentation of
the four blocks was randomly determined for each participant.
Procedure. Participants were run individually. The session began with
a familiarizationtraining phase, in which participants named all of the 38
target pictures (of the experimental and filler pairs) twice at a leisurely rate.
The pictures were presented as black-on-white drawings. The experimenter
corrected participants whenever they referred to a picture by a name other
than what we had selected for our list (e.g., saying lips instead of
mouth). Next, participants practiced the picturepicture naming task.
Participants were instructed to name the green picture of the composites,
and not the red picture, as fast and as accurately as possible. In the practice
block, each picture appeared twice with distractor pictures that were not
re-presented during the experimental session. We opted for a rather long
training session for two reasons. One reason, put forward in other studies
(e.g., Caramazza & Costa, 2001; Starreveld & La Heij, 1995), was to
ensure that participants would name the pictures without hesitation. The
other reason was to allow participants to master the task of perceptually
segregating two overlapping pictures.
In the practice phase and in the actual experiment, the picturepicture
trial went as follows. A ready prompt (a question mark) appeared, and as
soon as participants pressed the space bar, a fixation point () was shown
at the center of the screen for 700 ms. The fixation point was replaced by
the composite, which remained on the screen until participants responded
or for up to 3,000 ms. Stimulus presentation was controlled by the Psy-
Scope experiment software (Cohen, MacWhinney, Flatt, & Provost, 1993)
on an iMac Macintosh computer. Naming responses were measured with a
microphone (Model 33-3014; Radio Shack; Fort Worth, TX) and a Psy-
Scope button box (Model 2.0.2; New Micros; Dallas, TX). Participant
responses were manually coded by the experimenter. Responses in which
participants hesitated or used names that were different from those ex-
pected, as well as responses presented along with microphone malfunction,
were classified as errors and therefore excluded from analyses. Exceed-
ingly long responses (more than 2 s) were also excluded. A trimming
procedure removed outliers (responses more than three standard deviations
above the mean of a participants experimental items) from the remaining
data.
Results
English. Within the experimental conditions (phonologically
related and their controls) erroneous responses were relatively
infrequent, accounting for less than 2% of the data points (see
4
In a pilot study with items that shared considerably less phonological
overlap (e.g., just the onset, as in SKIRTskull), we found negligible
effects. This finding led us to use pairs with larger phonological overlap.
559
Table 1). Responses longer than 2 s and outliers were observed
in 2.2% of the responses to the experimental items.
As shown in Table 1, English speakers were faster at naming
pictures paired with phonologically related than with unrelated
distractors. Such a conclusion was confirmed by paired t tests that
analyzed participants means, t(38) 3.89, p .0004, and items
means, t(18) 2.11, p .05. Error rates did not differ reliably
between the related and unrelated conditions in a by-subject anal-
ysis, t(38) 1.78, p .08, and in a by-item analysis, t(18)1.83,
p .08, although the statistics approach significance.
5
Italian. Data were analyzed following the same procedure as
in the English condition. The responses of 1 participant were
excluded because they were excessively slow. Error rates were less
than 1% for both conditions, and the difference in error rates
between the two conditions was not statistically reliable (ts 1).
Outliers and responses longer than 2 s were observed in 3.1% of
the data. Regarding response latencies, no difference was found in
Italian between the composites that constituted the phonologically
related and unrelated condition in English (ts 1).
Discussion
In English, we found a sizable phonological effect of 22 ms that
was characterized, as we expected, by faster responses for phono-
logically related than unrelated composites. We have independent
evidence that the distractor pictures are spontaneously named with
the nouns that we selected. This renders more plausible the claim
that the facilitatory effect reflects the phonological overlapping
between the composite names. Moreover, by including the same
pictures in both the related and unrelated conditions we ruled out
that the phonological effect arose because we selected distinct lists
for the two conditions.
The effect obtained in English was not replicated in Italian, and
these contrasting results suggest two conclusions. On the one hand,
it is unlikely that the effect found in English is attributable to
variations in the visual characteristics of related and unrelated
composites. If this were the case, then identical results should have
been obtained in English and Italian. On the other hand, it can be
concluded convincingly that the phonological relation between the
target and distractors was responsible for the facilitatory effect
manifest in English. In fact, word phonology is the only factor
varying between the English and Italian conditions, and hence, it is
the only possible cause for the discrepant findings. The implica-
tions of this experiment for theories of lexical access in speech
production are examined in the General Discussion.
General Discussion
In the picturepicture version of the Stroop task, we found that
participants were faster at naming pictures with distractors that are
phonologically related. We also showed that there is good reason
to believe that this effect is phonological and not an artifact by (a)
using the same pictures in the phonologically related and unrelated
condition, (b) making sure that distractor pictures were named
accordingly in English, and (c) running the experiment with non-
English speakers. No effect was found in Italian, in which the
picture names of the experimental composites do not bear any
phonological similarity. This result is best accounted for by cas-
cade models of speech production, which posit that unselected
lexical nodes can influence phonology. In contrast, the effect
seems to be problematic for serial models. We chose the picture
picture paradigm because it is not a candidate for the type of
criticisms that were aimed at the speech error data and the picture
word interference data. However, it should be noted that our data
are limited in at least one sense: The effect was found with a
relatively small number of items. This small number was due to the
difficulty of finding picture pairs with names that share a high
degree of phonological overlap and the pictures of which have
high naming consistency. It is very likely that we may have
exhausted the pool of such picture pairs in English, making a
replication in English with different items untenable. A test of the
reliability of the phonological effect should then come from rep-
lications in other languages.
Our finding is predicted by the cascade hypothesis of the lexical
system, and within its framework, the phonological activation
from unselected lexical items could be instantiated in many ways.
It could be that activation cascades from the activated lexical
nodes onto the phonological levela type of model assuming a
strictly feed-forward propagation of activation (Humphreys et al.,
1988). In our experiment, participants may have been faster with
related items because the phonological features of these items have
been activated by both the target and the distractor. The following
example illustrates this point. Suppose that the target picture BED
appears along with the distractor bell. By assumption, the distrac-
tor, bell, activates the phonological representations /b/, /e/, and /l/.
At the moment of selecting the phonology of the target, BED, the
onset and vowel of the target word have already received activa-
tion from the distractor lexical node. The activation sent by the
distractor lexical node primes the phonemes of the target, facili-
tating their selection and ultimately the production of the target
word. In another scenario, activation not only proceeds from the
lexical nodes to the phonological features but it also bounces back
from the phonological features to the lexical nodes (Dell, 1986;
Stemberger, 1985). Feedback models can account for the facilita-
tory effect observed in our experiment by positing that phonolog-
ically related distractors not only facilitate the selection of the
word phonemes but also the selection of lexical nodes. This second
effect emerges as follows. Phonologically related distractors acti-
5
This trend may lead one to suspect that there is the possibility of a
speedaccuracy tradeoff. It should be noted, however, that in the vast
majority of the participants (n 30) there is no difference in the error rates
between the related and unrelated conditions (27 made no errors at all and 3
made one error in each condition). In any event, the difference in the total
number of errors in both conditions is minuscule (6 errors).
Table 1
Picture Naming Latencies (M and SEM) and Percentage Errors
Observed in English and Italian
Sprache
Related pairs Unrelated pairs
Response lat.
% errors
Response lat.
% errors M SEM M SEM
English 672 11 1.50 694 12 0.70
Italian 700 12 0.18 707 12 0.54
Note. lat. latencies.
560
MORSELLA AND MIOZZO
vate some of the target features, which in turn send activation back
to their corresponding lexical nodes, including the lexical node of
the target word. The extra activation reaching the target lexical
node from the lower phonological level facilitates the selection of
the lexical node, and in part explains why phonologically related
distractors speed up picture naming. In sum, although both classes
of cascading models can account for the phonological effect ob-
tained in the picturepicture interference paradigm, their accounts
presuppose slightly different mechanisms underlying the phono-
logical effects observed in the task.
A serial explanation for our findings may appeal to a variant of
the multiple-selection hypothesis discussed in the introduction. It
could be proposed that the nature of our Stroop interference task is
so demanding and unnatural that both the target and distractor
lexical nodes are selected for production. Thus, in the case of the
composite BEDbell, the lexical nodes of both words would have
been selected. It could be added that speakers utter only one word
because of the functions of a postencoding editor. This multiple-
selection account is similar to that proposed by Levelt et al. (1999)
to explain the data with near-synonyms (e.g., saying couch
versus sofa; Jescheniak & Schriefers, 1998; Peterson & Savoy,
1998), wherein both lexical nodes are selected but somehow only
one is produced. The first problem with this account of our
findings is that considering the decision processes involved, it is
not clear how the resolution of the multiple selection would lead to
faster instead of slower response times for phonologically related
composites such as BEDbell. If the postencoding editor cannot
resolve mixed errors (Levelt, 1989; Levelt et al., 1999), then it is
reasonable to expect that it would have more difficulties detecting
and correcting similar items than dissimilar ones. A system based
on functional principles of this sort seems at odds with our find-
ings. The second problem is empirical and relates to the contrast-
ing results observed between tasks requiring the production of
single versus multiple phonologically related words. Stroop tasks
are a prototypical example of the first class of tasks, and facilita-
tion is the finding typically associated to phonologically related
distractors (Lupker, 1982; Rayner & Posnansky, 1978; Underwood
& Briggs, 1984). An example of the second class of tasks is the
task devised by Sevald and Dell (1994), in which speakers are
required to utter the largest number of four-words-sequences pos-
sible within 8 s. Having to produce strings composed of onset-
related words ( pickpin) hurts performance, which is less efficient
compared with strings formed by unrelated words. If in the
picturepicture paradigm there is multiple selection, one would
expect that in accord with other multiple-selection tasks, phono-
logically related distractors would lead to interference rather than
facilitation. Finally, it would seem unlikely that proponents of the
serial hypothesis suppose multiple selection in the picturepicture
interference paradigm. Multiple selection was not proposed for the
pictureword variant of the task. On the contrary, data from the
pictureword interference paradigm were cited in support of the
cascade hypothesis (Levelt et al., 1999).
Our results contribute to the increasing series of findings in
favor of a cascade model of lexical selection. As we have seen,
proponents of the serial hypothesis have been able to accommo-
date some of the findings that initially appeared to be irreconcil-
able with their hypothesis. It is not immediately obvious how the
serial account could accommodate our data. A major challenge for
proponents of the serial account would be to provide evidence that
not only supports a serial account but that would also be incom-
patible with cascade models.
In conclusion, we have found that words unselected for produc-
tion, that is, words that will not be spoken, can nonetheless activate
phonology. Although it is difficult to generalize from our experi-
mental setting to everyday contexts, it would be interesting to
discover whether phonology is always activated for all the things
that happen to fall on the perceptual system, or whether this
unintentional activation of phonology occurs only in the context of
speech tasks. Beyond its implication for models of speech produc-
tion, such a finding would bear on theories about the nature in
which output programs are activated and selected in all actions, not
just linguistic ones.
References
Aldus Superpaint [Computer software]. (1993). Aldus: San Diego, CA.
Badecker, W., Miozzo, M., & Zanuttini, R. (1995). The two-stage model of
lexical retrieval: Evidence from a case of anomia with selective preser-
vation of grammatical gender. Cognition, 57, 193216.
Butterworth, B. (1989). Lexical access in speech production. In W.
Marslen-Wilson (Ed.), Lexical representation and process (pp. 108
135). Cambridge, MA: MIT Press.
Butterworth, B. (1992). Disorders of phonological encoding. Cognition,
42, 261286.
Caramazza, A. (1997). How many levels of processing are there in lexical
access? Cognitive Neuropsychology, 14, 177208.
Caramazza, A., & Costa, A. (2001). Set size and repetition in the picture
word interference paradigm: Implications for models of naming. Cog-
nition, 80, 215222.
Cohen, J. D., MacWhinney, B., Flatt, M., & Provost, J. (1993). PsyScope:
A new graphic interactive environment for designing psychology exper-
iments. Behavior Research Methods, Instruments, & Computers, 25,
257271.
Costa, A., Caramazza, A., & Sebastian-Galles, N. (2000). The cognate
facilitation effect. Journal of Experimental Psychology: Learning, Mem-
ory, and Cognition, 26, 12831296.
Cutting, J. C., & Ferreira, V. S. (1999). Semantic and phonological infor-
mation flow in the production lexicon. Journal of Experimental Psy-
chology: Learning, Memory, and Cognition, 25, 318344.
Dell, G. S. (1986). A spreading activation theory of retrieval in sentence
production. Psychological Review, 93, 283321.
Dell, G. S., & Reich, P. A. (1981). Stages in sentence production: An
analysis of speech error data. Journal of Verbal Learning and Verbal
Behavior, 20, 611629.
Deyer, F. N. (1973). The Stroop phenomenon and its use in the study of
perceptual, cognitive, and response processes. Memory & Cognition, 1,
106120.
Fromkin, V. A. (1971). The non-anomalous of anomalous utterances.
Language, 47, 2752.
Garrett, M. F. (1980). Levels of processing in sentence production. In B.
Butterworth (Ed.), Language Production. Vol. 1: Speech and Talk (pp.
177220). London: Academic Press.
Glaser, W. R., & Glaser, M. O. (1989). Context effects in Stroop-like word
and picture processing. Journal of Experimental Psychology: General,
118, 1342.
Goodglass, H., Kaplan, E., Weintraub, S., & Ackerman, N. (1976). The
tip-of-the-tongue phenomenon in aphasia. Cortex, 12, 145153.
Harley, T. A. (1993). Phonological activation of semantic competitors
during lexical access in speech production. Language and Cognitive
Processes, 8, 291309.
Humphreys, G. W., Riddoch, M. J., & Quinlan, P. T. (1988). Cascade
processes in picture identification. Cognitive Neuropsychology, 5, 67
104.
561
Jescheniak, J., & Schriefers, H. (1998). Discrete serial versus cascaded
processing in lexical access in speech production: Further evidence from
the coactivation of near-synonyms. Journal of Experimental Psychol-
ogy: Learning, Memory and Cognition, 24, 12561274
Kay, J., & Ellis, A. (1987). A cognitive neuropsychological case study of
anomia. Brain, 110, 613629.
Kempen, G., & Huijbers, P. (1983). The lexicalization process in sentence
production and naming: Indirect election of words. Cognition, 14, 185
209.
Klein, G. S. (1964). Semantic power measured through the interference of
words with color-naming. American Journal of Psychology, 77, 576
588.
Levelt, W. J. M. (1989). Speaking: From intention to articulation. Cam-
bridge, MA: MIT Press.
Levelt, W. J. M., Roelofs, A., & Meyer, A. S. (1999). A theory of lexical
access in speech production. Behavioral and Brain Sciences, 22, 138.
Levelt, W. J. M., Schriefers, H., Vorberg, D., Meyer, A. S., Pechman, T., &
Havinga, J. (1991). The time course of lexical access in speech production:
A study of picture naming. Psychological Review, 98, 122142.
Lupker, S. J. (1982). The role of phonetic and orthographic similarity in
pictureword interference. Canadian Journal of Psychology, 36, 349
367.
MacKay, D. G. (1987). The organization of perception and action: A
theory for language and other cognitive skills. New York: Springer-
Verlag.
MacLeod, C. M. (1991). Half a century of research on the Stroop effect: An
integrative review. Psychological Bulletin, 109, 163203.
Martin, N., Dell, G. S., Saffran, E. M., & Schwartz, M. F. (1994). Origins
of paraphasias in deep dysphasia: Testing the consequences of a decay
impairment to an interactive spreading activation model of lexical re-
trieval. Brain and Language, 47, 609660.
Peterson, R. R., & Savoy, P. (1998). Lexical selection and phonological
encoding during language production: Evidence for cascaded process-
ing. Journal of Experimental Psychology: Learning, Memory, and Cog-
nition, 24, 539557.
Rapp, B., & Goldrick, M. (2000). Discreteness and interactivity in spoken
word production. Psychological Review, 107, 460499.
Rayner, K., & Posnansky, C. J. (1978). Stages of processing in word
identification. Journal of Experimental Psychology: General, 107, 64
80.
Roelofs, A. (1992) A spreading-activation theory of lemma retrieval in
speaking. Cognition, 42, 107142.
Roelofs, A., Meyer, A. S., & Levelt, W. J. M. (1995). Interaction between
semantic and orthographic factors in conceptually driven naming: Com-
ment on Starreveld and La Heij. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 22, 246251.
Santiago, J., & MacKay, D. G. (1999). Constraining production theories:
Principled motivation, consistency, homunculi, underspecification,
failed predictions, and contrary data. Behavioral and Brain Sciences, 22,
5556.
Schriefers, H., Meyer, A. S., & Levelt, W. J. M. (1990). Exploring the
time-course of lexical access in production: Pictureword interference
studies. Journal of Memory and Language, 29, 86102.
Sevald, C. A., & Dell, G. S. (1994). The sequential cuing effect in speech
production. Cognition, 53, 91127.
Starreveld, P. A., & La Heij, W. (1995). Semantic interference, ortho-
graphic facilitation, and their interaction in naming tasks. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 21, 686
698.
Stemberger, J. P. (1985). Bound morpheme errors in normal and agram-
matic speech: One mechanism or two? Brain and Language, 25, 246
256.
Stroop, J. R. (1935). Studies of interference in serial verbal reactions.
Journal of Experimental Psychology, 18, 643662.
Tipper, S. P. (1985). The negative priming effect: inhibitory priming by
ignored objects. The Quarterly Journal of Experimental Psychology,
37(A), 571590.
Underwood, G., & Briggs, P. (1984). The development of word recognition
processes. British Journal of Psychology, 75, 243255.
562
MORSELLA AND MIOZZO
Received February 26, 2001
Revision received September 14, 2001
Accepted September 14, 2001
Appendix
Picture Stimulus Lists
English names Italian names
Target
Distractors
Target
Distractors
Related Unrelated Related Unrelated
BAG bat cage BORSA pipistrello gabbia
BED bell hat LETTO campana cappello
BOAT bone mouse BARCA osso topo
BOW bowl cork FIOCCO scodella tappo
CAN cat pin BARATTOLO gatto spilla
CANE cage heart BASTONE gabbia cuore
CHAIR chain bat SEDIA corona pipistrello
CLOUD clown bone NUVOLA pagliaccio osso
CORN cork fire PANNOCCHIA tappo fuoco
COW couch bowl MUCCA divano scodella
DOG doll bell CANE bambola campana
FILE fire rain LIMA fuoco pioggia
HAND hat clown MANO cappello pagliaccio
HARP heart crib ARPA cuore culla
MOUTH mouse nest BOCCA topo nido
NET nest doll RETE nido bambola
PIG pin towel MAIALE spilla asciugamano
RAKE rain cat RASTRELLO pioggia gatto
WHIP witch harp FRUSTA strega arpa
563

Bilingual Lexicon Study

Uploaded by

Copyright:

Available Formats

Bilingual Lexicon Study

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bilingual Lexicon Study

Uploaded by

Copyright:

Available Formats

Evidence for a Cascade Model of Lexical Access in Speech Production

Ezequiel Morsella and Michele Miozzo

You might also like