Show Summary Details

Page of

PRINTED FROM OXFORD HANDBOOKS ONLINE ( © Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).

date: 26 April 2019

The Phonetics and Phonology of Bilingualism

Abstract and Keywords

This chapter provides a selective overview of recent research on the phonetics and phonology of bilingualism. The central idea put forth in the chapter is that, in bilingualism and second-language learning, cross-language categories are involved in complex interactions that can take many forms, including assimilations and dissimilations. The sound categories of the two languages of a bilingual seem to coexist in a common representational network and appear to be activated simultaneously in the processing of speech in real time, but some degree of specificity is attested. The chapter then goes on to explore some of the characteristics of cross-language sound interactions, including the fact that these interactions are pliable and appear to be mediated by the structure of the lexicon.

Keywords: phonetics, phonology, bilingualism, cross-language, phonemes, contrasts, acoustics, lexicon, speech perception, spoken-word recognition


People who learn a second language in adulthood often struggle to acquire new sound categories (i.e., sounds that exist only in the target language), new sound contrasts (i.e., phonological distinctions present only in the target language), and/or new detailed articulatory and perceptual habits. These difficulties are, for the most part, predictable from the structure of the first or native language of the learners (Lado, 1957). Even proficient bilinguals often retain a “foreign” accent after many years of experience with their non-native language (e.g., Oyama, 1976; Flege, Yeni-Komshian, and Liu, 1999). A reasonable conclusion is that the first language of bilinguals interacts with the second one. This view, which has by now been corroborated by manifold empirical studies, was articulated as early as 1939 by N. Trubetzkoy, who claimed that bilinguals’ difficulties with the sounds of their non-native language are due to their “incorrect evaluations” of the non-native sounds, which in turn are caused by “the differences between the phonological structure of the foreign language and the mother tongue of the speaker” (Trubetzkoy [1939] 1969: 55). Interlingual interactions are operationalized in all modern theories of second-language phonetics and phonology, such as the Speech Learning Model (Flege, 1995, 2007), the extension of the Perceptual Assimilation Model to include second-language learning (Best and Tyler, 2007), and the Second Language Linguistic Perception model (Escudero, 2005). For Trubetzkoy, native phonology serves as a “sieve” with which listeners evaluate perceived sounds. Second-language learners and bilinguals seem to use the “sieve” of their native language to process the sounds of their second language. It may thus be said that the first language interferes with the second one.1 Interlingual interactions are the focus of the present chapter, and these have been detected in early proficient bilinguals as well as in adult foreign-language learners, in children as well as in adults, in speech production data as well as in speech perception data.

The majority of empirical studies on bilingualism support a view according to which the two languages of a bilingual are integrated. The processing of a bilingual’s two languages is also nonselective. Research on visual and spoken-word recognition in bilinguals, for instance, suggests that the two lexicons are stored in a common space or network. The process of finding words during reading or auditory comprehension does not proceed in a language-selective manner, but activates words from both languages simultaneously (cf. Dijkstra, 2007). The following example from the literature on bilingual spoken-word recognition is very illustrative. In a series of studies V. Marian and M. Spivey (Spivey and Marian, 1999; Marian and Spivey, 2003a, 2003b) tested Russian‒English bilinguals on their auditory comprehension of English words using the visual world paradigm. Participants were asked to interact with a collection of objects placed on a table while wearing a head-mounted eye tracker. The participants’ looks to the objects were captured upon their hearing of a particular instruction such as: Put the marker below the cross. The participants had simultaneous access to four objects. One of the objects was the target object (the one named in the instructions: marker), two were fillers, and a fourth object was an interlingual competitor (an object whose English name bore no phonological similarity with the target item, a stamp, but whose Russian name did: marka is Russian for “stamp”). The results showed that, upon hearing [mark]-, bilinguals were momentarily distracted by the object representing a stamp before settling for the target, a marker. Apparently, not only same-language but also different-language words compete for selection during spoken-word recognition in bilinguals, even in unilingual settings (i.e., when the task requires the use of only one language) (cf. Weber and Cutler, 2004). These findings show that the robust nonselective effects found in the visual word recognition literature (cf. Dijkstra, 2007) were not due to orthographic overlap, but to bona fide processes of nonselective lexical searches. This further suggests that the phonological forms of words are partially shared across the two languages of the bilingual—phonological categories are reutilized.

For the phonological categories that may not be simply “copied” from the native language and “pasted” into the non-native language because they are phonetically different, the evidence suggests that bilinguals develop links between similar sounds across the two languages. These connections are generally revealed by cross-language assimilations, but sometimes they may be reflected in the form of dissimilations (e.g., Flege, 1995, 2007). Flege (1987) provided a relatively early example of a case of cross-language phonetic category assimilation in two groups of French‒English bilinguals. One of the bilingual groups consisted of native speakers of English who immigrated to France as adults and the other consisted of native French speakers who immigrated to the United States as adults. (The voiceless stops were acoustically examined.) Phonetic analyses revealed that the bilinguals’ productions were not identical to the monolingual controls, neither in the target (non-native) language nor in the source (native) language. In other words, this study, as well as many others, suggests that bilinguals establish formal connections between the phonetic categories of their non-native language and the closest phonetic categories of their native language. These cross-linguistic connections may lead to difficulties in the establishment of novel phonetic categories (i.e., categories specific to the non-native language). Even in cases in which novel categories have been formed, these do not seem to easily become independent from the closest ones in the native language as they are consistently revealed to be assimilated, at some level, to native ones.

An example of dissimilation is provided in Flege et al. (2003). This study measures the acoustic characteristics of the English /eɪ/ vowel, as in day, as produced by two groups of bilinguals from Canada, early and late learners of English whose native language is Italian. It would seem that Italian‒English bilinguals establish a link between English /eɪ/ and Italian /e/, its closest vowel category. These two phonetic categories, however, are not acoustically identical—while the Italian vowel is a monophthong, the English one is a diphthong. The study revealed that early bilinguals produced English /eɪ/ with more formant movement than English native controls. The authors propose that this was due to the fact that the bilinguals had formed a novel category for English /eɪ/, separate from Italian /e/, and this had led to an acoustic dissimilation of the two sounds—a process akin to phonetic category dispersion. A dissimilation still reveals the existence of a connection of some sort between a native and a non-native phonetic category.

The studies discussed in this section reveal that the two languages of a bilingual occupy a single representational network and share cognitive resources during processing. It is important to approach the study of bilingual phonetics and phonology from this standpoint. How are the phonological networks of the two languages of a bilingual connected with each other? How selective (or not) are the (phonetic) processing mechanisms of a bilingual? This section has laid the ground regarding cross-linguistic category interactions by discussing some basic concepts, such as those of category assimilations and category dissimilations.

Phonetics, Phonology, and the Lexicon

The present chapter is concerned with phonological knowledge and phonetic behavior in bilinguals. In my view, this means that the chapter is about mnemonic representations of language sounds (phonology) as well as patterns of sound production and perceptual processing (phonetics) as manifested in people who speak two languages.

According to some schools of thought, phonological knowledge consists of an inventory of sound categories, some of them called phonemes, and a set of rules to combine them to form lawful sequences. The sounds of a language differ in the nature of their status within the system. According to one view, one can distinguish between contrastive and non-contrastive sounds—the former are called phonemes. Whether a sound is contrastive or not is revealed by comparing words. A phoneme is a phoneme by virtue of being associated with a set of words and excluded from another set of words. For instance, Spanish [k] and [g] are contrastive because they may be exploited to distinguish a word pair such as casa [’kasa] “house” and gasa [’gasa] “gauze,” while [g] and [ɣ] are not contrastive because they are in complementary distribution in this language. It follows that knowledge of a phonological category, or phoneme, consists of (i) knowledge of a phonetic category (perhaps an abstract, symbolic representation including phonetic-substance information or a cloud of detailed statistical information) and (ii) knowledge of the set of words that contain this category. If a second-language learner manages to acquire a new phonetic category, as revealed by identification and discrimination tasks with isolated sounds, but does not learn which words possess this category and which words do not, the leaner may not be said to have acquired a phoneme or a phonemic contrast. An important goal of phonetic and phonological research on bilingualism, in addition to exploring new category formation and the development of language-specific production and perceptual habits and strategies, is to obtain an understanding of the storage and retrieval of the phonological forms of words (e.g., Bybee, 2001).

The effects of the structure of the lexicon on second-language phonetics and phonology have been investigated in manifold studies. Some of these are reviewed in this chapter. Consider the following finding. Amengual (2011) measured the acoustics of Spanish word-initial /t/ as produced by several groups of Spanish‒English bilinguals differing in linguistic experience. Among other phonetic features, utterance-initial Spanish /t/ differs from English /t/ in that, when pronounced, the former is not aspirated while the latter is. Proficient Spanish‒English bilinguals presumably use aspirated [th] in English, but not in Spanish (Magloire and Green, 1999), but bilinguals have been found to use voiceless stops with aspiration values intermediate between those of their native language and those of the second language (e.g., Flege, 1987; Fowler et al., 2008). The latter finding is generally attributed to having established a cross-language interaction between the voiceless stops of the two languages. But just what interacts with what? Amengual’s study focused exclusively on the production of Spanish words; some of the Spanish words in the study had English cognates (teléfono “telephone,” total “total”) while others did not (teclado “keyboard,” tejado “ceiling”). Even in an entirely unilingual environment, [t]s in Spanish–English cognates displayed slightly longer latencies between the time of articulatory release and the onset of voicing (i.e., a small period of aspiration) than [t]s in Spanish words with no English cognates. This shows that Spanish /t/ is more similar to English /t/ in words that are arguably densely connected in the lexicon, or that share a semantic representation across the two languages, and more different in words that are not so densely connected. (Further research on lexical connections is needed, but cf. Mora and Nadeu, 2012.) Amengual’s findings strongly suggest that cross-language interactions take place in the lexicon, or at least are mediated by lexical connections. This study suggests that a thorough understanding of the phonetics and phonology of bilingualism needs to incorporate an understanding of the structure of the bilingual lexicon. Therefore, theoretical models of the phonetics and phonology of bilinguals should be able to account for findings in the bilingual speech production and perception literatures as well for findings in the literature on lexical processing. This position justifies, in my view, the discussion of spoken word recognition studies in a chapter on the phonetics and phonology of bilingualism.

If a bilingual shows evidence of having acquired a second-language phonetic category or having developed accurate discrimination abilities of a second-language contrast, this is still not proof that a phoneme has been established. The linguistic, structural consequences of this feat, therefore, would still need to be proven with different experimental tasks. In a recent study of Catalan‒Spanish bilinguals, Bosch and Ramon-Casas (2011) reported on data that raise this issue, even if only indirectly. Bosch and Ramon-Casas measured the acoustics of two Catalan mid-front vowels produced by a group of Catalan‒Spanish bilinguals differing in their language-dominance patterns. Catalan contrasts /e/ with /ɛ/ while Spanish possesses /e/; it is reasonable to assume, therefore, that both Catalan vowels are assimilated to (or interact with) Spanish /e/. Unsurprisingly, Spanish-dominant bilinguals were found to produce a merged mid-front vowel rather than two acoustic distributions for the two Catalan phonemes. Catalan-dominant bilinguals produced both vowel phonemes in different acoustic distributions; that is, when the acoustics of the target vowels of words with /ɛ/ and those with /e/were compared, these were found to be different. Spanish-dominant bilinguals did not distinguish between /ɛ/- and /e/-words in terms of the acoustics of their target vowels. This suggests that Spanish-dominant bilinguals had experienced a merger due to interactions between these vowels and their own Spanish /e/ (cf. Simonet, 2011, for a similar account regarding Catalan /o/‒/ɔ/). It is interesting to note, however, that a phonetically trained native speaker of Catalan heard the vowels produced by the Spanish-dominant bilinguals in terms of two categories, [e] and [ɛ], and not in terms of one merged category, which is what the first statistical analysis had revealed. When these vowel realizations were statistically compared following a “phonetic” (auditory) rather than “phonological” (lexical, etymological) classification, two statistical distributions arose. This could suggest that Spanish-dominant bilinguals had not failed to establish separate phonetic categories but had failed to associate these with words or word lists. Perhaps what they lacked were stable lexical representations of these sounds. According to the definition of “phoneme” we employed earlier, what these bilinguals had failed to do was to form two new phonemes or a new phonemic contrast—they may have learned [e] and [ɛ], but not /e/ and /ɛ/. This study illustrates that, before we can claim that a second-language learner has been able to acquire a phonemic contrast of her/his second language, in addition to proving that the learner perceives two categories as categorically distinct, one would need to show that s/he has memorized the list of words that have one category versus the other.

Non-native Contrasts and (Asymmetric) Lexical Access

There exists a literature on the process of spoken-word recognition in bilingualism. According to my personal view, these studies investigate the processing of phonemes, phonological patterns, and phonological entries by virtue of the fact that they investigate the link between sounds and words. This is why some of the findings in this literature are discussed here.

Pallier et al. (2001) tested two groups of Catalan‒Spanish bilinguals on their lexical processing of several Catalan contrasts, such as the /e/‒/ɛ/ contrast. The experimental paradigm used in this study relied on the phenomenon of priming in an auditory lexical decision task. It is a recurrent finding that listeners recognize a word faster if they have recently been exposed to the word. Pallier and colleagues selected a set of minimal pairs in Catalan that contrasted on the basis of sound categories that do not exist in Spanish and, presumably, would provoke difficulties to Spanish-dominant bilinguals, such as the /e/‒/ɛ/ contrast mentioned earlier. The hypothesis was that, upon hearing the second member of a minimal pair in a list ([’ne.tə] “granddaughter,” [’nɛ.tə] “clean, fem.”) Catalan-dominant listeners would not experience priming or lexical facilitation; on the other hand, Spanish-dominant listeners would experience priming or facilitation on the account that they would process the word as a repetition of the other member of the minimal pair—instead of having heard a different word they would have heard a second rendering of the same word. The findings corroborated that (i) both groups of bilinguals experienced repetition priming for words that were actual repetitions of previous items, (ii) they did not experience any repetition priming for words involved in minimal pairs made up of phonemic categories shared by both languages, and, importantly, (iii) the Catalan-dominant bilinguals did not experience repetition priming for words involved in minimal pairs involving contrasts specific to Catalan while the Spanish-dominant bilinguals did. The authors interpreted this finding as follows: Spanish-dominant bilinguals had not only presumably failed to establish a phonetic category for the novel Catalan sounds, they had in effect stored minimal pairs as homophones. Phonetic category interactions had had an effect on lexical storage, and therefore on phonological knowledge.

This interpretation has not remained unchallenged. Hayes-Harb and Masuda (2008: 8), among other scholars (cf. Darcy et al., 2013), have pointed out that the findings of Pallier et al. (2001) do not necessarily confirm that minimal pairs are stored as homophones in the Catalan lexicon of Spanish-dominant speakers. Alternatively, it could be that these bilinguals have represented these words as minimal pairs (i.e., as heterophones), and thus have associated two lexical sets with two contrastive sounds, but fail to access lexical entries efficiently during lexical retrieval due to performance, perceptual-encoding limitations (rather than representational differences). Although this is certainly a possibility, at present we lack detailed models of how language learners would be able to store phonemic contrasts in the lexicon if they experience perceptual difficulties with non-native sounds. Some proposals have been made (one includes the role of orthography, for instance), but in my opinion we are far from having a full grasp of the process, and more research is needed in this regard.

The fact that bilinguals may be able to store lexical (phonological) contrasts for non-native phonemes while experiencing perception difficulties with the phonetic realizations of these phonemes has been corroborated in a number of recent studies. Cutler et al. (2006), following up on an unexpected finding in Weber and Cutler (2004), investigated the process of English spoken-word recognition in Japanese-speaking English learners, who are known for their difficulties with the English /l/‒/ɹ/ contrast. This study used the visual world paradigm and the eye-tracking technique (e.g., Spivey and Marian, 1999). The design showed four pictures on a computer screen. One of the pictures was the target (the one named in the instructions: locker), two were fillers, and a fourth object was a minimally paired competitor (an object whose name began with the opposite member of the /l/‒/ɹ/ contrast: rocket). A revealing finding was that competition between /l/ and /ɹ/ words was asymmetric. Upon hearing /ɹ/-words, /l/-words were activated (i.e., locker competed for attention with rocket upon hearing rocket). On the other hand, upon hearing /l/-words, /ɹ/-words were not simultaneously activated (i.e., rocket did not compete for attention when locker was played). The most significant aspect of this finding is that, for the Japanese learners of English, rocket and locker did not begin with some undefined, unspecified, identical consonant. The fact that confusion was asymmetric indicates that, at some representational level, the phonological contrast between /l/ and /ɹ/ words is maintained by these learners, even though they experience serious difficulties when processing [l] and [ɹ]. This finding suggests that the learners do not store rock and lock as homophones. In other words, they have problems with English [l] and [ɹ], but not so much with English /l/ and /ɹ/. It could be said, for instance, that their representation of /l/ includes [l] and [ɹ] while their representation of /ɹ/ includes only [ɹ] or something similar (crucially, not [l]).

How could a second-language learner master a second-language phonemic contrast if the phonetic nature of the distinction is problematic for her/him? I am not aware anyone has provided us with a compelling answer to this question yet. I believe, however, that one of the directions in which future research could go to explore this question is this: We could not only recognize, but actively and explicitly investigate the separate levels of encoding possibly involved in auditory speech comprehension. Consider the following finding—one of many related findings. It is sometimes claimed that the difficulties experienced by bilinguals with certain non-native sounds occur at some relatively high encoding level, not at a very basic acoustic encoding level. For instance, Mann (1986) reported that Japanese-speaking learners of English who were unable to identify the English [l]‒[ɹ] distinction showed, nonetheless, a pattern of compensation-for-coarticulation of [al] versus [aɹ] on a [da]‒[ga] continuum, just as native English speakers do. In other words, the acoustic or articulatory difference between [aɹ] and [al] did not go unnoticed by Japanese-speaking learners, at least at some basic, pre-lexical level of encoding. How do the levels of encoding (in case there are indeed separate, sequentially ordered levels,) interact in bilinguals? Can early stages of encoding produce representational entries (e.g., have an effect on lexical representations) after significant exposure while later (post-lexical, meta-lexical) stages of processing continue to manifest categorization difficulties?2

Cross-Linguistic Category Interactions Are Dynamic

Communicative environments affect phonetic behavior—speech production and perception. In turn, phonetic adaptations to a communicative environment may have transient but, sometimes, long-lasting impact on linguistic representations. Pardo (2006) showed that speakers tend to converge in the acoustics of their speech even after a brief period of conversation. Clarke and Garrett (2004) found that English-speaking listeners improve in their perceptual processing of foreign-accented speech after a brief experimental training phase. Norris et al. (2003) demonstrated that phonological categories (i.e., phonetic categories as represented in the lexicon) are “updated” during speech performance. In this study, listeners were exposed to sounds that had been manipulated to be ambiguous between [s] and [f]. Some listeners were exposed to this ambiguous sound in words containing /f/ (in Dutch words such as witlof “chicory”) and others heard this sound in words containing /s/ (in Dutch words such as naaldbos “pine forest”). Subsequently, these listeners categorized fricative stimuli on an [f]‒[s] continuum; those who heard the ambiguous sound in the /s/ context labeled more of the stimuli as “s” and those who heard it in the /f/ context labeled more of them as “f.” Among other things, these data indicate that categorical entities such as /s/ and /f/ are constantly “updated.” McQueen et al. (2006) verified that perceptual recalibration (category “updates”) affects lexical recognition; that is, training tasks of the sort in Norris et al. (2003) were found to modulate patterns of spoken word recognition, as measured by a priming paradigm, and not only patterns of labeling of sound continua. In other words, /f/ and /s/ are “updated” as a result of exposure to variable speech.

The fact that there is evidence that sound categories are “updated” and that phonological representations in the lexicon are also involved in this phonetic recalibration is important for our understanding of second-language speech learning. A case that has received ample attention in the literature is that of Japanese-speaking learners of English. English contrasts /l/ and /ɹ/, such as in lock and rock. Native speakers of Japanese are well known for the difficulties they seem to experience when learning this English contrast. Apparently, both English sounds are assimilated to Japanese /ɾ/, albeit poorly (Aoyama et al., 2004). Logan et al. (1991) and Lively et al. (1993) trained a group of Japanese speakers on the English contrast and found that, if the training consisted of exposing the learners to the items of multiple English talkers, its results were “better” (i.e., listeners showed more perceptual improvement) than if training had only used the speech of one single talker. Exposure to more variability led to the Japanese speakers generalizing their learning to new words and to new speakers (cf. Iverson et al., 2005). Further research showed that the “positive” effects of perceptual training also included production (Bradlow et al., 1997), and that improvements remained three months after training and, albeit more modestly, also six months after training (Lively et al., 1994). These findings suggest, among other things, that cross-linguistic interactions are dynamic; that is, whatever cross-linguistic connections Japanese-speaking learners of English form between the two English phonetic categories and their closest Japanese category at the earliest stages of exposure, these are modifiable and they are indeed modified as a result of exposure to variable speech.3 Cross-linguistic phonetic category interactions (not only the categories themselves, but the connections between the categories) are, therefore, malleable—they are plastic.

The plasticity of cross-linguistic phonetic category interactions does not end there. A number of research studies have provided evidence to the effect that these interactions can be revised as a function of the immediate communicative environment in which a bilingual finds herself/himself. While Flege (1987) and Fowler et al. (2008), among many others, have found that bilinguals behave as if they have developed intermediate phonetic categories between the sounds of the first language and the sounds of their second language, other studies have reported that these findings could be due, at least in part, to an experimental artifact. Antoniou et al. (2010) measured the acoustics of stops produced by Greek‒English bilinguals; English voiceless stops are aspirated whereas Greek ones are not. These bilinguals were heritage Greek speakers who were brought up in an English-speaking country (Australia) and who seemed to be dominant in English even though their first language was Greek. The bilinguals were randomly assigned to two groups: one group provided materials only in English and the other provided materials only in Greek. All instructions and materials were also in the appropriate language. In these controlled environments, the bilinguals’ English stops did not differ from those produced by English monolinguals and their Greek stops were not different from those produced by a group of monolingual Greek speakers from Athens, Greece. While this is a between-speakers finding, a similar result was obtained by Magloire and Green (1999) using a within-speakers design. It would seem, therefore, that the strength of cross-linguistic phonetic category interactions can be inhibited, at least by some bilinguals, in unilingual contexts. It can even be reduced to nothing.

Further proof of the malleability of cross-linguistic category interactions comes from a case study of a Portuguese‒English speaker from Brazil (Sancier and Fowler, 1997). Production data were collected from the same bilingual speaker in two experimental sessions: one of the sessions took place in the United States and the other in Brazil. Both Portuguese and English materials were collected in both occasions, and the bilingual speaker distinguished between both English and Portuguese materials in terms of the acoustics of her stop consonants. An interesting finding was that the acoustic characteristics of the English stops recorded in the Brazilian session resembled those of Portuguese and the acoustic characteristics of the Portuguese stops produced in the American session resembled those of English—recall, however, that these assimilations did not lead to cross-language merger of categories. By hypothesis, the Portuguese‒English bilingual had been speaking mostly in English prior to providing the English and Portuguese materials in the American session, while she had been speaking mostly Portuguese before she participated in the Brazilian session of the study. In other words, a reasonable conclusion is that recent experience matters—it affects the strength and direction of cross-linguistic interactions.

The finding that cross-linguistic category interactions are pliable extends to very dynamic or transient situations. Olson (2013) and Simonet (2014) examined the effects of transient, dynamic cross-linguistic category interactions in two production studies. Olson tested a group of Spanish‒English bilinguals participating in a language-switching, cued picture-naming task. One of the experimental manipulations in the study was as follows: In one session (unilingual), most of the words pronounced by the bilinguals were in one language (95%) while only a few were in the other language (5%); in a second session (bilingual), the number of words in the two languages was balanced (50%). The acoustic characteristics of the English and Spanish stop consonants produced by the bilingual participants were more different from each other when data from the two unilingual sessions were compared than when data collected in the bilingual session were compared. In other words, a balance in the presence of words from both languages led to an increase in cross-language phonetic similarity, which suggests that the cross-language interactions established by the speakers between native and non-native stop categories had been circumstantially, transiently strengthened. Simonet (2014) explored the productions of a group of Catalan‒Spanish bilinguals with a general experimental design similar to the one used by Olson. In Simonet’s study, three groups of Catalan‒Spanish bilinguals differing in their linguistic experiences were recorded while producing Catalan words implementing the Catalan /o/‒/ɔ/ contrast. (Spanish has one mid-back vowel, /o/, and Spanish‒Catalan bilinguals presumably assimilate the two Catalan mid-back vowels to the one Spanish mid-back vowel.) The bilinguals participated in two sessions. In a unilingual session they exclusively pronounced words in Catalan; in a bilingual session they produced both Spanish as well as Catalan words, the same Catalan words produced in the unilingual session. Speakers were asked to shadow speech recordings collected from native speakers of the target languages, three Catalan-dominant and three Spanish-dominant “talkers.” The participants thus had an immediate acoustic model for their production, and the Catalan acoustic models were the same in the two sessions. An analysis of the two Catalan mid-back vowels indicated that, in the bilingual session, both vowels had become acoustically more similar to the Spanish vowel than they were in the unilingual session. In sum, both Olson (2013) and Simonet (2014) reported findings that indicate that cross-language phonetic category interactions are measurably strengthened in communicative contexts that lead to the use of the two languages of a bilingual. Studies of the phonetic effects of code-switching take the pliability of cross-language phonetic interactions one step further (Antoniou et al., 2011).

The evidence discussed in this section suggests the following. (i) Bilinguals develop links or connections of one sort or another between the phonetic categories in their native language and the closest ones in their non-native language. (ii) Moreover, these connections are pliable or plastic; that is, they are transiently strengthened in contexts that induce the activation of both languages and inhibited in contexts that favor the use of only one of the languages. This interpretation agrees with Grosjean’s Bilingual Language Modes model (e.g., Grosjean, 2001). The extent to which the inhibition of cross-language category interactions is absolute, even in the most unilingual of the occasions, remains controversial. It would appear less controversial that increased activation of both languages leads to increased effects of cross-language interactions, but further research is needed, especially on whether (or how much) different bilingual populations show this pattern, and in what occasions this occurs.

Language-Specific Perceptual Strategies: Now You See Them, Now You Don’t

Several studies have examined whether bilinguals make use of language-specific perceptual strategies when perceiving speech. The studies reviewed in this section examine whether listeners’ expectations regarding the language they “believe” they are processing alter their categorization of the speech signal. They examine whether listeners can control their perceptual processing mechanisms or whether at least effects of communicative context lead to transient changes in perceptual strategies. If this is attested in an experimental task, it could be suggested that bilinguals possess language-specific perceptual systems and that these systems alter perceptual processing selectively, in real time. This would be robust proof of pliability in cross-language phonetic category interactions. This evidence would be extremely relevant since most evidence from other domains, including spoken word recognition (Spivey and Marian, 1999; Marian and Spivey, 2003a, 2003b), seem to suggest that language activation is not language-selective.

Before we discuss the literature on language-specific perceptual strategies, let us ponder over the role of language-selective perceptual processing during spoken word recognition. Consider the findings of Ju and Luce (2004), who presented Spanish‒English bilinguals with spoken Spanish words beginning with voiceless stops. Recall that Spanish word-initial voiceless stops are not aspirated, unlike their English “counterparts.” The Spanish spoken words played to the Spanish‒English bilinguals in this study were manipulated so that their word-initial consonant could present aspiration (or not). For instance, the word playa “beach” was presented without aspiration, as is typical in Spanish, or with aspiration, more typical of the English word pliers. Bilinguals wore a head-mounted eye-tracker, and they were asked to mouse-click on one of four line drawings appearing on a computer screen. One of the items was the target item (a line drawing of a beach when hearing the Spanish word playa “beach”), two were fillers, and a fourth was a cross-language competitor (a line drawing of a pair of pliers, alicates in Spanish). When Spanish auditory stimuli were presented with English-like phonetic characteristics (aspiration), looks to cross-language competitors (pliers) were significantly more frequent than when Spanish auditory stimuli had Spanish phonetic characteristics. In other words, the activation strength of playa was stronger when /pl/- was not aspirated than when it was. Interestingly, even though the experiment induced a unilingual mode, in that the entire task was done in Spanish, and the bilinguals were exclusively asked to respond to Spanish lexical items, English-like phonetic characteristics present in Spanish auditory stimuli had the power to increase the activation strength of English lexical competitors. An important aspect of the findings of this study is that they suggest that the use of language-specific phonetic properties in bilinguals has the effect of reducing cross-language competition during lexical access and turning the process into a rather language-selective one. It would follow that actively manipulating perceptual strategies so that they reflect language-specific behavior aids bilinguals in the process of finding words in a certain language while rejecting words from the non-intended language.

In the studies aiming to determine whether bilinguals exploit language-specific perceptual strategies, bilinguals are asked to categorize stimuli drawn from an acoustic continuum into two phonetic categories, such as /p/ and /b/. Crucially, the two languages of the bilingual are hypothesized to possess these (at least) two phonological entities, albeit their phonetic implementation is thought to differ as a function of language. This is the case with the Spanish and the English /p/‒/b/ contrast—both languages contrast voiceless with voiced bilabial stops but the phonetic realization of these categories differs as a function of language. On the one hand, Spanish /b/ is pre-voiced ([b]), Spanish /p/ is not ([p]); on the other hand, English /p/ is not generally prevoiced but it displays a brief lag between the release of articulatory closure and the onset of voicing ([p]), and English /p/ is aspirated ([ph]) (Lisker and Abramson, 1964; Abramson and Lisker, 1970, 1973). In a two-alternative, forced-choice identification task conducted on a [b]‒[ph] acoustic continuum one expects the responses of native Spanish listeners and those of native English listeners to result in sigmoids with different horizontal 50% cross-over points. In particular, the point at which the Spanish sigmoid crosses 50% should be earlier (overall more “p” responses) than that of the English sigmoid function. These studies ask bilinguals to categorize the same stimuli in two languages or, more appropriately, language contexts or modes. The hypothesis is that, if bilinguals possess language-specific perceptual strategies, their behavior will result in language-specific perceptual boundaries detected on the same acoustic continuum. This would support a view according to which speech perception in bilinguals is language-selective, at least to some extent.

The first studies to test this hypothesis generally resulted in null findings—bilinguals were able to produce language-specific phonetic categories but did not show two different identification boundaries for the same acoustic continuum in two language modes (Caramazza et al., 1974; Williams, 1977). The question arose as to what may lead bilinguals to acquire language-specific phonetic categories to be used on pronunciation and yet not adapt their perceptual system to make it more efficient during speech comprehension.

Elman et al. (1977) offered the possibility that the null findings of Caramazza et al. (1974) and Williams (1977) could be due to the methods used to induce language contexts or environments. Elman et al. suggested that conversations and instructions prior to the identification of synthetic stimuli did not necessarily guarantee that a participant would remain in a particular language mode throughout the task; furthermore, the use of synthetic stimuli might not be entirely conducive to the activation of language- or even speech-specific perceptual strategies. Elman et al. tested Spanish‒English bilinguals on a ranked natural set of stimuli recorded by a bilingual speaker; each stimulus was preceded by an auditory language-appropriate instruction, such as Write the word or Escriba la palabra. An assortment of language-appropriate filler words was randomly played during the task—bilinguals were thus “reminded” of the language environments throughout the experiment. Language-specific categorization was found—bilinguals switched their identification of ambiguous stimuli, stimuli that would be classified as “p” in a Spanish mode and as “b” in an English mode. Other studies that were able to capture language-specific perceptual strategies by means of different methods were those of Hazan and Boulakia (1993) and Flege and Eefting (1987).

One perceptual phenomenon raised serious doubts regarding Elman et al.’s conclusions. The authors of this study later published an experimental report the findings of which suggested that the original effects could have been triggered by the acoustic characteristics of the precursor sentences rather than the language modes (Diehl et al., 1978). Thus, instead of inducing a Spanish-specific perceptual strategy in bilinguals, sentences such as Escriba la palabra may have affected the categorization of the [b]‒[p] continuum by the mere presence of the [p] (and not [ph], for instance) in palabra. Effects of precursor materials on stimuli categorization are well known (cf. Eimas and Corbit, 1973), and Diehl et al. (1978) showed that listeners tended to classify ambiguous [b]‒[p] sounds more as “p” after hearing a clear [b] and more as “b” after hearing a clear [p]. In fact, Bohn and Flege (1993) were able to detect “language-specific” responses even in monolinguals, which was really puzzling indeed. Bohn and Flege tested Spanish‒English bilinguals and monolinguals and collected data in two language-specific sessions. Precursor sentences were presented throughout the study to ensure language specificity in perceptual behavior. The fact that the classifications of both bilinguals and monolinguals resulted in “language-specific” identification boundaries suggested that the effect obtained for the bilinguals was not due to language-specific perceptual strategies but rather to something else, most likely the presence of acoustic precursor sentences.

Note that both Elman et al. (1977) and Hazan and Boulakia (1993) found that, when bilinguals presented two perceptual boundaries, the difference between these two boundaries was larger for early, proficient bilinguals than for less proficient bilinguals. Therefore, the possibility remained that both factors, language-specific perceptual behavior and the presence of precursor sentences, contributed to the effects. García-Sierra et al. (2009) set out to find whether “the level of confidence in using English and Spanish (reading, writing, speaking and comprehension)” was correlated with the size of the boundary effect (García-Sierra et al., 2009: 369). They tested both monolingual English and Spanish‒English bilinguals on a [g]‒[kh] continuum and expected to find a correlation for the bilinguals (but not for the monolinguals). Both bilinguals and monolinguals participated in two “language-specific” sessions. García-Sierra et al. found significant language-context effects for both bilinguals and monolinguals. Interestingly, bilingual proficiency correlated with the size of the perceptual boundary effect for the bilinguals but the monolingual effect was random. This study replicated previous findings with a more sophisticated and careful methodology. The authors recommended that subsequent research on the double perceptual boundary effect should (i) ensure that the language environment or context is kept constant throughout the perceptual task and (ii) try to “influence participant’s phonetic judgements” not by means of acoustic precursor sentences but by means of other experimental manipulations (García-Sierra et al., 2009: 378).

Two studies have managed to do just that. García-Sierra et al. (2012) investigated the early stages of speech perception by analyzing event-related brain potentials (ERPs) obtained from Spanish‒English bilinguals processing auditory stimuli from an acoustic stop continuum. The researchers selected stimuli that could be termed ambiguous (i.e., stimuli that would be classified as “g” by an English speaker and as “k” by a Spanish speaker). Event-related potentials are able to reflect pre-attentive perception, and the participants are not required to respond to the stimuli actively. Participants were reading a language-appropriate magazine (Spanish in the Spanish setting and English in the English setting), and they were played the synthetic stimuli in the background. This experiment, therefore, did not use precursor sentences. In these studies, one stimulus token is played repetitively and occasionally a different stimulus token is played. Event-related potentials’ data capture whether the occasional changes in the stimuli trigger any sort of pre-attentive reaction in the listeners. The hypothesis in this study was that Spanish‒English bilinguals would react differently as a function of two different language environments. In particular, the difference between [k] and [kh] would provoke a strong reaction in the English environment (as the stimuli belong to two different categories in this language, /g/ and /k/, respectively) but not in the Spanish environment (as they belong to the same category in this language, /k/). On the other hand, the difference between [g] and [k] would trigger a robust reaction in the Spanish environment (as these stimuli belong to two different categories in this language, /g/ and /k/, respectively) but not in the English environment (as they belong to the same category in this language, /g/). The results of the study confirmed the expectations of the researchers. It appears, therefore, that bilinguals have language-specific expectations and these influence speech perception even at pre-attentive stages.

Finally, consider the findings in Gonzales and Lotto (2013). This study also tested Spanish–English bilinguals on an acoustic continuum. The crucial aspect of this experiment is that bilinguals were presumably induced in two different language modes (Spanish, English) solely by the stimuli to be categorized. Interestingly, all conversations and instructions with the participants were in English (in both language contexts), so as to completely avoid any potential effects of acoustic precursors. The key was in the nature of the stimuli. Gonzales and Lotto selected two items that were pseudo-words in both Spanish or English: bafri and pafri. The <r> letter is pronounced as a tap in Spanish and as a retroflex or bunched approximant in English. An acoustic [b]‒[ph] continuum was then used to draw tokens to replace the initial consonant of the pseudo-words so that two experimental continua were created, one from bafri to pafri in which the “r” was a tap (“Spanish” version) and one from bafri to pafri in which the “r” was an approximant (“English” version). By hypothesis, upon hearing a tap and seeing an <r> on the screen, bilingual participants would be set in Spanish mode; upon hearing a retroflex approximant and seeing an <r> on the screen, bilinguals would be set in English mode. The findings corroborated the authors’ expectations: bilinguals labeling the “Spanish” version of the acoustic continuum had a perceptual boundary different from the one the bilinguals labeling the “English” version had, supporting the existence of a double perceptual boundary.

The study by Gonzales and Lotto (2013) raises a number of questions the authors themselves do not address. What do the two variants (the tap and the approximant) actually activate? Do they activate language-specific lexicons? Do they activate language-specific inventories, sound patterns, or networks of phonetic categories? How and why do they create language-specific expectations? On the one hand, [p]s and taps coexist in Spanish words, only seldom in English words,4 and [ph]s and retroflex approximants coexist in English words, never in Spanish words. However, it is not reasonable to suggest that, in this specific study, taps activated Spanish lexical entries (which ones?) and approximants activated English lexical entries (which ones?). An explanation based on the structure of the lexicon would not need to make any reference to the existence of connections between phonetic categories (devoid of lexical information) in language-specific networks, but it would be problematic because rhotics and word-initial bilabial stops may not be connected through lexical entries in the items used in the study since neither pafri nor bafri are real words in neither Spanish nor English. Are we to posit a language-specific link between [b] and [ɾ]? On the basis of what? This question deserves further theoretical modeling and it opens the door to a multiplicity of studies examining the nature of inter-category connections, within and across languages, in bilingualism as well as in monolingualism.

Can bilinguals switch between language-appropriate perceptual strategies as a function of the linguistic environment in which they find themselves? It would appear that, although triggering language-specific perceptual systems is difficult (switching off the “other” system requires some particular conditions), it is indeed possible. Therefore, some degree of language-selectivity is available during the perception of speech.


The present chapter has provided a very selective overview of the research on the phonetics and phonology of bilingualism. In selecting the points to be discussed, I have favored studies that illustrate the fact that, in bilingualism and adult second-language acquisition, speakers seem to develop complex interactions between the sounds in the native language and the sounds in the non-native language (e.g., Flege, 1987; Flege et al., 2003). Two important points discussed in the chapter were as follows: (i) cross-language sound interactions appear to be mediated by the structure of the lexicon (but further research and theorization are needed), and (ii) these interactions are pliable or plastic (but their nature deserves further exploration). On the one hand, lexical associations appear to impact the strength of the cross-language sound interactions created by a bilingual (Amengual, 2011; Mora and Nadeu, 2012). On the other hand, bilinguals may learn to perceive and produce novel phonetic categories without mastering to associate these with different lexical sets (Bosch and Ramon-Casas, 2011; cf. Simonet, 2014), and interestingly they may acquire the skill to process specific phonological contrasts as they are instantiated in the lexicon without fully absorbing the physical nature of the phonetic categories that implement these contrasts in the speech of monolingual speakers (Weber and Cutler, 2004; Cutler et al., 2006). An accurate understanding of bilingual cross-language sound associations can only advance in tandem with an apprehension of lexical storage and processing in bilingualism. Otherwise, research on the phonetics of bilingualism and second-language speech learning runs the risk of being irrelevant for linguistic theory.

Regarding the malleability of cross-language interactions, it was discussed that the requirements of different communicative situations seem to trigger modifications in the strength of cross-language sound interactions—some situations inhibit interactions while others maximize them (Magloire and Green, 1999; Antoniou et al., 2010, 2011; Olson, 2013; Simonet, 2014). The chapter concluded with a review of the literature on language-specificity in bilingual speech perception (Caramazza et al., 1974; Elman et al., 1978; Hazan and Boulakia, 1993; Ju and Luce, 2004; García-Sierra et al., 2009, 2012; Gonzales and Lotto, 2013). While this body of literature serves to illustrate that cross-language category interactions are pliable, it also opens new questions regarding the nature of the connections or networks created by bilinguals not only across language (sub-)systems but also within them.


Abramson, A., and Lisker, L. (1970). “Discriminability along the Voicing Continuum: Cross-Language Tests.” In Proceedings of the 6th International Congress of Phonetic Sciences, pp. 569–573. Prague: Academia.Find this resource:

Abramson, A., and Lisker, L. (1973). “Voice Timing Perception in Spanish Word-Initial Stops.” Journal of Phonetics 1: 1–8.Find this resource:

Amengual, M. (2011). “Interlingual Influence in Bilingual Speech: Cognate Status Effect in a Continuum of Bilingualism.” Bilingualism: Language and Cognition 15: 517–530.Find this resource:

Antoniou, M., Best, C., Tyler, M., and Kroos, C. (2011). “Interlanguage Interference in VOT Production by L2-Dominant Bilinguals: Asymmetries in Phonetic Code-Switching.” Journal of Phonetics 39: 558–570.Find this resource:

Antoniou, M., Tyler, M., Best, C., and Kroos, C. (2010). “Language Context Elicits Native-like Stop Voicing in Early Bilinguals’ Productions in Both L1 and L2.” Journal of Phonetics 38: 640–653.Find this resource:

Aoyama, K., Flege, J., Guion, S., Akahane-Yamada, R., and Yamada, T. (2004). “Perceived Phonetic Dissimilarity and L2 Speech Learning.” Journal of Phonetics 32: 233–250.Find this resource:

Best, C., and Tyler, M. (2007). “Nonnative and Second-Language Speech Perception: Commonalities and Complementarities.” In Language Experience in Second Language Speech Learning: In Honor of James Emil Flege, edited by O.-S. Bohn and M. Munro, pp. 13–34. Amsterdam: John Benjamins.Find this resource:

Bohn, O.-S., and Flege, J. (1993). “Perceptual Switching in Spanish‒English Bilinguals.” Journal of Phonetics 21: 267–290.Find this resource:

Bosch, L., and Ramon-Casas, M. (2011). “Variability in Vowel Production by Bilingual Speakers: Can Input Properties Hinder the Early Stabilization of Contrastive Categories?” Journal of Phonetics 39: 514–526.Find this resource:

Bradlow, A., Pisoni, D., Akahane-Yamada, R., and Tohkura, Y. (1997). “Training Japanese Listeners to Identify English /r/ and /l/: Some Effects of Perceptual Learning on Speech Production.” Journal of the Acoustical Society of America 101: 2299–2310.Find this resource:

Bybee, J. (2001). Phonology and Language Use (Cambridge: Cambridge University Press).Find this resource:

Caramazza, A., Yeni-Komshian, G., and Zurif, E. (1974). “Bilingual Switching: The Phonological Level.” Canadian Journal of Psychology 28: 310–318.Find this resource:

Chang, C. (2012). “Rapid and Multifaceted Effects of Second Language Learning on First-Language Speech Production.” Journal of Phonetics 40: 249–268.Find this resource:

Clarke, C., and Garrett, M. (2004). “Rapid Adaptation to Foreign Accented English.” Journal of the Acoustical Society of America 116: 3647–3658.Find this resource:

Cutler, A., Weber, A., and Otake, T. (2006). “Asymmetric Mapping from Phonetic to Lexical Representations in Second-Language Listening.” Journal of Phonetics 34: 269–284.Find this resource:

Darcy, I., Daidone, D., and Kojima, C. (2013). “Asymmetric Lexical Access and Fuzzy Lexical Representations.” The Mental Lexicon 8: 372–420.Find this resource:

Darcy, I., Dekydtspotter, L., Sprouse, R., Glover, J., Kaden, C., McGuire, M., and Scott, J. (2012). “Direct Mapping of Acoustics to Phonology: On the Lexical Encoding of Front Rounded Vowels in L1 English‒L2 French Acquisition.” Second Language Research 28: 5–40.Find this resource:

Diehl, R., Elman, J., and McCusker, S. (1978). “Contrast Effects on Stop Contrast Identification.” Journal of Experimental Psychology: Human Perception and Performance 4: 599–609.Find this resource:

Dijkstra, T. (2007). “The Multilingual Lexicon.” In Handbook of Psycholinguistics, edited by G. Gaskell, pp. 251–265. Oxford: Oxford University Press.Find this resource:

Eimas, P., and Corbit, J. (1973). “Selective Adaptation of Linguistic Feature Detectors.” Cognitive Psychology 4: 99–109.Find this resource:

Elman, J., Diehl, R., and Buchwald, S. (1977). “Perceptual Switching in Bilinguals.” Journal of the Acoustical Society of America 62: 971–974.Find this resource:

Escudero, P. (2005). “Linguistic Perception and Second Language Acquisition: Explaining the Attainment of Optimal Phonological Categorization.” PhD dissertation, University of Utrecht, The Netherlands. LOT Dissertation Series, 131.Find this resource:

Flege, J. (1987). “The Production of ‘New’ and ‘Similar’ Sounds in a Foreign Language: Evidence for the Effect of Bilingual Classification.” Journal of Phonetics 15: 47–65.Find this resource:

Flege, J. (1995). “Second Language Speech Learning: Theory, Findings and Problems.” In Speech Perception and Linguistic Experience: Issues in Cross-Language Research, edited by W. Strange, pp. 229–273. Timonium, MD: York Press.Find this resource:

Flege, J. (2007). “Language Contact in Bilingualism: Phonetic System Interactions.” In Laboratory Phonology 9, edited by J. Cole and J. I. Hualde, pp. 353–380. Berlin: Mouton de Gruyter.Find this resource:

Flege, J., and Eefting, W. (1987). “Cross-Language Switching in Stop Consonant Perception and Production by Dutch Speakers of English.” Speech Communication 6: 185–202.Find this resource:

Flege, J., Schirru, C., and MacKay, I. (2003). “Interaction between the Native and Second Language Phonetic Subsystems.” Speech Communication 40: 467–491.Find this resource:

Flege, J., Yeni-Komshian, G., and Liu, S. (1999). “Age Constraints on Second Language Acquisition.” Journal of Memory and Language 41: 78–104.Find this resource:

Fowler, C., Sramko, V., Ostry, D., Rowland, S., and Hallé, P. (2008). “Cross-Language Phonetic Influences on the Speech of French English Bilinguals.” Journal of Phonetics 36: 649–663.Find this resource:

García-Sierra, A., Diehl, R., and Champlin, C. (2009). “Testing the Double Phonemic Boundary in Bilinguals.” Speech Communication 51: 369–378.Find this resource:

García-Sierra, A., Ramírez-Esparza, N., Silva-Pereyra, J., Siard, J. and Champlin, C. (2012). “Assessing the Double Phonemic Representation in Bilingual Spanish and English Speakers: An Electrophysiological Study.” Brain & Language 121: 194–205.Find this resource:

Gonzales, K., and Lotto, A. (2013). “A Bafri, un Pafri: Bilinguals’ Pseudoword Identifications Support Language-Specific Phonetic Systems.” Psychological Science 24: 2135–2142.Find this resource:

Grosjean, F. (2001). “The Bilinguals’ Language Modes.” In One Mind, Two Languages: Bilingual Language Processing, edited by J. Nicol, pp. 1–22. Oxford: Blackwell Publishing.Find this resource:

Hayes-Harb, R., and Masuda, K. (2008). “Development of the Ability to Lexically Encode Novel Second Language Phonemic Contrasts.” Second Language Research 24: 5–33.Find this resource:

Hazan, V., and Boulakia, G. (1993). “Perception and Production of a Voicing Contrast by French‒English Bilinguals.” Language & Speech 36: 17–38.Find this resource:

Iverson, P., Hazan, V., and Bannister, K. (2005). “Phonetic Training with Acoustic Cue Manipulations: A Comparison of Methods for Teaching English /r/‒/l/ to Japanese Adults.” Journal of the Acoustical Society of America 118: 3267–3278.Find this resource:

Ju, M., and Luce, P. (2004). “Falling on Sensitive Ears: Constraints on Bilingual Lexical Activation.” Psychological Science 15: 314–318.Find this resource:

Lado, R. (1957). Linguistics across Cultures: Applied Linguistics for Language Teachers (Ann Arbor: University of Michigan Press).Find this resource:

Lisker, L., and Abramson, A. (1964). “A Cross-Language Study of Voicing in Initial Stops: Acoustical Measurements.” Word 20: 384–422.Find this resource:

Lively, S., Logan, J., and Pisoni, D. (1993). “Training Japanese Listeners to Identify English /r/ and /l/, II: The Role of Phonetic Environment and Talker Variability in Learning New Perceptual Categories.” Journal of the Acoustical Society of America 94: 1242–1255.Find this resource:

Lively, S., Pisoni, D., Akahane-Yamada, R., Tohkura, Y., and Yamada, T. (1994). “Training Japanese Listeners to Identify English /r/and/l/, III: Long-Term Retention of New Phonetic Categories.” Journal of the Acoustical Society of America 96: 2076–2087.Find this resource:

Logan, J., Lively, S., and Pisoni, D. (1991). “Training Japanese Listeners to Identify English /r/ and /l/, I: A First Report.” Journal of the Acoustical Society of America 89: 874–886.Find this resource:

Magloire, J., and Green, K. (1999). “A Cross-Language Comparison of Speaking Rate Effects on the Production of Voice Onset Times in English and Spanish.” Phonetica 56: 158–185.Find this resource:

Mann, V. (1986). “Distinguishing Universal and Language-Dependent Levels of Speech Perception: Evidence from Japanese Listeners’ Perception of English ‘l’ and ‘r’.” Cognition 24: 169–196.Find this resource:

Marian, V., and Spivey, M. (2003a). “Bilingual and Monolingual Processing of Competing Lexical Items.” Applied Psycholinguistics 24: 173–193.Find this resource:

Marian, V., and Spivey, M. (2003b). “Competing Activation in Bilingual Language Processing: Within- and Between-Language Competition.” Bilingualism: Language and Cognition 6: 97–115.Find this resource:

McQueen, J., Cutler, A., and Norris, D. (2006). “Phonological Abstraction in the Mental Lexicon.” Cognitive Science 30: 1113–1126.Find this resource:

Mora, J., and Nadeu, M. (2012). “L2 Effects on the Perception and Production of a Native Vowel Contrast in Early Bilinguals.” International Journal of Bilingualism 16: 484–499.Find this resource:

Norris, D., McQueen, J., and Cutler, A. (2003). “Perceptual Learning in Speech.” Cognitive Psychology 47: 204–238.Find this resource:

Olson, D. (2013). “Bilingual Language Switching and Selection at the Phonetic Level: Asymmetrical Transfer in VOT Production.” Journal of Phonetics 40: 407–420.Find this resource:

Oyama, S. (1976). “A Sensitive Period for the Acquisition of a Phonological System.” Journal of Psycholinguistic Research 5: 261–283.Find this resource:

Pallier, C., Colomé, A., and Sebastián-Gallés, N. (2001). “The Influence of Native-Language Phonology on Lexical Access: Exemplar-Based versus Abstract Lexical Entries.” Psychological Science 12: 445–449.Find this resource:

Pardo, J. (2006). “On Phonetic Convergence during Conversational Interaction.” Journal of the Acoustical Society of America 119: 2382–2393.Find this resource:

Sancier, M., and Fowler, C. (1997). “Gestural Drift in a Bilingual Speaker of Portuguese and English.” Journal of Phonetics 25: 421–436.Find this resource:

Simonet, M. (2011). “Production of a Catalan-Specific Vowel Contrast by Early Spanish‒Catalan Bilinguals.” Phonetica 68: 88–110.Find this resource:

Simonet, M. (2014). “Phonetic Consequences of Dynamic Cross-Linguistic Interference in Proficient Bilinguals.” Journal of Phonetics 43: 26–37.Find this resource:

Spivey, M., and Marian, V. (1999). “Cross Talk between Native and Second Languages: Partial Activation of an Irrelevant Lexicon.” Psychological Science 10: 281–284.Find this resource:

Trubetzkoy, N. ([1939] 1969). “Grundzüge der Phonologie.” In Principles of Phonology, Travaux du cercle linguistique de Prague 7, translated by C. Baltaxe. Berkeley: University of California Press.Find this resource:

Weber, A. and Cutler, A. (2004). “Lexical Competition in Non-native Spoken-Word Recognition” Journal of Memory and Language 50: 1–25.Find this resource:

Williams, L. (1977). “The Perception of Stop Consonant Voicing by Spanish‒English Bilinguals.” Perception & Psychophysics 21: 289–297.Find this resource:


(1) Interestingly, recent studies have found that, under at least some circumstances, cross-language interactions in bilinguals are mutual, so that the sounds of the first language may be impacted upon learning the sounds of the second language (e.g., Chang, 2012). The phonology of the native language is, therefore, not immune to cross-language interactions.

(2) For further theorization, see Darcy et al., 2012, 2013.

(3) An alternative interpretation might be that training provided listeners with a sufficiently large set of /r/s and /l/s produced by different speakers in different phonetic contexts and words, which made the bimodal (rather than seeming unimodal) distribution of /r/ and /l/ become apparent. This view does not require “updating” the link between native and non-native categories.

(4) Speakers of some English dialects pronounce <r> with a tap rather than an approximant and thus this would not be true of those speakers. The speakers in Gonzales and Lotto (2013), who live in Arizona, pronounce <r> as an approximant, not a tap. These speakers flap some of their /t/s and /d/s, but in this experiment the tap/flap was associated with <r>, not <t> or <d>.