Phonotactics and Syllable Structure in Infant Speech Perception
Abstract and Keywords
Phonotactics and syllable structure form an integral part of phonological competence and may be used to discover other aspects of language. Given the importance of such knowledge to the process of language acquisition, numerous studies have investigated the development of phonotactic and syllabic knowledge in order to determine when infants become sensitive to these sound patterns and how they may use this knowledge in language processing. Considering that infants’ first exposure to linguistic structures comes from speech perception, we provide an overview of the perception-related issues that have been investigated experimentally and point out issues that have not yet been addressed in the literature. We begin with phonotactic development, examining a wide range of sound patterns, followed by a discussion of the acquisition of syllable structure and a brief summary of various outstanding issues that may be of interest to the reader, including production-related investigations and phonological modeling studies.
Phonotactics and syllable structure form an integral part of phonological competence and may be used to discover other aspects of language. Learners who are equipped with knowledge of which phonotactic structures are allowed in the native language may apply this knowledge during online language processing to locate potential word boundaries, which in turn helps them build the lexicon (Mattys and Jusczyk 2001b). Learners may also simultaneously use their knowledge about distributions of speech segments to classify lexical items into different word categories, such as nouns and verbs (Onnis and Christiansen 2008; Christiansen et al. 2009; Lany and Saffran 2010). Hence, given the importance of such knowledge to the process of language acquisition, numerous studies have investigated the development of phonotactic and syllabic knowledge in infancy and attempted to determine the point in development at which infants become sensitive to sound patterns or how knowledge of phonotactics and the syllable could be used as a strategy to solve the word segmentation problem (among others, Friederici and Wessels 1993; Jusczyk et al. 1993a, 1994; Chambers et al. 2003, 2011; Zamuner 2003, 2006, 2009a, 2009b; Seidl and Buckley 2005; White et al. 2009).
The present chapter aims to introduce the reader to the relevant literature on the development of phonotactics and syllable structure. Considering that infants’ first exposure to linguistic structures comes from speech perception, our main goal is to provide an overview of the perception-related issues that have been investigated experimentally and to point out those questions that have not yet been addressed in the literature. We begin with phonotactic development, examining a wide range of sound patterns, followed by a discussion of the acquisition of syllable structure and a brief summary of (p. 28) various outstanding issues that may be of interest to the reader, including production-related investigations and phonological modeling studies.
The term phonotactics refers to language-specific restrictions on sequencing of speech sounds (Haugen 1956a; Hill 1958). For example, while English words are allowed to end in ŋ (as in ‘sing’), the ŋ is phonotacticaly illegal in word-initial position, as no English word may begin with this sound (Whorf 1940). The same constraint is not found in other languages where words are allowed to begin with ŋ. Hopi, for example, has the words ŋɨmni ‘flour’ and ŋɨhɨ ‘medicine’ (Jeanne 1992). Such phonotactic patterns are often described by referring to the notion of the syllable (e.g. prohibition against specific onsets; Haugen 1956b; Fudge 1969) and can be classified as “absolute” versus “probabilistic” or “first-order” versus “second-order” (Chambers et al. 2011).
Absolute phonotactic patterns (also called “categorical phonotactic constraints”) refer to those restrictions that are never violated in a given language. Absolute patterns may involve either individual segments (e.g. the constraint against the word-initial ŋ in English) or sequences of sounds (e.g. the constraint against the word-initial sequence bn in English). Cross-linguistic differences in the distribution of segments have long received attention in the literature (Haugen 1951; Saporta and Olson 1958), and distributional analyses of where phonemes can and cannot occur have been used as a tool and diagnostic for identifying phonemic inventories (Hill 1958). In contrast to the absolute phonotactic constraints, probabilistic phonotactic patterns refer to the statistical likelihood of a sound or a sequence of sounds occurring in a specific environment (e.g. in languages such as Hopi, the ŋ may have different likelihoods of occurring word-initially versus word-finally). Such descriptions of the relative frequency of segments have a long tradition in linguistic literature, dating back to the Prague school of linguistics (Zipf 1935; Trubetzkoy 1939/1969; Saporta 1955; Keller and Saporta 1957). With respect to the “order” factor, first-order patterns typically involve restrictions on the position of a segment or a feature within a syllabic frame (e.g. a [+voiced] segment cannot be in a coda regardless of its preceding or following environment). Second-order patterns refer to positional restrictions that are dependent on another property, such as the type of preceding or following segment (e.g. a fricative can only be in coda position if preceded by a high vowel). As first-order patterns do not depend on the appearance of any other segment or feature, they can be viewed as less complex than second-order patterns (Chambers et al. 2011). In the following sections, we review the relevant literature that has examined the acquisition of these types of phonotactic knowledge.
3.2.1 Absolute Phonotactics
As just described above, absolute phonotactic patterns are compulsory restrictions on individual segments or sequences of speech sounds. One of the first studies looking at (p. 29) the acquisition of such patterns was Friederici and Wessels (1993) who tested Dutch infants’ sensitivity to legal versus illegal onset and offset sequences in Dutch (e.g. sɡ, rt). The sound sequences were embedded in non-words, with phonotactically illegal stimuli created by reversing legal onsets and offsets. In a series of studies using the head-turn preference procedure (HPP), 9-month-old Dutch-learning infants preferred to listen to those lists that had legal onsets and offsets, whereas 4- and 6-month-old infants showed no significant listening preference to either the legal or the illegal list. Crucially, 9-month-olds also showed no preference for either list when the stimuli were low-pass filtered to remove the segmental content from the signal, demonstrating that the listening preference for the phonotactically legal stimuli was driven by their segmental properties. Friederici and Wessels offered two possible accounts of their results. One interpretation of the findings was that they reflected infants’ knowledge about the frequency of segmental sequences in different prosodic positions. The other possibility was that the observed results reflected more abstract or rule-governed phonotactic knowledge. Although these alternative interpretations concern the issue of the level of abstraction in learner’s phonotactic knowledge and Friederici and Wessels’ study was not designed to address this question, the authors suggested that “frequency of occurrence is the first ground upon which to build up initial language-specific knowledge” (1993: 297) and that more abstract or rule-governed knowledge emerges from exposure to such patterns.
Friederici and Wessels demonstrated that there were differences in sensitivity to phonotactic regularities at different ages (no effect at 4 and 6 months, significant preference for legal phonotactic patterns at 9 months). This finding indicates that some type of learning of phonotactic patterns has taken place. However, when interpreting these results and considering how they might generalize to learners of other languages, it is important to keep in mind that phonotactic learning is traditionally thought to be dependent on language-specific knowledge of the native phonemic inventory and how these phonemes are allowed to combine, yet learners may also display sensitivity to certain types of phonotactic patterns without necessarily experiencing them in the ambient language. In other words, infants may prefer one pattern over another based on more language-general processing abilities (where sensitivity to different structures may emerge at different stages in development) or innate linguistic knowledge (which may reflect factors such as phonetic knowledge and/or knowledge of how speech is perceived; Hayes and Steriade 2004). Crucially, an important limitation of Friederici and Wessels’ stimuli is that through the juxtaposing of legal onset and offset clusters they created syllables that were ill-formed not only for Dutch but also cross-linguistically. For example, while the onset rt is phonotactically illegal in Dutch, it also violates the Sonority Sequencing Principle (Steriade 1982; Selkirk 1984; Clements 1990) and it goes against the tendencies that are found cross-linguistically (Kawasaki-Fukumori 1992). Hence, the stimuli were not only phonotactically illegal for Dutch but also marked cross-linguistically and the observed preference for the legal phonotactic patterns could have been driven by language-general knowledge rather than learning of the patterns in the ambient language. As such, the case for language-specific phonotactic learning would be much stronger if differences in learner’s performance were found depending on the learner’s input. For example, it would be revealing to examine the development (p. 30) of phonotactic knowledge in a group of infants acquiring a language like Russian, which has syllables that violate the Sonority Sequencing Principle (e.g. rta ‘mouth,’ bobr ‘beaver’). However, very few studies have looked at phonotactic acquisition and directly compared infants’ performances across languages. These investigations, described in this chapter, have generally found that a wide range of factors comes into play in the acquisition of phonotactics.
Almost ten years after the original study, Sebastian-Galles and Bosch (2002) extended Friederici and Wessels’ (1993) results to a group of 10-month-old infants from monolingual Catalan families, monolingual Spanish families, and Catalan-Spanish bilingual households. Catalan and Spanish have different phonotactic restrictions on word-final consonants. Catalan permits word-final consonant clusters (e.g. ve[rt] ‘green (masculine)’; Wheeler 2005), while Spanish is more restrictive and does not allow final clusters. Sebastian-Galles and Bosch tested both monolingual Catalan and monolingual Spanish infants on their preference for words that ended in consonant clusters that were either legal or illegal according to Catalan phonotactics (legal: birt, dort, gurt, nast; illegal: ketr, datl, bitl, bepf). If the two lists were discriminated on the basis of language-general knowledge or perceptual saliency rather than experience with the ambient language, both groups of infants may be expected to show a preference for the legal phonotactic lists. However, Sebastian-Galles and Bosch found that only the Catalan-learning infants showed a listening preference for the legal list over the illegal list and that Spanish monolingual infants did not prefer either list (note, however, that the exact interpretation of the Catalan results remains a matter of debate since the illegal clusters violated both Catalan phonotactics and the Sonority Sequencing Principle).
Jusczyk et al. (1993a) is another central study on phonotactic acquisition, which set out to establish when learners show evidence of language-specific, absolute phonotactic patterns. English- and Dutch-learning infants were presented with lists of either English or Dutch words that contained phonemes found in both languages, but the legality of sequencing of these phonemes differed across English and Dutch. For example, the Dutch word zweten ‘to sweat’ begins with a zw cluster. Although both z and w are phonemes of English and Dutch alike, only Dutch allows zw word-initially. Jusczyk and his colleagues found that infants at 9 months of age, but not at 6 months, listened longer to word lists from their ambient language. When the segmental information was removed by using low-pass filtered stimuli, infants no longer showed any preference for the stimuli from their native language. The authors concluded that knowledge of language-specific phonotactics emerges around 9 months of age. Note, however, that although the lists were controlled to contain phonemes that are found in both languages, the frequency and acoustic patterns of the phonemes in question differ across English and Dutch. This opens the possibility that the lists may also have been distinguished on the basis of factors other than phonotactics (see Zamuner 2006: 81). Furthermore, one question the authors raise is how infants acquire this phonotactic knowledge. Although phonotactic knowledge is characterized as patterns that are abstracted across the learner’s lexicon, this does not seem plausible given that infants at 9 months have small lexicons. Instead, infants’ phonotactic knowledge likely reflects sublexical levels of representation (p. 31) (Jusczyk et al. 1993a; Mattys et al. 1999), which is also supported by the findings of the studies on the acquisition of phonotactic probabilities that are described in the next section.
A recent study by Archer and Curtin (2011) explored whether 6- and 9-month-old English-learning infants showed sensitivity to the type and token frequency of legal versus illegal onset clusters (e.g. kla, bla, tla). The study revealed that infants at 6 months of age were not sensitive to either the type or the token frequency of onset clusters. In contrast, 9-month-old infants showed sensitivity to type but not token frequencies of onset clusters. Archer and Curtin proposed that the lack of sensitivity to phonotactic patterns at 6 months is related to the lack of sensitivity to type frequencies and that the emergence of phonotactic knowledge at 9 months reflects a newly developed sensitivity to type frequencies, which underpins the learner’s ability to determine which sound combinations are legal and illegal in the language the learner is acquiring. The importance of type but not token frequency is also consistent with usage-based theories of language (e.g. Bybee 2001) which use type frequency rather than token frequency to predict the learnability of a given pattern.
Other factors, such as the acoustic saliency of a phonotactic pattern, may also interact with the acquisition of phonotactics. In a study by Narayan et al. (2010), English-learning and Filipino-learning infants were tested on two contrasts which differed in their acoustic saliency and their cross-linguistic phonotactic legality. The first contrast was ma ~ na, which is both acoustically salient and phonotactically legal in English and Filipino alike (Narayan 2008). The second contrast was na ~ ŋa, which is less salient and also involves an important cross-linguistic difference. While English has both n and ŋ in its sound inventory, only n can occur word-initially. In Filipino, both sounds are phonotactically legal in word-initial position. Narayan and colleagues found that English-learning infants showed discrimination of ma ~ na at 6–8 and 10–12 months of age (Filipino infants were not tested on this contrast). At the same time, both English and Filipino infants failed to discriminate the na ~ ŋa contrast at 6–8 months, and only Filipino learning-infants discriminated the na ~ ŋa contrast at 10–12 months. Although one would expect the contrast to be discriminated by both groups of infants at 6–8 months (based on studies showing that language specific phonemic contrasts are acquired later; e.g. Werker and Tees 1984), the less salient contrast appears to be discriminated only with linguistic experience. Thus, the learning of a phonotactic pattern may also vary depending on its acoustic saliency. Other potential limitations on the types of phonotactic patterns that learners can acquire are discussed in sections 3.2.2–3.2.6.
3.2.2 Probabilistic Phonotactics
The notion of phonotactic probability refers to the statistical likelihood that a sound or a sequence of sounds will occur in a given environment (e.g. Vitevitch and Luce 2004). In a study analogous to the works looking at absolute phonotactics, Jusczyk et al. (1994) examined whether infants are sensitive to the statistical frequency of sound patterns (p. 32) that are all legal in the ambient language. English-learning infants were tested on their listening preferences for non-words which contained either frequent or infrequent sound combinations. Jusczyk and colleagues found that infants at 9 months of age listened longer to a list of non-words of high phonotactic probability (fʌl, kit, mep) than to lists of non-words of low probability (ðʌʃ, zidʒ, θeθ). In contrast, 6-month-old infants displayed no preference for either list (also, see Zamuner 2003 who extended Jusczyk et al.’s findings to 7-month-old English-learning infants and found a similar preference for high phonotactic probabilities). These findings indicate that infants in the second half of their first year of life are sensitive to the distribution of sound patterns in the ambient language. This type of knowledge may then be used in word segmentation and the “development of word recognition abilities and the organization of the mental lexicon” (Jusczyk et al. 1994: 642). As mentioned earlier, studies on phonotactic acquisition have not necessarily addressed the question of whether phonotactic knowledge reflects specific knowledge about the frequency of phonotactic patterns in a language or, alternatively, more abstract or rule-governed phonotactic knowledge. Given that Jusczyk and colleagues demonstrated that infants are in fact sensitive to the statistical frequency of sound patterns, their results suggest that, at this stage in development, phonotactic knowledge is likely to be more specific than abstract (however, see Chambers et al. (2011) who showed that infants are able to generalize certain first-order patterns and may therefore have access to abstract knowledge).
Another study by Mattys et al. (1999) found that infants are sensitive to the occurrence of consonant+consonant (CC) sequences within and across word boundaries. The CC clusters in their stimuli all had roughly the same type and token frequency in English (e.g. ŋk, ft, ŋt, fh). However, the frequency of the consonantal sequences varied within and across word boundaries. Some clusters had a high probability of occurrence within word boundaries but a low probability of occurrence across word boundaries (e.g. nɔŋ.kʌθ, zuf.tʌdʒ). Other clusters had a low probability within word boundaries but a high probability across-word boundaries (e.g. nɔŋ.tʌθ, zuf.hʌdʒ). Infants listened longer to non-words containing high probability within-word clusters. Thus, not only do infants demonstrate knowledge about the frequency of sound patterns, but they are also sensitive to how sound combinations are distributed both within and across word boundaries (see also Gonzalez-Gomez and Nazzi, 2016 for evidence that infants are capable of tracking probabilistic patterns that involve non-adjacent segments). To explain the results for between-word patterns, Mattys et al. argued that learner’s phonotactic knowledge cannot be based exclusively on individual, stored lexical items, since infants also showed sensitivity to between-word sequences.
3.2.3 Acquisition of Novel Patterns
In more recent years, research on phonotactic acquisition has looked beyond infants’ sensitivity to phonotactic patterns that are legal versus illegal or frequent versus infrequent in the ambient language and have started investigating the learning mechanisms (p. 33) that are involved in the acquisition of phonotactics, the type of information that is relevant for learners, and the way learners might begin to analyze, represent, and generalize phonotactic knowledge. Such studies aim to determine what kinds of novel first- and second-order phonotactic patterns can be learned by infants during a brief familiarization stage and how the learnability of such patterns is affected by phonetic naturalness, the phonemic status of segments, and infants’ age.
Using the head-turn preference procedure (HPP), Chambers et al. (2003) explored infants’ ability to track an arbitrary pattern involving positional restrictions on initial and final consonants. English-learning 16.5-month-olds were familiarized with C1VC2 non-word items in which initial and final consonants were drawn from two different segmental sets that could not be characterized using a single phonetic feature (e.g. b, k, m, t, f as C1; p, g, n, tʃ, s as C2). During testing, infants were presented with novel words that followed either the familiarized pattern (C1VC2) or a new pattern in which initial and final consonants were reversed (C2VC1). Infants showed a preference for the juxtaposed non-words, confirming that they were capable of tracking an arbitrary first-order phonotactic pattern that was not present in their ambient language (for comparable findings for 10.5- and 16.5-month-olds, see Chambers et al. 2011). However, it remained to be determined whether infants acquired the complex restrictions on both word-initial and word-final consonants or whether the observed effect was due primarily to their ability to detect word-initial or word-final patterns alone (see, among others, Saffran and Thiessen 2003; Weitzman 2007).
Further evidence for infants’ ability to learn first- and second-order patterns came from Seidl and Buckley (2005) who tested whether 8.5- to 9.5-month-old infants could learn phonetically natural and phonetically unnatural, arbitrary phonotactic patterns involving manner and place of articulation. Infants were exposed to sound patterns that were either common (unmarked) or rare (marked) in the world’s languages. For manner of articulation, the unmarked pattern involved words containing intervocalic fricatives and affricates but no intervocalic stops (e.g. pasat, mitʃa). The marked pattern involved items with word-initial fricatives and affricates and intervocalic stops (e.g. sapat). For place of articulation, the unmarked pattern was the co-occurrence of labial consonants with round vowels and coronal consonants with front vowels (e.g. vogo, sike). The marked pattern involved co-occurrence of labial consonants with high vowels and coronal consonants with mid vowels (e.g. vigo, soke). Infants did not show any significant preference for phonetically natural sequences for either the first-order pattern involving manner of articulation or the second-order pattern involving place of articulation. Seidl and Buckley concluded that infants were capable of acquiring both marked and unmarked patterns and that there was no specific preference for phonetic naturalness in the grammars of 9-month-old learners. However, the proposed interpretation of the findings is based on the assumption that intervocalic affrication (i.e. stops becoming affricates in intervocalic positions) is phonetically natural and that infants define both fricatives and affricates as [+continuant] and oral and nasal stops as [–continuant], which may or may not be the case. Seidl and Buckley also acknowledged that, for consonantal manner, infants could have learned that word-initial segments must be stops (p. 34) rather than learning that word-medial segments must be fricatives and affricates. For place features, potentially conflicting cues that could have affected infants’ ability to recognize the pattern were present in the data as phonetic naturalness was only manipulated in the first syllable of bisyllabic items (e.g. infants were exposed to items such as sidu in which coronal segments in the second syllable were not followed by front vowels).
In contrast to the studies discussed earlier in this section, some previous investigations showed that infants were not able to learn certain types of phonotactic patterns. Saffran and Thiessen (2003), for example, found that English-learning 9-month-old infants failed to acquire abstract phonotactic regularities involving phonetically unnatural classes of segments. Saffran and Thiessen examined whether infants could acquire first-order restrictions on the occurrence of voiced and voiceless consonants in disyllables (voiceless p, t, k in onsets, voiced b, d, ɡ in codas, and vice versa; e.g. todkad and dakdot) and whether infants could also learn a similar pattern that did not involve sets of sounds that could be defined as a natural class using a single phonological feature (p, d, k in onsets, b, t, k in codas, and vice versa; e.g. dotkat and taktod). Infants acquiring the pattern involving a single feature listened longer to the lists which did not follow the familiarized pattern, which is a novelty preference. Given the complementary distribution of voiced and voiceless consonants in the single feature pattern, it remained unclear what kind of regularities the infants were actually tracking (e.g. they could have learned that word-initial segments had to be voiceless and could have disregarded the voicing of coda consonants and/or word-medial onsets). However, those infants who were familiarized with arbitrary sets of segments that could not be characterized with a single feature, failed to acquire the pattern. Saffran and Thiessen also noted the role of age, since their findings contrasted the results obtained by Chambers et al. (2003) for 16.5-month-olds who showed successful acquisition of an arbitrary pattern (note, however, that recent work by Chambers et al. 2011 extended their 2003 findings to 10.5-month-old infants).
The suggestion that age may have an important effect on infants’ ability to acquire phonotactic patterns has also been supported by White et al. (2009). White and colleagues explored the ability of English-learning 8.5- and 12-month-old infants to acquire phonotactic patterns involving obstruent voicing. Infants were familiarized with datasets involving stop or fricative voicing alternations at word boundaries (e.g. initial consonants were voiceless after voiceless segments, and voiced elsewhere, as in rot#pevi ~ na#bevi). While both 8.5- and 12-month-olds could learn the voicing pattern, younger infants appeared to rely on transitional probabilities of segments, whereas older infants seemed to group segments into functional categories (although the relevant interaction was not statistically significant). Similar developmental changes in the acquisition of phonotactic patterns have also been reported in the work by Cristià and colleagues (Cristià and Seidl 2008; Cristià et al. 2011). Cristià and Seidl (2008) trained 7-month-old infants on a phonotactic pattern involving nasals and oral stops or a pattern involving nasals and fricatives. Only the infants trained on the pattern involving nasals and stops were able to generalize the phonotactic regularity to novel test items. At the same time, Cristià et al. (2011) showed that 4-month-old infants were able to generalize both (p. 35) patterns when tested on the same stimuli. This suggests that infants’ sensitivity to phonotactic regularities becomes more focused as their exposure to the ambient language continues to increase.
Finally, Seidl et al. (2009) investigated whether the phonemic status of segments could also affect the acquisition of phonotactic patterns. Seidl and colleagues familiarized 11-month-old infants from French Canadian families and 4- and 11-month-old infants from English-speaking households with a second-order phonotactic pattern in which the selection of C2 in C1VC2 items was dependent on the type of the preceding vowel (e.g. oral vowels were followed by stops, nasal vowels were followed by fricatives). During testing, French Canadian infants (for whom vowel nasality was phonemically salient) showed a head-turn preference for the items that did not follow the familiarized pattern and were also capable of transferring the familiarized pattern to a novel set of vowels. In contrast, English-learning 11-month-olds (for whom vowel nasality was not phonemically relevant) did not acquire the phonotactic regularity. English-learning 4-month-olds (who have had limited experience with the phonological system of English), patterned like the French-learning 11-month-old infants and learned the phonotactic pattern, displaying a familiarity preference for items that followed the original pattern. Thus, by 11-months, phonotactic learning appears to be also constrained by the phonemic status of segments, with less attention given to contrasts that are allophonic in the ambient language.
3.2.4 Phonotactics and Word Segmentation
In the past few years, several works have looked at learner’s ability to use phonotactic knowledge in word segmentation in order to determine whether infants are able to apply statistical knowledge of a language’s phoneme distributions and phonemic transitions to discover word boundaries. Mattys and Jusczyk (2001b), for example, tested 9-month-old infants on their ability to segment a CVC word from fluent speech. The target words varied on whether they were preceded and/or followed by good versus poor phonotactic cues to a word boundary. The word ‘gaffe,’ for example, creates the clusters nɡ and fh when embedded in the sequence ‘The old pine gaffe house tends to break too often.’ When ‘gaffe’ is embedded in ‘The old tong gaffe tends to break too often,’ this creates the clusters ŋɡ and ft. While the frequency of these four clusters is approximately the same in English, they vary in how often they occur across versus within word boundaries. The clusters nɡ and fh are considered good phonotactic cues to a word boundary because nɡ and fh do not tend to occur in the same word in English. In contrast, ŋɡ and ft are poor phonotactic cues to a word boundary because in English these clusters tend to be found within a word. Mattys and Jusczyk found that infants segmented the words when they were preceded by good phonotactic cues that aligned with across-word boundaries. Similarly, at the level of syllables, work using artificial languages (Saffran et al. 1996a, 1996b) found that infants are more likely to posit a word boundary at the location of a low probability transition (indicative of a word boundary) than a high probability transition (indicative of within-word sequences).
(p. 36) While a full review of the literature on word segmentation is beyond the aims of this chapter, it is worthwhile to note that a large amount of research has been dedicated to understanding how distributional information may be used in the word segmentation task (Church 1987; Brent 1999b) and how much can be gained on the basis of phonotactic cues alone versus when phonotactic information is combined with other cues, including knowledge of lexical stress (Christiansen et al. 1998), phonological cues (Onnis et al. 2005) or universal phonotactic constraints, such as the prior knowledge that well-formed words consist of a syllabic sound or a nucleus (Blanchard et al. 2010). For an in-depth discussion on word segmentation, the reader is referred to Chapter 8 by Louise Goyet, Séverine Millotte, Anne Christophe, and Thierry Nazzi in this volume.
3.2.5 Phonotactics and Lexical Acquisition
Beyond word segmentation, phonotactic knowledge is also known to be applied to the learning of lexical items (Saffran and Graf Estes 2006; Onnis and Christiansen 2008; Christiansen et al. 2009; Graf Estes 2009; Lany and Saffran 2010). Specifically, researchers have asked whether the phonological shape of a word has an impact on whether or not that word is acquired, that is, whether already established phonological knowledge can facilitate the acquisition of new lexical items. To address this question, studies have manipulated the phonotactic patterns of novel words and tested whether learners showed an advantage in acquisition depending on the type of phonotactic patterns involved. While most of this research has looked at older children (e.g. Messer 1967), some studies have examined the potential role of phonotactic patterns for lexical acquisition in infancy. Graf Estes et al. (2011), for example, showed that word learning is impacted by the phonological patterning of novel words, for infants as young as 19 months of age. In their study, infants learned non-words that conformed to English phonotactics (dref, sloob), but did not learn non-words with illegal phonotactic patterns (dlef, sroob). Friedrich and Friederici (2005) reported neurophysiological data which largely parallels the behavioral findings of Graf Estes and colleagues. Friedrich and Friederici compared the processing of phonotactically legal versus illegal non-words by 19-month-old German-learning infants. They found that infants have an ERP component (N400) only for phonotactically legal non-words, but not for illegal tokens. The N400 is known to be an indicator of semantic processing or semantic integration (Kutas and Hillyard 1980). Because an N400 was only found in the processing of non-words containing legal phonotactic patterns, this suggests that the degree of semantic integration of a word is influenced by the words’ phonological shape (Friederici 2005). Furthermore, other work that has manipulated the phonotactic probabilities of non-word stimuli items found that young children are also better at learning non-words with frequent versus infrequent phonological patterns (e.g. Storkel 2001). Together, these studies demonstrate that the phonological properties of words impact lexical acquisition and provide a connection between phonotactics and lexical acquisition.
(p. 37) Experimental findings can vary depending on the methodology adopted by the researchers. For example, the role of phonotactic probabilities and neighborhood densities in the acquisition of real words and non-word stimuli has been investigated either by looking at corpora of child directed speech or children’s first word productions (Coady and Aslin 2003; Storkel 2009; Zamuner 2009b) or by subjecting learners to various experimental procedures (Hollich et al. 2002; Swingley and Aslin 2007). The corpora studies find a benefit for new words that overlap with already acquired words. In contrast, experimental studies find that the same words are at a disadvantage (see discussion on these issues by Saffran and Graf Estes 2006; Graf Estes 2009). As pointed out by Saffran and Graf Estes (2006), examinations of the relationship between phonological knowledge and lexical acquisition are still relatively recent, and many research questions are yet to be addressed in the literature (also see Stoel-Gammon 2011). For example, although work on infant speech perception has already examined learner’s sensitivities to phonotactically legal versus illegal patterns, experimental work on the production of phonotactically illegal structures is so far limited to a few older studies (e.g. Messer 1967).
3.2.6 Phonotactics and Prosodic/Word Domains
What many of the different types of phonotactic patterns discussed in the previous section have in common is that they refer to specific prosodic or word domains. For example, the restriction in English against ŋ in word-initial position can only be learned if ŋ is perceived correctly and its distribution is tracked across different environments. Therefore, any studies investigating the acquisition of phonotactics must consider potential differences in infants’ discrimination abilities as well as their ability to learn patterns that occur in different prosodic positions. Most studies have focused on the discrimination of contrasts in word-initial position, starting with some of the earlier work on infant speech perception (Eimas et al. 1971). While our knowledge of the types of contrasts that infants can discriminate in word-initial position is vast, there are only a handful of studies that have examined infants discrimination abilities in positions beyond the word-initial environment, such as word-finally (e.g. Jusczyk 1977; Zamuner 2006; Fais et al. 2009). These latter studies have varied in their experimental research aims, the types of contrasts that they tested, and the methodologies they used (for a summary of the relevant literature, see table 1 in Fais et al. 2009: 290).
Generally, the discrimination of contrasts in final position varies depending on the nature of the stimuli. Jusczyk (1977b) found that 2-month-old infants are able to discriminate a word-final d~ɡ contrast (bad ~ baɡ) and an m~ɡ contrast (bam ~ baɡ). Fais et al. (2009) reported that English-learning infants at 6- 12- and 18 months of age are able to discriminate between word-final singleton consonants and word-final consonant clusters (as in neek ~ neeks). Similar effects have also been found at the cross-linguistic level, with infants’ reaction to phonotactic patterns being affected by the phonotactic legality of the sequences in question in the ambient language (e.g. English versus (p. 38) Japanese; Kajikawa et al. 2006; Mugitani et al. 2007). Not all contrasts, however, appear to be equally salient. Zamuner (2006) found that 10- and 16-month-old Dutch-learning infants can discriminate place of articulation-based contrasts in the word-final position (kep ~ ket) but they do not discriminate between legal and illegal voicing phonotactics in the same word-final environment (ked ~ ket).
Another general finding is that infants’ performance depends on the nature of the experimental task. Discrimination studies tend to find that infants are able to perceive final contrasts (e.g. Eilers et al. 1977; Jusczyk 1977b; Hayes et al. 2000, 2009; Fais et al. 2009). Similarly, sensitivity to phonotactic patterns in final position has also been found in preference studies (Friederici and Wessels 1993; Mattys and Jusczyk 2001b; Sebastián Gallés and Bosch 2002), word-segmentation tasks (Mattys and Jusczyk 2001b; Tincoff and Jusczyk 2003), and studies looking at word-recognition and word-learning (Nazzi and Bertoncini 2009; Swingley 2009b). Swingley (2009b), for example, found that 14- to 22-month-olds are equally sensitive to mispronunciations in word-initial and word-final positions (e.g. “book” pronounced as “[d]ook” or “boo[p]”). Nazzi and Bertoncini (2009) showed that French-learning infants at 20 months of age are able to learn minimal pair non-words that differ in a single consonant in either the initial position (but ~ put) or the final position (pid ~ pit). In contrast, studies that have required infants to categorize stimuli items, such as finding commonalities across experimental tokens, have shown that infants are less sensitive to contrasts that occur word-finally (Jusczyk et al. 1999c; Zamuner 2006; Fais et al. 2009). Jusczyk et al. (1999c), for example, compared infants’ abilities to detect similarities in different word positions. At 9 months of age, infants preferred to listen to lists of words that shared the initial segment (fɛt, fɛm, fɛt, fɛɡ), but not to lists of words that had the same sound in word-final position (bad, pad, mad, tad). Zamuner (2006) found that Dutch-learning infants at 9- and 11 months did not prefer to listen to lists of non-words ending in legal voicing phonotactics over non-words ending in illegal voicing phonotactics. In sum, there is a great deal of variability found across the different studies, such as testing infants of different ages and/or using different methodologies. Some evidence suggests that contrasts in different prosodic positions are equally learnable. Other studies find that learners are more sensitive to initial position than medial or final position (Karzon 1985; Walley et al. 1986; Jusczyk et al. 1999c; Swingley 2005a; Zamuner 2006; Levelt 2012) and, as in the case of the discrimination studies described at the start of this section, it remains to be explored whether the observed effects may stem primarily from the differences in the acoustic salience of the contrast under investigation.
3.3 Syllable Structure
The notion of the syllable has a long tradition in the linguistic literature. Syllable structure, for example, is often used to describe phonotactic patterns that are found in the world’s languages (see, among others, Fudge 1969). The syllable has also been argued (p. 39) to function as a unit of processing that plays a role in both the production and perception of speech (Spoehr and Smith 1973; Mehler et al. 1981; Levelt and Wheeldon 1994; Ferrand et al. 1996; Cholin et al. 2006). Not surprisingly, the notion of the syllable is also frequently encountered in infant studies. However, very few works have attempted to directly address the question of what role (if any) the syllable plays in infants’ ability to perceive speech inputs and, so far, no work from infant studies has been specifically dedicated to the question of how syllable structure is acquired.
Most research on the role of the syllable has concentrated on the issue of early representations. Namely, they attempted to address the question of whether infants perceive speech inputs as decomposable sequences (strings of phonemes, bundles of phonetic features, etc.) or, alternatively, as non-decomposable syllable-sized chunks. Jusczyk and Derrah (1987), for example, used a modified high amplitude sucking (HAS) procedure to habituate 2-month-old infants to a set of monosyllables bi, bo, ba, and bər. During testing, infants were exposed to du and bu. Jusczyk and Derrah assumed that a stronger reaction to du (which involves a new consonant and a new vowel) than bu (which involves a new vowel but has the same consonant as in the familiarization set) would indicate that infants’ early representations are decomposable into individual segments. Infants reacted to both changes in the same way. In the absence of any positive evidence for segmental representations, Jusczyk and Derrah concluded that perceptual inputs are not represented as phonemic sequences but rather as non-decomposable ‘global’ units reminiscent of the syllable. This finding was in contrast to some of the earlier claims of infants’ sensitivity to individual phones (e.g. Miller and Eimas 1979; Eimas and Miller 1981; Hillenbrand 1983, 1985), but was in line with a number of studies that advocated for syllable-sized units in infant speech perception (Bertoncini et al. 1988; Jusczyk et al. 1995a, 1995b; Houston et al. 2003).
Another question seen in the literature on infant speech perception is whether the presence of syllable structure in the inputs facilitates perceptual processing. Using the HAS and a habituation-dishabituation paradigm, Bertoncini and Mehler (1981) tested 2-month-old infants’ ability to detect consonantal metathesis in CVC, CCC, and VCCCV sequences (e.g. tap ~ pat, tʃp ~ pʃt, or utʃpu ~ upʃtu). Bertoncini and Mehler found a significant discrimination rate for syllable-like CVC stimuli, a weaker but still significant rate for seemingly bisyllabic VCCCV items, and no significant differences for the non-syllabic CCC control group. On the basis of these results, Bertoncini and Mehler argued that the syllable is a unit of speech processing in infants. However, as pointed out in Bijeljac-Babic et al. (1993), infants’ ability to better discriminate a given contrast in a syllable-like environment does not necessarily entail that the syllable is a unit of speech perception. Furthermore, consonantal place differences are known to be most perceptible in pre-vocalic environments, less perceptible in post-vocalic environments and least perceptible when adjacent to other consonantal segments (see, among others, Steriade 1999; Côté 2000). Hence, no explicit reference to syllable structure is needed to account for the fact that infants are quite good at perceiving the transition from ta- to pa- (which involves a pre-vocalic position), slightly worse at discriminating the ut- versus up- contrast (which involves a post-vocalic contrast), and have the most difficulty (p. 40) discriminating tʃ- from pʃ- (when t, p are pre-consonantal). It is also not known whether infants tracked both consonants simultaneously or whether they paid attention to the initial or the final consonantal segment only, which would implicate a unit smaller than the syllable (e.g. tracking a rhyme or a coda but disregarding the onset).
Lastly, a few studies have aimed to determine whether infants are equally capable of noticing phonotactic patterns involving differences in abstract syllable structures and those involving differences in segmental composition or moraicity. Bijeljac-Babic et al. (1993), for example, used the HAS to test whether 4-day-old newborns could discriminate between two stimuli lists on the basis of (i) syllabicity (bisyllabic versus trisyllabic; e.g. rifo, zuti ~ mazopu, kesopa) and (ii) the number of segments (4 versus 6; e.g. rifo, iblo, gria ~ suldri, treklu, alprim). Infants noticed the difference between bisyllabic and trisyllabic sets even when the duration of stimuli items was modified to create a substantial overlap in the two distributions. At the same time, infants did not show any evidence of sensitivity to differences in the number of segments. Similarly, Bertoncini et al. (1995) tested whether 3-day-old newborns were capable of discriminating speech inputs on the basis of syllable structure and moraicity. Results revealed that while newborns register the difference between bisyllabic and trisyllabic sets (e.g. iga, tema ~ hekiga, temari), they fail to discriminate between bimoraic and trimoraic lists (e.g. kago, tomi, seki, buke ~ kango, tomin, sekki, buuke). Bertoncini and colleagues concluded that syllables are salient units even for newborns and that neonates use global representations and do not perceive syllable-internal complexity. Saffran and Thiessen (2003, Experiment 1) also showed that 9-month-olds can be sensitive to syllabic differences. Using HPP, Saffran and Thiessen familiarized infants with either CVCV or CVCCVC sequences (e.g. boga ~ bikrub). During testing, infants listened longer to those items that conformed to the familiarized pattern, which led Saffran and Thiessen to conclude that infants are capable of acquiring knowledge about syllabic structure.
However, the conclusions presented here were based on stimuli items that were not necessarily controlled for exogenous variability, such as the presence of durational differences between preshift and postshift sets. In addition, instead of noticing differences in syllable structures, infants could have perceived differences in the number of vowel peaks or presence versus absence of consonant clusters within stimuli items. In the case of Saffran and Thiessen’s study, the observed preference for a familiar syllabic frame has also been explained in terms of priming rather than learning (Seidl and Buckley 2005). Thus, the investigations described in this section provide at best limited evidence for the syllable playing a decisive role in perceptual processing. Furthermore, these studies often used monosyllabic items and did not attempt to de-correlate different levels of representation, which leaves open the possibility of another unit of processing (e.g. a word, a rhyme, a nucleus) being responsible for the observed effects. As such, the role of the syllable in infant speech processing and the question of how syllable structure is acquired (or whether it is acquired at all) remain to be explored, especially in light of a growing body of psycholinguistic literature challenging the status of the syllable as a unit of processing in adult speakers (among others, Jared and Seidenberg 1990; Schiller 1998, 1999, 2000; Perret et al. 2006).
(p. 41) 3.4 Outstanding Issues
While the focus of the present chapter is on speech perception, it is important to point out that there exists a body of literature on other aspects of the acquisition of phonotactics and syllable structure, such as production-related issues and phonological modeling of the learning process. The literature on the production of phonotactic patterns and the acquisition of syllable structure in children’s early speech outputs is very limited, especially with children under the age of three (for general reviews, see Bernhardt and Stemberger 1998; Fikkert 2007; Demuth 2011; Stoel-Gammon 2011). The central themes of this research have been the nature of children’s phonological and lexical representations and the role of frequency in phonological acquisition (e.g. Beckman and Edwards 2000; Munson 2001; Coady and Aslin 2004; Edwards et al. 2004; Zamuner et al. 2004; Munson et al. 2005a; Munson et al. 2005b; Stokes et al. 2006; Coady and Evans 2008; Zamuner 2009b; Munson et al. 2012), the influence of the knowledge about the syllable and its organization (Jakobson 1941/1968; Demuth 1995b; Ohala 1999), and the emergence of different syllable structures and the developmental paths followed by the speakers (Moskowitz 1973; Fikkert 1994; Levelt et al. 1999). Several studies have also used production data to examine how different types of segmental patterns interact with prosodic knowledge (Fudge 1969; Fikkert and Freitas 2004) or to investigate the structure of the learner’s syllables, arguing for specific linguistic representations (Fikkert 1994; Goad 2002; Ning 2005). Issues of phonological processing and representations are also discussed in this volume in Chapters 4 by Heather Goad and 33 by Daniel Dinnsen, Jessica Barlow, and Judith Gierut.
In addition to production-related issues, a growing number of works have looked into the question of modeling the acquisition process. Many earlier studies relied on Optimality Theory (Prince and Smolensky 2004) to show how phonotactics, syllable structure, and other similar phonological regularities may be acquired in the process of constraint (re)ranking (among others, Jusczyk et al. 2002, 2003; Hayes 2004). More recently, various computational models have also been proposed, such as the learning algorithm for acquiring full phonotactic systems in Hayes and Wilson (2008) and the model of learning sonority-based regularities in Daland et al. (2011). For further discussion on the phonological and computational modeling of language acquisition, we refer the reader to Chapter 28 on word segmentation by Pearl and Goldwater in this volume and the general reviews of the modeling literature in Albright and Hayes (2011) and Moreton and Pater (2011).
The review in this chapter focused on the development of phonotactics and syllable structure in infancy. In sum, at around 9 months of age infants begin to demonstrate (p. 42) sensitivity to the legality of the distribution of phonemes and the usage frequency of individual sounds and sound combinations in the ambient language. While infant research has not centered on the question of how this learning takes place, it is generally accepted that phonotactic awareness starts with non-abstract, language-specific knowledge (knowledge of sound inventories, frequency counts, etc.) and that more abstract or rule-governed knowledge emerges at later stages in development. This phonotactic knowledge is also thought to reflect sublexical levels of representation. Beyond that, a number of more recent studies have begun to explore a range of factors that may come into play in the acquisition of phonotactic knowledge. These factors include acoustic saliency and legality of phonotactic patterns, phonemic status of segments in infants’ ambient language, and the role of prosodic/word domains. Other studies have started investigating how learners analyze and represent phonotactic knowledge by exploring the types of novel first- and second-order phonotactic patterns that can be learned by infants. These studies have found that infants are capable of learning arbitrary first-order phonotactic patterns and that learners appear to have no specific preference for phonetic naturalness. This ability might be limited to older infants, as younger learners appear to rely more on statistical patterns in the language, whereas older infants seem to have access to more abstract phonotactic knowledge. Studies have also begun to examine the relationship between phonotactics and lexical acquisition, examining the potential connection between already established phonological knowledge and the acquisition of new lexical items. Lastly, work on the syllable in infant speech perception has focused on issues of early representations of speech, arguing for syllable-sized units playing a role already at the early stages of infant speech perception. However, this type of work is relatively sparse, and the role of the syllable in infant speech processing and the development of syllable structure largely remains to be explored in more detail.
Notably, the many findings from developmental speech production studies looking at the acquisition of phonotactics and syllable structure are yet to be integrated with those coming from the domain of speech perception. More detailed investigation is also needed into how phonotactic and syllable structure knowledge develop in learners acquiring more than one language and what kinds of differences are found cross-linguistically. It is also important to understand what it means to acquire a phonological inventory in the first place, a question that is discussed in the current volume in Chapter 2 by Ewan Dunbar and William Idsardi. Future research into these and other areas pointed out in this chapter will undoubtedly help us to better our understanding of how phonotactics, syllable structure, and linguistic knowledge in general are represented and processed in the human brain.
Thank you to Kyle Chambers, Alejandrina Cristià, Amanda Seidl, and an anonymous reviewer for providing helpful feedback on earlier drafts of this chapter.