This article provides details on human speech production involving a range of physical features, which may have evolved as specific adaptations for this purpose. All mammalian vocalizations are produced similarly, involving features that primarily evolved for respiration or ingestion. Sounds are produced using the flow of air inhaled through the nose or mouth, or expelled from the lungs. Unvoiced sounds are produced without the involvement of the vocal folds of the larynx. Mammalian vocalizations require coordination of the articulation of the supralaryngeal vocal tract with the flow of air, in or out. An extensive series of harmonics above a fundamental frequency, F0 for phonated sounds is produced by resonance. These series are filtered by the shape and size of the vocal tract, resulting in the retention of some parts of the series, and diminution or deletion of others, in the emitted vocalization. Human sound sequences are also much more rapid than those of non-human primates, except for very simple sequences such as repetitive trills or quavers. Human vocal tract articulation is much faster, and humans are able to produce multiple sounds on a single breath movement, inhalation or exhalation. The unique form of the tongue within the vocal tract in humans is considered to be a key factor in the speech-related flexibility of supralaryngeal vocal tract.
Bob McMurray and Ashley Farris-Trimble
This article addresses the case studies explaining the mechanisms involved in production-perception coupling. One of the studies addresses online processing mechanisms, the second considers statistical learning mechanisms, and the third argues for the role of parsing as a unifying mechanism. Information-level coupling may arise as a consequence of a common processing principle that is interactive activation. This commonality may lead perception to reflect the distributional properties of production, even if coupling is not an organizing principle for either system. A fundamental issue in spoken-word recognition is time. The acoustic material comprising a word unfolds with time and at early points there is ambiguity. The interactive activation models describes that a small set of units are activated corresponding to the perceptual input. Activation then spreads to phonemes and words, resulting in the parallel activation of multiple interpretations of the signal at each level. Distributional learning can account for many of the developmental patterns, describing how infants acquire categories, and with some simple assumptions, it explains discrimination performance. Distributional learning also gives rise to informational coupling. Computing cues relative to expectations (C-CuRE) provides an insight to identify individual tokens and clusters in the input by progressively accounting for sources of variance in the signal.
Insights From Perception and Comprehension: How Perceptual and Cognitive Constraints Affect Learning of Speech CategoriesRepresentations of Speech Sound Patterns In The Speaker's Brain: Insights From Perception Studies
Lori L. Holt and Noël Nguyen
This article provides detailed information on category learning and the role of speech perception in the formation of phonological representations. The experience-dependent change in speech perception reflects the influence of native-language speech category learning and has been described as a ‘warping’ of perceptual space. The mapping from acoustics to perceptual space is closely related to the raw acoustic differences among speech sounds, and infants' speech discrimination is mostly independent of the native language environment. Artificial languages comprised of speech tokens manipulated to have special characteristics have been used widely as a tool in understanding infant language acquisition. Discrimination training and categorization training warp listeners' perception of non-speech stimuli in different ways. Discrimination training increases listeners' sensitivity to small distinctions among stimuli thereby working against categorization. One of the characteristics of speech categories is their multidimensionality. Relative perceptual cue weight develops across childhood and is native-language specific. Words can show substantial variations in their surface form under the influence of a variety of phonological phenomena such as assimilation or deletion. Featurally underspecified lexicon (FUL) model of word recognition reports that each word is associated in the mental lexicon with a highly abstract phonological representation, which is underspecified for certain features such as coronal.
This chapter provides a selective overview of recent research on the phonetics and phonology of bilingualism. The central idea put forth in the chapter is that, in bilingualism and second-language learning, cross-language categories are involved in complex interactions that can take many forms, including assimilations and dissimilations. The sound categories of the two languages of a bilingual seem to coexist in a common representational network and appear to be activated simultaneously in the processing of speech in real time, but some degree of specificity is attested. The chapter then goes on to explore some of the characteristics of cross-language sound interactions, including the fact that these interactions are pliable and appear to be mediated by the structure of the lexicon.
Brett Miller, Neil Myler, and Bert Vaux
This chapter draws a distinction between Universal Grammar (the initial state of the computational system that underwrites the human capacity for language) and the Language Acquisition Device (the complex of components of the mind/brain involved in constructing grammar+lexicon pairs upon exposure to primary linguistic data). It then considers whether there are any substantive phonological components of Universal Grammar strictu sensu. Two of the strongest empirical arguments for the existence of such phonological content in UG have been (i) apparent constraints on the space of variation induced from the typological record, and (ii) apparently universal dispreferences against certain phonological configurations (known as markedness). The chapter examines these arguments in the light of recent literature, concluding that the phenomena submit at least as well to historical, phonetic, or other non-UG explanations. We suggest that language acquisition experiments, involving natural and artificial languages, may be a more fruitful domain for future research into these questions.