Show Summary Details

Page of

PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).

date: 17 November 2019

Timbre, Pitch, and Music

Abstract and Keywords

The perception of a sound’s timbre and pitch may be related to the more basic auditory function of sound recognition. Timbre may be related to the sensory experience (or memory) by which we recognize the source or meaning of a sound, while pitch may involve the recognition and mapping of timbres along a cognitive spatial dimension. Musical dissonance may then result from failure of sound recognition mechanisms, resulting in poor integration of pitch information and heightened arousal in musicians. Neurobiological models of auditory processing that include cortico-ponto-cerebellar and limbic pathways provide an account of the neural plasticity that underpins sound recognition and more complex human musical behaviors.

Keywords: Timbre, pitch, music, sound recognition, dissonance, auditory processing, memory, cortico-cerebellar circuits, limbic system, neuroplasticity

This chapter presents a historical overview of theories of pitch and dissonance, starting with early mechanistic models based on the vibration of strings and progressing through algorithmic signal processing models to finally present more recent neuropsychological models. Neurobiological mechanisms are described that could underpin the neuroplasticity of sound recognition and a range of animal behaviors that such neuroplasticity supports. Finally, these behaviors are compared with human music perception and cognition.

Definitions and Traditions

Pitch is a psychological percept that does not map simply onto physical properties of sound. The American National Standards Institute (ANSI) defines pitch as the auditory attribute of sound according to which sounds can be ordered on a scale from low to high (American National Standards Institute, 1973; Walker et al., 2011). In support of this definition, it has long been observed that people tend to assign high frequencies to high positions in vertical space and low frequencies to low positions (Mudd, 1963; Pratt, 1930; Roffler & Butler, 1968). Furthermore, some people make faster and more accurate judgments of pitch height when response options are spatially coherent with perceived changes in pitch—a phenomenon that has been labeled the spatial music association of response codes (SMARC) effect (Beecham et al., 2009; Rusconi et al., 2006). But not all people naturally associate pitch along a vertical spatial dimension. Antovic (2009) found that many musically untrained Serbian and Romani children described pitch differences as changes in size or thickness, and Rusconi et al. (2006) found that some musicians associated pitch height with a horizontal spatial dimension consistent with the layout of the piano keyboard instead of a vertical spatial dimension.

All these pitch mappings are generally consistent with physical properties that affect the frequency of vibration of structures (N. H. Fletcher & Rossing, 1991). However, mappings of pitch to physical vibrations may become ambiguous when different types of sound source are being compared. For example, a large bell may simultaneously produce a range of audible frequencies (partials or overtones) that are incongruent with the pitch of a human voice when its frequency is matched to one of the bell’s partials. So the ANSI definition of pitch operates successfully for instruments that reliably produce a sequence of sounds that could be recognized as belonging to the same source but that vary systematically in frequency. Such instruments provide an external scale that enables the association of audible frequencies with degrees along a linear dimension of the instrument.

Many simple musical instruments produce sound by the vibration of a string or a column of air. Such linear vibrating systems vibrate in integer fractions of their length (1/2, 1/3, 1/4, etc.) and in so doing produce a complex tone of many partials at integer multiples of the lowest, or fundamental, frequency (at 2, 3, 4, etc. times the fundamental). These sounds are called harmonic complex tones. Since the human voice also produces harmonic complex tones, most people can readily match the frequencies of linearly vibrating instruments to their own voices, thereby generating an abstracted musical scale that can be applied across a range of instruments and harmonic timbres.

In Africa and Asia sophisticated musical cultures emerged that were based on percussion instruments (e.g. xylophones, bells, and gongs). Percussion instruments vibrate in two or three dimensions and so their partials are not usually integer multiples of the lowest-frequency component. The resulting sounds are known as inharmonic complex tones (N. H. Fletcher & Rossing, 1991) and produce ambiguous pitch for people who are unfamiliar with these instruments (McLachlan et al., 2013c). The pitch of inharmonic instruments will be discussed in more detail below, in the section below entitled “From mechanistic to psychological models of pitch.”

The ANSI definition of timbre is the attribute of an auditory sensation that allows it to be distinguished from other sounds at the same pitch and loudness (American National Standards Institute, 1973). However, this definition fails when the pitches of different sound sources are incomparable because of widely differing patterns of component frequencies (as with the earlier example of inharmonic bells and harmonic voices). Another definition of timbre might emerge from the consideration that the ability to distinguish differences between sounds is essential for sound classification and recognition. Sound recognition likely starts early in auditory processing (McLachlan & Wilson, 2010, 2016), so we could define timbre as the time-varying pattern of spectral components by which a sound may be recognized (Handel, 1995; Handel & Erickson, 2004). This definition aligns with common descriptions of the timbre of a sound according to its similarity with a remembered sound identity (e.g. it sounded like a gunshot or a trumpet) or by onomatopoeia (e.g. it sounded like a bang or a toot).

People can vary the frequency of their voices, but prior to the invention of musical instruments these variations could not be associated with a linear dimension such as a series of air holes in a flute or frets along a string. Rather, variations in pitch initially may have been associated with emotional meaning, as is common in primate calls, or with symbolic meaning in human protolanguages (Mithen, 2005). So today, people may associate the pitch of a sound with both a symbolic meaning and with a spatial (or musical) scale. For example, many people can recognize the frequency of their phone’s dial tone (N. A. Smith & Schmuckler, 2008), and most musicians could also reproduce this frequency on an instrument. Other examples of the association of a stimulus frequency with a symbolic meaning include absolute (or perfect) pitch, in which musicians learn to directly associate a stimulus frequency with a note name in the Western musical scale (Levitin & Rogers, 2005; McLachlan et al., 2013b; Wilson et al, 2012); formant frequencies that define vowel sounds (Deterding, 1997); and tonal languages such as Mandarin, in which changes in the pitch of a vowel can change the meaning of a word.

Most music employs temporal patterns of pitch. The ancient Chinese, and later Pythagoras, discovered that a tone produced by 2/3 the length of a linear vibrator blended smoothly with the tone of the full-length vibrator and that sequential repetition of this interval eventually produced a tone close to the initial pitch some octaves (or doubling of frequency) higher (Partch, 1974). Much later, Helmholtz (1863/1954) described harmonics of a string or air column in physical terms and explained that two harmonic complex tones tuned to a frequency ratio of 3/2 (the Western perfect 5th intervalthe fifth degree of a major scale) produced many pairs of harmonics tuned to the same frequencies. Integer-ratio tunings (3/2, 4/3, 5/4, etc.) avoid the rapid fluctuations in loudness called beating or roughness that are caused by destructive interference when harmonics are slightly mistuned (Helmholtz, 1863/1954). By the time Helmholtz wrote his theory, many music tuning systems and scales had evolved in Europe and Asia that were based on integer ratios of the vibrator length to minimize the roughness of chords (Partch, 1974; Chalmers, 1990); as a consequence many elaborate cultural concepts emerged around beliefs in a mathematical basis of musical harmony. However, empirical investigations have revealed that familiarity and training with a musical system better account for ratings of dissonance and musical preference than does roughness (Guernsey, 1928; McLachlan et al., 2013a). This research will be discussed in more detail near the end of this article, in the section entitled “Emotional and cognitive processing of music.”

Initial Western conceptualizations of pitch held that pitch was a mechanical feature of sound-source vibrations (de Cheveigné, 2005). As it became clear that a wide variety of complex sounds could produce pitch percepts, information processing models of pitch based on neural properties of the auditory periphery emerged (de Cheveigné, 2005). The remainder of this chapter outlines the development and limitations of these pitch concepts and then describes a proposed neuropsychological model of pitch perception that addresses both the cultural basis of this human musical behavior and the auditory system’s plasticity to cultural and environmental contexts (Weinberger, 2012). Finally, brain mechanisms will be presented that may account for many of the cognitive and emotional processes involved in music perception.

From Mechanistic to Psychological Models of Pitch

Although ancient Chinese, Greeks, and Persians developed geometric methods of producing ordered scales of pitch using ratios of vibrator length (Partch, 1974), they did not develop physical theories that related vibrations to pitch (de Cheveigné, 2005). Early in the 17th century Mersenne determined that frequency varied inversely with the length of a string and as the square root of its tension. Over the next three centuries many scientists explored mathematical relationships of pitch to varying periods of string vibration or to the frequencies of harmonics (de Cheveigné, 2005). In 1693 Du Verney suggested that the cochlea resonated at specific locations along its length for different frequencies (de Cheveigné, 2005), and much later Helmholtz (1863/1954) refined this idea by suggesting that the auditory nerve (AN) responds selectively to vibrations at different frequencies along the basilar membrane within the cochlea. Consistent with that suggestion, H. Fletcher and Munson (1937) defined critical bandwidth auditory filters in the early 20th century. But the concept of auditory filters made it harder to explain why harmonic complexes are associated with only one pitch, since harmonic complexes comprise multiple tone components that independently excite multiple auditory filter channels. In other words, there is no one–one correspondence between the pitch perceived by the listener and the mechanical vibrations either of strings or of the basilar membrane of the cochlea.

Around the mid-20th century two algorithmic signal processing theories were developed to address the problem that pitches are perceived at the fundamental frequency of harmonic complexes even when this component is absent from the stimulus—a phenomenon known as virtual pitch (Schouten, 1938). The first of these was by Licklider (1951), who proposed that the auditory system employs a network of time delay circuits with coincidence-detecting neurons in each auditory filter band to detect the period of vibrations on the basilar membrane. This approach could explain pitch perception of harmonic complexes with a missing fundamental frequency if neural responses at time delays corresponding to multiple periods of the harmonics were summed to produce a peak at the fundamental frequency (e.g. time delays at twice the period of the 1st harmonic and three times the period of the second harmonic are equal to the period of the fundamental). Many researchers further developed this idea by applying autocorrelation and similar algorithms to measurements and models of AN responses to successfully model many basic features of pitch perception (Cariani, 1999; Delgutte, 1984; Meddis & Hewitt, 1991a, 1991b). These periodicity-based models of pitch were strongly supported by the observation of consistent pitch perceptions for iterated rippled noise, a class of noise stimuli that contains repeating sound features at fixed temporal lags without providing strong spectral frequency cues (Yost, 1996). However, neurophysiological measurements of the auditory system have not found neural arrays with systematically varying latencies, which would be required for pitch perception based on autocorrelation-like mechanisms (Shamma & Klein, 2000).

A second algorithmic approach to pitch perception was proposed independently by three researchers in the early 1970s (Goldstein, 1973; Terhardt, 1974; Wightman, 1973). These researchers applied various algorithmic methods to match harmonic templates to modeled AN responses for harmonic complexes. In these harmonic template matching models, excitation of the AN at harmonic frequencies was associated with pitch salience at the fundamental frequency. Terhardt (1974) also introduced perceptual weightings to his model to account for people’s ability to isolate the pitches of individual harmonics (referred to as analytical listening; Plomp, 1967; Ritsma, 1967) as well as to perceive a virtual pitch at the fundamental frequency. Terhardt (1974) proposed that harmonic templates may be learned implicitly from exposure to the human voice, but in the absence of any neurophysiological evidence for harmonic templates in the auditory pathways, Shamma and Klein (2000) suggested that these templates may arise spontaneously from harmonic distortion within the auditory system itself.

While many researchers selectively favored either periodicity- (autocorrelation-) based models or harmonic template–based models of pitch, a few researchers suggested that the auditory system may use both mechanisms. Moore (2003) proposed that periodicity cues may be used at frequencies below around 4 kHz, for which AN spike rates can phase-lock to the stimulus, while spectral cues are used for higher frequencies, where pitch perception is less precise. Alternatively, Carlyon and Shackelton (1994) suggested that periodicity cues may be more useful than spectral cues in frequency regions where harmonics are not resolved by auditory filter channels. However, more recently, Oxenham et al. (2011) challenged the idea that the accuracy of pitch perception at high frequencies is constrained by phase-locking on the AN and suggested instead that it may be due to lack of exposure to high-frequency stimuli, and so less familiarity. Consistent with this, McLachlan (2009) proposed that spectral recognition mechanisms may initially prime a pitch array in the auditory cortex so that only periodicity information that is consistent with the initial spectral estimate (usually at the lowest-frequency partial) contributes to refining the pitch percept over subsequent stimulus periods. This suggests that pitch associations are learned and so pitch accuracy should improve with increasing stimulus familiarity.

Smith et al. (2002) developed an intriguing demonstration that supports the proposition that pitch perception relies on both periodicity and spectral cues. They systematically altered computed spike rate modulations that would normally occur in auditory filter channels for examples of melodic stimuli. In other words, they altered the waveform periodicity cues provided in each auditory filter channel without changing spectral cues. Listeners sometimes reported hearing two melodies in the altered stimuli, one based on the periodicity and the other based on spectral amplitudes of the filter channels. The finer the spectral resolution of the filter bands, the more likely listeners were to use spectral information to hear the melody. For example, listeners always heard the melody based on the periodicity when the stimuli were filtered with up to 32 frequency bands but identified the spectral melody more often for stimuli filtered with 64 frequency bands.

There remain problems with both harmonic template–matching and periodicity-based pitch models. Behavioral data show that pure tones are unambiguously perceived at just one pitch, with a similar pitch salience to harmonic complexes with fundamentals tuned to the same frequency (Fastl & Hesse, 1984; Moore, 1973; Zwicker & Fastl, 1999). However, harmonic template–matching models predict that pure-tone stimuli will produce multiple pitches, since multiple templates would contain tonal information at the frequency of the stimulus. They also predict lower pitch salience for pure tones than harmonic complex tones, since the former should provide less neural stimulation of harmonic templates than the latter (McLachlan, 2009). Furthermore, the highly consistent tuning of inharmonic percussion instruments at the frequencies of their first partials in gamelan ensembles from Indonesia provides a substantial challenge to both periodicity-based and harmonic template–matching models of pitch, since both types of models predict that each overtone of an inharmonic complex will generate a different pitch estimate (McLachlan et al., 2003; McLachlan et al., 2013c; Parncutt, 1989; Terhardt, 1974).

McLachlan et al. (2013c) investigated the pitch-matching accuracy of Western musicians who had spent at least one year learning to play inharmonic gamelan instruments. They found that the gamelan-trained musicians matched the pitch of a pure tone to the lowest-frequency partials of both inharmonic gamelan and harmonic Western instruments with similar accuracy. In contrast, musicians with similar levels of training who had never played gamelan instruments were much less accurate for those instruments than for Western instruments. This finding is consistent with the proposition that spectral recognition mechanisms allow different timbres such as pure tones and harmonic and inharmonic complexes to be associated with the same pitch, as first proposed by McLachlan (2009). It is also consistent with a psychological definition of pitch as the systematic association of a sound-source timbre that may occur at various frequencies with a spatial scale of pitch. Taken together these findings suggest that pitch perception involves learning to recognize a sound timbre over a range of frequencies and then associating changes in frequency with visuospatial and kinesthetic dimensions of an instrument, a visual notation, and eventually a more abstracted pitch scale.

Since we learn to recognize sound sources at all the frequencies that they usually produce, timbre is often independent of pitch, as posited by the ANSI definition (American National Standards Institute, 1973). This perceptual phenomenon would require the formation of sets of long-term memory templates of the spectral components associated with a particular sound source at all the frequencies at which the sound naturally occurs (McLachlan, 2009, 2011; McLachlan & Wilson, 2010; McLachlan et al., 2013c). These templates may comprise any reproducible pattern of component frequencies, allowing musicians to learn unambiguous pitch associations for inharmonic percussion instruments as well as for harmonic complexes (McLachlan et al., 2013c). Furthermore, the learned independence of pitch and timbre allows humans to produce a wide variety of vocal timbres that are associated with phonemes while independently varying their vocal pitch.

Pitch-matching accuracy is greatly reduced when more than one pitch is presented simultaneously (Assmann & Paschall, 1998). A number of researchers have applied harmonic template–matching and autocorrelation models to explain human pitch and vowel segregation (Assmann & Summerfield, 1990; de Cheveigné & Kawahara, 1999; Meddis & Hewitt, 1992). Autocorrelation-like models were able to outperform human vowel segregation (Assmann & Summerfield, 1990; de Cheveigné & Kawahara, 1999); however, the algorithm required longer than usually occur (Robinson & Patterson, 1995), suggesting that speech segregation is unlikely to use periodicity-based pitch estimates. Given the proposal that pitch perception may be based on timbre recognition, McLachlan et al. (2013a) investigated pitch-matching performance for common and uncommon music chords in people with various levels of music training. They found that pitch-matching accuracy for pitches in chords increased with music training and with chord usage and familiarity. Furthermore, pitch-matching accuracy was consistently better for the highest pitch of chords. These data suggest that the timbre of the chord itself was recognized by musicians and associated with the fundamental frequency of the highest pitch in the chord. Recognition of the chord timbre then allows musicians to identify the musical intervals and prime their pitches in auditory working memory (McLachlan et al., 2013a).

The idea that pitch perception involves the association of a sound timbre with a cognitive spatial scale is further supported by relationships of performance on tasks requiring symmetrical transformations of melodic information with performance on spatial tasks (Cupchik et al., 2001) and with the spatial coherence of music notations (McLachlan et al., 2011). Dual-pathway models of sensory processing in the brain, in which a ventral stream processes “what” (or symbolic) information and a dorsal stream processes “where” (or spatial) information, have been influential in both the visual and auditory literature (Arnott el al., 2004; Rauschecker & Tian, 2000; Schneider, 1969). The Object-Attribute Model (OAM), developed by McLachlan and Wilson (2010), extended existing models of auditory “what” and “where” pathways to suggest that recognition mechanisms start subcortically and subserve the integration of auditory features such as pitch and loudness. The model explains how a sound source that varies in frequency may be associated with symbolic information such as a verbal label in the “what” pathway or with the spatial dimension of pitch height in the “where” pathway. A dissociation between symbolic and spatial processing of frequency information is consistent with findings that some people are able to discriminate stimuli with frequency differences of around two semitones but are unable to determine the direction of the pitch change (Johnsrude et al., 2000; Semal & Demany, 2006; Tramo et al., 2000). In other words, people can sometimes recognize different sounds, but poor mapping to a pitch scale prevents them from determining the direction of the pitch difference.

The following section outlines in more detail the OAM and related neurobiological research on pitch processing and sound recognition mechanisms.

Neurobiological Theories of Pitch

Since the mid-20th century, models and measurements of AN responses have been used to extend algorithmic models of pitch processing to a broad range of stimuli. For example, detailed psychoacoustic measurements of the perceptual masking of tones have provided detailed information about the shape of auditory filter channels in humans (Zwicker & Fastl, 1999) and enabled the design of digital filter banks that could be used to model the spectral resolution of the AN (Moore & Glasberg, 1983). Meddis (1986) published a mathematical model of the cellular dynamics of auditory hair cells that innervate the AN, which could be used in conjunction with a digital filter bank to model AN phase-locking to stimulus waveforms; more recent models of AN function capture a wide range of the nerve’s physiological properties (Sumner et al., 2002; Zilany & Bruce, 2006).

Cariani (1999) used measurements of spike trains in the AN of cats to show that interspike temporal intervals provide information about stimulus periodicity that could be used to estimate the pitches of harmonic complexes with missing fundamentals. While these findings were used to support periodicity models of pitch, they suffered from the lack of physiological evidence for a network of time delay circuits in the auditory pathways. This deficiency prompted researchers to look for other neurobiological mechanisms that could extract pitch information from the temporal structure of AN spike trains. The discovery of a class of neurons in the cochlear nucleus (CN) known as chopper neurons (Oertel, 1985) provided a new avenue of research in pitch modeling.

Chopper neurons inherit their frequency tuning (or characteristic frequency) from the ANs that innervate them. When excited near this frequency, they exhibit highly regular firing patterns (otherwise described as bursting or chopping) with a chopping frequency that is independent of the stimulus frequency (Blackburn and Sachs, 1991; Oertel, 1985). Based on these observations Meddis and O’Mard (2006) proposed a synchronization model of pitch in which coincidence-detecting neurons in the inferior colliculus (IC) receive innervation from groups of chopper cells with similar innate chopping frequencies. The regularity of chopper cell firing increases when the modulation frequency of the AN spike trains that innervate the cells coincides with their innate chopping frequency (Wiegrebe & Winter, 2001). Meddis and O’Mard proposed that this behavior would increase coincident spiking and enhance spike rates of IC neurons at this best modulation frequency. An array of chopper cells with varying innate chopping frequencies could therefore lead to a place–rate code for periodicity in the IC in each auditory filter channel.

The Meddis and O’Mard (2006) model was consistent with physiological measurements of the IC. Neurons in the central nucleus of the IC are arranged in a three-dimensional map with fine and coarse frequency resolution in two dimensions and minimum-intensity thresholds in the third dimension (Ehret & Schreiner, 2005). Frequency-band laminae contain neurons that are tuned to characteristic frequencies within the critical bandwidths of auditory filter channels, so the array of laminae constitutes the coarse, tonotopic frequency dimension. Around the circumference of each lamina, the neurons are arranged according to their best modulation frequencies, which range from around 10 Hz to 600 Hz and represent a place–rate code for periodicity (Langner et al., 2002). Each lamina also exhibits concentrically arranged regions of neurons with similar minimum-intensity thresholds. These thresholds decrease toward the center of the lamina by around 60 dB (Ehret & Schreiner, 2005).

Neurobiological models of pitch based on the temporal synchronicity of sustained chopper spike trains have replicated the independence of pitch height and loudness (Zwicker & Fastl, 1999) and a range of other attributes of pitch perception (McLachlan, 2009; Meddis & O’Mard, 2006). However, since the spike rates of sustained chopper cells rapidly saturate (Wiegrebe & Winter, 2001), these models also predict that pitch strength (or salience) is largely independent of loudness (McLachlan & Grayden, 2014), which is in stark contrast to the linear dependence of pitch strength and loudness reported by Fastl (1989). Furthermore, it is difficult to justify the evolution of innate synchronization mechanisms for pitch processing in mammals such as rodents, whose behavior does not require such fine frequency discrimination. Finally, if fine pitch processing relies on an innate mechanism evolved in early animals, then it is difficult to explain why people have such large individual differences in the pitch discrimination of pure tones and why this would improve with training (McLachlan et al., 2013a).

Timbre, Pitch, and MusicClick to view larger

Figure 1. Schematic representation of neurobiological model of periodicity processing in the auditory brainstem. Synchronization of chopper cell inputs with stimulus-locked inputs to the inferior colliculus leads to enhanced spike rates at a specific best modulation frequency (BMF). CB = cerebellum; CF = characteristic frequency.

Redrawn from McLachlan and Grayden (2014).

It is well known that the auditory system is more sensitive to tonal sounds, such as speech or animal vocalizations than noise (Kidd et al., 1989; Kryter & Pearsons, 1965; Soeta et al., 2007). To account for this McLachlan and Grayden (2014) published a neurobiologically inspired algorithm for enhancing the salience of tonal sounds based on the synchronization of chopper cell spike trains with stimulus-driven spike trains in the IC. In this algorithm, IC neural spike rates are enhanced by the coincidence of chopper spike inputs with spike train inputs that are phase-locked to the stimulus waveform, thereby increasing the specific loudness of any auditory filter channel that contains a periodic component (Figure 1). Octopus and bushy cells in the CN have very short spike latencies, so a single cell can phase-lock to acoustic waveforms at frequencies up to around 800 Hz and integrate information over many auditory filter bands (Ferragamo & Oertel, 2002; Oertel, 1997; Oertel et al., 2000). Octopus and bushy cells in the CN often project to neurons in the ventral lateral lemniscus that also have short temporal integration times (around 5 ms) and broad tuning (Covey & Casseday, 1991), so the combined excitatory and inhibitory input to the IC from octopus and bushy cells is likely to contain precise waveform timing information that is superimposed on inputs from the CN chopper cells (Pickles, 2008; Riquelme et al., 2001; Schofield, 2005). This model raises the possibility that innate synchronization mechanisms in animals evolved to enhance the salience of vocal communications in noisy environments rather than to produce pitch perception at a frequency resolution of less than a critical bandwidth. Music training in humans might then involve learning to associate a spatial dimension of pitch with elevated spike rates at best modulation frequencies in the IC (McLachlan, 2011; McLachlan et al., 2013a, 2013c).

For pitch perception of harmonic complex tones, a single value of pitch must be associated with a wide range of neural activation peaks that occur at harmonic and subharmonic frequencies in multiple auditory filter channels (McLachlan, 2009). The proposition that auditory recognition mechanisms make this association (McLachlan, 2009, 2011; McLachlan et al., 2013a) requires that the brainstem contain long-term memory templates to enable rapid integration of afferent auditory information.

Sound recognition subserves a wide range of animal behaviors, such as tone-conditioned motor reflexes and autonomic arousal to threats. For example, startle responses to acoustic stimuli are common in terrestrial vertebrates, and in rats these responses have very short latencies of about 5 ms (Fleshler, 1965). Habituation of startle responses is frequency specific (Lingenhöhl & Friauf, 1994), indicating the presence of a rapid neural pathway with the capacity to learn spectral information. Lingenhöhl and Friauf (1994) found that this pathway comprises only three synapses: the CN innervates giant neurons in the pons region of the brainstem, which in turn innervate cranial and spinal motor neurons. Consistent with this, Aitkin and Boyd (1978) described a subset of cells (15%) in the dorsolateral pontine nucleus, with onset latencies to auditory stimuli as short as 3 ms, that likely receive direct input from the CN (Huang et al., 1982). Learned fear responses to acoustic stimuli are also rapid and frequency specific (Sacchetti et al., 2005). The thalamus is innervated by the pons and the deep nucleus of the cerebellum (Pare et al., 1990; Reese et al., 1995), and it projects to the amygdala, which is associated with autonomic arousal. These connections form a neural circuit that can rapidly learn to recognize auditory stimuli paired with pain or threat, generate autonomic arousal, and adapt these learned associations (McLachlan & Wilson, 2016; Sacchetti et al., 2005; Weinberger, 2012).

Auditory conditioning of eye-blinks in rabbits can be achieved by repeated presentation of puffs of air on the eye paired with an initially neutral stimulus such as a tone (Medina et al., 2000; Ohyama et al., 2003; Perrett et al., 1993; Thompson & Steinmetz, 2009). Rabbits can be taught to accurately blink at any time between 100 and 600 ms after the onset of a tone of a specific frequency, with the accuracy and number of blink responses diminishing at longer time intervals up to about 3 s (Ohyama et al., 2003). The minimum onset latency of an eye-blink in rabbits is about 25–40 ms (Thompson & Steinmetz, 2009). Lesions of the cerebellar cortex permanently abolish the adaptive timing of eye-blinks, leaving only frequency-specific reflex responses with short and relatively fixed delays (Ohyama et al., 2003; Perrett et al., 1993). This finding indicates that the cerebellar cortex plays a role in learning accurate temporal relationships between acoustic and tactile stimuli. The preservation of the frequency selectivity of blink responses after cerebellar lesioning suggests that the learning of frequency-specific reflexes occurs prior to cerebellar cortical processing (presumably in the CN–pons circuit described above for startle and fear responses).

Timbre, Pitch, and MusicClick to view larger

Figure 2. A schematic diagram of the neural pathways involved in auditory processing described in this chapter. Blue and red arrows show “what” and “where” auditory pathways, respectively, including cortico-ponto-cerebellar pathways associated with sound recognition. The pathways marked by a ‘minus’ sign are inhibitory. Green arrows show limbic pathways associated with emotional responses to music and other conditioned stimuli. Black arrows show other important brain pathways involved in broader motor, cognitive and memory functions for completeness.

The cerebellar cortex receives innervation from the IC (Aitkin & Boyd, 1975), providing a network that can learn finer-resolution pitch associations as the neural responses at best modulation frequencies increase over multiple waveform periods (McLachlan, 2009; McLachlan & Grayden, 2014; McLachlan & Wilson, 2016). Figure 2 shows this network along with additional neural circuitry discussed below. The substantial interconnection of cerebellar networks with temporal and parietal cortices allows rapid association between auditory information and kinesthetic and visual information (Ackermann et al., 2007), as is required to generate spatial pitch height representations. Consistent with cerebellar involvement in pitch processing, Parsons et al. (2009) found that pitch discrimination thresholds of high-functioning patients afflicted with varying degrees of global cerebellar degeneration were on average over five times poorer than those of controls and were proportional to the degree of cerebellar ataxia. Furthermore, Holcomb et al. (1998) found increased blood flow in the left lateral cerebellum associated with the decision components of a pitch recognition task. Finally, Hutchinson et al. (2003) reported that male musicians had larger cerebellar volumes relative to their total brain volume than male nonmusicians, and Abdul-Kareem et al. (2011) reported increased white matter volumes in the middle and superior cerebellar peduncles of musicians, suggesting structural adaptation to extended periods of musical practice for enhanced motor control and better pitch discrimination (Holcomb et al., 1998).

To summarize what we have covered so far, theorizing about pitch began in the early stages of the scientific revolution, when mechanistic models of hearing led to the assumption that pitch perception is a primitive and automatic auditory process that occurs early in the auditory pathways. However, individual differences in pitch abilities associated with the type and level of musical training suggest that pitch perception involves learned associations of sound timbre to a spatial dimension (such as the keys on a piano). These spatial associations are distinct from more typical associations of timbre with symbolic meaning, as occur in speech and animal communication. It is possible that associations of sounds to pitch height can use any salient and consistent change across the multidimensional neural representations of stimuli in the IC. So participants in psychoacoustic experiments may learn to associate highly artificial stimuli to a pitch scale based on any feature of the IC’s stimulus representation—a possibility that might explain the many competing claims about neural mechanisms that underpin pitch perception in the literature (de Cheveigné, 2005).

Emotional and Cognitive Processing of Music

The invention of string and wind instruments created the opportunity to accurately reproduce a musical scale based on the absence of beating in intervals such as the Pythagorean perfect 5th. The association of other simple ratio tunings with the perceived smoothness of chords was first made by Persian musicians in the 13th century AD (Partch, 1974). For the remainder of the second millennium AD music theorists debated the benefits of various tuning strategies in terms of the relative roughness of intervals created at different degrees of musical scales and of the possibilities that musical scales afforded for musical modulation, that is, for shifting melodic patterns across different starting notes (Chalmers, 1990; Partch, 1974). In the mid-19th century roughness became largely synonymous with dissonance, after Helmholtz (1863/1954) proposed that dissonance was created by the beating of closely tuned harmonics in harmonic complex tones that were mistuned from simple integer ratios.

Dissonance is a strong negative emotional response to stimuli, and during the 20th century many researchers produced behavioral and neurophysiological evidence supporting the Helmholtz roughness model of dissonance. A particularly influential paper by Plomp and Levelt (1965) showed that dissonance ratings for pairs of pure tones were greater for intervals of less than a critical bandwidth. This finding was taken as evidence that beating (or comodulation) of tones that are not resolved by auditory filter channels causes dissonance. However, the same data could also be interpreted as showing that failure to spectrally resolve auditory information causes dissonance, regardless of whether it also causes roughness (McLachlan et al., 2013a). Another influential stream of research attempted to show that animals and human infants have an innate preference for music intervals tuned to simple integer ratios (Chiandetti & Vallortigara, 2011; Izumi, 2000; Trainor et al., 2002). The relevant studies found differences in behavioral responses to one-semitone intervals that are not resolved by the auditory system, to octave (2/1) intervals, and to perfect 5th (3/2) intervals. However, rather than showing a preference for simple integer tunings, these studies could also be interpreted as showing that dissonance is greater for the stimuli with spectrally unresolved, one-semitone intervals.

Early in the 20th century Guernsey (1928) performed a series of experiments that presented substantial challenges for the roughness theory of dissonance. The roughness theory proposes that consonance is the absence of dissonance and should increase when there are fewer mistuned harmonics in the stimulus. Guernsey found no evidence that reducing the number of harmonics in the stimulus increased consonance and instead found strong effects of music training, leading her to suggest that consonance was associated with the familiarity of commonly used musical chords. Given the importance of Guernsey’s work for music theory, it is remarkable how little attention her findings received over the remainder of the 20th century.

Guernsey’s findings are consistent with the Object-Attribute Model (OAM) of auditory processing developed almost a century later by McLachlan and Wilson (2010, 2016). According to the OAM, recognition mechanisms start early in auditory processing pathways and prime fine pitch processing based on waveform periodicity. Music training leads to increased familiarity for common chords, so that successful recognition occurs more often and activates more accurate long-term memory representations that enhance pitch processing (McLachlan, 2011). McLachlan et al. (2013a) investigated pitch-matching accuracy and familiarity and dissonance ratings for a range of common and uncommon music chords in musicians with various levels of music training. They found that dissonance rating was inversely proportional to pitch-matching accuracy (moderated by the level of music training). This result led McLachlan et al. to propose the cognitive incongruence model of dissonance. In this model, failure of spectral recognition mechanisms for unfamiliar chords causes incongruence between spectral and periodicity-based pitch estimates, leading to strong negative affect (or dissonance) in musicians. In contrast, nonmusicians with poor pitch-matching accuracy (who presumably had not learned to associate periodicity cues with pitch) reported no differences in the dissonance of chords with intervals greater than a critical bandwidth (McLachlan et al., 2013a).

Having been persuaded by Helmholtz’s roughness model of dissonance, Terhardt (1974) proposed a distinction between sensory dissonance for musical chords presented independent of a music context and musical dissonance experienced when music deviates from an expected pattern of chords or notes. In the Western music tradition, melodic and harmonic expectations are based on the use of functional harmony, a system of hierarchical tonal relationships based on Pythagorean tuning (Piston, 1948). Deutsch and Feroe (1981) proposed that long-term memory templates for music scales are hierarchically encoded, with greater emphasis placed on more common musical intervals; McLachlan and Wilson (2010) subsequently proposed that memory templates for scales could prime pitch processing in primary auditory cortex to create music expectancies. So in contrast to Terhardt (1974), the cognitive incongruence model of dissonance suggests that sensory and musical dissonance may both arise from negative affect generated by the failure of pitch-priming mechanisms. Under this model, sensory dissonance is generated by the failure of recognition mechanisms for stimulus timbre, whereas musical dissonance is generated by the failure of recognition mechanisms for music melody and scales (McLachlan et al., 2013a).

In the OAM, sound recognition occurs in brainstem and cerebellar pathways (McLachlan & Wilson, 2016). The pons and the deep nucleus of the cerebellum project to the thalamus, which in turn projects to the amygdala (Figure 2; Ramnani, 2006; Strick et al., 2009; Wolpert et al., 1998). The amygdala regulates autonomic arousal and, in conjunction with the hippocampal and parahippocampal cortices, the integration of sensory, semantic, and mnemonic operations (Critchley, 2005). Neuroimaging research has shown increased amygdala, hippocampal, and parahippocampal activity associated with the experience of dissonance (Blood et al., 1999; Koelsch et al., 2006; Wieser & Mazzola, 1986). This is consistent with autonomic arousal occurring in association with increased stimulus ambiguity due to the failure of recognition mechanisms in the pons and cerebellum.

Zatorre and colleagues reported activation of the dopaminergic brainstem pathways when listeners reported feeling pleasure while listening to music (Blood et al., 1999; Salimpoor et al., 2012). Furthermore, Mitterschiffthaler et al. (2007) reported increased activation of autonomic arousal pathways involving the amygdala for sad music, presumably associated with greater dissonance, and of the dopaminergic reward network for happy music, presumably associated with greater consonance (Figure 2). The release of dopamine following rewarded (or successful) behavior is a well-detailed neurobiological mechanism (Berridge & Kringelbach, 2013) that has been shown to be recruited by cognitive feedback such as occurs in problem solving without explicit external rewards (Tricomi et al., 2006). Successful prediction of environmental events generates confidence in an individual’s model of their immediate environment, and so successful prediction of musical events may cause pleasure and further engagement with music, although too much predictability can lead to boredom and even irritation (Meyer, 1956; Narmour, 1992).

Taken together, these data suggest that rather than consonance’s being the absence of dissonance (Helmholtz, 1863/1954), consonance involves activation of reward networks in the limbic system due to the successful prediction of musical patterns. Naturally, consonance will occur only in the absence of dissonance, since dissonance is associated with failure of recognition mechanisms. Note that this definition of consonance could apply to the successful prediction of any musical pattern, such as rhythm or orchestration. So the experience of consonance could occur within music traditions that do not use hierarchically defined pitch scales such as those found in Western music.

Western classical music has long been conceived of as high art in Western culture, associated with high levels of intellectual sophistication and higher cognitive abilities (Seashore & Mount, 1918). However, primitive vertebrates without neocortex display behaviors very similar to human responses to music. For example, young chicks prefer stimuli comprising frequency intervals larger than one semitone (Chiandetti & Vallortigara, 2011), possibly because of aversive reactions to sounds that the auditory system resolves poorly (McLachlan et al., 2013a). Primitive animals such as frogs respond preferentially to specific spectrotemporal structures in mating calls (Castellano et al., 2009; Ryan et al., 1990), and as in humans, the auditory brainstem neurons in frogs contain a wide diversity of spectrotemporal response fields (Hall, 1994) that provide them with highly specific sound recognition mechanisms (Ryan et al., 1990). Just as human pleasure in musical patterns involves dopaminergic neural pathways (Blood et al., 1999; Salimpoor et al., 2012), so too do the mating behaviors of frogs (O’Connell et al., 2010). Finally, marine iguanas are capable of distinguishing the predator alarm calls of mockingbirds from other mockingbird songs for the purposes of initiating escape and alert behaviors (Vitousek et al., 2007). Such behaviors are generally associated with autonomic arousal to familiar aversive stimuli, such as the fear displayed by rats when they hear a tone that has been paired with a subsequent electrical shock (Quirk et al., 1995).

Berridge and Kringelbach (2013) describe the affective keyboard, a neural mechanism by which modulation of sensitivity over an array of sites in the nucleus accumbens by the frontal cortex may initiate either intense dread or desire to the same stimulus. Aversive responses to familiar (or recognized) sounds likely engage the nucleus accumbens and amygdala. This brain network is different from that for dissonance, which as described above involves failure to recognize the stimulus and likely activates amygdala and hippocampal brain regions via the thalamus (Blood et al., 1999; Koelsch et al., 2006; Wieser & Mazzola 1986). Autonomic arousal coupled with feelings of empowerment also occurs in animals when they make aggressive territorial displays and vocalizations, which may explain why some people enjoy making loud and aggressive music (Hsu et al., 2014).

Taken together, the findings discussed above suggest that emotional responses to music in humans arise primarily from ancient brain networks that link the limbic system with recognition mechanisms in the brainstem and cerebellum and with prefrontal cortical regions associated with emotional regulation (Figure 2).

Birds, frogs, and reptiles lack neocortex and the higher cerebral auditory processing centers found in humans. Brainstem and cerebellar networks can learn implicitly by forming neural templates for stimuli (Fiez et al., 1992; Gebhart et al., 2002; Ravizza et al., 2006). Templates for stimuli from one sensory modality may be paired with templates for stimuli from other modalities that co-occur with them with high statistical reliability—in other words, multiple sensory inputs become associated with the same object, event, or behavior. For example, people learn speech and music implicitly (Mahon & Caramazza, 2008). Their learning rates are enhanced by pairing perception with production (Kotze & Schwatze, 2010), likely as a result of the added structure that motor mapping can provide for forming sensory neural templates in the cerebellum and of the opportunity to use voluntary actions to aid perceptual learning. In particular, people who learn musical instruments have better pitch perception (McLachlan et al., 2013a, 2013c), have better auditory brainstem phase-locking to the sounds of their instruments (Strait et al., 2012), and often align the psychological pitch dimension with motor maps for their instrument (Rusconi et al., 2006). More generally, this phenomenon, known as embodied cognition (Mahon & Caramazza, 2008), allows musicians to automatize many music perception and production skills, thereby releasing cerebral processing resources for analysis of higher-level features of musical expression.

In recent human evolution, speech has likely driven the rapid enlargement of the ventrolateral portion of the cerebellum in conjunction with the inferior frontal region of the cortex (Leiner et al., 1989; Ramnani, 2006). Cortico-cerebellar pathways to and from inferior frontal cerebral regions in humans could support symbolic associations with long-term sensory memory and the priming of sensory memories by contextual information in working memory (McLachlan & Wilson, 2016; Strick et al., 2009). By enabling humans to process complex patterns of auditory stimuli in cerebral working memory in conjunction with primitive brainstem networks, this circuitry leads to diverse and dynamic music cultures that evoke powerful emotional responses. Similarly, cortico-cerebellar pathways to and from parietal cerebral regions can support associations of implicitly learned templates with visuospatial information (McLachlan et al., 2011) and cerebral motor planning. These associations can support spatial representations of pitch height, as suggested by the ANSI pitch definition (American National Standards Institute, 1973).

In Western music the frequency of the starting or reference pitch in these spatial scales can be shifted (or modulated), leading to a complex spatial language of pitch relationships known as relative pitch, which generates strong musical expectancies. Relative pitch is quite distinct from the absolute pitch (or fixed frequencies) that are used to convey meaning in animal communications. So indeed music is a high art, in which complex expressions are shaped in the cerebrum in response to implicitly learned grammars that are likely stored in cortico-ponto-cerebellar pathways. Successful prediction of musical gestures results in activation of reward networks in the limbic system, reinforcing learning (Blood et al., 1999; Salimpoor et al., 2012). In contrast, music can also generate high autonomic arousal by transgressing implicitly learned musical grammars (Blood et al., 1999) or by being loud enough to evoke dominance responses associated with a social identity (Hsu et al., 2014).

Conclusion

Music appears to be common to all human societies, but humans are the only animals that make music. That is, humans are the only animals that spontaneously exhibit fine pitch discrimination and rhythmic synchronization (Merker et al., 2009). While musical behaviors appear to be innate in humans, an individual’s musical perceptual skills develop through training in relation to his or her particular cultural tradition. For example, people who learn to play inharmonic tuned percussion instruments learn to accurately associate pitch with the lowest-frequency partials of those instruments (McLachlan et al., 2013c), and people who learn complex non-Western rhythms have more accurate temporal discrimination for those rhythms than do people unfamiliar with those rhythms (Hannon et al., 2012). These observations are consistent with the proposition that learning music involves neuroplastic adaptation of brainstem and cerebellar pathways that generates long-term memory representations of auditory information, which in turn become associated with memory for musical identities and patterns represented in the “what” and “where” pathways of the cerebral cortex (McLachlan & Wilson, 2010, 2016). The development of musical skill through extensive and repetitive training is consistent with cerebellar function (McPherson, 2005), in that the cerebellum implicitly learns to automate perceptual and motor skills, thereby reducing the conscious effort required to understand complex information and perform complex tasks (Ramnani, 2006). Finally, the arousing, pleasurable, and often prosocial experiences that music affords (in addition to exercising and sharpening auditory, cognitive, and motor acuity) may have conferred an evolutionary advantage for musicality in humans that has contributed to its widespread distribution across human societies.

Acknowledgement

I would like to acknowledge the substantial intellectual input by my colleague Professor Sarah Wilson to the development of the new cerebellar model of auditory processing outlined in this chapter.

References

Abdul-Kareem, I. A., Stancak, A., Parkes, L. M., Al-Ameen, M., AlGhamdi, J., Aldhafeeri, F. M., Embleton, K., Morris, D., & Sluming, V. (2011). Plasticity of the superior and middle cerebellar peduncles in musicians revealed by quantitative analysis of volume and number of streamlines based on diffusion tensor tractography. Cerebellum, 10, 611–623.Find this resource:

Ackermann, H., Mathiak, K., & Riecker, A. (2007). The contribution of the cerebellum to speech production and speech perception: Clinical and functional imaging data. Cerebellum, 6, 202–213.Find this resource:

Aitkin, L. M., & Boyd, J. (1975). Responses of single units in cerebellar vermis of the cat to monaural and binaural stimuli. Journal of Neurophysiology, 38, 418–429.Find this resource:

Aitkin, L. M., & Boyd, J. (1978). Acoustic input to the lateral pontine nuclei. Hearing Research, 1, 67–77.Find this resource:

American National Standards Institute (1973). American National Psychoacoustical Terminology. S3.20. New York: American National Standards Association.Find this resource:

Antovic, M. (2009). Musical metaphors in Serbian and Romani children: An empirical study. Metaphor and Symbol, 24, 184–202.Find this resource:

Arnott, S. R., Binns, M. A., Grady, C. L., & Alain, C. (2004). Assessing the auditory dual-pathway model in humans. NeuroImage, 22, 401–408.Find this resource:

Assmann, P. F., & Paschall, D. D. (1998). Pitches of concurrent vowels, Journal of the Acoustical Society of America, 103, 1150–1160.Find this resource:

Assmann, P. F., & Summerfield, Q. (1990). Modeling the perception of concurrent vowels: Vowels with different fundamental frequencies. Journal of the Acoustical Society of America, 88, 680–697.Find this resource:

Beecham, R., Reeve, R. A., & Wilson, S. J. (2009). Spatial representations are specific to different domains of knowledge. PLoS ONE, 4, e5543.Find this resource:

Berridge, K. C., & Kringelbach, M. L. (2013). Neuroscience of affect: Brain mechanisms of pleasure and displeasure. Current Opinion in Neurobiology, 23, 294–303.Find this resource:

Blackburn, C. C., & Sachs, M. B. (1991). Regularity analysis in a compartmental model of chopper units in the anteroventral cochlear nucleus. Journal of Neurophysiology, 65, 606–629.Find this resource:

Blood, A. J., Zatorre, R. J., Bermudez, P., & Evans, A. C. (1999). Emotional responses to pleasant and unpleasant music correlate with activity in paralimbic brain regions. Nature Neuroscience, 2, 382–387.Find this resource:

Cariani, P. (1999). Temporal coding of periodicity pitch in the auditory system: An overview. Neural Plasticity, 6, 147–172.Find this resource:

Carlyon, R. P., & Shackelton, T. M. (1994). Comparing the fundamental frequencies of resolved and unresolved harmonics: Evidence for two pitch mechanisms? Journal of the Acoustical Society of America, 95, 3541–3554.Find this resource:

Castellano, S., Zanollo, V., Marconi, V., & Berto, G. (2009). The mechanisms of sexual selection in a lek-breeding anuran, Hyla intermedia. Animal Behavior, 77, 213–224.Find this resource:

Chalmers, J. (1990). Divisions of the tetrachord. Lebanon, NH: Frog Peak Music.Find this resource:

Chiandetti, C., & Vallortigara, G. (2011). Chicks like consonant music. Psychological Science, 22, 1270–1273.Find this resource:

Covey, E., & Casseday, J. H. (1991). The monaural nuclei of the lateral lemniscus in an echolocating bat: Parallel pathways for analyzing temporal features of sound. Journal of Neuroscience, 11, 3456–3470.Find this resource:

Critchley, H. D. (2005). Neural mechanisms of autonomic, affective, and cognitive integration. Journal of Comparative Neurology, 493, 154–166.Find this resource:

Cupchik, G. C., Phillips, K., & Hill, D. S. (2001). Shared processes in spatial rotation and music permutation. Brain and Cognition, 46, 373–382.Find this resource:

de Cheveigné, A. (2005). Pitch perception models. In C. Plack, A. Oxenham, R. R. Fay, & A. N. Popper (Eds.), Pitch—neural coding and perception (pp. 169–233). New York: Springer-Verlag.Find this resource:

de Cheveigné, A., & Kawahara, H. (1999). Multiple period estimation and pitch perception model. Speech Communication, 27, 175–185.Find this resource:

Delgutte, B. (1984). Speech coding in the auditory nerve: II. Processing schemes for vowel‐like sounds. Journal of the Acoustical Society of America, 75, 879–886.Find this resource:

Deterding, D. (1997). The formants of monophthong vowels in Standard Southern British English pronunciation, Journal of the International Phonetic Association, 27, 47–55.Find this resource:

Deutsch, D., & Feroe, J. (1981). The internal representation of pitch sequences in tonal music. Psychological Review, 88, 503–522.Find this resource:

Ehret, G., & Schreiner, C. E. (2005). Spectral and intensity coding. In J. A. Winer & C. E. Schreiner (Eds.), The inferior colliculus (pp. 319–345). New York: Springer.Find this resource:

Fastl, H. (1989). Pitch strength of pure tones. In Proceedings of the 13th International Congress on Acoustics (pp. 11–14). Belgrade, Serbia: SAVA CENTAR.Find this resource:

Fastl, H., & Hesse, A. (1984). Frequency discrimination of pure tones at short durations. Acoustica, 56, 41–47.Find this resource:

Ferragamo, M. J., & Oertel, D. (2002). Octopus cells of the mammalian ventral cochlear nucleus sense the rate of depolarization. Journal of Neurophysiology, 87, 2262–2270.Find this resource:

Fiez, J. A., Petersen, S. E., Cheney, M. K., & Raichle, M. E. (1992). Impaired non-motor learning and error detection associated with cerebellar damage. Brain, 115, 155–178.Find this resource:

Fleshler, M. (1965). Adequate acoustic stimulus for startle reaction in the rat. Journal of Comparative and Physiological Psychology, 60, 200–207.Find this resource:

Fletcher, H., & Munson, W. A. (1937). Relation between loudness and masking. Journal of the Acoustical Society of America, 9, 1–10.Find this resource:

Fletcher, N. H., & Rossing, T. (1991). The physics of musical instruments (chapter 3). New York: Springer-Verlag.Find this resource:

Gebhart, A. G., Petersen, S. E., & Thach, W. T. (2002). The role of the cerebellum in language. In S. M. Highstein & W. T. Thach (Eds.), Recent developments in cerebellar research (pp. 318–333). New York: New York Academy of Sciences.Find this resource:

Goldstein, J. L. (1973). An optimum processor theory for the central formation of the pitch of complex tones. Journal of the Acoustical Society of America, 54, 1496–1516.Find this resource:

Guernsey, M. (1928). The role of consonance and dissonance in music. American Journal of Psychology, 40, 173–204.Find this resource:

Hall, J. C. (1994). Central processing of communication sounds in the anuran auditory system. American Zoologist, 34, 670–684.Find this resource:

Handel, S. (1995). Timbre perception and auditory object identification. In B. C. J. Moore (Ed.), Hearing (pp. 425–461). New York: Academic Press.Find this resource:

Handel, S., & Erickson, M. L. (2004). Sound source identification: The possible role of timbre transformations. Music Perception, 21, 587–610.Find this resource:

Hannon, E. E., Soley, G., & Ullal, S. (2012). Familiarity overrides complexity in rhythm perception: A cross-cultural comparison of American and Turkish listeners. Journal of Experimental Psychology: Human Perception and Performance, 38, 543–548.Find this resource:

Holcomb, H. H., Medoff, D. R., Caudill, P. J., Zhao, Z., Lahti, A. C., Dannals, R. F., & Tamming, C. A. (1998). Cerebral blood flow relationships associated with a difficult tone recognition task in trained normal volunteers. Cerebral Cortex, 8, 534–542.Find this resource:

Hsu, D. Y., Huang, L., Nordgren, L. F., Rucker, D. D., & Galinsky, A. D. (2014). The music of power: Perceptual and behavioral consequences of powerful music. Social Psychological and Personality Science, 6, 75–83. doi:10.1177/1948550614542345Find this resource:

Huang, C., Liu, G., & Huang, R. (1982). Projections from the cochlear nucleus to the cerebellum. Brain Research, 244, 1–8.Find this resource:

Hutchinson, S., Lee, L. H., Gaab, N., & Schlaug, G. (2003). Cerebellar volume of musicians. Cerebral Cortex, 13, 943–949.Find this resource:

Helmholtz, H. L. F. (1863/1954). On the sensation of tones as a physiological basis for the theory of music (2nd ed.; A. J. Ellis, Trans.). London: Dover.Find this resource:

Izumi, A. (2000). Japanese monkeys perceive sensory consonance of chords. Journal of the Acoustical Society of America, 108, 3073–3078.Find this resource:

Johnsrude, I. S., Penhune, V. B., & Zatorre, R. J. (2000). Functional specificity in the right human auditory cortex for perceiving pitch direction. Brain, 123, 155–163.Find this resource:

Kidd, G., Jr., Mason, C. R., Brantley, M. A., & Owen, G. A. (1989). Roving-level tone-in-noise detection. Journal of the Acoustical Society of America, 38, 106–112.Find this resource:

Koelsch, S., Fritz, T., von Cramon, D. Y., Müller, K., & Friederici, A. D. (2006). Investigating emotion with music: An fMRI study. Human Brain Mapping, 27, 239–250.Find this resource:

Kotze, S. A., & Schwatze, M. (2010). Cortical speech processing unplugged: A timely subcortico-cortical framework. Trends in Cognitive Science, 14, 392–399.Find this resource:

Kryter, K. D., & Pearsons, K. S. (1965). Judged noisiness of a band of random noise containing an audible pure tone. Journal of the Acoustical Society of America, 38, 106–112.Find this resource:

Langner, G., Albert, M., & Briede, T. (2002). Temporal and spatial coding of periodicity information in the inferior colliculus of awake chinchilla (Chinchilla laniger). Hearing Research, 168, 110–130.Find this resource:

Leiner, H., Leiner, A., & Dow, R. (1989). Reappraising the cerebellum: What does the hindbrain contribute to the forebrain? Behavioral Neuroscience, 102, 998–1008.Find this resource:

Levitin, D. J., & Rogers, S. E. (2005). Absolute pitch: Perception, coding, and controversies. Trends in Cognitive Science, 9, 26–33.Find this resource:

Licklider, J. C. (1951). A duplex theory of pitch perception. Experientia, 7, 128–134.Find this resource:

Lingenhöhl, K., & Friauf, E. (1994). Giant neurons in the rat reticular formation: A sensorimotor interface in the elementary acoustic startle circuit? Journal of Neuroscience, 14, 1176–1194.Find this resource:

Mahon, B. Z. & Caramazza, A. (2008). A critical look at the embodied cognition hypothesis and a new proposal for grounding conceptual content. Journal of Physiology-Paris, 102, 59–70.Find this resource:

McLachlan, N. M. (2009). A model of pitch strength and just noticeable difference. Hearing Research, 249, 23–35.Find this resource:

McLachlan, N. M. (2011). A neurocognitive model of recognition and pitch segregation. Journal of the Acoustical Society of America, 130, 2845–2854.Find this resource:

McLachlan, N. M., & Grayden, D. B. (2014). Enhancement of speech perception in noise by periodicity processing: A neurobiological model and signal processing algorithm. Speech Communication, 57C, 114–125.Find this resource:

McLachlan, N. M., & Wilson, S. J. (2010). The central role of recognition in auditory perception: A neurobiological model. Psychological Review, 117, 175–196.Find this resource:

McLachlan, N. M., & Wilson, S. J. (2016). Auditory recognition in brain stem and cerebellar pathways. Forthcoming.Find this resource:

McLachlan, N. M., Keramati Nigjeh, B., & Hasell, A. (2003). The design of bells with harmonic overtones. Journal of the Acoustical Society of America, 114, 505–511.Find this resource:

McLachlan, N. M., Greco, L., Toner E., & Wilson S. J. (2011). Using spatial manipulation to examine interactions between visual and auditory encoding of pitch and time. Frontiers in Psychology, 1, 233. doi:10.3389/fpsyg.2010.00233Find this resource:

McLachlan, N. M., Marco, D. J. T., & Wilson S. J. (2013a). Consonance and pitch. Journal of Experimental Psychology: General, 142, 1142–1158. doi:10.1037/a0030830Find this resource:

McLachlan, N. M., Marco, D. J. T. & Wilson S. J. (2013b). Pitch and plasticity: Insights from the pitch matching of chords by musicians with absolute and relative pitch. Brain Science, 3, 1615–1634. doi:10.3390/brainsci3041615Find this resource:

McLachlan, N. M., Marco, D. J. T., & Wilson, S. J. (2013c). The musical environment and auditory plasticity: Hearing the pitch of percussion. Frontiers in Psychology, 4, 1–6. doi:10.3389/fpsyg.2013.00768Find this resource:

McPherson, G. (2005). From child to musician: Skill development during the beginning stages of learning an instrument. Psychology of Music, 33, 5–35.Find this resource:

Medina, J. F., Garcia, K. S., Nores, W. L., Taylor, N. M., & Mauk, M. D. (2000). Timing mechanisms in the cerebellum: Testing predictions of a large-scale computer simulation. Journal of Neuroscience, 20, 5516–5525.Find this resource:

Meddis, R. (1986). Simulation of mechanical to neural transduction in the auditory receptor. Journal of the Acoustical Society of America, 79, 702–711.Find this resource:

Meddis, R., & Hewitt, M. J. (1991a). Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I: Pitch identification. Journal of the Acoustical Society of America, 89, 2866–2882.Find this resource:

Meddis, R., & Hewitt, M. J. (1991b). Virtual pitch and phase sensitivity of a computer model of the auditory periphery. II: Phase sensitivity. Journal of the Acoustical Society of America, 89, 2883–2894.Find this resource:

Meddis, R., & Hewitt, M. J. (1992). Modeling the identification of concurrent vowels with different fundamental frequencies. Journal of the Acoustical Society of America, 91, 233–245.Find this resource:

Meddis, R., & O’Mard, L. P. (2006). Virtual pitch in a computational physiological model. Journal of the Acoustical Society of America, 120, 3861–3869.Find this resource:

Merker, B. H., Madison, G. S., & Eckerdal, P. (2009). On the role and origin of isochrony in human rhythmic entrainment. Cortex, 45, 4–17.Find this resource:

Meyer, L. B. (1956). Emotion and meaning in music. Chicago: University of Chicago Press.Find this resource:

Mithen, S. (2005). The singing Neanderthals: The origins of music, language, mind and body. London: Weidenfeld and Nicolson.Find this resource:

Mitterschiffthaler, M. T., Fu, C. H. Y., Dalton, J. A., Andrew, C. M., & Williams, S. C. R. (2007). A functional MRI study of happy and sad affective states induced by classical music. Human Brain Mapping, 28, 1150–1162.Find this resource:

Moore, B. C. J. (1973). Frequency difference limens for short duration tones. Journal of the Acoustical Society of America, 54, 610–619.Find this resource:

Moore, B. C. J. (2003). An introduction to the psychology of hearing (5th ed.). London: Academic Press.Find this resource:

Moore, B. C. J., & Glasberg, B. R. (1983). Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. Journal of the Acoustical Society of America, 74, 750–753.Find this resource:

Mudd, S. A. (1963). Spatial stereotypes of four dimensions of pure tone. Journal of Experimental Child Psychology, 66, 347–352.Find this resource:

Narmour, E. (1992). The analysis and cognition of melodic complexity: The implication-realization model. Chicago: University of Chicago Press.Find this resource:

O’Connell, L. A., Matthews, B. J., Ryan M. J., & Hofmann, H. A. (2010). Characterization of the dopamine system in the brain of the túngara frog, Physalaemus pustulosus. Brain, Behavior and Evolution, 76, 211–225.Find this resource:

Oertel, D. (1985). Use of brain slices in the study of the auditory system: Spatial and temporal summation of synaptic inputs in cells in the anteroventral cochlear nucleus of the mouse. Journal of the Acoustical Society of America, 78, 329–333.Find this resource:

Oertel, D. (1997). Encoding of timing in the brain stem auditory nuclei of vertebrates. Neuron, 19, 959–962.Find this resource:

Oertel, D., Bal, R., Gardner, S. M., Smith, P. H., & Joris, P. X. (2000). Detection of synchrony in the activity of auditory nerve fibers by octopus cells of the mammalian cochlear nucleus. Proceedings of the National Academy of Sciences of the USA, 97, 11773–11779.Find this resource:

Ohyama, T., Nores, W. L., Murphy M., & Mauk, M. D. (2003). What the cerebellum computes. Trends in Neurosciences, 26, 222–227.Find this resource:

Oxenham, A. J., Micheyl, C., Keebler, M. V., Loper, A., & Santurette, S. (2011). Pitch perception beyond the traditional existence region of pitch. Proceedings of the National Academy of Sciences of the USA, 108, 7629–7634.Find this resource:

Pare, D., Steriade, M., Deschenes, M., & Bouhassira, D. (1990). Prolonged enhancement of anterior thalamic synaptic responsiveness by stimulation of a brain-stem cholinergic group. Journal of Neuroscience, 10, 20–33.Find this resource:

Parncutt, R. (1989). Harmony: A psychoacoustical approach. Berlin: Springer-Verlag.Find this resource:

Parsons, L. M., Petacchi, A., Schmahmann, J. D., & Bower, J. M. (2009). Pitch discrimination in cerebellar patients: Evidence for a sensory deficit. Brain Research, 1303, 84–96.Find this resource:

Partch, H. (1974). Genesis of a music (2nd ed.). New York: Da Capo.Find this resource:

Perrett, S. P., Ruiz, B. P., & Mauk, M. D. (1993). Cerebellar cortex lesions disrupt learning-dependent timing of conditioned eyelid responses. Journal of Neuroscience, 13, 1708–1718.Find this resource:

Pickles, J. O. (2008). An introduction to the physiology of hearing (3rd ed., pp. 155–183). Bingley, UK: Emerald.Find this resource:

Piston, W. (1948). Harmony (5th ed.). London: Gollancz.Find this resource:

Plomp, R. (1967). Pitch of complex tones. Journal of the Acoustical Society of America, 41, 1526–1533.Find this resource:

Plomp, R., & Levelt, W. J. M. (1965). Tonal consonance and critical bandwidth. Journal of the Acoustical Society of America, 38, 548–560.Find this resource:

Pratt, C. C. (1930). The spatial character of high and low tones. Journal of Experimental Psychology, 13, 278–285.Find this resource:

Quirk, G. J., Repa, J. C., & LeDoux, J. E. (1995). Fear conditioning enhances short-latency auditory responses of lateral amygdala neurons: Parallel recordings in the freely behaving rat. Neuron, 15, 1029–1039.Find this resource:

Ramnani, N. (2006). The primate cortico-cerebellar system: Anatomy and function. Nature Reviews Neuroscience, 7, 511–522.Find this resource:

Rauschecker, J. P., & Tian, B. (2000). Mechanisms and streams for processing “what” and “where” in the auditory cortex. Proceedings of the National Academy of Sciences of the USA, 97, 11800–11806.Find this resource:

Ravizza, S. M., McCormick, C. A., Schlerf, J. E., Justus, T., Ivry, R. B., & Fiez, J. A. (2006). Cerebellar damage produces selective deficits in verbal working memory. Brain, 129, 306–320.Find this resource:

Reese, N. B., Garcia-Rill, E., & Skinner, R. D. (1995). Auditory input to the pedunculopontine nucleus: I. Evoked potentials. Brain Research Bulletin, 37, 257–264.Find this resource:

Riquelme, R., Saldanã, E., Osen, K. K., Ottersen, O. P., & Merchán, M. A. (2001). Colocalization of GABA and glycine in the ventral nucleus of the lateral lemniscus in rat: An in situ hybridization and semiquantitative immunocytochemical study. Journal of Comparative Neurology, 432, 409–424.Find this resource:

Ritsma, R. J. (1967). Frequencies dominant in the perception of the pitch of complex tones. Journal of the Acoustical Society of America, 42, 191–198.Find this resource:

Robinson, K., & Patterson, R. D. (1995). The stimulus duration required to identify vowels, their octave and their pitch chroma. Journal of the Acoustical Society of America, 98, 1858–1865.Find this resource:

Roffler, S. K., & Butler, R. A. (1968). Localization of tonal stimuli in the vertical plane. Journal of the Acoustical Society of America, 43, 1260–1266.Find this resource:

Rusconi, E., Kwan, B., Giordano, B. L., Umiltà, C., & Butterworth, B. (2006). Spatial representation of pitch height: The SMARC effect. Cognition, 19, 113–129.Find this resource:

Ryan, M. J., Fox, J. H., Wilczynski, W., & Rand, A. S. (1990). Sexual selection for sensory exploitation in the frog Physalaemus pustulosus. Nature, 343, 66–67.Find this resource:

Sacchetti, B., Scelfo, B., & Strata, P. (2005). The cerebellum: Synaptic changes and fear conditioning. Neuroscientist, 11, 217–227.Find this resource:

Salimpoor, V. N., van den Bosch, I., Kovacevic, N., McIntosh, A. R., Dagher, A., & Zatorre, R. J. (2012). Interactions between the nucleus accumbens and auditory cortices predict music reward value. Science, 340, 216–219.Find this resource:

Schneider, G. E. (1969). Two visual systems. Science, 163, 895–902.Find this resource:

Schofield, B. R. (2005). Olivary and leminscal projections. In J. A. Winer & C. E. Schreiner (Eds,), The inferior colliculus (pp. 132–154). New York: Springer.Find this resource:

Schouten, J. F. (1938). The perception of subjective tones. Proceedings of the Koninklijke Nederlandse Akademie van Wetenschappen, 41, 1086–1093.Find this resource:

Seashore, C. E., & Mount, G. H. (1918). Correlation of factors in musical talent and training. Psychological Monographs, 25, 83.Find this resource:

Semal, C., & Demany, L. (2006). Individual differences in the sensitivity to pitch direction. Journal of the Acoustical Society of America, 120, 3907–3915.Find this resource:

Shamma, S., & Klein, D. (2000). The case of the missing pitch templates: How harmonic templates emerge in the early auditory system. Journal of the Acoustical Society of America, 107, 2631–2644.Find this resource:

Smith, N. A., & Schmuckler, M. A. (2008). Dial A440 for absolute pitch: Absolute pitch memory by non–absolute pitch possessors. Journal of the Acoustical Society of America, 123, EL77–EL84.Find this resource:

Smith, Z. M., Delgutte, B., & Oxenham, A. J. (2002). Chimaeric sounds reveal dichotomies in auditory perception. Nature, 416, 87–90.Find this resource:

Soeta, Y., Yanai, K., Nakagawa, S., Kotani, K., & Horii, K. (2007). Loudness in relation to iterated rippled noise. Journal of Sound and Vibration, 304, 415–419.Find this resource:

Strait, D. L., Chan, K., Ashley, R., & Kraus, N. (2012). Specialization among the specialized: Auditory brainstem function is tuned in to timbre. Cortex, 48, 360–362.Find this resource:

Strick, P. J., Dum, R. P., & Fiez, J. A. (2009). Cerebellum and non-motor function. Annual Review of Neuroscience, 32, 413–434.Find this resource:

Sumner, C. J., Lopez-Poveda, E. A., & Meddis, R. (2002). A revised model of the inner-hair cell and auditory-nerve complex. Journal of the Acoustical Society of America, 111, 2178–2188.Find this resource:

Terhardt, E. (1974). Pitch, consonance, and harmony. Journal of the Acoustical Society of America, 55, 1061–1069.Find this resource:

Thompson, R. F., & Steinmetz, J. E. (2009). The role of the cerebellum in classical conditioning of discrete behavioural responses. Neuroscience, 162, 732–755.Find this resource:

Trainor, L. J., Tsang, C. D., & Cheung, V. H. W. (2002). Preference for sensory consonance in 2- and 4-month old infants. Music Perception, 20, 187–194.Find this resource:

Tramo, M. J., Shah, G. D., & Braida, L. D. (2000). Functional role of auditory cortex in frequency processing and pitch perception. Journal of Neurophysiology, 87, 122–139.Find this resource:

Tricomi, E., Delgado, M. R., McCandliss, B. D., McClelland, J. L., & Fiez, J. A. (2006). Performance feedback drives caudate activation in a phonological learning task. Journal of Cognitive Neuroscience, 18, 1029–1043.Find this resource:

Vitousek, M. N., Adelman, J. S., Gregory N. C., & St. Clair, J. J. H. (2007). Heterospecific alarm call recognition in a non-vocal reptile. Biological Letters, 3, 632–634.Find this resource:

Walker, K. M. M., Bizley, J. K., King, A. J., & Schnupp, J. W. H. (2011). Cortical encoding of pitch: Recent results and open questions. Hearing Research, 271, 74–87.Find this resource:

Weinberger, N. M. (2012). Plasticity in the primary auditory cortex, not what you think it is: Implications for basic and clinical auditory neuroscience. Otolaryngology, S3, 002. doi:10.4172/2161-119X.S3-002Find this resource:

Wiegrebe, L., & Winter, I. M. (2001). Temporal representation of iterated rippled noise as a function of delay and sound level in the ventral cochlear nucleus. Journal of Neurophysiology, 85, 1206–1219.Find this resource:

Wieser, H., & Mazzola, G. (1986). Musical consonances and dissonances: Are they distinguished independently by the right and left hippocampi? Neuropsychologia, 24, 805–812.Find this resource:

Wightman, F. L. (1973). The pattern-transformation model of pitch. Journal of the Acoustical Society of America, 54, 407–416.Find this resource:

Wilson, S. J., Lusher, D., Martin, C. L., Rayner, G., & McLachlan, N. M. (2012). Intersecting factors lead to absolute pitch acquisition that is maintained in a “fixed do” environment. Music Perception, 29, 285–296.Find this resource:

Wolpert, D. M., Miall, R. C., & Kawato, M. (1998). Internal models in the cerebellum, Trends in Cognitive Science, 2, 338–347.Find this resource:

Yost, W. A. (1996). Pitch strength of iterated rippled noise. Journal of the Acoustical Society of America, 100, 3329–3335.Find this resource:

Zilany, M. S. A., and Bruce, I. C. (2006). Modeling auditory-nerve responses for high sound pressure levels in the normal and impaired auditory periphery, Journal of the Acoustical Society of America, 120, 1446–1466.Find this resource:

Zwicker, E., & Fastl, H. (1999). Psycho-acoustics facts and models (2nd ed.). Berlin: Springer-Verlag.Find this resource: