Show Summary Details

Page of

PRINTED FROM OXFORD HANDBOOKS ONLINE ( (c) Oxford University Press, 2015. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in Oxford Handbooks Online for personal use (for details see Privacy Policy).

date: 18 November 2017

Musical Grammar

Abstract and Keywords

This article discusses musical grammar and the factors that shape it, including the psychological abilities and constraints that determine what humans can learn, remember, and reproduce. It illustrates the notion of musical grammar by imagining how music works and is explained in a fictional world—the land of Bijou. Bijouan music has similarities to an artificial music used in a recent empirical study of how listeners can learn a musical grammar through repeated exposure to an initially unfamiliar type of music. The article examines the laws of harmony, the principle of chordal inversion, the many meanings of musical grammar, the operations of syntax, and the importance of memory before concluding with a description of how musical grammar was taught in the conservatories of eighteenth- and nineteenth-century Europe.

Keywords: musical grammar, Psyche Loui, David Wessel, Carla Hudson Kam, laws of harmony, chordal inversion, chord morphology, syntax, memory, Leonard B. Meyer

Music has been called the “universal language.” That might be true within areas that share similar musical traditions. A Norwegian, for instance, could enjoy Italian instrumental music without the need for any kind of translation. But were a French military band to provide the music for a traditional Bedouin ceremony in Qatar, the limits of music’s universality would quickly become apparent. Even within a single family, teenagers might enjoy listening to things that the parents or grandparents totally reject as being music. Given this diversity it seems wise not to discuss musical grammar solely in terms of any one style. True statements about heavy metal in 1980s Los Angeles might be false for Latin masses in 1680s Rome. A great deal of the following discussion thus concerns a rare and beautiful kind of music that is foreign to everyone.

Musical Grammar in Bijou

Let us begin by imagining how music works in the fictional land of Bijou. Like many real musics, the music of Bijou is highly prized for its beauty and emotional power. Brides insist on it for their weddings, bereaved families want it for a dignified funeral, and even everyday entertainments seem more entertaining when accompanied by it. The styles of this music are famously complex, with elaborate melodies and subtle rhythms that have developed over the centuries. The musicians of Bijou nevertheless manage the complexity through a unique system known as the “Rules of the Jewels,” and musicians think of these rules as summarizing the grammar of Bijouan music.

Musical GrammarClick to view larger

Figure 1. Two proper sequences of jewels (rows 1 and 2) and two improper sequences (rows 3 and 4). Sequence 4 violates the rule for a beryl at stage C. The grammatical problem with sequence 3 is unknown.

Image courtesy of the American Gem Trade Association.

The first rule states that phrases must end with a musical gesture known as beryl, after the pale yellow-green jewel. Gemstones are so abundant in Bijou that musicians teach the rules by aligning jewels in rows, with a left-to-right arrangement representing order in time. Four such rows are shown in Figure 1. Rows 1 and 2 show popular three-jewel sequences. Rows 3 and 4, flagged with asterisks, show sequences that sound wrong and are actually offensive to local musicians. Though we are outsiders to Bijou, we can still guess that sequence 4 is wrong because it ends with a blue sapphire, violating the first rule. Sequence 3, however, does follow the rule, so its problem is more of a mystery. Maybe a ruby at stage B is forbidden, or perhaps a pink morganite at stage A signals an entirely different sequence. How could we make educated guesses to explain the problem with sequence 3?

Many music lovers, in the long history of music in Bijou, have asked similar questions. The explanations given them by teachers and scholars have varied widely over the ages. In early times appeals were made to religious and cultural ideals. Thus sequence 1 was said to be especially fine because the jewels moved “toward the light,” “becoming ever brighter.” Those ideals did not, however, fit sequence 2 nearly as well, even though it was a favorite of musicians and audiences. In the era of its first natural scientists, the Bijouan Academy of Science considered a theory that the proper sequence of jewels should exhibit an increasing index of refraction. But again, the approach from physics made distinctions between sequences that did not fully match the behaviors and preferences of the best musicians. More recently, Bijouan scholars have begun to wonder if better explanations might come from studies in psychology and learning.

An Experiment

At this point in our discussion curiosity about music in Bijou happens to align with curiosity about music in the real world of the twenty-first century. In 2008, for instance, three North American researchers—Psyche Loui, David Wessel, and Carla Hudson Kam—began to seek a new way to study how listeners learn a musical grammar. They wanted to start with a clean slate, but any real musical style that they chose would already be known by some people. So instead they created a completely new kind of music unfamiliar to everyone. It has a radically different kind of scale that is not used by any ethnic or social group in the real world (the notes are stretched apart so that the scale becomes strangely wider). One version of the so-called “Bohlen-Pierce” scale can be heard by clicking Audio Example 1.

Musical GrammarClick to view larger

Figure 2. Two different grammars for four-jewel sequences. They differ in that sequence 5 goes “sapphire to garnet” while sequence 6 goes “garnet to sapphire.”

Image courtesy of the American Gem Trade Association.

The grammar invented for this music is not unlike musical grammar in Bijou, so we can use patterns of jewels to explain it. The four-jewel patterns shown in Figure 2 have the same beginnings and endings: beryls. The only difference is that for sequence 5 the middle jewels are “sapphire, garnet” while for 6 they are “garnet, sapphire.” Each jewel represents a different “chord” of three tones from the scale. A grammatically correct melody sounds two tones chosen from each of the jewels for a total of eight tones. As a melody progresses through each jewel, left to right, it can sound two different tones or the same tone twice. Depending on which chord tones are chosen, the first beryl, for example, may not sound the same as the last one.

A melody from sequence 5, one used in the experiment, can be heard by clicking Audio Example 2. A test melody from sequence 6 can also be heard by clicking Audio Example 3.

In a controlled experiment reported in the journal Music Perception (2010), our researchers asked each participant to listen to 400 different melodies that conformed to one but not the other sequence of jewels. It took about 30 minutes to listen to all the melodies. Afterward participants were first tested to see if they could recognize the melodies that they had already heard; they were about 65% accurate, which is very good for having heard 400 different short melodies just once. Then participants were played several pairs of new melodies, where for each pair one melody followed the grammar of sequence 5 and one followed sequence 6. They were asked to choose which melody was more familiar and like the ones they had learned. Because each participant had listened to melodies from only one sequence, this was like distinguishing proper tunes in a recently learned style from tunes that were somewhat similar but not quite right. In effect, the choice was between grammatically correct and incorrect melodies. Participants made the correct choice about 75% of the time.

To most people, the two sample melodies provided here sound a little strange and may seem more alike than different. Certainly neither one is easy to hum or whistle. So how were people able to tell a new sapphire-to-garnet melody from a garnet-to-sapphire one? Every physical aspect of the melodies was balanced between the two sequences, and the participants were unlikely to have cultural preferences that might bear on such unusual tunes. One is left with the likelihood that “statistical” learning is the best explanation. Students trained on sequence 5 learned the regularities in its patterns of tones (the statistics or probabilities of what followed what). Later, when tested, they remembered the gist of the patterns learned earlier and were able to generalize their learning so that they could identify new melodies that followed the same “rules.” Participants trained on sequence 6 did the same thing, and each group later heard the other group’s melodies as sounding wrong or at least not quite right. It all came down to a learned sensitivity to usage, whichever usage the participants had chanced to learn.

Usage ≈ Rules

Think back now to the problem of sequence 3 (cf. Figure 1). If we knew more rules (more grammar), we might be able to explain why the sequence is wrong. But where would the extra rules have come from? Notice that the prior sentence ended with a proposition—“from.” For centuries now many teachers of English grammar have proclaimed such usage ungrammatical. The topic was raised in an influential eighteenth-century grammar book by Robert Lowth (1763, p. 141).

The Preposition is often separated from the Relative which it governs, and joined to the Verb at the end of the Sentence, or some member of it: as, “Horace is an author, whom I am much delighted with.” “The world is too well bred to shock authors with a truth, which generally their booksellers are the first that inform them of” [a quote from Alexander Pope]. This is an Idiom which our language is strongly inclined to; it prevails in common conversation, and suits very well with the familiar style in writing; but the placing of the Preposition before the Relative is more graceful, as well as more perspicuous; and agrees much better with the solemn and elevated Style.

Lowth (1710–1787), a high churchman and Oxford professor, was the very sort of person who might have been expected to issue strict rules like “Thou shall not dangle Prepositions!” Yet instead he treats usage as the deciding factor—he believes the idiom is fine for all but the more formal styles. If usage determines rules, then rules are like small stories told about usage. Even if Bijouan musicians pronounced a rule that applied to the problem of sequence 3, it is likely that the rule would itself be a generalization derived from many individual musical utterances that shared similar patterns of usage. In fact, if we could examine the statistics of enough patterns of jewels, or learn those statistics implicitly through years of training and performance under the guidance of Bijouan master musicians, we might find explicit rules to be needless oversimplifications.

The Laws of Harmony

Today many musicians are uncomfortable with the suggestion that the rules of a music—its grammar—are just the norms of its usage or that mastering a music’s grammar is much the same as developing a sensitivity to those norms. They may ask, “What about the laws of harmony?” or “Doesn’t the overtone series determine how tones go together?” Questions like these often confuse correlation with causation. It is true, for instance, that faint higher pitches—“overtones”—are produced when a string is plucked or when a narrow column of air is set vibrating. It is also true that most of the world’s musics contain intervals of a fourth, fifth, or octave that match, in their simple frequency ratios, the ratios among the strongest of those overtones. But the art of those musics is no more determined by that coincidence than are automobile drivers in the northern hemisphere forced to drive northward because of the influence of the magnetic pole on the iron in their vehicles. As Isaac Newton revealed long ago, the earth and the automobiles do attract each other, but that fact of physics does little to determine patterns of driving.

Some of the assumed “laws of harmony” come from the period when Europe was beginning to industrialize. Images of machines with their fixed actions and reactions may have inspired musicians to transform their knowledge of normal successions of chords into a more rigid and mechanistically prescriptive “chord grammar,” with attendant “part-writing rules.” Especially in Protestant lands during the Victorian era, the connections were only too obvious between the strictures set up for a “proper” musical grammar and the strictures of propriety in “good society.” Not surprisingly, a better fit to such a grammar was to be found in earnest, upright hymns than in sensuous art songs, virtuoso chamber music, or scandalous opera. More esoteric ideas about harmony came from a somewhat later time when European scientists were discovering all sorts of new and unseen phenomena. The early discoveries of overtones and principles of electromagnetism were followed by even more unexpected phenomena like radio waves, invisible gases like helium, and secret worlds within the atom. Musicians began to wonder if tones and chords could also have fundamental, previously unrecognized “functions” (Hugo Riemann 1877), spiritual “wills” (Schenker 1935), or dynamic “energies” (Kurth 1917) that could explain the underlying causes of music’s observable surface. In hindsight, it now seems apparent that these writers were transferring their deep feelings about music into beliefs about deep, incorporeal causes, thereby mistaking causes for effects.

The Principle of Chordal Inversion

Jean-Philippe Rameau (1683–1764), one of the greatest composers to have ever written about musical grammar, embarked on a conscious attempt to uncover the “natural principles” of music just as Newton had done for physics (Rameau 1722). Rameau tried for decades to bring his new theory of harmony into line with what working musicians of his day called the “Rule of the Octave” (a practical guide to which chord to play on each step of the ascending and descending scale), never quite getting the two to match. In the course of writing several books on harmony, he popularized the notion of chordal inversion, whereby a chord retains the same grammatical meaning whichever of its several tones happens to be the lowest one. His contemporary J. S. Bach, according to the testimony of Bach’s son Emanuel, did not believe this was true, but later generations of music students have nonetheless taken the principle of chordal inversion to be a verified rule of musical grammar.

Musical GrammarClick to view larger

Figure 3. Jewels used to represent different sequences of chords. Both beginning and ending chords are the same, but the inner chords have opposite sequences.

Image courtesy of the American Gem Trade Association.

Let us take two patterns of Bijouan music and reinterpret them in terms of chords from styles of music in Europe and America. As shown in Figure 3, the jewel sequences 5 and 6 imply that the beginning and ending chords are always the same but that the middle chords reverse their order. If the theory of inversion is correct, any note of a “sapphire” chord could serve as the lowest tone, as could any note of a “garnet” chord, without changing the grammar. That is, merely switching the octaves in which we place any chord’s tones should not change the identity of the sequence or cause a grammatical mistake.

Musical GrammarClick to view larger

Figure 4. Four-chord progressions like the jewels of Figure 3. The text describes points of interest (marked by arrows) for each of these six small cadences.

Source: Robert O. Gjerdingen.

The music notation shown in Figure 4 represents six variants of a small cadence, all of which can be heard by clicking Audio Example 4.

Let us arbitrarily assign cadence 1 to the first four-jewel pattern, the “sapphire-to-garnet” order of sequence 5 (cf. Figure 3). Cadence 1 is the most classical sounding, with its introduction and “preparation” at stage A of a treble G that will become dissonant over an F-chord at stage B (see the arrow on Figure 4) and then resolve downward by step during stage C on its way to a final consonance at stage D (Audio Example 5). For the grammarian Lowth, this would be the “graceful and perspicuous” usage appropriate for “the solemn and elevated Style.” From innumerable cadences like this we get the “rule” that “sapphire progresses to garnet and garnet progresses to beryl,” though this is better known as “IV progresses to V, and V progresses to I.” The roman numerals stand for the scale steps of the major scale viewed as “roots” of chords.

Cadence 2 is more casual in doing away with the niceties of preparation and resolution, though it still sounds like a “sapphire-to-garnet” sequence. Lowth might say that it “suits very well with the familiar style,” especially where, as indicated by the arrow, it adds a jazzy seventh (a B) to the final chord (Audio Example 6). Note that this chord of closure and resolution is acoustically the most dissonant of the whole progression, suggesting that dissonance and consonance are not the most important factors in this grammar. Cadence 3 takes the principle of inversion literally and places the seventh of the C-chord in the bass. The asterisk on Figure 3 marks an ungrammatical utterance: this low B sounds like a wrong note, whatever the theory of inversion might claim (Audio Example 7).

Cadence 4 again sounds somewhat like the “sapphire-to-garnet” patterns of cadences 1 and 2 (Audio Example 8). Yet observe that cadence 4 switches the inner voices of stages B and C. In cadences 1 and 2, the second stage had the notes F–A–C–G and the third stage had G–G–B–F. Now, in cadence 4, the second stage has the notes G–B–D–F and the third stage has F–A–C–G. These combinations of notes have reversed (“sapphire” becomes “garnet” and IV becomes V), but the perceived grammar has not changed. The important outer voices stayed the same as in cadences 1 and 2, and listeners familiar with styles of popular music in the later 1970s and 1980s (e.g., Steely Dan) know how to interpret the upper voices (consonant among themselves) as a collective dissonance that, like the simpler dissonance of cadence 1, will ultimately resolve downward as the cadence concludes. If simple enumerations of note-name aggregates adequately defined the grammar of chords, then cadence 4 should not sound at all like the other “sapphire-to-garnet” cadences because its collections of tones objectively read “garnet to sapphire.”

Cadence 5 completely reverses the middle stages of cadence 2. It does sound different from prior cadences and could be said to match sequence 6—beryl, garnet, sapphire, beryl (Audio Example 9). Stylistically, the “garnet-to-sapphire,” V-to-IV progression is a better match to the usage of some cadences in light popular styles from the 1950s and 1960s, especially those influenced by African American genres like the blues. The seventh in the chord at stage D is, however, uncommon in that usage, so replacing the seventh with the plain tonic (a C) at the last stage of cadence 6 better aligns the choice of sonority (“chord morphology”) with the associated genres and their norms of usage (Audio Example 10).

Are the C-major chords—the beryls—that begin and end all these cadences interchangeable, as would be expected from the theory of inversion? The answer is “Partially but not fully.” Concluding a cadence with the third of the chord in the bass (the low E in stage A replacing the low C in stage D) was perceived in the time of Mozart as an “imperfect” kind of ending, while beginning a cadence with the low C in the bass was less common than with the low E.

In sum, the theory of inversion is not very successful as a theory of chord morphology, leaving aside its broad dissemination in pedagogy. Without a workable theory of inversion, the putative laws of harmony applied to art music quickly devolve into observations about usage, observations that of necessity must include factors like melody, rhythm, and counterpoint. That is hardly surprising since European art music was founded on the coordinated movements of two or more voices—a counterpoint of melodies. In early European art music, chords were less like primary elements of the musical grammar and more like secondary phenomena—byproducts of a contrapuntal and melodic grammar. These resultant sonorities might appear with similar shapes (e.g., C-major triads) when actually caused by different melodic and contrapuntal designs.

The dependency of harmony on melody and counterpoint did not end with Bach and Handel. Even in the early twentieth century the French composer Vincent d’Indy (1851–1931), a master of harmony, declared that “Musically, chords do not exist, and harmony is not the science of chords. The study of chords per se is, from a musical point of view, completely in error esthetically, for harmony comes from melody and ought never to be separated from it in practice” (1903, 91). His polemical statement, influenced perhaps by his location in the upper echelons of European art music, may seem extreme. His opinions would not, for instance, be shared by guitarists in many folk-music traditions. D’Indy’s view, nevertheless, serves as a useful caution against thinking of the laws of harmony as being solely about chords.

The Many Meanings of Grammar

In reference to language, the term grammar covers a variety of usages. There are informal meanings in everyday speech and highly formal meanings with special vocabularies in linguistics or computer programming. The meanings of musical grammar inherit this range of usage and extend from the general sense of “correctness” to the highly specialized and technical meanings of advanced music analysis.

Informally, musical grammar can mean “the basics of the art” just as grammar school refers to the place where one learns the basics of literacy and numeracy. Young performers are expected to learn scales, arpeggios, simple chords, and cadences, and how to read basic notation or, in the case of jazz, chord symbols. These skills are often referred to as “fundamentals.” In this domain of musical grammar, rules predominate and the focus is on the prescriptive and proscriptive—“do this, and don’t do that!” As with language, the clarity of such rules does not always conform to the realities of usage. Many music students, for instance, believe the “natural minor” scale must in some sense be natural or fundamental, when in reality it is rare and artificial within the European classical tradition.

Chord grammar refers on the one hand to the proper spelling of chords (“chord morphology”) and, on the other hand, to the arrangement of chords in series (“chordal syntax”). Theoretically, chord morphology depends on chordal syntax in the sense that a particular musical grammar can determine whether a tone is “in” or “out” of a chord. Practically speaking, a small number of “rules of thumb” operate within each musical tradition, and such rules may be quite rigid. In the music of nineteenth-century Europe, chords were treated as simple “stacks of thirds” like C–E–G or C–E–G–B. One could, in theory, continue this stacking to eventually include every note of the scale. In practice, however, triads and some seventh chords were treated as unitary chordal objects, whereas other sevenths, and all ninths and elevenths above a “root” were treated as “nonharmonic” (i.e., extraneous to the chord). By contrast, in twentieth-century jazz and popular traditions the categories of harmonic objects included many note combinations that the art-music tradition would view as composites of harmonic and nonharmonic tones. In this sense the chord morphology in the classical tradition was more theoretically driven and prescriptive while that of jazz and popular traditions was more descriptive and inclusive.

The informal grammars just mentioned have a practical orientation toward reading music notation or discussing basic musical objects. One could learn these things well but still know very little of the art of music. So music scholars have pursued the creation of formal grammars that offer more insight into how a music might be created or how one could describe all the interrelationships of its tones, rhythms, and phrases. In his 1977 book Early Downhome Blues, for example, the ethnomusicologist Jeff Todd Titon outlined a “song-producing model” for that style. He described his model as a “generative grammar,” referring to the work of the linguist Noam Chomsky. A grammar is “generative” if it is capable of producing a large set of correct utterances and no incorrect ones. Titon’s work coincided with the high-water mark of Chomsky’s influence on music studies. There were “generative” studies of Indian ragas (Cooper 1977) and of the European art-music tradition (Keiler 1977, Lerdahl & Jackendoff 1977), culminating in Lerdahl and Jackendoff’s 1983 book A Generative Theory of Tonal Music, one of the most complete grammars of classical European music ever published. Its only competitor in that regard would be Heinrich Schenker’s 1935 Free Composition, which analyzes entire movements of symphonies and other large works as a complex hierarchy of various contrapuntal combinations.

While Noam Chomsky (1928–) is without doubt the most famous linguistic theorist of the past fifty years, his highly abstract theories have long been opposed by important “functionalists,” which is to say linguists who see language as rooted in the real situations of human communication and social interaction. Over the decades, data from experiments in psycholinguistics and statistics from computer analyses of large collections of texts or speech have been increasingly favoring the functionalists, as have studies in child development. In Chomskyan linguistics, it was argued that children could never learn a grammar solely from exposure to the speech of adults. The so-called “argument from the poverty of the stimulus” held that sequences of syllables do not provide sufficient information from which to infer a grammar. Hence certain aspects of language must be innate.

Functionalists have argued to the contrary that children have much more information available to them than just a sequence of syllables (Tomasello 2003). “More-ap-ple-sauce?” when viewed as four abstract sounds may seem like an impoverished stimulus. But for a hungry baby looking at a spoon of tasty food held by a smiling mother who raises her eyebrows as she intones “More-ap-ple-sauce?” with a rising inflection, the stimulus is rather rich and inferentially productive, especially when quickly reinforced by the reward of applesauce. Even without all the helpful cues provided by human interactions, it has been experimentally confirmed that infants can learn words based solely on the statistics of the order of syllables (Saffran 2001). Infants are amazing learners; the similarities among the grammars of the world’s many languages may have less to do with any special genes for grammar and more to do with our genes for general learning. And in the absence of any known genes for music, it seems likely that the similarities that exist across all the world’s musics may be attributable to similarities in how we learn, compare, and remember sounds.

The Operations of Syntax

Musical GrammarClick to view larger

Figure 5. Four versions of one Bijouan sequence: the basic pattern (no. 5), two extensions by repetition (nos. 7 and 8), and a further extension (no. 9) by two repetitions and one slight variation (garnet to altered garnet).

Image courtesy of the American Gem Trade Association.

What linguists call coordination is the linking of similar items, often by conjunctions (common words like “and”). In music, simple repetition serves as a cue to coordinate one statement of a pattern with its restatement. The same scheme works for a statement and a subsequent minor variation. Figure 5 shows a now-familiar Bijouan pattern and three longer versions of it. For musicians in Bijou, all four sequences are said to have the same structure: a basic form (no. 5), an extended form with repeated garnets (no. 9), a different extension with repeated sapphires (no. 10), and the longest version (no. 11) with repeated sapphires, varied garnets, and repeated beryls.

Simple coordination of similar items is quite limiting in language (e.g., “He and she went here or there”). One of the hallmarks of a developed human grammar is the ability to coordinate hierarchical relationships among different items. This typically involves understanding how small patterns fit within and modify the meaning of larger patterns. Linguists call this subordination. In English, the basic subject–verb–object pattern (“The boy kicked the ball”) is easily elaborated and modified by subordinate adjectives (“The young boy kicked the big red ball”) or by subordinate phrases (“The boy who mows the Smiths’ lawn kicked the ball that he found behind their garage”).

The American music theorist Leonard B. Meyer (1973) made the distinction between features of music that can be used to construct a syntax and those that cannot (or at least are not presently so used). Examples of the former would be distinct tones (not sirens or noises) and distinct durations (times that we can compare, as in “this is twice as long as that”). Examples of nonsyntactic features would be tone color, loudness, or texture. Meyer’s ideas suggest that syntax in music is easiest to create and perceive when it involves things we can remember as distinct, countable objects (a melody goes “up two steps and then down one step”) or relate to clear reference points (the trumpet’s licks “always start on the downbeat”).

Musical GrammarClick to view larger

Figure 6. The subordination of three-jewel sequence 1 to the beryls in four-jewel sequence 5.

Image courtesy of the American Gem Trade Association.

By using simple patterns of scale tones and durations, Bijouan musicians are able to create a syntax that subordinates smaller patterns within larger ones. In particular, the grammar allows for the subordination of three-jewel sequences into longer patterns of four-jewel sequences. We already know that many three-jewel sequences end with a beryl (e.g., nos. 1, 2) and that many four-jewel sequences begin and end with a beryl (nos. 5, 6, 7, 8, and 9). So the jewel-pattern diagram shown in Figure 6, while more complex than anything shown previously, is still considered grammatical in Bijou. Musicians, of course, do not perform diagrams. When they perform this structure, they play the jewels in the order “Q–R–S–B–C–Q–R–S,” with the subordinated patterns replacing beryls A and D.

Musical GrammarClick to view larger

Figure 7. A simple three-jewel sequence of emerald, blue iolite, and emerald. This pattern occurs only in subordinated roles.

Image courtesy of the American Gem Trade Association.

Some patterns of jewels only occur at lower levels of subordination. For example, sequence 10 shown in Figure 7 is never a top-level pattern. Its simple scheme of an emerald, a blue iolite, and a second emerald is always associated with embellishing a ruby.

Musical GrammarClick to view larger

Figure 8. Two levels of subordination. Sequence 10 is subordinated to the rubies in sequence 1, which in turn begin a three-jewel sequence subordinated to the beryls of sequence 5.

Image courtesy of the American Gem Trade Association.

Since rubies can form the first jewel in a three-jewel pattern, and since three-jewel patterns can be subordinated in four-jewel patterns, the result can be a three-level hierarchy (see Figure 8). When performed, this structure will produce the long sequence of jewels “L–M–N–R–S–B–C–L–M–N–R–S.” Here the “Qs” of Figure 5 have been replaced by the “L–M–N” pattern of sequence 10. And due to the two levels of subordination, the top-level A has now been replaced by five jewels—L–M–N–R–S—as has the top-level D.

Sapphires and garnets in a top-level pattern (e.g., sequence 5) could be repeated and/or varied. They also have their own traditions of subordinate patterns, so very elaborate structures can easily be constructed. Music of this complexity is typical of performances at the royal court of Bijou, where young girls train intensively for a decade or more before they attempt to perform any of the courtly genres. In lighter genres of urban entertainment music or rural folk music, this degree of complexity is generally avoided. Music for children often uses little or no subordination, while art music for Bijouan connoisseurs revels in it.

With graphs of the type shown above in Figure 8, one could create hierarchies of any degree of complexity. The eye sees the whole picture of how the brackets and arrows relate subordinate patterns to a top-level sequence. When patterns are not visual but aural, humans have some limitations because we only hear one “jewel” at a time. Take language, for instance. When we hear a sentence being read or spoken we must construct for ourselves an idea of the meaning as the sentence unfolds. Many languages help us to do this by giving us words or special word forms that give us clues to the intended structure. In the sentence presented earlier to illustrate subordination—“The boy who mows the Smiths’ lawn kicked the ball that he found behind their garage”—helpful cues are provided by “who,” “Smiths’,” “that,” “behind,” and “their.” The presence of such cues—“predictive dependencies” (Saffran 2003)—facilitates statistical learning. If one leaves out too many cues, a listener may be “led down the garden path,” meaning led to a wrong conclusion about sentence structure. A “garden path sentence” like “The old man the boat” is intended to mean something like “The boat is manned by the old people.” But when we hear or read “the old man” we tend to assume that an “old man” is the subject of what comes next (which is false).

To illustrate aurally the limitations but also the potential of musical hierarchy, we first turn the graph of Figure 8 into a very simple type of Bijouan music with obvious cues for subordination. We can translate jewels into single tones on the Bohlen-Pierce scale, and we can translate subordination into duration, with higher-level jewels having longer durations. For the top-level sequence 5 played alone, click Audio Example 11. Notice that the beryls are the same pitch, the sapphire is higher, and the garnet is lower. In Audio Example 12 we hear the middle-level sequence 1 played alone as a rising sequence of pitches. In Audio Example 13, sequence 1 is subordinated to the top-level sequence 5 by fitting into the time allotted for each beryl. Finally, in Audio Example 14, a further level of subordination is added by the neighbor-tone pattern of sequence 7 played rapidly as an ornamentation of the first tone (the ruby) of sequence 1.

The structure of this simple type of Bijouan music was aided at every turn by “outsider-friendly” choices and clear cues to the syntax. The contours of each component pattern were simple and easy to comprehend. The direct translation of the level of subordination into duration made it possible to hear the full version as a straightforward enriching and ornamentation of a largely unchanged top-level pattern.

If we instead adopt outsider-unfriendly choices, our ability as outsiders to hear the intended structure can be greatly diminished. In Audio Example 15 we hear the same structure shown in Figure 5. The top-level sequence 5, however, is now realized as a series of 10-note melodies, one per jewel. Each component melody is significant to Bijouan musicians because it replicates a theme from the sacred repertory. The subordinated sequence 1 is played as a nine-note melody, three tones per jewel, interleaved between the tones of the melodies for the beryls of top-level sequence 5, and the doubly subordinated sequence 7 is performed as a three-tone pattern that sounds simultaneously with the tones of the ruby from sequence 1. Comprehension of this performance is difficult for outsiders because we lack the memories of Bijouan musicians and because most of us are not used to subordination working in quite these ways. We may get the overall form, where rapid events from the beginning reoccur near the end, but we are likely oblivious to the details and the intended content (Bijouans, by contrast, hear this as a lament). Were more levels of subordination added, and were more complex components devised, we could quickly lose any connection between the structure intended and the structure perceived. The syntax of a musical grammar may in principle allow for unlimited complexity, but listeners do have limits. The composer and music theorist Fred Lerdahl described this situation as reflecting “cognitive constraints on compositional systems” (1992).

The Importance of Memory

As mentioned, ideas about technology have often had an effect on ideas about grammar, whether in language or in music. In the late 1950s, when Chomsky published his first book on language and Meyer his first book on music, the “artificial intelligence” of room-sized computers depended on elaborate programs—algorithms—because such machines had almost no memory. The first Apple computers sold in the 1980s still could store only a few pages of text in active memory. Today, by contrast, an ordinary home computer can store and almost instantaneously access the equivalent of a whole library, and the Internet collectively constitutes a digital memory of unfathomable depth.

Over the same decades the estimated storage capacity of the human brain has grown enormously. Psychologists once thought that information in the brain had to be reduced to a small essence before it could be squeezed into our limited long-term memories. Today researchers strongly suspect that there is no practical limit to the capacity of human long-term memory.

The shift from algorithms to memory has major implications for thinking about grammar. The “generative” or “transformational” grammars of Chomsky and his many brilliant students had strong similarities to algorithms, both in their mathematical formalisms and in their reliance on “processing.” Functionalist accounts of grammar, by contrast, assume relatively little processing but a huge reliance on massive memories of individual utterances, grouped by similarity. In recent formulations, often called “usage-based grammars” or “construction grammars” (Bybee 2006, Goldberg 2006), the statistics of usage determine much of the grammar. Those statistics identify “constructions,” which act as matchmakers between incoming sounds and stored meanings. Instead of having an abstract syntax acting on a separate lexicon (i.e., looking up words in a mental dictionary), construction grammars create dynamic combinations of words and word patterns that function like holistic Gestalts or schemas.

A usage-based grammar treats language as it is, without fixed notions about what the grammar ought to be. Imagine hearing for the first time a construction like “That’s so ’90s!” (Wee and Tan 2008). In terms of anything Robert Lowth might have imagined in the 1700s, the phrase is ungrammatical. Yet today a large percentage of English speakers know that it means “That [thing in view] is so [characteristic and sadly reminiscent of similar things once common in] the 1990s.” Construction grammars assume that a contextually important meaning is learned along with the syntactical form. One could look up each individual word of “That’s so ’90s!” and never find out that the phrase is often used disparagingly. Yet people who have learned the construction know this meaning and will use the construction accordingly. In construction grammars, idioms like “That’s so ’90s!” are fully part of the grammar, not strange exceptions.

The idea of constructions recognizes the fact that learning and memory affect perception and expectation. If we hear someone say “That’s so ….” we will already have matched those sounds to the beginning of the appropriate construction, and we will be expecting a date or other word associated with style to fill the missing slot. This is a very efficient way to navigate an ever-changing world of language, and it resembles in many ways the strategies used to teach music in the old conservatories of Europe.

Teaching and Learning a Musical Grammar

In Europe during the 1700s and 1800s, music was a trade like carpentry or jewelry making. Young children were apprenticed to masters who taught them to imitate the proper shapes and designs of their trade. For carpentry we can look at the pattern books used by apprentices. For music, we can look at workbooks and exercises. Few of these documents survive from individual apprenticeships, but, first in Naples and later in Paris, hundreds of young apprentices were gathered in large urban conservatories. Many documents survive from these institutions, and their pages tell a story about how a musical grammar was learned (Gjerdingen 2007, Sanguinetti 2012).

In Naples, students first learned some basic “rules,” although the manuscripts from that period (1730s–1790s) reveal that the rules were really musical exemplars—small encapsulations of real music. The Rule of the Octave, for instance, was a scale harmonized in a certain way. The children learned the way it was done. They did not learn verbal rules explaining why. The great Neapolitan master Francesco Durante, whose music was once copied by J. S. Bach, is reputed to have told his students, “My dears, do it this way because this is the way it is done.”

After learning a few exemplars, the students learned to play them at the keyboard in response to the matching patterns in basses called partimenti. A partimento bass would mix cues for the recently learned patterns with cues for cadences, and it would modulate to various keys in the course of an exercise. Unlike thorough-bass, usually intended for the role of harmonic background in an ensemble, partimenti were meant to be self-standing improvisations where a student’s evolving repertory of constructions was rehearsed and refined.

The Paris Conservatory (1795–) chose the Italian tradition as the classic model for its instruction. The young students in Paris were taught Italian exemplars and practiced them by playing “realizations” of partimenti. Unique to conservatory life in Paris were annual contests in harmony and counterpoint. For the contest in harmony, the students were given an unfigured bass before being sent to small cubicles where, in a few hours and without a keyboard, they were expected to add three more melodically elegant parts to the bass, parts that would employ imitative counterpoint and collectively conform to the approved usage of each construction. This was not composition in the sense of a unique artistic expression. This was more like the presentation of a “masterpiece” to a craft guild, where the masters of the trade would inspect the journeyman’s product for any defects or failures to understand the approved methods.

Musical GrammarClick to view larger

Figure 9. Measures 1 through 16 of the unfigured bass (basse donnée) given to contestants in the harmony contest of 1857 at the Paris Conservatory.

Source: Robert O. Gjerdingen.

Figure 9 presents the first sixteen measures of the bass given to contestants for the harmony contest of 1857. Click on Audio Example 16 to hear this bass.

Musical GrammarClick to view larger

Figure 10. Four constructions taken from the harmony treatise of 1858 by François Bazin, harmony teacher at the Paris Conservatory.

Source: Robert O. Gjerdingen.

This excerpt from the complete bass contains thirty-three tones and thirty-two intervals. The more skilled contestants would see through that forest of tones and recognize a simpler scheme: the cadence in measures 15 and 16 is preceded by just three constructions. The student contestants had previously been taught to memorize a large repertory of constructions, all written out in four parts, with indications of where there were opportunities for imitative counterpoint. The four constructions shown in Figure 10 come from the treatise of François Bazin (1857), a harmony teacher at the conservatory. They can be heard by clicking Audio Example 17. Bazin taught many variations of each construction, but the four exemplars of Figure 10 would have been sufficient to guide a four-voice realization of the contest bass of Figure 9.

The first of these, construction “A,” treats a neighbor-tone figure in the bass (Audio Example 18). The second, B, presents the bass of Pachelbel’s Canon, sometimes called a Romanesca bass (Audio Example 19). The third, C, involves rising semitones in the bass (Audio Example 20), and the fourth, D, has a sequence of falling fourths and rising fifths (Audio Example 21). For each of these basses the upper voices indicate preferred counterpoints. Both C and D, for example, have a pair of upper voices in canon with each other (tenor and soprano for C; alto and soprano for D). To say that these constructions are chord progressions is to oversimplify what was being taught and learned.

Musical GrammarClick to view larger

Figure 11. A summary syntax of the bass from Figure 9. The four jewels stand for Romanesca (citrine), Rising Semitones (sapphire), Rising Fifths (garnet), and Cadence (beryl).

Image courtesy of the American Gem Trade Association.

One can diagram the basic syntax of the contest’s bass with a simple pattern of jewels. As shown by sequence 11 (see Figure 11), a beryl can stand for the cadence, and the three jewels to the left of it can stand for the three constructions: Romanesca (citrine), Rising Semitones or Monte (sapphire), and Rising Fifths (garnet). The Monte construction was named in the eighteenth century by Joseph Riepel (1755), who associated its ascending sequence with climbing a mountain (Italian: monte).

This simple syntax of coordination (Romanesca and Rising Semitones and Rising Fifths) is made more complex by levels of subordination. As shown in Figure 12, each jewel of Figure 11 summarizes sequential transpositions. Each of these is in turn a composite of two bass notes, the first of which can be replaced by a subordinated neighbor-note construction. These neighbor notes (emerald–iolite–emerald) act like the “predictive dependencies” in language by aurally marking each stage of the two longest constructions.

Musical GrammarClick to view larger

Figure 12. A more detailed look at the syntax of the bass from Figure 9. Coordinate patterns predominate at the high levels (the ascending and descending sequences), but the Romanesca (yellow citrines) and Rising Fifths (garnets) both feature two levels of subordination: pairs of jewels (pink morganite to yellow topaz) can replace single upper-level jewels, and three-jewel neighbor-note figures (emerald to blue iolite to emerald) can replace the first jewel of each middle-level pair.

Image courtesy of the American Gem Trade Association.

Two technical points about Figure 12 are worth noting. First, component jewels for each separate stage of the Romanesca and Rising Fifths constructions could involve identical note names (two bass tones, the second one being a fourth lower or a fifth higher). For that reason Gjerdingen (2007) named the Rising-Fifths construction a Monte Romanesca. Yet because the two constructions differ in their successive transpositions and their associated counterpoints, they sound quite different. It is the whole pattern that counts. Second, if the jewel pattern of Figure 12 were a legitimate sequence of Bijouan music, we could now explain why sequence 3 (cf. Figure 1) was wrong; from the six identical exemplars of Figure 12 we could say with some confidence that a morganite leads to a topaz and not, as in Figure 1, to a ruby. Sequence 3 was thus ungrammatical based on the statistics of Bijouan usage and the expectations that were formed from experiencing that usage.

Musical GrammarClick to view larger

Figure 13. A first-prize-winning realization of the bass from the harmony contest of 1857 (see Figure 9) by the thirteen-year-old Henri Fissot, a student of François Bazin at the Paris Conservatory. The markings of contrapuntal imitations are original. Jewels have been added for comparison with Figures 11 and 12.

Image courtesy of the American Gem Trade Association.

The patterns of Figure 12 are so simple to grasp that they could legitimately be called “child’s play,” especially because one of the actual winners of the contest of 1857, Henri Fissot, was only thirteen years old (see Figure 13). Click on Audio Example 22 to hear his realization of the contest bass, complete with its many approved patterns of imitation. It is worth noting that this thoroughly contrapuntal and quite sophisticated realization was something a student was expected to complete before entering the class on counterpoint. At the Paris Conservatory even the work specifically focused on harmony was primarily concerned with the coordination of independent melodic lines.


A musical grammar describes regularities in a particular musical style. Those regularities are largely determined by the behaviors of musicians and the preferences of their audiences. Nonmusicians can learn a musical grammar from mere exposure to music, though it helps if the exposure occurs in situations (concerts, dances, movies, theater, songs with lyrics) where additional cues add meaning to the patterns of sound. Musicians learn a large repertory of constructions that help them organize and conceptually simplify the complex patterns that they will need to perform or compose. In the past, conservatories managed this learning through partimenti and textbooks designed to guide the imitation and improvisation of a repertory of constructions. Today, in homes and garages, young musicians in popular genres accomplish much the same thing through the careful imitation of recordings and participation in improvisational “jam sessions.” The physics of sound plays a limited role in shaping a musical grammar. The biggest factors are the psychological abilities and constraints that determine what humans can learn, remember, and reproduce. As we learn new works and experience new patterns, we relate them to previous experiences and update our ideas about usage. Each evolving hunch about usage—a part of our personal musical grammar—guides our future expectations and helps to make our next musical experience just a little bit richer.


Bazin, François. 1857. Cours d’harmonie théorique et pratique. Paris: Escudier.Find this resource:

Bybee, Joan. 2006. “From Usage to Grammar: The Mind’s Response to Repetition.” Language 82(4): 711–733.Find this resource:

Cooper, Robin. 1977. “Abstract Structure and the Indian Rāga System.” Ethnomusicology 21: 1–32.Find this resource:

d’Indy, Vincent. 1903. Cour de composition musicale. Paris: Durand.Find this resource:

Gjerdingen, Robert. 2007. Music in the Galant Style. New York: Oxford University Press.Find this resource:

Goldberg, Adele. 2006. Constructions at Work: The Nature of Generalization in Language. Oxford: Oxford University Press.Find this resource:

Keiler, Alan. 1977. “The Syntax of Prolongation (I).” In Theory Only 3(5): 3–27.Find this resource:

Kurth, Ernst. 1917. Grundlagen des linearen Kontrapunkts: Einführung in Stil und Technik von Bachs melodischer Polyphonie. Bern: M. Drechsel.Find this resource:

Lerdahl, Fred, and Ray Jackendoff. 1977. “Toward a Formal Theory of Tonal Music.” Journal of Music Theory 21: 111–171.Find this resource:

Lerdahl, Fred, and Ray Jackendoff. 1983. A Generative Theory of Tonal Music. Cambridge: MIT Press.Find this resource:

Lerdahl, Fred, and Ray Jackendoff.. 1992. “Cognitive Constraints on Compositional Systems,” Contemporary Music Review, 6/2, 97–121.Find this resource:

Loui, P., Wessel, D. L., & Hudson Kam, C. L. (2010). “Humans Rapidly Learn Grammatical Structure in a New Musical Scale.” Music Perception, 27(5): 377–388.Find this resource:

Lowth, Robert. 1763. A Short Introduction to English Grammar: With Critical Notes, 2d ed. London: Millar and Dodsley.Find this resource:

Meyer, Leonard B. 1973. Explaining Music: Essays and Explorations. Berkeley: University of California Press.Find this resource:

Rameau, Jean-Philippe. 1722. Traité de l’harmonie reduite à ses principes naturels. Paris: 1722; Eng. trans., 1971, New York: Dover.Find this resource:

Riemann, Hugo. 1877. Musikalische Syntaxis: Grundriss einer harmonischen Satzbildungslehre Leipzig: Breitkopf and Härtel.Find this resource:

Riepel, Joseph. 1755. Grundregeln zur Tonordnung insgemein. Frankfurt and Leipzig.Find this resource:

Saffran, Jenny R. 2001. “Words in a Sea of Sounds: The Output of Statistical Learning.” Cognition 81: 149–169.Find this resource:

Saffran, Jenny R. 2003. “Statistical Language Learning: Mechanisms and Constraints.” Current Directions in Psychological Science 12(4): 110–114.Find this resource:

Sanguinetti, Giorgio. 2012. The Art of Partimento: History, Theory, and Practice. New York: Oxford University Press.Find this resource:

Schenker, Heinrich. 1935. Der freie Satz (Free Composition). Vienna: Universal; Eng. trans., 1979, New York: Longman.Find this resource:

Titon, Jeff Todd. 1977. Early Downhome Blues: A Musical and Cultural Analysis. Urbana: University of Illinois Press.Find this resource:

Tomasello, Michael. 2003. Constructing a Language: A Usage-Based Theory of Language Acquisition. Cambridge, MA: Harvard University Press.Find this resource:

Wee, Lionel, and Ying Tan. 2008. “That’s So Last Year! Constructions in a Socio-Cultural Context.” Journal of Pragmatics 40: 2100–2113.Find this resource: