Prominence Marking in the Japanese Intonation System
Abstract and Keywords
This article explains that Japanese uses a variety of prosodic mechanisms to mark focal prominence, including local pitch range expansion, prosodic restructuring to set off the focal constituent, postfocal subordination, and prominence-lending boundary pitch movements, but (notably) not manipulation of accent. It also discusses the Japanese intonation system within the Autosegmental-Metrical (AM) model of intonational phonology. The article then describes four phenomena that are the locus of lively discussion and controversy in the further development of this AM framework account, before addressing the larger implications which these phenomena have for the development of a tenable general theory of the role of prosody in the marking of discourse prominence. There is a rich variety of prominence-marking mechanisms even when morphosyntactic mechanisms such as scrambling are ignored. The generalization across English and Japanese languages predicts that there should also be complementary patterns of focus projection within the VP in transitive clauses.
Keywords: Japanese intonation system, Autosegmental-Metrical model, intonational phonology, pitch movements, range expansion, focal constituent, postfocal subordination, focus projection, prominence marking
Beginning at least as early as Trubetskôi 1939, Arisaka 1941, and Hattori 1961, Japanese has played an important role in developing our understanding of prosodic structure and its function in the grammar of all human languages. In much of this literature, the primary consideration has been to account for apparent similarities between the culminative distribution of lexical stress in the lexicon of languages such as English and the distribution of pitch accents in simple and derived words of Japanese. For example, McCawley (1970) and Hyman (2001) both cite Japanese in rejecting an older typology of word-level prosodic systems in which languages are categorized by a simple dichotomy between stress languages (such as English or Catalan) and tone languages (such as Yoruba or Cantonese). More recently, the analysis of Japanese has played a critical role in the development of computationally explicit compositional theories of the elements of intonation contours and their relationship to the prosodic organization of utterances. For example, Fujisaki and Sudo (1971) and Pierrehumbert and Beckman (1988) provide two different accounts of Japanese intonation patterns in which phrase-level pitch range is specified by continuous phonological control parameters that are independent of the categorical specification of more local tone shapes in the lexicon. This incorporation of continuous (or “paralinguistic”) specifications of phrasal pitch range directly into the phonological description was a critical element in the development of what (p. 457) Ladd (1996) named the Autosegmental-Metrical (AM) framework. Beginning with Pierrehumbert's (1980) model of the grammar of English intonation contours, the AM framework has been widely adopted in descriptions of intonation systems and in comparisons of prosodic organization across languages. (See Jun 2005 for a recent collection of such descriptions and Gussenhoven 2004 for a review of work in this framework since Ladd 1996.)
Such cross-language comparisons, in turn, are critical for our understanding of prominence marking and of the ways in which prosody is used in the articulation of discourse-level categories such as topic and focus. Japanese is widely cited as a language that has a morphological marker for topic. Work on other languages, including English, Catalan, and Hungarian (e.g., Vallduví 1992, Roberts 1996), strongly suggests that the morphosyntactic articulation of topic and focus is related, at least in part, to phonological constraints on the prosodic marking of prominence contrasts. For Japanese, too, we see reference to contrasting degrees of prosodic prominence as a necessary condition on some interpretations of phrases marked for topic (see, e.g., Kuno 1973; Nakanishi 2007; Heycock, this volume). To fully understand how such categories as topic and focus are realized in Japanese, therefore, it is critical to understand how the prosodic marking of contrasts between prominent versus nonprominent elements works in the language.
In this chapter, we review the currently standard AM framework account of how the prosodic marking of prominence works in the Japanese system (section 17.3). We show that, just as in the English prosodic system, there are several different prosodic mechanisms in Japanese for marking some constituents as having focal prominence and others as being relatively reduced and out offocus. In section 17.4, we then describe four phenomena that are the locus of lively discussion and controversy in the further development of this AM framework account, before discussing the larger implications that these phenomena have for the development of a tenable general theory of the role of prosody in the marking of discourse prominence (section 17.5). Before we begin this description, however, we need to make explicit some assumptions about what prosody is and about the nature of phonetic representations that we use to study it. An understanding of these assumptions is essential background for understanding the AM framework description of Japanese. section 17.2 lays out this background.
17.2 Elements of an AM Description of Japanese Intonation
In describing the prominence-marking mechanisms of Japanese, we use the account of the Japanese intonation system that is encoded in the X-JToBI labeling conventions (Maekawa et al. 2002). These conventions build on the earlier J_ToBI (p. 458) conventions described by Venditti (1997, 2005), adding tags for elements of intonation contours that have not been noted in the previous literature in English, although they are observed even in relatively formal spontaneous speech such as the conference presentations in the Corpus of Spontaneous Japanese (henceforth CSJ; Maekawa 2003, Kokuritsu Kokugo Kenkyuujo [National Institute for Japanese Language] 2006). Both the original and expanded tag sets are intended to describe intonation contours for standard (Tokyo) Japanese. The intonation systems of other dialects are known to differ, some rather dramatically. However, they either have not been studied at all (most regional dialects) or have been studied less extensively (e.g., the Osaka system, as described by Kori  and others). Both tag sets also assume a description couched in the AM framework.
17.2.1 The AM framework
As already noted, the term Autosegmental-Metrical was coined by Ladd (1996) to refer to a class of models of phonological structure that emerged in work on intonation in the 1980s, beginning with Pierrehumbert's (1980) incorporation of Bruce's (1977) seminal insights about the composition of Swedish tone patterns into a grammar for English intonation contours. All AM models are deeply compositional, analyzing sound patterns into different types of elements at several different levels. The most fundamental level is the separation of autosegmental content features from the metrical positions that license them. This is roughly comparable to the separation of lexis from phrase structure in a description of the rest of the language's grammar.
More specifically, the “A” (Autosegmental) part of an AM description refers to the specification of content features that are autonomously segmented—that is, that project as strings of discrete categories specified on independent tiers rather than being bundled together at different positions in the word-forms that they contrast. For example, in Japanese, changing specifications of stricture degree for different articulators can define as many as six different manner autosegments within a disyllabic word-form, as in the word sanpun [sampun] ‘3 minutes’, where there is a sequence of stricture gestures for sibilant airflow, for vowel resonances, for nasal airflow, for a complete stopping of air, and then a vowel and a nasal again. However, there are constraints on which oral articulators can be involved in making these different airflow conditions, so that only one place feature can be specified for the middle two stricture specifications in a word of this shape. Therefore, in the grammar of Japanese (as in many other languages), place features should be projected onto a different autosegmental tier from the manner features. This projection allows for a maximally general description of the alternation among labial nasal [m] in sanpun, dental nasal [n] in sandan ‘3 steps’, velar nasal [ŋ] in sangai ‘3rd floor’, and uvular nasal [n] in the citation form san ‘3’. That is, the projection of consonant place features onto a separate autosegmental tier from the manner features accounts for the derivational alternation in these Sino-Japanese forms in terms of the same general phonological constraints that govern how postvocalic nasals are adapted into the language in monomorphemic loanwords such as konpyuuta (p. 459) ‘computer’ or hamu ‘ham’ (where the [m] corresponds to a postvocalic [m] in the English source word), handoru ‘steering wheel’ (where the [n] corresponds to a postvocalic [n]), hankachi ‘handkerchief’ (where [ŋ] 〈 [n]), and koon ‘corn’ (where [n] 〈 [n]). A key insight of Firth (1957), which has been translated into Autosegmental Phonology by North American interpreters such as Langendoen (1968), Haraguchi (1977), and Goldsmith (1979), is that the traditional consonant and vowel “segments” that are named by symbols such the [s], [a], [m], [p], [u], and [n] in sanpun are not primitive elements in the phonological grammar. Rather, they are complex categories defined by the intersection of place and manner specifications for the word-form when the array of parallel streams of autosegments on different tiers is collapsed into a one-dimensional string.
These traditional “segments” are also the names for the C or V leaf nodes in the other part of an AM description of a spoken utterance. More generally, the “M” (Metrical) part specifies the hierarchy of prosodic constituents that defines the meter or rhythm of an utterance containing the word-form. For example, in Japanese, there is a low-level prosodic constituent syllable (typically abbreviated σ) that coordinates the place and manner features of words to alternate between more sonorant manners licensed to occur at the head and less sonorant manners licensed to occur only at the edges. In an utterance of sanpun, this syllable-level meter is CVC.CVC. In many other words, there is a more regular rhythmic alternation between C and V leaf-node types, as illustrated by the utterances in figure 17.1.
All but five of the thirty-three syllables in these utterances have a perfectly alternating CV structure. The exceptional cases are of two types. The first is the bare V at the beginning of the verbs oyoideru and utatteru in figures 17.1a and 17.1c. These two V-shape syllables, like the twenty-eight CV-shape syllables, are short. That is, each contains just one mora (abbreviated μ), a constituent between the syllable and the CV leaf nodes that licenses the specification of vowel place and height features as the head of the syllable.
The other three exceptional syllables are long. A long syllable contains at least one extra mora after the head V that is the sole obligatory leaf constituent. This following mora can be either a V or a C. If it is a V, it can be the second part of a geminate vowel or it can be a less sonorous vowel than the head vowel, as in figure 17.1a, where the [i] in the second syllable of oyoideru has lesser sonority than the preceding head vowel [o]. If the following mora is a C, it can be a moraic nasal, as in each of the two syllables of sanpun, or it can be the first part of a geminate obstruent, as in the second syllables of the verbs hasitteru and utatteru in figures 17.1b and 17.1c.
As sanpun shows, a long syllable that is closed by the moraic nasal can be word final. By contrast, a long syllable that is closed by an obstruent must have a following syllable to provide the second C position for the necessarily geminate consonant. This constraint is one of the motivations for positing prosodic constituents above the syllable as well as below. That is, the place and manner autosegments for a geminate consonant necessarily associate to a sequence of C nodes that is medial to a prosodic word (abbreviated ω). (p. 460)
To recap, then, the hierarchy of prosodic constituents in the phonological description of an utterance is analogous to the hierarchy of syntactic constituents in its syntactic description. By this analogy, the prosodic constraint that each syllable must have a V (vowel) category autosegment positioned to be its head is comparable to the syntactic constraint that a verb phrase must have a V (verb) category word positioned to be its head. It is in this sense that we say that the inventory of segments for a language is analogous to the lexis of the language. One argument in favor of this analogy is that the prosodic category of a segment such as [i] is inherently ambiguous in the same way that the syntactic category of a word such as happyoo is inherently ambiguous. That is, happyoo can be parsed either as the verb ‘to announce, present’ or as the noun ‘announcement, presentation’ in different morphosyntactic contexts. Similarly, the palatal constriction that gives rise to the low first formant and high second formant on either side of the medial [o] in the word oyoideru in figure 17.1 can be parsed either as a C (written ‘y’) in the syllable onset or as a V (written ‘i’) in the (p. 461) second mora of the syllable. A second argument in favor of this analogy is the way that empty structural positions are interpreted. That is, the implicit but unrealized subject noun phrase in sentences shown in figure 17.2 can be recovered from the syntactic parse of the rest of the sentence.
Similarly, the implicit head vowel in the second syllable of hashitteru in figure 17.1b and third syllable of kurashiteru in figure 17.1d can be recovered from the prosodic parse of the syllable structure even though there is no interval containing the usual acoustic traces of the vowel manner specification in the signal. These two (p. 462) tokens of the segment are “devoiced” and so cannot show the vowel formant values that cue the explicit [i] in oyoideru. Nonetheless, the listener parses the presence of some implicit head V for the second syllable of hashitteru and third syllable of kurashiteru from the fact that [∫t:] and [∫t] are not grammatical sequences of segments for any position in a prosodic word of the language. A good grasp of how listeners resolve prosodic ambiguities and of how they recover the intended autosegmental features of an implicit prosodic node in phonological ellipsis is critical for understanding the ways in which phonological structures above the word are manipulated to mark discourse prominence.
In introducing the prosodic constituents mora, syllable, and prosodic word in this section, we have defined them primarily in terms of the distribution of place and manner autosegments for consonants and vowels. In describing what listeners do to parse the prosodic categories of ambiguous segments or to recover elided segments, we have made passing reference to their phonetic interpretation, using the spectrograms in figure 17.1 as a convenient representation of the place and manner autosegments. We use the fundamental frequency (F0) contours that are displayed below the spectrograms as a convenient phonetic representation of another type of autosegment that is critical for understanding prosodic structures above the word.
17.2.2 Tone Features and Prosodic Constituents above the Word
In addition to word-internal constituents such as the syllable, Japanese also has larger constituents above the word: various types of prosodic phrases. The X-JToBI model describes two levels of prosodic phrasing: the accentual phrase (AP) and the intonation phrase (IP). When our speaker produced the fluent utterances of the sentences in figures 17.1a and 17.1b, he grouped the prosodic words in each utterance into two APs, which in turn were grouped together to form a single IP for the utterance as a whole. To define these two larger groupings, we must refer to the distribution of another type of autosegment besides the place and manner autosegments that defined the C and V nodes of the prosodic tree. This third autosegmental tier is the string of low (L) and high (H) tones that correspond to the target pitch values that the speaker produces at different prosodic positions in the utterance.
The Fo contours below the spectrograms in figure 17.1 are a convenient phonetic representation of these target pitch values, and we have annotated each of them for some of the C and V autosegmental targets that also affect the pitch pattern in ways that can make Fo contours difficult for the uninitiated to read. In particular, there is a discontinuity in the Fo representation at each interval where the [−voice] autosegment is in effect. For example, there is a break in the Fo contour during the [∫] of Kashino and during the [h] and the [∫t:] sequence (p. 463) in the first three syllables of hashitteru in figure 17.1b. Also, [+voice] consonants, such as the [d] and [r] in oyoideiru in figure 17.1a, can cause sharp dips in the pitch, because the speed of opening and closing of the glottis depends in part on the difference in air pressure below and above the glottis, which in turn depends on a steady venting of air through the mouth. Experienced readers of Fo contours automatically parse these effects as cues to the consonant or vowel segments rather than cues to the tones and see that the tone sequences in utterances figures 17.1a and 17.1b are essentially identical. There is a high tone target near the beginning of the subject NP, then a sudden fall to a low tone target, then a rise to a second high target on the second syllable of the VP, followed by another fall to a low target. The pitch rise at the beginning of the VP, which is especially apparent in figure 17.1a, marks the boundary between the first and the second AP.
In the previous section, we described how constriction targets can be called by their metrical functions as well as by their actual content, so that a close palatal constriction is symbolized differently depending on whether it marks a syllable boundary (C = [y]) or a syllable head (V = [i]). All tone targets, too, can be called by their metrical functions as well as by their pitch levels. For example, in the X-JToBI model, the low target at the AP boundary between the NP and the VP in figures 17.1a and 17.1b is analyzed as a boundary tone, which is aligned to the phrase edge. Following the usual AM conventions for indicating the demarcative prosodic function directly in the symbol for the category, this boundary tone is symbolized as [L%] in an X-JToBI tagging of the utterance. Given that the start of the utterance as a whole is also necessarily the start of a new AP, there is another instance of this low boundary tone marking the beginning of these two utterances as well as the beginning of the utterances in figures 17.1c and 17.1d. Following common practice in the AM literature, the X-JToBI conventions tag this boundary tone as [%L], using the function marker as a prefix to indicate that this boundary tone marks an initial boundary after a silent pause. Of course, this utterance-initial [%L] target is explicitly realized only when the text of the phrase begins with a sequence of [+voice] consonant and vowel autosegments on which the Fo can be realized, as in figures 17.1a and 17.1c. However, even when the phrase begins with a [−voice] segment, as in figures 17.1b and 17.1d, the [%L] can be recovered from the prosodic parse in the same way that the implicit [i] in the second syllable of hashitteru is recovered in figure 17.1b. In both figures 17.1a and 17.1c, the explicit low tone defines the start of a rise in Fo. The high tone target that defines the end of this rise is also explicitly realized in figures 17.1b and 17.1d, providing part of the context for positing an elided [%L]. In figures 17.1a and 17.1b, this high tone is the first tone target in a [H⋆+L] falling sequence that is specified in the lexicon, whereas in figures 17.1c and 17.1d it is an AP-level phrasal tone, symbolized with [H−]. These tone sequences and their metrical affiliations are shown in (1a,b) for the utterances in figures 17.1a and 17.1c.1
(1) Metrical structure and associated tone targets for (a) Yaʼmano-ga | oyoʼideru (see figure 17.1a), (b) Yamada-ga uttateru (see figure 17.1c), and (c) Yaʼmano-wa ║ oyoʼideru (see figure 17.3a, black line).
Each of the utterances in figure 17.1 constitutes a single IP, and the second [L%] boundary tone at the end of the utterance marks the final boundary of this IP as well as the final boundary of the second AP. By contrast, the utterance of Yaʼmano-wa oyoʼideru in figure 17.3a (black line) is two IPs, so that the medial [L%] boundary tone here marks an IP boundary as well as an AP boundary. (The phonetic markers to the strength of this medial disjuncture include the noticeable phrase-final lengthening on the -wa as well as other less well-documented hiatus markers, such as a possible change in voice quality or glottal stop. However, they do not include a silent pause, so there is no separate initial [%L] boundary tone marking the beginning of the second IP.) There are many other tone sequences besides simple [L%] that can mark the end of a prosodic phrase, giving rise to a variety of boundary pitch movements (BPMs) with contrasting pragmatic functions.
We discuss the inventory of BPMs that can mark the end of a phrase in the next section, but first we must describe one more element that can occur in the tone sequence for some APs. This other AP-level element is the akusento-kaku, a sequence of tones that is tagged in Japanese ToBI as [H⋆+L]. This tone sequence is what results in the steep fall in each AP in figure 17.1a and the first AP in figure 17.1b. The APs Yaʼmano-ga and Kaʼshino-ga contrast with Yamada-ga and Kushida-ga in that each of the former has an akusento-kaku on its first syllable. The APs oyoʼideru and hashiʼtteru similarly contrast with utatteru and kurashiteru in that each of the former has an akusento-kaku on its second syllable, and there would be a steep fall after the second syllable in hashiʼtteru in figure 17.1b if the vowel in this syllable were not effectively deleted. That is, the listener parses the implicit high target for a [H⋆+L] from the surrounding pitch dynamics and from the relationship between the low-tone pitch targets on the first and the third syllables, even though the syllable bearing the high tone of the akusento-kaku is completely voiceless.
This term akusento-kaku is usually rendered in English as ‘pitch accent’ or ‘accent’, but we introduce it first by the Japanese term to try to decouple this use in translation from the way these two English terms are used in AM models of several other well-studied languages. In descriptions of English, for example, accent refers to any of a number of prominence-lending pragmatic morphemes that can be associated with the metrically strongest syllables in the most prominent content words of an utterance. In English, the location of these prominence-lending morphemes is determined by prosodic constraints such as ObligatoryHead (the principle that the intonation contour for each intonational phrase must contain at least one pitch accent associated to the metrically strongest prosodic position; see Hyman 2005) in (p. 465) interaction with the information structure of the utterance in its discourse context. For example, a native speaker of English may choose to place a particular rising-falling configuration of accents on each of the nouns and verbs in the sentences in (2) to invoke the relevant contrast sets for each person and each activity.
(p. 466) (2) [context: Who's doing what?]
Yamano is swimming. Kashima is running. Yamada's just standing there singing to ʼem.
When speakers of English begin to learn Japanese, they may interpret the rises and falls in figure 17.1a and 17.1b in terms of such head-marking tonal morphemes in their native language. But this is a misparsing of the prosodic function of the tone shapes in these two Japanese utterances. Although the steep fall in Yaʼmano in figure 17.1a is because this noun has a pitch accent on the first syllable, that fact is a property of the word itself, regardless of the context in which it is uttered. The specification of a [H⋆+L] fall at this point is part of the phonological contrast between accented Yaʼmano and unaccented Yamada.
Japanese is like conservative varieties of Basque in having a lexical contrast between words specified for an accentual fall and words without this specification. (See Hualde 1991 and Hualde et al. 2002 for descriptions of the conservative Basque system and of the influence of contact with Spanish in other varieties.) Tokyo Japanese is crucially different from Basque, however, in that this lexical specification completely constrains the distribution of [H⋆+L] sequences in the surface intonation pattern. That is, in Basque, an unaccented word must be grouped together with a following accented word, and if a sentence contains only lexically unaccented words, an accent is inserted somewhere—either in the last argument before the sentence-final verb or earlier if there is narrow focus on an earlier constituent (see Ito 2002a). By contrast, there is no postlexical insertion of accent in Japanese. When a Japanese sentence contains no lexically accented words, the intonation contour of an utterance of that sentence does not contain any accents, whatever the focus pattern. Instead, it is composed only of the same tones as the contour in figure 17.1c (see also (3b)); there are the rises for the [%L H−] or [L% H−] sequences that mark the beginning of every AP, and there may be additional boundary tones marking the end of some phrases, but there are no other tones. Any fall in pitch within an AP is not the sharp localized fall of the accent [H⋆+L] but some more or less gradual interpolation between the high pitch target for the phrasal high and some following low tone. When these words are all grouped together into one long AP, as in figure 17.2c, this gradually falling Fo can describe a fairly long stretch of the contour from the high target at the end of the rise for the initial [%L H−] to the IP-level boundary [L%] at the end.
Note that this account of the long gradual fall in figure 17.2c assumes underspecification of tones relative to potential tone-bearing units. That is, none of the morae between the demarcative rise in omiage and the boundary tone at the end of the utterance is described as having an associated tone target of any kind. This differs from older accounts within the Autosegmental Phonology framework, such as Haraguchi (1977), who specifies a [H] tone target for each of the ten sonorant morae after the demarcative rise from the first to the second V in omiage. In such older accounts, the description of the fall that is actually observed over this purported sequence of ten [H] tone targets is relegated to declination—an “extragrammatical” (p. 467) or “paralinguistic” manipulation of global pitch trends. Pierrehumbert and Beckman (1988) argue for their underspecified AM account (which X-JToBI follows in this regard) on the basis ofslope measurements for unaccented APs containing varying numbers of syllables. In their data, the slope of the fall is inversely correlated with its length, just as predicted in an account that models the fall as an interpolation from the [H−] phrase tone to some later low target. In this AM account, then, the fall in figure 17.2c is attributed to the transition between different target values assigned to the [H−] and the [L%], just as the slightly rising plateau-like interval in the first AP in figure 17.2a is attributed to a transition between only slightly different target values for the phrasal [H−] at the beginning and the higher tone target of the [H⋆+L] accentual fall at the syllable that is lexically associated to the akusento-kaku.
The Fo contour for figure 17.2a also illustrates another phenomenon that has been treated differently in at least one older account in the Autosegmental Phonology framework. The [H⋆] target at the accent in taʼbeta is considerably lower than that in chiʼizu. The same relationship holds between the accent on the VP and the accent on the preceding NP in figures 17.1a and 17.1b. McCawley (1968) accounts for this relationship by positing a mid (M) tone for Japanese—that is, a tone target intermediate between [H] and [L] such as the [M] tone necessary to describe the lexical tone contrasts on monosyllabic words in Yoruba and Cantonese. In the AM account, by contrast, the lowering of the tone target on the verbs in figure 17.1a, 17.1b and 17.2a is attributed instead to the component of the grammar that specifies parameters of the backdrop pitch range for each AP and for each IP.
Pierrehumbert and Beckman (1988) argue for the AM treatment on the grounds that there are many more intermediate levels than just one. For example, the medial AP-level [L%] boundary tone in figure 17.1a (see also example (1a)) is certainly lower than either the preceding or the following accent peak. Yet it is higher than an IP-level [L%] boundary tone at the end of the utterance. This can be seen by contrasting the AP-level [L%] to the utterance-final [L%] in that example or by contrasting the lower IP-level [L%] after oyogeʼru-ga in utterance figure 17.3b to the merely AP-level [L%] after the following Kaʼshino-wa. This difference between the truly low target for the [L] at an IP boundary versus the intermediate [L] at a mere AP boundary is a fairly reliable cue to the level of the disjuncture. It seems to be the same kind of boundary strengthening effect noted for consonant features in many languages by researchers such Pierrehumbert and Talkin (1992), Fougeron and Keating (1997), and Jun (1998). Other types of systematic difference among tones are illustrated in the Fo contours of the two example sentences that are overlaid in figure 17.4.
One difference in figure 17.4 is contained just within the Akai yaʼne-no phrase (gray line). Here the high target for the phrasal [H−] is lower than the high for the following accent peak, and this is true for all three speakers. The same relationship also accounts for the rise in pitch over the plateau-like interval in the Omiyage-no chiʼizu-o phrase in figure 17.2a. We have also seen flat or slightly falling shapes for (p. 468) longer accented phrases, such as the one in figure 17.2b. In read speech, however, a small rise seems typical for short phrases such as the Akai yaʼne-no phrase. That is, if [H−] is typically lower than the high target of [H⋆+L], then we can explain the difference in peak heights between unaccented Yamada-ga in figure 17.1c and accented Yaʼmano-ga in figure 17.1a and between comparable minimal pairs in prior experiments by many researchers, including Pierrehumbert and Beckman (1988). This difference may be related to the same dissimilative pre-L raising seen in many other languages, including Yoruba (Laniran and Clements 2003), Thai (Gandour, Potisuk, and Dechongkit 1994), and Taiwanese (Peng 1997).
Another systematic difference seen in figure 17.4 is between the yaʼne-no phrase in the two overlaid utterances. The accent peak on yaʼne is lower after accented aoʼi than it is after unaccented akai, as predicted by McCawley's rule lowering [H] to [M] in this context. However, each of the subsequent accent peaks in both sentences (p. 469) also is lower than its immediately preceding accent peak. Moreover, in the female speaker's utterances in figure 17.4b, where the medial [L%] AP-boundary tones are all clearly visible as local Fo minima, there is the same successive lowering of Fo values across the two or three utterance-medial AP boundaries.
Which of these intermediate levels should be singled out as the target level [M]? The X-JToBI model follows Pierrehumbert and Beckman (1988) in calling none of them [M]. Instead, all tone levels in Japanese are either [H] or [L], with the intermediate Fo values attributed to another type of phonological representation than the categorical contrast between these two tones—namely, a representation of backdrop reference pitch levels. For example, beginning with Poser (1984), all accounts in the AM framework model the systematically lower value for the [H] tone in yaʼne after the accented adjective in figure 17.4 in terms of downstep, defined as a compression of the pitch range that begins at the akusento-kaku and affects all tone targets up to the following IP boundary. Other current formal models of Japanese intonation patterns account for some of the systematic Fo target differences between and within utterances in subtly different ways from the account in the X-JToBI model. However, despite their differences, all current formal models of the Japanese intonation system share one property that distinguishes them from older Autosegmental Phonology accounts such as McCawley 1968 or Haraguchi 1977. All provide some way of representing the kind of continuous variation discussed in these examples directly in the grammar. Understanding this aspect of the AM framework model and other compatible models is critical for understanding one of the basic mechanisms for focus marking in the language. We expand on this aspect in the next section.
17.2.3 The Relationship between Tone Features and Pitch Range Features
Pierrehumbert and Beckman (1988) describe an intonation synthesis program that models the differences in Fo value among the various [H] or [L] targets in terms of prominence relationships at two different levels of specification. One level involves tone scaling within the local pitch range. For example, in figure 17.3b the more prominent IP-level [L%] target after the oyogeʼru-ga is scaled lower in the pitch range than the merely AP-level [L%] targets at the ends of the preceding and following phrases. Similarly, the more prominent [H] target of the accent is scaled higher in the pitch range than the phrasal [H−], as described above. In the synthesis program, these two local prominence relationships are specified in terms of two constants indicating tone-scaling ratios. The first constant positions an IP-final [L%] proportionally closer to the bottomline value for the local pitch range, and the second one positions any accent [H⋆] target proportionally closer to the topline for the local pitch range. Such variation in tone-scaling changes the target position within the pitch range, but leaves the pitch range intact.
By contrast, other prominence relationships are modeled by varying the pitch range. The synthesis program provides for a paradigmatic choice of hertz values (p. 470) for the bottomline and for the topline of the pitch range of each IP. It also provides for the paradigmatic choice of an “AP prominence” factor—a ratio value that determines the topline for each AP relative to the pitch range of the IP that dominates it. For example, the accent peak on /Yaʼmano-wa/ in figure 17.3a (black line) is lower than the accent peak on /Yaʼmano-ga/ in figure 17.1a (reproduced as the gray line in figure 17.3a), because the topic IP /Yaʼmano-wa/ is not in focus relative to the following predicate. In the synthesis program, this relationship is generated by specifying a lower topline value for the first IP /Yaʼmano-wa/ in figure 17.3a (black line). In a similar way, the accent peak on the Akai yaʼne-no phrase is very much higher than the peaks on ieʼ and mieʼru in all three speakers' renditions of this sentence in figure 17.4, because the initial AP is produced in a pitch range that is somewhat expanded relative to a more neutral reading. This reflects the focus structure of the sentence in the kinds of discourse contexts that the speakers could imagine for it. The distinctive roof color identifies which house it is that is visible. In the synthesis program, this is specified by an AP prominence ratio greater than 1.0 so that the topline for the first AP is above the topline for the IP as a whole.
The discourse-level considerations just described are not the only ones at play in determining the topline value for an AP. There is also the general phonological constraint that AP-level high tone targets are lower in APs that occur after an accented AP within the same IP. This constraint is reflected in the lower value for the accent peak on oyoʼideru in figure 17.1a, where the subject and predicate are grouped together into one IP, relative to the accent peak on oyoʼderu in figure 17.3a (black line), where the predicate stands alone in an independent IP. The same relationship can be seen also on yaʼne in Aoʼi yaʼne-no as compared to the accent peak in the Akai yaʼne-no phrase in figure 17.4. In the synthesis program, these differences are modeled by a downstep ratio that proportionally reduces the distance of the IP topline above the bottomline immediately after each accent [H⋆] target. The very much lower values of the peaks on ieʼ-ga and miʼeru in figure 17.4 then reflect the combined effects of focus subordination and of downstep.
The types of prominence relationships that are modeled in Pierrehumbert and Beckman's synthesis program are also modeled in a comparably direct way in Hiroya Fujisaki's model that was first described in Fujisaki and Sudo 1971 and that has been adopted in an impressive body of later studies that became the basis for the intonation synthesis modules in several leading Japanese text-to-speech and concept-to-speech systems (e.g., Fujisaki and Hirose 1993; Fujisaki et al. 1994; Hirai, Higuchi, and Sagisaka 1997; Kiriyama, Hirose, and Minematsu 2002). In the Fujisaki framework, the prominence relationship among different intonation phrases is modeled by specifying different amplitudes for their phrase commands. The phrase command is a local pulse aligned near the beginning of the phrase and then convolved with a decaying declination function. The prominence relationship among different APs, in contrast, is modeled by the specification of different amplitudes for the accent commands. The accent command is a square wave that generates the rise at the beginning of each AP and the subsequent steep fall after the accent peak (p. 471) in an accented AP. The differences between the X-JToBI AM framework model and Fujisaki's model primarily involve the treatment of unaccented phrases. The accent command in Fujisaki's model implies a [H] target at the end of an unaccented AP. In other respects the two frameworks provide comparable accounts of the many intermediate Fo target levels observed within and across utterances. The different frameworks are comparable enough that observations couched in one set of synthesis parameters can be translated readily into the parameters of the other model. That is, this kind of formal analysis-by-synthesis yields data that can be compared across frameworks in a way that is not possible with the informal pretheoretic observations reported in studies such as Selkirk and Tateishi 1991, which assumes a direct and deterministic mapping between syntactic and prosodic structures instead of building an independent formal model of the prosodic structure on its own terms. Both synthesis models also provide an explicit representation of continuous variation in the alignment of tone targets relative to the consonant and vowel segments of the associated text. This explicit representation of tone-text alignment is critical for describing the contrasting boundary pitch movements at the ends of intonational phrases, which we describe in the next section.
17.2.4 Boundary Pitch Movements
The preceding two sections have discussed the relevant metrical structure at both the AP and IP levels, as well as the tonal autosegments that mark the lower level of the AP and the pitch range compression that is triggered by the accent tones and affects all following tones until a new pitch range is chosen for the following IP. The literature in English focuses almost exclusively on these aspects of the Japanese intonation system, and all have been encoded even in the earlier J_ToBI conventions (Venditti 1997). By contrast, the last major element of Japanese intonation—the inventory of rises and more complicated tonal events that are licensed to occur after a phrase-final [L%] boundary tone—is rarely given more than passing mention.
In the X-JToBI conventions, these events are called boundary pitch movements (BPMs). A BPM occurs when the AP-level boundary [L%] is followed by one or more other tone targets, including at least one [H], to create a simple rise, a concave rise, a rise-fall, or some even more complicated Fo movement. There is a long-standing consensus that BPMs are pragmatic morphemes that are licensed to occur at the end of a phrase to contribute to the pragmatic interpretation of the phrase (see Kindaichi 1951, Ohishi 1959, Kawakami 1963, Miyaji 1963, Kori 1989b, Fujisaki and Hirose 1993, Muranaka and Hara 1994, Venditti, Maeda, and van Santen 1998, Katagiri 1999, among many others). Despite this primordial characterization of their general function, however, we do not yet have either an exhaustive inventory of BPMs or an exact description of the contrasting discourse meanings that they convey. In this section, we discuss the inventory of BPM categories in the X-JToBI scheme, which were used to tag all 178,000+ phrase boundaries in the forty-five-hour Core portion of the CSJ (henceforth CSJCore), yielding the counts in table 17.1. (p. 472) We compare this list of BPMs with the inventory proposed by Kawakami in his insightful 1963 paper on phrase-final rises.
Table 17.1 Distribution of phrase-final tones in CSJCore
L% + H%
L% + HL%
L% + LH%
L% + HLH%
(a) APS (Academic Presentation Speech) refers to CSJ monologues collected from live academic presentations.
(b) SPS (Simulated Public Speaking) refers to monologues collected from lay subjects who were asked to give short presentations about everyday topics.
(c) D refers to dialogue recordings.
(d) R refers to read speech of (i) passages from popular science books or (ii) “reproduced” transcriptions of the APS/SPS spontaneous speech (see Maekawa 2003 for details).
As table 17.1 shows, many prosodic phrases in the CSJCore end simply in [L%], without any additional boundary pitch events. This is true also of all of the prosodic phrases in our read example utterances in figures 17.2–17.4 and figure 17.5a. However, phrases ending in other boundary configurations are not rare even in fairly scripted spontaneous speech. For example, nearly a quarter of the phrases in the academic presentations (APS) in the CSJCore are tagged with some kind of BPM after the [L%].
The most common BPM tag in the CSJCore is [H%]. This corresponds to the shape that Kawakami (1963) termed the futsuu no jooshoochoo ‘normal rise’. The pitch rises sharply from the onset of the final syllable of the phrase, as shown in figure 17.5b. The [H%] BPM should not be confused with the superficially similar ‘question rise’ illustrated in figure 17.5d. Its function is instead more closely related to that of [HL%], another BPM with which it also shares an alignment pattern for the [H] tone target. In the [HL%] BPM, as in [H%], the Fo rises sharply at the onset of the final syllable of the phrase, before it falls again. The final syllable is usually lengthened as well, as in the example utterance in figure 17.5c.
In each of the examples in figure 17.5, the BPM occurs on a sentence-final pragmatic particle ne. Katagiri (1999) describes ne as occurring only in dialogue, where it indicates lack of commitment to the proposition just uttered (in contrast to yo, which indicates the speaker's acceptance of the proposition into the mutual belief space). Katagiri also describes how this core meaning of ne interacts with the meaning of a rising versus a falling phrase-final intonation pattern to induce a variety of conversational implicatures, such as confirmation question, polite assertion, and the invoking of external authority. It is important to note, however, that Katagiri does not distinguish between the two types of rising BPMs illustrated (p. 473) in figure 17.5b versus 17.5d. Thus, it is possible that some of the meaning contrasts he shows are associated with tonal differences.
Katagiri is not alone in his failure to distinguish more than one rising BPM. Indeed, in the literature in English, this is the rule rather than the exception. For example, Pierrehumbert and Beckman (1988) also recognize only one rising BPM, which they analyze as a [H%] boundary tone. Moreover, they elicit this rise only in interrogative uses of the sentence-final particles ka and no. The occurrence of a rising BPM in association with an utterance-final interrogative ka or no marks a very salient, easily grasped speech act, so it is not surprising that all researchers have recognized at least one rising BPM in at least this one discourse context.
Pierrehumbert and Beckman also follow McCawley (1968), Haraguchi (1977), and Poser (1984) in assuming that this rise occurs only in sentence-final position. Based on this (mistaken) assumption, they posit a prosodic constituent above the IP (termed the utterance) as the domain that licenses the [H%] target. However, as figure 17.6a shows, the BPM that is tagged as [H%] in X-JToBI readily occurs sentence-medially in spontaneous speech (see also Nagahara 1994 and Nagahara and Iwasaki 1994 for a similar insight). Often the phrase-final syllable on which [H%] is realized is a postposition or case particle, although this need not be the case. For example, there are three instances of [H%] in figure 17.6a, but only one of these is associated to a case particle. Moreover, as the counts in table 17.1 show, this BPM occurs frequently in all genres represented in CSJCore. That is, unlike the particles ne and yo, the simple rise BPM is by no means limited to dialogue. (p. 474)
(p. 475) In the (mostly non-English) literature that does distinguish at least two different rising BPMs, the function of the rise that is tagged as [H%] in X-JToBI is described as one of imparting prominence to the constituent to which it associates (see Ohishi 1959; Kori 1989b, 1997; Venditti, Maeda, and van Santen 1998; Taniguchi and Maruyama 2001; Oshima, 2007; among many others). This function contrasts with the function of the question rise in figure 17.5d, and it relates [H%] instead to [HL%], the rise-fall BPM illustrated in figure 17.5c. The rise-fall BPM in figure 17.5c is not included in Kawakami's (1963) inventory, but it is among the shapes examined by Venditti and her colleagues in a perception experiment (Venditti, Maeda, and van Santen 1998) matched with an extensive set of acoustic measurements (Maeda and Venditti 1998). In the perception study, Venditti, Maeda, and van Santen (1998) asked listeners to rate each stimulus on several semantic scales. They observed that listeners perceive [HL%] to be highly ‘explanatory’ and ‘emphatic’. That is, Japanese listeners expect speakers to use [HL%] in contexts where they are explaining some point to their interlocutor and want to focus attention on a particular phrase in this explanation. Although it is not as frequent as [H%] and is markedly rare in read speech, [HL%] occurs at 4–8 percent of the phrase boundaries in the different genres of spontaneous speech in the CSJCore.
The next most frequent BPM in the CSJCore is much rarer, occurring at only 0.2 percent of phrase boundaries. This is the [LH%] rise illustrated in figure 17.5d. This BPM may occur sentence finally and is observed most often at the ends of utterances expressing questions. As noted already, this BPM is superficially similar in form to [H%] in that the Fo rises on the last syllable of the phrase. However, unlike with [H%], where the onset of the rise aligns with the onset of the final syllable, the rise of [LH%] occurs later within that syllable, making the overall shape of [L% LH%] more “scooped” or “concave” than that of [L% H%]. In some cases, especially in cases expressing incredulity, the final syllable is lengthened as well. In the Venditti, Maeda, and van Santen (1998) perception study, stimuli carrying [LH%] were perceived as ‘seeking confirmation’ from the interlocutor, and those that also had a lengthened final syllable yielded a high ‘disbelief’ rating (i.e., incredulity questions).
The X-JToBI [LH%] category corresponds to two different rise shapes in Kawakami (1963), who describes simple information-seeking questions such as kaʼeru? or kaeru no? ‘Are you returning?’ as having a futsuu no jooshoochoo ‘normal rise’ and distinguishes this shape from interrogatives indicating ‘deep’ questioning, which he describes as having a hanmon no jooshoochoo ‘return rise’. In their acoustic analysis of BPM contrasts, Venditti, Maeda, and van Santen (1998) show that both the alignment and overall Fo curve shape categorically distinguish all questions from prominence-lending rises: in both information-seeking and incredulity questions ([LH%]), the rise onset occurs later in the syllable and the rise shape is more concave than in the prominence-lending rise ([H%]). By contrast, the phonetic distinction between the two question types is fuzzier. The duration of the final syllable varies continuously from the shortest most clearly information-seeking question (e.g., kaʼeru no?) to the most drawn-out, clearest examples of an (p. 476) incredulity question (e.g., kaʼeru no?!!). Additionally, the location of the rise onset (and other points during the rise) is correlated with the varying durations of the C and V autosegments on which the rise is realized. Given these results, we now understand Kawakami's futsuu no jooshoochoo and hanmon no jooshoochoo to be descriptions of the extreme endpoints of a continuum that includes many intermediate degrees of emphatic lengthening. That is, the gradient nature of the relationship between the phonetic dimensions and the continuum of contrasting degrees of incredulity suggests an analysis akin to the one that Hirschberg and Ward (1992) propose for the uncertainty versus incredulity interpretations of the English fall-rise contour.
The rarest BPM distinguished by X-JToBI is [HLH%], a very complex shape in which the Fo rises, then falls, then rises again. This BPM also is not included in Kawakami's inventory of rises. It seems characteristic particularly of child-directed speech, where it can give a wheedling or cajoling quality to the utterance to which it attaches. The counts in table 17.1 show that [HLH%] occurs a total of only 14 times on the 178,000+ boundaries of the CSJCore, one of which is shown in figure 17.6b. This is likely because the speaking genres that CSJ includes are for the most part not the types in which [HLH%] readily occurs; we would expect to count more instances of [HLH%] in interactions between mothers and their young children or in conversations between young female friends.
In addition to the BPM tags in table 17.1, the X-JToBI conventions provide a number of diacritics that mark distinctions within categories. Several of these could well be phonologically distinct tonal sequences. They correspond to categories that Kawakami (1963) discusses (e.g., ‘floating rise’, ‘hooked rise’) that do not seem to have any kind of gradient relationship between the phonetic dimensions of contrast and a continuum of pragmatic interpretation. However, research on these distinctions is even less well-developed than research on the differences between the two types of [LH%]. Moreover, the functions that have been suggested for them are not directly relevant to the theme of this chapter. Therefore, the only X-JToBI diacritic that we mention here is for an Fo rise that Kawakami does not mention as a distinct BPM type.
The shape of this rise is convex, like that of the [H%] BPM, with the [L%] target that begins the rise aligned early in the syllable to which it associates, and it has been described by Ohishi (1959), Kori (1989b, 1997), and others as having a similar prominence-lending function. This rise contrasts with the [H%] BPM, however, in that it occurs on the penultimate syllable of the phrase rather than the last. In the X-JToBI conventions, this earlier rise is termed the PNLP (for penultimate non-lexical prominence). Because the Fo typically also falls immediately after the PNLP to a [L] target at the following phrase boundary, the X-JToBI convention is to mark the PNLP using the [HL%] rise-fall BPM label on the tones tier, with an additional tag (‘PNLP’) on the comments tier. A total of 1,162 cases of PNLP were tagged in CSJCore (62.7 percent in APS, 33.7 percent in SPS, 1.9 percent in dialogue, and 1.6 percent in read speech). In the example in figure 17.6c, the PNLP tagged rise-fall is indicated in bold.
(p. 477) 17.3 The Prosodic Marking of Prominence
Having provided an overview description of the Japanese intonation system, we can now turn to the various ways in which intonation patterns are manipulated in the marking of focal prominence in the language. Given that this is an area of research where the literature has been muddled considerably by drawing analogies to pretheoretical intuitions about English focus-marking mechanisms, it is useful to begin by describing one way in which focal prominence is not marked in Japanese.
17.3.1 Distinguishing Accent from Akusento
Much of the theoretical literature on focus marking in English, German, and Dutch invokes a structural property that is variously called accent, nuclear pitch accent, sentence stress, or nuclear stress. As this jumble of terms suggests, focus marking in these languages involves a complex interplay of prominence markers at several levels of the prosodic hierarchy. We cannot do justice here to the richness of these systems, however, and instead we discuss just one example from English that we hope expresses as clearly as possible what we mean when we say that standard Japanese has no analog to the notion “accent” when it is used as a synonym for “nuclear stress” in these Germanic languages. Especially, we must emphasize that the notion “accent” in that sense is neither a formal nor a functional analog to “accent” as a translation of akusento-kaku.
In section 17.2.2, we alluded to a common cross-language misparsing of the akusento-kaku in words such as Yaʼmano, oyoʼideru, and hashiʼtteru in figures 17.1a and 17.1b. The utterances in figure 17.7 illustrate how the English intonation system contributes to this misparsing. In figure 17.7a the first syllable of running is associated to the nuclear pitch accent in the intonation contour. By nuclear pitch accent, we mean a configuration of tones that marks the word (or some larger constituent) as the focus of the utterance and simultaneously invites the listener to make some inference about the pragmatic relationship between the focused constituent and the shared model of the discourse context. In figure 17.7a, this configuration is the fall in pitch defined by the transition from the [H⋆] pitch accent on the first syllable of running to the following [L−] phrase accent. In figure 17.7b, the second syllable of the (English) word Yamano is associated to the same kind of fall that makes running focally prominent in figure 17.7a.
The tone labels in figure 17.7 follow the MAE_ToBI tagging conventions described by Beckman, Hirschberg, and Shattuck-Hufnagel (2005). Other AM models of English offer different analyses of the falling pitch that is aligned earlier in figure 17.7b as compared with figure 17.7a. For example, Gussenhoven (2004) labels the fall as [H⋆+L], because he disputes the relevance of the notion “phrase accent” for English. Despite these differences across frameworks, however, there is (p. 478) a broad consensus about the focus-marking function of the fall and about its culminative role. That is, any complete, well-formed intonation contour of English has exactly one nuclear pitch accent per intonation phrase. In figure 17.7c, then, the sentence is divided into two intonation phrases so that both Yamano and running can be associated to a nuclear pitch accent. That is, the restructuring of the sentence into two intonation phrases in figure 17.7c preserves the culminative distribution of nuclear pitch accents, with the result that there is exactly one word in each IP that is marked as bearing the one obligatory pitch accent in the intonation contour for the IP (what Liberman and Prince  termed the “designated terminal element” of the phrase).
A noteworthy property of the English prosodic system is that this culminative distribution of nuclear pitch accents relative to IP boundaries in connected speech mirrors the culminative distribution of potential association sites for the nuclear pitch accent in polysyllabic words in the lexicon. In the words Yamáno and rúnning in figure 17.7, asin rélay, dóing, átom, atómic, rétail, ráttle, legislátion, and many other polysyllabic words of English, this potential association site is the penultimate syllable. In átomize, ráttlesnake, and législature, it is the first syllable. The terms (p. 479) primary stress and lexical stress refer to this potential for association to the nuclear pitch accent. As Hyman (2005) and many others before him have pointed out, the obligatory designation of exactly one primary stress per word imparts a sense of wordhood. Thus, compound words such as ráttlesnake and láw degree can be distinguished from phrases such as ráttle it óff and láw and órder because of the Compound Stress Rule—a morphological process whereby all but one of the primary stresses in a compound word lose the potential to associate to accent. Also, function words such as the, of, and but (which associate to pitch accents only in unusual circumstances) are commonly understood to be prosodically cliticized onto a neighboring content word. This link between the culminative distribution of primary stresses in words in the lexicon and the culminative distribution ofnuclear pitch accents in the intonation phrases of a discourse gives rise to the terms sentence stress and nuclear stress to designate the stressed syllable in the word that associates to the nuclear pitch accent in an IP.
Another noteworthy property of the English intonation system is that association to the nuclear pitch accent is often accompanied by an exaggeration of other phonetic markers of prominence that together constitute the formal properties that define the stress foot, a lower-level constituent in the English prosodic hierarchy. These include, for example, (a) the aspiration of the foot-initial /t/ in atomic and retail as opposed to the lenited pronunciations typical of the foot-medial /t/ in atom and rattler, and (b) the constrained distribution of reduced/ elided vowels that yields the morphological alternation between a full long /ɑ/in the foot-initial stressed syllable of atómic and the reduced short /ə/ or deleted vowel (making the [m] syllabic) in the corresponding unstressed syllable in átom. Thus, the aspiration of foot-initial voiceless stops can be exaggerated when the word is accented (e.g., Docherty 1992, among many others), whereas foot-medial /t/ can be weakened further and even deleted in unaccented contexts in running speech (Raymond, Dautricourt, and Hume 2006). Similarly, the morphological alternation between full versus reduced/deleted vowels in related forms such as atomic and atom is echoed by the optional prosodic alternation between disyllabic pronunciations of words such as rattler ([ræ?.lər]) in fast, unaccented contexts, and trisyllabic pronunciations ([ræ.ɾə.lər] or even [ræ.t∧.lər]) when the stressed syllables in these words are associated with accents in the intonation contour (as noted by Beckman 1996a, among many others). Such segmental prominence (“stress proper”) at the foot level is linked to the prosodic marking of focus in English because lexical stress constrains where pitch accents can and cannot go: any pitch accent in the intonation contour must associate to the head of a stress foot. This is the source of the unequivocally syllabic “strong” variants of words such as of, the, and but in contexts where they receive a pitch accent.
It is important to keep in mind that these links between syllable stress (in the lexicon) and pitch accent (in the intonation system) are particular to English and the handful of other languages that are like English. The prosodic organization of Japanese words and intonation phrases is quite different from that of English, and researchers should not be misled by the superficial phonetic resemblances that lead native speakers of Japanese to misparse the distribution of nuclear pitch accents in (p. 480) an utterance such as figure 17.7c in terms of the distribution of akusento-kaku in an utterance such as figure 17.1a, and vice versa.
The differences can be summarized most succinctly by contrasting the applicability of the notion “stress” in describing the two prosodic systems. In descriptions of English, this term is an indispensable shorthand way of capturing the many profound implicational dependencies among categorical markers of segmental prominence at the foot level and categorical markers of tonal prominence at the IP level. In Japanese, by contrast, there are no such fundamental links between segmental prominence and tonal prominence, and the term “stress” is at best a somewhat misleading translation of technical terms for pragmatic functions such as kyoochoo (better translated as ‘emphasis’ or ‘focal prominence’). For example, where vowel lenition in English is constrained to occur in weak unaccented syllables, vowel lenition readily occurs in accented syllables in Japanese even in very deliberate read speech. The deletion of the [i] in the accented second syllable of hasiʼtteru in figure 17.1b is by no means anomalous, as can be seen by the counts that Maekawa and Kikuchi (2005) report in all of the different genres in the CSJ. Also, where the typically unaccented (and lexically unstressed) pronunciation of function words in English is often described in terms of the notion “clitic”—that is, the prosodic grouping of function words onto adjacent content words in running speech—many function words in Japanese, such as maʼde ‘until’, yoʼri ‘from’, sura ‘even’, and noʼmi ‘only’, are lexically accented, and although these words are almost always realized as clitics (i.e., produced without the [L% H–] rise that would demarcate them from the preceding noun phrase), the accented syllable nonetheless is often produced with an associated Fo fall for the akusento-kaku, as noted by Sagisaka and Sato (1983) and Kubozono (1993), among others. (See Maekawa and Igarashi 2006 for counts in the CSJ.) Conversely, content words that are lexically unaccented (i.e., lexically marked as having no possible association to an akusento-kaku) do not surface with an associated pitch accent even under focus. This fact is illustrated by the utterance in figure 17.8, in which the focally prominent word mannaka ‘smack in the middle’ is lexically unaccented and surfaces without any accent, whereas the following verb okimaʼsu is lexically accented and (p. 481) surfaces with an accent but with none of the intonational hallmarks that make the preceding locative phrase the focus of the sentence. We turn to these hallmarks in the next section.
17.3.2 Phrasal Pitch Range and Focal Prominence
It has been widely noted in the literature that Japanese uses local pitch range expansion to mark focal prominence (see Kindaichi 1951; Kawakami 1957; Fujisaki and Hirose 1984; Fujisaki and Kawai 1988, 1994; Poser 1984; Pierrehumbert and Beckman 1988; Kori 1989a, 1989b, 1997; Takeda and Ichikawa 1990; Maekawa 1997; Nagao and Amano 2000; Ito 2002a, among others). Figures 17.3a and 17.8 illustrate this. In figure 17.3a, the Fo contour for an utterance of Yaʼmano-wa oyoʼideru is overlaid on the Fo contour of an utterance of Yaʼmano-ga oyoʼideru. The rendition of the Yaʼmano-wa … sentence (black line) invokes a garden-variety “thematic” interpretation of wa (see Kuno 1973; Heycock, this volume), which makes the following verb the focal constituent in the utterance. By contrast, the rendition of the Yaʼmano-ga … sentence (gray line) is an “out-of-the-blue” broad-focus utterance. In keeping with these contrasting focus patterns, the pitch range is expanded on oyoʼideru in the Yaʼmano-wa … sentence relative to that on the same word in the Yaʼmano-ga … sentence. Conversely, the pitch range on Yaʼmano-ga … is expanded relative to that on Yaʼmano-wa …, marking the subject as part of the focus constituent, which is the whole sentence in the broad-focus case. Because both Yaʼmano and oyoʼideru are lexically accented, this pitch range expansion is most readily observed around the [H] tone of the [H⋆+L] akusento-kaku, which shows both a more extreme rise from the preceding [%L] or [L%] boundary tone and a more extreme fall to the [L] of the akusento-kaku. The utterance in figure 17.8, by contrast, shows that a word need not be accented to be a target of pitch range expansion.
This utterance is from a spontaneous monologue narrative that Venditti and Swerts (1996) elicit using a puzzle task in which the speaker is asked to describe the moves chosen to build a house from colored pieces of paper. In the just-completed move, the speaker places a triangular piece for the roof (saʼnkaku-no yaʼne), and in the utterance just before figure 17.8 in the monologue, she states that she will next place a pink rectangular piece for a window. In figure 17.8, then, she describes where exactly on the roof she will put the window piece. She produces prosodic prominence on the locative phrase mannaka-ni ‘smack in the middle’, which is the focal constituent. Note that this focused phrase is unaccented. That is, it does not contain the sharp fall in Fo that is present in APs containing words that are lexically specified as accented. Nonetheless, it is clearly marked for focal prominence by the expanded local pitch range. As noted above, this is a key difference from prominence marking in English, where a pitch accent is inserted on a rhythmically strong syllable postlexically to cue the pragmatic function of focal prominence. English speakers also employ pitch range expansion to mark focus, but it is localized to the tone targets of the nuclear pitch accent. Japanese is different in that accents cannot (p. 482) be inserted in phrases that do not contain lexically accented words. Therefore, when unaccented phrases such as mannaka-ni in figure 17.8 receive focal prominence, the local pitch range expansion must target tones other than the [H⋆+L] of the akusento-kaku. Specifically, the pitch range expansion is most easily observed on the [%L H−] boundary-marking sequence at the beginning of the focused unaccented AP.
This kind of dramatic rise at the beginning of the focus constituent is often termed reset or Fo reset. In both unaccented and accented phrases, the percept of reset can be enhanced in a gradient way by a contrast effect, as Maekawa (1997) notes in an experimental analysis of Fo heights of all phrases in an utterance containing a focused constituent. He shows that the pitch range of the prefocal phrase varies in inverse relationship to the degree of focal emphasis; more extreme pitch range expansion on the focused phrase is accompanied by more extreme pitch range compression on the immediately preceding phrase. Ito (2002a) similarly observes that the Fo values of prefocal peaks are lower than those in a baseline (broad focus) condition.
One topic of lively discussion in the early AM literature on Japanese is the exact extent of the local pitch range expansion after reset. This debate centers on the notion of the “domain” of pitch range specification and the “restructuring” of the prosodic constituent hierarchy that would license a local pitch range expansion such as that observed on mannaka-ni in figure 17.8. In Fujisaki's synthesis system (Fujisaki and Sudo 1971, Fujisaki and Hirose 1984), as in other models from that era when the paradigmatic specification of continuous variation in pitch range was first incorporated directly into the phonological formalism, there are two ways in which the extreme rise in Fo at the beginning of mannaka in this utterance can be formalized. First, it can be generated by specifying an unexpectedly large amplitude for the accent command on mannaka-ni.2 (In the Pierrehumbert and Beckman (1988) synthesis model, this corresponds to the specification of an AP prominence ratio value that is much higher than 1.o for the mannaka-ni phrase.) Second, it can be generated by inserting an unexpected new phrase command with an especially large amplitude around the beginning of mannaka-ni. (In the Pierrehumbert and Beckman model, this corresponds to an unexpected IP boundary after yaʼne-no, with an unusually high topline value for the new IP that begins at mannaka-ni.)
To try to decide between these two alternative accounts (i.e., a local AP prominence or an IP break), Pierrehumbert and Beckman (1988) analyze utterances containing adjective-noun sequences such as umaʼi mameʼ-wa ‘tasty beans-top’ that differ minimally by which word receives a corrective narrow focus, comparing these utterances with baseline utterances with broad focus on the noun phrase as a whole. In one analysis, they have a fluent speaker of Japanese listen to each utterance and tag the perceived degree of disjuncture between the two words. In this test, focused nouns are usually perceived as being set off from the preceding adjective by an IP boundary. Sometimes this percept could be attributed to a visible pause or glottalization. Another analysis exploits the fact that application of downstep (p. 483) is blocked across an IP boundary: Pierrehumbert and Beckman use materials that systematically vary the adjective between accented umaʼi ‘tasty’ and unaccented amai ‘sweet’, and they observe whether downstep has applied by comparing the peak Fo in the following focused noun in the accented context versus the unaccented context. They find that the Fo height of the focused noun is not significantly lower when following an accented word. They interpret this nonapplication of downstep as further evidence indicating that an IP boundary is inserted immediately preceding a focused phrase. In other words, they conclude that focus marking involves the second of the two mechanisms described above—that is, the insertion of an unexpected high-amplitude phrase command in the Fujisaki model, or insertion of an IP boundary in the Pierrehumbert and Beckman model.
Kubozono (1993, 2007), by contrast, argues for the first of the two mechanisms, which corresponds to the unexpectedly large accent command in the Fujisaki model (or a local AP prominence in the Pierrehumbert and Beckman model). He shows summary statistics suggesting that downstep is not always blocked and thus concludes that late narrow focus within an adjective-noun sequence (or any other syntactic constituent that normally is produced as one IP) does not involve a restructuring of the prosodic hierarchy. Rather, the focused AP is produced with a “metrical boost” realized on the [H] tone targets. We return to this point in section 17.4.2 after describing other aspects of focus marking that also have been described in terms of restructuring, albeit at a lower level of the prosodic hierarchy.
17.3.3 Prosodic Subordination after Focal Prominence
Another older observation reconfirmed by Pierrehumbert and Beckman's study is that focal prominence has measurable effects on following material as well as on the focused constituent itself. That is, putting focal prominence on a word in an utterance induces some kind of prosodic subordination of all following words in the intonation phrase. For example, in the utterance in figure 17.8, the Fo on the verb okimaʼsu is much lower than the [H−] target in the preceding mannaka. In figure 17.3b, similarly, the accent peak on the verb oyogeʼru is very much lower than the accent peak on the preceding Yaʼmano-wa, in keeping with the “contrastive” interpretation of the wa in this utterance. That is, the degree of pitch range reduction on the verb in this utterance is more than can be accounted for simply by downstep, as can be gauged by comparing the peak Fo on oyogeʼru in figure 17.3b with the oyoʼideru in the broad-focus utterance of Yaʼmano-ga oyoʼideru in figure 17.3a (gray line).
This postfocal prosodic subordination is realized in various ways that can be modeled in terms of more or less elaborate manipulations of the prosodic structure. In the simplest case, the only feature that is affected is the pitch range; there is an extreme pitch range reduction on all words in the postfocal region, but all tone targets are realized in the Fo contour. An example of such a realization is the small but observable rise-fall of the [L% H⋆+L L%] on oyogeʼru in figure 17.3b. In these (p. 484) cases, the demarcative rise from the AP-level [L%] to a following [H] target (the phrasal [H−] or the [H⋆] of an early accent) is preserved on the postfocal constituent, although the manifestation of this rise can be very subtle because of the substantially reduced range.
Quite often, however, the AP-initial rise is not realized after a focused word. Pierrehumbert and Beckman (1988) describe this later type of postfocal prosodic subordination as the process of dephrasing, which they define as a total deletion of the [L% H−] demarcative AP-initial rise and (in cases where this deletion would yield two akusento-kaku within the resulting merged AP) deletion of all but the first [H⋆+L] accent fall. That is, they posit a restructuring under focus that not only inserts a new IP boundary to set off the beginning of the focused constituent but also erases any following AP boundary within the IP. They relate this restructuring to the morphological processes that determine the accent patterns of compound words.
Pierrehumbert and Beckman illustrate dephrasing with figures of utterances that place narrow focus on the adjective in their target adjective-noun sequences. The figures show an extreme Fo rise marking the beginning of the adjective but no Fo rise at the beginning of the following noun. The lexical accents of the component words interact with dephrasing to yield three distinct contours. When the adjective is accented, the purported dephrasing is realized as a very steep fall at the akusento-kaku, followed by a gradual fall located in the lower part of the speaker's pitch range. This combination of steep fall followed by gradual fall resembles the Fo contour on the chiʼizu-o taʼbeta portion of figure 17.2a. Notice there is no Fo rise onto taʼbeta in this case. When the adjective is unaccented, in contrast, dephrasing results in a more or less gradual decline in the upper part of the pitch range. In an unaccented adjective + accented noun sequence (e.g., amai mameʼ in Pierrehumbert and Beckman's materials3), such dephrasing resembles the gradually declining interpolation between the phrase-initial [H−] and the [H⋆+L] on the verb shown in figure 17.2b. In the unaccented adjective + unaccented noun sequence (e.g., amai ame), the shape resembles the interpolation between the initial [H−] and the utterance-final [L%] shown in figure 17.2c. Pierrehumbert and Beckman interpret all three of these patterns as indicating a prosodic restructuring that groups the postfocal noun into the same AP as the focused adjective. They identify this dephrasing with the morphological process of prosodic cliticization that distinguishes auxiliary verbs such as iru and miʼru from their full clause counterparts, making the single AP yoʼnde-miʼru ‘try reading’ distinct from the two AP sequence yoʼnde miʼru ‘read and then see’ (as described in Poser 1984, for example).
Later work by Kori (1997) gives further support for this account and also clarifies the exact nature of the prosodic restructuring. Specifically, Kori's examples show that postfocal prosodic subordination effectively deletes the [L% H−] sequence that normally marks the boundary between two AP, but it need not delete the tone targets of the subsequent akusento-kaku when this dephrasing occurs within a sequence of two or more accented words. Kori describes cases of prosodic subordination after a focused accented word where postfocal accents are in fact maintained, noting they are prosodically weakened due to the fact that they (a) are (p. 485) realized in a reduced pitch range and (b) are lacking the phrase-initial rise. This is shown by the “shoulder” of the postfocal accent on ikimaʼshita ‘went’ in figure 17.9a. In contrast, Kori notes that prosodic subordination after unaccented focused words is realized differently, although it can be modeled in terms of the very same process of dephrasing at the AP level. That is, after an unaccented focused word, the deletion of the AP-initial [L% H−] rise before any postfocal word results in the Fo remaining relatively high in the speaker's range, as illustrated in figure 17.9b. As Kori points out, despite the striking superficial difference between low Fo after an accented focused word and higher Fo after an unaccented focused word, there is a common underlying prosodic restructuring. As he puts it, “At first glance, the behavior under focus seems to differ depending on the accentuation, but actually you can think of both of these realizations as ‘weakening the autonomy of postfocal words’” (Kori 1997:178). That is, prosodic subordination can be accomplished by dephrasing (deletion of the demarcative [L% H−] rise) in both accentual contexts.
To understand the significance of examples such as figure 17.9a, where a focused accented word is followed by another accented word within the same AP, recall that Pierrehumbert and Beckman (1988) follow Fujisaki and Kawai (1988) (p. 486) and others in assuming that dephrasing deletes the tones of the second akusento-kaku as well as the demarcative tones at the AP boundary itself. Under this account, the culminative distribution of accents in compound words (as described by McCawley 1968 and Kubozono 1993, among many others) is mirrored in the culminative distribution of accents in the postlexical process of dephrasing for focus marking. Of course, in Fujisaki's model this linkage between dephrasing and accent deletion is unavoidable, given that the square wave of the accent command generates both the phrase-initial rise and the later fall at the accented syllable. In the AM framework, by contrast, there is no necessary coupling between the demarcative rise and the accent fall. This means that dephrasing can delete the [L% H—] sequence that marks the boundary between two accented words without deleting [H⋆+L] in the second word. Thus, as noted previously, function words such as yoʼri and suʼra can be cliticized onto the preceding content word and still surface with the lexically specified fall at the accent.4 A related process of subordination of following accents seems to be involved in the contrast that Poser (1984) notes between yoʼnde-miʼru ‘try reading’ (where the auxiliary verb is cliticized onto the main verb) and yoʼnde miʼru ‘read and see’ (where the two verbs form separate APs). Because he assumes a full specification account, Poser describes this contrast in terms of a prosodic grouping that necessarily involves a complete deletion of all [H] tone targets in the auxiliary verb miʼru, making the auxiliary reading lose its accent to be yoʼnde-miru. However, Maekawa's (1994) statistical analysis of Fo slopes in postfocal yoʼnde-miʼru versus yoʼnde-iru is reading’ shows that this account is not tenable. That is, the contrast between accented miʼru and unaccented iru is preserved even for the auxiliary verb readings of these two verbs. And, as figure 17.9a shows, narrow focus on the first word in a sequence of accented words can be realized by a prosodic subordination of postfocal material that drastically lowers the [H⋆] target without completely eliminating the accent fall in the second word.
Note that by this account (in contrast to Pierrehumbert and Beckman's original formulation), dephrasing after the focus constituent is a “postlexical” intonational process that is qualitatively different from the morphological processes that determine the distribution of accents in compound words. As described at length by McCawley (1968), Poser (1984), and Kubozono (1993), among many others, when two accented words are conjoined in compound word formation, the output is a word with at most one accent (e.g., shaʼkai ‘society’ + seʼido ‘system’ ⃗ shakaiseʼido ‘social system’).5 By contrast, by the account of postfocal prosodic subordination presented here, when two or more accented words are conjoined by the deletion of the boundary-marking rise, the output is a single AP that can contain two or even more akusento-kaku. Note, too, that although his account is somewhat different from the one presented here, Kubozono also differentiates the morphological processes that enforce a culminative distribution of accents in compound word formation from the postlexical grouping of words into phrases to reflect the syntactic organization of sentences in a discourse. He reserves the term accentual phrase for the output of the compound accentuation rules and terms the postlexical prosodic grouping the minor phrase. Although Kubozono does not discuss postfocus (p. 487) subordination in these terms, we can say that his “accentual phrase formation” is roughly analogous to the English Compound Stress Rule in the same way that his “minor phrase formation” is roughly analogous to the application of the Nuclear Stress Rule to larger constituents such as noun phrases in the older literature on the syntax-prosody mapping in English. This recalls the analogy that Pierrehumbert and Beckman (1988) make between postfocal dephrasing in Japanese and deaccenting in English—that is, the constraint in the English intonation system against the occurrence of any further pitch accents within an IP after a word that bears the nuclear pitch accent. These analogies serve to highlight both the profound formal differences and the deeper functional similarities between the two intonation systems. Where Japanese uses prosodic grouping to mark prosodic subordination, English uses the culminative distribution of stress markers such as association to a nuclear pitch accent. Nonetheless, in both languages, there is a formal relationship between the mechanisms that mark morphological subordination in compound word formation and higher-level subordination of postfocal material within the IP. We return to this point in section 17.4.3.
17.3.4 Prominence-lending Boundary Pitch Movements
In the two previous sections we described how speakers can use prosody to cue focal prominence in Japanese (a) by inserting an IP boundary and manipulating the pitch range before and after that boundary to make for a dramatic reset at the beginning of the focus constituent, and (b) by processes of prosodic subordination after the focus constituent that include the deletion of the rise at the beginning of following APs as well as an extreme pitch range compression that can reduce subsequent accent peaks to the same level as the [L] target of the first [H⋆+L] when the focus constituent itself is lexically accented. We now turn to a different kind of restructuring to cue focus whereby the speaker sets off the focus constituent from adjacent material on both sides. That is, in addition to the reset that marks the beginning of the focused word, there is some salient marking of the end of the focus constituent to separate it prosodically also from following material. One way that speakers can mark the end of the focus constituent is by inserting a silent pause. When this pause is inserted after an unaccented word, it simply interrupts the interpolation from the phrasal [H−] to the next tone target in an attention-getting way. In figure 17.8, for example, the downward-sloping interpolation from the [H−] in mannaka-ni toward the lowered [H⋆+L] in okimaʼsu suddenly stops for 200 ms before resuming again at the level that it would have reached if the two words had been separated instead by a filled pause. Another way of marking the end of the focal constituent to set it off from following material is by producing a BPM. As noted earlier, two of the four BPMs that are tagged as tonally distinct categories in the X-JToBI scheme—namely, [H%] and [HL%]—have been linked to a prominence-lending function. Both of these are illustrated in figure 17.10, and we describe each of them in turn. (p. 488)
22.214.171.124 The prominence-lending rise
The [H%] prominence-lending rise (PLR) is the BPM most commonly associated with a prominence-lending function. For example, focal prominence can be marked on the phrase Naʼoya-mo ‘Naoya-also’ in the sentence Kinoo Naʼoya-mo oyoʼida ‘naoya also swam yesterday’ by inserting an intonation phrase break before Naʼoya accompanied by a large pitch reset, as described above and shown in figure 17.10a, or by putting a PLR at the end of the focused phrase, as shown in figures 17.10b and 17.10c. This second mechanism is noted by Ohishi (1959), Kawakami (1963), Miyaji (1963), Kori (1989b, 1997), Muranaka and Hara (1994), and Venditti (1997), among others, and more recently has been investigated experimentally by Venditti, Maeda, and van Santen (1998), Taniguchi and Maruyama (2001), Maruyama and Taniguchi (2002), Oshima (2007), and Mesbur (2005).
figure 17.10a shows the use of expanded pitch range to cue focal prominence on Naʼoya-mo ‘Naoya-also’. Note the prosodic subordination of the following verb (p. 489) oyoʼida ‘swam’, similar to figure 17.9a. Figures 17.10b–e illustrate the use of the various BPMs to cue this same function: (b) shows [H%] marking Naʼoya-mo, with no pause following; (c) shows [H%] with a pause following; and (d) gives an example of the penultimate non-lexical prominence (PNLP), which we discuss later. These three examples show prosodic subordination of the verb after the BPM, similar to figure 17.10a. The final contour figure 17.10e gives an example of the [HL%] ‘explanatory’ BPM, which was also illustrated in figure 17.5c.
Given that the PLR often occurs on phrases ending with a particle or postposition, many accounts of its form and function use terms such as “prominent particles,” which might be misinterpreted as suggesting that the particle itself is the focus constituent. The [H%] does make the final syllable in a phrase stand out in the intonation contour, and its occurrence on a phrase-final particle can mark narrow focus on the particle itself, as in Ohishi's (1959) example of metalinguistic contrast in (3a).6 More typically, however, the focus constituent is the entire phrase ending with the particle that is aligned with the PLR, as in Ohishi's example in (3b). This fact has been pointed out by a number of scholars, including Ohishi himself, who says: “When prominence is placed on a particle, the thing that is focused is not just the particle, but it is the word that that particle is affiliated or packaged with” (Ohishi 1959:95). Maruyama and Taniguchi (2002:18) also highlight this point: “Prominence on particles prosodically marks the right edge of the phrase which includes the focused element, and has the function of prosodically indicating the focus structure of an utterance which may not be represented by its syntax.”
A further point to make is that whereas [H%] can occur on a particle, it need not do so, as is shown by the examples from various authors in (3d–f) and by the contour in figure 17.6a. Indeed, it appears that the only constraint on prominence-lending BPM placement (besides the need to be at or near the constituent edge) is that a prominence-lending BPM cannot occur sentence finally.7 In this position, focal prominence can only be cued by reset—that is, by having a phrase boundary before the focus constituent where there can be a pronounced pitch-range expansion relative to prior material.
This constraint against marking focus on utterance-final phrases with PLR complements the constraints on marking focus by the mechanisms identified in the previous section. That is, on utterance-initial words, it is difficult to mark focus by inserting an IP boundary and expanding the pitch range, because the beginning of an utterance must be the beginning of a new IP in any case, and because there is no preceding material to produce in a relatively reduced pitch range to contrastively enhance the phrase-initial rise.8 This functional motivation for the constraint is reflected in Ohishi's (1959) discussion of two of his examples, given here as (3b,c). Specifically, he suggests that although local pitch range expansion can indicate narrow focus on aʼni ‘older brother’ in (3c), it is impossible to use pitch-range expansion to focus the entire noun phrase watashi-no aʼni ‘my older brother’. However, the speaker can still mark that phrase as the focus constituent by producing a [H%] PLR on the final syllable, as in (3b).
(p. 490) Another related point is that some speakers find it difficult to cue focus by expanding the range of an unaccented phrase, without setting off the focus constituent somehow from following material. That is, it is generally acknowledged among Japanese researchers (with suggestive instrumental support in an experimental study by Ito [2002a]) that it is difficult to expand the range of an unaccented phrase to mark focus. However, other experimental studies by Kori (1989a) and Pierrehumbert and Beckman (1988) have observed pitch range expansion on unaccented APs, and figure 17.8 shows that it can occur in spontaneous speech, too. The feeling of awkwardness probably originates in the lesser salience of the AP-initial boundary rise when there is no downstep to reduce the Fo values of the [L%] and any following [H⋆+L]. In such cases, the use of the [H%] PLR is a likely alternative to inserting a pause in the middle of the AP, as in figure 17.8.
A. Gakkoo-ng↑ tooroʼnkai? (Ohishi 1959)
‘Did you say it's a debate about the school?’
B. Iya, gakkoo-de↑ yatta n deʼsu.
‘No, it's that they held it at the school.’
b. Watashi-no aʼniwaa↑ otonashiʼi.
‘my older brother is shy.’
c. Watashi-no aʼni-wa otonashiʼi. [with pitch range reset on aʼniwa]
‘My older brother is shy.’
d. San-senchi-guʼrai akete↑ shita-ni okimaʼsu. (Venditti 1997)
‘I will open up about 3 cm and put it below there.’
e. [Context: ʿIs it the Mr. Yamada who lives in Miyajima?]
lie. Aoʼmori-ni sunde-ir↑ Yamada-san deʼsu.
(Maruyama and Taniguchi 2002)
‘No. (It's) the Mr. Yamada who lives in aomori’
f. Kaʼzuya-wa biʼiru-o saʼnbat↑ chuumon shimaʼshita. (Oshima 2007)
‘Kazuya ordered three bottles of beer.’
Using PLR to mark focal prominence is optional, just as pitch range expansion on the focal constituent is optional. In fact, Kori (1997) states that the only obligatory prosodic marker of prominence is the postfocal prosodic subordination discussed in section 17.3.3. In a naturalness-rating perception study, Maruyama and Taniguchi (2002) find that [H%] does indeed cue focus but that [H%] alone is not sufficient when the postfocal region is not prosodically subordinated as well. That is, whereas their intonation contour with PLR followed by a compressed pitch range (their pattern B) is judged by listeners as a highly natural way of focusing that phrase, the contour with PLR and no compression of the following pitch range (their pattern D) is judged as unnatural. They conclude, as Kori does, that postfocal prosodic subordination is crucial for cueing focus naturally.9
That said, postfocal pitch-range compression does not occur in all cases where apparently focal material is set off from following phrases by the insertion of [H%]. The utterance in figure 17.6a illustrates this point, representing a very deliberate (p. 491) speaking style that is common in monologue narratives such as the academic talks and prepared presentations in the CSJ; the speaker produces a string of phrases each deliberately set off (and given a kind of focal prominence) by [H%], with no obvious postfocal subordination. This use of [H%] to set off each phrase as a separate focal constituent is reminiscent of the short intonation phrases and frequent use of [L+H⋆] nuclear pitch accents in American radio and television news broadcast style, where prosody is used to break up the densely information-packed paragraphs typical of news stories. That is, examples such as figure 17.6a may point to another function of [H%] in discourse—a use that shares the foregrounding function of PLR in (3) but without the concomitant backgrounding of all material outside of the focus constituent. In the information-packed paragraphs of this kind of narrative monologue, the [H%] helps speakers and listeners chunk information in the speech stream. This “Here, parse this!” function is alluded to in Muranaka and Hara's (1994:396) descriptions of the PLR on particles as “making clear the syntactic structure” or “uttered with higher Fo when the preceding noun phrase is long, [indicating] the end of the phrase.”
126.96.36.199 The rise-fall BPM and PNLP
The [HL%] BPM is also said to have a prominence-lending function, and some of the remarks just made about when and why a speaker chooses to mark focus with PLR also apply to the rise-fall. In the discourse in (4), for example, speaker B uses [HL%] to put prominence on anoʼhito ‘that person’. At the same time, however, [HL%] also expresses the speaker's irritation at B's misunderstanding. In their perception study using ratings on various semantic scales, Venditti, Maeda, and van Santen (1998) find that listeners perceive [HL%] as highly ‘emphatic’ (kyoochoo shite iru) and ‘explanatory’ (joohoo-o setsumei shite iru)—that is, the rise-fall BPM imparts a very strong sense of the speaker forcefully explaining some point to the listener. In fact, [HL%] is rated higher on these two scales than [H%] is. However, in this study listeners do not rate [HL%] as indicating the speaker's irritation, as it seems to do in the context in (4).
(talking to B) Anaʼta-ga Yamada-san deʼsu ka?
Are you Mr. Yamada?
lie, watashi-wa Yoshida deʼsu.
No, I'm Yoshida.
Soshite watashi-ga↑ Yamada deʼsu. (H%)
And I am Yamada.
(talking to B) Wakarimaʼshita.
Ana ta-ga Yamada-san de su.
You are Mr. Yamada.
lie, anoʼhito-ga^ yamada-san naʼn
No, that person is
The chunking function described for [H%] and illustrated in figure 17.6a is also characteristic of the rise-fall BPM; in many utterances in the CSJ presentation speech narratives, [HL%] sets off nearly every phrase in an especially deliberate way. This use of the rise-fall BPM seems especially common in dialogues among (p. 492) young speakers, especially young girls. In fact, some speakers use [HL%] so frequently that it seems to lose any local focalizing function and becomes more of a marker of the speech style and of the speaker's social identity in choosing the style. That is, the rise-fall BPM in these speakers and this style may be like the frequent use of the English high-rise contour by the younger speakers of American English that McLemore (1991) describes, where the frequency of use effectively weakens the meaning of this contour that is described by Hirschberg and Ward (1995), among others, and makes it instead a marker of solidarity (with the speaker's fellow sorority members, in the case of McLemore's study).
As noted earlier, X-JToBI uses the [HL%] sequence of the rise-fall BPM to mark another distinct prominence-lending shape—namely, the penultimate non-lexical prominence, or PNLP, illustrated in figure 17.6c. In that utterance, the rise is localized on the penultimate syllable of the phrase soo itta yoʼo-na taʼipu no rensoogo-o ‘the aforementioned type of semantic associate’ (indicated here by underlining). The PNLP puts narrow focus on the newly introduced term rensoogo ‘semantic associate’, which is being defined in this appositional use of no.
We describe the PNLP together with the rise-fall BPM in this subsection because it is tagged in the CSJCore using the [HL%] tone label (in conjunction with the PNLP comment label). However, this choice of the rise-fall BPM label in the tagging conventions should not be interpreted as indicating a commitment to the implied phonological analysis on the part of any of the researchers who developed the X-JToBI tagging conventions. Rather, the [HL%] tag was chosen as a quick-and-dirty first approximation to the superficially similar rise-fall shape that PNLP typically shows. For example, in figure 17.6c, the Fo falls over the duration of the accusative case particle o onto the first syllable of the following phrase, the verb kotaeʼru ‘respond’. However, there is an alternative account available for this fall. Given that the beginning of the phrase immediately following a medial PLR [H%] (like the beginning of a phrase after a silent pause) is marked by an initial [%L] boundary tone, the Fo fall after a PNLP could be the transition to this independently motivated [%L] target. The PNLP rise, then, could be a leftward-displaced instance of the PLR [H%].
This alternative account is appealing to us, because the PNLP is often cited in the literature (e.g., Ohishi 1959; Kori 1989b, 1997) as a variant of the “prominent particle” phenomenon we attribute to the PLR [H%]. Moreover, our impression from the instances that we have seen is that the PNLP is more similar to [H%] than it is to [HL%] in both form and function. Compare, for example, the slope of the [L% H%] rise in figure 17.10b with that of the PNLP in figure 17.10d. Additionally, the typical lengthening of the syllable that is aligned with the [HL%] (as seen in figure 17.10e) is not present in cases of either PNLP or [H%]. However, differentiating between these two (and several other more controversial) alternative accounts is a research question in its own right. Moreover, it is a question that requires more extensive corpus work than would have been possible before the development of resources such as the CSJ. We return to this point again briefly in section 17.4.4.
(p. 493) 17.4 Ambiguities of the Prosodic Parse
At various points in our description of focus-marking mechanisms in section 17.3 we alluded to points of disagreement or potential disagreement among linguists currently working on Japanese intonational phonology. Many of these are points where the phonetic evidence that might differentiate between competing prosodic parses is inherently lacking or ambiguous. In this section, we briefly describe four of these points.
17.4.1 Postfocal Dephrasing
In section 17.3.3, we described the prosodic subordination of postfocal material in terms of the notion “dephrasing,” which we defined as the deletion (or non-realization) of the [L% H−] sequence that marks the beginning of every well-formed accentual phrase. By this definition, the junctural cues are definitive. When no [L% H−] sequence is realized between two accents and there are no segmental cues to support an AP boundary in the absence of this tonal cue, dephrasing has occurred. When two accented words are phrased together into a single AP, both accent falls can be realized, although the second is typically reduced in prominence, so that its [H⋆] target is at the same Fo level or a somewhat lower level than the [L] target of the first accent, as in taʼbeta in figure 17.2a and ikimaʼshita in figure 17.9a.
Note that by this account, for sequences of accented words, the line between dephrasing and extreme pitch-range compression is a fuzzy one, particularly when there is only one or two syllables separating the two accents. The Fo contour over the sequence of words yoʼo-na taʼipu in figure 17.6c illustrates this source of ambiguity in the prosodic parse. The inflection point at the beginning of the small rise to the accent peak on taʼipu could be parsed as the [L] of the accent on yoʼo-na or as a [L%] boundary tone at the word boundary. The X-JToBI transcriber must pay close attention to subtle cues such as the length of the word-final vowel in yoʼo-na to distinguish between these two parses. This means that there are many different degrees of clarity to the phrasal disjuncture between two accented APs. Moreover, this continuum of degrees of perceived disjuncture is in a gradient relationship to the many different degrees of subordination that are also possible. The postfocal dephrasing evident in figure 17.9a, then, is toward one end of a continuum that includes (at the other end) the much less compressed pitch range of the following verb phrases relative to the Yaʼmano-wa and Kaʼshino-wa phrases in figure 17.3b.
By our account, in other words, sequences of two accented words are vulnerable to the kind of prosodic parsing ambiguity Beckman (1996b) describes, because the Fo pattern in the middle of a sequence of two accents within a single AP is phonetically very similar to the rise that marks the boundary between two APs. By contrast, sequences of two unaccented words are not phonetically (p. 494) susceptible to this type of misparsing of the intended prosodic structure, because a rise at the boundary between two unaccented words can only be parsed as [L% H−]. Of course, sequences of unaccented words are vulnerable to the opposite misparsing, because the [L%] is not subject to downstep and because (other things being equal) the phrasal [H−] is scaled lower in the pitch range than the peak in the accent [H⋆+L]. These two facts together mean that the demarcative rise is much smaller for sequences of two unaccented words as compared with sequences of two accented words for the same degree of prosodic subordination of the second word, so that listeners may misinterpret an intended sequence of two unaccented APs as having undergone postlexical dephrasing.
Note that this account of dephrasing as one endpoint of a continuum of subordination differs from the more categorical distinction assumed in some other accounts of postfocal subordination. That is, there is a broad consensus in the AM literature that dephrasing does occur. However, many other phonologists have assumed a slightly different account of dephrasing that does not go as far as our definition in differentiating the postlexical intonational process from the morphological processes that account for the accent patterns of compound words. Specifically, in most previous accounts other than Kori 1997, the output of postlexical dephrasing is assumed to preserve the (almost perfectly) culminative distribution of akusento-kaku in the lexicon. When two accented words are phrased together into a single AP, these accounts say, the second akusento-kaku must be deleted along with the demarcative rise.10 In applying these models in analyses of actual utterances, then, the presence of a second [H⋆+L] fall in a sequence of two accented words is interpreted as unequivocal evidence that there is an intervening AP boundary even when the second peak is reduced to a target level at or even below the level of the inflection point for the [L] of the first accent. By these other accounts, then, the fall on taʼbeta in figure 17.2a must indicate an AP break after the object NP, making the prosodic structure of this utterance different from that in the syntactically identical sentences in figures 17.2b and 17.2c. Moreover, the different shapes in the postfocal region in figures 17.9a and 17.9bc are interpreted as indicating different focus-marking mechanisms depending on the lexical accentuation of the word in focus.
For example, Sugahara (2002) observes what she takes to be evidence of an effect of lexical accentuation in determining prosodic phrasing in the postfocal region. That is, like Kori (1997), she describes a pattern of no AP-initial rise and continuous gradual fall on postfocal (contextually “given”) words in unaccented sequences and a contrasting pattern with a “bumpier” transition when the focal word and the following word are accented. To account for this difference, she posits a dependency between accentual phrasing and lexical accentuation status. That is, she notes that postfocal material is not subject to dephrasing when it is contextually “new” information or at the left edge of a major syntactic constituent (XP), as might be expected if accentual phrasing tends to reflect both the discourse context and the syntactic structure of an utterance, but she also concludes that postfocal material is not subject to dephrasing when it is lexically accented. This is a (p. 495) somewhat surprising effect of the lexical tone pattern if we assume the strictly modular division of classical generative phonology between the phonological grammar that generates phonological structures and the phonetic processes that implement them.
Other researchers also have observed apparent interactions of dephrasing with lexical accentuation. Most notably, Ito and her colleagues (Ito, Speer, and Beckman 2003; Ito and Speer 2006) document such an interaction in spontaneous narratives, where the contrast between “given” and “new” information arises naturally from the nonlinguistic task that naive subjects were asked to perform. We can be confident, therefore, that it is not an artifact of eliciting the contrasting information structures in the laboratory using contrived dialogue scripts. More extensive corpus work, in conjunction with analysis-by-synthesis methods, needs to be done before we can say whether the apparent interaction can be explained away by the different, complementary phonetic susceptibilities to ambiguity in the parsing of tonal cues to disjuncture in accented-accented sequences as compared to unaccented-unaccented sequences.
17.4.2 Prosodic Restructuring or Local Boost?
In section 17.3.2, we described pitch range expansion under focal prominence in terms of a prosodic restructuring, such that an IP boundary is inserted before the focal constituent. As Kubozono (1989, 1993, 2007) and others have pointed out, however, there is an alternative account available, because the reset at the beginning of the focus constituent is potentially ambiguous: it could signal the start of a new IP, or it may be a more local boost that raises the topline of the current AP only, without affecting the prosodic organization of the utterance at the IP level.
To determine locations of IP boundaries, researchers sometimes compare the height of the target peak to that of the previous one: if the target is the same height or higher than the previous peak, they assume that there is an IP boundary between the two phrases. Kubozono (2007) refers to this as the “syntagmatic approach.” Beckman and Pierrehumbert (1992) suggest that, as a research methodology, this approach is circular; typical peak height relationships can vary to contrast two categorically different levels of prosodic disjuncture (as in the contrast between IP and AP in the Japanese prosodic hierarchy), but peak height relationships can also vary in a continuous, iconic way to reflect discourse structure, including many different degrees of relative foregrounding versus backgrounding of constituents. They therefore suggest that the only way to definitively determine whether an IP boundary exists between two phrases is to systematically vary the accentuation of the first phrase and observe the downstep effects (if any) on the second phrase. Kubozono (2007) refers to this as the “paradigmatic approach” and he, too, suggests that it is a better method for deciding whether an IP boundary has been inserted before a focus constituent.
(p. 496) Using this paradigmatic approach, Pierrehumbert and Beckman (1988) find no significant effect of previous accentuation on focused peaks in their data. Specifically, although they find a small (albeit insignificant) mean difference for the peak height value in a focused word depending on the accent of a preceding word, a scatterplot showing the peak height relationship for each token in the data set suggests that this is due to variability in the phrasing, with a few tokens showing a dramatic syntagmatic reduction after accent, and all the others showing reset. This pattern leads them to propose that focal prominence typically introduces an IP break before the focused element. This proposal has been widely adopted in subsequent analyses of focus-related effects on phrasing in Japanese (Nagahara 1994, Ito 2002a, among others).
In contrast to Pierrehumbert and Beckman's findings, Kubozono (2007) finds a significant difference in mean values for Fo peaks in focused wh-phrases such as the naʼni-o ‘what-acc’ in (5). Here, the accentuation of the two phrases preceding the target word naʼni is systematically varied, yielding a somewhat lower average peak value on naʼni in (5a) relative to the mean in (5b) for five of seven speakers. This small difference in mean values appears even though the pitch range of the wh-word appears to be expanded under focus in every token. Following his earlier work on local pitch range expansion at major syntactic boundaries (see Kubozono 1989, 1993), Kubozono posits a local Fo “boost” that occurs on focused phrases without undoing the effects of downstep from preceding accents, as would be predicted if reset were associated with an unexpected IP boundary just before the target word. Kubozono concludes that pitch range expansion for focus marking can occur without the prosodic restructuring posited by Pierrehumbert and Beckman.
‘What did you see with Naoko in Aomori?’
‘What did you see with Naomi in Oomori?’
How can we account for the discrepancies between these two studies? Kubozono points to one possible source of the difference in results—namely, a difference between his materials and the materials used in most other experiments, including Pierrehumbert and Beckman's. Kubozono examines wh-phrases, which are (under most accounts of focus in questions) natural targets of focal prominence.11 By contrast, most other laboratory studies of focus use scripted dialogues that are designed to prompt the readers to perform a contrastive or even corrective focal prominence. For example, the materials in Pierrehumbert and Beckman 1988 use the “contrastive wa” construction illustrated in figure 17.3b. As Prince (1981) and many others have pointed out, the discourse phenomena that have been termed “focus” are no more homogeneous than the prominence-lending (p. 497) mechanisms that languages use to mark focus. Perhaps the differentiation of the contrastive use of wa from the ordinary thematic use of wa requires a more dramatic setting off of the focus constituent than does the “inherent” focus of the wh-word.
Another important point that emerges from comparing these studies is that the potential for misparsing the degree of prosodic disjuncture before a focus constituent interacts with the interpretation of focal prominence, and this interaction depends on the lexical accentuation in a way that is reminiscent of the ambiguities regarding dephrasing after the focus constituent discussed in section 17.4.1. That is, when preceding material is unaccented, the listener can directly compare the Fo before and after the [L% H−] rise to gauge the degree of relative pitch range expansion on the focal constituent—as in Kubozono's syntagmatic approach—and the speaker can induce the percept of focal prominence even with a fairly small reset at the beginning of the focused element. When the preceding material is accented, however, the listener must gauge the intended pitch-range relationship against the backdrop of Bayesian expectations regarding downstep—taking a necessarily paradigmatic approach by evaluating the reset relative to an internal representation of the typical effects of being after an accented word within the same IP versus being after an accented word across an IP boundary.
We suspect that this difference between the two cases is what gives rise to another interaction with lexical accentuation that Kawakami (1957) observes. He points out that after an accented word, pitch range expansion for focus marking must at least mimic the effect of blocking the downstep, creating syntagmatic cues that suggest an IP boundary, as in (6a). After an unaccented word, however, the speaker can put focal prominence on a word simply by producing the AP-initial rise within a syntactic constituent that is typically produced as one AP, as in (6b).
‘(She)'s his older sister.’
‘(She)'s my older sister.’
17.4.3 Scaling of High Tones within the AP
In section 17.4.1, we defined dephrasing as a postlexical process that differs from the morphological processes involved in compound-word formation by targeting the (p. 498) cues to the AP boundary alone, without deleting one of the accents when two accented words are conjoined. We suggested that this difference between the two processes may help to explain an otherwise puzzling dependency between dephrasing and the lexical accentuation of the focused and postfocal constituents. If we are correct in saying that the conjoining of two morphemes into a compound word differs prosodically from the conjoining of two words into a single AP, then we might also expect different behavior in cases of metalinguistic contrast on one of the elements in a compound word, as in (7).
a. OroGi-baʼnashi janaʼkute, mukasm-ba nashi da yo. (Gussenhoven 2004)
‘It's not a nursery tale, but a legend (Lit. oldentimes-tale, with -ba ‘nashi from –hanashi’).’
b. Mukashi monogaʼtari janaʼkute, mukashi-baʼnashi da yo.
‘It's not an old-time story, but a legend (Lit. oldentimes-tale).’
c. Watashi-ga miʼtai no wa eiga janaʼkute, mukashi-baʼnashi da yo.
‘What I want to see is not a film, but a legend (Lit. oldentimes-tale).’
d. Hoʼn-dana deʼwa nakute ho ‘n-bako deʼsu. (Kawakami 1957)
‘It's not a bookshelf, it's a book-bin.’
e. Koogyoo-daʼigaku daʼ ka koogyoo-gaʼkkoo daʼ ka …
‘Whether it's an industrial college, or an industrial school …’
The sentences in (7a–c) are based on work by Gussenhoven (2004), who cites Mariko Sugahara (pers. comm., 2000) as saying that the two elements of a compound word cannot be focused separately. More specifically, he says that a compound word such as mukashi-ba nashi ‘legend (from mukashi ‘oldentimes plus a sandhi form of hanashi’ ‘tale’) in (7a–c) cannot be divided into two APs to have different pitch ranges to express the difference between narrow focus on mukashi in (7a) and narrow focus on hanashi’ in (7b), and he shows Fo contours of elicited utterances of (7a–c) that “are all basically the same” (Gussenhoven 2004:205). His figure suggests two further pieces of evidence for the purported prosodic indivisibility of the compound word: (a) the initial consonant in the second morpheme is [b] rather than the underlying [h], showing that the morphological process of rendaku (“sequential voicing”) has applied, and (b) the location of the accent fall in the Fo contour reflects the compound accent pattern rather than the underlying final accent of hanashi’ ‘tale’.
By contrast, Kawakami (1957) says that corrective focal prominence on the second element in compound words such as hoʼn-dana versus hoʼn-bako in (7d) and koogyoo-daʼigaku versus koogyoo-gaʼkkoo in (7e) can be expressed by having the same kind of AP-initial rise in the middle of the compound word that he observes in sequences of words such as (6b). At the same time, he points out that even when this prominence-lending AP-initial rise occurs, the forms show the other prosodic hallmarks of being compound words. Specifically, rendaku applies to both the underlying tana ‘shelf’ and hako ‘box’ in (7d), and the compound accent pattern applies to put an akusento-kaku on the first syllable of daigaku ‘college’ and gakkoo ‘school’ (both of which are unaccented as simplex forms) in (7e). (p. 499)
The conflicting cues to the nature of the word-internal juncture that Kawakami describes for examples (7d,e) is reminiscent of the conflicting cues to the lexical status of the initial morpheme in English words such as illiterate and impolite when corrective focus associates the nuclear pitch accent to the prefix, as in (8).
(8) He's not polite. In fact, he's downright impolite.
(p. 500) At the same time, it is important to point out that Gussenhoven's identification of a common source of difficulty for expressing the narrow focus in (7a,b) assumes that focus marking on the first element of a compound should be the mirror image of focus marking on the second element. There is no logical necessity to this assumption. As we noted in section 17.3, there are complementary constraints on the availability of different focus-marking mechanisms at different positions in an utterance. Prosodic restructuring to begin the focus constituent with an unexpected IP boundary and reset is not a possible way to express narrow focus on an utterance-initial word, whereas postfocal dephrasing is not a possible way to express narrow focus on an utterance-final word. Analogously, the production of unexpected tonal cues to an AP boundary that Kawakami describes for (7d,e) should not be able to express the early narrow focus of (7a) as distinct from the whole-word focus of (7c), and if there is an analog to postfocal dephrasing, we might expect it to be constrained to occur on nonfinal elements of a compound word. The utterance in figure 17.11 suggests a possible analog to postfocal dephrasing within a compound word form, so long as it is accented and shows the late accent placement that is a typical output of the compound accentuation rules.
We mentioned in section 17.2.2 that the [H⋆] target of the accent is inherently scaled higher in the local pitch range for the AP than is the phrasal [H−]. In section 17.4.1, we discussed the implications of this difference for the interpretation of rises in the middle of sequences of accented words where the second accent is early in the word. Here we discuss another implication of this relationship of [H−] 〈 [H⋆]. When an AP groups together an unaccented word with a following accented word, this difference in inherent tone scaling often results in an upward-sloping interpolation from the initial [H−] to the following accent [H⋆]. This is the pattern over the initial AP Akai yaʼne ‘red roof’ in all three panels of figure 17.4 and in the first two words Omiyage-no chiʼizu ‘souvenir cheese’ in figure 17.2a. The productions of mukashi-baʼnashi that Gussenhoven shows for the three focus patterns in (7a–c) also have this [H−] 〈 [H⋆] tone-scaling relationship, and he follows most earlier AM accounts in interpreting this as evidence that focal prominence on a word can only be marked by changing the pitch range specification for the AP containing the word as a whole, without affecting the inherent tone-scaling relationship of [H−] 〈 [H⋆] within the phrase. As Pierrehumbert and Beckman (1988:108) put it, “focus is a property of the AP as a whole, and is therefore reflected in the realization of the accent H as well as the phrasal H.”
figure 17.8 already shows that this may be an oversimplification. The [H−] target on the focused mannaka-ni is very much higher that the [H⋆] target at the following accent in okimaʼsu. figures 17.9b and 17.11a show that this unexpected relationship of [H−] 〉〉 [H⋆] can be produced to mark narrow focus on the first element in an AP even when there is no silent pause interrupting the interpolation from the [H−] to the [H⋆], as is the case in figure 17.8. figure 17.11b shows that this unexpected relationship of [H−] 〉〉 [H⋆] can even be produced in a compound (p. 501) word. That is, because the sequence emmentaaru-chiʼizu is marked as a morphologically conjoined sequence by the compound accentuation pattern (emmentaʼaru being underlyingly accented), the downsloping pattern in figure 17.11b is comparable to the patterns that Kawakami observes in (7d,e)—an instance where focal prominence is marked on just one of the elements in a compound, by reversing the expected relationship between tones and the prosodic structures that they mark. Examples such as (7d,e) and figure 17.11b thus exemplify another way in which the prosodic parse can be ambiguous; the segmental cues and compound accentuation pattern tell the listener that these forms are very closely conjoined morphologically, but the postlexical AP-boundary tones in (7d,e) and the independent scaling of the two [H] targets that allows for the [H−] 〉〉 [H⋆] relationship in figure 17.11b tell the listener that the two subelements of the compound form are independent enough that the second or the first of them can be marked separately as the focus constituent.
17.4.4 Prominence-lending Edge Markers
A final point of potential ambiguity that we briefly address here concerns the phonological analysis of prominence-lending pitch movements. At first glance, the rise-fall Fo movements located on or near the particle mo for the [L%H%] sequence in figure 17.10b, the PNLP in 17.10d, and the [L%HL%] in 17.10e are all strikingly similar in shape to the [%L H⋆+L] sequence for the AP-initial rise and lexical accent on the first syllable of Naʼoya in figure 17.10a. This surface similarity has prompted some researchers to wonder whether the movements in figures 17.10b–e are in fact pitch accents (as opposed to boundary tones), and that is, in fact, how some researchers model them. Fujisaki and Hirose (1993), for example, generate all of these rises (including the more scooped question rise) by inserting an extra accent command. Oshima (2007) similarly argues that all particles are underlyingly accented, not just bimoraic ones such as maʼde and suʼra. This allows him to analyze the rise-fall shape on the particle mo in figures 17.10b–e as the resurfacing of the accent tones under narrow focus. That is, Oshima would interpret the PLR and other prominence-lending BPM in these examples as an exaggeration of an underlying [H⋆+L] fall on mo, marking the particle as the focus constituent.
The X-JToBI analysis that we adopt is different. Although the contour on mo in figures 17.10b–e does superficially resemble the shape around the lexical accent on the first syllable of Naʼoya in figure 17.10a, we do not consider these BPMs to be the realization of an underlying accent on the monomoraic particle mo. This X-JToBI analysis is supported by Mesbur's (2005) experimental observation that multimoraic particles that have lexically specified accent on some nonfinal mora (e.g., guʼrai ‘about, as much as’) can be uttered with both the rise-fall of the lexical accent and a final [H%] PLR under focus, as in (9).
(p. 502) (9) Omame guʼrai↑ eiyoo no aʼru mono wa naʼi n da kara.
‘(You'd better eat them) since there's nothing with as much nutrition as beans.’
At the same time, we can see clearly why Oshima (2007) and Fujisaki and Hirose (1993) were attracted to this account of prominence-lending BPMs as accents instead of as boundary tones. And an analysis that equates the prominence-lending properties of these BPMs with focalizing mechanisms that expand the accent fall in figure 17.10a is especially attractive for the BPM that X-JToBI calls PNLP (penultimate non-lexical prominence). The rise-fall movement of PNLP is transcribed as a [HL%] boundary tone, even though it is not realized at the phrase edge, as is the canonical ‘explanatory’ [HL%] (with which it shares its X-JToBI label) and the [H%] PLR (with which it is allied in alignment and duration characteristics). Thus, PNLP is inherently ambiguous. It looks and acts like the [H%] boundary tone of the PLR, but it violates our expectation that the phonetic realization of a boundary-marking tonal morpheme should be localized at the edge to which it is phonologically affiliated.
However, other languages have similarly ambiguous tones that can be analyzed as boundary-marking events despite their dislocation from the phrase edge. Compare, for example, the behavior of English phrase accents, which are associated with the phrase edge but spread leftward to fill up the space between the nuclear accent and the phrase edge. (The [L−] tones labeled in each panel of figure 17.7 are an example of such phrase accents.) Grice, Ladd, and Arvaniti (2000) review a number of languages that have prominence-marking tones with such an ambiguous or dual affiliation, including Greek and Hungarian. The prominence-lending function of the [H%] and [HL%] in figures 17.10b–d already suggests an affinity between these boundary tones and the phrase accents of languages such as English, Greek, and Hungarian. This prominence-marking function makes them susceptible to a diachronic reanalysis such as the one that may have given rise to the possibility of postlexical insertion of accents in Basque. The PNLP shares with the phrase accents of languages such as English, Greek, and Hungarian an even more inherently ambiguous parse because of its displacement from the edge to which it is phonologically associated. It will be interesting to see whether this displacement is the harbinger of a sound change that will make Tokyo Japanese more similar to Basque.
(p. 503) 17.5 Theoretical Implications
We began this chapter by reviewing aspects of the Japanese intonation system that are especially relevant for understanding the interplay between information structure and prosodic organization. We then reviewed the literature that describes the ways in which prosody is used by the language to mark focal prominence on constituents in running speech. The literature shows that there is a rich variety of prominence-marking mechanisms even when morphosyntactic mechanisms such as scrambling are ignored.
In reviewing this literature, we have tried to highlight points of consensus that emerge from comparing how a particular phenomenon or particular set of phenomena is treated in models that make different assumptions about such aspects of the intonation system as whether the [H–] tone at the beginning of an accentual phrase is an independent target from the [H⋆] tone at the accent fall. When different models make analogous generalizations about older observations leading to similar predictions about future observations, this makes us confident that the generalizations are robust.
One generalization that emerges from comparing treatments across frameworks in this way is that pitch range specification plays an integral role in the intonational phonology. That is, as we saw in section 17.4.2, it is possible to disagree about whether the reset that marks the beginning of the focus constituent in examples such as (5) involves the insertion of an unexpected IP boundary or a boost to the pitch range after a mere AP boundary. But there is no disagreement that speakers have a flexibly continuous control of the pitch range on either side of this boundary and that they can use this control to produce a continuum of degrees of reset to indicate different relationships of foregrounding the focus constituent against the background of reduced pitch range on the preceding constituent. Moreover, although the exact details of the analysis differ across different frameworks, there is strong agreement that this continuous control of pitch range interacts with the control of categorical contrasts in lexical tone specification to produce superficially different patterns, as shown in figure 17.9. Nonetheless, listeners can parse this variation in terms of their expectations about how words and syntactic phrases will be reflected in the prosodic organization of an utterance to recover the intended focus pattern.
A closely related (and overarching) generalization concerns the importance of prosodic grouping and the role of constituent edges in the marking of focal prominence. The exaggerated reset in utterances such as figure 17.8 targets the beginning of the focus constituent, as does the unexpected AP boundary in examples such as (6b), whereas the prominence-lending boundary pitch movements discussed in section 17.3.4 target the end of the focus constituent. This focus on edges contrasts to the importance of head positions in the English intonation system. That is, prominence-marking in Japanese, unlike prominence-marking in English, does not target any such position as the syllable with primary stress in the focused word or the “designated terminal element” of the intonation phrase.
(p. 504) Despite this profound formal difference between the two languages, however, there is still a similarity concerning the associated types of phonetic properties that mark some constituents as more prominent and others as less prominent. For example, just as pitch range is manipulated to produce a more extreme rise to mark the beginning of a focused word in Japanese, pitch range is expanded to produce a more extreme rise or fall around the culminative pitch accent associated to the syllable that is the obligatory head of a focused word in English. If we understand the phonology-phonetics interface as being a counterpart to the syntax-semantics interface at the other end of the grammar, then this speaks to a deeper similarity in the ways in which the grammatical patterns are interpreted. However, this deeper “semantic” similarity can only be described accurately once the analysis moves beyond the fixation on more superficial similarities that lead to the cross-language misparsings described in section 17.3.1. That is, one has to first analyze each language on its own terms (and not just look at surface similarities) before one can see how the two languages are fundamentally the same.
Moreover, once each prosodic system is analyzed fully in its own terms, another related deeper similarity emerges. In each language, when there is no narrow focus within an utterance, some of the prominence markers are associated with constituents near the edge of the utterance. Looking at English first, we can say that the generalization about association to the obligatory head means that there must be a nuclear pitch accent somewhere in every well-formed IP. When there is no narrow focus, the nuclear pitch accent goes on the last content word (by the Nuclear Stress Rule). In Japanese, similarly, there has to be at least one IP-initial rise at the beginning of every well-formed utterance, and when there is no narrower focus prompting an IP break and reset later on, the rise from the utterance initial [%L] makes the immediately next [H] target (whether a phrasal [H–] or the [H] of a [H⋆+L]) the highest (most prominent) peak in the utterance. Thus, it is difficult to distinguish broad focus on the utterance as a whole from narrow focus on the relevant edge constituent in both languages, even though the edge is different—namely, last in English, first in Japanese.
This generalization across the two languages predicts that there should also be complementary patterns of focus projection within the VP in transitive clauses. That is, given that objects follow verbs in English, placing the nuclear pitch accent on the object NP should be ambiguous between narrow focus on the NP and broader focus on the VP, as indeed it is, as Gussenhoven (1983) and others have shown experimentally. Conversely, because objects come before verbs in Japanese, producing a prominent reset before the object and dephrasing the following verb should be ambiguous between narrow focus on the NP and broader focus on the VP, as indeed it is, as Ito (2002a, b) has shown experimentally.
Comparing across languages in this way also highlights important lacunae in our understanding of one or the other prosodic system. For example, Ito (2002a, b) tests her predictions about focus projection only with sequences of lexically accented object followed by lexically accented verb. Much previous work on English focus projection, comparably, had not examined the effects of varying aspects of (p. 505) the tune other than the location of the nuclear pitch accent, but more recent work shows, for example, that the broad focus reading of late nuclear accent placement is less readily available when the pitch accent on the object is [L+H⋆] rather than [H⋆] (e.g., Welby 2003). Are there comparable interactions in Japanese, such that focus projection is easier or more difficult in other types of sequences besides accented object followed by accented verb?
Comparing across models also highlights other substantial gaps in our knowledge. For example, one lesson to draw from the discrepancy between Kubozono's and Pierrehumbert and Beckman's results described in section 17.4.2 is that we need more work on the prosody-pragmatics interface, work that applies additional methods drawn from other fields such as sentence processing.
Another, equally important lesson that we glean from this comparison across frameworks is the reminder that phonological relationships such as the contrast between an IP boundary and a mere AP boundary are grammatical abstractions that native speakers acquire in the course of extensive exposure to a rich variety of language-specific cues. Some of these cues come from parsing the Fo contour in terms of tone targets and pitch range specifications. Others come from parsing the spectral patterns in terms of the dynamic flow of consonant constrictions and vowel resonance patterns. A frequently encountered congruence of cues from multiple sources in the signal induces a stronger abstraction, which allows adult native speakers to produce conflicting cues when necessary to convey the intended information structure, as in the mannaka-ni okimaʼsu sequence in figure 17.8, where the Fo contour suggests that the adverbial phrase and the verb are grouped together into one AP, but the long pause suggests that an IP boundary separates them. The native listener, conversely, can accommodate to such conflicting cues to recover both the intended focus pattern and the syntactic grouping that the prosodic phrasing also cues. In a similar way, although rendaku and the compound accent pattern of hoʼn-dana typically occur on sequences of morphemes that are grouped together into a single AP, the speaker can produce the tonal cues to an AP boundary to convey narrow focus on the tana and can do so without “undoing” the other prosodic cues to the morphological grouping. Moreover, the speaker can produce these conflicting cues in full expectation that the listener will recover both the intended information structure and the intended morphological grouping. The disagreements described in Sections 17.4.1 and 17.4.2 come about because the same kind of statistical dependencies drive the various interactions between the preferred focus-marking mechanism and lexical accent pattern, making for inherent ambiguities. What this lesson means for the theoretical linguist, then, is that the next generation of models of intonational phonology needs to do better justice to these complex interactions among discrete contrasts (e.g., between accented and unaccented words) and continuous variation (e.g., more versus less extreme degrees of pitch range expansion or compression at the edges of focus constituents) and the ways that native speakers and listeners take advantage of the statistical dependencies among different patterns.
Aoyagi, Seizô. 1969. A demarcative pitch of some prefix-stem sequences in Japanese. Onsei no Kenkyuu 14:241–247.Find this resource:
Arisaka, Hideyo. 1941. Akusento no kata no honshitsu ni tsuite [On the nature of accent pattern]. Gengo Kenkyuu 7:83–92.Find this resource:
Arvaniti, Amalia, D. Robert Ladd, and Ineke Mennen. 1998. Stability of tonal alignment: The case of Greek prenuclear accents. Journal of Phonetics 26:3–25.Find this resource:
Beckman, Mary E. 1996a. When is a syllable not a syllable? In Phonological structure and language processing: Cross-linguistic studies, ed. Takashi Otake and Anne Cutler, 95–123. Berlin: Mouton de Gruyter.Find this resource:
Beckman, Mary E. 1996b. The parsing of prosody. Language and Cognitive Processes 11:17–67.Find this resource:
Beckman, Mary E., Julia Hirschberg, and Stefanie Shattuck-Hufnagel. 2005. The original ToBI system and the evolution of the ToBI framework. In Prosodic typology: The phonology of intonation and phrasing, ed. Sun-Ah Jun, 9–54. New York: Oxford University Press.Find this resource:
Beckman, Mary E., and Janet Pierrehumbert. 1992. Comments on chapters 13 and 14 [original title: Strategies and tactics for thinking about Fo variation]. In Papers in laboratory phonology II: Segment, gesture, prosody, ed. Gerard J. Docherty and D. Robert Ladd, 387–397. Cambridge: Cambridge University Press.Find this resource:
Bruce, Gösta. 1977. Swedish word accents in sentence perspective. Lund: Gleerup.Find this resource:
Docherty, Gerard J. 1992. The timing of voicing in British English obstruents. Dordrecht: Foris.Find this resource:
Firth, J. R. (John Rupert). 1957. Papers in linguistics, 1934–1951. London: Oxford University Press.Find this resource:
Fougeron, Cécile, and Patricia A. Keating. 1997. Articulatory strengthening at the edges of prosodic domains. Journal of the Acoustical Society of America 101:3728–3740.Find this resource:
Fujisaki, Hiroya, and Keikichi Hirose. 1984. Analysis of voice fundamental frequency contours for declarative sentences of Japanese. Journal of the Acoustical Society of Japan (English) 5(4):233–242.Find this resource:
Fujisaki, Hiroya, and Keikichi Hirose. 1993. Analysis and perception of intonation expressing paralinguistic information in spoken Japanese. Lund Linguistics Working Papers 41:254–257.Find this resource:
Fujisaki, Hiroya, and Hisashi Kawai. 1988. Realization of linguistic information in the voice fundamental frequency contour of the spoken Japanese. Annual Bulletin of the Research Institute of Logopedics and Phoniatrics (RILP) 22:183–191.Find this resource:
Fujisaki, Hiroya, Sumio Ohno, Masafumi Osame, Mayumi Sakata, and Keikichi Hirose. 1994. Prosodic characteristics of a spoken dialogue for information query. In Proceedings (p. 508) of the International Conference on Spoken Language Processing (ICSLP), 1103–1106. Tokyo: The Acoustical Society of Japan.Find this resource:
Fujisaki, Hiroya, and H. Sudo. 1971. Synthesis by rule of prosodic features of connected Japanese. In Proceedings of the 7th International Congress on Acoustics (ICA), 133–136. Budapest: Akadémiai Kiadó.Find this resource:
Gandour, Jack, S. Potisuk, and S. Dechongkit. 1994. Tonal coarticulation in Thai. Journal of Phonetics 22:477–492.Find this resource:
Goldsmith, John A. 1979. Autosegmental phonology. New York: Garland.Find this resource:
Grice, Martine, D. Robert Ladd, and Amalia Arvaniti. 2000. On the place of phrase accents in intonational phonology. Phonology 17:143–185.Find this resource:
Gussenhoven, Carlos. 1983. Testing the reality of focus domains. Language and Speech 26:61–80.Find this resource:
Gussenhoven, Carlos. 2004. The phonology of tone and intonation. Cambridge: Cambridge University Press.Find this resource:
Haraguchi, Shôsuke. 1977. The tone pattern of Japanese: An autosegmental theory of tonology. Tokyo: Kaitakusha.Find this resource:
Hattori, Shiro. 1961. Prosodeme, syllable structure, and laryngeal phonemes. Bulletin of the Summer Institute of Linguistics: Studies in Descriptive and Applied Linguistics 1:1–27. Tokyo: International Christian University.Find this resource:
Hay, Jennifer. 2003. Causes and consequences of word structure. London: Routledge.Find this resource:
Hirai, Toshio, Norio Higuchi, and Yoshinori Sagisaka. 1997. Comparison of Fo control rules derived from multiple speech databases. In Computing prosody: Computational models for processing spontaneous speech, ed. Yoshinori Sagisaka, Nick Campbell, and Norio Higuchi, 211–223. Berlin: Springer-Verlag.Find this resource:
Hirschberg, Julia, and Gregory Ward. 1992. The influence of pitch range, duration, amplitude and spectral features on the interpretation of the rise-fall-rise intonation contour in English. Journal of Phonetics 20:241–251.Find this resource:
Hirschberg, Julia, and Gregory Ward. 1995. The interpretation of the high-rise question contour in English. Journal of Pragmatics 24:407–412.Find this resource:
Hualde, José Ignacio. 1991. Basque phonology. London: Routledge.Find this resource:
Hualde, José Ignacio, Gorka Elordieta, Iñaki Gaminde, and Rajka Smiljanić 2002. From pitch-accent to stress-accent in Basque. In Laboratory phonology 7, ed. Carlos Gussenhoven and Natasha Warner, 547–584. Berlin: Mouton de Gruyter.Find this resource:
Hyman, Larry. 2001. Tone systems. In Language typology and language universals: An international handbook (vol. 2), ed. Martin Haspelmath, Ekkehard König, Wulf Oesterreicher, and Wolfgang Raible, 1367–1380. Berlin: Walter de Gruyter.Find this resource:
Hyman, Larry. 2005. Word-prosodic typology. Paper presented at the Between Stress and Tone conference, Leiden, June.Find this resource:
Ito, Kiwako. 2002a. The interaction of focus and lexical pitch accent in speech production and dialogue comprehension: Evidence from Japanese and Basque. Doctoral dissertation, University of Illinois, Urbana-Champaign.Find this resource:
Ito, Kiwako. 2002b. Ambiguity in broad focus and narrow focus interpretation in Japanese. In Proceedings of Speech Prosody, ed. Bernard Bel and Isabelle Marlien, 411–414. Aix-en-Provence, France: Laboratoire Parole et Langage, Université de Provence.Find this resource:
Ito, Kiwako, and Shari R. Speer. 2006. Using interactive tasks to elicit natural dialogue. In Methods in empirical prosody research, ed. Stefan Sudhoff, Denisa Lenertova, Roland Meyer, Sandra Pappert, Petra Augurzky, Ina Mleinek, Nicole Richter, and Johannes Schliesser, 229–257. Berlin: Mouton de Gruyter.Find this resource:
(p. 509) Ito, Kiwako, Shari R. Speer, and Mary Beckman. 2003. The influence of given-new status and lexical accent on intonation in Japanese spontaneous speech. Poster presented at the 16th annual CUNY Conference on Human Sentence Processing, 34. Boston, March.Find this resource:
Jun, Sun-Ah. 1998. The accentual phrase in the Korean prosodic hierarchy. Phonology 15:189–226.Find this resource:
Jun, Sun-Ah. 2005. Prosodic typology: The phonology of intonation and phrasing. New York: Oxford University Press.Find this resource:
Katagiri, Yasuhiro. 1999. Dialogue functions of Japanese sentence-final particles. In Proceedings of LPʼ98 (Linguistics and Phoneticsʼ98), ed. Osamu Fujimura, Brian D. Joseph, and Bohumil Palek, 77–90. Charles University in Prague: The Karolinum Press.Find this resource:
Kawakami, Shin. 1957. Tookyoogo no takuritsu kyoochoo no onchoo [Tonal prominence in Tokyo Japanese]Find this resource:
Reprinted in 1995 in Nihongo akusento ronshuu [A collection of papers on Japanese accent], ed. Shin Kawakami, 76–91. Tokyo: Kyûko Shoin.Find this resource:
Kawakami, Shin. 1963. Bunmatsu nado no jooshoochoo ni tsuite [On phrase-final rises]Find this resource:
Reprinted in 1995 in Nihongo akusento ronshuu [A collection of papers on Japanese accent], ed. Shin Kawakami, 274–298. Tokyo: Kyûko Shoin.Find this resource:
Kindaichi, Haruhiko. 1951. Kotoba no senritsu [The melody of language]. Kokugogaku 5(2):37–59.Find this resource:
Kiriyama, Shinya, Keikichi Hirose, and Nobuaki Minematsu. 2002. Control of prosodic focuses for reply speech generation in a spoken dialogue system for information retrieval on academic documents. In Proceedings of Speech Prosody, ed. Bernard Bel and Isabelle Marlien, 431–434. Aix-en-Provence, France: Laboratoire Parole et Lan-gage, Université de Provence.Find this resource:
Kokuritsu Kokugo Kenkyuujo [National Institute for Japanese Language (NIJL, formerly NRLI)]. 2006. Nihongo hanashi kotoba koopasu no koochikuhoo [Construction of the Corpus of Spontaneous Japanese]. NIJL report 124. Tokyo. http://www.kokken.go.jp/katsudo/seika/corpus/public/index.html.Find this resource:
Kori, Shiro. 1987. The tonal behavior of Osaka Japanese: An interim report. Ohio State Working Papers in Linguistics (OSUWPL): Papers from the Linguistics Laboratory 36:31–61.Find this resource:
Kori, Shiro. 1989a. Fookasu jitsugen ni okeru onsei no tsuyosa, jizokujikan, Fo no eikyoo [Acoustic manifestation of focus in Tokyo Japanese: The role of intensity, duration and Fo]. Onsei Gengo 3:29–38.Find this resource:
Kori, Shiro. 1989b. Kyooshoo to intoneeshon [Emphasis and intonation]. In Nihongo no onsei/onʼin (joo) [Japanese phonetics/phonology (1)], vol. 2 of Koza Nihongo to Nihongo Kyooiku, ed. Miyoko Sugito, 316–342. Tokyo: Meiji Shoin.Find this resource:
Kori, Shiro. 1997. Nihongo no intoneeshon: kata to kinoo [Japanese intonation: Form and function]. In Akusento, intoneeshon, rizumu topoozu [Accent, intonation, rhythm, and pause], vol. 2 of Nihongo Onsei, ed. Tetsuya Kunihiro, 169–202. Tokyo: Sanseido.Find this resource:
Kubozono, Haruo. 1989. Syntactic and rhythmic effects on downstep in Japanese. Phonology 6:39–67.Find this resource:
Kubozono, Haruo. 1993. The organization of Japanese prosody. Tokyo: Kuroshio.Find this resource:
Kubozono, Haruo. 2007. Focus and intonation in Japanese: Does focus trigger pitch reset? In Working papers of the SFB632: Interdisciplinary studies on information structure (ISIS) 9, ed. Shinichiro Ishihara, 1–27. Potsdam, Germany: University of Potsdam.Find this resource:
Kuno, Susumu. 1973. The structure of the Japanese language. Cambridge, Mass: MIT Press.Find this resource:
Ladd, D. Robert. 1996. Intonational phonology. Cambridge: Cambridge University Press.Find this resource:
(p. 510) Langendoen, D. Terence. 1968. The London school of linguistics: A study of the linguistic theories of B. Malinowski and J. R. Firth. Cambridge, Mass: MIT Press.Find this resource:
Laniran, Yetunde, and G. N. Clements. 2003. Downstep and high tone raising: Interacting factors in Yoruba tone production. Journal of Phonetics 31(2):203–250.Find this resource:
Liberman, Mark, and Alan Prince. 1977. On stress and linguistic rhythm. Linguistic Inquiry 8(2):249–336.Find this resource:
Maeda, Kazuaki, and Jennifer J. Venditti. 1998. Phonetic investigation of boundary pitch movements in Japanese. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), 631–634. Rundle Mall, Australia: Casual Productions.Find this resource:
Maekawa, Kikuo. 1991. Perception of intonational characteristics of wh and non-wh questions in Tokyo Japanese. In Proceedings of the International Congress of Phonetic Sciences (ICPhS), 4/5:202–205. Aix-en-Provence, France: Université de Provence.Find this resource:
Maekawa, Kikuo. 1994. Is there “dephrasing” of the accentual phrase in Japanese? Ohio State University Working Papers in Linguistics 44:146–165.Find this resource:
Maekawa, Kikuo. 1997. Effects of focus on duration and vowel formant frequency in Japanese. In Computing prosody: Computational models for processing spontaneous speech, ed. Yoshinori Sagisaka, Nick Campbell, and Norio Higuchi, 129–153. New York: Springer Verlag.Find this resource:
Maekawa, Kikuo. 2003. Corpus of Spontaneous Japanese: Its design and evaluation. In Proceedings of the ISCA and IEEE Workshop on Spontaneous Speech Processing and Recognition (SSPR2003), 7–12. Tokyo.Find this resource:
Maekawa, Kikuo, and Yosuke Igarashi. 2006. 2-moora yuukaku joshi no inritsujoo no dokuritsusei: Nihongo hanashi kotoba koopasu no bunseki [Prosodic independence of bimoraic accented particles: Analysis of the Corpus of Spontaneous Japanese]. Journal of the Phonetic Society of Japan 10(2)33–42.Find this resource:
Maekawa, Kikuo, and Hideaki Kikuchi. 2005. Corpus-based analysis of vowel devoicing in spontaneous Japanese: An interim report. In Voicing in Japanese, ed. Jeroen van de Weijer, Kensuke Nanjo, and Tetsuo Nishihara, 205–228. Berlin: Mouton de Gruyter.Find this resource:
Maekawa, Kikuo, Hideaki Kikuchi, Yosuke Igarashi, and Jennifer J. Venditti. 2002. X-JToBI: An extended J_ToBI for spontaneous speech. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), 1545–1548. Boulder, Colo.: Center for Spoken Language Research.Find this resource:
Maruyama, Takehiko, and Miki Taniguchi. 2002. Bun no shooten koozoo to kyokushoteki takuritsu [Focus structure of sentences and local prominence]. Kansai Linguistics Society 22:18–28.Find this resource:
McCawley, James D. 1968. The phonological component of a grammar of Japanese. The Hague: Mouton.Find this resource:
McCawley, James D. 1970. Some tonal systems that come close to being pitch accent systems but don't quite make it. Papers from the 6th Regional Meeting of the Chicago Linguistic Society (CLS) 6:526–532.Find this resource:
McLemore, Cynthia Ann. 1991. The pragmatic interpretation of English intonation: Sorority speech. Doctoral dissertation, University of Texas, Austin.Find this resource:
Mesbur, James. 2005. Evidence for prominence-lending rise in focus phrases in Japanese. Ms., University of Pennsylvania, Philadelphia.Find this resource:
Miyaji, Yutaka. 1963. IV Intoneeshon [Intonation]. Kokuritsu Kokugo Kenkyuujo [National Institute for Japanese Language (NIJL, formerly NRLI)] Report 23, 178–208. Tokyo: Kokuritsu Kokugo Kenkuujo.Find this resource:
(p. 511) Muranaka, Toshiko, and Noriyo Hara. 1994. Features of prominent particles in Japanese discourse. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), 395–398. Tokyo: The Acoustical Society of Japan.Find this resource:
Nagahara, Hiroyuki. 1994. Phonological phrasing in Japanese. Doctoral dissertation, University of California, Los Angeles.Find this resource:
Nagahara, Hiroyuki, and Shoichi Iwasaki. 1994. Tail pitch movement and the intermediate phrase in Japanese. Paper presented at the Linguistic Society of America (LSA) annual meeting, Washington, D.C., January.Find this resource:
Nagao, Kyoko and Shigeaki Amano. 2000. A role of fundamental frequencies in the perception of emphasized words: Conference presentation abstract. Journal of the Acoustical Society of America 108(5):2465.Find this resource:
Nakanishi, Kimiko. 2007. Prosody and scope interpretations of the topic marker wa in Japanese. In Topic and focus: Cross-linguistic perspectives on intonation and meaning, ed. Chungmin Lee, Matthew Gordon, and Daniel Buring, 177–194. Dordrecht: Springer Verlag.Find this resource:
Ohishi, Shotaro. 1959. Purominensu ni tsuite: Tookyoogo no kansatsu ni motozuku oboegaki [Prominence: Observation of the Tokyo Dialect]. Kotoba no kenkyuu [Papers of the National Institute for Japanese Language], 87–102. Tokyo: Kokuritsu Kokugo Kenkuujo.Find this resource:
Oshima, David Y. 2007. Boundary tones or prominent particles? Variation in Japanese focus-marking contours. Berkeley Linguistics Society (BLS) 31:453–464.Find this resource:
Peng, Shu-hui. 1997. Production and perception of Taiwanese tones in different tonal and prosodic contexts. Journal of Phonetics 25:371–400.Find this resource:
Pierrehumbert, Janet B. 1980. The phonology and phonetics of English intonation. Doctoral dissertation, MIT, Cambridge, Mass.Find this resource:
Pierrehumbert, Janet B., and Mary E. Beckman. 1988. Japanese tone structure. Cambridge, Mass: MIT Press.Find this resource:
Pierrehumbert, Janet, and David Talkin. 1992. Lenition of /h/ and glottal stop. In Papers in laboratory phonology II: Segment, gesture, prosody, ed. Gerard J. Docherty and D. Robert Ladd, 90–116. Cambridge: Cambridge University Press.Find this resource:
Poser, William J. 1984. The phonetics and phonology of tone and intonation in Japanese. Doctoral dissertation, MIT, Cambridge, Mass.Find this resource:
Prince, Ellen F. 1981. Toward a taxonomy of given/new information. In Radical pragmatics, ed. Peter Cole, 223–255. San Diego, Calif.: Academic Press.Find this resource:
Raymond, William D., Robin Dautricourt, and Elizabeth Hume. 2006. Word-internal /t, d/ deletion in spontaneous speech: Modeling the effects of extra-linguistic, lexical, and phonological factors. Language Variation and Change 18:55–97.Find this resource:
Roberts, Craige. 1996. Information structure in discourse: Towards an integrated formal theory of pragmatics. OSU Working Papers in Linguistics 49:91–136.Find this resource:
Sagisaka, Yoshinori, and Hirokazu Sato. 1983. Secondary accent analysis in Japanese stem-affix concatenations. Acoustical Society of Japan Transactions of the Committee on Speech Research S83–05:31–37.Find this resource:
Selkirk, Elisabeth O., and Koichi Tateishi. 1991. Syntax and downstep in Japanese. In Interdisciplinary approaches to language: Essays in honor of S.-Y. Kuroda, ed. Carol Georgopoulos and Roberta Ishihara, 519–543. Dordrecht: Kluwer.Find this resource:
Sugahara, Mariko. 2002. Conditions on post-FOCUS dephrasing in Tokyo Japanese. In Proceedings of Speech Prosody, ed. Bernard Bel and Isabelle Marlien, 655–658. Aix-en-Provence, France: Laboratoire Parole et Langage, Université de Provence.Find this resource:
(p. 512) Takeda, Shoichi, and Akira Ichikawa. 1990. Analysis of prosodic features of prominence in spoken Japanese sentences. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), 493–496. Kobe, Japan: The Acoustical Society of Japan.Find this resource:
Taniguchi, Miki, and Takehiko Maruyama. 2001. Shooten koozoo to zyosi no takuritu [Focus structure and prominence of particles]. Kansai Linguistics Society 21:56–66.Find this resource:
Trubetskôi, Nikolâi Sergeevich. 1939. Grundzüge der phonologie. (Travaux du Cercle linguistique de Prague No. 7.) Prague: Cercle linguistique de PragueFind this resource:
Trans. 1969, by Christiane A. M. Baltaxe as Principles of Phonology. Berkeley: University of California Press.Find this resource:
Vallduví, Enric. 1992. The informational component. New York: Garland.Find this resource:
Venditti, Jennifer J. 1997. Japanese ToBI labelling Guidelines. Ohio State University Working Papers in Linguistics 50:127–162. (First distributed in 1995.)Find this resource:
Venditti, Jennifer J. 2000. Discourse structure and attentional salience effects on Japanese intonation. Doctoral dissertation, Ohio State University, Columbus.Find this resource:
Venditti, Jennifer J. 2005. The J_ToBI model of Japanese intonation. In Prosodic typology: The phonology of intonation and phrasing, ed. Sun-Ah Jun, 172–200. Oxford: Oxford University Press.Find this resource:
Venditti, Jennifer J., Kazuaki Maeda, and Jan P. H. van Santen. 1998. Modeling Japanese boundary pitch movements for speech synthesis. In Proceedings of the 3rd ESCA Workshop on Speech Synthesis, ed. Mike Edgington, 317–322. Jenolan Caves, Australia. New York: Institute of Electrical and Electronics Engineers.Find this resource:
Venditti, Jennifer J., and Marc Swerts. 1996. Intonational cues to discourse structure in Japanese. Proceedings of the International Conference on Spoken Language Processing (ICSLP) 2:725–728.Find this resource:
Welby, Pauline. 2003. Effects of pitch accent position, type, and status on focus projection. Language and Speech 46:53–82.Find this resource:
(1.) Following standard practice in the literature on Japanese phonology, we use an apostrophe following the vowel to mark the location of the akusento-kaku in the roman transliteration of any word that is lexically specified as containing a syllable that associates to this [H⋆+L] tone sequence. Also, we use | to indicate an AP boundary and || to indicate an IP boundary, when we need to highlight the distinction.
(2.) Recall that the Fujisaki model uses a square wave “accent command” to model both accented and unaccented phrases; in the unaccented case, the fall at end of the square wave is obscured either by a following accent command or by cessation of voicing for a following silent pause.
(3.) Following the usual conventions in the literature on focus marking, we sometimes adopt the shorthand notation of writing the focus constituent in small caps in lieu of explicating the context.
(5.) The exceptions to this culminative distribution involve a handful of productive prefixes such as juʼn ‘quasi’ and moʼto ‘former’ (the so-called Aoyagi prefixes (see Aoyagi 1969), which retain their accent in addition to the accent on the base noun, as in moʼto-daʼijin ‘former minister’.
(6.) In text-only examples, we indicate a prominence-lending BPM by underlining the phrase-final syllable of the focus constituent, followed by the symbol ↑ (to represent the Fo rise that is realized on that syllable for [H%]) or ^ (to represent the rise-fall shape for [HL%]). The symbol ^ is also inserted after the penultimate syllable to represent a PNLP rise.
(8.) Compare the asymmetry in English between marking narrow focus on nonfinal constituents with an early nuclear accent versus the difficulty in marking narrow focus by nuclear pitch accent placement in utterance-final position (although it is possible to set off the final word with a preceding silence and by choosing L+H⋆ as the nuclear accent so as to produce a steep rise onto the peak that is associated with the accented syllable).
(10.) That is, even though AM accounts such as Pierrehumbert and Beckman's (1988) differ from Fujisaki's account (Fujisaki and Kawai 1988) in modeling the phrasal [H–] as an independent target from the [H⋆] of the accent fall, most previous AM accounts effectively model dephrasing as the formal equivalent of deletion of the whole accent command in the Fujisaki model.