Arabic Intonation

Abstract and Keywords

This overview of intonation in Arabic compares the intonational systems of selected Arabic dialects from Morocco in the West to Kuwait in the East. The formal comparison will mainly be carried out within the framework of autosegmental-metrical (AM) theory, taking the phonetic micro-prosody of the identified pitch accents as a tertium comparationis. Furthermore, the intonation systems will be compared with respect to prosodic phrasing. The second part of the overview is devoted to the functions of intonation in Arabic. In this section, the comparison will be based on a wider range of descriptions, including work carried out within other theoretical frameworks. The section will identify the role of metrical and tonal structures and the way they interact with syntax, information structure, and sentence mode in different varieties of Arabic. The concluding section will provide a preliminary typological picture of Arabic prosody with respect to the macro-rhythmic properties of Arabic.

Keywords: Arabic intonation, intonation, Arabic varieties, prosody, macro-rhythm, autosegmental-metrical theory

1. What Is Intonation and How Is It Described?

Intonation was described by Dwight Bolinger (1978) as a “half-tamed savage.” Contrary to the arbitrariness of segmental phonology, intonation is a largely iconic system that mostly follows natural mechanisms of speech production and perception (cf. Gussenhoven 2004). Its main functions are the expression of emotions and attitude. However, intonation also has an undeniable structural linguistic function. Probably the most important linguistic function is to structure the information conveyed by a speaker to facilitate cognitive processing on the part of the listener. Thus, phrasing breaks up the speech flow into manageable chunks that group together what belongs together semantically. Second, it is possible to foreground newsworthy or otherwise important information by accentuation (i.e., by rendering parts of the speech flow more prominent than others). A third frequently noted linguistic function of intonation is distinguishing between sentence types, such as statements, questions, and so on. This distinction is predominantly coded tonally, exploiting the up-and-down movements of pitch.

Similar to musical tunes in song, intonation constitutes the rhythm and melody of spoken language. The term intonation has been used to refer only to the tonal aspect with its acoustic manifestation of fundamental frequency variation (f0), or in a wider sense, covering rhythmic or accentual properties as well. Accents are frequently signaled by pitch, especially pitch change, and additionally by a range of other (supra-)segmental features such as duration, intensity, and vowel quality. In the present overview, the term “accent” will be used to refer to an actual prosodic prominence (cf. Fox 2000). In the currently prevalent approach to intonational phonology, the autosegmental-metrical (AM) approach (cf. Ladd 2008), word-level prominence, referred to as stress, is distinguished from sentence-level prominence referred to as pitch accent. Within that framework, these pitch accents are regarded as the building blocks of intonational contours.1

The present overview of Arabic intonation will predominantly refer to more recent work that has been conducted within an AM framework. In line with the basic tenets of Autosegmental Phonology (Goldsmith 1976), the AM framework of intonation assumes tonal events to be independent from the linear sequence of segments, which are formally represented on a separate tier. Similarly, metrical structure (cf. Liberman and Prince 1977) is a non-linear representation of prosodic units into which the segmental material is grouped, constituting a prosodic hierarchy from the syllable up to higher levels such as the prosodic word and the prosodic phrase. Metrical structure also indicates strength relations between syllables, identifying, for example, the metrically strong syllable of a word (word stress) or of a phrase (sentence stress). As indicated by the term “AM,” an AM model consists of these two components. The tonal contour is analyzed as a sequence of pitch accents consisting of one or more tones, which can only have the values high (H) or low (L) (Pierrehumbert 1980). Pitch accents are associated to metrically strong syllables. In addition, there are edge or boundary tones that are usually not associated with metrically strong positions but with the edges of prosodic constituents. It is common practice in intonational descriptions to dispense with the separate notation of the metrical component and use the tonal description with reference to metrical structure. This is done by marking the tone related to a metrical prominence with a star (H*). The starred tone is the one associated with the metrically prominent syllable. Pitch accents can be monotonal or polytonal, usually bitonal, with a leading tone (LH*) or a trailing tone (H*L), depending on whether the unstarred tone precedes the stressed syllable or follows it. The cited pitch accents represent a rise and a fall, respectively. However, we will see later that neither are rises necessarily marked by a leading tone, nor falls by a trailing tone, nor is it necessarily the high tone that is marked by the star. This raises the question of what a starred tone really is (Arvaniti et al. 2000 and later work)—which again is an ongoing debate. Generally, notational conventions within the ToBI system (see note 2) are a language-specific matter. One way of identifying the starred tone is the exact alignment within the stressed syllable. Figure 1 is a schematic illustration of pitch accents in German as labeled according to the notational conventions of GToBI,2 the transcription system developed for German (Grice, Baumann, and Benzmüller 2005). A starred tone is assumed if it aligns within the stressed syllable, whereas a leading L tone usually aligns shortly before the beginning of the syllable (cf. especially 1c vs. 1d).3 The exact alignment of tones is again language-specific. In Egyptian Arabic (EA) the leading L tone consistently aligns with the beginning of the stressed syllable (cf. Hellmuth 2006).

Arabic IntonationClick to view larger

Figure 1 Alignment of tones as the basis for pitch accent identification. The shaded area indicates the stressed syllable (from GToBI,

In older pre-AM work on intonation, dealing almost exclusively with Indo-European languages, it has been observed that the final part of intonation contours has a high functional load. The British school of intonation (cf. Cruttenden 1997) identifies this part as the nucleus, which is regarded as the main element responsible for signaling the meaning of the contour (see section 3.3). This part of an intonational unit is described in most AM models in terms of the last pitch accent and the following edge tones: phrase tones/accents (L-, H-) and boundary tones (L%, H%). The intonational unit itself is called intonation phrase (IP). While the end of an IP is characterized by the combination of phrase and boundary tones, there is also a smaller unit recognized in standard AM models, the intermediate phrase (ip) (Beckman and Pierrehumbert 1986), which only has phrase tones, but no boundary tones. While the nucleus did not have a special status in Pierrehumbert’s (1980) original model, more recent developments of the Standard Theory have come to integrate this notion via the distinction between pre-nuclear and nuclear accents, ascribing to the nuclear accent a higher functional load (cf. Ladd 2008: 131–134).

AM theory today is by no means monolithic (cf. Ladd 2008: 109). Not all AM models abide by the same tenets as the Standard Theory just outlined. Models differ regarding the level of phrasing assumed and the structure of pitch accents and edge tones. However, all AM approaches assume some version of tune-text association (Liberman 1979), associating tonal events with metrical positions. An example of a nonstandard approach is the ToDI4 annotation system (Gussenhoven 2005 and other work)—that has neither an intermediate phrase, nor, for that matter, phrase tones. Many pitch accents that are analyzed as LH* in the Standard Theory (in English, for instance) are described as H*L with a trailing tone instead of a leading tone in ToDI on perceptual grounds, thus being closer to the British tradition (Gussenhoven, p.c.). A standard model has, for instance, been used by Hellmuth (2006, and other work) for EA, by Chahal (2001) for Lebanese Arabic (LA), and by de Jong and Zawaydeh (1999) for Jordanian Arabic (JA). A nonstandard model for Arabic intonation has been developed by El Zarka (2011, 2013a) for the description of EA intonation.

The essay will consider the components of intonation in a bottom-up manner. The first part of this overview (section 2) deals with the phonology of intonation, comparing the proposed pitch accent inventories of Arabic varieties and discussing the micro-prosody of the accents, that is, the alignment of the individual tones (2.1.1). Section 2.1.2 will be devoted to prosodic structure. In the second part of this overview (section 3), the typical functions of intonation will be compared across Arabic varieties. Section 3.1 will discuss the syntactic functions of phrasing and section 3.2 will deal with prominence patterns, pitch accent types and their information structural function, and finally the issue of tunes and their linguistic correlates such as sentence mode and theme-rheme will be dealt with in section 3.3.

2. The Form of Intonation in Arabic

The comparison is based on descriptions available in the literature carried out within different theoretical frameworks or different AM models. Comparing such heterogeneous material is by no means an easy task. The only way to provide a reasonably reliable comparison is to use the phonetic descriptions as a tertium comparationis. Of course, this is only possible when the authors make reference to the phonetic details of the phonological categories they identify. Furthermore, the descriptions available differ regarding the depth of the analysis: some are the result of comprehensive corpus studies, sometimes involving various methodologies, while others are only preliminary investigations. A third difficulty is due to the fact that most of the works only cover some aspects of intonation. Consequently, it is not possible to choose a certain number of varieties and carry out a comprehensive comparison. Instead, the individual aspects will have to rely on different descriptions. For the micro-prosodic characteristics, we will predominantly carry out the comparison based on auto-segmental analyses. For the functional comparison, analyses conducted in other frameworks will be considered as well.

2.1 Accent Type and Micro-Prosodic Variation

AM intonational analyses usually arrive at an intonational grammar of the language consisting of the types of pitch accents and edge tones identified. A simple comparison of the different grammars would, however, be seriously misleading as already mentioned. We will nevertheless start with a table of a number of pitch accent inventories suggested in the literature (Table 1), but then discuss the micro-prosody of the pitch accents in some detail and see what this tells us about the commonalities and differences between the Arabic varieties.

Table 1 Pitch accents and edge tones in six varieties of Arabic: Egyptian (EA), Jordanian Arabic (JA), Lebanese (LA), Emirati (EmA), Sanaani (SA), and Hijazi Arabic (HA)



Pitch Accents

Phrase Tones

Boundary Tones


Rastegar-El Zarka (1997)a

H*L (with meaningful modification)


Rifaat (2005)a

H, LH, HL, L

Hellmuth (2006)


H-, L-

L%, H%,

El Zarka (2013a)

LHL (default accent with meaningful modification)



de Jong and Zawaydeh (1999)

H*, L*, H + L*

H-, L-

H%, L%


Chahal (2001)

H*,!H*, L*, L + H*, L+!H*, H+!H*

L-, H-,!H-

L%, H%


Blodgett et al. (2007)

H*/!H*, (LH)*

H-, L-

L%, H%


Hellmuth (2014)

H*, L*, L + H*, L* + H,?LH*L,?H+H*

L-, H-

L%, H%


Alzaidi (2014)

H*, L + H*, L*



(a) The variety described is a formal register, that is, MSA (probably ESA) as pronounced by Egyptian speakers.

One look at the table reveals that EA is the variety of Arabic that has been treated most extensively within the AM paradigm. Although the studies of EA investigate different registers, a very formal register (Modern Standard Arabic, MSA) or Educated Spoken Arabic (ESA) according to Badawi’s (1973) classification and colloquial Arabic (EA), it can be assumed that the data do not differ significantly regarding the alignment properties of the (default) pitch accent, as shown by El Zarka and Hellmuth (2008).

Another look at Table 1 reveals the paucity of pitch accents identified for EA, especially as opposed to LA and SA. As already observed by Mitchell (1993), EA contours are characterized by a constant up and down in pitch, or in AM terminology, a dense pitch accent distribution, that is, a pitch accent on every (content) word (cf. esp. Hellmuth 2007b). Whereas the frequency of pitch events may partly explain the paucity of accent shapes (cf. Jun 2005, 2014), the fact that what is obviously phonetically alike5 is given diverse or even opposite analyses by different scholars is yet to be explained. Hellmuth’s (2006) analysis of the EA pitch accent refers to the tonal contour on the metrically strong syllable, based on the alignment of the tones in the vicinity of the stressed syllable. Hellmuth found that the L tone is stably aligned at the beginning of the stressed syllable, whereas the alignment of the H-tone is more variable, but usually within a stressed heavy syllable and within the vowel of the second syllable in a sequence of light syllables, suggesting the second mora of the accentual foot as the landing site of the H-tone (Hellmuth 2007a). Rastegar-El Zarka’s analysis was carried out in a Gussenhoven-style model, identifying the main pitch accent as falling (H*L) with the F0 peak stably aligned within the vowel of the lexically stressed syllable (Rastegar-El Zarka 1997: 236) and a more variable final L that is regarded as a kind of edge tone delimiting the tonal domain (Tondomäne). The observation of two L tones surrounding the peak (El Zarka and Hellmuth’s 2008) was the basis for El Zarka’s (2011, 2013b) assumption of a tri-tonal default accent of EA (cf. ex. 1, Figure 2) that accommodates under the falling part of the contour grammatical material, such as function words. The latter, may, however, occasionally project a tonal domain of their own.


Arabic Intonation6

Arabic IntonationClick to view larger

Figure 2 A typical intonation contour of a declarative sentence read in a neutral style with a large number of inter-accentual syllables. The relevant portions containing the low level stretches are indicated by two-sided arrows.

(From El Zarka 2013a; the example is taken from the experimental data used in El Zarka and Hellmuth 2008).

Hellmuth’s (2006) study arrives at a phonological representation positing only one type of pitch accent, which is subject to variation before a boundary. Her account analyzes final downstep and pitch range compression as !LH*, assuming it to be an effect of final lowering (Hellmuth 2006: 70f.). The earlier peak of a final falling gesture and the non-realization of an L tone are regarded as tonal repulsion or undershoot effected by an upcoming pitch event or boundary (Hellmuth 2006: 78; Chahal and Hellmuth 2014: 392).

In Rifaat’s (2005) analysis of Egyptian MSA, the recurring pitch event is labeled H. In doing so, Rifaat seems to treat the flanking L tones as a kind of default tone like Gussenhoven’s analysis of Nubi word prosody (Gussenhoven 2006). In phrase-final position the typical pitch accent is analyzed as HL. LH and L are said to be only marginal. The L is used for the very rare cases of deaccenting. Boundary tones are said to be redundant. In its essence, Rifaat’s description of formal Arabic is very similar to Rastegar-El Zarka’s (1997).

El Zarka (2013a) suggests that the main function of the rising part across the stressed syllable is to lend prominence to the accented word and thus help singling out the lexical items in the speech flow. It is thus rather an instance of word-level prosody than of post-lexical pitch that signals discoursal meanings. She therefore assumes the part that follows the accented syllable to be subject to meaningful variation and thus the most important part of the EA pitch accent, being the part that carries the functional load attributed to the edge tones in Hellmuth’s account.7 In her model, boundary tones are optional and the variation in accent shapes is attributed to phonological features (Gussenhoven 1983; Ladd 1983) like raised and lowered tones, earlier and later alignment, downstep and upstep. Regarding tonal variation, it turned out that the formal register (Rastegar-El Zarka 1997) shows less variation than the colloquial (El Zarka 2013a). The main difference between the standard AM models and El Zarka’s account, however, lies in the theoretical assumptions. Similar to Rifaat’s (2005) account, her model does not identify pitch accents as phonemes, but rather uses the two-tone notation as a type of broad phonetic transcription. El Zarka (2011, 2013a) assumes tonal gestures as intonational primes. Following Bolinger’s proposal that accents are themselves meaningful (cf. Bolinger 1986), she makes a distinction between leading and closing contours (similar to Bolinger’s A profiles and B profiles) and linking contours, thereby offering a unified treatment of individual pitch accents and phrasal contours as, for instance, in Figures 3 and 4. Despite the uncontroversial cases of tonal crowding that may be responsible for tonal undershoot in certain contexts, she proposes that the lack of a leading L may be due to two other sources, either the intention to produce a fall with an early peak or a high linking contour consisting of adjacent identically scaled H*s. Both instances are related to specific meanings as shown in section 3.2 and 3.3 (see Figures 3 and 4).

Arabic IntonationClick to view larger

Figure 3 The EA utterances il-miʕza bitaʕit kamaal/ʔakalit il-fuul “Kamal_s goat ate the beans” as a topic-comment structure (panel a) with two successive rising accents associated with the topical subject and with a rhematic subject (panel b), signaled by two falling accents (from El Zarka 2011).

Arabic IntonationClick to view larger

Figure 4 The utterances il-bajjaʕiin hinaak/bijġallu kullǝ ḥaaga “The vendors there sell everything expensive” as a topic-comment structure with a rising topic phrase and a falling rheme (from El Zarka 2011).

(Soundfile: ElZarka-Audio 1)

In sum, we may conclude that the seemingly different analyses of EA pitch accents are predominantly the result of different theoretical assumptions. However, all accounts unanimously state that almost every content word carries an accent in EA and that the pitch accent inventory is fairly small. The analyses also suggest that there is variation at phrase boundaries or, if El Zarka’s observations are correct, even within phrases. Based on the EA facts we may now proceed with the comparison between EA and other Arabic varieties.

As can be seen in Table 1, the pitch accent inventory proposed by Chahal (2001) for LA is significantly larger than that of EA, suggesting more intonational variation within phrases. The model adopted by Chahal could be classified as a standard model, and thus may be safely compared to that of Hellmuth (cf. Chahal and Hellmuth 2014 for such a comparison). One specific difference is the distinction between monotonal high accents H* and bitonal LH*/L*H (cf. Figure 1b–d for an illustration) in LA. While Chahal (2001) argues that the distinction between the monotonal and the bitonal accent is in fact categorical, she treats the two rising accents as allophonic. Another conspicuous difference between the systems of the two varieties is the assumption of downstepped variants of both monotonal and bitonal accents and a downstepped phrase tone in LA. Finally, the LA tone inventory also includes an L*. Some of the surface differences between EA and LA vanish if we only look at the final accents—even under Hellmuth’s analysis. For example, the frequent occurrences of final downstep could be attributed to final lowering in a Hellmuth-style analysis. However, Chahal specifically mentions that all her pitch accent types may occur in nuclear and pre-nuclear positions. Comparing the LA pitch accent system to the results of El Zarka’s (2013a) study, further diminishes the discrepancy in tonal “inventory.” In addition to the prevalent feature of downdrifting ups and downs, El Zarka (2013a) notices the occurrence of terraced downstepped accents in EA as well. She also identifies phrase-medial high linking contours (cf. Figure 5, which can be analyzed as involving H* accents. Finally, L* tones occur in her data, however as a level configuration only phrase-finally, analyzed as a type of downstep. The second occurrence of L* is in a rising configuration in final or pre-final position. A low flat contour at the beginning of an IP that is usually associated with function words is also attested in EA. Complete deaccenting also occurs after narrow focus and occasionally on given items (Figure 6). Considering that most tonal choices identified for LA also occur in EA, the difference may not be a difference in pitch accent inventory in the first place, but rather a difference in the frequency of use of the pitch accents. The obvious difference between LA and EA pertains to the frequency of flat linking contours, especially when accompanied by deaccentuation, which is conspicuously higher in LA than in EA. The high frequency of bitonal pitch accents is what lends EA its characteristic strong macro-rhythm8 (Jun 2014). Juba Arabic, an Arabic based creole, analyzed as a pitch accent language by Manfredi and Tosco (2014) also exhibits that kind of strong macro-rhythm. Similarly Alzaidi (2014: 337) notes that Hijazi Arabic (HA) has a strong macro-rhythm, since all content words have a pitch accent and the number of pitch accents is small.

Arabic IntonationClick to view larger

Figure 5 A high linking contour in EA (corpus El Zarka, natural conversation).

Arabic IntonationClick to view larger

Figure 6 Deaccenting of a given expression after narrow focus in EA (43ARZ-B07FCT-02F1000 from the experimental data of the D2 project (SFB 632 Information structure).

Emirati Arabic, although it is described as having a pitch accent on every word, is characterized by flat contours (Blodgett et al. 2007), suggesting that the typical pitch accent is monotonal (Figure 7). The bitonal accent suggested for EmA is (provisionally) labeled (LH)* by the authors because the alignment of tones with the stressed syllable is variable. These facts are reminiscent of the situation in LA as noted by Chahal (2001), suggesting that there is most probably no meaningful difference between the rising accent.

Arabic IntonationClick to view larger

Figure 7 Intonation contour of Emirati Arabic (from Blodgett et al. 2007: fig. 1).

(Soundfile ElZarka-Audio 2)

A comparison with the pitch accent inventory of Sanaani Arabic proposed by Hellmuth (2014) shows a higher variability in pitch accents especially when compared to EA. Hellmuth (2014: 76) analyzes LH* and L*H as independent phonological choices, the first of which is observed only in nuclear position, arguing that LH* systematically occurs in polar questions, although she notes that the first could be analyzed as a distributional variant of the latter in pre-boundary position. These facts suggest that SA differs from EA in the default (most frequent) rising accent, being L*H rather than LH* as in EA. Thus, SA may be classified as an L*-language and EA as an H*-language according to Hedberg and Sosa’s proposal (2008: 119,. Within nuclear accents, the default seems to be H* or !H which are not taken to be distinct phonological categories. Hellmuth does not mention, however, whether the downstep variant is like the final-lowering case she described as !LH* in EA.

The analyses of HA (Alzaidi 2014) and of JA (de Jong and Zawaydeh 1999) have only been carried out on very limited experimental data.9 Despite the differences in phonological analysis, the main pitch accent shape in HA and JA seems pretty much the same as the one described for EA.

A conspicuous difference between SA and all other varieties discussed so far is the frequent use of low level contours after initial strong prominences (e.g., on a question word) analyzed as a sequence of L* tones. The use of L* tones for the low level stretch implies that, contrary to LA, SA is less likely to deaccent longer sequences than LA. Hellmuth (2014: 80) notes that in read speech a pitch accent may be found on every content word, but that, contrary to EA (Hellmuth 2011), repeated words are routinely deaccented in narratives and conversational speech. Obviously, the difference may once again rather be a statistical tendency than a categorical phenomenon. Even though other studies do not identify a default accent, it is possible to extract similar alignment facts from the accent descriptions of various other varieties.

All varieties of Arabic described so far are stress-timed (Watson 2011) and head-prominent (i.e., the lexically stressed syllable is the one that carries the pitch accent), which also lends some prominence to the syllable and the whole word it is associated with. In an interesting paper, Rammuny (1989) calls rhythm in JA “word-stress timed,” thus acknowledging that JA tends to have a pitch accent on every word. Let us now look at a variety with a different type of prosody. In Moroccan Arabic (MA), as described by Benkirane (1998, 1999–2000), for instance, the different cues to prominence are strangely dissociated. According to Benkirane (1998: 352, 1999–2000), a word-final sequence CVC(V) is the domain for stress in MA, yielding ’CVC and ’CVCV. Moroccan Arabic intonation contours crucially lack the peaks and troughs characteristic for other varieties, especially EA. It is the phrase-final syllable or the last two syllables that carry the single noteworthy pitch event of an IP (Benkirane 1998: 361). In declarative utterances, the pitch contour is a rise-fall, that could be noted as LHL in AM-notation, which stretches over the final CVC(V) sequence. Continuity is signaled by a high rise covering the final syllable only. As the spontaneous speech data analyzed by Maas (ms.) suggest, a final word may frequently be associated with the falling or low part of the contour, whereas the rise or high tone is associated with the syllable before it (i.e., the final syllable of the penultimate word as in Figure 8). This is what lends Maghrebinian Arabic its peculiar rhythmic and tonal characteristics that are strange to the ears of speakers of Eastern varieties.

Arabic IntonationClick to view larger

Figure 8 Intonation contour of an MA declarative sentence (F.92.01: 207 from Maas, ms.).

(Soundfile ElZarka-Audio 3)

Benkirane (1999–2000: 90) notes that word accent and intonational pitch are separate phenomena in MA, the first one being on the penultimate syllable and the second one on the final syllable. This suggests that probably MA lacks word stress altogether (Maas, ms.). Pitch in this variety seems to be purely post-lexical, occurring at the right edge of a phrase. If this analysis is correct, MA is an edge-marking language (Jun 2005). In any case, the characteristic off-beat rhythm (if we assume penultimate stress with Benkirane) together with extremely high final rises is what makes Moroccan intonation strikingly different from Eastern Arabic.

2.2 Prosodic Structure

As noted in the introductory section,10 AM accounts of intonation usually assume a prosodic structure that is built up hierarchically in a bottom-up manner: syllable, foot, prosodic word (PW), the prosodic (or intermediate) phrase (PP or ip), intonation phrase (IP). In a revision of the original theory (Pierrehumbert 1980), Beckman and Pierrehumbert (1986) introduced the intermediate phrase (ip) to account for weaker intonational boundaries and for tonal phenomena that posed problems for the theory, such as the suspension of downstep.11 When two levels of intonational phrasing are assumed, tonal contours following the pitch accent are attributed to phrase tones. AM descriptions of Arabic, in line with the revised Standard Theory, usually posit two levels of intonational phrasing (de Jong and Zawaydeh 1999 (JA); Chahal 2001 (LA), Hellmuth 2006 (EA)). The prosodic constituent hierarchy as assumed in Chahal’s work is depicted in Figure 9. In LA and EA, as in stress-accent languages in general, the PW is generally assumed to be the domain of stress assignment. Hellmuth (2007b) additionally assumes the PW to be the level of pitch accent assignment. In LA, the ip is the level of nucleus placement, and the distinction between ip and IP is thus based on prominence relations and, as noted earlier, tonal phenomena. Chahal (2001) also notes different amounts of pre-boundary lengthening as further evidence for her analysis. For EA, Hellmuth (2011) also proposes two levels of phrasing above the PW,12 MaP (equivalent to ip) and IP, both of which are said to be the domain of downstep. Contrary to LA, Hellmuth neither found consistent lengthening effects nor obligatory edge tones in the MaP of EA. Earlier, Hellmuth (2004) had proposed an additional constituent, the Minor Phrase (MiP)13, between the PW and the MaP for rhythmic reasons, suggesting that a MaP is minimally comprised of two Maps, which in turn is minimally comprised of two PWs.14 The lack of obligatory phrase-tones suggests that these two constituents constitute the domains of prominence distribution and rhythmic adjustment rather than tonal specification by edge tones, thus being comparable to Rastegar-El Zarka’s (1997) Prosodic Phrase (PP). The approach employed by Rastegar-El Zarka (1997) and El Zarka (2013a) separates metrical and tonal structure by assuming a PP as the domain of rhythmic adjustment, however without phrase tones, and an IP with optional boundary tones. Like Hellmuth’s MiP and MaP, the PP exhibits partial reset between phrases, that is, the height of the first peak is approximately equivalent to the height of the final peak in the preceding PP, with the option of full reset (to the level of the utterance-initial peak) in the final PP. Rastegar-El Zarka also observes that PPs frequently consist of two prosodic words and that in highly rhythmical speech an IP may be split up into several PPs. As a matter of fact, Hellmuth’s analyses were based on experimental laboratory speech consisting of highly unnatural long sentences and Rastegar-El Zarka’s observations were made in a corpus of formal pre-planned or read speech (MSA/ESA) from television broadcasts. Both types of data invite a very rhythmical, mechanistic articulation, promoting the occurrence of shorter rhythmical phrases. As El Zarka (2013a) notes, the PP only plays a minor role in spontaneous conversational data, where IPs are often significantly shorter and definitely more expressive and therefore clearly less rhythmic, albeit long IPs may also be found in conversational data (Hellmuth 2006, 2011).

One important assumption about the prosodic hierarchy concerns segmental sandhi phenomena that are ascribed to the different prosodic domains. For EA and SA, Watson (2002) describes a number of prosodic processes which are sensitive to different prosodic domains, noticing the optionality of many of them, also in terms of the domain of their application. The well-known rule of phrasal epenthesis in EA that inserts a vowel between C2 and C3 in a sequence of three consonant obviously applies across PP boundaries (Rastegar-El Zarka 1997; Hellmuth 2011).

As regards prominence relations within a phrase, it has mostly been assumed that an IP in Arabic has a strong right edge, manifested in the nuclear accent (Chahal and Hellmuth 2014). Chahal ascribes this property to the ip in LA, noting that prominence relationships between individual nuclear accents within an IP are unclear (Chahal and Hellmuth 2014: 368). Similarly, in Hellmuth’s account of EA, phrases are analyzed as right-headed. Hellmuth however remarks that the nuclear accent (i.e., the main prominence) is not easily characterized phonetically (372) in broad focus cases. As already noted, the main difference between EA and LA is that non-final nuclear accents in cases of narrow focus are followed by postnuclear accents in EA, whereas post-nuclear stretches within the ip are deaccented in LA. Furthermore Chahal and Hellmuth (2014) note that while LA distinguishes three levels of prominence (lexical stress, pitch accent, and nuclear accent), lexical stress and pitch accent are conflated in EA, yielding only two levels of prominence.

Arabic IntonationClick to view larger

Figure 9 Prosodic hierarchy of LA (from Chahal and Hellmuth 2014: 371, fig. 13.1).

Another view is expressed by Rastegar-El Zarka (1997) who suggests that the main prominence of a phrase (PP and IP) may be either left or right, the right edge being the more frequent option. In El Zarka (2013a, b), she abandons the idea of an obligatory nucleus altogether, suggesting that marking edges by prominence is optional in EA. Differentiating between the metrical level (PW, PP) and the tonal level, El Zarka (2013a) suggests that pitch accents in EA may indeed be analyzed as Accentual Phrases (AP)15 as suggested for many (pitch accent) languages (cf. Jun 2014 for examples). In the default case, an AP covers all syllables starting from the lexically stressed syllable of a PW to the beginning of the following PW.

As this small-scale comparison between two varieties shows, difficulty in comparing phrasing across Arabic varieties arises even when the descriptions have been carried out within more or less the same framework. This is due to different assumptions about what constitutes a prosodic constituent, which frequently have theory-internal motivation. We will therefore not try to compare prosodic constituency among other varieties that have been described in a completely different framework. We will nevertheless refer to them in the section 3.1 that discusses the functions of prosodic phrasing.

3. The Function of Intonation in Arabic

3.1 Prosodic and Syntactic Phrases

The fact that prosodic boundaries frequently coincide with syntactic constituent boundaries is a truism, syntactic phrases being the formal expression of semantic constituents. In generative accounts, prosodic phrasing is usually regarded as the “spell-out” of syntactic structure. It is, however, acknowledged that prosodic structure and syntactic structure are not always isomorphic. Since generative work is concerned with speaker competence and not with actual performance, most generative linguists predominantly base their analyses on individual (read) sentences. This facilitates equating syntactic phrases with prosodic phrases. Thus, Chahal and Hellmuth (2014) state that the IP is co-extensive with a “syntactic sentence” in LA, and a root clause in EA, whereas the ip (MaP in EA) corresponds to a syntactic phrase or XP. Hellmuth (2016) summarizes the few experimental studies that deal with the syntax-prosody interface in Arabic all of them on EA. She states that phrasing is variable across speakers, but that general patterns can be observed. For instance, there is a general tendency for most EA speakers not to produce a phrase boundary after the subject in SVO sentences, unless the subject is heavy. Interestingly, in a pilot experiment with a parallel corpus of EA and JA, Hellmuth found that Jordanian speakers routinely produced a phrase break, even after light subjects (cf. Chafe 1994).

An informal count in a spontaneous speech corpus (El Zarka 2013a) yielded only a handful of cases where the subject was separated from the following verb by a phrase break, which supports Hellmuth’s experimental findings for EA.16 Most cases were found in the narratives evolving around a number of different characters, and the phrase breaks were used as a means for highlighting topic shifts.

The only empirical study we are aware of that specifically investigates prosody-syntax correspondences is Malherbe’s (1991) study of conversational urban Kuwaiti Arabic (KA), framed in a Hallidayan approach. Alharbi notes that an IP most frequently corresponds to sub-clausal elements/constituents, followed by single-clause constructions, and less frequently to more than one clause. The final category also contains a combination of one clause plus elements of a second one. Some noteworthy generalizations emerge from the study concerning the preferred phrasing in relation to the syntactic constituents: (I) over 95% of IPs are co-extensive with a grammatical constituent (Alharbi 1991: 180); (ii) there is a tendency of verbs to be IP-initial, subjects IP-medial, and objects and adverbs IP-final;17 (iii) phrases with only one syntactic constituent are most frequent among the sub-clausal phrases; and (iv) adverbial phrases are most likely to constitute a prosodic phrase of their own, followed by objects, whereas subjects are more likely to be phrased together with the verb.

Based on Ingham’s (1994) description of Najdi Arabic, Ingham (2010) and Holes (2010) distinguish two basic sentence types, the “uninodal” type which is predominantly verb-initial and the “binodal” noun-initial type for Najdi Arabic and Gulf Arabic, respectively. The distinction is found to be related to information structure, the uninodal sentence is assumed to express new information and the first part of the binodal sentence can be identified with the topic of a topic-comment structure. While the uninodal sentence is articulated in one IP, the binodal type is usually divided into two phrases, frequently separated by a pause.

Obviously, phrasing is caused by multiple factors, syntax being only one of them. Other factors are speech rate and information structure (see section 3.2), and more general rhetorical preferences of individual speakers. It would nevertheless be interesting to see whether such differences as observed by Hellmuth (forthcoming) between EA and JA can robustly be identified between Arabic varieties.

3.2 Prominence Patterns, Pitch Accent Type, and Information Structure

One of the main linguistic functions of intonation is discoursal. In this section, we will look at the intonational correlates of information structure, assuming two basic dichotomies, following El Zarka (2013a): focus and background and theme and rheme. While focus and background serve to highlight or downgrade parts of the information, theme or topic and rheme18 are pragmatic relations (Halliday 1967; Lambrecht 1994; El Zarka 2013a, b), that is, the theme is what the sentence is about or serves as an anchoring point for the proposition, whereas the rheme is the newsworthy, assertive part of the information. A special case where rheme and focus coincide is called narrow focus. Under narrow focus only one constituent constitutes the new information, as, for example, JOHN ate the cookies, answering the question: Who has eaten the cookies? In spontaneous speech, such answers usually do not repeat the presupposed information. If they do, however, the linguistic constituents are frequently deaccented (i.e., the only pitch accent is used on the focal constituent).

It has been proposed that only some languages mark narrow focus by intonation, so called “plastic-accent languages,” whereas other languages, the “non-plastic accent languages” make use of word order instead (Vallduvi 1992). Although Arabic varieties clearly have flexible word order—clefting, right and left dislocation are certainly commonplace in all Arabic varieties—it seems that narrow focus may also be realized in situ like in English. Mitchell (1993: 230) notes that Arabic (obviously in general) shares with English the possibility of changing the location of the nucleus in an unchanged word order. Mitchell’s suggestions are clearly supported by Rammuny (1989) for JA, by Chahal (2001) for LA, by Alharbi (1991) for KA, and by El Zarka (2013a, b) for EA. However, while Chahal’s and Rammuny’s studies suggest that there is complete deaccenting after the early narrow focus in LA and JA, post-focal accents are not completely suppressed in EA, but only downgraded. Thus, downgrading seems to be the EA equivalent (probably also that of other Arabic varieties, e.g., Hijazi Arabic) of complete deaccenting in West-Germanic languages, since in EA, word prominence (almost) always involves a certain measure of pitch change (El Zarka 2013a, b).

As regards information status (i.e., whether the denotatum of a word is given or new in discourse), it has been claimed by Hellmuth (2010) that the given/new distinction does not have prosodic reflexes in EA due to the non-deaccenting phonology of the language. El Zarka (2013a) argues that given items may be downgraded and even deaccented as long as they are in the presupposition, but not if they are part of the rheme. Deaccenting of given concepts within the rheme is a typical feature of English (cf. Ladd 2008). Mitchell (1993: 240), citing El Hassan (1990), gives an example from JA that suggests that this variety allows post-focal deaccenting within the rheme: Ɂana BAḤIBB ilfawaakih “I LIKE fruits” as an answer to taakul tuffaaḥ? Would you like (to eat) some apples? with a flat monotone on fawaakih, being semantically given due to its being a hyperonym of “apple.” For EA, El Zarka (2013b) claims that there is a difference in prominence between final downstepped accents on time adverbials which are auditorily less prominent than the preceding accent, if they are not narrowly focused. Thus, the difference between EA and other varieties seems to be gradient rather than categorical. El Zarka (2013a) also cites many examples that suggest that the location of the main prominence is not fully predictable. A similar claim has been made by Alharbi (1991) for KA.

It is commonly assumed that the focused constituent carries the nucleus19 of a contour. While this is probably uncontroversial in narrow focus, it is not so clear when the focus is broad, constituting a larger rhematic domain. Thus, El Zarka (2013a, b) claims that in many cases a “neutral” all-new sentence will not exhibit any special prominence in EA. As Alzaidi (2014) demonstrates, neutral contours in Hijazi Arabic show a clear prominence peak on the penultimate word,20 but he also noticed a fair number of declining pitch accents with no clear prominence distinctions among them, suggesting that a nucleus is optional also in HA.

Concerning the phonetic realization of this stronger prominence, it is usually assumed that the accent of a narrow focus (at least if contrastive) is marked by expanded pitch range (Rammuny 1989; Chahal 2001; Hellmuth 2006, 2010; Yeou et al. 2007; Alzaidi 2014). However, the acoustic correlates of the focus accent may vary, both between languages and between speakers of the same language. Thus, not all speakers use pitch range excursion as a strategy in EA (El Zarka 2013a; Cangemi et al. 2016). In a cross-dialectal study of contrastive focus in three vernaculars (MA, Yemeni Arabic (YA), KA), Yeou et al. (2007) found that in MA focus accents exhibited very large pitch excursions, followed by KA, whereas the difference in excursion size between contrastive and non-contrastive focus was not significant in YA. While in all varieties, post-focal material was downgraded, only MA also exhibited a totally flat contour in the pre-focal syllables, suggesting that MA resorts to complete deaccenting. Durational effects of focus have also been reported for all varieties, however, with differing magnitude (Yeou 2007 et al.). Also in HA, postfocal pitch accents are not deleted, but only compressed (Alzaidi 2014).

Finally, El Zarka (2011, 2013a,b) suggests that in addition to prominence distinctions brought about by compression of post-focal pitch range, EA uses a special accent shape to convey narrow focus. While she ascribes prominence differences to the focus-background partition, she proposes that the rhematic contour is typically falling (see section 3.3). She identifies two types of focal accents: the closing accent ((rising-)falling) signifying assertion and the leading accent (high rising), used pre-pausally, signifying focus plus continuity. Yeou et al. (2007) report a high rising variant of the focus accent for KA.

El Zarka (2013a) demonstrates that under narrow focus, EA LHL-accents exhibit an earlier alignment of the second L-tone and, perhaps due to tonal repulsion, an earlier peak as well (Figure 4b). For Moroccan Arabic, Benkirane (1998) reports an LHL contour associated with the final word of an utterance, with a low level stretch on unstressed syllables and a rise-fall starting from the stressed syllable to the end of the word. He observes that the peak is at the end of the stressed vowel if the stressed syllable is followed by another open syllable and within the vowel if the stressed syllable is final, in which case the duration of the syllable is increased (352). These facts have been corroborated in an experimental study by Yeou (2005) who found that not only syllable structure and duration but also focus affect the alignment of the peak.

3.3 Tunes, Sentence Mode, and Information Structure

One especially salient intonational difference is the distinction between declarative and question intonation. In this section, we will give an overview of a number of sentence mode types and their preferred intonational realization in Arabic varieties, as described in the literature.21

3.3.1 Declarative Tune

The most common declarative tune in most varieties is a gradually declining contour involving pitch accents on every content word (Alharbi 1991; Rifaat 1991; Mitchell 1993; Rastegar-El Zarka 1997; Chahal 2001; Hellmuth 2006; Alzaidi 2014), exhibiting what Mitchell (1993: 222) called the “see-saw effect.” An example from EA is shown in Figure 2. The “neutral” declarative sentence is mostly assumed to be “all-new,” uttered out-of-the-blue. However, as sentences usually occur in context, the typical declarative may also be a topic-comment structure, if the topic is highlighted constituting a theme (El Zarka 2013a) as in Figure 4. Such statements have a rising-falling shape, encoding the bipartiteness of the information structure (cf. Ingham’s “binodal” sentence type). Alharbi (1991) also notices the rare occurrence of such contours in KA. In MA, the rising-falling articulation seems to be quite frequent (Benkirane 1998), usually involving two phrases. According to Benkirane (1998: 355), the declarative utterance, however, consists of a level stretch until the nuclear syllable which is characterized by a rise-fall. In both cases, the MA declarative lacks the conspicuous downtrend of the other varieties (355). The rising-falling pattern, a pointed hat, has been found in LA short declaratives. Probably due to the stronger deaccenting nature of LA, declaratives in this variety may also be characterized by the “hat pattern” typical of West-Germanic languages (Chahal and Hellmuth 2014: 377), that is, a rise-level-fall contour (Figure 10).22 The hat pattern is probably also characteristic for Syrian Arabic (Ghazali et al. 2007).

Arabic IntonationClick to view larger

Figure 10 A typical hat pattern contour in LA (from Chahal and Hellmuth 2014: 378, fig. 13.3).

3.3.2 Continuity

Incompleteness, as indicated by rises or level contours, may not only characterize the topical or initial part of a sentence, but multiple parallel clauses or constituents. Whereas the falling declarative contour is typical for a statement in isolation, indicating finality, actual discourse often comprises a series of IPs with a final rise followed by an IP with a final fall.

Continuity may be signaled in Arabic varieties by a final rise or a rising-plateau contour. The first contour has been identified in EA (Rifaat 2005; Hellmuth 2006; El Zarka 2013a), LA (Chahal 2001; Chahal and Hellmuth 2014); JA (Rammuny 1989); MA (Benkirane 1998, 2000); the (rise-)level occurs in EA (Rastegar-El Zarka 1997; El Zarka 2013a), LA (Chahal 2001; Chahal and Hellmuth 2014), JA (Rammuny 1989; de Jong and Zawaydeh 1999), MA (Benkirane 1998). Rastegar-El Zarka (1997) notes that in EA the phrase-final level contour has a slightly falling variant—a rise-plateau-slump contour—which is especially common in lists.

El Zarka (2013a) identifies high rises in continuation contexts. She argues, however, that this rise not only indicates continuation, but at the same time involves a focus accent, the large pitch excursion being a correlate of prominence (section 3.2). More generally, the rise as opposed to the rise-plateau seems to be more marked and to indicate some measure of contrastiveness as well.

3.3.3 Questions

It is well established that questions do not necessarily have rising intonation. A major distinction has to be drawn between questions that are morphosyntactically marked, in the first place wh-questions, and unmarked, so-called declarative questions. Unsurprisingly, only the latter type has to be marked differently from statements by intonation. Polar questions with a question particle like hal (MSA) or waʃ (MA) have also been reported to be of the rising type in Arabic. Wh-questions

In most Arabic varieties, the question word occupies the initial slot of a sentence. As the question word is also commonly regarded the focus of the utterance, it is usually associated with a strong prominence, frequently the nuclear accent of the whole phrase. Most descriptions of Arabic note a falling contour over the whole utterance. Thus, the whole tune is similar to the English wh-question tune. The interesting question, however, concerns the prominence of the question word and the declination of the trendline of the whole phrase. Ibrahim et al. (2001) suggest a difference in upper and lower trendlines (connecting the peaks and troughs of the tune) between declaratives and wh-questions. According to their calculations the upper and lower trendlines decline in parallel in statements, whereas in wh-questions the upper trendline declines and the lower trendline rises. This observation may account for a compressed pitch range in post-focal position, if the question word expresses narrow focus. Additionally, the upper trendline falls from a higher level, exhibiting a steeper fall than in declarative utterances. This last observation accords with Rastegar-El Zarka’s (1997) findings in MSA/ESA. This author, however, claims that in “unemphatic”23 wh-questions the first accent is realized in a higher register with a gradual fall to the end of the utterances. A nuclear focus accent on the question word is indicative of an echo-question, with special emphasis on the question word. A theoretical question connected with these observations is whether the different trendlines observed require a descriptive phrasal component as suggested in the so-called overlay models that treat accents as local events “riding” on a pre-defined phrase (cf. Ladd 2008: ch. 1.3.6 for a discussion). El Zarka (2013a) suggests that the tone-sequence notation of AM theory can readily account for the EA facts if the notation includes a device for describing the scaling of the individual tones in relation to the preceding tone and a certain default.

EA differs from other Arabic vernaculars in that the question word frequently occurs in situ (Woidich 2006: 358). Thus, if the constituent asked for is in object position, the neutral position of the question word is non-initial, but even wh-words asking for a subject usually occur post-verbally as, for example, in ḥaṣal ’ē? “What happened?” (lit. happened what),24 which is invariably realized with a falling contour, with the question word being highly prominent, but associated with a low flat contour. Echo questions, on the other hand, may be articulated with a final rise (i.e., high pitch on the question word), thus adding to its prominence. Similar facts are reported for JA by Rammuny (1989: 31f.) who gives falling intonation for “normal questions” and a falling contour with a final rise for “echo questions.” Thus, EA and JA do not differ in contour type, although the question word is final in EA and initial in JA. Interestingly, Alharbi (1991) reports over 90% of wh-questions with a rising final contour in KA. He also notes that the question word always carries the nuclear rising accent followed by a slight rise to the end of the contour. The KA tune is probably similar to the high register use Mitchell (1993: 247) reports for Syrian Arabic. Polar Questions

Polar questions may be signaled in Arabic by a question particle, like MA waš or be morphologically covert. In the latter case, the only way to disambiguate question and statement is by intonation. Consequently, rising intonation in polar questions has been observed for EA (Rastegar-El Zarka 1997; Chahal and Hellmuth 2014), JA (Rammuny 1989; de Jong and Zawaydeh 1999); LA (Chahal 2001; Chahal and Hellmuth 2014); KA (Alharbi 1991). The interesting question is where this rise is located and how exactly it is associated with the text. Question tunes may either be high or rising altogether, involving a specific trendline or the rise may occur at the right boundary. The first analysis has been chosen by Ibrahim et al. (2001) who note that “yes-no” questions exhibit a rising upper and lower trendline in EA; a similar observation was made by Norlin (1989). Rastegar-El Zarka (1997) reports a falling contour with a rising final pitch accent in polar questions in a MSA/ESA register. Whether this difference is due to the difference in register or whether both options are available in EA cannot be decided yet. El Zarka (2011, 2013a) proposes the equality of phrasal contour and accent contour for the thematic function of the rising contour (see section 3.3.6 below). If this is correct, we may equally find rising question IPs and declining IPs with rising final accents in EA.

If the rise is localized in the final accent, it may be rising from low in the accented syllable (L*H) or the accented syllable may be associated with a rise (LH*). While EA opts for the second variant (Rifaat 1991; Rastegar-El Zarka 1997; Hellmuth 2006), JA opts for the first one (Rammuny 1989; de Jong and Zawaydeh 199925). In LA, low and rising pitch accents occur before the final rise, however, without being contrastive. Contrary to EA, LA contours may exhibit an early pitch accent with the rest of the phrase deaccented, followed by a rising boundary configuration (Chahal and Hellmuth 2014: 377, 379). This fact brings LA closer to West-European languages. However, Barrett and Hata (2006) note that in their Arabic data26 (spoken by a Lebanese speaker), the final question rise always starts on the final stressed syllable as opposed to the English data where the question rise is clearly a boundary phenomenon. Alharbi (1991) reports for KA a high rising pitch on the question word (casā mā), if there is one, whereafter the pitch remains high until the end of the question. In the varieties described, the authors mostly note that the question rise seems to differ from the continuation contour. In EA and LA, rising continuation tunes and rising question tunes differ in excursion size (Rastegar-El Zarka 1997; Chahal and Hellmuth 2014).

A special type of final rise is typical of Syrian Arabic, where a polar question is characterized by a high rise with a concomitant pre-pausal “drawl” (Mitchell 1993: 245ff.; Kulk et al. 2003), which is not only common in questions.

The interesting odd-man-out among Arabic varieties is once again MA. Benkirane’s (1999–2000) experimental study clearly shows a falling ending for polar questions in MA. Declarative and polar question intonation can nevertheless be distinguished, as MA questions are characterized by a higher register in general and an even larger pitch excursion on the final nuclear accent (Benkirane 1998; Benkirane 1999–2000: ch. 6).

3.3.4 Other Illocutionary Types

Commands and exclamations have been said to be characterized by a low falling contour in EA (Rastegar-El Zarka 1997), KA (Alharbi 1991), and MA (Benkirane 1998; Maas, ms.). Benkirane (1998: 356) notes that the intonational contour of commands resembles that of wh-questions, exhibiting a higher onset than statements and a declining contour. For both utterance types, high level contours have been reported in KA (Alharbi 1991); Rastegar-El Zarka (1997: 366) notes that, contrary to the matter-of-fact flavor of downstep in neutral statements, the final accent in commands and wishes is frequently a rise-fall or a high level tone, making the utterance more insistent and forceful.

Calling contours have been reported for EA to be rising-falling. While the fall may optionally reach a low level, it most frequently is only half-completed (Rastegar-El Zarka 1997); the half-completed fall also occurs in MA (Benkirane 1999–2000: 173).

3.3.5 The Intonation of Negation

Arabic varieties differ in the form of verbal negation. Most varieties east of Egypt, as well as the Sudanic region, only use a negative prefix maa, while Western varieties (including Egypt) have a discontinuous morpheme maš to express negation. This difference is accompanied by a difference in prominence. The Eastern negative prefix typically carries the nuclear accent, which is falling in statements and rising in questions. In Syrian question intonation, the rise is continued across the verb, accompanied by the characteristic drawl (Mitchell 1993: 239). In EA, however, the negative particle is not prominent at all, but the main accent is on the verb itself (Mitchell 1993). The same is true for Tripoli Arabic (Caron et al. 2015).

3.3.6 The Intonation of Theme and Rheme

In section 3.2, we have been concerned with the prominence-related notions of information structure such as focus and background. This section will look into the tonal shape of thematic and rhematic domains in Arabic.

Earlier (section 3.1), we already pointed out the strong effect of information structure on phrasing. Ingham (2010) and Holes (2010) also note that the uninodal sentence type is characterized by a falling intonation, whereas the binodal sentence comprises a rising and a falling part.

El Zarka (2013a) offers a detailed study of the information structure–prosody interface in EA. She identifies an intonational construction in EA called integration I consisting of a rising and a falling contour. This contour is associated with various syntactic constructions. Its major function, however, is the partition of a sentence into a thematic and a rhematic part. The rising or rising-level “thematic” contour may be associated with a number of thematic constituents, such as aboutness topics like in the example in Figure 4, or (mostly temporal) adverbial phrases that serve as a frame for the predication but also other syntactic constructions such as the apodosis of a conditional construction. The falling tunes are said to be associated with rhematic parts of sentences. In general, El Zarka (2013a) claims that those semantic components that are asserted are associated with a fall. Very similar facts have been described for Juba Arabic by Manfredi and Tosco (2014).

Maas’s (ms.) use of the term integration also characterizes the concatenation of rising and falling prosodic domains to express the relation between syntactic clauses in MA. He, however, claims that prosodic integration is only observed if this relation is syntactically asyndetic. Again, the conspicuous difference between EA and the other Eastern Arabic varieties, on the one hand, and MA, on the other hand, is the fact that in MA the rises and falls are only associated with the ends of phrases, whereas at least in EA it is frequently, but not obligatorily, whole phrases that have rising and falling trendlines.

4. Conclusion and Outlook

This short overview of tones and tunes has identified a number of commonalities and differences among Arabic varieties. A three-way classification of these may be discerned: (a) near-universal commonalities, (b) areal commonalities, (c) inter- and intra-varietal differences.

The near-universal commonalities concern the typically falling contours of declaratives and the high component in questions, as well as highlighting the most interesting and newsworthy information by prosodic prominence in cases of narrow focus. In all Arabic vernaculars reviewed except KA, wh-questions have a high onset, usually with a strong prominence on the question word (if in initial position) and a steeply falling trendline. Polar questions are more varied, but the familiar final rise occurs in many varieties, alongside a suspension of downdrift. Only MA polar questions have been found to exhibit falling contours and to use high register of the whole utterance.

The overview suggests that in all Arabic varieties reviewed it is possible to express narrow focus in situ. This is a feature usually ascribed to West-Germanic languages that are characterized by pervasive deaccentuation of given constituents. It has been demonstrated that other languages that have a more flexible syntax, do not express focus in situ, but use different word orders.27 Interestingly, Arabic, despite its quite flexible syntax—which is also used pervasively to signal information structure—and its avoidance of deaccenting, also shows in situ narrow focus.

As far as accent distribution and rhythm are concerned, there seems to be a major split among Arabic vernaculars into a Western type, as exemplified by MA, and an Eastern type from Egypt to the East. While MA intonation contours are characterized by pervasive deaccenting (or rather lack of accenting), the Eastern varieties all have a tendency to accent all content words, however with differences in degree. EA has been found to be the language that mostly resists deaccenting and that has little variation in pitch accent shape, whereas LA and SA, for instance, are said to have deaccentuation in the environment of narrow focus and more variation among pitch accents. Based on our present knowledge, we tentatively draw a map depicting the macro-rhythm observed in Arabic varieties, applying the criteria suggested by Jun (2014), which shows a cline from African Arabic-based creoles and EA as the varieties with the strongest macro-rhythm and MA with the weakest macro-rhythm (Figure 11). As soon as more robust data is available for the individual Arabic varieties, the map may be modified, refined, and completed. This rhythmic distinction is partly in line with the findings of Ghazali et al. (2007) concerning micro-rhythmic variation, which also group Western Arabic varieties together against varieties in the East.28 The strong macro-rhythm among Arabic varieties in Eastern Africa is probably an areal phenomenon.

The high pitch accent frequency goes hand in hand with a rather small number of pitch accent types in the investigated varieties. Specifically, complex tones of English, such as the fall-rise and the rise-fall-rise “scooped” intonation are typically absent in Arabic intonation.29 Micro-prosodic variation in pitch accents can be discerned, especially concerning the alignment of the L tone at the beginning of a rise. While EA clearly favors alignment at the beginning of the stressed syllable, yielding LH*, JA seems to favor alignment within the vowel, yielding an L*H accent. LA seems to use both variants in free variation.

At present, it is however not yet clear that the differences suggested by the pitch accent inventories proposed in the works of individual scholars is real. Since the proposed inventories strongly depend on the theoretical premises of the descriptions and the type of data investigated, it is premature to take the suggested differences at face value. It is important to look at which accent types occur most frequently and which of the accents are used in what contexts to be able to establish robust differences within the Arabic language family.

Thus, future research will have to look at comparable data in different Arabic varieties from one and the same theoretical viewpoint. Fortunately, such a project is already underway. The Intonational Variation in Arabic (IVAr)30 project at the University of York, U.K., directed by Sam Hellmuth, is based on a multi-level corpus ranging from fully controlled read data to spontaneous speech (cf. Hellmuth 2014). The results of the project will certainly constitute genuine progress in the comparative study of Arabic intonation. In addition, we are clearly in need of more large-scale descriptive studies of spoken corpora. Work of this kind is currently being done at the CNRS in Paris in various laboratories, such as the laboratory for the study of Languages and Cultures of Sub-Saharan Africa (LLACAN).31 Only the collaboration of the descriptive linguist who collects a large corpus of socio-linguistically stratified naturally occurring data and the prosody expert who applies various methodologies involving experimental design, acoustic investigation, and statistical methods will help to close the knowledge gap and complete the picture of intonational variation in Arabic.

Arabic IntonationClick to view larger

Figure 11 Macro-rhythmic typology across Arabic varieties: (a) strong macro-rhythm (Juba Arabic, EA, HA), (b) medium macro-rhythm (LA, SyrA, EmA, (c) weak macro-rhythm (MA).


(1) However, this distinction is not without problems and the debate about stress versus accent is not yet settled. Note that in some AM models low level “pitch-accents” are identified, in which case prominence cannot be assumed to be coded by pitch in any way. Furthermore, what is called stress is a cover term for “admittedly not well understood” phonetic cues to prominence (Ladd 2008: 54). To avoid these problems, we will use the term stressed only in the sense of an abstract property of a syllable within a word as having the potential of being accented (i.e., actually made prominent) (cf. Bolinger 1986 and other work) interchangeably with lexically stressed or metrically strong. For the sake of simplicity, the terms accent and pitch accent will also be used without a difference here. This minor terminological fuzziness seems to be justified as phonetic differences between stress (in the phonetic sense) and pitch accent may be identifiable in Germanic languages, but probably not in Arabic. In Arabic varieties, pitch change is a major prominence-lending feature.

(3) However, this assumption is a strong idealization. In actual speech, the tones are not always stably aligned. While alignment seems to be a good basis for the identification of L*, it is less apt for the identification of H*. In fact, in many languages H*s occur outside the realm of the stressed syllables. Therefore, it has also been proposed to label the starred tone according to perception (cf. the discussion in Face [2006] for more details).

(4) ToDI is the annotation system developed for Dutch by Gussenhoven and colleagues as opposed to the standard ToBI (tone-and-break-indices) system (Beckman et al. 2005) for English. In recent years, many ToBI-style annotation conventions have been developed for individual languages.

(5) Remember that accent shapes do not necessarily differ between EA and MSA/ESA as noted above.

(6) L tones and syllables with which they are associated are italicized; capital letters indicate stressed syllables.

(7) In section 2.2, there are examples for such modifications.

(8) Jun (2014) proposes a new Prosodic Typology based on what she calls macro-rhythm, meaning the rhythm created by pitch events. This macro-rhythm is essentially equal to El Zarka’s (2013a) high-level rhythm.

(9) However, our knowledge of JA intonation is enriched by Rammuny’s (1989) description, which will be discussed later.

(10) In the introductory section, we ascribed this hierarchy to the metrical structure. Metrical structure and prosodic structure are usually used without a meaning difference.

(11) Cf. Gussenhoven (2004: 125f.) for a discussion.

(12) The prosodic word is not included in the present comparison, being the level of stress assignment and, strictly speaking, not an intonational domain. Note, however, that in Hellmuth’s work the PW is assumed as the domain headed by a pitch accent.

(13) The MiP was reanalyzed as a compound PW in Hellmuth (2011).

(14) The second constraint was said to be only true for faster speech rates.

(15) Such an AP is the equivalent of the “Tondomäne” in Rastegar-El Zarka (1997).

(16) Note, however, that the phrasing in the spontaneous speech corpus is at IP-level, while Hellmuth’s experiment investigates PPs.

(17) Alharbi (1991: 166) notes that this fact may indicate that VS is the most commonly used word order in KA. He however points out that such a claim should be based on detailed syntactic investigation. That VS is the preferred word order in KA is indeed doubtful (Jonathan Owens, p.c; cf. also Holes 1990).

(18) What is called rheme here, is frequently referred to as focus in the more recent literature, for example, in Lambrecht (1994) and as FOCUS in El Zarka (2013a).

(19) Nucleus is defined here as the most prominent pitch accent, and the focused constituent in the sense it is used here is identical with the rheme.

(20) The penultimate word in his test sentences is the object of a VSO sentence followed by an adverb in most cases.

(21) Cf. also Mitchell (1993) for an excellent overview of some tunes in Arabic vernaculars.

(22) Chahal (2001), however, notes that in naturalistic speech, declaratives also typically involve a number of declining peaks.

(23) Perhaps a better term would be neutral or unmarked wh-question as opposed to echo questions, for instance.

(24) A high number of such questions occur in the EA corpus of the D2 project “Typology of Information Structure” of SFB 632 “Information Structure: The Linguistic Means for Structuring Utterances, Sentences and Texts,” University of Potsdam,

(25) Although Rammuny’s description is conducted in the traditional American framework using register targets (1–3) for JA, his annotations can be easily translated into an AM notation, yielding L*H for the final rise.

(26) The sentences are in Classical Arabic with full case endings and tanwin.

(27) Examples are Catalan (Vallduvi 1992) or Hungarian (e.g., É. Kiss 1998) among many others.

(28) Ghazali et al. (2007) investigate the vowel-consonant ratios in Arabic varieties and find a cline from West (lowest vowel intervals) to East (highest vowel intervals). They interpret their results in terms of more or less stress-timing. In this suggested cline, Egypt and Tunisia are located in the middle of the continuum, however.

(29) Some instances of complex configurations have been described for EA and LA, however only as marked tunes with special functions, such as irony or astonishment (EA) or as possible borrowings from English (LA).