Show Summary Details

This update includes a significantly extended bibliography.

Updated on 6 September 2017. The previous version of this content can be found here.
Page of

PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).

date: 20 April 2019

Issues in Arabic Computational Linguistics

Abstract and Keywords

This article focuses on the current state of affairs in the field of Arabic computational linguistics. It begins by briefly monitoring relevant trends in phonetics and phonology, morphology, syntax, lexicology, semantics, stylistics, and pragmatics. Then, the chapter describes changes or special accents within formal Arabic syntax. After some evaluative remarks about the approach opted for, it continues with a linguistic description of literary Arabic for analysis purposes as well as an introduction to a formal description, pointing to some early results. The article hints at further perspectives for ongoing research and possible spinoffs such as a formalized description of Arabic syntax in formalized dependency rules as well as a subset thereof for information retrieval purposes.

Keywords: phonetics, phonology, morphology, syntax, lexicology, semantics, stylistics, pragmatics

9.1 Introduction

At a meeting in Doha (Qatar 2011), experts discussed the challenges for Natural Language Processing (NLP)1 applied to (and, if possible, in) Arabic, concerning technologies, resources, and applications in cultural, social, educational, medical, and touristic areas, in the region concerned, for the near future. Interestingly enough, there was a consensus (by majority of votes)2 about more focus on large-scale caption of written Arabic (OCR) in view of preservation and the accessibility of the Arabic and Islamic cultural heritage; the spoken varieties of Arabic in view of the development of all kinds of conversion and answering systems (AS) to and from a standard, speech-to-speech3 (STS) as well as to speech-to-text (STT) and text-to-speech (TTS)4 conversion; and (p. 214) finally, on multisimultaneous signal processing (subtitling, visualization, and instant translation),5 if possible, with event (EE) or factoid (FE) extraction for information retrieval (IR), document routing (DR), archiving purposes, mass storage, Aboutness suggestions, and different forms of retracing facilities.

NLP overlaps to a large degree with computational linguistics (CL), especially when both are applied to standard Arabic or spoken varieties. The former (NLP) usually centers on the interaction between man and machine, deals with “processing” and “automated processes,” and is as exact as possible in nature. It therefore remains measurable and verifiable (NLP is eager for applications). The latter (CL) concentrates exclusively on linguistic theory and language modeling, while using any computational means it can exploit, for an adequate, coherent, and consistent linguistic description or language model.

CL is usually characterized as a subsection of artificial intelligence (AI). However, I would like to underline its communicative (action and reaction) perspective against a purely cognitive environment of AI. Moreover, the communicative aspect of CL points to a reference to reality. Even when formalized and in its most abstract and logically implemented form, semantics still remains an open domain. Moreover, I would like to underline two complementary aspects of CL concerning Arabic, one the application of computer sciences to Arabic; and the other Arabic linguistics making as much use as possible of computational as well as linguistic means and techniques. The former is more striking, and the latter is more basic.

For information on relevant trends in Arabic NLP and CL, we do not need to start from scratch. General trends in NLP are adequately described in Manning and Schütze (2003) and Jurafsky and Martin (2009);6 for Arabic NLP, see Habash (2010) and the references therein. For CL in general, one should certainly consult Bunt et al. (2004, 2010).7 Soudi et al. (2007) and Farghaly (2010) offer valuable contributions in the field of Arabic CL but also deal with Arabic NLP. Levelt (1974) is a must for formal grammars (and psycholinguistics). On (more linguistically oriented) main and subentries, there is valuable information in Versteegh (2006–2009).8 Needless to say, the Internet is always a good, if not the best, starting point for a literature search.

The main topic of interest here is the current state of affairs in the field of Arabic CL. Relevant trends in phonetics and phonology, morphology, syntax, lexicology, semantics, and stylistics and pragmatics will be briefly examined. Then changes or special accents within the field of interest, namely, formal Arabic syntax, will be noted. After some evaluative remarks about the approach of this chapter, it continues with a linguistic description of MSA for analysis purposes as well as an introduction to a formal description. Some early results will be highlighted. Further perspectives are then (p. 215) offered for ongoing research and possible spinoffs such as a formalized description of Arabic syntax in formalized dependency rules as well as a subset thereof for IR purposes. Appendix 1 contains a list of acronyms frequently encountered in NLP and CL. Appendix 2, found only in the online version, gives a glossary of frequently-used terms in NLP and CL.

9.2 Arabic CL

Defined as a statistical or rule-based modeling of natural language from a computational point of view,9 CL should always contain a linguistic dimension in the form of a specific theory combined with a descriptive model together with a formal implementation in which that linguistic theory about a natural language or its adequate, coherent, and consistent description is entered in a processing environment for analysis or synthesis purposes. Jurafsky and Martin (2009) adequately describe a modern “toolkit” for CL, but we limit ourselves here mainly to rule-based modeling by means of a relational programming algorithm using a nondeterministic formalism of two interwoven context-free grammars, resulting in a bottom-up unification-based parser for Arabic.10 Levelt (1974) provides a descriptively and didactically good introduction in the field of Formal Grammars and Psycholinguistics.

Arabic CL thus combines a linguistic and a computational part. The linguistic part exploits the most recent developments in the field of adequate, coherent, and consistent language description. The formal part tests the (natural or programming) language description concerned with computational viability. Nowadays, linguistic description testing usually takes place in the framework of corpus linguistics (CoL) using large collections of authentic language data, as such serving as a reliable test bed and learning model for refining the linguistic description. The formal part of such a linguistic implementation can be tested using personal computers.

Authentic text data contain both stylistic and pragmatic elements. Then we are dealing with a form of semantics, hidden in structured combinations (syntax) of lexical elements (lexicon). Relations and dependencies between elements may be underlined with morphemes (morphology) that should be accounted for in a description of the language concerned. Such a description comprises the inventory of the smallest unit of linguistic description, which is called the phoneme (phonology), or its orthographic, the grapheme (orthography). In this way, authentic (Arabic) data represent single multiple-layer syntax, distinguished in modules only for practical reasons.

(p. 216) 9.2.1 Computational Phonetics and Phonology

Here the term phonetics indicates the study of the physical properties of the smallest units of linguistic description in the Arabic language (i.e., phonemes), whether for analysis (recognition) or synthesis (generation) purposes.11 At an early stage, this study was extended with the study of its graphic counterpart, the grapheme. Later, research started on the development of all kinds of remedial support such as Arabic Braille (AB), text-to-speech, and speech-to-text conversion, or combinations thereof (for blind, deaf, and dumb).

Phonology, on the other hand, is more concerned with the generalized, grammatical characterization of the Arabic phoneme and grapheme inventory of the language system. Computational phonology is the use of models in phonological theory. For the description of the typical nonconcatenative root and pattern system of Semitic languages in general and Arabic phonology and morphology in particular, McCarthy (1981) proposes a representation of different layers, further developed by Kay (1987), Beesley (1996), Kiraz (1994, 1997), Kiraz and Anton (2000, 2001), and Ratcliffe (1998) [Ratcliffe, “Morphology”]. More recent developments go in the direction of optimality theory (OT; Prince and Smolensky 2004) and syllabification (Kiraz and Möbius 1998). There already are some specialized studies in this field on Arabic phonology [Hellmuth, “Phonology”].

Müller (2002) adds an NLP flavor with his probabilistic context-free grammars for phonology.12 Computational phonology is the basis of many NLP applications, such as the previously mentioned AS systems and STT, TTS, and STS conversion and others such as Arabic speech recognition (SR), OCR, and text-to-text conversion (TTT) or machine translation (MT).13

In computational phonetics and phonology, we are confronted with terms such as tiers, distinct layers of representation (two-four, three), finite-state automata (FSA or FSM), transducers, programming languages, tables, tagging, ± deterministic, and a few other technical terms. Sometimes, a decisive discussion about progression in the field of research concerned is worded using general (Arabic) linguist terms, but for certain entries in Appendix 2 it was necessary to employ less frequently used linguistic terms.

9.2.2 Arabic Computational Morphology

As Richard Sproat correctly mentioned (Soudi et al. 2007: viii), Kaplan and Kay (1981) and Kay (1987), in line with Koskenniemi (1983), paved the way for Kenneth Beesley’s (1989) (p. 217) research on Arabic computational morphology, which led to the work of many others as well as to the development of applications in the field of Arabic morphology.

One of the pioneers in computational Arabic morphology, Tim Buckwalter, developed BAMA, an Arabic morphological analyzer (Buckwalter 2010). Initially, his research was oriented toward automated corpus-based Arabic lexicology. Later, three lexicons, compatibility scripts, and an algorithm in the feature-rich dynamically typed14 programming language called Perl were combined in a software package for the morphological analysis of Arabic words (Buckwalter 2002, 2004), used, inter alia, for morphological and part-of-speech (POS, part of speech) tagging as well as for syllabification of authentic data in existing Arabic Treebanks15 (Maamouri and Bies 2010; Smrž and Hajič 2010) for morphological or syntactic annotation.

It is not surprising that research on Arabic computational morphology is easily adopted, adapted, and incorporated into general approaches to computational phonetics, phonology, and morphophonemics. Al-Sughaiyer and Al-Kharashi (2004) classify a number of Arabic morphological analyzers (analyzers) and synthesizers (generators) according to the approach employed regarding table lookup, linguistic (two-level, FSA or FSM, traditional applications), combinatorial, and pattern-based approaches. As Köprü and Miller (2009) point out, “Very few of the available systems are evaluated using systematic and scientific procedures.” This is perhaps a bit too harsh a criticism. However, it is always worthwhile to scrutinize and evaluate the advantages and disadvantages as well as the adequacy, coherency, and consistency of a chosen approach.

Evaluating 20-odd Arabic morphological analyzers and synthesizers, Al-Sughaiyer and Al-Kharashi (2004: 198, Table 4) mention their algorithm name and type: some “brand” names and even one “Sebawai”;16 many “linguistics”; and one “rule based.” Smrž (2007: 5–6) qualifies Beesley (2001), Ramsay and Mansur (2001), and Buckwalter (2002) as “lexical” in nature. Habash (2004) calls his own work “lexical-realizational” in nature. Finally, Cavalli-Sforza et al. (2000), Habash et al. (2005), Dada and Ranta (2006), and Forsberg and Ranta (2004) are rather “inferential-realizational.”

For his ElixirFM, Smrž (2007: 2) emphasizes its implementation within the Prague framework of function generative dependency (FGD) in functional programming (Haskell), contrasting with the dynamic programming (Perl) of Buckwalter (2002) and resulting in “a yet more refined linguistic model.”

Partly based on the operational tagging system of Buckwalter’s BAMA morphological analyzer for Arabic, Otakar Smrž developed “description of [Arabic] surface syntax in the dependency framework” (Smrž and Hajič 2010). This brings us to the doorstep (p. 218) between Arabic phonology–morphology and Arabic computational syntax at least as far as the representation of the analysis results in the form of dependency trees is concerned. These results are obtained on the basis of a pretagged corpus. The Prague linguists opted for a functional dependency grammar approach. Nonetheless, also for the computational description of Arabic morphology and syntax, a programming language, Haskell,17 is being used.18 There is an important difference between the use of a programming language and a formalism for implementable and operational descriptions of a natural language.19

9.2.3 Arabic Computational Syntax

Syntax is the description of the overall organization of a natural language in which different complementary building blocks such as phonology, morphology, lexicology, semantics, stylistics and pragmatics come together to convey a particular message between an A and a B. To describe the general structure of this organization for natural languages in general or for a specific language such as Arabic is the objective of the linguistic part of the description. To find an implementable formal model for such a description is the objective of the computational part of that description.

9.2.3.1 Linguistic Part

For a historical overview of language description, I refer to HSK 18.3 (2006). Here we limit ourselves to the century of the dominance of immediate constituency (IC) and the rise of many other linguistic theories and descriptive models such as dependency grammar, of importance or used for the linguistic description of Standard and spoken Arabic varieties.

It is evident that the splitting up of a natural language system into its largest and smallest units of linguistic description, as well as the description of mutual relationships and (p. 219) dependencies between these units, form an excellent starting point for any research on the fundamentals of human communication in general, and of the organisation of a specific natural language system in particular (Habash 2010: chapter 6). Computational linguistics (cf. Winograd 1983) started with the annotation (POS tagging) of formal (e.g., parts of speech; word and phrasal categories, sentences, sections, chapters, volumes, and the marking of nontextual insertions);20 functional elements (e.g., cases, clitics, determiners, heads and modifiers, slots and fillers) in authentic text data (CoL); and continued later in the presentation of derivation trees or labeled bracketing, extracted from this (earlier inserted) information.

9.2.3.2 Formal part

For a historical overview of computational language description in general, I refer to Winograd (1983). Here we speak about the current state of syntactic parsing of Arabic text data wherein different steps can be distinguished. Usually, they are labeled with terms such as tokenization, diacritization, POS tagging, morphological disambiguation (Marton et al. 2010), base phrase chunking, semantic role labeling, lemmatization, stemming, and the like (cf. Appendix 2; cf. also Mesfar 2010). Most of these processes have been automated by now, but all the existing collections of syntactically analyzed Arabic text data (Habash 2010: section 6.2) such as the Penn Arabic Treebank (Maamouri et al. 2004), the Prague Arabic Dependency Treebank (Hajič et al. 2001), and the Columbia Arabic Treebank (Habash and Roth 2009) have been manually checked. This “forest of treebanks” (Habash 2010: 111) can now be used as learning models for the development of new statistical parsers, evaluating parsers and general Arabic parsers.

9.2.4 Arabic Computational Lexicology

Computational lexicology is the branch of linguistics, which is concerned with the use of computers in the study of machine-readable dictionaries (lexicon). Sometimes this term is synonymous with computational lexicography, though the latter is more specifically for the use of computers in the construction of dictionaries (Al-Shalabi and Kanaan 2004).21

Piek Vossen, a well-known computational lexicologist, founder and president of the Global Wordnet Association, worked on the first WordNet project (Fellbaum 1998) and supervised parallel projects such as EuroWordNet (Vossen 1998) and Arabic WordNet (Black et al. 2006; Elkateb et al. 2006). He is thinking in terms of (multi)lingual lexical (p. 220) databases with lexical semantic networks. We come close to a distinction in form, function, meaning, and contextual realization of a lexical entry. Besides this distinction we always have the linguistic and the formal part.

9.2.4.1 Linguistic part

Relevant here are studies such as those of Dévényi et al. (1994) on Arabic lexicology and lexicography as well as other research with valuable bibliographical references (Bohas 1997; Bohas and Dat 2008; Hassanein 2008; Hoogland 2008; Seidensticker 2008). Moreover, one should include the studies about affixes, features (Dichy 2005; Ditters 2007), or parameters hinting at theta, thematic, or semantic roles.

9.2.4.2 Formal part

On the formal side I would like to mention the tag sets (Habash 2010: 79–85; Maamouri et al. 2009; Diab et al. 2004; Diab 2007; Kulick et al. 2006; Habash and Roth 2009; Khoja et al. 2001; Hajič et al. 2005) used for the annotation of the corpora of Arabic text data as well as the by then enriched corpora (treebanks) from which all kinds of relevant information can be extracted. Here should also be included studies on Arabic semantic labeling (Diab et al. 2007) and Arabic semantic roles (Diab et al. 2008).

9.2.5 Arabic Computational Semantics, Stylistics, and Pragmatics

Computational syntax, at the academic level, is still not common practice. Computational semantics, stylistics, and pragmatics are even at a more rudimentary stage,22 not only as far as the Arabic language is concerned but also even for more intensively studied natural languages. It is worthwhile here to refer to the HSK volumes on dependency and valency (HSK 25, 2003–2006), and in particular to contributions of interest for our discussion23 (Owens 2003; Msellek 2006; Bielický and Smrž 2008, 2009).

According to Wikipedia:24

Computational semantics is the study of how to automate the process of constructing and reasoning with meaning representations of natural language expressions. It consequently plays an important role in natural language processing and computational linguistics. Some traditional topics of interest are: semantic analysis, semantic (p. 221) underspecification, anaphora resolution, presupposition projection, and quantifier scope resolution. Methods employed usually draw from formal semantics or statistical semantics.

Issues in Arabic Computational LinguisticsClick to view larger

Figure 9.1 An example from the Arabic Propbank

(Habash 2010, 115).

In a note on Arabic semantics, Habash (2010: chapter 7) mentions the Arabic Proposition Bank (Propbank) and Arabic WordNet. A propbank (Palmer et al. 2005) is, in contrast with a treebank, a semantically annotated corpus. On the basis of predicate—argument information of the complement structure of a verb so-called frameset definitions (Baker et al. 1998) are associated with each entry of the verbal lexicon (Palmer et al. 2008). The description of the nature, number, and role of the arguments can be as detailed and specific as a linguistic description of the semantics of a language allows. It may be clear that here lies the greatest challenge in the development of adequate, coherent, and consistent parsers for any natural language text data, MSA, and spoken varieties of Arabic included.

Figure 9.1 presents information about the linguistic unity of description (S), type of sentence (NP-TPC1), with a verb phrase (VP) as comment. The topic (ARG0) is realised by a noun phrase (NP). The comment, which may be termed “predicate” is realized by a finite transitive verb (PRED) with an implicit subject (ARG0), a direct object noun phrase (ARG1), and a prepositional time adverbial (ARGM-TMP). There is a form of subcategorization at the phrasal level (NP-TPC1, NP, NP-SBJ1, NP-OBJ). Nouns are by subscripts divided into common nouns and subcategories. There is some form of description in terms of functions and categories, but it is not maintained in a consistent and coherent way until the final lexical entries have been reached.

Arabic WordNet has been mentioned earlier in relation to computational lexicology (§1.4). Here I want to explicitly underline the importance of electronically available collections of text data and nowadays also parallel corpora, for linguistic research (Col = corpus linguistics). In the following section I defend a purely linguistic approach (p. 222) of Arabic language description exploiting as much computational means as possible on the basis of authentic data and within a long-standing linguistic Arabic grammatical tradition.

Other studies in the fields of semantics,25 dialogue (Hafez 1991), discourse,26 stylistics (Mitchell 1985; Mohammad 1985; Somekh, 1979, 1981), and pragmatics27 remain mainly theoretical. Little can be found on heuristics. Something like a dialogue act annotation system allowing the ranking of communicative functions of utterances in terms of their subjective importance (Włodarczak 2010) for Arabic has still to be written. Computational semantics has points of contact with the areas of lexical semantics (word sense disambiguation and semantic role labeling), discourse semantics, knowledge representation, and automated reasoning (in particular, automated theorem proving).

9.3 A Formalized Linguistic Description of Arabic Syntax

There are, in my opinion, some basic concepts and rules important for long-term linguistic research. The first point is that syntax encompasses a number of subfields, including phonology, morphology, lexicology, semantics, stylistics, pragmatics, and heuristics, together with their respective branches, including the use of computational tools. Moreover, linguistic research should positively and negatively improve the field: positively in the sense of enriching the discipline as well as socially relevant; and negatively in the sense of convincingly demonstrating that a specific approach did not and will not lead to any useful results or meaningful research.

A rather important rule is that any account of linguistic research, with some additional information and footnotes, should be readable and understandable for, as well as verifiable by, colleagues. Finally, scientific linguistic research should not be presented encoded in machine language or a programming language printout or even in PDF form and, moreover, should not be superficial, as are many of the presentations of commercial researchers and product developers.

(p. 223) The description of the syntactic structure28 of Standard Arabic, readable and understandable for, as well as verifiable by, colleagues may have the form of a hypothesis, to be tested against authentic language data. After refining and renewed testing, this leads to a theory about the syntactic structure of the same layer of data of Standard Arabic as tested in the data. The same approach can be followed by further research on other Arabic text data.

Earlier, a listing was made of useful and (moreover) operational computational instruments, including machine-readable resources of all kinds, for the automated linguistic research on Arabic. We discussed the difference between Arabic NLP and Arabic CL, underlining the independence of the linguistic and formal parts in this research, while acknowledging a bias in favor of the linguistic part. Here I will defend an approach to an adequate, consistent, and coherent description of, in this case, MSA for the automated analysis29 of authentic Arabic text data.

First, we position this section in Arabic NLP history (9.3.1). Then I say something about linguistic and formal concepts for language description within the Arabic grammatical tradition (9.3.2). Finally, I present a sample of a linguistic (9.3.3) and a formal part (9.3.4) of a description of MSA. Finally (9.4), I say something about perspectives on the basis of options chosen.

9.3.1 Evaluating Remarks About the Approach Opted for

When Smrž (2007: 68) says, “The tokens30 with their disambiguated grammatical information enter the annotation of analytical syntax,” we are in the linguistic part of our discussion about computational Arabic syntax. The same is the case with Topologische Dependenz-grammatik fürs Arabische (Odeh 2004). In both, the results of the analysis of some interesting syntactic peculiarities of the Arabic language, processed in a language-independent dependency-oriented environment (Debusmann 2006), are presented in the form of unambiguous, rather very nice tectogrammatical dependency trees on the basis of an analytical representation in the case of Smrž (2007), and, except for the labeling, in almost identical ID (immediate dominancy) and LP (linear precedence) representations (Odeh 2004, see Figure 9.2a).31 (p. 224)

Issues in Arabic Computational LinguisticsClick to view larger

Figure 9.2a Tectogrammatic representation of the analysis of a sentence

(Smrž 2007b, 73).

The sentence can be interpreted as containing a predicate (PRED) with an agent (ACT), no expressed addressee (ADDR), and an object (PAT). This object consists of two coordinated topics (ID), both further specified by an attributive modifier (RSTR). The second modifier does not have an agent but does govern an object (PAT). A positional apposition (LOC) plays the role of sentence adverbial. The second part of Figure 9.2b lists the Arabic word and its English translation as well as the tags used for the analytical representation (column 1). Column 2 lists the values for some variables used in the analysis. The third column (3) gives the values (in upper case) and representation of the variables (in lower case) I use in my approach of two-level description of the same target language.

Odeh (2004: Figure 9.3) presents the ID and the LP representation of a sentence with a finite verb form in first position. The abbreviations in Figure 9.3a are self-evident. Those in Figure 9.3b refer to the topological fields: sentence field (sf) and article field (artf).32

Notwithstanding the vagueness of the dismissal (Smrž 2007: 6), we like to comment on:

The outline of formal grammar (Ditters, 2001), for instance, works with grammatical categories like number, gender, humanness, definiteness, but one cannot see which of the existing systems could provide for this information correctly, as they (p. 225) misinterpret some morphs for bearing a category, and underdetermine lexical morphemes in general as to their intrinsic morphological functions.33

Issues in Arabic Computational LinguisticsClick to view larger

Figure 9.2b The analyzed sentence: ‘and in the section on literature, the magazine presented the issue on the Arabic language and the dangers that threaten it.’

(Smrž 2007b, 72–73).

This is a correct remark, as far as Ditters (2001) is concerned. I am working with a description in terms of grammatical functions and categories, final lexical entries, dependency relations, and, additionally, an opening toward a description of semantic features as well.34

Grammatical in this context involves, as said earlier, the phonological, morphological, structural, and lexical modules for language description with rudimentary extensions to semantics, stylistics, and pragmatics as well as. It is necessary to remain understandable for and verifiable by colleagues. Serious semantic extensions are awaiting further computationally more coordinated research under supervision of the linguistic twin part in this kind of research. Let us continue with some words about the outline of formal grammar. (p. 226)

Issues in Arabic Computational Linguistics

Figure 9.3a Immediate dominance (ID) representation.

Issues in Arabic Computational Linguistics

Figure 9.3b Linear precedence (LP) representation.

Computational means can be used to test a hypothesis about the linguistic structure of MSA sentences35 in an efficient but linguistically understandable wording of nonterminals and terminals, applying only a context-free grammar formalism (with room for some additional context-free layers in the second level of description for semantics) but considering the availability of compilers for one, two, or more levels of context-free attribute grammar formalisms. Until now this approach has proved to be promising.

9.3.2 Linguistic and Formal Concepts About Language Description in the Arabic Grammatical Tradition

Immediate constituency (IC) has dominated descriptive linguistics for a long time. Dependency grammar (DG) concepts were already (according to Carter 1973, 1980; Owens 1988) familiar to Arab and Arabic grammarians and became a welcome, and needed, addition to an implementable descriptive power for natural languages in general and MSA in particular. Moreover, one can always explore other valuable suggestions.

(p. 227) The basis for the description of parts of speech, functions, and categories in phrasal categories in MSA can be found in the Kitāb of Sībawayhi (d. 798).36 Carter (1973: 146)37 calls it “a type of structuralist analysis unknown to the West until the 20th century.” He ends his abstract with:

Utterances are analysed not into eight Greek-style “parts” but into more than seventy function classes. Each function is normally realized as a binary unit containing one active ‘operator’ (the speaker himself or an element of his utterance) and one passive component operated on (not “governed”) the active member of the unit. Because every utterance is reduced to binary units, Sībawayhi’s method is remarkably similar to immediate constituent Analysis, with which it shares both common techniques and inadequacies, as is shown.

As a first example of such a class of functions, Carter (1973: 151) presents the triad of “grammatical effect” (ʿamal), comprising a “grammatically affecting” (ʿāmil) and a “grammatically affected” (maʿmūl) component. Similar triads (possibly considered as dependency rules) can be drafted from the other function classes listed. Moreover, the function classes could be arranged in subcategories, for example, accounting for relationships between constituents, sentence types, or word formation. Other function classes deal with phonology and morphology, have discourse functions, or are related to stylistics. Finally, on the basis of Sībawayhi’s comprehensive (exhaustive?) description of Arabic syntax, the function classes could easily be extended with more of this kind of “dependency rules,” for example, with triads accounting for transitivity, or the subject–object–predicate relations.

Owens’ (1988) historical overview with an apparent DG related perspective is broader. In section 2, all the relevant issues are discussed: constituency (IC) in Arabic theory (§2.9); dependency (DG) (§2.9.2); and dependency in Sībawayhi (§2.9.3). There are also chapters about markedness in Arabic theory (§8) and syntax, semantics, and pragmatics (§9).

9.3.3 Linguistic Description of MSA for Analysis Purposes38

An ideal compromise between IC and DG seems to be, for me, a context-free grammar description of MSA in IC terms accounting for the horizontal sequential order, enriched with a second, context-free grammar level attached to the nonterminals of the first level, accounting for relationships and dependencies (DG) in the vertical relational order between elements of a constituent or between constituents of one of the two sentence types (nominal and verbal). A third context-free grammar layer, only (p. 228) describing semantic extensions and properties, proved to be locatable (for the moment) within the two-layer context-free grammar frame.39

Working with two semantically and therefore also syntactically different sentence types in MSA, for both types one can distinguish categorial and functional nonterminals, categorial and functional terminals, lexical terminals, dependencies, and relationships between elements of a constituent and between constituents at sentence level. The syntactic consequences of a semantic property of a lexical entry will, of course, be of importance for the analysis and generation of text data of the language concerned.40

The syntax of sentence types in MSA, Sn, and Sv is described by means of alternating layers of categories, functions, and categories until terminals (here: lexical entries) have been reached. In the Sn, we distinguish two obligatory slots: a topic and a comment. The filler for a topic function characteristically belongs to the class of N’s (here: head) or to the category NP’s (optionally also containing modifiers to the head). In the Sv, we are dealing with a single obligatory slot, a predicate. The filler of a predicate function typically belongs to the class of V’s (head) and is usually realized by a VP (optionally also containing modifiers to the head). The comment function in the Sn as well as optional slots, in both the Sn and the Sv, may be filled with entries of the different classes or with different phrasal categories.

In line with the first words of the Kitāb of Sībawayhi, as parts of speech we distinguish elements of the open word classes nouns (N) and verbs (V) and the closed class of particles (P).41 Elements of these classes realize the head function in phrasal constituents such as noun phrases (NP), verb phrases (VP), and particle phrases (PP) as well as properties of elements of the word classes (N, V, and P), including morphological, syntactic, and semantic features (valency indications) and some pragmatic ones as well.42 The following section introduces a simple sample implementation.

9.3.4 The Formal Description of MSA

For an adequate, coherent, and consistent description of MSA I examined different linguistic theories and models (Ditters 1992) for the best products, implementable in a processing environment for analysis purposes. For example, generalizations in the form of transformations in a transformational generative framework (TG) are too powerful a tool for an overall linguistic description;, when a simple machine like a computer failed to decide in a finite time whether or not a certain TG structure was described in the formal implementation of that linguistic description, I opted for a different formalism (AGFL)43 to implement a description (Ditters 2001, 2007, 2011) in terms of a combination of IC and DG.

(p. 229) In contrast to a programming language (dynamic, functional, or relational in nature), a formalism (Ditters 1992: 134) is exclusively for the static, nondeterministic, and declarative description of structures such as programming languages (e.g., Algol-68, Affix- Attribute-Feature- Logical Grammars), also suited for the description of natural languages (e.g., Arabic, Dutch, English, Hebrew, Latin, Spanish). The objective is to test such a hypothetical description of, in our case, lMSA syntax structure on real data, for example, an Arabic text corpus. It is the machine-readable text data that determine whether a match should occur or not. Once tested, corrected, and refined, the hypothesis has become a theory, certainly for the language structure represented in the test bed in the form of a new hypothesis to be tested on other Arabic text corpora. Briefly, AGFL is an interwoven two-level context-free grammar formalism44 with almost context-sensitive properties.45

On the Chomskian hierarchy of grammars (Levelt 1974) scale, context-free grammars are, for the description and automated testing of natural language descriptions, really rather nonproblematic. As a matter of fact, Chomsky qualified a context-free grammar as an inadequate descriptive tool for natural languages. However, he never showed, as far as I know, any interest in a combination of two (or even more) context-free grammars (with an almost context-sensitive descriptive power),46 enough to describe, for example, most of Standard Arabic, including at least parts of its semantic richness. Furthermore, he never tested, as far as I know, his linguistic hypotheses and theoretical models practically by computational means.

A rule-based context-free grammar rewrites one single nonterminal at the left-hand side into one or more nonterminals or lexical entries on the right-hand side. AGFL, successor of EAG (Appendix 1), is a formalism for the description of programming languages or natural languages in which large context-free grammars can be described in a compact way. Along with attribute grammars47 and DCGs, AGFLs belong to the family of two-level grammars, the first, context-free level, which accounts for the description of sequential word order of surface natural language elements, is augmented with set-valued features for expressing agreement between constituents and between elements of a constituent as well as linguistic properties (including semantic features). AGFL is implemented in CDL3 and C.48

Notational AGFL conventions include the rewrite symbol (:), the marking of alternatives (;), the separation of sequences (,), the end of a rule (.), the layout of nonterminals and terminals of the first level and of the nonterminals and final values of variables of the second level of description in terms of lower- and uppercase representation. (p. 230) Besides that, there is no longer any capacity problem for the storage of electronic data. Therefore, the choice of names and terminal values for the elements of the first and second level of description of, for example, MSA, may be as linguistically recognizable as one prefers.

We use four types of rules within the AGFL formalism: the so-called hyperrules; metarules; predicate rules; and lexical rules:49

  • Hyperrules formally describe the occurrence of elements of word classes, in single or phrasal form. Variation in the sequence of those elements is dealt with by the formalism.

  • Metarules define the nonterminals of the second level of description to a finite set of terminal values.

  • Predicate rules describe and, if needed, condition relationships and dependencies between phrasal constituents or between elements of a constituent.

  • Lexical rules describe final or terminal values of the first level of description, if possible with semantic features, some colocational information (including additional remarks about nonregular and unexpected occurrences in compositional semantics).

However, as is well-known, natural languages go further than the addition of the meaning of individual elements for capturing the real meaning of the linguistic data concerned.50

In Figure 9.4, Jaszczolt (2005) illustrates the process of utterance interpretation within the compositional theory of default semantics. It may be clear that the meaning of an utterance does not equal the sum of the meaning of its constituents.

The following presents a sample grammar51 of a two-level context-free rewrite AGFL52 for Modern Standard Arabic:53 (p. 231)

Issues in Arabic Computational Linguistics

Figure 9.4 Utterance interpretation in default semantics

(Jaszczolt 2005, 73).

  • GRAMMARsentence.
  • ROOT sentence(topic).
  • # 1 Meta rules
  • # These define the finite domain of values for non-terminals of the second level of description.
  • # This second level enables accounting for relationships and dependencies.
  • CASE::acc|gen|nom|invar.
  • DECLEN::defec|dipt|CASE.
  • DEFNESS::def|indef.
  • GENDER::fem|masc.
  • HEADREAL::com|pers|prop.
  • MODE::nominal|verbal.
  • MOOD::imper|indic|juss|subj.
  • NUMBER::coll|dual|PLUR|sing.
  • ORDER::topic|elliptic_topic.
  • PERSON::first|second|third.
  • PLUR::explu|inplu.
  • TENSE::perfect|MOOD.
  • TYPES::direc|finalintr|place|timeprep.
  • VOICE::active|passive.
  • # 2 Hyper rules
  • # They describe functions and categories of non-terminals at the first level of description,
  • (p. 232) # until lexical values have been reached. This level enables accounting, in an efficient way,
  • # for the generalization of word order and sentence structure.54 sentence(topic):
  • topic(GENDER,NUMBER),
  • topic comp(GENDER,NUMBER).
  • topic(GENDER,NUMBER):
  • nounphrase(def,GENDER,NUMBER,PERSON,nom|invar);
  • prep(finalintr),
  • np(HEADREAL,def,GENDER,NUMBER,PERSON,gen);
  • bound prep(finalintr) +
  • np(HEADREAL,def,GENDER,NUMBER,PERSON,gen).
  • topic comp(GENDER,NUMBER):
  • predicate(MODE,DEFNESS,third,GENDER,NUMBER);
  • np(HEADREAL,DEFNESS,GENDER,NUMBER,PERSON,nom);
  • adjp(DEFNESS,GENDER,NUMBER,CASE);
  • ap;
  • pp.
  • nounphrase(def,GENDER,NUMBER,PERSON,nom|invar):
  • np(HEADREAL,def,GENDER,NUMBER,PERSON, nom|invar).
  • np(HEADREAL,def,GENDER,NUMBER,PERSON,CASE):
  • head(HEADREAL,DEFNESS,GENDER,NUMBER,PERSON,CASE), DEF is(HEADREAL,DEFNESS).
  • predicate(verbal,DEFNESS,third,GENDER,NUMBER):
  • vp(TENSE,PERSON,GENDER,NUMBER).
  • predicate(nominal,DEFNESS,PERSON,GENDER,NUMBER):
  • np(HEADREAL,DEFNESS,GENDER,NUMBER,PERSON,nom),
  • headreal is(HEADREAL).
  • head(HEADREAL,DEFNESS,GENDER,NUMBER,PERSON,CASE):
  • noun(DECLEN,GENDER,NUMBER).
  • noun(DECLEN,GENDER,NUMBER):
  • common noun(DECLEN,GENDER,NUMBER);
  • pers pronoun(GENDER,NUMBER,PERSON,CASE);
  • proper noun(DECLEN,DEFNESS,GENDER,NUMBER).
  • vp(TENSE,PERSON,GENDER,NUMBER):
  • verb(TENSE,VOICE,PERSON,GENDER,NUMBER).
  • # 3 Predicate rules
  • # They describe, or even determine, relations and dependencies between values of elements
  • (p. 233) # of the second level of description by means of the conditioning of specific values.
  • # This type of rules sometimes is called “ empty rules.”
  • # DEF is(HEADREAL,DEFNESS).
  • DEF is(com,DEFNESS):.
  • DEF is(pers,def):.
  • DEF is(prop,def):.
  • #Headrealis(HEADREAL).
  • headreal is(com):.
  • headreal is(pers):.
  • headreal is(prop):.
  • # 4 Lexical rules
  • # They describe the final or lexical value of entries in the lexicon.
  • # Adjp(DEFNESS,GENDER,NUMBER,CASE) and ap.
  • “adjp” adjp(DEFNESS,GENDER,NUMBER,CASE)
  • “ap”    ap
  • # Common noun(norm,masc,sing), including pronouns and proper nouns.

“raǧul”

common noun(norm,masc,sing)

“riǧāl”

common noun(norm,masc,inplu)

“bint”

common noun(norm,fem,sing)

“banāt”

common noun(norm,fem,inplu)

  • # Perspronoun(fem|masc,sing,first,nom).

“ʾanā”

perspronoun(fem|masc,sing,first,nom)

“naḥnu”

pers pronoun(fem|masc,inplu,first,nom)

“ī”

pers pronoun(fem|masc,sing,first,gen|acc)

“nī”

pers pronoun(fem|masc,sing,first,gen|acc)

“nā”

pers pronoun(fem|masc,inplu,first,gen|acc)

  • # Proper noun(dipt,def,masc,sing).

“ʾaḥmad”

proper noun(dipt,def,masc,sing)

“muḥammad

proper noun(norm,def,masc,sing)

“fātima

Issues in Arabic Computational Linguistics

proper noun(norm,def,fem,sing)

  • # Prep(TYPES).

“la”

bound prep(finalintr)

“la”

prep(finalintr)

“pp”

pp

  • # Verb(TENSE,VOICE,PERSON,GENDER,NUMBER).

“kataba”

verb(perfect,active,third,masc,sing)

“kutiba”

verb(perfect,passive,third,masc,sing)

“yaktubu”

verb(indic,active,third,masc,sing)

“yuktabu”

verb(indic,passive,third,masc,sing)

  • # Remark 55

9.4 Perspectives for Further Linguistic and Formal Research on Arabic Syntax

In a joint contribution to the Nemlar conference (Ditters and Koster 2004),56 we explored the potentiality of the existing approach to MSA syntax for other, socially equally relevant, spinoffs of my description of Arabic for corpus-linguistic analysis purposes. I prefer to finish testing the current implementation hypothesis about MSA syntax on Arabic text data. Second, I should like to refine the verified theory by means of a formal description of MSA syntax for generative purposes. Dependency grammar worded implementation could be extracted from such research. Finally, I like to make an, within the AGFL processing environment, implementable subset of DG rules for research on aboutness57 in Arabic text data.

Appendices

(p. 234) (p. 235) (p. 236)

Appendix-1 Frequently used Abbreviations and their Meaning

Symbol

Meaning

A

Aboutness

AB

Arabic Braille

ABP

Arabic proposition bank

AC

Automatic Correction

AGFL

Affix Grammar over Finite Lattices

AI

Artificial Intelligence

AS

Answering System

ASL

Arabic Sign Language

ASR

Automatic Speech Recognition

BP

Base phrase

CALL

Computer Assisted Language Learning

CFG

Context Free Grammar

CL

Computational Linguistics

CoL

Corpus Linguistics

DM

Data Mining

DR

Document Routing

EAG

Extended Affix Grammar

EBL

Example-based Learning cf. MBL

EE

Event Extraction

FE

Factoid Extraction

FGD

Function Generative Dependency

FSA

Finite State Automaton

FSM

Finite State Machine

FST

Finite State Transducer

HR

Handwriting Recognition

IBL

Instance-based Learning cf. MBL

IC

Immediate Constituency

ID

Immediate Dominancy

IR

Information Retrieval

LL

Lazy Learning cf. MBL

LM

Language Modeling

LP

Linear Precedence

MBC

Morphological Behavior Class

MBL

Memory-based Learning

MFH

Morphological Form Harmony

MLA

Machine Learning Approach

MSA

Modern Standard Arabic

MT

Machine Translation

(S)MT

(Statistical) MT

NER

Named Entity Recognition

NET

Named Entity Translation

NL

Natural language

NLP

Natural Language Processing

OCR

Optical Character Recognition

OT

Optimality Theory

ON

Orthographic Normalization

POS

Parts of Speech

POS-T

POS-tagging

QAS

Question Answering System

RBA

Rule Based Approach

SA

Statistical Approach

SBA

Stem Based Approach

SC

Spelling Correction

SG

Speech Generation

SLA

Supervised Learning Approach

SP

Signal Processing

SP

Speech Processing

SR

Speech Recognition

STS

Speech-to-Speech

STT

Speech-to-Text

SVM

Support Vector Machines

TC

Text Categorization

TDT

Topic Detection and Tagging

TG

Text Generation

TM

Text Mining

TP

Text Processing

TTS

Text-to-Speech

TTT

Text-to-Text

ULA

Unsupervised Learning Approach

Appendix-2 Relevant Notions and their Global (mostly linguistically oriented) Meaning

Notion

Circonscription

Affix

Here: a technical term for a position and/or its finite value as part of a meta rule in the framework of a formalized two-level language description; a linguistic term for any pre-, in- and/or postposed element of a word class to a central element of a word class in the same language description

Analysis

Decomposition of a linguistic descriptive unit in its smaller formal and/or functional constituents

Chunking

Base Phrase (BP) chunking is the process of creating non-recursive base phrases such as adjectival-, determiner-, noun-, particle-, prepositional-, verb phrases, and the like

Cliticization

Concatenation of a function word with an entry of another word category (for instance definite article in Arabic)

Compositional semantics

The Principle of Compositionality is, in mathematics, semantics, and philosophy of language, the principle that the meaning of a complex expression is determined by the meanings of its constituent expressions and the rules used to combine them (Wikipedia)

Computer language

The means by which instructions and data are transmitted to computers

Concatenative

In a root- and pattern language system two consecutive constituents of the language basic inventory

Context-free

Any production of the form A → B, where |B| ≠ 0, is allowed, B being ant string of terminal and non-terminal elements (Levelt, 1974, Vol. 1: 17)

Data (Text) Mining

High quality information extraction

Declarative

In this context: only static structures are defined or described

Decliticization

Separation of a function word with an entry of another word category (for instance definite article in Arabic)

Dependent

Here, an element in a dependency relation from/with another element of description

Deterministic

In this context: from the beginning the machine is explicitly being told: how to start, where to find what it needs for the execution of the program, what to do with it, what the next step will be, and when its activity will come to an end

Detokenization

Reconstructing tokens into a stream of text

Diacritization

The explicit addition of all and/or allmost all of the implicitly understood scriptural- or meaning markers in a textual representation of a communication

Disambiguation

Here, the reduction of a multiple result in an automated linguistic analysis of authentic text data (morphologically, syntactically and/or semantically) to a unique most probable result

Dynamic

In this context: there is a beginning, a series of steps to be taken, leading to an end

Dynamic programming

A man-machine medium, accentuating the process of a beginning, and the execution of a finite set of steps to be taken

Filler

Here, the possible content (category) of an obligatory or optional function within a formalized language description

Formal

Here, a symbolic as well as an automated element in an implemented (IT) language application

Formal description

A, usually rule-based, linguistic description of, in our case, the structure of a natural language for automated processing

Formal language

A set of words, i.e. finite strings of letters, symbols, or tokens, in computer science used, among other things, for the precise definition of data formats and the syntax of programming languages (Wikipedia)

Formalism

A, usually rule-based, medium between man and machine for computer processing purposes

Frameset

Meaning label of a verb with information about the numer and function of its arguments

Functional programming

A man-machine medium, accentuating functional elements in the process of a beginning, the execution of a set of steps to be taken inevitably leading to an end

Generation

Production of a greater whole from smaller constituents

Head

Central element in a linguistic unit

Language Modeling

Statistical estimation of the distribution of natural language as accurate as possible

Lemmatization

Grouping together the different inflected forms of a word so they can be analysed as a single item

Light stemming

Splitting off of post-clitics

Modifier

Linguistic element added to further specify a Head

Morphotactics

The syntax of morphemes

Non-concatenative

In a root- and pattern language system two non-consecutive constituents of the language basic inventory

Non-declarative

In this context: static as well as dynamic structures are defined or described

Non-deterministic

In this context: only the machine-readable input data condition a match with preprogrammed information

Non-terminal

Here, a symbolic name of an element of the first (word order) or the second level (relationships and/or dependencies) of language description

Orthographic normalization

Mapping variant orthographic forms to a single canonical form

Phrase chunking

The separation and segmentation of sentences into their sub-constituents, such as noun, verb, and prepositional

POS-tagging

Assigning a contextually appropriate morpho-syntactic annotation to every word in a sentence

Programming language

An artificial language designed to express computations that can be performed by a machine, particularly a computer

Propbank

A semantically annotated text corpus

Recall

The number of correct analyses/generations of the system divided by the number of all analyses/generations in the evaluation reference

Recognition

Match between something with something already known or acknowledgement of something as valid

Relational programming

A man-machine medium, accentuating relational elements, for instance dependencies, in the process of a beginning, the execution of a set of steps to be taken inevitably leading to an end

Slot

Here, an optional or obligatory function or position within a linguistic unit

Static

In this context: a description of structures with a beginning a series of steps to be taken, and end conditioned by the end of the input to be structurally matched

Stem

the root or roots of a word, together with any derivational affixes, to which inflectional affixes are added

Stemming

Splitting off of clitics and other non-core morphemes resulting in the representation of the pure stem (stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form)

Subcategorization

Finer grained restrictions on the arguments a head can take

synthesis

Composition of a linguistic descriptive unit from its smaller formal and/or functional constituents

Tag

A keyword or term associated with or assigned to a piece of information

Tagging

Assignment of a part-of-speech or another (sub-) category marker to each word in a corpus (of natural language text)

Tagset

A list of keywords or terms to be associated with or assigned to a piece of information

Terminal

Here, a final value of the first (lexicon) or the second level (meta rules) of language description

Text Categorization

The task of assigning predefined categories to free-text documents

Text Mining

High-quality information extraction

Tier

A layer or ranking or classification-group in any real or imagined hierarchy

Tokenization

Breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens (Tokenization is the process of segmenting clitics from stems)

Tokenization Schemes

Specification of the form of preprocessed output

Tokenization Techniques

Specification how to preprocess output

Transducer

A transducer is a device that converts one type of, originally, energy to another. In this context the energy = bit of information

Treebank

A morphologically and syntactically annotated corpus

Complete Bibliography

Abu Ghazaleh, Ilham N. 1983. Theme and the function of the verb in Palestinian Arabic narrative discourse, University of Florida. Dissertation Abstracts International 45:1738A.Find this resource:

Abu Libdeh, As’ad Abu Jabr. 1991. A discourse perspective on figurative expression in literary works with reference to English/Arabic translation, Heriot-Watt University, United Kingdom. Dissertation Abstracts International 52:3258A.Find this resource:

Alfalahi, Hussain Ali. 1981. The relationship between discourse universals and discourse structure of English and Arabic, University of South Carolina. Dissertation Abstracts International 42:3135A.Find this resource:

Ali, Nabil, 2003. The second wave of Arabic natural language processing: A content perspective. Economic and Social Commission for Western Asia. Expert Group Meeting on Promotion of Digital Arabic Content. Beirut, June 3–5, 2003. United Nations: Economic and Social Council, E/ESCWA/ICTD/2003/WG.2/18, 5.Find this resource:

Al-Jubouri, Adnan J. R. 1984. The role of repetition in Arabic argumentative discourse. In English for specific purposes in the Arab world, ed. J. Swales and H. Mustafa, 99–117. Birmingham: Language Studies, University of Aston.Find this resource:

Al-Najjar, Balkees. 1984. The syntax and semantics of verbal aspect in Kuwaiti Arabic. Ph.D. diss., University of Utah. Dissertation Abstracts International 44:3673A.Find this resource:

Al-Shabab, Omar S. 1986. Organizational and textual structuring of radio news discourse in English and Arabic. Ph.D. diss., University of Aston, Birmingham (United Kingdom). Dissertation Abstracts International 49:348C.Find this resource:

Al-Shalabi, Riyad, and Ghassan Kanaan. 2004. Constructing an automatic lexicon for Arabic language. International Journal of Computing and Information Sciences 2: 114–128.Find this resource:

Al-Sughaiyer, Imad A., and Ibrahim A. Al-Kharashi. 2004. Arabic morphological analysis techniques: A comprehensive survey. Journal of the American Society for Information Science and Technology 55(3): 189–213.Find this resource:

Al-Tarouti, Ahmed Fathalla. 1992. Temporality in Arabic grammar and discourse. Ph.D. diss., University of California, Los Angeles.Find this resource:

Baalbaki, Ramzi. 1979. Some aspects of harmony and hierarchy in Sībawayhi’s grammatical analysis. ZAL 2: 7–22.Find this resource:

———. forthcoming. Naḥw and ṣarf. In The Oxford handbook of Arabic linguistics, ed. Jonathan Owens. Oxford: Oxford University Press.Find this resource:

Bahloul, Maher. 1994. The syntax and semantics of taxis, aspect, tense and modality in standard Arabic. Ph.D. diss., Cornell University.Find this resource:

Baker, Collin, Charles Fillmore, and John Lowe. 1998. The Berkeley FrameNet project (COLING-ACL’98). Proceedings of the University of Montréal Conference, 86–90.Find this resource:

Bangalore, Srinivas, Aravind Joshi, and Owen Rambow. 2003. Dependency and valency in other theories. HSK 25(1): 669–678.Find this resource:

Bar-Lev, Zev. 1986. Discourse theory and “contrastive rhetoric.” Discourse Processes 9: 235–246.Find this resource:

Beesley, Kenneth R. 1989. Computer analysis of Arabic morphology: A two-level approach with Detours. In Perspectives on Arabic linguistics, vol. III, Papers from the 3rd annual symposium on Arabic linguistics, ed. Bernard Comrie and Mushira Eid, 155–172. Amsterdam: John Benjamins.Find this resource:

———. 1996. Arabic finite-state morphological analysis and generation. Proceedings of the 16th international conference on computational linguistics (COLING-96), Copenhagen, Denmark, 89–94.Find this resource:

———. 2001. Finite-state morphological analysis and generation of Arabic at Xerox research: Status and plans in 2001. Proceedings of the EACL workshop on language processing: Status and prospects. Toulouse, France, 1–8.Find this resource:

Bielický, Viktor, and Otakar Smrž. 2008. Building the valency lexicon of Arabic verbs. Proceedings of the 6th international conference on language resources and evaluation (LREC), Marrakech, Morocco.Find this resource:

———. 2009. Enhancing the ElixirFM lexicon with verbal valency frames. In Proceedings ofthe 2nd international conference on Arabic language resources and tools, ed. Khalid Choukri and Bente Maegaard. Cairo: The MEDAR Consortium.Find this resource:

Black, William, et al. 2006. Introducing the Arabic WordNet project. Proceedings of the 3rd international WordNet conference, Jeju Island, Korea.Find this resource:

Blohm, Dieter. 1989. Arabic verbs of horizontal movement: The action type and semantics of verb forms. Linguistische Studien 189: 1–16.Find this resource:

Bohas, Georges. 1997. Matrices, étymons, racines: Éléments ’d’une theorie lexicographique du vocabulaire arabe. Leuven: Peeters.Find this resource:

(p. 237) Bohas, Georges, and Mihai Dat. 2008. Lexicon: Matrix and etymon model. In EALL III, ed. Kees Versteegh et al., 45–52. Leiden: Brill.Find this resource:

Bouillon, Pierrette, et al. 2007. Adapting a medical speech to speech translation system (MedSLT) to Arabic. Proceedings of the 5th ACL workshop on computational approaches to Semitic languages, Prague, Czech Republic, 41–48.Find this resource:

Bröker, Norbert. 2003. Formal foundations of dependency grammar. HSK 25(1): 294–310.Find this resource:

Buckwalter, Timothy. 2002. Arabic morphological analyzer version 1.0. Philadelphia: Linguistic Data Consortium.Find this resource:

———. 2004. Issues in Arabic orthography and morphology analysis. In Proceedings of the workshop on computational approaches to Arabic script-based languages (COLING 2004), Geneva, Switzerland, ed. Ali Farghaly and Karine Megerdoomian, 31–34. Stroudsburg, PA: Association for Computational Linguistics.Find this resource:

———. 2010. The Buckwalter Arabic morphological analyzer, ed. Ali Farghaly, 85–101. Philadelphia: Linguistic Data Consortium.Find this resource:

Bunt, Harry, John Carroll, and Giorgio Satta. eds. 2004. New developments in parsing technology. Dordrecht: Kluwer Academic Publishers.Find this resource:

Bunt, Harry, Paolo Merlo and Joakim Nivre. (eds.) 2010. Trends in parsing technology. Berlin: Springer.Find this resource:

Busse, Winfried. 2006. Valenzlexika in anderen Sprachen. HSK 25(2): 1424–1435.Find this resource:

Carter, Michael G. 1973. An Arab grammarian of the eight century A.D.: A contribution to the history of linguistics. JAOS 93: 146–157.Find this resource:

———. 1980. Sibawayhi and modern linguistics. Histoire Épistémologique 2(1980): 21–26.Find this resource:

———. 2007 Pragmatics and contractual language in early Arabic grammar and legal theory. In: Everhard Ditters and Harald Motzki (eds.), Approaches to Arabic Linguistics: Presented to Kees Versteegh on the Occasion of his Sixtieth Birthday, 25−44. Leiden: Brill.Find this resource:

Cavalli-Sforza, Violetta, Abdelhadi Soudi, and Teruko Mitamura. 2000. Arabic morphology generation using a concatenative strategy. Proceedings of the 1st meeting of the North American Chapter of the Association for computational linguistics,(NAACL 2000), Seattle, WA, 86–93.Find this resource:

Chenfour, Noureddine. 2006. Automatic language processing. EALL I, ed. Kees Versteegh et al., 206—216. Leiden: Brill.Find this resource:

Coleman, John, and Janet Pierrehumbert. 1997. Stochastic phonological grammars and acceptability. Proceedings of the third meeting of the ACL special interest group in computational phonology. SIGPHON 3, ed. John Coleman. Stroudsberg, PA: Association for Computational Linguistics.Find this resource:

Dada, Ali, and Aarne Ranta. 2006. Arabic resource grammar. In Proceedings of the Arabic language processing conference (JETALA). Rabat, Morocco: IERA.Find this resource:

Dahl, Östen, and Fathi Talmoudi. 1979. Qad and laqad: Tense/aspect and pragmatics in Arabic. In Aspectology: Papers from the 5th Scandinavian conference of linguistics, Frostavallen, ed. Thore Pettersson, 51–68. Stockholm: Almqvist and Wiksell.Find this resource:

Daoud, Mohamed. 1991. The processing of EST discourse: Arabic and French native speakers’ recognition of rhetorical relationships in engineering texts. Ph.D. diss., University of California, Los Angeles. Dissertation Abstracts International 52:2905A.Find this resource:

Darwish, Kareem, and Douglas W. Oard. 2002a. Building a shallow morphological analyzer in one day. Proceedings of the workshop on computational approaches to Semitic languages in the 40th annual meeting of the Association for computational linguistics (ACL-02), Philadelphia, 47–54.Find this resource:

———.. 2002b. CLIR experiments at Maryland for TREC-2002: Evidence combination for Arabic-English retrieval. The 11th text retrieval conference (TREC11).Find this resource:

———. 2002c. Term selection for searching printed Arabic. SIGIR’02: Proceedings of the 25th annual international ACM SIGIR Conference on research and development in information retrieval, New York, 261–268.Find this resource:

———. 2007. Adapting morphology for Arabic information retrieval. In Arabic computational morphology: Knowledge-based and empirical methods, ed. Abdelhadi Soudi, Antal van den Bosch, and Günter Neumann, 245–262. New York: Springer.Find this resource:

Debusmann, Ralph. 2004. Extensible dependency grammar: A new methodology. In Proceedings of the Coling 2004 workshop on recent advances in dependency grammar, Geneva, Switzerland, ed. Denys Duchier and Geert-Jan M. Kruijff, 70–76. Geneva: Coling.Find this resource:

———. 2006. Extensible dependency grammar: A modular grammar formalism based on multigraph description. Ph.D. diss., Saarland University.Find this resource:

DeMiller, Anna L. 1988. Syntax and semantics of the form II modern standard Arabic verb. Al-’Arabiyya 21: 19–50.Find this resource:

Dévényi, Kinga, Tamás Iványi, and Avihai Shivtiel. eds. 1994. Proceedings of the colloquium on Arabic lexicology and lexicography (C.A.L.L.): Budapest 1-7 September. Part one: Papers in European languages. Part two: Papers in Arabic. Budapest: Eötvös Loránd University Chair for Arabic Studies.Find this resource:

Diab, Mona T. 2004. An unsupervised approach for bootstrapping Arabic sense tagging. In Proceedings of the workshop on computational approaches to Arabic script-based languages (COLING 2004), Geneva, Switzerland, ed. Ali Farghaly and Karine Megerdoomian, 43–50. Stroudsburg, PA: Association for Computational Linguistics.Find this resource:

———. 2007. Towards an optimal POS tag set for modern standard Arabic processing. Proceedings of recent advances in natural language processing (RANLP), Borovets, Bulgaria.Find this resource:

Diab, Mona, Kadri Hacioglu, and Daniel Jurafsky. 2004. Automatic tagging of Arabic text: From raw text to base phrase chunks. Proceedings of the 5th meeting of the North American chapter of the Association for computational linguistics/human language technologies conference (HLT-N/LICL04), Boston, MA, 149–152.Find this resource:

Diab, Mona, Musa Alkhalifa, Sabry ElKateb, Christiane Fellbaum, Aous Mansouri, and Martha Palmer. 2007. Arabic semantic labeling. Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-2007 18), Prague, Czech Republic, 93–98.Find this resource:

Diab, Mona, Alessandro Moschitti, and Daniele Pighin. 2008. Semantic role labeling systems for Arabic language using Kernel methods. Proceedings of ACL-08: HLT, Columbus, Ohio, 798–806.Find this resource:

Dichy, Joseph. 2005. Spécificateurs engendrés par les traits [±animé], [±humain], [±concret] et structures d’arguments en arabe et en français. In De la mesure dans les termes. Actes du colloque en hommage à Philippe Thoiron, ed. Henri Béjoint and François Maniez, 151–181. Travaux CRTT, Lyon: Presses Universitaires de Lyon.Find this resource:

Ditters, Everhard. 1992. A formal approach to Arabic syntax: The noun phrase and the verb phrase. Ph.D. diss., Nijmegen University.Find this resource:

———. 2001. A formal grammar for the description of sentence structure in modern standard Arabic. In EACL 2001 Workshop Proceedings on Arabic language processing: Status and prospects, 31–37. Toulouse, France: http://www.elsnet.org/acl2001-arabic.htmlFind this resource:

(p. 238) ———. 2006. Computational linguistics. In EALL I, ed. Kees Versteegh et al., 511–518. Leiden: Brill.Find this resource:

———. 2007. Featuring as a disambiguation tool in Arabic NLP. In Approaches to Arabic linguistics: Presented to Kees Versteegh on the occasion of his sixtieth birthday (Studies in Semitic Languages and Linguistics), ed. Everhard Ditters and Harald Motzki, 367–402. Leiden: Brill.Find this resource:

———. 2011. A formal description of sentences in Modern Standard Arabic. In Studies in Semitic languages and linguistics, ed. T. Muraoka, A. Rubin, and C. Versteegh, 511–551. Leiden: Brill.Find this resource:

Ditters, Everhard, and Cornelis H. A. Koster. 2004. Transducing Arabic phrases into head-modifier (HM) pairs for Arabic information retrieval. In Proceedings of the NEMLAR 2004 international conference on Arabic language resources and tools, September 22–23, 148–154. Cairo.Find this resource:

Duchier, Denys, and Ralph Debusmann. 2001. Topological dependency trees: A constraint-based account of linear precedence. 39th Annual meeting of the Association for computational linguistics. Toulouse, France.Find this resource:

Eijck, J. van, and Christina Unger. 2010. Computational semantics with functional programming. Cambridge: Cambridge University Press.Find this resource:

Eisele, John. 1988. The syntax and semantics of tense, aspect and time reference in Cairene colloquial Arabic. Ph.D. diss., University of Chicago.Find this resource:

———. 1990. Aspectual classification of verbs and predicates in Cairene Arabic. In Perspectives on Arabic linguistics II, ed. Mushira Eid and John McCarthy, 193–233. Amsterdam: John Benjamins.Find this resource:

———. 1990. Time reference, tense and formal aspect in Cairene Arabic. In Perspectives on Arabic linguistics I, ed. Mushira Eid, 173–212. Amsterdam: John Benjamins.Find this resource:

———. 1992. Egyptian/Cairene Arabic auxiliaries and the category of AUX. In Perspectives on Arabic linguistics IV, ed. Ellen Broselow, Mushira Eid, and John McCarthy, 143–165. Amsterdam: John Benjamins.Find this resource:

Elkateb, Sabri, et al. 2006. Building a WordNet for Arabic. Proceedings of the 5th international conference on language resources and evaluation (LREC 2006), Genoa, Italy.Find this resource:

Fakhri, Ahmed. 2002. Borrowing discourse patterns: French rhetoric in Arabic legal texts. Perspectives on Arabic linguistics ed. by Dilworth Parkinson and Elabbas Benmamoun, 155–170. Amsterdam: Benjamins.Find this resource:

———. 1998. Reported speech in Arabic journalistic discourse. In Perspectives on Arabic linguistics, vol. XI, Papers from the annual symposium on Arabic linguistics, ed. Elabbas Benmamoun, Mushira Eid, and Niloofar Haeri, 167–182. Amsterdam/Philadelphia: John Benjamins.Find this resource:

———. 1995. Topic continuity in Arabic narrative discourse. In Mushira Eid (ed.) Perspectives on Arabic linguistics, vol. VII, Papers from the annual symposium on Arabic linguistics, 141–155. Amsterdam: Benjamins.Find this resource:

———. 1992. Egyptian/Cairene Arabic auxiliaries and the category of AUX. In Perspectives on Arabic linguistics IV, ed. Ellen Broselow, Mushira Eid, and John McCarthy, 143–165. Amsterdam: John Benjamins.Find this resource:

Fareh, Shehdeh I. 1988. Paragraph structure in Arabic and English expository discourse. Ph.D. diss., University of Kansas. Dissertation Abstracts International 50:1292A.Find this resource:

Farghaly, Ali, ed. 2010. Arabic computational linguistics. Stanford, CA: CSLI Publications.Find this resource:

Farghaly, Ali, and Karine Megerdoomian, eds. 2004. Proceedings of the workshop on computational approaches to Arabic script-based languages (COLING 2004), Geneva, Switzerland. Stroudsburg, PA: Association for Computational Linguistics.Find this resource:

Fellbaum, Christiane. ed. 1998. WordNet: An electronic lexical database. Cambridge, MA: MIT Press. <http://wordnet.princeton.edu/>.Find this resource:

Fillmore, Charles J. 1968. The case for case. In Universals in linguistic theory, ed. Emmon Bach and Robert T. Harms, 1–88. New York: Holt, Rinehart, and Winston.Find this resource:

———. 2003. Valency and semantic roles: The concept of deep structure case. HSK 25(1): 457–475.Find this resource:

Forsberg, Markus, and Aarne Ranta. 2004. Functional morphology. In Proceedings of the 9th ACM SIGPLAN international conference on functional programming, ICFP 2004, 213–223. New York: ACM Press.Find this resource:

Ghobrial, Atef N. 1993. Discourse markers in colloquial Cairene Egyptian Arabic: A pragmatic perspective. Ph.D. diss., Boston University. Dissertation Abstracts International 53:4299A.Find this resource:

Gully, Adrian J. 1992. Aspects of semantics, grammatical categories and other linguistic considerations in Ibn-Hisham’s Mughni al-Labib. Ph.D. diss., University of Exeter.Find this resource:

Habash, Nizar. 2004. Large scale lexeme-based Arabic morphological generation. Proceedings of traitement automatique des Langues Naturelles (TALN-04), Fez, Morocco, 271–276.Find this resource:

———. 2010. Introduction to Arabic natural language processing. San Raphael, CA: Morgan & Claypool Publishers.Find this resource:

Habash, Nizar, and Ryan Roth. 2009. CATIB: The Colombia Arabic treebank. Proceedings of the ACL-IJCNLP 2009 conference Short Papers, Suntec, Singapore, 221–224.Find this resource:

Habash, Nizar, Owen Rambow, and George Kiraz. 2005. Morphological analysis and generation for Arabic dialects. Proceedings of the Workshop on computational approaches to Semitic languages at 43rd meeting of the Association for computational linguistics (’ACL’05), Ann Arbor, Michigan, 17–24.Find this resource:

Hafez, Ola M. 1991. Turn-taking in Egyptian Arabic: Spontaneous speech vs. drama dialogue. Journal of Pragmatics 15: 59–81.Find this resource:

Hajič, Jan, Jan Hric, and Vladislav Kuboň. 2000. Machine translation of very close languages. Proceedings of the 6th applied natural language processing conference (ANLP’2000), Seattle, Washington, 7–12.Find this resource:

Hajič, Jan, Barbora Hladká, and Petr Pajas. 2001. The Prague Dependency Treebank: Annotation Structure and Support. In Proceedings of the IRCS Workshop on Linguistic Databases, pages 105–114, Philadelphia, December 2001. University of Pennsylvania.Find this resource:

Hajič, Jan et al. 2004a. Prague Arabic dependency treebank: Development in data and tools. NEMLAR international conference on Arabic language resources and tools, ELDA, 110–117.Find this resource:

Hajič, Jan et al.. 2004b. Prague Arabic dependency treebank 1.0. Catalog number ILDC2004T23. LDC, Philadelphia.Find this resource:

Hajič, Jan et al. 2005. Feature-based tagger approximations of functional Arabic morphology. Proceedings of the 4th workshop on treebanks and linguistic theories (TLT 2005), Barcelona, Spain, 53–64.Find this resource:

Hajičová, Eva, and Petr Sgall. 2003. Dependency syntax in functional generative description. HSK 25(1): 570–592.Find this resource:

———. 2004. Degrees of contrast and the topic-focus articulation. In Information structure: Theoretical and empirical aspects, vol. 1, Language, context, and cognition, ed. Anita Steube, 1–13. Berlin: Walter de Gruyter.Find this resource:

Hassanein, Ahmed T. 2008. Lexicography: Monolingual dictionaries. In EALL III, ed. Kees Versteegh et al., 37–45. Leiden: Brill.Find this resource:

Hatim, Basil. 1987a. Discourse texture in translation: Towards a text-typological redefinition of theme and rheme. In Proceedings of the conference, held at Heriot-Watt University, Edinburgh, January 5–7, 1986. Translation in the modern languages degree, ed. Hugh Keith and Ian Mason, 52–62. London: Regent’s College.Find this resource:

———. 1987b. A text linguist model for the analysis of discourse errors: Contributions from Arabic linguistics. In Grammar in the construction of texts, ed. James Monaghan, 102–113. London: Pinter.Find this resource:

———. 1989. Text linguistics in the didactics of translation: The case of the verbal and nominal clause types in Arabic. department of Languages Heriot-Watt University 27: 137–144.Find this resource:

Hellwig, Peter. 1986. Dependency unification grammar. Proceedings of the 11th international conference on computational linguistics (Coling 1986), University of Bonn, Bonn, 195–198.Find this resource:

———. 2003. Dependency unification grammar. HSK 25(1): 593–635.Find this resource:

———. 2006. Parsing with dependency grammars. HSK 25(2): 1081–1109.Find this resource:

Hoogland, Jan. 2008. Lexicography: Bilingual dictionaries. In EALL III, ed. Kees Versteegh et al., 21–30. Leiden: Brill.Find this resource:

Horacek, Helmut. 2006. Generation with dependency grammars. HSK 25(2): 1109–1129.Find this resource:

Hudson, Richard A. 1976. Arguments for a non-transformational grammar. Chicago: University of Chicago Press.Find this resource:

———. 1984. Word grammar. Oxford: Blackwell.Find this resource:

———. 1990. English word grammar. Oxford: Blackwell.Find this resource:

———. 2003. Word grammar. HSK 25(1): 508–526.Find this resource:

Jaszczolt, Katarzyna M. 2005. Default semantics: Foundations of a compositional theory of acts of communication. Oxford: Oxford University Press.Find this resource:

Johnstone, Barbara. 1990. “Orality” and discourse structure in modern standard Arabic. In Perspectives on Arabic linguistics, vol. I, Papers from the annual symposium on Arabic linguistics, ed. Mushira Eid, 215–233. Amsterdam/Philadelphia: John Benjamins.Find this resource:

———. 1991. Repetition in Arabic discourse: Paradigms, syntagms, and the ecology of language. Amsterdam: John Benjamins.Find this resource:

Jurafsky, Daniel, and James H. Martin. 2009. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Upper Saddle River, NJ: Prentice Hall.Find this resource:

Justice, David B. 1981. The semantics of form in Arabic in the mirror of European languages. Ph.D. diss., University of California, Berkeley. Dissertation Abstracts International 42:5107A.Find this resource:

———. 1987. The semantics of form in Arabic in the mirror of European languages. Amsterdam: John Benjamins.Find this resource:

Kahane, Sylvain. 2003. The meaning-text theory. HSK 25(1): 546–570.Find this resource:

Kaplan, Ronald, and Martin Kay. 1981. Phonological rules and finite-state transducers. Linguistic Society of America meeting handbook, 56th annual meeting, New York.Find this resource:

(p. 239) Kay, Martin. 1987. Noncatenative finite-state morphology. Workshop on Arabic morphology. Stanford, CA: Stanford University Press, 2–10.Find this resource:

Khalil, Esam N. 1985. News discourse: A strategy of recasting. Journal of Pragmatics 9(5): 621–643.Find this resource:

Khoja, Shereen, Roger Garside, and Gerry Knowles. 2001. A tagset for the morphosyntactic tagging of Arabic. Proceedings of corpus linguistics 2001. Lancaster, UK, 341–353.Find this resource:

Kiraz, George Anton. 1994. Multi-tape two-level morphology: A case study in Semitic non-linear morphology. Proceedings of 15th international conference on computational linguistics (COLING-94), Kyoto, Japan, 180–186.Find this resource:

———.1997. Compiling regular formalisms with rule features into finite-state automata. ACL/EACL-97, Madrid, Spain, 329–336.Find this resource:

Kiraz, George Anton, and Bernd Möbius. 1998. Multilingual syllabification using weighted finite-state transducers. Proceedings of the 3rd ESCA workshop on speech synthesis, Jenolan Caves, Australia, 71–76.Find this resource:

———. 2000. Multi-tiered nonlinear morphology using multi-tape finite automata: A case study on Syriac and Arabic. Computational Linguistics 26: 77–105.Find this resource:

———. 2001. Computational nonlinear morphology with emphasis on Semitic languages. Studies in natural language processing. New York: Cambridge University Press.Find this resource:

Köprü, Selçuk, and Jude Miller. 2009. A unification based approach to the morphological analysis and generation of Arabic. 3rd Workshop on computational approaches to Arabic script-based languages (CAASL3).Find this resource:

Koch, Barbara J. 1981. Repetition in discourse: Cohesion and persuasion in Arabic argumentative prose. Ph.D. diss., University of Michigan. Dissertation Abstracts International 42:3983A.Find this resource:

———. 1983a. Arabic lexical couplets and the evolution of synonymy. General Linguistics 23(1): 51–61.Find this resource:

———. 1983b. Presentation as proof: The language of Arabic rhetoric. Anthropological Linguistics 25: 47–60.Find this resource:

———. 1984. Rhetorical style in Arabic: An addendum to “The least you should know about Arabic.” TESOL Quarterly 18(3): 542–545.Find this resource:

Koskenniemi, Kimmo. 1983. Two-level morphology: A general computational model for word-form recognition and production. Publications, no. 11. Helsinki: Department of General Linguistics, University of Helsinki.Find this resource:

Koster, Kees. 1971. Affix grammars. In Algol 68 Implementation, ed. J. Peck, 95–105. Amsterdam: North Holland.Find this resource:

———. 1991. Affix grammars for natural languages. Technical Report (1991): 7–12. Nijmegen University, Department of Informatics.Find this resource:

———. 2011. An aboutness-based dependency parser for Dutch. Coling’11.Find this resource:

Kulick, Seth, Ryan Gabbard, and Mitch Marcus. 2006. Parsing the Arabic treebank: Analysis and improvements. Proceedings of the Treebanks and Linguistic Theories Conference. Prague, Czech Republic, 31–42.Find this resource:

Larcher, Pierre. 1990. Eléments pragmatiques dans la théorie grammatical arabe post-classique. In Studies in the history of Arabic grammar II, ed. Kees Versteegh and Michael G. Carter, 195–214. Amsterdam: John Benjamins.Find this resource:

Levelt, Willem J. M. 1974. Formal grammars in linguistics and psycholinguistics: An introduction to the theory of formal languages. 3 vols. Den Haag/Paris: Mouton.Find this resource:

Maamouri, Mohamed, and Ann Bies. 2010. The Penn Arabic treebank. In ed. Ali Farghaly, Arabic computational linguistics. Stanford: CSLI, pp. 103–135.Find this resource:

Maamouri, Mohamed et al. 2004. The Penn Arabic treebank: Building a large-scale annotated Arabic corpus. Paper presented at the NEMLAR International Conference on Arabic Language Resources and Tools. Cairo, September 22–23.Find this resource:

Maamouri, Mohamed, Ann Bies, and Seth Kulick. 2009. Creating a methodology for large-scale correction of treebank annotation: The case of the Arabic treebank. In Proceedings of MEDAR international conference on Arabic language resources and tools. Cairo: Medar.Find this resource:

Mahmoud, Reda A. H. 2008. A text-pragmatic approach to moot questions in Arabic. In Perspectives on Arabic linguistics, ed. Dilworth B. Parkinson, 43–67. Amsterdam: John Benjamins.Find this resource:

Manning, Christopher D., and Hinrich Schütze. 2003 [1999]. Foundations of statistical natural language processing. Cambridge, MA: MIT Press (6th printing).Find this resource:

Marton, Yuval, Nizar Habash, and Owen Rambow. 2010. Improving Arabic dependency parsing with lexical and inflectional morphological features. Proceedings of the NAACL HLT 2010 1st workshop on statistical parsing of morphologically-rich languages. Los Angeles, CA, 13–21.Find this resource:

Maxwell, Dan. 1995. Unification dependency grammar. http://linguistlist.org/issues/6/6-792.html#4.

———. 2003. The concept of dependency in morphology. HSK 25(1): 678–684.Find this resource:

McCarthy, John. 1981. A prosodic theory of non-concatenative morphology. Linguistic Inquiry 12: 373–418.Find this resource:

Mel’čuk, Igor. 1988. Meaning-text theory (MTT).Find this resource:

———. 1988. Dependency syntax: Theory and practice. Albany: State University of New York Press.Find this resource:

———. 2003. Levels of dependency description: Concepts and problems. HSK 25(1): 188–229.Find this resource:

Mesfar, Slim. 2010. Towards a cascade of morpho-syntactic tools for Arabic natural language processing. In Computational linguistics and intelligent text processing. Proceedings of the 11th international conference CICLing 2010, ed. Romania Iaşi, New York: Springer: LNCS 6008.Find this resource:

(p. 240) Mitchell, Terence F. 1985. Sociolinguistic and stylistic dimensions of the educated spoken Arabic of Egypt and the Levant. In Language standards and their codification: Process and application, ed. J. Douglas Woods, 42–57. Exeter: University of Exeter.Find this resource:

Mohammad, Mahmud D. 1985. Stylistic rules in classical Arabic and the levels of grammar. Studies in African Linguistics Supplement 9: 228–232.Find this resource:

Moutaouakil, Ahmed. 1985. Topic in Arabic: Towards a functional analysis. In Syntax and pragmatics in functional grammar, ed. A. Machtelt Bolkestein, Casper deGroot, and J. Lachlan Mackenzie. Dordrecht: Foris.Find this resource:

———. 1987. Pragmatic functions in functional grammar of Arabic. Providence: Foris.Find this resource:

———. 1989. Pragmatic functions in a functional grammar of Arabic (Functional Grammar Series 8). Dordrecht/Providence: Foris.Find this resource:

Msellek, Abderrazaq. 2006. Kontrastive Fallstudie: Deutsch—Arabisch. HSK 25(2): 1287–1292.Find this resource:

Mughazy, Mustafa A. 2003. Discourse particles revisited: The case of Wallahi in Egyptian Arabic. In Perspectives on Arabic linguistics XV, ed. Dilworth B. Parkinson and Samira Farwaneh, 3–17. Amsterdam: John Benjamins.Find this resource:

———. 2005. Rethinking lexical aspect in Egyptian Arabic. In Perspectives on Arabic linguistics XVII–XVIII: Papers from the annual symposium on Arabic linguistics, ed. Mohammad T. Alhawary and Elabbas Benmamoun, 133–172. Amsterdam: John Benjamins.Find this resource:

———, ed. 2007. Perspectives on Arabic linguistics XX: Papers from the annual symposium on Arabic linguistics. Amsterdam/Philadelphia: John Benjamins.Find this resource:

———. 2007. Introduction. In Mustafa A. Mughazy (ed.), ix–xii.Find this resource:

———. 2008. The pragmatics of denial: An information structure analysis of so-called “emphatic negation” in Egyptian Arabic. In Perspectives on Arabic linguistics, ed. Dilworth B. Parkinson, 69–81. Amsterdam: John Benjamins.Find this resource:

Müller, Karin. 2002. Probabilistic context-free grammars for phonology. In Proceedings of ACL SIGPHON, 70–80. Philadelphia: Association for Computational Linguistics.Find this resource:

Odeh, Marwan. 2004. Topologische Dependenzgrammatik fürs Arabische. Abschlussbericht FR6.2-Informatik, Universität des Saarlandes.Find this resource:

Ojeda, Almerindo. 1992. The semantics of number in Arabic. In Proceedings of the second conference on linguistics and semantic theory, ed. Chris Barker and David Dowty, 303–325. Columbus: Ohio State University.Find this resource:

Oliva, Karel. 2003. Dependency, valency and head-driven phrase-structure grammar. HSK 25(1): 660–668.Find this resource:

Owens, Jonathan. 1988. The foundations of grammar: An introduction to medieval Arabic grammatical theory. Amsterdam: John Benjamins.Find this resource:

———. 2003. Valency-like concepts in the Arabic grammatical tradition. HSK 25(1): 26–32.Find this resource:

Palmer, Martha, Dan Gildea, and Paul Kingsbury. 2005. The proposition bank: An annotated corpus of semantic roles. Computational Linguistics 31: 71–106.Find this resource:

Palmer, Martha, et al. 2008. A pilot Arabic propbank. Proceedings of ’LIZEC. Marrakech, Morocco, 3467–3472. Proceedings of EMNLP 2004, Barcelona, Spain, 88–94.Find this resource:

Prince, Alan, and Smolensky, Paul. 2004. Optimality theory: Constraint interaction in generative grammar. Oxford: Blackwell.Find this resource:

Ramsay, Alan, and Hanady Mansur. 2001. Arabic morphology: A categorical approach. EACL workshop proceedings on Arabic language processing: Status and prospects. Toulouse, France, 17–22.Find this resource:

Ratcliffe, Robert R. 1998. The “broken” plural problem in Arabic and comparative Semitic: Allomorphy and analogy in non-concatenative morphology. Amsterdam: John Benjamins.Find this resource:

Russell, Robert A. 1977. Word order and discourse function in Arabic. Boston: Harvard University Press.Find this resource:

Ryding, Karin. 1992. Morphosyntactic analysis in Al-Jumal fii l-naḥw: Discourse structure and metalanguage. In Perspectives on Arabic linguistics IV: Papers from the annual symposium on Arabic linguistics, ed. Ellen Broselow, Mushira Eid, and John McCarthy, 263–278. Amsterdam/Philadelphia: John Benjamins.Find this resource:

Salib, Maurice B. 1979. Spoken literary Arabic: Oral approximation of literary Arabic in Egyptian formal discourse. Ph.D. diss., University of California, Berkeley. Dissertation Abstracts International 40:4010A.Find this resource:

Sawaie, Mohammed. 1980. Discourse reference and pronominalization in Arabic. Ph.D. diss., Ohio State University. Dissertation Abstracts International 41:3089A.Find this resource:

Schubert, Klaus. 1987. Metataxis. Dordrecht/Providence: Foris.Find this resource:

———. 2006. Maschinelle Übersetzungen mit Dependenzgrammatiken. HSK 25(2): 1129–1158.Find this resource:

Seidensticker, Tilman. 2008. Lexicography: Classical Arabic. In EALL III, ed. Kees Versteegh et al., 30–37. Leiden: Brill.Find this resource:

Sezgin, Fuat. 1967–1984. Geschichte des arabischen Schrifttums. 9 vols. Leiden: Brill.Find this resource:

———. 1995–2000. Geschichte des arabischen Schrifttums. Frankfurt am Main: Institut für Geschichte der Arabisch-Islamischen Wissenschaften, Johann Wolfgang Goethe Universität, Gesamtindices zu Band I–IX, and Volumes: X–XII.Find this resource:

Sgall, Petr, Eva Hajičová, and Jarmila Panevová. 1986. The Meaning of the sentence in its semantic and pragmatic aspects. Dordrecht: Reidel.Find this resource:

Sībawayhi (d.798). Al-Kitāb. 2 vols. Būlāq 1316 A.H. Reprint Baghdad: al-Muṯannān.Find this resource:

Smrž, Otakar. 2007. Functional Arabic morphology: Formal system and implementation. Ph.D. diss., Charles University, Prague, Czech Republic.Find this resource:

Smrž, Otakar and Jan Hajič. 2010. The other Arabic treebank: Prague dependencies and functions. In ed. Ali Farghaly, 2010. Arabic computational linguistics. Stanford, CA: CSLI Publications, 137–168.Find this resource:

Somekh, Sasson. 1979. The emergence of two sets of stylistic norms in the early literary translation into modern Arabic prose. Ha-Sifrut-Literature 8(28): 52–57.Find this resource:

———. 1981. The emergence of two sets of stylistic norms in the early literary translation into modern Arabic prose. Poetics Today 2: 193–200.Find this resource:

Soudi, Abdelhadi, Antal van den Bosch, and Günter Neumann, eds. 2007. Arabic computational morphology: Knowledge-based and empirical methods. Dordrecht: Springer.Find this resource:

Starosta, Stanley. 2003. Lexicase grammar. HSK 25(1): 526–545.Find this resource:

Suleiman, Saleh M. 1989. On the pragmatics of subject preposing in standard Arabic. Language Sciences 11: 215–235.Find this resource:

———. 1990. The semantic functions of object deletion in classical Arabic. Language Sciences 12: 255–266.Find this resource:

Vossen, Piek. 1998. EuroWordNet: A multilingual database with lexical semantic networks. Dordrecht: Kluwer.Find this resource:

Winograd, Terry. 1983. Language as a cognitive process, Volume 1, Syntax. Reading MA: Addison-Wesley.Find this resource:

Włodarczak, Marcin. 2010. Ranked multidimensional dialogue act annotation. Language and Computation, 162–173. (New Directions in Logic, Language and Computation, ESSLLI 2010 and ESSLLI 2011. Berlin: Springer.Find this resource:

Zabbal, Youri. 2002. The semantics of number in the Arabic noun phrase. M.A. thesis, University of Calgary.Find this resource:

Notes:

(1) Appendix 1 at the end of the chapter lists common abbreviations and technical terms (frequently) used not only in this field in general but also in the current of this contribution, together with some paraphrasing of terms used in this contribution.

(2) I would like to have had some more attention for equally socially relevant matters like pure linguistics.

(4) On the program of the annual ALS symposium on Arabic linguistics (2011), more than half of the presentations (17 of 31) dealt with Arabic colloquials, the diglossia situation, and the application of general linguistic theories for the description of Arabic colloquials. Beginning in 1990, this trend can be found in all the issues of Perspectives on Arabic Linguistics. For a while, the Moroccan Linguistic Society had a similar development.

(5) If possible, together with a deaf window as well as a form of simultaneous (Arabic) Braille output.

(6) See also Ali (2003).

(7) The end of this chapter offers suggestions for further reading.

(8) For the main topics of this chapter, see Chenfour (2006) and Ditters (2006). Subtopics and references will be referred to in the body of the text.

(9) The source here is Wikipedia; see also Ditters (2006).

(10) For a paraphrase of descriptive terms, see Appendix 2.

(11) There is a difference in interest between the computational subword (phonetics) and the computational word (phonology) level and beyond. This chapter is concerned with remedial and commercial applications.

(12) See also Coleman and Pierrehumbert (1997) on stochastic phonological grammars and acceptability.

(13) Cf. Farghaly (2010: chapters 3 and 4) and Habash (2010: chapter 8 and Appendix 2).

(14) Here in contrast with statically typed. Computer science presently has four main branches of programming languages: imperative oriented; functional oriented; logical oriented; and object oriented. For our purposes this information will be enough.

(15) Wikipedia paraphrases treebank as a text corpus in which each sentence has been parsed, that is, annotated with syntactic structure, which is commonly represented as a tree.

(16) I appreciate Darwish’s (2002) reference to Sībawayhi in his account of “a one-day construction of a shallow Arabic morphological analyzer.”

(17) Wikipedia paraphrases: Haskell is a standardized, general-purpose purely functional programming language, with nonstrict semantics and strong static typing.

(18) Quoting Smrž and Hajič (2010, 140): “these systems misinterpret some morphs for bearing a category, and underspecify lexical morphemes in general as to their intrinsic morphological functions.” I come back on this point while discussing the automated linguistic description of Arabic by means of programming languages or computational formalisms.

(19) A programming language describes a dynamic and deterministic process. It is dynamic because there is a beginning and a series of steps to be taken leading inevitably to an end. It is deterministic because the computer is explicitly being told from the very beginning, how to start, where to find what it needs for the execution of the program, what to do with it, what the next step will be, and when its activity will come to an end. A formalism also is an artificial, formal language but is designed as a medium for the definition or the description of static structures. Such an approach is declarative because in the formal grammar only structures are defined and described. There is a beginning, a series of rules and an end, but there is no logical link between beginning and end. Not the computer but it is the machine-readable data that determine whether a match should occur or not; that is, the parser is dependent on the input string for deciding whether or not its structure can be recognized as defined or described in the formal grammar.

(20) For Arabic, Sībawayhi (d. 798, kitāb) described nouns (N), verbs (V) and non-noun non-verb particles (-N-V) as the basic word categories. He also hinted at greater constituents with an element of one of those categories as head, but the labeling into NPs, VPs, and PaPs here is mine.

(21) See, for example, also the objectives of The Arabic Language Computing Research Group (ALCRG), King Saud University (http://ccis.ksu.edu.sa/ar/en/cs/research/ALCRG).

(22) Carter (2007: 27) discusses an earlier form of pragmatics in Larcher’s approach of ʾinšāʾ (ibid., 28). See also Larcher (1990).

(26) Most references are a bit dated but concern colloquial varieties as well as Standard Arabic: Abu Ghazaleh (1983), Abu Libdeh (1991), Alfalahi (1981), Al-Jubouri (1984), Al-Shabab (1986), Al-Tarouti (1992), Bar-Lev (1986), Daoud (1991), Fakhri (1995, 1998, 2002), Fareh (1988), Ghobrial (1993), Hatim (1987, 1989), Johnstone (1990, 1991), Khalil (1985), Koch (1981), Mughazy (2003), Russell (1977), Ryding (1992), Salib (1979), and Sawaie (1980).

(28) Accounting for all the aforementioned branches of syntax, including an opening to a semantic description of language properties.

(29) A similar linguistic (and not heuristic thus pragmatic) description for generation purposes is not yet within reach.

(30) The results of his formal system and the implementation of functional Arabic morphology (Smrž 2007b: 69) are presented in the form of unambiguous dependency trees.

(31) Here is not the most appropriate place to initiate a discussion about the processing of a nondeterministic formal description of a natural language, for instance, in CFG terms, and the processing of a deterministic (each programming language) formal description of a natural language, whether or not the results are presented in IC, DG, HPSG, or any other form.

(32) Duchier and Debusmann (2001) describe a new framework for dependency grammar with a modular decomposition of immediate dependency and linear precedence. Their approach distinguishes two orthogonal yet mutually constraining structures: a syntactic dependency tree; and a topological dependency tree. The former is nonprojective and even nonordered, while the latter is projective and partially ordered.

(33) It is, unmistakingly, my fault not to have been clear enough to explain the basic principles of my approach to language description: the analysis of a linguistic unit in terms of alternating layers of functions and categories until final (lexical) entries have been reached.

(34) For an example, see §2.3.

(35) Maybe, for Arabic, we are not yet ready to think in terms of a linguistic and implementable paragraph, section, text, volume, and, generally applicable, syntax description of MSA or colloquial varieties of Arabic.

(36) See for an overview of Arabic literature in general, among others, Sezgin (1967–2000), in particular volume 8 (lexicology) and 9 (syntax).

(37) Cf. in this perspective also Baalbaki (1979).

(38) See Appendix 2 for a paraphrase of technical terms used.

(39) See §2.3.

(40) I am exclusively working in an analysis perspective of authentic MSA text data.

(41) Superscripts to a nonterminal symbol, such as S, N, V, P, NP, VP, and PP point to categorial distinctions at the first level of description. Subscripts point to categorial, functional, or semantic characteristics at the second level of description.

(42) See §2.3.

(44) The main advantage of a formal description above a formalism, a computer or a programming language is that a linguist is able, after a simple introduction, to read, understand, and comment on your description without first becoming a mathematical, logical, or computational linguist or scientist.

(45) It is important to repeat that I exclusively use the formalism for analysis purposes. This means that the description may really be “liberal.” For synthesis objectives it is quite a different story.

(46) 1 context-free grammar + 1 context-free grammar ≡ 2 context-free grammars ≠ 1 context-sensitive grammar.

(47) Koster, 1971, 1991.

(48) CDL refers to Compiler Description Language. CDL and C are, both, imperative programming languages, but in CDL3 the notational conventions are more suited for AGFL-formatted natural language descriptions to be tested.

(49) For detailed information about probabilistic and frequency accounting properties of the AGFL formalism, I refer to the aforementioned AGFL site.

(50) For more details see the aforementioned references.

(51) A formal two-level context-free rewrite grammar means one describes one and only one nonterminal on the left-hand side and rewrites it into one or more nonterminal or terminal values from the lexicon on the right-hand side. However, this action will take place at the first and second level of description. The second level of description is included between parentheses (()), attached, if desirable this has been considered, to nonterminals of the first level.

(52) AGFL stands for affix grammar over finite lattices. Cf. www.agfl.cs.ru.nl; Koster (1971, 1991).

(53) Notational conventions: a hash (#) introduces a comment line; a double colon (::) rewrites the left-hand side of a nonterminal of the second level of description into one or more final values or another nonterminal of the second level of description; a vertical bar (|) separates alternatives on the right-hand side of the second level of description; a colon (:) rewrites the nonterminal at the left-hand side of the first level of description into one or more nonterminals or terminal values; a comma (,) separates successive elements on the right-hand side; a semicolon (;) separates alternatives on the right-hand side; an addition sign (+) tells the machine to ignore all spaces; a dot (.) ends each rule, with the exception of lexical rules. Nonterminals of the second level are written in uppercase. Terminal values of the second level are written in lower case. Terminal values of the first level are enclosed in double quotes (““).

(54) The listing of meta-, hyper-, predicate, or empty and lexical rules in reality is slightly longer.

(55) In this sample grammar some alternatives of metarule rewritings of nonterminals at the second level of description occur only for elucidating purposes.

(56) At the AGFL site mentioned as: AP4IR (Arabic [Dependency] Pairs for Information Retrieval).

(57) Words from the open categories (nouns, verbs, and to a lesser extent adjectives and modifiers) carry the aboutness of a text, the others are in fact stop words. Similarly, only triples whose head or modifier are from an open category carry the aboutness of a text, and any other triples can be discarded as stopwords (Koster, 2011).