The ability to communicate in writing is an essential skill in modern society. But ability in writing varies considerably; and no matter what their existing level of competence, most writers would acknowledge that what they write could often be improved. Given that the output of the writing process is natural language, it seems plausible that natural language processing techniques might be used to analyse this output and to suggest ways to improve it. In various guises, this has indeed been an application of NLP at least since the 1960s. In this chapter, we survey the different kinds of assistance to authors that NLP makes possible; we describe what can be done today, and explore what might be possible in the future.
This chapter surveys methods of analysing phonological change that rely on computers because they require lengthy operations, mathematical precision, and reproducibility. Applications include techniques for discovering and verifying sound correspondences, modelling the course of sound change, computing the most likely genetic tree consistent with a set of innovations, testing the significance of the phonetic evidence for genetic relationship between languages, and exploring the relationships between dialects via quantification of phonetic and phonological differences.
Temporality in computational linguistics and natural language processing can be considered from two aspects. One concerns the use of linguistic and philosophical theories of temporality in computational applications. The other concerns the use of computational theory in its own right to define new kinds of theories of dynamical systems including natural language and its temporal semantics. As in the case of nominal expressions in natural language, we should be careful to distinguish temporal semantics, or the question of what kinds of objects and relations temporal categories denote, from the question of temporal reference to particular times or events that the discourse context affords. It is useful to draw a further distinction within the semantics between temporal ontology, or the types of temporal entity that the theory entertains, such as instants, intervals, events, states, or whatever, temporal quantification over such entities, and the temporal relations over them which it countenances, such as priority or posteriority, causal dependence, and the like. This article examines computational linguistics, focusing on temporal semantics, and also considers ontologies, quantifiers, relations, and temporal reference.
Computational linguistics grew out of early projects in machine translation. Initially it was conceived of as a branch of artificial intelligence with the goal of complete human-like language understanding, and was concerned with symbolic methods of parsing and semantic analysis. In recent years, because of more powerful computers, the development of machine-learning algorithms, and the rise of the World Wide Web, computational linguistics has taken an empiricist view of language processing that is based on corpora and statistical methods. It emphasizes practical applications with a tolerance for some degree of error.
This chapter introduces the fields of Computational Linguistics (CL)—the computational modelling of linguistic representations and theories—and Natural Language Processing (NLP)—the design and implementation of tools for automated language understanding and production—and discusses some of the existing tensions between the formal approach to linguistics and the current state of the research and development in CL and NLP. The paper goes on to explain the specific challenges faced by CL and NLP for Persian, much of it derived from the intricacies presented by the Perso-Arabic script in automatically identifying word and phrase boundaries in text, as well as difficulties in automatic processing of compound words and light verb constructions. The chapter then provides an overview of the state of the art in current and recent CL and NLP for Persian. It concludes with areas for improvement and suggestions for future directions.
Sentence comprehension draws on multiple levels of linguistic knowledge, including the phonological, orthographic, lexical, syntactic, and discoursal. This article focuses on the computational models of second language sentence processing. Understanding the computational mechanisms responsible for using this knowledge in real time provides basic insights into how language and the mind work. For a cognitive theory of second language acquisition, a better understanding of how the second language learner develops the capacity to process sentences fluently also has important implications for theories of acquisition and instruction. This article examines two perspectives on written sentence comprehension in the second language. The two approaches considered are syntax based and constraint based. The approaches make fundamentally different assumptions concerning the nature of linguistic representation and how the human speech processing mechanism uses this knowledge in online comprehension. The two perspectives also represent a basic division between formalist and functionalist/usage based approaches to second language learning and use.
Edward P. Stabler
While research in the ‘principles and parameters’ tradition can be regarded as attributing as much as possible to universal grammar (UG) in order to understand how language acquisition is possible, Chomsky characterizes the ‘minimalist program’ as an effort to attribute as little as possible to UG while still accounting for the apparent diversity of human languages. These two research strategies aim to be compatible, and ultimately should converge. Several of Chomsky's own early contributions to the minimalist program have been fundamental and simple enough to allow easy mathematical and computational study. Among these are (i) the characterization of ‘bare phrase structure’; and (ii) the definition of a structure building operation Merge which applies freely to lexical material, with constraints that ‘filter’ the results only at the phonetic form and logical form interfaces. The first studies inspired by (i) and (ii) are ‘stripped down’ to such a degree that they may seem unrelated to minimalist proposals, but this article shows how some easy steps begin to bridge the gap. It briefly surveys some proposals about (iii) syntactic features that license structure building; (iv) ‘locality’, the domain over which structure building functions operate,; (v) ‘linearization’, determining the order of pronounced forms; and (vi) the proposal that Merge involves copying.
This chapter presents a characterisation of the field of computational pragmatics, discusses some of the fundamental issues in the field, and provides a survey of recent developments. Central to computational pragmatics is the development and use of computational tools and models for studying the relations between utterances and their context of use. Essential for understanding these relations are the use of inference and the description of language use as actions inspired by the context, and intended to influence the context. The chapter therefore focuses on recent work in the use of inference for utterance interpretation and in dialogue modeling in terms of dialogue acts, viewed as context-changing actions. The chapter concludes with a survey of recent activities concerning the construction and use of resources in computational pragmatics, in particular annotation schemes, annotated corpora, and tools for corpus construction and use.
Carlos Ramisch and Aline Villavicencio
In natural-language processing, multiword expressions (MWEs) have been the focus of much attention in their many forms, including idioms, nominal compounds, verbal expressions, and collocations. In addition to their relevance for lexicographic and terminographic work, their ubiquity in language affects the performance of tasks like parsing, word sense disambiguation, and natural-language generation. They lend a mark of naturalness and fluency to applications that can deal with them, ranging from machine translation to information retrieval. This chapter presents an overview of their linguistic characteristics and discusses a variety of proposals for incorporating them into language technology, covering type-based discovery, token-based identification, and MWE-aware language technology applications.
Carol A. Chapelle
Computer-assisted language learning, defined as “the search for and study of applications of the computer in language teaching and learning”, covers a broad spectrum of concerns, but the central issues are the pedagogies implemented through technology and their evaluation. In view of the range of complex materials included under the umbrella of CALL, research and practice in this area draws from other areas within and beyond applied linguistics for conceptual and technical tools to develop practices and evaluate success. Like technologies for language learning, theories of instructed SLA have evolved dramatically over the past twenty years. One change is the evolution in the input theory that Underwood drew upon. Whereas that theory asserts that the second language is acquired unconsciously, Schmidt claims the opposite: that subliminal language learning is impossible, and that is what learners consciously notice. This requirement of noticing is meant to apply equally to all aspects of language.