The ability to communicate in writing is an essential skill in modern society. But ability in writing varies considerably; and no matter what their existing level of competence, most writers would acknowledge that what they write could often be improved. Given that the output of the writing process is natural language, it seems plausible that natural language processing techniques might be used to analyse this output and to suggest ways to improve it. In various guises, this has indeed been an application of NLP at least since the 1960s. In this chapter, we survey the different kinds of assistance to authors that NLP makes possible; we describe what can be done today, and explore what might be possible in the future.
This chapter surveys methods of analysing phonological change that rely on computers because they require lengthy operations, mathematical precision, and reproducibility. Applications include techniques for discovering and verifying sound correspondences, modelling the course of sound change, computing the most likely genetic tree consistent with a set of innovations, testing the significance of the phonetic evidence for genetic relationship between languages, and exploring the relationships between dialects via quantification of phonetic and phonological differences.
Computational linguistics grew out of early projects in machine translation. Initially it was conceived of as a branch of artificial intelligence with the goal of complete human-like language understanding, and was concerned with symbolic methods of parsing and semantic analysis. In recent years, because of more powerful computers, the development of machine-learning algorithms, and the rise of the World Wide Web, computational linguistics has taken an empiricist view of language processing that is based on corpora and statistical methods. It emphasizes practical applications with a tolerance for some degree of error.
Temporality in computational linguistics and natural language processing can be considered from two aspects. One concerns the use of linguistic and philosophical theories of temporality in computational applications. The other concerns the use of computational theory in its own right to define new kinds of theories of dynamical systems including natural language and its temporal semantics. As in the case of nominal expressions in natural language, we should be careful to distinguish temporal semantics, or the question of what kinds of objects and relations temporal categories denote, from the question of temporal reference to particular times or events that the discourse context affords. It is useful to draw a further distinction within the semantics between temporal ontology, or the types of temporal entity that the theory entertains, such as instants, intervals, events, states, or whatever, temporal quantification over such entities, and the temporal relations over them which it countenances, such as priority or posteriority, causal dependence, and the like. This article examines computational linguistics, focusing on temporal semantics, and also considers ontologies, quantifiers, relations, and temporal reference.
Sentence comprehension draws on multiple levels of linguistic knowledge, including the phonological, orthographic, lexical, syntactic, and discoursal. This article focuses on the computational models of second language sentence processing. Understanding the computational mechanisms responsible for using this knowledge in real time provides basic insights into how language and the mind work. For a cognitive theory of second language acquisition, a better understanding of how the second language learner develops the capacity to process sentences fluently also has important implications for theories of acquisition and instruction. This article examines two perspectives on written sentence comprehension in the second language. The two approaches considered are syntax based and constraint based. The approaches make fundamentally different assumptions concerning the nature of linguistic representation and how the human speech processing mechanism uses this knowledge in online comprehension. The two perspectives also represent a basic division between formalist and functionalist/usage based approaches to second language learning and use.
Edward P. Stabler
While research in the ‘principles and parameters’ tradition can be regarded as attributing as much as possible to universal grammar (UG) in order to understand how language acquisition is possible, Chomsky characterizes the ‘minimalist program’ as an effort to attribute as little as possible to UG while still accounting for the apparent diversity of human languages. These two research strategies aim to be compatible, and ultimately should converge. Several of Chomsky's own early contributions to the minimalist program have been fundamental and simple enough to allow easy mathematical and computational study. Among these are (i) the characterization of ‘bare phrase structure’; and (ii) the definition of a structure building operation Merge which applies freely to lexical material, with constraints that ‘filter’ the results only at the phonetic form and logical form interfaces. The first studies inspired by (i) and (ii) are ‘stripped down’ to such a degree that they may seem unrelated to minimalist proposals, but this article shows how some easy steps begin to bridge the gap. It briefly surveys some proposals about (iii) syntactic features that license structure building; (iv) ‘locality’, the domain over which structure building functions operate,; (v) ‘linearization’, determining the order of pronounced forms; and (vi) the proposal that Merge involves copying.
This chapter presents a characterisation of the field of computational pragmatics, discusses some of the fundamental issues in the field, and provides a survey of recent developments. Central to computational pragmatics is the development and use of computational tools and models for studying the relations between utterances and their context of use. Essential for understanding these relations are the use of inference and the description of language use as actions inspired by the context, and intended to influence the context. The chapter therefore focuses on recent work in the use of inference for utterance interpretation and in dialogue modeling in terms of dialogue acts, viewed as context-changing actions. The chapter concludes with a survey of recent activities concerning the construction and use of resources in computational pragmatics, in particular annotation schemes, annotated corpora, and tools for corpus construction and use.
Carlos Ramisch and Aline Villavicencio
In natural-language processing, multiword expressions (MWEs) have been the focus of much attention in their many forms, including idioms, nominal compounds, verbal expressions, and collocations. In addition to their relevance for lexicographic and terminographic work, their ubiquity in language affects the performance of tasks like parsing, word sense disambiguation, and natural-language generation. They lend a mark of naturalness and fluency to applications that can deal with them, ranging from machine translation to information retrieval. This chapter presents an overview of their linguistic characteristics and discusses a variety of proposals for incorporating them into language technology, covering type-based discovery, token-based identification, and MWE-aware language technology applications.
Carol A. Chapelle
Computer-assisted language learning, defined as “the search for and study of applications of the computer in language teaching and learning”, covers a broad spectrum of concerns, but the central issues are the pedagogies implemented through technology and their evaluation. In view of the range of complex materials included under the umbrella of CALL, research and practice in this area draws from other areas within and beyond applied linguistics for conceptual and technical tools to develop practices and evaluate success. Like technologies for language learning, theories of instructed SLA have evolved dramatically over the past twenty years. One change is the evolution in the input theory that Underwood drew upon. Whereas that theory asserts that the second language is acquired unconsciously, Schmidt claims the opposite: that subliminal language learning is impossible, and that is what learners consciously notice. This requirement of noticing is meant to apply equally to all aspects of language.
In this chapter the use of corpora in natural-language processing (NLP) is overviewed. The chapter begins by defining what a corpus is. In doing so it introduces different types of corpora such as monolingual, parallel and comparable corpora. It also discusses key issues in corpus design, notably balance and representativeness. The chapter then overviews the history of corpus linguistics, from its early beginnings in the pre computer age to its current digital form. Following this there is a brief survey of the current state of corpora, taking into account recent innovations in corpus construction, notably the development of the notion of the ‘Web as corpus’. The chapter concludes by briefly considering the use of corpora in a range of NLP systems.