Abstract and Keywords
In this chapter the use of corpora in natural-language processing (NLP) is overviewed. The chapter begins by defining what a corpus is. In doing so it introduces different types of corpora such as monolingual, parallel and comparable corpora. It also discusses key issues in corpus design, notably balance and representativeness. The chapter then overviews the history of corpus linguistics, from its early beginnings in the pre computer age to its current digital form. Following this there is a brief survey of the current state of corpora, taking into account recent innovations in corpus construction, notably the development of the notion of the ‘Web as corpus’. The chapter concludes by briefly considering the use of corpora in a range of NLP systems.
Access to the complete content on Oxford Handbooks Online requires a subscription or purchase. Public users are able to search the site and view the abstracts and keywords for each book and chapter without a subscription.
If you have purchased a print title that contains an access token, please see the token for information about how to register your code.