Show Summary Details

Page of

PRINTED FROM OXFORD HANDBOOKS ONLINE ( © Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).

date: 13 June 2021

Abstract and Keywords

In this chapter the use of corpora in natural-language processing (NLP) is overviewed. The chapter begins by defining what a corpus is. In doing so it introduces different types of corpora such as monolingual, parallel and comparable corpora. It also discusses key issues in corpus design, notably balance and representativeness. The chapter then overviews the history of corpus linguistics, from its early beginnings in the pre computer age to its current digital form. Following this there is a brief survey of the current state of corpora, taking into account recent innovations in corpus construction, notably the development of the notion of the ‘Web as corpus’. The chapter concludes by briefly considering the use of corpora in a range of NLP systems.

Keywords: corpus data, corpora, empirical linguistics, comparable corpora, monolingual corpora, annotation

Access to the complete content on Oxford Handbooks Online requires a subscription or purchase. Public users are able to search the site and view the abstracts and keywords for each book and chapter without a subscription.

Please subscribe or login to access full text content.

If you have purchased a print title that contains an access token, please see the token for information about how to register your code.

For questions on access or troubleshooting, please check our FAQs, and if you can''t find the answer there, please contact us.