- Oxford Handbooks in Linguistics
- List of Contributors
- Corpus Design
- Data Collection
- Corpus Annotation: Methodology and Transcription Systems
- On Automatic Phonological Transcription of Speech Corpora
- Statistical Corpus Exploitation
- Corpus Archiving and Dissemination
- Metadata Formats
- Data Formats for Phonological Corpora
- Corpus and Research in Phonetics and Phonology: Methodological and Formal Considerations
- A Corpus-Based Study of Apicalization of /s/ before /l/ in Oslo Norwegian
- Corpora, Variation, and Phonology: An Illustration from French Liaison
- Corpus-Based Investigations of Child Phonological Development: Formal and Practical Considerations
- Corpus Phonology and Second Language Acquisition
- ELAN: Multimedia Annotation Application
- The Use of Praat in Corpus Research
- Praat Scripting
- The PhonBank Project: Data and Software-Assisted Methods for the Study of Phonology and Phonological Development
- ANVIL: The Video Annotation Research Tool
- Web-Based Archiving and Sharing of Phonological Corpora
- The IViE Corpus
- French Phonology from a Corpus Perspective: The PFC Programme
- Two Norwegian Speech Corpora: NoTa-Oslo and TAUS
- The LeaP Corpus
- The Diachronic Electronic Corpus of Tyneside English: Annotation Practices and Dissemination Strategies
- The Lanchart Corpus
- Phonological and Phonetic Databases at the Meertens Institute
- The VALIBEL Speech Database
- Prosody and Discourse in the Australian Map Task Corpus
- A Phonological Corpus of L1 Acquisition of Taiwan Southern Min
Abstract and Keywords
The goal of the present chapter is to explore the possibility of providing the research (but also the industrial) community that commonly uses spoken corpora with a stable portfolio of well-documented standardized formats that allow a high reuse rate of annotated spoken resources and, as a consequence, better interoperability across tools used to produce or exploit such resources.
Laurent Romary is Directeur de Recherche INRIA, France and guest scientist at Humboldt University in Berlin, Germany. He carries out research on the modelling of semi-structured documents, with a specific emphasis on texts and linguistic resources. He is the chairman of ISO committee TC 37/SC 4 on Language Resource Management, and has been active as member (2001-2007) then chair (2008-2011) of the TEI (Text Encoding Initiative) council. He currently contributes to the establishment and coordination of the European Dariah infrastructure for the arts and humanities.
Andreas Witt received his Ph.D. in Computational Linguistics and Text Technology from Bielefeld University in 2002, and continued there for the next four years as an instructor and researcher in those fields. In 2006 he moved to Tübingen University, where he participated in a project on “Sustainability of Linguistic Resources” and in projects on the interoperability of language data. Since 2009 he has headed the Research Infrastructure group at the Institute for the German Language in Mannheim.
Access to the complete content on Oxford Handbooks Online requires a subscription or purchase. Public users are able to search the site and view the abstracts and keywords for each book and chapter without a subscription.
If you have purchased a print title that contains an access token, please see the token for information about how to register your code.