- The Oxford Handbook of Computational Linguistics
- Pragmatics and Dialogue
- Formal Grammars and Languages
- Text Segmentation
- Part-of-Speech Tagging
- Word-Sense Disambiguation
- Anaphora Resolution
- Natural Language Generation
- Speech Recognition
- Text-to-Speech Synthesis
- Finite-State Technology
- Statistical Methods
- Machine Learning
- Lexical Knowledge Acquisition
- Sublanguages and Controlled Languages
- Corpus Linguistics
- Tree-Adjoining Grammars
- Machine Translation: General Overview
- Machine Translation: Latest Developments
- Information Retrieval
- Information Extraction
- Question Answering
- Text Summarization
- Term Extraction and Automatic Indexing
- Text Data Mining
- Natural Language Interaction
- Natural Language in Multimodal and Multimedia Systems
- Natural Language Processing in Computer-Assisted Language Learning
- Multilingual On-Line Natural Language Processing
- Notes on Contributors
- Index of Authors
- Subject Index
Abstract and Keywords
This article gives an introduction to state-of-the-art text-to-speech (TTS) synthesis systems, showing both the natural language processing and the digital signal processing problems involved. Text-to-speech (TTS) synthesis is the art of designing talking machines. The article begins with brief user-oriented description of a general TTS system and comments on its commercial applications. It then gives a functional diagram of a modern TTS system, highlighting its components. It describes its morphosyntactic module. Furthermore, it examines why sentence-level phonetization cannot be achieved by a sequence of dictionary look-ups, and describes possible implementations of the phonetizer. Finally, the article describes prosody generation, outlining how intonation and duration can approximately be computed from text. Prosody refers to certain properties of the speech signal, which are related to audible changes in pitch, loudness, and syllable length. This article also introduces the two main existing categories of techniques for waveform generation: synthesis by rule and concatenative synthesis.
Thierry Dutoit is a Professor in the Department of Electrical Engineering at the Faculté Polytechnique de Mons, Belgium where he obtained his Ph.D. in 1993. He worked for 16 months as a consultant for AT & T Labs Research in Murray Hill and Florham Park, NJ, in 1996 and 1998. Prof. Dutoit is the author of two books and more than 60 reviewed papers on speech processing and text-to-speech synthesis, and is the coordinator of the MBROLA and EULER projects for free multilingual speech synthesis. He is an Associate Editor of the IEEE Transactions on Speech and Audio Processing and a memeber of the IEEE Speech Technical Committee. His main interests are in speech synthesis and software engineering.
Yannis Stylianou is an Associate Professor of Computer Science at the University of Crete, Greece. After completing his Ph.D. at the École National Supérieure des Telecommunications (Telecom Paris) in 1996, he joined AT&T Labs Research, NJ, where he worked on Next Generation TTS and on speech coders. From February 2001 until February 2002, he was a Member of Technical Staff of Language Modelling and Research Department at Bell Labs, Murray Hill, NJ. His current interests are in various aspects of speech analysis for speech synthesis and coding.
Access to the complete content on Oxford Handbooks Online requires a subscription or purchase. Public users are able to search the site and view the abstracts and keywords for each book and chapter without a subscription.
If you have purchased a print title that contains an access token, please see the token for information about how to register your code.