- Mathematical Foundations: Formal Grammars and Languages
- Finite-State Technology
- Statistical Models for Natural Language Processing
- Machine Learning
- Word Representation
- Deep Learning
- Sublanguages and Controlled Languages
- Corpus Annotation
- Text Segmentation
- Part-of-Speech Tagging
- Semantic Role Labelling
- Word Sense Disambiguation
- Computational Treatment of Multiword Expressions
- Textual Entailment
- Natural Language Generation
- Speech Recognition
- Temporal Processing
- Text-to-Speech Synthesis
- Machine Translation
- Translation Technology
- Information Retrieval
- Information Extraction
- Question Answering
- Text Summarization
- Term Extraction
- Web Text Mining
- Opinion Mining and Sentiment Analysis
- Spoken Language Dialogue Systems
- Multimodal Systems
- Natural Language Processing for Educational Applications
- Automated Writing Assistance
- Text Simplification
- Author Profiling and Related Applications
Abstract and Keywords
Text-to-speech (TTS) synthesis is the art of designing talking machines. Seen from this functional perspective, the task looks simple, but this chapter shows that delivering intelligible, natural-sounding, and expressive speech, while also taking into account engineering costs, is a real challenge. Speech synthesis has made a long journey from the big controversy in the 1980s, between MIT’s formant synthesis and Bell Labs’ diphone-based concatenative synthesis. While unit selection technology, which appeared in the mid-1990s, can be seen as an extension of diphone-based approaches, the appearance of Hidden Markov Models (HMM) synthesis around 2005 resulted in a major shift back to models. More recently, the statistical approaches, supported by advanced deep learning architectures, have been shown to advance text analysis and normalization as well as the generation of the waveforms. Important recent milestones have been Google’s Wavenet (September 2016) and the sequence-to-sequence models referred to as Tacotron (I and II).
Thierry Dutoit graduated as an electrical engineer and Ph.D. in 1988 and 1993 from the Faculté Polytechnique de Mons (now UMONS), Belgium, where he teaches Circuit Theory, Signal Processing, and Speech Processing. In 1995, he initiated the MBROLA project for free multilingual speech synthesis. Between 1996 and 1998, he spent 16 months at AT&T-Bell Labs, in Murray Hill (NJ) and Florham Park (NJ). He is the author of several books on Speech Synthesis and Applied Signal Processing, and he wrote or co-wrote more than 20 journal papers, and more than 120 papers on speech processing, biomedical signal processing, and digital art technology. He has been an Associate Editor of the IEEE Transactions on Speech and Audio Processing (2003-2006) and the president of ISCA’s SynSIG interest group on speech synthesis, from 2007 to 2010. In 2005, he initiated the eNTERFACE 4-weeks summer workshops on Multimodal Interfaces. He was also part of the organizing committee of INTERSPEECH'07 in Antwerpen. T. Dutoit is a member of the IEEE Signal Processing and Biomedical Engineering societies, and was part of the Speech and Language Technical Committee of the IEEE (2009-2011). He was one the founders of ACAPELA-GROUP, a European leader in TTS products. Recently he founded the NUMEDIART Institute for Media Art Technology, of which he is the director.
Yannis Stylianou is Professor of Speech Processing at University of Crete, in Greece, Department of Computer Science, CSD UOC, and since 2013, he is also the Group Leader of the Speech Technology Group at Toshiba Cambridge Research Lab, UK. From 1996 until 2001 he was with AT&T Labs Research (Murray Hill and Florham Park, NJ, USA) as a Senior Technical Staff Member. In 2001 he joined Bell-Labs Lucent Technologies, in Murray Hill, NJ, USA (now Alcatel-Lucent). He holds MSc and PhD from ENST-Paris on Signal Processing and he has studied Electrical Engineering at NTUA Athens Greece (1991). He is an IEEE Fellow. His current research focuses on speech signal processing algorithms for speech analysis, statistical signal processing (detection and estimation), and time-series analysis/modeling using deep learning and signal processing. He has (co-)authored more than 200 scientific publications, and 30 US and UK patents, which have received more than 5500 citations (excluding self-citations) with H-index=36. He is the P.I. and scientific director of several European and Greek research programs and has been participating as leader in USA research programs. He has given tutorials at various Interspeech conferences, has organized special sessions at ICASSP and Interspeech, he organized the 1st IEEE SPS Winter School on Speech and Audio Processing for Immersive Environments and Future Interfaces (2012) and the 1st ISCA Summer School on Speech Processing (2014), while every year since 2014 he is organizing a Summer School on Speech Processing in Crete, Greece. He was on the Board of the International Speech Communication Association (ISCA) (2009-2013(,. He is currently member of the IEEE Speech and Language Technical Committee (2017-2019, and previously 2007-2010). He is on the Editorial Board of the Digital Signal Processing Journal of Elsevier, of Journal of Electrical and Computer Engineering, Hindawi JECE, Associate Editor of the EURASIP Journal on Speech, Audio, and Music Processing, ASMP, and of the EURASIP Research Letters in Signal Processing, RLSP. He was Associate Editor for the IEEE Signal Processing Letters, Vice-Chair of the Cost Action 2103: "Advanced Voice Function Assessment", VOICE, and on the Management Committee for the COST Action 277: "Nonlinear Speech Processing".
Access to the complete content on Oxford Handbooks Online requires a subscription or purchase. Public users are able to search the site and view the abstracts and keywords for each book and chapter without a subscription.
If you have purchased a print title that contains an access token, please see the token for information about how to register your code.