Abstract and Keywords
This article gives an introduction to state-of-the-art text-to-speech (TTS) synthesis systems, showing both the natural language processing and the digital signal processing problems involved. Text-to-speech (TTS) synthesis is the art of designing talking machines. The article begins with brief user-oriented description of a general TTS system and comments on its commercial applications. It then gives a functional diagram of a modern TTS system, highlighting its components. It describes its morphosyntactic module. Furthermore, it examines why sentence-level phonetization cannot be achieved by a sequence of dictionary look-ups, and describes possible implementations of the phonetizer. Finally, the article describes prosody generation, outlining how intonation and duration can approximately be computed from text. Prosody refers to certain properties of the speech signal, which are related to audible changes in pitch, loudness, and syllable length. This article also introduces the two main existing categories of techniques for waveform generation: synthesis by rule and concatenative synthesis.
Access to the complete content on Oxford Handbooks Online requires a subscription or purchase. Public users are able to search the site and view the abstracts and keywords for each book and chapter without a subscription.
If you have purchased a print title that contains an access token, please see the token for information about how to register your code.