Corpus-based approaches to speech synthesis have been advocated to overcome the limitations of concatenative synthesis from a xed acoustic unit inventory.

Abstract: In speech synthesis, concatenative data-driven synthesis methods prevail.

This paper presents a new approach to speech synthesis in which a set of cross-word decision-tree state-clustered context-dependent hidden Markov models are used to define a set of subphone units to be used in a concatenation synthesizer.

Models of segmental coarticulation and other phonetic factors are an important part of a text-to-speech system. The control part of a synthesis system calculates the parameter values at each time frame. Two main types of approaches can be distinguished: rule-based methods that use an explicit formulation of existing knowledge and library-based methods that replace rules by a collection of segment combinations. Clearly, each approach has its advantages. If the data are coded in terms of targets and slopes, we need methods to calculate the parameter tracks. The efforts of Holmes et al. (1964) and the filtered square wave approach by Liljencrants (1969) provide some classical examples in this context.

