DEPARTMENT OF LINGUISTICS

SIDGWICK AVENUE
CAMBRIDGE CB3 9DA
UNITED KINGDOM

TEL: +44 (0)1223 335010
FAX: +44 (0)1223 335053



ProSynth homepage

An Integrated Prosodic Approach to Device-Independent, Natural-Sounding Speech Synthesis


This grant runs from October 1997 to May 2000. The award holders are Sarah Hawkins at Cambridge, in collaboration with John Local and Richard Ogden at the University of York, and Jill House and Mark Huckvale at University College London. The £268,000 project is funded by EPSRC grants GR/L53069 (Cambridge), GR/L51829 (York) and GR/L52109 (UCL)


Objectives
This project explores the viability of a phonological model that rectifies some of the phonetic weaknesses of current concatenative and formant-based text-to-speech systems. The new model integrates timing, intonation and systematic segmental variation. For the selected linguistic structures modelled, the result should be high-quality, natural-sounding synthetic speech that is robust in noise. Our objectives are:

  1. Demonstration of selected parts of a text-to-speech system constructed on linguistically-motivated, declarative computational principles.
  2. A system-independent description of the linguistic structures developed.
  3. Perceptual test results using criteria of naturalness and robustness.


Summary
Current text-to-speech systems, both concatenative and formant-based, have some common shortcomings: the speech often sounds unnatural because the rhythm, intonation and fine phonetic detail reflecting coarticulatory patterns are poor, so although intelligibility rates may be good, listeners experience increased cognitive load and poorer perception in noise. These shortcomings restrict the applications for which synthetic speech is useful. This collaborative project aims to integrate and extend existing knowledge to produce the core of a new model of computational phonology and phonetic interpretation which will deliver high-quality speech synthesis. The complete model will comprise a unified, language- and accent-independent linguistic representation. The current project is developing a partial model, using representative linguistic structures which test the viability of our approach, applied initially to Southern British English. The three focal areas of research are intonation, morphological structure, and systematic segmental variation. The common factor is a temporal model that systematically structures information from all three areas and governs the output of synthesizer parameters. The signal generation component is based on time-domain modification of natural speech signals, supplemented by formant-based synthesis and is adaptable to concatenative and formant-based methods. Evaluation includes perceptual tests for naturalness, intelligibility and communicative success under conditions of high cognitive load.

Progress
General information on the status of ProSynth can be obtained from the ProSynth page and the ProSynth newsletter.
Cambridge's contribution to ProSynth is to model acoustic-phonetic fine detail and its control in the overall structure of the synthesizer, and to assess the intelligibility and naturalness of the synthesis. The following describes our progress.

Related research issues
There is scope for research on grammatical and phonological determinants of perceptually-relevant allophonic variation, including the perceptual role of systematic long-domain cues to phonemic identity, and on developing software for conducting intelligibility and naturalness tests. These tests assess the perceptual salience of particular acoustic properties in various types of adverse listening conditions, including noise, and high cognitive load due to carrying out simultaneous linguistic and non-linguistic tasks.

email Sarah Hawkins
email Sebastian Heid
Other research grants in the Department of Linguistics
Go to Department of Linguistics
Go to Cambridge University

last updated: 29 March 2000