IPOX Demos : All-Prosodic Speech Synthesis
Arthur Dirksen
John Coleman
This document presents audio demos complementing our paper All-prosodic speech synthesis, which appears in:
J.P.H. van Santen, R.W. Sproat, J.P. Olive, & J. Hirschberg (eds.) Progress in Text-to-Speech Synthesis, New York, Springer Verlag. 91-108.
The audio demos are 16-bit .wav files, sampled at 11025 Hz, generated by IPOX.
Demo 1 - Coarticulation
Spreading of vocalic place features in the phonology is reflected in phonetic interpretation by subtle differences in frication spectra associated with /s/:
Demo 2 - Syllable overlap
Three versions of /bot@l/ bottle, with ambisyllabic /t/, generated with different amounts of syllable overlap:
- very short closure ("flapping")
- normal intervocalic closure
- long closure ("gemination")
Demo 3 - Ambisyllabicity
Intervocalic clusters are parsed with maximal ambisyllabicity. In the following words, the bracketed clusters are ambisyllabic:
- /win[t]@r/winter
- /si[st]@m/system
- /a[sp]rin/asprin
Note that /t/ is aspirated in winter, but not in system.
Demo 4 - Syllable compression I
Again, three versions of /bot@l/ bottle, this time with different amounts of compression for the first and second syllable:
Note that when the second syllable /t@l/ is compressed to 62%, the vowel is almost fully eclipsed, creating the impression of a syllabic sonorant.
Demo 5 - Syllable compression II
Two versions of /s^powz/ suppose, with different amounts of compression for the unstressed prefix /s^p/:
Demo 6 - Syllable compression III
A segmental analysis of vowel elision would seem to predict that s'pport is phonetically identical to sport . Our analysis in terms of syllable compression correctly predicts subtle (and less subtle) differences:
Note that /p/ is aspirated in s'pport, but not in sport.
Demo 7 - Syllable compression IV
The three words below have been synthesized using full vowels (/ow/ as in blow, /o/ as in pot, and /a/ as in sad ) in analysis as well as phonetic interpretation. By varying syllable compression in accordance with metrical-prosodic structure, we obtain the expected alternations between full and reduced vowels.
- /fowtograf/photograph
- /fowtografi/photography
- /fowtografikal/photographical
Demo 8 - Connected speech
Our first attempt at generation of a full sentence with IPOX.
- /Disiz@n@dapt@b@lsist@m/This is an adaptable system
February 1995, revised June 2000