In the last decade or so, there has been a number of
pioneering studies of vocal tract shapes, using Magnetic Resonance
Imaging. (For a
page of links to other sites click
here.)
To begin with, conventional static images were acquired. Since the
"exposure time" required in these early studies was rather long (many
seconds), only "prolongable" articulations - vowels and continuant
consonants (e.g. nasals, liquids and fricatives) - could be examined.
More recently, a number of research groups have attempted to acquire
dynamic MRI image sequences, i.e. MRI "movies". One very successful
technique involves the acquisition of single images from an utterance
that is repeated over
and over again. The single images, from different times in the
production
of the utterance, can be put together to form an animation, showing the
movements of speech organs with a very fine degree of temporal
resolution. You can
see a clip
here
(9.7 MB QuickTime movie) that is made from an MR image sequence
obtained by a member of the Phonetics Laboratory, Greg Kochanski, at
the University of Oxford Centre for Clinical Magnetic Resonance
Research. It is of repeated utterances of "I'm a spotted chicken".
(Note that the clip begins about half-way through the cycle, so that
the sequence is actually "chicken ... I'm a spotted".) If the images
are displayed too large, it might be worth while downloading the file
to your disk and then playing it at a smaller scale using e.g. the
Apple QuickTime player.
However, the animation technique is just that - animation, not a
real-time record of the movements of a single speech event. It assumes
that the movements
produced on each repetition are identical, or at least do not vary
significantly.
Therefore, other research groups (including ours) are investigating the
use of very fast MR image acquisition ("real time" MRI)
Our first real time investigation, illustrated in the following movie
clip, shows the movements of the speech organs in a phonetically
untrained
speaker who repeats the word "Elgar" at a moderate rate several times
in
succession. The images were acquired at a rate of 6 frames per second.
Although
this is fast enough to give the visual impression of continuous, fluid
movement,
at 166 ms per frame it is incapable of capturing some very
quickly-changing
articulatory details, such as the rapid transitions associated with
transitions
between consonants and vowels.
(Click on the image to view or download the movie [2.1 MByte
QuickTime].)
In order to attain the high frame rate of this movie (high for MRI,
that is), we acquire data in only a single plane, and we have to accept
a relatively poor level of contrast between different types of tissue,
cartilage, bone, fat etc. Nevertheless, the contrast between tissue and
air is quite clear, which makes such images quiet adequate for studying
certain aspects of articulation.
In a recent study, we have investigated the suitability of this method
for investigating articulations of three degrees of complexity: i)
single vowels, ii) CVC syllables, and iii) whole sentences. The second
movie clip is of the syllable sequence "peat, pit, pet, pat, part":
Click on one of the following to view or download the movie: 1.3 MByte AVI clip (for Windows Media Player)
or
5 MByte QuickTime clip.
This sequence was acquired on a newer scanner, at a
lower frame rate (3 frames per second), with a corresponding
improvement in the resolution of tissue contrasts. At this frame rate,
vowels are easily discriminated from one another, and some details of
the consonants can be made out: the raising of the tongue tip for the
final [t]'s, for instance, but not the lip movements, which are not
quite in the frame.
A table of tongue positions for British English vowels (as uttered by
one speaker, at least) is available
here.
Larynx position in vowels
Even though 3 frames per second is quite a slow frame rate, some
aspects of speech articulation are slow enough to be perfectly visible
at this rate. In acquiring image sequences of vowel articulations, we
were particularly interested in the way in which the position of the
larynx is altered for
vowels of different pitch. It has been known for some time that the
whole
larynx moves as part of the process of pitch control, but it is
difficult
to observe these movements with other instruments. With dynamic MRI,
however,
it is relatively straightforward to observe and measure them. For
example,
in the 1.7 MByte QuickTime video clip associated with the figure below,
it
is easy to see the fall and rise of the larynx during the production of
a
vowel [i], spoken with a falling-rising pitch contour:
By using the "frame forward" button on the QuickTime player, it is
possible to observe the fall and rise of the larynx, frame-by-frame.
(To view the
sequence to best effect, it may be helpful to set the QuickTime player
to
display the images at half normal size.) Two blue lines have been
superimposed on each image. The upper, horizontal line segment marks
the base of the
first cranial vertebra. The lower end of the sloping line is positioned
at the base of the epiglottis, which moves up and down (and forwards
and
backwards) along with the thyroid cartilage and other tissues of the
larynx.
By making measurements of all the frames in a number of such sequences,
as spoken by a single phonetically-trained subject, we have determined
that, for this chap at least, his larynx is significantly higher and
more advanced for high-pitched vowels than for the same vowels spoken
with low pitch.
We have recently completed a study of such larynx movements in 6
phonetically
untrained subjects that corroborates this pattern of larynx raising and
lowering
for high vs. low-pitched vowels. A summary of the project report can be
inspected
here.
(This study was supported by British Academy grant SG-36269.)