259 lines
15 KiB
Plaintext
259 lines
15 KiB
Plaintext
From ms20@u.washington.edu Sat Sep 3 22:48:53 1994
|
||
Date: Sat, 3 Sep 1994 19:19:44 -0700 (PDT)
|
||
From: HIgH TeCH <ms20@u.washington.edu>
|
||
To: analogue <analogue@magnus.acs.ohio-state.edu>
|
||
Subject: Talking Machines (Long!)
|
||
|
||
This is an excerpt taken from J.L.Flanagan's Speech Analysis, Synthesis,
|
||
and Perception, Second Edition Pages 204-211
|
||
|
||
The text is reproduced as it is in the book, except where references to
|
||
illustrations were made.
|
||
|
||
I mainly wanted to expose the readers to the history of speech synthesis
|
||
preceding the Vocoder, so anything actually involving the Vocoder is not
|
||
included in this text. Don't let that discourage you. This is good reading!
|
||
|
||
Enjoy,
|
||
|
||
Romeo Fahl
|
||
++++++++++
|
||
ms20@u.washington.edu
|
||
|
||
-------------------------------------------------------------------------
|
||
|
||
|
||
SPEECH SYNTHESIS
|
||
----------------
|
||
|
||
|
||
Ancient man often took his ability of speech as a symbol of divine
|
||
origin. Not unnaturally, he sometimes ascribed the same ability to his gods.
|
||
Pagan priests, eager to fulfill great expectations, frequently tried to make
|
||
their idols speak directly to the people. Talking statues, miraculous
|
||
voices and oracles were well known in the Greek and Roman civilizations -
|
||
the voice usually coming to the artificial mouth via cleverly concealed
|
||
speaking tubes. Throughout early times the capacity of "artificial speech"
|
||
to amaze, amuse and influence its listeners was remarkably well appreciated
|
||
and exploited.
|
||
As the civilized world entered the Renaissance scientific
|
||
curiousity developed and expanded. Man began to inquire more seriously
|
||
into the nature of things. Human life and physiological functions were
|
||
fair targets of study, and the physiological mechanism of speech belonged
|
||
in this sphere. Not surprisingly, the relatively complex vocal mechanism
|
||
was often considered in terms of more tractable models. These early models
|
||
were invariably mechanical contrivances, and some were exceedingly clever
|
||
in design.
|
||
|
||
MECHANICAL SPEAKING MACHINES: HISTORICAL EFFORTS
|
||
------------------------------------------------
|
||
|
||
One of the earliest documented efforts at speech synthesis was by
|
||
Kratzenstein in 1779. The Imperial Academy of St.Petersburg offered its
|
||
annual prize for explaining the physiological differences between five
|
||
vowels, and for making apparatus to produce them artificially. As the
|
||
winning solution, Kratzenstein constructed acoustic resonators with
|
||
vibrating reeds which, in a manner analogous to the human vocal cords,
|
||
interrupted an air stream.
|
||
A few years later (1791), Von Kempelen constructed and demonstrated
|
||
a more elaborate machine for generating connected utterances. [Apparently
|
||
Von Kempelen's efforts antedate Kratzenstein's, since Von Kempelen
|
||
pruportedly began work on his device in 1769 (Von Kempelen; Dudley and
|
||
Tarnoczy).] Although his machine received considerable publicity, it was
|
||
not taken as seriously as it should have been. Von Kempelen had earlier
|
||
perpetrated a deception in the form of a mechanical chess-playing machine.
|
||
The main "mechanism" of the machine was a concealed, legless man - an
|
||
expert chess player.
|
||
The speaking machine, however, was a completely legitimate device.
|
||
It used a bellows to supply air to a reed which, in turn, excited a single,
|
||
hand-varied resonator for producing voiced sounds. Consonants, including
|
||
nasals, were simulated by four separate constricted passages, controlled by
|
||
the fingers of the other hand. An improved version of the machine was
|
||
built from Von Kempelen's description by Sir Charles Wheatstone (of the
|
||
Wheatstone Bridge, and who is credited in Britain with the invention of the
|
||
telegraph).
|
||
Briefly, the device was operated in the following manner. The
|
||
right arm rested on the main bellows and expelled air through a vibrating
|
||
reed to produce voiced sounds. The fingers of the right hand controlled
|
||
the air passages for the fricatives /<2F>/ and /s/, as well as the "nostril"
|
||
openings and the reed on-off control. For vowel sounds, all the passages
|
||
were closed and the reed turned on. Control of vowel resonances was
|
||
effected with the left hand by suitably deforming the leather resonator at
|
||
the front of the device. Unvoiced sounds were produced with the reed off,
|
||
and by a turbulent flow through a suitable passage. In the original work,
|
||
Von Kempelen claimed that approximately 19 consonant sounds could be made
|
||
passably well.
|
||
Von Kempelen's efforts probably had a more far-reaching influence
|
||
than is generally appreciated. During Alexander Graham Bell's boyhood in
|
||
Edingburgh, Scotland (latter 1800's), Bell had an opportunity to see the
|
||
reproduction of Von Kempelen's machine which had been constructed by
|
||
Wheatstone. He was greatly impressed with the device. With stimulation
|
||
from his father (Alexander Melville Bell, an elocutionist like his own
|
||
father), and his brother Melville's assistance, Bell set out to construct a
|
||
speaking automaton of his own.
|
||
Following their father's advice, the boys attempted to copy the
|
||
vocal organs by making a cast from a human skull and molding the vocal
|
||
parts in the gutta-percha. The lips, tongue, palate, teeth, pharynx, and
|
||
velum were represented. The lips were a frame-work of wire, covered with
|
||
rubber which had been stuffed with cotton batting. Rubber checks were
|
||
enclosed in the mouth cavity, and the tongue was simulated by
|
||
wooden sections - likewise covered by a rubber skin and stuffed with
|
||
batting. The parts were actuated by levers controlled from a keyboard. A
|
||
larynx "box" was constructed of tin and had a flexible tube for a windpipe.
|
||
A vocal cord orifice was made by stretching a slotted rubber sheet over tin
|
||
supports.
|
||
Bell says the device could be made to say vowels and nasals and
|
||
could be manipulated to produce a few simple utterances (apparently well
|
||
enough to attract the neighbors). It is tempting to speculate how this
|
||
boyhood interest may have been decisive in leading to U.S. patent No.
|
||
174,465, dated February 14, 1876 - describing the telephone, and which has
|
||
been perhaps one of the most valuable patents in history.
|
||
Bell's youthful interest in speech production also led him to
|
||
experiment with his pet Skye terrier. He taught the dog to sit up on his
|
||
hind legs and growl continuously. At the same time, Bell manipulated the
|
||
dog's vocal tract by hand. The dog's repertoire of sounds finally
|
||
consisted of the vowels /a/ and /u/, the diphthong /ou/ and the syllables
|
||
/ma/ and /ga/. His greatest linguistic accomplishment consisted of the
|
||
sentence, "How are you Grandmamma?" The dog apparently started taking a
|
||
"bread and butter" interest in the project and would try to talk by
|
||
himself. But on his own, he could never do better than the usual growl.
|
||
This, according to Bell, is the only foundation to the rumor that he once
|
||
taught a dog to speak.
|
||
Interest in mechanical analogs of the vocal system continued to the
|
||
twentieth century. Among those who developed a penetrating understanding
|
||
of the nature of human speech was Sir Richard Paget. Besides making
|
||
accurate plaster tube models of the vocal tract, he was also adept at
|
||
simulating vocal configurations with his hands. He could literally "talk
|
||
with his hands" by cupping them and exciting the cavities either with a
|
||
reed, or with thelips made to vibrate after the fashion of blowing a
|
||
trumpet.
|
||
Around the same time, a different approach to artificial speech was
|
||
taken by people like Helmholtz, D.C. Miller, Stumpf, and Koenig. Their
|
||
view was more from the point of perception than from production. Helmholtz
|
||
synthesized vowel sounds by causing a sufficient number of tuning forks to
|
||
vibrate at selected frequencies and with prescribed amplitudes. Miller and
|
||
Stumpf, on the other hand, accomplished the same thing by sounding organ
|
||
pipes. Still different, Koenig synthesized vowel spectra from a siren in
|
||
which air jets were directed at rotating, toothed wheels.
|
||
At least one more-recent design for a mechanical talker has been
|
||
put forward (Riesz, unpublished, 1937). Air under pressure is brought from
|
||
a reservoir at the right. Two valves control the flow. The first valve
|
||
admits air into a chamber in which a reed is fixed. The reed vibrates and
|
||
interrupts the air flow much like the vocal cords. A spring-loaded slider
|
||
varies the effective length of the reed and changes its fundamental
|
||
frequency. Unvoiced sounds are produced by admitting air through the
|
||
second valve. The configuration of the vocal tract is varied by means of
|
||
nine movable members representing the lips, teeth, tongue, pharynx, and
|
||
velar coupling.
|
||
To simplify the control, Riesz constructed the mechanical talker
|
||
with finger keys to control the configuration, but with only one control
|
||
each for lips and teeth (which worked in opposition to each other). The
|
||
different members were covered with a soft rubber lining to accomplish
|
||
realistic closures and dampings. Two keys (4 and 5) operate excitation
|
||
valves (V4 and V5), arranged somewhat differently than the first two.
|
||
Valve V4 admits air through a hole forward in the tract for producing
|
||
unvoiced sounds. Valve V5 supplies air to the reed chamber for voiced
|
||
excitation. In this case pitch is controlled by the amount of air passed
|
||
by the valve V5. When operated by a skilled person, the machine could be
|
||
made to simulate connected speech. One of its particularly good utterances
|
||
was reported to be "cigarette".
|
||
|
||
|
||
ELECTRICAL METHODS FOR SPEECH SYNTHESIS
|
||
---------------------------------------
|
||
|
||
|
||
With the evolution of electrical technology, interest in speech
|
||
synthesis assumed a broader basis. Academic interest in the physiology and
|
||
acoustics of the signal-producing mechanism was supplemented by the
|
||
potential for communicating at a distance. Although "facsimile waveform"
|
||
transmission of speech was the first method to be applied successfully
|
||
(i.e. in the telephone), many early inventors appreciated the resonance
|
||
nature of the vocal system and the importance to intelligibility of
|
||
preserving the short-time amplitude spectrum *. Analytical formulation and
|
||
practical application of this knowledge were longer in coming.
|
||
|
||
SPECTRUM RECONSTRUCTION TECHNIQUES
|
||
----------------------------------
|
||
|
||
Investigators such as Helmholtz, D.C. Miller, R. Koenig and Stumpf
|
||
had earlier noted that speech-like sounds could be generated by producing
|
||
an harmonic spectrum with the correct fundamental frequency and relative
|
||
amplitudes. In other words, the signal could be synthesized with no
|
||
compelling effort at duplicating the vocal system, but mainly with the
|
||
objective of producing the desired percept. Among the first to demonstrate
|
||
the principle electrically was Stewart, who excited two coupled resonant
|
||
electrical circuits by a current interrupted at a rate analogous to the
|
||
voice fundamental. By adjusting the circuit tuning, sustained vowels could
|
||
be simulated. The apparatus was not elaborate enough to produce connected
|
||
utterances. Somewhat later, Wagner devised a similar set of four
|
||
electrical resonators, connected in parallel, and excited by a buzz-like
|
||
source. The outputs of the four resonators were combined in the proper
|
||
amplitudes to produce vowel spectra.
|
||
Probably the first electrical synthesizer which attempted to
|
||
produce connected speech was the Voder (Dudley, Riesz, and Watkins). It was
|
||
basicaly a spectrum-synthesis device operated from a finger keyboard. It
|
||
did, however, duplicate one important physiological characteristic of the
|
||
vocal system, namely, that the excitation can be voiced or unvoiced.
|
||
The "resonance control" box of the divice contains 10 contiguous
|
||
band-pass filters which span the speech frequency range and are connected
|
||
in parallel. All the filters receive excitation from either the noise
|
||
source or the buzz (relaxation) oscillator. The wrist bar selects the
|
||
excitation source, and a foot pedal controls the pitch of the buzz
|
||
oscillator. The outputs of the band-pass filters pass through
|
||
potentiometer gain controls and are added. Ten finger keys operate the
|
||
potentiometers. Three additional keys provide a transient excitation of
|
||
selected filters to simulate stop-consonant sounds.
|
||
This speaking machine was demonstrated by trained operators at the
|
||
World's Fairs of 1939 (New York) and 1940 (San Francisco). Although the
|
||
training required was quite long (on the order of a year or more), the
|
||
operators were able to "play" the machines - literally as though they were
|
||
organs or pianos - and to produce intelligible speech **. More recently,
|
||
further research studies based upon the Voder principle have been carried
|
||
out (Oizumi and Kubo).
|
||
|
||
----
|
||
*
|
||
|
||
Prominent among this group was Alexander Graham Bell. The events - in
|
||
connection with experiments on the "harmonic telegraph" - that led Bell, in
|
||
March of 1876, to apply the facsimile waveform principle are familiar to
|
||
most students of communication. Less known, perhaps, is Bell's conception
|
||
of a spectral transmission method remarkably similar to the channel
|
||
vocoder.
|
||
Bell called the idea the "harp telephone". It consisted of an
|
||
elongated electromagnet with a row of steel reeds in the magnetic circuit.
|
||
The reeds were to be arranged to vibrate in proximity to the pole of the
|
||
magnet, and were to be tuned successively to different frequencies. Bell
|
||
suggested that "-they might be considered analogous to the rods in the harp
|
||
of Corti in the human ear". Sound uttered near the reeds would cause to
|
||
vibrate those reeds corresponding to the spectral structure of the sound.
|
||
Each reed would induce in the magnet an electrical current which would
|
||
combine with the currents produced by other reeds into a resultant complex
|
||
wave. The total current passing through a similar instrument at the
|
||
receiver would, Bell thought, set identical reeds into motion and reproduce
|
||
the original sound (Watson).
|
||
The device was never constructed. The reason, Watson says, was the
|
||
prohibitive expense! Also, because of the lack of means for amplification,
|
||
Bell thought the currents generated by such a device might be too feeble to
|
||
be practicable. (Bell found with his harmonic telegraph, however, that a
|
||
magnetic transducer with a diaphragm attached to the armature could, in
|
||
fact, produce audible sound from such feeble currents.)
|
||
The principle of the "harp telephone" carries the implication that
|
||
speech intelligibility is retained by preserving the short-time amplitude
|
||
spectrum. Each reed of the device might be considered a combined
|
||
electro-accoustic transducer and bandpass filter. Except for the mixing of
|
||
the "filter" signals in a common conductor, and the absence of rectifying
|
||
and smoothing means, the spectrum reconstruction principle bears a striking
|
||
resemblance to that of the channel Vocoder.
|
||
|
||
**
|
||
|
||
H.W. Dudley retired from Bell Laboratories in October 1961. On the
|
||
completion of his more than 40 years in speech research, one of the Voder
|
||
machines was retrieved from storage and refurbished. In addition, one of
|
||
the original operators was invited to return and perform for the occasion.
|
||
Amazingly, after an interlude of twenty years, the lady was able to sit
|
||
down to the console and make the machine speak.
|
||
|