259 lines
15 KiB
Plaintext
259 lines
15 KiB
Plaintext
|
From ms20@u.washington.edu Sat Sep 3 22:48:53 1994
|
|||
|
Date: Sat, 3 Sep 1994 19:19:44 -0700 (PDT)
|
|||
|
From: HIgH TeCH <ms20@u.washington.edu>
|
|||
|
To: analogue <analogue@magnus.acs.ohio-state.edu>
|
|||
|
Subject: Talking Machines (Long!)
|
|||
|
|
|||
|
This is an excerpt taken from J.L.Flanagan's Speech Analysis, Synthesis,
|
|||
|
and Perception, Second Edition Pages 204-211
|
|||
|
|
|||
|
The text is reproduced as it is in the book, except where references to
|
|||
|
illustrations were made.
|
|||
|
|
|||
|
I mainly wanted to expose the readers to the history of speech synthesis
|
|||
|
preceding the Vocoder, so anything actually involving the Vocoder is not
|
|||
|
included in this text. Don't let that discourage you. This is good reading!
|
|||
|
|
|||
|
Enjoy,
|
|||
|
|
|||
|
Romeo Fahl
|
|||
|
++++++++++
|
|||
|
ms20@u.washington.edu
|
|||
|
|
|||
|
-------------------------------------------------------------------------
|
|||
|
|
|||
|
|
|||
|
SPEECH SYNTHESIS
|
|||
|
----------------
|
|||
|
|
|||
|
|
|||
|
Ancient man often took his ability of speech as a symbol of divine
|
|||
|
origin. Not unnaturally, he sometimes ascribed the same ability to his gods.
|
|||
|
Pagan priests, eager to fulfill great expectations, frequently tried to make
|
|||
|
their idols speak directly to the people. Talking statues, miraculous
|
|||
|
voices and oracles were well known in the Greek and Roman civilizations -
|
|||
|
the voice usually coming to the artificial mouth via cleverly concealed
|
|||
|
speaking tubes. Throughout early times the capacity of "artificial speech"
|
|||
|
to amaze, amuse and influence its listeners was remarkably well appreciated
|
|||
|
and exploited.
|
|||
|
As the civilized world entered the Renaissance scientific
|
|||
|
curiousity developed and expanded. Man began to inquire more seriously
|
|||
|
into the nature of things. Human life and physiological functions were
|
|||
|
fair targets of study, and the physiological mechanism of speech belonged
|
|||
|
in this sphere. Not surprisingly, the relatively complex vocal mechanism
|
|||
|
was often considered in terms of more tractable models. These early models
|
|||
|
were invariably mechanical contrivances, and some were exceedingly clever
|
|||
|
in design.
|
|||
|
|
|||
|
MECHANICAL SPEAKING MACHINES: HISTORICAL EFFORTS
|
|||
|
------------------------------------------------
|
|||
|
|
|||
|
One of the earliest documented efforts at speech synthesis was by
|
|||
|
Kratzenstein in 1779. The Imperial Academy of St.Petersburg offered its
|
|||
|
annual prize for explaining the physiological differences between five
|
|||
|
vowels, and for making apparatus to produce them artificially. As the
|
|||
|
winning solution, Kratzenstein constructed acoustic resonators with
|
|||
|
vibrating reeds which, in a manner analogous to the human vocal cords,
|
|||
|
interrupted an air stream.
|
|||
|
A few years later (1791), Von Kempelen constructed and demonstrated
|
|||
|
a more elaborate machine for generating connected utterances. [Apparently
|
|||
|
Von Kempelen's efforts antedate Kratzenstein's, since Von Kempelen
|
|||
|
pruportedly began work on his device in 1769 (Von Kempelen; Dudley and
|
|||
|
Tarnoczy).] Although his machine received considerable publicity, it was
|
|||
|
not taken as seriously as it should have been. Von Kempelen had earlier
|
|||
|
perpetrated a deception in the form of a mechanical chess-playing machine.
|
|||
|
The main "mechanism" of the machine was a concealed, legless man - an
|
|||
|
expert chess player.
|
|||
|
The speaking machine, however, was a completely legitimate device.
|
|||
|
It used a bellows to supply air to a reed which, in turn, excited a single,
|
|||
|
hand-varied resonator for producing voiced sounds. Consonants, including
|
|||
|
nasals, were simulated by four separate constricted passages, controlled by
|
|||
|
the fingers of the other hand. An improved version of the machine was
|
|||
|
built from Von Kempelen's description by Sir Charles Wheatstone (of the
|
|||
|
Wheatstone Bridge, and who is credited in Britain with the invention of the
|
|||
|
telegraph).
|
|||
|
Briefly, the device was operated in the following manner. The
|
|||
|
right arm rested on the main bellows and expelled air through a vibrating
|
|||
|
reed to produce voiced sounds. The fingers of the right hand controlled
|
|||
|
the air passages for the fricatives /<2F>/ and /s/, as well as the "nostril"
|
|||
|
openings and the reed on-off control. For vowel sounds, all the passages
|
|||
|
were closed and the reed turned on. Control of vowel resonances was
|
|||
|
effected with the left hand by suitably deforming the leather resonator at
|
|||
|
the front of the device. Unvoiced sounds were produced with the reed off,
|
|||
|
and by a turbulent flow through a suitable passage. In the original work,
|
|||
|
Von Kempelen claimed that approximately 19 consonant sounds could be made
|
|||
|
passably well.
|
|||
|
Von Kempelen's efforts probably had a more far-reaching influence
|
|||
|
than is generally appreciated. During Alexander Graham Bell's boyhood in
|
|||
|
Edingburgh, Scotland (latter 1800's), Bell had an opportunity to see the
|
|||
|
reproduction of Von Kempelen's machine which had been constructed by
|
|||
|
Wheatstone. He was greatly impressed with the device. With stimulation
|
|||
|
from his father (Alexander Melville Bell, an elocutionist like his own
|
|||
|
father), and his brother Melville's assistance, Bell set out to construct a
|
|||
|
speaking automaton of his own.
|
|||
|
Following their father's advice, the boys attempted to copy the
|
|||
|
vocal organs by making a cast from a human skull and molding the vocal
|
|||
|
parts in the gutta-percha. The lips, tongue, palate, teeth, pharynx, and
|
|||
|
velum were represented. The lips were a frame-work of wire, covered with
|
|||
|
rubber which had been stuffed with cotton batting. Rubber checks were
|
|||
|
enclosed in the mouth cavity, and the tongue was simulated by
|
|||
|
wooden sections - likewise covered by a rubber skin and stuffed with
|
|||
|
batting. The parts were actuated by levers controlled from a keyboard. A
|
|||
|
larynx "box" was constructed of tin and had a flexible tube for a windpipe.
|
|||
|
A vocal cord orifice was made by stretching a slotted rubber sheet over tin
|
|||
|
supports.
|
|||
|
Bell says the device could be made to say vowels and nasals and
|
|||
|
could be manipulated to produce a few simple utterances (apparently well
|
|||
|
enough to attract the neighbors). It is tempting to speculate how this
|
|||
|
boyhood interest may have been decisive in leading to U.S. patent No.
|
|||
|
174,465, dated February 14, 1876 - describing the telephone, and which has
|
|||
|
been perhaps one of the most valuable patents in history.
|
|||
|
Bell's youthful interest in speech production also led him to
|
|||
|
experiment with his pet Skye terrier. He taught the dog to sit up on his
|
|||
|
hind legs and growl continuously. At the same time, Bell manipulated the
|
|||
|
dog's vocal tract by hand. The dog's repertoire of sounds finally
|
|||
|
consisted of the vowels /a/ and /u/, the diphthong /ou/ and the syllables
|
|||
|
/ma/ and /ga/. His greatest linguistic accomplishment consisted of the
|
|||
|
sentence, "How are you Grandmamma?" The dog apparently started taking a
|
|||
|
"bread and butter" interest in the project and would try to talk by
|
|||
|
himself. But on his own, he could never do better than the usual growl.
|
|||
|
This, according to Bell, is the only foundation to the rumor that he once
|
|||
|
taught a dog to speak.
|
|||
|
Interest in mechanical analogs of the vocal system continued to the
|
|||
|
twentieth century. Among those who developed a penetrating understanding
|
|||
|
of the nature of human speech was Sir Richard Paget. Besides making
|
|||
|
accurate plaster tube models of the vocal tract, he was also adept at
|
|||
|
simulating vocal configurations with his hands. He could literally "talk
|
|||
|
with his hands" by cupping them and exciting the cavities either with a
|
|||
|
reed, or with thelips made to vibrate after the fashion of blowing a
|
|||
|
trumpet.
|
|||
|
Around the same time, a different approach to artificial speech was
|
|||
|
taken by people like Helmholtz, D.C. Miller, Stumpf, and Koenig. Their
|
|||
|
view was more from the point of perception than from production. Helmholtz
|
|||
|
synthesized vowel sounds by causing a sufficient number of tuning forks to
|
|||
|
vibrate at selected frequencies and with prescribed amplitudes. Miller and
|
|||
|
Stumpf, on the other hand, accomplished the same thing by sounding organ
|
|||
|
pipes. Still different, Koenig synthesized vowel spectra from a siren in
|
|||
|
which air jets were directed at rotating, toothed wheels.
|
|||
|
At least one more-recent design for a mechanical talker has been
|
|||
|
put forward (Riesz, unpublished, 1937). Air under pressure is brought from
|
|||
|
a reservoir at the right. Two valves control the flow. The first valve
|
|||
|
admits air into a chamber in which a reed is fixed. The reed vibrates and
|
|||
|
interrupts the air flow much like the vocal cords. A spring-loaded slider
|
|||
|
varies the effective length of the reed and changes its fundamental
|
|||
|
frequency. Unvoiced sounds are produced by admitting air through the
|
|||
|
second valve. The configuration of the vocal tract is varied by means of
|
|||
|
nine movable members representing the lips, teeth, tongue, pharynx, and
|
|||
|
velar coupling.
|
|||
|
To simplify the control, Riesz constructed the mechanical talker
|
|||
|
with finger keys to control the configuration, but with only one control
|
|||
|
each for lips and teeth (which worked in opposition to each other). The
|
|||
|
different members were covered with a soft rubber lining to accomplish
|
|||
|
realistic closures and dampings. Two keys (4 and 5) operate excitation
|
|||
|
valves (V4 and V5), arranged somewhat differently than the first two.
|
|||
|
Valve V4 admits air through a hole forward in the tract for producing
|
|||
|
unvoiced sounds. Valve V5 supplies air to the reed chamber for voiced
|
|||
|
excitation. In this case pitch is controlled by the amount of air passed
|
|||
|
by the valve V5. When operated by a skilled person, the machine could be
|
|||
|
made to simulate connected speech. One of its particularly good utterances
|
|||
|
was reported to be "cigarette".
|
|||
|
|
|||
|
|
|||
|
ELECTRICAL METHODS FOR SPEECH SYNTHESIS
|
|||
|
---------------------------------------
|
|||
|
|
|||
|
|
|||
|
With the evolution of electrical technology, interest in speech
|
|||
|
synthesis assumed a broader basis. Academic interest in the physiology and
|
|||
|
acoustics of the signal-producing mechanism was supplemented by the
|
|||
|
potential for communicating at a distance. Although "facsimile waveform"
|
|||
|
transmission of speech was the first method to be applied successfully
|
|||
|
(i.e. in the telephone), many early inventors appreciated the resonance
|
|||
|
nature of the vocal system and the importance to intelligibility of
|
|||
|
preserving the short-time amplitude spectrum *. Analytical formulation and
|
|||
|
practical application of this knowledge were longer in coming.
|
|||
|
|
|||
|
SPECTRUM RECONSTRUCTION TECHNIQUES
|
|||
|
----------------------------------
|
|||
|
|
|||
|
Investigators such as Helmholtz, D.C. Miller, R. Koenig and Stumpf
|
|||
|
had earlier noted that speech-like sounds could be generated by producing
|
|||
|
an harmonic spectrum with the correct fundamental frequency and relative
|
|||
|
amplitudes. In other words, the signal could be synthesized with no
|
|||
|
compelling effort at duplicating the vocal system, but mainly with the
|
|||
|
objective of producing the desired percept. Among the first to demonstrate
|
|||
|
the principle electrically was Stewart, who excited two coupled resonant
|
|||
|
electrical circuits by a current interrupted at a rate analogous to the
|
|||
|
voice fundamental. By adjusting the circuit tuning, sustained vowels could
|
|||
|
be simulated. The apparatus was not elaborate enough to produce connected
|
|||
|
utterances. Somewhat later, Wagner devised a similar set of four
|
|||
|
electrical resonators, connected in parallel, and excited by a buzz-like
|
|||
|
source. The outputs of the four resonators were combined in the proper
|
|||
|
amplitudes to produce vowel spectra.
|
|||
|
Probably the first electrical synthesizer which attempted to
|
|||
|
produce connected speech was the Voder (Dudley, Riesz, and Watkins). It was
|
|||
|
basicaly a spectrum-synthesis device operated from a finger keyboard. It
|
|||
|
did, however, duplicate one important physiological characteristic of the
|
|||
|
vocal system, namely, that the excitation can be voiced or unvoiced.
|
|||
|
The "resonance control" box of the divice contains 10 contiguous
|
|||
|
band-pass filters which span the speech frequency range and are connected
|
|||
|
in parallel. All the filters receive excitation from either the noise
|
|||
|
source or the buzz (relaxation) oscillator. The wrist bar selects the
|
|||
|
excitation source, and a foot pedal controls the pitch of the buzz
|
|||
|
oscillator. The outputs of the band-pass filters pass through
|
|||
|
potentiometer gain controls and are added. Ten finger keys operate the
|
|||
|
potentiometers. Three additional keys provide a transient excitation of
|
|||
|
selected filters to simulate stop-consonant sounds.
|
|||
|
This speaking machine was demonstrated by trained operators at the
|
|||
|
World's Fairs of 1939 (New York) and 1940 (San Francisco). Although the
|
|||
|
training required was quite long (on the order of a year or more), the
|
|||
|
operators were able to "play" the machines - literally as though they were
|
|||
|
organs or pianos - and to produce intelligible speech **. More recently,
|
|||
|
further research studies based upon the Voder principle have been carried
|
|||
|
out (Oizumi and Kubo).
|
|||
|
|
|||
|
----
|
|||
|
*
|
|||
|
|
|||
|
Prominent among this group was Alexander Graham Bell. The events - in
|
|||
|
connection with experiments on the "harmonic telegraph" - that led Bell, in
|
|||
|
March of 1876, to apply the facsimile waveform principle are familiar to
|
|||
|
most students of communication. Less known, perhaps, is Bell's conception
|
|||
|
of a spectral transmission method remarkably similar to the channel
|
|||
|
vocoder.
|
|||
|
Bell called the idea the "harp telephone". It consisted of an
|
|||
|
elongated electromagnet with a row of steel reeds in the magnetic circuit.
|
|||
|
The reeds were to be arranged to vibrate in proximity to the pole of the
|
|||
|
magnet, and were to be tuned successively to different frequencies. Bell
|
|||
|
suggested that "-they might be considered analogous to the rods in the harp
|
|||
|
of Corti in the human ear". Sound uttered near the reeds would cause to
|
|||
|
vibrate those reeds corresponding to the spectral structure of the sound.
|
|||
|
Each reed would induce in the magnet an electrical current which would
|
|||
|
combine with the currents produced by other reeds into a resultant complex
|
|||
|
wave. The total current passing through a similar instrument at the
|
|||
|
receiver would, Bell thought, set identical reeds into motion and reproduce
|
|||
|
the original sound (Watson).
|
|||
|
The device was never constructed. The reason, Watson says, was the
|
|||
|
prohibitive expense! Also, because of the lack of means for amplification,
|
|||
|
Bell thought the currents generated by such a device might be too feeble to
|
|||
|
be practicable. (Bell found with his harmonic telegraph, however, that a
|
|||
|
magnetic transducer with a diaphragm attached to the armature could, in
|
|||
|
fact, produce audible sound from such feeble currents.)
|
|||
|
The principle of the "harp telephone" carries the implication that
|
|||
|
speech intelligibility is retained by preserving the short-time amplitude
|
|||
|
spectrum. Each reed of the device might be considered a combined
|
|||
|
electro-accoustic transducer and bandpass filter. Except for the mixing of
|
|||
|
the "filter" signals in a common conductor, and the absence of rectifying
|
|||
|
and smoothing means, the spectrum reconstruction principle bears a striking
|
|||
|
resemblance to that of the channel Vocoder.
|
|||
|
|
|||
|
**
|
|||
|
|
|||
|
H.W. Dudley retired from Bell Laboratories in October 1961. On the
|
|||
|
completion of his more than 40 years in speech research, one of the Voder
|
|||
|
machines was retrieved from storage and refurbished. In addition, one of
|
|||
|
the original operators was invited to return and perform for the occasion.
|
|||
|
Amazingly, after an interlude of twenty years, the lady was able to sit
|
|||
|
down to the console and make the machine speak.
|
|||
|
|