textfiles/music/mpdl.t

1493 lines
73 KiB
Perl

[To be published in Computer Music Journal 18:4 (Winter 94)]
The ZIPI Music Parameter Description Language
Keith McMillen*, David L. Wessel&, and Matthew Wright*&
*Gibson Western Innovation Zone
2560 9th St. Suite 212
Berkeley, California 94710 USA
&Center for New Music and Audio Technologies (CNMAT)
Department of Music, University of California, Berkeley
1750 Arch St.
Berkeley, California 94720 USA
McMillen, DLW, Matt@CNMAT.Berkeley.edu
ZIPI's Music Parameter Description Language ("MPDL") is a new language for
describing music. It delivers musical parameters (such as articulation,
brightness, etc.) to notes or groups of notes. It includes parameters that
are well understood and universally implemented, such as loudness and pitch,
and supports parameters such as brightness and noise amount that should be
more common in the future. A large number of parameters remain unspecified,
thus assuring expandability and flexibility.
The MPDL is just one of ZIPI's application layers; others will include MIDI,
data dumps, and digital audio.
This article does not address any of the low level networking issues
associated with ZIPI. We will assume that ZIPI's lower levels deliver
arbitrary-sized data packets from any device on the network to any other
device on the network; this document describes how to transmit music data
via those packets. This application layer could run equally well on some
other lower network layer such as Ethernet or FDDI.
ZIPI's Music Parameter Description Language was designed by Keith A.
McMillen, David Wessel, and Matthew Wright.
The Shape of MPDL Packets
=========================
[Figure 1 would go here if this weren't the ASCII version]
Figure 1 shows the format of MPDL packets. The low-level network that
carries the MPDL packets will impose some overhead bytes; this will include
a network address indicating which ZIPI device should receive the packet.
The other overhead bytes depend on the particular network that carries MPDL;
we will not discuss the details here. Note that overhead bytes are at the
beginning and/or end of a network packet, but the MPDL data is contiguous.
The MPDL data itself consists of a *note address* followed by an arbitrary
number of *note descriptors*; these are described in the sections below.
Address Space
=============
When you send a message such as "become louder," you also need to specify
what it is that becomes louder. We call this "what" the *address* of a
message.
In MIDI, you might send continuous controller number 7 (which typically
means volume) to channel number 3, so the address of the message is "channel
three." If you want to send a MIDI message to a single note, rather than an
entire channel, you must name that note by giving its pitch as well as its
channel, as in, "apply after-touch to middle C on channel 2; release the G
above middle C on channel 1."
One weakness of MIDI is that there are many musical situations that are
awkward to express when a note's address corresponds directly to its pitch.
For example, a note's pitch might change over time, or there might be two
notes played on the same instrument with the same pitch. Therefore, in MPDL,
notes have individual addresses that are unrelated to their pitch. An MPDL
note number is simply a number used to identify that note; any MPDL note
number can have any pitch.
MIDI's address space is organized as a two-level hierarchy---notes within
channels. It is sometimes useful, however, to control groups of groups of
notes rather than just groups of notes, so MPDL's note address space uses a
three-level hierarchy. Our names for the layers of the hierarchy fit an
orchestral metaphor; there are *notes* within *instruments*, and the
instruments are grouped into *families*. (One might ask why we do not have a
four-level hierarchy, or even a general n-level hierarchy. Such schemes
take up synthesizer resources, so we've tried to balance generality with
ease of implementation.)
Another weakness of MIDI is that each kind of message must be addressed
either to an entire channel or to a single note, not both. For example, it
is impossible to individually pitch-bend one of the notes of a MIDI channel;
any pitch bend information will affect all of the notes sounding on that
channel. Pan, volume, and any other "continuous controller" data must also
apply to an entire channel. Likewise, other MIDI messages, such as note-on,
are always per-note, so there's no way to articulate an entire chord with a
single message. In general, one would like to be able to send any control
signal either to a particular note or to a group of notes. Therefore, in
MPDL, any message can be addressed to any level of the hierarchy. For
example, a pitch message could go to a single note, while a note release
message could go to all of the notes of an instrument, and a loudness
message could go to an entire family.
There are 63 MPDL families, each of which have 127 instruments. Each of
these 8001 MPDL instruments has 127 notes, for a total of 1,016,127 MPDL
note addresses. This hierarchical organization is shown in Figure 2.
Families, instruments, and notes are numbered from one, not zero.
[Figure 2 would go here if this weren't the ASCII version]
There is also a way to send a message to all families. This is not a fourth
level of the hierarchy, but rather an abbreviation for sending the same
message to each of the 63 families; its effect is exactly the same as if one
had sent the message 63 times.
An instrument belongs to exactly one family and cannot change its family.
The orchestral metaphor is just a metaphor; the purpose of an MPDL address
is to uniquely specify the note or group of notes to which a message applies.
The device sending the note information must keep track of which notes are
sounding and which are not, so that it can update parameters of already
sounding notes. The device receiving the note information, of course, will
not be capable of 1016127 note polyphony, so it must manage the allocation
of the available synthesis resources. (It will also not be capable of
storing parameters for 1016127 different notes. It is expected that ZIPI
controllers will in practice only use a small subset of the address space;
algorithms can take advantage of that expectation to store and record
parameter values very quickly and without using very much memory.)
Each ZIPI device has its own address space; all of the families,
instruments, and notes just discussed are within a single ZIPI device. In
addition to the note address contained in the MPDL packet, ZIPI packets also
include a *network address* that indicates which device should receive the
MPDL message. Delivery of the MPDL packet to the appropriate device is
handled by the network levels below MPDL.
Controlling Musical Instruments
-------------------------------
Musicians will typically set up each MPDL instrument as 127 voices of a
particular timbre. (In a sense, an MPDL instrument configured this way is
like a MIDI channel, which is always associated with a single "patch" or
"preset.") Sending messages to this instrument will influence all the notes
played by the instrument, yet other messages can be addressed to individual
notes within the instrument. For example, one could "pitch bend" all of the
notes played by a given instrument by sending a pitch message to the
instrument, yet an individual note can be bent by sending a pitch message to
that note.
Furthermore, it sometimes makes sense to group collections of instruments to
be controlled together. To create a ZIPI orchestra, we might put all of the
strings in one family, the brass in another, and woodwinds and percussion in
two more. Inside the string family we would have instruments for first
violins, second violins, violas, cellos, and basses. Each string of each of
the violas would be a note. Table 1 shows how one might set this up.
Family 1: Strings
Instrument 1: First violins
Instrument 2: Second violins
Instrument 3: Violas
Note 1: C string
Note 2: G string
Note 3: D string
Note 4: A string
Instrument 4: 'Cellos
Instrument 5: Basses
Family 2: Brass
. . .
Family 3: Woodwinds
Instrument 1: Flutes
Instrument 2: Clarinets
Instrument 3: Oboes
Instrument 4: Bassoons
Family 4: Percussion
. . .
Table 1: The Addresss Space for a ZIPI Orchestra
Since commands can be issued to control each level of the hierarchy, this
setup gives the user a conductor's control over the orchestra. The string
family can be made louder and brighter. The woodwinds can all be panned
slightly to the left. Within the woodwind family, individual instruments can
be addressed; for example, the oboes can get quieter while the clarinets
over-blow.
Musical Control Parameters
==========================
After the note address has been chosen, a ZIPI packet may contain any number
of *note descriptors* intended for that address. A note descriptor gives a
new value for a parameter, e.g., "pitch is B flat 2" or "pan hard left." The
note descriptor consists of a note descriptor identifier (ID), which
indicates which parameter is being updated, and some number of data bytes,
which give the new value for that parameter.
Tables 2 through 6 list the currently defined note descriptors. We expect to
define a few more note descriptors before completing the specification of
the MPDL, but we will leave at least half of them explicitly undefined
(Zicarelli 1991). The interpretations of these messages are described in the
sections below. The last column of each table, the "combining rule,"
indicates the way that values of these parameters interact when sent to
different levels of the address hierarchy; this is described in the section
on "how the levels interact" below. Note descriptor ID zero is illegal.
#bytes ID(hex) Meaning Default Combining rule
------ ------- ------- ------- --------------
1 01 Articulation see below ``and''
2 40 Pitch Middle C add (see below)
4 80 Frequency in Hz Middle C overwrite
2 41 Loudness mezzo-forte multiply
2 42 Amplitude mid-scale multiply
1 02 Brightness mid-scale multiply
1 03 Even/Odd Harmonic Balance mid-scale multiply
1 04 Pitched/Unpitched Balance mid-scale multiply
1 05 Roughness mid-scale multiply
1 06 Attack character mid-scale multiply
1 07 Inharmonicity (signed) zero multiply
1 08 Pan Left/Right center multiply
1 09 Pan Up/Down center multiply
1 0A Pan Front/Back center multiply
2 43 Spatialization distance 10 meters multiply
1 0B Spatialization azimuth angle forward add
1 0C Spatialization elevation angle zero add
2 44 Multiple output levels mid-scale multiply
2 45 Program Change Immediately 0 (silence) overwrite
2 46 Program Change Future Notes 0 (silence) overwrite
1 0D Timbre space X dimension 0 add
1 0E Timbre space Y dimension 0 add
1 0F Timbre space Z dimension 0 add
Table 2: Synthesizer Control Parameters
#bytes ID(hex) Meaning Default Combining rule
------ ------- ------- ------- --------------
11 C0 Modulation info block none see below
3 81 Modulation rate zero see below
2 47 Modulation depth zero see below
n C1 Modulation table n/a n/a
n C2 Segment info block none see below
n C3 Segment table n/a n/a
Table 3: Higher-Order Messages
#bytes ID(hex) Meaning Default Combining rule
------ ------- ------- ------- --------------
1 10 Allocation priority zero multiply
3 82 New Address n/a n/a
n C4 Overwrite n/a n/a
n C5 Query n/a n/a
n C6 Query response n/a n/a
n C7 Text/Comment n/a n/a
Table 4: Housekeeping Messages
#bytes ID(hex) Meaning Default Combining rule
------ ------- ------- ------- --------------
4 83 Time tag n/a n/a
4 84 Desired minimum latency 0 n/a
Table 5: Time Tagging Messages
#bytes ID(hex) Meaning Default Combining rule
------ ------- ------- ------- --------------
1 11-1F Undefined 1-byte controllers undefined undefined
2 48-5F Undefined 2-byte controllers undefined undefined
3-4 85-9F Undefined 4-byte controllers undefined undefined
n C8-DF Undefined n-byte controllers undefined undefined
Table 6: Undefined MPDL Synthesizer Control Parameters
ZIPI instruments are not required to respond to every one of these messages,
although they are encouraged to respond to as many of them as possible. A
note descriptor's byte length is encoded as part of the ID number, so
receiving synthesizers can ignore note descriptors that they do not
implement, skip the correct number of bytes, and then examine the next note
descriptor in the packet.
Logically, all of the note descriptors in a single MPDL packet apply to the
same instant of time. This means the order of note descriptors within a MPDL
packet does not matter.
Articulation
If the note descriptor ID is *articulation*, the two high-order bits of the
data byte specify one of the three articulation types that are defined in
table 7.
High-order bits Articulation type
--------------- -----------------
11 Trigger
10 (not used)
01 Reconfirm
00 Release
Table 7: MPDL Articulation Types
*Trigger* messages start a note; the new note will have any parameters that
were set for that note before the trigger message was sent. Pitch and
loudness are not part of the trigger message, so before sending the trigger
message, or in the same MPDL packet as the trigger message, you should set
pitch and loudness to the desired levels. Remember that the order of note
descriptors within an MPDL packet does not matter, so pitch, loudness, and
trigger could come at any position in the packet.
The note then sounds until a *release* message is received. If a new trigger
message comes before the release message comes, the note re-attacks with no
release. This is useful for legato phrasing. Figure 3 shows what might
happen to the amplitude of a tone as a note receives two trigger messages
and then a release message.
[Figure 3 would go here if this weren't the ASCII version]
A note retains its parameters after a release message; receipt of a new
trigger message will articulate a new note with the same parameters as
before. (The section below on the "allocation priority" message describes
when a note loses its parameters.)
Keep in mind that the default timbre is silence, so unless you've sent a
program change message that affects a note, triggering it will have no
effect. The default pitch for a note, i.e., the pitch of a note that has
never had its pitch set, is middle C, and the default loudness is
*mezzo-forte*, so if you set the timbre of a note and trigger it, the
synthesizer will play a *mezzo-forte* middle C.
The *reconfirm* message is a reminder from the controller that it thinks the
note should still be sounding. It is not needed under most circumstances,
because notes that have been triggered but not yet released are assumed to
still be sounding. However, in cases of network failures, this message can
be used to reestablish the notes that should still sound. If a network
failure occurs, all controllers should reconfirm all sounding notes to the
synthesizers. Synthesizers that do not receive a reconfirmation within a
certain amount of time should shut off those notes, assuming that the
release message was lost.
These three messages can be understood in terms of the gate and trigger bits
that were used to articulate synthesizers in the "old days." The high-order
bit is like the trigger bit, and the next bit is like the gate bit. So
trigger means asserting both the gate and trigger bits, release means
de-asserting the gate bit, and reconfirm is like asserting the gate bit but
not the trigger bit.
The remaining 6 bits of the articulation data byte specify exactly what kind
of articulation occurs. In music, "articulation" can mean a lot more than
"on and off." There are a large number of instrument-specific articulation
styles, e.g., hammer-ons for guitar, lip slurs for brass instruments, and
heavily tongued attacks for reed instruments (Blatter 1980; Piston 1955).
The problem with encoding these articulation types is that they are
meaningful only in the context of certain instruments; it is difficult to
say how to implement a hammer-on on a clarinet. So we are working to define
abstract articulation categories, expressed in a way that does not refer to
a particular instrument. We hope that these will be in a future version of
the MPDL as the possible values for the remaining six bits.
For the release message, we've specified three possible behaviors, as shown
in Table 8.
Low-order bits Behavior
-------------- --------
000001 Release the note naturally
000010 Instantly silence the note
000011 Release the note naturally, unless it's still in the
attack portion of the tone, in which case complete
the attack portion and then release naturally
Table 8: Types of Release Messages
Controlling Pitch
In MIDI, pitch is specified by key number, which in practice almost always
maps to equal-tempered semitones. For finer resolution, e.g., to convey
vibrato or pitch bend, or to use alternate tunings, it is necessary to use a
pitch bend controller, which provides an additional seven or 14 bits of
precision. Attempts have been made to retrofit a better tuning system onto
MIDI (Scholz 1991), but nothing has been officially adopted (Rona 1991).
In MPDL, pitch data is given by a 16-bit log pitch word. The first seven
bits are nearest MIDI note number and the remaining nine bits are fractional
semitonal values in units of about 0.2 cents. The binary word *nnnn nnn*1
0000 0000 is equal to MIDI note number *nnn nnnn*. The word *nnnn nnn*0 00
00 0000 is MIDI note *nnnn nnn*, a quarter tone flat, and *nnnn nnn*1 1111
1111 is MIDI note *nnnn nnn*, a quarter tone sharp. (The rationale is that
MPDL to MIDI pitch conversion thus requires truncation instead of rounding.)
*Frequency in Hz* is an alternate way to specify pitch, as a 32-bit
fixed-point number. The first 16 bits are the number of Hertz, from 0 to
65535, and the last eight bits are the fractional part, giving a resolution
of better than 0.000016 Hertz. Receipt of a Hz-frequency message overwrites
the previous pitch message and vice versa; mixing the two kinds of messages
is discouraged.
Loudness
The *loudness* parameter corresponds to our subjective impression of
intensity. The units of loudness are musical dynamic markings; the
interpretations of loudness are given in Table 9.
MPDL loudness value (hex) musical dynamic
------------------------- ---------------
0000 pppp
1000 ppp
2000 pp
4000 p
6000 mp
8000 mf
A000 f
C000 ff
E000 fff
FFFF ffff
Table 9: Interpretations of MPDL Loudness Values
Loudness is influenced not only by amplitude but also by the temporal and
spectral characteristics of the sound (Moore 1989; Pierce 1992). Performing
musicians are usually quite skilled in trading off the various acoustic
parameters that contribute to the perception of loudness and are able, for
example, to adjust a bright oboe note to be same musical dynamic or loudness
as a mellow French horn tone. The key idea behind the MPDL loudness
parameter is that if one sends the same value to notes played on different
instruments they will sound at the same subjective level. In order for the
loudness parameter to function properly, the various instruments on a given
synthesizer must be carefully matched throughout their pitch and dynamic
ranges so that a given loudness value consistently produces the same musical
dynamic. Admittedly, as loudness is defined in terms of a subjective
impression, there will be some differences of opinion among different
listeners. What we are asking for is a good approximation to matched
loudnesses. Good voicing practice on MIDI synthesizers already points in
this general direction in that different voices are adjusted so that a given
MIDI velocity produces comparable loudness impressions.
Amplitude
*Amplitude* describes the overall gain of a sound. Increasing it is like
turning up the level on a mixing board. Changing a sound's amplitude does
not change its timbre.
We might describe here the result of changing the amplitude of instrument
timbres already adjusted for loudness. High amplitude with low loudness on a
piano sound would give the effect of a pianist playing very softly through a
loud amplification system. Conversely, low amplitude with high loudness
would sound like someone banging hard on a piano, but played very quietly
through speakers.
Even/Odd Harmonic Balance
Most pitched sounds can be thought of as a collection of harmonics or
overtones, which are sine waves spaced equally in frequency. The first
harmonic is defined to be at the fundamental frequency, the second harmonic
at twice the fundamental frequency, the third harmonic at three times, etc.
(Pierce 1992 contains a good explanation of this.) The *even/odd harmonic
balance* parameter is a measure of the overall amplitude of the first,
third, fifth, etc., harmonics versus the overall amplitude of the second,
fourth, sixth, etc. Listening to only the odd harmonics of a tone gives a
sound something like a square wave; listening to only the even harmonics
sounds somewhat similar to a note played an octave higher with the same
spectrum.
This may seem like a strange parameter, but it is actually quite meaningful.
Many acoustic musical instruments have different balances of odd and even
harmonics, and this balance can fluctuate dramatically over the time course
of the note. This even/odd balance and its variation over time have potent
effects on tone quality (Krimphoff, McAdams, and Winsberg 1994). For
example, the timbral difference that comes from picking a guitar at
different points on the string has a lot to do with this balance.
Most synthesis algorithms make it easy to manipulate this balance. In
physical modeling synthesis one can make models of open and closed tubes or
strings plucked or bowed at various critical points along the string. FM
synthesis provides a natural mechanism by mixing simple FM patches with
differing carrier-to-modulator ratios. Wave-shaping synthesis has its odd
and even distortion function components, and additive synthesis affords
direct control over the spectral content. Even subtractive synthesis with
poles and zeros allows for tight control over the even/odd balance (Smith
1993).
Pitched/Unpitched Balance
Many sounds can be thought of as containing a pitched portion and an
unpitched or noise portion (Serra and Smith 1990). The sound of a piano, for
example, consists of a "thud" made by the sound of the hammer hitting the
string, along with the pitched sound of the vibrating strings. The
*pitched/unpitched balance* parameter measures the relative volume of these
two portions of a tone.
Most pitched musical sounds are harmonic or nearly harmonic, that is, the
partial frequencies are nearly exact integer multiples of the fundamental.
For some instruments, however, these ratios are inexact. Pianos, bells,
tympani, and other instruments have partials whose frequencies are not
always integer multiples of the fundamental, and are sometime nowhere near
integer multiples.
Inharmonicity
The *inharmonicity* parameter describes the amount that partials deviate
from perfect harmonicity. Inharmonicity is signed; zero, the center value,
means "the usual inharmonicity of the sound." As there are many ways to
produce a deviation from the harmonic series, the interpretation of this
parameter may vary when it is non-zero. Some synthesizers might interpret
negative inharmonicity to mean "more perfectly harmonic than the original
sound" and positive inharmonicity mean more inharmonic. Others might
interpret negative inharmonicity as "squeeze," causing the partial
frequencies to be spaced closer together than usual, with positive
inharmonicity as "stretch," making the partial frequencies spaced further
apart than usual. Other synthesizers might take the absolute value of this
parameter, ignoring the sign.
Controlling a Note's Position in Space
A sound's position in 3-D space can be described in either rectangular or
polar form. For rectangular form, there are three *pan* variables:
left/right, up/down and front/back. A value of hexadecimal 00 means "panned
all the way to one direction," which would be left, up, or front.
Hexadecimal FF means "panned all the way to the other direction," and
hexadecimal 80 means "center." Synthesizers should implement equal energy
panning.
For polar coordinates, there's *spatialization*, which describes the
distance of the produced sound from the listener and the direction from
which the sound comes. Psychoacousticians have noted that human spatial
perception is described by separate mechanisms for angular orientation and
distance (Blauert 1982), so spatialization is perceptually more meaningful
than pan. The azimuth is the angle between the sound source and
"backwards," in a horizontal plane. Hexadecimal 0000 means "from behind";
4000 means "to the left"; 8000 is "directly ahead," and C000 is "to the
right." The elevation is the remaining dimension in polar coordinates, going
from hexadecimal 0000, meaning "down", to 8000 for "on the same level", to
FFFF for "up."
Finally, there's a way to directly control the amplitude that a note has out
of each of the outputs of a ZIPI timbre module. In the most general case,
the synthesizer has up to 256 separate outputs, and any note can be directed
to any output with any volume. A particular note might be coming mostly out
of outputs 5 and 8, but also a little bit out of 1, 2, and 4, for example.
The *multiple output levels* note descriptor lets you set these amplitudes.
It has two data bytes; the first selects one of the synthesizer's outputs
and the second sets a level for the given sound at that output.
ZIPI controllers should not mix these three types of positioning
information; they're meant to be three separate systems to specify the same
thing. (That is to say, the low-level effect of pan or spatialization
messages will be to control how much of each sound comes out of each of the
synthesizer's outputs.)
Program Change
*Program Change* determines which program or preset the synthesizer should
use as an instrument---trumpet, bag pipes, timpani, etc. Patch number zero,
the default, is defined to be silence. Presets 1 through 127 should follow
the General MIDI assignments (MMA 1991). Since the program is specified by
two bytes (giving 65536 possible values), the default set can be largely
expanded while keeping a large segment free for arbitrary use. This message
might cause a synthesizer to choose a set of sample files, load FM synthesis
parameters, or anything else.
It is expected that synthesizers will have some restrictions on the use of
this parameter. For example, since each active patch will probably take up a
certain amount of memory, processing power, or other resources, there might
be a maximum number of different patches that can be selected at once. Also,
it might take a certain amount of time to set up a new timbre, so this
message should be sent before the new timbre actually has to sound.
There are two reasonable behaviors for a program change message with regards
to the timbre of the notes sounding at the time the message is received. The
*program change immediately* message requests that all currently sounding
notes change their program to the new one. The *program change future notes*
message requests that currently sounding notes retain their programs, but
that newly articulated notes use the new program. For example, if note 3 of
some instrument is sounding a flute, and that instrument receives a program
change future notes message to trumpet, note 3 will continue to play a flute
until it is released. When note 3 receives another trigger message, it will
sound as a trumpet.
Controlling Timbre
Though timbre is a complex and subjective attribute of musical tones, there
are some aspects of timbre about which there is considerable agreement among
both musicians and psychoacousticians (Risset and Wessel 1994). We have
chosen to specify these more agreed upon aspects of timbre in MPDL. They are
brightness, roughness, and attack character.
The impression we have of a tone's *brightness* corresponds strongly to the
amount of high-frequency content in the spectrum of the sound. One good
measure of this is the "spectral centroid," which is the average frequency
of the components of a sound, weighted by amplitude. At a given pitch and
loudness an oboe sounds brighter than a French horn, and a look and the
spectrum of each tone shows the oboe to have more high frequency components
than the French horn. Computing the spectral centroid would show the oboe to
have a higher value than the French horn. Different synthesis algorithms
employ different procedures to manipulate a tone's brightness but most all
provide a rather direct path to control this feature. In FM increasing the
modulation index increases the high frequency content. Moving the cutoff
frequency of a low-pass filter upwards has a similar effect. With additive
synthesis, detailed control of the spectral envelope is provided by direct
specification the amplitudes of the partials.
*Roughness* has a direct intuitive meaning. Low values would correspond to
very smooth tones, while high values would be rough. It might, for example,
correspond to over-blow on a saxophone. A considerable body of
psychoacoustics research shows it to be related to amplitude fluctuations in
the tone's envelope. When the envelope of a tone fluctuates at a rate of 25
to 75 Hz and when the depth of this amplitude fluctuation approaches 10% of
the overall amplitude the sound quality becomes very rough. Beats among
partials of a complex tones can produce such fluctuations and give a
roughness to the sound. As with other timbral parameters, there are a
variety of ways to implement roughness control in different synthesis
algorithms.
*Attack Character* describes, intuitively, how strong of an attack a note
should have. As a first approximation, it might correspond to the attack
rate in a traditional attack-decay-sustain-release envelope. It might also
correspond to a louder maximum volume value during the attack, a noisier
attack, a brighter attack, etc. The value for this parameter is "sampled" at
a note's trigger time---whatever value this parameter has when a note is
triggered specifies the attack character for the note. Changing attack
character in the middle of a note does not require the synthesizer to change
anything about that note.
Moving in a Timbre Space
A timbre space (Wessel 1985) is a geometric model wherein different sound
qualities or timbres are represented as points. Similar sounding timbres are
proximate and dissimilar ones distant from each other. A timbre space is a
fairly general model for representing the important perceptual relationships
among different timbres and provides an intuitive control scheme based on
interpolation. Timbral control is exercised by making trajectories in the
space.
MPDL provides for up to three-dimensional timbre spaces. For higher-
dimensioned timbre spaces, it is always possible to use MPDL's undefined
note descriptors.
The timbre space *x*, *y*, and *z* coordinate controls are like other
continuous controller note descriptors. The contents of the space and the
scheme for interpolation are part of the patch that is selected by the
"program change" note descriptor.
Higher-order Messages
---------------------
MPDL provides for modulation messages. Vibrato is an example of pitch
modulation, where the pitch of a note varies around the central pitch of the
note. You can think of the vibrato as a function of time (e.g., a triangle
or sine wave) modifying a default value, i.e., the underlying pitch that the
vibrato is around. In the MPDL, modulation means that an additive offset or
a multiplicative scale factor can be applied to a parameter value as a
function of time. Any parameter, not just pitch, can vary.
It would be possible to modulate any parameter explicitly, by sending a
stream of explicit parameter values. For example, a good ZIPI violin would
have fine-grained enough pitch detection to notice the vibrato played by the
musician, and send it to a ZIPI synthesizer as a series of very accurate
pitches all close to a central value.
On the other hand, consider a computer program playing a symphony on a
collection of ZIPI timbre modules. Individually specifying the vibrato of
each stringed instrument via frequent updates of the pitch parameter would
be impractical in terms of the amount of data transmitted. Instead, the MPDL
has a way to specify a table or function to give values for a parameter over
time. A single message says, for example, "start a triangle-wave shaped
modulation of the pitch parameter with depth ten cents and frequency six
Hz." After that message is sent, the receiving device computes the
subsequent values of that parameter with no additional ZIPI messages
required. Furthermore, as the depth and frequency of a modulation are MPDL
parameters like any other, they can be updated by a stream of control values.
Segments provide a similar high-level control, but have semantics more like
that of decrescendo. With segments, one can say, for example, "start an
exponential decay of loudness that will go to pianissimo in 1.6 sec." Here,
what is specified is a parameter (loudness), a target value (pianissimo), a
time to reach that target (1.6 sec), and a shape for the function to use on
the way to that value (exponential).
Modulation is for signed indefinite repeating functions moving back and
forth around a center value; the possible modulation functions are given in
Table 10. Segments are for functions with a "goal"; they are used to get
from one parameter value to another in a set amount of time. The possible
segment functions are given in Table 11.
Number Function
------ --------
0 No modulation (i.e., f(t) = constant)
1 sine wave
2 square wave
3 saw-tooth wave
4 triangle wave
5 random
6-255 User-defined tables
Table 10: MPDL Modulation Functions
Number Function
------ --------
0 Linear
1 Exponential concave
2 Exponential convex
3 S-shaped (e.g., sigmoid or logistic)
4-255 User-defined tables
Table 11: MPDL Segment Functions
Synthesizer manufacturers may implement these functions in a variety of
ways, as long as the functions behave as specified. (Except for "random,"
these functions could be computed by table lookup from a 256 by 1-byte
table, sample values for which will be made available.)
There should also be user-settable tables that can be loaded with any values
over the network. Tables can contain 1, 2, 3, or 4 byte numbers, and must
have a size that is a power of two. Tables should be able to be at least 256
by 1-byte, but synthesizer manufacturers are encouraged to provide larger
ones. Synthesizers are encouraged to have as many user-settable tables as
possible. Synthesizers should interpolate the values in these tables when
necessary.
To modulate a note descriptor, you must specify six things. The MPDL's
*modulation info block* message consists of the values for these six
parameters, all in a single message. The format of this message is shown in
Table 12.
Byte # Contents
------ --------
1 Note descriptor ID of parameter being modulated
2 Which function (from table 10)
3+4 Modulation rate, from -255 to 255 Hz, with .008 Hz resolution
5+6 Modulation depth
7+8 Loop begin point
9+10 Loop end point
Table 12: The Format of the Modulation Info Block Message
When the modulation rate is negative, it means that the synthesizer is to
read backwards through the table, or the equivalent if the function is
implemented in a manner other than a table. This is useful for changing the
shape of a saw-tooth wave, for example.
The loop begin and end points specify which portion of the table is read
through during the course of modulation. Typically they will specify the
entire table, but they can be used, for example, to alter the duty cycle of
a square wave.
Instead of sending a new modulation info block, you can send a *modulation
rate* or *modulation depth* message to specify the rate or depth of the
modulation for a particular note descriptor without updating any of the
other parameters.
To send your own modulation table as an MPDL message, use a *modulation
table* message. The first data byte is the number of the table you are
setting, which should not conflict with any of the pre-defined functions.
The second byte gives the size of the values in the table, in bytes, e.g.,
1, 2, or 4. The remaining data bytes are the actual contents of the table.
(Since MPDL note descriptors include their own lengths, the size of a
transmitted table is unambiguous.) It does not matter what note address a
modulation table message is sent to.
Function zero, "no modulation," stops modulation of a given parameter,
freeing all resources used for the modulation. ZIPI controllers should
explicitly turn off modulation of a parameter rather than just setting the
depth to zero, so that the synthesizer will be able to free these resources.
Modulation scales whatever other changes happen to a particular parameter,
using the same combining rule as with messages sent to various levels of the
address space hierarchy (see below). Imposing sinusoidal vibrato on a patch
with built-in vibrato will simply combine through. Therefore, users who want
to explicitly modulate pitch should probably turn off their synthesizer's
built-in vibrato. (Of course, if the vibrato is part of a sample, this is
not so easy.)
The *segment info block* message gives all the parameters necessary to apply
a segment to a parameter, as shown in Table 13.
Byte # Contents
------ --------
1 Note descriptor ID of parameter being manipulated
2 Which function (from table 11)
3 Segment chaining byte
4+5 Time to reach desired value, in msec
6+7 Function begin point
8+9 Function end point
10-n Target value
Table 13: The Format of the Segment Info Block Message
The starting value for the segment is not specified. Whatever value the
given parameter currently has is the start point. This makes it much easier
to chain segments together. If you would like the value of the given
parameter to jump to a new value, then go gradually to a second value, just
precede the segment info block message with an explicit value for that
parameter.
The function begin and end points are like the loop begin and end points for
modulation---they specify a portion of the function to use. The target value
must be a legal value for the given note descriptor; the byte length depends
on the note descriptor.
The segment chaining byte indicates how the given segment fits in with other
segments. If the value is zero, it means that this segment should override
any other segments that are affecting the given parameter for the given
address, "taking over" control of that parameter. If the value is one, it
means that the synthesizer should put this segment at the end of a queue of
segments for this parameter, to take effect after the current segment
finishes. This allows complex envelopes to be built out of these segments.
The *segment table* allows you to specify your own table for use by a
segment. Its syntax is exactly like that of the modulation table message.
The segment mechanism is exactly equivalent to sending a stream of values
for a parameter, so the old value of that parameter is overwritten with
values produced by the segment, and when the segment is done, the last value
produced by the segment is the current value of that parameter. What happens
when an explicit value is set for the parameter while a segment is still
running? That segment (and all segments in the queue) terminate, and the
parameter gets the explicitly set value. In other words, sending a
parameter change message is equivalent to killing a segment.
Housekeeping
------------
*Allocation priority* describes the importance of a note, in case a
synthesizer runs out of resources and has to choose a note to turn off. As
an example, important melodic notes might have a higher priority than
sustaining inner voices in thick chords. In MIDI, there's no way to tell the
synthesizer which notes are important, so MIDI synthesizers typically just
turn off the oldest sounding note, which is not always desired (Loy 1985).
Zero is the default priority for a note, but sending any message to a note
with zero priority automatically increases that note's priority to one.
Setting a note's priority to zero means "reset." If the note is sounding,
receipt of a zero priority silences it immediately. It also resets all of
the parameters associated with the note, as if no message had ever been sent
to the note. This allows a synthesizer to de-allocate the memory used to
remember parameter values that had been sent to that note. Therefore,
devices that send MPDL information, e.g., controllers and sequencers, should
send zero values for allocation priority when they determine that a note's
parameters will no longer need to be stored.
Setting an instrument's priority to zero resets the entire instrument; all
the notes inside the instrument shut off, all the parameters associated with
the instrument are reset, and all the parameters associated with all the
notes inside the instrument are reset. Likewise, allocation priority of zero
for a family shuts off all notes in all instruments of the family, resets
all of the parameters associated with that family, and resets all parameters
of all instruments and notes inside the family.
*New Address* is not a message addressed to a note; it is just a way to use
ZIPI bandwidth more efficiently. Imagine that you would like to update
certain timbral parameters for a whole group of notes. For example, a ZIPI
guitar might provide continuous information about the pitch, loudness, and
brightness of each of the six strings. One way to do this would be to send
six ZIPI packets, one for each of the strings. But this is wasteful of
network bandwidth, because each separate ZIPI packet has seven bytes of
overhead.
Instead, it would be better to use the new address message. The MPDL portion
of a ZIPI message must start with an address to which further note
descriptors apply. However, if one of the note descriptors is "new address,"
it specifies a different address to which subsequent note descriptors apply.
For example, for the packet shown in Figure 4, the pitch and pan messages
are for note 1 of instrument 3 of family 1, while the amplitude message is
for note 2 of instrument 2 of family 1 and the brightness message is for
instrument 1 of family 1.
[Figure 4 would go here if this weren't the ASCII version]
*Overwrite* has to do with the interaction of the three levels of the
address hierarchy. See the section below, "how the levels interact," for an
explanation of this message.
Querying a Synthesizer
----------------------
Since ZIPI communication can always be two-way, there is a mechanism for
asking questions of a synthesizer. The *query* message asks some question;
it is a request for the synthesizer to respond with a *query response*
message answering the question.
The first data byte of the query message is the question ID, as given in
Table 14. The remaining data bytes qualify the question, e.g., for question
4 ("Do you respond to the given note descriptor?") there is one additional
data byte, a note descriptor ID. (We also rely on the data link layer to
include the network address of the querying device, as the return address
for the response.)
ID Meaning
-- -------
1 What's the value of this MPDL parameter at the given level of
the address space?
2 What's the combined value of this MPDL parameter for the given
note?
3 Please send me a menu of all patch names.
4 Do you respond to the given note descriptor?
5 How many voices of polyphony do you have?
6 How many voices of polyphony do you have left?
... Undefined
255 (Indicates that the next two bytes specify the question ID.)
Table 14: MPDL Query Question IDs
The first byte of the query response message is the ID of the question being
answered, and the next bytes are the qualifying data that was asked in the
question. The remaining bytes are the actual response to the question.
For example, a sequencer program might ask a synthesizer whether note two of
instrument one of family one is sounding. It would select the given address
as the address of an MPDL packet, then include a query note descriptor. The
first data byte would be two ("What's the combined value of this MPDL
parameter for the given note?"), and the second data byte would be one, the
note descriptor ID for articulation.
The synthesizer's response would first select note two of instrument one of
family one as the address, then include a query response message. The data
bytes of the query response message would be 2, the question ID, one, the
note descriptor ID for articulation, and then something like 11000000 (an
articulation value for "trigger") or 00000000 (an articulation value for
"release").
Comments
The comment note descriptor's data bytes are ASCII characters; they have no
meaning to a synthesizer. In recorded files of MPDL control information (see
Appendix B), one might want to add comments to certain note descriptors,
e.g., "Start of second movement." In order for these comments to be
incorporated consistently with the other messages, they're part of the MPDL
itself. (This implies that when an MPDL file is transmitted via ZIPI, the
comments will remain intact.)
Time Tags
No network can provide instantaneous transmission of information. In ZIPI,
network latency (the amount of time it takes a packet to be delivered) will
be very small under normal conditions, usually in the range of 0.5 to 5
msec. Variability in network latency, sometimes called "jitter," can be
worse than the latency itself if music is involved, however. It is less
annoying to play a synthesizer that delays every attack by exactly 10 msec
than one that delays every attack by a random amount of time between 3 and 9
msec (Moore 1988.)
Therefore, the MPDL has a way to impose a fixed minimum delay on each
packet. If it arrives earlier than expected, the receiving synthesizer can
wait a short amount of time before carrying out the packet. (Anderson and
Kuivila [1986] discuss this principle, but in the context of algorithmic
composition rather than networking.)
To accomplish this, there's a way to put a time tag in each MPDL packet. The
*time tag* note descriptor indicates the exact time that the packet applies
to, according to the sender's clock. The *desired minimum latency* note
descriptor tells a ZIPI device what minimum latency to impose on all
time-tagged network packets.
What does the receiving device do with this information? For now, let's
assume that the receiving device has its own clock, which has been
synchronized very closely to the sender's clock. (ZIPI's data link layer can
do this synchronization.) The receiving device can compare an incoming
message's time stamp to the current value of its own clock to see how much
delay has occurred already. If this is less than the requested minimum
latency, the receiving device simply buffers the message for the appropriate
amount of time until the minimum latency has been reached. (Dannenberg
[1989] gives a variety of efficient algorithms for this.) If the delay
already incurred is greater than the requested minimum, the message is
already too late, so the receiving device should deal with it immediately.
What if the two clocks are not synchronized? For example, if the MPDL runs
over FDDI or Ethernet, no system-wide clock synchronization will be
provided. In that case, there are still algorithms to impose a minimum
latency on network packets, although they are more complicated and slightly
less effective.
These time tags are useful in other situations besides real-time control of
a synthesizer. For example, take the case of recording into a ZIPI
sequencer. MIDI sequencers must record the time that each message was
received, so as to store timing information in the recorded file. The time
that the sequencer receives the information is the time that it was played,
plus some network delay. In ZIPI, if a controller time tags outgoing
messages, the network delay will have no effect on the recorded sequence.
Sequencer playback benefits from time tags also. If the sequencer program
uses the time stamps stored in the file and requests a sufficiently large
minimum delay, all of the delays incurred by the sequencer, including disk
latency, processing time, and network delay, can be eliminated as well.
As another example, imagine the data from some ZIPI controller being fed
into a computer, which applies some transformation to the data and then
gives it to a synthesizer. The computer program does not have to wait for a
fixed latency before processing each controller message; it can start
manipulating the data as soon as it arrives, but if it preserves the time
stamps produced by the controller, rather than producing new ones, the
jitter introduced by the transformation process can be eliminated at the
synthesis end.
Time stamps are 4 bytes, using 50 msec units, giving a range of 2.5 days
expressible in an MPDL time stamp. The desired minimum latency note
descriptor has the same data format.
Controller Measurements
=======================
Real-time control of an electronic musical instrument involves three stages:
measuring the musician's gestures---which key was struck, how much air
pressure was there, where on the fingerboard were the violinist's fingers,
etc.; deciding how these gestures will translate into the electronic sounds
produced; and synthesizing a sound
Figure 5 demonstrates these stages. In the MPDL we draw a distinction
between the first arrow, "measurements of musician's gestures," and the
second arrow, "synthesizer control signals." All of the parameters listed in
Table 2 above are in the second category; they are descriptions of sound
that tell a synthesizer what to do.
[Figure 5 would go here if this weren't the ASCII version]
Typically, ZIPI controllers will provide both the measurements of the
gestures and a way to map those gestures onto parameters required to produce
a sound. For example, a ZIPI violin might measure the bow's distance from
the bridge and use it to determine brightness. That would divide the above
picture according to Figure 6. In this setup, the ZIPI instrument sends
synthesizer control parameters, the ones described above.
[Figure 6 would go here if this weren't the ASCII version]
People will not always want to use the mapping capabilities provided by
their controller, however. For example, some people will want to write their
own computer programs, e.g., in the Max language (Puckette 1991), to
determine complex mappings. One might want to control the loudnesses of four
families by finger position on the neck of a ZIPI violin. To support
user-defined mappings, we recommend that ZIPI controllers be able to send
their raw physical measurements directly, without mapping them onto
synthesizer control parameters. In other words, it should be possible to
turn off the software in the controller that is mapping the physical
gestures into control information, sending the measurements of those
gestures as uninterpreted data. That would divide the picture as shown in
Figure 7. ZIPI controllers' user interfaces should provide a way to switch
between these two modes; sometimes the controller should do the mapping
itself and sometime the controller should send out the "raw" data.
[Figure 7 would go here if this weren't the ASCII version]
Controller measurements are just another kind of note descriptor, listed in
Table 15. Note that these measurements take up the higher ID numbers, while
synthesizer control parameters take up lower ID numbers and the middle
numbers are undefined. (As with the note descriptors for synthesizer
control, we expect to define a few more and leave most of them free for
future specification.)
#bytes ID(hex) Meaning
------ ------- -------
1 3f Key Velocity
1 3e Key Number
2 7f Key Pressure
2 7e Pitch Bend Wheel
2 7d Mod Wheel 1
2 7c Mod Wheel 2
2 7b Mod Wheel 3
1 3d Switch pedal 1 (Sustain)
1 3c Switch pedal 2 (Soft pedal)
1 3b Switch pedal 3
1 3a Switch pedal 4
2 7a Continuous pedal 1 (Volume)
2 79 Continuous pedal 2
2 78 Continuous pedal 3
2 77 Continuous pedal 4
1 39 Pick/Bow Velocity (signed)
1 38 Pick Pressure
1 37 Pick/bow position
2 76 Fret/fingerboard position
1 36 Fret/fingerboard pressure
1 35 Wind flow or pressure (breath controller)
1 34 embouchure (bite)
2 75 Wind controller keypads
1 33 Lip pressure
2 74 Lip frequency (buzz frequency for brass)
1 32 Drum head striking point X position (rectangular coordinates)
1 31 Drum head striking point Y position (rectangular coordinates)
1 30 Drum head striking point distance from center (polar form)
1 2f Drum head striking point angle from center (polar form)
2 73 X position in space
2 72 Y position in space
2 71 Z position in space
2 70 Velocity in X dimension
2 6f Velocity in Y dimension
2 6e Velocity in Z dimension
2 6d Acceleration in X dimension
2 6c Acceleration in Y dimension
2 6b Acceleration in Z dimension
Table 15: Controller Measurement Parameters
Note that not all ZIPI controllers will work by physically measuring a
musician's gestures. Another class of ZIPI controllers consists of acoustic
instruments whose sound output is measured and analyzed by a computer and
converted into control information. In this kind of instrument, for example,
a real-time pitch tracker would examine the sound produced by a flute and
convert it to MPDL pitch messages. Digital signal analysis could be used to
compute the spectral centroid of the flute's sound, which would produce MPDL
brightness messages. In this case the measurements of the musician's
gestures are done with the same musical control parameters that the mapping
function produces to control a synthesizer. (But in this case, the identity
function is a perfectly good mapping function; it will cause the synthesizer
to mimic timbrally what the musician is playing, which will probably be the
most commonly desired situation.)
How the Levels Interact
=======================
What happens if you send an amplitude of one to a note, then an amplitude of
ten to the instrument containing that note, then an amplitude of 100 to the
family containing that note? What is the actual amplitude of the sound
produced?
There are four different ways to combine parameter values passed to
different levels of the hierarchy. Each parameter uses one of these four
rules. They are: "and," "multiply," "add," and "overwrite." The "combine"
column in the tables of note descriptors (Tables 2 through 6) tells which of
these four rules each parameter uses.
Only articulation uses the "and" rule, which is described in the next
section.
Most parameters use the "multiply" rule, meaning that each level of the
hierarchy (notes, instruments, and families) stores its most recent value
for the parameter, and the actual value that comes out is the product of
these three numbers.
Amplitude is an example of a parameter with this rule. If two notes of an
instrument have amplitudes of 20 and 10, they will have a relative amplitude
ratio of 2:1 no matter how high or low the instrument's amplitude gets.
Note that what are being multiplied are offsets from a base value, and that
the "base value" depends on the particular patch being played on the
synthesizer. A flugelhorn will naturally be less bright than an oboe, so the
mid-scale brightness value for a flugelhorn will produce a much less bright
sound than the mid-scale brightness value for an oboe.
The "add" rule is just like the "multiply" rule, except that the three
values for the parameter are added together instead of multiplied together.
Coordinates in a timbre space are combined in the MPDL with this rule.
The combining rule for pitch is a special case of the "add" rule; pitch is
taken as an offset from middle C, and the offsets accumulate additively. If
a family receives a pitch message of hexadecimal 7F00, which would be middle
C#, the effect will be to transpose everything played by that family up a
half step.
The last rule is "overwrite." For these parameters, the instrument and
family do not store their own values for the parameter. Instead, a message
sent to an instrument or family overwrites the values of that parameter for
each active note of the instrument or family. Program change is a parameter
with this rule. Since there is no easy way to combine three synthesizer
programs into a single patch, a program change message received by a family
sets the program of all the notes of that family.
The "And" Rule for Articulation
The name "and" comes from Boolean logic. In this context, it means that a
note only sounds if it has been triggered at the note level, the instrument
level, and the family level. By default, everything is triggered at the
family and instrument levels, so sending a trigger message to any note turns
on that note. That is the normal case that people will use most of the time.
It is possible to turn off all the notes in a family just by sending a
release message to a family. If this happens, the previously sounding notes
still remember that they are turned on at the note and instrument levels, so
if you re-trigger the family, those notes will sound again.
Here is an example of how to take advantage of this. First, send a release
message to an instrument, preventing any notes from sounding on that
instrument. Then, send pitch and trigger messages to a group of notes in
that instrument, to form a chord. Those notes do not sound yet, because the
instrument is switched off. Finally, you can send a trigger message to the
instrument, which will trigger all of the notes of that instrument, playing
the chord you set up earlier. Now you can turn the chord on and off by
sending articulation messages to the instrument. If you want to add or
delete notes from the chord, send articulation messages at the note level.
Sending a Message to All Families
Sending a message to the "all families" address is an abbreviation for
sending the message repeatedly to families 1 through 63. It is not a fourth
level of the hierarchy in the sense of storing yet another parameter value
that must be added or multiplied through. Instead, it changes the values
stored for each of the 63 families.
Overwriting a Large Group of Values
Usually the multiply or add rule does what one would want; it makes sense to
have the oboes louder than the flutes by the same relative amount no matter
how quiet or loud the wind section gets. Occasionally, however, the
overwrite rule is desired even for parameters that typically use the
multiply or add rules.
For example, suppose all of the instruments in a family are playing
different pitches, but now you want them to play in unison. If you send a
pitch message to the family, it will transpose all of the instruments of the
family, leaving their relative pitches the same. Instead, you want a way to
say "individually set the pitch of each note of instrument in this family to
440 Hz."
The "overwrite" note descriptor handles cases like these. Its data bytes
consist of a note descriptor ID, which specifies a parameter to be
overwritten, and some data for that parameter, which specifies a new value
for it. If you send overwrite to a family, it sets the values for every
instrument in that family, throwing away each instrument's old value for the
parameter. If you send an overwrite message to an instrument, it sets the
values for each note in that instrument. If you send an overwrite message to
a family, and as the first data byte, repeat the ID for overwrite, the next
data byte gives a parameter that should be reset for every note of every
instrument of the family.
Not In This Layer
=================
It is important to mention some of the information that will be present in
ZIPI's lower-level network layers, and in application layers other than the
MPDL.
There will be separate application layers for sample dumps, patch dumps, and
raw binary data, which can be used analogously to MIDI's "system exclusive"
messages, sending data that does not fit the MPDL.
ZIPI's data link layer will provide a way for all of the devices on a ZIPI
network to synchronize their clocks to within 50 msec, to give a common time
base to the time stamp messages described above. Strictly, system-wide clock
synchronization is not required to benefit from time-tagged data. There are
algorithms to reduce network delay jitter even if the sending and receiving
devices have different time bases, but implementations of the MPDL over the
ZIPI data link layer will have the advantage of synchronized system clocks.
There is a way, via the data link layer, to request that certain packets be
confirmed upon receipt, to ensure that they arrive intact. Any packet sent
by the MPDL layer can request this confirmation from the data link layer, so
highly-critical messages such as "all notes off" could be guaranteed to
arrive.
A lower network layer provides a way for ZIPI devices to identify their
characteristics to other devices on the network, to query devices about
their characteristics, and to look for devices with certain characteristics.
These characteristics include instrument name, manufacturer, possible ZIPI
speeds, etc.
ZIPI's data link layer provides a way to send a packet to all devices on the
network, and also a way for a device to listen to all packets on the
network, regardless of the device for which the packet is intended. Both of
these features are available for MPDL data.
A separate application layer for machine control will handle issues of
synchronization (complying with SMPTE and other standards) and sequencer
control.
There will be an application layer for MIDI messages, carried over a ZIPI
network.
There will be an application layer for error messages. ZIPI devices with
limited user-interfaces can send ASCII encoded error messages, which will be
picked up and displayed to the user by another device, e.g., a computer.
References
==========
Anderson, D., and R. Kuivila. 1986. "Accurately Timed Generation of Discrete
Musical Events." *Computer Music Journal* 10(3): 48-56.
Blatter, A. 1980. *Instrumentation/Orchestration*. New York: Schirmer Books.
Blauert, J. 1982. *Spatial Hearing. *Trans. John S. Allen. Cambridge, MA:
MIT Press.
Dannenberg, R. 1989. "Real-Time Scheduling and Computer Accompaniment." In
M. V. Mathews, and J. R. Pierce, eds. 1989. *Current Directions in Computer
Music Research.* Cambridge, Massachusetts: MIT Press, pp. 225-261.
Krimphoff, J., S. McAdams, and S. Winsberg. 1994. "Caract risation du
timbre des sons complexes. II: Analyses acoustiques et quantification
psychophysique." Proceedings of the Third French Congress of Acoustics,
Toulouse.
Loy, D. G. 1985. "Musicians Make a Standard: The MIDI Phenomenon." *Computer
Music Journal* 9(4): 8-26.
Moore, B. 1989. *Introduction to the Psychology of Hearing,* 3rd edition.
London: Academic Press.
Moore, F. R. 1988. "The Dysfunctions of MIDI." *Computer Music Journal*
12(1): 19-28.
MIDI Manufacturers Association (MMA). 1991. *General MIDI System - Level 1.*
Los Angeles, California: International MIDI Association (IMA).
Pierce, J. R. 1992. *The Science of Musical Sound,* Revised edition. New
York: W. H. Freeman and Company.
Piston, W. 1955. *Orchestration*. New York: W. W. Norton and Co.
Puckette, M. 1991. "Combining Event and Signal Processing in the MAX
Graphical Programming Environment." *Computer Music Journal* 15(3): 58-67.
Risset, J. C. and D. Wessel. 1994. "Analysis-Synthesis Methods for Sound
Synthesis and the Study of Timbre." In D. Deutsch, ed. 1994. *The Psychology
of Music*, 2nd Edition. London: Academic Press.
Rona, J. 1991. "Proposed MIDI Extension" (letter). *Computer Music Journal*
15(3): 15.
Scholz, C. 1991. "A proposed extension to the MIDI specification concerning
tuning." *Computer Music Journal* 15(1): 49-54.
Serra, X., and J. Smith. 1990. "Spectral Modeling Synthesis: a Sound
Analysis Synthesis System Based on a Deterministic Plus Stochastic
Decomposition." *Computer Music Journal* 14(4): 12-24.
Smith, R. 1993. *Morpheus User's Manual*. Scotts Valley, California: E-Mu
Systems.
Wessel, D. L. 1985. "Timbre space as a musical control structure." In C.
Roads, and J. Strawn, eds. 1985. *Foundations of Computer Music*. Cambridge,
Massachusetts: MIT Press, pp. 640-657.
Zicarelli, D. 1991. "Communicating with Meaningless Numbers." *Computer
Music Journal* 15(4): 74-77.
Appendix A: ZIPI MPDL Byte format
=================================
In ZIPI's current low-level protocol, there are seven bytes of overhead in
each ZIPI packet that are not part of the application layer. (The first
three bytes say "this is a new ZIPI packet", the fourth byte is the network
address of the device that the packet is intended for, and the fifth byte
says "this packet contains application layer information." At the end of the
packet are two more bytes for the CRC error detection checksum.) It is
important that one of these bytes selects a particular ZIPI device by
number; therefore, a ZIPI device is not interrupted by packets that are
intended for other devices.
The first byte of the application layer data indicates the application layer
to which the given packet applies. If the first four bits are 0000 (i.e., if
the number represented by the byte is less than 16), it means that the
packet is for the Music Parameter Description Language, regardless of the
value of the next four bits. The other 240 possible one byte values indicate
different application layers.
The remainder of the first byte, plus the next two bytes, make up a 20-bit
note address. The format of these 20 bits is described below.
After the note address is selected, a packet consists of any number of note
descriptors. A note descriptor consists of a one-byte ID (as given in Tables
2 through 6 and 15), and some number of data bytes. (This is similar to a
MIDI message; what they call "status byte" we call "ID") See Figure 8 for an
illustration of this.
[Figure 8 would go here if this weren't the ASCII version]
A packet can contain multiple note descriptors to cut down on network
overhead; multiple parameters can be updated in a single ZIPI MPDL packet
while only incurring the seven-byte overhead once. Furthermore, with the
"new address" message described above, a single ZIPI MPDL packet can contain
note descriptors for an arbitrary number of different addresses. Figure 9
shows the structure of a ZIPI MPDL packet, including the seven overhead
bytes required for the lower network layers.
[Figure 9 would go here if this weren't the ASCII version]
Byte Format of ZIPI MPDL Addresses
----------------------------------
The 20-bit address is interpreted as a 6-bit family, a 7-bit instrument, and
a 7-bit note, as shown in Figure 10.
[Figure 10 would go here if this weren't the ASCII version]
For example, the binary address 000111 0011001 0010010 means that the
information in the packet is addressed to note 18 (binary 10010) of
instrument 25 (binary 11001) of family 7 (binary 111).
Note 0 of any instrument means "the entire instrument," so the address
000111 0011001 0000000 means "instrument 25 of family 7." Likewise,
instrument 0 of any family means "the entire family." So the address 000111
0000000 0000000 means "family 7." (If the instrument bits are zero, it does
not matter what the note bits are; any address whose first 13 bits are
000111 0000000 means family 7.) Finally, for messages that affect the entire
address space, family zero means "all families." As mentioned above, this is
just an abbreviation for a message sent to each of the 63 families. If the
first six bits of an address are zero, it does not matter what the other 14
bits are.
The new address message has three data bytes, but MPDL addresses are only 20
bits. So the high-order four bits of the first byte must be 0000, to be
consistent with the 0000 at the beginning of MPDL portion of a ZIPI packet.
Note Descriptor Length
----------------------
The high-order two bits of the note descriptor ID say how many data bytes
the note descriptor has, according to Table 16.
High-order bits of ID Length
--------------------- ------
00 1 byte
01 2 bytes
10 4 bytes
11 other
Table 16: Number of Data Bytes of Note Descriptors, Based on ID
Thus, there are 64 note descriptors that have one data byte, 64 note
descriptors that have two data bytes, etc. Note descriptors that only
require three data bytes, e.g., new address, have IDs that begin with binary
10. These note descriptors actually have four data bytes; the fourth is
simply ignored. (So the 4 bytes for new address are 0000, followed by the
20-bit address, followed by 8 more bits to be ignored.)
When the high-order bits are binary 11, i.e., "other," the message has more
than three data bytes. In this case, the two bytes after the note descriptor
ID are not data bytes; instead they form a 16-bit unsigned integer that
tells the number of data bytes. Figure 11 shows this pictorially. Figure 12
shows an example---since the note descriptor begins with "11", the length is
given by the next two bytes. The second and third bytes form the number 5,
meaning that there are 5 data bytes, for a total of 8 bytes altogether in
the note descriptor. The four data bytes are decimal 1, 2, 3, 4, and 5.
[Figure 11 would go here if this weren't the ASCII version]
[Figure 12 would go here if this weren't the ASCII version]
Complete example
----------------
Figure 13 shows every single byte of the MPDL portion of a typical ZIPI
packet. This packet corresponds to playing a close voiced root position C
major triad on a keyboard, with all three notes being sent simultaneously.
[Figure 13 would go here if this weren't the ASCII version]
Appendix B: ZIPI MPDL file format
=================================
Files containing ZIPI MPDL data are logically sequences of MPDL frames. The
file must also include a time stamp for each frame; these time stamps have
the same format as MPDL time stamps (an unsigned four byte integer, in units
of 50 msec). The file must also include the number of bytes of each frame.
(Remember that there can be multiple note descriptors in a single MPDL
packet; we know the length of the MPDL data only because the lower network
levels know when the entire packet ends.) This count is an unsigned two byte
integer.
A ZIPI MPDL file thus consists of an arbitrary number of repetitions of:
- a 4-byte time tag;
- a 2-byte count; and
- an MPDL packet consisting of the given number of data bytes
Figure 14 demonstrates this format pictorially.
[Figure 14 would go here if this weren't the ASCII version]
We have intentionally used the same format for MPDL files' time tags as for
the MPDL time tag note descriptor; this makes it trivial to convert
sequences of time-tagged MPDL packets into MPDL files. In this case, there's
no reason to store the time tag note descriptor in the file, since that
information would be redundant. When storing non-time-tagged MPDL data into
a file, the process creating the file will have to supply its own time
stamps.