414 lines
22 KiB
Perl
414 lines
22 KiB
Perl
[To be published in Computer Music Journal 18:4 (Winter 94)]
|
|
|
|
A Comparison of MIDI and ZIPI
|
|
|
|
Matthew Wright
|
|
Center for New Music and Audio Technologies (CNMAT)
|
|
Department of Music, University of California, Berkeley
|
|
1750 Arch St.
|
|
Berkeley, California 94720 USA
|
|
Matt@CNMAT.Berkeley.edu
|
|
|
|
A main factor in the design of ZIPI was frustration with MIDI, the
|
|
well-established standard for communication among electronic musical
|
|
instruments. This article lists some of those frustrations and explain how
|
|
ZIPI overcomes them. Basic knowledge of both MIDI and ZIPI are assumed
|
|
(e.g., [IMA 1988] and the related articles on ZIPI in this issue).
|
|
|
|
|
|
Real Networks
|
|
=============
|
|
|
|
Each MIDI device has separate MIDI plugs labeled "in," "out," or "thru." A
|
|
MIDI user must think carefully about which devices will be sending
|
|
information to which other devices, and arrange that the "MIDI-out" of the
|
|
device sending the information is connected with its own cable to the
|
|
"MIDI-in" of the device receiving the information, or that the signal is
|
|
properly daisy-chained via the "MIDI-thru" of intermediate devices.
|
|
|
|
Computer networks, in contrast, have the characteristic that any device
|
|
connected to the network can send and receive packets to and from any other
|
|
device connected to the network. ([Tanenbaum 1989] is an excellent
|
|
introduction to computer networks.) In Ethernet, for example, every device
|
|
has only one plug, and a single cable connects the computer to the entire
|
|
network, allowing it to talk to any device.
|
|
|
|
In this respect, ZIPI is more like a computer network than like MIDI. Each
|
|
ZIPI device has only one ZIPI plug, and a single ZIPI cable connects it to a
|
|
hub. Any devices connected to the same hub can send packets to each other.
|
|
If multiple hubs are connected with ZIPI cables, than any device connected
|
|
to any hub can talk to any other device on any hub.
|
|
|
|
This means that musicians will never have to rewire their ZIPI studios,
|
|
unless they add or remove devices. If you want to use your ZIPI synthesizer
|
|
as a keyboard controller one day and as a timbre module the next day, no
|
|
wiring has to change. You never have to worry about the number of ZIPI
|
|
outputs your computer has, because connecting your computer's one ZIPI port
|
|
to the hub will allow the computer to control any number of ZIPI
|
|
synthesizers. You never have to build complicated wiring structures with
|
|
in/out/thru, because you don't have to think about which data will flow
|
|
through which wire.
|
|
|
|
This also means that all ZIPI communication can be two-way instead of
|
|
one-way. A ZIPI controller can ask questions of a ZIPI synthesizer, e.g., to
|
|
find out its capabilities, and the synthesizer can respond by sending a
|
|
message back to the controller (Loy 1985).
|
|
|
|
|
|
Bandwidth and Efficiency
|
|
========================
|
|
|
|
MIDI has a data rate of 31.25 kBaud, which it uses 80 percent efficiently.
|
|
(MIDI has 10-bit bytes, with a start bit, 8 data bits, and a stop bit.) This
|
|
is more than enough for note on and note off events; consider the extreme
|
|
case of a keyboard player playing 10-voice 16th-note chords at 120 beats per
|
|
minute. That is 80 notes per second. A MIDI note on and note off each take
|
|
only 2 bytes to transmit (using running status), so that's 320 bytes, or
|
|
3,200 bits per second, which is just over a tenth of MIDI's bandwidth.
|
|
|
|
MIDI cannot keep up, however, with continuous controllers. A guitar
|
|
controller soon to be released by Zeta Music tracks pitch, loudness,
|
|
brightness, even/odd ratio, and noise amount on each of six strings,
|
|
updating each parameter every 8-10 msec. Pitch and loudness are 2-byte
|
|
values; the other three are 1-byte, so to send these over MIDI we would have
|
|
to use seven continuous controllers. (Even this is generous, since MIDI
|
|
continuous controllers are only 7 bits and we compute 8-bit bytes.) This
|
|
results in 4200 control updates per second:
|
|
|
|
7 control updates * 6 strings / 10 msec = 4200 control updates/sec
|
|
|
|
Sending continuous controllers on separate channels rules out MIDI running
|
|
status, so we assume each of these would take three bytes. How much
|
|
bandwidth does this require?
|
|
|
|
4200 control updates/sec * 3 bytes/controller * 10 bits/byte = 126 kBaud.
|
|
|
|
This is over four times MIDI's data rate, without even considering note on
|
|
and note off messages.
|
|
|
|
ZIPI's data rate is variable, with no maximum, so as technology improves and
|
|
data rates increase, ZIPI will never be a bottleneck. ZIPI's minimum data
|
|
rate is 250 kBaud, eight times MIDI's rate, which is a comfortable speed
|
|
even for this kind of continuous information. Currently available
|
|
communication chips allow a maximum data rate of 20 MBaud. (ZIPI includes a
|
|
mechanism for automatically picking the fastest speed that all connected
|
|
devices can handle, so it's no problem to mix ZIPI devices with different
|
|
data rates.)
|
|
|
|
Also, ZIPI's data format allows it to transmit high-bandwidth information
|
|
more efficiently than MIDI. For example, the information produced by the
|
|
guitar controller mentioned above would require 126 kBaud to transmit via
|
|
MIDI continuous controllers. Via ZIPI, the same controller could transmit
|
|
all the same data, with slightly higher resolution, using only 85.6 kBaud.
|
|
(See the derivation for this in the "Examples of ZIPI Applications" article
|
|
in this issue.) Thus, in addition to being faster than MIDI, ZIPI uses its
|
|
bandwidth more efficiently.
|
|
|
|
|
|
Flexibility in Message Addressing
|
|
=================================
|
|
|
|
MIDI messages fall into two categories. The first category consists of the
|
|
messages whose first data byte specifies a particular note number: note on,
|
|
note off, and polyphonic after-touch. All other MIDI controller messages,
|
|
such as pitch bend, pan, and modulation, apply to an entire channel, not a
|
|
single note.
|
|
|
|
Imagine that you are controlling a synthesizer from a guitar via MIDI. Each
|
|
of the six guitar strings might be bent by the guitarist by different
|
|
amounts, so to have individual pitch-bend control of six voices, you'd have
|
|
to put them on six different MIDI channels; all MIDI guitars do this. That
|
|
is awkward and needlessly complicated, and it uses up over a third of the
|
|
MIDI channels for one instrument. In ZIPI, it is possible to address a pitch
|
|
message to a single note instead of an entire channel. In fact, any message
|
|
can be sent to a single note, so this entire category of problem can never
|
|
arise.
|
|
|
|
MIDI has the opposite problem too. It would be nice to turn off all of the
|
|
notes on a channel all at once, but since note off commands cannot be sent
|
|
to a channel, this is impossible. Every note off message has to be sent to
|
|
only one note. There is a separate "All Notes Off" message, but it has a
|
|
decidedly second-class status; "In no case should [all notes off messages]
|
|
be used in lieu of note off commands to turn off notes which have been
|
|
previously turned on. Therefore any all notes off command (123-127) may be
|
|
ignored by receiver with no possibility of notes staying on, since any note
|
|
on command must have a corresponding specific note off command" (IMA 1988).
|
|
|
|
For after-touch, there are also two separate messages: polyphonic
|
|
after-touch, applicable only to a single note, and channel after-touch,
|
|
applicable only to an entire channel. The MIDI standard doesn't explicitly
|
|
discourage either of these messages, but in practice the channel version of
|
|
the message is generally favored---few MIDI controllers send polyphonic
|
|
after-touch. Again, MIDI has separate controllers that mean the same thing,
|
|
except for their addressability.
|
|
|
|
The last note-addressed MIDI message is note on. It would be nice to be able
|
|
to articulate an entire chord in one message, avoiding temporal "smearing"
|
|
of the onsets of the notes in the chord (Moore 1988) and saving bandwidth.
|
|
This is impossible in MIDI. There isn't even a second-class channel message
|
|
for note-on, because MIDI has no way to specify what notes the chord would
|
|
contain. In ZIPI, every message can be sent either to a single note or to a
|
|
group of notes. Anything you can tell a note to do you can also tell a group
|
|
of notes to do.
|
|
|
|
|
|
Address Space
|
|
=============
|
|
|
|
In MIDI, a note's address is the same as the note's pitch. If you want to
|
|
specify which note to apply after touch to, or which note to release, you
|
|
have to name that note by giving its pitch. You cannot say "note number 55"
|
|
without it meaning "the note whose pitch is G below middle C."
|
|
|
|
In real life, though, a note's pitch might change over time, or there might
|
|
be two notes played on the same instrument with the same pitch. Both of
|
|
these situations are awkward to express in MIDI. You can't say "that note
|
|
that is G below middle C; slide it up a whole step to A below middle C." You
|
|
can send a pitch-bend message to the channel containing that note, but then
|
|
when you want to release the A you still have to call it a G, because the
|
|
note number is the name as well as the pitch.
|
|
|
|
Similarly, imagine a MIDI guitar controller in which the guitarist is
|
|
fretting an E on the fifth fret of the B string, and also letting the open E
|
|
ring on the high E string. The guitar is playing two notes at the same time,
|
|
with the same pitch. But the note on the E string might be a lot quieter
|
|
than the note on the B string, or the note on the B string might be bent up
|
|
a half step, or one of them could end while the other keeps sounding. When
|
|
you send a typical MIDI synthesizer two note-on messages with the same
|
|
pitch, it plays two copies of the same note. But then it's hard to send
|
|
messages to a particular one of the two notes. If you send polyphonic after
|
|
touch to MIDI note number 64 (the E being played by two strings), it might
|
|
affect both the sounding notes, or just one of them, but there is no way to
|
|
specify which one. If you send a note-off to note number 64, either note
|
|
might release, even if one is much louder than the other. It is possible to
|
|
get around these problems by using separate MIDI channels for each note.
|
|
Then you could have a loud E on channel 1, and a quieter E, with after
|
|
touch, on channel 2. But this solution is inelegant and awkward, and it soon
|
|
leads to running out of MIDI channels.
|
|
|
|
In ZIPI, the notions of address and pitch are separate. ZIPI note number 64
|
|
doesn't have to be the E above middle C; it is simply a number. When you
|
|
want a note to sound, you pick an address, give it a pitch, loudness, etc.,
|
|
and tell it to start. Then whenever you want to make changes to this note,
|
|
you send the address of this note and the note descriptors that change it.
|
|
|
|
|
|
Distinguishing Between Controller and Synthesizer Messages
|
|
==========================================================
|
|
|
|
When a musician controls a synthesizer, there are four steps: (1) the
|
|
musician performs some action, like blowing into a mouthpiece or pressing
|
|
keys; (2) these gestures are somehow measured, producing parameters such as
|
|
"how fast the key was going" and "which fret was fingered"; (3) these
|
|
measurements are translated into parameters to control a synthesizer. For
|
|
example, key velocity might map to amplitude and brightness, and fret
|
|
position would map to pitch; and (4) a synthesizer takes these control
|
|
parameters and produces sound. Figure 1 illustrates these steps. Note that
|
|
there are two streams of information. One is a stream of measurements about
|
|
the musician's gestures; the other is a stream of control parameters for a
|
|
synthesizer.
|
|
|
|
[Figure 1 would go here if this weren't the ASCII version]
|
|
|
|
In MIDI, these two streams are confused. There is no way to directly set the
|
|
pitch of a note in MIDI. You can say which key was pressed, and what the
|
|
position of the pitch bend wheel is, but those are both descriptions of what
|
|
the musician's hands are doing, not measurements of pitch. In other words,
|
|
MIDI's notion of pitch only goes as far as describing the gestures produced
|
|
by a keyboard player, not explicitly controlling a synthesizer.
|
|
|
|
Obviously, failing to make a distinction between these two ideas does not
|
|
prevent music from being made with MIDI. For example, MIDI users understand
|
|
that the way to send pitch via MIDI is to pretend that a keyboard player is
|
|
pressing a certain key and holding the pitch bend wheel in a certain
|
|
position, even if they would rather control pitch directly. (Non-keyboard
|
|
MIDI controllers start by knowing the desired pitch; then they have to go
|
|
through extra steps to translate the desired pitch into a MIDI key number
|
|
plus a pitch bend amount.) Likewise, people use the term "velocity," which
|
|
is a measure of how fast a key is pressed, to mean loudness or amplitude.
|
|
|
|
ZIPI has a distinction between these two kinds of information. Standard
|
|
messages, which ZIPI synthesizers expect to see, are descriptions of sounds
|
|
that should be produced, not descriptions of gestures that the musician is
|
|
producing. So instead of having "key number" and "velocity," ZIPI has
|
|
"pitch" and "loudness." But ZIPI also has a second set of parameters
|
|
explicitly for describing musicians' gestures. These include keyboard
|
|
measurements like key number and velocity, but also parameters that come
|
|
from other controllers, e.g., bow position, wind pressure, and striking
|
|
position on a drum head.
|
|
|
|
|
|
Controlling Drum Machines
|
|
=========================
|
|
|
|
Many MIDI drum machines and drum timbre modules allow the user to
|
|
pitch-shift and pan drum samples. This can be useful to create what seems
|
|
like a large number of instruments out of one single sample. But since
|
|
MIDI's pitch is the same as its address, it is common for each key number to
|
|
be assigned to a different sound altogether, as in ``middle C is ride
|
|
cymbal, C# above that is closed hi-hat...'' With this scheme, it's
|
|
impossible to use MIDI's pitch mechanism to specify the pitch of a drum
|
|
sound. Some MIDI drum machines get around this by letting the user assign
|
|
the same sample, with different pitches and pan locations, to multiple MIDI
|
|
note numbers (Kawai 1986, Smith 1990), but that easily results in running
|
|
out of note numbers.
|
|
|
|
Furthermore, this mapping from MIDI note numbers to various drum sounds
|
|
isn't standard, and can't be set via MIDI. This makes it difficult for two
|
|
drum machines to communicate via MIDI, because MIDI note number 37 might
|
|
mean snare drum to one instrument and crash cymbal to another. Using
|
|
different pitch and pan values for the same sound on different MIDI key
|
|
numbers just makes this worse, because even if MIDI note 68 is a crash
|
|
cymbal on both drum machines, it might be pitch shifted up on one of them
|
|
and down on the other.
|
|
|
|
This can even be a nuisance when sequencing drum tracks from the same drum
|
|
machine that will play them back. For example, suppose your drum machine
|
|
lets you specify the pitch and pan of each note as you add it to a drum
|
|
pattern. Once your pattern is complete you want to load it into your
|
|
sequencer along with the keyboard parts. But on many drum machines, the
|
|
MIDI note numbers chosen for outgoing MIDI data are determined only by the
|
|
instrument being played, not by the pitch of that instrument. So
|
|
translating a drum sequence to MIDI loses the work spent specifying the
|
|
pitches.
|
|
|
|
Drums under ZIPI would be much easier, because pitch and address are
|
|
separate concepts, and because each note can have its own pitch, program
|
|
change, and pan. A typical configuration would be to think of a drum kit as
|
|
a family, with instruments like snare drum, timpani, cowbell, etc. Each of
|
|
these instruments could be sent a program change message selecting the
|
|
appropriate percussion timbre, so there is no ambiguity about the mapping of
|
|
instrument numbers to drum sounds. A percussion sound could be selected by
|
|
choosing an instrument, and pitch or pan could be changed by sending a pitch
|
|
or pan message to a note in that instrument.
|
|
|
|
This means that a ZIPI drum machine wouldn't have to provide so much
|
|
structure for assigning sounds, pitches, and pans to each key number.
|
|
Instead, all of the setup can be done over ZIPI. To get a new set of
|
|
sounds, your controller or sequencer can just send program change, pan, and
|
|
pitch messages to each instrument of the drum kit.
|
|
|
|
ZIPI's MPDL also has note descriptors reserved for drum-specific control
|
|
parameters like position on the drum head, and velocity and acceleration.
|
|
Continuous hi-hat pedal position, varying from fully depressed to fully
|
|
open, would be encoded in ``continuous pedal'' messages. Hopefully, the
|
|
next generation of drum pads and drum machines will take advantage of these
|
|
parameters to give electronic drums a level of expressivity closer to that
|
|
of acoustic drums.
|
|
|
|
|
|
Data Resolution
|
|
===============
|
|
|
|
Each MIDI byte begins with a status bit that tells whether it is a data byte
|
|
or a control byte, so each byte really only has seven user-settable bits.
|
|
Seven bits is not enough resolution for a variety of applications, and it is
|
|
awkward to send larger amounts of information. It is possible to partition a
|
|
14-bit quantity into two separate MIDI controllers, but this is messy and
|
|
rarely done. Also, even 14 bits is not enough for many applications; it
|
|
would take 3 MIDI bytes (30 bits transmitted) to send a 16-bit word. ZIPI
|
|
parameters can have any number of 8-bit data bytes; there is no per-byte
|
|
overhead in ZIPI.
|
|
|
|
MIDI uses only four bits to encode a channel, giving 16 channels. This major
|
|
weakness has given rise to kludges like multiple MIDI outputs on a computer,
|
|
each with an associated letter. This would give, e.g., 32 MIDI channels,
|
|
which could be referred to by special software as A1-A16 and B1-B16 (Roberts
|
|
1992). ZIPI addresses are 20 bits, giving over a million possible addresses.
|
|
|
|
|
|
High-Level Parameter Control
|
|
============================
|
|
|
|
Suppose you are playing something on a multi-timbral synthesizer via MIDI,
|
|
and that you want to turn down the entire output of the synthesizer via
|
|
MIDI. The only way to do it is to send continuous controller 7, volume, to
|
|
all 16 MIDI channels. In ZIPI, messages can be sent to any level of the
|
|
address space hierarchy, so it would be possible to turn down a group of
|
|
instruments all at once (and with only one network message) by sending a
|
|
loudness message to the family that contains those instruments. It is even
|
|
possible to send a message to all families at once. This should make it
|
|
unnecessary to duplicate the same ZIPI message many times to control
|
|
different notes.
|
|
|
|
MIDI also requires a large number of messages to apply a simple function to
|
|
a parameter. For example, suppose you would like to exponentially decrease
|
|
the volume of a MIDI channel. The only way to do this is to send a stream of
|
|
volume controller messages. In ZIPI, it is possible to request that a
|
|
certain function modulate a parameter. You could say, for example, "begin an
|
|
exponential decay of loudness that takes 2.3 seconds to go to silence" in a
|
|
single message, and the decrescendo would then happen without any further
|
|
messages. There are some useful pre-defined functions in ZIPI, and a way for
|
|
you to send your own tables over the network if you would like to make up
|
|
your own functions "on the fly."
|
|
|
|
|
|
Support for Pitch Trackers
|
|
==========================
|
|
|
|
The theoretical lower bound to find the pitch of an arbitrary signal is one
|
|
period. The lowest note of a 5-string bass guitar, the B three octaves and a
|
|
half step below middle C, is 30.9 Hz. One period at 30.9 Hz is 32 msec. A
|
|
MIDI bass guitar can know that the musician is playing a note well before
|
|
one msec, just from looking at the amplitude of the signal coming from the
|
|
pickup. But it can't know the pitch for at least 32 msec, probably more.
|
|
|
|
In MIDI, it is impossible to start a note without a commitment to the note's
|
|
pitch, since pitch (i.e., key number) is part of a note-on message. The
|
|
synthesized note cannot start for quite a long time after the musician plays
|
|
it on the bass. A 30 msec delay here is very easily detected by the ear;
|
|
that is why most MIDI bass and guitar controllers feel "spongy" or
|
|
unresponsive to many musicians.
|
|
|
|
What can the synthesizer do for the 30 msec between when the note starts and
|
|
the pitch tracker knows the pitch? The ear is very forgiving about exactly
|
|
what it hears for those 30 msec. Many non-electronic timbres begin with lots
|
|
of noise-like sound for at least 30 msec, for example, the hammer noise on a
|
|
piano or the wind turbulence on a flute. The pitch can sometimes vary a
|
|
great deal during the onset of a note. An examination of brass tones, for
|
|
example, shows that there is often an extensive glissando during the attack,
|
|
yet we hear the note as having a definite, fixed pitch (Risset and Wessel
|
|
1994). It is not that the glissando is imperceptible; it is just that the
|
|
glissando is heard as part of the attack characteristic of the tone rather
|
|
than as part of the pitch.
|
|
|
|
The solution therefore would be for the bass guitar controller to send a
|
|
note-on message as soon as it knows there is a note. The synthesizer can
|
|
play mostly noise, or the wrong pitch, for 30 msec or so, while the pitch
|
|
tracker is waiting to find the pitch. When the pitch is determined, the
|
|
controller can update the synthesizer, and from then on the synthesizer will
|
|
play the right pitch.
|
|
|
|
This is easy in ZIPI, since it is possible to articulate a note and then
|
|
later correct the pitch of that note. ZIPI also has a way to set the balance
|
|
of a sound's pitched and noise portions.
|
|
|
|
|
|
References
|
|
==========
|
|
|
|
International MIDI Association (IMA). 1988. *MIDI 1.0 Detailed
|
|
Specification, Document Version 4.0*. Los Angeles, California, IMA.
|
|
|
|
Kawai. 1986. *R-100 Digital Drum Machine Owner's Manual*. Tokyo, Japan:
|
|
Kawai Corp..
|
|
|
|
Loy, D. G. 1985. "Musicians Make a Standard: The MIDI Phenomenon." *Computer
|
|
Music Journal* 9(4): 8-26.
|
|
|
|
Moore, F. R. 1988. "The Dysfunctions of MIDI." *Computer Music Journal*
|
|
12(1): 19-28.
|
|
|
|
Risset, J. C., and D. Wessel. 1994. "Analysis-Synthesis Methods for Sound
|
|
Synthesis and the Study of Timbre." In D. Deutsch, ed. 1994. *The Psychology
|
|
of Music*, 2nd Edition. London: Academic Press.
|
|
|
|
Roberts, A. 1992. "Devices for Increasing the Number of MIDI Channels."
|
|
*Computer Music Journal* 16(4): 101-104.
|
|
|
|
Smith, R. 1990. *PROCUSSION 16 bit Percussion Sound Module Operation
|
|
Manual.* Scotts Valley, California: E-Mu Systems.
|
|
|
|
Tanenbaum, A. S. 1989. *Computer Networks*. Englewood Cliffs, New Jersey:
|
|
Prentice Hall.
|