1350 lines
56 KiB
Plaintext
1350 lines
56 KiB
Plaintext
|
|
Subject FAQ: Audio File Formats (version 2.3)
|
|
|
|
Submitted-by Guido van Rossum <guido@cwi.nl>
|
|
Last-modified 9-Jul-1992
|
|
Lines 1365
|
|
|
|
|
|
FAQ: Audio File Formats (version 2.3)
|
|
=====================================
|
|
|
|
Table of contents
|
|
-----------------
|
|
|
|
Introduction
|
|
Device characteristics
|
|
Popular sampling rates
|
|
Compression schemes
|
|
Current hardware
|
|
File formats
|
|
File conversions
|
|
Playing audio files on UNIX
|
|
Playing audio files on micros
|
|
The Sound Site Newsletter
|
|
Posting sounds
|
|
|
|
Appendices:
|
|
|
|
FTP access for non-internet sites
|
|
AIFF Format (Audio IFF)
|
|
The NeXT/Sun audio file format
|
|
IFF/8SVX Format
|
|
Playing sound on a PC
|
|
The EA-IFF-85 documentation
|
|
US Federal Standard 1016 availability
|
|
Creative Voice (VOC) file format
|
|
RIFF WAVE (.WAV) file format
|
|
|
|
|
|
Introduction
|
|
------------
|
|
|
|
This is version 2 of this FAQ, which I started in November 1991 under
|
|
the name "The audio formats guide". I bumped the major version number
|
|
since the Subject and Newsgroups headers have changed to make the
|
|
subject more informative and give the guide a wider audience. I also
|
|
added a Table of contents section at the top.
|
|
|
|
I am posting this about once a fortnight, either unchanged (just to
|
|
inform new readers), or updated (if I learn more or when new hardware
|
|
or software becomes popular). I post to alt.binaries.sounds.{misc,d}
|
|
and to comp.dsp, for maximal coverage of people interested in audio,
|
|
and to news.answers, for easy reference.
|
|
|
|
A companion posting with subject "Change to: ..." is occasionally
|
|
posted listing the diffs between a new version and the last. This is
|
|
not reposted, and it is suppressed when the diffs are bigger than the
|
|
new version.
|
|
|
|
Send updates, comments and questions to <guido@cwi.nl>; flames to
|
|
/dev/null.
|
|
|
|
I'd like to thank everyone who sent me mail with updates for previous
|
|
versions. The list of names is really too long to list you all...
|
|
|
|
--Guido van Rossum, CWI, Amsterdam <guido@cwi.nl>
|
|
"Fear, surprise, ruthless efficiency, a fanatical devotion to the
|
|
pope, and nice red uniforms."
|
|
|
|
|
|
Device characteristics
|
|
----------------------
|
|
|
|
In this text, I will only use the term "sample" to refer to a single
|
|
output value from an A/D converter, i.e., a small integer number
|
|
(usually 8 or 16 bits).
|
|
|
|
Audio data is characterized by the following parameters, which
|
|
correspond to settings of the A/D converter when the data was
|
|
recorded. Naturally, the same settings must be used to play the data.
|
|
|
|
- sampling rate (in samples per second), e.g. 8000 or 44100
|
|
|
|
- number of bits per sample, e.g. 8 or 16
|
|
|
|
- number of channels (1 for mono, 2 for stereo, etc.)
|
|
|
|
Approximate sampling rates are often quoted in Hz or kHz ([kilo-]
|
|
Hertz), however, the politically correct term is samples per second
|
|
(samples/sec). Sampling rates are always measured per channel, so for
|
|
stereo data recorded at 8000 samples/sec, there are actually 16000
|
|
samples in a second. I will sometimes write 8 k as a shorthand for
|
|
8000 samples/sec.
|
|
|
|
Multi-channel samples are generally interleaved on a frame-by-frame
|
|
basis: if there are N channels, the data is a sequence of frames,
|
|
where each frame contains N samples, one from each channel. (Thus,
|
|
the sampling rate is really the number of *frames* per second.) For
|
|
stereo, the left channel usually comes first.
|
|
|
|
The specification of the number of bits for U-LAW (pronounced mu-law
|
|
-- the u really stands for the Greek letter mu) samples is somewhat
|
|
problematic. These samples are logarithmically encoded in 8 bits,
|
|
like a tiny floating point number; however, their dynamic range is
|
|
that of 14 bit linear data. Source for converting to/from U-LAW
|
|
(written by Jef Poskanzer) is distributed as part of the SOX package
|
|
mentioned below; it can easily be ripped apart to serve in other
|
|
applications. The official definition is the CCITT standard G.711.
|
|
|
|
(There exists another encoding similar to U-LAW, called A-LAW, which
|
|
is used as a European telephony standard. I don't know how it differs
|
|
from U-LAW. There is less support for it in UNIX workstations.)
|
|
|
|
|
|
Popular sampling rates
|
|
----------------------
|
|
|
|
Some sampling rates are more popular than others, for various reasons.
|
|
Some recording hardware is restricted to (approximations of) some of
|
|
these rates, some playback hardware has direct support for some. The
|
|
popularity of divisors of common rates can be explained by the
|
|
simplicity of clock frequency dividing circuits :-).
|
|
|
|
Samples/sec Description
|
|
|
|
5500 One fourth of the Mac sampling rate (rarely seen).
|
|
|
|
7333 One third of the Mac sampling rate (rarely seen).
|
|
|
|
8000 Exactly 8000 samples/sec is a telephony standard that
|
|
goes together with U-LAW (and also A-LAW) encoding.
|
|
Some systems use an approximation; in particular, the
|
|
NeXT workstation uses 8012.8210513 samples/sec.
|
|
(Can anyone explain why? SGI software calls this rate
|
|
"codec-rate".)
|
|
|
|
11 k Either 11025, a quarter of the CD sampling rate,
|
|
or half the Mac sampling rate (perhaps the most
|
|
popular rate on the Mac).
|
|
|
|
16000 Used by, e.g. the G.722 compression standard.
|
|
|
|
22 k Either 22050, half the CD sampling rate, or the Mac
|
|
rate; the latter is precisely 22254.545454545454 but
|
|
usually misquoted as 22000.
|
|
|
|
32000 Used in digital radio, NICAM (Nearly-Instantaneous
|
|
Companded Audio Multiplex [IBA/BREMA/BBC]) and other
|
|
TV work, at least in the UK; also some DAT machines
|
|
can do it. It is also the standard long play speed
|
|
for DAT non linear encoding.
|
|
|
|
44056 This weird rate is used by professional audio
|
|
equipment to fit an integral number of samples in a
|
|
video frame.
|
|
|
|
44100 The CD sampling rate. (Professional DAT also supports
|
|
this rate.)
|
|
|
|
48000 The DAT (Digital Audio Tape) sampling rate for
|
|
domestic use.
|
|
|
|
Files samples on SoundBlaster hardware have sampling rates that are
|
|
divisors of 1000000.
|
|
|
|
While professinal musicians disagree, most people don't have a problem
|
|
if recorded sound is played at a slightly different rate, say, 1-2%.
|
|
On the other hand, if recorded data is being fed into a playback
|
|
device in real time (say, over a network), even the smallest
|
|
difference in sampling rate can frustrate the buffering scheme used...
|
|
|
|
There may be an emerging tendency to standardize on only a few
|
|
sampling rates and encoding styles, even if the file formats may
|
|
differ. The suggested rates and styles are:
|
|
|
|
rate (samp/sec) style mono/stereo
|
|
|
|
8000 8-bit U-LAW mono
|
|
22050 8-bit linear unsigned mono and stereo
|
|
44100 16-bit linear signed mono and stereo
|
|
|
|
|
|
Compression schemes
|
|
-------------------
|
|
|
|
Strange though it seems, audio data is remarkably hard to compress
|
|
effectively. For 8-bit data, a Huffman encoding of the deltas between
|
|
successive samples is relatively successful. For 16-bit data,
|
|
companies like Sony and Philips have spent millions to develop
|
|
proprietary schemes.
|
|
|
|
Public standards for voice compression are slowly gaining popularity,
|
|
e.g. CCITT G.721 and G.723 (ADPCM at 32 and 24 kbits/sec). (ADPCM ==
|
|
Adaptive Delta Pulse Code Modulation.) Free source code for a *fast*
|
|
32 kbits/sec ADPCM algorithm is available by ftp from ftp.cwi.nl as
|
|
/pub/adpcm.shar.
|
|
|
|
There are also two US federal standards, 1016 (Code excited linear
|
|
prediction (CELP), 4800 bits/s) and 1015 (LPC-10E, 2400 bits/s). See
|
|
also the appendix for 1016.
|
|
|
|
(Note that U-LAW and silence detection can also be considered
|
|
compression schemes.)
|
|
|
|
Finally, the comp.compression FAQ has some text on the 6:1 audio
|
|
compression scheme used by MPEG (a video compression standard-to-be).
|
|
It's interesting to note that video compression reaches much higher
|
|
ratios (like 26:1).
|
|
|
|
|
|
Current hardware
|
|
----------------
|
|
|
|
I am aware of the following computer systems that can play back and
|
|
(sometimes) record audio data, with their characteristics. Note that
|
|
for most systems you can also buy "professional" sampling hardware,
|
|
which supports much better quality, e.g. >= 44.1 k 16 bits stereo.
|
|
The characteristics listed here are a rough estimate of the
|
|
capabilities of the basic hardware only (and even here I am on thin
|
|
ice, with systems becoming ever more powerful).
|
|
|
|
machine bits max sampling rate #output channels
|
|
|
|
Mac 8 22k 1
|
|
Apple IIgs 8 32k / >70k 8(st)
|
|
PC/Soundblaster 8 13k /22k 1
|
|
Atari ST 8 22k 1
|
|
Atari STe,TT 8 50k 2
|
|
Amiga 8 ~29k 4(st)
|
|
Sun Sparc U-LAW 8k 1
|
|
NeXT U-LAW,8,16 44.1k 1(st)
|
|
SGI Indigo 8,16 48k 4(st)
|
|
Acorn Archimedes ~U-LAW ~180k 8(st)
|
|
Sony RISC-NEWS 8, 16 37.8k ?(st)
|
|
VAXstation 4000 U-LAW 8k 1
|
|
Tandy 1000/[TS]L 8-bit 22k 3
|
|
|
|
4(st) means "four voices, stereo"; sampling rates xx/yy are
|
|
different recording/playback rates.
|
|
|
|
All these machines can play back sound without additional hardware,
|
|
although the needed software is not always standard; only the Sun,
|
|
NeXT and SGI come with standard sampling hardware (the NeXT only
|
|
samples U-LAW at 8000 samples/sec from the built-in microphone port;
|
|
you need a separate board for other rates).
|
|
|
|
The new VAXstation 4000 series lets you PLAY audio (.au) files, and
|
|
the as-of-yet-unreleased package, DECsound, will let you do the
|
|
recording.
|
|
|
|
The SGI Personal IRIS 4D/30 and 4D/35 have the same capabilities as
|
|
the Indigo.
|
|
|
|
The new Apple Macs have more powerful audio hardware; the latest
|
|
models have built-in microphones.
|
|
|
|
Software exists for the PC that can play sound on its 1-bit speaker
|
|
using pulse width modulation (see appendix); the Soundblaster board
|
|
records at rates up to 13 k and plays back up to 22 k (weird
|
|
combination, but that's the way it is).
|
|
|
|
On the NeXT, the Motorola 56001 DSP chip is programmable and you can
|
|
(in principle) do what you want. The SGI uses the same DSP chip but
|
|
it can't be programmed by users -- SGI prefers to offer it as a shared
|
|
system resource to multiple applications, thus enabling developers to
|
|
program audio with their Audio Library and avoid code modifications
|
|
for execution on future machines with different audio hardware, i.e. a
|
|
different DSP.
|
|
|
|
The Amiga also has a 6-bit volume, which can be used to produce
|
|
something like a 14-bit output for each voice. The hardware can also
|
|
use one of each voice-pair to modulate the other in FM (period) or AM
|
|
(volume, 6-bits).
|
|
|
|
The Acorn Archimedes uses a variation on U-LAW with the bit order
|
|
reversed and the sign bit in bit 0. Being a 'minority' architecture,
|
|
Arc owners are quite adept at converting sound/image formats from
|
|
other machines, and it is unlikely that you'll ever encounter sound in
|
|
one of the Arc's own formats (there are several).
|
|
|
|
CD-I machines form a special category. The following formats are used:
|
|
|
|
- PCM 44.1 kHz standard CD format
|
|
- ADPCM - Addaptive Delta PCM
|
|
- Level A 37.8 kHz 8-bit
|
|
- Level B 37.8 kHz 4-bit
|
|
- Level C 18.9 kHz 4-bit
|
|
|
|
|
|
File formats
|
|
------------
|
|
|
|
Historically, almost every type of machine used its own file format
|
|
for audio data, but some file formats are more generally applicable,
|
|
and in general it is possible to define conversions between almost any
|
|
pair of file formats -- sometimes losing information, however.
|
|
|
|
File formats are a separate issue from device characteristics. There
|
|
are two types of file formats: self-describing formats, where the
|
|
device parameters and encoding are made explicit in some form of
|
|
header, and "raw" formats, where the device parameters and encoding
|
|
are fixed.
|
|
|
|
Self-describing file formats generally define a family of data
|
|
encodings, where a header fields indicates the particular encoding
|
|
variant used. Headerless formats define a single encoding and usually
|
|
allows no variation in device parameters (except sometimes sampling
|
|
rate, which can be a pain to figure out other than by listening to the
|
|
sample).
|
|
|
|
The header of self-describing formats contains the parameters of the
|
|
sampling device and sometimes other information (e.g. a
|
|
human-readable description of the sound, or a copyright notice). Most
|
|
headers begin with a simple "magic word". (Some formats do not simply
|
|
define a header format, but may contain chunks of data intermingled
|
|
with chunks of encoding info.) The data encoding defines how the
|
|
actual samples are stored in the file, e.g. signed or unsigned, as
|
|
bytes or short integers, in little-endian or big-endian byte order,
|
|
etc. Strictly spoken, channel interleaving is also part of the
|
|
encoding, although so far I have seen little variation in this area.
|
|
|
|
Some file formats apply some kind of compression to the data, e.g.
|
|
Huffman encoding, or simple silence deletion.
|
|
|
|
Here's an overview of popular file formats.
|
|
|
|
Self-describing file formats
|
|
----------------------------
|
|
|
|
extension, name origin variable parameters (fixed; comments)
|
|
|
|
.au or .snd NeXT, Sun rate, #channels, encoding, info string
|
|
.aif(f), AIFF Apple, SGI rate, #channels, sample width, lots of info
|
|
.aif(f), AIFC Apple, SGI same (extension of AIFF with compression)
|
|
.iff, IFF/8SVX Amiga rate, #channels, instrument info (8 bits)
|
|
.voc Soundblaster rate (8 bits/1 ch; can use silence deletion)
|
|
.wav, WAVE Microsoft rate, #channels, sample width, lots of info
|
|
.sf IRCAM rate, #channels, encoding, info
|
|
none, HCOM Mac rate (8 bits/1 ch; uses Huffman compression)
|
|
none, MIME Internet (see below)
|
|
.mod or .nst Amiga (see below)
|
|
|
|
Note that the filename extension ".snd" is ambiguous: it can be either
|
|
the self-describing NeXT format or the headerless Mac/PC format, or
|
|
even a headerless Amiga format.
|
|
|
|
I know nothing for sure about the origin of HCOM files, only that
|
|
there are a lot of them floating around on our system and probably at
|
|
FTP sites over the world. The filenames usually don't have a ".hcom"
|
|
extension, but this is what SOX (see below) uses. The file format
|
|
recognized by SOX includes a MacBinary header, where the file
|
|
type field is "FSSD". The data fork begins with the magic word "HCOM"
|
|
and contains Huffman compressed data; after decompression it it is 8
|
|
bits unsigned data.
|
|
|
|
IFF/8SVX allows for amplitude contours for sounds (attack/decay/etc).
|
|
Compression is optional (and extensible); volume is variable; author,
|
|
notes and copyright properties; etc.
|
|
|
|
AIFF, AIFC and WAVE are similar in spirit but allow more freedom in
|
|
encoding style (other than 8 bit/sample), amongst others.
|
|
|
|
There are other sound formats in use on Amiga by digitizers and music
|
|
programs, such as IFF/SMUS.
|
|
|
|
Appendices describes the NeXT and VOC formats; pointers to more info
|
|
about AIFF, AIFC, 8SVX and WAVE (which are too complex to describe
|
|
here) are also in appendices.
|
|
|
|
DEC systems (e.g. DECstation 5000) use a variant of the NeXT format
|
|
that uses little-endian encoding and has a different magic number
|
|
(0x0064732E in little-endian encoding).
|
|
|
|
Standard file formats used in the CD-I world are IFF but on the disc
|
|
they're in realtime files.
|
|
|
|
An interesting "interchange format" for audio data is described in the
|
|
proposed Internet Standard "MIME", which describes a family of
|
|
transport encodings and structuring devices for electronic mail. This
|
|
is an extensible format, and initially standardizes a type of audio
|
|
data dubbed "audio/basic", which is 8-bit U-LAW data sampled at 8000
|
|
samples/sec.
|
|
|
|
Finally, a format that doesn't really belong here are "MOD" files,
|
|
usually with extension ".mod" or ".nst" (on PCs, that is -- on Amigas
|
|
they have a *prefix* of "mod."). These files are short clips of
|
|
sounds with sequencing information. This makes for fairly compact
|
|
files but is limitted to making music with samples of a piano and
|
|
trumpet, etc.
|
|
|
|
Headerless file formats
|
|
-----------------------
|
|
|
|
extension origin parameters
|
|
or name
|
|
|
|
.snd, .fssd Mac, PC variable rate, 1 channel, 8 bits unsigned
|
|
.ul US telephony 8 k, 1 channel, 8 bit "U-LAW" encoding
|
|
.snd? Amiga variable rate, 1 channel, 8 bits signed
|
|
|
|
It is usually easy to distinguish 8-bit signed formats from unsigned
|
|
by looking at the beginning of the data with 'od -b <file | head';
|
|
since most sounds start with a little bit of silence containing small
|
|
amounts of background noise, the signed formats will have an abundance
|
|
of bytes with values 0376, 0377, 0, 1, 2, while the unsigned formats
|
|
will have 0176, 0177, 0200, 0201, 0202 instead. (Using "od -c" will
|
|
also show any headers that are tacked in front of the file.)
|
|
|
|
The Apple IIgs records raw data in the same format as the Mac, but
|
|
uses a 0 byte as a terminator; samples with value 0 are replaced by 1.
|
|
|
|
|
|
File conversions
|
|
----------------
|
|
|
|
SOX
|
|
---
|
|
|
|
The most versatile tool for converting between various audio formats
|
|
is SOX ("Sound Exchange"). It can read and write various types of
|
|
audio files, and optionally applies some special effects (e.g. echo,
|
|
channel averaging, or rate conversion).
|
|
|
|
SOX recognizes all filename extensions listed above except ".snd",
|
|
which would be ambiguous anyway, and ".wav" (but there's a patch, see
|
|
below). Use type ".au" for NeXT ".snd" files. Mac and PC ".snd"
|
|
files are completely described by these parameters:
|
|
|
|
-t raw -b -u -r 11000
|
|
|
|
(or -r 22000 or -r 7333 or -r 5500; 11000 seems to be the most common
|
|
rate).
|
|
|
|
The source for SOX, version 5, was posted to alt.sources, and should
|
|
be widely archived. To save you the trouble of hunting it down, it
|
|
can be gotten by anonymous ftp from wuarchive.wustl.edu, in the
|
|
directory usenet/alt.sources/articles, files 5581.Z through 5585.Z.
|
|
(These files are compressed news articles containing shar files, if
|
|
you hadn't guessed.) I am sure many sites have similar archives, I'm
|
|
just listing one that I know of and which carries a lot of this kind
|
|
of stuff. (Also see the appendix if you don't have Internet access.)
|
|
|
|
A compressed tar file containing the same version of SOX is available
|
|
by anonymous ftp from ftp.cwi.nl [192.16.184.180], in /pub/sox*.tar.Z.
|
|
You may be able to locate a nearer version using archie!
|
|
|
|
Ports of SOX:
|
|
|
|
- The source as posted should compile on any UNIX system with 4-byte
|
|
integers.
|
|
|
|
- A PC version is available by ftp from ftp.cwi.nl (see above) as
|
|
pub/sox4*.zip; also available from the garbo mail server.
|
|
|
|
- The latest Amiga SOX (corresponding to version 5) is available via
|
|
anonymous ftp to wuarchive.wustl.edu, files
|
|
systems/amiga/audio/utils/amisox*. (See below for a non-SOX
|
|
solution.)
|
|
|
|
- Work is currently in progress to get SOX ported to VMS (watch
|
|
comp.os.vms for announcements).
|
|
|
|
SOX usage hints:
|
|
|
|
- Often, the filename extension of sound files posted on the net is
|
|
wrong. Don't give up, try a few other possibilities using the
|
|
"-t <type>" option. Remember that the most common file type is
|
|
unsigned bytes, which can be indicated with "-t ub". You'll have to
|
|
guess the proper sampling rate, but often it's 11k or 22k.
|
|
|
|
- In particular, with SOX version 4 (or earlier), you have to
|
|
specify "-t 8svx" for files with an .iff extension.
|
|
|
|
- When converting linear samples to U-LAW using the .au type for the
|
|
output file, you must specify "-U" for the output file, otherwise
|
|
you will end up with a file containing a NeXT/Sun header but linear
|
|
samples -- only the NeXT will play such files correctly. Also, you
|
|
must explicitly specify an output sampling rate with "-r 8000".
|
|
(This may seem fixed for most cases in version 5, but it is still
|
|
occasionally necessary, so I'm keeping this warning in.)
|
|
|
|
Sun Sparc
|
|
---------
|
|
|
|
On Sun Sparcs, starting at SunOS 4.1, a program "raw2audio" is
|
|
provided by Sun (in /usr/demo/SOUND -- see below) which takes a raw
|
|
U-LAW file and turns it into a ".au" file by prefixing it with an
|
|
appropriate header.
|
|
|
|
NeXT
|
|
----
|
|
|
|
On NeXTs, you can usually rename .au files to .snd and it'll work like
|
|
a charm, but some .au files lack header info that the NeXT needs.
|
|
This can be fixed by using sndconvert:
|
|
|
|
sndconvert -c 1 -f 1 -s 8012.8210513 -o nextfile.snd sunfile.au
|
|
|
|
SGI Indigo and Personal IRIS
|
|
----------------------------
|
|
|
|
SGI supports a program sfconvert, similar in spirit to SOX (in
|
|
/usr/sbin in IRIX version 4.0). Also note that the sfplay program
|
|
(see the next section) can do on-the-fly conversion for several
|
|
popular formats.
|
|
|
|
Amiga
|
|
-----
|
|
|
|
Mike Cramer's SoundZAP can do no effects except rate change and it
|
|
only does conversions to IFF, but it is generally much faster than
|
|
SOX. (Ftp'able from the same directory as amisox above.)
|
|
|
|
Tandy
|
|
-----
|
|
|
|
The Tandy 1000 uses a (proprietary?) compressed format. There is a
|
|
PD Mac to Tandy conversion program called CONVERT.
|
|
|
|
|
|
Playing audio files on UNIX
|
|
---------------------------
|
|
|
|
The commands needed to play an audio file depend on the file format
|
|
and the available hardware and software. Most systems can only
|
|
directly play sound in their native format; use a conversion program
|
|
(see above) to play other formats.
|
|
|
|
Sun Sparc
|
|
---------
|
|
|
|
Raw U-LAW files can be played using "cat file >/dev/audio".
|
|
|
|
A whole package for dealing with ".au" files is provided by Sun on an
|
|
experimental basis, in /usr/demo/SOUND. You may have to compile the
|
|
programs first. (If you can't find this directory, either you are not
|
|
running SunOS 4.1 yet, or your system administrator hasn't installed
|
|
it -- go ask him for it, not me!) The program "play" in this
|
|
directory recognizes all files in Sun/NeXT format, but can play only
|
|
those using U-LAW encoding at 8 k.
|
|
|
|
You can also cat a ".au" file to /dev/audio, if it uses U-LAW; the
|
|
header will sound like a short burst of noise but the rest of the data
|
|
will sound OK (really, the only difference in this case between raw
|
|
U-LAW and ".au" files is the header; the U-LAW data is exactly the
|
|
same).
|
|
|
|
Finally, OpenWindows 3.0 has a full-fledged audio tool. You can drop
|
|
audio file icons into it, edit them, etc.
|
|
|
|
NeXT
|
|
----
|
|
|
|
On NeXT machines, the standard "sndplay" program can play all NeXT
|
|
format files (this include Sun ".au" files). It supports at least
|
|
U-LAW at 8 k and 16 bits samples at 22 or 44.1 k. It attempts
|
|
on-the-fly conversions for other formats.
|
|
|
|
Sound files are also played if you double-click on them in the file
|
|
browser.
|
|
|
|
SGI Indigo and Personal IRIS
|
|
----------------------------
|
|
|
|
On SGI Indigo and the 4D/30 and /35 Personal IRIS workstations, the
|
|
program "sfplay" (in /usr/sbin) plays AIFF files, if the sampling rate
|
|
is one of 8000, 11025, 16000, 22050, 32000, 44100, or 48000 (the
|
|
library interface to the hardware doesn't support other rates -- I
|
|
don't know what the hardward is actually capable of). On the Personal
|
|
IRIS, you need to have the audio board installed (check the output
|
|
from hinv) and you must run IRIX 3.3.2 or 4.0 or higher.
|
|
|
|
There is no simple /dev/audio interface on these SGI machines. (There
|
|
was one on 4D/25 machines, reading and writing signed linear 8-bit
|
|
samples at rates of 8, 16 and 32 k; unfortunately the board design
|
|
caused a lot of noise from the CPU board to clutter the audio signals.)
|
|
|
|
A program "playulaw" was posted as part of the "radio 1.0" release
|
|
that I posted to alt.sources recently; it plays raw U-LAW files on the
|
|
Indigo or Personal IRIS audio hardware.
|
|
|
|
Sony NEWS
|
|
---------
|
|
|
|
The Sony RISC-NEWS line (NWS-3250 laptop, NWS-37xx desktop, NWS-38xx
|
|
desktop w/ IOP) also has builtin sound capabilities. You can also buy
|
|
external boards for the older NEWS machines or to add extra channels
|
|
to the new machines. In the default mode (8k/8-bit), Sun .au files
|
|
are directly supported (you can 'cat' .au files to /dev/sb and have
|
|
them play).
|
|
|
|
Vaxstation 4000
|
|
---------------
|
|
|
|
".au" files can be played by COPYING them to device "SOA0:". This
|
|
device is set up by enabling the driver SODRIVER, as described below:
|
|
|
|
DEC's sound stuff is like most other new toy. Hardware first, THEN the
|
|
software. DEC will soon be releasing a layered product called DECsound,
|
|
which will let you record, play, and (possibly) manipulate sound files.
|
|
Third party product(s) have ALREADY hit the market.
|
|
|
|
Enabling SODRIVER: (you can use the following command file)
|
|
|
|
$!---------------- cut here -------------------------------
|
|
$! sound_setup.com enable SOUND driver
|
|
$ run sys$system:sysgen
|
|
connect soa0 /adapter=0 /csr=%x0e00 /vector=%o304 /driver=sodriver
|
|
exit
|
|
$ exit
|
|
$!----------------- cut here ------------------------------------
|
|
|
|
The external audio port comes with a telephone-jack-like port. For
|
|
starters, you can plug a telephone RECEIVER right into this port to
|
|
hear your first sound files. After that, you can use the adapter
|
|
(that came with the VaxStation), and plug in a small set of stereo
|
|
speakers (the kind you'd plug into a WALKMAN, for example), for more
|
|
volume.
|
|
|
|
Others
|
|
------
|
|
|
|
Most other UNIX boxes don't have audio hardware and thus can't play
|
|
audio data.
|
|
|
|
|
|
Playing audio files on micros
|
|
-----------------------------
|
|
|
|
Most micros have at least a speaker built in, so theoretically all you
|
|
need is the right software. Unfortunately most systems don't come
|
|
bundled with sound-playing software, so there are many public domain
|
|
or shareware software packages, each with their own bugs and features.
|
|
Most separate sound recording hardware also comes with playing
|
|
software, most of which can play sound (in the file format used by
|
|
that hardware) even on machines that don't have that hardware
|
|
installed.
|
|
|
|
Chris S. Craig announces the following software for PCs:
|
|
|
|
ScopeTrax This is a complete PC sound player/editor package. Sounds
|
|
can be played back at ANY rate between 1kHz to 65kHz through
|
|
the PC speaker or the Sound Blaster. It supports several
|
|
file formats including VOC, IFF/8SVX, raw signed and raw
|
|
unsigned. A separate executable is provided to convert
|
|
.au and mu-law to raw format. ScopeTrax requires EGA/VGA
|
|
graphics for editing and displaying sounds on a REALTIME
|
|
oscilloscope. The package also includes:
|
|
* An expanded memory player which can play sounds
|
|
larger than 640K in size.
|
|
* Basic (rough) sound compression/uncompression
|
|
utilities.
|
|
* Complete documentation.
|
|
The package is FREEWARE! It is available on SIMTEL in the
|
|
PD1:[MSDOS.SOUND] directory.
|
|
|
|
One of the appendices below contains a list of more programs to play
|
|
sound on the PC.
|
|
|
|
For sounds on Atari STs - programs are in the atari/sound/players
|
|
directory on atari.archive.umich.edu (141.211.164.8).
|
|
|
|
Malcolm Slaney from Apple writes:
|
|
|
|
"We do have tools to play sound back on most of our Unix hosts. We wrote
|
|
a program called TcpPlay that lets us read a sound file on a Unix host,
|
|
open a TCP/IP connection to the Mac on my desk, and plays the file. We
|
|
think of it as X windows for sound (at least a step in that direction.)
|
|
|
|
This software is available for anonymous FTP from ftp.apple.com.
|
|
Look for ~ftp/pub/TcpPlay/TcpPlay.sit.hqx.
|
|
|
|
Finally, there are MANY tools for working with sound on the Macintosh. Three
|
|
applications that come to mind immediately are SoundEdit (formerly by
|
|
Farralon and now by MacroMind/Paracomp), Alchemy and Eric Keller's Signalyze.
|
|
There are lots of other tools available for sound editing (including some
|
|
of the QuickTime Movie tools.)"
|
|
|
|
On a Tandy 1000, sounds can be played and recorded with DeskMate Sound
|
|
(SOUND.PDM), or if they not stored in compressed format, they can also
|
|
be played be a program called PLAYSND. No indication of whether
|
|
PLAYSND is PD or not. It hasn't been updated since March of 89.
|
|
|
|
The Sound Site Newsletter
|
|
-------------------------
|
|
|
|
An electronic publication with lots of info about digitised sound and
|
|
sound formats, albeit mostly on micros, is "The Sound Site
|
|
Newsletter". So far, 8 issues have appeared, the last in January
|
|
1992. Issues can be ftp'ed from saffron.inset.com, directory
|
|
directory pub/rogue/newsletters, or from ccb.ucsf.edu,
|
|
Pub/Sound_list/Sound.Newsletters.
|
|
|
|
|
|
Posting sounds
|
|
--------------
|
|
|
|
The newsgroup alt.binaries.sounds.misc is dedicated to postings
|
|
containing sound. (Discussions related to such postings belong in
|
|
alt.binaries.sounds.d.)
|
|
|
|
There is no set standard for posting sounds; uuencoded files in most
|
|
popular formats are welcome, if split in parts under 50 kBytes. To
|
|
accomodate automatic decoding software (such as the ":decode" command
|
|
of the nn newsreader), please place a part indicator of the form
|
|
(mm/nn) at the end of your subject meaning this is number mm of a
|
|
total of nn part.
|
|
|
|
It is recommended to post sounds in the format that was used for the
|
|
original recording; conversions to other formats often lose
|
|
information and would do people with identical hardware as the poster
|
|
no favor. For instance, convering 8-bit linear sound to U-LAW loses
|
|
the lower few bits of the data, and rate changing conversions almost
|
|
always add noise. Converting from U-LAW to linear requires expansion
|
|
to 16 bit samples if no information loss is allowed!
|
|
|
|
U-LAW data is best posted with a NeXT/Sun header.
|
|
|
|
If you have to post a file in a headerless format (usually 8-bit
|
|
linear, like ".snd"), please add a description giving at least the
|
|
sampling rate and whether the bytes are signed (zero at 0) or unsigned
|
|
(zero at 0200). However, it is highly recommended to add a header
|
|
that indicates the sampling rate and encoding scheme; if necessary you
|
|
can use SOX to add a header of your choice to raw data.
|
|
|
|
Compression of sound files usually isn't worth it; the standard
|
|
"compress" algorithm doesn't save much when applied to sound data
|
|
(typically at most 10-20 percent), and compression algorithms
|
|
specifically designed for sound (e.g. NeXT's) are usually
|
|
proprietary. (See also the section "Compression schemes" earlier.)
|
|
|
|
|
|
Appendices
|
|
==========
|
|
|
|
Here are some more detailed pieces of info that I received by e-mail.
|
|
They are reproduced here virtually without much editing.
|
|
|
|
------------------------------------------------------------------------
|
|
FTP access for non-internet sites
|
|
---------------------------------
|
|
|
|
From the sci.space FAQ:
|
|
|
|
Sites not connected to the Internet cannot use FTP directly, but
|
|
there are a few automated FTP servers which operate via email.
|
|
Send mail containing only the word HELP to ftpmail@decwrl.dec.com
|
|
or bitftp@pucc.princeton.edu, and the servers will send you
|
|
instructions on how to make requests
|
|
|
|
Also:
|
|
|
|
FAQ lists are available by anonymous FTP from pit-manager.mit.edu
|
|
(18.72.1.58) and by email from mail-server@pit-manager.mit.edu (send
|
|
a message containing "help" for instructions about the mail server).
|
|
|
|
|
|
------------------------------------------------------------------------
|
|
AIFF Format (Audio IFF) and AIFC
|
|
--------------------------------
|
|
|
|
This format was developed by Apple for storing high-quality sampled
|
|
sound and musical instrument info; it is also used by SGI and several
|
|
professional audio packages (sorry, I know no names). An extension,
|
|
called AIFC or AIFF-C, supports compression (see the last item below).
|
|
|
|
I've made a BinHex'ed MacWrite version of the AIFF spec (no idea if
|
|
it's the same text as mentioned below) available by anonymous ftp from
|
|
ftp.cwi.nl [192.16.184.180]; the file is /pub/AudioIFF1.2.hqx. But
|
|
you may be better off with the AIFF-C specs, see below.
|
|
|
|
Mike Brindley (brindley@ece.orst.edu) writes:
|
|
|
|
"The complete AIFF spec by Steve Milne, Matt Deatherage (Apple) is
|
|
available in 'AMIGA ROM Kernal Reference Manual: Devices (3rd Edition)'
|
|
1991 by Commodore-Amiga, Inc.; Addison-Wesley Publishing Co.;
|
|
ISBN 0-201-56775-X, starting on page 435 (this edition has a charcoal
|
|
grey cover). It is available in most bookstores, and soon in many
|
|
good librairies."
|
|
|
|
Finally, Mark Callow writes (in comp.sys.sgi):
|
|
|
|
"I have placed a PostScript version of the AIFF-C specification on
|
|
sgi.sgi.com for public ftp. It is in the file sgi/aiff-c.9.26.91.ps.
|
|
|
|
sgi.sgi.com's internet host number is (I think) 192.48.153.1."
|
|
|
|
------------------------------------------------------------------------
|
|
The NeXT/Sun audio file format
|
|
------------------------------
|
|
|
|
Here's the complete story on the file format, from the NeXT
|
|
documentation. (Note that the "magic" number is ((int)0x2e736e64),
|
|
which equals ".snd".) Also, at the end, I've added a litte document
|
|
that someone posted to the net a couple of years ago, that describes
|
|
the format in a bit-by-bit fashion rather than from C.
|
|
|
|
I received this from Doug Keislar, NeXT Computer. This is also the
|
|
Sun format, except that Sun doesn't recognize as many format codes. I
|
|
added the numeric codes to the table of formats and sorted it.
|
|
|
|
|
|
SNDSoundStruct: How a NeXT Computer Represents Sound
|
|
|
|
The NeXT sound software defines the SNDSoundStruct structure to
|
|
represent sound. This structure defines the soundfile and Mach-O
|
|
sound segment formats and the sound pasteboard type. It's also used
|
|
to describe sounds in Interface Builder. In addition, each instance
|
|
of the Sound Kit's Sound class encapsulates a SNDSoundStruct and
|
|
provides methods to access and modify its attributes.
|
|
|
|
Basic sound operations, such as playing, recording, and cut-and-paste
|
|
editing, are most easily performed by a Sound object. In many cases,
|
|
the Sound Kit obviates the need for in-depth understanding of the
|
|
SNDSoundStruct architecture. For example, if you simply want to
|
|
incorporate sound effects into an application, or to provide a simple
|
|
graphic sound editor (such as the one in the Mail application), you
|
|
needn't be aware of the details of the SNDSoundStruct. However, if
|
|
you want to closely examine or manipulate sound data you should be
|
|
familiar with this structure.
|
|
|
|
The SNDSoundStruct contains a header, information that describes the
|
|
attributes of a sound, followed by the data (usually samples) that
|
|
represents the sound. The structure is defined (in
|
|
sound/soundstruct.h) as:
|
|
|
|
typedef struct {
|
|
int magic; /* magic number SND_MAGIC */
|
|
int dataLocation; /* offset or pointer to the data */
|
|
int dataSize; /* number of bytes of data */
|
|
int dataFormat; /* the data format code */
|
|
int samplingRate; /* the sampling rate */
|
|
int channelCount; /* the number of channels */
|
|
char info[4]; /* optional text information */
|
|
} SNDSoundStruct;
|
|
|
|
|
|
|
|
|
|
SNDSoundStruct Fields
|
|
|
|
|
|
|
|
magic
|
|
|
|
magic is a magic number that's used to identify the structure as a
|
|
SNDSoundStruct. Keep in mind that the structure also defines the
|
|
soundfile and Mach-O sound segment formats, so the magic number is
|
|
also used to identify these entities as containing a sound.
|
|
|
|
|
|
|
|
|
|
|
|
dataLocation
|
|
|
|
It was mentioned above that the SNDSoundStruct contains a header
|
|
followed by sound data. In reality, the structure only contains the
|
|
header; the data itself is external to, although usually contiguous
|
|
with, the structure. (Nonetheless, it's often useful to speak of the
|
|
SNDSoundStruct as the header and the data.) dataLocation is used to
|
|
point to the data. Usually, this value is an offset (in bytes) from
|
|
the beginning of the SNDSoundStruct to the first byte of sound data.
|
|
The data, in this case, immediately follows the structure, so
|
|
dataLocation can also be thought of as the size of the structure's
|
|
header. The other use of dataLocation, as an address that locates
|
|
data that isn't contiguous with the structure, is described in
|
|
"Format Codes," below.
|
|
|
|
|
|
|
|
|
|
|
|
dataSize, dataFormat, samplingRate, and channelCount
|
|
|
|
These fields describe the sound data.
|
|
|
|
dataSize is its size in bytes (not including the size of the
|
|
SNDSoundStruct).
|
|
|
|
dataFormat is a code that identifies the type of sound. For sampled
|
|
sounds, this is the quantization format. However, the data can also
|
|
be instructions for synthesizing a sound on the DSP. The codes are
|
|
listed and explained in "Format Codes," below.
|
|
|
|
samplingRate is the sampling rate (if the data is samples). Three
|
|
sampling rates, represented as integer constants, are supported by
|
|
the hardware:
|
|
|
|
Constant Sampling Rate (samples/sec)
|
|
|
|
SND_RATE_CODEC 8012.821 (CODEC input)
|
|
SND_RATE_LOW 22050.0 (low sampling rate output)
|
|
SND_RATE_HIGH 44100.0 (high sampling rate output)
|
|
|
|
channelCount is the number of channels of sampled sound.
|
|
|
|
|
|
|
|
|
|
|
|
info
|
|
|
|
info is a NULL-terminated string that you can supply to provide a
|
|
textual description of the sound. The size of the info field is set
|
|
when the structure is created and thereafter can't be enlarged. It's
|
|
at least four bytes long (even if it's unused).
|
|
|
|
|
|
|
|
|
|
|
|
Format Codes
|
|
|
|
A sound's format is represented as a positive 32-bit integer. NeXT
|
|
reserves the integers 0 through 255; you can define your own format
|
|
and represent it with an integer greater than 255. Most of the
|
|
formats defined by NeXT describe the amplitude quantization of
|
|
sampled sound data:
|
|
|
|
Value Code Format
|
|
|
|
0 SND_FORMAT_UNSPECIFIED unspecified format
|
|
1 SND_FORMAT_MULAW_8 8-bit mu-law samples
|
|
2 SND_FORMAT_LINEAR_8 8-bit linear samples
|
|
3 SND_FORMAT_LINEAR_16 16-bit linear samples
|
|
4 SND_FORMAT_LINEAR_24 24-bit linear samples
|
|
5 SND_FORMAT_LINEAR_32 32-bit linear samples
|
|
6 SND_FORMAT_FLOAT floating-point samples
|
|
7 SND_FORMAT_DOUBLE double-precision float samples
|
|
8 SND_FORMAT_INDIRECT fragmented sampled data
|
|
9 SND_FORMAT_NESTED ?
|
|
10 SND_FORMAT_DSP_CORE DSP program
|
|
11 SND_FORMAT_DSP_DATA_8 8-bit fixed-point samples
|
|
12 SND_FORMAT_DSP_DATA_16 16-bit fixed-point samples
|
|
13 SND_FORMAT_DSP_DATA_24 24-bit fixed-point samples
|
|
14 SND_FORMAT_DSP_DATA_32 32-bit fixed-point samples
|
|
15 ?
|
|
16 SND_FORMAT_DISPLAY non-audio display data
|
|
17 SND_FORMAT_MULAW_SQUELCH ?
|
|
18 SND_FORMAT_EMPHASIZED 16-bit linear with emphasis
|
|
19 SND_FORMAT_COMPRESSED 16-bit linear with compression
|
|
20 SND_FORMAT_COMPRESSED_EMPHASIZED A combination of the two above
|
|
21 SND_FORMAT_DSP_COMMANDS Music Kit DSP commands
|
|
22 SND_FORMAT_DSP_COMMANDS_SAMPLES ?
|
|
|
|
|
|
Most formats identify different sizes and types of
|
|
sampled data. Some deserve special note:
|
|
|
|
|
|
-- SND_FORMAT_DSP_CORE format contains data that represents a
|
|
loadable DSP core program. Sounds in this format are required by the
|
|
SNDBootDSP() and SNDRunDSP() functions. You create a
|
|
SND_FORMAT_DSP_CORE sound by reading a DSP load file (extension
|
|
".lod") with the SNDReadDSPfile() function.
|
|
|
|
-- SND_FORMAT_DSP_COMMANDS is used to distinguish sounds that
|
|
contain DSP commands created by the Music Kit. Sounds in this format
|
|
can only be created through the Music Kit's Orchestra class, but can
|
|
be played back through the SNDStartPlaying() function.
|
|
|
|
-- SND_FORMAT_DISPLAY format is used by the Sound Kit's
|
|
SoundView class. Such sounds can't be played.
|
|
|
|
|
|
-- SND_FORMAT_INDIRECT indicates data that has become
|
|
fragmented, as described in a separate section, below.
|
|
|
|
|
|
-- SND_FORMAT_UNSPECIFIED is used for unrecognized formats.
|
|
|
|
|
|
|
|
|
|
|
|
Fragmented Sound Data
|
|
|
|
Sound data is usually stored in a contiguous block of memory.
|
|
However, when sampled sound data is edited (such that a portion of
|
|
the sound is deleted or a portion inserted), the data may become
|
|
discontiguous, or fragmented. Each fragment of data is given its own
|
|
SNDSoundStruct header; thus, each fragment becomes a separate
|
|
SNDSoundStruct structure. The addresses of these new structures are
|
|
collected into a contiguous, NULL-terminated block; the dataLocation
|
|
field of the original SNDSoundStruct is set to the address of this
|
|
block, while the original format, sampling rate, and channel count
|
|
are copied into the new SNDSoundStructs.
|
|
|
|
|
|
Fragmentation serves one purpose: It avoids the high cost of moving
|
|
data when the sound is edited. Playback of a fragmented sound is
|
|
transparent-you never need to know whether the sound is fragmented
|
|
before playing it. However, playback of a heavily fragmented sound
|
|
is less efficient than that of a contiguous sound. The
|
|
SNDCompactSamples() C function can be used to compact fragmented
|
|
sound data.
|
|
|
|
Sampled sound data is naturally unfragmented. A sound that's freshly
|
|
recorded or retrieved from a soundfile, the Mach-O segment, or the
|
|
pasteboard won't be fragmented. Keep in mind that only sampled data
|
|
can become fragmented.
|
|
|
|
|
|
|
|
_________________________
|
|
>From mentor.cc.purdue.edu!purdue!decwrl!ucbvax!ziploc!eps Wed Apr 4
|
|
23:56:23 EST 1990
|
|
Article 5779 of comp.sys.next:
|
|
Path: mentor.cc.purdue.edu!purdue!decwrl!ucbvax!ziploc!eps
|
|
>From: eps@toaster.SFSU.EDU (Eric P. Scott)
|
|
Newsgroups: comp.sys.next
|
|
Subject: Re: Format of NeXT sndfile headers?
|
|
Message-ID: <445@toaster.SFSU.EDU>
|
|
Date: 31 Mar 90 21:36:17 GMT
|
|
References: <14978@phoenix.Princeton.EDU>
|
|
Reply-To: eps@cs.SFSU.EDU (Eric P. Scott)
|
|
Organization: San Francisco State University
|
|
Lines: 42
|
|
|
|
In article <14978@phoenix.Princeton.EDU>
|
|
bskendig@phoenix.Princeton.EDU (Brian Kendig) writes:
|
|
>I'd like to take a program I have that converts Macintosh sound
|
|
files
|
|
>to NeXT sndfiles and polish it up a bit to go the other direction as
|
|
>well.
|
|
|
|
Two people have already submitted programs that do this
|
|
(Christopher Lane and Robert Hood); check the various
|
|
NeXT archive sites.
|
|
|
|
> Could someone please give me the format of a NeXT sndfile
|
|
>header?
|
|
|
|
"big-endian"
|
|
0 1 2 3
|
|
+-------+-------+-------+-------+
|
|
0 | 0x2e | 0x73 | 0x6e | 0x64 | "magic" number
|
|
+-------+-------+-------+-------+
|
|
4 | | data location
|
|
+-------+-------+-------+-------+
|
|
8 | | data size
|
|
+-------+-------+-------+-------+
|
|
12 | | data format (enum)
|
|
+-------+-------+-------+-------+
|
|
16 | | sampling rate (int)
|
|
+-------+-------+-------+-------+
|
|
20 | | channel count
|
|
+-------+-------+-------+-------+
|
|
24 | | | | | (optional) info
|
|
string
|
|
|
|
28 = minimum value for data location
|
|
|
|
data format values can be found in /usr/include/sound/soundstruct.h
|
|
|
|
Most common combinations:
|
|
|
|
sampling channel data
|
|
rate count format
|
|
voice file 8012 1 1 = 8-bit mu-law
|
|
system beep 22050 2 3 = 16-bit linear
|
|
CD-quality 44100 2 3 = 16-bit linear
|
|
|
|
------------------------------------------------------------------------
|
|
IFF/8SVX Format
|
|
---------------
|
|
|
|
Newsgroups: alt.binaries.sounds.d,alt.sex.sounds
|
|
Subject: Format of the IFF header (Amiga sounds)
|
|
Message-ID: <2509@tardis.Tymnet.COM>
|
|
From: jms@tardis.Tymnet.COM (Joe Smith)
|
|
Date: 23 Oct 91 23:54:38 GMT
|
|
Followup-To: alt.binaries.sounds.d
|
|
Organization: BT North America (Tymnet)
|
|
|
|
The first 12 bytes of an IFF file are used to distinguish between an Amiga
|
|
picture (FORM-ILBM), an Amiga sound sample (FORM-8SVX), or other file
|
|
conforming to the IFF specification. The middle 4 bytes is the count of
|
|
bytes that follow the "FORM" and byte count longwords. (Numbers are stored
|
|
in M68000 form, high order byte first.)
|
|
|
|
------------------------------------------
|
|
|
|
FutureSound audio file, 15000 samples at 10.000KHz, file is 15048 bytes long.
|
|
|
|
0000: 464F524D 00003AC0 38535658 56484452 FORM..:.8SVXVHDR
|
|
F O R M 15040 8 S V X V H D R
|
|
0010: 00000014 00003A98 00000000 00000000 ......:.........
|
|
20 15000 0 0
|
|
0020: 27100100 00010000 424F4459 00003A98 '.......BODY..:.
|
|
10000 1 0 1.0 B O D Y 15000
|
|
|
|
0000000..03 = "FORM", identifies this as an IFF format file.
|
|
FORM+00..03 (ULONG) = number of bytes that follow. (Unsigned long int.)
|
|
FORM+03..07 = "8SVX", identifies this as an 8-bit sampled voice.
|
|
|
|
????+00..03 = "VHDR", Voice8Header, describes the parameters for the BODY.
|
|
VHDR+00..03 (ULONG) = number of bytes to follow.
|
|
VHDR+04..07 (ULONG) = samples in the high octave 1-shot part.
|
|
VHDR+08..0B (ULONG) = samples in the high octave repeat part.
|
|
VHDR+0C..0F (ULONG) = samples per cycle in high octave (if repeating), else 0.
|
|
VHDR+10..11 (UWORD) = samples per second. (Unsigned 16-bit quantity.)
|
|
VHDR+12 (UBYTE) = number of octaves of waveforms in sample.
|
|
VHDR+13 (UBYTE) = data compression (0=none, 1=Fibonacci-delta encoding).
|
|
VHDR+14..17 (FIXED) = volume. (The number 65536 means 1.0 or full volume.)
|
|
|
|
????+00..03 = "BODY", identifies the start of the audio data.
|
|
BODY+00..03 (ULONG) = number of bytes to follow.
|
|
BODY+04..NNNNN = Data, signed bytes, from -128 to +127.
|
|
|
|
0030: 04030201 02030303 04050605 05060605
|
|
0040: 06080806 07060505 04020202 01FF0000
|
|
0050: 00000000 FF00FFFF FFFEFDFD FDFEFFFF
|
|
0060: FDFDFF00 00FFFFFF 00000000 00FFFF00
|
|
0070: 00000000 00FF0000 00FFFEFF 00000000
|
|
0080: 00010000 000101FF FF0000FE FEFFFFFE
|
|
0090: FDFDFEFD FDFFFFFC FDFEFDFD FEFFFEFE
|
|
00A0: FFFEFEFE FEFEFEFF FFFFFEFF 00FFFF01
|
|
|
|
This small section of the audio sample shows the number ranging from -5 (0xFD)
|
|
to +8 (0x08). Warning: Do not assume that the BODY starts 48 bytes into the
|
|
file. In addition to "VHDR", chunks labeled "NAME", "AUTH", "ANNO", or
|
|
"(c) " may be present, and may be in any order. You will have to check the
|
|
byte count in each chunk to determine how many bytes to skip.
|
|
|
|
------------------------------------------------------------------------
|
|
Playing sound on a PC
|
|
---------------------
|
|
|
|
From: Eric A Rasmussen
|
|
|
|
Any turbo PC (8088 at 8 Mhz or greater)/286/386/486/etc. can produce a quality
|
|
playback of single channel 8 bit sounds on the internal (1 bit, 1 channel)
|
|
speaker by utilizing Pulse-Width-Modulation, which toggles the speaker faster
|
|
than it can physically move to simulate positions between fully on and fully
|
|
off. There are several PD programs of this nature that I know of:
|
|
|
|
REMAC - Plays MAC format sound files. Files on the Macintosh, at least the
|
|
sound files that I've ripped apart, seem to contain 3 parts. The
|
|
first two are info like what the file icon looks like and other
|
|
header type info. The third part contains the raw sample data, and
|
|
it is this portion of the file which is saved to a seperate file,
|
|
often named with the .snd extension by PC users. Personally, I like
|
|
to name the files .s1, .s2, .s3, or .s4 to indicate the sampling rate
|
|
of the file. (-s# is how to specify the playback rate in REMAC.)
|
|
REMAC provides playback rates of 5550hz, 7333hz, 11 khz, & 22 khz.
|
|
REMAC2 - Same as REMAC, but sounds better on higher speed machines.
|
|
REPLAY - Basically same as REMAC, but for playback of Atari ST sounds.
|
|
Apparently, the Atari has two sound formats, one of which sounds like
|
|
garbage if played by REMAC or REPLAY in the incorrect mode. The
|
|
other file format works fine with REMAC and so appears to be 'normal'
|
|
unsigned 8-bit data. REPLAY provides playback rates of 11.5 khz,
|
|
12.5 khz, 14 khz, 16 khz, 18.5 khz, 22khz, & 27 khz.
|
|
|
|
These three programs are all by the same author, Richard E. Zobell who does
|
|
not have an internet mail address to my knowledge, but does have a GEnie email
|
|
address of R.ZOBELL.
|
|
|
|
Additionally, there are various stand-alone demos which use the internal
|
|
speaker, of which there is one called mushroom which plays a 30 second
|
|
advertising jingle for magic mushroom room deoderizers which is pretty
|
|
humerous. I've used this player to playback samples that I ripped out of the
|
|
commercial game program Mean Streets, which uses something they call RealSound
|
|
(tm) to playback digital samples on the internal speaker. (Of course, I only do
|
|
this on my own system, and since I own the game, I see no problems with it.)
|
|
|
|
For owners of 8 Mhz 286's and above, the option to play 4 channel 8 bit sounds
|
|
(with decent quality) on the internal speaker is also a reality. Quite a
|
|
number of PD programs exist to do this, including, but not limited to:
|
|
|
|
ModEdit, ModPlay, ScreamTracker, STM, Star Trekker, Tetra, and probably a few
|
|
more.
|
|
|
|
All these programs basically make use of various sound formats used by the
|
|
Amiga line of computers. These include .stm files, .mod files
|
|
[a.k.a. mod. files], and .nst files [really the same hing]. Also,
|
|
these programs pretty much all have the option to playback the
|
|
sound to add-on hardware such as the SoundBlaster card, the Covox series of
|
|
devices, and also to direct the data to either one or two (for stereo)
|
|
parallel ports, which you could attach your own D/A's to. (From what I have
|
|
seen, the Covox is basically an small amplified speaker with a D/A which plugs
|
|
into the parallel port. This sounds very similiar to the Disney Sound System
|
|
(DSS) which people have been talking about recently.)
|
|
|
|
------------------------------------------------------------------------
|
|
The EA-IFF-85 documentation
|
|
---------------------------
|
|
|
|
From: dgc3@midway.uchicago.edu
|
|
|
|
As promised, here's an ftp location for the EA-IFF-85 documentation. It's
|
|
the November 1988 release as revised by Commodore (the last public release),
|
|
with specifications for IFF FORMs for graphics, sound, formatted text, and
|
|
more. IFF FORMS now exist for other media, including structured drawing, and
|
|
new documentation is now available only from Commodore.
|
|
|
|
The documentation is at grind.isca.uiowa.edu [128.255.19.233], in the
|
|
directory /amiga/f1/ff185. The complete file list is as follows:
|
|
|
|
DOCUMENTS.zoo
|
|
EXAMPLES.zoo
|
|
EXECUTABLE.zoo
|
|
INCLUDE.zoo
|
|
LINKER_INFO.zoo
|
|
OBJECT.zoo
|
|
SOURCE.zoo
|
|
TP_IFF_Specs.zoo
|
|
|
|
All files except DOCUMENTS.zoo are Amiga-specific, but may be used as a basis
|
|
for conversion to other platforms. Well, I take that tentatively back. I
|
|
don't know what TP_IFF_Specs.zoo contains, so it might be non-Amiga-specific.
|
|
|
|
------------------------------------------------------------------------
|
|
US Federal Standard 1016 availability
|
|
-------------------------------------
|
|
|
|
From: Joe Campbell N3JBC jpcampb@afterlife.ncsc.mil 74040.305@compuserve.com
|
|
|
|
The U.S. DoD's Federal-Standard-1016 4800 bps code excited linear prediction
|
|
voice coder version 3.2 (CELP 3.2) Fortran and C simulation source codes are
|
|
now available for worldwide distribution at no charge (on DOS diskettes,
|
|
but configured to compile on Sun SPARC stations) from:
|
|
|
|
Bob Fenichel
|
|
National Communications System
|
|
Washington, D.C. 20305
|
|
1-703-692-2124
|
|
1-703-746-4960 (fax)
|
|
|
|
In addition to the source codes, example input and processed speech files
|
|
are included along with a technical information bulletin to assist in
|
|
implementation of FS-1016 CELP. (An anonymous ftp site is being considered
|
|
for future releases.)
|
|
|
|
Copies of the FS-1016 document are available for $2.50 each from:
|
|
|
|
GSA Rm 6654
|
|
7th & D St SW
|
|
Washington, D.C. 20407
|
|
1-202-708-9205
|
|
|
|
The following articles describe the Federal-Standard-1016 4.8-kbps CELP
|
|
coder (it's unnecessary to read more than one):
|
|
|
|
Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch,
|
|
"The Federal Standard 1016 4800 bps CELP Voice Coder," Digital Signal
|
|
Processing, Academic Press, 1991, Vol. 1, No. 3, p. 145-155.
|
|
|
|
Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch,
|
|
"The DoD 4.8 kbps Standard (Proposed Federal Standard 1016),"
|
|
in Advances in Speech Coding, ed. Atal, Cuperman and Gersho,
|
|
Kluwer Academic Publishers, 1991, Chapter 12, p. 121-133.
|
|
|
|
Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The
|
|
Proposed Federal Standard 1016 4800 bps Voice Coder: CELP," Speech
|
|
Technology Magazine, April/May 1990, p. 58-64.
|
|
|
|
|
|
------------------------------------------------------------------------
|
|
Creative Voice (VOC) file format
|
|
--------------------------------
|
|
|
|
From: galt@dsd.es.com
|
|
|
|
(byte numbers are hex!)
|
|
|
|
HEADER (bytes 00-19)
|
|
Series of DATA BLOCKS (bytes 1A+) [Must end w/ Terminator Block]
|
|
|
|
- ---------------------------------------------------------------
|
|
|
|
HEADER:
|
|
=======
|
|
byte # Description
|
|
------ ------------------------------------------
|
|
00-12 "Creative Voice File"
|
|
13 1A (eof to abort printing of file)
|
|
14-15 Offset of first datablock in .voc file (std 1A 00
|
|
in Intel Notation)
|
|
16-17 Version number (minor,major) (VOC-HDR puts 0A 01)
|
|
18-19 2's Comp of Ver. # + 1234h (VOC-HDR puts 29 11)
|
|
|
|
- ---------------------------------------------------------------
|
|
|
|
DATA BLOCK:
|
|
===========
|
|
|
|
Data Block: TYPE(1-byte), SIZE(3-bytes), INFO(0+ bytes)
|
|
NOTE: Terminator Block is an exception -- it has only the TYPE byte.
|
|
|
|
TYPE Description Size (3-byte int) Info
|
|
---- ----------- ----------------- -----------------------
|
|
00 Terminator (NONE) (NONE)
|
|
01 Sound data 2+length of data *
|
|
02 Sound continue length of data Voice Data
|
|
03 Silence 3 **
|
|
04 Marker 2 Marker# (2 bytes)
|
|
05 ASCII length of string null terminated string
|
|
06 Repeat 2 Count# (2 bytes)
|
|
07 End repeat 0 (NONE)
|
|
|
|
*Sound Info Format: **Silence Info Format:
|
|
--------------------- ----------------------------
|
|
00 Sample Rate 00-01 Length of silence - 1
|
|
01 Compression Type 02 Sample Rate
|
|
02+ Voice Data
|
|
|
|
|
|
Marker# -- Driver keeps the most recent marker in a status byte
|
|
Count# -- Number of repetitions + 1
|
|
Count# may be 1 to FFFE for 0 - FFFD repetitions
|
|
or FFFF for endless repetitions
|
|
Sample Rate -- SR byte = 256-(1000000/sample_rate)
|
|
Length of silence -- in units of sampling cycle
|
|
Compression Type -- of voice data
|
|
8-bits = 0
|
|
4-bits = 1
|
|
2.6-bits = 2
|
|
2-bits = 3
|
|
Multi DAC = 3+(# of channels) [interesting--
|
|
this isn't in the developer's manual]
|
|
|
|
------------------------------------------------------------------------
|
|
RIFF WAVE (.WAV) file format
|
|
----------------------------
|
|
|
|
RIFF is a format by Microsoft and IBM which is similar in spirit and
|
|
functionality as EA-IFF-85, but not compatible (and it's in
|
|
little-endian byte order, of course :-). WAVE is RIFF's equivalent of
|
|
AIFF, and its inclusion in Microsoft Windows 3.1 has suddenly made it
|
|
important to know about.
|
|
|
|
Rob Ryan was kind enough to send me a description of the RIFF format.
|
|
Unfortunately, it is too big to include here (27 k), but I've made it
|
|
available for anonymous ftp as ftp.cwi.nl:/pub/RIFF-format.
|
|
|
|
And here's a pointer to the official description from Matt Saettler,
|
|
Microsoft Multimedia:
|
|
|
|
"The complete definition of the WAVE file format as defined by
|
|
IBM/Microsoft is available for anon. FTP from ftp.uu.net in the
|
|
vendor/microsoft/multimedia directory."
|
|
|
|
(Rob Ryan's version may actually be an extract from one of the files
|
|
stored there.)
|
|
|
|
------------------------------------------------------------------------
|
|
|