776 lines
28 KiB
Plaintext
776 lines
28 KiB
Plaintext
RIFF WAVE (.WAV) file format
|
||
----------------------------
|
||
|
||
From: Rob Ryan <ST802200@brownvm.brown.edu>
|
||
Organization: Brown University
|
||
|
||
I found the following lengthy excerpt in a document rmrtf.zrt (it is actually
|
||
a .zip file) in the vendor/microsoft/multimedia subdirectory at the ftp.uu.net
|
||
ftp site. It is presumably beyond the scope (in terms of the amount of
|
||
detail) of your document, but nevertheless, I thought that it may help you
|
||
in including references to the Windows .WAV format in the future.
|
||
|
||
Let me know if you have any questions/comments. Again, thank you for your
|
||
helpful summary. Keep it up!
|
||
|
||
|
||
The following is taken from RIFFMCI.RTF, "Multimedia Programming Interface
|
||
and Data Specification v1.0", a Windows RTF (Rich Text Format) file contained
|
||
in the .zip file, RMRTF.ZRT. The original document is quite long and this
|
||
constitutes pages 83-95 of the text format version (starting on roughly
|
||
page 58 of the RTF version). If you would like a PostScript version, let
|
||
me know and I can make one up for you.
|
||
|
||
|
||
Waveform Audio File Format (WAVE)
|
||
|
||
|
||
This section describes the Waveform format, which is used to
|
||
represent digitized sound.
|
||
|
||
The WAVE form is defined as follows. Programs must expect
|
||
(and ignore) any unknown chunks encountered, as with all
|
||
RIFF forms. However, <fmt-ck> must always occur before
|
||
<wave-data>, and both of these chunks are mandatory in a
|
||
WAVE file.
|
||
|
||
<WAVE-form> ->
|
||
RIFF( 'WAVE'
|
||
<fmt-ck> // Format
|
||
[<fact-ck>] // Fact chunk
|
||
[<cue-ck>] // Cue points
|
||
[<playlist-ck>] // Playlist
|
||
[<assoc-data-list>] // Associated
|
||
data list
|
||
<wave-data> ) // Wave data
|
||
|
||
The WAVE chunks are described in the following sections.
|
||
|
||
|
||
WAVE Format Chunk
|
||
|
||
|
||
The WAVE format chunk <fmt-ck> specifies the format of the
|
||
<wave-data>. The <fmt-ck> is defined as follows:
|
||
|
||
<fmt-ck> -> fmt( <common-fields>
|
||
<format-specific-fields> )
|
||
|
||
<common-fields> ->
|
||
struct
|
||
{
|
||
WORD wFormatTag; // Format category
|
||
WORD wChannels; // Number of channels
|
||
DWORDdwSamplesPerSec; // Sampling rate
|
||
DWORDdwAvgBytesPerSec; // For buffer
|
||
estimation
|
||
WORD wBlockAlign; // Data block size
|
||
}
|
||
|
||
The fields in the <common-fields> chunk are as follows:
|
||
|
||
|
||
|
||
Field Description
|
||
|
||
|
||
|
||
|
||
|
||
|
||
wFormatTag A number indicating the WAVE format
|
||
category of the file. The content of
|
||
the <format-specific-fields> portion
|
||
of the `fmt' chunk, and the
|
||
interpretation of the waveform data,
|
||
depend on this value.
|
||
|
||
You must register any new WAVE format
|
||
categories. See ``Registering
|
||
Multimedia Formats'' in Chapter 1,
|
||
``Overview of Multimedia
|
||
Specifications,'' for information on
|
||
registering WAVE format categories.
|
||
|
||
``Wave Format Categories,'' following
|
||
this section, lists the currently
|
||
defined WAVE format categories.
|
||
|
||
wChannels The number of channels represented in
|
||
the waveform data, such as 1 for mono
|
||
or 2 for stereo.
|
||
|
||
dwSamplesPerSe The sampling rate (in samples per
|
||
c second) at which each channel should
|
||
be played.
|
||
|
||
dwAvgBytesPerS The average number of bytes per second
|
||
ec at which the waveform data should be
|
||
transferred. Playback software can
|
||
estimate the buffer size using this
|
||
value.
|
||
|
||
wBlockAlign The block alignment (in bytes) of the
|
||
waveform data. Playback software needs
|
||
to process a multiple of wBlockAlign
|
||
bytes of data at a time, so the value
|
||
of wBlockAlign can be used for buffer
|
||
alignment.
|
||
|
||
|
||
|
||
The <format-specific-fields> consists of zero or more bytes
|
||
of parameters. Which parameters occur depends on the WAVE
|
||
format category-see the following section for details.
|
||
Playback software should be written to allow for (and
|
||
ignore) any unknown <format-specific-fields> parameters that
|
||
occur at the end of this field.
|
||
|
||
|
||
|
||
WAVE Format Categories
|
||
|
||
|
||
The format category of a WAVE file is specified by the value
|
||
of the wFormatTag field of the `fmt' chunk. The
|
||
|
||
|
||
|
||
|
||
representation of data in <wave-data>, and the content of
|
||
the <format-specific-fields> of the `fmt' chunk, depend on
|
||
the format category.
|
||
|
||
The currently defined open non-proprietary WAVE format
|
||
categories are as follows:
|
||
|
||
|
||
|
||
wFormatTag Value Format Category
|
||
|
||
|
||
WAVE_FORMAT_PCM (0x0001) Microsoft Pulse Code
|
||
Modulation (PCM) format
|
||
|
||
|
||
|
||
The following are the registered proprietary WAVE format
|
||
categories:
|
||
|
||
|
||
|
||
wFormatTag Value Format Category
|
||
|
||
|
||
IBM_FORMAT_MULAW IBM mu-law format
|
||
(0x0101)
|
||
|
||
IBM_FORMAT_ALAW (0x0102) IBM a-law format
|
||
|
||
IBM_FORMAT_ADPCM IBM AVC Adaptive
|
||
(0x0103) Differential Pulse Code
|
||
Modulation format
|
||
|
||
|
||
|
||
The following sections describe the Microsoft
|
||
WAVE_FORMAT_PCM format.
|
||
|
||
|
||
Pulse Code Modulation (PCM) Format
|
||
|
||
|
||
If the wFormatTag field of the <fmt-ck> is set to
|
||
WAVE_FORMAT_PCM, then the waveform data consists of samples
|
||
represented in pulse code modulation (PCM) format. For PCM
|
||
waveform data, the <format-specific-fields> is defined as
|
||
follows:
|
||
|
||
<PCM-format-specific> ->
|
||
struct
|
||
{
|
||
WORD wBitsPerSample; // Sample size
|
||
}
|
||
|
||
The wBitsPerSample field specifies the number of bits of
|
||
data used to represent each sample of each channel. If there
|
||
|
||
|
||
|
||
|
||
are multiple channels, the sample size is the same for each
|
||
channel.
|
||
|
||
For PCM data, the wAvgBytesPerSec field of the `fmt' chunk
|
||
should be equal to the following formula rounded up to the
|
||
next whole number:
|
||
|
||
wBitsPerSample
|
||
wChannels x wBitsPerSecond x --------------
|
||
8
|
||
|
||
The wBlockAlign field should be equal to the following
|
||
formula, rounded to the next whole number:
|
||
|
||
wBitsPerSample
|
||
wChannels x --------------
|
||
8
|
||
|
||
Data Packing for PCM WAVE Files
|
||
|
||
In a single-channel WAVE file, samples are stored
|
||
consecutively. For stereo WAVE files, channel 0 represents
|
||
the left channel, and channel 1 represents the right
|
||
channel. The speaker position mapping for more than two
|
||
channels is currently undefined. In multiple-channel WAVE
|
||
files, samples are interleaved.
|
||
|
||
The following diagrams show the data packing for a 8-bit
|
||
mono and stereo WAVE files:
|
||
|
||
|
||
Sample 1 Sample 2 Sample 3 Sample 4
|
||
|
||
|
||
Channel 0 Channel 0 Channel 0 Channel 0
|
||
|
||
|
||
|
||
Data Packing for 8-Bit Mono PCM
|
||
|
||
|
||
|
||
Sample 1 Sample 2
|
||
|
||
Channel 0 Channel 1 Channel 0 Channel 0
|
||
(left) (right) (left) (right)
|
||
|
||
|
||
|
||
Data Packing for 8-Bit Stereo PCM
|
||
|
||
|
||
|
||
The following diagrams show the data packing for 16-bit mono
|
||
and stereo WAVE files:
|
||
|
||
|
||
Sample 1 Sample 2
|
||
|
||
|
||
|
||
|
||
|
||
Channel 0 Channel 0 Channel 0 Channel 0
|
||
|
||
low-order high-order low-order high-order
|
||
byte byte byte byte
|
||
|
||
|
||
Data Packing for 16-Bit Mono PCM
|
||
|
||
|
||
|
||
Sample 1
|
||
|
||
Channel 0 Channel 0 Channel 1 Channel 1
|
||
(left) (left) (right) (right)
|
||
low-order high-order low-order high-order
|
||
byte byte byte byte
|
||
|
||
|
||
Data Packing for 16-Bit Stereo PCM
|
||
|
||
|
||
|
||
Data Format of the Samples
|
||
|
||
Each sample is contained in an integer i. The size of i is
|
||
the smallest number of bytes required to contain the
|
||
specified sample size. The least significant byte is stored
|
||
first. The bits that represent the sample amplitude are
|
||
stored in the most significant bits of i, and the remaining
|
||
bits are set to zero.
|
||
|
||
For example, if the sample size (recorded in nBitsPerSample)
|
||
is 12 bits, then each sample is stored in a two-byte
|
||
integer. The least significant four bits of the first (least
|
||
significant) byte is set to zero.
|
||
|
||
The data format and maximum and minimums values for PCM
|
||
waveform samples of various sizes are as follows:
|
||
|
||
|
||
|
||
Sample Size Data Format Maximum Value Minimum Value
|
||
|
||
|
||
One to Unsigned 255 (0xFF) 0
|
||
eight bits integer
|
||
|
||
Nine or Signed Largest Most negative
|
||
more bits integer i positive value of i
|
||
value of i
|
||
|
||
|
||
For example, the maximum, minimum, and midpoint values for
|
||
8-bit and 16-bit PCM waveform data are as follows:
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Format Maximum Minimum Value Midpoint
|
||
Value Value
|
||
|
||
|
||
8-bit PCM 255 (0xFF) 0 128 (0x80)
|
||
|
||
16-bit PCM 32767 -32768 0
|
||
(0x7FFF) (-0x8000)
|
||
|
||
|
||
Examples of PCM WAVE Files
|
||
|
||
Example of a PCM WAVE file with 11.025 kHz sampling rate,
|
||
mono, 8 bits per sample:
|
||
|
||
RIFF( 'WAVE' fmt(1, 1, 11025, 11025, 1, 8)
|
||
data( <wave-data> ) )
|
||
|
||
Example of a PCM WAVE file with 22.05 kHz sampling rate,
|
||
stereo, 8 bits per sample:
|
||
|
||
RIFF( 'WAVE' fmt(1, 2, 22050, 44100, 2, 8)
|
||
data( <wave-data> ) )
|
||
|
||
Example of a PCM WAVE file with 44.1 kHz sampling rate,
|
||
mono, 20 bits per sample:
|
||
|
||
RIFF( 'WAVE' INFO(INAM("O Canada"Z))
|
||
fmt(1, 1, 44100, 132300, 3, 20)
|
||
data( <wave-data> ) )
|
||
|
||
|
||
Storage of WAVE Data
|
||
|
||
|
||
The <wave-data> contains the waveform data. It is defined as
|
||
follows:
|
||
|
||
<wave-data> -> { <data-ck> : <data-list> }
|
||
|
||
<data-ck> -> data( <wave-data> )
|
||
|
||
<wave-list> -> LIST( 'wavl' { <data-ck> :
|
||
// Wave samples
|
||
<silence-ck> }... ) // Silence
|
||
|
||
<silence-ck> -> slnt( <dwSamples:DWORD> ) // Count
|
||
of
|
||
// silent samples
|
||
|
||
Note: The `slnt' chunk represents silence, not necessarily
|
||
a repeated zero volume or baseline sample. In 16-bit PCM
|
||
data, if the last sample value played before the silence
|
||
section is a 10000, then if data is still output to the D to
|
||
A converter, it must maintain the 10000 value. If a zero
|
||
|
||
|
||
|
||
|
||
value is used, a click may be heard at the start and end of
|
||
the silence section. If play begins at a silence section,
|
||
then a zero value might be used since no other information
|
||
is available. A click might be created if the data following
|
||
the silent section starts with a nonzero value.
|
||
|
||
|
||
FACT Chunk
|
||
|
||
|
||
The <fact-ck> fact chunk stores important information about
|
||
the contents of the WAVE file. This chunk is defined as
|
||
follows:
|
||
|
||
<fact-ck> -> fact( <dwFileSize:DWORD> ) // Number
|
||
of samples
|
||
|
||
The `fact'' chunk is required if the waveform data is
|
||
contained in a `wavl'' LIST chunk and for all compressed
|
||
audio formats. The chunk is not required for PCM files using
|
||
the `data'' chunk format.
|
||
|
||
The "fact" chunk will be expanded to include any other
|
||
information required by future WAVE formats. Added fields
|
||
will appear following the <dwFileSize> field. Applications
|
||
can use the chunk size field to determine which fields are
|
||
present.
|
||
|
||
|
||
Cue-Points Chunk
|
||
|
||
|
||
The <cue-ck> cue-points chunk identifies a series of
|
||
positions in the waveform data stream. The <cue-ck> is
|
||
defined as follows:
|
||
|
||
<cue-ck> -> cue( <dwCuePoints:DWORD> // Count of cue
|
||
points
|
||
<cue-point>... ) // Cue-point
|
||
table
|
||
|
||
<cue-point> -> struct {
|
||
DWORD dwName;
|
||
DWORD dwPosition;
|
||
FOURCC fccChunk;
|
||
DWORD dwChunkStart;
|
||
DWORD dwBlockStart;
|
||
DWORD dwSampleOffset;
|
||
}
|
||
|
||
The <cue-point> fields are as follows:
|
||
|
||
|
||
|
||
Field Description
|
||
|
||
|
||
|
||
|
||
|
||
dwName Specifies the cue point name. Each
|
||
<cue-point> record must have a unique
|
||
dwName field.
|
||
|
||
dwPosition Specifies the sample position of the
|
||
cue point. This is the sequential
|
||
sample number within the play order.
|
||
See ``Playlist Chunk,'' later in this
|
||
document, for a discussion of the play
|
||
order.
|
||
|
||
fccChunk Specifies the name or chunk ID of the
|
||
chunk containing the cue point.
|
||
|
||
dwChunkStart Specifies the file position of the
|
||
start of the chunk containing the cue
|
||
point. This is a byte offset relative
|
||
to the start of the data section of
|
||
the `wavl' LIST chunk.
|
||
|
||
dwBlockStart Specifies the file position of the
|
||
start of the block containing the
|
||
position. This is a byte offset
|
||
relative to the start of the data
|
||
section of the `wavl' LIST chunk.
|
||
|
||
dwSampleOffset Specifies the sample offset of the cue
|
||
point relative to the start of the
|
||
block.
|
||
|
||
|
||
|
||
|
||
Examples of File Position Values
|
||
|
||
|
||
The following table describes the <cue-point> field values
|
||
for a WAVE file containing multiple `data' and `slnt' chunks
|
||
enclosed in a `wavl' LIST chunk:
|
||
|
||
|
||
|
||
Cue Point Field Value
|
||
Location
|
||
|
||
|
||
In a `slnt' fccChunk FOURCC value `slnt'.
|
||
chunk
|
||
|
||
dwChunkStart File position of the
|
||
`slnt' chunk relative to
|
||
the start of the data
|
||
section in the `wavl' LIST
|
||
chunk.
|
||
|
||
|
||
|
||
|
||
|
||
dwBlockStart File position of the data
|
||
section of the `slnt'
|
||
chunk relative to the
|
||
start of the data section
|
||
of the `wavl' LIST chunk.
|
||
|
||
dwSampleOffs Sample position of the cue
|
||
et point relative to the
|
||
start of the `slnt' chunk.
|
||
|
||
In a PCM fccChunk FOURCC value `data'.
|
||
`data' chunk
|
||
|
||
dwChunkStart File position of the
|
||
`data' chunk relative to
|
||
the start of the data
|
||
section in the `wavl' LIST
|
||
chunk.
|
||
|
||
dwBlockStart File position of the cue
|
||
point relative to the
|
||
start of the data section
|
||
of the `wavl' LIST chunk.
|
||
|
||
dwSampleOffs Zero value.
|
||
et
|
||
|
||
In a fccChunk FOURCC value `data'.
|
||
compressed
|
||
`data' chunk
|
||
|
||
dwChunkStart File position of the start
|
||
of the `data' chunk
|
||
relative to the start of
|
||
the data section of the
|
||
`wavl' LIST chunk.
|
||
|
||
dwBlockStart File position of the
|
||
enclosing block relative
|
||
to the start of the data
|
||
section of the `wavl' LIST
|
||
chunk. The software can
|
||
begin the decompression at
|
||
this point.
|
||
|
||
dwSampleOffs Sample position of the cue
|
||
et point relative to the
|
||
start of the block.
|
||
|
||
|
||
|
||
The following table describes the <cue-point> field values
|
||
for a WAVE file containing a single `data' chunk:
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Cue Point Field Value
|
||
Location
|
||
|
||
|
||
Within PCM fccChunk FOURCC value `data'.
|
||
data
|
||
|
||
dwChunkStart Zero value.
|
||
|
||
dwBlockStart Zero value.
|
||
|
||
dwSampleOffs Sample position of the cue
|
||
et point relative to the
|
||
start of the `data' chunk.
|
||
|
||
In a fccChunk FOURCC value `data'.
|
||
compressed
|
||
`data' chunk
|
||
|
||
dwChunkStart Zero value.
|
||
|
||
dwBlockStart File position of the
|
||
enclosing block relative
|
||
to the start of the `data'
|
||
chunk. The software can
|
||
begin the decompression at
|
||
this point.
|
||
|
||
dwSampleOffs Sample position of the cue
|
||
et point relative to the
|
||
start of the block.
|
||
|
||
|
||
|
||
Playlist Chunk
|
||
|
||
|
||
The <playlist-ck> playlist chunk specifies a play order for
|
||
a series of cue points. The <playlist-ck> is defined as
|
||
follows:
|
||
|
||
<playlist-ck> -> plst(
|
||
<dwSegments:DWORD> // Count of play
|
||
segments
|
||
<play-segment>... ) // Play-segment
|
||
table
|
||
|
||
<play-segment> -> struct {
|
||
DWORD dwName;
|
||
DWORD dwLength;
|
||
DWORD dwLoops;
|
||
}
|
||
|
||
The <play-segment> fields are as follows:
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Field Description
|
||
|
||
|
||
dwName Specifies the cue point name. This
|
||
value must match one of the names
|
||
listed in the <cue-ck> cue-point
|
||
table.
|
||
|
||
dwLength Specifies the length of the section in
|
||
samples.
|
||
|
||
dwLoops Specifies the number of times to play
|
||
the section.
|
||
|
||
|
||
|
||
|
||
Associated Data Chunk
|
||
|
||
|
||
The <assoc-data-list> associated data list provides the
|
||
ability to attach information like labels to sections of the
|
||
waveform data stream. The <assoc-data-list> is defined as
|
||
follows:
|
||
|
||
<assoc-data-list> -> LIST('adtl'
|
||
<labl-ck> // Label
|
||
<note-ck> // Note
|
||
<ltxt-ck> // Text
|
||
with data length
|
||
<file-ck> ) // Media
|
||
file
|
||
|
||
<labl-ck> -> labl(<dwName:DWORD>
|
||
<data:ZSTR> )
|
||
|
||
<note-ck> -> note(<dwName:DWORD>
|
||
<data:ZSTR> )
|
||
|
||
<ltxt-ck> -> ltxt(<dwName:DWORD>
|
||
<dwSampleLength:DWORD>
|
||
<dwPurpose:DWORD>
|
||
<wCountry:WORD>
|
||
<wLanguage:WORD>
|
||
<wDialect:WORD>
|
||
<wCodePage:WORD>
|
||
<data:BYTE>... )
|
||
|
||
<file-ck> -> file(<dwName:DWORD>
|
||
<dwMedType:DWORD>
|
||
<fileData:BYTE>...)
|
||
|
||
|
||
|
||
|
||
Label and Note Information
|
||
|
||
|
||
The `labl' and `note' chunks have similar fields. The `labl'
|
||
chunk contains a label, or title, to associate with a cue
|
||
point. The `note' chunk contains comment text for a cue
|
||
point. The fields are as follows:
|
||
|
||
|
||
|
||
Field Description
|
||
|
||
|
||
dwName Specifies the cue point name. This
|
||
value must match one of the names
|
||
listed in the <cue-ck> cue-point
|
||
table.
|
||
|
||
data Specifies a NULL-terminated string
|
||
containing a text label (for the
|
||
`labl' chunk) or comment text (for the
|
||
`note' chunk).
|
||
|
||
|
||
|
||
|
||
Text with Data Length Information
|
||
|
||
|
||
The `ltxt'' chunk contains text that is associated with a
|
||
data segment of specific length. The chunk fields are as
|
||
follows:
|
||
|
||
|
||
|
||
Field Description
|
||
|
||
|
||
dwName Specifies the cue point name. This
|
||
value must match one of the names
|
||
listed in the <cue-ck> cue-point
|
||
table.
|
||
|
||
dwSampleLength Specifies the number of samples in the
|
||
segment of waveform data.
|
||
|
||
dwPurpose Specifies the type or purpose of the
|
||
text. For example, dwPurpose can
|
||
specify a FOURCC code like `scrp' for
|
||
script text or `capt' for close-
|
||
caption text.
|
||
|
||
wCountry Specifies the country code for the
|
||
text. See ``Country Codes'' in Chapter
|
||
2, ``Resource Interchange File
|
||
Format,'' for a current list of
|
||
country codes.
|
||
|
||
|
||
|
||
|
||
|
||
wLanguage, Specify the language and dialect codes
|
||
wDialect for the text. See ``Language and
|
||
Dialect Codes'' in Chapter 2,
|
||
``Resource Interchange File Format,''
|
||
for a current list of language and
|
||
dialect codes.
|
||
|
||
wCodePage Specifies the code page for the text.
|
||
|
||
|
||
|
||
|
||
Embedded File Information
|
||
|
||
|
||
The `file' chunk contains information described in other
|
||
file formats (for example, an `RDIB' file or an ASCII text
|
||
file). The chunk fields are as follows:
|
||
|
||
|
||
|
||
Field Description
|
||
|
||
|
||
dwName Specifies the cue point name. This
|
||
value must match one of the names
|
||
listed in the <cue-ck> cue-point
|
||
table.
|
||
|
||
dwMedType Specifies the file type contained in
|
||
the fileData field. If the fileData
|
||
section contains a RIFF form, the
|
||
dwMedType field is the same as the
|
||
RIFF form type for the file.
|
||
|
||
This field can contain a zero value.
|
||
|
||
fileData Contains the media file.
|
||
|