839 lines
39 KiB
Plaintext
839 lines
39 KiB
Plaintext
Simple Offline USENET Packet Format (SOUP) Version 1.2
|
|
|
|
Copyright (c) 1992-1993 Rhys Weatherley
|
|
|
|
rhys@cs.uq.oz.au
|
|
|
|
Last Update: 14 August 1993
|
|
|
|
DISTRIBUTION
|
|
|
|
Permission to use, copy, and distribute this material for any purpose
|
|
and without fee is hereby granted, provided that the above copyright notice
|
|
and this permission notice appear in all copies, and that the name of Rhys
|
|
Weatherley not be used in advertising or publicity pertaining to this material
|
|
without specific, prior written permission. RHYS WEATHERLEY MAKES NO
|
|
REPRESENTATIONS ABOUT THE ACCURACY OR SUITABILITY OF THIS MATERIAL FOR ANY
|
|
PURPOSE. IT IS PROVIDED "AS IS", WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES.
|
|
|
|
NOTE: This document is NOT in the public domain. It is copyrighted.
|
|
However, the free distribution of this document is unlimited.
|
|
|
|
If you create a product which uses this packet format, it is suggested
|
|
that you include an UNMODIFIED copy of this document to inform your users
|
|
as to the packet format. All queries about this format, or requests for
|
|
the latest version should be directed to Rhys Weatherley at the above
|
|
e-mail address.
|
|
|
|
INTRODUCTION
|
|
|
|
For many years, the FidoNet community has been using QWK and other formats to
|
|
enable users to download their mail and conferences to be read while off-line.
|
|
This not only saves phone charges and prevents tying up BBS lines for long
|
|
periods of time; it also allows a user to use much more powerful tools on
|
|
their own machine to process the downloaded "packets" than what can be made
|
|
available in an on-line environment.
|
|
|
|
To date however, very little work has been done in the USENET and dial-in Unix
|
|
community to facilitate the same user operations. Some attempts have been
|
|
made to use QWK, but due to QWK's limitations and unsuitability for the USENET
|
|
message formats, such efforts have not been very successful.
|
|
|
|
Within USENET, the tendency seems to be either "dial-in to some other machine
|
|
and put up with it", or "set up your own USENET site". The former keeps the
|
|
user at the mercy of whatever user interfaces the admin of the other machine
|
|
sees fit to install, and the latter requires far more computing knowledge than
|
|
the average computer user is expected to have. Both of these can serve to
|
|
lock out large portions of the computer-literate public from experiencing
|
|
USENET. The latter option can also give rise to security problems in the form
|
|
of forged USENET messages, which a more controlled dial-in system avoids.
|
|
|
|
The purpose of this document is to define a new packet format which is aware
|
|
of the conventions used in the USENET community, forming a middle ground
|
|
between dial-in user interfaces and full USENET connectivity. It is not
|
|
limited to downloading USENET news however. The same format could be used
|
|
to enable a Unix user to package up their Unix mailbox and download it for
|
|
later perusal. The format is extensible to other kinds of news or conference
|
|
systems, so it is feasible, although not yet defined, that QWK or FidoNet
|
|
messages could be accomodated within the same packet as USENET messages.
|
|
|
|
REVISION HISTORY
|
|
|
|
1.2 Add COMMANDS and ERRORS files. Renamed to "Simple Offline USENET
|
|
Packet Format". A few extra fields and type codes for the AREAS and
|
|
LIST files. Message area summaries.
|
|
|
|
1.1 Add description of the LIST file. Everything else is identical to 1.0.
|
|
|
|
1.0 Original version of the document.
|
|
|
|
Previously, this document was known as the "Helldiver Packet Format" (HDPF).
|
|
A variant of HDPF, called the "Simple Local News Packet format" (SLNP) was
|
|
created by Philippe Goujard (ppg@oasis.icl.co.uk). This document combines
|
|
the features of both previous formats and the name was changed to make it
|
|
less product-oriented.
|
|
|
|
TERMINOLOGY
|
|
|
|
Packet: a set of files, collected into a compressed archive.
|
|
|
|
Message packet: the primary kind of packet which contains messages for
|
|
the user to read.
|
|
|
|
Reply packet: a special kind of packet which contains replies composed by
|
|
the user, usually in response to the messages in a message packet.
|
|
|
|
Packet generator: a program which generates packets to be downloaded and
|
|
read, and which processes uploaded reply packets.
|
|
|
|
Packet reader: a program which reads packets, usually by presenting the
|
|
messages in a packet to the user, and which generates reply packets.
|
|
|
|
Packet processor: either a packet generator or a packet reader.
|
|
|
|
Generating host: the computer on which the packet generator executes.
|
|
|
|
Reading host: the computer on which the packet reader executes.
|
|
|
|
Download: the transfer of a packet from the generating host to the reading
|
|
host. This transfer may take place in any fashion, although the
|
|
most common method is through the use of a file transfer protocol
|
|
such as Zmodem or Kermit.
|
|
|
|
Upload: the transfer of a packet from the reading host to the generating host.
|
|
|
|
Packet stream: a logical link between the generating and reading hosts over
|
|
which downloads and uploads of packets take place.
|
|
|
|
Message area: a collection of messages which are related by a common topic
|
|
or purpose. Examples of message areas include USENET newsgroups,
|
|
Unix mailboxes, and FidoNet conferences.
|
|
|
|
Reply message area: a special kind of message area which contains replies
|
|
being uploaded to a generating host.
|
|
|
|
Text file: an ASCII file consisting of lines terminated by linefeed characters
|
|
(LF, 10 decimal). Some operating systems terminate lines in a text
|
|
file by CRLF pairs: such files must be converted to LF-terminated
|
|
lines for transmission in a packet.
|
|
|
|
ANATOMY OF A PACKET
|
|
|
|
A packet is a group of files, collected into a compressed archive. The
|
|
standard compression technique defined by this document is ZIP. Other
|
|
techniques such as ARJ, ZOO, ARC, LZH, etc can also be used. It is also
|
|
possible for Unix's tar.Z format to be used to transmit packets. The minimum
|
|
requirement is a method to collect a group of files into a single packet,
|
|
and a method to expand the packet back into the original files. ZIP is
|
|
specified to provide a common compression format for packet processors.
|
|
Each of the filenames in a packet should be stored in upper case on those
|
|
systems where case matters (e.g. Unix).
|
|
|
|
The following file specifications may appear in a packet:
|
|
|
|
INFO Optional textual information.
|
|
LIST List of message areas on the generating host.
|
|
AREAS Index of the message areas within the packet.
|
|
REPLIES Index of the reply message areas from the reading host.
|
|
*.MSG Text of the messages in a particular message area.
|
|
*.IDX Index information for messages in a message area.
|
|
COMMANDS Extra commands sent along with a packet.
|
|
ERRORS Errors that occurred during the execution of commands.
|
|
|
|
Other filenames may also appear in the packet, but are not defined by this
|
|
specification, so they should be avoided by generating software, and ignored
|
|
by receiving software.
|
|
|
|
The INFO file is an optional text file which may contain any kind of textual
|
|
information from the generating system. Typically this file would only be
|
|
present if there is some kind of urgent message that must be sent to the
|
|
receiving user. Use of this file to store the name of the generating host
|
|
and other such static information is possible, but discouraged to save space
|
|
and transmission time. If such information is required, then the COMMANDS
|
|
file can be used to transfer it.
|
|
|
|
The LIST file is an optional text file which contains a list of all message
|
|
areas that are available on the generating host, together with the format of
|
|
the messages. It is specified further in the section "LIST FILE".
|
|
|
|
The AREAS file is a text file which contains an index of the message areas
|
|
present within the packet, specifying the name of the message area, the
|
|
filename the messages may be found in, and the message format. This is
|
|
specified further in the next section.
|
|
|
|
The REPLIES file is a text file which contains an index of the message areas
|
|
present within the packet that contain replies from the user which should
|
|
be mailed or posted on the generating host. In most cases, a packet will
|
|
contain either an AREAS file or a REPLIES file, but both may be present.
|
|
See the section "REPLIES FILE" below for more information.
|
|
|
|
The *.MSG files contain the text of the messages from a single message area.
|
|
The actual format of this file depends on the type of message area specified
|
|
in the AREAS file. See the section "MESSAGE FILES" below for more information.
|
|
|
|
The *.IDX files provide an index into the *.MSG files, usually specifying
|
|
where each message starts and the contents of some of the common message
|
|
header fields. These files are intended for use by reading software on the
|
|
recipient's system to quickly display an overview of the messages present in
|
|
a message area. See the section "INDEX FILES" below for more information.
|
|
|
|
The COMMANDS file is a text file which contains commands to be executed on
|
|
the reading or generating hosts to change the behaviour of the hosts at
|
|
each end of a packet stream. The ERRORS file contains textual error messages
|
|
to report to a human at the host the packet is destined for. These two files
|
|
are explained further in the section "SENDING COMMANDS BETWEEN SYSTEMS" below.
|
|
|
|
AREAS FILE
|
|
|
|
The AREAS file is a text file containing zero or more lines, each of which
|
|
specifies a single message area, its encoding and the name of the message/index
|
|
file pair in which the messages appear. In particular, each line has the
|
|
following form:
|
|
|
|
prefix<TAB>area name<TAB>encoding[<TAB>description[<TAB>number]]
|
|
|
|
where "prefix" specifies the name of the message/index file pair, "area name"
|
|
is the name of the message area, "encoding" specifies the formats of the
|
|
message and index files and the type of message area, "description" is a
|
|
descriptive name for the message area, and "number" is the number of messages
|
|
in the message file. The last two fields are optional. Additional fields may
|
|
be added in a future version of this specification.
|
|
|
|
The message and index files corresponding to the message area have the names
|
|
"prefix.MSG" and "prefix.IDX" respectively. If "prefix" contains alphabetic
|
|
characters, they must be upper case.
|
|
|
|
The message area name may be any sequence of printable ASCII characters (space
|
|
through tilde). Under USENET, this is typically a dotted name like
|
|
"comp.lang.c". Other networks may include spaces or other unusual characters
|
|
in the area names, so the receiving software must be aware of this fact,
|
|
and act accordingly. Also, receiving software must deal gracefully with
|
|
characters that have the high bit set, or names that contain control
|
|
characters, since people in other countries that speak a language other than
|
|
English may wish to use their country's native encoding for the message area
|
|
name. The only hard rule is that the name may not contain TAB, CR or LF.
|
|
Receiving software should treat the name as an indivisible string to be
|
|
displayed to the user.
|
|
|
|
The encoding field consists of two or three ASCII characters (usually
|
|
alphabetic). The first specifies the format of the message file, the second
|
|
specifies the format of the index file, and the optional third specifies the
|
|
kind of area (private or public). The following message file formats are
|
|
currently defined (case is significant):
|
|
|
|
u USENET news articles
|
|
m Unix mailbox articles
|
|
M Mailbox articles in the MMDF format
|
|
b Binary 8-bit clean mail format
|
|
B Binary 8-bit clean news format
|
|
i Index file only
|
|
|
|
The individual message file encodings are explained further in the next
|
|
section. The format 'i' indicates that no message file is present, and
|
|
the index file should be used as a summary of the messages in the message
|
|
area. This is explained further in the section "MESSAGE AREA SUMMARIES".
|
|
The following index file formats are currently defined (again, case is
|
|
significant):
|
|
|
|
n No index file
|
|
c C-news overview database format
|
|
C Shorter C-news overview database format
|
|
i Offset/length pairs delineating the messages
|
|
|
|
These types are explained further in the section "INDEX FILES" below.
|
|
|
|
See the section "MINIMAL CONFORMANCE" for information on the minimal number
|
|
of message and index formats that should be supported by packet generators
|
|
and packet readers.
|
|
|
|
The following kind of message areas are currently defined (again, case is
|
|
significant):
|
|
|
|
m The message area contains private mail
|
|
n The message area contains public messages, or news
|
|
u The message area kind is unknown (the default)
|
|
|
|
This third letter is optional. If it is not present or unknown, the kind
|
|
of area depends on the message file type. Message types 'm', 'M', and 'b'
|
|
default to kind 'm', and message types 'u', 'B' and 'i' default to kind 'n'.
|
|
It is not recommended that the value 'u' for this third letter be used,
|
|
although future versions of this specification may add additional letters,
|
|
necessitating 'u' to be placed in the third letter if the kind is unknown.
|
|
If the message area kind can be solely determined from the message file
|
|
type, it is recommended that the third letter be omitted to save space and
|
|
transmission time.
|
|
|
|
Further types may be defined in future versions of this specification. If
|
|
the packet processor does not recognise a message file type, it should ignore
|
|
the corresponding message and index files. If the packet processor does
|
|
not recognise a index file type, it can either ignore the message file, or
|
|
attempt to break down the message file into separate messages by some other
|
|
means. If the packet processor does not recognise a message area kind,
|
|
the kind should be treated as unknown. The user should be warned if a message
|
|
area has been ignored.
|
|
|
|
The optional message area description in the AREAS file consists of any
|
|
sequence of printable ASCII characters. This may be used to insert a
|
|
"readable" name for the message area. It may not contain TAB, CR or LF.
|
|
|
|
A message area may appear more than once in the AREAS file, each time with a
|
|
different prefix, but this is discouraged. This could be used to split large
|
|
message areas across more than one message file, but this is more conveniently
|
|
handled by generating a separate packet containing the area contination.
|
|
|
|
The following examples demonstrate the capabilities of the AREAS file:
|
|
|
|
0000000 Email mn
|
|
0000001 comp.lang.c uc C Programming Language Discussions 125
|
|
0000002 news.future Bc Future of USENET 38
|
|
|
|
EMAIL /usr/spool/mail/fred unm Private e-mail for fred
|
|
U000001 comp.bbs.misc MCn
|
|
U000002 comp.bbs.waffle ui
|
|
|
|
MESSAGE FILES
|
|
|
|
The format of the message file depends on the message file format specified in
|
|
the AREAS file. This version of the specification defines three formats,
|
|
which are in common use in the USENET and Unix community, and two additional
|
|
binary formats which permit messages to be stored with no modification or
|
|
assumptions about line lengths and byte values.
|
|
|
|
For each of these formats, lines are terminated with LF characters. Any CR
|
|
characters in the messages should be considered as data characters, or ignored
|
|
on receipt. In particular, MS-DOS systems should strip CR characters from
|
|
text messages before writing them to a packet.
|
|
|
|
A 'u' (USENET) message file is a text file consisting of one or more messages
|
|
prefixed with an rnews header. This header has the form "#! rnews n" where
|
|
"n" is the number of bytes in the message that follows the header, excluding
|
|
the line-feed character which terminates the header. If the number in the
|
|
header is followed by white space and other characters, these other characters
|
|
should be ignored, until the terminating LF character is encountered.
|
|
|
|
A note about the rnews header: although a terser separator could be used, the
|
|
rnews header has the following advantages: (a) the messages can be extracted
|
|
in the absense of index files, or where the index files have an unknown type,
|
|
and (b) the message files can be imported into a USENET system as standard
|
|
rnews batches. Thus, if the user wishes to set up a real USENET site, or
|
|
simply use dedicated USENET software to read packets, they can use their
|
|
existing packet provider as a convenient read-only newsfeed, with no extra
|
|
burden placed on the system administrator of the generating system.
|
|
|
|
A 'm' (Unix mailbox) message file is a text file consisting of one or more
|
|
messages. The first line of each message must start with the character
|
|
sequence "From ". Any remaining lines in the message which start with
|
|
"From " should have the character '>' prepended. Thus the "From " lines
|
|
delimit the message file into separate messages.
|
|
|
|
A 'M' (MMDF mailbox) message file is a sequence of one or more messages,
|
|
separated by at least 4 Control-A characters. The message file may optionally
|
|
start and end with a sequence of such characters. If a sequence of 4 or more
|
|
Control-A characters occurs in a message, it should be "adjusted" by the
|
|
insertion of spaces to split the sequence. The use of Control-A characters
|
|
within a message is discouraged.
|
|
|
|
The 'm' and 'M' formats were chosen for mail because of their common
|
|
occurrence in the Unix community. The generating system may elect to instead
|
|
convert a mailbox into the USENET format if it wishes, and set the area kind
|
|
to 'm' to inform the packet reader that the message area contains private
|
|
e-mail rather than news.
|
|
|
|
The 'b' (binary mail) and 'B' (binary news) formats are identical. The
|
|
contents of each message must conform to RFC-822/1036 and may contain content
|
|
information compatible with RFC-1341 (MIME). The only difference between
|
|
the messages of these formats and the preceding formats is that no assumption
|
|
is made about line lengths, and any of the 256 values for a byte may be used
|
|
in any position. Each message is preceded by a 4-byte value which indicates
|
|
the length of the message in bytes, stored in big-endian order (i.e. high
|
|
byte first, low byte last). The difference between 'b' and 'B' is a semantic
|
|
one: message files of type 'b' are expected to contain mail messages, and
|
|
message files of type 'B' are expected to contain news messages. Thus, reader
|
|
software can make a distinction between the two if it desires.
|
|
|
|
For most practical purposes, 'u', 'm' and 'M' should be sufficient. The binary
|
|
'b' and 'B' types should be used for articles that contain 8-bit binary data.
|
|
It is possible to use type 'u' for binary data as well, but 'm' and 'M'
|
|
cannot be because the message contents may be modified. When MIME becomes
|
|
more wide-spread, it is expected that binary messages containing programs,
|
|
sound, pictures and video will become popular, necessitating these binary
|
|
types.
|
|
|
|
Note that MIME messages can be stored in 'u', 'm' and 'M' message files, but
|
|
any binary components should be encoded with quoted-printable or base64 (which
|
|
is expected to be the most common usage of MIME in the near future). It is
|
|
not required that 'b' or 'B' be used for MIME messages: only those containing
|
|
raw unencoded binary data (as indicated by the Content-transfer-encoding
|
|
header value "binary").
|
|
|
|
INDEX FILES
|
|
|
|
This specification defines four index file types, which provide varying
|
|
degrees of support for packet readers.
|
|
|
|
Type 'n' indicates that no index file is present, and it is up to the packet
|
|
reader to extract messages from the message file. It is useful where the
|
|
generating system is providing a USENET newsfeed using packets, and the
|
|
receiving system is not interested in the index information.
|
|
|
|
A type 'c' index file is a text file (LF terminated lines), with one line per
|
|
message that occurs in the message file. The lines in the index file should
|
|
be in the same order as the corresponding messages. Each line has the
|
|
following form:
|
|
|
|
offset<TAB>subject<TAB>author<TAB>date<TAB>mesgid<TAB>
|
|
refs<TAB>bytes<TAB>lines[<TAB>selector]
|
|
|
|
[Note: the line-wrapping here is for document-formating purposes only. No
|
|
line-wrapping occurs in the index files]. The fields have the following
|
|
semantics:
|
|
|
|
offset Seek position in the message file of where the corresponding
|
|
message starts. The first seek position is 0. For the 'u'
|
|
format, this indicates the start of the line following the
|
|
rnews header line. For the 'm' format, this indicates the
|
|
start of the "From " line and for the 'M' format, this
|
|
indicates the start of the article after the Control-A
|
|
sequence. For the 'b' and 'B' formats, this indicates the
|
|
first byte of the message after the 4-byte message length.
|
|
|
|
subject The "Subject:" line from the message.
|
|
|
|
author The "From:" line from the message.
|
|
|
|
date The "Date:" line from the message.
|
|
|
|
mesgid The "Message-Id:" line from the message.
|
|
|
|
refs The "References:" line from the message.
|
|
|
|
bytes The number of bytes in the message. If this field is zero,
|
|
then it indicates that there is no corresponding message
|
|
in the message file. This is used for summaries: see the
|
|
section "MESSAGE AREA SUMMARIES" for more details.
|
|
|
|
lines The "Lines:" line from the message. Note that this field
|
|
is pretty useless these days on USENET, but is still popular.
|
|
It is meant to indicate the number of lines in the body of
|
|
the message. Generating software may elect to re-generate
|
|
this value if it is not present in the original message,
|
|
but this is not required.
|
|
|
|
selector A string used for summaries to request that a message be
|
|
sent in a future packet. See the section "MESSAGE AREA
|
|
SUMMARIES" for more details. This string will usually be
|
|
a number, but other values such as Message-ID's could be
|
|
used. Packet readers should treat this string as an
|
|
indivisible string to be sent in a "sendme" command in the
|
|
COMMANDS file. A zero-length string indicates that there
|
|
is no selector string.
|
|
|
|
If any of these fields contained TAB's, newlines or other white space in the
|
|
original articles, they should be converted into single spaces. All fields
|
|
must be present, but some may be empty. The "bytes" field must not be empty,
|
|
since it provides necessary information for packet readers. Each field must
|
|
conform to the Internet RFC documents RFC-822 or RFC-1036.
|
|
|
|
Optionally, a header line may end with one or more extra TAB-separated fields
|
|
for other RFC-compliant header fields, together with the header field names.
|
|
e.g. "Supersedes: <1234@foovax>". These fields are not defined by this
|
|
version of the specification, and are by arrangement between the generating
|
|
host and the reading host only.
|
|
|
|
This format is compatible with the news overview (NOV) database format of
|
|
C-news. The only difference being the substitution of an offset for the
|
|
article number used by C-news, and the addition of the "selector" field.
|
|
The C-news format was designed to assist threading newsreaders, so this packet
|
|
format should provide similar assistance to threading packet readers.
|
|
|
|
The 'C' format is similar to 'c', except that the "mesgid" and "refs" fields
|
|
are dropped. These fields can commonly be quite long and are mainly of use to
|
|
packet readers which perform Message-ID based message threading. Packet
|
|
readers which perform subject threading (i.e. sort on the subject line and
|
|
then on the date and/or arrival order) do not require such information. The
|
|
format of the header lines in this case is as follows:
|
|
|
|
offset<TAB>subject<TAB>author<TAB>date<TAB>bytes<TAB>lines[<TAB>selector]
|
|
|
|
Further TAB-separated fields may be added in future versions of this
|
|
specification.
|
|
|
|
The "author" field is slightly different to the 'c' format. Instead of
|
|
an RFC-822 format address, it is just the author's name, extracted from the
|
|
"From:" line of the message. Most RFC-822 and RFC-1036 "From:" lines have one
|
|
of the following forms:
|
|
|
|
address
|
|
address (name)
|
|
name <address>
|
|
|
|
Names may sometimes be surrounded by double-quote characters, have embedded
|
|
"(...)" sequences, or contain "useless" information after a comma (",") or
|
|
slash ("/"). The main requirement is that the generating software produce
|
|
some kind of (more or less) meaningful string for the name of the author which
|
|
can be displayed to the user by a packet reader. See RFC-822 and RFC-1036
|
|
for more information on the syntax of the "From:" line in messages.
|
|
|
|
The 'i' index format is purely binary, using 8 bytes for each message in the
|
|
corresponding message file. The first 4 bytes specify the offset into the
|
|
message file of the message and the remaining 4 bytes specify the number of
|
|
bytes in the message. Each 4-byte quantity is stored in big-endian order
|
|
(high byte first). This format is supplied to provide a trade-off between
|
|
transmission time and easy extraction of messages from a message file.
|
|
|
|
REPLIES FILE
|
|
|
|
One of the requirements for an off-line reading system is a mechanism for a
|
|
user to upload replies or new messages to a generating system for mailing or
|
|
posting. While it is possible to re-use the AREAS file for this purpose,
|
|
keeping the download and upload sections separate will help prevent messages
|
|
being fed back into a network erroneously.
|
|
|
|
The REPLIES file has a similar format to the AREAS file. Each line has the
|
|
following form:
|
|
|
|
prefix<TAB>reply kind<TAB>encoding
|
|
|
|
The "prefix" and "encoding" fields are as before. The "reply kind" field
|
|
indicates the mechanism to use when transmitting the messages in the message
|
|
file. The following values are currently defined:
|
|
|
|
mail Transmit an RFC-822 compliant personal mail message
|
|
news Transmit an RFC-1036 compliant USENET news posting
|
|
|
|
On a Unix system, transmission of mail and news is usually performed with the
|
|
"sendmail" and "inews" programs respectively. Additional kinds may be
|
|
specified in a future version of this specification for other message formats.
|
|
Note: it is discouraged that the kinds "mail" and "news" be used for anything
|
|
other than RFC-compliant messages. In particular, FidoNet or QWK messages
|
|
should use a different reply kind. Messages of the same reply kind can be
|
|
placed in the same message file, or in separate message files.
|
|
|
|
Further TAB-separated fields may be added to the lines in the REPLIES file
|
|
in a future version of this specification.
|
|
|
|
It is recommended that a message file type of 'b' or 'B' be used for sending
|
|
replies to minimise the chance of message corruption. The recommended index
|
|
file types for replies are 'i' and 'n'. The index types 'c' and 'C' are
|
|
discouraged because they do not provide useful information for reply purposes.
|
|
|
|
The format of the messages in the message files should follow the relevant
|
|
RFC standards, with the following restriction: any "From:", "Sender:",
|
|
"Control:", "Approved:" or other similar "dangerous" header lines should be
|
|
ignored by the system transmitting the replies to prevent forgeries from
|
|
occuring. In particular, the "From:" header should be determined from the
|
|
user's login name, or some other similar means, rather than from any data
|
|
supplied in the user's message.
|
|
|
|
In most cases, mail messages will contain "To:", "Subject:", "Cc:", "Bcc:"
|
|
and "Reply-To:" header lines, and news messages will contain "Newsgroups:",
|
|
"Subject:", "Followup-To:", "Keywords:", "Summary:" and "Reply-To:" header
|
|
lines. Other optional headers (especially MIME content headers) may also
|
|
be present.
|
|
|
|
The automatic addition of a signature by the generating host which receives
|
|
the reply packet is discouraged. Signatures should be added by the user's
|
|
packet reading software instead, if desired.
|
|
|
|
A method for allowing replies from more than one person to be stored in the
|
|
same packet was considered, but was rejected for security reasons.
|
|
|
|
The following example demonstrates the capabilities of the REPLIES file:
|
|
|
|
R001 mail bn
|
|
R002 mail bi
|
|
R003 news Bn
|
|
R004 news Bi
|
|
|
|
LIST FILE
|
|
|
|
The LIST file may be used to send a list of available message areas to the
|
|
receiving system. Its format is similar to the AREAS file, with the prefix
|
|
field deleted. Each line has the following form:
|
|
|
|
area name<TAB>encoding[<TAB>description]
|
|
|
|
where "area name" is the name of the message area, "encoding" is a 2, 3 or 4
|
|
letter message, index, area kind, and subscription code, and "description"
|
|
is an optional message area description. Further optional fields may be
|
|
added in a future version of this specification.
|
|
|
|
The message, index, and area kind codes are the same as for the AREAS file.
|
|
The subscription code has one of the following values:
|
|
|
|
y The user is subscribed to the message area
|
|
n The user is not subscribed to the message area
|
|
|
|
If this field is not present, it defaults to 'n'.
|
|
|
|
Note that the message areas in the LIST file should only be those that can
|
|
be subscribed to or unsubscribed from using a request in the COMMANDS file.
|
|
Private e-mail message areas will normally not appear in the list.
|
|
|
|
The following example demonstrates the capabilities of the LIST file:
|
|
|
|
alt.flame ucnn
|
|
comp.bbs.misc ucny
|
|
comp.bbs.waffle ucny
|
|
comp.lang.c ucnn C Programming Language Discussions
|
|
news.future ucny Future of USENET
|
|
|
|
SENDING COMMANDS BETWEEN SYSTEMS
|
|
|
|
The COMMANDS and ERRORS files contain information for changing the behaviour
|
|
of each end of a packet stream, or for reporting errors in the execution of
|
|
commands or the generation of packets. Each is a text file with LF-terminated
|
|
lines.
|
|
|
|
The ERRORS file is the simplest: it consists of error messages from the
|
|
program which generated the packet to report on the progress of previously
|
|
executed commands. The format of these error messages is not defined, but
|
|
they should be human readable so that packet readers may present the errors
|
|
to the user for perusal.
|
|
|
|
The COMMANDS file consists of a sequence of commands, one per line, which
|
|
modify the behaviour of the packet processor at the other end of the
|
|
packet stream. Usually these commands are sent from the packet reader
|
|
to the packet generator to change the subscribed message areas, send
|
|
files, etc. The names of the commands are NOT case significant, but SHOULD
|
|
be sent in lower case. Any commands that are not understood by a program
|
|
should be ignored.
|
|
|
|
version n.m
|
|
|
|
The command specifies the version of this specification that the
|
|
packet conforms to. For this document the version is "1.2".
|
|
|
|
date dd mmm ccyy hh:mm:ss [zone]
|
|
|
|
The date and time when the packet was created. To prevent confusion
|
|
with different country's date formats, the date MUST always appear
|
|
as "dd mmm ccyy". For example, "25 Jul 1993". This date format can
|
|
be converted to local conventions if desired. "hh:mm:ss" is a
|
|
24-hour clock time value. The "zone" field is the number of hours
|
|
and minutes that the timezone is offset from Greenwich Mean Time as
|
|
"+HHMM" or "-HHMM". For example, US Eastern Standard Time (EST) is
|
|
"-0500", and Australian Eastern Standard Time is "+1000". If the
|
|
zone is omitted, it defaults to "local time", however the zone should
|
|
only be omitted if there is no way to determine it.
|
|
|
|
subscribe name
|
|
|
|
This command requests the packet generating program to subscribe to
|
|
a new message area. The area name may contain spaces, but not TABs.
|
|
Additional fields may be added in a future version of this
|
|
specification after a separating TAB. For now, ignore anything after
|
|
a TAB. This command may generate an error message if the message area
|
|
does not exist, or cannot be subscribed to.
|
|
|
|
unsubscribe name
|
|
|
|
This command requests the packet generating program to unsubscribe
|
|
from a message area. The same remarks about TABs and errors above
|
|
also apply to this command.
|
|
|
|
catchup [name]
|
|
|
|
This command requests the packet generating program to catchup on
|
|
the nominated message area. That is, to mark all messages in the
|
|
area as read and continue batching from the next message received.
|
|
If the area name is not present, the packet generating program
|
|
should catchup on all message areas.
|
|
|
|
list [always|never]
|
|
|
|
This command requests the packet generating program to send a
|
|
full list of all available message areas as a LIST file in
|
|
the next packet. If the argument "always" is present, then
|
|
the LIST file should be sent in every packet. The argument
|
|
value "never" reverses this. For minimal compliance,
|
|
"list always" should be treated as "list", and "list never"
|
|
should be ignored.
|
|
|
|
hostname string
|
|
|
|
This command specifies the name of the host or BBS the packet was
|
|
generated on. It serves an informational role only. The string
|
|
can be any sequence of printable ASCII characters.
|
|
|
|
software string
|
|
|
|
This command specifies the name and version of the software which
|
|
generated the packet. It servers an informational role only. The
|
|
string can be any sequence of printable ASCII characters.
|
|
|
|
sendme<TAB>area<TAB>selector[<TAB>selector[...]]
|
|
|
|
This command requests that the packet generator send a number of
|
|
messages from the nominated message area. The "selector" arguments
|
|
are taken from the "selector" fields in a 'c' or 'C' index file.
|
|
Multiple "sendme" commands for the same message area may be present
|
|
in a COMMANDS file. The maximum length for this command is 500
|
|
characters. Note that other commands use spaces to separate
|
|
arguments, but this command uses TAB's.
|
|
|
|
mail y
|
|
mail n
|
|
|
|
This command changes whether or not private e-mail should be sent
|
|
in generated packets.
|
|
|
|
deletemail y
|
|
deletemail n
|
|
|
|
This command changes whether or not the user's private mailbox should
|
|
be deleted after being batched into a packet.
|
|
|
|
mailindex x
|
|
|
|
Set the preferred mail index format, where 'x' is one of the values
|
|
'n', 'c', 'C' or 'i'.
|
|
|
|
newsindex x
|
|
|
|
Set the preferred news index format, where 'x' is one of the values
|
|
'n', 'c', 'C' or 'i'.
|
|
|
|
get filename [putname]
|
|
|
|
Request that a file on the generating side be placed into a packet
|
|
and sent to the packet reader. "putname" specifies the "filename"
|
|
argument for the corresponding "put" command. If "putname" is
|
|
not specified, the default is to use the base name of "filename".
|
|
If directory paths are specified, the separator must be '/'. It
|
|
should be noted that security could be breached through the use
|
|
of this command, so programs which support this command should be
|
|
very careful, preferably restricting requests to a particular
|
|
directory tree.
|
|
|
|
put pktname filename
|
|
|
|
This command is usually sent in response to a "get" command, although
|
|
it can be sent on its own. "pktname" specifies the name of the file
|
|
in the packet which contains the requested file's contents. The
|
|
"filename" argument specifies destination file to write the contents
|
|
to. Note that security could be breached with this command, so
|
|
the destination filename should be checked, or restricted to a
|
|
particular directory tree. It is also recommended that the user
|
|
be prompted for confirmation before writing the file. If directory
|
|
paths are specified in "filename", the separator must be '/'. It
|
|
is recommended that the extension "FIL" be used for files in a
|
|
packet which contain data sent with this command. For example,
|
|
"put 001.FIL abc.zip"
|
|
|
|
supported cmd ...
|
|
|
|
This command is usually sent from a packet generator to inform a
|
|
packet reader as to which commands are supported by the generating
|
|
program. The argument is a space-separated list of command names.
|
|
For example, "supported subscribe unsubscribe list", or "supported
|
|
subscribe unsubscribe catchup list mail deletemail".
|
|
|
|
It is recommended that at least "subscribe", "unsubscribe" and "list" (with
|
|
no arguments) be supported. Packet generators are recommended to add a
|
|
"supported" line to all packets generated to inform the packet reader
|
|
which commands can be used. In the absence of a "supported" line, only
|
|
"subscribe", "unsubscribe" and "list" should be assumed to be supported.
|
|
|
|
If more than one command is received for the same item (e.g. "subscribe",
|
|
"unsubscribe", "list", "mail", ...), then the last command in the COMMANDS
|
|
file takes precedence over any previous commands.
|
|
|
|
The following example demonstrates a typical COMMANDS file sent from a
|
|
packet generator:
|
|
|
|
version 1.2
|
|
date 25 Jul 1993 12:34:38 +1000
|
|
hostname frobozz.domain.com
|
|
software Fubar 1.3
|
|
supported subscribe unsubscribe catchup list sendme get
|
|
put 001.FIL abc.zip
|
|
put 002.FIL def.txt
|
|
|
|
The following example demonstrates a typical COMMANDS file sent from a
|
|
packet reader:
|
|
|
|
subscribe comp.lang.c
|
|
subscribe comp.lang.misc
|
|
unsubscribe alt.swedish.chef.bork.bork.bork
|
|
list
|
|
get xyzzy.zip
|
|
get /usr/local/lib/fubar.txt frobozz.txt
|
|
|
|
MESSAGE AREA SUMMARIES
|
|
|
|
The preceding sections have described a number of features for supporting
|
|
message area summaries. This section provides greater detail.
|
|
|
|
Since some message areas, notably USENET newsgroups, can get quite large,
|
|
the user may want to download a summary of a message area instead of all
|
|
of the messages, and then request that messages of interest be sent at
|
|
some later time for reading. Usually the summary will list the messages'
|
|
subjects, authors, and other similar "header information". Optionally,
|
|
the user may request that the first few lines of the messages also be
|
|
sent so that the user may peruse the beginning of the message and decide
|
|
whether to retrieve the rest of the message.
|
|
|
|
This activity is supported in the following fashion in this packet format:
|
|
summary information is sent in an index file of type 'c' or 'C', usually
|
|
with no accompanying message file. Therefore, the message file format in
|
|
the AREAS file will be set to 'i'. Each line in the index file has its
|
|
"bytes" field set to 0 to indicate that the message is not present in
|
|
the message file, and the "selector" field is set to some string that can
|
|
be used to request the message by way of a "sendme" command. Usually this
|
|
selection string will be the message number of the message on the generating
|
|
host, but other values such as Message-ID's are allowable.
|
|
|
|
If the first few lines of each message are also desired, the message file
|
|
format is set to something other than 'i', and the "offset" and "bytes" fields
|
|
in the index file may be used to extract the trimmed-down messages for
|
|
perusal. The "selector" field is once again used to request that an entire
|
|
message be sent at some later time, by way of a "sendme" command.
|
|
|
|
It is possible to create a message area which contains both ordinary messages
|
|
and summary messages. If the "selector" field is not present, or is
|
|
zero-length, then the message should be processed in the usual way, and if
|
|
the "selector" field is present and not zero-length, then it is a summary
|
|
message and the "bytes" field can be used to determine if the first few
|
|
lines of a message exist in the message file or not. This mixture can be
|
|
useful in some situations where the user wishes to download all messages
|
|
less than a certain length, and download the larger messages as summaries,
|
|
so that the larger messages can be explicitly requested only if the user
|
|
really wants them.
|
|
|
|
MINIMAL CONFORMANCE
|
|
|
|
This section describes the minimal amount of work that a packet processor
|
|
must do to be compliant with this specification.
|
|
|
|
Packet generators should be able to generate message areas for the 'b'
|
|
and 'u' message formats for private and public message areas respectively,
|
|
and process replies for the 'b' and 'B' message formats. For minimal
|
|
conformance, index format 'n' must be supported, and if message area
|
|
summaries are required, one of index formats 'c' or 'C' should be supported.
|
|
It is recommended that either 'c' or 'C' be supported in all packet
|
|
generators, even when message summaries are not required. If message
|
|
summaries are supported, the minimal requirement is to send an index file
|
|
with the message file format set to 'i'. Packet generators should support
|
|
the "subscribe", "unsubscribe" and "list" commands, and also the "sendme"
|
|
command if message area summaries are required.
|
|
|
|
Packet readers should be able to read all message and index formats, and
|
|
generate replies for the 'b' and 'B' message formats. If message area
|
|
summaries are not supported, all areas with message format 'i' should be
|
|
flagged to the user as not understood. Packet readers should also be
|
|
able to display the INFO and LIST files if they are present in a packet
|
|
and be able to prompt the user for "subscribe" and "unsubscribe" requests
|
|
to be sent to the packet generator.
|
|
|
|
FUTURE ENHANCEMENTS
|
|
|
|
The obvious enhancement that can be made is to support other message formats,
|
|
especially FidoNet formats. Currently the message area file code 'q' is
|
|
reserved for QWK-format messages. This will be defined in a future version
|
|
of this specification if demand warrants.
|
|
|
|
Experimentation with other formats and auxillary files is encouraged, but
|
|
please contact the author first to prevent double-ups from occurring.
|
|
The author may be contacted via e-mail at rhys@cs.uq.oz.au.
|