529 lines
22 KiB
Plaintext
529 lines
22 KiB
Plaintext
What is a USENET domain?
|
||
------------------------
|
||
What is a Domain?
|
||
|
||
Mark R. Horton
|
||
|
||
Bell Laboratories
|
||
Columbus, Ohio 43213
|
||
|
||
ABSTRACT
|
||
|
||
|
||
|
||
In the past, electronic mail has used
|
||
many different kinds of syntax, naming a
|
||
computer and a login name on that
|
||
computer. A new system, called
|
||
``domains'', is becoming widely used,
|
||
based on a heirarchical naming scheme.
|
||
This paper is intended as a quick
|
||
introduction to domains. For more
|
||
details, you should read some of the
|
||
documents referenced at the end.
|
||
|
||
1. Introduction
|
||
|
||
What exactly are domains? Basically, they are a way of
|
||
looking at the world as a heirarchy (tree structure).
|
||
You're already used to using two tree world models that work
|
||
pretty well: the telephone system and the post office.
|
||
Domains form a similar heirarchy for the electronic mail
|
||
community.
|
||
|
||
The post office divides the world up geographically, first
|
||
into countries, then each country divides itself up, those
|
||
units subdivide, and so on. One such country, the USA,
|
||
divides into states, which divide into counties (except for
|
||
certain states, like Louisiana, which divide into things
|
||
like parishes), the counties subdivide into cities, towns,
|
||
and townships, which typically divide into streets, the
|
||
streets divide into lots with addresses, possibly containing
|
||
room and apartment numbers, the then individual people at
|
||
that address. So you have an address like
|
||
|
||
Mark Horton
|
||
Room 2C-249
|
||
6200 E. Broad St.
|
||
Columbus, Ohio, USA
|
||
|
||
(I'm ignoring the name ``AT&T Bell Laboratories'' and the
|
||
zip code, which are redundant information.) Other countries
|
||
may subdivide differently, for example many small countries
|
||
do not have states.
|
||
|
||
The telephone system is similar. Your full phone number
|
||
might look like 1-614-860-1234 x234 This contains, from left
|
||
to right, your country code (Surprise! The USA has country
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
- 2 -
|
||
|
||
|
||
|
||
code ``1''!), area code 614 (Central Ohio), 860 (a prefix in
|
||
the Reynoldsburg C.O.), 1234 (individual phone number), and
|
||
extension 234. Some phone numbers do not have extensions,
|
||
but the phone system in the USA has standardized on a 3
|
||
digit area code, 3 digit prefix, and 4 digit phone number.
|
||
Other countries don't use this standard, for example, in the
|
||
Netherlands a number might be +46 8 7821234 (country code
|
||
46, city code 8, number 7821234), in Germany +49 231
|
||
7551234, in Sweden +31 80 551234, in Britain +44 227 61234
|
||
or +44 506 411234. Note that the country and city codes and
|
||
telephone numbers are not all the same length, and the
|
||
punctuation is different from our North American notation.
|
||
Within a country, the length of the telephone number might
|
||
depend on the city code. Even within the USA, the length of
|
||
extensions is not standardized: some places use the last 4
|
||
digits of the telephone number for the extension, some use 2
|
||
or 3 or 4 digit extensions you must ask an operator for.
|
||
Each country has established local conventions. But the
|
||
numbers are unambigous when dialed from left-to-right, so as
|
||
long as there is a way to indicate when you are done
|
||
dialing, there is no problem.
|
||
|
||
A key difference in philosophy between the two systems is
|
||
evident from the way addresses and telephone numbers are
|
||
written. With an address, the most specific information
|
||
comes first, the least specific last. (The ``root of the
|
||
tree'' is at the right.) With telephones, the least
|
||
specific information (root) is at the left. The telephone
|
||
system was designed for machinery that looks at the first
|
||
few digits, does something with it, and passes the remainder
|
||
through to the next level. Thus, in effect, you are routing
|
||
your call through the telephone network. Of course, the
|
||
exact sequence you dial depends on where you are dialing
|
||
from - sometimes you must dial 9 or 8 first, to get an
|
||
international dialtone you must dial 011, if you are calling
|
||
locally you can (and sometimes must) leave off the 1 and the
|
||
area code. (This makes life very interesting for people who
|
||
must design a box to call their home office from any phone
|
||
in the world.) This type of address is called a ``relative
|
||
address'', since the actual address used depends on the
|
||
location of the sender.
|
||
|
||
The postal system, on the other hand, allows you to write
|
||
the same address no matter where the sender is. The address
|
||
above will get to me from anywhere in the world, even
|
||
private company mail systems. Yet, some optional
|
||
abbreviations are possible - I can leave off the USA if I'm
|
||
mailing within the USA; if I'm in the same city as the
|
||
address, I can usually just say ``city'' in place of the
|
||
last line. This type of address is called an ``absolute
|
||
address'', since the unabbreviated form does not depend on
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
- 3 -
|
||
|
||
|
||
|
||
the location of the sender.
|
||
|
||
The ARPANET has evolved with a system of absolute addresses:
|
||
``user@host'' works from any machine. The UUCP network has
|
||
evolved with a system of relative addresses: ``host!user''
|
||
works from any machine with a direct link to ``host'', and
|
||
you have to route your mail through the network to find such
|
||
a machine. In fact, the ``user@host'' syntax has become so
|
||
popular that many sites run mail software that accepts this
|
||
syntax, looks up ``host'' in a table, and sends it to the
|
||
appropriate network for ``host''. This is a very nice user
|
||
interface, but it only works well in a small network. Once
|
||
the set of allowed hosts grows past about 1000 hosts, you
|
||
run into all sorts of administrative problems.
|
||
|
||
One problem is that it becomes nearly impossible to keep a
|
||
table of host names up to date. New machines are being
|
||
added somewhere in the world every day, and nobody tells you
|
||
about them. When you try to send mail to a host that isn't
|
||
in your table (replying to mail you just got from a new
|
||
host), your mailing software might try to route it to a
|
||
smarter machine, but without knowing which network to send
|
||
it to, it can't guess which smarter machine to forward to.
|
||
Another problem is name space collision - there is nothing
|
||
to prevent a host on one network from choosing the same name
|
||
as a host on another network. For example, DEC's ENET has a
|
||
``vortex'' machine, there is also one on UUCP. Both had
|
||
their names long before the two networks could talk to each
|
||
other, and neither had to ask the other network for
|
||
permission to use the name. The problem is compounded when
|
||
you consider how many computer centers name their machines
|
||
``A'', ``B'', ``C'', and so on.
|
||
|
||
In recognition of this problem, ARPA has established a new
|
||
way to name computers based on domains. The ARPANET is
|
||
pioneering the domain convention, and many other computer
|
||
networks are falling in line, since it is the first naming
|
||
convention that looks like it really stands a chance of
|
||
working. The MILNET portion of ARPANET has a domain, CSNET
|
||
has one, and it appears that Digital, AT&T, and UUCP will be
|
||
using domains as well. Domains look a lot like postal
|
||
addresses, with a simple syntax that fits on one line, is
|
||
easy to type, and is easy for computers to handle. To
|
||
illustrate, an old routed UUCP address might read
|
||
``sdcsvax!ucbvax!allegra!cbosgd!mark''. The domain version
|
||
of this might read ``mark@d.osg.cb.att.uucp''. The machine
|
||
is named d.osg.cb.att.uucp (UUCP domain, AT&T company,
|
||
Columbus site, Operating System Group project, fourth
|
||
machine.) Of course, this example is somewhat verbose and
|
||
contrived; it illustrates the heirarchy well, but most
|
||
people would rather type something like ``cbosgd.att.uucp''
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
- 4 -
|
||
|
||
|
||
|
||
or even ``cbosgd.uucp'', and actual domains are usually set
|
||
up so that you don't have to type very much.
|
||
|
||
You may wonder why the single @ sign is present, that is,
|
||
why the above address does not read
|
||
``mark.d.osg.cb.att.uucp''. In fact, it was originally
|
||
proposed in this form, and some of the examples in RFC819 do
|
||
not contain an @ sign. The @ sign is present because some
|
||
ARPANET sites felt the strong need for a divider between the
|
||
domain, which names one or more computers, and the left hand
|
||
side, which is subject to whatever interpretation the domain
|
||
chooses. For example, if the ATT domain chooses to address
|
||
people by full name rather than by their login, an address
|
||
like ``Mark.Horton@ATT.UUCP'' makes it clear that some
|
||
machine in the ATT domain should interpret the string
|
||
``Mark.Horton'', but if the address were
|
||
``Mark.Horton.ATT.UUCP'', routing software might try to find
|
||
a machine named ``Horton'' or ``Mark.Horton''. (By the way,
|
||
case is ignored in domains, so that ``ATT.UUCP'' is the same
|
||
as ``att.uucp''. To the left of the @ sign, however, a
|
||
domain can interpret the text any way it wants; case can be
|
||
ignored or it can be significant.)
|
||
|
||
It is important to note that domains are not routes. Some
|
||
people look at the number of !'s in the first example and
|
||
the number of .'s in the second, and assume the latter is
|
||
being routed from a machine called ``uucp'' to another
|
||
called ``att'' to another called ``cb'' and so on. While it
|
||
is possible to set up mail routing software to do this, and
|
||
indeed in the worst case, even without a reasonable set of
|
||
tables, this method will always work, the intent is that
|
||
``d.osg.cb.att.uucp'' is the name of a machine, not a path
|
||
to get there. In particular, domains are absolute
|
||
addresses, while routes depend on the location of the
|
||
sender. Some subroutine is charged with figuring out, given
|
||
a domain based machine name, what to do with it. In a high
|
||
quality environment like the ARPA Internet, it can query a
|
||
table or a name server, come up with a 32 bit host number,
|
||
and connect you directly to that machine. In the UUCP
|
||
environment, we don't have the concept of two processes on
|
||
arbitrary machines talking directly, so we forward mail one
|
||
hop at a time until it gets to the appropriate destination.
|
||
In this case, the subroutine decides if the name represents
|
||
the local machine, and if not, decides which of its
|
||
neighbors to forward the message to.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
- 5 -
|
||
|
||
|
||
|
||
2. What is a Domain?
|
||
|
||
So, after all this background, we still haven't said what a
|
||
domain is. The answer (I hope it's been worth the wait) is
|
||
that a domain is a subtree of the world tree. For example,
|
||
``uucp'' is a top level domain (that is, a subtree of the
|
||
``root''.) and represents all names and machines beneath it
|
||
in the tree. ``att.uucp'' is a subdomain of ``uucp'',
|
||
representing all names, machines, and subdomains beneath
|
||
``att'' in the tree. Similarly for ``cb.att.uucp'',
|
||
``osg.cb.att.uucp'', and even ``d.osg.cb.att.uucp''
|
||
(although ``d.osg.cb.att.uucp'' is a ``leaf'' domain,
|
||
representing only the one machine).
|
||
|
||
A domain has certain properties. The key property is that
|
||
it has a ``registry''. That is, the domain has a list of
|
||
the names of all immediate subdomains, plus information
|
||
about how to get to each one. There is also a contact
|
||
person for the domain. This person is responsible for the
|
||
domain, keeping the registry up-to-date, serving as a point
|
||
of contact for outside queries, and setting policy
|
||
requirements for subdomains. Each subdomain can decide who
|
||
it will allow to have subdomains, and establish requirements
|
||
that all subdomains must meet to be included in the
|
||
registry. For example, the ``cb'' domain might require all
|
||
subdomains to be physically located in the AT&T building in
|
||
Columbus.
|
||
|
||
ARPA has established certain requirements for top level
|
||
domains. These requirements specify that there must be a
|
||
list of all subdomains and contact persons for them, a
|
||
responsible person who is an authority for the domain (so
|
||
that if some site does something bad, it can be made to
|
||
stop), a minimum size (to prevent small domains from being
|
||
top level), and a pair of nameservers (for redundancy) to
|
||
provide a directory-assistance facility. Domains can be
|
||
more lax about the requirements they place on their
|
||
subdomains, making it harder to be a top level domain than
|
||
somewhere lower in the tree. Of course, if you are a
|
||
subdomain, your parent is responsible for you.
|
||
|
||
One requirement that is NOT present is for unique parents.
|
||
That is, a machine (or an entire subdomain) need not appear
|
||
in only one place in the tree. Thus, ``cb'' might appear
|
||
both in the ``att'' domain, and in the ``ohio'' domain.
|
||
This allows domains to be structured more flexibly than just
|
||
the simple geography used by the postal service and the
|
||
telephone company; organizations or topography can be used
|
||
in parallel. (Actually, there are a few instances where
|
||
this is done in the postal service [overseas military mail]
|
||
and the telephone system [prefixes can appear in more than
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
- 6 -
|
||
|
||
|
||
|
||
one area code, e.g. near Washington D.C., and Silicon
|
||
Valley].) It also allows domains to split or join up, while
|
||
remaining upward compatible with their old addresses.
|
||
|
||
Do all domains represent specific machines? Not
|
||
necessarily. It's pretty obvious that a full path like
|
||
``d.cbosg.att.uucp'' refers to exactly one machine. The OSG
|
||
domain might decide that ``cbosg.att.uucp'' represents a
|
||
particular gateway machine. Or it might decide that it
|
||
represents a set of machines, several of which might be
|
||
gateways. The ``att.uucp'' domain might decide that several
|
||
machines, ``ihnp4.uucp'', ``whgwj.uucp'', and ``hogtw.uucp''
|
||
are all entry points into ``att.uucp''. Or it might decide
|
||
that it just represents a spot in the name space, not a
|
||
machine. For example, there is no machine corresponding to
|
||
``arpa'' or ``uucp'', or to the root. Each domain decides
|
||
for itself. The naming space and the algorithm for getting
|
||
mail from one machine to another are not closely linked -
|
||
routing is up to the mail system to figure out, with or
|
||
without help from the structure of the names.
|
||
|
||
The domain syntax does allow explicit routes, in case you
|
||
want to exercise a particular route or some gateway is
|
||
balking. The syntax is
|
||
``@dom1,@dom2,...,@domn:user@domain'', for example,
|
||
@ihnp4.UUCP,@ucbvax.UUCP,:joe@NIC.ARPA, forcing it to be
|
||
routed through dom1, dom2, ..., domn, and from domn sent to
|
||
the final address. This behaves exactly like the UUCP !
|
||
routing syntax, although it is somewhat more verbose.
|
||
|
||
By the way, you've no doubt noticed that some forms of
|
||
electronic addresses read from left-to-right (cbosgd!mark),
|
||
others read from right-to-left (mark@Berkeley). Which is
|
||
better? The real answer here is that it's a religious
|
||
issue, and it doesn't make much difference. left-to-right
|
||
is probably a bit easier for a computer to deal with because
|
||
it can understand something on the left and ignore the
|
||
remainder of the address. (While it's almost as easy for
|
||
the program to read from right-to-left, the ease of going
|
||
from left-to-right was probably in the backs of the minds of
|
||
the designers who invented host:user and host!user.)
|
||
|
||
On the other hand, I claim that user@host is easier for
|
||
humans to read, since people tend to start reading from the
|
||
left and quit as soon as they recognize the login name of
|
||
the person. Also, a mail program that prints a table of
|
||
headers may have to truncate the sender's address to make it
|
||
fit in a fixed number of columns, and it's probably more
|
||
useful to read ``mark@d.osg.a'' than ``ucbvax!sdcsv''.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
- 7 -
|
||
|
||
|
||
|
||
These are pretty minor issues, after all, humans can adapt
|
||
to skip to the end of an address, and programs can truncate
|
||
on the left. But the real problem is that if the world
|
||
contains BOTH left-to-right and right-to-left syntax, you
|
||
have ambiguous addresses like x!y@z to consider. This
|
||
single problem turns out to be a killer, and is the best
|
||
single reason to try to stamp out one in favor of the other.
|
||
|
||
|
||
3. So why are we doing this, anyway?
|
||
|
||
The current world is full of lots of interesting kinds of
|
||
mail syntax. The old ARPA ``user@host'' is still used on
|
||
the ARPANET by many systems. Explicit routing can sometimes
|
||
by done with an address like ``user@host2@host1'' which
|
||
sends the mail to host1 and lets host1 interpret
|
||
``user@host2''. Addresses with more than one @ were made
|
||
illegal a few years ago, but many ARPANET hosts depended on
|
||
them, and the syntax is still being used. UUCP uses
|
||
``h1!h2!h3!user'', requiring the user to route the mail.
|
||
Berknets use ``host:user'' and do not allow explicit
|
||
routing.
|
||
|
||
To get mail from one host to another, it had to be routed
|
||
through gateways. Thus, the address ``csvax:mark@Berkeley''
|
||
from the ARPANET would send the mail to Berkeley, which
|
||
would forward it to the Berknet address csvax:mark. To send
|
||
mail to the ARPANET from UUCP, an address such as
|
||
``ihnp4!ucbvax!sam@foo-unix'' would route it through ihnp4
|
||
to ucbvax, which would interpret ``sam@foo-unix'' as an
|
||
ARPANET address and pass it along. When the Berknet-UUCP
|
||
gateway and Berknet-ARPANET gateway were on different
|
||
machines, addresses such as
|
||
``csvax:ihnp4!ihnss!warren@Berkeley'' were common.
|
||
|
||
As you can see, the combination of left-to-right UUCP syntax
|
||
and right-to-left ARPANET syntax makes things pretty
|
||
complex. Berknets are gone now, but there are lots of
|
||
gateways between UUCP and the ARPANET and ARPANET-like mail
|
||
networks. Sending mail to an address for which you only
|
||
know a path from the ARPANET onto UUCP is even harder -
|
||
suppose the address you have is ihnp4!ihnss!warren@Berkeley,
|
||
and you are on host rlgvax which uses seismo as an ARPANET
|
||
gateway. You must send to
|
||
seismo!ihnp4!ihnss!warren@Berkeley, which is not only pretty
|
||
hard to read, but when the recipient tries to reply, it will
|
||
have no idea where the break in the address between the two
|
||
UUCP pieces occurs. An ARPANET site routing across the UUCP
|
||
world to somebody's Ethernet using domains locally will have
|
||
to send an address something like ``xxx@Berkeley.ARPA'' to
|
||
get it to UUCP, then ``ihnp4!decvax!island!yyy'' to get it
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
- 8 -
|
||
|
||
|
||
|
||
to the other ethernet, then ``sam@csvax.ISLAND'' to get it
|
||
across their ethernet. The single address would therefore
|
||
be ihnp4!decvax!island!sam@csvax.ISLAND@Berkeley.ARPA, which
|
||
is too much to ask any person or mailer to understand. It's
|
||
even worse: gateways have to deal with ambiguous names like
|
||
ihnp4!mark@Berkeley, which can be parsed either
|
||
``(ihnp4!mark)@Berkeley'' in accordance with the ARPANET
|
||
conventions, or ``ihnp4!(mark@Berkeley)'' as the old UUCP
|
||
would.
|
||
|
||
Another very important reason for using domains is that your
|
||
mailing address becomes absolute instead of relative. It
|
||
becomes possible to put your electronic address on your
|
||
business card or in your signature file without worrying
|
||
about writing six different forms and fifteen hosts that
|
||
know how to get to yours. It drastically simplifies the job
|
||
of the reply command in your mail program, and automatic
|
||
reply code in the netnews software.
|
||
|
||
|
||
4. Further Information
|
||
|
||
For further information, some of the basic ARPANET reference
|
||
documents are in order. These can often be found posted to
|
||
Usenet, or available nearby. They are all available on the
|
||
ARPANET on host NIC via FTP with login ANONYMOUS, if you
|
||
have an ARPANET login. They can also be ordered from the
|
||
Network Information Center, SRI International, Menlo Park,
|
||
California, 94025.
|
||
|
||
RFC819 The Domain Naming Convention for Internet User Applications
|
||
RFC821 Simple Mail Transfer Protocol
|
||
RFC822 Standard for the Format of ARPANET Text Messages
|
||
RFC881 The Domain Names Plan and Schedule
|
||
|
||
#
|
||
# @(#)domain.mm 2.1 smail 12/14/86
|
||
#
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|