textfiles/internet/domain.txt

529 lines
22 KiB
Plaintext
Raw Normal View History

2021-04-15 11:31:59 -07:00
What is a USENET domain?
------------------------
What is a Domain?
Mark R. Horton
Bell Laboratories
Columbus, Ohio 43213
ABSTRACT
In the past, electronic mail has used
many different kinds of syntax, naming a
computer and a login name on that
computer. A new system, called
``domains'', is becoming widely used,
based on a heirarchical naming scheme.
This paper is intended as a quick
introduction to domains. For more
details, you should read some of the
documents referenced at the end.
1. Introduction
What exactly are domains? Basically, they are a way of
looking at the world as a heirarchy (tree structure).
You're already used to using two tree world models that work
pretty well: the telephone system and the post office.
Domains form a similar heirarchy for the electronic mail
community.
The post office divides the world up geographically, first
into countries, then each country divides itself up, those
units subdivide, and so on. One such country, the USA,
divides into states, which divide into counties (except for
certain states, like Louisiana, which divide into things
like parishes), the counties subdivide into cities, towns,
and townships, which typically divide into streets, the
streets divide into lots with addresses, possibly containing
room and apartment numbers, the then individual people at
that address. So you have an address like
Mark Horton
Room 2C-249
6200 E. Broad St.
Columbus, Ohio, USA
(I'm ignoring the name ``AT&T Bell Laboratories'' and the
zip code, which are redundant information.) Other countries
may subdivide differently, for example many small countries
do not have states.
The telephone system is similar. Your full phone number
might look like 1-614-860-1234 x234 This contains, from left
to right, your country code (Surprise! The USA has country
- 2 -
code ``1''!), area code 614 (Central Ohio), 860 (a prefix in
the Reynoldsburg C.O.), 1234 (individual phone number), and
extension 234. Some phone numbers do not have extensions,
but the phone system in the USA has standardized on a 3
digit area code, 3 digit prefix, and 4 digit phone number.
Other countries don't use this standard, for example, in the
Netherlands a number might be +46 8 7821234 (country code
46, city code 8, number 7821234), in Germany +49 231
7551234, in Sweden +31 80 551234, in Britain +44 227 61234
or +44 506 411234. Note that the country and city codes and
telephone numbers are not all the same length, and the
punctuation is different from our North American notation.
Within a country, the length of the telephone number might
depend on the city code. Even within the USA, the length of
extensions is not standardized: some places use the last 4
digits of the telephone number for the extension, some use 2
or 3 or 4 digit extensions you must ask an operator for.
Each country has established local conventions. But the
numbers are unambigous when dialed from left-to-right, so as
long as there is a way to indicate when you are done
dialing, there is no problem.
A key difference in philosophy between the two systems is
evident from the way addresses and telephone numbers are
written. With an address, the most specific information
comes first, the least specific last. (The ``root of the
tree'' is at the right.) With telephones, the least
specific information (root) is at the left. The telephone
system was designed for machinery that looks at the first
few digits, does something with it, and passes the remainder
through to the next level. Thus, in effect, you are routing
your call through the telephone network. Of course, the
exact sequence you dial depends on where you are dialing
from - sometimes you must dial 9 or 8 first, to get an
international dialtone you must dial 011, if you are calling
locally you can (and sometimes must) leave off the 1 and the
area code. (This makes life very interesting for people who
must design a box to call their home office from any phone
in the world.) This type of address is called a ``relative
address'', since the actual address used depends on the
location of the sender.
The postal system, on the other hand, allows you to write
the same address no matter where the sender is. The address
above will get to me from anywhere in the world, even
private company mail systems. Yet, some optional
abbreviations are possible - I can leave off the USA if I'm
mailing within the USA; if I'm in the same city as the
address, I can usually just say ``city'' in place of the
last line. This type of address is called an ``absolute
address'', since the unabbreviated form does not depend on
- 3 -
the location of the sender.
The ARPANET has evolved with a system of absolute addresses:
``user@host'' works from any machine. The UUCP network has
evolved with a system of relative addresses: ``host!user''
works from any machine with a direct link to ``host'', and
you have to route your mail through the network to find such
a machine. In fact, the ``user@host'' syntax has become so
popular that many sites run mail software that accepts this
syntax, looks up ``host'' in a table, and sends it to the
appropriate network for ``host''. This is a very nice user
interface, but it only works well in a small network. Once
the set of allowed hosts grows past about 1000 hosts, you
run into all sorts of administrative problems.
One problem is that it becomes nearly impossible to keep a
table of host names up to date. New machines are being
added somewhere in the world every day, and nobody tells you
about them. When you try to send mail to a host that isn't
in your table (replying to mail you just got from a new
host), your mailing software might try to route it to a
smarter machine, but without knowing which network to send
it to, it can't guess which smarter machine to forward to.
Another problem is name space collision - there is nothing
to prevent a host on one network from choosing the same name
as a host on another network. For example, DEC's ENET has a
``vortex'' machine, there is also one on UUCP. Both had
their names long before the two networks could talk to each
other, and neither had to ask the other network for
permission to use the name. The problem is compounded when
you consider how many computer centers name their machines
``A'', ``B'', ``C'', and so on.
In recognition of this problem, ARPA has established a new
way to name computers based on domains. The ARPANET is
pioneering the domain convention, and many other computer
networks are falling in line, since it is the first naming
convention that looks like it really stands a chance of
working. The MILNET portion of ARPANET has a domain, CSNET
has one, and it appears that Digital, AT&T, and UUCP will be
using domains as well. Domains look a lot like postal
addresses, with a simple syntax that fits on one line, is
easy to type, and is easy for computers to handle. To
illustrate, an old routed UUCP address might read
``sdcsvax!ucbvax!allegra!cbosgd!mark''. The domain version
of this might read ``mark@d.osg.cb.att.uucp''. The machine
is named d.osg.cb.att.uucp (UUCP domain, AT&T company,
Columbus site, Operating System Group project, fourth
machine.) Of course, this example is somewhat verbose and
contrived; it illustrates the heirarchy well, but most
people would rather type something like ``cbosgd.att.uucp''
- 4 -
or even ``cbosgd.uucp'', and actual domains are usually set
up so that you don't have to type very much.
You may wonder why the single @ sign is present, that is,
why the above address does not read
``mark.d.osg.cb.att.uucp''. In fact, it was originally
proposed in this form, and some of the examples in RFC819 do
not contain an @ sign. The @ sign is present because some
ARPANET sites felt the strong need for a divider between the
domain, which names one or more computers, and the left hand
side, which is subject to whatever interpretation the domain
chooses. For example, if the ATT domain chooses to address
people by full name rather than by their login, an address
like ``Mark.Horton@ATT.UUCP'' makes it clear that some
machine in the ATT domain should interpret the string
``Mark.Horton'', but if the address were
``Mark.Horton.ATT.UUCP'', routing software might try to find
a machine named ``Horton'' or ``Mark.Horton''. (By the way,
case is ignored in domains, so that ``ATT.UUCP'' is the same
as ``att.uucp''. To the left of the @ sign, however, a
domain can interpret the text any way it wants; case can be
ignored or it can be significant.)
It is important to note that domains are not routes. Some
people look at the number of !'s in the first example and
the number of .'s in the second, and assume the latter is
being routed from a machine called ``uucp'' to another
called ``att'' to another called ``cb'' and so on. While it
is possible to set up mail routing software to do this, and
indeed in the worst case, even without a reasonable set of
tables, this method will always work, the intent is that
``d.osg.cb.att.uucp'' is the name of a machine, not a path
to get there. In particular, domains are absolute
addresses, while routes depend on the location of the
sender. Some subroutine is charged with figuring out, given
a domain based machine name, what to do with it. In a high
quality environment like the ARPA Internet, it can query a
table or a name server, come up with a 32 bit host number,
and connect you directly to that machine. In the UUCP
environment, we don't have the concept of two processes on
arbitrary machines talking directly, so we forward mail one
hop at a time until it gets to the appropriate destination.
In this case, the subroutine decides if the name represents
the local machine, and if not, decides which of its
neighbors to forward the message to.
- 5 -
2. What is a Domain?
So, after all this background, we still haven't said what a
domain is. The answer (I hope it's been worth the wait) is
that a domain is a subtree of the world tree. For example,
``uucp'' is a top level domain (that is, a subtree of the
``root''.) and represents all names and machines beneath it
in the tree. ``att.uucp'' is a subdomain of ``uucp'',
representing all names, machines, and subdomains beneath
``att'' in the tree. Similarly for ``cb.att.uucp'',
``osg.cb.att.uucp'', and even ``d.osg.cb.att.uucp''
(although ``d.osg.cb.att.uucp'' is a ``leaf'' domain,
representing only the one machine).
A domain has certain properties. The key property is that
it has a ``registry''. That is, the domain has a list of
the names of all immediate subdomains, plus information
about how to get to each one. There is also a contact
person for the domain. This person is responsible for the
domain, keeping the registry up-to-date, serving as a point
of contact for outside queries, and setting policy
requirements for subdomains. Each subdomain can decide who
it will allow to have subdomains, and establish requirements
that all subdomains must meet to be included in the
registry. For example, the ``cb'' domain might require all
subdomains to be physically located in the AT&T building in
Columbus.
ARPA has established certain requirements for top level
domains. These requirements specify that there must be a
list of all subdomains and contact persons for them, a
responsible person who is an authority for the domain (so
that if some site does something bad, it can be made to
stop), a minimum size (to prevent small domains from being
top level), and a pair of nameservers (for redundancy) to
provide a directory-assistance facility. Domains can be
more lax about the requirements they place on their
subdomains, making it harder to be a top level domain than
somewhere lower in the tree. Of course, if you are a
subdomain, your parent is responsible for you.
One requirement that is NOT present is for unique parents.
That is, a machine (or an entire subdomain) need not appear
in only one place in the tree. Thus, ``cb'' might appear
both in the ``att'' domain, and in the ``ohio'' domain.
This allows domains to be structured more flexibly than just
the simple geography used by the postal service and the
telephone company; organizations or topography can be used
in parallel. (Actually, there are a few instances where
this is done in the postal service [overseas military mail]
and the telephone system [prefixes can appear in more than
- 6 -
one area code, e.g. near Washington D.C., and Silicon
Valley].) It also allows domains to split or join up, while
remaining upward compatible with their old addresses.
Do all domains represent specific machines? Not
necessarily. It's pretty obvious that a full path like
``d.cbosg.att.uucp'' refers to exactly one machine. The OSG
domain might decide that ``cbosg.att.uucp'' represents a
particular gateway machine. Or it might decide that it
represents a set of machines, several of which might be
gateways. The ``att.uucp'' domain might decide that several
machines, ``ihnp4.uucp'', ``whgwj.uucp'', and ``hogtw.uucp''
are all entry points into ``att.uucp''. Or it might decide
that it just represents a spot in the name space, not a
machine. For example, there is no machine corresponding to
``arpa'' or ``uucp'', or to the root. Each domain decides
for itself. The naming space and the algorithm for getting
mail from one machine to another are not closely linked -
routing is up to the mail system to figure out, with or
without help from the structure of the names.
The domain syntax does allow explicit routes, in case you
want to exercise a particular route or some gateway is
balking. The syntax is
``@dom1,@dom2,...,@domn:user@domain'', for example,
@ihnp4.UUCP,@ucbvax.UUCP,:joe@NIC.ARPA, forcing it to be
routed through dom1, dom2, ..., domn, and from domn sent to
the final address. This behaves exactly like the UUCP !
routing syntax, although it is somewhat more verbose.
By the way, you've no doubt noticed that some forms of
electronic addresses read from left-to-right (cbosgd!mark),
others read from right-to-left (mark@Berkeley). Which is
better? The real answer here is that it's a religious
issue, and it doesn't make much difference. left-to-right
is probably a bit easier for a computer to deal with because
it can understand something on the left and ignore the
remainder of the address. (While it's almost as easy for
the program to read from right-to-left, the ease of going
from left-to-right was probably in the backs of the minds of
the designers who invented host:user and host!user.)
On the other hand, I claim that user@host is easier for
humans to read, since people tend to start reading from the
left and quit as soon as they recognize the login name of
the person. Also, a mail program that prints a table of
headers may have to truncate the sender's address to make it
fit in a fixed number of columns, and it's probably more
useful to read ``mark@d.osg.a'' than ``ucbvax!sdcsv''.
- 7 -
These are pretty minor issues, after all, humans can adapt
to skip to the end of an address, and programs can truncate
on the left. But the real problem is that if the world
contains BOTH left-to-right and right-to-left syntax, you
have ambiguous addresses like x!y@z to consider. This
single problem turns out to be a killer, and is the best
single reason to try to stamp out one in favor of the other.
3. So why are we doing this, anyway?
The current world is full of lots of interesting kinds of
mail syntax. The old ARPA ``user@host'' is still used on
the ARPANET by many systems. Explicit routing can sometimes
by done with an address like ``user@host2@host1'' which
sends the mail to host1 and lets host1 interpret
``user@host2''. Addresses with more than one @ were made
illegal a few years ago, but many ARPANET hosts depended on
them, and the syntax is still being used. UUCP uses
``h1!h2!h3!user'', requiring the user to route the mail.
Berknets use ``host:user'' and do not allow explicit
routing.
To get mail from one host to another, it had to be routed
through gateways. Thus, the address ``csvax:mark@Berkeley''
from the ARPANET would send the mail to Berkeley, which
would forward it to the Berknet address csvax:mark. To send
mail to the ARPANET from UUCP, an address such as
``ihnp4!ucbvax!sam@foo-unix'' would route it through ihnp4
to ucbvax, which would interpret ``sam@foo-unix'' as an
ARPANET address and pass it along. When the Berknet-UUCP
gateway and Berknet-ARPANET gateway were on different
machines, addresses such as
``csvax:ihnp4!ihnss!warren@Berkeley'' were common.
As you can see, the combination of left-to-right UUCP syntax
and right-to-left ARPANET syntax makes things pretty
complex. Berknets are gone now, but there are lots of
gateways between UUCP and the ARPANET and ARPANET-like mail
networks. Sending mail to an address for which you only
know a path from the ARPANET onto UUCP is even harder -
suppose the address you have is ihnp4!ihnss!warren@Berkeley,
and you are on host rlgvax which uses seismo as an ARPANET
gateway. You must send to
seismo!ihnp4!ihnss!warren@Berkeley, which is not only pretty
hard to read, but when the recipient tries to reply, it will
have no idea where the break in the address between the two
UUCP pieces occurs. An ARPANET site routing across the UUCP
world to somebody's Ethernet using domains locally will have
to send an address something like ``xxx@Berkeley.ARPA'' to
get it to UUCP, then ``ihnp4!decvax!island!yyy'' to get it
- 8 -
to the other ethernet, then ``sam@csvax.ISLAND'' to get it
across their ethernet. The single address would therefore
be ihnp4!decvax!island!sam@csvax.ISLAND@Berkeley.ARPA, which
is too much to ask any person or mailer to understand. It's
even worse: gateways have to deal with ambiguous names like
ihnp4!mark@Berkeley, which can be parsed either
``(ihnp4!mark)@Berkeley'' in accordance with the ARPANET
conventions, or ``ihnp4!(mark@Berkeley)'' as the old UUCP
would.
Another very important reason for using domains is that your
mailing address becomes absolute instead of relative. It
becomes possible to put your electronic address on your
business card or in your signature file without worrying
about writing six different forms and fifteen hosts that
know how to get to yours. It drastically simplifies the job
of the reply command in your mail program, and automatic
reply code in the netnews software.
4. Further Information
For further information, some of the basic ARPANET reference
documents are in order. These can often be found posted to
Usenet, or available nearby. They are all available on the
ARPANET on host NIC via FTP with login ANONYMOUS, if you
have an ARPANET login. They can also be ordered from the
Network Information Center, SRI International, Menlo Park,
California, 94025.
RFC819 The Domain Naming Convention for Internet User Applications
RFC821 Simple Mail Transfer Protocol
RFC822 Standard for the Format of ARPANET Text Messages
RFC881 The Domain Names Plan and Schedule
#
# @(#)domain.mm 2.1 smail 12/14/86
#