textfiles/internet/domain.txt

What is a USENET domain?
------------------------
                            What is a Domain?

                              Mark R. Horton

                            Bell Laboratories
                           Columbus, Ohio 43213

                                 ABSTRACT


                 In the past, electronic mail has used
                 many different kinds of syntax, naming a
                 computer and a login name on that
                 computer.  A new system, called
                 ``domains'', is becoming widely used,
                 based on a heirarchical naming scheme.
                 This paper is intended as a quick
                 introduction to domains.  For more
                 details, you should read some of the
                 documents referenced at the end.

       1.  Introduction

       What exactly are domains?  Basically, they are a way of
       looking at the world as a heirarchy (tree structure).
       You're already used to using two tree world models that work
       pretty well: the telephone system and the post office.
       Domains form a similar heirarchy for the electronic mail
       community.

       The post office divides the world up geographically, first
       into countries, then each country divides itself up, those
       units subdivide, and so on.  One such country, the USA,
       divides into states, which divide into counties (except for
       certain states, like Louisiana, which divide into things
       like parishes), the counties subdivide into cities, towns,
       and townships, which typically divide into streets, the
       streets divide into lots with addresses, possibly containing
       room and apartment numbers, the then individual people at
       that address.  So you have an address like

               Mark Horton
               Room 2C-249
               6200 E. Broad St.
               Columbus, Ohio, USA

       (I'm ignoring the name ``AT&T Bell Laboratories'' and the
       zip code, which are redundant information.)  Other countries
       may subdivide differently, for example many small countries
       do not have states.

       The telephone system is similar.  Your full phone number
       might look like 1-614-860-1234 x234 This contains, from left
       to right, your country code (Surprise!  The USA has country


                                  - 2 -


       code ``1''!), area code 614 (Central Ohio), 860 (a prefix in
       the Reynoldsburg C.O.), 1234 (individual phone number), and
       extension 234.  Some phone numbers do not have extensions,
       but the phone system in the USA has standardized on a 3
       digit area code, 3 digit prefix, and 4 digit phone number.
       Other countries don't use this standard, for example, in the
       Netherlands a number might be +46 8 7821234 (country code
       46, city code 8, number 7821234), in Germany +49 231
       7551234, in Sweden +31 80 551234, in Britain +44 227 61234
       or +44 506 411234.  Note that the country and city codes and
       telephone numbers are not all the same length, and the
       punctuation is different from our North American notation.
       Within a country, the length of the telephone number might
       depend on the city code.  Even within the USA, the length of
       extensions is not standardized: some places use the last 4
       digits of the telephone number for the extension, some use 2
       or 3 or 4 digit extensions you must ask an operator for.
       Each country has established local conventions.  But the
       numbers are unambigous when dialed from left-to-right, so as
       long as there is a way to indicate when you are done
       dialing, there is no problem.

       A key difference in philosophy between the two systems is
       evident from the way addresses and telephone numbers are
       written.  With an address, the most specific information
       comes first, the least specific last.  (The ``root of the
       tree'' is at the right.)  With telephones, the least
       specific information (root) is at the left.  The telephone
       system was designed for machinery that looks at the first
       few digits, does something with it, and passes the remainder
       through to the next level.  Thus, in effect, you are routing
       your call through the telephone network.  Of course, the
       exact sequence you dial depends on where you are dialing
       from - sometimes you must dial 9 or 8 first, to get an
       international dialtone you must dial 011, if you are calling
       locally you can (and sometimes must) leave off the 1 and the
       area code.  (This makes life very interesting for people who
       must design a box to call their home office from any phone
       in the world.)  This type of address is called a ``relative
       address'', since the actual address used depends on the
       location of the sender.

       The postal system, on the other hand, allows you to write
       the same address no matter where the sender is.  The address
       above will get to me from anywhere in the world, even
       private company mail systems.  Yet, some optional
       abbreviations are possible - I can leave off the USA if I'm
       mailing within the USA; if I'm in the same city as the
       address, I can usually just say ``city'' in place of the
       last line.  This type of address is called an ``absolute
       address'', since the unabbreviated form does not depend on


                                  - 3 -


       the location of the sender.

       The ARPANET has evolved with a system of absolute addresses:
       ``user@host'' works from any machine.  The UUCP network has
       evolved with a system of relative addresses: ``host!user''
       works from any machine with a direct link to ``host'', and
       you have to route your mail through the network to find such
       a machine.  In fact, the ``user@host'' syntax has become so
       popular that many sites run mail software that accepts this
       syntax, looks up ``host'' in a table, and sends it to the
       appropriate network for ``host''.  This is a very nice user
       interface, but it only works well in a small network.  Once
       the set of allowed hosts grows past about 1000 hosts, you
       run into all sorts of administrative problems.

       One problem is that it becomes nearly impossible to keep a
       table of host names up to date.  New machines are being
       added somewhere in the world every day, and nobody tells you
       about them.  When you try to send mail to a host that isn't
       in your table (replying to mail you just got from a new
       host), your mailing software might try to route it to a
       smarter machine, but without knowing which network to send
       it to, it can't guess which smarter machine to forward to.
       Another problem is name space collision - there is nothing
       to prevent a host on one network from choosing the same name
       as a host on another network.  For example, DEC's ENET has a
       ``vortex'' machine, there is also one on UUCP.  Both had
       their names long before the two networks could talk to each
       other, and neither had to ask the other network for
       permission to use the name.  The problem is compounded when
       you consider how many computer centers name their machines
       ``A'', ``B'', ``C'', and so on.

       In recognition of this problem, ARPA has established a new
       way to name computers based on domains.  The ARPANET is
       pioneering the domain convention, and many other computer
       networks are falling in line, since it is the first naming
       convention that looks like it really stands a chance of
       working.  The MILNET portion of ARPANET has a domain, CSNET
       has one, and it appears that Digital, AT&T, and UUCP will be
       using domains as well.  Domains look a lot like postal
       addresses, with a simple syntax that fits on one line, is
       easy to type, and is easy for computers to handle.  To
       illustrate, an old routed UUCP address might read
       ``sdcsvax!ucbvax!allegra!cbosgd!mark''.  The domain version
       of this might read ``mark@d.osg.cb.att.uucp''.  The machine
       is named d.osg.cb.att.uucp (UUCP domain, AT&T company,
       Columbus site, Operating System Group project, fourth
       machine.)  Of course, this example is somewhat verbose and
       contrived; it illustrates the heirarchy well, but most
       people would rather type something like ``cbosgd.att.uucp''


                                  - 4 -


       or even ``cbosgd.uucp'', and actual domains are usually set
       up so that you don't have to type very much.

       You may wonder why the single @ sign is present, that is,
       why the above address does not read
       ``mark.d.osg.cb.att.uucp''.  In fact, it was originally
       proposed in this form, and some of the examples in RFC819 do
       not contain an @ sign.  The @ sign is present because some
       ARPANET sites felt the strong need for a divider between the
       domain, which names one or more computers, and the left hand
       side, which is subject to whatever interpretation the domain
       chooses.  For example, if the ATT domain chooses to address
       people by full name rather than by their login, an address
       like ``Mark.Horton@ATT.UUCP'' makes it clear that some
       machine in the ATT domain should interpret the string
       ``Mark.Horton'', but if the address were
       ``Mark.Horton.ATT.UUCP'', routing software might try to find
       a machine named ``Horton'' or ``Mark.Horton''.  (By the way,
       case is ignored in domains, so that ``ATT.UUCP'' is the same
       as ``att.uucp''.  To the left of the @ sign, however, a
       domain can interpret the text any way it wants; case can be
       ignored or it can be significant.)

       It is important to note that domains are not routes.  Some
       people look at the number of !'s in the first example and
       the number of .'s in the second, and assume the latter is
       being routed from a machine called ``uucp'' to another
       called ``att'' to another called ``cb'' and so on.  While it
       is possible to set up mail routing software to do this, and
       indeed in the worst case, even without a reasonable set of
       tables, this method will always work, the intent is that
       ``d.osg.cb.att.uucp'' is the name of a machine, not a path
       to get there.  In particular, domains are absolute
       addresses, while routes depend on the location of the
       sender.  Some subroutine is charged with figuring out, given
       a domain based machine name, what to do with it.  In a high
       quality environment like the ARPA Internet, it can query a
       table or a name server, come up with a 32 bit host number,
       and connect you directly to that machine.  In the UUCP
       environment, we don't have the concept of two processes on
       arbitrary machines talking directly, so we forward mail one
       hop at a time until it gets to the appropriate destination.
       In this case, the subroutine decides if the name represents
       the local machine, and if not, decides which of its
       neighbors to forward the message to.


                                  - 5 -


       2.  What is a Domain?

       So, after all this background, we still haven't said what a
       domain is.  The answer (I hope it's been worth the wait) is
       that a domain is a subtree of the world tree.  For example,
       ``uucp'' is a top level domain (that is, a subtree of the
       ``root''.) and represents all names and machines beneath it
       in the tree.  ``att.uucp'' is a subdomain of ``uucp'',
       representing all names, machines, and subdomains beneath
       ``att'' in the tree.  Similarly for ``cb.att.uucp'',
       ``osg.cb.att.uucp'', and even ``d.osg.cb.att.uucp''
       (although ``d.osg.cb.att.uucp'' is a ``leaf'' domain,
       representing only the one machine).

       A domain has certain properties.  The key property is that
       it has a ``registry''.  That is, the domain has a list of
       the names of all immediate subdomains, plus information
       about how to get to each one.  There is also a contact
       person for the domain.  This person is responsible for the
       domain, keeping the registry up-to-date, serving as a point
       of contact for outside queries, and setting policy
       requirements for subdomains.  Each subdomain can decide who
       it will allow to have subdomains, and establish requirements
       that all subdomains must meet to be included in the
       registry.  For example, the ``cb'' domain might require all
       subdomains to be physically located in the AT&T building in
       Columbus.

       ARPA has established certain requirements for top level
       domains.  These requirements specify that there must be a
       list of all subdomains and contact persons for them, a
       responsible person who is an authority for the domain (so
       that if some site does something bad, it can be made to
       stop), a minimum size (to prevent small domains from being
       top level), and a pair of nameservers (for redundancy) to
       provide a directory-assistance facility.  Domains can be
       more lax about the requirements they place on their
       subdomains, making it harder to be a top level domain than
       somewhere lower in the tree.  Of course, if you are a
       subdomain, your parent is responsible for you.

       One requirement that is NOT present is for unique parents.
       That is, a machine (or an entire subdomain) need not appear
       in only one place in the tree.  Thus, ``cb'' might appear
       both in the ``att'' domain, and in the ``ohio'' domain.
       This allows domains to be structured more flexibly than just
       the simple geography used by the postal service and the
       telephone company; organizations or topography can be used
       in parallel.  (Actually, there are a few instances where
       this is done in the postal service [overseas military mail]
       and the telephone system [prefixes can appear in more than


                                  - 6 -


       one area code, e.g. near Washington D.C., and Silicon
       Valley].)  It also allows domains to split or join up, while
       remaining upward compatible with their old addresses.

       Do all domains represent specific machines?  Not
       necessarily.  It's pretty obvious that a full path like
       ``d.cbosg.att.uucp'' refers to exactly one machine.  The OSG
       domain might decide that ``cbosg.att.uucp'' represents a
       particular gateway machine.  Or it might decide that it
       represents a set of machines, several of which might be
       gateways.  The ``att.uucp'' domain might decide that several
       machines, ``ihnp4.uucp'', ``whgwj.uucp'', and ``hogtw.uucp''
       are all entry points into ``att.uucp''.  Or it might decide
       that it just represents a spot in the name space, not a
       machine.  For example, there is no machine corresponding to
       ``arpa'' or ``uucp'', or to the root.  Each domain decides
       for itself.  The naming space and the algorithm for getting
       mail from one machine to another are not closely linked -
       routing is up to the mail system to figure out, with or
       without help from the structure of the names.

       The domain syntax does allow explicit routes, in case you
       want to exercise a particular route or some gateway is
       balking.  The syntax is
       ``@dom1,@dom2,...,@domn:user@domain'', for example,
       @ihnp4.UUCP,@ucbvax.UUCP,:joe@NIC.ARPA, forcing it to be
       routed through dom1, dom2, ..., domn, and from domn sent to
       the final address.  This behaves exactly like the UUCP !
       routing syntax, although it is somewhat more verbose.

       By the way, you've no doubt noticed that some forms of
       electronic addresses read from left-to-right (cbosgd!mark),
       others read from right-to-left (mark@Berkeley).  Which is
       better?  The real answer here is that it's a religious
       issue, and it doesn't make much difference.  left-to-right
       is probably a bit easier for a computer to deal with because
       it can understand something on the left and ignore the
       remainder of the address.  (While it's almost as easy for
       the program to read from right-to-left, the ease of going
       from left-to-right was probably in the backs of the minds of
       the designers who invented host:user and host!user.)

       On the other hand, I claim that user@host is easier for
       humans to read, since people tend to start reading from the
       left and quit as soon as they recognize the login name of
       the person.  Also, a mail program that prints a table of
       headers may have to truncate the sender's address to make it
       fit in a fixed number of columns, and it's probably more
       useful to read ``mark@d.osg.a'' than ``ucbvax!sdcsv''.


                                  - 7 -


       These are pretty minor issues, after all, humans can adapt
       to skip to the end of an address, and programs can truncate
       on the left.  But the real problem is that if the world
       contains BOTH left-to-right and right-to-left syntax, you
       have ambiguous addresses like x!y@z to consider.  This
       single problem turns out to be a killer, and is the best
       single reason to try to stamp out one in favor of the other.


       3.  So why are we doing this, anyway?

       The current world is full of lots of interesting kinds of
       mail syntax.  The old ARPA ``user@host'' is still used on
       the ARPANET by many systems.  Explicit routing can sometimes
       by done with an address like ``user@host2@host1'' which
       sends the mail to host1 and lets host1 interpret
       ``user@host2''.  Addresses with more than one @ were made
       illegal a few years ago, but many ARPANET hosts depended on
       them, and the syntax is still being used.  UUCP uses
       ``h1!h2!h3!user'', requiring the user to route the mail.
       Berknets use ``host:user'' and do not allow explicit
       routing.

       To get mail from one host to another, it had to be routed
       through gateways.  Thus, the address ``csvax:mark@Berkeley''
       from the ARPANET would send the mail to Berkeley, which
       would forward it to the Berknet address csvax:mark.  To send
       mail to the ARPANET from UUCP, an address such as
       ``ihnp4!ucbvax!sam@foo-unix'' would route it through ihnp4
       to ucbvax, which would interpret ``sam@foo-unix'' as an
       ARPANET address and pass it along.  When the Berknet-UUCP
       gateway and Berknet-ARPANET gateway were on different
       machines, addresses such as
       ``csvax:ihnp4!ihnss!warren@Berkeley'' were common.

       As you can see, the combination of left-to-right UUCP syntax
       and right-to-left ARPANET syntax makes things pretty
       complex.  Berknets are gone now, but there are lots of
       gateways between UUCP and the ARPANET and ARPANET-like mail
       networks.  Sending mail to an address for which you only
       know a path from the ARPANET onto UUCP is even harder -
       suppose the address you have is ihnp4!ihnss!warren@Berkeley,
       and you are on host rlgvax which uses seismo as an ARPANET
       gateway.  You must send to
       seismo!ihnp4!ihnss!warren@Berkeley, which is not only pretty
       hard to read, but when the recipient tries to reply, it will
       have no idea where the break in the address between the two
       UUCP pieces occurs.  An ARPANET site routing across the UUCP
       world to somebody's Ethernet using domains locally will have
       to send an address something like ``xxx@Berkeley.ARPA'' to
       get it to UUCP, then ``ihnp4!decvax!island!yyy'' to get it


                                  - 8 -


       to the other ethernet, then ``sam@csvax.ISLAND'' to get it
       across their ethernet.  The single address would therefore
       be ihnp4!decvax!island!sam@csvax.ISLAND@Berkeley.ARPA, which
       is too much to ask any person or mailer to understand.  It's
       even worse: gateways have to deal with ambiguous names like
       ihnp4!mark@Berkeley, which can be parsed either
       ``(ihnp4!mark)@Berkeley'' in accordance with the ARPANET
       conventions, or ``ihnp4!(mark@Berkeley)'' as the old UUCP
       would.

       Another very important reason for using domains is that your
       mailing address becomes absolute instead of relative.  It
       becomes possible to put your electronic address on your
       business card or in your signature file without worrying
       about writing six different forms and fifteen hosts that
       know how to get to yours.  It drastically simplifies the job
       of the reply command in your mail program, and automatic
       reply code in the netnews software.


       4.  Further Information

       For further information, some of the basic ARPANET reference
       documents are in order.  These can often be found posted to
       Usenet, or available nearby.  They are all available on the
       ARPANET on host NIC via FTP with login ANONYMOUS, if you
       have an ARPANET login.  They can also be ordered from the
       Network Information Center, SRI International, Menlo Park,
       California, 94025.

       RFC819  The Domain Naming Convention for Internet User Applications
       RFC821  Simple Mail Transfer Protocol
       RFC822  Standard for the Format of ARPANET Text Messages
       RFC881  The Domain Names Plan and Schedule

       #
       # @(#)domain.mm 2.1 smail 12/14/86
       #