550 lines
24 KiB
Plaintext
550 lines
24 KiB
Plaintext
|
|
|
|
|
|
|
|
|
|
|
|
Virus Verification and Removal -- Tools and Techniques
|
|
|
|
David M. Chess
|
|
High Integrity Computing Laboratory
|
|
IBM Thomas J. Watson Research Center
|
|
Yorktown Heights, NY
|
|
|
|
Nov. 18, 1991
|
|
|
|
|
|
|
|
HISTORY
|
|
|
|
|
|
This is an updated version of a paper that originally
|
|
appeared in the November 1991 issue of Virus Bulletin.
|
|
Since this sort of technology is continually evolving, it
|
|
seemed reasonable to make an update available on the net; in
|
|
particular, the virus-removal language has been considerably
|
|
enhanced since the paper was originally written. Comments
|
|
are welcome, on VIRUS-L (comp.virus), or directly to the
|
|
author (chess at watson.ibm.com).
|
|
|
|
|
|
|
|
|
|
INTRODUCTION
|
|
|
|
|
|
The first line of defense against computer viruses consists
|
|
of programs that detect that something is probably wrong.
|
|
These include modification detectors, integrity shells,
|
|
known-virus scanners, access-control programs, and similar
|
|
things. Their main function is to alert the user of a
|
|
machine that a virus, some virus, is probably present. The
|
|
important thing is the alert; since something is likely to
|
|
be wrong, the user should stop what he is doing, and take
|
|
action to correct the problem. It doesn't matter much at
|
|
this stage what the alert says; a first-line anti-virus
|
|
system that always said simply "Something virus-like may be
|
|
going on!" would be sufficient for most environments, if it
|
|
was usually right.
|
|
|
|
Once the alert has been given, and the infected system taken
|
|
out of immediate contact with other systems, other kinds of
|
|
software become important. Before we can decide how to
|
|
clean up an infected system, and even where else to look for
|
|
infection, we need to know exactly what the infection
|
|
consists of. Once that has been determined, we can take
|
|
steps to restore the infected parts of the system to an
|
|
uninfected state, and to recover from any other damage the
|
|
virus may have caused. This paper is a description of one
|
|
part of the second-line toolbox, the virus verifier and
|
|
remover.
|
|
|
|
|
|
|
|
VIRUS VERIFIERS
|
|
|
|
|
|
A virus verifier is a program that, given a file or disk
|
|
that is probably infected with a given virus, determines
|
|
with a high degree of certainty whether the virus is a known
|
|
strain, or a new variant. This is, of course, important to
|
|
know: if the virus is different from any known strain, it
|
|
will have to be analyzed for new effects before we can be
|
|
confident that we know just what to do to clean up after it.
|
|
On the other hand, if the virus is identical to a known
|
|
strain, we already know what to do. It is particularly
|
|
important to perform verification in a program that attempts
|
|
to automatically remove the virus infection from an object,
|
|
restoring it to its original uninfected form.
|
|
|
|
Abstractly, a verifier is a program that, given another
|
|
program as input, determines whether or not the given
|
|
program is part of the set of possible "offspring" of a
|
|
particular virus. For many classes of viruses, including
|
|
all the viruses actually widespread at the moment, this is
|
|
easy to do. Almost all known viruses consist almost
|
|
entirely of code that does not change from infection to
|
|
infection, except perhaps for a simple XOR-type garbling,
|
|
and data areas that are either constant, or change in simple
|
|
ways (or that can be ignored entirely for the purposes of
|
|
verification). Given a suspect file F and a known virus V,
|
|
it is therefore always relatively simple to answer the
|
|
question "is F a file that could have been produced by
|
|
infection with virus V?". It is an open question of some
|
|
theoretical interest whether or not some future virus might
|
|
make this harder to do! Reliably determining whether a file
|
|
is infected with any virus at all is of course known to be
|
|
impossible, but we have no similar result about determining
|
|
the presence of a specific virus.
|
|
|
|
There are various concrete decisions and tradeoffs involved
|
|
in writing a virus verifier; this section will list a few of
|
|
them, and the next sections will describe the
|
|
verifier/remover currently being developed and used at the
|
|
High Integrity Computing Lab at IBM's Watson Research
|
|
Center.
|
|
|
|
A verifier may be an independent tool, or it may be
|
|
integrated into a virus detector. An integrated
|
|
detector/verifier can be quicker and more convenient, since
|
|
there's no need for a user to find and run a verifier once
|
|
the detector goes off. On the other hand, since most copies
|
|
of any virus detector will never in fact detect a virus
|
|
(most of the world's computers are not infected, after all),
|
|
integrating a verifier along with the detector is in some
|
|
sense inefficient, in that it adds significant code to the
|
|
detector that may never be used. Given how much more
|
|
expensive human time is than CPU time and disk space these
|
|
days, integrated tools are likely to be more cost-effective
|
|
in the long run. On the other hand, detection and
|
|
verification will always be two different activities,
|
|
because it is very desirable for a detector to detect small
|
|
variants of known viruses as viruses, whereas a verifier
|
|
must be able to identify any variation as a variation.
|
|
Detection algorithms are typically run very often, and must
|
|
be fast. Verification algorithms, on the other hand, are
|
|
run rarely (only when a virus is detected), and speed is
|
|
typically not a major issue.
|
|
|
|
To determine whether or not a given object is infected with
|
|
a known strain of a virus, a verifier must know what the
|
|
known strain looks like. This may be done either with an
|
|
actual copy of the code of the known strain of the virus, or
|
|
by using a CRC or similar modification-detection algorithm.
|
|
It's not generally desirable to include the entire code of a
|
|
virus with widely-distributed tools, for obvious reasons!
|
|
On the other hand, even a good difficult-to-invert digital
|
|
signature algorithm is not as reliable as a byte-for-byte
|
|
comparison, and it is vulnerable to a virus author
|
|
intentionally creating a variant that looks to the verifier
|
|
like a known strain. (This can be made arbitrarily hard
|
|
through the use of cryptographic checksums and related
|
|
technologies, at some increase in runtime and complexity.)
|
|
|
|
Lastly, a verifier may use either special-purpose code, with
|
|
one or more routines being written in some compiled language
|
|
for each new strain discovered, or it may be written as an
|
|
interpreter for a high-level virus-description language. A
|
|
high-level language is generally simpler to program in
|
|
reliably; on the other hand, this is only true because it is
|
|
less expressive, which implies that there will be cases
|
|
(viruses that are exotically self-garbling, for instance) in
|
|
which it will be necessary to drop into the lower-level
|
|
programming language again.
|
|
|
|
|
|
|
|
VERV - A PROTOTYPE VIRUS VERIFIER AND REMOVER
|
|
|
|
|
|
At HICL, we are currently using and developing a virus
|
|
verifier and remover called "VERV" for PC-DOS viruses. The
|
|
current version can verify over 40 different viruses and
|
|
variants, which accounts for nearly all of the actual
|
|
infections that we see in day-to-day operation. It has
|
|
recently been enhanced to attempt to remove about a dozen of
|
|
the most common file-infecting viruses (we have other tools,
|
|
which will eventually be integrated, for removing
|
|
boot-sector-infecting viruses). As well as being used in
|
|
the lab, and as a research prototype, VERV is used by IBM's
|
|
internal Computer Emergency Response Teams (CERTs), as part
|
|
of routine incident handling.
|
|
|
|
It is an independent tool at the moment; in the long run, we
|
|
expect to integrate it with our other anti-virus programs.
|
|
It can use either a CRC algorithm or a byte-for-byte
|
|
comparison to verify the identity of a virus. In the
|
|
laboratory, we use the byte-for-byte compare to test new
|
|
samples against old ones. In the field, our users use the
|
|
CRC algorithm to verify the virus in infected objects before
|
|
applying cleanup measures.
|
|
|
|
VERV includes an interpreter for a small virus-description
|
|
language. Virus-description languages, for this and other
|
|
purposes, have been around for some time; Christoph Fischer
|
|
at the University of Karlsruhe, Morton Swimmer in Hamburg,
|
|
Alan Solomon in the UK, and no doubt many others in the
|
|
field, have worked on similar things (personal
|
|
correspondence; one motivation for this paper is to
|
|
encourage others, who have perhaps done it better, to
|
|
publish their work). VERV's language is very simple, and
|
|
provides for lower-level hooks (instructions to call
|
|
special-purpose C routines) when a virus requires actions
|
|
that cannot be described in the high-level language. We
|
|
will describe the language in some detail, not because it is
|
|
particularly interesting as a language, or because we think
|
|
we have it all correct and optimal, but rather so that other
|
|
people working on the same sorts of things can benefit from
|
|
both our ideas and our mistakes. We hope this will help
|
|
inspire continued discussion and exchange.
|
|
|
|
|
|
|
|
|
|
VERV'S VIRUS-DESCRIPTION LANGUAGE
|
|
|
|
|
|
|
|
The file from which VERV reads virus descriptions consists
|
|
of a number of virus-description blocks. Each block has the
|
|
following structure:
|
|
|
|
One or more VIRUS records
|
|
A NAME record
|
|
One or more LOAD records
|
|
Zero or more DEGARBLE and related records
|
|
Zero or more ZERO records
|
|
One or more check records
|
|
Zero or more REPAIR blocks
|
|
|
|
For instance, the block for the Slow-1721 virus currently
|
|
looks like this:
|
|
|
|
VIRUS slow slow-1721
|
|
NAME the Slow-1721 virus
|
|
LOAD P-COM 0 6B4
|
|
LOAD S-EXE 0 6B4
|
|
DEXOR1 001E 06AD 0012 0000 ; Degarble the code
|
|
DEXOR1 00EB 0159 0061 0001 ; and the data area
|
|
ZERO 0012 1 ; Zero the one-byte code-garble key
|
|
ZERO 0061 1 ; and the data-garble key
|
|
CODE 0000 00EA 38d5dc08 ; Code up to first data area
|
|
CONST 0144 014E 0ff22ad9 ; COMMAND.COM
|
|
CODE 015A 063C 74e00962 ; Code between data areas
|
|
CODE 0657 06AD ad3b0b41 ; After the second data area
|
|
|
|
The VIRUS records simply give a list of one-word aliases for
|
|
the virus, that are used on the command line to tell VERV
|
|
which virus to look for. These aliases are not the full
|
|
primary name of the virus (that is given on the NAME
|
|
record); they are just short abbreviations that the user can
|
|
use on the command line.
|
|
|
|
A very useful extension here would be for VERV to support
|
|
virus families, so that a single command would cause testing
|
|
for all members of the Jerusalem family, or the Flip family,
|
|
and so on. When integrated into the virus detector, of
|
|
course, the detector will directly inform the verifier which
|
|
virus or viruses to test for.
|
|
|
|
The LOAD records describe where in an infected object of a
|
|
given type the virus can be found. The tokens on a LOAD
|
|
record are an object type, followed by either an offset and
|
|
a length, or the word SPECIAL and a number. The offset
|
|
tells VERV where, relative to the effective entry-point of
|
|
that sort of object, to start loading; the length tells how
|
|
many bytes to load. For viruses that are not always at a
|
|
fixed offset from the initial entrypoint, the SPECIAL
|
|
keyword causes VERV to invoke an internal routine, coded in
|
|
C, to perform the loading.
|
|
|
|
The Slow virus is an EXE-infector, and a prepending COM
|
|
infector; the LOAD records in this example tell VERV to load
|
|
the first 06B4 bytes of a COM-format file, and the first
|
|
06B4 bytes after the entry point of an EXE-format file.
|
|
(EXE-format files are those that begin with the letters
|
|
"MZ"; DOS loads these differently from COM-format files,
|
|
which begin with any other bytes.) Other object types
|
|
supported include:
|
|
|
|
o E9-COM, for viruses that infect COM files by changing
|
|
the first three bytes to a long jump to the virus (E9 is
|
|
the hex code for a long jump),
|
|
o E8-COM, for viruses that infect COM files by changing
|
|
the first three bytes to a long CALL to the virus (E8 is
|
|
a long call),
|
|
o MBR, for viruses that infect hard disk master boot
|
|
records and diskette boot records, and fit in a single
|
|
sector,
|
|
o DISKETTE, for other sorts of diskette infectors (those
|
|
that do not fit in a single sector),
|
|
o HARDDISK, for other sorts of hard disk infectors (those
|
|
that infect system boot records, and/or occupy more than
|
|
one sector).
|
|
|
|
A description block will have as many LOAD records as there
|
|
are types of object that the virus can infect.
|
|
|
|
The DEXOR1 records tell VERV to perform a certain common
|
|
type of degarbling: a one-byte XOR with data to be found at
|
|
a fixed offset in the virus. The details are not terribly
|
|
important here. A more general record, consisting of just
|
|
the word DEGARBLE followed by a number, causes VERV to
|
|
invoke an internal C-language routine to perform degarbling.
|
|
|
|
Once the loading and degarbling have been done, VERV has a
|
|
complete "virus image" in its internal buffer. A
|
|
command-line switch (described later) can instruct VERV to
|
|
save the contents of this buffer to a file, for later
|
|
examination.
|
|
|
|
The ZERO records describe variable areas within the virus,
|
|
that should be set to zero before checks are done. This is
|
|
really just a convenience, to reduce the number of
|
|
check-type records needed.
|
|
|
|
There are three basic types of check records, describing
|
|
different tests to be done on the degarbled and zero'd virus
|
|
image now in the buffer:
|
|
|
|
o CODE records describe areas of virus code. The numbers
|
|
given are the start and end offsets of the area, and the
|
|
expected CRC value of the data there. VERV uses a
|
|
31-bit CRC, with a custom polynomial. This is not
|
|
strongly resistant to intentional reverse engineering; a
|
|
more difficult-to-invert algorithm may be desirable
|
|
later on. If any CODE areas are found to be different
|
|
than expected, VERV will report that this is not the
|
|
usual strain of the virus.
|
|
o CONST records describe constant areas that should not
|
|
change, and whose values effect the actual running of
|
|
the virus. CONST areas are currently treated exactly
|
|
like CODE areas.
|
|
o TEXT records describe areas of the virus that are not
|
|
expected to change, but do not significantly effect the
|
|
operation of the virus. If a sample differs from the
|
|
given description only in one or more TEXT areas, VERV
|
|
will report a "text variant" of the virus. This is
|
|
useful for message areas within a virus that are not
|
|
actually used, or that are simply displayed to the user.
|
|
These areas can be interesting in tracking how the virus
|
|
is spreading, by correlating incidents that involve the
|
|
same "text variant", but they do not effect cleanup or
|
|
prevention.
|
|
|
|
Normally, VERV performs its CRC calculation on each area
|
|
within the virus, and compares the results to the expected
|
|
values. A command-line switch (described in more detail
|
|
below) can be used to tell VERV to read a standard copy of
|
|
the virus from another file instead, and do byte-by-byte
|
|
comparison between the two. This is more reliable, but of
|
|
course it requires having a sample of the usual strain of
|
|
the virus present to verify against.
|
|
|
|
Another example, illustrating the use of special C routines,
|
|
is the block for the 1701 virus:
|
|
|
|
VIRUS 1701
|
|
NAME the 1701 virus
|
|
LOAD E9-COM -1 06A5
|
|
DEGARBLE 1
|
|
CODE 0001 0026 19989c7e ; Degarble, MOV, jmp-in
|
|
CODE 0076 06A4 c03a91c5 ; Main code
|
|
|
|
Here, the "DEGARBLE 1" record causes VERV to invoke an
|
|
internal routine to degarble the data in the buffer, using
|
|
the 1701's own algorithm. It would be possible to enhance
|
|
the virus-description language enough that the 1701's
|
|
degarbling algorithm could be expressed in it directly.
|
|
This would complicate the language considerably, though, and
|
|
would somewhat lessen the advantage that a special
|
|
high-level language has over native C code; so far, we have
|
|
decided against such enhancements.
|
|
|
|
|
|
|
|
REPAIR
|
|
|
|
|
|
For many viruses and many infected objects, it's possible to
|
|
restore the object to what it looked like before it was
|
|
infected, or at least to a state in which it will function
|
|
in just the same way. Unfortunately, this isn't always
|
|
possible; the classic example is the 1813 (Jerusalem) virus
|
|
infecting an EXE-format file. While it's usually possible
|
|
to undo the infection, sometimes the resulting file is
|
|
missing data that was in the uninfected original, and it's
|
|
not always possible to tell that this has happened. The
|
|
best an 1813-remover can do on the EXE file, therefore, is
|
|
something that is quite likely to work, but might not. In
|
|
most cases, though, sufficiently-reliable repair is
|
|
possible, and particularly in large infections of
|
|
non-critical machines, repair is sometimes a cost-effective
|
|
option.
|
|
|
|
A description of a virus in VERV's language includes one
|
|
repair block for every type of object that the virus may
|
|
infect. Each repair block consists of a header record
|
|
"REPAIR <object type>", followed by one or more
|
|
repair-operation records. Currently defined repair
|
|
operations include:
|
|
|
|
o an FCOPY_TO record, that copies bytes from the start of
|
|
the infected file up to a given number of bytes from the
|
|
virus entry point (this is used to remove appending
|
|
viruses),
|
|
o an FCOPY_FROM record that copies bytes from the infected
|
|
file, starting a given number of bytes from the virus
|
|
entry point, and ending a given number of bytes before
|
|
the end of the file (this is used to remove prepending
|
|
viruses),
|
|
o a BWRITE record, that copies so many bytes from a given
|
|
offset in VERV's internal buffer (which initially holds
|
|
an image of the virus) to a given offset in the file
|
|
being repaired (this is used, for instance, to repair
|
|
the first few bytes of an infected COM file, or the
|
|
header of an infected EXE file),
|
|
o a BREAD record, that loads a given number of bytes from
|
|
a given offset (relative to the start of the infected
|
|
file) into VERV's buffer,
|
|
o an EXE_LENGTH_BUG record, that tells VERV that this
|
|
particular virus has the common bug that it assumes that
|
|
the image length in the header of an EXE file is the
|
|
same as the file's length, and therefore damages (by
|
|
overlaying some data) any EXE file that contains data
|
|
after the EXE image,
|
|
o a 64K_COM_BUG record, which tells VERV that this virus
|
|
has the common bug that it assumes that any file it
|
|
thinks of as a COM file must be less than 64K bytes
|
|
long,
|
|
o an EXE_LENGTH_ADJUST record, that treats two words
|
|
within VERV's buffer as the "page count" and "last page
|
|
length" fields from a DOS EXE-file header, and subtracts
|
|
a given constant value, adjusting them accordingly,
|
|
o an R_SPECIAL record, to cause VERV to invoke an internal
|
|
C routine to perform some function not directly
|
|
implemented in the language.
|
|
|
|
For instance, the repair block for the usual 1813 or
|
|
Jerusalem virus currently looks like this:
|
|
|
|
REPAIR S-EXE
|
|
EXE_LENGTH_BUG
|
|
FCOPY_TO -0C5
|
|
EXE_LENGTH_ADJUST 0053 0051 0710
|
|
BWRITE 0043 0010 2 ; Fix SP
|
|
BWRITE 0045 000E 2 ; Fix SS
|
|
BWRITE 0047 0014 2 ; Fix IP
|
|
BWRITE 0049 0016 2 ; Fix CS
|
|
BWRITE 0051 0002 4 ; Fix image length
|
|
* Fixing COM files
|
|
REPAIR P-COM
|
|
64K_COM_BUG
|
|
FCOPY_FROM 0710 -5
|
|
|
|
The two BUG records cause VERV to print warnings to the user
|
|
that some files may not function correctly, and to refuse to
|
|
repair (later versions may offer to erase) any files that
|
|
are obviously not correctly repairable. The FCOPY records
|
|
pick out just the part of the file that does not contain the
|
|
virus, and the EXE_LENGTH_ADJUST and BWRITE records restore
|
|
and replace approximately the original EXE file header. EXE
|
|
files that are successfully repaired will differ from the
|
|
original file only in having been rounded up to a multiple
|
|
of sixteen bytes (and the corresponding change in the EXE
|
|
file header).
|
|
|
|
After repair is completed, VERV restarts processing on the
|
|
repaired file, to ensure that there is not another instance
|
|
of the virus present. If the virus is present in the file
|
|
multiple times, all will be removed. Once VERV is
|
|
integrated with a virus scanner, the repaired file will be
|
|
automatically re-scanned for all viruses, and any found will
|
|
be re-verified and removed.
|
|
|
|
Repair processing is only performed if the user has
|
|
requested it on the command line, and if VERV finds that the
|
|
virus is indeed exactly the known strain of the virus. In
|
|
small infections, or in situations where correct operation
|
|
of the objects involved is particularly crucial, we continue
|
|
to recommend that infected objects be destroyed (files
|
|
erased, diskettes formatted, and so on), and replaced from
|
|
uninfected sources.
|
|
|
|
|
|
|
|
|
|
VERV OPTIONS
|
|
|
|
|
|
The functions of VERV's command-line switches include:
|
|
|
|
o Reading the virus to be tested from an image file,
|
|
instead of from a normally-infected object; this can be
|
|
useful, for instance, in testing a boot-sector infector
|
|
that has been received as a binary dump of the boot
|
|
sector rather than on diskette.
|
|
o Overriding the default virus-description file (contained
|
|
within VERV.EXE itself), allowing easy testing of new or
|
|
experimental descriptions.
|
|
o Producing detailed progress messages and data displays
|
|
during processing, to help pinpoint differences found or
|
|
errors encountered.
|
|
o Specifying that, rather than using a CRC, VERV should
|
|
compare the relevant parts of the object to be tested
|
|
with a standard sample of the virus stored in an image
|
|
file, or a standard infected specimen.
|
|
o Producing a dump of the virus image, after all
|
|
degarbling, but before any zeroing has been done. This
|
|
image can then be used for storage, analysis, or
|
|
transmission, or for later use as input to VERV for
|
|
byte-by-byte comparisons.
|
|
|
|
|
|
|
|
|
|
STATUS AND FUTURE GOALS
|
|
|
|
|
|
VERV is currently in use by a small number of people within
|
|
IBM who deal with virus infections. Its availability has
|
|
greatly reduced the time spent by technical people in doing
|
|
semi-manual verification, and has therefore sped up the
|
|
response time to virus incidents. Adding a typical
|
|
newly-analyzed virus to VERV is generally quite simple,
|
|
involving a few lines in the VERV language, and sometimes a
|
|
small piece of C code to handle a new garbling algorithm.
|
|
|
|
The virus-removal language has just recently been
|
|
implemented, and is not yet in wide use.
|
|
|
|
Our near-term plans for VERV include support for families of
|
|
viruses, and the ability to verify a virus in a number of
|
|
objects at once. This will ease integration with our virus
|
|
detectors; when a detector detects a signature that
|
|
corresponds to a virus, or a family of viruses, in a number
|
|
of files, it will be able to verify the identity of the
|
|
virus with a single call to VERV.
|
|
|
|
If transmission bandwidth, CPU cycles, and disk space were
|
|
free, and programming was easy, every workstation would be
|
|
protected by a seamless "immune system". Objects infected
|
|
with existing viruses would be detected automatically, the
|
|
identity of the virus verified and reported to a central
|
|
location, and the object destroyed or repaired, with minimal
|
|
user intervention. New viruses would be detected
|
|
automatically with some high degree of confidence,
|
|
first-pass signature patterns would be extracted
|
|
automatically where possible and communicated to a central
|
|
clearinghouse, along with a sample of the suspicious object.
|
|
Viruses would very rarely, if at all, spread widely.
|
|
|
|
One of our main focuses at HICL is studying what part of
|
|
that ideal scenario is feasible, in both current and future
|
|
systems. The prototype VERV is a small part of our
|
|
experimentation with parts of that system that are also
|
|
immediately useful to users in the near term. We would
|
|
welcome similar descriptions by others in the field, of work
|
|
that they are doing in similar directions.
|
|
|