340 lines
20 KiB
Plaintext
340 lines
20 KiB
Plaintext
The following document is copyrighted, 1989, by Tim Sankary -
|
|
all rights reserved. It may be copied and distributed freely as long
|
|
as no changes are made and as long as this copyright notice remains
|
|
with the document
|
|
|
|
I want to preface this document with a personal statement. I
|
|
am aware that Jim Goodwin has published a partial list of his virus
|
|
disassemblies and I can imagine the controversy that will result. I
|
|
do not have an inside track to the "truth" of this Distribute/Don't
|
|
Distribute issue, and I can frankly see both sides of the argument. I
|
|
find it hard, however, to censure a colleague who has performed such
|
|
excellent and dedicated work as Jim has, and I have to admire his
|
|
courage in taking such a controversial step. For those of you who
|
|
anticipate writing or designing Identification and Removal programs
|
|
(CVIA Class III programs) for viruses, I hope you will find something
|
|
of value in the following study that will be useful. If you have
|
|
access to disassemblies, this document may provide some insights into
|
|
designing your own disinfectant.
|
|
I would like to thank "Doc" John McAfee for his guidance and
|
|
help in developing this paper, and the Computer Virus Industry
|
|
Association for the outstanding visual aids that they contributed.
|
|
These figures have been referenced in the paper but I have been unable
|
|
to create ASCII representations of them for BBS distribution. If you
|
|
obtained this document from an electronic source and would like a copy
|
|
of the figures, they can be obtained by sending a stamped, self
|
|
addressed envelope to the CVIA, 4423 Cheeney Street, Santa Clara, CA.
|
|
95054. - Tim Sankary
|
|
From the Homebase BBS
|
|
408 988 4004
|
|
|
|
|
|
|
|
DEVELOPING VIRUS IDENTIFICATION PRODUCTS
|
|
|
|
|
|
In January of 1986, the world's first computer virus was
|
|
unleashed upon an unsuspecting and largely defenseless population of
|
|
global IBM personal computers users. The virus originated in Lahore,
|
|
Pakistan, and spread rapidly from country to country through Europe
|
|
and across to the North American Continent. In less than twelve
|
|
months it had infected nearly a half-million computers and was causing
|
|
minor havoc in hundreds of universities, corporations and government
|
|
agencies.
|
|
This virus, later dubbed the "Pakistani Brain", caught the
|
|
user community unawares and the problems resulting from its many
|
|
infections demonstrated how unprepared we were for this phenomenon.
|
|
The computer systems targeted by the virus contained no specific
|
|
hardware or software elements that could prevent or even slow its
|
|
spread, and few utilities could even detect its presence after an
|
|
infection occurrence. Fortunately, the virus was not destructive, and
|
|
it limited its infections to floppy diskettes; avoiding hard disks
|
|
entirely.
|
|
The first defensive procedure developed to counteract this
|
|
virus involved a simple visual inspection of a suspected diskette's
|
|
volume serial label. The virus erased every infected diskette's
|
|
volume label and replaced it with the character string - "@BRAIN".
|
|
Thus, any inspection of the volume label, such as performing a simple
|
|
DIRECTORY command, would indicate the presence or absence of the
|
|
virus. An infected diskette could then be reformatted, or the virus
|
|
could be removed by replacing the boot sector. This manual procedure
|
|
is a typical, if somewhat rudimentary, example of the type of
|
|
functions performed by a class of antiviral utilities commonly called
|
|
Infection Identification products.
|
|
Infection identification products generally employ "passive"
|
|
techniques for virus detection. That is; they work by examining the
|
|
virus in its inert state. This contrasts with active detection
|
|
products which look for specific actions employed by a virus. For
|
|
example, looking for a Format instruction within a segment of code on
|
|
a disk would be a passive method of detecting a potentially
|
|
destructive program. If we detected the Format attempt during program
|
|
execution, however, we would be performing an active detection.
|
|
Passive methods concern themselves with the static attributes of
|
|
viruses, active methods concern themselves with the results of virus
|
|
execution.
|
|
Example active indicators are: the attempted erasure of
|
|
critical files, destruction of the FAT table, re-direction of system
|
|
interrupt vectors, general slowdown of the system, or an attempt to
|
|
modify an executable program. These indicators are generic; that is,
|
|
they are common to a large class of viruses. Because so many viruses
|
|
perform these common activities, however, they are of little use in
|
|
identifying individual virus strains. It is the passive virus
|
|
indicators that prove most useful to a positive identification: The
|
|
characteristic text imbedded within the virus, specific flags,
|
|
singular filenames or a distinctive sequence of instructions that are
|
|
unique to the virus. These and other similar indicators can best be
|
|
ascertained by scanning system storage and examining the program files
|
|
and other inert data.
|
|
|
|
History
|
|
Virus identification products have their genesis in the
|
|
utility programs first developed in 1982 and 1983 to check public
|
|
domain software for bombs or trojans before they were executed. These
|
|
utility programs initially checked for questionable instructions in
|
|
the suspect program's object code. Direct input/output instructions,
|
|
interrupt calls, format sequences and like instructions, if found,
|
|
were flagged and the user was notified. Later versions included tests
|
|
for imbedded data strings that were typically used by trojan
|
|
designers. Suspect programs were scanned for profanity, for keywords
|
|
like "gotcha" or "sucker", and for data strings that had been found in
|
|
specific trojan programs. Some programs looked also for specific
|
|
names of files that were frequently used by trojans and bombs.
|
|
These products, however, were seldom able to identify a
|
|
specific bomb or trojan. Rather, they indicated that the suspect
|
|
program contained instructions or messages of a questionable nature -
|
|
implying that the program might be a generic trojan. This, however,
|
|
is not sufficient for dealing with viruses.
|
|
Viruses create entirely different problems than bombs or
|
|
trojans. Viruses replicate, and can infect hundreds or even thousands
|
|
of programs within an installation. They remain invisible for long
|
|
periods of time before they activate and cause damage. And, they are
|
|
difficult to remove because they imbed themselves within critical
|
|
segments of the system. It is not sufficient to know that a virus is
|
|
present, it is necessary to know which virus is present. We must know
|
|
how it infects, what actions it takes, and, most importantly, what
|
|
must be done to de-activate and remove the virus.
|
|
Thus, when the first virus identification products emerged in
|
|
1986 they didn't just look for generic code or messages, they looked
|
|
for specific indications that could identify the individual virus
|
|
strain. This allowed the user to verify a specific infection
|
|
occurrence and take appropriate action. Later versions of these
|
|
products went a step further. They actually removed the virus when an
|
|
infection was identified.
|
|
|
|
Techniques
|
|
Before we discuss the techniques used by identification
|
|
products, we need to look briefly at how viruses insert themselves
|
|
into programs. As shown in Figure 1, viruses actually modify the
|
|
structure of the programs that they infect. Generally, the virus
|
|
replaces the program's start-up segment with a routine that passes
|
|
control to the main body of the virus. This main body code may be
|
|
inserted within the program in a buffer area, or it may be added to
|
|
the beginning or the end of the program. After execution of the
|
|
virus, the program's original start-up sequence is replaced and
|
|
control is passed to the program.
|
|
When removing a virus from an infected program, it is crucial
|
|
to determine exactly how the virus modified the program. Each virus
|
|
differs from other viruses in size, segmentation and technique. Each
|
|
virus chooses a different area for infection, stores the start-up
|
|
sequence in a different location. and return control in a different
|
|
manner. We must know exactly what the virus did during the infection
|
|
process in order to reverse the steps for removal.
|
|
Thus, it should be clear that in order to develop an antidote
|
|
for a specific virus, we must first obtain a copy of the virus for
|
|
analysis. A thorough analysis of the structure and design of the
|
|
virus will provide the answers to all of the above questions.
|
|
When a virus has been disassembled and analyzed, we in theory
|
|
know all there is to know about the virus. We are then able to create
|
|
an "attribute file" for the virus. This file contains all of
|
|
characteristics of the virus that can be uniquely assigned to the
|
|
virus. For example, we may find imbedded data within the virus that
|
|
we would not reasonably expect to find in any other program or data
|
|
file. Or we may find an instruction sequence that is sufficiently
|
|
unusual that we would not expect any other program to use the exact
|
|
same sequence. Figure 2 shows two virus examples that contain unique
|
|
imbedded data. In the Pakistani Brain example, it is clear that we
|
|
would not expect to find the exact same name, address and telephone
|
|
number in any other program.
|
|
In addition to "identification" attributes, the attribute file
|
|
contains all information necessary to reverse the virus infection
|
|
process. Common elements of an attribute file might be:
|
|
- Executable code signatures
|
|
- Volume label flags
|
|
- Hidden file names
|
|
- Absolute sector address contents
|
|
- Key data at specific file offsets
|
|
- Specific interrupt vector modifications
|
|
- ASCII data content
|
|
- Specific increases in bad sector counts
|
|
When the attribute file has been created, it is inputted into
|
|
a program that scans all of the appropriate areas of system storage
|
|
looking for combinations of the attributes. As more attributes are
|
|
discovered, the degree of assurance that the virus is present
|
|
increases. For example, the character string "sUMsDOS" is common to
|
|
all versions of the Israeli virus. It is conceivable, however, that
|
|
the same string could appear randomly in any text file. Therefore,
|
|
the identification program will look for verification attributes, such
|
|
as the file offset where the character string was located, or a
|
|
sequence of instructions following the data.
|
|
When the virus has been identified, the removal phase begins.
|
|
Since the infection attributes of the virus are known, the removal
|
|
process is fairly straightforward. Usually it involves locating the
|
|
main body of the virus and all segments of the original program that
|
|
had been re-located by the virus. The virus is erased, and the
|
|
program is then re-constructed.
|
|
Clearly, multiple attribute files can be used by a single
|
|
program. Thus, single identification products are able to identify
|
|
multiple strains of viruses (see Figure 3).
|
|
|
|
Product Advantages
|
|
Infection identification products have a major advantage over
|
|
other types of virus protection products: They are able to determine
|
|
whether or not a system is already infected. This is a serious
|
|
concern in many organizations. Other classes of virus protection
|
|
products must assume that a given system is uninfected at the time the
|
|
products are installed. They log the state of the system at the time
|
|
they are installed and periodically compare the current state to the
|
|
original state. If a virus has infected the system in the interim,
|
|
the change will be detected. If a virus has already infected the
|
|
system before such products are installed, however, the virus will be
|
|
logged as part of the original system, and no change will be detected.
|
|
Infection identification products, on the other hand, are
|
|
specifically designed to look for and identify pre-existing
|
|
infections. This ability to identify an existing infection is in many
|
|
cases crucial to the success of implementing antiviral measures.
|
|
Since a virus may remain dormant for months or even years before it
|
|
activates and damages the system, pre-existing infections could cause
|
|
widespread destruction in spite of our best efforts at implementing
|
|
protection programs.
|
|
Automatic removal is the second advantage of identification
|
|
products. Virus infections can sometimes involve hundreds or
|
|
thousands of programs within an organization. When the virus is
|
|
discovered, the task of tracking down and disinfecting all of the
|
|
infected programs can become monumental. In many cases, multiple
|
|
versions of a single program may be infected, or the original source
|
|
diskettes may have been lost or misplaced. In some cases, infected
|
|
programs may be overlooked or incorrectly replaced, so that re-
|
|
infection becomes a problem. These and other issues invariably cause
|
|
problems. The identification products, however, automatically find,
|
|
identify and remove the infection, normally at a rate of a few seconds
|
|
per infected program. The time savings alone can be enormous.
|
|
A third advantage to identification programs is that they
|
|
cannot be circumvented by a known virus. Other types of products that
|
|
use active methods for infection prevention or detection can be
|
|
specifically targeted by viruses. The virus can seek out and destroy
|
|
or disable the active element of such products. For example, if the
|
|
product is a filter type product that monitors all system I/O, the
|
|
virus can steal the interrupts from the monitor and thus bypass the
|
|
program's checking function. Likewise, if a protection program uses a
|
|
checksum or other method to look for change within a program, the
|
|
virus can modify the program's checksum routine so that the change
|
|
caused by an infection will not be detected. These and other
|
|
techniques have been used by many viruses to avoid interference by
|
|
antiviral programs that use active detection methods.
|
|
Identification products, on the other hand, cannot be so
|
|
easily circumvented. Since these products use passive techniques, the
|
|
virus has no control over the products' functions. Keep in mind that
|
|
the virus and its resultant system modifications are merely a sequence
|
|
of inert bits as far as the identification product is concerned. Also
|
|
the virus is not active at the time the product is being used (all
|
|
such products come with their own boot diskettes, and they run
|
|
stand-alone). Thus, the virus can in no way affect the product's
|
|
operation, or even be aware of its presence.
|
|
|
|
Problem areas
|
|
There are some drawbacks to identification products however.
|
|
The first problem is that these products only work for known viruses.
|
|
That is, a virus that has been around long enough to be noticed,
|
|
isolated, sampled, disassembled and analyzed. This may take a
|
|
considerable time if the virus is unobtrusive and slow to activate.
|
|
When the virus has been discovered and analyzed, the identification
|
|
product must be designed, implemented, packaged, marketed and
|
|
distributed - a process that could take considerably more time. Thus
|
|
identification utilities will lag new virus developments by months, or
|
|
in some cases, even years. This time lag implies that there will
|
|
always be new viruses, and thus new dangers, against which no
|
|
identification utility will be effective.
|
|
The second problem with these products is more thorny, and
|
|
requires a high level of product sophistication in order to resolve. At
|
|
issue is a phenomenon that might be called the Uncertainty Factor, and
|
|
it is caused by the increasing tendency of hackers to collect existing
|
|
viruses, modify them and return them to the public domain. These
|
|
modifications sometimes cause viruses to react differently from the ways
|
|
in which they were originally designed, yet they may leave key
|
|
identification attributes unchanged.
|
|
For example, the Jerusalem virus was originally designed to slow
|
|
down the infected machine's processor one-half hour after an infected
|
|
program was executed. This slowdown was a nuisance to the user of the
|
|
infected machine, but it severely limited the spread of the virus,
|
|
because the virus made itself known early in the infection process and
|
|
had limited time to replicate before being removed. In the summer of
|
|
1988, an unknown hacker modified the virus by changing just one
|
|
instruction (see Figure 4). This modification disabled the routine that
|
|
caused the system to slowdown, and as a result, the virus became many
|
|
times more infectious.
|
|
Modifications like this, and other more substantial
|
|
modifications, are made almost daily to existing viruses. The danger
|
|
that these modifications pose to identification products is substantial.
|
|
If an identification product is attempting to remove a virus that has
|
|
infected a program differently than the way in which the product
|
|
expects, then the results of the disinfection will be unpredictable.
|
|
Damage to the system may result, the program may be destroyed or, in the
|
|
worst case, the virus will still be active even though the product
|
|
thinks it has removed it.
|
|
In order to minimize the risks posed by this problem,
|
|
identification products must be designed to cross reference as many
|
|
virus attributes as possible prior to attempting removal. If any one of
|
|
the expected attributes has been changed, or is missing, the product
|
|
should notify the user of the potential problem and manual intervention
|
|
will be required.
|
|
|
|
Future Prospects
|
|
Identification products clearly must play a major role in the
|
|
battle against computer viruses. As viruses become more widespread and
|
|
as infections become more common, the need for utilities able to
|
|
identify and help remove viruses will become apparent. It is probable
|
|
that these products will become the dominant form of virus protection in
|
|
the future. A few technical advances, however would greatly aid their
|
|
general acceptance.
|
|
One of the problems facing identification products is the time
|
|
required to fully scan attached storage devices when searching for a
|
|
virus. For example, as many as ten or more minutes can be required to
|
|
fully scan a 40 megabyte drive while looking for just one virus.
|
|
Multiple virus checks require more time. Because of this, it is
|
|
impractical to perform frequent scans of the system. This is
|
|
unfortunate because it would be advantageous to perform a complete
|
|
identification check of a system each time the system was booted. This
|
|
would provide a high degree of system security, assuming that the
|
|
identification product was kept up to date. More sophisticated
|
|
algorithms for searching attached storage and creative techniques for
|
|
multiple virus scans could alleviate the time scan problem.
|
|
A second desirable advance in the technology of these products
|
|
would be the development of techniques that could identify variations of
|
|
known viruses and still provide the capability to remove the modified
|
|
virus. This advance would remove a major limitation of the current
|
|
products and would greatly increase their reliability. Techniques for
|
|
removing variations have already been developed for a few root viruses,
|
|
but there currently exists no generic technique that is effective for a
|
|
large class of viruses. I anticipate that this hurdle will be overcome
|
|
within a year or two.
|
|
A final enhancement would be the ability to fully or partially
|
|
re- structure data that has been corrupted by a virus after it has
|
|
activated. Currently, infection identification products are only useful
|
|
if they are used before a virus begins its destructive phase. When the
|
|
destructive phase begins, the virus may destroy critical control tables,
|
|
data files, programs or even itself. At this point all current virus
|
|
products have limited usefulness.
|
|
It is possible in some cases, however, to reverse much of the
|
|
destruction caused by a virus provided: 1) We know the details of the
|
|
destruction process, and 2) The destructive phase has not gone on too
|
|
long. For example, one of the common PC viruses scrambles the File
|
|
Allocation Table by reversing a number of the entries. Since we know
|
|
the exact way in which the virus scrambles the information, we can
|
|
easily unscramble it. However, after a few days of data scrambling, the
|
|
virus initiates a low level format of the hard disk. At this point, no
|
|
recovery is possible.
|
|
I anticipate that future products will incorporate recovery
|
|
capabilities for a large number of virus destructive acts. This
|
|
capability, and others described above, should provide the best virus
|
|
protection that we can hope to achieve.
|