textfiles/virus/yetiser_

272 lines
15 KiB
Plaintext

Date: Fri, 30 Apr 1993 19:43:34 -0400
From: tyetiser@umbc.edu (Mr. Tarkan Yetiser)
Subject: Polymorphic Viruses
Polymorphic Viruses: Implementation, Detection, and Protection
Copyright (c) 1993
by
VDS Advanced Research Group
P.O. Box 9393
Baltimore, MD 21228, U.S.A.
prepared by
Tarkan Yetiser
e-mail: tyetiser@umbc5.umbc.edu
Jan 24, 1993
PA, U.S.A.
Summary
This paper discusses the subject of polymorphic engines and viruses. It looks
at general characteristics of polymorphism as currently implemented. It tries
to maintain a practical presentation of the subject matter rather than an
academic and abstract approach that would confuse many people. Basic
knowledge of the Intel 80x86 instruction set will be highly useful in
understanding the material presented. A very detailed discussion is avoided
not to have the side effect of "teaching" how to create polymorphic engines
or viruses. The purpose is to help computer professionals understand this
trend of virus development and the threats it poses. It should serve as a
starting point for individuals who would like to get an idea about the
polymorphic viruses and how they are implemented. Long gone are the days of
innocence, when any schoolboy could write a virus scanner using a few
signatures extracted from captured virus samples.
The subject of polymorphism can be extended to other areas such as
anti-reverse-engineering or anti-direct-attacks, and it can be argued to be
useful in that context. This paper only looks at the use of polymorphism in
PC viruses to avoid simple detection techniques.
1. Introduction
In the Spring of 1992, we analyzed a polymorphic engine called MtE, and
provided a report to satisfy the curiosity of the public. We also provided a
freeware program called CatchMtE. At the end of 1992, we received reports
of a new polymorphic engine, called TridenT Polymorphic Engine, or TPE for
short. It was released in Europe by someone who calls himself Masud Khafir.
Both MtE and TPE are distributed as object modules that can be linked to
programs allowing them to create different-looking decryptors. One notorious
use of such a module is for writing so-called "polymorphic" viruses. Prior
to the appearance of TPE, several viruses using MtE (Mutation Engine) have
been seen. The claimed author of TPE pays tribute to the author of MtE in the
documentation that comes with TPE.
MtE is about 2.4 kilobytes in size. It can generate decyptors using based
or indexed addressing modes with word-size displacement. The decryptor steps
through the code a word at a time. It uses 4 variations of the based or
indexed addressing modes. The structure of the decryptors is constant.
TPE is about 1.5 kilobytes in size. It can generate decryptors using based
or indexed addressing modes with or without displacement. Unlike MtE, TPE
can also create byte-at-a-time decryptors, as well as word-at-a-time. It also
uses more addressing modes available on the 80x86; 6 variations of the based
or indexed addressing modes are used. Its more general nature makes TPE less
predictable, and complicates the task of recognizing TPE-based viruses. Many
encryptive viruses can be considered a subset of TPE-based decryptors, and
may be flagged as such. To overcome this problem, one has to check for other
viruses before performing check for the presence of a TPE decryptor.
2. Polymorphism and Its Common Use
We will now present some preliminary information on polymorphism in
general, and discuss certain features of the Intel 80x86 instruction set.
Although polymorphism is independent of encryption, it is easier to use
encryption to hide the main body of the virus and implement a polymorphic
decryptor. Viruses aim to keep their size as small as possible and it is
impractical to make the main virus body polymorphic. One could attempt to
rearrange the instructions in the main virus body or even use different
instructions (to defy recognition techniques based on checksumming). Such an
effort would not be as helpful or as easy as the common approach of using
encryption in combination with polymorphism. In summary, current polymorphic
viruses keep the main virus body encrypted, and implement a polymorphic
decryption routine in plaintext. Since the decryptor is comparatively small,
and performs one specific task, the amount of time and effort needed to
craft a polymorphic virus is significantly reduced. Pictorially, a generic
polymorphic virus would be structured as follows:
virus entry point:
Polymorphic Decryptor
*****************************
** encrypted ****************
** main virus body **********
*****************************
*****************************
3. Implementation of Polymorphism on the Intel 80x86
In almost every case we have examined, the polymorphic engine exploits the
fact that certain computations can be performed using different registers and
instructions. To step through encrypted portion of code, for example, one
can use DI, SI, or BX registers. To increment or to decrement the index value,
one can ADD to the index register, INC it, or use an implicit instruction that
increments it (CMPSB is used in TPE for example).
Polymorphic engines can also rely on the availability of instructions that
are coded using the same opcodes. On the 80x86, there are 11 opcodes used for
several different instructions: 80, 81, 83, D0, D1, D2, D3, F6, F7, FE, and
FF. When it is necessary to encode information about one operand, the middle
three bits of the ModRM byte are used to distinguish operations. The ModRM
byte follows some opcodes and can either extend the opcode or contain
information about the operand and addressing modes of an instruction. It
contains three fields: Mod (2 bits), Reg (3 bits), and R/M (3 bits). For
example, for the opcode 80, the middle three bits of the ModRM byte, which
make up the Reg field, have the following meanings:
ADD 000
OR 001
ADC 010
SBB 011
AND 100
SUB 101
XOR 110
CMP 111
The second operand of each instruction with an opcode of 80 is an
immediate byte, so the ModRM fields in the second byte of machine code encode
the first register or memory operand.
By flipping a few bits, it is possible to generate code achieving many
different operations easily. Combined with a rich set of addressing modes,
and a good random number generator, one can create very different-looking
decryptors.
4. Common Characteristics of Polymorphic Viruses
Polymorphic viruses of varying degrees of complexity have appeared in the
past. In the anti-virus community, polymorphism is still an active area of
discussion and much disagreement. Most researchers would agree that certain
viruses are simply encryptive with a variable key. We do not consider such
beasts polymorphic since they can be recognized using simple wild card scan
strings. The number of such viruses significantly increased over the past few
years as a reaction to the worldwide availability of good signature-based
virus scanners. Similarly, scanners also evolved to handle such beasts using
more flexible signatures that can include wild card or don't-care values to
accommodate the variable parts of the decryptors.
There are other viruses that exhibit some polymorphic behavior, though
they remain unsophisticated. Classification of such beasts is a gray area.
For example, some viruses can generate a limited number of different-looking
decryptors. These do not implement a polymorphic code generation engine, but
rather pick from a pre-computed set of code fragments that they carry along.
Writing detection routines for such viruses is not too difficult.
To aid in implementing polymorphism, a random-number generator is often
used. A random-number generator provides selection of a part of the decryptor.
This selection could be for "noise" instructions that are inserted in the
decryptor to render signature-based scanning ineffective. It could also be
used to select a certain addressing mode and appropriate registers. The
obvious use of random number generators is to get a seed value for encryption.
Another aspect of a polymorphic engine is the choice of instructions that
actually perform the decryption. XOR is a favorite choice. The chosen
operation modifies the code to be decrypted using an addressing mode that
allows external memory access. By external, we mean outside the processor
registers. By using several instructions and many different addressing modes,
a polymorphic engine can achieve a large number of combinations of decryptors.
Usually, a loop construct is set up to step through the encrypted code. On
the 80x86, many instructions can be used to accomplish looping. The loop
counter can be modified implicitly using LOOP instruction. Or it could be
more explicit using a DEC CX, JNZ ?? combination. Several such possibilities
exist. Again, availability of different approaches increases the number of
appearances a decryptor can have. In the MtE, the loop construct was very
predictable not only because it used a specific instruction but also because
it had a characteristic that made it very simple to recognize MtE-based
decryptors. Although "appearance-based" analysis on the MtE left many
anti-virus developers in despair, a structural analysis proved to be very
effective. We doubt even the developer of MtE noticed how predictable his
polymorphic engine is.
The goal of a polymorphic engine is not to leave any predictable sequence
of instructions that a virus scanner can use to simplify or optimize an
appropriate detection algorithm. For example, using the same instruction to
increment the index register is too predictable. A mixture of instructions
that achieve the same result explicitly or implicitly (as in TPE) would make
detection a lot harder.
5. Detection of Polymorphic Viruses
There are several problems that must be addressed to develop a reliable
detection routine for a given polymorphic engine. The most challenging one
appears to be avoiding false positives. Many programs include a run-time
decompressor to reduce space requirements. When loaded in memory, the
decompressor takes charge and produces an expanded copy of the code that can
be run on the CPU. Furthermore, some programs are actually encrypted on disk,
and they are decrypted on the fly when they are loaded. The purpose of using
such a scheme is to make reverse-engineering efforts difficult. Such programs
are more likely to trigger a false positive.
To reduce false positive rate, one can limit the search to a small area of
the suspect file. This would be helpful in many cases; however, the compressed
and encrypted files carry their decompressor/decryptor at the entry point of
the program, just where many viruses plant their code. Another trick a few
viruses employ is to hide the virus entry point where the decryptor is
located. Of course, scattering the virus code while keeping the host
functional complicates the virus.
A structural analysis of TPE reveals some information that can be used to
recognize a TPE-based decryptor, although it is not as useful as it is in the
case of MtE. Note that structural analysis does not rely on presence of
specific byte sequences, and therefore offers a powerful tool that can be
used in developing recognition routine for many polymorphic engines.
6. Protection Mechanisms and Solutions
Anti-virus field is full of solutions to almost every existing virus
problem, and sometimes solutions to non-existing problems as well. Among
other things, one solution prevailed as the most popular: scanning. The
inherent flaw in this solution is that it cannot cope with viruses that it
does not have signatures for. Therefore ugrades are issued frequently.
Deployment of such upgrades in large installations requires tremendous
effort, and quickly translates into having a very old version of a scanner
installed; even then not necessarily used. Aside from the false sense of
security scanning gives, some companies make less-than-honest claims about
the capabilities of their scanners to promote their products. Testing such
claims is beyond the resources of most organizations. Those who could conduct
such tests are not always impartial.
Scanning has its place, and it is very useful when used properly. The best
protection against computer viruses is user awareness and education.
Unfortunately, a well-written virus can be so transparent that even the most
observant user may not notice a thing. Even worse, many users lack the
technical knowledge to understand how their computers work. Most people simply
do not have the time or desire to learn more than how to use a mouse. The
proof of this is the proliferation of products that make excessive hardware
demands without offering improved functionality. The bulk of the code takes
care of the user interface. This trend is unlikely to change.
More powerful techniques such as integrity checking can deal with viruses
of different kinds, including polymorphic viruses. Even a simple integrity
checker should be able to find if a polymorphic virus is spreading all over
your disk. In other words, solution to polymorphic viral spread has been
available for quite some time. Note that there are some viruses that target
integrity-based products and they must be dealt with accordingly. A detailed
discussion of such viral methods can be found in a paper written by anti-virus
researcher Mr. Vesselin Bontchev of Virus Test Center at the University of
Hamburg. His paper is titled Attacks Against Integrity Checkers, and it is
available via anonymous FTP. Interested individuals are encouraged to read
his excellent paper.
The point is that an integrity-based anti-viral solution should be made
part of your arsenal against viruses. Such a solution could easily provide
you with early detection of viruses before they spread. Once the viral spread
is controlled, viruses become nothing more than another "computer glitch" that
can haunt only those without timely backups.
We believe that neither harsh legislation nor emphasis on responsible
computing can stop virus development, although they may slow it down. It is
necessary to take matters into your own hands and protect your computers
adequately.