textfiles/programming/CRYPTOGRAPHY/theory.des

John A. Thomas
CompuServe: 75236,3536
101 N.W. Eighth St.
Grand Prairie, TX 75050


		SURVEY OF DATA ENCRYPTION
		    By John A. Thomas


Introduction
------------

        The following article is a survey of data encryption.
It is intended to provoke discussion among the members of this
forum and perhaps lead to a creative exchange of ideas.
Although the basics of the subject seem to be known to few
programmers, it embraces many interesting and challenging
programming problems, ranging from the optimization of machine
code for maximum throughput to the integration of encryption
routines into editors, communications packages, and perhaps
products as yet not invented.   Governments have dominated this
technology up until the last few years, but now the need for
privacy and secrecy in the affairs of a computer-using public
has made it essential that programmers understand and apply
the fundamentals of data encryption.

Some Cryptographic Basics
-------------------------

        A few definitions are appropriate first.  We use the
term "encryption" to refer to the general process of making
plain information secret and making secret information plain.
To "encipher" a file is to transform the information in the file
so that it is no longer directly intelligible.  The file is then
said to be in "ciphertext".  To "decipher" a file is to
transform it so that it is directly intelligible; that is, to
recover the "plaintext."

        The two general devices of encryption are "ciphers" and
"codes".  A cipher works on the individual letters of an
alphabet, while a code operates on some higher semantic level,
such as whole words or phrases.  Cipher systems may work by
transposition (shuffling the characters in a message into some
new order), or by substitution (exchanging each character in the
message for a different character according to some rule), or a
combination of both.  In modern usage, transposition is often
called "permutation".  A cipher which employs both transposition
and substitution is called a "product" cipher.  In general,
product ciphers are stronger than those using transposition or
substitution alone.  Shannon referred to substitution as
"confusion", because the output is a non-linear function of the
input, thus creating confusion as to the set of input
characters.  He referred to transposition as "diffusion" because
it spreads the dependence of the output from a small number of
input positions to a larger number.

        Every encryption system has two essential parts: an algorithm
for enciphering and deciphering, and a "key", which consists of
information to be combined with the plaintext according to the
dictates of the algorithm.  In any modern encryption system, the
algorithm is assumed to be known to an opponent, and the security of
the system rests entirely in the secrecy of the key.

        Our goal is to translate the language of the plaintext to a
new "language" which cannot convey meaning without the additional
information in the key.  Those familiar with the concept of
"entropy" in physics may be surprised to learn that it is also
useful in information theory and cryptography.  Entropy is a
measure of the amount of disorder in a physical system, or the
relative absence of information in a communication system.  A
natural language such as English has a low entropy because of
its redundancies and statistical regularities. Even if many of
the characters in a sentence are missing or garbled, we can
usually make a good guess as to its meaning.  Conversely, we
want the language of our ciphertext to have as high an entropy
as possible; ideally, it should be utterly random. Our guiding
principle is that we must increase the uncertainty of the
cryptanalyst as much as possible. His uncertainty should be so
great that he cannot make any meaningful statement about the
plaintext after examining the ciphertext; also, he must be just
as uncertain about the key, even if he has the plaintext itself
and the corresponding ciphertext (In practice, it is impossible
to keep all plaintext out of his hands).

        A prime consideration in the security of an encryption
system is the length of the key. If a short key (i.e., short
compared with the length of the plaintext) is used, then the
statistical properties of the language will begin to "show
through" in the ciphertext as the key is used over and over, and
a cryptanalyst will be able to derive the key if he has enough
ciphertext to work with.  On the other hand, we want a
relatively short key, so that it can be easily stored or even
remembered by a human.  The government or a large corporation
may have the means to generate and store long binary keys, but
we cannot assume that the personal computer user will be able to
do so.

        The other important fact about the keys is that there must be
very many of them. If our system allows only 10,000 different keys,
for example, it is not secure, because our opponent could try every
possible key in a reasonable amount of time.  This introduces
the concept of the "work factor" required to break an encryption
system.  We may not have a system unbreakable in principle, but
if we can make the work factor for breaking so high it is not
practical for our opponent to do so, then it is irrelevant that the
system may be less strong than the ideal.  What constitutes an
adequate work factor depends essentially on the number of
uncertainties the cryptanalyst must resolve before he can derive
plaintext or a key. In these days of constantly improving computers,
that number should probably exceed 2**128. It is easy to quantify the
work factor if we are talking about exhaustive key trial, but few
modern ciphers are likely to be broken by key trial, since it is too
easy to make the key space very large. Most likely they will be
broken because of internal periodicities and subtle dependency of
output on input which give the cryptanalyst enough information to
reduce his uncertainty by orders of magnitude.

        A corollary to work factor is the rule that a system need
only be strong enough to protect the information for however
long it has value.  If a system can be broken in a week, but
not sooner, then it may be good enough, if the information has
no value to an opponent after a week.


Cryptanalysis
-------------

        Cryptanalysis is the science of deriving plaintext
without the key information.  Anyone intending to design an
encryption system must acquaint himself to some degree with
cryptanalytic methods.  The methods of attack may range from
sophisticated statistical analysis of ciphertext to breaking
into the opponent's office and stealing his keys ("practical
cryptanalysis").  There are no rules of fair play.   The
cryptanalyist is free to use his puzzle-solving ingenuity
to the utmost, even to the point of applying the knowledge that
your dog's name is "Pascal", and that you might be lazy enough
to use that as your key for the day.

        The cryptanalyst may have only ciphertext to work with,
or he may have both ciphertext and the corresponding plaintext,
or he may be able to obtain the encipherment of chosen
plaintext.  Some cryptographic systems are fairly strong if the
analyst is limited to ciphertext, but fail completely if he has
corresponding plaintext.  Your system should be strong enough to
resist attack even if your opponent has both plaintext and
ciphertext.

        Computer power can greatly aid cryptanalysis, but
many systems that appear strong can be broken with pencil-and-
paper methods.  For example, the Vigenere family of
polyalphabetic ciphers was generally believed to be unbreakable
up until the late nineteenth century. A polyalphabetic cipher is
a substitution cipher in which a different alphabet is used for
each character of plaintext.  In these systems, the key
determines the order of the substitution alphabets, and the
cycle repeats with a period equal to the length of the key. This
periodicity is a fatal weakness, since fairly often a repeated
letter or word of plaintext will be enciphered with the same key
letters, giving identical blocks of ciphertext.  This exposes
the length of the key.  Once we have the length of the key, we
use the known letter frequencies of the language to gradually
build and test hypotheses about the key.  Vigenere ciphers can
be easily implemented on computers, but they are worthless
today.  A designer without knowledge of cryptanalysis however,
might be just as ignorant of this fact as his colleagues of
the last century.  Please see the references at the end of
this article for information on cryptanalytic technique.

A Survey of Cryptographic systems
---------------------------------

        We now review some representative encryption schemes,
starting with traditional ones and proceeding to the systems
which are only feasible to implement on computers.

        The infinite-key cipher, also known as the "one time
pad," is simple in concept. We first generate a key which
is random and at least the same length as our message.  Then,
for each character of plaintext, we add the corresponding
character of the key, to give the ciphertext.  By "addition," we
mean some reversible operation; the usual choice is the
exclusive-or. A little reflection will show that given a random
key at least the size of the plaintext (i.e., "infinite" with
respect to the plaintext because it is never repeated), then the
resulting cipher is unbreakable, even in principle.  This scheme
is in use today for the most secret government communications,
but it presents a serious practical problem with its requirement
for a long random key for each message and the need to somehow
send the lengthy key to the recipient. Thus the ideal infinite
key system is not practical for large volumes of message
traffic.  It is certainly not practical for file encryption on
computers, since where would the key be stored?  Be wary of
schemes which use software random-number generators to supply
the "infinite" key.  Typical random-number algorithms use the
preceeding random number to generate the succeeding number, and
can thus be solved if only one number in the sequence is found.

        Some ciphers have been built to approximate the
infinite-key system by expanding a short key.  The Vernam system
for telegraph transmission used long paper tapes containing
random binary digits (Baudot code, actually) which were
exclusively-or'ed with the message digits.  To achieve a long
key stream, Vernam and others used two or more key tapes of
relatively prime lengths, giving a composite key equal to their
product.  The system is still not ideal, since eventually the
key stream will repeat, allowing the analyst to derive the
length and composition of the keys, given enough ciphertext.
There are other ways to approach the infinite-key ideal, some of
which are suggested in the author's article (with Joan
Thersites) in the August '84 issue of DDJ.

        The "rotor" systems take their name from the
electromechanical devices of World War II, the best known being
perhaps the German ENIGMA.  The rotors are wheels with
characters inscribed on their edges, and with electrical
contacts corresponding to the letters on both sides.  A
plaintext letter enters on one side of the rotor and is mapped
to a different letter on the other side before passing to
the next rotor, and so on. All of the rotors (and there may be
few or many) are then stepped, so that the next substitution is
different.  The key is the arrangement and initial setting of
the rotor disks. These devices are easy to implement in software
and are fairly strong. They can be broken however; the British
solution of the ENIGMA is an interesting story outside the
scope of this note.  If you implement a rotor system, consider
having it operate on bits or nybbles instead of bytes, consider
adding permutation stages, and consider how you are going to
generate the rotor tables, since you must assume these will
become known to an opponent.

        In 1977 the National Bureau of Standards promulgated the
Data Encryption Standard (DES) as the encryption system to be
used by all federal agencies (except for those enciphering data
classified under any of the national security acts).  The
standard is available in a government publication and also in a
number of books.  The DES was intended to be implemented only in
hardware, probably because its designers did not want users to
make changes to its internal tables.  However, DES has been
implemented in software and is available in several
microcomputer products (such as Borland's Superkey or IBM's
Data Encoder).

        The DES is a product cipher using 16 stages of
permutation and substitution on blocks of 64 bits each. The
permutation tables are fixed, and the substitutions are
determined by bits from a 56-bit key and the message block.
This short key has caused some experts to question the security
of DES. Controversy also exists regarding the involvement of the
NSA in parts of the DES design.  The issues are interesting, but
beyond the scope of this note.

        Since DES was intended for hardware implementation, it
is relatively slow in software.  Software implementations of DES
are challenging because of the bit-manipulation required in the
key scheduling and permutation routines of the algorithm.  Some
implementations gain speed at the expense of code size by using
large pre-computed tables.

        The public key cipher is an interesting new development
which shows potential for making other encryption systems
obsolete. It takes its name from the fact that the key
information is divided into two parts, one of which can be made
public. A person with the public key can encipher messages, but
only one with the private key can decipher them. All of the
public key systems rely on the existence of certain functions
for which the inverse is very difficult to compute without the
information in the private key. These schemes do not appear to
be practical for microcomputers if their strength is fully
exploited, at least for eight-bit machines.  One variety of
public key system (the "knap-sack") has been broken by solution
of its enciphering function, but this is no reflection on other
systems, such as the RSA scheme, which use different enciphering
functions.  All public-key systems proposed to date require
heavy computation, such as the exponentiation and division of
very large numbers (200 decimal digits for the RSA scheme).  On
the other hand, a public-key system that worked at only 10
bytes/sec might be useful if all we are sending are the keys for
some other system, such as the DES.


Some random thoughts
--------------------

        To wrap up this too-lengthy exposition, I append a few
questions for the readers:

        Must we operate on blocks instead of bytes?  Block
ciphers seem stronger, since they allow for permutation.  On the
other hand, they make life difficult when file size is not
an integral multiple of the block size.

        Can we make a file encryption system OS-independent?
This is related to the question above on blocks vs bits.  How do
we define the end-of-file if the plaintext is ascii and the
ciphertext can be any 8-bit value?

        Can we find an efficient way to generate and store a
random key for the infinite-key system?  Hardware random-number
generators are not hard to build, but would they be of any use?

        Bit-fiddling is expensive.  Can it be avoided and still
leave a secure system?  What are the relative costs of
manipulating bits on the Z80 vs the 68000, for example?

        No file-encryption system can erase a file logically and
be considered secure.  The information can be recovered until it
is overwritten.  Overwriting files adds to processing time.  I
am informed that it is possible to reliably extract information
even from sectors that HAVE been overwritten.  Is this so?
If it is, what is the solution?

        How do we integrate encryption systems into different
tools?  Should a telecommunications program transparently
encrypt data if the correspondent is compatible?  What about an
editor-encryption system wherein plaintext would never exist on
the disk, only on the screen?  How would we manage to
encipher/decipher text as we scroll through it and make changes,
and still get acceptable performance?

        By their nature, encryption schemes are difficult to
test.  In practice, we can only have confidence that a system is
strong after it has been subjected to repeated attack and
remained unbroken.  What test might we subject a system to that
would increase our confidence in it?


References
----------

Here are a few useful books and articles.  This is by no means
a complete bibliography of the subject:

Kahn, David.  "The Code Breakers".  The basic reference for the
        history of cryptography and cryptanalysis.  Use it to
        learn where others have gone wrong.

Konheim, Alan G.  "Cryptography, A Primer".  Survey of
        cryptographic systems from a mathematical perspective.
        Discusses rotor systems and the DES in great detail.

Sinkov, Abraham.  "Elementary Cryptanalysis".  Very basic, but
        very useful, introduction to the mathematical concepts
        of cryptanalysis.

Foster, Caxton C.  "Cryptanalysis for Microcomputers".  Covers
        the cryptanalysis of simple systems, but still a good
        introduction to cryptanalytic technique.  Describes the
        operation of many traditional systems in detail.

Shannon, Claude.  "Communication Theory of Secrecy Systems".
        Bell System Technical Journal (October 1949) : 656-715.
        Discusses secrecy systems from viewpoint of information
        theory.  No practical tips, but useful orientation.

Rivest, R. et al. "A method for Obtaining Digital Signatures and
        Public Key Cryptosystems".  Comm. of the ACM, Vol. 21,
        No. 2, (February 1978) : 120-126.  This article
        describes what has come to be known as the RSA
        public-key system.

"Data Encryption Standard", Federal Information Processing
        Standard (FIPS), Publication No. 46, National Bureau of
        Standards, U.S. Dept. of Commerce, January, 1977.

---

To start off, I'll discuss some *good* points of the Data Encryption
Standard promulgated by the federal government (DES).  We all know the key is
too small - the NSA almost certainly has the means to exhaust it.  But it
would be a pity not to examine the DES just for that reason.  It uses a
brilliant cryptographic technique that once understood can be used by us in
families of other crypto-systems, with much larger keys.

The DES shows us how to use one-way functions in cryptography.  At first
sight, a one-way function would seem to be useless - if we encrypt 'A' and get
'R', we have to be able to decrypt 'R' and get back 'A'.  However, a one-way
function, if it could be used, allows very complex transformations of text
that are practically impossible to undo without knowledge of the key.
However, the DES is as complicated as it is complex, so for a beginning I'm
going to explain how to use a one-way function cryptographically without
reference to the DES.  If there's enough interest, we can later relate this to
the DES.

Perhaps the simplest way to define a one-way function is with a table, such
as:

             0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
            -----------------------------------------------
            14 05 01 14 10 04 08 00 03 02 15 08 09 11 07 15

The top numbers are indexes into the one-way table.  Given an index, you can
get a value by table look up, but given a table value there's no guarantee
you'll get the index you started with because both 0 and 3 map to 14, 6 and 11
map to 8, and so on.  BTW, the table values were generated by a random
process.

Now, let's use this cryptographically.  Take an ASCII letter, say 'a' and
split it into two nibbles, left and right

                       LEFT  RIGHT
                61h ->  6      1

The DES trick is to use one half as an argument of a one-way function to
obtain a value with which to encrypt the other half, so let's do it.  Using
RIGHT as the index into the table, we obtain the value 5.  Now, we need a
primitive encryption function that is, and must be, invertible.  The DES uses
XOR, but we will use ADD, and add 5 to LEFT, mod 16, obtaining 11.  We have

                       LEFT  RIGHT
                        11     1

This encrypts LEFT.  Notice that even though we used a one-way function, the
encryption is completely reversible for two reasons.  First, RIGHT, the
argument of the table lookup is unchanged, so getting the correct value from
the table for decryption is assured.  Second, the encryption primitive is
invertible; we need only to subtract the table value mod 16 from encrypted
left.

But there seems to be a problem.  RIGHT isn't encrypted, and if that's how we
left matters, we wouldn't have much of a cipher.  The answer suggests itself -
use the new value of LEFT in the same way to encrypt RIGHT.  Let's do it.
Using 11 as an index into the table gives us 8, which we add to RIGHT giving

                       LEFT  RIGHT
                        11     9

which on the IBM PC is a funny graphics character.  Now, is this reversible?
Of course it is, the argument into the table that encrypted RIGHT is
completely unchanged, and if used we get 8 which we subtract from 9 giving 1.
And we have already shown that 11  1 will undo properly to give us 6 1.

So far, we have nothing but a curious simple substitution cipher.  Repeating
the process isn't encouraging either.  It clearly must cycle if we continue
to alternately encrypt LEFT and RIGHT, using the values from the previous
encryption as input.  In fact, there are a number of cycles of different
lengths, but it's interesting that some cycles don't cycle back to the
starting value.  For example, starting with 01, we get 01, 55, 99, BB, 33,
EE, 55...  The reason is that our table is not a permutation of the integers
0 through 15.

To make our sample cipher more than a curiosity, we shall do what the DES
does, use another one-way function (that is, a table) in a second *round*, and
repeat this process with the new table.

             0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
            -----------------------------------------------
            04 13 12 06 13 07 03 15 13 15 06 09 09 09 07 10 07

                       LEFT  RIGHT
                        11     9 -> 15 + 11 = 6
        11 = 9 + 3  <-   6     9
                         6    11

This is already a considerably more complex cipher.  It is still reversible,
but we must remember to *decrypt* it starting with the last table, not the
first.  This cipher, like the DES, is sensitive to direction, encrypt or
decrypt.

It still is a simple substitution cipher with no strength.  You will notice
however, that the length of the second table is one more than that of the
first.  Obviously, I intend to rotate both tables one position before
encrypting the next letter of a message.  Since the lengths are relatively
prime, I can encipher 16x17=272 letters before my tables repeat.  This is no
longer a simple substitution cipher, but one far more complex.  Still not good
enough by far.  But if I had only one message not too long to protect, this
would be practically unbreakable.  It might be good for hundreds of letters
before repetitions would enable cryptanalysis.

Of course, it isn't strong enough to protect against an attack based on known
plaintext.  With enough known plaintext, both tables can be reconstructed.  If
the DES used only two rounds, it too could be cryptanalyzed.  It can be broken
because with only two rounds, there aren't enough addends for known input and
output that the addends can't be exhausted.  Several more rounds are necessary
to make exhaustion of possible addends impractical.  It is quite important in
designing a crypto-system that you design it against *known* plaintext.  In
practice, it is impossible to prevent known plaintext from falling into the
cryptanalyst's hands.

Later, I will develop this sample into a very tough cipher indeed, though I
won't make any claims for its ultimate strength (I haven't examined it that
well yet).  But for now, let's sum up what we have done.

1. We used a one-way function for a crypto-system.  To give credit where it is
due, this technique was invented by Horst Feistel of IBM, originally for the
Lucifer.  It is in my opinion an absolutely brilliant cryptographic
innovation.  The technique is fundamental to the DES.

2. We cascaded one-way functions for complexity.  The DES uses cascading
similarly for strength.

3. Unlike the DES, we have developed the germ of a byte cipher, rather than a
block cipher.  Very great complexity may be had with a block cipher, but there
is still doubt that block ciphers are 'better'.

It is the trick of alternately enciphering 'left' and 'right' that permits the
use of a one-way function.  In practice, it is easier to swap or exchange
right and left after each round.  That is, we use RIGHT to encrypt LEFT, then
exchange RIGHT and LEFT, and repeat encryption for the next round.  This
simplifies code and circuitry, though it may confuse us as we try to follow
what happens in the DES.  Therefore, to avoid confusion, we will always keep
RIGHT right, and LEFT left.

---

To begin, we shall consider the DES encipherment a little abstractly, then
later in more detail.

The heart of the DES one-way function consists of the eight S-boxes, each
of which transforms six bits of the input to four bits of output.  We
can understand the function of the S-boxes better if we first consider
the transformation they effect as they act in concert upon the entire
input block of 48 bits.  Imagine one table consisting of 2^48 numbers of 32
bits each. Each 32 bit number is repeated 65,536 times, in a more or less
random order. Also imagine a 'letter', not of one byte, but of 64 bits divided
into a LEFT of 32 bits and a RIGHT of 32 bits.  RIGHT is combined with 48 key
bits selected out of 56 in a way that will be described later, without,
however, changing RIGHT, and this combined value is used as an argument into
the huge table to return a pseudo-random 32 bit value.  This returned value is
XORed with LEFT to encrypt LEFT.  The same table lookup is repeated in the next
round using LEFT as the argument and encrypting RIGHT.  Except that a different
arrangement of 48 key bits out of the 56 is used.  The DES repeats this process
16 times, that is, encrypts RIGHT eight times, and LEFT eight times.  There is
a reason for eight that we will discuss later.  Each iteration is called a
'round'.

It is clear that this huge table is a one-way function, and that the
encryption technique is almost exactly what we described for the byte cipher
discussed in the previous upload.  There is an important difference - we are
now using a key in addition to a table.  In our byte cipher, the key is the
table itself. Also, the DES uses a different permutation of the original key
for every round in a clever way that ensures that every bit of final
ciphertext depends complexly on every bit of the key.  This is important.
A sure-fire cryptanalytic technique is to suppose a few key bits - not
too many possibilities to exhaust - then to test the supposition against
sample ciphertext compared with known plaintext. In this way, although a key
space may be too large to exhaust by brute force, the key is gradually
recovered.  A good example is the simple substitution where the key space is
26!  (about 2^88). But, by forcing all ciphertext bits to depend on all key
bits, this sure-fire attack is inhibited if not made impossible.  You can't
solve for a few key bits, you have to solve for all or none.

Why a key?  The chief reason is that the one-way table is published.  It is no
secret as we suppose our byte cipher's tables are.  And, to be a standard,
we aren't supposed to change the DES tables.  Further, the tables are not
random; the inventors state that the DES's particular tables have been worked
out for cryptographic strength, and that to their own surprise discovered that
random tables, which they had naively believed to be sufficient, aren't so
good.  And, nobody will say what criterion should be used to design DES
tables, and nobody has figured them out (at least, not publicly).  Naturally,
this has bred suspicion, and the fact that the NSA helped design the tables
hasn't helped.  Later, I would like to offer my own speculation on what this
criterion might be.  To summarize, the key serves as the secret ingredient
because the tables can't be secret.

Obviously, a table of this size is a practical impossibility.  So, how does
the DES achieve a 'virtual' table of this size?  Basically, nibble by nibble.
It uses the nibbles of RIGHT, in order, as indexes into relatively small
matrixes to get eight encryption values per round.  But the encryption values
don't encrypt the corresponding nibbles of LEFT.  Oh no, that would be fatally
simple.  Each result nibble of the one-way lookup encrypts scattered bits of
LEFT so that each bit of the value impacts the most nibbles possible in LEFT.
Now, when LEFT becomes the argument, the nibble values returned from the table
have dependencies on many bits of RIGHT.  Within five rounds, all ciphertext
bits become complex dependents of all plaintext bits and all key bits.  The
dependency is extremely violent.  Believe me, a single bit of difference in
either plaintext or key gives you an unrecognizably different ciphertext.

We should be ready now to see how a nibble of RIGHT (actually, three will be
involved), and some key bits, are used as table arguments, and how this
encrypts four bits of LEFT in a single round.  Let's ignore how the different
permutations of the key bits are generated for now.  Just imagine there are
somehow six key bits.  The first nibble one-way function is this matrix:

                            Box S1
            0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
          |-----------------------------------------------
        0 |14  4 13  1  2 15 11  8  3 10  6 12  5  9  0  7
        1 | 0 15  7  4 14  2 13  1 10  6 12 11  9  5  3  8
        2 | 4  1 14  8 13  6  2 11 15 12  9  7  3 10  5  0
        3 |15 12  8  2  4  9  1  7  5 11  3 14 10  0  6 13

Recall that the input block of 32 bits is first expanded to 48 bits by
the "selection operation".  If you consult this table in the standard,
you will see that the first six bits of the result are the first RIGHT
nibble, plus the last bit of RIGHT, plus the first bit of the next nibble
to form six bits:

                32  1  2  3  4  5

This is one argument into the one-way function.  Also, six bits of the key are
an argument, and the six key bits and these RIGHT bits are XORed to form the
actual table argument. The first and last bits of this XOR-result index the row
of the S1 box.  The middle four bits index the column.  The intersection of row
and column gives us a four-bit value, which, after passing through the
permutation operation P, is used to encrypt LEFT bits 9 17 23 and 31.

This is surely one-way.  You can't determine the row and column indexes from
the table value because there are four row and column indexes that map to the
same value.  It should also be clear that LEFT bits 9 17 23 31 are decryptable
because RIGHT was never changed.  We have only to combine the appropriate key
bits with RIGHT's 32 1 2 3 4 5, and we'll get the same value out of the S box,
which, XORed with LEFT, will yield our original 9 17 23 and 31.

Notice the scatter.  By encrypting these particular bits, we are encrypting
the 3rd, 5th, 6th, and 8th nibbles which will be immediately used in the next
round as arguments.  Because of their positions, they will form part of the
2nd, 3rd, 4th, 5th, 6th, and 8th nibble arguments of the future LEFT.  And
thus, when RIGHT gets encrypted, its old first nibble will be quite
spread out.  The scatter makes far more bits dependent on very few bits than
is at first apparent.  The only unaffected nibbles are the 1st and 7th, but
this is only one round, and we tracked only one nibble of RIGHT.

This is getting lengthy, so let me list the RIGHT bit arguments corresponding
to the eight S boxes and the LEFT bits that get encrypted

          RIGHT bits         box         LEFT bits
      32  1  2  3  4  5 ----> S1 ---->  9 17 23 31
       4  5  6  7  8  9 ----> S2 ----> 13 28  2 18
       8  9 10 11 12 13 ----> S3 ----> 24 16 30  6
      12 13 14 15 16 17 ----> S4 ----> 26 20 10  1
      16 17 18 19 20 21 ----> S5 ---->  8 14 25  3
      20 21 22 23 24 25 ----> S6 ---->  4 29 11 19
      24 25 26 27 28 29 ----> S7 ----> 32 12 22  7
      28 29 30 31 32  1 ----> S8 ---->  5 27 15 21

In this manner, a nibble by nibble encryption is spread out so that a virtual
table of 2^48 elements is achieved.  Note that we have not really considered
the key yet.  And note that I have shown the contents of only one box.  The
boxes are listed in FIPS Pub 46, which you should use if you ever implement
the DES, because other sources usually have typos, the slightest of which is
fatal.

Also, pay attention to how RIGHT's bits are expanded to 48.  The last bit of
the previous nibble plus the first bit of the next are tacked onto each nibble
of the argument fore and aft.  This builds an inter-nibble dependency into the
DES.  Even more important, one encrypting bit from the table can do a lot of
'damage'.  Look at the nibble value coming out of the second S-box; its first
bit will encrypt the 13th bit of LEFT.  But look where the 13th bit occurs in
expanded RIGHT!  The result of this encryption is that when LEFT becomes the
table argument, the third and fourth table nibbles are dependent on just one
bit.  As a matter of fact, the value out of any S-box encrypts exactly two
nibble boundary bits and two nibble inner bits.

We saw how the DES uses one-way functions, each specific to a nibble, and how
the results of the one-way functions are carefully scattered.  The purpose of
the scattering is to build up complex dependencies for the final result on all
bits of message and key as fast as possible.  The scatter is achieved by a
permutation, called P, that the standard describes as occurring after
gathering the eight table values.  For software implementation, there is a
great savings in speed by replacing all S-box values with 32-bit entries
pre-permuted by the inverse of P - that is, by the encrypting bit positions
we already listed, and doing away with P entirely.

To illustrate, the first value of the first S-box is 14, in binary, 1 1 1 0.
Therefore, to do away with P, we replace this value with its equivalent

                   1                   2                   3
 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
 ---------------------------------------------------------------
 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
                 1               1           1               0

The gain in speed can't be overemphasized.  There are more implementation
tricks like this that hopefully we can get into later.  On an 8088, at 4.77
mHz, encryption speeds of 2500 bytes/sec are possible in pure software.  On a
68000, we think that software speeds of 6000, or better, bytes/sec are
achievable.  Perhaps even as high as 8000.  The important lesson is that you
don't have to code in the dumb plodding way the standard implies.  It seems
that the standard was written to be as uninstructive as possible.

We should now discuss the key.  We won't go into a bit by bit description, for
that, see FIPS 46.  The main idea is to generate 16 48-bit keys out of the
original 56 that

  1.  are simply generated
  2.  vary one from the other as much as possible
  3.  fit the 'scatter' to involve all key bits in all cipher bits

For this, two (actually, three) permutations are used, called PC1 and PC2 in
the standard, awkward names, but we'll use them.  This 'key scheduling' is not
perfect; it permits weak keys (keys that are their own inverse) and semi-weak
keys (keys that result in alternating patterns so that all encryptions of LEFT
are as if encrypted by an invariant key, and a different invariant key for
RIGHT).  The key scheduling just isn't good enough to avoid these weaknesses.

PC1 is a geometric permutation that doesn't seem to have deep reason.  It
discards the so-called parity bits, thus reducing 64 bits to 56, and divides
the key bits into two halves.  The halving *is* important.  This is a picture
of the PC1 transformation of the original key bits with their 'parity' bits

               1   2   3        4   5   6   7            8
               9  10  11       12  13  14  15           16
              17  18  19       20  21  22  23           24
              25  26  27       28  29  30  31           32
              33  34  35  36       37  38  39           40
           ^  41  42  43  44       45  46  47 ^         48
           |  49  50  51  52       53  54  55 |         56
           |  57  58  59  60       61  62  63 |         64
                 C half           D half            discarded

C = 57 49 .. 44 36
D = 63 55 .. 4 12 20 28

Very geometric, which, combined with the geometric shifts used with PC2, cause
a larger number of weak keys than would be strictly necessary.  Now, PC2,
which is actually two permutations, one that operates on circular shifts of C,
and one that operates on circular shifts of D, has an odd property; it
re-arranges 24 bits out of C and 24 bits out of D so that the C bits directly
combine only with the first 16 bits of RIGHT (or LEFT), and the D bits only
with the last 16 bits of RIGHT (or LEFT).  (I'm not considering the nibble
boundary bits).  Bit 41, in other words, will never combine with bits 17
through 32 of the one-way argument.  Similarly, bit 47 will never combine with
bits 1 through 16 of the one-way argument.

Indirect combination, due to the scatter, is another story.  My guess is, that
cutting the key bits into halves is deliberate to ensure that all key bits
encrypt all cipher bits as quickly and simply as possible.  If you carefully
examine which bits get encrypted by the table values, you will see that the
scatter also repositions substitution bits by halves.  That is, in our
illustration, the first 16 bits plus the nibble boundary bits, of RIGHT,
encrypt half of the first 16 bits of LEFT and half of the last 16 bits of LEFT
(almost).  Also, the last 16 bits of RIGHT cause encryption of the remaining
first and second halves of LEFT.  I think this completely explains the purpose
of the P permutation described (described? just listed!) in the standard.

For PC2, see FIPS 46.  Let me just mention that since PC2 selects 24 bits out
of each half, in any of the key schedules there are always 4 bits out of the
56 missing.  This makes it harder for the cryptanalyst to work the cipher
backwards by supposing and testing assumptions about the key.

Finally, let me add an implementation note.  You don't have to go through the
key scheduling algorithm for every block like the standard describes.
Instead, you program an initialization call that generates the 16 key
schedules one time.  You can do this because the schedules are invariant from
block to block.  Such an initialization call makes the difference between
encryption rates of 80 bytes/sec and 700 bytes/sec for an otherwise uninspired
implementation, or 2500 bytes/sec for what is in my opinion the very finest
ever for the IBM PC, the Data Encoder.

To make all cipher bits depend on all key bits is no mean accomplishment.  The
attempt is fraught with booby traps.  Most ways you can think of to force this
dependency have the unfortunate result of transforming the actual key into a
simpler and much smaller equivalent key.  This danger the designers of the DES
managed to avoid.  I mention this because we have seen some weaknesses in the
DES's key scheduling, but these weaknesses should not blind us to the DES's
good points if we want to learn how to design ciphers.

The DES has another weakness, the complementary property, that effectively
halves the key space.  This property is caused by the two uses of XOR in the
algorithm.  Let's use '~' to indicate boolean 'not', 'm' to indicate
'message', and 'k' to indicate 'key'.

The complementary property is as follows:

          DES(k,m) = ~DES(~k,~m)

Read this as: the DES encryption of a message, m, under a key, k, is IDENTICAL
to the complement of the DES encryption of ~m under key ~k.

Remember that the key bits are combined with the message bits by a XOR to look
up an encrypting value in an S-box.  Because of this XOR, m and k, or ~m and
~k, map to the identical value in an S-box because of the boolean identity

                 (m XOR k) = (~m XOR ~k)

Remember also that LEFT is encrypted by XORing it with the looked-up value.
Due to another boolean identity

               (LEFT XOR VALUE) = ~(~LEFT XOR VALUE)

we have the complementary property.  The result is that for known plaintext,
the DES's key space is 2^55, not 2^56.  And cryptographers must assume that
plaintext is known when analyzing a cipher; that simply isn't unreasonable.

This weakness would have been easy to avoid.  We have only to *add* the key to
RIGHT (or LEFT) mod 2^48, and we would not map to the same S-box values for
complements of message and key; and to *add* the looked-up value to LEFT (or
RIGHT) mod 2^32.  And as a general rule, XOR is not a good primitive for
ciphers.  True, it is convenient because XOR is its own inverse, while if we
used ADD to encrypt, we would have to use SUB to decrypt.  But in cryptography
there are more important things than convenience, for example, keeping the
actual key space close to the potential.

Why this weakness is not corrected is hard to understand.  DES defenders claim
that the complementary property is useful for verifying the correctness of
implementations.  Basically, you code the DES then test it to see if the
complementary property holds for randomly chosen key and plaintext, and if it
does, you are supposed to get a 'warm' feeling.  But this argument can't be
valid.  Errors in key scheduling and in S-box values can't be caught by this
test.  Matter of fact, most errors in coding the DES can't be caught by the
complementary property, so long as XOR is used to combine RIGHT and key, and
to encrypt LEFT.  It only proves that XOR is used both places, not that things
are right!

DES defenders also claim that XOR is necessary to keep decryption like
encryption, that is, to avoid sensitivity to direction.   However,
the DES, whether it uses XOR or not, *is* sensitive to direction.  To encrypt,
you must start with the *first* key schedule, and work your way to the 16th.
To decrypt, you must start with the 16th key schedule, and work your way
backwards to the first.  Since the DES is sensitive to direction anyhow, it
wouldn't hurt a thing to use ADD one way, and SUB the other.

My opinion is that XOR is a mistake realized too late, and that correction is
resisted because too much is now invested in this mistake, and that defense of
the XOR are rationalizations, not reasons.  And yes, it does matter.  The key
space of 2^56 isn't enough anyhow.  If the NSA can exhaust all 2^56 keys in a
day (and on average, it need only to exhaust half that, or 2^55), then for
known plaintext, which is very common, it can exhaust all possible 2^55 keys
in half a day (or on average 2^54 keys in one quarter of a day).

But our interest in the DES is not its defense, but to learn good ciphering
from both its good and bad points.  The lesson is clear; avoid XOR for
ciphers, because it halves the key space.  If you implement the DES and
don't need to maintain the standard, I recommend using ADD and SUB instead
of XOR.  Software implementation of the DES is frowned on, not because it
is slow, but because it permits a knowledgeable person to tamper with the DES
to the NSA's distress.  The NSA prefers hardware implementations only (ROM
qualifies as 'hardware') and will more readily grant export licenses for
hardware than software.

Here is one suggestion that practically increases the key space to
120 bits.  Begin with a seed of 64 bits, the seed being an extension
of your 56-bit key.  Encrypt the seed by your key and use the resulting
16 nibbles of the ciphertext to shuffle the nibbles of the first row of
the first S-box.  Encrypt the ciphertext (not the seed) again with
the altered S-box.  Use the second ciphertext block to shuffle the
nibbles of the next row of the S-box.  Repeatedly encrypt blocks
and shuffle rows until all 32 rows have been altered.  Thereafter,
use your 56-bit key and the altered DES tables to encrypt and decrypt
your messages.  In all probability, the resulting ciphertext won't
be as nice cryptographically with these randomized tables as with
the contrived ones, but this scheme will thwart a brute-force attack
based on exhaustive key trial.  A key search on a special-purpose
machine is possible over a 55-bit key space (given known plaintext),
but it is not possible over a 120-bit key space.  We may take up
later other pros and cons of modifying the DES tables.

The inventors of the DES state that five rounds are required for all
ciphertext bits to become dependent on all key bits and all message bits.  Yet
the DES uses 16 rounds.  How come?  Wouldn't it be better to limit the rounds
to five for a great increase in throughput?  Or would it?  In examining a
cipher it is very good not to take anything for granted, to ask oneself 'why
this, instead of that?' and attempt to find a coherent reason for it.

I haven't been able to answer this question satisfactorily for myself.
However, the DES is too carefully crafted for me to lightly believe that it
uses 16 rounds just because that is a nice round number in binary arithmetic.
Perhaps my current thinking will help others to puzzle it out.

The DES is breakable, not just by brute force but by cryptanalysis, if it used
two rounds instead of 16, and if plaintext is known.  To see this, let's list
knowns and unknowns for two rounds.  We need a notation now for LEFT and RIGHT
to show rounds, and we will use L[i] and R[i].  When i=0, LEFT and RIGHT are
plaintext; when i=n, LEFT and RIGHT are ciphertext.  Also, we will designate
key schedules by K[j].  This is the picture when n=1.

        L[0]     R[0]    known by assuming known plaintext
        L[1]     R[1]    known by intercepting ciphertext
        K[1]             unknown
        K[2]             unknown

Remember that encryption means a 32-bit number was picked out of the S-boxes
as a function of R[0] and K[1], and this number encrypted L[0] to produce
L[1].  Similarly, another 32-bit number was picked out of the S-boxes as a
function of L[1] and K[2], and the second number encrypted R[0] to produce
R[1].

There is another way to look at this (the poet Wallace Stevens listed thirteen
ways of looking at a blackbird).  One nibble is picked out of the first S-box
as a function of the bits 32 1 2 3 4 5 of R[0] and the first six bits of K[1]
(bits 10 51 34 60 49 17 of the key), and encrypts bits 9 17 23 and 31 of L[0]
by XOR.  And so on.

Now, if we have known plaintext, it is simply no problem to determine what the
S-box values were that encrypted both LEFT and RIGHT.  We just take bits 9 17
23 and 31 of L[0] and XOR them against the same bits of L[1].  And so on.
Let's call these nibbles the encrypting *sums*, and because there is only one
round per side, the sums are also direct S-box values.

For each S-box value, there are four possible 6-bit keys that will map to
the S-box value.  For all the nibbles that encrypted left, the possible key
space used with R[0] is reduced from 2^48 to (2^2)^8, or 2^16.  This key space
is easily exhausted, and we can now attack K[2].  By the same procedure, the
key space for K[2] is also reduced to 2^16, however, K[2] must be consistent
with K[1].  That is, the bits of K[2] are practically the same as for K[1],
but rearranged.  So, in fact, the key space for K[2] is far smaller than 2^16,
I think it is only 2^4 (but I haven't counted it yet).  This breaks a
two-round DES.

Now suppose four rounds; two for LEFT and two for RIGHT, and list the knowns
and unknowns again

        L[0]   R[0]                  known  (plaintext)
        L[1]   R[1]   K[1]  K[2]     unknown!  (intermediate ciphertext)
        L[2]   R[2]                  known  (ciphertext)
                      K[3]  K[4]     unknown

Now when we derive the encrypting sums we no longer have the S-box values, but
two S-box *addends*, and for any sum, there must be 16 pairs of addends.  We
simply don't know what L[1] and R[1], the intermediate ciphertext, are
anymore.  The possible keys for K[1] increase from 2^16 to 2^32.  This begins
to look computationally difficult.  Instead of four possible 6-bit keys per
nibble, we now have 16.  However, let us remember the consistency requirement
of the key bits.  If we do suppose a particular key bit is 1, it must be 1 in
all the schedules in which it occurs, regardless of position.  So, the
combinatorics aren't quite as daunting as they first appear.  It seems that
this is still solvable.

To work this out accurately, you would have to construct tables of key bits
according to the key schedules, then note the bits repeated (they will be in
different positions) from round to round.

For six rounds, the number of possible keys used with R[0] jumps to 2^48.
With a large machine, it should be possible to solve.  But any more rounds, we
would be solving for 2^56 possible keys; that is, we are back to a brute force
attack.

With 16 rounds, there are seven intermediate unknown LEFTs and RIGHTs
and the combinatorics become too great to backsolve for the key.  I suspect
that if the math is worked out, 16 rounds make it a practical certainty that
backsolving is infeasible, whether we attack all bits used in one round, or a
few bits used in all rounds.  It would be nice to know the exact number of
rounds that reach this infeasibility; the math, however, escapes me.

We must remember this when we return to our byte cipher (discussed in
the file "DES"), because we should determine how many rounds are
required to make backsolving with known plaintext infeasible.  The
number of rounds is important; too many is inefficient, too few is fatal.

---

Experts have studied the S-boxes.  To a casual eye each row of
the S-boxes appears to be a random permutation of the nibbles 0
through 15.  But they aren't random.  So far, nobody has figured
out their ordering or why.

I haven't figured out the S-boxes, and it is unlikely I ever
will.  Nevertheless, I am inclined to believe the inventors when
they say there is no trap door, and the computed boxes really
are 'better' cryptographically than random boxes.  Keep in mind
that many mathematicians, computer scientists, and gifted
amateur cryptanalysts who owe nothing to the NSA have pored over
these boxes.  Also, the inventors' tone in their defense of the
DES ('Cryptography:  A New Dimension in Computer Security'; Carl
Meyer and Stephan Matyas, John Wiley and Sons) strikes me as
sincere.  They seem to me genuinely surprised to discover that
non-random boxes are better than random.

I can't explain the S-boxes, but I do have an idea why
non-random boxes might be 'better'.  It turns out that the
ENIGMA rotors also were not randomly wired.  Rather, they were
wired to ensure 'distance'.  The idea is that similar plaintext
should not encrypt under a similar key to a similar ciphertext.
I haven't studied the ENIGMA in detail yet, so I don't know what
threat 'closeness' poses to the key that the German inventors
were trying to avoid.  Yet, it must have been a threat, because
the German cryptographers deliberately avoided random rotor
wiring, in spite of the weaknesses 'distance' wiring introduced.

There is the same idea in symbol table hashing algorithms.
Here, one wants similar variables, perhaps differing only by one
character to hash to far different places in a symbol table.
The reason is, we want to avoid 'collisions' - different
variable names accidentally hashing to the same symbol table
location, causing extra effort to find an unused slot for each
variable name.  Most hashing schemes have the effect of
discarding the high order character, so without 'distance', in
FORTRAN, for example, collision is practically assured because
FORTRAN uses the initial letter of a variable to classify the
data type, and we tend to name variables systematically.

Might this not be the secret of the S-box design criterion?  It
seems to me that the message space mapping of a cipher is not in
principle different from symbol table hashing.  And if
'distance' is *not* a criterion of the mapping, maybe it ought
to be.

What I'm saying, strictly as speculation, is that very similar
plaintext, differing perhaps by only a bit, should map to wildly
different points in the message space, that it may be impossible
to guarantee distance with random table values, and that the
S-box values were carefully computed to ensure distance.
Exactly why the lack of distance should be weak, I haven't
figured out.  In support of my guess, the DES does indeed
exhibit the property that similar input maps to violently
different results.

But even if my guess is right, this leaves unanswered the
important question of how to design the boxes.  Still, it's a
start, and it raises very interesting questions in math and
cryptography.

The plaintext is not simply divided into LEFT and RIGHT like we
described.  It is permuted first by a so-called Initial
Permutation, IP, that transposes the rows and columns of a
bit-byte matrix in an odd way.

          1  2  3  4  5  6  7  8
          9 10 11 12 13 14 15 16
         17 18 19 20 21 22 23 24
         25 26 27 28 29 30 31 32
         33 34 35 36 37 38 39 40
         41 42 43 44 45 46 47 48
         49 50 51 52 53 54 55 56
         57 58 59 60 61 62 63 64
         -----------------------
             1     2     3     4   take out order for LEFT
          5     6     7     8      take out order for RIGHT

In other words, transcribe the bytes in row order, then take
bits out in the column order indicated, from bottom to top.

This permutation is bemusing.  There is no apparent reason for
it, and many cryptanalysts believe IP has no cryptographic
significance.  The American Cryptogram Association, an
organization of amateur cryptanalysts, has considered IP without
being able to reach a conclusion about it.

This unexplained permutation does something distinctive,
however.  Let's look at eight consecutive bytes in a different
way:

            x..x  x..x
            x..x  x..x
            x..x  x..x
            x..x  x..x
            x..x  x..x
            x..x  x..x
            x..x  x..x
            x..x  x..x

What we are doing is distinguishing between nibble boundary bits
and nibble inner bits.  Now, it is a fact that IP rearranges
boundary and inner bits like this

            .... ....
            x..x x..x
            .... ....
            x..x x..x
            x..x x..x
            .... ....
            x..x x..x
            .... ....

In other words, half the normal boundary bits are exchanged with
inner bits, and the original boundary bits that still remain
boundary bits are relocated.  The reason for this is obvious -
the boundary bits have quite pronounced statistics in normal
natural language text, especially in EBCDIC.  For example, a
blank is 0 1 0 0  0 0 0 0.  The result is that 20 percent of all
boundary bits are 0 because blanks make up 20 percent of all
text.  By continuing to separate the boundary nibbles by their
natural language statistics, you can derive a frequency table,
in addition to 0..0, for

          0..1
          1..0
          1..1

Do this as an exercise, and you will see that the frequencies
are quite pronounced.  Use EBCDIC; remember, the inventors of
the DES were thinking IBM, not micros and ASCII.

The effect of IP is to even out these frequencies for the
boundary bits so that they are more nearly uniform.  It is
impossible to believe that something as remarkable as this was
not intended.  This smoothing out of boundary bit statistics
must be the purpose of IP.

How come?  Why are the boundary bits so important that they are
evened out at the expense of the inner bits?  The answer I
suppose is to compensate for the reduction of the DES key from
128 bits to 56.

It doesn't take too long studying the DES to realize that its
dimensions are wrong.  It is very lopsided.  There are eight
boxes of four rows and 16 columns each.  In short time, you get
the feeling that 16 boxes of 16 rows and columns each would have
been more natural.  The key should have been 128 bits; LEFT and
RIGHT should have consisted of 64 bits each.  The inter-nibble
dependency should have been TWO bits from the previous nibble
(not one) concatenated to the current nibble, concatenated to
TWO bits of the following nibble.  In other words, instead of 32
1 2 3 4 5, it should have been 31 32 1 2 3 4 5 6.   Bits 31 32 5
6 should have been the row coordinate into the first S-box, not
32 and 5.

The result of this brutal reduction is that there are far less
rows per nibble to select values out of than there are columns.
Guessing which row is selected, rather than guessing which
value, there are

               (2^2)^8 = 2^16

choices per round, or for one nibble for all rounds.  This is
not a formidable number.

For known input, correctly guessing a row gives you two bits of
the key.  Because of the key bit consistency requirement, it
also helps determine other key bits.  Also, guessing the row
helps determine the column as well, since the selfsame message
row bits participate as column selection bits.  It looks like
row selection is the weakest part of the DES, and if a
cryptanalytic attack is achievable, it should be the spot to
concentrate on.  My guess is that a strong bias of nibble
boundary bits might help such an attack, and that IP is a
stop-gap measure to thwart it.  If the rows *can* be determined,
32 key bits are recovered, and the DES falls.  The remaining 24
bits could not resist even a microcomputer.

Perhaps IP is not a bemusing oddity.  Perhaps it is symptomatic
of a deep weakness born of the DES's butchering.  By the way,
this is exactly how cryptanalysis works - find a weak point and
hammer it.

---

This note will describe one more implementation trick for very
fast software DES, then recap implementation tricks.

You will remember how the 32 bits of RIGHT (or LEFT) must be
expanded before they can be combined with the key schedule for a
round.  The following bits

               1  2  3  4
               5  6  7  8
               9 10 11 12
              13 14 15 16
              17 18 19 20
              21 22 23 24
              25 26 27 28
              29 30 31 32

must be expanded to

           32  1  2  3  4  5
            4  5  6  7  8  9
            8  9 10 11 12 13
           12 13 14 15 16 17
           16 17 18 19 20 21
           20 21 22 23 24 25
           24 25 26 27 28 26
           28 29 30 31 32  1

This expansion is called E in FIPS 46.  You will also remember
that IP arranges the bits of the plaintext into these 32 bits
for RIGHT and 32 bits for LEFT.  Now to perform this expansion
in code for every round happens to be quite a bit of work, even
using a micro's fastest instruction forms.  However, IP may be
modified to generate LEFT and RIGHT in *expanded* form.  Then,
if the S-box values are adjusted to compensate for an expanded
LEFT and RIGHT, the same encryption is achieved, but without the
expense of performing E for every round.  Including the S-box
adjustment to do away with P, the cost is 48-bit S-box values
for every original 4-bit element in the S-boxes.

To see how this works, remember that *all* values in the first
S-box encrypt only bits 9 17 23 and 31 of either LEFT or RIGHT,
and the first S-box's first value is 1 1 1 0, in binary.  We
therefore replace this value with the following

           0  0  0  0  0  0
           0  0  0  0  0  1
           0  1  0  0  0  0
           0  0  0  0  0  1
           0  1  0  0  0  0
           0  0  0  1  0  0
           0  0  0  0  0  0
           0  0  0  0  0  0

Let's assume the 8088 chip and a clock rate of 4.77 MHz, and
recap.  Setting the key schedules in a one-time initialization
call increases software speed from about 80 bytes per second to
close to 800.  Getting rid of P by precomputing it in the
S-boxes increases the speed from about 800 bytes/second to about
1700.  Making E a one-time expense by putting it into IP, and
adjusting the S-boxes accordingly increases the speed from about
1700 to 2500 bytes/second.

There is one more trick if you can live with a c * 64K table,
where 'c' is the length of one table entry. You can collapse two
S-boxes into a single table, thereby halving the number of S-box
lookups. This should bring the speed up to about 5000
bytes/second. However, this large a table isn't practical for
some encryption applications.

---

    The files permf.s and permg.s are the 68000 versions of the
permutation routines described in Z80 code in the article
"Designing a File Encryption System" in the Aug '84 issue of
DDJ.  Permf performs the forward permutation of the bits in a
256-bit block as specified by a table of bytes.  Permg performs
the inverse permutation.

    The file cycle.s is the 68000 version of the corresponding
Z80 code for the routine which "cycles" the permutation tables
to the next position.

For example, if the permutation table has the values:

        1  15  115  57 .... 0

then the forward permutation means to put the 1st bit of the
block in the 0th place, the 15th bit in the 1st place, the 115th
bit in the 2nd place, and so on until the 0th bit goes in the
255th place.

The inverse permutation with the same table means to place the
0th bit of the block in the 1st place, the 1st bit in the 15th
place, the 2nd bit in the 115th place, and so on until the 255th
bit goes in the 0th place.

The routines address bits in the block by deriving a bit index
from the byte value of the permutation table.  The upper five
bits of that value index to the particular byte in the block,
and the lower three bits then index to the particular bit within
that byte.

In the original cryptographic use, the permutation table was
assumed to be cycled to its next permutation after the
encryption of each block.  The idea is to use the same thing in
as many different ways as possible to get a long period before
it repeats.  Consider the following permutation list of 10
elements:

element:    7 4 1 3 5 9 8 6 2 0
position:   0 1 2 3 4 5 6 7 8 9

This is the table permf and permg use.  In cyclic notation this
list becomes:

        (0 7 6 8 2 1 4 5 9) (3)

That is, there are two cycles, one of degree 9 and one
singleton.  It means that 7 goes to 0, 6 goes to 7, 8 goes to 6,
and so on.  If we "rotate" the cycles to the second position, we
get this list:

element:   6 5 4 3 9 0 2 8 1 7
position:  0 1 2 3 4 5 6 7 8 9

Thus from one cycle table we can get 9 different permutation
lists.  The cycle routine constructs these lists for permf and
permg, given a table in cyclic notation as above.  It can handle
tables of 256 elements, each of which may contain a number of
cycles.  For example, if we had two tables, the first containing
two cycles, of degrees 83 and 173, and the second containing
cycles of degrees 197 and 59, the composite cycle would not
repeat until 83 * 173 * 197 * 59 = 1.669 x 10^8 blocks had been
processed.

The routines now run about four times faster that the Z80
versions.

*
* PERMF for the 68000.      Permute a 256-bit bit vector, BLOCK
* by table BITPERM.  On call:
*
*       a0 -> BLOCK
*       a1 -> BITPERM
*
* On return, a0 -> permuted BLOCK.
*
* Register usage:
*
*       a2 -> WORK, a 32-byte temporary work area
*       d0 =  byte from BITPERM, shifted to bit index
*       d1 =  index to byte of BLOCK
*       d2 =  rotated bit for masking and inner loop control
*       d3 =  #31, outer loop control
*       d4 =  #$07 immediate masking value
*
* 5/19/86.  Execution time 4.5 ms.
*
        .globl  permf,work

        .text

permf:
        movem.l d0-d4/a0-a3,-(a7)
        moveq   #7,d0           clear work area
        lea     work,a2         -> work
        move.l  a2,a3           copy ptr for later use
clrloop:
        clr.l   (a2)+
        dbra    d0,clrloop
        move.l  a3,a2           retrieve work pointer
        move    #$80,d2         masking bit and inner loop control
        moveq   #31,d3          outer loop control
        moveq   #7,d4           masking value
        clr     d0              keep word clear
permloop:
        move.b  (a1)+,d0        get byte from BITPERM
        move    d0,d1           we will need it twice
        lsr     #3,d1           compute byte index in BLOCK
        and     d4,d0           save lower 3 bits for bit index
        eor     d4,d0           reverse bit order for btst
        btst    d0,(a0,d1)      is bit on in BLOCK?
        beq     permf1          if so, we must set bit in WORK
        or.b    d2,(a2)         set bit in WORK
permf1:
        ror.b   #1,d2           shift masking bit
        bcc     permloop        next bit of work?
        addq    #1,a2           else, next byte of work
        dbra    d3,permloop     do for 32 bytes

movloop:
        move.l  (a3)+,(a0)+     move data from WORK to BLOCK
        dbra    d4,movloop      use #7 already in d4
        movem.l (a7)+,d0-d4/a0-a3
        rts                     all done

        .end

*
* PERMG for the 68000.      Inversely permute a 256-bit bit
* vector by table BITPERM.  On call:
*
*       a0 -> BLOCK
*       a1 -> BITPERM
*
* On return, a0 -> permuted BLOCK.
*
* Register usage:
*
*       a2 -> WORK, a 32-byte temporary work area
*       a3 -> BLOCK
*       d0 =  outer loop control
*       d1 =  inner loop control
*       d2 =  bit counter
*       d3 =  longword from block
*       d4 =  byte from BITPERM
*       d5 =  temporary
*       d6 =  #7 masking value
*
* 5/19/86.  Execution time is 3.3 ms.
*
        .globl  permg,work

        .text

permg:
        movem.l d0-d6/a0-a3,-(a7)
        moveq   #7,d0           clear work area
        lea     work,a2
        move.l  a2,a3           copy ptr for later use
clrloop:
        clr.l   (a2)+
        dbra    d0,clrloop
        move.l  a0,a3           save a0 ptr for later
        moveq   #7,d0           outer loop control
        moveq   #0,d2           count of bits
        move    d2,d4           need word clear
        moveq   #7,d6           masking value
permg1:
        moveq   #31,d1          inner loop control and bit to test
        move.l  (a0)+,d3        get longword from BLOCK
bitloop:
        btst    d1,d3           check for bit on
        beq     permg2          if on, set BITPERM[d2] bit in WORK
        move.b  (a1,d2),d4      get byte BITPERM[COUNT]
        move    d4,d5           save for reuse
        lsr     #3,d4           index to byte of WORK
        and     d6,d5           compute bit # in that byte
        eor     d6,d5           reverse bit order
        bset    d5,(a2,d4)      set the bit in WORK
permg2:
        addq    #1,d2           bump bit count
        dbra    d1,bitloop      and do for all bits in this word
        dbra    d0,permg1       do for all words of BLOCK

movloop:
        move.l  (a2)+,(a3)+     move WORK to BLOCK
        dbra    d6,movloop      use #7 already in d6
        movem.l (a7)+,d0-d6/a0-a3
        rts                     all done

        .end

*
* CYCLE:  Convert a table of permutation cycles to a
* permutation list at the Nth permutation.  This version
* of cycle can deal with a table having any number of
* cycles (up to word value) of various degrees.  The cyclic
* permutation table is RANDP and the permutation list
* table is BITPERM.  BITPERM may then be used to permute
* a block of elements.  The procedure is:
*
* consult the cycle table structure to obtain the number
*  and degree of the cycles and the rotation to be applied
*  to each
*
* k <- 0       /* index into RANDP */
* for i = 1 to number_of_cycles do
*   get degree_of_this_cycle from structure
*   cycle_base <- k
*   for j = 1 to degree_of_this_cycle do
*     BITPERM[RANDP[k]] <- RANDP[RANDP[(k - cycle_base + rotation)
*                               mod (degree_of_this_cycle)]]
*     k <- k + 1
*   end for
* end for
*
* On call:
*       a0 -> RANDP
*       a1 -> BITPERM
*       a2 -> cycle structure
*
* On return:
*       all registers saved
*       BITPERM is established from RANDP
*
* The cycle structure is built as follows for each set of tables:
*
*       dc.w    xx      number of cycles in this table
*       dc.w    yy      degree of first cycle
*       dc.w    zz      rotation of first cycle
*       .       .       ..and so forth for each cycle
*       .       .
* CAUTION:  This structure holds words, but this implementation assumes
* that RANDP and BITPERM are tables of BYTES.  See indirect addressing
* inside degloop below.
*
* Version of 4/6/86.

        .text
        .globl  cycle

cycle:
        movem.l d0-d7/a0-a2,-(a7)
        move    (a2)+,d0        get # of cycles
        subq    #1,d0           adjust for dbra
        clr     d3              clear index into RANDP ("k" above)
        clr     d7              clear scratch register
cycloop:
        move    (a2)+,d1        get degree of this cycle
        move    (a2)+,d2        get rotation for this cycle
        move    d1,d5           use degree for loop control
        subq    #1,d5           adjust for dbra
        move    d3,d4           set current cycle base = k
degloop:
        move    d3,d6           first, see if outside cycle
        sub     d4,d6           RANDP index - current cycle base
        add     d2,d6           add in rotation
        cmp     d1,d6           does that put us outside this cycle?
        bcs     degok           branch if not
        sub     d1,d6           else, adjust mod degree
degok:
        add     d4,d6           add cycle base back to index + rotation
        move.b  (a0,d6),d6      get RANDP[index + rotation]
        move.b  (a0,d3),d7      get RANDP[index]
        move.b  (a0,d6),(a1,d7) put byte in BITPERM
        addq    #1,d3           bump RANDP index
        dbra    d5,degloop      loop until this cycle done
        dbra    d0,cycloop      loop until all cycles done

        movem.l (a7)+,d0-d7/a0-a2
        rts

        .end

/*EOF*/