1466 lines
70 KiB
Plaintext
1466 lines
70 KiB
Plaintext
John A. Thomas
|
||
CompuServe: 75236,3536
|
||
101 N.W. Eighth St.
|
||
Grand Prairie, TX 75050
|
||
|
||
|
||
SURVEY OF DATA ENCRYPTION
|
||
By John A. Thomas
|
||
|
||
|
||
Introduction
|
||
------------
|
||
|
||
The following article is a survey of data encryption.
|
||
It is intended to provoke discussion among the members of this
|
||
forum and perhaps lead to a creative exchange of ideas.
|
||
Although the basics of the subject seem to be known to few
|
||
programmers, it embraces many interesting and challenging
|
||
programming problems, ranging from the optimization of machine
|
||
code for maximum throughput to the integration of encryption
|
||
routines into editors, communications packages, and perhaps
|
||
products as yet not invented. Governments have dominated this
|
||
technology up until the last few years, but now the need for
|
||
privacy and secrecy in the affairs of a computer-using public
|
||
has made it essential that programmers understand and apply
|
||
the fundamentals of data encryption.
|
||
|
||
Some Cryptographic Basics
|
||
-------------------------
|
||
|
||
A few definitions are appropriate first. We use the
|
||
term "encryption" to refer to the general process of making
|
||
plain information secret and making secret information plain.
|
||
To "encipher" a file is to transform the information in the file
|
||
so that it is no longer directly intelligible. The file is then
|
||
said to be in "ciphertext". To "decipher" a file is to
|
||
transform it so that it is directly intelligible; that is, to
|
||
recover the "plaintext."
|
||
|
||
The two general devices of encryption are "ciphers" and
|
||
"codes". A cipher works on the individual letters of an
|
||
alphabet, while a code operates on some higher semantic level,
|
||
such as whole words or phrases. Cipher systems may work by
|
||
transposition (shuffling the characters in a message into some
|
||
new order), or by substitution (exchanging each character in the
|
||
message for a different character according to some rule), or a
|
||
combination of both. In modern usage, transposition is often
|
||
called "permutation". A cipher which employs both transposition
|
||
and substitution is called a "product" cipher. In general,
|
||
product ciphers are stronger than those using transposition or
|
||
substitution alone. Shannon referred to substitution as
|
||
"confusion", because the output is a non-linear function of the
|
||
input, thus creating confusion as to the set of input
|
||
characters. He referred to transposition as "diffusion" because
|
||
it spreads the dependence of the output from a small number of
|
||
input positions to a larger number.
|
||
|
||
Every encryption system has two essential parts: an algorithm
|
||
for enciphering and deciphering, and a "key", which consists of
|
||
information to be combined with the plaintext according to the
|
||
dictates of the algorithm. In any modern encryption system, the
|
||
algorithm is assumed to be known to an opponent, and the security of
|
||
the system rests entirely in the secrecy of the key.
|
||
|
||
Our goal is to translate the language of the plaintext to a
|
||
new "language" which cannot convey meaning without the additional
|
||
information in the key. Those familiar with the concept of
|
||
"entropy" in physics may be surprised to learn that it is also
|
||
useful in information theory and cryptography. Entropy is a
|
||
measure of the amount of disorder in a physical system, or the
|
||
relative absence of information in a communication system. A
|
||
natural language such as English has a low entropy because of
|
||
its redundancies and statistical regularities. Even if many of
|
||
the characters in a sentence are missing or garbled, we can
|
||
usually make a good guess as to its meaning. Conversely, we
|
||
want the language of our ciphertext to have as high an entropy
|
||
as possible; ideally, it should be utterly random. Our guiding
|
||
principle is that we must increase the uncertainty of the
|
||
cryptanalyst as much as possible. His uncertainty should be so
|
||
great that he cannot make any meaningful statement about the
|
||
plaintext after examining the ciphertext; also, he must be just
|
||
as uncertain about the key, even if he has the plaintext itself
|
||
and the corresponding ciphertext (In practice, it is impossible
|
||
to keep all plaintext out of his hands).
|
||
|
||
A prime consideration in the security of an encryption
|
||
system is the length of the key. If a short key (i.e., short
|
||
compared with the length of the plaintext) is used, then the
|
||
statistical properties of the language will begin to "show
|
||
through" in the ciphertext as the key is used over and over, and
|
||
a cryptanalyst will be able to derive the key if he has enough
|
||
ciphertext to work with. On the other hand, we want a
|
||
relatively short key, so that it can be easily stored or even
|
||
remembered by a human. The government or a large corporation
|
||
may have the means to generate and store long binary keys, but
|
||
we cannot assume that the personal computer user will be able to
|
||
do so.
|
||
|
||
The other important fact about the keys is that there must be
|
||
very many of them. If our system allows only 10,000 different keys,
|
||
for example, it is not secure, because our opponent could try every
|
||
possible key in a reasonable amount of time. This introduces
|
||
the concept of the "work factor" required to break an encryption
|
||
system. We may not have a system unbreakable in principle, but
|
||
if we can make the work factor for breaking so high it is not
|
||
practical for our opponent to do so, then it is irrelevant that the
|
||
system may be less strong than the ideal. What constitutes an
|
||
adequate work factor depends essentially on the number of
|
||
uncertainties the cryptanalyst must resolve before he can derive
|
||
plaintext or a key. In these days of constantly improving computers,
|
||
that number should probably exceed 2**128. It is easy to quantify the
|
||
work factor if we are talking about exhaustive key trial, but few
|
||
modern ciphers are likely to be broken by key trial, since it is too
|
||
easy to make the key space very large. Most likely they will be
|
||
broken because of internal periodicities and subtle dependency of
|
||
output on input which give the cryptanalyst enough information to
|
||
reduce his uncertainty by orders of magnitude.
|
||
|
||
A corollary to work factor is the rule that a system need
|
||
only be strong enough to protect the information for however
|
||
long it has value. If a system can be broken in a week, but
|
||
not sooner, then it may be good enough, if the information has
|
||
no value to an opponent after a week.
|
||
|
||
|
||
Cryptanalysis
|
||
-------------
|
||
|
||
Cryptanalysis is the science of deriving plaintext
|
||
without the key information. Anyone intending to design an
|
||
encryption system must acquaint himself to some degree with
|
||
cryptanalytic methods. The methods of attack may range from
|
||
sophisticated statistical analysis of ciphertext to breaking
|
||
into the opponent's office and stealing his keys ("practical
|
||
cryptanalysis"). There are no rules of fair play. The
|
||
cryptanalyist is free to use his puzzle-solving ingenuity
|
||
to the utmost, even to the point of applying the knowledge that
|
||
your dog's name is "Pascal", and that you might be lazy enough
|
||
to use that as your key for the day.
|
||
|
||
The cryptanalyst may have only ciphertext to work with,
|
||
or he may have both ciphertext and the corresponding plaintext,
|
||
or he may be able to obtain the encipherment of chosen
|
||
plaintext. Some cryptographic systems are fairly strong if the
|
||
analyst is limited to ciphertext, but fail completely if he has
|
||
corresponding plaintext. Your system should be strong enough to
|
||
resist attack even if your opponent has both plaintext and
|
||
ciphertext.
|
||
|
||
Computer power can greatly aid cryptanalysis, but
|
||
many systems that appear strong can be broken with pencil-and-
|
||
paper methods. For example, the Vigenere family of
|
||
polyalphabetic ciphers was generally believed to be unbreakable
|
||
up until the late nineteenth century. A polyalphabetic cipher is
|
||
a substitution cipher in which a different alphabet is used for
|
||
each character of plaintext. In these systems, the key
|
||
determines the order of the substitution alphabets, and the
|
||
cycle repeats with a period equal to the length of the key. This
|
||
periodicity is a fatal weakness, since fairly often a repeated
|
||
letter or word of plaintext will be enciphered with the same key
|
||
letters, giving identical blocks of ciphertext. This exposes
|
||
the length of the key. Once we have the length of the key, we
|
||
use the known letter frequencies of the language to gradually
|
||
build and test hypotheses about the key. Vigenere ciphers can
|
||
be easily implemented on computers, but they are worthless
|
||
today. A designer without knowledge of cryptanalysis however,
|
||
might be just as ignorant of this fact as his colleagues of
|
||
the last century. Please see the references at the end of
|
||
this article for information on cryptanalytic technique.
|
||
|
||
A Survey of Cryptographic systems
|
||
---------------------------------
|
||
|
||
We now review some representative encryption schemes,
|
||
starting with traditional ones and proceeding to the systems
|
||
which are only feasible to implement on computers.
|
||
|
||
The infinite-key cipher, also known as the "one time
|
||
pad," is simple in concept. We first generate a key which
|
||
is random and at least the same length as our message. Then,
|
||
for each character of plaintext, we add the corresponding
|
||
character of the key, to give the ciphertext. By "addition," we
|
||
mean some reversible operation; the usual choice is the
|
||
exclusive-or. A little reflection will show that given a random
|
||
key at least the size of the plaintext (i.e., "infinite" with
|
||
respect to the plaintext because it is never repeated), then the
|
||
resulting cipher is unbreakable, even in principle. This scheme
|
||
is in use today for the most secret government communications,
|
||
but it presents a serious practical problem with its requirement
|
||
for a long random key for each message and the need to somehow
|
||
send the lengthy key to the recipient. Thus the ideal infinite
|
||
key system is not practical for large volumes of message
|
||
traffic. It is certainly not practical for file encryption on
|
||
computers, since where would the key be stored? Be wary of
|
||
schemes which use software random-number generators to supply
|
||
the "infinite" key. Typical random-number algorithms use the
|
||
preceeding random number to generate the succeeding number, and
|
||
can thus be solved if only one number in the sequence is found.
|
||
|
||
Some ciphers have been built to approximate the
|
||
infinite-key system by expanding a short key. The Vernam system
|
||
for telegraph transmission used long paper tapes containing
|
||
random binary digits (Baudot code, actually) which were
|
||
exclusively-or'ed with the message digits. To achieve a long
|
||
key stream, Vernam and others used two or more key tapes of
|
||
relatively prime lengths, giving a composite key equal to their
|
||
product. The system is still not ideal, since eventually the
|
||
key stream will repeat, allowing the analyst to derive the
|
||
length and composition of the keys, given enough ciphertext.
|
||
There are other ways to approach the infinite-key ideal, some of
|
||
which are suggested in the author's article (with Joan
|
||
Thersites) in the August '84 issue of DDJ.
|
||
|
||
The "rotor" systems take their name from the
|
||
electromechanical devices of World War II, the best known being
|
||
perhaps the German ENIGMA. The rotors are wheels with
|
||
characters inscribed on their edges, and with electrical
|
||
contacts corresponding to the letters on both sides. A
|
||
plaintext letter enters on one side of the rotor and is mapped
|
||
to a different letter on the other side before passing to
|
||
the next rotor, and so on. All of the rotors (and there may be
|
||
few or many) are then stepped, so that the next substitution is
|
||
different. The key is the arrangement and initial setting of
|
||
the rotor disks. These devices are easy to implement in software
|
||
and are fairly strong. They can be broken however; the British
|
||
solution of the ENIGMA is an interesting story outside the
|
||
scope of this note. If you implement a rotor system, consider
|
||
having it operate on bits or nybbles instead of bytes, consider
|
||
adding permutation stages, and consider how you are going to
|
||
generate the rotor tables, since you must assume these will
|
||
become known to an opponent.
|
||
|
||
In 1977 the National Bureau of Standards promulgated the
|
||
Data Encryption Standard (DES) as the encryption system to be
|
||
used by all federal agencies (except for those enciphering data
|
||
classified under any of the national security acts). The
|
||
standard is available in a government publication and also in a
|
||
number of books. The DES was intended to be implemented only in
|
||
hardware, probably because its designers did not want users to
|
||
make changes to its internal tables. However, DES has been
|
||
implemented in software and is available in several
|
||
microcomputer products (such as Borland's Superkey or IBM's
|
||
Data Encoder).
|
||
|
||
The DES is a product cipher using 16 stages of
|
||
permutation and substitution on blocks of 64 bits each. The
|
||
permutation tables are fixed, and the substitutions are
|
||
determined by bits from a 56-bit key and the message block.
|
||
This short key has caused some experts to question the security
|
||
of DES. Controversy also exists regarding the involvement of the
|
||
NSA in parts of the DES design. The issues are interesting, but
|
||
beyond the scope of this note.
|
||
|
||
Since DES was intended for hardware implementation, it
|
||
is relatively slow in software. Software implementations of DES
|
||
are challenging because of the bit-manipulation required in the
|
||
key scheduling and permutation routines of the algorithm. Some
|
||
implementations gain speed at the expense of code size by using
|
||
large pre-computed tables.
|
||
|
||
The public key cipher is an interesting new development
|
||
which shows potential for making other encryption systems
|
||
obsolete. It takes its name from the fact that the key
|
||
information is divided into two parts, one of which can be made
|
||
public. A person with the public key can encipher messages, but
|
||
only one with the private key can decipher them. All of the
|
||
public key systems rely on the existence of certain functions
|
||
for which the inverse is very difficult to compute without the
|
||
information in the private key. These schemes do not appear to
|
||
be practical for microcomputers if their strength is fully
|
||
exploited, at least for eight-bit machines. One variety of
|
||
public key system (the "knap-sack") has been broken by solution
|
||
of its enciphering function, but this is no reflection on other
|
||
systems, such as the RSA scheme, which use different enciphering
|
||
functions. All public-key systems proposed to date require
|
||
heavy computation, such as the exponentiation and division of
|
||
very large numbers (200 decimal digits for the RSA scheme). On
|
||
the other hand, a public-key system that worked at only 10
|
||
bytes/sec might be useful if all we are sending are the keys for
|
||
some other system, such as the DES.
|
||
|
||
|
||
Some random thoughts
|
||
--------------------
|
||
|
||
To wrap up this too-lengthy exposition, I append a few
|
||
questions for the readers:
|
||
|
||
Must we operate on blocks instead of bytes? Block
|
||
ciphers seem stronger, since they allow for permutation. On the
|
||
other hand, they make life difficult when file size is not
|
||
an integral multiple of the block size.
|
||
|
||
Can we make a file encryption system OS-independent?
|
||
This is related to the question above on blocks vs bits. How do
|
||
we define the end-of-file if the plaintext is ascii and the
|
||
ciphertext can be any 8-bit value?
|
||
|
||
Can we find an efficient way to generate and store a
|
||
random key for the infinite-key system? Hardware random-number
|
||
generators are not hard to build, but would they be of any use?
|
||
|
||
Bit-fiddling is expensive. Can it be avoided and still
|
||
leave a secure system? What are the relative costs of
|
||
manipulating bits on the Z80 vs the 68000, for example?
|
||
|
||
No file-encryption system can erase a file logically and
|
||
be considered secure. The information can be recovered until it
|
||
is overwritten. Overwriting files adds to processing time. I
|
||
am informed that it is possible to reliably extract information
|
||
even from sectors that HAVE been overwritten. Is this so?
|
||
If it is, what is the solution?
|
||
|
||
How do we integrate encryption systems into different
|
||
tools? Should a telecommunications program transparently
|
||
encrypt data if the correspondent is compatible? What about an
|
||
editor-encryption system wherein plaintext would never exist on
|
||
the disk, only on the screen? How would we manage to
|
||
encipher/decipher text as we scroll through it and make changes,
|
||
and still get acceptable performance?
|
||
|
||
By their nature, encryption schemes are difficult to
|
||
test. In practice, we can only have confidence that a system is
|
||
strong after it has been subjected to repeated attack and
|
||
remained unbroken. What test might we subject a system to that
|
||
would increase our confidence in it?
|
||
|
||
|
||
References
|
||
----------
|
||
|
||
Here are a few useful books and articles. This is by no means
|
||
a complete bibliography of the subject:
|
||
|
||
Kahn, David. "The Code Breakers". The basic reference for the
|
||
history of cryptography and cryptanalysis. Use it to
|
||
learn where others have gone wrong.
|
||
|
||
Konheim, Alan G. "Cryptography, A Primer". Survey of
|
||
cryptographic systems from a mathematical perspective.
|
||
Discusses rotor systems and the DES in great detail.
|
||
|
||
Sinkov, Abraham. "Elementary Cryptanalysis". Very basic, but
|
||
very useful, introduction to the mathematical concepts
|
||
of cryptanalysis.
|
||
|
||
Foster, Caxton C. "Cryptanalysis for Microcomputers". Covers
|
||
the cryptanalysis of simple systems, but still a good
|
||
introduction to cryptanalytic technique. Describes the
|
||
operation of many traditional systems in detail.
|
||
|
||
Shannon, Claude. "Communication Theory of Secrecy Systems".
|
||
Bell System Technical Journal (October 1949) : 656-715.
|
||
Discusses secrecy systems from viewpoint of information
|
||
theory. No practical tips, but useful orientation.
|
||
|
||
Rivest, R. et al. "A method for Obtaining Digital Signatures and
|
||
Public Key Cryptosystems". Comm. of the ACM, Vol. 21,
|
||
No. 2, (February 1978) : 120-126. This article
|
||
describes what has come to be known as the RSA
|
||
public-key system.
|
||
|
||
"Data Encryption Standard", Federal Information Processing
|
||
Standard (FIPS), Publication No. 46, National Bureau of
|
||
Standards, U.S. Dept. of Commerce, January, 1977.
|
||
|
||
---
|
||
|
||
To start off, I'll discuss some *good* points of the Data Encryption
|
||
Standard promulgated by the federal government (DES). We all know the key is
|
||
too small - the NSA almost certainly has the means to exhaust it. But it
|
||
would be a pity not to examine the DES just for that reason. It uses a
|
||
brilliant cryptographic technique that once understood can be used by us in
|
||
families of other crypto-systems, with much larger keys.
|
||
|
||
The DES shows us how to use one-way functions in cryptography. At first
|
||
sight, a one-way function would seem to be useless - if we encrypt 'A' and get
|
||
'R', we have to be able to decrypt 'R' and get back 'A'. However, a one-way
|
||
function, if it could be used, allows very complex transformations of text
|
||
that are practically impossible to undo without knowledge of the key.
|
||
However, the DES is as complicated as it is complex, so for a beginning I'm
|
||
going to explain how to use a one-way function cryptographically without
|
||
reference to the DES. If there's enough interest, we can later relate this to
|
||
the DES.
|
||
|
||
Perhaps the simplest way to define a one-way function is with a table, such
|
||
as:
|
||
|
||
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
|
||
-----------------------------------------------
|
||
14 05 01 14 10 04 08 00 03 02 15 08 09 11 07 15
|
||
|
||
The top numbers are indexes into the one-way table. Given an index, you can
|
||
get a value by table look up, but given a table value there's no guarantee
|
||
you'll get the index you started with because both 0 and 3 map to 14, 6 and 11
|
||
map to 8, and so on. BTW, the table values were generated by a random
|
||
process.
|
||
|
||
Now, let's use this cryptographically. Take an ASCII letter, say 'a' and
|
||
split it into two nibbles, left and right
|
||
|
||
LEFT RIGHT
|
||
61h -> 6 1
|
||
|
||
The DES trick is to use one half as an argument of a one-way function to
|
||
obtain a value with which to encrypt the other half, so let's do it. Using
|
||
RIGHT as the index into the table, we obtain the value 5. Now, we need a
|
||
primitive encryption function that is, and must be, invertible. The DES uses
|
||
XOR, but we will use ADD, and add 5 to LEFT, mod 16, obtaining 11. We have
|
||
|
||
LEFT RIGHT
|
||
11 1
|
||
|
||
This encrypts LEFT. Notice that even though we used a one-way function, the
|
||
encryption is completely reversible for two reasons. First, RIGHT, the
|
||
argument of the table lookup is unchanged, so getting the correct value from
|
||
the table for decryption is assured. Second, the encryption primitive is
|
||
invertible; we need only to subtract the table value mod 16 from encrypted
|
||
left.
|
||
|
||
But there seems to be a problem. RIGHT isn't encrypted, and if that's how we
|
||
left matters, we wouldn't have much of a cipher. The answer suggests itself -
|
||
use the new value of LEFT in the same way to encrypt RIGHT. Let's do it.
|
||
Using 11 as an index into the table gives us 8, which we add to RIGHT giving
|
||
|
||
LEFT RIGHT
|
||
11 9
|
||
|
||
which on the IBM PC is a funny graphics character. Now, is this reversible?
|
||
Of course it is, the argument into the table that encrypted RIGHT is
|
||
completely unchanged, and if used we get 8 which we subtract from 9 giving 1.
|
||
And we have already shown that 11 1 will undo properly to give us 6 1.
|
||
|
||
So far, we have nothing but a curious simple substitution cipher. Repeating
|
||
the process isn't encouraging either. It clearly must cycle if we continue
|
||
to alternately encrypt LEFT and RIGHT, using the values from the previous
|
||
encryption as input. In fact, there are a number of cycles of different
|
||
lengths, but it's interesting that some cycles don't cycle back to the
|
||
starting value. For example, starting with 01, we get 01, 55, 99, BB, 33,
|
||
EE, 55... The reason is that our table is not a permutation of the integers
|
||
0 through 15.
|
||
|
||
To make our sample cipher more than a curiosity, we shall do what the DES
|
||
does, use another one-way function (that is, a table) in a second *round*, and
|
||
repeat this process with the new table.
|
||
|
||
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
|
||
-----------------------------------------------
|
||
04 13 12 06 13 07 03 15 13 15 06 09 09 09 07 10 07
|
||
|
||
LEFT RIGHT
|
||
11 9 -> 15 + 11 = 6
|
||
11 = 9 + 3 <- 6 9
|
||
6 11
|
||
|
||
This is already a considerably more complex cipher. It is still reversible,
|
||
but we must remember to *decrypt* it starting with the last table, not the
|
||
first. This cipher, like the DES, is sensitive to direction, encrypt or
|
||
decrypt.
|
||
|
||
It still is a simple substitution cipher with no strength. You will notice
|
||
however, that the length of the second table is one more than that of the
|
||
first. Obviously, I intend to rotate both tables one position before
|
||
encrypting the next letter of a message. Since the lengths are relatively
|
||
prime, I can encipher 16x17=272 letters before my tables repeat. This is no
|
||
longer a simple substitution cipher, but one far more complex. Still not good
|
||
enough by far. But if I had only one message not too long to protect, this
|
||
would be practically unbreakable. It might be good for hundreds of letters
|
||
before repetitions would enable cryptanalysis.
|
||
|
||
Of course, it isn't strong enough to protect against an attack based on known
|
||
plaintext. With enough known plaintext, both tables can be reconstructed. If
|
||
the DES used only two rounds, it too could be cryptanalyzed. It can be broken
|
||
because with only two rounds, there aren't enough addends for known input and
|
||
output that the addends can't be exhausted. Several more rounds are necessary
|
||
to make exhaustion of possible addends impractical. It is quite important in
|
||
designing a crypto-system that you design it against *known* plaintext. In
|
||
practice, it is impossible to prevent known plaintext from falling into the
|
||
cryptanalyst's hands.
|
||
|
||
Later, I will develop this sample into a very tough cipher indeed, though I
|
||
won't make any claims for its ultimate strength (I haven't examined it that
|
||
well yet). But for now, let's sum up what we have done.
|
||
|
||
1. We used a one-way function for a crypto-system. To give credit where it is
|
||
due, this technique was invented by Horst Feistel of IBM, originally for the
|
||
Lucifer. It is in my opinion an absolutely brilliant cryptographic
|
||
innovation. The technique is fundamental to the DES.
|
||
|
||
2. We cascaded one-way functions for complexity. The DES uses cascading
|
||
similarly for strength.
|
||
|
||
3. Unlike the DES, we have developed the germ of a byte cipher, rather than a
|
||
block cipher. Very great complexity may be had with a block cipher, but there
|
||
is still doubt that block ciphers are 'better'.
|
||
|
||
It is the trick of alternately enciphering 'left' and 'right' that permits the
|
||
use of a one-way function. In practice, it is easier to swap or exchange
|
||
right and left after each round. That is, we use RIGHT to encrypt LEFT, then
|
||
exchange RIGHT and LEFT, and repeat encryption for the next round. This
|
||
simplifies code and circuitry, though it may confuse us as we try to follow
|
||
what happens in the DES. Therefore, to avoid confusion, we will always keep
|
||
RIGHT right, and LEFT left.
|
||
|
||
---
|
||
|
||
To begin, we shall consider the DES encipherment a little abstractly, then
|
||
later in more detail.
|
||
|
||
The heart of the DES one-way function consists of the eight S-boxes, each
|
||
of which transforms six bits of the input to four bits of output. We
|
||
can understand the function of the S-boxes better if we first consider
|
||
the transformation they effect as they act in concert upon the entire
|
||
input block of 48 bits. Imagine one table consisting of 2^48 numbers of 32
|
||
bits each. Each 32 bit number is repeated 65,536 times, in a more or less
|
||
random order. Also imagine a 'letter', not of one byte, but of 64 bits divided
|
||
into a LEFT of 32 bits and a RIGHT of 32 bits. RIGHT is combined with 48 key
|
||
bits selected out of 56 in a way that will be described later, without,
|
||
however, changing RIGHT, and this combined value is used as an argument into
|
||
the huge table to return a pseudo-random 32 bit value. This returned value is
|
||
XORed with LEFT to encrypt LEFT. The same table lookup is repeated in the next
|
||
round using LEFT as the argument and encrypting RIGHT. Except that a different
|
||
arrangement of 48 key bits out of the 56 is used. The DES repeats this process
|
||
16 times, that is, encrypts RIGHT eight times, and LEFT eight times. There is
|
||
a reason for eight that we will discuss later. Each iteration is called a
|
||
'round'.
|
||
|
||
It is clear that this huge table is a one-way function, and that the
|
||
encryption technique is almost exactly what we described for the byte cipher
|
||
discussed in the previous upload. There is an important difference - we are
|
||
now using a key in addition to a table. In our byte cipher, the key is the
|
||
table itself. Also, the DES uses a different permutation of the original key
|
||
for every round in a clever way that ensures that every bit of final
|
||
ciphertext depends complexly on every bit of the key. This is important.
|
||
A sure-fire cryptanalytic technique is to suppose a few key bits - not
|
||
too many possibilities to exhaust - then to test the supposition against
|
||
sample ciphertext compared with known plaintext. In this way, although a key
|
||
space may be too large to exhaust by brute force, the key is gradually
|
||
recovered. A good example is the simple substitution where the key space is
|
||
26! (about 2^88). But, by forcing all ciphertext bits to depend on all key
|
||
bits, this sure-fire attack is inhibited if not made impossible. You can't
|
||
solve for a few key bits, you have to solve for all or none.
|
||
|
||
Why a key? The chief reason is that the one-way table is published. It is no
|
||
secret as we suppose our byte cipher's tables are. And, to be a standard,
|
||
we aren't supposed to change the DES tables. Further, the tables are not
|
||
random; the inventors state that the DES's particular tables have been worked
|
||
out for cryptographic strength, and that to their own surprise discovered that
|
||
random tables, which they had naively believed to be sufficient, aren't so
|
||
good. And, nobody will say what criterion should be used to design DES
|
||
tables, and nobody has figured them out (at least, not publicly). Naturally,
|
||
this has bred suspicion, and the fact that the NSA helped design the tables
|
||
hasn't helped. Later, I would like to offer my own speculation on what this
|
||
criterion might be. To summarize, the key serves as the secret ingredient
|
||
because the tables can't be secret.
|
||
|
||
Obviously, a table of this size is a practical impossibility. So, how does
|
||
the DES achieve a 'virtual' table of this size? Basically, nibble by nibble.
|
||
It uses the nibbles of RIGHT, in order, as indexes into relatively small
|
||
matrixes to get eight encryption values per round. But the encryption values
|
||
don't encrypt the corresponding nibbles of LEFT. Oh no, that would be fatally
|
||
simple. Each result nibble of the one-way lookup encrypts scattered bits of
|
||
LEFT so that each bit of the value impacts the most nibbles possible in LEFT.
|
||
Now, when LEFT becomes the argument, the nibble values returned from the table
|
||
have dependencies on many bits of RIGHT. Within five rounds, all ciphertext
|
||
bits become complex dependents of all plaintext bits and all key bits. The
|
||
dependency is extremely violent. Believe me, a single bit of difference in
|
||
either plaintext or key gives you an unrecognizably different ciphertext.
|
||
|
||
We should be ready now to see how a nibble of RIGHT (actually, three will be
|
||
involved), and some key bits, are used as table arguments, and how this
|
||
encrypts four bits of LEFT in a single round. Let's ignore how the different
|
||
permutations of the key bits are generated for now. Just imagine there are
|
||
somehow six key bits. The first nibble one-way function is this matrix:
|
||
|
||
Box S1
|
||
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
|
||
|-----------------------------------------------
|
||
0 |14 4 13 1 2 15 11 8 3 10 6 12 5 9 0 7
|
||
1 | 0 15 7 4 14 2 13 1 10 6 12 11 9 5 3 8
|
||
2 | 4 1 14 8 13 6 2 11 15 12 9 7 3 10 5 0
|
||
3 |15 12 8 2 4 9 1 7 5 11 3 14 10 0 6 13
|
||
|
||
Recall that the input block of 32 bits is first expanded to 48 bits by
|
||
the "selection operation". If you consult this table in the standard,
|
||
you will see that the first six bits of the result are the first RIGHT
|
||
nibble, plus the last bit of RIGHT, plus the first bit of the next nibble
|
||
to form six bits:
|
||
|
||
32 1 2 3 4 5
|
||
|
||
This is one argument into the one-way function. Also, six bits of the key are
|
||
an argument, and the six key bits and these RIGHT bits are XORed to form the
|
||
actual table argument. The first and last bits of this XOR-result index the row
|
||
of the S1 box. The middle four bits index the column. The intersection of row
|
||
and column gives us a four-bit value, which, after passing through the
|
||
permutation operation P, is used to encrypt LEFT bits 9 17 23 and 31.
|
||
|
||
This is surely one-way. You can't determine the row and column indexes from
|
||
the table value because there are four row and column indexes that map to the
|
||
same value. It should also be clear that LEFT bits 9 17 23 31 are decryptable
|
||
because RIGHT was never changed. We have only to combine the appropriate key
|
||
bits with RIGHT's 32 1 2 3 4 5, and we'll get the same value out of the S box,
|
||
which, XORed with LEFT, will yield our original 9 17 23 and 31.
|
||
|
||
Notice the scatter. By encrypting these particular bits, we are encrypting
|
||
the 3rd, 5th, 6th, and 8th nibbles which will be immediately used in the next
|
||
round as arguments. Because of their positions, they will form part of the
|
||
2nd, 3rd, 4th, 5th, 6th, and 8th nibble arguments of the future LEFT. And
|
||
thus, when RIGHT gets encrypted, its old first nibble will be quite
|
||
spread out. The scatter makes far more bits dependent on very few bits than
|
||
is at first apparent. The only unaffected nibbles are the 1st and 7th, but
|
||
this is only one round, and we tracked only one nibble of RIGHT.
|
||
|
||
This is getting lengthy, so let me list the RIGHT bit arguments corresponding
|
||
to the eight S boxes and the LEFT bits that get encrypted
|
||
|
||
RIGHT bits box LEFT bits
|
||
32 1 2 3 4 5 ----> S1 ----> 9 17 23 31
|
||
4 5 6 7 8 9 ----> S2 ----> 13 28 2 18
|
||
8 9 10 11 12 13 ----> S3 ----> 24 16 30 6
|
||
12 13 14 15 16 17 ----> S4 ----> 26 20 10 1
|
||
16 17 18 19 20 21 ----> S5 ----> 8 14 25 3
|
||
20 21 22 23 24 25 ----> S6 ----> 4 29 11 19
|
||
24 25 26 27 28 29 ----> S7 ----> 32 12 22 7
|
||
28 29 30 31 32 1 ----> S8 ----> 5 27 15 21
|
||
|
||
In this manner, a nibble by nibble encryption is spread out so that a virtual
|
||
table of 2^48 elements is achieved. Note that we have not really considered
|
||
the key yet. And note that I have shown the contents of only one box. The
|
||
boxes are listed in FIPS Pub 46, which you should use if you ever implement
|
||
the DES, because other sources usually have typos, the slightest of which is
|
||
fatal.
|
||
|
||
Also, pay attention to how RIGHT's bits are expanded to 48. The last bit of
|
||
the previous nibble plus the first bit of the next are tacked onto each nibble
|
||
of the argument fore and aft. This builds an inter-nibble dependency into the
|
||
DES. Even more important, one encrypting bit from the table can do a lot of
|
||
'damage'. Look at the nibble value coming out of the second S-box; its first
|
||
bit will encrypt the 13th bit of LEFT. But look where the 13th bit occurs in
|
||
expanded RIGHT! The result of this encryption is that when LEFT becomes the
|
||
table argument, the third and fourth table nibbles are dependent on just one
|
||
bit. As a matter of fact, the value out of any S-box encrypts exactly two
|
||
nibble boundary bits and two nibble inner bits.
|
||
|
||
We saw how the DES uses one-way functions, each specific to a nibble, and how
|
||
the results of the one-way functions are carefully scattered. The purpose of
|
||
the scattering is to build up complex dependencies for the final result on all
|
||
bits of message and key as fast as possible. The scatter is achieved by a
|
||
permutation, called P, that the standard describes as occurring after
|
||
gathering the eight table values. For software implementation, there is a
|
||
great savings in speed by replacing all S-box values with 32-bit entries
|
||
pre-permuted by the inverse of P - that is, by the encrypting bit positions
|
||
we already listed, and doing away with P entirely.
|
||
|
||
To illustrate, the first value of the first S-box is 14, in binary, 1 1 1 0.
|
||
Therefore, to do away with P, we replace this value with its equivalent
|
||
|
||
1 2 3
|
||
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
|
||
---------------------------------------------------------------
|
||
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
|
||
1 1 1 0
|
||
|
||
The gain in speed can't be overemphasized. There are more implementation
|
||
tricks like this that hopefully we can get into later. On an 8088, at 4.77
|
||
mHz, encryption speeds of 2500 bytes/sec are possible in pure software. On a
|
||
68000, we think that software speeds of 6000, or better, bytes/sec are
|
||
achievable. Perhaps even as high as 8000. The important lesson is that you
|
||
don't have to code in the dumb plodding way the standard implies. It seems
|
||
that the standard was written to be as uninstructive as possible.
|
||
|
||
We should now discuss the key. We won't go into a bit by bit description, for
|
||
that, see FIPS 46. The main idea is to generate 16 48-bit keys out of the
|
||
original 56 that
|
||
|
||
1. are simply generated
|
||
2. vary one from the other as much as possible
|
||
3. fit the 'scatter' to involve all key bits in all cipher bits
|
||
|
||
For this, two (actually, three) permutations are used, called PC1 and PC2 in
|
||
the standard, awkward names, but we'll use them. This 'key scheduling' is not
|
||
perfect; it permits weak keys (keys that are their own inverse) and semi-weak
|
||
keys (keys that result in alternating patterns so that all encryptions of LEFT
|
||
are as if encrypted by an invariant key, and a different invariant key for
|
||
RIGHT). The key scheduling just isn't good enough to avoid these weaknesses.
|
||
|
||
PC1 is a geometric permutation that doesn't seem to have deep reason. It
|
||
discards the so-called parity bits, thus reducing 64 bits to 56, and divides
|
||
the key bits into two halves. The halving *is* important. This is a picture
|
||
of the PC1 transformation of the original key bits with their 'parity' bits
|
||
|
||
1 2 3 4 5 6 7 8
|
||
9 10 11 12 13 14 15 16
|
||
17 18 19 20 21 22 23 24
|
||
25 26 27 28 29 30 31 32
|
||
33 34 35 36 37 38 39 40
|
||
^ 41 42 43 44 45 46 47 ^ 48
|
||
| 49 50 51 52 53 54 55 | 56
|
||
| 57 58 59 60 61 62 63 | 64
|
||
C half D half discarded
|
||
|
||
C = 57 49 .. 44 36
|
||
D = 63 55 .. 4 12 20 28
|
||
|
||
Very geometric, which, combined with the geometric shifts used with PC2, cause
|
||
a larger number of weak keys than would be strictly necessary. Now, PC2,
|
||
which is actually two permutations, one that operates on circular shifts of C,
|
||
and one that operates on circular shifts of D, has an odd property; it
|
||
re-arranges 24 bits out of C and 24 bits out of D so that the C bits directly
|
||
combine only with the first 16 bits of RIGHT (or LEFT), and the D bits only
|
||
with the last 16 bits of RIGHT (or LEFT). (I'm not considering the nibble
|
||
boundary bits). Bit 41, in other words, will never combine with bits 17
|
||
through 32 of the one-way argument. Similarly, bit 47 will never combine with
|
||
bits 1 through 16 of the one-way argument.
|
||
|
||
Indirect combination, due to the scatter, is another story. My guess is, that
|
||
cutting the key bits into halves is deliberate to ensure that all key bits
|
||
encrypt all cipher bits as quickly and simply as possible. If you carefully
|
||
examine which bits get encrypted by the table values, you will see that the
|
||
scatter also repositions substitution bits by halves. That is, in our
|
||
illustration, the first 16 bits plus the nibble boundary bits, of RIGHT,
|
||
encrypt half of the first 16 bits of LEFT and half of the last 16 bits of LEFT
|
||
(almost). Also, the last 16 bits of RIGHT cause encryption of the remaining
|
||
first and second halves of LEFT. I think this completely explains the purpose
|
||
of the P permutation described (described? just listed!) in the standard.
|
||
|
||
For PC2, see FIPS 46. Let me just mention that since PC2 selects 24 bits out
|
||
of each half, in any of the key schedules there are always 4 bits out of the
|
||
56 missing. This makes it harder for the cryptanalyst to work the cipher
|
||
backwards by supposing and testing assumptions about the key.
|
||
|
||
Finally, let me add an implementation note. You don't have to go through the
|
||
key scheduling algorithm for every block like the standard describes.
|
||
Instead, you program an initialization call that generates the 16 key
|
||
schedules one time. You can do this because the schedules are invariant from
|
||
block to block. Such an initialization call makes the difference between
|
||
encryption rates of 80 bytes/sec and 700 bytes/sec for an otherwise uninspired
|
||
implementation, or 2500 bytes/sec for what is in my opinion the very finest
|
||
ever for the IBM PC, the Data Encoder.
|
||
|
||
To make all cipher bits depend on all key bits is no mean accomplishment. The
|
||
attempt is fraught with booby traps. Most ways you can think of to force this
|
||
dependency have the unfortunate result of transforming the actual key into a
|
||
simpler and much smaller equivalent key. This danger the designers of the DES
|
||
managed to avoid. I mention this because we have seen some weaknesses in the
|
||
DES's key scheduling, but these weaknesses should not blind us to the DES's
|
||
good points if we want to learn how to design ciphers.
|
||
|
||
The DES has another weakness, the complementary property, that effectively
|
||
halves the key space. This property is caused by the two uses of XOR in the
|
||
algorithm. Let's use '~' to indicate boolean 'not', 'm' to indicate
|
||
'message', and 'k' to indicate 'key'.
|
||
|
||
The complementary property is as follows:
|
||
|
||
DES(k,m) = ~DES(~k,~m)
|
||
|
||
Read this as: the DES encryption of a message, m, under a key, k, is IDENTICAL
|
||
to the complement of the DES encryption of ~m under key ~k.
|
||
|
||
Remember that the key bits are combined with the message bits by a XOR to look
|
||
up an encrypting value in an S-box. Because of this XOR, m and k, or ~m and
|
||
~k, map to the identical value in an S-box because of the boolean identity
|
||
|
||
(m XOR k) = (~m XOR ~k)
|
||
|
||
Remember also that LEFT is encrypted by XORing it with the looked-up value.
|
||
Due to another boolean identity
|
||
|
||
(LEFT XOR VALUE) = ~(~LEFT XOR VALUE)
|
||
|
||
we have the complementary property. The result is that for known plaintext,
|
||
the DES's key space is 2^55, not 2^56. And cryptographers must assume that
|
||
plaintext is known when analyzing a cipher; that simply isn't unreasonable.
|
||
|
||
This weakness would have been easy to avoid. We have only to *add* the key to
|
||
RIGHT (or LEFT) mod 2^48, and we would not map to the same S-box values for
|
||
complements of message and key; and to *add* the looked-up value to LEFT (or
|
||
RIGHT) mod 2^32. And as a general rule, XOR is not a good primitive for
|
||
ciphers. True, it is convenient because XOR is its own inverse, while if we
|
||
used ADD to encrypt, we would have to use SUB to decrypt. But in cryptography
|
||
there are more important things than convenience, for example, keeping the
|
||
actual key space close to the potential.
|
||
|
||
Why this weakness is not corrected is hard to understand. DES defenders claim
|
||
that the complementary property is useful for verifying the correctness of
|
||
implementations. Basically, you code the DES then test it to see if the
|
||
complementary property holds for randomly chosen key and plaintext, and if it
|
||
does, you are supposed to get a 'warm' feeling. But this argument can't be
|
||
valid. Errors in key scheduling and in S-box values can't be caught by this
|
||
test. Matter of fact, most errors in coding the DES can't be caught by the
|
||
complementary property, so long as XOR is used to combine RIGHT and key, and
|
||
to encrypt LEFT. It only proves that XOR is used both places, not that things
|
||
are right!
|
||
|
||
DES defenders also claim that XOR is necessary to keep decryption like
|
||
encryption, that is, to avoid sensitivity to direction. However,
|
||
the DES, whether it uses XOR or not, *is* sensitive to direction. To encrypt,
|
||
you must start with the *first* key schedule, and work your way to the 16th.
|
||
To decrypt, you must start with the 16th key schedule, and work your way
|
||
backwards to the first. Since the DES is sensitive to direction anyhow, it
|
||
wouldn't hurt a thing to use ADD one way, and SUB the other.
|
||
|
||
My opinion is that XOR is a mistake realized too late, and that correction is
|
||
resisted because too much is now invested in this mistake, and that defense of
|
||
the XOR are rationalizations, not reasons. And yes, it does matter. The key
|
||
space of 2^56 isn't enough anyhow. If the NSA can exhaust all 2^56 keys in a
|
||
day (and on average, it need only to exhaust half that, or 2^55), then for
|
||
known plaintext, which is very common, it can exhaust all possible 2^55 keys
|
||
in half a day (or on average 2^54 keys in one quarter of a day).
|
||
|
||
But our interest in the DES is not its defense, but to learn good ciphering
|
||
from both its good and bad points. The lesson is clear; avoid XOR for
|
||
ciphers, because it halves the key space. If you implement the DES and
|
||
don't need to maintain the standard, I recommend using ADD and SUB instead
|
||
of XOR. Software implementation of the DES is frowned on, not because it
|
||
is slow, but because it permits a knowledgeable person to tamper with the DES
|
||
to the NSA's distress. The NSA prefers hardware implementations only (ROM
|
||
qualifies as 'hardware') and will more readily grant export licenses for
|
||
hardware than software.
|
||
|
||
Here is one suggestion that practically increases the key space to
|
||
120 bits. Begin with a seed of 64 bits, the seed being an extension
|
||
of your 56-bit key. Encrypt the seed by your key and use the resulting
|
||
16 nibbles of the ciphertext to shuffle the nibbles of the first row of
|
||
the first S-box. Encrypt the ciphertext (not the seed) again with
|
||
the altered S-box. Use the second ciphertext block to shuffle the
|
||
nibbles of the next row of the S-box. Repeatedly encrypt blocks
|
||
and shuffle rows until all 32 rows have been altered. Thereafter,
|
||
use your 56-bit key and the altered DES tables to encrypt and decrypt
|
||
your messages. In all probability, the resulting ciphertext won't
|
||
be as nice cryptographically with these randomized tables as with
|
||
the contrived ones, but this scheme will thwart a brute-force attack
|
||
based on exhaustive key trial. A key search on a special-purpose
|
||
machine is possible over a 55-bit key space (given known plaintext),
|
||
but it is not possible over a 120-bit key space. We may take up
|
||
later other pros and cons of modifying the DES tables.
|
||
|
||
The inventors of the DES state that five rounds are required for all
|
||
ciphertext bits to become dependent on all key bits and all message bits. Yet
|
||
the DES uses 16 rounds. How come? Wouldn't it be better to limit the rounds
|
||
to five for a great increase in throughput? Or would it? In examining a
|
||
cipher it is very good not to take anything for granted, to ask oneself 'why
|
||
this, instead of that?' and attempt to find a coherent reason for it.
|
||
|
||
I haven't been able to answer this question satisfactorily for myself.
|
||
However, the DES is too carefully crafted for me to lightly believe that it
|
||
uses 16 rounds just because that is a nice round number in binary arithmetic.
|
||
Perhaps my current thinking will help others to puzzle it out.
|
||
|
||
The DES is breakable, not just by brute force but by cryptanalysis, if it used
|
||
two rounds instead of 16, and if plaintext is known. To see this, let's list
|
||
knowns and unknowns for two rounds. We need a notation now for LEFT and RIGHT
|
||
to show rounds, and we will use L[i] and R[i]. When i=0, LEFT and RIGHT are
|
||
plaintext; when i=n, LEFT and RIGHT are ciphertext. Also, we will designate
|
||
key schedules by K[j]. This is the picture when n=1.
|
||
|
||
L[0] R[0] known by assuming known plaintext
|
||
L[1] R[1] known by intercepting ciphertext
|
||
K[1] unknown
|
||
K[2] unknown
|
||
|
||
Remember that encryption means a 32-bit number was picked out of the S-boxes
|
||
as a function of R[0] and K[1], and this number encrypted L[0] to produce
|
||
L[1]. Similarly, another 32-bit number was picked out of the S-boxes as a
|
||
function of L[1] and K[2], and the second number encrypted R[0] to produce
|
||
R[1].
|
||
|
||
There is another way to look at this (the poet Wallace Stevens listed thirteen
|
||
ways of looking at a blackbird). One nibble is picked out of the first S-box
|
||
as a function of the bits 32 1 2 3 4 5 of R[0] and the first six bits of K[1]
|
||
(bits 10 51 34 60 49 17 of the key), and encrypts bits 9 17 23 and 31 of L[0]
|
||
by XOR. And so on.
|
||
|
||
Now, if we have known plaintext, it is simply no problem to determine what the
|
||
S-box values were that encrypted both LEFT and RIGHT. We just take bits 9 17
|
||
23 and 31 of L[0] and XOR them against the same bits of L[1]. And so on.
|
||
Let's call these nibbles the encrypting *sums*, and because there is only one
|
||
round per side, the sums are also direct S-box values.
|
||
|
||
For each S-box value, there are four possible 6-bit keys that will map to
|
||
the S-box value. For all the nibbles that encrypted left, the possible key
|
||
space used with R[0] is reduced from 2^48 to (2^2)^8, or 2^16. This key space
|
||
is easily exhausted, and we can now attack K[2]. By the same procedure, the
|
||
key space for K[2] is also reduced to 2^16, however, K[2] must be consistent
|
||
with K[1]. That is, the bits of K[2] are practically the same as for K[1],
|
||
but rearranged. So, in fact, the key space for K[2] is far smaller than 2^16,
|
||
I think it is only 2^4 (but I haven't counted it yet). This breaks a
|
||
two-round DES.
|
||
|
||
Now suppose four rounds; two for LEFT and two for RIGHT, and list the knowns
|
||
and unknowns again
|
||
|
||
L[0] R[0] known (plaintext)
|
||
L[1] R[1] K[1] K[2] unknown! (intermediate ciphertext)
|
||
L[2] R[2] known (ciphertext)
|
||
K[3] K[4] unknown
|
||
|
||
Now when we derive the encrypting sums we no longer have the S-box values, but
|
||
two S-box *addends*, and for any sum, there must be 16 pairs of addends. We
|
||
simply don't know what L[1] and R[1], the intermediate ciphertext, are
|
||
anymore. The possible keys for K[1] increase from 2^16 to 2^32. This begins
|
||
to look computationally difficult. Instead of four possible 6-bit keys per
|
||
nibble, we now have 16. However, let us remember the consistency requirement
|
||
of the key bits. If we do suppose a particular key bit is 1, it must be 1 in
|
||
all the schedules in which it occurs, regardless of position. So, the
|
||
combinatorics aren't quite as daunting as they first appear. It seems that
|
||
this is still solvable.
|
||
|
||
To work this out accurately, you would have to construct tables of key bits
|
||
according to the key schedules, then note the bits repeated (they will be in
|
||
different positions) from round to round.
|
||
|
||
For six rounds, the number of possible keys used with R[0] jumps to 2^48.
|
||
With a large machine, it should be possible to solve. But any more rounds, we
|
||
would be solving for 2^56 possible keys; that is, we are back to a brute force
|
||
attack.
|
||
|
||
With 16 rounds, there are seven intermediate unknown LEFTs and RIGHTs
|
||
and the combinatorics become too great to backsolve for the key. I suspect
|
||
that if the math is worked out, 16 rounds make it a practical certainty that
|
||
backsolving is infeasible, whether we attack all bits used in one round, or a
|
||
few bits used in all rounds. It would be nice to know the exact number of
|
||
rounds that reach this infeasibility; the math, however, escapes me.
|
||
|
||
We must remember this when we return to our byte cipher (discussed in
|
||
the file "DES"), because we should determine how many rounds are
|
||
required to make backsolving with known plaintext infeasible. The
|
||
number of rounds is important; too many is inefficient, too few is fatal.
|
||
|
||
---
|
||
|
||
Experts have studied the S-boxes. To a casual eye each row of
|
||
the S-boxes appears to be a random permutation of the nibbles 0
|
||
through 15. But they aren't random. So far, nobody has figured
|
||
out their ordering or why.
|
||
|
||
I haven't figured out the S-boxes, and it is unlikely I ever
|
||
will. Nevertheless, I am inclined to believe the inventors when
|
||
they say there is no trap door, and the computed boxes really
|
||
are 'better' cryptographically than random boxes. Keep in mind
|
||
that many mathematicians, computer scientists, and gifted
|
||
amateur cryptanalysts who owe nothing to the NSA have pored over
|
||
these boxes. Also, the inventors' tone in their defense of the
|
||
DES ('Cryptography: A New Dimension in Computer Security'; Carl
|
||
Meyer and Stephan Matyas, John Wiley and Sons) strikes me as
|
||
sincere. They seem to me genuinely surprised to discover that
|
||
non-random boxes are better than random.
|
||
|
||
I can't explain the S-boxes, but I do have an idea why
|
||
non-random boxes might be 'better'. It turns out that the
|
||
ENIGMA rotors also were not randomly wired. Rather, they were
|
||
wired to ensure 'distance'. The idea is that similar plaintext
|
||
should not encrypt under a similar key to a similar ciphertext.
|
||
I haven't studied the ENIGMA in detail yet, so I don't know what
|
||
threat 'closeness' poses to the key that the German inventors
|
||
were trying to avoid. Yet, it must have been a threat, because
|
||
the German cryptographers deliberately avoided random rotor
|
||
wiring, in spite of the weaknesses 'distance' wiring introduced.
|
||
|
||
There is the same idea in symbol table hashing algorithms.
|
||
Here, one wants similar variables, perhaps differing only by one
|
||
character to hash to far different places in a symbol table.
|
||
The reason is, we want to avoid 'collisions' - different
|
||
variable names accidentally hashing to the same symbol table
|
||
location, causing extra effort to find an unused slot for each
|
||
variable name. Most hashing schemes have the effect of
|
||
discarding the high order character, so without 'distance', in
|
||
FORTRAN, for example, collision is practically assured because
|
||
FORTRAN uses the initial letter of a variable to classify the
|
||
data type, and we tend to name variables systematically.
|
||
|
||
Might this not be the secret of the S-box design criterion? It
|
||
seems to me that the message space mapping of a cipher is not in
|
||
principle different from symbol table hashing. And if
|
||
'distance' is *not* a criterion of the mapping, maybe it ought
|
||
to be.
|
||
|
||
What I'm saying, strictly as speculation, is that very similar
|
||
plaintext, differing perhaps by only a bit, should map to wildly
|
||
different points in the message space, that it may be impossible
|
||
to guarantee distance with random table values, and that the
|
||
S-box values were carefully computed to ensure distance.
|
||
Exactly why the lack of distance should be weak, I haven't
|
||
figured out. In support of my guess, the DES does indeed
|
||
exhibit the property that similar input maps to violently
|
||
different results.
|
||
|
||
But even if my guess is right, this leaves unanswered the
|
||
important question of how to design the boxes. Still, it's a
|
||
start, and it raises very interesting questions in math and
|
||
cryptography.
|
||
|
||
The plaintext is not simply divided into LEFT and RIGHT like we
|
||
described. It is permuted first by a so-called Initial
|
||
Permutation, IP, that transposes the rows and columns of a
|
||
bit-byte matrix in an odd way.
|
||
|
||
1 2 3 4 5 6 7 8
|
||
9 10 11 12 13 14 15 16
|
||
17 18 19 20 21 22 23 24
|
||
25 26 27 28 29 30 31 32
|
||
33 34 35 36 37 38 39 40
|
||
41 42 43 44 45 46 47 48
|
||
49 50 51 52 53 54 55 56
|
||
57 58 59 60 61 62 63 64
|
||
-----------------------
|
||
1 2 3 4 take out order for LEFT
|
||
5 6 7 8 take out order for RIGHT
|
||
|
||
In other words, transcribe the bytes in row order, then take
|
||
bits out in the column order indicated, from bottom to top.
|
||
|
||
This permutation is bemusing. There is no apparent reason for
|
||
it, and many cryptanalysts believe IP has no cryptographic
|
||
significance. The American Cryptogram Association, an
|
||
organization of amateur cryptanalysts, has considered IP without
|
||
being able to reach a conclusion about it.
|
||
|
||
This unexplained permutation does something distinctive,
|
||
however. Let's look at eight consecutive bytes in a different
|
||
way:
|
||
|
||
x..x x..x
|
||
x..x x..x
|
||
x..x x..x
|
||
x..x x..x
|
||
x..x x..x
|
||
x..x x..x
|
||
x..x x..x
|
||
x..x x..x
|
||
|
||
What we are doing is distinguishing between nibble boundary bits
|
||
and nibble inner bits. Now, it is a fact that IP rearranges
|
||
boundary and inner bits like this
|
||
|
||
.... ....
|
||
x..x x..x
|
||
.... ....
|
||
x..x x..x
|
||
x..x x..x
|
||
.... ....
|
||
x..x x..x
|
||
.... ....
|
||
|
||
In other words, half the normal boundary bits are exchanged with
|
||
inner bits, and the original boundary bits that still remain
|
||
boundary bits are relocated. The reason for this is obvious -
|
||
the boundary bits have quite pronounced statistics in normal
|
||
natural language text, especially in EBCDIC. For example, a
|
||
blank is 0 1 0 0 0 0 0 0. The result is that 20 percent of all
|
||
boundary bits are 0 because blanks make up 20 percent of all
|
||
text. By continuing to separate the boundary nibbles by their
|
||
natural language statistics, you can derive a frequency table,
|
||
in addition to 0..0, for
|
||
|
||
0..1
|
||
1..0
|
||
1..1
|
||
|
||
Do this as an exercise, and you will see that the frequencies
|
||
are quite pronounced. Use EBCDIC; remember, the inventors of
|
||
the DES were thinking IBM, not micros and ASCII.
|
||
|
||
The effect of IP is to even out these frequencies for the
|
||
boundary bits so that they are more nearly uniform. It is
|
||
impossible to believe that something as remarkable as this was
|
||
not intended. This smoothing out of boundary bit statistics
|
||
must be the purpose of IP.
|
||
|
||
How come? Why are the boundary bits so important that they are
|
||
evened out at the expense of the inner bits? The answer I
|
||
suppose is to compensate for the reduction of the DES key from
|
||
128 bits to 56.
|
||
|
||
It doesn't take too long studying the DES to realize that its
|
||
dimensions are wrong. It is very lopsided. There are eight
|
||
boxes of four rows and 16 columns each. In short time, you get
|
||
the feeling that 16 boxes of 16 rows and columns each would have
|
||
been more natural. The key should have been 128 bits; LEFT and
|
||
RIGHT should have consisted of 64 bits each. The inter-nibble
|
||
dependency should have been TWO bits from the previous nibble
|
||
(not one) concatenated to the current nibble, concatenated to
|
||
TWO bits of the following nibble. In other words, instead of 32
|
||
1 2 3 4 5, it should have been 31 32 1 2 3 4 5 6. Bits 31 32 5
|
||
6 should have been the row coordinate into the first S-box, not
|
||
32 and 5.
|
||
|
||
The result of this brutal reduction is that there are far less
|
||
rows per nibble to select values out of than there are columns.
|
||
Guessing which row is selected, rather than guessing which
|
||
value, there are
|
||
|
||
(2^2)^8 = 2^16
|
||
|
||
choices per round, or for one nibble for all rounds. This is
|
||
not a formidable number.
|
||
|
||
For known input, correctly guessing a row gives you two bits of
|
||
the key. Because of the key bit consistency requirement, it
|
||
also helps determine other key bits. Also, guessing the row
|
||
helps determine the column as well, since the selfsame message
|
||
row bits participate as column selection bits. It looks like
|
||
row selection is the weakest part of the DES, and if a
|
||
cryptanalytic attack is achievable, it should be the spot to
|
||
concentrate on. My guess is that a strong bias of nibble
|
||
boundary bits might help such an attack, and that IP is a
|
||
stop-gap measure to thwart it. If the rows *can* be determined,
|
||
32 key bits are recovered, and the DES falls. The remaining 24
|
||
bits could not resist even a microcomputer.
|
||
|
||
Perhaps IP is not a bemusing oddity. Perhaps it is symptomatic
|
||
of a deep weakness born of the DES's butchering. By the way,
|
||
this is exactly how cryptanalysis works - find a weak point and
|
||
hammer it.
|
||
|
||
---
|
||
|
||
This note will describe one more implementation trick for very
|
||
fast software DES, then recap implementation tricks.
|
||
|
||
You will remember how the 32 bits of RIGHT (or LEFT) must be
|
||
expanded before they can be combined with the key schedule for a
|
||
round. The following bits
|
||
|
||
1 2 3 4
|
||
5 6 7 8
|
||
9 10 11 12
|
||
13 14 15 16
|
||
17 18 19 20
|
||
21 22 23 24
|
||
25 26 27 28
|
||
29 30 31 32
|
||
|
||
must be expanded to
|
||
|
||
32 1 2 3 4 5
|
||
4 5 6 7 8 9
|
||
8 9 10 11 12 13
|
||
12 13 14 15 16 17
|
||
16 17 18 19 20 21
|
||
20 21 22 23 24 25
|
||
24 25 26 27 28 26
|
||
28 29 30 31 32 1
|
||
|
||
This expansion is called E in FIPS 46. You will also remember
|
||
that IP arranges the bits of the plaintext into these 32 bits
|
||
for RIGHT and 32 bits for LEFT. Now to perform this expansion
|
||
in code for every round happens to be quite a bit of work, even
|
||
using a micro's fastest instruction forms. However, IP may be
|
||
modified to generate LEFT and RIGHT in *expanded* form. Then,
|
||
if the S-box values are adjusted to compensate for an expanded
|
||
LEFT and RIGHT, the same encryption is achieved, but without the
|
||
expense of performing E for every round. Including the S-box
|
||
adjustment to do away with P, the cost is 48-bit S-box values
|
||
for every original 4-bit element in the S-boxes.
|
||
|
||
To see how this works, remember that *all* values in the first
|
||
S-box encrypt only bits 9 17 23 and 31 of either LEFT or RIGHT,
|
||
and the first S-box's first value is 1 1 1 0, in binary. We
|
||
therefore replace this value with the following
|
||
|
||
0 0 0 0 0 0
|
||
0 0 0 0 0 1
|
||
0 1 0 0 0 0
|
||
0 0 0 0 0 1
|
||
0 1 0 0 0 0
|
||
0 0 0 1 0 0
|
||
0 0 0 0 0 0
|
||
0 0 0 0 0 0
|
||
|
||
Let's assume the 8088 chip and a clock rate of 4.77 MHz, and
|
||
recap. Setting the key schedules in a one-time initialization
|
||
call increases software speed from about 80 bytes per second to
|
||
close to 800. Getting rid of P by precomputing it in the
|
||
S-boxes increases the speed from about 800 bytes/second to about
|
||
1700. Making E a one-time expense by putting it into IP, and
|
||
adjusting the S-boxes accordingly increases the speed from about
|
||
1700 to 2500 bytes/second.
|
||
|
||
There is one more trick if you can live with a c * 64K table,
|
||
where 'c' is the length of one table entry. You can collapse two
|
||
S-boxes into a single table, thereby halving the number of S-box
|
||
lookups. This should bring the speed up to about 5000
|
||
bytes/second. However, this large a table isn't practical for
|
||
some encryption applications.
|
||
|
||
---
|
||
|
||
The files permf.s and permg.s are the 68000 versions of the
|
||
permutation routines described in Z80 code in the article
|
||
"Designing a File Encryption System" in the Aug '84 issue of
|
||
DDJ. Permf performs the forward permutation of the bits in a
|
||
256-bit block as specified by a table of bytes. Permg performs
|
||
the inverse permutation.
|
||
|
||
The file cycle.s is the 68000 version of the corresponding
|
||
Z80 code for the routine which "cycles" the permutation tables
|
||
to the next position.
|
||
|
||
For example, if the permutation table has the values:
|
||
|
||
1 15 115 57 .... 0
|
||
|
||
then the forward permutation means to put the 1st bit of the
|
||
block in the 0th place, the 15th bit in the 1st place, the 115th
|
||
bit in the 2nd place, and so on until the 0th bit goes in the
|
||
255th place.
|
||
|
||
The inverse permutation with the same table means to place the
|
||
0th bit of the block in the 1st place, the 1st bit in the 15th
|
||
place, the 2nd bit in the 115th place, and so on until the 255th
|
||
bit goes in the 0th place.
|
||
|
||
The routines address bits in the block by deriving a bit index
|
||
from the byte value of the permutation table. The upper five
|
||
bits of that value index to the particular byte in the block,
|
||
and the lower three bits then index to the particular bit within
|
||
that byte.
|
||
|
||
In the original cryptographic use, the permutation table was
|
||
assumed to be cycled to its next permutation after the
|
||
encryption of each block. The idea is to use the same thing in
|
||
as many different ways as possible to get a long period before
|
||
it repeats. Consider the following permutation list of 10
|
||
elements:
|
||
|
||
element: 7 4 1 3 5 9 8 6 2 0
|
||
position: 0 1 2 3 4 5 6 7 8 9
|
||
|
||
This is the table permf and permg use. In cyclic notation this
|
||
list becomes:
|
||
|
||
(0 7 6 8 2 1 4 5 9) (3)
|
||
|
||
That is, there are two cycles, one of degree 9 and one
|
||
singleton. It means that 7 goes to 0, 6 goes to 7, 8 goes to 6,
|
||
and so on. If we "rotate" the cycles to the second position, we
|
||
get this list:
|
||
|
||
element: 6 5 4 3 9 0 2 8 1 7
|
||
position: 0 1 2 3 4 5 6 7 8 9
|
||
|
||
Thus from one cycle table we can get 9 different permutation
|
||
lists. The cycle routine constructs these lists for permf and
|
||
permg, given a table in cyclic notation as above. It can handle
|
||
tables of 256 elements, each of which may contain a number of
|
||
cycles. For example, if we had two tables, the first containing
|
||
two cycles, of degrees 83 and 173, and the second containing
|
||
cycles of degrees 197 and 59, the composite cycle would not
|
||
repeat until 83 * 173 * 197 * 59 = 1.669 x 10^8 blocks had been
|
||
processed.
|
||
|
||
The routines now run about four times faster that the Z80
|
||
versions.
|
||
|
||
*
|
||
* PERMF for the 68000. Permute a 256-bit bit vector, BLOCK
|
||
* by table BITPERM. On call:
|
||
*
|
||
* a0 -> BLOCK
|
||
* a1 -> BITPERM
|
||
*
|
||
* On return, a0 -> permuted BLOCK.
|
||
*
|
||
* Register usage:
|
||
*
|
||
* a2 -> WORK, a 32-byte temporary work area
|
||
* d0 = byte from BITPERM, shifted to bit index
|
||
* d1 = index to byte of BLOCK
|
||
* d2 = rotated bit for masking and inner loop control
|
||
* d3 = #31, outer loop control
|
||
* d4 = #$07 immediate masking value
|
||
*
|
||
* 5/19/86. Execution time 4.5 ms.
|
||
*
|
||
.globl permf,work
|
||
|
||
.text
|
||
|
||
permf:
|
||
movem.l d0-d4/a0-a3,-(a7)
|
||
moveq #7,d0 clear work area
|
||
lea work,a2 -> work
|
||
move.l a2,a3 copy ptr for later use
|
||
clrloop:
|
||
clr.l (a2)+
|
||
dbra d0,clrloop
|
||
move.l a3,a2 retrieve work pointer
|
||
move #$80,d2 masking bit and inner loop control
|
||
moveq #31,d3 outer loop control
|
||
moveq #7,d4 masking value
|
||
clr d0 keep word clear
|
||
permloop:
|
||
move.b (a1)+,d0 get byte from BITPERM
|
||
move d0,d1 we will need it twice
|
||
lsr #3,d1 compute byte index in BLOCK
|
||
and d4,d0 save lower 3 bits for bit index
|
||
eor d4,d0 reverse bit order for btst
|
||
btst d0,(a0,d1) is bit on in BLOCK?
|
||
beq permf1 if so, we must set bit in WORK
|
||
or.b d2,(a2) set bit in WORK
|
||
permf1:
|
||
ror.b #1,d2 shift masking bit
|
||
bcc permloop next bit of work?
|
||
addq #1,a2 else, next byte of work
|
||
dbra d3,permloop do for 32 bytes
|
||
|
||
movloop:
|
||
move.l (a3)+,(a0)+ move data from WORK to BLOCK
|
||
dbra d4,movloop use #7 already in d4
|
||
movem.l (a7)+,d0-d4/a0-a3
|
||
rts all done
|
||
|
||
.end
|
||
|
||
*
|
||
* PERMG for the 68000. Inversely permute a 256-bit bit
|
||
* vector by table BITPERM. On call:
|
||
*
|
||
* a0 -> BLOCK
|
||
* a1 -> BITPERM
|
||
*
|
||
* On return, a0 -> permuted BLOCK.
|
||
*
|
||
* Register usage:
|
||
*
|
||
* a2 -> WORK, a 32-byte temporary work area
|
||
* a3 -> BLOCK
|
||
* d0 = outer loop control
|
||
* d1 = inner loop control
|
||
* d2 = bit counter
|
||
* d3 = longword from block
|
||
* d4 = byte from BITPERM
|
||
* d5 = temporary
|
||
* d6 = #7 masking value
|
||
*
|
||
* 5/19/86. Execution time is 3.3 ms.
|
||
*
|
||
.globl permg,work
|
||
|
||
.text
|
||
|
||
permg:
|
||
movem.l d0-d6/a0-a3,-(a7)
|
||
moveq #7,d0 clear work area
|
||
lea work,a2
|
||
move.l a2,a3 copy ptr for later use
|
||
clrloop:
|
||
clr.l (a2)+
|
||
dbra d0,clrloop
|
||
move.l a0,a3 save a0 ptr for later
|
||
moveq #7,d0 outer loop control
|
||
moveq #0,d2 count of bits
|
||
move d2,d4 need word clear
|
||
moveq #7,d6 masking value
|
||
permg1:
|
||
moveq #31,d1 inner loop control and bit to test
|
||
move.l (a0)+,d3 get longword from BLOCK
|
||
bitloop:
|
||
btst d1,d3 check for bit on
|
||
beq permg2 if on, set BITPERM[d2] bit in WORK
|
||
move.b (a1,d2),d4 get byte BITPERM[COUNT]
|
||
move d4,d5 save for reuse
|
||
lsr #3,d4 index to byte of WORK
|
||
and d6,d5 compute bit # in that byte
|
||
eor d6,d5 reverse bit order
|
||
bset d5,(a2,d4) set the bit in WORK
|
||
permg2:
|
||
addq #1,d2 bump bit count
|
||
dbra d1,bitloop and do for all bits in this word
|
||
dbra d0,permg1 do for all words of BLOCK
|
||
|
||
movloop:
|
||
move.l (a2)+,(a3)+ move WORK to BLOCK
|
||
dbra d6,movloop use #7 already in d6
|
||
movem.l (a7)+,d0-d6/a0-a3
|
||
rts all done
|
||
|
||
.end
|
||
|
||
*
|
||
* CYCLE: Convert a table of permutation cycles to a
|
||
* permutation list at the Nth permutation. This version
|
||
* of cycle can deal with a table having any number of
|
||
* cycles (up to word value) of various degrees. The cyclic
|
||
* permutation table is RANDP and the permutation list
|
||
* table is BITPERM. BITPERM may then be used to permute
|
||
* a block of elements. The procedure is:
|
||
*
|
||
* consult the cycle table structure to obtain the number
|
||
* and degree of the cycles and the rotation to be applied
|
||
* to each
|
||
*
|
||
* k <- 0 /* index into RANDP */
|
||
* for i = 1 to number_of_cycles do
|
||
* get degree_of_this_cycle from structure
|
||
* cycle_base <- k
|
||
* for j = 1 to degree_of_this_cycle do
|
||
* BITPERM[RANDP[k]] <- RANDP[RANDP[(k - cycle_base + rotation)
|
||
* mod (degree_of_this_cycle)]]
|
||
* k <- k + 1
|
||
* end for
|
||
* end for
|
||
*
|
||
* On call:
|
||
* a0 -> RANDP
|
||
* a1 -> BITPERM
|
||
* a2 -> cycle structure
|
||
*
|
||
* On return:
|
||
* all registers saved
|
||
* BITPERM is established from RANDP
|
||
*
|
||
* The cycle structure is built as follows for each set of tables:
|
||
*
|
||
* dc.w xx number of cycles in this table
|
||
* dc.w yy degree of first cycle
|
||
* dc.w zz rotation of first cycle
|
||
* . . ..and so forth for each cycle
|
||
* . .
|
||
* CAUTION: This structure holds words, but this implementation assumes
|
||
* that RANDP and BITPERM are tables of BYTES. See indirect addressing
|
||
* inside degloop below.
|
||
*
|
||
* Version of 4/6/86.
|
||
|
||
.text
|
||
.globl cycle
|
||
|
||
cycle:
|
||
movem.l d0-d7/a0-a2,-(a7)
|
||
move (a2)+,d0 get # of cycles
|
||
subq #1,d0 adjust for dbra
|
||
clr d3 clear index into RANDP ("k" above)
|
||
clr d7 clear scratch register
|
||
cycloop:
|
||
move (a2)+,d1 get degree of this cycle
|
||
move (a2)+,d2 get rotation for this cycle
|
||
move d1,d5 use degree for loop control
|
||
subq #1,d5 adjust for dbra
|
||
move d3,d4 set current cycle base = k
|
||
degloop:
|
||
move d3,d6 first, see if outside cycle
|
||
sub d4,d6 RANDP index - current cycle base
|
||
add d2,d6 add in rotation
|
||
cmp d1,d6 does that put us outside this cycle?
|
||
bcs degok branch if not
|
||
sub d1,d6 else, adjust mod degree
|
||
degok:
|
||
add d4,d6 add cycle base back to index + rotation
|
||
move.b (a0,d6),d6 get RANDP[index + rotation]
|
||
move.b (a0,d3),d7 get RANDP[index]
|
||
move.b (a0,d6),(a1,d7) put byte in BITPERM
|
||
addq #1,d3 bump RANDP index
|
||
dbra d5,degloop loop until this cycle done
|
||
dbra d0,cycloop loop until all cycles done
|
||
|
||
movem.l (a7)+,d0-d7/a0-a2
|
||
rts
|
||
|
||
.end
|
||
|
||
/*EOF*/
|
||
|