852 lines
37 KiB
Plaintext
852 lines
37 KiB
Plaintext
Article 4959 of comp.misc:
|
||
Path: ucdavis!ucbvax!pasteur!ames!mailrus!iuvax!bobmon
|
||
From: bobmon@iuvax.cs.indiana.edu (RAMontante)
|
||
Newsgroups: comp.misc
|
||
Subject: Re: "big endian" and "little endian" - first usage for computers
|
||
Summary: Danny Cohen's article
|
||
Message-ID: <16070@iuvax.cs.indiana.edu>
|
||
Date: 31 Dec 88 06:22:06 GMT
|
||
Reply-To: bobmon@iuvax.UUCP (RAMontante)
|
||
Organization: malkaryotic
|
||
Lines: 837
|
||
|
||
Since this appears to be a fresh topic for lotsa people, I'll repost
|
||
Danny Cohen's article in which he (originated or) popularized the terms
|
||
Big-Endian and Little-Endian. It's pretty big; I don't know who typed
|
||
it all in (maybe John Gilmore, info at the end), but I'm glad they did.
|
||
I don't know about the 1 April 1980 date shown, but I've seen the IEEE
|
||
version and it has a similar date.
|
||
|
||
|
||
IEN 137 Danny Cohen
|
||
U S C/I S I
|
||
1 April 1980
|
||
|
||
|
||
ON HOLY WARS AND A PLEA FOR PEACE
|
||
|
||
|
||
|
||
INTRODUCTION
|
||
|
||
|
||
This is an attempt to stop a war. I hope it is not too late and that
|
||
somehow, magically perhaps, peace will prevail again.
|
||
|
||
The latecomers into the arena believe that the issue is: "What is the
|
||
proper byte order in messages?".
|
||
|
||
The root of the conflict lies much deeper than that. It is the question
|
||
of which bit should travel first, the bit from the little end of the
|
||
word, or the bit from the big end of the word? The followers of the
|
||
former approach are called the Little-Endians, and the followers of the
|
||
latter are called the Big-Endians. The details of the holy war between
|
||
the Little-Endians and the Big-Endians are documented in [6] and
|
||
described, in brief, in the Appendix. I recommend that you read it at
|
||
this point. [I have inserted it -- gnu]
|
||
|
||
13
|
||
|
||
A P P E N D I X
|
||
|
||
|
||
|
||
Some notes on Swift's Gulliver's Travels:
|
||
|
||
|
||
Gulliver finds out that there is a law, proclaimed by the grandfather of
|
||
the present ruler, requiring all citizens of Lilliput to break their
|
||
eggs only at the little ends. Of course, all those citizens who broke
|
||
their eggs at the big ends were angered by the proclamation. Civil war
|
||
broke out between the Little-Endians and the Big-Endians, resulting in
|
||
the Big-Endians taking refuge on a nearby island, the kingdom of
|
||
Blefuscu.
|
||
|
||
Using Gulliver's unquestioning point of view, Swift satirizes religious
|
||
wars. For 11,000 Lilliputian rebels to die over a controversy as
|
||
trivial as at which end eggs have to be broken seems not only cruel but
|
||
also absurd, since Gulliver is sufficiently gullible to believe in the
|
||
significance of the egg question. The controversy is important
|
||
ethically and politically for the Lilliputians. The reader may think
|
||
the issue is silly, but he should consider what Swift is making fun of
|
||
the actual causes of religious- or holy-wars.
|
||
|
||
In political terms, Lilliput represents England and Blefuscu France.
|
||
The religious controversy over egg-breaking parallels the struggle
|
||
between the Protestant Church of England and the Catholic Church of
|
||
France, possibly referring to some differences about what the Sacraments
|
||
really mean. More specifically, the quarrel about egg-breaking may
|
||
allude to the different ways that the Anglican and Catholic Churches
|
||
distribute communion, bread and wine for the Anglican, but bread alone
|
||
for the Catholic. The French and English struggled over more mundane
|
||
questions as well, but in this part of Gulliver's Travels, Swift points
|
||
up the symbolic difference between the churches to ridicule any
|
||
religious war.
|
||
|
||
|
||
For ease of reference please note that Lilliput and Little-Endians
|
||
both start with an "L", and that both Blefuscu and Big-Endians start
|
||
with a "B". This is handy while reading this note.]
|
||
|
||
[End of appendix -- gnu]
|
||
|
||
The above question arises from the serialization process which is
|
||
performed on messages in order to send them through communication media.
|
||
If the communication unit is a message - these problems have no meaning.
|
||
If the units are computer "words" then one may ask in which order these
|
||
words are sent, what is their size, but not in which order the elements
|
||
of these words are sent, since they are sent virtually "at-once". If
|
||
the unit of transmission is an 8-bit byte, similar questions about bytes
|
||
are meaningful, but not the order of the elementary particles which
|
||
constitute these bytes.
|
||
|
||
If the units of communication are bits, the "atoms" ("quarks"?) of
|
||
computation, then the only meaningful question is the order in which
|
||
bits are sent.
|
||
|
||
Obviously, this is actually the case for serial transmission. Most
|
||
modern communication is based on a single stream of information
|
||
("bit-stream"). Hence, bits, rather than bytes or words, are the units
|
||
of information which are actually transmitted over the communication
|
||
channels such as wires and satellite connections.
|
||
|
||
Even though a great deal of effort, in both hardware and software, is
|
||
dedicated to giving the appearance of byte or word communication, the
|
||
basic fact remains: bits are communicated.
|
||
|
||
Computer memory may be viewed as a linear sequence of bits, divided into
|
||
bytes, words, pages and so on. Each unit is a subunit of the next
|
||
level. This is, obviously, a hierarchical organization.
|
||
2
|
||
|
||
If the order is consistent, then such a sequence may be communicated
|
||
successfully while both parties maintain their freedom to treat the bits
|
||
as a set of groups of any arbitrary size. One party may treat a message
|
||
as a "page", another as so many "words", or so many "bytes" or so many
|
||
bits. If a consistent bit order is used, the "chunk-size" is of no
|
||
consequence.
|
||
|
||
If an inconsistent bit order is used, the chunk size must be understood
|
||
and agreed upon by all parties. We will demonstrate some popular but
|
||
inconsistent orders later.
|
||
|
||
In a consistent order, the bit-order, the byte-order, the word-order,
|
||
the page-order, and all the other higher level orders are all the same.
|
||
Hence, when considering a serial bit-stream, along a communication line
|
||
for example, the "chunk" size which the originator of that stream has in
|
||
mind is not important.
|
||
|
||
There are two possible consistent orders. One is starting with the
|
||
narrow end of each word (aka "LSB") as the Little-Endians do, or
|
||
starting with the wide end (aka "MSB") as their rivals, the Big-Endians,
|
||
do.
|
||
|
||
In this note we usually use the following sample numbers: a "word" is a
|
||
32-bit quantity and is designated by a "W", and a "byte" is an 8-bit
|
||
quantity which is designated by a "C" (for "Character", not to be
|
||
confused with "B" for "Bit)".
|
||
|
||
|
||
|
||
|
||
MEMORY ORDER
|
||
|
||
The first word in memory is designated as W0, by both regimes.
|
||
Unfortunately, the harmony goes no further.
|
||
|
||
The Little-Endians assign B0 to the LSB of the words and B31 is the MSB.
|
||
The Big-Endians do just the opposite, B0 is the MSB and B31 is the LSB.
|
||
|
||
By the way, if mathematicians had their way, every sequence would be
|
||
numbered from ZERO up, not from ONE, as is traditionally done. If so,
|
||
the first item would be called the "zeroth"....
|
||
|
||
Since most computers are not built by mathematicians, it is no wonder
|
||
that some computers designate bits from B1 to B32, in either the
|
||
Little-Endians' or the Big-Endians' order. These people probably would
|
||
like to number their words from W1 up, just to be consistent.
|
||
|
||
Back to the main theme. We would like to illustrate the hierarchically
|
||
consistent order graphically, but first we have to decide about the
|
||
order in which computer words are written on paper. Do they go from
|
||
left to right, or from right to left?
|
||
3
|
||
|
||
The English language, like most modern languages, suggests that we lay
|
||
these computer words on paper from left to right, like this:
|
||
|
||
|---word0---|---word1---|---word2---|....
|
||
|
||
In order to be consistent, B0 should be to the left of B31. If the
|
||
bytes in a word are designated as C0 through C3 then C0 is also to the
|
||
left of C3. Hence we get:
|
||
|
||
|---word0---|---word1---|---word2---|....
|
||
|C0,C1,C2,C3|C0,C1,C2,C3|C0,C1,C2,C3|.....
|
||
|B0......B31|B0......B31|B0......B31|......
|
||
|
||
If we also use the traditional convention, as introduced by our
|
||
numbering system, the wide-end is on the left and the narrow-end is on
|
||
the right.
|
||
|
||
Hence, the above is a perfectly consistent view of the world as depicted
|
||
by the Big-Endians. Significance consistently decreases as the item
|
||
numbers (address) increases.
|
||
|
||
Many computers share with the Big-Endians this view about order. In
|
||
many of their diagrams the registers are connected such that when the
|
||
word W(n) is shifted right, its LSB moves into the MSB of word W(n+1).
|
||
|
||
English text strings are stored in the same order, with the first
|
||
character in C0 of W0, the next in C1 of W0, and so on.
|
||
|
||
This order is very consistent with itself and with the English language.
|
||
|
||
On the other hand, the Little-Endians have their view, which is
|
||
different but also self-consistent.
|
||
|
||
They believe that one should start with the narrow end of every word,
|
||
and that low addresses are of lower order than high addresses.
|
||
Therefore they put their words on paper as if they were written in
|
||
Hebrew, like this:
|
||
|
||
...|---word2---|---word1---|---word0---|
|
||
|
||
When they add the bit order and the byte order they get:
|
||
|
||
...|---word2---|---word1---|---word0---|
|
||
....|C3,C2,C1,C0|C3,C2,C1,C0|C3,C2,C1,C0|
|
||
.....|B31......B0|B31......B0|B31......B0|
|
||
|
||
In this regime, when word W(n) is shifted right, its LSB moves into the
|
||
MSB of word W(n-1).
|
||
4
|
||
|
||
English text strings are stored in the same order, with the first
|
||
character in C0 of W0, the next in C1 of W0, and so on.
|
||
|
||
This order is very consistent with itself, with the Hebrew language, and
|
||
(more importantly) with mathematics, because significance increases with
|
||
increasing item numbers (address).
|
||
|
||
It has the disadvantage that English character streams appear to be
|
||
written backwards; this is only an aesthetic problem but, admittedly, it
|
||
looks funny, especially to speakers of English.
|
||
|
||
In order to avoid receiving strange comments about this orders the
|
||
Little-Endians pretend that they are Chinese, and write the bytes, not
|
||
right-to-left but top-to-bottom, like:
|
||
|
||
C0: "J"
|
||
C1: "O"
|
||
C2: "H"
|
||
C3: "N"
|
||
..etc..
|
||
|
||
Note that there is absolutely no specific significance whatsoever to the
|
||
notion of "left" and "right" in bit order in a computer memory. One
|
||
could think about it as "up" and "down" for example, or mirror it by
|
||
systematically interchanging all the "left"s and "right"s. However,
|
||
this notion stems from the concept that computer words represent
|
||
numbers, and from the old mathematical tradition that the wide-end of a
|
||
number (aka the MSB) is called "left" and the narrow-end of a number is
|
||
called "right".
|
||
|
||
This mathematical convention is the point of reference for the notion of
|
||
"left" and "right".
|
||
|
||
It is easy to determine whether any given computer system was designed
|
||
by Little-Endians or by Big-Endians. This is done by watching the way
|
||
the registers are connected for the "COMBINED-SHIFT" operation and for
|
||
multiple-precision arithmetic like integer products; also by watching
|
||
how these quantities are stored in memory; and obviously also by the
|
||
order in which bytes are stored within words. Don't let the B0-to-B31
|
||
direction fool you!! Most computers were designed by Big-Endians, who
|
||
under the threat of criminal prosecution pretended to be Little-Endians,
|
||
rather than seeking exile in Blefuscu. They did it by using the
|
||
B0-to-B31 convention of the Little-Endians, while keeping the
|
||
Big-Endians' conventions for bytes and words.
|
||
|
||
The PDP10 and the 360, for example, were designed by Big-Endians: their
|
||
bit order, byte-order, word-order and page-order are the same. The same
|
||
order also applies to long (multi-word) character strings and to
|
||
multiple precision numbers.
|
||
|
||
5
|
||
|
||
Next, let's consider the new M68000 microprocessor. Its way of storing
|
||
a 32-bit number, xy, a 16-bit number, z, and the string "JOHN" in its
|
||
16-bit words is shown below (S = sign bit, M = MSB, L = LSB):
|
||
|
||
SMxxxxxxx yyyyyyyyL SMzzzzzzL "J" "O" "H" "N"
|
||
|--word0--|--word1--|--word2--|--word3--|--word4--|....
|
||
|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|.....
|
||
|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|......
|
||
|
||
The M68000 always has on the left (i.e., LOWER byte- or word-address)
|
||
the wide-end of numbers in any of the various sizes which it may use: 4
|
||
(BCD), 8, 16 or 32 bits.
|
||
|
||
Hence, the M68000 is a consistent Big-Endian, except for its bit
|
||
designation, which is used to camouflage its true identity. Remember:
|
||
the Big-Endians were the outlaws.
|
||
|
||
Let's look next at the PDP11 order, since this is the first computer to
|
||
claim to be a Little-Endian. Let's again look at the way data is stored
|
||
in memory:
|
||
|
||
"N" "H" "O" "J" SMzzzzzzL SMyyyyyyL SMxxxxxxL
|
||
....|--word4--|--word3--|--word2--|--word1--|--word0--|
|
||
.....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|
|
||
......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|
|
||
|
||
The PDP11 does not have an instruction to move 32-bit numbers. Its
|
||
multiplication products are 32-bit quantities created only in the
|
||
registers, and may be stored in memory in any way. Therefore, the
|
||
32-bit quantity, xy, was not shown in the above diagram.
|
||
|
||
Hence, the above order is a Little-Endians' consistent order. The PDP11
|
||
always stores on the left (i.e., HIGHER bit- or byte-address) the
|
||
wide-end of numbers of any of the sizes which it may use: 8 or 16 bits.
|
||
|
||
However, due to some infiltration from the other camp, the registers of
|
||
this Little-Endian's marvel are treated in the Big-Endians' way: a
|
||
double length operand (32-bit) is placed with its MSB in the lower
|
||
address register and the LSB in the higher address register. Hence,
|
||
when depicted on paper, the registers have to be put from left to right,
|
||
with the wide end of numbers in the LOWER-address register. This
|
||
affects the integer multiplication and division, the combined-shifts and
|
||
more. Admittedly, Blefuscu scores on this one.
|
||
|
||
Later, floating-point hardware was introduced for the PDP11/45.
|
||
|
||
Floating-point numbers are represented by either 32- or 64-bit
|
||
quantities, which are 2 or 4 PDP11 words. The wide end is the one with
|
||
the sign bit(s), the exponent and the MSB of the fraction. The narrow
|
||
end is the one with the LSB of the fraction. On paper these formats are
|
||
clearly shown with the wide end on the left and the narrow on the right,
|
||
according to the centuries old mathematical conventions. On page 12-3
|
||
6
|
||
|
||
of the PDP11/45 processor handbook, [3], there is a cute graphical
|
||
demonstration of this order, with the word "FRACTION" split over all the
|
||
2 or the 4 words which are used to store it.
|
||
|
||
However, due to some oversights in the security screening process, the
|
||
Blefuscuians took over, again. They assigned, as they always do, the
|
||
wide end to the LOWer addresses in memory, and the narrow to the HIGHer
|
||
addresses.
|
||
|
||
Let "xy" and "abcd" be 32- and 64-bit floating-point numbers,
|
||
respectively. Let's look how these numbers are stored in memory:
|
||
|
||
ddddddddL ccccccccc bbbbbbbbb SMaaaaaaa yyyyyyyyL SMxxxxxxx
|
||
....|--word5--|--word4--|--word3--|--word2--|--word1--|--word0--|
|
||
.....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|
|
||
......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|
|
||
|
||
Well, Blefuscu scores many points for this. The above reference in [3]
|
||
does not even try to camouflage it by any Chinese notation.
|
||
|
||
Encouraged by this success, as minor as it is, the Blefuscuians tried to
|
||
pull another fast one. This time it was on the VAX, the sacred machine
|
||
which all the Little-Endians worship.
|
||
|
||
Let's look at the VAX order. Again, we look at the way the above data
|
||
(with xy being a 32-bit integer) is stored in memory:
|
||
|
||
"N" "H" "O" "J" SMzzzzzzL SMxxxxxxx yyyyyyyyL
|
||
...ng2-------|-------long1-------|-------long0-------|
|
||
....|--word4--|--word3--|--word2--|--word1--|--word0--|
|
||
.....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|
|
||
......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|
|
||
|
||
What a beautifully consistent Little-Endians' order this is !!!
|
||
|
||
So, what about the infiltrators? Did they completely fail in carrying
|
||
out their mission? Since the integer arithmetic was closely guarded
|
||
they attacked the floating point and the double-floating which were
|
||
already known to be easy prey.
|
||
7
|
||
|
||
Let's look, again, at the way the above data is stored, except that now
|
||
the 32-bit quantity xy is a floating point number: now this data is
|
||
organized in memory in the following Blefuscuian way:
|
||
|
||
"N" "H" "O" "J" SMzzzzzzL yyyyyyyyL SMxxxxxxx
|
||
...ng2-------|-------long1-------|-------long0-------|
|
||
....|--word4--|--word3--|--word2--|--word1--|--word0--|
|
||
.....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|
|
||
......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|
|
||
|
||
Blefuscu scores again. The VAX is found guilty, however with the
|
||
explanation that it tries to be compatible with the PDP11.
|
||
|
||
Having found themselves there, the VAXians found a way around this
|
||
unaesthetic appearance: the VAX literature (e.g., p. 10 of [4])
|
||
describes this order by using the Chinese top-to-bottom notation, rather
|
||
than an embarrassing left-to-right or right-to-left one. This page is a
|
||
marvel. One has to admire the skillful way in which some quantities are
|
||
shown in columns 8-bit wide, some in 16 and other in 32, all in order to
|
||
avoid the egg-on-the-face problem.....
|
||
|
||
By the way, some engineering-type people complain about the "Chinese"
|
||
(vertical) notation because usually the top (aka "up") of the diagrams
|
||
corresponds to "low"-memory (low addresses). However, anyone who was
|
||
brought up by computer scientists, rather than by botanists, knows that
|
||
trees grow downward, having their roots at the top of the page and their
|
||
leaves down below. Computer scientists seldom remember which way "up"
|
||
really is (see 2.3 of [5], pp. 305-309).
|
||
|
||
Having scored so easily in the floating point department, the
|
||
Blefuscuians moved to new territories: Packed-Decimal. The VAX is also
|
||
capable of using 4-bit-chunk decimal arithmetic, which is similar to the
|
||
well known BCD format.
|
||
|
||
The Big-Endians struck again, and without any resistance got their way.
|
||
The decimal number 12345678 is stored in the VAX memory in this order:
|
||
|
||
7 8 5 6 3 4 1 2
|
||
...|-------long0-------|
|
||
....|--word1--|--word0--|
|
||
.....|-C1-|-C0-|-C1-|-C0-|
|
||
......|B15....B0|B15....B0|
|
||
|
||
This ugliness cannot be hidden even by the standard Chinese trick.
|
||
8
|
||
|
||
|
||
|
||
|
||
|
||
SUMMARY (of the Memory-Order section)
|
||
|
||
|
||
To the best of my knowledge only the Big-Endians of Blefuscu have built
|
||
systems with a consistent order which works across chunk-boundaries,
|
||
registers, instructions and memories. I failed to find a
|
||
Little-Endians' system which is totally consistent.
|
||
|
||
|
||
|
||
|
||
TRANSMISSION ORDER
|
||
|
||
|
||
In either of the consistent orders the first bit (B0) of the first byte
|
||
(C0) of the first word (W0) is sent first, then the rest of the bits of
|
||
this byte, then (in the same order) the rest of the bytes of this word,
|
||
and so on.
|
||
|
||
Such a sequence of 8 32-bit words, for example, may be viewed as either
|
||
4 long-words, 8 words, 32 bytes or 256 bits.
|
||
|
||
For example, some people treat the ARPA-internet-datagrams as a sequence
|
||
of 16-bit words whereas others treat them as either 8-bit byte streams
|
||
or sequences of 32-bit words. This has never been a source of
|
||
confusion, because the Big-Endians' consistent order has been assumed.
|
||
|
||
There are many ways to devise inconsistent orders. The two most popular
|
||
ones are the following and its mirror image. Under this order the first
|
||
bit to be sent is the LEAST significant bit (B0) of the MOST significant
|
||
byte (C0) of the first word, followed by the rest of the bits of this
|
||
byte, then the same right-to-left bit order inside the left-to-right
|
||
byte order.
|
||
|
||
Figure 1 shows the transmission order for the 4 orders which were
|
||
discussed above, the 2 consistent and the 2 inconsistent ones.
|
||
|
||
Those who use such an inconsistent order (or any other), and only those,
|
||
have to be concerned with the famous byte-order problem. If they can
|
||
pretend that their communication medium is really a byte-oriented link
|
||
then this inconsistency can be safely hidden under the rug.
|
||
|
||
A few years ago 8-bit microprocessors appeared and changed drastically
|
||
the way we do business. A few years later a wide variety of 8-bit
|
||
communication hardware (e.g., Z80-SIO and 2652) followed, all of which
|
||
operate in the Little-Endians' order.
|
||
9
|
||
|
||
Now a wave of 16-bit microprocessors has arrived. It is not
|
||
inconceivable that 16-bit communication hardware will become a reality
|
||
relatively soon.
|
||
|
||
Since the 16-bit communication gear will be provided by the same folks
|
||
who brought us the 8-bit communication gear, it is safe to expect these
|
||
two modes to be compatible with each other.
|
||
|
||
The only way to achieve this is by using the consistent Little-Endians
|
||
order, since all the existing gear is already in Little-Endians order.
|
||
|
||
We have already observed that the Little-Endians do not have consistent
|
||
memory orders for intra-computer organization.
|
||
|
||
IF the 16-bit communication link could be made to operate in any order,
|
||
consistent or not, which would give it the appearance of being a byte-
|
||
oriented link, THEN the Big-Endians could push (ask? hope? pray?) for an
|
||
order which transmits the bytes in left-to-right (i.e., wide-end first)
|
||
and use that as a basis for transmitting all quantities (except BCD) in
|
||
the more convenient Big-Endians format, with the most significant
|
||
portions leading the least significant, maintaining compatibility
|
||
between 16- and 32-bit communication, and more.
|
||
|
||
However, this is a big "IF".
|
||
|
||
Wouldn't it be nice if we could encapsulate the byte-communication and
|
||
forget all about the idiosyncrasies of the past, introduced by RS232 and
|
||
TELEX, of sending the narrow-end first?
|
||
|
||
I believe that it would be nice, but nice things do not necessarily
|
||
occur, especially if there is so much silicon against them.
|
||
|
||
Hence, our choice now is between (1) Big-Endians' computer-convenience
|
||
and (2) future compatibility between communication gear of different
|
||
chunk size.
|
||
|
||
I believe that this is the question, and we should address it as such.
|
||
|
||
Short term convenience considerations are in favor of the former, and
|
||
the long term ones are in favor of the latter.
|
||
|
||
Since the war between the Little-Endians and the Big-Endians is
|
||
imminent, let's count who is in whose camp.
|
||
|
||
The founders of the Little-Endians party are RS232 and TELEX, who stated
|
||
that the narrow-end is sent first. So do the HDLC and the SDLC
|
||
protocols, the Z80-SIO, Signetics-2652, Intel-8251, Motorola-6850 and
|
||
all the rest of the existing communication devices. In addition to
|
||
these protocols and chips the PDP11s and the VAXes have already pledged
|
||
their allegiance to this camp, and deserve to be on this roster.
|
||
10
|
||
|
||
The HDLC protocol is a full fledged member of this camp because it sends
|
||
all of its fields with the narrow end first, as is specifically defined
|
||
in Table 1/X.25 (Frame formats) in section 2.2.1 of Recommendation X.25
|
||
(see [2]). A close examination of this table reveals that the bit order
|
||
of transmission is always 1-to-8. Always, except the FCS (checksum)
|
||
field, which is the only 16-bit quantity in the byte-oriented protocol.
|
||
|
||
The FCS is sent in the 16-to-1 order. How did the Blefuscuians manage
|
||
to pull off such a fiasco?! The answer is beyond me. Anyway, anyone
|
||
who designates bits as 1-to-8 (instead of 0-to-7) must be gullible to
|
||
such tricks.
|
||
|
||
The Big-Endians have the PDP10's, 370's, ALTO's and Dorado's...
|
||
|
||
An interesting creature is the ARPANet-IMP. The documentation of its
|
||
standard host interface (aka "LH/DH") states that "The high order bit of
|
||
each word is transmitted first" (p. 4-4 of [1]), hence, it is a
|
||
Big-Endian. This is very convenient, and causes no confusion between
|
||
diagrams which are either 32- (e.g., on p. 3-25) and 16-bit wide (e.g.,
|
||
on p. 5-14).
|
||
|
||
However, the IMP's Very Distant Host (VDH) interface is a Little-Endian.
|
||
|
||
The same document ([1], again, p. F-18), states that the data "must
|
||
consist of an even number of 8-bit bytes. Further, considering each pair
|
||
of bytes as a 16-bit word, the less significant (right) byte is sent
|
||
first".
|
||
|
||
In order to make this even more clear, p. F-23 states "All bytes (data
|
||
bytes too) are transmitted least significant (rightmost) bit first".
|
||
|
||
Hence, both camps may claim to have this schizophrenic double-agent in
|
||
their camp.
|
||
|
||
Note that the Lilliputians' camp includes all the who's-who of the
|
||
communication world, unlike the Blefuscuians' camp which is very much
|
||
oriented toward the computing world.
|
||
|
||
Both camps have already adopted the slogan "We'd rather fight than
|
||
switch!".
|
||
|
||
I believe they mean it.
|
||
11
|
||
|
||
|
||
|
||
|
||
|
||
SUMMARY (of the Transmission-Order section)
|
||
|
||
|
||
There are two camps each with its own language. These languages are as
|
||
compatible with each other as any Semitic and Latin languages are.
|
||
|
||
All Big-Endians can talk to each other with relative ease.
|
||
|
||
So can all the Little-Endians, even though there are some differences
|
||
among the dialects used by different tribes.
|
||
|
||
There is no middle ground. Only one end can go first.
|
||
|
||
|
||
|
||
|
||
CONCLUSION
|
||
|
||
|
||
Each camp tries to convert the other. Like all the religious wars of
|
||
the past, logic is not the decisive tool. Power is. This holy war is
|
||
not the first one, and probably will not be the last one either.
|
||
|
||
The "Be reasonable, do it my way" approach does not work. Neither does
|
||
the Esperanto approach of "let's all switch to yet a new language".
|
||
|
||
Our communication world may split according to the language used. A
|
||
certain book (which is NOT mentioned in the references list) has an
|
||
interesting story about a similar phenomenon, the Tower of Babel.
|
||
|
||
Little-Endians are Little-Endians and Big-Endians are Big-Endians and
|
||
never the twain shall meet.
|
||
|
||
We would like to see some Gulliver standing up between the two islands,
|
||
forcing a unified communication regime on all of us. I do hope that my
|
||
way will be chosen, but I believe that, after all, which way is chosen
|
||
does not make too much difference. It is more important to agree upon
|
||
an order than which order is agreed upon.
|
||
|
||
How about tossing a coin ???
|
||
12
|
||
|
||
|
||
time time
|
||
| |
|
||
\ | | /
|
||
\ | | /
|
||
\ | | /
|
||
\ | | /
|
||
\ | | /
|
||
\ | | /
|
||
\ | | /
|
||
\ | | /
|
||
\ | | /
|
||
\ | | /
|
||
\ | | /
|
||
\ | | /
|
||
\ | | /
|
||
\ | | /
|
||
\ | | /
|
||
\ | | /
|
||
<-MSB---------------LSB- -MSB---------------LSB->
|
||
order (1) | | order (2)
|
||
|
||
|
||
time time
|
||
| |
|
||
/ | | \
|
||
/ | | \
|
||
/ | | \
|
||
/ | | \
|
||
/ | | \
|
||
/ | | \
|
||
/ | | \
|
||
/ | | \
|
||
/ | | \
|
||
/ | | \
|
||
/ | | \
|
||
/ | | \
|
||
/ | | \
|
||
/ | | \
|
||
/ | | \
|
||
/ | | \
|
||
<-MSB---------------LSB- -MSB---------------LSB->
|
||
order (3) | | order (4)
|
||
|
||
|
||
|
||
|
||
Figure 1: Possible orders, consistent: (1)+(2), inconsistent: (3)+(4).
|
||
|
||
14
|
||
|
||
R E F E R E N C E S
|
||
|
||
|
||
|
||
[1] Bolt Beranek & Newman.
|
||
Report No. 1822: Interface Message Processor.
|
||
Technical Report, BB&N, May, 1978.
|
||
|
||
[2] CCITT.
|
||
Orange Book. Volume VIII.2: Public Data Networks.
|
||
International Telecommunication Union, Geneva, 1977.
|
||
|
||
[3] DEC.
|
||
PDP11 04/05/10/35/40/45 processor handbook.
|
||
Digital Equipment Corp., 1975.
|
||
|
||
[4] DEC.
|
||
VAX11 - Architecture Handbook.
|
||
Digital Equipment Corp., 1979.
|
||
|
||
[5] Knuth, D. E.
|
||
The Art of Computer Programming. Volume I: Fundamental
|
||
Algorithms.
|
||
Addison-Wesley, 1968.
|
||
|
||
[6] Swift, Jonathan.
|
||
Gulliver's Travel.
|
||
Unknown publisher, 1726.
|
||
15
|
||
|
||
OTHER SLIGHTLY RELATED TOPICS (IF AT ALL)
|
||
|
||
|
||
not necessarily for inclusion in this note
|
||
|
||
|
||
|
||
|
||
|
||
Who's on first? Zero or One ??
|
||
|
||
People start counting from the number ONE. The very word FIRST is
|
||
abbreviated into the symbol "1st" which indicates ONE, but this is a
|
||
very modern notation. The older notions do not necessarily support this
|
||
relationship.
|
||
|
||
In English and French - the word "first" is not derived from the word
|
||
"one" but from an old word for "prince" (which means "foremost").
|
||
Similarly, the English word "second" is not derived from the number
|
||
"two" but from an old word which means "to follow". Obviously there is
|
||
an close relation between "third" and "three", "fourth" and "four" and
|
||
so on.
|
||
|
||
Similarly, in Hebrew, for example, the word "first" is derived from the
|
||
word "head", meaning "the foremost", but not specifically No. 1. The
|
||
Hebrew word for "second" is specifically derived from the word "two".
|
||
The same for three, four and all the other numbers.
|
||
|
||
However, people have,for a very long time, counted from the number One,
|
||
not from Zero. As a matter of fact, the inclusion of Zero as a
|
||
full-fledged member of the set of all numbers is a relatively modern
|
||
concept.
|
||
|
||
Zero is one of the most important numbers mathematically. It has many
|
||
important properties, such as being a multiple of any integer.
|
||
|
||
A nice mathematical theorem states that for any basis, b, the first b^N
|
||
(b to the Nth power) positive integers are represented by exactly N
|
||
digits (leading zeros included). This is true if and only if the count
|
||
starts with Zero (hence, 0 through b^N-1), not with One (for 1 through
|
||
b^N).
|
||
|
||
This theorem is the basis of computer memory addressing. Typically, 2^N
|
||
cells are addressed by an N-bit addressing scheme. Starting the count
|
||
from One, rather than Zero, would cause either the loss of one memory
|
||
cell, or an additional address line. Since either price is too
|
||
expensive, computer engineers agree to use the mathematical notation of
|
||
starting with Zero. Good for them!
|
||
|
||
The designers of the 1401 were probably ashamed to have address-0 and
|
||
hid it from the users, pretending that the memory started at address-1.
|
||
16
|
||
|
||
This is probably the reason that all memories start at address-0, even
|
||
those of systems which count bits from B1 up.
|
||
|
||
Communication engineers, like most "normal" people, start counting from
|
||
the number One. They never suffer by having to lose a memory cell, for
|
||
example. Therefore, they are happily counting 1-to-8, and not 0-to-7 as
|
||
computer people learn to do.
|
||
|
||
|
||
|
||
ORDER OF NUMBERS.
|
||
|
||
In English, we write numbers in Big-Endians' left-to-right order. I
|
||
believe that this is because we SAY numbers in the Big-Endians' order,
|
||
and because we WRITE English in Left-to-right order.
|
||
|
||
Mathematically there is a lot to be said for the Little-Endians' order.
|
||
|
||
Serial comparators and dividers prefer the former. Serial adders and
|
||
multipliers prefer the latter order.
|
||
|
||
When was the common Big-Endians order adopted by most modern languages?
|
||
|
||
In the Bible, numbers are described in words (like "seven") not by
|
||
digits (like "7") which were "invented" nearly a thousand years after
|
||
the Bible was written. In the old Hebrew Bible many numbers are
|
||
expressed in the Little-Endians order (like "Seven and Twenty and
|
||
Hundred") but many are in the Big-Endians order as well.
|
||
|
||
Whenever the Bible is translated into English the contemporary English
|
||
order is used. For example, the above number appears in that order in
|
||
the Hebrew source of The Book of Esther (1:1). In the King James
|
||
Version it is (in English) "Hundred and Seven and Twenty". In the
|
||
modern Revised American Standard Version of the Bible this number is
|
||
simply "One Hundred and Twenty-Seven".
|
||
|
||
|
||
|
||
INTEGERS vs. FRACTIONS
|
||
|
||
Computer designers treat fix-point multiplication in one of two ways, as
|
||
an integer-multiplication or as a fractional-multiplication.
|
||
|
||
The reason is that when two 16-bit numbers, for example, are multiplied,
|
||
the result is a 31-bit number in a 32-bit field. Integers are right
|
||
justified; fractions are left justified. The entire difference is only
|
||
a single 1-bit shift. As small as it is, this is an important
|
||
difference.
|
||
|
||
Hence, computers are wired differently for these kinds of
|
||
multiplications. The addition/subtraction operation is the same for
|
||
either integer/fraction operation.
|
||
17
|
||
|
||
If the LSB is B0 then the value of a number is SIGMA<B(i)*[(2)^i]>,
|
||
for i=0,15, in the above example. This is, obviously, an integer.
|
||
|
||
If the MSB is B0 then the value of a number is SIGMA<B(i)*[(1/2)^i]>,
|
||
for i=0,15. This is, obviously, a fraction.
|
||
|
||
Hence, after multiplication the Integerites would typically keep B0-B15,
|
||
the LSH (Least Significant Half), and discard the MSH, after verifying
|
||
that there is no overflow into it. The Fractionites would also keep
|
||
B0-B15, which is the MSH, and discard the LSH.
|
||
|
||
One could expect Integerites to be Little-Endians, and Fractionites to
|
||
be Big-Endians. I do not believe that the world is that consistent.
|
||
|
||
|
||
|
||
SWIFT's POINT
|
||
|
||
It may be interesting to notice that the point which Jonathan Swift
|
||
tried to convey in Gulliver's Travels in exactly the opposite of the
|
||
point of this note.
|
||
|
||
Swift's point is that the difference between breaking the egg at the
|
||
little-end and breaking it at the big-end is trivial. Therefore, he
|
||
suggests, that everyone does it in his own preferred way.
|
||
|
||
We agree that the difference between sending eggs with the little- or
|
||
the big-end first is trivial, but we insist that everyone must do it in
|
||
the same way, to avoid anarchy. Since the difference is trivial we may
|
||
choose either way, but a decision must be made.
|
||
|
||
|
||
*****
|
||
|
||
An edited version of this note appears in Computer Magazine (IEEE)
|
||
of October 1981.
|
||
|
||
|
||
*****
|
||
Meta-historical note: This was previously posted with the following headers:
|
||
|
||
From: gnu@hoptoad.uucp (John Gilmore)
|
||
Newsgroups: comp.sys.m68k,comp.arch,comp.sys.intel
|
||
Subject: Byte Order: On Holy Wars and a Plea for Peace
|
||
Date: 30 Nov 86 01:29:46 GMT
|
||
|
||
--
|
||
-- bob, mon (bobmon@iuvax.cs.indiana.edu)
|
||
-- RAMontante, Computer Science Dept., Indiana University, Bloomington
|
||
-- "In this position, the skier is flying in a complete stall..."
|
||
|
||
|
||
|