textfiles/computers/486vs040.txt



               CISC  :  The Intel 80486 vs. The Motorola MC68040
              ---------------------------------------------------

               Source : Advanced Microprocessors by Daniel Tabak

               Scribe : X-> Mike <-X  -  July '92

              ---------------------------------------------------


System Comparison

        Most of the space in this text is dedicated to the most recent
advanced CISC microprocessors, the top current products within their families;
the Intel 80486 and the Motorola MC68040.   They both belong to the latest
1.2 million transistors per chip generation.   It therefore makes sense
to compare the two.   It would be unfair to compare the NS32532 with them,
since the NS32532 belongs to an earlier generation and it is not in the same
class as the 80486 and MC68040.

        A selection of points of comparison between the 80486 and the MC68040
is listed in Table 1.1.   Looking carefully at the table, one can perceive
only a single line indentically marked in both columns: both chips have an
on-chip FPU, conforming to the IEEE 754-1985 standard.   All other data are
different, although quite close in some instances.   The points of difference
between the 80486 and the MC68040 will be discussed next in some detail.


         Table 1.1   Comparison of Intel 80486 and Motorola MC68040
-------------------------------------------------------------------------------
       Feature                         Intel 80486       Motorola MC68040
-------------------------------------------------------------------------------
FPU on Chip                            Yes (IEEE)        Yes (IEEE)
CPU General-Purpose 32-bit Registers   8                 16; 8 Data/8 Address
FPU 80-bit Registers                   8 (stack)         8
MMU on Chip                            Yes               Yes; Dual: Data, Code
Cache on Chip                          8k Mixed          4k Data + 4k Code
Segmentation                           Yes               No
Paging                                 Yes; 4k/page      Yes; 4k or 8k/page
TLB (or ATC) size                      32 entries        64 entries in each:
                                                           Data, Code ATC
Levels of protection                   4                 2
Instruction pipeline stages            5                 6
Pins                                   168               179
-------------------------------------------------------------------------------


CPU General-Purpose Registers

        Both systems have 32-bit general-purpose registers; the 80486 has 8,
while the 68040 has double that number, namely 16.   There are advantages
(and disadvantages) to having a large register file.   The register file of
the 80486 is definitely too small to avail itself to the advantages.   This
is particularly exacerbated by the fact that the CPU registers of the 80486
are not really quite as general purpose as one might wish.   In fact, all of
them are dedicated to certain special tasks, such as:

EAX, EDX    Dedicated to multiplication/division operations
EDX         Dedicated to some I/O operations
EBX, EBP    Dedicated to serve as base registers for some addressing modes
ECX         Dedicated to serve as a counter in LOOP instructions
ESP         Dedicated to serve as a stack pointer
ESI, EDI    Dedicated to serve as pointers in string instructions and as
            index registers in some addressing modes

On the other hand, on the MC68040 the eight 32-bit data registers D0 to D7
are genuinely general purpose without any restrictions or specific tasks
imposed on them.   Of the eight 32-bit address registers A0 to A7, only A7
is dedicated as a stack pointer.   The user is free to use the other seven
resgisters A0 to A6 in any possible way.

        From the point of view of the CPU register file, the MC68040 has a
very clear advantage.   It is much better equipped to retain intermediate
results during a program run, thus reducing CPU-memory traffic.   From this
standpoint, the MC68040 even has a slight edge over the VAX architecture.
The VAX (any VAX model) also has sixteen 32-bit general-purpose registers.
However, only 12 of those (as opposed to the 68040's 15) can be used freely
by the programmer.   Of the four VAX dedicated registers, one is used as a
program counter and another as a stack pointer.   The program counter is
completely separate on both the MC68040 and the 80486 and is not included in
the general-purpose registers.


FPU General-Purpose Registers

        Both systems have eight 80-bit registers, providing a large range for
floating-point number representation and a high level of precision.   The only
differnce between the two is that the 80486 FPU registers are organized as a
stack, while those of the MC68040 are accessed directly, as its integer CPU
registers.   Because of the stack organization the 80486 might have a slight
edge from the standpoint of compiler generation (for that part of the compiler
dealing with floating-point operations).


MMU on Chip

        The 80486 has a regular MMU on chip for the control and management of
its memory.   The MC68040 has two MMUs: one for code and one for data.   This
duality, supported by a separate operand data bus, allows the control unit to
handle instruction and operand fetching simultaneously in parallel and enhances
the handling of the instruction pipeline.   Of course, the external bus leading
to the off-chip main memory is single (32-bit data, 32-bit address), and it is
shared by instructions and data operands.   With a reasonable on-chip cache hit
ratio, the off-chip bus would be used less often.


Cache on Chip

        The total on-chip cache of both systems is 8 kbytes.   Interestingly
enough, they have the same parameters: both are four-way set-associative with
16 bytes per line.   The difference is that while the 80486 on-chip 8k cache
is mixed, storing both code and data the MC68040 cache is subdivided into two
equal parts: a 4-kbyte data cache and a 4-kbyte code cache.   Each cache is
controlled by the respective MMU, mentioned above.   The advantage, as in the
MMU case, is the provision of two parallel paths for code and data, resulting
in an overall speedup of operation.


Segmentation

        The Intel 80x86 family implements segmentation, while the M68000 family
does not.   The earlier Intel systems (8086, 80286) were plagued with the upper
64-kbyte segment size limit, starting with the 80386 and so on, the segment sizecan be made as high as 4 Gbytes (maximum size of the physical memory),
effectively removing the segmentation feature by the decision of the user.
Therefore, as far as segmentation is concerned, the 80486 and MC68040 are
comparable.   The 80486 has some edge, since it allows the user to implement
segmentation if needed and avail oneself to its advantages.


Paging

        The MMUs of both systems feature paged virtual memory management.
The 80486 offers a single standard page size of 4 kbytes.   This page size
is implemented in many other systems.   With a 4-kbyte page size, one can
arrange an address mapping where the page directory and the page tables also
have the standard page size of 4 kbytes (1024 = 2^10 entries, 4 bytes each).
Thus, the page directory and the page tables can be treated as entire pages
and placed within page frames in the memory.   This results in reduced
complexity in the MMU hardware and in the OS software, one of whose tasks is
to support the management of virtual memory.   The MC68040 offers two page
sizes, selectable by the user: 4 kbytes and 8 kbytes.   This tends to
complicate the MMU logic and the OS.   It is a good thing that Motorola got
rid of the other page size options available with its MC68851 paged MMU:
8 sizes ranging from 256 bytes to 32 kbytes, stepped by a factor of 2.   On the
other hand, the 8-kbyte per page option could be useful to a programmer dealing
with large modules of code exceeding 4 kbytes.


TLB (or ATC) Size

        The 80486 MMU has a 32-entry TLB.   With a 4-kbyte page it covers
32 x 4 kbytes = 128 kbytes of memory.   The MC68040 offers much more.   The TLB
is called address translation cache (ATC) by Motorola, but it does the same:
it translates virtual into physical addresses.   The name given by Motorola is
simpler to perceive, although the TLB term is predominately used in the
computer literature.   Each of the two MC68040 MMUs has a 64-entry ATC, for a
total of 128 entries on the chip.   For a 4-kbyte page, a total of 128 x 4
kbytes = 512 kbytes of memory is covered (4 times that of the 80486), and for
an 8-kbyte page, 1 Mbyte (8 times that of 80486).   In this case, a strong
advantage of the MC68040 is obvious.   Since the ATCs encompass much more
memory, the ATC miss probability is considerably smaller.   Thus, less time
will be wasted in accessing page tables in memory, resulting in faster overall
operation.


Levels of Protection

        The 80486 offers four levels of protection, while the MC68040 has only
two - the supervisor and user, as does the whole M68000 family.   While the
protection mechanism of the 80486 is much more sophisticated and, with the
segmentation encapsulation of information, offers more reliable protection,
it also results in more complicated on-chip logic.   More time is taken up with
protection checks on the 80486.


Instruction Pipeline Stages

        The 80486 instruction pipeline has five stages, while that of the
MC68040 has six.   This means that the 80486 pipeline can handle five
instructions simultaneously and the MC68040 can handle six.   This certainly
gives an edge in favor of the MC68040, although its MMU-cache-internal buses
duality is a much stronger contributor to its enhanced speed of operation.
The above comments are valid if the instructions are executed sequentially,
without any taken branches.   In the case of the taken branch, the subsequent
prefetched instructions are flushed from the pipeline hardware.   Neither
the 80486 nor the MC68040 employ the delayed branch feature, as do most of
the RISC-type systems.   The MC68040 designers have investigated the possibilityof featuring a delayed branch or other techniques to alleviate the problem of
lost cycles in case of a flushed pipeline.   After a number of simulations,
they came to the conclusion that the gain in performance was not worth the
extra hardware expenditure incurred in the implementation of any of the methods
considered.   In RISC-type systems, on the other hand, due to reduced control
circuitry there is extra space for features such as the delayed branch which
alleviates the pipeline management problem in case of a taken branch.   Indeed,
Intel's RISC 80860 and Motorola's RISC M88000 both implement the delayed branch
technique as an option, selectable by the user.


Performance Benchmarks


Dhrystone Benchmark Version 2.1   (Integer Performance Test -- ALU)
-----------------------------------------------------------------------------
     System               Results - Kdhrystones/s            Relative
-----------------------------------------------------------------------------
VAX 11/780                        1.6                           1.0
Motorola MC68030 (50 Mhz,1ws)    20.0                          12.5
Intel 80486 (25 Mhz)             24.0                          15.0
SPARC (25 Mhz)                   27.0                          16.8
Motorola M88000 (20 Mhz)         33.3                          20.1
MIPS M/2000, R3000 (25 Mhz)      39.4                          23.8
Motorola MC68040 (25 Mhz)        40.0                          24.3
Intel 80860 (33.3 Mhz)           67.3                          40.6
-----------------------------------------------------------------------------


        As one can see, the MC68040 Dhrystone integer performance considerably
exceeds that of the 80486.   It should also be noted that the MC68040
outperforms its predecessor MC68030 by a factor of 2, while the MC68030
operates at a double frequency.


Linpack Benchmark (Double-Precision, 100x100)   (F-P Performance Test -- FPU)
-----------------------------------------------------------------------------
     System               Results - MFLOPS
-----------------------------------------------------------------------------
VAX 11/780                        0.14
NS32532 + NS32381                 0.17
Intel 80386 + 80387 (20 Mhz)      0.20
VAX 8600                          0.49
Intel 80486 (25 Mhz)              1.0
Motorola M88000 (20 Mhz)          1.2
Sun SPARCstation 1                1.3
Decstation 3100 (MIPS R2000)      1.6
Sun 4/200 (SPARC)                 1.6
Am29000 (25 Mhz)                  1.71
IBM 3081                          2.1
Motorola MC68040 (25 Mhz)         3.0
R3000/R3010 (25 Mhz)              3.9
Intel 80860                       4.5
RS/6000 (25 Mhz)                 10.9
Cray 1S                          12.0
Cray X-MP                        56.0
-----------------------------------------------------------------------------


        Here, the MC68040 outperforms the 80486 by a factor of 3.   This
performance ratio is well supported by the discussion given for the data
in Table 1.1.

        The fact that more RISC-type processors, tested above, outperform the
80486 CISC should not escape notice either.   This is particularly significant
for floating-point performance where the 80486 has an on-chip FPU, while the
R3000 and the SPARC use off-chip coprocessors.

        A comparison of memory access clock cycles needed for the execution of
ADD instructions is reported in the following:


Memory Access Clock Counts
-----------------------------------------------------------------------------
    Source        Destination         MC68040        80486        M88000
-----------------------------------------------------------------------------
ADD reg           reg                    1              1            1
ADD mem           reg (cache hit)        1              2            3*
ADD reg           mem (cache hit)        1              1            3*
ADD mem           reg (cache miss)       3              4           15*
-----------------------------------------------------------------------------
 --"reg" represents a CPU register and "mem" represents a location in memory.
  *Includes time to load register plus one clock for the ADD operation.


        The superior performance of the MC68040 fits the discussion given
earlier in this text.   It should also be noted that both the MC68040 and
80486 have an on-chip cache, while the M88000 cache is on a separate CMMU
chip (MC88200).


        It should be noted that all of the above comparisons were conducted
with artificial benchmark programs such as Dhrystone.   It is quite possible
that for some "real-life" programs the performance ordering might be quite
different.   It is no accident that when company A conducts benchmark
experiments, its products come out ahead of others.   It is quite possible
that when another company, say B, publishes its own benchmark results, the
performance ordering may look different.   Therefore, the sample of benchmark
comparison results presented should be regarded as a tentative indication.
They are certainly not conclusive.
 <*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*>
 <*>									 <*>
 <*>   >>> THIS TEXTFILE PASSED THROUGH IMAGINE BBS ++46-42-135505 <<<   <*>
 <*>       """""""""""""""""""""""""""""""""""""""""""""""""""""""       <*>
 <*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*><*>
          <<3C><>S<EFBFBD>C<EFBFBD>O<EFBFBD>O<EFBFBD>P<EFBFBD>E<EFBFBD>X<EFBFBD><58><EFBFBD>S<EFBFBD>W<EFBFBD>E<EFBFBD>D<EFBFBD>E<EFBFBD>N<EFBFBD><4E><EFBFBD>H<EFBFBD>E<EFBFBD>A<EFBFBD>D<EFBFBD>Q<EFBFBD>U<EFBFBD>A<EFBFBD>R<EFBFBD>T<EFBFBD>E<EFBFBD>R<EFBFBD><52>>