296 lines
16 KiB
Plaintext
296 lines
16 KiB
Plaintext
|
||
|
||
|
||
CISC : The Intel 80486 vs. The Motorola MC68040
|
||
---------------------------------------------------
|
||
|
||
Source : Advanced Microprocessors by Daniel Tabak
|
||
|
||
Scribe : X-> Mike <-X - July '92
|
||
|
||
---------------------------------------------------
|
||
|
||
|
||
System Comparison
|
||
|
||
Most of the space in this text is dedicated to the most recent
|
||
advanced CISC microprocessors, the top current products within their families;
|
||
the Intel 80486 and the Motorola MC68040. They both belong to the latest
|
||
1.2 million transistors per chip generation. It therefore makes sense
|
||
to compare the two. It would be unfair to compare the NS32532 with them,
|
||
since the NS32532 belongs to an earlier generation and it is not in the same
|
||
class as the 80486 and MC68040.
|
||
|
||
A selection of points of comparison between the 80486 and the MC68040
|
||
is listed in Table 1.1. Looking carefully at the table, one can perceive
|
||
only a single line indentically marked in both columns: both chips have an
|
||
on-chip FPU, conforming to the IEEE 754-1985 standard. All other data are
|
||
different, although quite close in some instances. The points of difference
|
||
between the 80486 and the MC68040 will be discussed next in some detail.
|
||
|
||
|
||
Table 1.1 Comparison of Intel 80486 and Motorola MC68040
|
||
-------------------------------------------------------------------------------
|
||
Feature Intel 80486 Motorola MC68040
|
||
-------------------------------------------------------------------------------
|
||
FPU on Chip Yes (IEEE) Yes (IEEE)
|
||
CPU General-Purpose 32-bit Registers 8 16; 8 Data/8 Address
|
||
FPU 80-bit Registers 8 (stack) 8
|
||
MMU on Chip Yes Yes; Dual: Data, Code
|
||
Cache on Chip 8k Mixed 4k Data + 4k Code
|
||
Segmentation Yes No
|
||
Paging Yes; 4k/page Yes; 4k or 8k/page
|
||
TLB (or ATC) size 32 entries 64 entries in each:
|
||
Data, Code ATC
|
||
Levels of protection 4 2
|
||
Instruction pipeline stages 5 6
|
||
Pins 168 179
|
||
-------------------------------------------------------------------------------
|
||
|
||
|
||
CPU General-Purpose Registers
|
||
|
||
Both systems have 32-bit general-purpose registers; the 80486 has 8,
|
||
while the 68040 has double that number, namely 16. There are advantages
|
||
(and disadvantages) to having a large register file. The register file of
|
||
the 80486 is definitely too small to avail itself to the advantages. This
|
||
is particularly exacerbated by the fact that the CPU registers of the 80486
|
||
are not really quite as general purpose as one might wish. In fact, all of
|
||
them are dedicated to certain special tasks, such as:
|
||
|
||
EAX, EDX Dedicated to multiplication/division operations
|
||
EDX Dedicated to some I/O operations
|
||
EBX, EBP Dedicated to serve as base registers for some addressing modes
|
||
ECX Dedicated to serve as a counter in LOOP instructions
|
||
ESP Dedicated to serve as a stack pointer
|
||
ESI, EDI Dedicated to serve as pointers in string instructions and as
|
||
index registers in some addressing modes
|
||
|
||
On the other hand, on the MC68040 the eight 32-bit data registers D0 to D7
|
||
are genuinely general purpose without any restrictions or specific tasks
|
||
imposed on them. Of the eight 32-bit address registers A0 to A7, only A7
|
||
is dedicated as a stack pointer. The user is free to use the other seven
|
||
resgisters A0 to A6 in any possible way.
|
||
|
||
From the point of view of the CPU register file, the MC68040 has a
|
||
very clear advantage. It is much better equipped to retain intermediate
|
||
results during a program run, thus reducing CPU-memory traffic. From this
|
||
standpoint, the MC68040 even has a slight edge over the VAX architecture.
|
||
The VAX (any VAX model) also has sixteen 32-bit general-purpose registers.
|
||
However, only 12 of those (as opposed to the 68040's 15) can be used freely
|
||
by the programmer. Of the four VAX dedicated registers, one is used as a
|
||
program counter and another as a stack pointer. The program counter is
|
||
completely separate on both the MC68040 and the 80486 and is not included in
|
||
the general-purpose registers.
|
||
|
||
|
||
FPU General-Purpose Registers
|
||
|
||
Both systems have eight 80-bit registers, providing a large range for
|
||
floating-point number representation and a high level of precision. The only
|
||
differnce between the two is that the 80486 FPU registers are organized as a
|
||
stack, while those of the MC68040 are accessed directly, as its integer CPU
|
||
registers. Because of the stack organization the 80486 might have a slight
|
||
edge from the standpoint of compiler generation (for that part of the compiler
|
||
dealing with floating-point operations).
|
||
|
||
|
||
MMU on Chip
|
||
|
||
The 80486 has a regular MMU on chip for the control and management of
|
||
its memory. The MC68040 has two MMUs: one for code and one for data. This
|
||
duality, supported by a separate operand data bus, allows the control unit to
|
||
handle instruction and operand fetching simultaneously in parallel and enhances
|
||
the handling of the instruction pipeline. Of course, the external bus leading
|
||
to the off-chip main memory is single (32-bit data, 32-bit address), and it is
|
||
shared by instructions and data operands. With a reasonable on-chip cache hit
|
||
ratio, the off-chip bus would be used less often.
|
||
|
||
|
||
Cache on Chip
|
||
|
||
The total on-chip cache of both systems is 8 kbytes. Interestingly
|
||
enough, they have the same parameters: both are four-way set-associative with
|
||
16 bytes per line. The difference is that while the 80486 on-chip 8k cache
|
||
is mixed, storing both code and data the MC68040 cache is subdivided into two
|
||
equal parts: a 4-kbyte data cache and a 4-kbyte code cache. Each cache is
|
||
controlled by the respective MMU, mentioned above. The advantage, as in the
|
||
MMU case, is the provision of two parallel paths for code and data, resulting
|
||
in an overall speedup of operation.
|
||
|
||
|
||
Segmentation
|
||
|
||
The Intel 80x86 family implements segmentation, while the M68000 family
|
||
does not. The earlier Intel systems (8086, 80286) were plagued with the upper
|
||
64-kbyte segment size limit, starting with the 80386 and so on, the segment sizecan be made as high as 4 Gbytes (maximum size of the physical memory),
|
||
effectively removing the segmentation feature by the decision of the user.
|
||
Therefore, as far as segmentation is concerned, the 80486 and MC68040 are
|
||
comparable. The 80486 has some edge, since it allows the user to implement
|
||
segmentation if needed and avail oneself to its advantages.
|
||
|
||
|
||
Paging
|
||
|
||
The MMUs of both systems feature paged virtual memory management.
|
||
The 80486 offers a single standard page size of 4 kbytes. This page size
|
||
is implemented in many other systems. With a 4-kbyte page size, one can
|
||
arrange an address mapping where the page directory and the page tables also
|
||
have the standard page size of 4 kbytes (1024 = 2^10 entries, 4 bytes each).
|
||
Thus, the page directory and the page tables can be treated as entire pages
|
||
and placed within page frames in the memory. This results in reduced
|
||
complexity in the MMU hardware and in the OS software, one of whose tasks is
|
||
to support the management of virtual memory. The MC68040 offers two page
|
||
sizes, selectable by the user: 4 kbytes and 8 kbytes. This tends to
|
||
complicate the MMU logic and the OS. It is a good thing that Motorola got
|
||
rid of the other page size options available with its MC68851 paged MMU:
|
||
8 sizes ranging from 256 bytes to 32 kbytes, stepped by a factor of 2. On the
|
||
other hand, the 8-kbyte per page option could be useful to a programmer dealing
|
||
with large modules of code exceeding 4 kbytes.
|
||
|
||
|
||
TLB (or ATC) Size
|
||
|
||
The 80486 MMU has a 32-entry TLB. With a 4-kbyte page it covers
|
||
32 x 4 kbytes = 128 kbytes of memory. The MC68040 offers much more. The TLB
|
||
is called address translation cache (ATC) by Motorola, but it does the same:
|
||
it translates virtual into physical addresses. The name given by Motorola is
|
||
simpler to perceive, although the TLB term is predominately used in the
|
||
computer literature. Each of the two MC68040 MMUs has a 64-entry ATC, for a
|
||
total of 128 entries on the chip. For a 4-kbyte page, a total of 128 x 4
|
||
kbytes = 512 kbytes of memory is covered (4 times that of the 80486), and for
|
||
an 8-kbyte page, 1 Mbyte (8 times that of 80486). In this case, a strong
|
||
advantage of the MC68040 is obvious. Since the ATCs encompass much more
|
||
memory, the ATC miss probability is considerably smaller. Thus, less time
|
||
will be wasted in accessing page tables in memory, resulting in faster overall
|
||
operation.
|
||
|
||
|
||
Levels of Protection
|
||
|
||
The 80486 offers four levels of protection, while the MC68040 has only
|
||
two - the supervisor and user, as does the whole M68000 family. While the
|
||
protection mechanism of the 80486 is much more sophisticated and, with the
|
||
segmentation encapsulation of information, offers more reliable protection,
|
||
it also results in more complicated on-chip logic. More time is taken up with
|
||
protection checks on the 80486.
|
||
|
||
|
||
Instruction Pipeline Stages
|
||
|
||
The 80486 instruction pipeline has five stages, while that of the
|
||
MC68040 has six. This means that the 80486 pipeline can handle five
|
||
instructions simultaneously and the MC68040 can handle six. This certainly
|
||
gives an edge in favor of the MC68040, although its MMU-cache-internal buses
|
||
duality is a much stronger contributor to its enhanced speed of operation.
|
||
The above comments are valid if the instructions are executed sequentially,
|
||
without any taken branches. In the case of the taken branch, the subsequent
|
||
prefetched instructions are flushed from the pipeline hardware. Neither
|
||
the 80486 nor the MC68040 employ the delayed branch feature, as do most of
|
||
the RISC-type systems. The MC68040 designers have investigated the possibilityof featuring a delayed branch or other techniques to alleviate the problem of
|
||
lost cycles in case of a flushed pipeline. After a number of simulations,
|
||
they came to the conclusion that the gain in performance was not worth the
|
||
extra hardware expenditure incurred in the implementation of any of the methods
|
||
considered. In RISC-type systems, on the other hand, due to reduced control
|
||
circuitry there is extra space for features such as the delayed branch which
|
||
alleviates the pipeline management problem in case of a taken branch. Indeed,
|
||
Intel's RISC 80860 and Motorola's RISC M88000 both implement the delayed branch
|
||
technique as an option, selectable by the user.
|
||
|
||
|
||
Performance Benchmarks
|
||
|
||
|
||
Dhrystone Benchmark Version 2.1 (Integer Performance Test -- ALU)
|
||
-----------------------------------------------------------------------------
|
||
System Results - Kdhrystones/s Relative
|
||
-----------------------------------------------------------------------------
|
||
VAX 11/780 1.6 1.0
|
||
Motorola MC68030 (50 Mhz,1ws) 20.0 12.5
|
||
Intel 80486 (25 Mhz) 24.0 15.0
|
||
SPARC (25 Mhz) 27.0 16.8
|
||
Motorola M88000 (20 Mhz) 33.3 20.1
|
||
MIPS M/2000, R3000 (25 Mhz) 39.4 23.8
|
||
Motorola MC68040 (25 Mhz) 40.0 24.3
|
||
Intel 80860 (33.3 Mhz) 67.3 40.6
|
||
-----------------------------------------------------------------------------
|
||
|
||
|
||
As one can see, the MC68040 Dhrystone integer performance considerably
|
||
exceeds that of the 80486. It should also be noted that the MC68040
|
||
outperforms its predecessor MC68030 by a factor of 2, while the MC68030
|
||
operates at a double frequency.
|
||
|
||
|
||
Linpack Benchmark (Double-Precision, 100x100) (F-P Performance Test -- FPU)
|
||
-----------------------------------------------------------------------------
|
||
System Results - MFLOPS
|
||
-----------------------------------------------------------------------------
|
||
VAX 11/780 0.14
|
||
NS32532 + NS32381 0.17
|
||
Intel 80386 + 80387 (20 Mhz) 0.20
|
||
VAX 8600 0.49
|
||
Intel 80486 (25 Mhz) 1.0
|
||
Motorola M88000 (20 Mhz) 1.2
|
||
Sun SPARCstation 1 1.3
|
||
Decstation 3100 (MIPS R2000) 1.6
|
||
Sun 4/200 (SPARC) 1.6
|
||
Am29000 (25 Mhz) 1.71
|
||
IBM 3081 2.1
|
||
Motorola MC68040 (25 Mhz) 3.0
|
||
R3000/R3010 (25 Mhz) 3.9
|
||
Intel 80860 4.5
|
||
RS/6000 (25 Mhz) 10.9
|
||
Cray 1S 12.0
|
||
Cray X-MP 56.0
|
||
-----------------------------------------------------------------------------
|
||
|
||
|
||
Here, the MC68040 outperforms the 80486 by a factor of 3. This
|
||
performance ratio is well supported by the discussion given for the data
|
||
in Table 1.1.
|
||
|
||
The fact that more RISC-type processors, tested above, outperform the
|
||
80486 CISC should not escape notice either. This is particularly significant
|
||
for floating-point performance where the 80486 has an on-chip FPU, while the
|
||
R3000 and the SPARC use off-chip coprocessors.
|
||
|
||
A comparison of memory access clock cycles needed for the execution of
|
||
ADD instructions is reported in the following:
|
||
|
||
|
||
Memory Access Clock Counts
|
||
-----------------------------------------------------------------------------
|
||
Source Destination MC68040 80486 M88000
|
||
-----------------------------------------------------------------------------
|
||
ADD reg reg 1 1 1
|
||
ADD mem reg (cache hit) 1 2 3*
|
||
ADD reg mem (cache hit) 1 1 3*
|
||
ADD mem reg (cache miss) 3 4 15*
|
||
-----------------------------------------------------------------------------
|
||
--"reg" represents a CPU register and "mem" represents a location in memory.
|
||
*Includes time to load register plus one clock for the ADD operation.
|
||
|
||
|
||
The superior performance of the MC68040 fits the discussion given
|
||
earlier in this text. It should also be noted that both the MC68040 and
|
||
80486 have an on-chip cache, while the M88000 cache is on a separate CMMU
|
||
chip (MC88200).
|
||
|
||
|
||
It should be noted that all of the above comparisons were conducted
|
||
with artificial benchmark programs such as Dhrystone. It is quite possible
|
||
that for some "real-life" programs the performance ordering might be quite
|
||
different. It is no accident that when company A conducts benchmark
|
||
experiments, its products come out ahead of others. It is quite possible
|
||
that when another company, say B, publishes its own benchmark results, the
|
||
performance ordering may look different. Therefore, the sample of benchmark
|
||
comparison results presented should be regarded as a tentative indication.
|
||
They are certainly not conclusive. |