296 lines
16 KiB
Plaintext
296 lines
16 KiB
Plaintext
|
|
|||
|
|
|||
|
|
|||
|
CISC : The Intel 80486 vs. The Motorola MC68040
|
|||
|
---------------------------------------------------
|
|||
|
|
|||
|
Source : Advanced Microprocessors by Daniel Tabak
|
|||
|
|
|||
|
Scribe : X-> Mike <-X - July '92
|
|||
|
|
|||
|
---------------------------------------------------
|
|||
|
|
|||
|
|
|||
|
System Comparison
|
|||
|
|
|||
|
Most of the space in this text is dedicated to the most recent
|
|||
|
advanced CISC microprocessors, the top current products within their families;
|
|||
|
the Intel 80486 and the Motorola MC68040. They both belong to the latest
|
|||
|
1.2 million transistors per chip generation. It therefore makes sense
|
|||
|
to compare the two. It would be unfair to compare the NS32532 with them,
|
|||
|
since the NS32532 belongs to an earlier generation and it is not in the same
|
|||
|
class as the 80486 and MC68040.
|
|||
|
|
|||
|
A selection of points of comparison between the 80486 and the MC68040
|
|||
|
is listed in Table 1.1. Looking carefully at the table, one can perceive
|
|||
|
only a single line indentically marked in both columns: both chips have an
|
|||
|
on-chip FPU, conforming to the IEEE 754-1985 standard. All other data are
|
|||
|
different, although quite close in some instances. The points of difference
|
|||
|
between the 80486 and the MC68040 will be discussed next in some detail.
|
|||
|
|
|||
|
|
|||
|
Table 1.1 Comparison of Intel 80486 and Motorola MC68040
|
|||
|
-------------------------------------------------------------------------------
|
|||
|
Feature Intel 80486 Motorola MC68040
|
|||
|
-------------------------------------------------------------------------------
|
|||
|
FPU on Chip Yes (IEEE) Yes (IEEE)
|
|||
|
CPU General-Purpose 32-bit Registers 8 16; 8 Data/8 Address
|
|||
|
FPU 80-bit Registers 8 (stack) 8
|
|||
|
MMU on Chip Yes Yes; Dual: Data, Code
|
|||
|
Cache on Chip 8k Mixed 4k Data + 4k Code
|
|||
|
Segmentation Yes No
|
|||
|
Paging Yes; 4k/page Yes; 4k or 8k/page
|
|||
|
TLB (or ATC) size 32 entries 64 entries in each:
|
|||
|
Data, Code ATC
|
|||
|
Levels of protection 4 2
|
|||
|
Instruction pipeline stages 5 6
|
|||
|
Pins 168 179
|
|||
|
-------------------------------------------------------------------------------
|
|||
|
|
|||
|
|
|||
|
CPU General-Purpose Registers
|
|||
|
|
|||
|
Both systems have 32-bit general-purpose registers; the 80486 has 8,
|
|||
|
while the 68040 has double that number, namely 16. There are advantages
|
|||
|
(and disadvantages) to having a large register file. The register file of
|
|||
|
the 80486 is definitely too small to avail itself to the advantages. This
|
|||
|
is particularly exacerbated by the fact that the CPU registers of the 80486
|
|||
|
are not really quite as general purpose as one might wish. In fact, all of
|
|||
|
them are dedicated to certain special tasks, such as:
|
|||
|
|
|||
|
EAX, EDX Dedicated to multiplication/division operations
|
|||
|
EDX Dedicated to some I/O operations
|
|||
|
EBX, EBP Dedicated to serve as base registers for some addressing modes
|
|||
|
ECX Dedicated to serve as a counter in LOOP instructions
|
|||
|
ESP Dedicated to serve as a stack pointer
|
|||
|
ESI, EDI Dedicated to serve as pointers in string instructions and as
|
|||
|
index registers in some addressing modes
|
|||
|
|
|||
|
On the other hand, on the MC68040 the eight 32-bit data registers D0 to D7
|
|||
|
are genuinely general purpose without any restrictions or specific tasks
|
|||
|
imposed on them. Of the eight 32-bit address registers A0 to A7, only A7
|
|||
|
is dedicated as a stack pointer. The user is free to use the other seven
|
|||
|
resgisters A0 to A6 in any possible way.
|
|||
|
|
|||
|
From the point of view of the CPU register file, the MC68040 has a
|
|||
|
very clear advantage. It is much better equipped to retain intermediate
|
|||
|
results during a program run, thus reducing CPU-memory traffic. From this
|
|||
|
standpoint, the MC68040 even has a slight edge over the VAX architecture.
|
|||
|
The VAX (any VAX model) also has sixteen 32-bit general-purpose registers.
|
|||
|
However, only 12 of those (as opposed to the 68040's 15) can be used freely
|
|||
|
by the programmer. Of the four VAX dedicated registers, one is used as a
|
|||
|
program counter and another as a stack pointer. The program counter is
|
|||
|
completely separate on both the MC68040 and the 80486 and is not included in
|
|||
|
the general-purpose registers.
|
|||
|
|
|||
|
|
|||
|
FPU General-Purpose Registers
|
|||
|
|
|||
|
Both systems have eight 80-bit registers, providing a large range for
|
|||
|
floating-point number representation and a high level of precision. The only
|
|||
|
differnce between the two is that the 80486 FPU registers are organized as a
|
|||
|
stack, while those of the MC68040 are accessed directly, as its integer CPU
|
|||
|
registers. Because of the stack organization the 80486 might have a slight
|
|||
|
edge from the standpoint of compiler generation (for that part of the compiler
|
|||
|
dealing with floating-point operations).
|
|||
|
|
|||
|
|
|||
|
MMU on Chip
|
|||
|
|
|||
|
The 80486 has a regular MMU on chip for the control and management of
|
|||
|
its memory. The MC68040 has two MMUs: one for code and one for data. This
|
|||
|
duality, supported by a separate operand data bus, allows the control unit to
|
|||
|
handle instruction and operand fetching simultaneously in parallel and enhances
|
|||
|
the handling of the instruction pipeline. Of course, the external bus leading
|
|||
|
to the off-chip main memory is single (32-bit data, 32-bit address), and it is
|
|||
|
shared by instructions and data operands. With a reasonable on-chip cache hit
|
|||
|
ratio, the off-chip bus would be used less often.
|
|||
|
|
|||
|
|
|||
|
Cache on Chip
|
|||
|
|
|||
|
The total on-chip cache of both systems is 8 kbytes. Interestingly
|
|||
|
enough, they have the same parameters: both are four-way set-associative with
|
|||
|
16 bytes per line. The difference is that while the 80486 on-chip 8k cache
|
|||
|
is mixed, storing both code and data the MC68040 cache is subdivided into two
|
|||
|
equal parts: a 4-kbyte data cache and a 4-kbyte code cache. Each cache is
|
|||
|
controlled by the respective MMU, mentioned above. The advantage, as in the
|
|||
|
MMU case, is the provision of two parallel paths for code and data, resulting
|
|||
|
in an overall speedup of operation.
|
|||
|
|
|||
|
|
|||
|
Segmentation
|
|||
|
|
|||
|
The Intel 80x86 family implements segmentation, while the M68000 family
|
|||
|
does not. The earlier Intel systems (8086, 80286) were plagued with the upper
|
|||
|
64-kbyte segment size limit, starting with the 80386 and so on, the segment sizecan be made as high as 4 Gbytes (maximum size of the physical memory),
|
|||
|
effectively removing the segmentation feature by the decision of the user.
|
|||
|
Therefore, as far as segmentation is concerned, the 80486 and MC68040 are
|
|||
|
comparable. The 80486 has some edge, since it allows the user to implement
|
|||
|
segmentation if needed and avail oneself to its advantages.
|
|||
|
|
|||
|
|
|||
|
Paging
|
|||
|
|
|||
|
The MMUs of both systems feature paged virtual memory management.
|
|||
|
The 80486 offers a single standard page size of 4 kbytes. This page size
|
|||
|
is implemented in many other systems. With a 4-kbyte page size, one can
|
|||
|
arrange an address mapping where the page directory and the page tables also
|
|||
|
have the standard page size of 4 kbytes (1024 = 2^10 entries, 4 bytes each).
|
|||
|
Thus, the page directory and the page tables can be treated as entire pages
|
|||
|
and placed within page frames in the memory. This results in reduced
|
|||
|
complexity in the MMU hardware and in the OS software, one of whose tasks is
|
|||
|
to support the management of virtual memory. The MC68040 offers two page
|
|||
|
sizes, selectable by the user: 4 kbytes and 8 kbytes. This tends to
|
|||
|
complicate the MMU logic and the OS. It is a good thing that Motorola got
|
|||
|
rid of the other page size options available with its MC68851 paged MMU:
|
|||
|
8 sizes ranging from 256 bytes to 32 kbytes, stepped by a factor of 2. On the
|
|||
|
other hand, the 8-kbyte per page option could be useful to a programmer dealing
|
|||
|
with large modules of code exceeding 4 kbytes.
|
|||
|
|
|||
|
|
|||
|
TLB (or ATC) Size
|
|||
|
|
|||
|
The 80486 MMU has a 32-entry TLB. With a 4-kbyte page it covers
|
|||
|
32 x 4 kbytes = 128 kbytes of memory. The MC68040 offers much more. The TLB
|
|||
|
is called address translation cache (ATC) by Motorola, but it does the same:
|
|||
|
it translates virtual into physical addresses. The name given by Motorola is
|
|||
|
simpler to perceive, although the TLB term is predominately used in the
|
|||
|
computer literature. Each of the two MC68040 MMUs has a 64-entry ATC, for a
|
|||
|
total of 128 entries on the chip. For a 4-kbyte page, a total of 128 x 4
|
|||
|
kbytes = 512 kbytes of memory is covered (4 times that of the 80486), and for
|
|||
|
an 8-kbyte page, 1 Mbyte (8 times that of 80486). In this case, a strong
|
|||
|
advantage of the MC68040 is obvious. Since the ATCs encompass much more
|
|||
|
memory, the ATC miss probability is considerably smaller. Thus, less time
|
|||
|
will be wasted in accessing page tables in memory, resulting in faster overall
|
|||
|
operation.
|
|||
|
|
|||
|
|
|||
|
Levels of Protection
|
|||
|
|
|||
|
The 80486 offers four levels of protection, while the MC68040 has only
|
|||
|
two - the supervisor and user, as does the whole M68000 family. While the
|
|||
|
protection mechanism of the 80486 is much more sophisticated and, with the
|
|||
|
segmentation encapsulation of information, offers more reliable protection,
|
|||
|
it also results in more complicated on-chip logic. More time is taken up with
|
|||
|
protection checks on the 80486.
|
|||
|
|
|||
|
|
|||
|
Instruction Pipeline Stages
|
|||
|
|
|||
|
The 80486 instruction pipeline has five stages, while that of the
|
|||
|
MC68040 has six. This means that the 80486 pipeline can handle five
|
|||
|
instructions simultaneously and the MC68040 can handle six. This certainly
|
|||
|
gives an edge in favor of the MC68040, although its MMU-cache-internal buses
|
|||
|
duality is a much stronger contributor to its enhanced speed of operation.
|
|||
|
The above comments are valid if the instructions are executed sequentially,
|
|||
|
without any taken branches. In the case of the taken branch, the subsequent
|
|||
|
prefetched instructions are flushed from the pipeline hardware. Neither
|
|||
|
the 80486 nor the MC68040 employ the delayed branch feature, as do most of
|
|||
|
the RISC-type systems. The MC68040 designers have investigated the possibilityof featuring a delayed branch or other techniques to alleviate the problem of
|
|||
|
lost cycles in case of a flushed pipeline. After a number of simulations,
|
|||
|
they came to the conclusion that the gain in performance was not worth the
|
|||
|
extra hardware expenditure incurred in the implementation of any of the methods
|
|||
|
considered. In RISC-type systems, on the other hand, due to reduced control
|
|||
|
circuitry there is extra space for features such as the delayed branch which
|
|||
|
alleviates the pipeline management problem in case of a taken branch. Indeed,
|
|||
|
Intel's RISC 80860 and Motorola's RISC M88000 both implement the delayed branch
|
|||
|
technique as an option, selectable by the user.
|
|||
|
|
|||
|
|
|||
|
Performance Benchmarks
|
|||
|
|
|||
|
|
|||
|
Dhrystone Benchmark Version 2.1 (Integer Performance Test -- ALU)
|
|||
|
-----------------------------------------------------------------------------
|
|||
|
System Results - Kdhrystones/s Relative
|
|||
|
-----------------------------------------------------------------------------
|
|||
|
VAX 11/780 1.6 1.0
|
|||
|
Motorola MC68030 (50 Mhz,1ws) 20.0 12.5
|
|||
|
Intel 80486 (25 Mhz) 24.0 15.0
|
|||
|
SPARC (25 Mhz) 27.0 16.8
|
|||
|
Motorola M88000 (20 Mhz) 33.3 20.1
|
|||
|
MIPS M/2000, R3000 (25 Mhz) 39.4 23.8
|
|||
|
Motorola MC68040 (25 Mhz) 40.0 24.3
|
|||
|
Intel 80860 (33.3 Mhz) 67.3 40.6
|
|||
|
-----------------------------------------------------------------------------
|
|||
|
|
|||
|
|
|||
|
As one can see, the MC68040 Dhrystone integer performance considerably
|
|||
|
exceeds that of the 80486. It should also be noted that the MC68040
|
|||
|
outperforms its predecessor MC68030 by a factor of 2, while the MC68030
|
|||
|
operates at a double frequency.
|
|||
|
|
|||
|
|
|||
|
Linpack Benchmark (Double-Precision, 100x100) (F-P Performance Test -- FPU)
|
|||
|
-----------------------------------------------------------------------------
|
|||
|
System Results - MFLOPS
|
|||
|
-----------------------------------------------------------------------------
|
|||
|
VAX 11/780 0.14
|
|||
|
NS32532 + NS32381 0.17
|
|||
|
Intel 80386 + 80387 (20 Mhz) 0.20
|
|||
|
VAX 8600 0.49
|
|||
|
Intel 80486 (25 Mhz) 1.0
|
|||
|
Motorola M88000 (20 Mhz) 1.2
|
|||
|
Sun SPARCstation 1 1.3
|
|||
|
Decstation 3100 (MIPS R2000) 1.6
|
|||
|
Sun 4/200 (SPARC) 1.6
|
|||
|
Am29000 (25 Mhz) 1.71
|
|||
|
IBM 3081 2.1
|
|||
|
Motorola MC68040 (25 Mhz) 3.0
|
|||
|
R3000/R3010 (25 Mhz) 3.9
|
|||
|
Intel 80860 4.5
|
|||
|
RS/6000 (25 Mhz) 10.9
|
|||
|
Cray 1S 12.0
|
|||
|
Cray X-MP 56.0
|
|||
|
-----------------------------------------------------------------------------
|
|||
|
|
|||
|
|
|||
|
Here, the MC68040 outperforms the 80486 by a factor of 3. This
|
|||
|
performance ratio is well supported by the discussion given for the data
|
|||
|
in Table 1.1.
|
|||
|
|
|||
|
The fact that more RISC-type processors, tested above, outperform the
|
|||
|
80486 CISC should not escape notice either. This is particularly significant
|
|||
|
for floating-point performance where the 80486 has an on-chip FPU, while the
|
|||
|
R3000 and the SPARC use off-chip coprocessors.
|
|||
|
|
|||
|
A comparison of memory access clock cycles needed for the execution of
|
|||
|
ADD instructions is reported in the following:
|
|||
|
|
|||
|
|
|||
|
Memory Access Clock Counts
|
|||
|
-----------------------------------------------------------------------------
|
|||
|
Source Destination MC68040 80486 M88000
|
|||
|
-----------------------------------------------------------------------------
|
|||
|
ADD reg reg 1 1 1
|
|||
|
ADD mem reg (cache hit) 1 2 3*
|
|||
|
ADD reg mem (cache hit) 1 1 3*
|
|||
|
ADD mem reg (cache miss) 3 4 15*
|
|||
|
-----------------------------------------------------------------------------
|
|||
|
--"reg" represents a CPU register and "mem" represents a location in memory.
|
|||
|
*Includes time to load register plus one clock for the ADD operation.
|
|||
|
|
|||
|
|
|||
|
The superior performance of the MC68040 fits the discussion given
|
|||
|
earlier in this text. It should also be noted that both the MC68040 and
|
|||
|
80486 have an on-chip cache, while the M88000 cache is on a separate CMMU
|
|||
|
chip (MC88200).
|
|||
|
|
|||
|
|
|||
|
It should be noted that all of the above comparisons were conducted
|
|||
|
with artificial benchmark programs such as Dhrystone. It is quite possible
|
|||
|
that for some "real-life" programs the performance ordering might be quite
|
|||
|
different. It is no accident that when company A conducts benchmark
|
|||
|
experiments, its products come out ahead of others. It is quite possible
|
|||
|
that when another company, say B, publishes its own benchmark results, the
|
|||
|
performance ordering may look different. Therefore, the sample of benchmark
|
|||
|
comparison results presented should be regarded as a tentative indication.
|
|||
|
They are certainly not conclusive. |