252 lines
12 KiB
Plaintext
252 lines
12 KiB
Plaintext
68040 Info:
|
|
|
|
----------------------------
|
|
This new CISC microprocessor
|
|
offers RISC performance
|
|
----------------------------
|
|
|
|
Motorola has officially unwrapped its newest 32-bit
|
|
microprocessor, the 68040. Manufactured with 0.8-micron
|
|
high-speed CMOS technology, the 68040 packs 1.2 million
|
|
transistors on a single silicon die. With 900,000 extra
|
|
transistors to work with over the 300,000 transistors in a 68000
|
|
processor, the 68040's designers added new features and boosted
|
|
performance. New features include the following:
|
|
|
|
|
|
|
|
-- Optimised 68030 integer unit. While retaining object-code
|
|
compatibility with previous 68000-family processors, the IU has
|
|
been optimised to execute instructions in fewer clock cycles
|
|
(i.e., run faster). The claimed boost in performance is three
|
|
times that of a 68030.
|
|
|
|
-- Integral FPU. The 68020 and 68030 require external FPU
|
|
coprocessor chips to handle floating-point math. The 68040,
|
|
however, has an FPU built into it, giving it the power to do
|
|
serious number crunching. The FPU's data types are compatible
|
|
with the ANSI/IEEE 754 standard for binary floating-point math,
|
|
and its instruction set is object code-compatible with Motorola's
|
|
68881/68882 FPUs. Like the IU, the 68040's on-chip FPU has been
|
|
optimised to execute frequently used instructions using fewer
|
|
clock cycles. The claimed performance boost is 10 times that of a
|
|
68882.
|
|
|
|
-- Large caches. Processor accesses to the system bus are
|
|
minimised by storing the most recently used set of instructions
|
|
or data in on-chip, 4K-byte caches. Both caches operate
|
|
independently but can be accessed at the same time. Bus snoop
|
|
logic is used to maintain cache coherence (i.e., it ensures that
|
|
the cache's contents match those parts of memory corresponding to
|
|
the cache). The bus snooper's design is fine-tuned to support
|
|
multiprocessor systems where one or more bus masters or 68040s
|
|
might share the same section of memory.
|
|
|
|
-- Separate memory units for instructions and data. Each memory
|
|
unit consists of a memory management unit, a cache controller,
|
|
and bus snoop logic. The MMUs use a subset of the 68030's MMU
|
|
instruction set. Both memory units function independently of each
|
|
other to improve processor throughput.
|
|
|
|
The 68040 ships with an initial clock speed of 25MHz; higher
|
|
speeds are to be available in the future, Motorola says. The
|
|
68040 comes in a 179-pin grid-array package. With the elimination
|
|
of coprocessor function lines (now that the MMU and FPU are
|
|
consolidated onto the processor) and the addition of snoop
|
|
control lines, the 68040 is not pin-compatible with the 68030.
|
|
|
|
Because of the 68040's software compatibility with its
|
|
predecessors, it can tap into the existing software base of 680x0
|
|
applications. It does this not only while eliminating a component
|
|
(the FPU) from a computer's design, but also while improving
|
|
performance. In fact, the 68040 executes instructions on the
|
|
average of nearly once per clock cycle -- the same as a RISC
|
|
processor.
|
|
|
|
|
|
Fine-Tuned for Performance
|
|
|
|
The 68040 was built on the firm foundation of its
|
|
predecessors. The design team used the experience garnered from
|
|
developing earlier processors to aid in optimising the throughput
|
|
of the 040.
|
|
|
|
The 040 was designed from the ground up, Motorola engineers
|
|
said. It incorporates a high degree of parallelism using a number
|
|
of internal buses. An internal Harvard architecture gives the
|
|
processor full access to both instructions and data. Both the IU
|
|
and FPU have separate pipelines and can operate concurrently. For
|
|
example, the FPU can perform floating-point instructions
|
|
independently of the IU. Each stream (instructions or data) has
|
|
its own dedicated cache and MU that function independently of
|
|
each other. A smart bus controller assigns priorities to bus
|
|
traffic to and from the caches.
|
|
|
|
There were several key areas where Motorola was able to
|
|
boost performance. The first was in reducing the clock cycles
|
|
needed to execute certain instructions. The next was to ensure
|
|
that the processor funnels instructions and data into itself
|
|
quickly and constantly, lest it stall while waiting on
|
|
information. The processor then gets its results back into the
|
|
system without interfering with incoming information. Finally, as
|
|
if this wasn't enough, the processor stays off the system bus to
|
|
a greater extent than is the case with other processor designs.
|
|
This lets DMA transfers and other bus masters have use of it.
|
|
|
|
|
|
CISC with the Speed of RISC
|
|
|
|
The IU was optimised so that high-usage instructions execute
|
|
in fewer clock cycles, particularly branch instructions. Motorola
|
|
said it performed thousands of code traces using real-world
|
|
applications to determine which instructions were used most
|
|
often. The IU consists of 6 stages: instruction prefetch, decode,
|
|
effective address calculation, operand fetch, execution, and
|
|
writeback (i.e., the result is written to either a register or to
|
|
memory). Each stage works concurrently on the instruction
|
|
pipeline. Dual prefetch and decode units deal with the branch
|
|
instructions: One set processes the instruction taken on the
|
|
branch, and another processes the instruction not taken. In this
|
|
way, no matter what the outcome, the IU has the next instruction
|
|
decoded and ready to go without seriously disrupting the
|
|
pipeline. This complex design has a big pay-off: Motorola has
|
|
determined that the average instruction takes 1.3 clock cycles to
|
|
execute. The ability to execute an instruction once per clock
|
|
cycle is the performance edge of RISC processors -- yet the
|
|
68040's IU accomplishes the same goal while executing
|
|
complex-instruction-set computer (CISC) instructions.
|
|
|
|
The FPU adds 11 registers to the 68040 register set: Eight
|
|
of them are 80-bit floating-point registers, and three are
|
|
status, control, and instruction address registers. The FPU has a
|
|
three-stage execution unit, and, like the IU, each stage operates
|
|
concurrently. Load and store instructions (FMOVE) can be
|
|
performed during other arithmetic operations, and a 64- by 8-bit
|
|
hardware multiplication unit speeds many calculations. However,
|
|
the FPU only implements a subset of the 68882 instructions
|
|
on-chip. The transcendental (trigonometric and exponential)
|
|
functions are emulated in software via a software trap. But
|
|
Motorola claims that even these instructions should execute 25%
|
|
to 100% faster on 25MHz 68040 than on a 33MHz 68882 FPU.
|
|
|
|
|
|
Boosting Throughput
|
|
|
|
In the area of throughput, each stream is managed by a
|
|
separate memory unit that uses an MMU for logical-to-physical
|
|
address translations during bus accesses. These MMUs support
|
|
demand-paged virtual memory. Both MMUs have a four-way
|
|
set-associative address translation cache (ATC) with 4 entries
|
|
(versus 22 entries for the 68030). The ATCs reduce processor
|
|
overhead by storing the most recent address translations. When an
|
|
address translation is required, the ATC is searched, and if it
|
|
contains the address, it is used immediately. Otherwise, a
|
|
combination of high-speed hardware logic and microcode searches
|
|
the translation tables located in main memory.
|
|
|
|
Like the PU, these MMUs implement a subset of the 68030's
|
|
MMU instruction set. Gone are the PLOAD and PMOVE instructions,
|
|
because enhanced existing instructions made them superfluous.
|
|
Also, only 2 memory page sizes are supported, 4K and 8K bytes,
|
|
whereas the 68030 MMU supported 8 page sizes ranging from 256
|
|
bytes to 32K bytes. A design tradeoff was made here: A
|
|
performance gain was possible by supporting only the 2 most
|
|
common page sizes. In any case, this change impacts only
|
|
operating-system code, since MMU instructions aren't normally
|
|
used by applications.
|
|
|
|
The two on-chip 4K caches improve processor throughput in 2
|
|
ways: They keep the pipelines filled and minimise system bus
|
|
accesses. To see how this is done, you must examine the structure
|
|
of the cache. Each is a four-way set-associative cache composed
|
|
of 64 sets of four lines. A line consists of 4 longwords, or 16
|
|
bytes. Cache lines are read or written rapidly using burst-mode
|
|
access (a type of bus transfer that moves 16 bytes in a minimum
|
|
of clock cycles). For read operations, this fills the cache
|
|
efficiently and, at the same time, loads adjacent instructions or
|
|
data into the cache that could be used in the near future.
|
|
|
|
|
|
Zen and the Art of Cache Maintenance
|
|
|
|
As the cache is accessed and data modified, cache-mode bits
|
|
in the ATC determine, on a page-by-page basis, the method by
|
|
which the information is handled. That is, the ATC entry that
|
|
corresponds to the address in main memory whose contents were
|
|
copied into the cache decides how the data will be updated. The
|
|
modes are cacheable write-through, cacheable copyback,
|
|
noncacheable, and noncacheable I/O.
|
|
|
|
In the cacheable write-through mode, an update to the data
|
|
cache forces a write to main memory. While this generates
|
|
additional bus activity, this mode is required when working with
|
|
a portion of memory that other processors share. The copyback
|
|
mode updates the cache line but without updating main memory. The
|
|
modified (or "dirty") cache line is copied back into main memory
|
|
only when absolutely necessary. "Noncacheable" indicates that the
|
|
data shouldn't be cached, which is typically the situation for
|
|
shared data structures or for locked accesses (e.g., an operand
|
|
access or a translation table entry update). Noncacheable I/O
|
|
indicates that the data can't be cached and must be read or
|
|
written in the exact order of instruction execution. This mode is
|
|
for memory-mapped I/O devices (typically a serial device) where
|
|
the information's order is crucial.
|
|
|
|
The bus snooper is used in multiple bus master situations
|
|
where a noncaching bus master, such as a DMA controller, might
|
|
modify the memory that is mapped into the 68040's cache. The bus
|
|
snooper monitors the external bus and updates the cache as
|
|
required.
|
|
|
|
Cache validity is handled on a line-by-line basis (i.e., a
|
|
cache miss triggers a burst-mode access that updates 16 bytes
|
|
either in the cache or main memory). The copyback mode minimises
|
|
writes to main memory, and the bus controller prioritises each
|
|
cache's external memory requests. Read requests take priority
|
|
over writes to ensure that the pipelines remain filled.
|
|
|
|
The caches are critical to the 040's overall throughput.
|
|
They keep instructions and data moving into the processor while
|
|
satisfying the apparently contradictory role of minimising system
|
|
bus accesses. Motorola estimates that the cache hit rate is about
|
|
93 percent for instruction and data reads and about 94 percent
|
|
for data writes.
|
|
|
|
|
|
A Processor for the 1990's
|
|
|
|
It is perhaps appropriate that Motorola has introduced the
|
|
68040 in the first month of the 1990s. The 040 has the power to
|
|
tackle the jobs with large amounts of information that we will be
|
|
dealing with regularly in the next ten or so years.
|
|
|
|
Preliminary results have a 68040 weighing in at 20 million
|
|
instructions per second versus the SPARC's 18 MIPS and the
|
|
80486's 15 MIPS, all clocked at 25MHz. On floating-point
|
|
operations, the 68040 antes up 3.5 million floating-point
|
|
operations per second versus the SPARCS's 2.6 MFLOPS and the
|
|
80486's 1 MFLOPS. If these numbers are accurate, then the 68040
|
|
already out performs one RISC processor.
|
|
|
|
But the computer industry doesn't stand still. As we move
|
|
into the new decade, we can expect new RISC processors to once
|
|
again take the lead in performance. Still, the 68040 shows that
|
|
owners of CISC systems can have their cake and eat it, too. They
|
|
don't have to forsake their software base or settle for mediocre
|
|
performance.
|
|
|
|
|
|
And Motorola is already working on the 68050.
|
|
|
|
|-THiS FiLE PASSED THR0UGH --- /\ ---.------ /\ ---*--.- FiDONET 2:200/612 --|
|
|
| . * . // \ . // \ . FUJiNeT 7:102/102 |
|
|
| I.C.S Swedish HQ // \ + // \ . MeGANeT 66:666/1 |
|
|
| + // / \ // \ + NeST 90:1101/112 |
|
|
| Sync World HQ /\\ \\ / . // \\ / |
|
|
| . // \ \/ // /\/ . 16800 DUAL STANDARD |
|
|
| +46-451-91002 \\ / / \\ \/ + |
|
|
| * \\ / + . \\ \ . . . |
|
|
| . \\ / \\ / |
|
|
|- SysOp: Troed ------------ \/ARCASTIC -- \/XISTENCE --- CoSysOp: Zaphod B -|
|
|
< Advertisment added using -=Bad Ad=- 1.91 by Troed/Sync. BBS: +46-451-91002 >
|