328 lines
16 KiB
Plaintext
328 lines
16 KiB
Plaintext
Intel Pentium(TM) Processor Technical Backgrounder
|
|
|
|
Intel's Pentium(TM) Processor: The Next Generation of Power
|
|
|
|
The Pentium(TM) processor is the newest and most powerful member
|
|
of Intel'sx86 family of microprocessors. While incorporating new
|
|
features and improvements made possible by advances in
|
|
semiconductor technology, the Pentium processor is 100% code
|
|
compatible with previous members of x86 family, preserving the
|
|
value of user's software investments.
|
|
|
|
The Pentium processor incorporates a superscalar architecture,
|
|
improved floating point unit, separate on-chip code and
|
|
write-back data caches, 64- bit external data bus, and other
|
|
features designed to provide a platform for high-performance
|
|
computing.
|
|
|
|
The State of Processor Design Art
|
|
|
|
In recent years, developments in the art of semiconductor design
|
|
and manufacturing have made it possible to produce increasingly
|
|
more powerful microprocessors in smaller and smaller packages.
|
|
Chief among these developments has been the decreasing size of
|
|
components. Microprocessor designers are now working with CMOS
|
|
(complementary metal-oxide semiconductor) process technology
|
|
with features of less than a micron (one- millionth of a meter)
|
|
in size.
|
|
|
|
The use of sub-micron components allows designers to fit more of
|
|
them on a chip. The number of transistors in each member of the
|
|
x86 family has continued to grow, culminating in the Pentium
|
|
processor, which is implemented in 0.8 micron CMOS technology
|
|
and has 3.1 million transistors.
|
|
|
|
The increase in transistors has made it possible to integrate
|
|
components that were previously external to the CPU (such as
|
|
math coprocessors and caches) and place them on-board the chip.
|
|
Placing components on-board decreases the time required to
|
|
access them and increases performance dramatically. Another way
|
|
to decrease the distance between components (and therefore
|
|
increase the speed of communication between them) is to provide
|
|
multiple levels of metal for interconnection. Intel's current
|
|
microprocessor technology utilizes a 3 metal layer, the layout
|
|
of which requires special computer-aided design tools.
|
|
|
|
The Pentium processor utilizes the latest in microprocessor
|
|
design technology to provide performance comparable to that of
|
|
alternative architectures used in scientific and engineering
|
|
workstations, while maintaining compatibility with the immense
|
|
installed base of software now available for the x86 family of
|
|
microprocessors.
|
|
|
|
Intel's x86 family
|
|
|
|
The history of the personal computer industry is intimately
|
|
associated with the history of Intel's x86 chip family. In 1985,
|
|
Intel introduced the ground-breaking Intel386TM DX CPU, a 32-bit
|
|
microprocessor that executed 3 to 4 million instructions per
|
|
second (MIPS). Available in speeds ranging from 16 MHz up to 33
|
|
MHz, the 80386 addresses up to 4 gigabytes of physical memory,
|
|
and up to 64 terabytes of "virtual memory" (a technology
|
|
borrowed from mainframe computers that allows systems to work
|
|
with programs and data larger than their actual physical memory.)
|
|
|
|
The 80386 provided for true, robust multitasking and the ability
|
|
to create "virtual 8086" systems, each running securely in its
|
|
own 1-megabyte address space. Like its predecessors, the i386 DX
|
|
microprocessor spawned a new generation of personal computers,
|
|
which had the ability to run 32-bit operating systems and
|
|
ever-more complicated applications, all the while maintaining
|
|
compatibility with previous members of the x86 family.
|
|
|
|
In 1989, Intel shipped the Intel486TM DX microprocessor, which
|
|
incorporated an enhanced 386-compatible core, math coprocessor,
|
|
cache memory, and cache controller--a total of 1.2 million
|
|
transistors--all on a single chip. Operating at an initial speed
|
|
of 25MHz, the Intel486 DX CPU processed up to 20 MIPS. At its
|
|
current peak speed of 50 MHz, the Intel486 DX CPU processes up
|
|
to 41 MIPS. By incorporating RISC principles in its CPU core
|
|
(specifically, instruction pipelining), the Intel486 DX CPU is
|
|
able to execute most instructions in a single clock cycle. In
|
|
spite of these powerful new features, the Intel486 DX
|
|
microprocessor maintains 100% compatibility with previous
|
|
members of the x86 family, thereby preserving customers'
|
|
investment in software.
|
|
|
|
With the 1992 introduction of the Intel486TM DX2 microprocessor,
|
|
Intel
|
|
|
|
increased the speed of the 486 family by as much as 70 percent.
|
|
The DX2 family features a technology called "clock doubling,"
|
|
which allows the processor to operate twice as fast internally
|
|
as externally. The Intel486 DX2 CPU is nevertheless
|
|
pin-compatible with the Intel486 DX processor. At its current
|
|
peak speed of 66 MHz, the Intel486 DX2 CPU executes up to 54
|
|
MIPS.
|
|
|
|
The Pentium processor is the next step in Intel's commitment to
|
|
provide the highest possible performance at the best price,
|
|
while maintaining compatibility with previous Intel processors.
|
|
|
|
First Superscalar x86 Compatible Processor
|
|
|
|
The heart of the Pentium processor is its superscalar design,
|
|
built around two instruction pipelines, each capable of
|
|
performing independently. These pipelines allow the Pentium
|
|
processor to execute two integer instructions in a single clock
|
|
cycle, nearly doubling the chip's performance relative to a
|
|
Intel486 chip at equal frequency.
|
|
|
|
The Pentium processor's pipelines are similar to the single
|
|
pipeline of the Intel486 CPU, but they have been optimized to
|
|
provide increased performance. Like the Intel486 CPU's pipeline,
|
|
the pipelines in the Pentium processor execute integer
|
|
instructions in 5 stages: Prefetch, Instruction Decode, Address
|
|
Generate, Execute, and Write Back. When an instruction passes
|
|
from Prefetch to Instruction Decode, the pipeline is then free
|
|
to begin another operation.
|
|
|
|
In many instances, the Pentium processor can issue two
|
|
instructions at once, one to each of the pipelines, in a process
|
|
known as " instruction pairing." In this case, the instructions
|
|
must both be "simple", and the v- pipe always receives the next
|
|
sequential instruction after the one issued to the u-pipe. Each
|
|
pipeline has its own ALU (arithmetic logic unit), address
|
|
generation circuitry, and interface to the data cache.
|
|
|
|
While the Intel486 microprocessor incorporated a single 8 Kbyte
|
|
cache, the Pentium processor features two 8K caches, one for
|
|
instructions and one for data. These caches act as temporary
|
|
storage places for instructions and data obtained from slower,
|
|
main memory; when a system uses data, it will likely use it
|
|
again, and getting it from an on-chip cache is much faster than
|
|
getting it from main memory.
|
|
|
|
The Pentium processor's caches are 2-way set-associative caches,
|
|
an improvement over simpler, direct-mapped designs. They are
|
|
organized with 32-byte lines, which allows the cache circuitry
|
|
to search only 2 32-byte lines rather than the entire cache. The
|
|
use of 32-byte lines (up from 16- byte lines on the 486 DX) is a
|
|
good match of the Pentium processor's bus width (64 bits) with
|
|
burst length (4 chunks.)
|
|
|
|
When the circuitry needs to store instructions or data in a
|
|
cache that is already filled, it discards the least recently
|
|
used information (according to an "LRU" algorithm) and replaces
|
|
it with the information at hand.
|
|
|
|
The data cache has two interfaces, one to each of the pipelines,
|
|
which allows it to provide data for two separate operations in a
|
|
single clock cycle. When data is removed from the data cache
|
|
(and only then), it is written into main memory, a technique
|
|
known as write-back caching. Write- back caching provides better
|
|
performance than simpler write-through caching, in which the
|
|
processor writes data to the cache and main memory at the same
|
|
time (though the Pentium processor can be dynamically configured
|
|
to support write-through caching).
|
|
|
|
To ensure that the data in the cache and in main memory are
|
|
consistent with one another (especially a concern with
|
|
multiprocessor systems), the data cache implements a cache
|
|
consistency protocol known as MESI. This protocol defines four
|
|
states, which are assigned to each line of the cache based on
|
|
actions performed on that line by a CPU. By obeying the rules of
|
|
the protocol during memory read/writes, the Pentium processor
|
|
maintains cache consistency and circumvents problems that might
|
|
be caused by multiple processors using the same data.
|
|
|
|
The use of separate caches for instructions and data works in
|
|
conjunction with other elements of the Pentium processor's
|
|
design to provide increased performance and faster throughput
|
|
compared to the Intel486 microprocessor. For example, the first
|
|
stage of the pipeline is Prefetch, during which instructions are
|
|
obtained from the instruction cache. With a single cache,
|
|
conflicts might occur between instruction prefetches and data
|
|
accesses. Providing separate caches for instructions and data
|
|
precludes such conflicts and allows both operations to take
|
|
place simultaneously.
|
|
|
|
The Pentium processor also increases performance by using a
|
|
small cache known as the Branch Target Buffer (BTB) to provide
|
|
dynamic branch prediction. When an instruction leads to a
|
|
branch, the BTB "remembers" the instruction and the address of
|
|
the branch taken. The BTB uses this information to predict which
|
|
way the instruction will branch the next time it is used,
|
|
thereby saving time that would otherwise be required to retrieve
|
|
the desired branch target. When the BTB makes a correct
|
|
prediction, the branch is executed without delay, which enhances
|
|
performance.
|
|
|
|
The combination of instruction pairing and dynamic branch
|
|
prediction can speed operations considerably. For example, a
|
|
single iteration of the classic Sieve of Eratosthenes benchmark
|
|
requires 6 clock cycles to execute on the Intel486
|
|
microprocessor. The same code executes in only 2 clock cycles on
|
|
the Pentium processor.
|
|
|
|
Improved Floating Point Unit
|
|
|
|
The floating point unit in the Pentium processor has been
|
|
completely redesigned over that in the Intel486 microprocessor.
|
|
It incorporates an 8- stage pipeline, which can execute one
|
|
floating point operation every clock cycle. (In some instances,
|
|
it can execute two floating point operations per clock--when the
|
|
second instruction is an Exchange.)
|
|
|
|
The first four stages of the FPU pipeline are the same as that
|
|
of the integer pipelines. The final four stages consist of a
|
|
two-stage Floating Point Execute, rounding and writing of the
|
|
result to the register file, and Error Reporting. The FPU
|
|
incorporates new algorithms that increase the speed of common
|
|
operations (such as ADD, MUL, and LOAD) by a factor of 3 times.
|
|
|
|
Performance Improvements
|
|
|
|
The Pentium processor's new architectural features--its
|
|
superscalar design, separate instruction and data caches,
|
|
write-back data caching, branch prediction, and redesigned
|
|
FPU--will enable the development of new applications software,
|
|
in addition to improving the performance of current applications
|
|
in a manner that is completely transparent to the end user.
|
|
|
|
Internally, the Pentium processor uses a 32-bit bus, like that
|
|
of the Intel486. However, the external data bus to memory is
|
|
64-bits wide, doubling the amount of data that may be
|
|
transferred in a single bus cycle. The Pentium processor
|
|
supports several types of bus cycles, including burst mode,
|
|
which loads large (256-bit) portions of data into the data cache
|
|
in a single bus cycle. The 64-bit data bus allows the Pentium
|
|
processor to transfer data to and from memory at rates up to 528
|
|
Mbyte/sec, a more than 3-fold increase over the peak transfer
|
|
rate of the 50 MHz Intel486 (160 Mbyte/sec).
|
|
|
|
Several instructions (such as MOV and ALU operations) have been
|
|
hardwired into the Pentium processor, which allows them to
|
|
operate more quickly. In addition, numerous microcode
|
|
instructions execute more quickly due to the Pentium processor's
|
|
dual pipelines. Finally, the Pentium processor features an
|
|
increased page size, which results in less page swapping in
|
|
larger applications.
|
|
|
|
The result of the Pentium processor's new architectural features
|
|
and enhancements to the 486 architecture is performance
|
|
improvement ranging from 3 to 5 times (5 to 10 times for
|
|
floating point intensive applications) when compared to a 33 MHz
|
|
486 DX and 2.5 times when compared to the 66 MHz Intel486 DX2
|
|
CPU.
|
|
|
|
Such dramatic performance improvements will meet the demands of
|
|
computing in a number of areas: advanced multitasking operating
|
|
systems that support graphical user interfaces, such as Windows
|
|
NT, OS/2, and new Unix implementations; compute-intensive
|
|
graphics applications such as 3-D modeling, computer-aided
|
|
design/engineering (CAD/CAE), large-scale financial analysis,
|
|
high-throughput client/server; handwriting and voice
|
|
recognition; network applications; virtual reality; electronic
|
|
mail that combines many of the above areas; and new applications
|
|
yet to be developed.
|
|
|
|
Data Integrity
|
|
|
|
The Pentium processor employs a number of techniques to maintain
|
|
the integrity of the data with which it is working. Error
|
|
detection is performed on two levels: via parity checking at the
|
|
external pins; and internally, on the on-chip memory structures
|
|
(cache, buffers, and microcode ROM.)
|
|
|
|
For situations where data integrity is especially crucial, the
|
|
Pentium processor supports Functional Redundancy Checking (FRC).
|
|
FRC requires the use of two Pentium chips, one acting as the
|
|
master and the other as the "checker". The two chips run in
|
|
tandem, and the checker compares its output with that of the
|
|
master Pentium processor to assure that errors have not
|
|
occurred. The use of FRC results in an error detection rate that
|
|
is greater than 99 percent.
|
|
|
|
The Pentium processor includes a number of built-in features for
|
|
testing the reliability of the chip. These include: a Built-In
|
|
Self Test that tests 70% of the Pentium processor's components
|
|
upon resetting the chip; an implementation of the IEEE 1149.1
|
|
standard (Test Access Port and Boundary Scan Architecture),
|
|
which provides a standard interface for manufacturers to test
|
|
the external connections to the Pentium processor; and Probe
|
|
Mode which provides access to the software visible Pentium
|
|
processor registers for the purpose of determining the current
|
|
state of the processor.
|
|
|
|
The Pentium processor also provides performance monitoring
|
|
features that will make it easier for developers to take fullest
|
|
advantage of the Pentium processor's superscalar architecture.
|
|
System developers will be able to monitor the "hit rates" of the
|
|
instruction and data caches, as well as the length of time the
|
|
Pentium processor spends waiting for the external bus, which
|
|
will help in the optimal design of external memory. The ability
|
|
to measure address generation interlocks and parallelism will
|
|
help compiler authors develop the most effective methods for
|
|
instruction scheduling.
|
|
|
|
So that system developers may design systems with different
|
|
features for specific applications, the Pentium processor
|
|
incorporates a System Management Mode (SMM) similar to that of
|
|
the Intel386 SL architecture. Power management and security
|
|
features are two areas for which SMM is useful.
|
|
|
|
High Performance While Maintaining Compatibility
|
|
|
|
The Pentium processor is a high-performance microprocessor that
|
|
incorporates the latest state-of-the-art design principles to
|
|
meet the needs of newly developing areas of applications
|
|
software, while nevertheless maintaining complete compatibility
|
|
with the $50 billion installed base of software currently
|
|
running on members of the x86 family.
|
|
|
|
Users will experience dramatic performance improvements while
|
|
running their current software, and can anticipate new
|
|
applications that take advantage of the Pentium processor's
|
|
high-performance features.
|
|
|
|
|
|
|
|
Intel486, i486, Intel386 and Pentium are trademarks of Intel
|
|
Corporation.
|
|
|
|
|
|
|
|
|
|
|