textfiles/computers/pentium.txt

328 lines
16 KiB
Plaintext

Intel Pentium(TM) Processor Technical Backgrounder
Intel's Pentium(TM) Processor: The Next Generation of Power
The Pentium(TM) processor is the newest and most powerful member
of Intel'sx86 family of microprocessors. While incorporating new
features and improvements made possible by advances in
semiconductor technology, the Pentium processor is 100% code
compatible with previous members of x86 family, preserving the
value of user's software investments.
The Pentium processor incorporates a superscalar architecture,
improved floating point unit, separate on-chip code and
write-back data caches, 64- bit external data bus, and other
features designed to provide a platform for high-performance
computing.
The State of Processor Design Art
In recent years, developments in the art of semiconductor design
and manufacturing have made it possible to produce increasingly
more powerful microprocessors in smaller and smaller packages.
Chief among these developments has been the decreasing size of
components. Microprocessor designers are now working with CMOS
(complementary metal-oxide semiconductor) process technology
with features of less than a micron (one- millionth of a meter)
in size.
The use of sub-micron components allows designers to fit more of
them on a chip. The number of transistors in each member of the
x86 family has continued to grow, culminating in the Pentium
processor, which is implemented in 0.8 micron CMOS technology
and has 3.1 million transistors.
The increase in transistors has made it possible to integrate
components that were previously external to the CPU (such as
math coprocessors and caches) and place them on-board the chip.
Placing components on-board decreases the time required to
access them and increases performance dramatically. Another way
to decrease the distance between components (and therefore
increase the speed of communication between them) is to provide
multiple levels of metal for interconnection. Intel's current
microprocessor technology utilizes a 3 metal layer, the layout
of which requires special computer-aided design tools.
The Pentium processor utilizes the latest in microprocessor
design technology to provide performance comparable to that of
alternative architectures used in scientific and engineering
workstations, while maintaining compatibility with the immense
installed base of software now available for the x86 family of
microprocessors.
Intel's x86 family
The history of the personal computer industry is intimately
associated with the history of Intel's x86 chip family. In 1985,
Intel introduced the ground-breaking Intel386TM DX CPU, a 32-bit
microprocessor that executed 3 to 4 million instructions per
second (MIPS). Available in speeds ranging from 16 MHz up to 33
MHz, the 80386 addresses up to 4 gigabytes of physical memory,
and up to 64 terabytes of "virtual memory" (a technology
borrowed from mainframe computers that allows systems to work
with programs and data larger than their actual physical memory.)
The 80386 provided for true, robust multitasking and the ability
to create "virtual 8086" systems, each running securely in its
own 1-megabyte address space. Like its predecessors, the i386 DX
microprocessor spawned a new generation of personal computers,
which had the ability to run 32-bit operating systems and
ever-more complicated applications, all the while maintaining
compatibility with previous members of the x86 family.
In 1989, Intel shipped the Intel486TM DX microprocessor, which
incorporated an enhanced 386-compatible core, math coprocessor,
cache memory, and cache controller--a total of 1.2 million
transistors--all on a single chip. Operating at an initial speed
of 25MHz, the Intel486 DX CPU processed up to 20 MIPS. At its
current peak speed of 50 MHz, the Intel486 DX CPU processes up
to 41 MIPS. By incorporating RISC principles in its CPU core
(specifically, instruction pipelining), the Intel486 DX CPU is
able to execute most instructions in a single clock cycle. In
spite of these powerful new features, the Intel486 DX
microprocessor maintains 100% compatibility with previous
members of the x86 family, thereby preserving customers'
investment in software.
With the 1992 introduction of the Intel486TM DX2 microprocessor,
Intel
increased the speed of the 486 family by as much as 70 percent.
The DX2 family features a technology called "clock doubling,"
which allows the processor to operate twice as fast internally
as externally. The Intel486 DX2 CPU is nevertheless
pin-compatible with the Intel486 DX processor. At its current
peak speed of 66 MHz, the Intel486 DX2 CPU executes up to 54
MIPS.
The Pentium processor is the next step in Intel's commitment to
provide the highest possible performance at the best price,
while maintaining compatibility with previous Intel processors.
First Superscalar x86 Compatible Processor
The heart of the Pentium processor is its superscalar design,
built around two instruction pipelines, each capable of
performing independently. These pipelines allow the Pentium
processor to execute two integer instructions in a single clock
cycle, nearly doubling the chip's performance relative to a
Intel486 chip at equal frequency.
The Pentium processor's pipelines are similar to the single
pipeline of the Intel486 CPU, but they have been optimized to
provide increased performance. Like the Intel486 CPU's pipeline,
the pipelines in the Pentium processor execute integer
instructions in 5 stages: Prefetch, Instruction Decode, Address
Generate, Execute, and Write Back. When an instruction passes
from Prefetch to Instruction Decode, the pipeline is then free
to begin another operation.
In many instances, the Pentium processor can issue two
instructions at once, one to each of the pipelines, in a process
known as " instruction pairing." In this case, the instructions
must both be "simple", and the v- pipe always receives the next
sequential instruction after the one issued to the u-pipe. Each
pipeline has its own ALU (arithmetic logic unit), address
generation circuitry, and interface to the data cache.
While the Intel486 microprocessor incorporated a single 8 Kbyte
cache, the Pentium processor features two 8K caches, one for
instructions and one for data. These caches act as temporary
storage places for instructions and data obtained from slower,
main memory; when a system uses data, it will likely use it
again, and getting it from an on-chip cache is much faster than
getting it from main memory.
The Pentium processor's caches are 2-way set-associative caches,
an improvement over simpler, direct-mapped designs. They are
organized with 32-byte lines, which allows the cache circuitry
to search only 2 32-byte lines rather than the entire cache. The
use of 32-byte lines (up from 16- byte lines on the 486 DX) is a
good match of the Pentium processor's bus width (64 bits) with
burst length (4 chunks.)
When the circuitry needs to store instructions or data in a
cache that is already filled, it discards the least recently
used information (according to an "LRU" algorithm) and replaces
it with the information at hand.
The data cache has two interfaces, one to each of the pipelines,
which allows it to provide data for two separate operations in a
single clock cycle. When data is removed from the data cache
(and only then), it is written into main memory, a technique
known as write-back caching. Write- back caching provides better
performance than simpler write-through caching, in which the
processor writes data to the cache and main memory at the same
time (though the Pentium processor can be dynamically configured
to support write-through caching).
To ensure that the data in the cache and in main memory are
consistent with one another (especially a concern with
multiprocessor systems), the data cache implements a cache
consistency protocol known as MESI. This protocol defines four
states, which are assigned to each line of the cache based on
actions performed on that line by a CPU. By obeying the rules of
the protocol during memory read/writes, the Pentium processor
maintains cache consistency and circumvents problems that might
be caused by multiple processors using the same data.
The use of separate caches for instructions and data works in
conjunction with other elements of the Pentium processor's
design to provide increased performance and faster throughput
compared to the Intel486 microprocessor. For example, the first
stage of the pipeline is Prefetch, during which instructions are
obtained from the instruction cache. With a single cache,
conflicts might occur between instruction prefetches and data
accesses. Providing separate caches for instructions and data
precludes such conflicts and allows both operations to take
place simultaneously.
The Pentium processor also increases performance by using a
small cache known as the Branch Target Buffer (BTB) to provide
dynamic branch prediction. When an instruction leads to a
branch, the BTB "remembers" the instruction and the address of
the branch taken. The BTB uses this information to predict which
way the instruction will branch the next time it is used,
thereby saving time that would otherwise be required to retrieve
the desired branch target. When the BTB makes a correct
prediction, the branch is executed without delay, which enhances
performance.
The combination of instruction pairing and dynamic branch
prediction can speed operations considerably. For example, a
single iteration of the classic Sieve of Eratosthenes benchmark
requires 6 clock cycles to execute on the Intel486
microprocessor. The same code executes in only 2 clock cycles on
the Pentium processor.
Improved Floating Point Unit
The floating point unit in the Pentium processor has been
completely redesigned over that in the Intel486 microprocessor.
It incorporates an 8- stage pipeline, which can execute one
floating point operation every clock cycle. (In some instances,
it can execute two floating point operations per clock--when the
second instruction is an Exchange.)
The first four stages of the FPU pipeline are the same as that
of the integer pipelines. The final four stages consist of a
two-stage Floating Point Execute, rounding and writing of the
result to the register file, and Error Reporting. The FPU
incorporates new algorithms that increase the speed of common
operations (such as ADD, MUL, and LOAD) by a factor of 3 times.
Performance Improvements
The Pentium processor's new architectural features--its
superscalar design, separate instruction and data caches,
write-back data caching, branch prediction, and redesigned
FPU--will enable the development of new applications software,
in addition to improving the performance of current applications
in a manner that is completely transparent to the end user.
Internally, the Pentium processor uses a 32-bit bus, like that
of the Intel486. However, the external data bus to memory is
64-bits wide, doubling the amount of data that may be
transferred in a single bus cycle. The Pentium processor
supports several types of bus cycles, including burst mode,
which loads large (256-bit) portions of data into the data cache
in a single bus cycle. The 64-bit data bus allows the Pentium
processor to transfer data to and from memory at rates up to 528
Mbyte/sec, a more than 3-fold increase over the peak transfer
rate of the 50 MHz Intel486 (160 Mbyte/sec).
Several instructions (such as MOV and ALU operations) have been
hardwired into the Pentium processor, which allows them to
operate more quickly. In addition, numerous microcode
instructions execute more quickly due to the Pentium processor's
dual pipelines. Finally, the Pentium processor features an
increased page size, which results in less page swapping in
larger applications.
The result of the Pentium processor's new architectural features
and enhancements to the 486 architecture is performance
improvement ranging from 3 to 5 times (5 to 10 times for
floating point intensive applications) when compared to a 33 MHz
486 DX and 2.5 times when compared to the 66 MHz Intel486 DX2
CPU.
Such dramatic performance improvements will meet the demands of
computing in a number of areas: advanced multitasking operating
systems that support graphical user interfaces, such as Windows
NT, OS/2, and new Unix implementations; compute-intensive
graphics applications such as 3-D modeling, computer-aided
design/engineering (CAD/CAE), large-scale financial analysis,
high-throughput client/server; handwriting and voice
recognition; network applications; virtual reality; electronic
mail that combines many of the above areas; and new applications
yet to be developed.
Data Integrity
The Pentium processor employs a number of techniques to maintain
the integrity of the data with which it is working. Error
detection is performed on two levels: via parity checking at the
external pins; and internally, on the on-chip memory structures
(cache, buffers, and microcode ROM.)
For situations where data integrity is especially crucial, the
Pentium processor supports Functional Redundancy Checking (FRC).
FRC requires the use of two Pentium chips, one acting as the
master and the other as the "checker". The two chips run in
tandem, and the checker compares its output with that of the
master Pentium processor to assure that errors have not
occurred. The use of FRC results in an error detection rate that
is greater than 99 percent.
The Pentium processor includes a number of built-in features for
testing the reliability of the chip. These include: a Built-In
Self Test that tests 70% of the Pentium processor's components
upon resetting the chip; an implementation of the IEEE 1149.1
standard (Test Access Port and Boundary Scan Architecture),
which provides a standard interface for manufacturers to test
the external connections to the Pentium processor; and Probe
Mode which provides access to the software visible Pentium
processor registers for the purpose of determining the current
state of the processor.
The Pentium processor also provides performance monitoring
features that will make it easier for developers to take fullest
advantage of the Pentium processor's superscalar architecture.
System developers will be able to monitor the "hit rates" of the
instruction and data caches, as well as the length of time the
Pentium processor spends waiting for the external bus, which
will help in the optimal design of external memory. The ability
to measure address generation interlocks and parallelism will
help compiler authors develop the most effective methods for
instruction scheduling.
So that system developers may design systems with different
features for specific applications, the Pentium processor
incorporates a System Management Mode (SMM) similar to that of
the Intel386 SL architecture. Power management and security
features are two areas for which SMM is useful.
High Performance While Maintaining Compatibility
The Pentium processor is a high-performance microprocessor that
incorporates the latest state-of-the-art design principles to
meet the needs of newly developing areas of applications
software, while nevertheless maintaining complete compatibility
with the $50 billion installed base of software currently
running on members of the x86 family.
Users will experience dramatic performance improvements while
running their current software, and can anticipate new
applications that take advantage of the Pentium processor's
high-performance features.
Intel486, i486, Intel386 and Pentium are trademarks of Intel
Corporation.