textfiles/computers/r4300i.txt

624 lines
24 KiB
Plaintext

R4300i MICROPROCESSOR
MIPS R4300i Microprocessor
Technical Backgrounder -Preliminary
Chapter 1. R4300i Technical Summary
-=- [PREFORMATTED] =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Performance: SPECint92 60
SPECfp92 45
ISA Compatibility MIPS-I, MIPS-II, AND MIPS-III
Master Clock Frequency 10 Mhz min/ 67 Mhz max
Pipeline Clock 10 Mhz min/ 100 Mhz max (1x, 1.5x, 2x, or 3x of master clock)
System Interface clock 67 Mhz max
Caches 16 KB I-cache and 8 KB D-cache
TLB 32 double entries; Variable Page size (4 KB to 16 MB in 4x increments)
Power dissipation: 1.8 watts (typ.) at max. operating frequency
Supply voltage min 3.0 V typ. 3.3 V max 3.6 V
Packaging: 120-pin Plastic Quad Flat Pack (PQFP)
Die size: 44 mm2
Process Technology: 0.35 micron, 3-level-metal CMOS
-=- [PREFORMATTED] =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Chapter 2. Overview
This paper introduces the RISC R4300i microprocessor from MIPS Technologies,
Inc. (MTI). The information presented in this paper discusses how
the R4300i differs from previous microprocessors from MTI.
This chapter provides general information on the R4300i, including:
ÿÿÿ[*]Introduction
ÿÿÿ[*]The R4300i microprocessor
ÿÿÿ[*]Packaging and design support
ÿÿÿ[*]Future upgrades
Introduction
Reduced instruction-set computer (RISC) architectures differ from
older complex instruction-set computer (CISC) architectures by optimizing
performance for the available silicon area. The MIPS architecture,
developed by MTI, is firmly established as the leading RISC architecture
today.
The MIPS R4300i microprocessor extends the benefits of RISC's performance
to consumer electronics. The R4300i microprocessor also delivers
high performance to existing embedded and computing applications
at a low cost. The low cost and high performance provided by the
R4300i are needed for the latest consumer applications such as interactive
television and games.
In the beginning, RISC microprocessors were typically used for high
performance applications. Lately, these processors have found their
way into the embedded systems market as well. Today, MIPS RISC processors
are used in network controllers, laser printers, and X-terminals
among other applications. The migration of MIPS RISC processors to
these applications has been facilitated by lower costs as well as
high integration of various functional blocks into a single die.
The R4300i can deliver up to 60 times the integer performance of
a VAX 11/780 (60 SPECint '92) at a cost approaching less than $1
per SPECint. The R4300i is also designed for low power so that it
can be offered in a low cost plastic package. This makes the R4300i
a strong candidate for consumer and embedded applications.
Between 1985 and 1994, three generations of the MIPS architecture
have been introduced and widely adopted. The first commercial MIPS
processor, the R2000, ran at 8-MHz and used a 32-bit architecture.
The R3000 family raised system speed to 40 MHz. The R4000 family
uses a 64-bit architecture to boost instruction throughput and increase
the available address space. It also adds multilevel cache and multiprocessor
capabilities. The R4000 family (R4400PC, R4400SC, R4400MC, R4000PC
and R4000SC) currently work at pipeline speeds of up to 200 MHz.
Recently MTI announced the MIPS R10000 microprocessor that offers
industry leading performance for scientific and database applications.
MTI's semiconductor partners have successfully implemented MIPS standard
processors in a variety of semiconductor processes and introduced
numerous derivative products. MTI semiconductor partners include
Integrated Device Technology, Inc., LSI Logic Corp., NEC Corporation,
Performance Semiconductor, Inc., Siemens and Toshiba Corporation.
Users of the MIPS architecture include AT&T, Cisco, Control Data,
NEC, Network Computing Devices, Pyramid Technology, QMS, Siemens-
Nixdorf, Silicon Graphics, Sony, Texas Instruments and Tektronix.
R4300i Microprocessor
The R4300i uses a variety of techniques to provide high performance
at low cost and low power consumption. These techniques include power-
reduction features, power management features, cost-reduction features,
and architectural optimizations.
Major R4300i characteristics include 64-bit processing, 100-MHz internal
pipeline clock frequency, low-voltage operation, power-saving modes,
plastic packaging, and a single data path for integer and floating-
point operations. The R4300i implements the MIPS-III instruction
set architecture and is fully software compatible with all existing
MIPS processors. Chapter 3 provides a complete description of architectural
enhancements in the R4300i over previous MIPS microprocessor families
and other R4000 family members.
R4300i Family
The R4300i is the first member of a family of microprocessors. The
R4300i family uses high integration, power management and virtual
memory implementation to bring high performance and low cost to the
consumer market. Future R4300i family members are planned to increase
the options available to systems developers.
The R4300i has been designed in well-defined basic blocks to simplify
implementation of the R4300i core logic in new products. For example,
by removing the caches, memory-management unit, and system interface,
a high-performance RISC core is available for integration in a derivative
processor or ASIC.
Packaging and Design Support
The R4300i will be made available in a single 120-pin PQFP package.
The 120-pin plastic quad flat pack (PQFP) offers a low-cost package
for surface-mount assembly, decreasing processor cost further to
benefit embedded applications, and with a low profile suitable for
consumer applications.
Future Upgrades
It is anticipated that higher frequency versions of the R4300i will
become available in the future. These will provide an extra performance
boost at the same low price points as the current R4300i.
Chapter 3. Implementation
The R4300i differs from the R4000 family in four main categories.
These categories, discussed in the next sections, are:-
ÿÿÿ[*]Power reduction features
ÿÿÿ[*]Power management features
ÿÿÿ[*]Cost reduction features
ÿÿÿ[*]Architectural optimization
ÿÿÿ[*]System bus interface
These features combine to achieve typical power dissipation of 1.8
watts, a reduced power mode dissipation of 0.4 watts, and a power-
down mode where the processor is turned off. These features also
allow a die size of less than 7mm on a side, while maintaining full
64-bit operation.
Power Reduction Features
The R4300i is designed using low-power design techniques. These are
techniques that reduce power dissipation while running standard tasks.
Examples of low-power design techniques, discussed below, are:
ÿÿÿ[*]3.3-Volt operation
ÿÿÿ[*]Dynamic logic design
ÿÿÿ[*]Cache bank partitioning
ÿÿÿ[*]Write-back data cache
ÿÿÿ[*]Cache prefetching
ÿÿÿ[*]Micro TLB
3.3-Volt operation
The R4300i was designed for operation at 3.3V to reduce power consumption.
CMOS power dissipation increases with the square of potential difference
between power (VDD) and ground (VSS). At lower voltage levels, however,
the threshold voltage at which a logic signal switches between zero
to one changes. Redesign of the gate's physical width-to-length ratio
is necessary to maintain logic speed at lower voltages; that slightly
increases total power dissipation. In sum, reducing the rail-to-rail
voltage difference by 1.7 volts reduces overall comparative power
dissipation to about 70%.
A lower rail-to-rail (VDD to VSS) voltage difference also increases
the chip's sensitivity to electrical noise, as there is a smaller
noise margin around the threshold voltage. MTI designed the R4300i
interface using low-voltage CMOS (LVCMOS) characterization to ensure
noise immunity and reliable signal operation.
Dynamic logic design
The R4300i uses dynamic rather than static logic design to reduce
transistor count and power dissipation. The R4300i 's power management
functions make the dynamic logic design approach suitable for use
in consumer applications.
Cache bank partitioning
Most instruction and data cache accesses exhibit spatial and temporal
locality of reference. In the R4300i, the instruction and data caches
are each split into four banks; only one of the four banks in the
instruction or data caches is powered up at any one time. This saves
power on every cache access cycle.
There is no performance degradation on cache bank misses; enabling
of cache banks is entirely transparent to system operation.
Write-back cache
The R4300i uses a write-back policy for write operations. In a write-
back policy, data in the cache is written to the main memory only
when the cache line is replaced. Cache lines can be replaced whenever
new data from a different address is to be loaded into the cache
or if the cache line is invalidated.
A write-back cache policy reduces store activity on the system bus,
which improves system performance and simplifies memory subsystem
design.
Cache prefetching
As instruction reads are usually sequential, the R4300i fetches two
consecutive 32-bit words (instructions) every time it reads the instruction
cache. Dual instruction access from the cache reduces the frequency
of cache enabling for instruction access, which reduces power dissipation.
TLB and Micro Instruction-TLB
The memory management unit (MMU) translates virtual addresses to
physical addresses by looking up address correspondences in a "page
table". The processor maintains a complete page table in main
memory, but accesses to main memory are slow. The translation lookaside
buffer (TLB) keeps copies of page table entries on-chip, which accelerates
virtual-to-physical address translation.
The TLB structure is large and consumes power when enabled. The R4300i
therefore includes a "micro TLB" on chip which contains
page table entries for the two most recently referenced instruction
pages.
Power Management Features
The R4300i also has built-in power management features in addition
to power reduction features. Power management is used when peak performance
is not required and it allows the processor to operate in different
modes. These modes require less power and therefore reduce average
power consumption over a period of time. The power management modes
, discussed below, are:-
ÿÿÿ[*]Standard operating mode
ÿÿÿ[*]Sleep mode
ÿÿÿ[*]Power-down mode
Standard operating mode
In this mode the processor operates at a maximum of 100 MHz pipeline
speed and 50 MHz external interface speed. Power dissipation at maximum
frequency in this mode is estimated at 1.8 watts.
Sleep mode
This feature allows the processor to change dynamically to one quarter
of the normal speed. For example, if the pipeline operates normally
at 100 Mhz, it would operate at 25 Mhz in sleep mode. Typically,
chipset logic triggers this mode when it detects no user activity
over some pre-determined amount of time(such as between keystrokes
or mouse movements). In reduced power mode, power dissipation falls
to 0.4 Watts, one quarter of the normal.
Power-down mode
In the R4300i, all variable registers are both readable and writable.
On power-down, the state of the processor can therefore be written
to non-volatile RAM. On power-up, the registers can be restored to
the same state.
This "instant-on" capability allows the processor to be
activated in milliseconds, instead of the many seconds normally required
to boot an operating system. This not only reduces power consumption
but is a power saving benefit for consumers.
Cost Reduction Features
The R4300i is designed for low cost. The main areas contributing
to microprocessor cost are packaging, test and assembly, and die
costs.
Packaging cost reduction
For low cost applications, the R4300i will be available in a 120-
pin plastic quad flat package (PQFP). High-performance microprocessors
normally require expensive ceramic or metal packages with enhanced
thermal dissipation because of their higher power requirements.
The reduction in the system bus width from 64 bits to 32 bits and
elimination of some control signals resulted in a low pin-count package
offering. Lower power dissipation facilitated the use of plastic
packages.
Test & assembly cost reduction
The R4300i implements column redundancy in both instruction and data
caches. Column redundancy reduces the chip's sensitivity to defects
in the caches.
The test process first tests the integrity of each bit column in
the cache; polysilicon fuses in the chip are then blown to swap redundant
bit columns for defective bit columns.
Column redundancy increases the die yield at the test stage, which
in turn decreases testing costs for each good die.
Die cost reduction
The die area was reduced by the following techniques:
ÿÿÿHigh-density 0.35 micron design rules
ÿÿÿUse of 4-transistor RAM cells in the caches
ÿÿÿUnified CPU/FPU data path
ÿÿÿReduced configurability
ÿÿÿOptimized cache and TLB size
The last three cost reduction strategies fall in the category of
architectural optimizations, discussed below.
Architectural Optimizations
The R4300i fully implements the current ISA, MIPS-III standard.
As in all the MIPS R3000 and R4000 processors, an on-chip CP0 coprocessor
contains an MMU for virtual address translation and exception processing
control. The system address/data bus interface is different from
the R4000 series processors. The R4300i has a 32-bit multiplexed
address/data bus and a 5-bit command bus. A flush buffer, which
was added to the R4400 for increased graphics performance, is retained
in the R4300i.
The R4300i also includes on-chip clock frequency division circuitry
to support internal 100-MHz operation from an external 50-MHz clock.
The R4300i has the option of operating internally at 1, 1.5, 2 or
3 times the frequency of the external clock. In addition, the R4300i
eliminates RClock that existed in the R4000. The output clocks SyncOut
and TClock can be turned off for power savings. This allows high
microprocessor performance and also simplifies system design.
The differences from the R4000, discussed below, are:
ÿÿÿ[*]Combined integer/floating-point data path
ÿÿÿ[*]Optimized 5-stage pipeline
ÿÿÿ[*]Optimized cache and TLB size
ÿÿÿ[*]Reduced physical address space
ÿÿÿ[*]Additional instruction trace support
ÿÿÿ[*]Simplified processor initialization
Unified integer/floating-point data path
The R4300i's integer unit shares its data path with the FPU unit.
CPU and floating-point instructions are both executed in the same
5-stage pipeline. This represents a considerable saving in die area
and corresponding power dissipation.
Optimized pipeline
Pipelining allows multiple instructions to overlap during execution
for greater throughput. All processor operations require five basic
operations: instruction fetch, instruction decode, instruction execution,
accessing data and writing of the results. These operations can be
split up over a pipeline so that several instructions can be treated
concurrently: while one instruction is being decoded, another can
be fetched, and so on.
The R4300i uses a five-stage pipeline instead of the eight-stage
pipeline found in the R4000 series processors. The eight-stage pipeline
allows the R4000 series processors to reach higher speeds. However,
the shorter pipeline achieves greater efficiency at a given speed
than its longer counterpart. Loads, stores, jumps and branches are
resolved in fewer cycles, and exception processing is simplified.
Less control logic means reduced die area.
Optimized cache and TLB size
The R4300i uses separate instruction and data caches. Instruction
cache size impacts performance to a greater extent owing to locality
of instruction code. The combination of 16-Kbytes instruction cache
and 8-Kbytes data cache gives optimum performance for a fixed total
cache size. The on-chip TLB is also reduced from the 48 entry-pairs
in the R4000 to 32 entry-pairs. These sizes have been selected after
extensive simulation to give the R4300i the best trade-off between
high performance and small die area. TLBs are very critical in implementing
virtual memory systems. In consumer applications such as settops,
the existence of TLBs allow for implementation of security as well
as fast context switching times when running multiple processes.
In addition, virtual page sizes from 4KB, 16KB, 64KB, 256KB, 1MB,
4MB, and 16MB are supported just as in the R4000 series processors.
Reduced physical address space
The physical address space has been reduced from 36 bits to 32 bits.
This still supports a physical address range of 4 Gbytes, more than
enough for consumer applications.
Additional instruction trace support
The R4300i includes an additional debugging mode called instruction
trace support. This mode lets the user find out the physical address
to which the CPU has branched or jumped whenever a branch, jump,
or exception is taken.
Simplified processor initialization
Processor initialization has been simplified, reflecting the lower
degree of configurability. The R4300i has two hardware pins for configuration
during the reset initialization sequence and a configuration register
in CP0 is used to set other options such as data rate or endianess.
Chapter 4. Benefit Summary
The main benefits of the R4300i, discussed below, are:
ÿÿÿ[*]Price/Performance
ÿÿÿ[*]Low Power
ÿÿÿ[*]Low Cost
ÿÿÿ[*]Compatibility
Price/Performance
The R4300i dramatically improves price/performance for both system
designers and end-users.
The R4300i achieves price/performance over ten times better than
existing microprocessors. The R4300i will be the first commercially
available processor to deliver less than $1/SPECint mark in 1995.
Low Power
The R4300i was specifically designed for high-performance consumer
applications by including reduced power and power management features.
ÿÿÿReduced-power features are those that perform traditional tasks
ÿÿÿin a way that reduces power consumption.
ÿÿÿPower management features are architectural enhancements that
ÿÿÿfurther reduce internal and system-wide power consumption through
ÿÿÿswitching modes.
The combination of reduced-power and power-management features allows
the R4300i to fit in an inexpensive plastic package.
Low Cost
In practice, good price/performance usually means increased performance
at a given price point. The R4300i, in contrast, brings existing
high-performance levels to an unprecedented low price-point.
System manufacturers can decrease component costs, thereby increasing
margins, or offer their products at a lower price, thereby stimulating
demand.
Price-sensitive consumer applications such as games and interactive
television benefit particularly from a low cost, high-speed microprocessor.
Factory automation and robotics are also likely applications for
the R4300i.
Compatibility
Software compatibility is a fundamental requirement to preserve investments
in software over time.
All MIPS processors maintain software compatibility. A program compiled
and linked to run on the R3000 processor will run on the R4300i.
Appendix A. Glossary
Cache. An on-chip temporary storage area containing a copy of main
memory fragments. Cache access is much faster than main memory access.
CISC (Complex Instruction Set Computing). A design approach that
attempts to achieve performance gains with complex instruction and
data types and hardware controlled memory management.
CPU (Central Processing Unit). The part of a microprocessor where
the majority of the instructions are executed.
Die. The silicon chip after it has been cut from a wafer and before
it has been packaged.
Flush Buffer (also called a write buffer). The flush buffer is a
temporary storage location for data that is being written from the
pipeline or cache to main memory. The flush buffer allows the processor
to continue executing instructions while data is being written to
main memory.
FPU (Floating-Point Unit). Dedicated logic to accelerate calculations
using floating-point numbers.
IU (Integer Unit). The part of a CPU that performs calculations using
integer arithmetic.
LVCMOS (Low-voltage CMOS). An IEEE standard for low-voltage logic
design.
MMU (Memory Management Unit). That part of a microprocessor which
implements virtual-to-physical address translation and the memory
system hierarchy including cache memory.
MTI (MIPS Technologies, Inc.). The developer of the MIPS RISC architecture,
the leading RISC architecture worldwide.
Page Table. An area of main memory containing sets of virtual addresses
with their corresponding physical addresses and protection data.
RISC (Reduced Instruction Set Computing). A design philosophy that
avoids implementing complex functions in silicon but realizes large
performance increases through executing simpler, standardized instructions
at faster, more efficient rates.
Pipeline. A mechanism to allow multiple instructions to overlap during
execution for greater throughput. A five-stage pipeline offers peak
performance five times that of a non-pipelined processor.
PQFP (Plastic Quad Flat Pack). A plastic package with pins on the
four edges, cheaper than a CPGA.
TLB (Translation Lookaside Buffer). An on-chip "page table"
cache containing copies of the page tables used by the MMU for virtual-
to-physical address translation.
64-bit Processor. A processor in which all address and data paths
are 64 bits wide. Leading-edge applications require 64-bit processors
today. 32-bit capability in a 64-bit processor is important to manage
the smooth transition from 32 to 64 bits. The MIPS R4000, R4400 and
R4300i MPUs can run in either 32-bit or 64-bit mode.
Appendix B. Bibliography
The following related documents are available from MIPS Technologies,
Inc.
MIPS R4400 Microprocessor, Technology Backgrounder.
MIPS Technologies, Inc., Corporate Backgrounder.
NEC Corporation, Corporate Backgrounder.
R4000 / R4400 User's Manual, [Prentice Hall].
MIPS RISC Architecture, Kane & Hall [Prentice Hall].
Contact MIPS Technologies, Inc. for a more complete list of publications
available on RISC technology and the MIPS architecture.
MIPS Technologies, Inc. Information Service:
1-800- I GO MIPS or 1-800-446-6477 Inside the US
415-688-4321 Outside the U.S
World Wide Web: URL to [*]http://www.mips.com
MIPS, the MIPS Technologies logo, and R3000 are registered trademarks,
and R2000, R4000, R4400, and R6000 are trademarks of MIPS Technologies,
Inc.
Windows NT is a trademark of Microsoft Corp.
---------------------------------------------------------------------------
HTMLCon Conversion Statistics and Information
---------------------------------------------------------------------------
Total bytes processed .......... 26394
Total lines processed .......... 665
Total links processed .......... 29
Total images processed ......... 0
Total strings replaced ......... 0
Total filters used ............. 10
Were references preserved ...... FALSE
Rough formatting preserved ..... FALSE
Space compressed used .......... TRUE
Rough line break used .......... 72