textfiles/programming/cyrix.cp

559 lines
26 KiB
C++
Raw Normal View History

2021-04-15 11:31:59 -07:00
From: S_JUFFA@IRAV1.ira.uka.de (|S| Norbert Juffa)
Newsgroups: comp.sys.ibm.pc.hardware,comp.sys.intel
Subject: Compatibility issues Cyrix 486DLC/Intel 486SX w/ regard to NeXTstep OS
Message-ID: <1j699uINN8qg@iraul1.ira.uka.de>
Date: 15 Jan 1993 12:05:18 GMT
Organization: University of Karlsruhe, FRG
Lines: 550
Compatibility issues Cyrix Cx486SLC/DLC as compared to the Intel 80486SX
There has been quite a bit of discussion here recently about compatibility
issues involving the Cyrix Cx486SLC and Cx486DLC processors, in particular
about the fact that the NextStep operating system doesn't run on the Cyrix
processors for some reason. During the course of this discussion, we have
heard *a lot of opinions* (e.g. "Intel sucks", "Cyrix sucks") but only *few
facts*. So I thought it might a good idea to throw in a bit of the latter.
I'll try to give the facts as accurate as possible, drawing from personal
experience and Intel's and Cyrix' literature on the 80486DX/SX and 486DLC/SLC.
If you think you have found erroneous information, feel free to contact me:
S_JUFFA@IRAVCL.IRA.UKA.DE (Norbert Juffa)
NOTE: I have no affiliation whatsoever with either Intel or Cyrix!
The Cyrix 486DLC is a replacement chip for the Intel/AMD 80386DX. The Cyrix
486SLC is a replacement chip for the Intel/AMD 386SX. While the internals of
the Cyrix 486SLC/DLC are roughly equivalent to those in the Intel 80486SX,
the bus interface of these chips is identical to that of the Intel 80386DX
and 80386SX CPUs, respectively to allow easy replacement of the Intel CPUs
by the Cyrix chips. This also means that the Cx486SLC, as a replacement for
the Intel/AMD 80386SX can only address 16 MB of memory.
The 486SLC/DLC CPUs have a register set that is identical to that found on
the Intel 80486SX. However, there are a few subtle differences in the
meaning of certain bits in some systems registers (e.g. cache test registers).
These are covered in more details below. The instruction sets of the Intel
486SX and the Cyrix Cx486SCL/DLC are identical. The execution times of specific
instructions differ between the two chips, but the overall execution speed
(measured in CPI = clocks per instruction) seems to be about same.
On both, the Intel 80486SX and the Cyrix 486SLC/DLC, there is *no* on-chip
FPU (floating point unit). To add floating point capabilities to a 486SX
based system, one would install an 487 'coprocessor', which is basically a
486DX with a slighty different pin-out, or replace the 486SX with an OverDrive
processor, a clock-doubled 486DX with the 486SX pinout. With the 486SLC/DLC,
one buys a 387 compatible coprocessor to add floating-point capabilities. It
is recommended to get a Cyrix coprocessor for this purpose, since these are
the fastest387 compatible coprocessors available. Also, Cyrix sells kits
consisting of a 486SLC/DLC and a coprocessor that have a favourable value
for money ratio. The floating-point performance of a Cyrix 486DLC + Cyrix
83D87 combination is about 50% of that of an Intel 486DX running at the same
frequency.
The Cyrix 486SLC/DLC have a RISC-like execution unit with a flexible five
stage pipeline, just as the 80486SX has. Unlike the Intel 80486, which has
an 8 kB, 4-way associative cache on chip, the Cx486SLC/DLC only have an 1
kB, 2-way associative cache (the cache on the Cyrix chips can also be
configured to be of the direct mapped type). The 486DLC provides up to 80%
more integer performance than a 386DX at the same clock frequency, with the
average performance gain being about 35%. With the 1 kB on-chip cache enabled,
the 486DLC provides about 75% of the integer performance of a 486SX at the
same clock frequency. With the cache disabled, the 486DLC provides about 65%
of the integer performance of a 486SX. The lower performance of the Cyrix
486DLC as compared to the Intel 80486SX is mostly due to the slow 386DX bus
interface the 486DLC uses, which is up to 2 times slower than the 486 bus
interface. Some additional performance penalty is imposed by the smaller
cache on the 486SCL/DLC, which provides significantly lower hit rates than
the 8 kB cache of the 80486SX.
I have personally used the Cyrix 486DLC with my 33.3/40 MHz 386 motherboard,
which uses the Forex chip set. I have also used the Intel RapidCAD and the
C&T 38600DX with this board. These are also replacement chips for the 386DX.
Replacing the 386DX is very easy: Just pull out the AMD/Intel 386DX, then
plug in the replacement chip (here: the Cyrix 486DLC). I haven't had *major*
problems with either of the available replacement chips. The problems
encountered using the Cyrix 486DLC were:
1) When a Cyrix EMC87, Cyrix 83D87 (chips produced prior to November 1991),
or IIT 3C87 coprocessor is used with the 486DLC, the computer locks up
completely at times, especially when running protected mode multitasked
operating systems, such as Windows 3.1 in enhanced mode. This is caused
by problems with the FSAVE and FRSTOR instructions when using these
coprocessors. Cyrix tells me that this problem only occurs with first
generation 486DLCs (read: sample chips like the one I have) and that the
bug is fixed on the chips that are now available to OEMs and end users.
2) When using the DBOS 1.0 DOS-extender delivered with the Salford FTN/386
Fortran compiler, the executable of the DODUC benchmark produced by that
compiler aborts with a general protection fault. The DODUC executable
runs fine with the DBOS 1.0 DOS-extender on the Intel 386DX, C&T 38600DX,
Intel RapidCAD, and Intel 80486DX. I have informed Cyrix of the problem.
As for the problems with NextStep on the 486DLC, I have no idea what causes
them. I can think of the following possibilities:
1) NextStep has been tailored extremely close to the 486 programming model,
not allowing for even slight changes in the architecture (e.g. smaller
cache), so that the subtle changes needed to adapt the different HW of
the Cyrix 486SLC/DLC to the 486 programming model can not be accomodated.
2) NextStep includes code that only runs because it uses officialy undocumented
features of the 80486 that have not been disclosed by Intel to other vendors.
3) NextStep includes code that only runs correctly on the 80486 by accident.
E.g. it could mask the contents of an system register and erroneously
include a bit that is undefined as per Intel's documentation. This undefined
bit could then be '1' on the 80486 and '0' on the 486SLC/DLC, for example,
thus leading to corruption of the system further down the execution path.
4) For correct execution, NextStep relies on the timing of certain instructions
that execute slower or faster on the Cx486SLC/DLC than they do on the Intel
80486SX (a chip that reportedly runs NextStep).
5) NEXT Corporation used an early and possibly buggy sample chip to do their
compatibility testing.
6) There is a bug in the Cyrix 486SLC/DLC that only creeps up if protected
mode system level programs are used, similar to the problem I encountered
with the DBOS 1.0 DOS-extender that is described above. However, it is
interesting to note that several 32-bit operating systems have been
successfully tested on the 486SLC/DLC (see below).
Summary of Intel 486SX / Cyrix 486SLC/DLC implementation details
Intel 486SX
bus interface: supports burst mode memory accesses with the first
memory access taking two clock cycles and subsequent
accesses taking only one clock cycle.
prefetch queue: 32 bytes
on-chip cache: 8 KByte unified (code and data) write-through cache.
The cache is 4-way set-associative, with 128 sets
consisting of four cache lines each. Every cache line
consists of 16 bytes. Four write buffers. Hit rate: ~95%
Invalidation of cache lines: total cache line
execution unit: RISC-like execution unit with five stage pipeline. Barrel
shifter. Conditional jump taken/not taken: 3/1 clock cycles.
Instructions that can be executed in 1 clock cycle if the
destination is a register and the source is either a register
or an immediate value:
ADC,ADD,AND,BSWAP,CMP,DEC,INC,MOV,NEG,NOT,OR,POP,PUSH,SBB,
SUB,TEST,XOR
Cyrix 486DLC
bus interface: Cx486SLC/DLC uses same the same bus interface as the
Intel 386DX/386SX. Highest speed at which memory is
accessed is two clock cycles per memory access, there
is *no* burst mode. Seven additional signals have been
assigned to pins that are not connected on the 386DX/
386SX. After power-on or reset, these pins are also
electrically disabled on the Cx486SLC/DLC and must be
specifically enabled by software. Signals added are used
for cache management (KEN#, FLUSH#, RPLSET and RPLVAL#),
power management (SUSP#, SUSPA#), and A20 control (A20M#).
Each signal can be enabled/disabled independently of the
enable/disable status of the other signals.
instruction set: complete Intel 486SX instruction set, including *all*
486 specific instructions: WBINVD (write back and
invalidate data cache), XADD (exchange and add), CMPXCHG
(compare and exchange), BSWAP (Byte Swap), INVLPG
(Invalidate TLB entry), INVD (Invalidate Data Cache)
prefetch queue: 16 bytes
on-chip cache: 1 KByte unified (code and data) write-through cache.
The cache is 2-way set-associative, with 128 sets
consisting of two cache lines each. Every cache line
consists of 4 bytes. Two write buffers. Hit rate: ~65%
Invalidation of cache lines: single bytes in cache line
The cache is disabled after power-on or reset for
compatibility reasons and must be enabled by software.
Under DOS, you can use a program provided by Cyrix for
this purpose. As far as I know, there are no programs
available yet for OS/2 and Unix that enable the cache.
execution unit: RISC-like execution unit with five stage pipeline. Barrel
shifter. 16x16 bit hardware multiplier (16x16 bit multiply:
3 cycles, 32x32 bit multiply: 7 cycles, AAD: 4 cycles).
Conditional jump taken/not taken: 6/1 clock cycles.
Instructions that can be executed in 1 clock cycle
if the destination is a register and the source is
either a register or an immediate value:
ADC,ADD,AND,CDQ,CLC,CLD,CMC,CMP,CWD,DEC,INC,MOV,MOVSX,
NEG, NOT,OR,SBB,SHLD,SHRD,STC,STD,SUB,TEST,XOR
Summary of known compatiblity issues
The following is an extract from the Cx486SLC and Cx486DLC Compatibility
Report, Cyrix Corporation 1992, Order No. 94074-00, with some additional
information added by me that has been taken from the Cyrix Cx486SLC
Microprocessor Data Sheet, Cyrix Corporation 1991, Order No. 94073-00,
the i486 Microprocessor Hardware Reference Manual, Intel Corporation,
Order No. 240552-001, and the i486 Microprocessor Programmer's Reference
Manual, Order No. 240486-001.
SUBSTANTIVE DIFFERENCES - (SOFTWARE)
SS-1 Description
The TR4 cache test register holds the cache tag address, valid bits
and LRU bits for the current cache test operation. The TR5 cache test
register defines the cache line, cache set and control bits for the
cache test operation. Since the cache size and organization differ
between the Cx486SLC/DLC and the 80486, TR4 and TR5 have similar but
not identical bit definitions on the Cx486SLC/DLC and the 80486.
Analysis
Cache test and diagnostic software - if written to explicitly depend
on the cache size and organization of the 80486 - may produce unexpected
results when run on a Cx486SLC/DLC. The results of the programs typically
have no effect on operating systems or applications software. For proper
test or diagnosis of the Cx486SLC/DLC cache, software should be used
which is specifically written to comprehend the Cx486SLC/DLC.
80486SX
31 11 10 9 7 6 3 2 0
+------------------------------------------+---+--------+-------+--------+
TR4 | Tag | V | LRU | Valid | Unused |
+------------------------------------------+---+--------+-------+--------+
V This is the valid bit for the particular cache line which was
accessed. On a cache lookup, it is a copy of one of the bits
reported in bits 3..6, which are the valid bits for all four
cache lines in the selected set. On a cache write, it becomes
the new valid bit for the particular cache line selected within
the selected set.
LRU On a cache lookup, these are the three LRU bits of the set which
was accessed. On a cache write, these bits are ignored; the LRU
bits in the cache are updated by the pseudo-LRU cache replacement
algorithm. LRU bit 0 (TR4 bit 7) indicates which group of two
cache lines in the set contains the cache line that has been least
recently used. The bit is clear when the least recently used line
is either line 0 or line 1, and is set when the least recently
used line in the set is either line 2 or line 3. LRU bit 1 (TR4
bit 8) and LRU bit 2 (TR4 bit 9) indicate which of the two lines
in the group of lines selected by LRU bit 0 is the least recently
used, where LRU bit 1 indicates either line 0 (bit=0) or line 1
(bit=1) and LRU bit 2 indicates either line 2 (bit=0) or line 3
(bit=1) has been least recently used. A real LRU replacement
algorithm would have to use 5 bits.
Valid On a cache lookup, these are the four Valid bits of the set which
was accessed, where each bit corresponds to one of the four cache
lines in the set.
486SLC/DLC
31 9 8 7 6 3 2 0
+------------------------------------------------+-+-----+-------+-------+
TR4 | Tag |U| LRU | Valid | 0 0 0 |
+------------------------------------------------+-+-----+-------+-------+
U bit 8 is unused.
LRU On a cache lookup, this is the LRU bit associated with the cache
set. On a cache write, this bit is ignored. Bit=0 means line 0
in the selected set has been least recently used, bit=1 means line
1 in the selcted set has been least recently used.
Valid On a cache lookup, these are the four valid bits for the particular
cache line accessed (one bit per byte in the cache line). On a cache write
these are the valid bits written into the line.
80486SX
31 11 10 4 3 2 1 0
+--------------------------------------+------------------+-------+------+
TR5 | Unused | Set Select | Entry | Ctrl |
+--------------------------------------+------------------+-------+------+
Set Select Selects one of the 128 sets of the cache.
Entry Selects one of the four cache lines within the selected set.
Ctrl 00 write to cache fill buffer, or read from cache read buffer
01 perform cache write
10 perform cache read
11 flush cache (mark all entries invalid)
486SLC/DLC
31 11 10 4 3 2 1 0
+--------------------------------------+------------------+-+-----+------+
TR5 | Unused | Set Select |U| Ent | Ctrl |
+--------------------------------------+------------------+-+-----+------+
Set Select Selects one of the 128 sets of the cache.
U bit 3 is unused
Entry Selects one of the two cache lines within the selected set.
Ctrl 00 ignored
01 perform cache write
10 perform cache read
11 flush cache (mark all entries invalid)
SS-2 Description
The 80486 NW (not write-through) bit in CR0 disables 80486 write-through
capability. If the cache disabled bit is on, a write occurs to a cache-hit
location, and NW is a 1, then the 80486 does not perform an external write
bus cycle. This bit is not available on the Cx486SLC/DLC and is fixed at
zero.
Analysis
The NW bit on the 80486 allows for a capability of self-contained
processing once a program has been loaded into the cache and the cache
disabled. Programs that use this feature will work on the Cx486SLC/DLC
with writes happening on external write bus cycles.
SS-3 Description
On systems with hardware FPUs, whose FPU ERROR signal is routed to the
CPU ERROR signal (NE bit set on the 80486DX), a floating point error is
normally acknowledged by the CPU upon execution of the next floating
point instruction. If the next floating point instruction is a load single
or load double precision that would have generated a General Protection
(GP) fault, it is possible for the Cx486SLC/DLC to acknowledge the GP
fault before the coprocessor error fault. The 80486 acknowledges the
coprocessor error first.
Analysis
This condition (FPU ERROR connected to CPU ERROR) does not occur in PC
compatible designs.
INFORMATIONAL DIFFERENCES - (SOFTWARE)
IS-1 Description
Certain 80486 flag bits in the flags register are documented by Intel
as undefined after execution of certain instructions. Testing at Cyrix
has shown that the final states of theses flag bits are in fact
unpredictable. The Cx486SLC/DLC leaves the flag bit values unmodified
after execution of the same instructions.
Analysis
Since the flag bits are documented by Intel to be undefined after certain
operations, software can not reliably use the resulting flag bit values.
IS-2 Description
Early revision 80486SX CPUs have a programmable Numeric Exception control
bit in control register CR0 (bit 28). This bit was intended to control
whether numeric execptions are handled internally (NE=1) or driven
externally on a discrete CPU pin (NE=0). On these 80486SXs, the NE bit
can be set to a one even though numeric execptions can not be handled
internally due to the fact that no coprocessor exists. Reading the NE
bit on the coprocessor exists. Reading the NE bit on the Cx486SLC/DLC
always returns a zero indicating that numeric exceptions are always
handled externally.
Analysis
Since the Cx486SLC/DLC does not have an on-board floating point unit, the
coprocessor interface (including numeric exception signaling) operates in
a fashion compatible with the 80386.The Cx486SLC/DLC and 80386 use an
external coprocessor which generates the numeric exception and always
return zero when the NE bit is read.
IS-3 Description
When trying to reference CR1 in protected mode while not at the highest
privilege level (level 0), the 80486 generates an Invalid Opcode fault,
whereas the Cx486SLC/DLC generate a General Protection (GP) fault.
Analysis
The Cx486SLC/DLC and 80486 do not define the bits in the CR1 register.
Since there are no valid bits in the CR1 register, any exception taken,
whether it is a GP fault or Invalid Opcode fault, will signal that an
invalid operation has taken place.
IS-4 Description
When using the Translation Lookaside Buffer (TLB) test registers, the
undefined bits in TR7 may differ between the 80486 and the Cx486SLC/DLC
when a look-up miss (TR7 bit 4 is clear) occurs. This includes the REP
field (bits 2-3).
Analysis
The majority of the bits in TR7 are documented by Intel to be undefined
after a TLB look-up miss. Therefore, software programs can not reliably
use the resulting values of these undefined bits.
IS-5 Description
Cx486SLC/DLC reads and writes to Debug Register 4 (DR4) and Debug
Register 5 (DR5) result in accesses to Debug Register 6 (DR6) and
Debug Register 7 (DR7), respectively. Accessing DR4 and DR5 on the
80486 produces an Invalid Opcode fault.
Analysis
DR4 and DR5 are documented as undefined by Intel on the 80486. Since
the results are undefined, software programs can not reliably use the
register results.
IS-6 Description
Writing duplicate TLB tags using the TLB test registers generates
different results on the Cx486SLC/DLC than on the 80486 when the
duplicate address is looked up. The results of writing duplicate
TLB tags is documented as undefined by Intel.
Analysis
Writing duplicate TLB tags using the TLB test registers is an unsupported
operation. The Cx486SLC/DLC and 80486 return undefined results when
looking up the resulting address. Since the results are undefined,
software programs can not reliably use the register results.
IS-7 Description
The 80486 imposes a performance penalty in order to report debug faults
precisely. The Cx486SLC/DLC reports debug faults precisely without a
performance penalty (except for a repeated MOVS instruction).
Analysis
The Cx486SLC/DLC provides superior debugging capability.
IS-8 Description
The 80486 writes zeroes to the destination register when executing a
Bit-Scan Forward (BSF) instruction if all zeroes are found in the
specified bit map. The Cx486DLC/DLC leaves the destination register
unchanged under this condition.
Analysis
The value in the destination register of a BSF instruction is specified
by Intel to be undefined when a one bit is not found in the source
operand. Since the results are undefined, software programs can not be
reliably use the register results.
IS-9 Description
Memory versions of the instructions ADC, ADD, AND, DEC, INC, MOVS, NEG,
NOT, OR, RCl, ROL, ROR, SAl, SAR, SBB, SUB, SHL, SHLD, SHR, SHRD, XCHG,
and XOR read the destination memory, operate on it, and write it back to
memory. The Cx486SLC/DLC checks the writability of the destination before
performing these instructions. On non-writable locations, the Cx486SLC/
DLC faults before starting the instruction. The 80486 performs the read,
sets the read location acessed bit, and modifies the flags before
faulting.
Analysis
By checking the writability first prior to execution of the instruction
(at no performance penalty), the Cx486SLC/DLC avoids unnecessary
operations. Leaving the accessed bit and flag contents in their original
state is prefered if the instruction is restarted.
IS-10 Description
In the case above, if the read locatuion is also not present, the 80486
will attempt the read, take a page fault, reload the page, restart the
instruction, and then take a GP fault. The Cx486SLC/DLC will take a GP
fault.
Analysis
The 80486 wastes time loading the requested page before taking the
required GP fault. The GP fault is eventually detected by both the 80486
and the Cx486SLC/DLC.
IS-11 Description
If a locked instruction accesses a memory page marked as not present, the
80486 reports in the error code that the access type was a write while
the Cx486SLC/DLC reports that the access type was a read.
Analysis
Since the page is not present in either case (read or write), the same
page fault is taken by both the Cx486SLC/DLC and the 80486.
IS-12 Description
When alignment checking is enabled an an ENTER instruction that misaligns
the stack is executed, the 80486 generates an alingment check fault even
though the misaligned stack has not been accessed. The Cx486SLC/DLC
generates the aligment check fault only when the misaligned stack is
accessed.
Analysis
The Cx486SLC/DLC correctly generates an alignment check fault only when
a misaligned stack is accessed. The 80486 unnecessarily takes the fault
in the case described.
IS-13 Description
When executing a REP LOOPE (repeated loop while equal) instruction, the
80486 does not perform the "if equal" function of the instruction. The
Cx486SLC/DLC does perfrom the "if equal" check under the same
circumstances.
Analysis
The 80486 execution should be considered incorrect. The Cx486SCL/DLC
correctly executes this instruction sequence.
IS-14 Description
The 80486 incorrectly asserts the LOCK# pin while enterinf the illegal
instruction exception handler when using the LOCK prefix on instructions
other than those allowed (Only BTS, BTR, BTC, XCHG, INC, DEC, NOT, NEG,
ADD, ADC, SUB, SBB, AND, OR, XOR are allowed). The Cx486SLC/DLC correctly
does not assert LOCK# in this case.
Analysis
When using the 80486 in a multi-processor environment, the bus may be
locked unnecessarily causing performance degradation.
Operating systems/operating environments tested with the Cx486SLC/DLC:
Digital Research: Concurrent DOS 386 5.0, DR-DOS 6.0
Ergo: OS/386
IBM: IBM DOS 3.3, IBM DOS 4.0, OS/2 2.0, OS/2 SE 1.3
IGC: VM/386 2.01
Interactive: Interactive Unix 3.2
Mark Williams: Coherent 3.1, Coherent 3.2
Microsoft: MS-DOS 3.3, MS-DOS 4.01, MS-DOS 5.0, Windows 3.0, Windows 3.1
Pharlap: DOS-Extender 286, DOS-Extender 386
Quarterdeck: Desqview 386 2.32
Rational: DOS/4G
SCO: SCO Open Desktop, SCO Unix, SCO Xenix 2.3.2c
Symantec: Norton Desktop for Windows 1.0
UHC: Developers Environment, Network Module, X11R4/Motif Windowing
Module, UNIX Release 4.0 Ver. 3.6