386 lines
14 KiB
Plaintext
386 lines
14 KiB
Plaintext
|
Last Change 7/17/93. Please send updates directly to Harald.
|
||
|
|
||
|
|
||
|
86BUGS.LST revision 1.0
|
||
|
By Harald Feldmann (harald.feldmann@almac.co.uk), mail address:
|
||
|
Hamarsoft, p.o. box 91, 6114 ZH Susteren, The Netherlands.
|
||
|
(Please retain my name and address in the document)
|
||
|
|
||
|
This file lists undocumented and buggy instructions of the Intel 80x86
|
||
|
family of processors. Some of the information was obtained from the book
|
||
|
"Programmer's technical reference, the processor and coprocessor; by
|
||
|
Robert L. Hummel; Ziff davis press. ISBN 1-56276-016-5 Which is highly
|
||
|
recommended. Note that Intel does not support the special features and
|
||
|
may decide to drop opcode variants and instructions in future products.
|
||
|
|
||
|
All mentioned trademarks and/or tradenames are owned by the respective
|
||
|
owners and are acknowledged.
|
||
|
|
||
|
Undocumented instructions and undocumented features of Intel and IIT
|
||
|
processors:
|
||
|
|
||
|
AAD: OPCODE: d5,0a OPCODE VARIANT
|
||
|
|
||
|
This instruction regularly performs the following action:
|
||
|
- unpacked BCD in AX example (AX = 0104h)
|
||
|
- AL = AH * 10d + AL (AL = 0eh )
|
||
|
- AH = 00 (AH = 00h )
|
||
|
|
||
|
The normal opcode decodes as follows: d5,0a
|
||
|
The instruction itself is an instruction plus operand. By
|
||
|
replacing the second byte with any number in the range 00 -
|
||
|
ff we can build our own instruction AAD for various number
|
||
|
systems in those ranges. For example by coding d5,10 we
|
||
|
achieve an instruction that performs: AL = AH * 16d + AL.
|
||
|
|
||
|
Note: the variant is not supported on all 80x86-compatible
|
||
|
CPUs, notably the NEC V-series, because some hard-code the
|
||
|
divisor at 0Ah
|
||
|
|
||
|
AAM: OPCODE: d4,0a OPCODE VARIANT
|
||
|
|
||
|
This instruction regularly performs the following action:
|
||
|
- binary number in AL
|
||
|
- AH = AL / 10d
|
||
|
- AL = AL MOD 10d
|
||
|
|
||
|
Thus creating an unpacked BCD in AX.
|
||
|
The normal opcode decodes as follows: d4,0a
|
||
|
The instruction itself is an instruction plus operand. By
|
||
|
replacing the second byte with any number in the range 00 -
|
||
|
ff we can build our own instruction AAM for various number
|
||
|
systems in that range. For example by coding d4,07 we
|
||
|
achieve an instruction that performs: AH = AL / 07d, AL = AL
|
||
|
MOD 07d
|
||
|
|
||
|
The AAD and AAM opcode variants have been found in Future
|
||
|
Domain SCSI controller ROMS.
|
||
|
|
||
|
|
||
|
LOADALL: OPCODE: 0f,05 (i80286) & 0f,07 (i80386 & i80486)
|
||
|
UNDOCUMENTED
|
||
|
|
||
|
Load _ALL_ processor registers. Does exactly as the name
|
||
|
suggests, separate versions for i80286 and i80386 exist. The
|
||
|
i80286 LOADALL instruction reads a block of 102 bytes into
|
||
|
the chip, starting at address 000800 hex. The i80286 LOADALL
|
||
|
takes 195 clocks to execute.
|
||
|
The sequence is as follows (Hex address, Bytes, Register):
|
||
|
|
||
|
0800: 6 N/A
|
||
|
0806: 2 MSW (Machine Status Word)
|
||
|
0808: 14 N/A
|
||
|
0816: 2 TR (Task Register)
|
||
|
0818: 2 FLAGS (Flags)
|
||
|
081a: 2 IP (Instruction Pointer)
|
||
|
081c: 2 LDT (Local Descriptor Table)
|
||
|
081e: 2 DS (Data Segment)
|
||
|
0820: 2 SS (Stack Segment)
|
||
|
0822: 2 CS (Code Segment)
|
||
|
0824: 2 ES (Extra Segment)
|
||
|
0826: 2 DI (Destination Index)
|
||
|
0828: 2 SI (Source Index)
|
||
|
082a: 2 BP (Base Pointer)
|
||
|
082c: 2 SP (Stack Pointer)
|
||
|
082e: 2 BX (BX register)
|
||
|
0830: 2 DX (DX register)
|
||
|
0832: 2 CX (CX register)
|
||
|
0834: 2 AX (AX register)
|
||
|
0836: 6 ES cache (ES descriptor _cache_)
|
||
|
083c: 6 CS cache (CS descriptor _cache_)
|
||
|
0842: 6 SS cache (SS descriptor _cache_)
|
||
|
0848: 6 DS cache (DS descriptor _cache_)
|
||
|
084e: 6 GDTR (Global Descriptor Table)
|
||
|
0854: 6 LDT cache (Local Descriptor_cache_)
|
||
|
085a: 6 IDTR (Interrupt Descriptor table)
|
||
|
0860: 6 TSS cache (Task State Segment _cache_)
|
||
|
|
||
|
Descriptor cache entries are internal copies of the
|
||
|
original registers (the LDT cache is normally a copy of the
|
||
|
last regularly _loaded_ LDT). Note that after executing
|
||
|
LOADALL, the chip will use the _cache_ registers without
|
||
|
re-checking the caches against the regular registers. That
|
||
|
means that cache and register do not have to be the same.
|
||
|
Caches are updated when the original register is loaded
|
||
|
again. Both will then contain the same value.
|
||
|
|
||
|
Descriptor caches layout:
|
||
|
3 bytes 24 bit physical address of segment
|
||
|
1 byte access rights byte, mapped as access right
|
||
|
byte in a regular descriptor. The present
|
||
|
bit now represents a valid bit. If this bit
|
||
|
is cleared (zero) the segment is invalid and
|
||
|
accessing it will trigger exception 0dh. The
|
||
|
DPL (Descriptor Privilege Level) fields of
|
||
|
the CS and SS descriptor caches determine
|
||
|
the CPL (Current Privilege Level).
|
||
|
2 bytes 16 bit segment limit.
|
||
|
This layout is the same for the GDTR and IDTR registers,
|
||
|
except that the access rights byte must be zero.
|
||
|
|
||
|
|
||
|
i80386 LOADALL:
|
||
|
The i80386 variant loads 204 (dec) bytes from the address at
|
||
|
ES:EDI and resumes execution in the specified state.
|
||
|
No timing information available.
|
||
|
|
||
|
relative offset: Bytes: Registers:
|
||
|
0000: 4 CR0
|
||
|
0004: 4 EFLAGS
|
||
|
0008: 4 EIP
|
||
|
000c: 4 EDI
|
||
|
0010: 4 ESI
|
||
|
0014: 4 EBP
|
||
|
0018: 4 ESP
|
||
|
001c: 4 EBX
|
||
|
0020: 4 EDX
|
||
|
0024: 4 ECX
|
||
|
0028: 4 EAX
|
||
|
002c: 4 DR6
|
||
|
0030: 4 DR7
|
||
|
0034: 4 TR
|
||
|
0038: 4 LDT
|
||
|
003c: 4 GS (zero extended)
|
||
|
0040: 4 FS (zero extended)
|
||
|
0044: 4 DS (zero extended)
|
||
|
0048: 4 SS (zero extended)
|
||
|
004c: 4 CS (zero extended)
|
||
|
0050: 4 ES (zero extended)
|
||
|
0054: 12 TSS descriptor cache
|
||
|
0060: 12 IDT descriptor cache
|
||
|
006c: 12 GDT descriptor cache
|
||
|
0078: 12 LDT descriptor cache
|
||
|
0084: 12 GS descriptor cache
|
||
|
0090: 12 FS descriptor cache
|
||
|
009c: 12 DS descriptor cache
|
||
|
00a8: 12 SS descriptor cache
|
||
|
00b4: 12 CS descriptor cache
|
||
|
00c0: 12 ES descriptor cache
|
||
|
|
||
|
Descriptor caches layout:
|
||
|
1 byte zero
|
||
|
1 byte access rights byte, same as i80286
|
||
|
2 bytes zero
|
||
|
4 bytes 32 bit physical base address of segment
|
||
|
4 bytes 32 bit segment limit
|
||
|
|
||
|
|
||
|
UNKNOWN: OPCODE: 0f,04 UNDOCUMENTED
|
||
|
|
||
|
This instruction is likely to be an alias for the LOADALL on
|
||
|
the i80286. It is not documented and is even marked as
|
||
|
unused in the 'Programmer's technical reference'. Still it
|
||
|
executes on the i80286. >> info wanted <<
|
||
|
|
||
|
|
||
|
SETALC: OPCODE: d6 UNDOCUMENTED
|
||
|
|
||
|
This instruction copies the Carry Flag to the AL register.
|
||
|
In case of a CY, AL becomes ffh. When the Carry Flag is
|
||
|
cleared, AL becomes 00.
|
||
|
|
||
|
|
||
|
Floating Point special instructions:
|
||
|
|
||
|
FMUL4X4: OPCODE: db,f1 IIT ONLY
|
||
|
|
||
|
This instruction is available only on the IIT (Integrated
|
||
|
Information Technology Inc.) math processors.
|
||
|
Takes 242 clocks.
|
||
|
The instruction performs a 4x4 matrix multiply in one
|
||
|
instruction using four banks of 8 floating point registers.
|
||
|
The operands must be loaded to a specific bank in a specific
|
||
|
order. The equation solved can be represented by:
|
||
|
|
||
|
Xn = (A00 * Xo) + (A01 * Xo) + (A02 * Xo) + (A03 * Xo)
|
||
|
Yn = (A10 * Yo) + (A11 * Yo) + (A12 * Yo) + (A13 * Yo)
|
||
|
Zn = (A20 * Zo) + (A21 * Zo) + (A22 * Zo) + (A23 * Zo)
|
||
|
Vn = (A30 * Vo) + (A31 * Vo) + (A32 * Vo) + (A33 * Vo)
|
||
|
|
||
|
Where Xo stands for the original X value and Xn for the
|
||
|
result. Operands must be loaded to the following registers
|
||
|
in the specified banks in the specified order.
|
||
|
|
||
|
Before FMUL4X4 After FMUL4X4
|
||
|
|
||
|
bank bank
|
||
|
Register: 0 1 2 0
|
||
|
|
||
|
ST(0) Xo A33 A31 Xn
|
||
|
ST(1) Yo A23 A21 Yn
|
||
|
ST(2) Zo A13 A11 Zn
|
||
|
ST(3) Vo A03 A01 Vn
|
||
|
ST(4) A32 A30 ?
|
||
|
ST(5) A22 A20 ?
|
||
|
ST(6) A12 A10 ?
|
||
|
ST(7) A02 A00 ?
|
||
|
|
||
|
|
||
|
|
||
|
All four banks can be selected by using the bankswitching
|
||
|
instructions, but only bank 0, 1 and 2 make sense since bank
|
||
|
3 is an internal scratchpad. The separate banks can contain
|
||
|
8 floating points and may be re-used with normal
|
||
|
instructions. Each bank acts like an independent i80287,
|
||
|
except when bankswitched inbetween, in those cases where the
|
||
|
initial status is not maintained;
|
||
|
|
||
|
Pseudo- multichip operation can be performed in each bank
|
||
|
and even in multiple banks at the same time (although only
|
||
|
one instruction will operate on one register at any given
|
||
|
time), provided that the active register and top register
|
||
|
are not changed after switching from bank to bank.
|
||
|
|
||
|
|
||
|
EXAMPLE:
|
||
|
FINIT ; reset control word
|
||
|
FSBP1 ; select bank 1
|
||
|
FLD DWORD PTR es:[si] ; first original
|
||
|
FLD DWORD PTR es:[si+4] ; second original
|
||
|
FLD DWORD PTR es:[si+8] ; third original
|
||
|
FSTCW WORD PTR [bx] ; save FPU control status
|
||
|
FSBP2 ; NOTE ! you will see three
|
||
|
active registers in this
|
||
|
bank when using a
|
||
|
debugger
|
||
|
FINIT ; nothing visible
|
||
|
FLD DWORD PTR [si] ; new value
|
||
|
FLD DWORD PTR [si+4] ; second new value
|
||
|
FADD ST,ST(1) ; two values visible
|
||
|
FSTP DWORD PTR [si+8] ; one value visible
|
||
|
FSBP1 ; one original visible
|
||
|
FLDCW WORD PTR [bx] ; restore FPU status to the
|
||
|
one active in bank 1,
|
||
|
causing original three
|
||
|
values to be visible
|
||
|
again in correct
|
||
|
sequence
|
||
|
|
||
|
... simply continue with what you wanted to do with
|
||
|
those numbers from es:[si], they are still there.
|
||
|
|
||
|
FLD DWORD PTR [si+8] ; for instance...
|
||
|
|
||
|
|
||
|
This feature of the IIT chips can be used to perform complex
|
||
|
operations in registers with many components remaining the
|
||
|
same for a large dataset, only saving intermediary results
|
||
|
to ONE memory location, bankswitching to the next series of
|
||
|
operands, loading that ONE operand and continuing the
|
||
|
calculation with the next set of operands already in that
|
||
|
bank. This does require another read into the new bank but
|
||
|
may save time and memoryspace compared to memory based
|
||
|
operands or multiple pass algorithms with multiple arrays of
|
||
|
intermediary results.
|
||
|
|
||
|
|
||
|
|
||
|
BANKSWITCH INSTRUCTIONS:
|
||
|
|
||
|
FSBP0: OPCODE: db,e8 IIT ONLY
|
||
|
Selects the original bank. (default) (6 clocks)
|
||
|
|
||
|
|
||
|
FSBP1: OPCODE: db,eb IIT ONLY
|
||
|
|
||
|
Selects bank 1 from FMUL4X4 instruction diagram (6 clocks)
|
||
|
|
||
|
|
||
|
FSBP2: OPCODE: db,ea IIT ONLY
|
||
|
|
||
|
Selects bank 2 from FMUL4X4 instruction diagram (6 clocks)
|
||
|
|
||
|
FSBP3: OPCODE: db,e9 IIT ONLY UNDOCUMENTED
|
||
|
Selects the scratchpad bank3 used by the FMUL4X4 internally.
|
||
|
Not very useful but funny to look at... How-to: load
|
||
|
any value into bank 0,1 or 2 until you have a full 8
|
||
|
registers, then execute this bankswitch. Using a
|
||
|
debugger like CodeView you are now able to inspect the
|
||
|
bank3 registers. (most likely to take 6 clocks)
|
||
|
|
||
|
|
||
|
|
||
|
TRIGONIOMETRIC FUNCTIONS:
|
||
|
|
||
|
Apparently the IIT 2c87 recognises and executes some
|
||
|
i80387 trigoniometric functions. UNDOCUMENTED
|
||
|
FSIN (sine) and FCOS (cosine) have been tested and function
|
||
|
according to the Intel 80387 specifications. FSINCOS
|
||
|
(available on the Intel 80287XL, 80387 and up) does not
|
||
|
work.
|
||
|
|
||
|
FSIN: OPCODE: d9,fe IIT 2c87+ (also Intel 80387+) UNDOCUMENTED
|
||
|
Calculates the sine of the input in radians in ST(0). After
|
||
|
calculation, ST(0) contains the sine. Takes approximately
|
||
|
120 clocks.
|
||
|
|
||
|
FCOS: OPCODE: d9,ff IIT 2c87+ (also Intel 80387+) UNDOCUMENTED
|
||
|
Calculates the cosine of the input in radians in ST(0).
|
||
|
After calculation, ST(0) contains the cosine. Takes
|
||
|
approximately 120 clocks.
|
||
|
|
||
|
|
||
|
... CUT HERE FOR FIRST REVISION, next part is to be revised ...
|
||
|
|
||
|
|
||
|
|
||
|
Instructions by mnemonic mnemonic:
|
||
|
opcode: processor: remark & remedy:
|
||
|
|
||
|
AAA i80286 & i80386 & i80486
|
||
|
|
||
|
CMPS i80286
|
||
|
CMPXCHG i80486
|
||
|
FINIT
|
||
|
FSTSW
|
||
|
FSTCW
|
||
|
|
||
|
|
||
|
INS i80286 &
|
||
|
i80386 &
|
||
|
i80486
|
||
|
|
||
|
INVD i80486
|
||
|
|
||
|
|
||
|
|
||
|
MOV to SS n/a early 8088 Some early 8088 would not properly
|
||
|
disable interrupts after a move to
|
||
|
the SS register. Workaround would
|
||
|
be to explicitly clear the
|
||
|
interrupts, update SS and SP and
|
||
|
then re-enable the interrupts.
|
||
|
Typically this would occur in a
|
||
|
situation where one would relocate
|
||
|
a stack in memory, more than 64Kb
|
||
|
from the original one, updating
|
||
|
both SS and SP like in:
|
||
|
MOV SS,AX ; would disable
|
||
|
interrupts
|
||
|
automatically during
|
||
|
this and next
|
||
|
instruction.
|
||
|
MOV SP,DX ; interrupts disabled
|
||
|
... ; interrupts enabled.
|
||
|
|
||
|
|
||
|
multiple prefixes
|
||
|
with REPx 8088 & 8086 They would not properly restart at
|
||
|
the first prefix byte after an
|
||
|
interrupt. when more than one
|
||
|
prefix is used. e.g. LOCK REP MOVSW
|
||
|
CS:[bx]. A workaround is to test
|
||
|
after the instruction for CX==0,
|
||
|
here: LOCK REP MOVSW CS:[BX] OR
|
||
|
CX,CX JNZ here because of the CS
|
||
|
override, the REP and LOCK prefixes
|
||
|
would not be recognised to be part
|
||
|
of the instruction and the REP MOVSW
|
||
|
would be aborted. This also seems to
|
||
|
be the case for a REP MOVSW CS:[BX]
|
||
|
Note that this also implies that
|
||
|
REPZ, REPNZ are affected in SCASW
|
||
|
for instance.
|
||
|
|
||
|
|