textfiles/computers/DOCUMENTATION/a07.txt

CHAPTER 7   THE FLOATING-POINT PROCESSOR


In this chapter, we'll refer to the various Central Processing
Units (CPUs) as the "86".  Thus "86" refers to either the 8088,
8086, 80186, 80286, etc.  We'll refer to the various coprocessors
as the "87".  Thus "87" refers to either the 8087, the 287, the
387, or the special IIT-2C87 processor.


The 8087 and 287 Coprocessors

All IBM-PC's, and most clones, contain a socket for a floating
point coprocessor.  If you shell out between $80 and $300, and
plug the appropriate chip into that socket, then a host of
floating point instructions is added to the assembly language
instruction set.

The original IBM-PC, and the XT, accept the original floating
point chip, the 8087.  The AT accepts a later update, the 287.
From a programming standpoint, the two chips are nearly
identical: the 287 adds the instructions FSETPM and FSTSW AX, and
ignores the instructions FENI and FDISI.  There is, however, a
rather nasty design flaw in the 8087, that was corrected in the
287.

To understand the flaw, you must understand how the 86 and 87
work as coprocessors. Whenever the 86 sees a floating point
instruction, it communicates the instruction, and any associated
memory operands, to the 87.  Then the 86 goes on to its next
instruction, operating in parallel with the 87.  That's OK, so
long as the following instructions don't do one of the following:

  1. Execute another floating point instruction; or

  2. Try to read the results of the still-executing floating
     point instruction.

If they do, then you must provide an instruction called WAIT (or
synonymously FWAIT), which halts the 86 until the 87 is finished.
For almost all floating point instructions, it should not be
necessary to provide an explicit FWAIT; the 86 ought to know that
it should wait.  For the 8087, it IS necessary to give an
explicit FWAIT before each floating point instruction: that is
the flaw.

Because of the flaw, all assemblers supporting the 8087 will
silently insert an FWAIT code (hex 9B) before all 87
instructions, except those few (the FN instructions other than
FNOP) not requiring the FWAIT.  A86 provides the switch +F (the F
must be capitalized), to signal that the 287 is the target
processor.  A86 also provides the directive ".287", compatible
with Microsoft's assembler, that you can insert into your
programs to accomplish the same thing as +F. However, the actions
taken by A86 and Microsoft when seeing .287 are completely
disjoint!  To wit:
                                                              7-2

* A86 ceases outputting FWAIT directives that are unnecessary for
  the 287.  For reasons beyond my comprehension, Microsoft
  continues to put them out.  Can someone enlighten me as to why
  Microsoft is putting out those codes?

* A86 ignores the instructions FENI, FDISI, FNENI, and FNDISI
  after it sees a .287 directive.  Microsoft continues to
  assemble these instructions.

* Microsoft recognizes the new 287 instructions, if and only if
  it sees the .287 directive.  A86 recognizes them even if .287
  is not given.  In general, I don't attempt to police your
  instruction usage-- if you use an instruction available on a
  limited number of processors, I trust that you are programming
  for one of those processors.

In summary, if your program will be running only on machines with
a 287, you can give ".287" directive. Your programs will be
significantly shorter than if they were assembled by Microsoft.
If you want your programs to run on all machines containing a
floating point chip, you should refrain from specifying .287.

WARNING: The most common mistake 87 programmers make is to try to
read the results of an 87 operation in 86 memory, before the
results are ready.  At least on my AT, the system often crashes
when you do this!  If your program runs correctly when single
stepped, but crashes when set loose, then chances are you need an
extra explicit FWAIT somewhere.


Extra Coprocessor Support

A86 now supports two additional coprocessors available for
PC-compatibles: the 80387, available for 386-based machines, and
the IIT-2C87, a 287-plug-compatible chip that adds a couple of
unique instructions. The IIT-2C87 has two extra banks of on-chip
8-number stacks, that can be switched in with the FBANK
instruction, and a matrix multiply instrction that uses all three
banks as input.  (For details contact Specialty Software
Development Corp., 110 Wild Basin Road, Austin TX 78746.) Both
chips incorporate the correction to the 8087's FWAIT design flaw,
so you can assemble with the .287 directive.  The extra
instructions for these chips are marked by "387 only:" and "IIT
only:" in the chart at the end of this chapter.


Emulating the 8087 by Software

There is a software package provided with many compilers
(Borland's Turbo C and most Microsoft compilers, for example)
that emulates the 8087 instruction set.  The emulator is very
cleverly implemented so that the programmer need not know whether
a floating point chip will be available, or whether emulation
will be necessary.  This is done by having the linker replace all
floating point machine instructions with INT calls to certain
interrupts, dedicated to emulation.  The interrupt handlers
interpret the operands to the instructions, and emulate the 8087.
                                                              7-3

You can tell A86 that the emulator might be used, by providing a
+f switch in the invocation line, or in the A86 environment
variable (make sure the f is lower case).  Since your program
will be linked to the emulator, you must be producing an OBJ
file, not a COM file, for emulation support to take effect.
Whenever a floating point instruction is assembled, A86 will
generate an external reference at the opcode for the instruction.
Then, if the emulation package is linked with your program, the
opcodes will be replaced by the INT calls. If a special
non-emulation module is linked, the opcodes will be left alone,
and the floating point instructions will be executed directly.


The Floating Point Stack

The 87 has its own register set, of 8 floating point numbers
occupying 10 bytes each, plus 14 bytes of status and control
information.  Many of the 87's instructions cause the numbers to
act like a stack, much like a Hewlett-Packard calculator.  For
this reason, the numbers are called the floating point stack.

The standard name for the top element of the floating point stack
is either ST or ST(0); the others are named ST(1) through ST(7).
Thus, for example, the instruction to add stack element number 3
into the top stack element is usually coded FADD ST,ST(3).

I find this notation painfully verbose.  Especially bad are the
parentheses, which are hard to type, and which add visual clutter
to the program.  To alleviate this problem while retaining
language compatibility, I name my stack elements simply 0 through
7.  I recognize ST as a synonym for 0.  I allow expression
elements to be concatenated; concatenation is the same as
addition.  Thus, when A86 sees ST(3), it computes 0+3 = 3.  So
you can code the old way, FADD ST,ST(3), or you can code the
concise way, FADD 0,3 or simply FADD 3.


Floating Point Initializations

In general, you use the 87 by loading numbers from 86 memory to
the 87 stack (using FLD instructions), calculating on the 87
stack, and storing the results back to 86 memory (using FST and
FSTP instructions).  There are seven constant numbers built into
the 87 instruction set: zero, one, Pi, and four logarithmic
conversion constants.  These can be loaded using the FLD0, FLD1,
FLDPI, FLDL2T, FLDL2E, FLDLG2, and FLDLN2 instructions.  All
other constants must be declared in, then loaded from, 86 memory.
Integer constant words and doublewords can be loaded via FILD.
Non-integer constant doubleword, quadwords, and ten-byte numbers
can be loaded via FLD.
                                                              7-4

A86 allows you to declare constants loaded via FLD as floating
point numbers, using scientific notation if you like.  As an
exclusive feature, A86 allows you to use any of the 4 arithmetic
functions +, -, *, / in expressions involving floating point
numbers.  A86 will even do type conversion if one of the two
operands is given as an integer; though for clarity I recommend
that you always give floating point constants with their decimal
point.


Built-In Constant Names

A86 offers another exclusive feature: the built-in symbols

    PI   ratio of circumference to diameter of a circle

    L2T  log base 2 of 10

    L2E  log base 2 of the calculus constant e = 2.71828...

    LG2  log base 10 of 2

    LN2  natural log (base e) of 2

You can use these symbols in expressions, to declare useful
constants.  For example, you can declare the degrees-to-radians
conversion constant:

    DEG_TO_RAD  DT  PI/180.


Special Immediate FLD Form

Yet another exclusive A86 feature is the instruction form FLD
constant.  This form is intended primarily to facilitate "fooling
around" with the 87 when using D86; but it is also useful for
quick-and-dirty programs.  For example, the instruction FLD 12.3
generates the following sequence of code bytes (without
explicitly using the local labels given):

    CS FLD T[M1]
    JMP >M2
  M1  DT 12.3
  M2:

Obviously, this form is not terrifically efficient: you can
always save the JMP by placing the constant outside of the
instruction stream; and the CS override might not be needed.  But
the form is very, very convenient!

NOTE that the preceding 2 sections imply that you can get
careless and code, for example, FLD PI when you intended FLDPI.
Though the two are functionally equivalent, the first form takes
a whopping 17 bytes; and second, only 2 bytes.  Be careful!
                                                              7-5

Floating Point Operand Types

The list of floating point instructions contains a variety of
operand types.  Here is a brief explanation of those types:

0        stands for the top element of the floating point stack.
         A synonym for 0 is ST or ST(0).

i        stands for element number i of the floating point stack.
         i can range from 0 through 7.  A synonym for i is ST(i).

mem10r   is a 10-byte memory quantity (typically declared with a
         DT directive) containing a full precision floating point
         number. Intel recommends that you NOT store your numbers
         in full precision; that you use the following double
         precision format instead.  Full precision numbers are
         intended for storage of intermediate results (on the
         stack); they exist to insure maximum accuracy for
         calculations on double precision numbers, which is the
         official external format of 87 numbers.

mem8r    is an 8-byte memory quantity (typically declared with a
         DQ directive) containing a double precision floating
         point number.  This is the best format for floating
         point numbers on the 87.  The 87 takes the same amount
         of time on double precision calculations as it does on
         single precision.  The only extra time is the memory
         access of 4 more bytes; negligible in comparison to the
         calculation time.

mem4r    is a 4-byte quantity (typically defined with a DD
         directive) containing a single precision floating point
         number.

mem10d   is a 10-byte quantity (also defined via DT) containing a
         special Binary Coded Decimal format recognized by the
         FBLD and FBSTP instructions.  This format is useful for
         input and output of floating point numbers.

mem4i    is a 4-byte quantity representing a signed integer in
         two's-complement notation.

mem2i    is a 2-byte quantity representing a signed integer in
         two's-complement notation.

mem14    and mem94 are 14- and 94-byte buffers containing the 87
         machine state.
                                                              7-6

Operand Choices in A86

In the "standard" assembly language, the choice of operands for
floating point instructions seems inconsistent to me.  For
example, to subtract stack i from 0, you must provide two
operands; to do the equivalent comparison, you must provide only
one operand.  A86 smooths out these inconsistencies by allowing
more choices for operands: FADD i is equivalent to FADD 0,i. FCOM
0,i is equivalent to FCOM i.  The same holds for the other main
arithmetic instructions.  FXCH 0,i and FXCH i,0 are allowed. So
if you wish to retain compatibility with other assemblers, you
should use their more restrictive instruction list, not the
following one.


The 87 Instruction Set

Following is the 87 instruction set.  The "w" in the opcode field
is the FWAIT opcode, hex 9B, which is suppressed if .287 is
selected.  Again, "0", "1", and "i" stand for the associated
floating point stack registers, not constant numbers!  Constant
numbers in the descriptions are given with decimal points: 0.0,
1.0, 2.0, 10.0.


    Opcode    Instruction     Description

 w  D9 F0     F2XM1           0 := (2.0 ** 0) - 1.0
 w  DB F1     F4X4            IIT only: 4 by 4 matrix multiply
 w  D9 E1     FABS            0 := |0|
 w  DE C1     FADD            1 := 1 + 0, pop
 w  D8 C0+i   FADD i          0 := i + 0
 w  DC C0+i   FADD i,0        i := i + 0
 w  D8 C0+i   FADD 0,i        0 := i + 0
 w  D8 /0     FADD mem4r      0 := 0 + mem4r
 w  DC /0     FADD mem8r      0 := 0 + mem8r
 w  DE C0+i   FADDP i,0       i := i + 0, pop
 w  DB E8     FBANK 0         IIT only: set bank pointer to default
 w  DB EB     FBANK 1         IIT only: set bank pointer to bank 1
 w  DB EA     FBANK 2         IIT only: set bank pointer to bank 2
 w  DF /4     FBLD mem10d     push, 0 := mem10d
 w  DF /6     FBSTP mem10d    mem10d := 0, pop
                                                              7-7

 w  D9 E0     FCHS            0 := -0
9B  DB E2     FCLEX           clear exceptions
 w  D8 D1     FCOM            compare 0 - 1
 w  D8 D0+i   FCOM 0,i        compare 0 - i
 w  D8 D0+i   FCOM i          compare 0 - i
 w  D8 /2     FCOM mem4r      compare 0 - mem4r
 w  DC /2     FCOM mem8r      compare 0 - mem8r
 w  D8 D9     FCOMP           compare 0 - 1, pop
 w  D8 D8+i   FCOMP 0,i       compare 0 - i, pop
 w  D8 D8+i   FCOMP i         compare 0 - i, pop
 w  D8 /3     FCOMP mem4r     compare 0 - mem4r, pop
 w  DC /3     FCOMP mem8r     compare 0 - mem8r, pop
 w  DE D9     FCOMPP          compare 0 - 1, pop both
 w  D9 FF     FCOS            387 only: push, 1/0 := cosine(old 0)

 w  D9 F6     FDECSTP         decrement stack pointer
 w  DB E1     FDISI           disable interrupts (.287 ignore)

 w  DE F9     FDIV            1 := 1 / 0, pop
 w  D8 F0+i   FDIV i          0 := 0 / i
 w  DC F8+i   FDIV i,0        i := i / 0
 w  D8 F0+i   FDIV 0,i        0 := 0 / i
 w  D8 /6     FDIV mem4r      0 := 0 / mem4r
 w  DC /6     FDIV mem8r      0 := 0 / mem8r

 w  DE F8+i   FDIVP i,0       i := i / 0, pop
 w  DE F1     FDIVR           1 := 0 / 1, pop
 w  D8 F8+i   FDIVR i         0 := i / 0
 w  DC F0+i   FDIVR i,0       i := 0 / i
 w  D8 F8+i   FDIVR 0,i       0 := i / 0
 w  D8 /7     FDIVR mem4r     0 := mem4r / 0
 w  DC /7     FDIVR mem8r     0 := mem8r / 0
 w  DE F0+i   FDIVRP i,0      i := 0 / i, pop

 w  DB E0     FENI            enable interrupts (.287 ignore)
 w  DD C0+i   FFREE i         empty i
 w  DE /0     FIADD mem2i     0 := 0 + mem4i
 w  DA /0     FIADD mem4i     0 := 0 + mem2i
 w  DE /2     FICOM mem2i     compare 0 - mem2i
 w  DA /2     FICOM mem4i     compare 0 - mem4i
 w  DE /3     FICOMP mem2i    compare 0 - mem2i, pop
 w  DA /3     FICOMP mem4i    compare 0 - mem4i, pop

 w  DE /6     FIDIV mem2i     0 := 0 / mem2i
 w  DA /6     FIDIV mem4i     0 := 0 / mem4i
 w  DE /7     FIDIVR mem2i    0 := mem2i / 0
 w  DA /7     FIDIVR mem4i    0 := mem4i / 0
 w  DF /0     FILD mem2i      push, 0 := mem2i
 w  DB /0     FILD mem4i      push, 0 := mem4i
 w  DF /5     FILD mem8i      push, 0 := mem8i
                                                              7-8

 w  DE /1     FIMUL mem2i     0 := 0 * mem2i
 w  DA /1     FIMUL mem4i     0 := 0 * mem4i
 w  D9 F7     FINCSTP         increment stack pointer
9B  DB E3     FINIT           initialize 87
 w  DF /2     FIST mem2i      mem2i := 0
 w  DB /2     FIST mem4i      mem4i := 0
 w  DF /3     FISTP mem2i     mem2i := 0, pop
 w  DB /3     FISTP mem4i     mem4i := 0, pop
 w  DF /7     FISTP mem8i     mem8i := 0, pop

 w  DE /4     FISUB mem2i     0 := 0 - mem2i
 w  DA /4     FISUB mem4i     0 := 0 - mem4i
 w  DE /5     FISUBR mem2i    0 := mem2i - 0
 w  DA /5     FISUBR mem4i    0 := mem4i - 0


 w  D9 C0+i   FLD i           push, 0 := old i
 w  DB /5     FLD mem10r      push, 0 := mem10r
 w  D9 /0     FLD mem4r       push, 0 := mem4r
 w  DD /0     FLD mem8r       push, 0 := mem8r
 w  D9 E8     FLD1            push, 0 := 1.0
 w  D9 /5     FLDCW mem2i     control word := mem2i
 w  D9 /4     FLDENV mem14    environment := mem14
 w  D9 EA     FLDL2E          push, 0 := log base 2.0 of e
 w  D9 E9     FLDL2T          push, 0 := log base 2.0 of 10.0
 w  D9 EC     FLDLG2          push, 0 := log base 10.0 of 2.0
 w  D9 ED     FLDLN2          push, 0 := log base e of 2.0
 w  D9 EB     FLDPI           push, 0 := Pi
 w  D9 EE     FLDZ            push, 0 := +0.0

 w  DE C9     FMUL            1 := 1 * 0, pop
 w  D8 C8+i   FMUL i          0 := 0 * i
 w  DC C8+i   FMUL i,0        i := i * 0
 w  D8 C8+i   FMUL 0,i        0 := 0 * i
 w  D8 /1     FMUL mem4r      0 := 0 * mem4r
 w  DC /1     FMUL mem8r      0 := 0 * mem8r
 w  DE C8+i   FMULP i,0       i := i * 0, pop

    DB E2     FNCLEX          nowait clear exceptions
    DB E1     FNDISI          disable interrupts (.287 ignore)
    DB E0     FNENI           enable interrupts (.287 ignore)
    DB E3     FNINIT          nowait initialize 87
 w  D9 D0     FNOP            no operation

    DD /6     FNSAVE mem94    mem94 := 87 state
    D9 /7     FNSTCW mem2i    mem2i := control word
    D9 /6     FNSTENV mem14   mem14 := environment
    DF E0     FNSTSW AX       AX := status word
    DD /7     FNSTSW mem2i    mem2i := status word
 w  D9 F3     FPATAN          0 := arctan(1/0), pop
 w  D9 F8     FPREM           0 := REPEAT(0 - 1)
 w  D9 F5     FPREM1          387 only: 0 := REPEAT(0 - 1) IEEE compat.
 w  D9 F2     FPTAN           push, 1/0 := tan(old 0)
                                                              7-9

 w  D9 FC     FRNDINT         0 := round(0)
 w  DD /4     FRSTOR mem94    87 state := mem94
 w  DD /6     FSAVE mem94     mem94 := 87 state
 w  D9 FD     FSCALE          0 := 0 * 2.0 ** 1
9B  DB E4     FSETPM          set protection mode
 w  D9 FE     FSIN            387 only: push, 1/0 := sine(old 0)
 w  D9 FB     FSINCOS         387 only: push, 1 := sine, 0 := cos(old 0)
 w  D9 FA     FSQRT           0 := square root of 0

 w  DD D0+i   FST i           i := 0
 w  D9 /2     FST mem4r       mem4r := 0
 w  DD /2     FST mem8r       mem8r := 0
 w  D9 /7     FSTCW mem2i     mem2i := control word
 w  D9 /6     FSTENV mem14    mem14 := environment
 w  DD D8+i   FSTP i          i := 0, pop
 w  DB /7     FSTP mem10r     mem10r := 0, pop
 w  D9 /3     FSTP mem4r      mem4r := 0, pop
 w  DD /3     FSTP mem8r      mem8r := 0, pop
 w  DF E0     FSTSW AX        AX := status word
 w  DD /7     FSTSW mem2i     mem2i := status word

 w  DE E9     FSUB            1 := 1 - 0, pop
 w  D8 E0+i   FSUB i          0 := 0 - i
 w  DC E8+i   FSUB i,0        i := i - 0
 w  D8 E0+i   FSUB 0,i        0 := 0 - i
 w  D8 /4     FSUB mem4r      0 := 0 - mem4r
 w  DC /4     FSUB mem8r      0 := 0 - mem8r
 w  DE E8+i   FSUBP i,0       i := i - 0, pop
 w  DE E1     FSUBR           1 := 0 - 1, pop
 w  D8 E8+i   FSUBR i         0 := i - 0
 w  DC E0+i   FSUBR i,0       i := 0 - i
 w  D8 E8+i   FSUBR 0,i       0 := i - 0
 w  D8 /5     FSUBR mem4r     0 := mem4r - 0
 w  DC /5     FSUBR mem8r     0 := mem8r - 0
 w  DE E0+i   FSUBRP i,0      i := 0 - i, pop

 w  D9 E4     FTST            compare 0 - 0.0
 w  DD E0+i   FUCOM i         387 only: unordered compare 0 - i
 w  DD E1     FUCOM           387 only: unordered compare 0 - 1
 w  DD E8+i   FUCOMP i        387 only: unordered compare 0 - i, pop
 w  DD E9     FUCOMP          387 only: unordered compare 0 - 1, pop
 w  DA E9     FUCOMPP         387 only: unordered compare 0 - 1, pop both
9B            FWAIT           wait for 87 ready
 w  D9 E5     FXAM            C3 -- C0 := type of 0
 w  D9 C9     FXCH            exchange 0 and 1
 w  D9 C8+i   FXCH 0,i        exchange 0 and i
 w  D9 C8+i   FXCH i          exchange 0 and i
 w  D9 C8+i   FXCH i,0        exchange 0 and i
 w  D9 F4     FXTRACT         push, 1 := expo, 0 := sig
 w  D9 F1     FYL2X           0 := 1 * log base 2.0 of 0, pop
 w  D9 F9     FYL2XP1         0 := 1 * log base 2.0 of (0+1.0), pop