textfiles/computers/DOCUMENTATION/a08.txt

CHAPTER 8   NUMBERS AND EXPRESSIONS


Numbers and Bases

A86 supports a variety of formats for numbers.  In non-computer
life, we write numbers in a decimal format.  There are ten
digits, 0 through 9, that we use to describe numbers; and each
digit position is ten times as significant as the position to its
right.   The number ten is called the "base" of the decimal
format.  Computer programmers often find it convenient to use
other bases to specify numbers used in their programs.  The most
commonly-used bases are two (binary format), sixteen (hexadecimal
format), and eight (octal format).

The hexadecimal format requires sixteen digits.  The extra six
digits beyond 0 through 9 are denoted by the first six letters of
the alphabet: A for ten, B for eleven, C for twelve, D for
thirteen, E for fourteen, and F for fifteen.

In A86, a number must always begin with a digit from 0 through 9,
even if the base is hexadecimal.  This is so that A86 can
distinguish between a number and a symbol that happens to have
digits in its name.  If a hexadecimal number would begin with a
letter, you precede the letter with a zero.  For example, hex A0,
which is the same as decimal 160, would be written 0A0.

Because it is necessary for you to append leading zeroes to many
hex numbers, and because you never have to do so for decimal
numbers, I decided to make hexadecimal the default base for
numbers with leading zeroes.  Decimal is still the default base
for numbers beginning with 1 through 9.

Large numbers can be given as the operands to DD, DQ, or DT
directives.  For readability, you may freely intersperse
underscore characters anywhere with your numbers.

The default base can be overridden, with a letter or letters at
the end of the number: B or xB for binary, O or Q for octal, H
for hexadecimal, and D or xD for decimal.  Examples:

 077Q       octal, value is 8*7 + 7 = 63 in decimal notation
 123O       octal if the "O" is a letter: 64 + 2*8 + 3 = 83 decimal
 1230       decimal 1230: shows why you should use "Q" for octal!!
 01234567H  large constant
 0001_0000_0000_0000_0003R real number specified in hexadecimal
 100D       superfluous D indicates decimal base
 0100D      hex number 100D, which is 4096 + 13 = 5009 in decimal
 0100xD     decimal 100, since xD overrides the default hex format
 0110B      hex 110B, which is 4096 + 256 + 11 = 4363 in decimal
 0110xB     binary 4+2 = 6 in decimal notation
 110B       also binary 4+2 = 6, since "B" is not a decimal digit
                                                              8-2

The last five examples above illustrate why an "x" is sometimes
necessary before the base-override letter "B" or "D".  If that
letter can be interpreted as a hex digit, it is; the "x" forces
an override interpretation for the "B" or "D".  By the way, the
usage of lower case for x and upper case for the following
override letter is simply a recommendation; A86 treats upper-and
lower-case letters equivalently.


The RADIX Directive

The above-mentioned set of defaults (hex if leading zero, decimal
otherwise) can be overridden with the RADIX directive.  The RADIX
directive consists of the word RADIX followed by a number from 2
to 16.  The default base for the number is ALWAYS decimal,
regardless of any (or no) previous RADIX commands.  The number
gives the default base for ALL subsequent numbers, up to (but not
including) the next RADIX command.  If there is no number
following RADIX, then A86 returns to its initial mixed default of
hex for leading zeroes, decimal for other leading digits.

For compatibility with IBM's assembler, RADIX can appear with a
leading period; although I curse the pinhead designer who put
that period into IBM's language.

As an alternative to the RADIX directive, I provide the D switch,
which causes A86 to start with decimal defaults.  You can put +D
into the A86 command invocation, or into the A86 environment
variable.  The first RADIX command in the program will override
the D switch setting.

Following are examples of radix usage.  The numbers in the
comments are all in decimal notation.

  DB 10,010     ; produces 10,16 if RADIX was not seen yet
                ;   and +D switch was not specified
RADIX 10
  DB 10,010     ; produces 10,10
RADIX 16
  DB 10,010     ; produces 16,16
RADIX 2
  DB 10,01010   ; produces 2,10
RADIX 3         ; for Martian programmers in Heinlein novels
  DB 10,100     ; produces 3,9
RADIX
  DB 10,010     ; produces 10,16
                                                              8-3

Floating Point Initializations

A86 allows floating point numbers as the operands to DD, DQ, and
DT directives.  The numbers are encoded according to the IEEE
standard, followed by the 8087 and 287 coprocessors.  The format
for floating point constants is as follows: First, there is a
decimal number containing a decimal point.  There must be a
decimal point, or else the number is interpreted as an integer.
There must also be at least one decimal digit, either to the left
or right of the decimal point, or else the decimal point is
interpreted as an addition (structure element) operator.
Optionally, there may follow immediately after the decimal number
the letter E followed by a decimal number.  The E stands for
"exponent", and means "times 10 raised to the power of".  You may
provide a + or - between the E and its number.  Examples:

  0.1             constant one-tenth
  .1              the same
  300.            floating point three hundred
  30.E1           30 * 10**1; i.e., three hundred
  30.E+1          the same
  30.E-1          30 * 10**-1; i.e., three
  30E1            not floating point: hex integer 030E1
  1.234E20        scientific notation: 1.234 times 10 to the 20th
  1.234E-20       a tiny number: 1.234 divided by 10 to the 20th


Overview of Expressions

Most of the operands that you code into your instructions and
data initializations will be simple register names, variable
names, or constants.  However, you will regularly wish to code
operands that are the results of arithmetic calculations,
performed either by the machine when the program is running (for
indexing), or by the assembler (to determine the value to
assemble into the program).  A86 has a full set of operators that
you can use to create expressions to cover these cases:

* Arithmetic Operators
             byte isolation and combination (HIGH, LOW, BY)
             addition and subtraction (+,-)
             multiplication and division (* , /, MOD)
             shifting operators (SHR, SHL, BIT)

* Logical Operators
             (AND, OR, XOR, NOT)

* Boolean Negation Operator
             (!)

* Relational Operators
             (EQ, LE, LT, GE, GT, NE)

* String Comparison Operators
             (EQ, NE, =)
                                                              8-4

* Attribute Operators/Specifiers
             size specifiers (B=BYTE,W=WORD,F=FAR,SHORT,LONG)
             attribute specifiers (OFFSET,NEAR,brackets)
             segment addressing specifier (:)
             compatibility operators (PTR,ST)
             built-in value specifiers (TYPE,THIS,$)

* Special Data Duplication Operator
             (DUP)  --see Chapter 9 for a description


Types of Expression Operands

Numbers and Label Addresses

A number or constant (16-bit number) can be used in most
expressions.   A label (defined with a colon) is also treated as
a constant and so can be used in expressions, except when it is a
forward reference.

Variables

A variable stands for a byte- or word-memory location.   You may
add or subtract constants from variables; when you do so, the
constant is added to the address of the variable.  You typically
do this when the variable is the name of a memory array.

Index Expressions

An index expression consists of a combination of a base register
[BX] or [BP], and/or an index register [SI] or [DI], with an
optional constant added or subtracted.   You will usually want to
precede the bracketed expression with B, W, or F; to specify the
kind of memory unit (byte, word, or far pointer) you are
referring to.  The expression stands for the memory unit whose
address is the run-time value(s) of the base and/or index
registers added to the constant.  See the Effective Address
section and the beginning of this chapter for more details on
indexed memory.


Arithmetic Operators


HIGH/LOW

Syntax:  HIGH operand
         LOW operand

These operators are called the "byte isolation" operators.  The
operand  must evaluate to a 16-bit number.   HIGH returns the
high order byte of the number; LOW the low order byte.

For example,

  MOV AL,HIGH(01234)     ; AL = 012
  TENHEX EQU LOW(0FF10)  ; TENHEX = 010
                                                              8-5

These operators can be applied to each other.   The following
identities apply:

LOW LOW Q = LOW Q
LOW HIGH Q = HIGH Q
HIGH LOW Q = 0
HIGH HIGH Q = 0


BY

Syntax:  operand BY operand

This operator is a "byte combination" operator.  It returns the
word whose high byte is the left operand, and whose low byte is
the right operand.  For example, the expression 3 BY 5 is the
same as hexadecimal 0305.  The BY operator is exclusive to A86. I
added it to cover the following situation: Suppose you are
initializing your registers to immediate values.  Suppose you
want to initialize AH to the ASCII value 'A', and AL to decimal
10.  You could code this as two instructions MOV AH,'A' and MOV
AL,10; but you realize that a single load into the AX register
would save both program space and execution time.  Without the BY
operator, you would have to code MOV AX,0410A, which disguises
the types of the individual byte operands you were thinking
about.  With BY, you can code it properly: MOV AX,'A' BY 10.


Addition (combination)

Syntax:  operand + operand
         operand.operand
         operand PTR operand
         operand operand

As shown in the above syntax, addition can be accomplished in
four ways: with a plus sign, with a dot operator, with a PTR
operator, and simply by juxtaposing two operands next to each
other.  The dot and PTR operators are provided for compatibility
with Intel/IBM assemblers.  The dot is used in structure field
notation; PTR is used in expressions such as BYTE PTR 0.  (See
Chapter 12 for recommendations concerning PTR.)

If either operand is a constant, the answer is an expression with
the typing of the other operand, with the offsets added.  For
example, if BVAR is a byte variable, then BVAR + 100 is the byte
variable 100 bytes beyond BVAR.

Other examples:

   DB 100+17         ; simple addition
   CTRL EQU -040
   MOV AL,CTRL'D'    ; a nice notation for control-D!
   MOV DX,[BP].SMEM  ; --where SMEM was in an unindexed structure
   DQ  10.0 + 7.0    ; floating point addition
                                                              8-6

Subtraction

Syntax:  operand - operand

The subtraction operator may have operands that are:

  a. both absolute numbers

  b. variable names that have the same type

The result is an absolute number; the difference between the two
operands.

Subtraction is also allowed between floating point numbers; the
answer is the floating point difference.


Multiplication and Division

Syntax:   operand * operand     (multiplication)
          operand / operand     (division)
          operand MOD operand   (modulo)

You may only use these operators with absolute or floating point
numbers, and the result is always the same type.  Either operand
may be a numeric expression, as long as the expression evaluates
to an absolute or floating point number.  Examples:

CMP AL,2 * 4    ; compare AL to 8
MOV BX,0123/16  ; BX = 012
DT  1.0 / 7.0


Shifting Operators

Syntax:  operand SHR count   (shift right)
         operand SHL count   (shift left)
         BIT count           (bit number)

The shift operators will perform a "bit-wise" shift of the
operand.  The operand will be shifted "count" bits either to the
right or the left.  Bits shifted into the operand will be set to
0.

The expression "BIT count" is equivalent to "1 SHL count"; i.e.,
BIT returns the mask of the single bit whose number is "count".
The operands must be numeric expressions that evaluate to
absolute numbers.  Examples:

MOV BX, 0FACBH SHR 4   ; BX = 0FACH
OR AL,BIT 6            ; AL = AL OR 040; 040 is the mask for bit 6
                                                              8-7

Logical Operators

Syntax:  operand OR operand
         operand XOR operand
         operand AND operand
         NOT operand

The logical operators may only be used with absolute numbers.
They always return an absolute number.

Logical operators operate on individual bits.   Each bit of the
answer depends only on the corresponding bit in the operand(s).

The functions performed are as follows:

1.  OR: An answer bit is 1 if either or both of the operand bits
    is 1.   An answer bit is 0 only if both operand bits are 0.

Example:

11110000xB OR 00110011xB = 11110011xB


2.  XOR: This is "exclusive OR."  An answer bit is 1 if the
    operand bits are different; an answer bit is 0 if the operand
    bits are the same.   Example:

11110000xB XOR 00110011xB = 11000011xB


3.  AND: An answer bit is 1 only if both operand bits are 1.   An
    answer bit is 0 if either or both operand bits are 0.
    Example:

11110000xB AND 00110011xB = 00110000xB

4.  NOT: An answer bit is the opposite of the operand bit.   It
    is 1 if the operand bit is 0; 0 if the operand bit is 1.
    Example:

NOT 00110011xB = 11001100xB


Boolean Negation Operator

Syntax:  ! operand

The exclamation-point operator, rather than reversing each
individual bit of the operand, considers the entire operand as a
boolean variable to be negated.  If the operand is non-zero (any
of the bits are 1), the answer is 0.  If the operand is zero, the
answer is 0FFFF.
                                                              8-8

Because ! is intended to be used in conditional assembly
expressions (described in Chapter 11), there is also a special
action when ! is applied to an undefined name: the answer is the
defined value 0FFFF, meaning it is TRUE that the symbol is
undefined.  Similarly, when ! is applied to some defined quantity
other than an absolute constant, the answer is 0, meaning it is
FALSE that the operand is undefined.


Relational Operators

Syntax:    operand EQ operand    (equal)
           operand NE operand    (not equal)
           operand LT operand    (less than)
           operand LE operand    (less or equal)
           operand GT operand    (greater than)
           operand GE operand    (greater or equal)

The relational operators may have operands that are:

  a.  both absolute numbers

  b.  variable names that have the same type

The result of a relational operation is always an absolute
number.  They return an 8-or 16-bit result of all 1's for TRUE
and all 0's for FALSE.  Examples:

MOV AL, 3 EQ 0     ; AL = 0 (false)
MOV AX, 2 LE 15    ; AX = 0FFFFH (true)


String Comparison Operators

Syntax:    string EQ string    (equal)
           string NE string    (not equal)
           string = string     (equal ignoring case)

In order to subsume the string comparison facilities offered by
That Other Assembler's special conditional-assembly directives
IFIDN and IFDIF, A86 allows the relational operators EQ and NE to
accept string arguments.  For this syntax to be accepted by A86,
both strings must be bounded using the same delimiter (either
single quotes for both strings, or double quotes for both
strings).  For a match (EQ returns TRUE or NE returns FALSE), the
strings must be the same length, and every character must match
exactly.
                                                              8-9

An additional A86-exclusive feature is the = operator, which
returns TRUE if the characters of the strings differ only in the
bit masked by the value 020.  Thus you may use = to compare a
macro parameter to a string containing nothing but letters.  The
comparison will be TRUE whether the macro parameter is upper-case
or lower-case.  No checking is made to detect non-letters, so if
you use = on strings containing non-letters, you may get some
false TRUE results.  Also, = is accepted when it is applied to
non-strings as well-- the corresponding values are interpreted as
two-byte strings, with the 020 bits masked away before
comparison.


Attribute Operators/Specifiers


B,W,D,Q,T memory variable specifiers

Syntax:  B operand          Q operand
         operand B          operand Q
         W operand          T operand
         operand W          operand T
         D operand
         operand D

B, W, D, F, Q, and T convert the operand into a byte, word,
doubleword, far, quadword, and ten-byte variable, respectively.
The operand can be a constant, or a variable of the other type.
Examples:

ARRAY_PTR:
  DB 100 DUP (?)
WVAR  DW ?
  MOV AL,ARRAY_PTR B  ; load first byte of ARRAY_PTR array into AL
  MOV AL,WVAR B       ; load the low byte of WVAR into AL
  MOV AX,W[01000]     ; load AX with the memory word at loc. 01000
  LDS BX,D[01000]     ; load DS:BX with the doubleword at loc. 01000
  JMP F[01000]        ; jump far to the 4-byte location at 01000
  FLD T[BX]           ; load ten-byte number at [BX] to 87 stack


For compatibility with Intel/IBM assemblers, A86 accepts the more
verbose synonyms BYTE, WORD, DWORD, FAR, QWORD, and TBYTE for
B,W,D,F,Q,T, respectively.


SHORT and LONG Operators

Syntax:    SHORT label
           LONG label
                                                             8-10

The SHORT operator is used to specify that the label referenced
by a JMP instruction is within 127 bytes of the end of the
instruction.  The LONG operator specifies the opposite: that the
label is not within 127 bytes.  The appropriate operator can (and
sometimes must) be used if the label is forward referenced in the
instruction.

When a non-local label is forward referenced, the assembler
assumes that it will require two bytes to represent the relative
offset of the label (so the instruction including the opcode byte
will be three bytes).  By correctly using the SHORT operator, you
can save a byte of code when you use a forward reference. If the
label is not within the specified range, an error will occur. The
following example illustrates the use of the SHORT operator.

JMP FWDLAB        ; three byte instruction
JMP SHORT FWDLAB  ; two byte instruction
JMP >L1           ; two byte instruction assumed for a local label

Because the assembler assumes that a forward reference local
label is SHORT, you may sometimes be forced to override this
assumption if the label is in fact not within 127 bytes of the
JMP.  This is why LONG is provided:

JMP LONG >L9      ; three byte instruction

If you are bothered by this possibility, you can specify the +L
switch, which causes A86 to pessimistically generate the three
byte JMP for all forward references, unless specifically told not
to with SHORT.

NOTE that LONG will have effect only on the operand to an
unconditional JMP instruction; not to conditional jumps.  This is
because the conditional jumps don't have 3-byte forms; the only
conditional jumps are short ones.  If you run into this problem,
then chances are your code is getting out of control--time to
rearrange, or to break off some of the intervening code into
separate procedures.  If you insist upon leaving the code intact,
you can replace the conditional jump with an "IF cond JMP".


OFFSET Operator

Syntax:  OFFSET var-name

OFFSET is used to convert a variable into the constant pointer to
the variable.  For example, if you have declared  XX DW ?, and
you want to load SI with the pointer to the variable XX, you can
code: MOV SI,OFFSET XX.  The simpler instruction MOV SI,XX moves
the variable contents of XX into SI, not the constant pointer to
XX.
                                                             8-11

NEAR Operator

Syntax:  NEAR operand

NEAR converts the operand to have the type of a code label, as if
it were defined by appearing at the beginning of a program line
with a colon after it.  NEAR is provided mainly for compatibility
with Intel/IBM assemblers.


Square Brackets Operator

Syntax:  [operand]

Square brackets around an operand give the operand a memory
variable type.  Square brackets are generally used to enclose the
names of base and index registers: BX, BP, SI, and DI.  When the
size of the memory variable can be deduced from the context of
the expression, square brackets are also used to turn numeric
constants into memory variables.  Examples:

  MOV B[BX+50],047  ; move imm value 047 into mem byte at BX+50
  MOV AL,[050]      ; move byte at memory location 050 into AL
  MOV AL,050        ; move immediate value 050 into AL


Colon Operator

Syntax:   constant:operand
          segreg:operand
          seg_or_group_name:operand

The colon operator is used to attach a segment register value to
an operand.  The segment register value appears to the left of
the colon; the rest of the operand appears to the right of the
colon.

There are three forms to the colon operator.  The first form has
a constant as the segment register value.  This form is used to
create an operand to a long (inter-segment) JMP or CALL
instruction.  An example of this is the instruction JMP 0FFFF:0,
which jumps to the cold-boot reset location of the 86 processor.

The only context other than JMP or CALL in which this first form
is legal, is as the operand to a DD directive or an EQU
directive.  The EQU case has a further restriction: the offset
(the part to the right of the colon) must have a value less than
256.  This is because there simply isn't room in a symbol table
entry for a segment register value AND a 2-byte offset.  I don't
think you will be hurt by this restriction, since references to
other segments are usually to jump tables at the beginning of
those segments.
                                                             8-12

The second form has a segment register name to the left of the
colon.  This is the segment override form, provided for
compatibility with Intel/IBM assemblers.  A86 will generate a
segment override byte when it sees this form, unless the operand
to the right of the colon already has a default segment register
that is the same as the given override.

I prefer the more explicit method of overrides, exclusive to A86:
simply place the segment register name before the instruction
mnemonic.  For example, I prefer ES MOV AL,[BX] to MOV
AL,ES:[BX].

The third form has a segment or group name before the colon. This
form is ignored by A86; it is provided for compatibility with
Turbo C, which likes to include spurious DGROUP: overrides, to
satisfy MASM's ASSUME-checking.


ST Operator

ST is ignored whenever it occurs in an expression.  It is
provided for compatibility with Intel and IBM assemblers. For
example, you can code FLD ST(0),ST(1), which will be taken by A86
as FLD 0,1.


TYPE Operator

Syntax:  TYPE operand

The TYPE operator returns 1 if the operand is a byte variable; 2
if the operand is a word variable; 4 if the operand is a
doubleword variable; 8 if the operand is a quadword variable; 10
if the operand is a ten-byte variable; and the number of bytes
allocated by the structure if the operand is a structure name
(see STRUC in the next chapter).

A common usage of the TYPE operator is to represent the number of
bytes of a named structure.  For example, if you have declared a
structure named LINE (as described in the next chapter) that
defines 82 bytes of storage, then two ways you might refer to the
value symbolically are as follows:

  MOV CX,TYPE LINE     ; loads the size of LINE into CX
  DB TYPE LINE DUP ?   ; allocates an area of memory for a LINE


THIS and $ Specifiers

THIS returns the value of the current location counter.  It is
provided for compatibility with Intel/IBM assemblers.  The dollar
sign $ is the more standard and familiar specifier for this
purpose; it is equivalent to THIS NEAR.  THIS is typically used
with the BYTE and WORD specifiers to create alternate-typed
symbols at the same memory location:
                                                             8-13

     BVAR EQU THIS BYTE
     WVAR  DW ?

I don't recommend the use of THIS.  If you wish to retain Intel
compatibility, you can use the less verbose LABEL directive:

      BVAR LABEL BYTE
      WVAR  DW ?

If you are not concerned with compatibility to lesser assemblers,
A86 offers a variety of less verbose forms.  The most concise is
DB without an operand:

      BVAR DB
      WVAR DW ?

If this is too cryptic for you, there is always BVAR EQU B[$].


Operator Precedence

Consider the expression 1 + 2 * 3.  When A86 sees this
expression, it could perform the multiplication first, giving an
answer of 1+6 = 7; or it could do the addition first, giving an
answer of 3*3 = 9.  In fact, A86 does the multiplication first,
because A86 assigns a higher precedence to multiplication than it
does addition.

The following list specifies the order of precedence A86 assigns
to expression operators. All expressions are evaluated from left
to right following the precedence rules.  You may override this
order of evaluation and precedence through the use of parentheses
( ).  In the example above, you could override the precedence by
parenthesizing the addition: (1+2) * 3.

Some symbols that we have referred to as operators, are treated
by the assembler as operands having built-in values.  These
include B, W, F, $, and ST.  In a similar vein, a segment
override term (a segment register name followed by a colon) is
recorded when it is scanned, but not acted upon until the entire
containing expression is scanned and evaluated.

If two operators are adjacent, the rightmost operator must have
precedence; otherwise, parentheses must be used.  For example,
the expression BIT ! 1 is illegal because the leftmost operator
BIT has the higher precedence of the two adjacent operators BIT
and "!".  You can code BIT (! 1).

--Highest Precedence--
                                                             8-14

1.  Parenthesized expressions
2.  Period
3.  OFFSET, SEG, TYPE, and PTR
4.  HIGH, LOW, and BIT
5.  Multiplication and division: *, /, MOD, SHR, SHL
6.  Addition and subtraction: +,-
       a. unary
       b. binary
7.  Relational: EQ, NE, LT, LE, GT, GE =
8.  Logical NOT and !
9.  Logical AND
10. Logical OR and XOR
11. Colon for long pointer, SHORT, LONG, and BY
12. DUP

--Lowest Precedence--