252 lines
11 KiB
Plaintext
252 lines
11 KiB
Plaintext
CHAPTER 4 ELEMENTS OF THE A86 LANGUAGE
|
||
|
||
This chapter begins the description of the A86 language. It's a
|
||
bit more tutorial in nature than the rest of the manual. I'll
|
||
start by describing the elementary building blocks of the
|
||
language.
|
||
|
||
|
||
The A86 Language and the A86 Program
|
||
|
||
First, let's establish what we mean when we say A86. On one
|
||
hand, A86 is the name for my assembly language for the Intel 86
|
||
family of (IBM-PC and compatible) computers. Statements written
|
||
in this language are used to specify machine instructions for the
|
||
86 family and to allocate memory space for program data. On the
|
||
other hand, A86 is the name for a program called an assembler,
|
||
that translates these human readable statements into a machine
|
||
readable form. The input to the assembler is a source file (or a
|
||
list of source files) containing assembly language statements.
|
||
The output of the assembler is a file containing binary program
|
||
code that can either be run as a program on the PC, or combined
|
||
with other modules (using a linker) to make a program.
|
||
|
||
|
||
General Categories of A86 Elements
|
||
|
||
The statements in an A86 source file can be classified in three
|
||
general categories: instruction statements, data allocation
|
||
statements, and assembler directives. An instruction statement
|
||
uses an easily remembered name (a mnemonic) and possibly one or
|
||
more operands to specify a machine instruction to be generated. A
|
||
data allocation statement reserves, and optionally initializes,
|
||
memory space for program data. An assembler directive is a
|
||
statement that gives special instructions to the assembler.
|
||
Directives are unlike the instruction and data allocation
|
||
statements in that they do not specify the actual contents of
|
||
memory. Examples of the three types of A86 statements are given
|
||
below. These are provided to give you a general idea of what the
|
||
different kinds of statements look like.
|
||
|
||
Instruction Statements
|
||
|
||
MOV AX,BX
|
||
CALL SORT_PROCEDURE
|
||
ADD AL,7
|
||
|
||
Data Allocation Statements
|
||
|
||
A_VARIABLE DW 0
|
||
DB 'HELLO'
|
||
|
||
Assembler Directives
|
||
|
||
CODE SEGMENT
|
||
ITEM_COUNT EQU 5
|
||
4-2
|
||
|
||
The statements in an A86 source file are made up of keywords,
|
||
identifiers, numbers, strings, special characters, and comments.
|
||
A keyword is a symbol that has special meaning to the assembler,
|
||
such as an instruction mnemonic (MOV, CALL) or some other
|
||
reserved word in the assembly language (DB, SEGMENT, EQU).
|
||
Identifiers are programmer-defined symbols, used to represent
|
||
such things as variables, labels in the code, and numerical
|
||
constants. Identifiers may contain letters, numbers, and the
|
||
characters _, @, $, and ?, but must begin with a letter, _, or @.
|
||
The identifier name is considered unique up to 127 characters,
|
||
but it can be of any length (up to 255 characters). Examples of
|
||
identifiers are: COUNT, L1, and A_BYTE.
|
||
|
||
Numbers in A86 may be expressed as decimal, hexadecimal, octal,
|
||
or binary. These must begin with a decimal digit and, except in
|
||
the case of a decimal or hexadecimal number, must end with "x"
|
||
followed by a letter identifying the base of the number. A
|
||
number without an identifying base is hexadecimal if the first
|
||
digit is 0; decimal if the first digit is 1 through 9. Examples
|
||
of A86 numbers are: 123 (decimal), 0ABC (hexadecimal), 1776xQ
|
||
(octal), and 10100110xB (binary).
|
||
|
||
Strings are characters enclosed in single quotes. Examples of
|
||
strings are: '1st string' and 'SIGN-ON MESSAGE, V1.0'. The
|
||
single quote is one of many special characters used in the
|
||
assembly language. Others, run together in a list, are: ! $ ? ;
|
||
: = , [ ] . + - ( ) * / > ". The space and tab characters are
|
||
also special characters, used as separators in the assembly
|
||
language.
|
||
|
||
For compatibility with other assemblers, I now also accept double
|
||
quotes for strings.
|
||
|
||
A comment is a sequence of characters used for program
|
||
documentation only; it is ignored by the assembler. Comments
|
||
begin with a semicolon (;) and run to the end of the line on
|
||
which they are started. Examples of lines with comments are
|
||
shown below:
|
||
|
||
; This entire line is a comment.
|
||
MOV AX,BX ; This is a comment next to an instruction statement.
|
||
|
||
Alternatively, for compatibility with other assemblers, I provide
|
||
the COMMENT directive. The next non-blank character after
|
||
COMMENT is a delimiter to a comment that can run across many
|
||
lines; all text is ignored, until a second instance of the
|
||
delimiter is seen. For example,
|
||
|
||
COMMENT 'This comment
|
||
runs across two lines'
|
||
4-3
|
||
|
||
I don't like COMMENT, because I think it's very dangerous. If,
|
||
for example, you have two COMMENTs in your program, and you
|
||
forget to close the first one, the assembler will happily ignore
|
||
all source code between the comments. If that source code does
|
||
not happen to contain any labels referenced elsewhere, the error
|
||
may not be detected until your program blows up. For multiline
|
||
comments, I urge you to simply start each line with a semicolon.
|
||
|
||
Statements in the A86 are line oriented, which means that
|
||
statements may not be broken across line boundaries. A86 source
|
||
lines may be entered in a free form fashion; that is, without
|
||
regard to the column orientation of the symbols and special
|
||
characters.
|
||
|
||
PLEASE NOTE: Because an A86 line is free formatted, there is no
|
||
need for you to put the operands to your instructions in a
|
||
separate column. You organize things into columns when you want
|
||
to visually scan down the column; and you practically never scan
|
||
operands separate from their opcodes. The only reason that 99%
|
||
of the assembly-language programs out there in the world have
|
||
operands in a separate column is that some IBM assembler written
|
||
back in 1953 required it. It makes no sense to have operands in
|
||
a separate column, so STOP DOING IT!
|
||
|
||
|
||
Operand Typing and Code Generation
|
||
|
||
A86 is a strongly typed assembly language. What this means is
|
||
that operands to instructions (registers, variables, labels,
|
||
constants) have a type attribute associated with them which tells
|
||
the assembler something about them. For example, the operand 4
|
||
has type number, which tells the assembler that it is a numerical
|
||
constant, rather than a register or an address in the code or
|
||
data. The following discussion explains the types associated
|
||
with instruction operands and how this type information is used
|
||
to generate particular machine opcodes from general purpose
|
||
instruction mnemonics.
|
||
|
||
Registers
|
||
|
||
The 8086 has 8 general purpose word (two-byte) registers:
|
||
AX,BX,CX,DX,SI,DI,BP, and SP. The first four of those registers
|
||
are subdivided into 8 general purpose one-byte registers
|
||
AH,AL,BH,BL,CH,CL,DH, and DL. There are also 4 16-bit segment
|
||
registers CS,DS,ES, and SS, used for addressing memory; and the
|
||
implicit instruction-pointer register (referred to as IP,
|
||
although "IP" is not part of the A86 assembly language).
|
||
|
||
Variables
|
||
|
||
A variable is a unit of program data with a symbolic name,
|
||
residing at a specific location in 8086 memory. A variable is
|
||
given a type at the time it is defined, which indicates the
|
||
number of bytes associated with its symbol. Variables defined
|
||
with a DB statement are given type BYTE (one byte), and those
|
||
defined with the DW statement are given type WORD (two bytes).
|
||
Examples:
|
||
4-4
|
||
|
||
BYTE_VAR DB 0 ; A byte variable.
|
||
WORD_VAR DW 0 ; A word variable.
|
||
|
||
Labels
|
||
|
||
A label is a symbol referring to a location in the program code.
|
||
It is defined as an identifier, followed by a colon (:), used to
|
||
represent the location of a particular instruction or data
|
||
structure. Such a label may be on a line by itself or it may
|
||
immediately precede an instruction statement (on the same line).
|
||
In the following example, LABEL_1 and LABEL_2 are both labels for
|
||
the MOV AL,BL instruction.
|
||
|
||
LABEL_1:
|
||
LABEL_2: MOV AL,BL
|
||
|
||
In the A86 assembly language, labels have a type identical to
|
||
that of constants. Thus, the instruction MOV BX,LABEL_2 is
|
||
accepted, and the code to move the immediate constant address of
|
||
LABEL2 into BX, is generated.
|
||
|
||
IMPORTANT: you must understand the distinction between a label
|
||
and a variable, because you may generate a different instruction
|
||
than you intended if you confuse them. For example, if you
|
||
declare X: DW ?, the colon following the X means that X is a
|
||
label; the instruction MOV SI,X moves the immediate constant
|
||
address of X into the SI register. On the other hand, if you
|
||
declare X DW ?, with no colon, then X is a word variable; the
|
||
same instruction MOV SI,X now does something different: it loads
|
||
the run-time value of the memory word X into the SI register.
|
||
|
||
Constants
|
||
|
||
A constant is a numerical value computed from an assembly-time
|
||
expression. For example, 123 and 3 + 2 - 1 both represent
|
||
constants. A constant differs from an a variable in that it
|
||
specifies a pure number, known by the assembler before the
|
||
program is run, rather than a number fetched from memory when the
|
||
program is running.
|
||
|
||
|
||
Generating Opcodes from General Purpose Mnemonics
|
||
|
||
My A86 assembly language is modeled after Intel's ASM86 language,
|
||
which uses general purpose mnemonics to represent classes of
|
||
machine instructions rather than having a different mnemonic for
|
||
each opcode. For example, the MOV mnemonic is used for all of
|
||
the following: move byte register to byte register, load word
|
||
register from memory, load byte register with constant, move word
|
||
register to memory, move immediate value to word register, move
|
||
immediate value to memory, etc. This feature saves you from
|
||
having to distinguish "move" from "load," "move constant" from
|
||
"move memory," "move byte" from "move word," etc.
|
||
4-5
|
||
|
||
Because the same general purpose mnemonic can apply to several
|
||
different machine opcodes, A86 uses the type information
|
||
associated with an instruction's operands in determining the
|
||
particular opcode to produce. The type information associated
|
||
with instruction operands is also used to discover programmer
|
||
errors, such as attempting to move a word register to a byte
|
||
register.
|
||
|
||
The examples that follow illustrate the use of operand types in
|
||
generating machine opcodes and discovering programmer errors. In
|
||
each of the examples, the MOV instruction produces a different
|
||
8086 opcode, or an error. The symbols used in the examples are
|
||
assumed to be defined as follows: BVAR is a byte variable, WVAR
|
||
is a word variable, and LAB is a label. As you examine these MOV
|
||
instructions, notice that, in each case, the operand on the right
|
||
is considered to be the source and the operand on the left is the
|
||
destination. This is a general rule that applies to all
|
||
two-operand instruction statements.
|
||
|
||
MOV AX,BX ; (8B) Move word register to word register.
|
||
MOV AX,BL ; ERROR: Type conflict (word,byte).
|
||
MOV CX,5 ; (B9) Move constant to word register.
|
||
MOV BVAR,AL ; (A0) Move AL register to byte in memory.
|
||
MOV AL,WVAR ; ERROR: Type conflict (byte,word).
|
||
MOV LAB,5 ; ERROR: Can't use label/constant as dest. to MOV.
|
||
MOV WVAR,SI ; (89) Move word register to word in memory.
|
||
MOV BL,1024 ; ERROR: Constant is too large to fit in a byte.
|
||
|
||
|