686 lines
32 KiB
Plaintext
686 lines
32 KiB
Plaintext
CHAPTER 10 RELOCATION AND LINKAGE
|
||
|
||
|
||
A86 allows you to produce either .COM files, which can be run
|
||
immediately as standalone programs, or .OBJ files, to be fed to
|
||
the MS-DOS LINK program. In this chapter I'll discuss .OBJ mode
|
||
of A86.
|
||
|
||
|
||
.OBJ Production Made Easy
|
||
|
||
I'll start by giving you the minimum amount of information you
|
||
need to know to produce .OBJ files. If you are writing short
|
||
interface routines, and do not want to concern yourself with the
|
||
esoterica of .OBJ files (segments, groups, publics, etc.), you
|
||
can survive quite nicely by reading only this section.
|
||
|
||
There are two ways you can cause A86 to produce a .OBJ file as
|
||
its object output. One way is to explicitly give .OBJ as the
|
||
output file name: for example, you can assemble the source file
|
||
FOO.8 by giving the command "A86 FOO.8 FOO.OBJ". The other way
|
||
is to specify the switch +O (letter O not digit 0). This is
|
||
illustrated by the invocation "A86 +O FOO.8", which will have the
|
||
same effect as the first invocation.
|
||
|
||
My design philosophy for .OBJ production is to accommodate two
|
||
types of user. The first type of user is writing new code, to
|
||
link to other (usually high level language) modules. That person
|
||
should be able to write the module with a minimum of red tape,
|
||
and have A86 do the right thing. The second type of user has
|
||
existing modules written for Intel/IBM assemblers, and wants to
|
||
port them to A86. A86 should recognize and act upon all the
|
||
relocation directives (SEGMENT, GROUP, PUBLIC, EXTRN, NAME, END)
|
||
given. The assembly should work even if several files, assembled
|
||
separately under the Intel/IBM assembler, are fed to a single A86
|
||
assembly. You'll see if you read on through this entire chapter
|
||
that the multiple-files requirement causes A86 to interpret some
|
||
of the relocation directives a little differently (while
|
||
achieving compatible results).
|
||
|
||
Let's suppose you're writing new code: for example, an interface
|
||
routine to the "C" language, that multiplies a 16-bit number by
|
||
10. "C" pushes the input number onto the stack, before calling
|
||
your routine. Your code needs to get the number, multiply it by
|
||
10, and return the answer in the AX register. You can code it:
|
||
|
||
_MUL10: ; "C" expects all public names to start with "_"
|
||
PUSH BP ; "C" expects BP to be preserved
|
||
MOV BP,SP ; we use BP to address the stack
|
||
MOV AX,[BP+4] ; fetch the number N, beyond BP and the ret addr
|
||
ADD AX,AX ; 2N
|
||
MOV BX,AX ; 2N is saved in BX
|
||
ADD AX,AX ; 4N
|
||
ADD AX,AX ; 8N
|
||
ADD AX,BX ; 8N + 2N = 10N
|
||
POP BP ; BP is restored
|
||
RET ; go back to caller
|
||
10-2
|
||
|
||
These 11 lines can be your entire source file! If you name the
|
||
file MUL10.8, A86 will create an object file MUL10.OBJ, that
|
||
conforms to the standard SMALL model of computation for high
|
||
level languages. If you use RETF instead of RET (thus, by the
|
||
way, getting the operand from BP+6 instead of BP+4), the object
|
||
module will conform to the standard LARGE model of computation.
|
||
All the red tape information required by the high level language
|
||
is provided implicitly by A86. I'll go through this information
|
||
in detail later, but you should need to read about it only if
|
||
you're curious.
|
||
|
||
What happens if you need to access symbols outside the module
|
||
you're assembling? If the type of the symbol is correctly
|
||
guessed from the instruction that refers to it, then you can
|
||
simply refer to it, and leave it undefined within the module. For
|
||
example, if A86 sees the instruction CALL PRINT with PRINT
|
||
undefined, it will assume that PRINT is a NEAR procedure. If
|
||
PRINT is never defined within the module, A86 will act as if you
|
||
declared PRINT via the directive EXTRN PRINT:NEAR. The address
|
||
of PRINT will be plugged into your instruction by LINK when it
|
||
combines A86's .OBJ file with the high level language's .OBJ
|
||
files, to make the final program.
|
||
|
||
In general, the undefined operand to any CALL or JMP instruction
|
||
is assumed to be NEAR. The second (source) operand to a MOV or
|
||
arithmetic instruction is assumed to be ABS (i.e., an immediate
|
||
constant). An undefined first (destination) operand is assumed
|
||
to be a simple memory variable, of the same size (BYTE or WORD)
|
||
as the register given in the second operand. If your external
|
||
symbol does not comply with these guidelines, you need to declare
|
||
it with an EXTRN before you use it. (You can also use EXTRN to
|
||
declare types of non-complying forward references within your
|
||
module, as you'll see later.)
|
||
|
||
If you'd like to link the MUL10 procedure to Turbo Pascal V4.0 or
|
||
later, you need to append the line CODE SEGMENT PUBLIC to the top
|
||
of the program, to name the program segment according to Turbo
|
||
Pascal's expectations. You may dispense with the leading
|
||
underscore in the name MUL10-- Turbo Pascal does not require or
|
||
expect it.
|
||
|
||
At this point, if you're a casual user, I think you've read
|
||
enough to get going! Read further only if you wish; or if you
|
||
get stuck, and need to master the esoterica.
|
||
10-3
|
||
|
||
Overview of Relocation and Linkage
|
||
|
||
When you assemble a program directly into a .COM file, the
|
||
program has just two forms: the source program, that you can
|
||
understand, and the .COM file, that the computer can "understand"
|
||
(i.e., execute). A .OBJ file is an intermediate format: neither
|
||
you nor the (executing) computer can make sense out of a .OBJ
|
||
file; only programs like LINK interpret .OBJ files. The purpose
|
||
of a .OBJ file is to allow you to assemble or compile just a part
|
||
of a program. The other parts (also in the form of .OBJ files)
|
||
can be produced at a different time; often by a different
|
||
assembler or compiler, whose source files are in a different
|
||
language. It's easy to see where the word "linkage" comes from:
|
||
the LINK program puts the pieces of a program together. The
|
||
"relocation" comes because the assembler or compiler that makes a
|
||
given program piece doesn't know how many other pieces will come
|
||
before it, or how big the other pieces will be. Each piece is
|
||
constructed as if it started at location 0 within the program;
|
||
then LINK "relocates" the piece to its true location.
|
||
|
||
Many of the relocation features of 86 assembly language are
|
||
couched in terms of LINK's point of view, so we must look at the
|
||
way LINK sees things. LINK calls a .OBJ file an "object module",
|
||
or just "module". Each module has a NAME, that can be referred
|
||
to when LINK issues diagnostic messages, such as error messages
|
||
and symbol maps. If a program symbol is used only within a
|
||
single module, it does not need to be given to LINK, except
|
||
possibly to pass along to a symbolic debugger. On the other
|
||
hand, if a program symbol is defined in one module and referenced
|
||
in other modules, then LINK needs to know the name of the symbol,
|
||
so it can resolve the references. Such a symbol is PUBLIC in the
|
||
module in which it is defined; it is "external" in the other
|
||
modules, containing references to it. Finally, exactly one
|
||
module in a program must contain the starting location for the
|
||
program; that module is called the "main module", and it must
|
||
supply the starting address (which is not necessarily at the
|
||
beginning of the module).
|
||
|
||
In the 86 family of microprocessors, the LINK system also does
|
||
much to manage the memory segments that a program will fit into,
|
||
and get its data from. The (grotesquely ornate) level of support
|
||
for segmentation was dictated by Intel when it specified (and IBM
|
||
and the compiler makers accepted) the format that .OBJ files will
|
||
have. I attended the fateful meeting at Intel, in which the
|
||
crucial design decisions were made. I regret to say that I sat
|
||
quietly, while engineers more senior than I applied their fertile
|
||
imaginations to construct fanciful scenarios which they felt had
|
||
to be supported by LINK. Let's now review the resulting
|
||
segmentation model.
|
||
10-4
|
||
|
||
The parts of a program, as viewed by LINK, come in three
|
||
different sizes: they can be (1) pieces of a single segment, (2)
|
||
an entire single segment, or (3) a sequence of consecutive
|
||
segments in 86 memory. Size (1) should have been called
|
||
something like FRAGMENT, but is instead called SEGMENT. Size (2)
|
||
should have been called SEGMENT, but is instead called GROUP.
|
||
Size (3) should have been called "group", but is instead called
|
||
"class". Let me cling to the sensible terminology for one more
|
||
paragraph, while I describe the worst scenario Intel wanted to
|
||
support; then when I discuss individual directives, I'll
|
||
regretfully revert to the official terminology.
|
||
|
||
The scenario is as follows: suppose you have a program that
|
||
occupies about 100K bytes of memory. The program contains a core
|
||
of 20K bytes of utility routines that every part of the program
|
||
calls. You'd like every part of the program to be able to call
|
||
these routines, using the NEAR form to save memory. By gum, you
|
||
can do it! You simply(!) slice the program into three fragments:
|
||
the utility routines will go into fragment U, and the rest of the
|
||
program will be split into equal-sized 40K-byte fragments A and
|
||
B. Now you arrange the fragments in 8086 memory in the order
|
||
A,U,B. The fragments A and U form a 60K-byte block, addressed by
|
||
a segment register value G1, that points to the beginning of A.
|
||
The fragments U and B form another 60K-byte block addressed by a
|
||
segment register value G2, that points to the beginning of U. If
|
||
you set the CS register to G1 when A is executing, and G2 when B
|
||
is executing, the U fragment is accessible at all times. Since
|
||
all direct JMPs and CALLs are encoded as relative offsets, the
|
||
U-code will execute direct jumps correctly whether addressed by
|
||
G1 with a huge offset, or G2 with a small offset. Of course, if
|
||
U contains any absolute pointers referring to itself (such as an
|
||
indirect near JMP or CALL), you're in trouble.
|
||
|
||
It's now been over a decade since the fateful design meeting took
|
||
place, and I can report that the above scenario has never taken
|
||
place in the real world. And I can state with some authority
|
||
that it never will. The reason is that the only programs that
|
||
exceed 64K bytes in size are coded in high level language, not
|
||
assembly language. High level language compilers follow a very,
|
||
very restricted segmentation model-- no existing model comes
|
||
remotely close to supporting the scheme suggested by the
|
||
scenario. But the 86 assembly language can support it-- the
|
||
directives "G1 GROUP A,U" and "G2 GROUP B,U", followed by chunks
|
||
of code of the appropriate object size, headed by directives "A
|
||
SEGMENT", "B SEGMENT", and "U SEGMENT". The LINK program is
|
||
supposed to sort things out according to the scenario; but I
|
||
can't say (and I have my doubts) if it actually succeeds in doing
|
||
so.
|
||
|
||
The concept of "class" was added as an afterthought, to implement
|
||
the more sensible and usable features that outsiders thought
|
||
GROUPs were implementing; namely, the ability to specify that
|
||
different (and disjoint!) segments occur consecutively in memory.
|
||
This allows programs to be arranged in a consistent manner-- for
|
||
example, with all program code followed by all static data
|
||
segments followed by all dynamically allocated memory.
|
||
10-5
|
||
|
||
The NAME Directive
|
||
|
||
Syntax: NAME module_name
|
||
|
||
The NAME directive specifies that "module_name" be given to LINK
|
||
as the name of the module produced by this assembly. The symbol
|
||
"module_name" can be used elsewhere in your program without
|
||
conflict: it can even, if you like, be a built-in assembler
|
||
mnemonic (e.g. "NAME MOV" is acceptable)!
|
||
|
||
If you do not provide a NAME directive, A86 will use the name of
|
||
the output object file, without the .OBJ extension. If you
|
||
provide more than one NAME directive, A86 will use the last one
|
||
given, with no error reported.
|
||
|
||
|
||
|
||
The PUBLIC Directive
|
||
|
||
Syntax: PUBLIC sym1, sym2, sym3, ...
|
||
PUBLIC
|
||
|
||
The PUBLIC directive allows you to explicitly list the symbols
|
||
defined in this assembly, that can be used by other modules. If
|
||
you do not give any PUBLIC directives in your program, A86 will
|
||
use every relocatable label and variable name in your program,
|
||
except local labels (the redefinable labels consisting of a
|
||
letter followed by digits: L7, M1, Q234, etc.). Symbols EQUated
|
||
to constants, and symbols defined within structures and DATA
|
||
SEGMENTs, are not implicitly declared PUBLIC: you have to
|
||
explicitly include them in a PUBLIC directive.
|
||
|
||
A86 maintains an internal flag, telling it whether to figure out
|
||
for itself which symbols are PUBLIC, or to let the program
|
||
explicitly declare them. The flag starts out "implicit", and is
|
||
set to "explicit" only if A86 sees a PUBLIC directive with no
|
||
names at all, or a PUBLIC directive containing at least one name
|
||
that would have been implicitly made PUBLIC.
|
||
|
||
If you are writing new code, you'll probably want to keep the
|
||
flag "implicit". You use the PUBLIC directive only for those
|
||
symbols which have the form of local labels, but aren't (e.g., a
|
||
memory variable I1987 for 1987 income); and for absolute values
|
||
that are globally accessed -- e.g. specify "PUBLIC
|
||
OPEN_FILES_LIMIT" for a symbol defined as "OPEN_FILES_LIMIT EQU
|
||
20".
|
||
|
||
If you are porting existing code, that code will already have
|
||
PUBLIC directives in it, and A86 will go to "explicit" mode,
|
||
duplicating the functionality of other assemblers.
|
||
|
||
The PUBLIC directive with no names is used to force "explicit"
|
||
mode, thus causing (if there are no further PUBLICs with names)
|
||
the .OBJ file to declare no symbols PUBLIC.
|
||
10-6
|
||
|
||
There is another side effect to the PUBLIC directive: if a symbol
|
||
is declared PUBLIC in a module, it had better be defined in that
|
||
module. If it isn't then A86 includes it in the .ERR listing of
|
||
undefined symbols in the module, and suppresses output of the
|
||
object file.
|
||
|
||
|
||
The EXTRN Directive
|
||
|
||
Syntax: EXTRN sym1:type, sym2:type, ...
|
||
|
||
where "type" is one of: BYTE WORD DWORD QWORD TBYTE FAR
|
||
or synonymously: B W D Q T F
|
||
or: NEAR ABS
|
||
|
||
The EXTRN directive allows you to attach a type to a symbol that
|
||
may not yet be defined (and may never be defined) within your
|
||
program. This is often necessary for the assembler to generate
|
||
the correct instruction form when the symbol is used as an
|
||
operand. All the possible types except ABS are defined elsewhere
|
||
in the A86 language, but I list them again here for convenience:
|
||
|
||
B or BYTE: byte-sized memory variable
|
||
W or WORD: word (2 byte) sized memory variable
|
||
D or DWORD: doubleword (4-byte) sized memory variable
|
||
Q or QWORD: quadword (8-byte) sized memory variable
|
||
T or TWORD: 10-byte-sized memory variable
|
||
NEAR: program label accessed within a segment
|
||
FAR: program label accessed from outside this segment
|
||
ABS: an absolute number (i.e., an immediate constant)
|
||
|
||
An example of EXTRN usage is as follows: suppose there is a word
|
||
memory variable IFARK in your program. The variable might be
|
||
declared at the end of the program; or it might be defined in a
|
||
module completely outside of this program. Without an EXTRN
|
||
directive, A86 will assemble an instruction such as "MOV
|
||
AX,IFARK" as the loading of an immediate constant IFARK into the
|
||
AX register. If you place the directive "EXTRN IFARK:W" at the
|
||
top of your program, you'll get the correct instruction form for
|
||
MOV AX,IFARK-- moving a word memory variable into the AX
|
||
register.
|
||
|
||
A86 will allow more than one EXTRN directive for a given symbol,
|
||
as long as the same type is given every time. A86 will even
|
||
allow an EXTRN directive for a symbol that has already been
|
||
defined, as long as the type declared is consistent with the
|
||
symbol's definition. These allowances exist so that you can
|
||
assemble multiple files written for another assembler, that had
|
||
been fed separately to that assembler.
|
||
10-7
|
||
|
||
Note that EXTRN is viewed quite differently by A86 than by other
|
||
assemblers. In fact, if it weren't for those other assemblers,
|
||
I'd use the mnemonic DECLARE instead of EXTRN. A86 doesn't
|
||
really use EXTRN to determine which symbols are external-- it
|
||
uses those symbols that are undefined at the end of assembly. As
|
||
I stated earlier in the chapter, an undefined symbol can be
|
||
referenced without being declared via EXTRN. Conversely, a
|
||
defined symbol can be declared (and redeclared) via EXTRN; being
|
||
defined, such a symbol will not be specified "external" in the
|
||
.OBJ file.
|
||
|
||
Because EXTRN is useful in forward reference situations, it is
|
||
now recognized even when A86 is assembling a .COM file.
|
||
|
||
For those of you who are accustomed to the more traditional use
|
||
of EXTRN, and who do not like external records to be created
|
||
"behind your back", A86 offers the "+x" option. If you include
|
||
"+x" in the program invocation, A86 will require that all
|
||
undefined symbols be explicitly declared via an EXTRN. Any
|
||
undefined, undeclared symbols will be included in the .ERR
|
||
listing of undefined symbols, and object file output will be
|
||
suppressed.
|
||
|
||
|
||
MAIN: The Starting Location for a Program
|
||
|
||
I've already stated that exactly one module in a program is the
|
||
"main" module, containing the starting address of the entire
|
||
program. In A86 when assembling .OBJ files, the starting address
|
||
is given by the label MAIN. You simply provide the label "MAIN:"
|
||
where you want the program to start. The module containing MAIN
|
||
is the main module.
|
||
|
||
|
||
The END Directive
|
||
|
||
Syntax: END
|
||
END start_addr
|
||
|
||
The END directive is used by other assemblers for two purposes,
|
||
both of which are now a little silly. The first purpose is to
|
||
signal the end of assembly. This was necessary back in the days
|
||
when source files were input on media such as paper tape: you had
|
||
to tell the assembler explicitly that the content of the tape has
|
||
ended. Today the operating system can tell you when you've
|
||
reached the end of the file, so this function is an anachronism.
|
||
|
||
The second purpose of END is, nonsensically, to allow you to
|
||
specify the starting location of the program. I suppose the
|
||
person who wrote the first assembler back in the 1950's was too
|
||
short on memory to implement a separate START directive, or a
|
||
MAIN label like A86 has, and decided to let END do double duty.
|
||
I've always considered the example "END START" to have an
|
||
Alice-in-Wonderland quality; it is fuel for the
|
||
high-level-language snobs who like to attack assembly language.
|
||
Please defeat the snobs, and use "MAIN:" if you are writing new
|
||
code.
|
||
10-8
|
||
|
||
For compatibility, A86 treats "END start_addr" exactly the same
|
||
as if you had coded "MAIN EQU start_addr". Note that if you want
|
||
your program to assemble under both A86 and that other assembler,
|
||
you can specify "END MAIN"-- A86 treats MAIN EQU MAIN as a legal
|
||
redefinition of the symbol MAIN.
|
||
|
||
A86 ignores END when there is no starting-address operand, thus
|
||
allowing assembly of multiple files written for other assemblers.
|
||
|
||
|
||
The SEGMENT Directive
|
||
|
||
Syntax: seg_name SEGMENT [align] [combine] ['class_name']
|
||
|
||
where "align" is one of: BYTE WORD PARA PAGE
|
||
"combine" is one of: PUBLIC STACK COMMON MEMORY
|
||
AT number
|
||
|
||
The SEGMENT directive says that assembled object code will
|
||
henceforth go to a block of code whose name is "seg_name".
|
||
"seg_name" is a symbol that represents a value that can be loaded
|
||
into a segment register. If "seg_name" is not declared in a
|
||
GROUP directive, then its value should in fact be loaded into a
|
||
segment register, in order to address the code. If "seg_name" is
|
||
declared in a GROUP directive, then the code is a a part of the
|
||
segment addressed by the name of the group.
|
||
|
||
A program can consist of any number of named segments, to be
|
||
combined in numerous exotic ways to produce the final program.
|
||
You can redirect your object output from one segment to another
|
||
in your assembly, by providing a SEGMENT directive before each
|
||
piece of code. You can even return to a segment you started
|
||
earlier, by repeating a SEGMENT with the same name-- the
|
||
assembler just picks up where it left off, subject to some
|
||
possible skipping for memory alignment, that I'll describe
|
||
shortly.
|
||
|
||
The specifications following the word SEGMENT help to describe
|
||
how the code in this module's part of the segment will be
|
||
combined with code for the same segment name given in other
|
||
modules; and also how this named segment will be grouped with
|
||
other named segments. Other assemblers require the
|
||
specifications to be given in the order indicated. A86 will
|
||
accept any order, and will accept commas between the
|
||
specifications if you want to provide them. The only restriction
|
||
is that "AT number" must be followed by a comma if it is not the
|
||
last specification on the line.
|
||
10-9
|
||
|
||
The "align" specification tells if each piece of code within the
|
||
segment should be aligned so that its starting address is an even
|
||
multiple of some number. BYTE alignment means there is no
|
||
requirement; WORD alignment requires each piece to start at a
|
||
multiple of 2; PARA alignment, at a multiple of 16; PAGE
|
||
alignment, at a multiple of 256. For example, suppose you have a
|
||
segment containing memory variables. You can declare the segment
|
||
with the statement "VAR_DATA SEGMENT WORD", which insures that
|
||
the segment is aligned to an even memory address. That way you
|
||
can insure that all 16-bit and bigger memory quantities in the
|
||
segment are aligned to even addresses, for faster access on the
|
||
16-bit machines of the 86 family.
|
||
|
||
There are special rules governing alignment for multiple pieces
|
||
of the same named segment within the same program module. Other
|
||
assemblers outlaw conflicting alignment specifications in this
|
||
situation; A86 accepts them, and uses the strictest specification
|
||
given. Furthermore, the alignment given for any specification
|
||
beyond the first will control the alignment for that piece of
|
||
code within this module's chunk. For example, if a program
|
||
contains two pieces of code headed by "VAR_DATA SEGMENT WORD",
|
||
A86 will insert a byte between the pieces if the first piece has
|
||
an odd number of bytes. This insures correct assembly for
|
||
multiple files written for another assembler.
|
||
|
||
If no "align" type is given for any of the pieces of a named
|
||
segment, an alignment of PARA is assumed.
|
||
|
||
|
||
The "combine" specification tells how the chunk of code from this
|
||
module will be combined with the chunks of the same named
|
||
segment, that come from other modules. Yes, I know, that sounds
|
||
like what "align" does; but "combine" takes a different, more
|
||
major point of view:
|
||
|
||
* PUBLIC is the kind of combination we've been talking about all
|
||
along: each piece of the segment is located off the end of the
|
||
previously linked piece, subject to possible gaps for
|
||
alignment. The size of the segment is the sum of the sizes of
|
||
the pieces, plus the sizes of the gaps.
|
||
|
||
* STACK is a combination type reserved for the system's stack
|
||
segment. To illustrate how STACK segment chunks are combined,
|
||
let's describe the only way a stack segment should ever be
|
||
used. We'll call the segment STACK; and we declare it as
|
||
follows:
|
||
|
||
STACK SEGMENT WORD STACK
|
||
DW 100 DUP (?)
|
||
TOP_OF_STACK:
|
||
10-10
|
||
|
||
The code just given declares a stack area of 200 bytes (100
|
||
words) for this module. If identical code occurs in each of
|
||
three modules which are then linked together, the resulting
|
||
STACK segment will have 600 bytes (the sizes are added), but
|
||
TOP_OF_STACK will be the same address (600) for each module
|
||
(each piece is overlayed at the top of the segment). That way,
|
||
every module can declare and access the top of the stack, which
|
||
is the only static part of the stack that any code should ever
|
||
refer to.
|
||
|
||
* COMMON is a type of memory area supported by FORTRAN. Each
|
||
module's chunk of a COMMON segment starts at location 0, and
|
||
overlaps (usually duplicates) the pieces from all the other
|
||
modules. The size of a COMMON segment is the size of the
|
||
largest chunk.
|
||
|
||
* MEMORY is supposed to be another kind of COMMON segment, that
|
||
is distinguished by automatically being located beyond all
|
||
other segments in memory. The MS_DOS LINK program, however,
|
||
does not implement MEMORY segments, and instead treats them
|
||
identically to PUBLIC (not COMMON!) segments. I can see no
|
||
useful purpose to the MEMORY combine type, since the
|
||
functionality can be achieved by putting a COMMON segment into
|
||
a 'class' by itself, that goes above all the other classes. So
|
||
don't use MEMORY.
|
||
|
||
Sorry, I don't support the assembly of multiple files written
|
||
for other assemblers, that contain STACK, COMMON, or MEMORY
|
||
segments. If I did, I would have to detect the file breaks,
|
||
and duplicate the overlapping functionality of these segment
|
||
types. Since I don't think anybody out there is using these
|
||
esoteric types, I didn't bother to support them to that extent.
|
||
Objections, anyone?
|
||
|
||
* "AT number" defines a non-combinable segment at the absolute
|
||
memory location whose segment register value is "number". This
|
||
form is useful for initializing data in fixed locations, such
|
||
as the 86 interrupt vector (IVECTOR SEGMENT AT 0 followed by
|
||
ORG 4 * INT_NUMBER), or for reading fixed memory locations,
|
||
such as the BIOS variables area (BIOS_DATA SEGMENT AT 040).
|
||
|
||
The combine type specification can be repeated in subsequent
|
||
pieces of a given segment, but if it is, it must be the same in
|
||
all pieces.
|
||
|
||
Finally, if no combine type is ever given for a named segment in
|
||
a module, that segment is non-combinable-- no other modules may
|
||
define that segment; the code given in the one module constitutes
|
||
the entire segment.
|
||
10-11
|
||
|
||
The last specification available on a SEGMENT line is the class
|
||
name, which is identified by being enclosed in single quotes.
|
||
Unlike a segment name, which can be used as an instruction
|
||
operand and hence cannot conflict with other assembler symbols, a
|
||
class name can be assigned without regard to its usage elsewhere
|
||
in the program. It can even be a built-in A86 mnemonic. In
|
||
fact, both the SMALL and LARGE high-level-language models specify
|
||
the class name 'CODE' for code segments, and the SMALL model
|
||
specifies the class name 'DATA'.
|
||
|
||
If no class name is given for a segment, A86 specifies the null
|
||
(zero length) string as the class name.
|
||
|
||
|
||
DATA SEGMENT, STRUC and CODE SEGMENT Directives
|
||
|
||
The DATA SEGMENT and STRUC directives work in .OBJ mode exactly
|
||
as they do in .COM mode-- they define a special assembly mode, in
|
||
which declarations are made, but no object code is output.
|
||
Offsets within DATA segments and structures are absolute, as in
|
||
.COM mode. Assembly resumes as before when an ENDS or CODE
|
||
SEGMENT directive is encountered.
|
||
|
||
For MASM compatibility (especially in modules written to link to
|
||
Turbo Pascal V4.0 programs), I now recognize the keywords CODE,
|
||
DATA, and STACK as ordinary relocatable segment names. The
|
||
ordinary functionality takes effect whenever a SEGMENT directive
|
||
is given with CODE, DATA or STACK as the segment name, and with
|
||
one or more relocatable parameters (e.g., PUBLIC) given after
|
||
SEGMENT.
|
||
|
||
|
||
The ENDS Directive
|
||
|
||
Syntax: [seg_name] ENDS
|
||
|
||
The ENDS directive closes out the segment currently being
|
||
assembled, and returns assembly to the segment being assembled
|
||
before the last SEGMENT directive. The "seg_name", if given,
|
||
must match the name in that last SEGMENT directive. ENDS allows
|
||
you to "nest" segments inside one another. For example, you can
|
||
declare some static data variables that are specific to a certain
|
||
section of code at the top of that section:
|
||
|
||
_DATA SEGMENT BYTE PUBLIC 'DATA'
|
||
VAR1 DB ?
|
||
VAR2 DB ?
|
||
_DATA ENDS
|
||
|
||
These four lines can be inserted inside any other segment being
|
||
assembled. They will cause the two variable allocations to be
|
||
tacked onto the segment _DATA; and assembly will then continue in
|
||
whatever segment surrounded the four lines. Observe that the
|
||
"nesting" does not occur in the final program; only the
|
||
presentation of the source code is nested.
|
||
10-12
|
||
|
||
If you are not nesting segments inside one another, then the ENDS
|
||
directive serves only to lend a clean, "block-structured"
|
||
appearance to your source code. It does not assist A86 in any
|
||
particular way; in fact, it consumes a bit more object output
|
||
memory (slightly reducing object output capacity) if you have
|
||
ENDSs, rather than just starting up new segments with SEGMENT
|
||
directives.
|
||
|
||
|
||
Default Outer SEGMENT
|
||
|
||
Other assemblers outlaw any code outside of a SEGMENT
|
||
declaration, forcing you to give a SEGMENT declaration before you
|
||
can assemble anything. A86 lets you assemble just your code; you
|
||
don't have to worry about SEGMENTs if you don't want to.
|
||
|
||
If you do provide code outside of all SEGMENT declarations, A86
|
||
performs the following steps, to find a reasonable place to put
|
||
the code:
|
||
|
||
1. If there are any segments explicitly declared whose name is or
|
||
ends with "_TEXT", then the first such segment declared is
|
||
used. It is as if the SEGMENT declaration appeared at the top
|
||
of, rather than within, the program.
|
||
|
||
2. If there is no such explicit segment, A86 creates a BYTE
|
||
PUBLIC segment of class 'CODE', and proceeds to construct a
|
||
name for the segment. If there are no RETF instructions in
|
||
the outer segment, the name chosen is "_TEXT", conforming to
|
||
the SMALL model of computation. If there is a RETF
|
||
instruction, the name chosen is "modulename_TEXT", where
|
||
"modulename" is the name of this module. Recall that
|
||
"modulename" comes from the NAME directive if there is one;
|
||
from the name of the .OBJ file if there isn't.
|
||
|
||
|
||
The GROUP Directive
|
||
|
||
Syntax: group_name GROUP seg_name1, seg_name2, ...
|
||
|
||
The GROUP directive causes A86 to tell LINK that all the listed
|
||
segments can fit into a single 64K-byte block of memory, and
|
||
instruct LINK to make that fit. (If they won't fit, LINK will
|
||
issue an error message.) Having declared the group, you can then
|
||
use "group_name" as the segment register value that will allow
|
||
simultaneous access to all the named segments. The order of
|
||
names given in the list does not necessarily determine the order
|
||
in which the segments will finally appear within the group.
|
||
|
||
The most useful application of the GROUP directive is to allow
|
||
you to structure the pieces of a program, all of whose code and
|
||
data will fit into a single 64K segment. You organize the pieces
|
||
into SEGMENTs, and declare all the SEGMENTs to be within the same
|
||
GROUP. When the program starts, all segment registers are set to
|
||
point to the GROUP, and you never have to worry about segment
|
||
registers again in the program.
|
||
10-13
|
||
|
||
WARNING: If your segments will be GROUPed in the final program,
|
||
you should have the appropriate GROUP directive in every module
|
||
assembled. If you don't, then any memory pointers generated will
|
||
be relative to the beginning of the individual named segments,
|
||
not to the beginning of the whole group.
|
||
|
||
Because of the obscure scenario I described in the Overview
|
||
section, Intel does not prohibit more than one GROUP from
|
||
containing some of the same segments; so neither does A86. Any
|
||
pointers within a segment will be calculated from the beginning
|
||
of the last GROUP that the segment was declared within. But
|
||
again, I have my doubts as to whether LINK will handle this
|
||
correctly.
|
||
|
||
|
||
The SEG Operator
|
||
|
||
Syntax: SEG operand
|
||
|
||
The SEG operator returns the segment containing its operand-- a
|
||
value suitable for loading into one of the segment registers. If
|
||
the operand is an explicit far constant such as 01811:0100, the
|
||
value returned is the lefthand component of the constant (01811
|
||
in this example). Otherwise, the result depends on A86's output
|
||
mode:
|
||
|
||
When A86 is assembling to an OBJ file, the result is the named
|
||
relocatable segment containing the operand. SEG is most useful
|
||
when the operand is not defined in this A86 module: in that case,
|
||
the segment value will be plugged in by LINK.
|
||
|
||
When A86 is assembling to a COM file, SEG always returns the CS
|
||
register, with one exception: symbols declared within a SEGMENT
|
||
AT structure return the value of the containing segment. COM
|
||
files have no facility for explicitly specifying relocatable
|
||
segments, so for compatibility A86 assumes that all non-absolute
|
||
segment references are to the program's segment itself.
|
||
|
||
|