
2836 lines
130 KiB
Raw Permalink Normal View History

2021-04-15 11:31:59 -07:00
William L. Peavy
Revised: August 11, 1990
This document provides a revised report on researches into
the structure and content of Unit (.TPU) files produced by
Turbo Pascal (version 5.5) from Borland International. No
assurances are possible regarding when (if ever) further
updates will be available so the material is released to the
Turbo Pascal user community in its admittedly imcomplete
state since very little of consequence really remains to be
Comments and feed-back are welcome -- especially new
contributions. I can be reached via the following services:
CompuServ (70042,2310)
HalPC Telecom-1 (William;Peavy)
HalPC Telecom-2 (Wm;Peavy)
Table Of Contents
Introduction ................................................ 3
1. Gross File Structure ..................................... 3
1.1 User Units .......................................... 4
2. Locators ................................................. 5
2.1 Local Links ......................................... 5
2.2 Global Links ........................................ 5
2.3 Table Offsets ....................................... 5
3. Unit Header .............................................. 6
3.1 Description ......................................... 6
3.2 File Size ........................................... 9
4. Symbol Dictionaries ...................................... 9
4.1 Organization ........................................ 9
4.2 Interface Dictionary ............................... 10
4.3 DEBUG Dictionary ................................... 10
4.4 Dictionary Elements ................................ 10
4.4.1 Hash Tables .................................. 10 Size ................................... 11 Scope .................................. 12 Special Cases .......................... 12
4.4.2 Dictionary Headers ........................... 13
4.4.3 Dictionary Stubs ............................. 13 Label Declaratives ("O") ............... 13 Un-Typed Constants ("P") ............... 14 Named Types ("Q") ...................... 14 Variables, Fields, Typed Cons ("R") .... 15 Subprograms & Methods ("S") ............ 16 Turbo Std Procedures ("T") ............. 17 Turbo Std Functions ("U") .............. 17 Turbo Std "NEW" Routine ("V") .......... 17 Turbo Std Port Arrays ("W") ............ 17 Turbo Std External Variables ("X") .... 17 Units ("Y") ........................... 18
4.4.4 Type Descriptors ............................. 19 Scope .................................. 19 Prefix Part ............................ 20 Suffix Parts ........................... 21 Un-Typed ......................... 21 Structured Types ................. 22 ARRAY Types ................ 22 RECORD Types ............... 22 OBJECT Types ............... 23 FILE (non-TEXT) Types ...... 23 TEXT File Types ............ 23 SET Types .................. 24
- i -
Table Of Contents POINTER Types .............. 24 STRING Types ............... 24 Floating-Point Types ............. 24 Ordinal Types .................... 24 "Integers" ................. 25 BOOLEANs ................... 25 CHARs ...................... 25 ENUMERATions ............... 26 SUBPROGRAM Types ................. 26
5. Maps and Lists .......................................... 27
5.1 PROC Map ........................................... 27
5.2 CSeg Map ........................................... 28
5.3 Typed CONST DSeg Map ............................... 28
5.4 Global VAR DSeg Map ................................ 29
5.5 Donor Unit List .................................... 29
5.6 Source File List ................................... 30
5.7 DEBUG Trace Table .................................. 31
6. Code, Data, Relocation Info ............................. 32
6.1 Object CSegs ....................................... 32
6.2 CONST DSegs ........................................ 32
6.3 Relocation Data Table .............................. 33
7. Supplied Program ........................................ 34
7.1 TPUNEW ............................................. 35 |
7.2 TPURPT1 ............................................ 35
7.3 TPUAMS1 ............................................ 35
7.4 TPUUNA1 ............................................ 35
7.5 Modifications ...................................... 36
7.6 Notes on Program Logic ............................. 36 |
7.6.1 Formatting the Dictionary .................... 37 |
7.6.2 The Disassembler ............................. 38 |
8. Unit Libraries .......................................... 41
8.1 Library Structure .................................. 41
8.2 The TPUMOVER Utility ............................... 41
9. Application Notes ....................................... 41
10. Acknowledgements ....................................... 42
11. References ............................................. 43
- ii -
Inside TURBO Pascal 5.5 Units
This document is the outcome of an inquiry conducted into the
structure and content of Borland Turbo Pascal (Version 5.5) Unit
files. The original purpose of the inquiry was to provide a body of
theory enabling Cross-Reference programs to resolve references to
symbols defined in .TPU files where qualification was not explicitly
provided. As is so often the case, one thing led to another and the
scope of the inquiry was expanded dramatically. While this document
should not be regarded as definitive, the author feels that the entire
Turbo Pascal User community might gain from the information extracted
from these files at the cost of so much time and effort.
The material contained herein represents the findings and
interpretations of the author. A great deal of guess-work was
required and no assurances are given as to the accuracy of either the
findings of fact or the inferences contained herein which are the sole
work-product of the author. In particular, the author had access only
to materials or information that any normal Borland customer has
access to. Further, no Borland source-codes were available as the
Library Routine source is not licensed to the author. In short, there
was nothing irregular about how these findings were achieved.
The material contained herein is placed in the public domain free of
copyright for use of the general public at its own risk. The author
assumes no liability for any damages arising from the use of this
material by others. If you make use of this information and you get
burned, TOUGH! The author accepts no obligation to correct any such
errors as may exist in the supplied programs or in the findings of
fact or opinion contained herein. On the other hand, this is not a
"complete" work in that a great many questions remain open, especially
as regards fine details. (The author is not a practitioner of Intel
80xxx Assembly Language and several open questions might best be
addressed by persons competent in this area.) The author welcomes the
input of interested readers who might be able to "flesh-out" some of
these open questions with "hard" answers.
A Turbo Pascal Unit file (Version 5.5 only) consists of an array of
bytes that is some exact multiple of sixteen (16). "Signature"
information allows the compiler to verify that the .TPU file was
compiled with the correct compiler version and to verify that the file
is of the correct size. The fine structure of the file will be
addressed in later sections at ever increasing levels of detail.
Rev: August 11, 1990 Page 3
Inside TURBO Pascal 5.5 Units
Graphically, the file may be regarded as having the following general
| Unit Header | Main Index to Unit File
| Dictionaries: |
| a) Interface |
| b) Debugger * | For Local Symbol Access
| PROC Map |
| CSeg Map * | May be Empty
| CONST DSeg Map * | May be Empty
| VAR DSeg Map * | May be Empty
| Donor Units * | May be Empty
| Source Files |
| Trace Table * | May be Empty
| CODE Segment(s) * | May be Empty
| DATA Segment(s) * | May be Empty
| RELO Data * | May be Empty
Units prepared by the compiler available to ordinary users have a very
straight-forward appearance and content. There may even be a little
"wasted" space that might be removed if the compiler were just a
little cleverer. The SYSTEM.TPU file is quite another thing however.
The SYSTEM.TPU file (found in TURBO.TPL) is extraordinary in that
great pains seem to have been taken to compact it. Further, it
contains a great many types of entries that just don't seem to be
achievable by ordinary users and I suspect that much (if not all) of
it was "hand-coded" in Assembler Language.
In the following sections, the details of these optimizations will be
explained in the context of the structural element then under
Rev: August 11, 1990 Page 4
Inside TURBO Pascal 5.5 Units
The data in these files has need of structure and organization to
support efficient access by the various programs such as the compiler,
the linker and the debugger. This organization is built on a solid
foundation of locators employed in the unit's data structures.
Local Links (LL's) are items of type WORD (2 bytes) which contain an
offset which is relative to the origin of the unit file itself. This
implies that a unit must be somewhat less than 64K bytes in size. If
the .TPU file is loaded into the heap, then LL's can be used to locate
any byte in the segment beginning with the load point of the file.
Global Links (LG's) are used to locate type descriptors which may
reside in other Units (i.e., units external to the present unit).
LG's are structured items consisting of two (2) words. The first of
these is an LL that is relative to the origin of the (possibly)
external unit. The second word is an LL which locates the stub of the
unit entry in the current unit dictionary for the (possibly) external
unit. This dictionary entry provides the name of the unit that
contains the item the LG points to.
This provides a handy mechanism for locating type descriptors which
are defined in other separately compiled units.
Finally, various data-structures within a .TPU file are organized as
arrays of fixed-length records or as lists of variable-length records.
Efficient access to such records is achieved by means of offsets
rather than subscripts (an addressing technique denied Pascal). These
offsets are relative to the origin of the array or list being
referenced rather than the origin of the unit.
Rev: August 11, 1990 Page 5
Inside TURBO Pascal 5.5 Units
The Unit Header comprises the first 64 bytes of the .TPU file. It
contains LL's that effectively locate all other sections of the .TPU
file plus statistics that enable a little cross-checking to be
performed. Some parts of the Unit Header appear to be reserved for
future use since no unit examined by this author has ever contained
non-zero data in these apparently reserved fields.
The Unit Header provides a high-level locator table whereby each major
structure in the unit file can be addressed. The following provides a
Pascal-like explanation of the layout of the header followed by
further narrative discussion of the contents of the individual fields
in the Unit Header.
Type HdrAry = Array[0..3] of Char; LL = Word;
UnitHeader = Record
FilHd : HdrAry; { +00 : = 'TPU6' }
Fillr : HdrAry; { +04 : = $00000000 }
UDirE : LL; { +08 : to Dictionary Head-This Unit }
UGHsh : LL; { +0A : to Interface Hash Header }
UHPrc : LL; { +0C : to PROC Map }
UHCsg : LL; { +0E : to CSeg Map }
UHDsT : LL; { +10 : to DSeg Map-Typed CONST's }
UHDsV : LL; { +12 : to DSeg Map-GLOBAL Variables }
URULt : LL; { +14 : to Donor Unit List }
USRCF : LL; { +16 : to Source file List }
UDBTS : LL; { +18 : to Debug Trace Step Controls }
UndNC : LL; { +1A : to end non-code part of Unit }
ULCod : Word; { +1C : Size of Code }
ULTCon: Word; { +1E : Size of Typed Constant Data }
ULPtch: Word; { +20 : Size of Relo Patch List }
Unknx : Word; { +22 : Number of Virtual Objects??? }
ULVars: Word; { +24 : Size of GLOBAL VAR Data }
UHash2: LL; { +26 : to Debug Hash Header }
UOvrly: Word; { +28 : Number of Procs to Overlay?? }
UVTPad: Array[0..10]
of Word; { +2A : Reserved for Future Expansion? }
End; { UnitHeader }
FilHd contains the characters "TPU6" in that order. This is
clear evidence that this unit was compiled by Turbo Pascal
Version 5.5.
Fillr is apparently reserved and contains binary zeros.
Rev: August 11, 1990 Page 6
Inside TURBO Pascal 5.5 Units
UDirE contains an LL (WORD) which points to the Dictionary
Header in which the name of this unit is found.
UGHsh contains an LL (WORD) which points to a Hash table that is
the root of the Interface Dictionary tree.
UHPrc contains an LL (WORD) which points to the PROC Map for
this unit. The PROC Map contains an entry for each
Procedure or Function declared in the unit (except for
INLINE types), plus an entry for the Unit Initialization
section. The length of the PROC Map (in bytes) is
determined by subtracting this LL (at 000C) from the LL at
offset 000E.
UHCsg contains an LL (WORD) which points to the CSeg (CODE
Segment) Map for this unit. The CSeg Map contains an
entry for each CODE Segment produced by the compiler plus
an entry for each of the CODE Segments included via the
{$L filename.OBJ} compiler directive. The length of this
Map (in bytes) is obtained by subtracting this LL (at
000E) from the word at 0010. The result may be zero in
which case the CSeg Map is empty.
UHDsT contains an LL (WORD) which points to the DSeg (DATA
Segment) Map that maps the initializing data for Typed
CONST items plus templates for VMT's (Virtual Method
Tables) that are associated with OBJECTS which employ
Virtual Methods. The length of this Map (in bytes) is
obtained by subtracting this LL (at 0010) from the word at
0012. The result may be zero in which case this DSeg Map
is empty.
UHDsV contains an LL (WORD) which points to the DSeg (DATA
Segment) Map that contains the specifications for DSeg
storage required by VARiables whose scope is GLOBAL. The
length of this Map (in bytes) is obtained by subtracting
this LL (at 0012) from the word at 0014. The result may
be zero in which case this DSeg Map is empty.
URULt contains an LL (WORD) which points to a table of units
which contribute either CODE or DATA Segments to the .EXE
file for a program using this Unit. This is called the
"Donor Unit Table". The length of this table (in bytes)
is obtained by subtracting this LL (at 0014) from the word
at 0016. The result may be zero in which case this table
is empty.
USRCF contains an LL (WORD) which points to a list of "source"
files. These are the files whose CODE or DATA Segments
are included in this Unit by the compiler. Examples are
the Pascal Source for the Unit itself, plus the .OBJ files
included via the {$L filename.OBJ} compiler directive.
The length of this table (in bytes) is obtained by
subtracting this LL (at 0016) from the word at 0018. The
result may be zero in which case this table is empty.
Rev: August 11, 1990 Page 7
Inside TURBO Pascal 5.5 Units
UDBTS contains an LL (WORD) which points to a Trace Table used
by the DEBUGGER for "stepping" through a Function or
Procedure contained in this Unit. The length of this
table (in bytes) is obtained by subtracting this LL (at
0018) from the word at 001A. The result may be zero in
which case this table is empty.
UndNC contains an LL (WORD) which points to the first free byte
which follows the Trace Table (if any). It serves as a
delimiter for determinimg the size of the Trace Table.
This LL (when rounded up to the next integral multiple of
16) serves to locate the start of the code/data segments.
ULCod is a WORD that contains the total byte count of all CODE
Segments compiled into this Unit.
ULTCon is a WORD that contains the total byte count of all Typed
CONST and VMT DATA Segments compiled into this unit.
ULPtch is a WORD that contains the total byte count of the
Relocation Data Table for this unit.
Unknx is a WORD whose usage is poorly understood. It appears
always to be zero except when the Unit contains OBJECTs
which employ Virtual Methods.
ULVars is a WORD that contains the total byte count of all GLOBAL
VAR DATA Segments compiled into this unit.
UHash2 contains an LL (WORD) which points to a Hash Table which
is the root of the DEBUGGER Dictionary. If Local Symbols
were generated by the compiler (directive {$L+}) then ALL
symbols declared in the unit can be accessed from this
Hash Table. In the SYSTEM.TPU file, there is no such
Dictionary and the LL stored here points to the INTERFACE
Dictionary. This is an example of Hash Table "Folding" to
save space which has been observed only in SYSTEM.TPU.
UOvrly is a WORD whose usage is poorly understood. This word is
usually zero unless the Unit was compiled with the Overlay
Directive {$O+}.
UVTPad begins a series of eleven (11) words that are apparently
reserved for future use. Nothing but zeros have ever been
seen here by this author.
Rev: August 11, 1990 Page 8
Inside TURBO Pascal 5.5 Units
An independent check on the size of the .TPU file is available using
information contained in the Unit Header. This is also important for
.TPL (Unit Library) organization. To compute the file size, refer to
the four (4) words at offsets 001A, 001C, 001E and 0020. Round the
contents of each of these words to the lowest multiple of 16 that is
greater than or equal to the content of that word. Then form the sum
of the rounded words. This is the .TPU file size in bytes.
This area contains all available documentation of declared symbols and
procedure blocks defined within the unit. Depending on compiler
options in effect when the unit was compiled, this section will
contain at a minimum, the INTERFACE declarations, and at a maximum,
ALL declarations. The information stored in the dictionary is highly
dependent on the context of the symbol declared. We defer further
explanation to the appropriate section which follows.
The dictionary is organized with a Hash Table as its root. The hash
table is used to provide rapid access to arbitrary symbols. Since
Turbo Pascal compiles very rapidly, I presume the hash function to be
worthwhile to say the least.
The dictionary itself may be thought of as an n-way tree. Each
subtree has its roots in a hash table. There may be a great many hash
tables in a given unit and their number depends on unit complexity as
well as the options chosen when the unit was compiled. Use of the
{$L+} directive produces the densest trees. The hash tables are
explained in detail a few sections further on.
Hash tables point to Dictionary Headers. When two or more symbols
produce the same hash function result, a collision is said to occur.
Collisions are resolved by the time-honored method of chaining
together the Dictionary Headers of those symbols having the same hash
function result. Dictionary supersetting is accomplished using these
Rev: August 11, 1990 Page 9
Inside TURBO Pascal 5.5 Units
The INTERFACE dictionary contains all symbols and the necessary
explanatory data for the INTERFACE section of a Unit. Symbols get
added to the Unit using increasing storage addresses until the
IMPLEMENTATION section is encountered.
The DEBUG dictionary (if present) is a superset of the INTERFACE
dictionary. It is used by the Turbo Debugger to support its many
features when tracing through a unit. If present, this dictionary is
rooted in its own hash table. The hash table is effectively
initialized when the IMPLEMENTATION keyword is processed by the
compiler. This takes the form (initially) of an unmodified copy of
the INTERFACE hash table, to which symbols are added in the usual
fashion. Thus, the hash chains constructed or extended at this time
lead naturally to the INTERFACE chains and this is how the superset is
effectively implemented.
The dictionary contains four major elements. These are: hash tables,
Dictionary Headers, Dictionary Stubs and Type Descriptors. The
distinction between Dictionary Headers and Stubs is essentially
arbitrary and is made in this document to assist in exposition. They
might just as easily be regarded as a single element (such as symbol
As has been intimated, Hash Tables are the glue that binds the
dictionary entries together and gives the dictionary its "shape".
They effectively implement the scope rules of the language and speed
access to essential information.
Each Hash table begins with a 2-byte size descriptor. This descriptor
contains the number of bytes in the table proper (less 2). Thus, the
descriptor directly points to the last bucket in the hash table. For
a hash table of 128 bytes, the size descriptor contains 126. The
first bucket in the table immediately follows the size descriptor.
Rev: August 11, 1990 Page 10
Inside TURBO Pascal 5.5 Units
---------------------------------------------------------------------- SIZE
So far, three different hash table sizes have been observed. The
INTERFACE and DEBUG hash tables are usually 128 bytes (64 entries) in
size plus 2 bytes of size description, but the SYSTEM.TPU unit is a
special case, containing only 16 entries. Hash tables which anchor
subtrees whose scope is relatively local usually contain four (4)
entries (8 bytes).
Graphically, a Hash Table with four slots has the following layout:
| 0006h | Size Descriptor
| slot 0 | an LL or zero
| slot 1 | an LL or zero
| slot 2 | an LL or zero
| slot 3 | an LL or zero
It should be noted that the Size Descriptor furnishes an upper bound
for the hash function itself. Thus, it seems possible that a single
hash function is used for all hash tables and that its result is ANDed
with the Size Descriptor to get the final result. Because the sizes
are chosen as they are (powers of 2) this is feasible. Note that in
the above example, 6 = 2 * (n - 1) where n = 4 {slot count}. All of
the hash tables observed so far have this property. What you get is a
really efficient MOD function.
Suppose that the hash of a given symbol is 13 and the proper slot must
be located for a hash table of four entries. If we let "h" be the raw
result of 13, then our final hash is (h SHL 1) AND ((4-1) SHL 1) or
(13 SHL 1) AND 6 = 2 !
One final note on this subject. Given these properties, "Folding" of
sparse hash tables is a rather trivial exercise so long as the new
hash table also contains a number of slots that is a power of 2. This
point is intriguing when one recalls that the SYSTEM.TPU hash table
has only 16 slots rather than the usual 64.
Rev: August 11, 1990 Page 11
Inside TURBO Pascal 5.5 Units
---------------------------------------------------------------------- SCOPE
The INTERFACE and DEBUG dictionary hash tables are Global in Scope
even though the symbols accessed directly via the DEBUG hash table may
be private. On the other hand, other hash tables are purely local in
scope. For example, the fields declared within a record are reached
via a small local hash table, as are the parameters and local
variables declared within procedures and functions. Even OBJECTS use
this technique to provide access to Methods and Object Fields.
Access to such local scope fields/methods requires use of qualified
names which ensures conformity to Pascal scope rules. The method is
truly simple and elegant. SPECIAL CASES
The SYSTEM.TPU Unit is a special case. Its INTERFACE and DEBUG hash
tables have apparently been "hand-tuned" for small size. Each
contains only sixteen (16) entries. In addition, the DEBUG hash table
is empty since there is no local symbol generation in this unit.
Therefore, the DEBUG hash table does not exist as a separate entity,
its function being served by the INTERFACE hash table. The pointer to
the DEBUG hash table (in the Unit Header) has the same value as the
pointer to the INTERFACE hash table (SYSTEM unit ONLY).
Rev: August 11, 1990 Page 12
Inside TURBO Pascal 5.5 Units
This is the structure that anchors all information known by the
compiler about any symbol. The format is as follows:
+00: An LL which points to the next (previous) symbol in the
same scope which had the same hash function value.
+02: A character that defines the category the symbol belongs
to and defines the format of the Dictionary Stub which
follows the Dictionary Header.
+03: A String (in the Pascal sense) of variable size that
contains the text of the symbol (in UPPER-CASE letters
only). The SizeOf function is not defined for these
strings since they are truncated to match the symbol size.
The "value" of the SizeOf function can be determined by
adding 1 to the first byte in the string. Thus,
Ord(Symbol[0])+1 is the expression that defines the Size
of the symbol string. Turbo Pascal defines a symbol as a
string of relatively arbitrary size, the most significant
63 characters of which will be stored in the dictionary.
Thus, we conclude that the maximum size of such a string
is 64 bytes.
Dictionary Stubs immediately follow their respective headers and their
format is determined by the category character in the Dictionary
Header. The function of the stub is to organize the information
appropriate to the symbol and provide a means of accessing additional
information such as type descriptors, constant values, parameter lists
and nested scopes. The format of each Stub is presented in the
following sub-sections. LABEL DECLARATIVES ("O")
This Stub consists of a WORD whose function is (as yet) unknown.
Rev: August 11, 1990 Page 13
Inside TURBO Pascal 5.5 Units
---------------------------------------------------------------------- UN-TYPED CONSTANTS ("P")
This Stub consists of (2) two fields:
+00: An LG which points to a Type Descriptor (usually in
SYSTEM.TPU). This establishes the minimum storage
requirement for the constant. The rules vary with the
type, but the size of the constant data field (which
follows) is defined using the Type Descriptor(s).
+04: The value of the constant. For ordinal types, this value
is stored as a LONGINT (size=4 bytes). For Floating-Point
types, the size is implicit in the type itself. For
String types, the size is determined from the length of
the string which is stored in the initial byte of the
constant. NAMED TYPES ("Q")
This Stub consists of an LG (4-bytes) that points to the Type
Descriptor for this symbol.
Rev: August 11, 1990 Page 14
Inside TURBO Pascal 5.5 Units
---------------------------------------------------------------------- VARIABLES, FIELDS, TYPED CONS ("R")
This Stub contains information required to allocate and describe these
types of entities. The format and content is as follows:
+00: A one-byte flag that precisely identifies the class of the
item being described. The known values and their proper
interpretation is as follows:
0 -> Global Variables Allocated in DS;
1 -> Typed Constants Allocated in DS;
2 -> LOCAL Variables & VALUE Parameters on STACK;
6 -> ADDRESS Parameters allocated on STACK;
8 -> Fields suballocated in RECORDS and OBJECTS, plus
METHODS declared for OBJECTS.
+01: A WORD containing the allocation offset in bytes;
+03: A WORD whose content depends on the one-byte flag that
this stub begins with. The context-dependent values
observed thus far are:
If the flag is 0, 2 or 6, then this word is an LL that
locates the containing scope or zero if none;
If the flag is 8, then this word is an LL that locates the
Dictionary Header for the next field or method defined
within the Record or Object;
If the flag is 1, then this word is an offset within the
CONST DSeg Map that locates the text of the Typed Constant
+05: An LG that locates the proper Type Descriptor for this
Rev: August 11, 1990 Page 15
Inside TURBO Pascal 5.5 Units
---------------------------------------------------------------------- SUBPROGRAMS & METHODS ("S")
Subprograms, especially since Object Methods are supported, have a
rather involved stub. Its format is as follows:
+00: A byte that contains bit-switches. These bit switches
have a great deal to do with the size of this stub and
with the proper interpretation of what follows. The
observed values of the bit-switches are as follows:
xxxxxxx1 -> Symbol declared in INTERFACE;
xxxxxx1x -> Symbol is an INLINE Declarative;
xxxx1x0x -> Symbol has EXTERNAL attribute;
x001xxxx -> Symbol is an ordinary Object Method;
x011xxxx -> Symbol is a CONSTRUCTOR Method;
x101xxxx -> Symbol is a DESTRUCTOR Method;
+01: A Word whose interpretation depends on whether we have an
INLINE Declarative Subprogram or not. If this is an
INLINE Declarative Subprogram, then this word contains the
byte-count of the INLINE code text at the end of this
stub. Otherwise, this word is the offset within the PROC
Map that locates the object code for this Subprogram.
+03: A Word that contains an LL which locates the containing
scope in the dictionary, or zero if none.
+05: A Word that contains an LL which locates the local Hash
Table for this scope. A local hash table provides access
to all formal parameters of the Subprogram as well as all
Symbols whose declarations are local to the scope of this
+07: A Word that is zero unless the symbol is a Virtual Method.
In this case, then the content is the offset within the
VMT for the owning object that defines where the FAR
POINTER to this Virtual Method is stored.
+09: A Word that is zero unless the symbol is a Method. In
this case, then the content is an LL which locates the
next METHOD for this Object.
+0B: A complete Type-Descriptor for this Subprogram. The
length is variable and depends upon the number of Formal
Parameters declared in the header. A complete description
of this subfield is found in a later section
+??: If this Symbol represents an INLINE Declarative
Subprogram, then the object-code text begins here. The
byte-count of the text occurs at offset 0001h in this
Rev: August 11, 1990 Page 16
Inside TURBO Pascal 5.5 Units
---------------------------------------------------------------------- TURBO STD PROCEDURES ("T")
This Stub consists of two bytes, the first of which is unique for each |
procedure and increments by 4. I have found nothing in the SYSTEM |
unit (which is where this entry appears) that this seems directly |
related to. The second byte is always zero. | TURBO STD FUNCTIONS ("U")
This Stub consists of two bytes, the first of which is unique for each |
function and increments by 4. I have found nothing in the SYSTEM unit |
(which is where this entry appears) that this seems directly related |
to. I wouldn't be surprised if this byte were an index into a TURBO |
compiler table that points to specialized parse tables/action routines |
for handling these functions and their non-standard parameter lists. |
The second byte seems to be a flag having the values $00, $40 and $C0. |
I strongly suspect that the flag $C0 marks exactly those functions |
which may be evaluated at compile-time. The meaning behind the other |
values is not known to me. | TURBO STD "NEW" ROUTINE ("V")
This Stub consists of a WORD whose function is (as yet) unknown. This |
is the only Standard Turbo routine that can behave as a procedure as |
well as a function (returning a pointer value). | TURBO STD PORT ARRAYS ("W")
This Stub consists of a byte whose value is 0 for byte arrays, and 1
This Stub consists of an LG (4-bytes) that points to the Type
Descriptor for this symbol.
Rev: August 11, 1990 Page 17
Inside TURBO Pascal 5.5 Units
---------------------------------------------------------------------- UNITS ("Y")
Unit Stubs have the following content:
+00: A Word whose apparently reserved for use by the Compiler
or Linker.
+02: A Word that seems to contain some kind of "signature" used
to detect inconsistent Unit Versions. This author
suspects that this consists of some kind of sum-check or
hash total but has not yet identified the algorithm which
computes the value stored in this word.
+04: A Word that contains an LL which locates the Successor
Unit in the "Uses" list. In fact, the "Uses" lists of
both the INTERFACE and IMPLEMENTATION sections of the Unit
are merged by this Word into a single list. A value of
zero is used to indicate no successor.
+06: A Word that contains an LL which locates the Predecessor
Unit in the "Uses" list. For the SYSTEM unit entry, this
value is always zero to indicate no predecessor. For the
Unit being compiled, this LL locates the final Unit in the
combined "Uses" list.
In effect, the two LL's at offsets 0004 and 0006 organize the units
into both forward and backward linked chains. The entry for the unit
being compiled is effectively the head of both the forward and the
backward chains. The final unit in the merged "Uses" list is the tail
of the forward chain, and the SYSTEM unit is the tail of the backward
Rev: August 11, 1990 Page 18
Inside TURBO Pascal 5.5 Units
Type Descriptors store much of the semantic information that applies
to the symbols declared in the unit. Implementation details can be
managed using high-level abstractions and these abstractions can be
shared. SCOPE
Type Descriptor sharing can occur across the boundaries which are
implicit in unit modules. Thus, a type defined in one unit may be
"imported" by some other module. Also, the pre-defined Pascal Types
(plus the Turbo Pascal extensions) are defined in the SYSTEM.TPU unit
and there needs to be a means of "importing" such Type Descriptors
during compilation. This is precisely the objective of the LG locator
which was described in section 2.2 (above). Type Descriptors are
NEVER copied between units. The binding always occurs by reference at
compile time and this helps support the technique of modifying a unit
and compiling it to a .TPU file, then re-compiling all units/programs
that "USE" it.
Type Descriptors have many roles so their format varies. We have
divided these structures into two parts: The PREFIX Part (which is
always present and) whose format is fairly constant and the SUFFIX
Part whose content and format depends on the attributes that are part
of the type definition.
Rev: August 11, 1990 Page 19
Inside TURBO Pascal 5.5 Units
---------------------------------------------------------------------- PREFIX PART
The Prefix Part of every Type Descriptor consists of four (4) bytes.
The usage is consistent for all types observed by this author and the
format is as follows:
+00: A Byte that identifies the format of the Suffix part.
This is essentially based on several high-level categories
which the Suffix Parts support directly. The observed set
of values is as follows:
00h -> an un-typed entity;
01h -> an ARRAY type;
02h -> a RECORD type;
03h -> an OBJECT type;
04h -> a FILE type (other than TEXT);
05h -> a TEXT File type;
06h -> a SUBPROGRAM type;
07h -> a SET type;
08h -> a POINTER type;
09h -> a STRING type;
0Ah -> an 8087 Floating-Point type;
0Bh -> a REAL type;
0Ch -> a Fixed-Point ordinal type;
0Dh -> a BOOLEAN type;
0Eh -> a CHAR type;
0Fh -> an Enumerated ordinal type.
+01: A Byte used as a modifier. Since the above scheme is too
general for machine-dependent details such as storage
width and sign control, this modifier byte supplies
additional data as required. The author has identified
several cases in which this information is vital but has
not spent very much time on the subject. The chief areas
of importance seem to be in the 8087 Floating-Point types,
and the Fixed-Point ordinal types. The semantics seem to
be as follows:
0A 00 -> The type "SINGLE"
0A 02 -> The type "EXTENDED"
0A 04 -> The type "DOUBLE"
0A 06 -> The type "COMP"
0C 00 -> an un-named BYTE integer
0C 01 -> The type "SHORTINT"
0C 02 -> The type "BYTE"
0C 04 -> an un-named WORD integer
0C 05 -> The type "INTEGER"
0C 06 -> The type "WORD"
0C 0C -> an un-named double-word integer
0C 0D -> The type "LONGINT"
Rev: August 11, 1990 Page 20
Inside TURBO Pascal 5.5 Units
One important feature of the above semantics is the fact
that an un-typed CONST declaration refers to the above two
bytes to determine the storage space needed in the
dictionary for the data value of the constant. This can
be a little involved however as the constant may contain
its own length descriptor (as in the case of a character
string) in which case it may be sufficient to identify
the high-level type category without any modifier byte.
+02: A Word that contains the number of bytes of storage that
are required to contain an object/entity of this type.
For types that represent variable-length objects/entities
such as strings, this word may define the value returned
by the SIZEOF function as applied to the type. SUFFIX PARTS
Suffix Parts further refine the implementation details of the type and
also provide subrange constraints where appropriate. In some cases
the Suffix part is empty since all semantic data for the type is
contained in the Prefix part. UN-TYPED
This Suffix Part is empty. Nothing is known about an un-typed entity.
Rev: August 11, 1990 Page 21
Inside TURBO Pascal 5.5 Units
---------------------------------------------------------------------- STRUCTURED TYPES
The structured types represent aggregates of lower-level types. We
types in this category. ARRAY TYPES
The Suffix Part of the ARRAY type is so constructed as to be able to
support recursive or nested definition of arrays. The suffix format
is as follows:
+00: An LG that locates the Type Descriptor for the "base-type"
of the array. This is the type of the entity being
arrayed and may itself be an array.
+04: An LG that locates the Type Descriptor for the array
bounds which is a constrained ordinal type or subrange. RECORD TYPES
RECORD types have nested scopes. The Suffix part provides a base
structure by which to locate the fields local to the scope of the
Record type itself. The format is as follows:
+00: A Word containing an LL which locates the local Hash Table
that provides access to the fields in the nested scope.
+02: A Word containing an LL which locates the Dictionary
Header of the initial field in the nested scope. This
supports a "left-to-right" traversal of the fields in a
Rev: August 11, 1990 Page 22
Inside TURBO Pascal 5.5 Units
---------------------------------------------------------------------- OBJECT TYPES
OBJECT types also have nested scopes. The Suffix part provides a base
structure by which to locate the fields and METHODS local to the scope
of the OBJECT type itself. In addition, inheritance and VMT
particulars are stored. The format is as follows:
+00: A Word containing an LL which locates the local Hash Table
that provides access to the fields and METHODS local to
the nested scope.
+02: A Word containing an LL which locates the Dictionary
Header of the initial field or METHOD in the nested scope.
This supports a "left-to-right" traversal of the fields
+04: An LG which locates the Type Descriptor of the Parent
Object. This field is zero if there is no such Parent.
+08: A Word which contains the size in bytes of the VMT for
this Object. This field is zero if the object employs no
Virtual Methods.
+0A: A Word which contains the offset within the CONST DSeg Map
that locates the VMT skeleton or template segment. This
field equals FFFFh if the object employs no Virtual
+0C: A Word which contains the offset within an Object instance
where the NEAR POINTER to the VMT for the object is stored
(within the DATA SEGMENT). This field equals FFFFh if the
object employs no Virtual Methods.
+0E: A Word which contains an LL which locates the Dictionary
Header for the name of the OBJECT itself. FILE (NON-TEXT) TYPES
This Suffix consists of an LG that locates the Type Descriptor of the
base type of the file. Note that the Type Descriptor may be that of
an un-typed entity (for un-typed files). TEXT FILE TYPES
This Suffix consists of an LG that locates the Type Descriptor of the
base type of the file -- in this case SYSTEM.CHAR.
Rev: August 11, 1990 Page 23
Inside TURBO Pascal 5.5 Units
---------------------------------------------------------------------- SET TYPES
This Suffix consists of an LG that locates the base-type of the set
itself. Pascal limits such entities to simple ordinals whose
cardinality is limited to 256. POINTER TYPES
This Suffix consists of an LG that locates the base-type of the entity
pointed at. STRING TYPES
This is a special case of an ARRAY type. The format is as follows:
+00: An LG to the Type Descriptor SYSTEM.CHAR which is the base
type of all Turbo Pascal Strings.
+04: An LG to the Type Descriptor for the array bounds
constraints for the string. FLOATING-POINT TYPES
The Suffix part for all Floating-Point types is EMPTY. All data
needed to specify these approximate number types is contained in the
Prefix part. The Types included in this class are SINGLE, DOUBLE,
The Ordinal Types consist of the various "integer" types plus the
BOOLEAN, CHAR and Enumerated types.
Rev: August 11, 1990 Page 24
Inside TURBO Pascal 5.5 Units
---------------------------------------------------------------------- "INTEGERS"
These types include BYTE, SMALLINT, WORD, INTEGER and LONGINT. Their
Suffix parts are identical in format:
+00: A double-word containing the LOWER bound of the subrange
constraint on the type;
+04: A double-word containing the UPPER bound of the subrange
constraint on the type;
+08: An LG that locates the Type Descriptor of the largest
upward compatible type. This is the Type Descriptor that
is used to control the width of an un-typed constant in
the dictionary stub. For the "integer" types, this is an
This type Suffix has the following format:
+00: A double-word containing the LOWER bound of the subrange
constraint on the type;
+04: A double-word containing the UPPER bound of the subrange
constraint on the type;
+08: An LG that locates the Type Descriptor SYSTEM.BOOLEAN.
There is no "upward compatible" type. CHARS
This type Suffix has the following format:
+00: A double-word containing the LOWER bound of the subrange
constraint on the type;
+04: A double-word containing the UPPER bound of the subrange
constraint on the type;
+08: An LG that locates the Type Descriptor SYSTEM.CHAR. There
is no "upward compatible" type.
Rev: August 11, 1990 Page 25
Inside TURBO Pascal 5.5 Units
---------------------------------------------------------------------- ENUMERATIONS
This type Suffix is unusual and has the following format:
+00: A double-word containing the LOWER bound of the subrange
constraint on the type;
+04: A double-word containing the UPPER bound of the subrange
constraint on the type;
+08: An LG that locates the Prefix of the current Type
Descriptor. There is no upward compatible type.
What follows is a full-fledged SET Type Descriptor whose base type is
the Type Descriptor of the Enumerated Type itself. The author has not
yet discovered the reason for this. SUBPROGRAM TYPES
The length of this Suffix is variable. The format is as follows:
+00: An LG that locates the Type Descriptor of the FUNCTION
result returned by the Subprogram. This field is zero if
the Subprogram is a PROCEDURE.
+04: A Word that contains the number of Formal Parameters in
the Function/Procedure header. If non-zero, then this
word is followed by the parameter list itself as a simple
array of parameter descriptors.
The format of a parameter descriptor is as follows:
0000: An LG that locates the Type Descriptor of the
corresponding parameter;
0004: A Byte that identifies the parameter passing
mechanism used for this entry as follows:
02h -> VALUE of parameter is passed on STACK,
06h -> ADDRESS of parameter is passed on STACK.
Rev: August 11, 1990 Page 26
Inside TURBO Pascal 5.5 Units
The "MAPS and LISTS" are not part of the symbol dictionary. Rather,
these structures provide access to the Code and Data Segments produced
by the compiler or included via the {$L name.OBJ} directive. The
format and purpose (as understood by this author) of each of these
tables is explained in the following sections.
The PROC Map provides a means of associating the various Function and
Procedure declarations with the Code Segments. There is some evidence
that the Compiler produces CODE (and DATA) Segments for EACH of the
Subprograms defined in the Unit as well as for the un-named Unit
Initialization code block. There is also evidence that EXTERNAL PROCs |
must be assembled separately in order to exploit fully the Turbo
"Smart Linker" since Turbo Pascal places some significant restrictions
on EXTERNAL routines in the area of Segment Names and Types.
Specifically, only code segments named "CODE" and data segments named
"DATA" will be used by the "Smart Linker" as sources of code and data
for inclusion in a Turbo Pascal .EXE file.
The first entry in the PROC Map is reserved for Unit Initialization
block. If there is no Unit Initialization block, this entry will be |
filled with $FF. In addition, each and every PROC in the Unit has an |
entry in this table.
If an EXTERNAL routine is included, then ALL PUBLIC PROC definitions
in that routine must be declared in the Unit Source Code with the
EXTERNAL attribute.
The size of the PROC Map Table (in Bytes) is implied in the Unit
Header by the LL's that occur at offsets +0C and +0E.
The Format of a single PROC Map Entry is as follows:
+00: A Word that contains an offset within the CSeg Map. This
is used to locate the code segment containing the PROC.
+02: A Word that contains an offset within the CODE Segment
that defines the PROC entry point relative to the load
point of the referenced CODE Segment.
Rev: August 11, 1990 Page 27
Inside TURBO Pascal 5.5 Units
The CSeg Map provides a convenient descriptor table for each CODE
Segment present in the Unit and serves to relate these segments with
the Segment Relocation Data and the Segment Trace Table. It seems
reasonable to infer that the "Smart Linker" is able to include/exclude
code/data at the SEGMENT level only.
The CSeg Map is an array of fixed-length records whose format is as
+00: A Word apparently reserved for use by TURBO.
+02: A Word that contains the Segment Length (in bytes).
+04: A Word that contains the Length of the Relocation Data
Table for this Code Segment (in bytes).
+06: A Word that contains the offset of the Trace Table Entry
for this Segment (if it was compiled with DEBUG Support).
If there is no Trace Table for this segment, then this
Word contains FFFFh.
The CONST DSeg Map provides a convenient descriptor table for each
DATA Segment present in the Unit which was spawned by the presence of
Typed Constants or VMT's in the Pascal Code. It serves to relate
these segments with the Segment Relocation Data and with the Code
Segments that refer to these DATA elements.
The CONST DSeg Map is an array of fixed-length records whose format is
as follows:
+00: A Word apparently reserved for use by TURBO.
+02: A Word that contains the Segment Length (in bytes).
+04: A Word that contains the Length of the Relocation Data
Table for this DATA Segment (in bytes).
+06: A Word that contains an LL which locates the OBJECT that
owns this VMT skeleton or zero if the segment is not a VMT
It is possible to determine the containing scope for a Typed Constant
declaration but -- unless it is for a VMT -- the job is a bit tedious.
Essentially, one has to search the Symbol Dictionary for a declaration
whose offset points to a given entry and the complete path to that
symbol must be recorded. Our program doesn't do this but it can be
done if the required dictionary entries are present.
Rev: August 11, 1990 Page 28
Inside TURBO Pascal 5.5 Units
The VAR DSeg Map provides a convenient descriptor table for each DATA
Segment present in the Unit.
One entry exists for each CODE segment which refers to GLOBAL VAR's
allocated in the DATA Segment. These references may be seen in the
Relocation Data Table. Each EXTERNAL CSeg having a segment named DATA
also spawns an entry in this table. Only the Code Segments that meet
these criteria cause entries to be generated in the VAR Dseg Map.
The VAR DSeg Map is an array of fixed-length records whose format is
as follows:
+00: A Word apparently reserved for use by TURBO.
+02: A Word that contains the Segment Length (in bytes). This
may be zero, especially if the EXTERNAL routine contains a
DATA segment whose sole purpose is to declare one or more
EXTRN symbols that are defined in some DATA segment
external to the Assembly.
+04: A Word apparently reserved for use by TURBO.
+06: A Word apparently reserved for use by TURBO.
To determine the identity of the CSeg that owns some particular entry
in this table, examine the Relocation Data for ALL CSegs. Each CSeg
which makes reference to a DATA segment has an entry in this table.
This list contains an entry for each Unit (taken from the "USES" list)
which MAY contribute either CODE or DATA to the executable file. Not
all units do make such a contribution as some exist merely to define a
collection of Types, etc. A Unit gets into this list if there exists
a single Relocation Data Entry that references CODE or DATA in that
The list is comprised of elements whose SIZE is variable and whose
format is as follows:
+00: A WORD apparently reserved for use by TURBO.
+02: A variable-length String containing the unit name.
Rev: August 11, 1990 Page 29
Inside TURBO Pascal 5.5 Units
This list contains an entry for each "source" file used to compile the
Unit. This includes the Primary Pascal file, files containing Pascal
code included by means of the {$I filename.xxx} compiler directive,
and .OBJ files included by the {$L filename.OBJ} compiler directive.
The order of entries in this list is critical since it maps the CODE
segments stored in the unit. The order of the entries is as follows:
1) The Primary Pascal file;
2) All Included Pascal files;
3) All Included .OBJ files.
Mapping of CSegs to files is done as follows:
a) Each .OBJ file contributes a SINGLE Code Segment (if any).
Note that this author has not observed an .OBJ module that
contains only a DATA Segment (but that seems a distinct
b) The Primary Pascal file (augmented by all included Pascal
Files) contributes zero or more CODE Segments.
Therefore, there are at least as many CSeg entries as .OBJ files. If
more, then the excess entries (those at the front of the list) belong
to the Pascal files that make up the Pascal source for the unit.
The format of an entry in this list is as follows:
+00: A flag byte that indicates the type of file represented;
04h -> the Primary Pascal Source File,
03h -> an Included Pascal Source File,
05h -> an .OBJ file that contains a CODE segment.
+01: A Word apparently reserved for use by the Compiler/Linker.
+03: A Word that is zero for .OBJ files and which contains the
file directory time-stamp for Pascal Files.
+05: A Word that is zero for .OBJ files and which contains the
file directory date-stamp for Pascal Files.
+07: A variable-sized string containing the filename and
extension of the file used during compilation.
Rev: August 11, 1990 Page 30
Inside TURBO Pascal 5.5 Units
If Debug support was selected at compile time, then all Pascal code
which supports Debugging produces an entry in this table. The table
entries themselves are variable in size and have the following format:
+00: A Word which contains an LL that locates the Directory
Header of the Symbol (a PROC name) this entry represents.
+02: A Word which contains the offset (within the Source File
List) of the entry that names the file that generated the
CSeg being traced. This allows the file included by means
of the {$I filename} directive to be identified for DEBUG
purposes, as well as code produced from the Primary File.
+04: A Word containing the number of bytes of data that precede
the BEGIN statement code in the segment. For Pascal PROCS
these bytes consist of literal constants, un-typed |
constants, and other data such as range-checking limits, |
+06: A Word containing the Line Number of the BEGIN statement
for the PROC.
+08: A Word containing the number of lines of Source Code to
Trace in this Segment.
+0A: An array of bytes whose size is at least the number of
source code lines in the PROC. Each byte contains the
number of bytes of object code in the corresponding source
line. This appears to be an array of SHORTINT since if a
"line" contains more than 127 bytes, then a single byte of
$80 precedes the actual byte count as a sort of "escape"
and the next byte records the up to 255 bytes for the |
line. This situation has not yet been fully explored. We |
do not yet know what happens in the event a line is |
credited with spawning more than 255 bytes of code. |
Rev: August 11, 1990 Page 31
Inside TURBO Pascal 5.5 Units
This area begins at the start of the next free PARAGRAPH. This means
that its offset from the beginning of the Unit ALWAYS ends in the
digit zero.
This area contains the CODE segments, CONST DATA segments, and the
Relocation Data required for linking.
Each CODE segment included in the unit appears here as specified by
the CSeg Map Table. Depending on usage, these segments may appear in
the executable file. There are no filler bytes between segments.
This section begins at the start of the first free PARAGRAPH following
the end of the Object CSegs. This means that its offset from the
beginning of the Unit ALWAYS ends in the digit zero.
A DATA segment fragment appears here for each CSeg that declares a
typed constant, and for each OBJECT which employs Virtual Methods.
There are no filler bytes between segments.
If local symbols were generated, there is always enough information to
allow documenting the scope of the declaration as well as interpreting
the data in the display since the needed type declarations would also
be available. Our program doesn't go to this extreme however.
Rev: August 11, 1990 Page 32
Inside TURBO Pascal 5.5 Units
This table begins at the start of the first free PARAGRAPH following
the end of the CONST DSegs. This means that its offset from the
beginning of the Unit ALWAYS ends in the digit zero. There are two |
sections in this table: one for code, and one for data. Both |
sections are aligned on paragraph boundaries. This may result in a |
"slack" entry between the code and data sub-sections, but this entry |
is included in the byte tally for the section stored in the Unit |
Header Table at ULPtch (offset +20). |
The table begins with entries for the CSeg Map and ends with entries
for the CONST DSeg Map. The appropriate Map entry specifies the
number of bytes of Relocation Data for the corresponding segment.
This number may be zero in which case there is no Relocation Data for
the given segment. |
The Table consists of an array of eight (8) byte entries whose format
is as follows:
+00: A Byte containing the offset within the Donor Unit List of
the Unit name that this entry refers to. This can be the
compiled Unit or some previously compiled external unit.
+01: A Byte that defines the type of reference being made and
implies the size of the pointer needed (WORD or DWORD).
The known and/or observed values are as follows:
00h -> a WORD refers to a PROC Map.
10h -> a WORD refers to a PROC Map.
20h -> a WORD refers to a PROC Map.
30h -> a DWORD pointer refers to a PROC Map.
50h -> a WORD refers to a CSeg Map.
60h -> a WORD refers to an unknown Map.
70h -> a DWORD pointer refers to a CSeg Map.
90h -> a WORD refers to a VAR DSeg Map.
A0h -> a WORD refers to a DSeg Map for SEG address. |
D0h -> a WORD refers to a CONST DSeg Map.
+02: A Word containing the offset within the Map table
referenced according to the above code scheme.
+04: A Word containing an offset within the target segment
which will be added to the effective address. For
example, a reference to the VAR DSeg Map will require a
final offset to locate the item (variable) within the DATA
SEGMENT being referenced here. This may also be needed
for references to LITERAL DATA embedded in a CODE SEGMENT.
+06: A Word containing the offset within the CODE or DATA
segment owning this entry that contains the area to be |
patched with the value of the final effective address. |
Rev: August 11, 1990 Page 33
Inside TURBO Pascal 5.5 Units
For some truly wild guessing about the flag byte above, the following |
pattern seems to be emerging. Look at bits 7-4 of this byte. It |
appears that the type of Map reference may be coded into bits 7-6 and |
that the size or type of reference may be coded into bits 5-4. Note |
that bits 7-6 are "00" for PROC Map items, "01" for CSeg Map items, |
"10" for Global DSeg Map items, and "11" for Const DSeg Map items. It |
appears that the size or type of reference may be coded into bits 5-4. |
Note that all FAR (DWORD) pointer references show these bits as "11" |
and that a SEGMENT Register value appears as "10" and that WORD values |
otherwise appear as "01" or "00". Further, no type 00h item has been |
seen which has a non-zero effective address adjustment. This all |
seems to suggest the following code structure: |
7654 3210 (bits 3-0 don't seem to be used) |
00-- ---- Locate item via a PROC Map, |
01-- ---- Locate item via a CSeg Map, |
10-- ---- Locate item via a Global DSeg Map, |
11-- ---- Locate item via a Const DSeg Map, |
--00 ---- WORD offset has NO effective address adjustment, |
--01 ---- WORD offset HAS an effective address adjustment, |
--10 ---- WORD is content of a SEGMENT Register such as DS |
or CS. |
--11 ---- DWORD (FAR) pointer is supplied with possible |
effective address adjustment. |
The evidence in support of this conjecture is both slim and vast. It |
all depends on how much data one looks at. I have looked at a lot of |
data from the Borland supplied units and I haven't found anything to |
refute the above. Accordingly, the supplied program interprets this |
flag byte according to this scheme. |
In order that the above information be made constructively useful, the
author has designed a program that automates the process of discovery.
It is not a "handsome" program and it is not a work of art. It does
give useful results provided your PC has enough available memory.
It should be obvious that the program was not designed "top-down".
Rather, it just evolved as each new discovery was made. Later on, it
seemed reasonable to try to document some of the relations between the
various lists and tables and the program tries to make some of these
relations clear, albeit with varying degrees of success.
Rev: August 11, 1990 Page 34
Inside TURBO Pascal 5.5 Units
7.1 TPUNEW |
This is the main program. It will ask for the name of the unit to be
documented. Reply with the unit name only. The program will append
the ".TPU" extension and will search for the proper file.
The program will then ask if Dis-Assembly is desired and will require
a "y" or "n" answer.
The current directory will be searched first, followed by all
directories in the current PATH. The program will NOT search a ".TPL"
(Turbo Pascal Library) file.
If the desired unit is found, the program will write a report to the
current directory named "unitname.lst" which contains its analysis.
The format of the report is such that it may be copied to a printer if
that printer supports TTY control codes with form-feeds. Be judicious
in doing this however since there can be a lot of information. The
Turbo SYSTEM.TPU unit file produces almost ninety (90) pages without |
the disassembly option. When disassembly is requested for the SYSTEM |
unit, the size of the output file exceeds 700K bytes. |
This is a Unit that contains the text-file output primitives required
by the main program. It's not very pretty but it does work.
This Unit contains all Type Definitions, Structures, and "Canned"
Functions and Procedures required by the main program. All structures
documented in this report are also documented in TPUAMS1 by means of
the TYPE mechanism. Some of the structures are difficult if not
impossible to handle using ISO Pascal but Turbo Pascal provides the
means for getting the job done.
This unit is a rudimentary disassembler. The output will not assemble
and may look strange to a real assembler programmer since this author
is not so-qualified. However, the basis for support of 80286, 80386
etc. processors is present as well as coprocessor support. Of perhaps
the greatest interest is that it does appear to decode the emulated
coprocessor instructions that are implemented via INT 34-3D.
Rev: August 11, 1990 Page 35
Inside TURBO Pascal 5.5 Units
Be warned however. The output is not guaranteed since this was coded
by myself and I am perhaps the rankest amateur that ever approached
this quite awful assembler language. For convenience, the operand
coding mimics TASM "Ideal" mode.
As is usual with programs of this type, error-recovery is minimal and
no context checking is performed. If the operation code is found to
be valid, then a valid instruction is assumed -- even if invalid
operands are present.
The only positives that apply to this program are that it doesn't slow
the cpu down (although a lot more output is produced), and it does let
one "tune" code for compactness by letting one view the results of the
coding directly. Also, incomplete instructions are handled as data |
rather than overrunning into the next proc. |
It was intended from the beginning that this program should be able to
be enhanced to permit external units to be referenced during the
analysis of any given unit, even if they were library components. The
author hopes that users so-inclined will find the code pliable enough
to engineer such enhancements. No small amount of care was expended
to make pointer references flexible enough so that more than one unit
could be addressed at one time. However, none of the references to
external units are resolved by the program as it now stands.
This program was NOT intended as a pilot for some future product. It |
WAS intended as a rather "ersatz" tool for myself. |
The following sections discuss a few of the methods employed by the
supplied program.
Rev: August 11, 1990 Page 36
Inside TURBO Pascal 5.5 Units
Printing the unit dictionary area in a way that exposes its underlying |
semantics is no small task. The unit dictionary area itself is a |
rather amorphous-looking mass of data composed of hash tables, |
dictionary headers and stubs, type descriptors, etc. In order to |
present all this information in a meaningful way, we have to reveal |
its structure and this cannot be done by means of a sequential |
"browse" technique. Rather, we have to visit all nodes in the |
dictionary area so that each may be formatted in a way that exposes |
their function and meaning. This is made necessary by the fact that |
items are added to the dictionary as encountered and no convenient |
ordering of entry types exists. What we have here is the problem of |
finding a minimal "cover" for the dictionary area that properly |
exposes the content and structure of the dictionary area. |
To do this, we construct (in the heap) a stack and a queue, both of |
which are initially empty. The entries we put in the stack identify |
the class of entry (Hash Table, Dictionary Header, Type Descriptor or |
In-Line Code group), the location of the structure, and the location |
of its immediate "owner" or "parent" dictionary entry (which allows |
some limited information about scope to be printed). |
To the empty stack, we add an entry for the unit name dictionary |
entry, the INTERFACE hash table, and the DEBUG hash table. All these |
are located via direct pointers (LL's) in the Unit Header Table. We |
then pop one entry off the stack and begin our analysis. |
a) If the entry we popped off the stack is not present in the |
queue, we add it and call a routine that can interpret the entry |
(aka, "cover") for a Dictionary Header, Hash Table, or Type |
Descriptor. (This may lead to additional entries being added to |
the stack such as nested-scope hash tables, Dictionary Headers, |
Type Descriptors or In-Line Code group entries.) |
b) While the stack is not empty, we pop another entry and repeat |
step "a" (above) until no more entries are available. |
The result is a queue containing one entry for each structure in the |
unit dictionary area that is identifiable via traversal. (In |
practice, the method we use is similar to a "breadth-first" traversal |
of an n-way tree that is implemented in non-recursive fashion.) Each |
entry in the queue contains the information described above and the |
queue itself thus forms a set of descriptors that drive the process of |
formatting the dictionary area for display. The process may be |
likened to "painting by the numbers" or to finding a way to lay tile |
on a flat surface using tiles of four different irregular shapes until |
the floor is exactly covered. |
There is one significant limitation that needs to be pointed out. It |
is not always possible to determine the "parent" or "owner" of a node |
with certainty. The following discussion illustrates the problem of |
finding the "real" parent of a Type Descriptor. |
Rev: August 11, 1990 Page 37
Inside TURBO Pascal 5.5 Units
Almost every "type" in Pascal is actually derived from the basic types |
that are (in Turbo Pascal) defined in the SYSTEM.TPU unit -- e.g. |
"INTEGER", "BYTE", etc. In addition, several of the Type Descriptors |
in the SYSTEM unit are referenced by more than one Dictionary Entry. |
Thus, we find that a "many-to-one" relationship may exist between |
Dictionary Entries and Type Descriptors. How does one find out which |
is the entry that actually gave rise to the Type Descriptor? |
The Dictionary Area of a unit has some special properties, one of |
which is the fact that the Dictionary Entries for named Types are |
often located quite near their primary type descriptors. The |
Dictionary Area seems to be treated as an upward growing heap with the |
various structures being added by Turbo as needed. This makes it |
likely that the Type "Q" header which gives rise to a type descriptor |
is quite likely to occur earlier in the Dictionary Area than any other |
header which refers to the same descriptor. We take advantage of this |
property to allocate "ownership" but it may not be "fool-proof". Some |
type descriptors are spawned by other type descriptors, especially for |
structured types. We don't attempt to allocate "ownership" to these |
"lower-level" descriptors. |
To start with, I apologize up front for mistakes which are bound to be |
present in this routine. I am not a MASM or TASM programmer and I |
will not pretend otherwise. This being the case, the formatting I |
have chosen for the operands may be erroneous or misleading and might |
(if submitted to one of the "real" assemblers) produce object code |
quite different from what is expected. I hope not, but I have to |
admit it's possible. |
My intention in adding this unit was to permit tuning of object code |
to be made possible. With practice and some effort, one can observe |
the effect on the object module caused by specific Pascal coding. |
Thus, where compactness is an issue of paramount importance, TPUUNA1 |
can be of help. In some cases, a simple re-arrangement of the local |
variable declarations in a procedure can have a significant effect of |
the size of the code if it means the difference between 1 and 2-byte |
displacements for each instruction that references a specific local |
variable. Potential applications along these lines seem almost |
unlimited. |
Rev: August 11, 1990 Page 38
Inside TURBO Pascal 5.5 Units
I adopted an operand format not unlike that of TASM "Ideal" mode since |
it was more convenient to do so and looked more readable to me. I |
relied on several reference books for guidance in decoding the entire |
mess and I found that there were several flaws (read ERRORS) in some |
of them which made the job that much more difficult. I then |
compounded my problems by attempting to handle 80286 and 80386 |
specific code even though Turbo Pascal does not generate code specific |
to these processors. I simply felt that the effort involved in |
writing any sort of Dis-Assembly program for Turbo Pascal units was an |
effort best experienced not more than once. With all this self- |
flagellation out of my system once and for all, I will try to show the |
basic strategy of the program and to explain the limitations and some |
of the discoveries I made. |
The routine is intended to be idiotically simple - i.e., no smarter |
than the DEBUG command in principle. The basic idea is: pass some |
text to the routine and get back ONE line derived from some prefix of |
that text. Repeat as necessary until all text is gone. Thus, there |
is no attempt to check the context of the text being processed. Also, |
some configurations of the "modR/M" byte may invalid for selected |
instructions. I don't try to screen these out since the intent was to |
look at the presumably correct code produced by TURBO Pascal -- not |
devious assembly language. Also, this program regards WAIT operations |
as "stand-alone" -- i.e., it doesn't check to see if a coprocessor |
operation follows for which the WAIT might be regarded as a prefix. |
One area of real difficulty was figuring out the Floating-Point |
emulations used by Turbo Pascal that are implemented by means of |
interrupts $34 through $3D. I don't know if I got it right, but the |
results seem reasonable and consistent. In the listing, the Interrupt |
is produced on one line, followed by its parameters on the next line. |
The parameter line is given the op-code "EMU_xxxx" where "xxxx" is the |
coprocessor op-code I felt was being emulated. Interrupt $3C was a |
real puzzler but after seeing a lot of code in context, I think that |
the segment override is communicated to the emulator by means of the |
first byte after the $3C. |
Normally, in a non-emulator environment, all coprocessor operations |
(ignoring any WAIT prefixes) begin with $D8-$DF. What Borland (and |
maybe Microsoft) seem to have done here is to change the $D8-$DF so |
that bits 7 and 6 of this byte are replaced with the one's complement |
of the 2-bit segment register number found in various 8086 |
instructions. This seems to be how an override for the DS register is |
passed to the emulator. I don't KNOW this to be the correct |
interpretation, but the code I have examined in context seems to work |
under this scheme, so TPUUNA uses it to interpret the operand |
accordingly. |
For 80x86 machines, the problem was somewhat simpler. TPUUNA takes a |
quick look at the first byte of the text. Almost any byte is valid as |
the initial byte of an instruction, but some instructions require more |
than one byte to hold the complete operation code. Thus, step 1 |
classifies bytes in several ways that lead to efficient recognition of |
valid operation codes. |
Rev: August 11, 1990 Page 39
Inside TURBO Pascal 5.5 Units
Once the instruction has been identified in this way, it is more or |
less easy to link to supplemental information that provides operand |
editing guidance, etc. |
The tables that embody the recognition scheme were constructed using |
PARADOX 3.0 (another fine Borland product) and suitably coded queries |
were used to generate the actual Turbo Pascal code for compilation. |
For those that are interested, TPUUNA supports the address-size and |
operand-size prefixes of the 80386 as well as 32-bit operands and |
addresses but remember that Turbo Pascal doesn't generate these. A |
trivial change is provided for which allows segments which default to |
32-bit mode to be handled as well. |
There is a simple mode variable that gets passed to TPUUNA by its |
caller which specifies the most-capable processor whose code is to be |
handled. Codes are provided for the 8086 (8088 is the same), 80186 |
(same as 80286 except no protected mode instructions), 80286 (80186 |
plus protected mode operation), and 80386. |
No such specifier is provided for coprocessor support. What is there |
is what I think an 80387 supports. I don't think that this is really |
a problem if you don't try to use TPUUNA for anything but Turbo Pascal |
code. |
Error recovery is predictably simple. The initial text byte is output |
as the operand of a DB pseudo-op and provision is made to resume work |
at the next byte of text. |
I hope this program is found to be useful in spite of the errors it |
must surely contain. I have yet to make much sense of the rules for |
MASM or TASM operand coding and I found very little of value in many |
of the so-called "texts" on the subject. I found myself in the |
position of that legendary American watching a Cricket match in |
England for the first time ("You mean it has RULES?"). |
Rev: August 11, 1990 Page 40
Inside TURBO Pascal 5.5 Units
This author has examined .TPL files in passing and concludes that
their structure is trivial in the extreme. The following notes should
be of some help.
A Turbo Pascal Library (.TPL) file appears to be a simple catenation
of Turbo Pascal Unit (.TPU) files. Since the length of a Unit may be
determined from the Unit Header (see section 3.2), it is simple to see
that one may "browse" through a .TPL file looking for an external unit
such as SYSTEM.TPU. If this seems to be too much effort, then there
is always the TPUMOVER Utility program supplied by Borland.
Quite simply, this Utility allows one to extract units from .TPL files
in order to subject them to the analysis performed by TPUMAIN. Read
your Turbo Pascal User's Guide for instructions on the operation and
use of this utility.
One of the more obvious applications of this information would seem to
be in the area of a Cross-Reference Generator.
There is a very fine example of such a program in the public domain
that was written by Mr. R. N. Wisan called "PXL". This program has
been around since the days of Turbo Pascal Version 1. The program has
been continually enhanced by the author in the way of features and for
support of the newer Turbo Pascal versions. It does not however solve
the problem of telling one which unit contains the definition of a
given symbol. In fairness to "PXL" however, this is no small problem
since the format of .TPU files keeps changing (Turbo 5.5 Units are
not object-code compatible with Turbo 5.0 Units, and so on...) and
Mr. Wisan probably has more than enough other projects to keep himself
However, for the user who is willing to work a little (maybe a lot?),
this document would seem to provide the information needed to add such
a function to his own pet cross-reference generator.
Rev: August 11, 1990 Page 41
Inside TURBO Pascal 5.5 Units
This project would have been totally infeasible without the aid of
some very fine tools. As it was, several hundred man hours have been
expended on it and as you can see, there are a few unresolved issues
that have been (graciously) left for others to address. The tools
used by this author consisted of:
1) Turbo Pascal 5.5 Professional by Borland International
2) Microsoft WORD (version 5.0)
3) LIST (version 6.4a) by Vernon D. Buerg
4) the DEBUG utility in MS-DOS Version 3.3.
5) PARADOX 3.0 by Borland International |
6) QUATTRO PRO by Borland International |
7) TURBO ASSEMBLER 1.1 by Borland International |
(PARADOX and QUATTRO PRO were used for data collection and analysis in |
the course of coding the recognizer tables for the disassembler unit.) |
The references listed were of great value in this project. [Intel85] |
was a valuable source of information about coprocessor instructions as |
well as offering hints about the differences between the 8086/8088 and |
the 80286. The [Borland] TASM manuals offered further info on the |
80186. [Nelson] provided presentations of well-organized data |
directed at the problem of disassembly but the tables were flawed by a |
number of errors which crept into my databases and which caused much |
of the extra debugging effort. [Intel89] offered valuable insights on |
the 80386 addressing schemes as well as the 32-bit data extensions. |
Finally, [Brown] provided valuable clues on the Floating-Point |
emulators used by Borland (and Microsoft?). As you can see, the |
amount of hard information available to me on this project was quite |
limited since I am unaware of any other existing body of literature on |
this subject. |
That's it folks. Does anyone wonder why it took several hundred man
hours to get to this point? It took a lot of hard (and at times
tedious) work coupled with a great many lucky guesses to achieve what
you see here.
Rev: August 11, 1990 Page 42
Inside TURBO Pascal 5.5 Units
[Bor88a], TURBO ASSEMBLER REFERENCE GUIDE, Borland International, |
1988. |
[Bor88b], TURBO ASSEMBLER USER'S GUIDE, Borland International, 1988. |
[Bor88c], TURBO PASCAL REFERENCE GUIDE Version 5.0, Borland |
International, 1988. |
[Bor88d], TURBO PASCAL USER'S GUIDE Version 5.0, Borland |
International, 1988. |
International, 1989. |
[Brown], INTER489.ARC, Ralf Brown, 1989 |
286 NUMERIC SUPPLEMENT, Intel Corporation, 1985, (order |
number 210498-003). |
Corporation, 1989, (order number 240331-001). |
THE 80386, Ross P. Nelson, Microsoft Press, 1988. |
Scanlon, Brady 1986. |
Rev: August 11, 1990 Page 43