2836 lines
130 KiB
Plaintext
2836 lines
130 KiB
Plaintext
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
-----------------------------
|
||
|
||
|
||
|
||
INSIDE TURBO PASCAL 5.5 UNITS
|
||
|
||
|
||
|
||
-----------------------------
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
by
|
||
|
||
William L. Peavy
|
||
|
||
-----------------
|
||
|
||
Revised: August 11, 1990
|
||
|
||
|
||
|
||
|
||
|
||
|
||
ABSTRACT
|
||
|
||
This document provides a revised report on researches into
|
||
the structure and content of Unit (.TPU) files produced by
|
||
Turbo Pascal (version 5.5) from Borland International. No
|
||
assurances are possible regarding when (if ever) further
|
||
updates will be available so the material is released to the
|
||
Turbo Pascal user community in its admittedly imcomplete
|
||
state since very little of consequence really remains to be
|
||
done.
|
||
|
||
|
||
|
||
COMMENTS
|
||
|
||
Comments and feed-back are welcome -- especially new
|
||
contributions. I can be reached via the following services:
|
||
|
||
CompuServ (70042,2310)
|
||
|
||
HalPC Telecom-1 (William;Peavy)
|
||
|
||
HalPC Telecom-2 (Wm;Peavy)
|
||
|
||
|
||
|
||
Table Of Contents
|
||
|
||
|
||
|
||
Introduction ................................................ 3
|
||
|
||
1. Gross File Structure ..................................... 3
|
||
1.1 User Units .......................................... 4
|
||
|
||
2. Locators ................................................. 5
|
||
2.1 Local Links ......................................... 5
|
||
2.2 Global Links ........................................ 5
|
||
2.3 Table Offsets ....................................... 5
|
||
|
||
3. Unit Header .............................................. 6
|
||
3.1 Description ......................................... 6
|
||
3.2 File Size ........................................... 9
|
||
|
||
4. Symbol Dictionaries ...................................... 9
|
||
4.1 Organization ........................................ 9
|
||
4.2 Interface Dictionary ............................... 10
|
||
4.3 DEBUG Dictionary ................................... 10
|
||
|
||
4.4 Dictionary Elements ................................ 10
|
||
4.4.1 Hash Tables .................................. 10
|
||
4.4.1.1 Size ................................... 11
|
||
4.4.1.2 Scope .................................. 12
|
||
4.4.1.3 Special Cases .......................... 12
|
||
|
||
4.4.2 Dictionary Headers ........................... 13
|
||
|
||
4.4.3 Dictionary Stubs ............................. 13
|
||
4.4.3.1 Label Declaratives ("O") ............... 13
|
||
4.4.3.2 Un-Typed Constants ("P") ............... 14
|
||
4.4.3.3 Named Types ("Q") ...................... 14
|
||
4.4.3.4 Variables, Fields, Typed Cons ("R") .... 15
|
||
4.4.3.5 Subprograms & Methods ("S") ............ 16
|
||
4.4.3.6 Turbo Std Procedures ("T") ............. 17
|
||
4.4.3.7 Turbo Std Functions ("U") .............. 17
|
||
4.4.3.8 Turbo Std "NEW" Routine ("V") .......... 17
|
||
4.4.3.9 Turbo Std Port Arrays ("W") ............ 17
|
||
4.4.3.10 Turbo Std External Variables ("X") .... 17
|
||
4.4.3.11 Units ("Y") ........................... 18
|
||
|
||
4.4.4 Type Descriptors ............................. 19
|
||
4.4.4.1 Scope .................................. 19
|
||
4.4.4.2 Prefix Part ............................ 20
|
||
|
||
4.4.4.3 Suffix Parts ........................... 21
|
||
4.4.4.3.1 Un-Typed ......................... 21
|
||
4.4.4.3.2 Structured Types ................. 22
|
||
4.4.4.3.2.1 ARRAY Types ................ 22
|
||
4.4.4.3.2.2 RECORD Types ............... 22
|
||
4.4.4.3.2.3 OBJECT Types ............... 23
|
||
4.4.4.3.2.4 FILE (non-TEXT) Types ...... 23
|
||
4.4.4.3.2.5 TEXT File Types ............ 23
|
||
4.4.4.3.2.6 SET Types .................. 24
|
||
|
||
|
||
|
||
- i -
|
||
|
||
|
||
|
||
Table Of Contents
|
||
|
||
|
||
4.4.4.3.2.7 POINTER Types .............. 24
|
||
4.4.4.3.2.8 STRING Types ............... 24
|
||
|
||
4.4.4.3.3 Floating-Point Types ............. 24
|
||
|
||
4.4.4.3.4 Ordinal Types .................... 24
|
||
4.4.4.3.4.1 "Integers" ................. 25
|
||
4.4.4.3.4.2 BOOLEANs ................... 25
|
||
4.4.4.3.4.3 CHARs ...................... 25
|
||
4.4.4.3.4.4 ENUMERATions ............... 26
|
||
|
||
4.4.4.3.5 SUBPROGRAM Types ................. 26
|
||
|
||
5. Maps and Lists .......................................... 27
|
||
5.1 PROC Map ........................................... 27
|
||
5.2 CSeg Map ........................................... 28
|
||
5.3 Typed CONST DSeg Map ............................... 28
|
||
5.4 Global VAR DSeg Map ................................ 29
|
||
5.5 Donor Unit List .................................... 29
|
||
5.6 Source File List ................................... 30
|
||
5.7 DEBUG Trace Table .................................. 31
|
||
|
||
6. Code, Data, Relocation Info ............................. 32
|
||
6.1 Object CSegs ....................................... 32
|
||
6.2 CONST DSegs ........................................ 32
|
||
6.3 Relocation Data Table .............................. 33
|
||
|
||
7. Supplied Program ........................................ 34
|
||
7.1 TPUNEW ............................................. 35
|
||
|
|
||
7.2 TPURPT1 ............................................ 35
|
||
7.3 TPUAMS1 ............................................ 35
|
||
7.4 TPUUNA1 ............................................ 35
|
||
7.5 Modifications ...................................... 36
|
||
|
||
7.6 Notes on Program Logic ............................. 36
|
||
|
|
||
7.6.1 Formatting the Dictionary .................... 37
|
||
|
|
||
7.6.2 The Disassembler ............................. 38
|
||
|
|
||
|
||
8. Unit Libraries .......................................... 41
|
||
8.1 Library Structure .................................. 41
|
||
8.2 The TPUMOVER Utility ............................... 41
|
||
|
||
9. Application Notes ....................................... 41
|
||
|
||
10. Acknowledgements ....................................... 42
|
||
|
||
11. References ............................................. 43
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
- ii -
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
INTRODUCTION
|
||
|
||
|
||
This document is the outcome of an inquiry conducted into the
|
||
structure and content of Borland Turbo Pascal (Version 5.5) Unit
|
||
files. The original purpose of the inquiry was to provide a body of
|
||
theory enabling Cross-Reference programs to resolve references to
|
||
symbols defined in .TPU files where qualification was not explicitly
|
||
provided. As is so often the case, one thing led to another and the
|
||
scope of the inquiry was expanded dramatically. While this document
|
||
should not be regarded as definitive, the author feels that the entire
|
||
Turbo Pascal User community might gain from the information extracted
|
||
from these files at the cost of so much time and effort.
|
||
|
||
The material contained herein represents the findings and
|
||
interpretations of the author. A great deal of guess-work was
|
||
required and no assurances are given as to the accuracy of either the
|
||
findings of fact or the inferences contained herein which are the sole
|
||
work-product of the author. In particular, the author had access only
|
||
to materials or information that any normal Borland customer has
|
||
access to. Further, no Borland source-codes were available as the
|
||
Library Routine source is not licensed to the author. In short, there
|
||
was nothing irregular about how these findings were achieved.
|
||
|
||
The material contained herein is placed in the public domain free of
|
||
copyright for use of the general public at its own risk. The author
|
||
assumes no liability for any damages arising from the use of this
|
||
material by others. If you make use of this information and you get
|
||
burned, TOUGH! The author accepts no obligation to correct any such
|
||
errors as may exist in the supplied programs or in the findings of
|
||
fact or opinion contained herein. On the other hand, this is not a
|
||
"complete" work in that a great many questions remain open, especially
|
||
as regards fine details. (The author is not a practitioner of Intel
|
||
80xxx Assembly Language and several open questions might best be
|
||
addressed by persons competent in this area.) The author welcomes the
|
||
input of interested readers who might be able to "flesh-out" some of
|
||
these open questions with "hard" answers.
|
||
|
||
|
||
1. GROSS FILE STRUCTURE
|
||
|
||
|
||
A Turbo Pascal Unit file (Version 5.5 only) consists of an array of
|
||
bytes that is some exact multiple of sixteen (16). "Signature"
|
||
information allows the compiler to verify that the .TPU file was
|
||
compiled with the correct compiler version and to verify that the file
|
||
is of the correct size. The fine structure of the file will be
|
||
addressed in later sections at ever increasing levels of detail.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 3
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
Graphically, the file may be regarded as having the following general
|
||
layout:
|
||
|
||
+-------------------+
|
||
| Unit Header | Main Index to Unit File
|
||
+-------------------+
|
||
| Dictionaries: |
|
||
| a) Interface |
|
||
| b) Debugger * | For Local Symbol Access
|
||
+-------------------+
|
||
| PROC Map |
|
||
+-------------------+
|
||
| CSeg Map * | May be Empty
|
||
+-------------------+
|
||
| CONST DSeg Map * | May be Empty
|
||
+-------------------+
|
||
| VAR DSeg Map * | May be Empty
|
||
+-------------------+
|
||
| Donor Units * | May be Empty
|
||
+-------------------+
|
||
| Source Files |
|
||
+-------------------+
|
||
| Trace Table * | May be Empty
|
||
+-------------------+
|
||
| CODE Segment(s) * | May be Empty
|
||
+-------------------+
|
||
| DATA Segment(s) * | May be Empty
|
||
+-------------------+
|
||
| RELO Data * | May be Empty
|
||
+-------------------+
|
||
|
||
|
||
1.1 USER UNITS
|
||
|
||
|
||
Units prepared by the compiler available to ordinary users have a very
|
||
straight-forward appearance and content. There may even be a little
|
||
"wasted" space that might be removed if the compiler were just a
|
||
little cleverer. The SYSTEM.TPU file is quite another thing however.
|
||
|
||
The SYSTEM.TPU file (found in TURBO.TPL) is extraordinary in that
|
||
great pains seem to have been taken to compact it. Further, it
|
||
contains a great many types of entries that just don't seem to be
|
||
achievable by ordinary users and I suspect that much (if not all) of
|
||
it was "hand-coded" in Assembler Language.
|
||
|
||
In the following sections, the details of these optimizations will be
|
||
explained in the context of the structural element then under
|
||
discussion.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 4
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
2. LOCATORS
|
||
|
||
|
||
The data in these files has need of structure and organization to
|
||
support efficient access by the various programs such as the compiler,
|
||
the linker and the debugger. This organization is built on a solid
|
||
foundation of locators employed in the unit's data structures.
|
||
|
||
|
||
|
||
2.1 LOCAL LINKS
|
||
|
||
|
||
Local Links (LL's) are items of type WORD (2 bytes) which contain an
|
||
offset which is relative to the origin of the unit file itself. This
|
||
implies that a unit must be somewhat less than 64K bytes in size. If
|
||
the .TPU file is loaded into the heap, then LL's can be used to locate
|
||
any byte in the segment beginning with the load point of the file.
|
||
|
||
|
||
|
||
2.2 GLOBAL LINKS
|
||
|
||
|
||
Global Links (LG's) are used to locate type descriptors which may
|
||
reside in other Units (i.e., units external to the present unit).
|
||
LG's are structured items consisting of two (2) words. The first of
|
||
these is an LL that is relative to the origin of the (possibly)
|
||
external unit. The second word is an LL which locates the stub of the
|
||
unit entry in the current unit dictionary for the (possibly) external
|
||
unit. This dictionary entry provides the name of the unit that
|
||
contains the item the LG points to.
|
||
|
||
This provides a handy mechanism for locating type descriptors which
|
||
are defined in other separately compiled units.
|
||
|
||
|
||
|
||
2.3 TABLE OFFSETS
|
||
|
||
|
||
Finally, various data-structures within a .TPU file are organized as
|
||
arrays of fixed-length records or as lists of variable-length records.
|
||
Efficient access to such records is achieved by means of offsets
|
||
rather than subscripts (an addressing technique denied Pascal). These
|
||
offsets are relative to the origin of the array or list being
|
||
referenced rather than the origin of the unit.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 5
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
3. UNIT HEADER
|
||
|
||
|
||
The Unit Header comprises the first 64 bytes of the .TPU file. It
|
||
contains LL's that effectively locate all other sections of the .TPU
|
||
file plus statistics that enable a little cross-checking to be
|
||
performed. Some parts of the Unit Header appear to be reserved for
|
||
future use since no unit examined by this author has ever contained
|
||
non-zero data in these apparently reserved fields.
|
||
|
||
|
||
|
||
3.1 DESCRIPTION
|
||
|
||
|
||
The Unit Header provides a high-level locator table whereby each major
|
||
structure in the unit file can be addressed. The following provides a
|
||
Pascal-like explanation of the layout of the header followed by
|
||
further narrative discussion of the contents of the individual fields
|
||
in the Unit Header.
|
||
|
||
Type HdrAry = Array[0..3] of Char; LL = Word;
|
||
|
||
UnitHeader = Record
|
||
|
||
FilHd : HdrAry; { +00 : = 'TPU6' }
|
||
Fillr : HdrAry; { +04 : = $00000000 }
|
||
UDirE : LL; { +08 : to Dictionary Head-This Unit }
|
||
UGHsh : LL; { +0A : to Interface Hash Header }
|
||
UHPrc : LL; { +0C : to PROC Map }
|
||
UHCsg : LL; { +0E : to CSeg Map }
|
||
UHDsT : LL; { +10 : to DSeg Map-Typed CONST's }
|
||
UHDsV : LL; { +12 : to DSeg Map-GLOBAL Variables }
|
||
URULt : LL; { +14 : to Donor Unit List }
|
||
USRCF : LL; { +16 : to Source file List }
|
||
UDBTS : LL; { +18 : to Debug Trace Step Controls }
|
||
UndNC : LL; { +1A : to end non-code part of Unit }
|
||
ULCod : Word; { +1C : Size of Code }
|
||
ULTCon: Word; { +1E : Size of Typed Constant Data }
|
||
ULPtch: Word; { +20 : Size of Relo Patch List }
|
||
Unknx : Word; { +22 : Number of Virtual Objects??? }
|
||
ULVars: Word; { +24 : Size of GLOBAL VAR Data }
|
||
UHash2: LL; { +26 : to Debug Hash Header }
|
||
UOvrly: Word; { +28 : Number of Procs to Overlay?? }
|
||
UVTPad: Array[0..10]
|
||
of Word; { +2A : Reserved for Future Expansion? }
|
||
|
||
End; { UnitHeader }
|
||
|
||
FilHd contains the characters "TPU6" in that order. This is
|
||
clear evidence that this unit was compiled by Turbo Pascal
|
||
Version 5.5.
|
||
|
||
Fillr is apparently reserved and contains binary zeros.
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 6
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
UDirE contains an LL (WORD) which points to the Dictionary
|
||
Header in which the name of this unit is found.
|
||
|
||
UGHsh contains an LL (WORD) which points to a Hash table that is
|
||
the root of the Interface Dictionary tree.
|
||
|
||
UHPrc contains an LL (WORD) which points to the PROC Map for
|
||
this unit. The PROC Map contains an entry for each
|
||
Procedure or Function declared in the unit (except for
|
||
INLINE types), plus an entry for the Unit Initialization
|
||
section. The length of the PROC Map (in bytes) is
|
||
determined by subtracting this LL (at 000C) from the LL at
|
||
offset 000E.
|
||
|
||
UHCsg contains an LL (WORD) which points to the CSeg (CODE
|
||
Segment) Map for this unit. The CSeg Map contains an
|
||
entry for each CODE Segment produced by the compiler plus
|
||
an entry for each of the CODE Segments included via the
|
||
{$L filename.OBJ} compiler directive. The length of this
|
||
Map (in bytes) is obtained by subtracting this LL (at
|
||
000E) from the word at 0010. The result may be zero in
|
||
which case the CSeg Map is empty.
|
||
|
||
UHDsT contains an LL (WORD) which points to the DSeg (DATA
|
||
Segment) Map that maps the initializing data for Typed
|
||
CONST items plus templates for VMT's (Virtual Method
|
||
Tables) that are associated with OBJECTS which employ
|
||
Virtual Methods. The length of this Map (in bytes) is
|
||
obtained by subtracting this LL (at 0010) from the word at
|
||
0012. The result may be zero in which case this DSeg Map
|
||
is empty.
|
||
|
||
UHDsV contains an LL (WORD) which points to the DSeg (DATA
|
||
Segment) Map that contains the specifications for DSeg
|
||
storage required by VARiables whose scope is GLOBAL. The
|
||
length of this Map (in bytes) is obtained by subtracting
|
||
this LL (at 0012) from the word at 0014. The result may
|
||
be zero in which case this DSeg Map is empty.
|
||
|
||
URULt contains an LL (WORD) which points to a table of units
|
||
which contribute either CODE or DATA Segments to the .EXE
|
||
file for a program using this Unit. This is called the
|
||
"Donor Unit Table". The length of this table (in bytes)
|
||
is obtained by subtracting this LL (at 0014) from the word
|
||
at 0016. The result may be zero in which case this table
|
||
is empty.
|
||
|
||
USRCF contains an LL (WORD) which points to a list of "source"
|
||
files. These are the files whose CODE or DATA Segments
|
||
are included in this Unit by the compiler. Examples are
|
||
the Pascal Source for the Unit itself, plus the .OBJ files
|
||
included via the {$L filename.OBJ} compiler directive.
|
||
The length of this table (in bytes) is obtained by
|
||
subtracting this LL (at 0016) from the word at 0018. The
|
||
result may be zero in which case this table is empty.
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 7
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
UDBTS contains an LL (WORD) which points to a Trace Table used
|
||
by the DEBUGGER for "stepping" through a Function or
|
||
Procedure contained in this Unit. The length of this
|
||
table (in bytes) is obtained by subtracting this LL (at
|
||
0018) from the word at 001A. The result may be zero in
|
||
which case this table is empty.
|
||
|
||
UndNC contains an LL (WORD) which points to the first free byte
|
||
which follows the Trace Table (if any). It serves as a
|
||
delimiter for determinimg the size of the Trace Table.
|
||
This LL (when rounded up to the next integral multiple of
|
||
16) serves to locate the start of the code/data segments.
|
||
|
||
ULCod is a WORD that contains the total byte count of all CODE
|
||
Segments compiled into this Unit.
|
||
|
||
ULTCon is a WORD that contains the total byte count of all Typed
|
||
CONST and VMT DATA Segments compiled into this unit.
|
||
|
||
ULPtch is a WORD that contains the total byte count of the
|
||
Relocation Data Table for this unit.
|
||
|
||
Unknx is a WORD whose usage is poorly understood. It appears
|
||
always to be zero except when the Unit contains OBJECTs
|
||
which employ Virtual Methods.
|
||
|
||
ULVars is a WORD that contains the total byte count of all GLOBAL
|
||
VAR DATA Segments compiled into this unit.
|
||
|
||
UHash2 contains an LL (WORD) which points to a Hash Table which
|
||
is the root of the DEBUGGER Dictionary. If Local Symbols
|
||
were generated by the compiler (directive {$L+}) then ALL
|
||
symbols declared in the unit can be accessed from this
|
||
Hash Table. In the SYSTEM.TPU file, there is no such
|
||
Dictionary and the LL stored here points to the INTERFACE
|
||
Dictionary. This is an example of Hash Table "Folding" to
|
||
save space which has been observed only in SYSTEM.TPU.
|
||
|
||
UOvrly is a WORD whose usage is poorly understood. This word is
|
||
usually zero unless the Unit was compiled with the Overlay
|
||
Directive {$O+}.
|
||
|
||
UVTPad begins a series of eleven (11) words that are apparently
|
||
reserved for future use. Nothing but zeros have ever been
|
||
seen here by this author.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 8
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
3.2 FILE SIZE
|
||
|
||
|
||
An independent check on the size of the .TPU file is available using
|
||
information contained in the Unit Header. This is also important for
|
||
.TPL (Unit Library) organization. To compute the file size, refer to
|
||
the four (4) words at offsets 001A, 001C, 001E and 0020. Round the
|
||
contents of each of these words to the lowest multiple of 16 that is
|
||
greater than or equal to the content of that word. Then form the sum
|
||
of the rounded words. This is the .TPU file size in bytes.
|
||
|
||
|
||
|
||
4. SYMBOL DICTIONARIES
|
||
|
||
|
||
This area contains all available documentation of declared symbols and
|
||
procedure blocks defined within the unit. Depending on compiler
|
||
options in effect when the unit was compiled, this section will
|
||
contain at a minimum, the INTERFACE declarations, and at a maximum,
|
||
ALL declarations. The information stored in the dictionary is highly
|
||
dependent on the context of the symbol declared. We defer further
|
||
explanation to the appropriate section which follows.
|
||
|
||
|
||
|
||
4.1 ORGANIZATION
|
||
|
||
|
||
The dictionary is organized with a Hash Table as its root. The hash
|
||
table is used to provide rapid access to arbitrary symbols. Since
|
||
Turbo Pascal compiles very rapidly, I presume the hash function to be
|
||
worthwhile to say the least.
|
||
|
||
The dictionary itself may be thought of as an n-way tree. Each
|
||
subtree has its roots in a hash table. There may be a great many hash
|
||
tables in a given unit and their number depends on unit complexity as
|
||
well as the options chosen when the unit was compiled. Use of the
|
||
{$L+} directive produces the densest trees. The hash tables are
|
||
explained in detail a few sections further on.
|
||
|
||
Hash tables point to Dictionary Headers. When two or more symbols
|
||
produce the same hash function result, a collision is said to occur.
|
||
Collisions are resolved by the time-honored method of chaining
|
||
together the Dictionary Headers of those symbols having the same hash
|
||
function result. Dictionary supersetting is accomplished using these
|
||
chains.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 9
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
4.2 INTERFACE DICTIONARY
|
||
|
||
|
||
The INTERFACE dictionary contains all symbols and the necessary
|
||
explanatory data for the INTERFACE section of a Unit. Symbols get
|
||
added to the Unit using increasing storage addresses until the
|
||
IMPLEMENTATION section is encountered.
|
||
|
||
|
||
|
||
4.3 DEBUG DICTIONARY
|
||
|
||
|
||
The DEBUG dictionary (if present) is a superset of the INTERFACE
|
||
dictionary. It is used by the Turbo Debugger to support its many
|
||
features when tracing through a unit. If present, this dictionary is
|
||
rooted in its own hash table. The hash table is effectively
|
||
initialized when the IMPLEMENTATION keyword is processed by the
|
||
compiler. This takes the form (initially) of an unmodified copy of
|
||
the INTERFACE hash table, to which symbols are added in the usual
|
||
fashion. Thus, the hash chains constructed or extended at this time
|
||
lead naturally to the INTERFACE chains and this is how the superset is
|
||
effectively implemented.
|
||
|
||
|
||
|
||
4.4 DICTIONARY ELEMENTS
|
||
|
||
|
||
The dictionary contains four major elements. These are: hash tables,
|
||
Dictionary Headers, Dictionary Stubs and Type Descriptors. The
|
||
distinction between Dictionary Headers and Stubs is essentially
|
||
arbitrary and is made in this document to assist in exposition. They
|
||
might just as easily be regarded as a single element (such as symbol
|
||
entry).
|
||
|
||
|
||
|
||
4.4.1 HASH TABLES
|
||
|
||
|
||
As has been intimated, Hash Tables are the glue that binds the
|
||
dictionary entries together and gives the dictionary its "shape".
|
||
They effectively implement the scope rules of the language and speed
|
||
access to essential information.
|
||
|
||
Each Hash table begins with a 2-byte size descriptor. This descriptor
|
||
contains the number of bytes in the table proper (less 2). Thus, the
|
||
descriptor directly points to the last bucket in the hash table. For
|
||
a hash table of 128 bytes, the size descriptor contains 126. The
|
||
first bucket in the table immediately follows the size descriptor.
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 10
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
4.4.1.1 SIZE
|
||
|
||
|
||
So far, three different hash table sizes have been observed. The
|
||
INTERFACE and DEBUG hash tables are usually 128 bytes (64 entries) in
|
||
size plus 2 bytes of size description, but the SYSTEM.TPU unit is a
|
||
special case, containing only 16 entries. Hash tables which anchor
|
||
subtrees whose scope is relatively local usually contain four (4)
|
||
entries (8 bytes).
|
||
|
||
Graphically, a Hash Table with four slots has the following layout:
|
||
|
||
+--------------------+
|
||
| 0006h | Size Descriptor
|
||
+====================+
|
||
| slot 0 | an LL or zero
|
||
+--------------------+
|
||
| slot 1 | an LL or zero
|
||
+--------------------+
|
||
| slot 2 | an LL or zero
|
||
+--------------------+
|
||
| slot 3 | an LL or zero
|
||
+--------------------+
|
||
|
||
It should be noted that the Size Descriptor furnishes an upper bound
|
||
for the hash function itself. Thus, it seems possible that a single
|
||
hash function is used for all hash tables and that its result is ANDed
|
||
with the Size Descriptor to get the final result. Because the sizes
|
||
are chosen as they are (powers of 2) this is feasible. Note that in
|
||
the above example, 6 = 2 * (n - 1) where n = 4 {slot count}. All of
|
||
the hash tables observed so far have this property. What you get is a
|
||
really efficient MOD function.
|
||
|
||
Suppose that the hash of a given symbol is 13 and the proper slot must
|
||
be located for a hash table of four entries. If we let "h" be the raw
|
||
result of 13, then our final hash is (h SHL 1) AND ((4-1) SHL 1) or
|
||
|
||
(13 SHL 1) AND 6 = 2 !
|
||
|
||
One final note on this subject. Given these properties, "Folding" of
|
||
sparse hash tables is a rather trivial exercise so long as the new
|
||
hash table also contains a number of slots that is a power of 2. This
|
||
point is intriguing when one recalls that the SYSTEM.TPU hash table
|
||
has only 16 slots rather than the usual 64.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 11
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
4.4.1.2 SCOPE
|
||
|
||
|
||
The INTERFACE and DEBUG dictionary hash tables are Global in Scope
|
||
even though the symbols accessed directly via the DEBUG hash table may
|
||
be private. On the other hand, other hash tables are purely local in
|
||
scope. For example, the fields declared within a record are reached
|
||
via a small local hash table, as are the parameters and local
|
||
variables declared within procedures and functions. Even OBJECTS use
|
||
this technique to provide access to Methods and Object Fields.
|
||
|
||
Access to such local scope fields/methods requires use of qualified
|
||
names which ensures conformity to Pascal scope rules. The method is
|
||
truly simple and elegant.
|
||
|
||
|
||
|
||
4.4.1.3 SPECIAL CASES
|
||
|
||
|
||
The SYSTEM.TPU Unit is a special case. Its INTERFACE and DEBUG hash
|
||
tables have apparently been "hand-tuned" for small size. Each
|
||
contains only sixteen (16) entries. In addition, the DEBUG hash table
|
||
is empty since there is no local symbol generation in this unit.
|
||
Therefore, the DEBUG hash table does not exist as a separate entity,
|
||
its function being served by the INTERFACE hash table. The pointer to
|
||
the DEBUG hash table (in the Unit Header) has the same value as the
|
||
pointer to the INTERFACE hash table (SYSTEM unit ONLY).
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 12
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
4.4.2 DICTIONARY HEADERS
|
||
|
||
|
||
This is the structure that anchors all information known by the
|
||
compiler about any symbol. The format is as follows:
|
||
|
||
+00: An LL which points to the next (previous) symbol in the
|
||
same scope which had the same hash function value.
|
||
|
||
+02: A character that defines the category the symbol belongs
|
||
to and defines the format of the Dictionary Stub which
|
||
follows the Dictionary Header.
|
||
|
||
+03: A String (in the Pascal sense) of variable size that
|
||
contains the text of the symbol (in UPPER-CASE letters
|
||
only). The SizeOf function is not defined for these
|
||
strings since they are truncated to match the symbol size.
|
||
The "value" of the SizeOf function can be determined by
|
||
adding 1 to the first byte in the string. Thus,
|
||
Ord(Symbol[0])+1 is the expression that defines the Size
|
||
of the symbol string. Turbo Pascal defines a symbol as a
|
||
string of relatively arbitrary size, the most significant
|
||
63 characters of which will be stored in the dictionary.
|
||
Thus, we conclude that the maximum size of such a string
|
||
is 64 bytes.
|
||
|
||
|
||
|
||
4.4.3 DICTIONARY STUBS
|
||
|
||
|
||
Dictionary Stubs immediately follow their respective headers and their
|
||
format is determined by the category character in the Dictionary
|
||
Header. The function of the stub is to organize the information
|
||
appropriate to the symbol and provide a means of accessing additional
|
||
information such as type descriptors, constant values, parameter lists
|
||
and nested scopes. The format of each Stub is presented in the
|
||
following sub-sections.
|
||
|
||
|
||
|
||
4.4.3.1 LABEL DECLARATIVES ("O")
|
||
|
||
|
||
This Stub consists of a WORD whose function is (as yet) unknown.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 13
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
4.4.3.2 UN-TYPED CONSTANTS ("P")
|
||
|
||
|
||
This Stub consists of (2) two fields:
|
||
|
||
+00: An LG which points to a Type Descriptor (usually in
|
||
SYSTEM.TPU). This establishes the minimum storage
|
||
requirement for the constant. The rules vary with the
|
||
type, but the size of the constant data field (which
|
||
follows) is defined using the Type Descriptor(s).
|
||
|
||
+04: The value of the constant. For ordinal types, this value
|
||
is stored as a LONGINT (size=4 bytes). For Floating-Point
|
||
types, the size is implicit in the type itself. For
|
||
String types, the size is determined from the length of
|
||
the string which is stored in the initial byte of the
|
||
constant.
|
||
|
||
|
||
|
||
4.4.3.3 NAMED TYPES ("Q")
|
||
|
||
|
||
This Stub consists of an LG (4-bytes) that points to the Type
|
||
Descriptor for this symbol.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 14
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
4.4.3.4 VARIABLES, FIELDS, TYPED CONS ("R")
|
||
|
||
|
||
This Stub contains information required to allocate and describe these
|
||
types of entities. The format and content is as follows:
|
||
|
||
+00: A one-byte flag that precisely identifies the class of the
|
||
item being described. The known values and their proper
|
||
interpretation is as follows:
|
||
|
||
0 -> Global Variables Allocated in DS;
|
||
1 -> Typed Constants Allocated in DS;
|
||
2 -> LOCAL Variables & VALUE Parameters on STACK;
|
||
6 -> ADDRESS Parameters allocated on STACK;
|
||
8 -> Fields suballocated in RECORDS and OBJECTS, plus
|
||
METHODS declared for OBJECTS.
|
||
|
||
+01: A WORD containing the allocation offset in bytes;
|
||
|
||
+03: A WORD whose content depends on the one-byte flag that
|
||
this stub begins with. The context-dependent values
|
||
observed thus far are:
|
||
|
||
If the flag is 0, 2 or 6, then this word is an LL that
|
||
locates the containing scope or zero if none;
|
||
|
||
If the flag is 8, then this word is an LL that locates the
|
||
Dictionary Header for the next field or method defined
|
||
within the Record or Object;
|
||
|
||
If the flag is 1, then this word is an offset within the
|
||
CONST DSeg Map that locates the text of the Typed Constant
|
||
Data.
|
||
|
||
+05: An LG that locates the proper Type Descriptor for this
|
||
symbol.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 15
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
4.4.3.5 SUBPROGRAMS & METHODS ("S")
|
||
|
||
|
||
Subprograms, especially since Object Methods are supported, have a
|
||
rather involved stub. Its format is as follows:
|
||
|
||
+00: A byte that contains bit-switches. These bit switches
|
||
have a great deal to do with the size of this stub and
|
||
with the proper interpretation of what follows. The
|
||
observed values of the bit-switches are as follows:
|
||
|
||
xxxxxxx1 -> Symbol declared in INTERFACE;
|
||
xxxxxx1x -> Symbol is an INLINE Declarative;
|
||
xxxx1x0x -> Symbol has EXTERNAL attribute;
|
||
x001xxxx -> Symbol is an ordinary Object Method;
|
||
x011xxxx -> Symbol is a CONSTRUCTOR Method;
|
||
x101xxxx -> Symbol is a DESTRUCTOR Method;
|
||
|
||
+01: A Word whose interpretation depends on whether we have an
|
||
INLINE Declarative Subprogram or not. If this is an
|
||
INLINE Declarative Subprogram, then this word contains the
|
||
byte-count of the INLINE code text at the end of this
|
||
stub. Otherwise, this word is the offset within the PROC
|
||
Map that locates the object code for this Subprogram.
|
||
|
||
+03: A Word that contains an LL which locates the containing
|
||
scope in the dictionary, or zero if none.
|
||
|
||
+05: A Word that contains an LL which locates the local Hash
|
||
Table for this scope. A local hash table provides access
|
||
to all formal parameters of the Subprogram as well as all
|
||
Symbols whose declarations are local to the scope of this
|
||
Subprogram.
|
||
|
||
+07: A Word that is zero unless the symbol is a Virtual Method.
|
||
In this case, then the content is the offset within the
|
||
VMT for the owning object that defines where the FAR
|
||
POINTER to this Virtual Method is stored.
|
||
|
||
+09: A Word that is zero unless the symbol is a Method. In
|
||
this case, then the content is an LL which locates the
|
||
next METHOD for this Object.
|
||
|
||
+0B: A complete Type-Descriptor for this Subprogram. The
|
||
length is variable and depends upon the number of Formal
|
||
Parameters declared in the header. A complete description
|
||
of this subfield is found in a later section
|
||
(4.4.4.3.2.6).
|
||
|
||
+??: If this Symbol represents an INLINE Declarative
|
||
Subprogram, then the object-code text begins here. The
|
||
byte-count of the text occurs at offset 0001h in this
|
||
stub.
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 16
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
4.4.3.6 TURBO STD PROCEDURES ("T")
|
||
|
||
|
||
This Stub consists of two bytes, the first of which is unique for each
|
||
|
|
||
procedure and increments by 4. I have found nothing in the SYSTEM
|
||
|
|
||
unit (which is where this entry appears) that this seems directly
|
||
|
|
||
related to. The second byte is always zero.
|
||
|
|
||
|
||
|
||
|
||
4.4.3.7 TURBO STD FUNCTIONS ("U")
|
||
|
||
|
||
This Stub consists of two bytes, the first of which is unique for each
|
||
|
|
||
function and increments by 4. I have found nothing in the SYSTEM unit
|
||
|
|
||
(which is where this entry appears) that this seems directly related
|
||
|
|
||
to. I wouldn't be surprised if this byte were an index into a TURBO
|
||
|
|
||
compiler table that points to specialized parse tables/action routines
|
||
|
|
||
for handling these functions and their non-standard parameter lists.
|
||
|
|
||
|
||
The second byte seems to be a flag having the values $00, $40 and $C0.
|
||
|
|
||
I strongly suspect that the flag $C0 marks exactly those functions
|
||
|
|
||
which may be evaluated at compile-time. The meaning behind the other
|
||
|
|
||
values is not known to me.
|
||
|
|
||
|
||
|
||
|
||
4.4.3.8 TURBO STD "NEW" ROUTINE ("V")
|
||
|
||
|
||
This Stub consists of a WORD whose function is (as yet) unknown. This
|
||
|
|
||
is the only Standard Turbo routine that can behave as a procedure as
|
||
|
|
||
well as a function (returning a pointer value).
|
||
|
|
||
|
||
|
||
|
||
4.4.3.9 TURBO STD PORT ARRAYS ("W")
|
||
|
||
|
||
This Stub consists of a byte whose value is 0 for byte arrays, and 1
|
||
for word arrays.
|
||
|
||
|
||
|
||
4.4.3.10 TURBO STD EXTERNAL VARIABLES ("X")
|
||
|
||
|
||
This Stub consists of an LG (4-bytes) that points to the Type
|
||
Descriptor for this symbol.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 17
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
4.4.3.11 UNITS ("Y")
|
||
|
||
|
||
Unit Stubs have the following content:
|
||
|
||
+00: A Word whose apparently reserved for use by the Compiler
|
||
or Linker.
|
||
|
||
+02: A Word that seems to contain some kind of "signature" used
|
||
to detect inconsistent Unit Versions. This author
|
||
suspects that this consists of some kind of sum-check or
|
||
hash total but has not yet identified the algorithm which
|
||
computes the value stored in this word.
|
||
|
||
+04: A Word that contains an LL which locates the Successor
|
||
Unit in the "Uses" list. In fact, the "Uses" lists of
|
||
both the INTERFACE and IMPLEMENTATION sections of the Unit
|
||
are merged by this Word into a single list. A value of
|
||
zero is used to indicate no successor.
|
||
|
||
+06: A Word that contains an LL which locates the Predecessor
|
||
Unit in the "Uses" list. For the SYSTEM unit entry, this
|
||
value is always zero to indicate no predecessor. For the
|
||
Unit being compiled, this LL locates the final Unit in the
|
||
combined "Uses" list.
|
||
|
||
In effect, the two LL's at offsets 0004 and 0006 organize the units
|
||
into both forward and backward linked chains. The entry for the unit
|
||
being compiled is effectively the head of both the forward and the
|
||
backward chains. The final unit in the merged "Uses" list is the tail
|
||
of the forward chain, and the SYSTEM unit is the tail of the backward
|
||
chain.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 18
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
4.4.4 TYPE DESCRIPTORS
|
||
|
||
|
||
Type Descriptors store much of the semantic information that applies
|
||
to the symbols declared in the unit. Implementation details can be
|
||
managed using high-level abstractions and these abstractions can be
|
||
shared.
|
||
|
||
|
||
|
||
4.4.4.1 SCOPE
|
||
|
||
|
||
Type Descriptor sharing can occur across the boundaries which are
|
||
implicit in unit modules. Thus, a type defined in one unit may be
|
||
"imported" by some other module. Also, the pre-defined Pascal Types
|
||
(plus the Turbo Pascal extensions) are defined in the SYSTEM.TPU unit
|
||
and there needs to be a means of "importing" such Type Descriptors
|
||
during compilation. This is precisely the objective of the LG locator
|
||
which was described in section 2.2 (above). Type Descriptors are
|
||
NEVER copied between units. The binding always occurs by reference at
|
||
compile time and this helps support the technique of modifying a unit
|
||
and compiling it to a .TPU file, then re-compiling all units/programs
|
||
that "USE" it.
|
||
|
||
Type Descriptors have many roles so their format varies. We have
|
||
divided these structures into two parts: The PREFIX Part (which is
|
||
always present and) whose format is fairly constant and the SUFFIX
|
||
Part whose content and format depends on the attributes that are part
|
||
of the type definition.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 19
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
4.4.4.2 PREFIX PART
|
||
|
||
|
||
The Prefix Part of every Type Descriptor consists of four (4) bytes.
|
||
The usage is consistent for all types observed by this author and the
|
||
format is as follows:
|
||
|
||
+00: A Byte that identifies the format of the Suffix part.
|
||
This is essentially based on several high-level categories
|
||
which the Suffix Parts support directly. The observed set
|
||
of values is as follows:
|
||
|
||
00h -> an un-typed entity;
|
||
01h -> an ARRAY type;
|
||
02h -> a RECORD type;
|
||
03h -> an OBJECT type;
|
||
04h -> a FILE type (other than TEXT);
|
||
05h -> a TEXT File type;
|
||
06h -> a SUBPROGRAM type;
|
||
07h -> a SET type;
|
||
08h -> a POINTER type;
|
||
09h -> a STRING type;
|
||
0Ah -> an 8087 Floating-Point type;
|
||
0Bh -> a REAL type;
|
||
0Ch -> a Fixed-Point ordinal type;
|
||
0Dh -> a BOOLEAN type;
|
||
0Eh -> a CHAR type;
|
||
0Fh -> an Enumerated ordinal type.
|
||
|
||
+01: A Byte used as a modifier. Since the above scheme is too
|
||
general for machine-dependent details such as storage
|
||
width and sign control, this modifier byte supplies
|
||
additional data as required. The author has identified
|
||
several cases in which this information is vital but has
|
||
not spent very much time on the subject. The chief areas
|
||
of importance seem to be in the 8087 Floating-Point types,
|
||
and the Fixed-Point ordinal types. The semantics seem to
|
||
be as follows:
|
||
|
||
0A 00 -> The type "SINGLE"
|
||
0A 02 -> The type "EXTENDED"
|
||
0A 04 -> The type "DOUBLE"
|
||
0A 06 -> The type "COMP"
|
||
|
||
0C 00 -> an un-named BYTE integer
|
||
0C 01 -> The type "SHORTINT"
|
||
0C 02 -> The type "BYTE"
|
||
0C 04 -> an un-named WORD integer
|
||
0C 05 -> The type "INTEGER"
|
||
0C 06 -> The type "WORD"
|
||
0C 0C -> an un-named double-word integer
|
||
0C 0D -> The type "LONGINT"
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 20
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
One important feature of the above semantics is the fact
|
||
that an un-typed CONST declaration refers to the above two
|
||
bytes to determine the storage space needed in the
|
||
dictionary for the data value of the constant. This can
|
||
be a little involved however as the constant may contain
|
||
its own length descriptor (as in the case of a character
|
||
string) in which case it may be sufficient to identify
|
||
the high-level type category without any modifier byte.
|
||
|
||
+02: A Word that contains the number of bytes of storage that
|
||
are required to contain an object/entity of this type.
|
||
For types that represent variable-length objects/entities
|
||
such as strings, this word may define the value returned
|
||
by the SIZEOF function as applied to the type.
|
||
|
||
|
||
|
||
4.4.4.3 SUFFIX PARTS
|
||
|
||
|
||
Suffix Parts further refine the implementation details of the type and
|
||
also provide subrange constraints where appropriate. In some cases
|
||
the Suffix part is empty since all semantic data for the type is
|
||
contained in the Prefix part.
|
||
|
||
|
||
|
||
4.4.4.3.1 UN-TYPED
|
||
|
||
|
||
This Suffix Part is empty. Nothing is known about an un-typed entity.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 21
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
4.4.4.3.2 STRUCTURED TYPES
|
||
|
||
|
||
The structured types represent aggregates of lower-level types. We
|
||
include ARRAY, RECORD, OBJECT, FILE, TEXT, SET, POINTER and STRING
|
||
types in this category.
|
||
|
||
|
||
|
||
4.4.4.3.2.1 ARRAY TYPES
|
||
|
||
|
||
The Suffix Part of the ARRAY type is so constructed as to be able to
|
||
support recursive or nested definition of arrays. The suffix format
|
||
is as follows:
|
||
|
||
+00: An LG that locates the Type Descriptor for the "base-type"
|
||
of the array. This is the type of the entity being
|
||
arrayed and may itself be an array.
|
||
|
||
+04: An LG that locates the Type Descriptor for the array
|
||
bounds which is a constrained ordinal type or subrange.
|
||
|
||
|
||
|
||
4.4.4.3.2.2 RECORD TYPES
|
||
|
||
|
||
RECORD types have nested scopes. The Suffix part provides a base
|
||
structure by which to locate the fields local to the scope of the
|
||
Record type itself. The format is as follows:
|
||
|
||
+00: A Word containing an LL which locates the local Hash Table
|
||
that provides access to the fields in the nested scope.
|
||
|
||
+02: A Word containing an LL which locates the Dictionary
|
||
Header of the initial field in the nested scope. This
|
||
supports a "left-to-right" traversal of the fields in a
|
||
record.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 22
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
4.4.4.3.2.3 OBJECT TYPES
|
||
|
||
|
||
OBJECT types also have nested scopes. The Suffix part provides a base
|
||
structure by which to locate the fields and METHODS local to the scope
|
||
of the OBJECT type itself. In addition, inheritance and VMT
|
||
particulars are stored. The format is as follows:
|
||
|
||
+00: A Word containing an LL which locates the local Hash Table
|
||
that provides access to the fields and METHODS local to
|
||
the nested scope.
|
||
|
||
+02: A Word containing an LL which locates the Dictionary
|
||
Header of the initial field or METHOD in the nested scope.
|
||
This supports a "left-to-right" traversal of the fields
|
||
and METHODS in an OBJECT.
|
||
|
||
+04: An LG which locates the Type Descriptor of the Parent
|
||
Object. This field is zero if there is no such Parent.
|
||
|
||
+08: A Word which contains the size in bytes of the VMT for
|
||
this Object. This field is zero if the object employs no
|
||
Virtual Methods.
|
||
|
||
+0A: A Word which contains the offset within the CONST DSeg Map
|
||
that locates the VMT skeleton or template segment. This
|
||
field equals FFFFh if the object employs no Virtual
|
||
Methods.
|
||
|
||
+0C: A Word which contains the offset within an Object instance
|
||
where the NEAR POINTER to the VMT for the object is stored
|
||
(within the DATA SEGMENT). This field equals FFFFh if the
|
||
object employs no Virtual Methods.
|
||
|
||
+0E: A Word which contains an LL which locates the Dictionary
|
||
Header for the name of the OBJECT itself.
|
||
|
||
|
||
|
||
4.4.4.3.2.4 FILE (NON-TEXT) TYPES
|
||
|
||
|
||
This Suffix consists of an LG that locates the Type Descriptor of the
|
||
base type of the file. Note that the Type Descriptor may be that of
|
||
an un-typed entity (for un-typed files).
|
||
|
||
|
||
|
||
4.4.4.3.2.5 TEXT FILE TYPES
|
||
|
||
|
||
This Suffix consists of an LG that locates the Type Descriptor of the
|
||
base type of the file -- in this case SYSTEM.CHAR.
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 23
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
4.4.4.3.2.6 SET TYPES
|
||
|
||
|
||
This Suffix consists of an LG that locates the base-type of the set
|
||
itself. Pascal limits such entities to simple ordinals whose
|
||
cardinality is limited to 256.
|
||
|
||
|
||
|
||
4.4.4.3.2.7 POINTER TYPES
|
||
|
||
|
||
This Suffix consists of an LG that locates the base-type of the entity
|
||
pointed at.
|
||
|
||
|
||
|
||
4.4.4.3.2.8 STRING TYPES
|
||
|
||
|
||
This is a special case of an ARRAY type. The format is as follows:
|
||
|
||
+00: An LG to the Type Descriptor SYSTEM.CHAR which is the base
|
||
type of all Turbo Pascal Strings.
|
||
|
||
+04: An LG to the Type Descriptor for the array bounds
|
||
constraints for the string.
|
||
|
||
|
||
|
||
4.4.4.3.3 FLOATING-POINT TYPES
|
||
|
||
|
||
The Suffix part for all Floating-Point types is EMPTY. All data
|
||
needed to specify these approximate number types is contained in the
|
||
Prefix part. The Types included in this class are SINGLE, DOUBLE,
|
||
EXTENDED, COMP and REAL.
|
||
|
||
|
||
|
||
4.4.4.3.4 ORDINAL TYPES
|
||
|
||
|
||
The Ordinal Types consist of the various "integer" types plus the
|
||
BOOLEAN, CHAR and Enumerated types.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 24
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
4.4.4.3.4.1 "INTEGERS"
|
||
|
||
|
||
These types include BYTE, SMALLINT, WORD, INTEGER and LONGINT. Their
|
||
Suffix parts are identical in format:
|
||
|
||
+00: A double-word containing the LOWER bound of the subrange
|
||
constraint on the type;
|
||
|
||
+04: A double-word containing the UPPER bound of the subrange
|
||
constraint on the type;
|
||
|
||
+08: An LG that locates the Type Descriptor of the largest
|
||
upward compatible type. This is the Type Descriptor that
|
||
is used to control the width of an un-typed constant in
|
||
the dictionary stub. For the "integer" types, this is an
|
||
LG to SYSTEM.LONGINT.
|
||
|
||
|
||
|
||
4.4.4.3.4.2 BOOLEANS
|
||
|
||
|
||
This type Suffix has the following format:
|
||
|
||
+00: A double-word containing the LOWER bound of the subrange
|
||
constraint on the type;
|
||
|
||
+04: A double-word containing the UPPER bound of the subrange
|
||
constraint on the type;
|
||
|
||
+08: An LG that locates the Type Descriptor SYSTEM.BOOLEAN.
|
||
There is no "upward compatible" type.
|
||
|
||
|
||
|
||
4.4.4.3.4.3 CHARS
|
||
|
||
|
||
This type Suffix has the following format:
|
||
|
||
+00: A double-word containing the LOWER bound of the subrange
|
||
constraint on the type;
|
||
|
||
+04: A double-word containing the UPPER bound of the subrange
|
||
constraint on the type;
|
||
|
||
+08: An LG that locates the Type Descriptor SYSTEM.CHAR. There
|
||
is no "upward compatible" type.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 25
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
4.4.4.3.4.4 ENUMERATIONS
|
||
|
||
|
||
This type Suffix is unusual and has the following format:
|
||
|
||
+00: A double-word containing the LOWER bound of the subrange
|
||
constraint on the type;
|
||
|
||
+04: A double-word containing the UPPER bound of the subrange
|
||
constraint on the type;
|
||
|
||
+08: An LG that locates the Prefix of the current Type
|
||
Descriptor. There is no upward compatible type.
|
||
|
||
What follows is a full-fledged SET Type Descriptor whose base type is
|
||
the Type Descriptor of the Enumerated Type itself. The author has not
|
||
yet discovered the reason for this.
|
||
|
||
|
||
|
||
4.4.4.3.5 SUBPROGRAM TYPES
|
||
|
||
|
||
The length of this Suffix is variable. The format is as follows:
|
||
|
||
+00: An LG that locates the Type Descriptor of the FUNCTION
|
||
result returned by the Subprogram. This field is zero if
|
||
the Subprogram is a PROCEDURE.
|
||
|
||
+04: A Word that contains the number of Formal Parameters in
|
||
the Function/Procedure header. If non-zero, then this
|
||
word is followed by the parameter list itself as a simple
|
||
array of parameter descriptors.
|
||
|
||
The format of a parameter descriptor is as follows:
|
||
|
||
0000: An LG that locates the Type Descriptor of the
|
||
corresponding parameter;
|
||
|
||
0004: A Byte that identifies the parameter passing
|
||
mechanism used for this entry as follows:
|
||
|
||
02h -> VALUE of parameter is passed on STACK,
|
||
06h -> ADDRESS of parameter is passed on STACK.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 26
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
5. MAPS AND LISTS
|
||
|
||
|
||
The "MAPS and LISTS" are not part of the symbol dictionary. Rather,
|
||
these structures provide access to the Code and Data Segments produced
|
||
by the compiler or included via the {$L name.OBJ} directive. The
|
||
format and purpose (as understood by this author) of each of these
|
||
tables is explained in the following sections.
|
||
|
||
|
||
|
||
5.1 PROC MAP
|
||
|
||
|
||
The PROC Map provides a means of associating the various Function and
|
||
Procedure declarations with the Code Segments. There is some evidence
|
||
that the Compiler produces CODE (and DATA) Segments for EACH of the
|
||
Subprograms defined in the Unit as well as for the un-named Unit
|
||
Initialization code block. There is also evidence that EXTERNAL PROCs
|
||
|
|
||
must be assembled separately in order to exploit fully the Turbo
|
||
"Smart Linker" since Turbo Pascal places some significant restrictions
|
||
on EXTERNAL routines in the area of Segment Names and Types.
|
||
Specifically, only code segments named "CODE" and data segments named
|
||
"DATA" will be used by the "Smart Linker" as sources of code and data
|
||
for inclusion in a Turbo Pascal .EXE file.
|
||
|
||
The first entry in the PROC Map is reserved for Unit Initialization
|
||
block. If there is no Unit Initialization block, this entry will be
|
||
|
|
||
filled with $FF. In addition, each and every PROC in the Unit has an
|
||
|
|
||
entry in this table.
|
||
|
||
If an EXTERNAL routine is included, then ALL PUBLIC PROC definitions
|
||
in that routine must be declared in the Unit Source Code with the
|
||
EXTERNAL attribute.
|
||
|
||
The size of the PROC Map Table (in Bytes) is implied in the Unit
|
||
Header by the LL's that occur at offsets +0C and +0E.
|
||
|
||
The Format of a single PROC Map Entry is as follows:
|
||
|
||
+00: A Word that contains an offset within the CSeg Map. This
|
||
is used to locate the code segment containing the PROC.
|
||
|
||
+02: A Word that contains an offset within the CODE Segment
|
||
that defines the PROC entry point relative to the load
|
||
point of the referenced CODE Segment.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 27
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
5.2 CSEG MAP
|
||
|
||
|
||
The CSeg Map provides a convenient descriptor table for each CODE
|
||
Segment present in the Unit and serves to relate these segments with
|
||
the Segment Relocation Data and the Segment Trace Table. It seems
|
||
reasonable to infer that the "Smart Linker" is able to include/exclude
|
||
code/data at the SEGMENT level only.
|
||
|
||
The CSeg Map is an array of fixed-length records whose format is as
|
||
follows:
|
||
|
||
+00: A Word apparently reserved for use by TURBO.
|
||
|
||
+02: A Word that contains the Segment Length (in bytes).
|
||
|
||
+04: A Word that contains the Length of the Relocation Data
|
||
Table for this Code Segment (in bytes).
|
||
|
||
+06: A Word that contains the offset of the Trace Table Entry
|
||
for this Segment (if it was compiled with DEBUG Support).
|
||
If there is no Trace Table for this segment, then this
|
||
Word contains FFFFh.
|
||
|
||
|
||
|
||
5.3 TYPED CONST DSEG MAP
|
||
|
||
|
||
The CONST DSeg Map provides a convenient descriptor table for each
|
||
DATA Segment present in the Unit which was spawned by the presence of
|
||
Typed Constants or VMT's in the Pascal Code. It serves to relate
|
||
these segments with the Segment Relocation Data and with the Code
|
||
Segments that refer to these DATA elements.
|
||
|
||
The CONST DSeg Map is an array of fixed-length records whose format is
|
||
as follows:
|
||
|
||
+00: A Word apparently reserved for use by TURBO.
|
||
|
||
+02: A Word that contains the Segment Length (in bytes).
|
||
|
||
+04: A Word that contains the Length of the Relocation Data
|
||
Table for this DATA Segment (in bytes).
|
||
|
||
+06: A Word that contains an LL which locates the OBJECT that
|
||
owns this VMT skeleton or zero if the segment is not a VMT
|
||
skeleton.
|
||
|
||
It is possible to determine the containing scope for a Typed Constant
|
||
declaration but -- unless it is for a VMT -- the job is a bit tedious.
|
||
Essentially, one has to search the Symbol Dictionary for a declaration
|
||
whose offset points to a given entry and the complete path to that
|
||
symbol must be recorded. Our program doesn't do this but it can be
|
||
done if the required dictionary entries are present.
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 28
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
|
||
|
||
5.4 GLOBAL VAR DSEG MAP
|
||
|
||
|
||
The VAR DSeg Map provides a convenient descriptor table for each DATA
|
||
Segment present in the Unit.
|
||
|
||
One entry exists for each CODE segment which refers to GLOBAL VAR's
|
||
allocated in the DATA Segment. These references may be seen in the
|
||
Relocation Data Table. Each EXTERNAL CSeg having a segment named DATA
|
||
also spawns an entry in this table. Only the Code Segments that meet
|
||
these criteria cause entries to be generated in the VAR Dseg Map.
|
||
|
||
The VAR DSeg Map is an array of fixed-length records whose format is
|
||
as follows:
|
||
|
||
+00: A Word apparently reserved for use by TURBO.
|
||
|
||
+02: A Word that contains the Segment Length (in bytes). This
|
||
may be zero, especially if the EXTERNAL routine contains a
|
||
DATA segment whose sole purpose is to declare one or more
|
||
EXTRN symbols that are defined in some DATA segment
|
||
external to the Assembly.
|
||
|
||
+04: A Word apparently reserved for use by TURBO.
|
||
|
||
+06: A Word apparently reserved for use by TURBO.
|
||
|
||
To determine the identity of the CSeg that owns some particular entry
|
||
in this table, examine the Relocation Data for ALL CSegs. Each CSeg
|
||
which makes reference to a DATA segment has an entry in this table.
|
||
|
||
|
||
|
||
5.5 DONOR UNIT LIST
|
||
|
||
|
||
This list contains an entry for each Unit (taken from the "USES" list)
|
||
which MAY contribute either CODE or DATA to the executable file. Not
|
||
all units do make such a contribution as some exist merely to define a
|
||
collection of Types, etc. A Unit gets into this list if there exists
|
||
a single Relocation Data Entry that references CODE or DATA in that
|
||
Unit.
|
||
|
||
The list is comprised of elements whose SIZE is variable and whose
|
||
format is as follows:
|
||
|
||
+00: A WORD apparently reserved for use by TURBO.
|
||
|
||
+02: A variable-length String containing the unit name.
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 29
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
5.6 SOURCE FILE LIST
|
||
|
||
|
||
This list contains an entry for each "source" file used to compile the
|
||
Unit. This includes the Primary Pascal file, files containing Pascal
|
||
code included by means of the {$I filename.xxx} compiler directive,
|
||
and .OBJ files included by the {$L filename.OBJ} compiler directive.
|
||
|
||
The order of entries in this list is critical since it maps the CODE
|
||
segments stored in the unit. The order of the entries is as follows:
|
||
|
||
1) The Primary Pascal file;
|
||
|
||
2) All Included Pascal files;
|
||
|
||
3) All Included .OBJ files.
|
||
|
||
Mapping of CSegs to files is done as follows:
|
||
|
||
a) Each .OBJ file contributes a SINGLE Code Segment (if any).
|
||
Note that this author has not observed an .OBJ module that
|
||
contains only a DATA Segment (but that seems a distinct
|
||
possibility).
|
||
|
||
b) The Primary Pascal file (augmented by all included Pascal
|
||
Files) contributes zero or more CODE Segments.
|
||
|
||
Therefore, there are at least as many CSeg entries as .OBJ files. If
|
||
more, then the excess entries (those at the front of the list) belong
|
||
to the Pascal files that make up the Pascal source for the unit.
|
||
|
||
The format of an entry in this list is as follows:
|
||
|
||
+00: A flag byte that indicates the type of file represented;
|
||
|
||
04h -> the Primary Pascal Source File,
|
||
03h -> an Included Pascal Source File,
|
||
05h -> an .OBJ file that contains a CODE segment.
|
||
|
||
+01: A Word apparently reserved for use by the Compiler/Linker.
|
||
|
||
+03: A Word that is zero for .OBJ files and which contains the
|
||
file directory time-stamp for Pascal Files.
|
||
|
||
+05: A Word that is zero for .OBJ files and which contains the
|
||
file directory date-stamp for Pascal Files.
|
||
|
||
+07: A variable-sized string containing the filename and
|
||
extension of the file used during compilation.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 30
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
5.7 DEBUG TRACE TABLE
|
||
|
||
|
||
If Debug support was selected at compile time, then all Pascal code
|
||
which supports Debugging produces an entry in this table. The table
|
||
entries themselves are variable in size and have the following format:
|
||
|
||
+00: A Word which contains an LL that locates the Directory
|
||
Header of the Symbol (a PROC name) this entry represents.
|
||
|
||
+02: A Word which contains the offset (within the Source File
|
||
List) of the entry that names the file that generated the
|
||
CSeg being traced. This allows the file included by means
|
||
of the {$I filename} directive to be identified for DEBUG
|
||
purposes, as well as code produced from the Primary File.
|
||
|
||
+04: A Word containing the number of bytes of data that precede
|
||
the BEGIN statement code in the segment. For Pascal PROCS
|
||
these bytes consist of literal constants, un-typed
|
||
|
|
||
constants, and other data such as range-checking limits,
|
||
|
|
||
etc.
|
||
|
||
+06: A Word containing the Line Number of the BEGIN statement
|
||
for the PROC.
|
||
|
||
+08: A Word containing the number of lines of Source Code to
|
||
Trace in this Segment.
|
||
|
||
+0A: An array of bytes whose size is at least the number of
|
||
source code lines in the PROC. Each byte contains the
|
||
number of bytes of object code in the corresponding source
|
||
line. This appears to be an array of SHORTINT since if a
|
||
"line" contains more than 127 bytes, then a single byte of
|
||
$80 precedes the actual byte count as a sort of "escape"
|
||
and the next byte records the up to 255 bytes for the
|
||
|
|
||
line. This situation has not yet been fully explored. We
|
||
|
|
||
do not yet know what happens in the event a line is
|
||
|
|
||
credited with spawning more than 255 bytes of code.
|
||
|
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 31
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
6. CODE, DATA, RELOCATION INFO
|
||
|
||
|
||
This area begins at the start of the next free PARAGRAPH. This means
|
||
that its offset from the beginning of the Unit ALWAYS ends in the
|
||
digit zero.
|
||
|
||
This area contains the CODE segments, CONST DATA segments, and the
|
||
Relocation Data required for linking.
|
||
|
||
|
||
|
||
6.1 OBJECT CSEGS
|
||
|
||
|
||
Each CODE segment included in the unit appears here as specified by
|
||
the CSeg Map Table. Depending on usage, these segments may appear in
|
||
the executable file. There are no filler bytes between segments.
|
||
|
||
|
||
|
||
6.2 CONST DSEGS
|
||
|
||
|
||
This section begins at the start of the first free PARAGRAPH following
|
||
the end of the Object CSegs. This means that its offset from the
|
||
beginning of the Unit ALWAYS ends in the digit zero.
|
||
|
||
A DATA segment fragment appears here for each CSeg that declares a
|
||
typed constant, and for each OBJECT which employs Virtual Methods.
|
||
There are no filler bytes between segments.
|
||
|
||
If local symbols were generated, there is always enough information to
|
||
allow documenting the scope of the declaration as well as interpreting
|
||
the data in the display since the needed type declarations would also
|
||
be available. Our program doesn't go to this extreme however.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 32
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
6.3 RELOCATION DATA TABLE
|
||
|
||
|
||
This table begins at the start of the first free PARAGRAPH following
|
||
the end of the CONST DSegs. This means that its offset from the
|
||
beginning of the Unit ALWAYS ends in the digit zero. There are two
|
||
|
|
||
sections in this table: one for code, and one for data. Both
|
||
|
|
||
sections are aligned on paragraph boundaries. This may result in a
|
||
|
|
||
"slack" entry between the code and data sub-sections, but this entry
|
||
|
|
||
is included in the byte tally for the section stored in the Unit
|
||
|
|
||
Header Table at ULPtch (offset +20).
|
||
|
|
||
|
||
The table begins with entries for the CSeg Map and ends with entries
|
||
for the CONST DSeg Map. The appropriate Map entry specifies the
|
||
number of bytes of Relocation Data for the corresponding segment.
|
||
This number may be zero in which case there is no Relocation Data for
|
||
the given segment.
|
||
|
|
||
|
||
The Table consists of an array of eight (8) byte entries whose format
|
||
is as follows:
|
||
|
||
+00: A Byte containing the offset within the Donor Unit List of
|
||
the Unit name that this entry refers to. This can be the
|
||
compiled Unit or some previously compiled external unit.
|
||
|
||
+01: A Byte that defines the type of reference being made and
|
||
implies the size of the pointer needed (WORD or DWORD).
|
||
The known and/or observed values are as follows:
|
||
|
||
00h -> a WORD refers to a PROC Map.
|
||
10h -> a WORD refers to a PROC Map.
|
||
20h -> a WORD refers to a PROC Map.
|
||
30h -> a DWORD pointer refers to a PROC Map.
|
||
50h -> a WORD refers to a CSeg Map.
|
||
60h -> a WORD refers to an unknown Map.
|
||
70h -> a DWORD pointer refers to a CSeg Map.
|
||
90h -> a WORD refers to a VAR DSeg Map.
|
||
A0h -> a WORD refers to a DSeg Map for SEG address.
|
||
|
|
||
D0h -> a WORD refers to a CONST DSeg Map.
|
||
|
||
+02: A Word containing the offset within the Map table
|
||
referenced according to the above code scheme.
|
||
|
||
+04: A Word containing an offset within the target segment
|
||
which will be added to the effective address. For
|
||
example, a reference to the VAR DSeg Map will require a
|
||
final offset to locate the item (variable) within the DATA
|
||
SEGMENT being referenced here. This may also be needed
|
||
for references to LITERAL DATA embedded in a CODE SEGMENT.
|
||
|
||
+06: A Word containing the offset within the CODE or DATA
|
||
segment owning this entry that contains the area to be
|
||
|
|
||
patched with the value of the final effective address.
|
||
|
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 33
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
For some truly wild guessing about the flag byte above, the following
|
||
|
|
||
pattern seems to be emerging. Look at bits 7-4 of this byte. It
|
||
|
|
||
appears that the type of Map reference may be coded into bits 7-6 and
|
||
|
|
||
that the size or type of reference may be coded into bits 5-4. Note
|
||
|
|
||
that bits 7-6 are "00" for PROC Map items, "01" for CSeg Map items,
|
||
|
|
||
"10" for Global DSeg Map items, and "11" for Const DSeg Map items. It
|
||
|
|
||
appears that the size or type of reference may be coded into bits 5-4.
|
||
|
|
||
Note that all FAR (DWORD) pointer references show these bits as "11"
|
||
|
|
||
and that a SEGMENT Register value appears as "10" and that WORD values
|
||
|
|
||
otherwise appear as "01" or "00". Further, no type 00h item has been
|
||
|
|
||
seen which has a non-zero effective address adjustment. This all
|
||
|
|
||
seems to suggest the following code structure:
|
||
|
|
||
|
||
7654 3210 (bits 3-0 don't seem to be used)
|
||
|
|
||
|
||
00-- ---- Locate item via a PROC Map,
|
||
|
|
||
01-- ---- Locate item via a CSeg Map,
|
||
|
|
||
10-- ---- Locate item via a Global DSeg Map,
|
||
|
|
||
11-- ---- Locate item via a Const DSeg Map,
|
||
|
|
||
--00 ---- WORD offset has NO effective address adjustment,
|
||
|
|
||
--01 ---- WORD offset HAS an effective address adjustment,
|
||
|
|
||
--10 ---- WORD is content of a SEGMENT Register such as DS
|
||
|
|
||
or CS.
|
||
|
|
||
--11 ---- DWORD (FAR) pointer is supplied with possible
|
||
|
|
||
effective address adjustment.
|
||
|
|
||
|
||
The evidence in support of this conjecture is both slim and vast. It
|
||
|
|
||
all depends on how much data one looks at. I have looked at a lot of
|
||
|
|
||
data from the Borland supplied units and I haven't found anything to
|
||
|
|
||
refute the above. Accordingly, the supplied program interprets this
|
||
|
|
||
flag byte according to this scheme.
|
||
|
|
||
|
||
|
||
|
||
7. SUPPLIED PROGRAM
|
||
|
||
|
||
In order that the above information be made constructively useful, the
|
||
author has designed a program that automates the process of discovery.
|
||
It is not a "handsome" program and it is not a work of art. It does
|
||
give useful results provided your PC has enough available memory.
|
||
|
||
It should be obvious that the program was not designed "top-down".
|
||
Rather, it just evolved as each new discovery was made. Later on, it
|
||
seemed reasonable to try to document some of the relations between the
|
||
various lists and tables and the program tries to make some of these
|
||
relations clear, albeit with varying degrees of success.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 34
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
7.1 TPUNEW
|
||
|
|
||
|
||
|
||
This is the main program. It will ask for the name of the unit to be
|
||
documented. Reply with the unit name only. The program will append
|
||
the ".TPU" extension and will search for the proper file.
|
||
|
||
The program will then ask if Dis-Assembly is desired and will require
|
||
a "y" or "n" answer.
|
||
|
||
The current directory will be searched first, followed by all
|
||
directories in the current PATH. The program will NOT search a ".TPL"
|
||
(Turbo Pascal Library) file.
|
||
|
||
If the desired unit is found, the program will write a report to the
|
||
current directory named "unitname.lst" which contains its analysis.
|
||
The format of the report is such that it may be copied to a printer if
|
||
that printer supports TTY control codes with form-feeds. Be judicious
|
||
in doing this however since there can be a lot of information. The
|
||
Turbo SYSTEM.TPU unit file produces almost ninety (90) pages without
|
||
|
|
||
the disassembly option. When disassembly is requested for the SYSTEM
|
||
|
|
||
unit, the size of the output file exceeds 700K bytes.
|
||
|
|
||
|
||
|
||
|
||
7.2 TPURPT1
|
||
|
||
|
||
This is a Unit that contains the text-file output primitives required
|
||
by the main program. It's not very pretty but it does work.
|
||
|
||
|
||
|
||
7.3 TPUAMS1
|
||
|
||
|
||
This Unit contains all Type Definitions, Structures, and "Canned"
|
||
Functions and Procedures required by the main program. All structures
|
||
documented in this report are also documented in TPUAMS1 by means of
|
||
the TYPE mechanism. Some of the structures are difficult if not
|
||
impossible to handle using ISO Pascal but Turbo Pascal provides the
|
||
means for getting the job done.
|
||
|
||
|
||
|
||
7.4 TPUUNA1
|
||
|
||
|
||
This unit is a rudimentary disassembler. The output will not assemble
|
||
and may look strange to a real assembler programmer since this author
|
||
is not so-qualified. However, the basis for support of 80286, 80386
|
||
etc. processors is present as well as coprocessor support. Of perhaps
|
||
the greatest interest is that it does appear to decode the emulated
|
||
coprocessor instructions that are implemented via INT 34-3D.
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 35
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
Be warned however. The output is not guaranteed since this was coded
|
||
by myself and I am perhaps the rankest amateur that ever approached
|
||
this quite awful assembler language. For convenience, the operand
|
||
coding mimics TASM "Ideal" mode.
|
||
|
||
As is usual with programs of this type, error-recovery is minimal and
|
||
no context checking is performed. If the operation code is found to
|
||
be valid, then a valid instruction is assumed -- even if invalid
|
||
operands are present.
|
||
|
||
The only positives that apply to this program are that it doesn't slow
|
||
the cpu down (although a lot more output is produced), and it does let
|
||
one "tune" code for compactness by letting one view the results of the
|
||
coding directly. Also, incomplete instructions are handled as data
|
||
|
|
||
rather than overrunning into the next proc.
|
||
|
|
||
|
||
|
||
|
||
7.5 MODIFICATIONS
|
||
|
||
|
||
It was intended from the beginning that this program should be able to
|
||
be enhanced to permit external units to be referenced during the
|
||
analysis of any given unit, even if they were library components. The
|
||
author hopes that users so-inclined will find the code pliable enough
|
||
to engineer such enhancements. No small amount of care was expended
|
||
to make pointer references flexible enough so that more than one unit
|
||
could be addressed at one time. However, none of the references to
|
||
external units are resolved by the program as it now stands.
|
||
|
||
This program was NOT intended as a pilot for some future product. It
|
||
|
|
||
WAS intended as a rather "ersatz" tool for myself.
|
||
|
|
||
|
||
|
||
|
||
7.6 NOTES ON PROGRAM LOGIC
|
||
|
|
||
|
||
|
||
The following sections discuss a few of the methods employed by the
|
||
supplied program.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 36
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
7.6.1 FORMATTING THE DICTIONARY
|
||
|
|
||
|
||
|
||
Printing the unit dictionary area in a way that exposes its underlying
|
||
|
|
||
semantics is no small task. The unit dictionary area itself is a
|
||
|
|
||
rather amorphous-looking mass of data composed of hash tables,
|
||
|
|
||
dictionary headers and stubs, type descriptors, etc. In order to
|
||
|
|
||
present all this information in a meaningful way, we have to reveal
|
||
|
|
||
its structure and this cannot be done by means of a sequential
|
||
|
|
||
"browse" technique. Rather, we have to visit all nodes in the
|
||
|
|
||
dictionary area so that each may be formatted in a way that exposes
|
||
|
|
||
their function and meaning. This is made necessary by the fact that
|
||
|
|
||
items are added to the dictionary as encountered and no convenient
|
||
|
|
||
ordering of entry types exists. What we have here is the problem of
|
||
|
|
||
finding a minimal "cover" for the dictionary area that properly
|
||
|
|
||
exposes the content and structure of the dictionary area.
|
||
|
|
||
|
||
To do this, we construct (in the heap) a stack and a queue, both of
|
||
|
|
||
which are initially empty. The entries we put in the stack identify
|
||
|
|
||
the class of entry (Hash Table, Dictionary Header, Type Descriptor or
|
||
|
|
||
In-Line Code group), the location of the structure, and the location
|
||
|
|
||
of its immediate "owner" or "parent" dictionary entry (which allows
|
||
|
|
||
some limited information about scope to be printed).
|
||
|
|
||
|
||
To the empty stack, we add an entry for the unit name dictionary
|
||
|
|
||
entry, the INTERFACE hash table, and the DEBUG hash table. All these
|
||
|
|
||
are located via direct pointers (LL's) in the Unit Header Table. We
|
||
|
|
||
then pop one entry off the stack and begin our analysis.
|
||
|
|
||
|
||
a) If the entry we popped off the stack is not present in the
|
||
|
|
||
queue, we add it and call a routine that can interpret the entry
|
||
|
|
||
(aka, "cover") for a Dictionary Header, Hash Table, or Type
|
||
|
|
||
Descriptor. (This may lead to additional entries being added to
|
||
|
|
||
the stack such as nested-scope hash tables, Dictionary Headers,
|
||
|
|
||
Type Descriptors or In-Line Code group entries.)
|
||
|
|
||
|
||
b) While the stack is not empty, we pop another entry and repeat
|
||
|
|
||
step "a" (above) until no more entries are available.
|
||
|
|
||
|
||
The result is a queue containing one entry for each structure in the
|
||
|
|
||
unit dictionary area that is identifiable via traversal. (In
|
||
|
|
||
practice, the method we use is similar to a "breadth-first" traversal
|
||
|
|
||
of an n-way tree that is implemented in non-recursive fashion.) Each
|
||
|
|
||
entry in the queue contains the information described above and the
|
||
|
|
||
queue itself thus forms a set of descriptors that drive the process of
|
||
|
|
||
formatting the dictionary area for display. The process may be
|
||
|
|
||
likened to "painting by the numbers" or to finding a way to lay tile
|
||
|
|
||
on a flat surface using tiles of four different irregular shapes until
|
||
|
|
||
the floor is exactly covered.
|
||
|
|
||
|
||
There is one significant limitation that needs to be pointed out. It
|
||
|
|
||
is not always possible to determine the "parent" or "owner" of a node
|
||
|
|
||
with certainty. The following discussion illustrates the problem of
|
||
|
|
||
finding the "real" parent of a Type Descriptor.
|
||
|
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 37
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
Almost every "type" in Pascal is actually derived from the basic types
|
||
|
|
||
that are (in Turbo Pascal) defined in the SYSTEM.TPU unit -- e.g.
|
||
|
|
||
"INTEGER", "BYTE", etc. In addition, several of the Type Descriptors
|
||
|
|
||
in the SYSTEM unit are referenced by more than one Dictionary Entry.
|
||
|
|
||
Thus, we find that a "many-to-one" relationship may exist between
|
||
|
|
||
Dictionary Entries and Type Descriptors. How does one find out which
|
||
|
|
||
is the entry that actually gave rise to the Type Descriptor?
|
||
|
|
||
|
||
The Dictionary Area of a unit has some special properties, one of
|
||
|
|
||
which is the fact that the Dictionary Entries for named Types are
|
||
|
|
||
often located quite near their primary type descriptors. The
|
||
|
|
||
Dictionary Area seems to be treated as an upward growing heap with the
|
||
|
|
||
various structures being added by Turbo as needed. This makes it
|
||
|
|
||
likely that the Type "Q" header which gives rise to a type descriptor
|
||
|
|
||
is quite likely to occur earlier in the Dictionary Area than any other
|
||
|
|
||
header which refers to the same descriptor. We take advantage of this
|
||
|
|
||
property to allocate "ownership" but it may not be "fool-proof". Some
|
||
|
|
||
type descriptors are spawned by other type descriptors, especially for
|
||
|
|
||
structured types. We don't attempt to allocate "ownership" to these
|
||
|
|
||
"lower-level" descriptors.
|
||
|
|
||
|
||
|
||
|
||
7.6.2 THE DISASSEMBLER
|
||
|
|
||
|
||
|
||
To start with, I apologize up front for mistakes which are bound to be
|
||
|
|
||
present in this routine. I am not a MASM or TASM programmer and I
|
||
|
|
||
will not pretend otherwise. This being the case, the formatting I
|
||
|
|
||
have chosen for the operands may be erroneous or misleading and might
|
||
|
|
||
(if submitted to one of the "real" assemblers) produce object code
|
||
|
|
||
quite different from what is expected. I hope not, but I have to
|
||
|
|
||
admit it's possible.
|
||
|
|
||
|
||
My intention in adding this unit was to permit tuning of object code
|
||
|
|
||
to be made possible. With practice and some effort, one can observe
|
||
|
|
||
the effect on the object module caused by specific Pascal coding.
|
||
|
|
||
Thus, where compactness is an issue of paramount importance, TPUUNA1
|
||
|
|
||
can be of help. In some cases, a simple re-arrangement of the local
|
||
|
|
||
variable declarations in a procedure can have a significant effect of
|
||
|
|
||
the size of the code if it means the difference between 1 and 2-byte
|
||
|
|
||
displacements for each instruction that references a specific local
|
||
|
|
||
variable. Potential applications along these lines seem almost
|
||
|
|
||
unlimited.
|
||
|
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 38
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
I adopted an operand format not unlike that of TASM "Ideal" mode since
|
||
|
|
||
it was more convenient to do so and looked more readable to me. I
|
||
|
|
||
relied on several reference books for guidance in decoding the entire
|
||
|
|
||
mess and I found that there were several flaws (read ERRORS) in some
|
||
|
|
||
of them which made the job that much more difficult. I then
|
||
|
|
||
compounded my problems by attempting to handle 80286 and 80386
|
||
|
|
||
specific code even though Turbo Pascal does not generate code specific
|
||
|
|
||
to these processors. I simply felt that the effort involved in
|
||
|
|
||
writing any sort of Dis-Assembly program for Turbo Pascal units was an
|
||
|
|
||
effort best experienced not more than once. With all this self-
|
||
|
|
||
flagellation out of my system once and for all, I will try to show the
|
||
|
|
||
basic strategy of the program and to explain the limitations and some
|
||
|
|
||
of the discoveries I made.
|
||
|
|
||
|
||
The routine is intended to be idiotically simple - i.e., no smarter
|
||
|
|
||
than the DEBUG command in principle. The basic idea is: pass some
|
||
|
|
||
text to the routine and get back ONE line derived from some prefix of
|
||
|
|
||
that text. Repeat as necessary until all text is gone. Thus, there
|
||
|
|
||
is no attempt to check the context of the text being processed. Also,
|
||
|
|
||
some configurations of the "modR/M" byte may invalid for selected
|
||
|
|
||
instructions. I don't try to screen these out since the intent was to
|
||
|
|
||
look at the presumably correct code produced by TURBO Pascal -- not
|
||
|
|
||
devious assembly language. Also, this program regards WAIT operations
|
||
|
|
||
as "stand-alone" -- i.e., it doesn't check to see if a coprocessor
|
||
|
|
||
operation follows for which the WAIT might be regarded as a prefix.
|
||
|
|
||
|
||
One area of real difficulty was figuring out the Floating-Point
|
||
|
|
||
emulations used by Turbo Pascal that are implemented by means of
|
||
|
|
||
interrupts $34 through $3D. I don't know if I got it right, but the
|
||
|
|
||
results seem reasonable and consistent. In the listing, the Interrupt
|
||
|
|
||
is produced on one line, followed by its parameters on the next line.
|
||
|
|
||
The parameter line is given the op-code "EMU_xxxx" where "xxxx" is the
|
||
|
|
||
coprocessor op-code I felt was being emulated. Interrupt $3C was a
|
||
|
|
||
real puzzler but after seeing a lot of code in context, I think that
|
||
|
|
||
the segment override is communicated to the emulator by means of the
|
||
|
|
||
first byte after the $3C.
|
||
|
|
||
|
||
Normally, in a non-emulator environment, all coprocessor operations
|
||
|
|
||
(ignoring any WAIT prefixes) begin with $D8-$DF. What Borland (and
|
||
|
|
||
maybe Microsoft) seem to have done here is to change the $D8-$DF so
|
||
|
|
||
that bits 7 and 6 of this byte are replaced with the one's complement
|
||
|
|
||
of the 2-bit segment register number found in various 8086
|
||
|
|
||
instructions. This seems to be how an override for the DS register is
|
||
|
|
||
passed to the emulator. I don't KNOW this to be the correct
|
||
|
|
||
interpretation, but the code I have examined in context seems to work
|
||
|
|
||
under this scheme, so TPUUNA uses it to interpret the operand
|
||
|
|
||
accordingly.
|
||
|
|
||
|
||
For 80x86 machines, the problem was somewhat simpler. TPUUNA takes a
|
||
|
|
||
quick look at the first byte of the text. Almost any byte is valid as
|
||
|
|
||
the initial byte of an instruction, but some instructions require more
|
||
|
|
||
than one byte to hold the complete operation code. Thus, step 1
|
||
|
|
||
classifies bytes in several ways that lead to efficient recognition of
|
||
|
|
||
valid operation codes.
|
||
|
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 39
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
Once the instruction has been identified in this way, it is more or
|
||
|
|
||
less easy to link to supplemental information that provides operand
|
||
|
|
||
editing guidance, etc.
|
||
|
|
||
|
||
The tables that embody the recognition scheme were constructed using
|
||
|
|
||
PARADOX 3.0 (another fine Borland product) and suitably coded queries
|
||
|
|
||
were used to generate the actual Turbo Pascal code for compilation.
|
||
|
|
||
|
||
For those that are interested, TPUUNA supports the address-size and
|
||
|
|
||
operand-size prefixes of the 80386 as well as 32-bit operands and
|
||
|
|
||
addresses but remember that Turbo Pascal doesn't generate these. A
|
||
|
|
||
trivial change is provided for which allows segments which default to
|
||
|
|
||
32-bit mode to be handled as well.
|
||
|
|
||
|
||
There is a simple mode variable that gets passed to TPUUNA by its
|
||
|
|
||
caller which specifies the most-capable processor whose code is to be
|
||
|
|
||
handled. Codes are provided for the 8086 (8088 is the same), 80186
|
||
|
|
||
(same as 80286 except no protected mode instructions), 80286 (80186
|
||
|
|
||
plus protected mode operation), and 80386.
|
||
|
|
||
|
||
No such specifier is provided for coprocessor support. What is there
|
||
|
|
||
is what I think an 80387 supports. I don't think that this is really
|
||
|
|
||
a problem if you don't try to use TPUUNA for anything but Turbo Pascal
|
||
|
|
||
code.
|
||
|
|
||
|
||
Error recovery is predictably simple. The initial text byte is output
|
||
|
|
||
as the operand of a DB pseudo-op and provision is made to resume work
|
||
|
|
||
at the next byte of text.
|
||
|
|
||
|
||
I hope this program is found to be useful in spite of the errors it
|
||
|
|
||
must surely contain. I have yet to make much sense of the rules for
|
||
|
|
||
MASM or TASM operand coding and I found very little of value in many
|
||
|
|
||
of the so-called "texts" on the subject. I found myself in the
|
||
|
|
||
position of that legendary American watching a Cricket match in
|
||
|
|
||
England for the first time ("You mean it has RULES?").
|
||
|
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 40
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
8. UNIT LIBRARIES
|
||
|
||
|
||
This author has examined .TPL files in passing and concludes that
|
||
their structure is trivial in the extreme. The following notes should
|
||
be of some help.
|
||
|
||
|
||
|
||
8.1 LIBRARY STRUCTURE
|
||
|
||
|
||
A Turbo Pascal Library (.TPL) file appears to be a simple catenation
|
||
of Turbo Pascal Unit (.TPU) files. Since the length of a Unit may be
|
||
determined from the Unit Header (see section 3.2), it is simple to see
|
||
that one may "browse" through a .TPL file looking for an external unit
|
||
such as SYSTEM.TPU. If this seems to be too much effort, then there
|
||
is always the TPUMOVER Utility program supplied by Borland.
|
||
|
||
|
||
|
||
8.2 THE TPUMOVER UTILITY
|
||
|
||
|
||
Quite simply, this Utility allows one to extract units from .TPL files
|
||
in order to subject them to the analysis performed by TPUMAIN. Read
|
||
your Turbo Pascal User's Guide for instructions on the operation and
|
||
use of this utility.
|
||
|
||
|
||
|
||
9. APPLICATION NOTES
|
||
|
||
|
||
One of the more obvious applications of this information would seem to
|
||
be in the area of a Cross-Reference Generator.
|
||
|
||
There is a very fine example of such a program in the public domain
|
||
that was written by Mr. R. N. Wisan called "PXL". This program has
|
||
been around since the days of Turbo Pascal Version 1. The program has
|
||
been continually enhanced by the author in the way of features and for
|
||
support of the newer Turbo Pascal versions. It does not however solve
|
||
the problem of telling one which unit contains the definition of a
|
||
given symbol. In fairness to "PXL" however, this is no small problem
|
||
since the format of .TPU files keeps changing (Turbo 5.5 Units are
|
||
not object-code compatible with Turbo 5.0 Units, and so on...) and
|
||
Mr. Wisan probably has more than enough other projects to keep himself
|
||
occupied.
|
||
|
||
However, for the user who is willing to work a little (maybe a lot?),
|
||
this document would seem to provide the information needed to add such
|
||
a function to his own pet cross-reference generator.
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 41
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
10. ACKNOWLEDGEMENTS
|
||
|
||
|
||
This project would have been totally infeasible without the aid of
|
||
some very fine tools. As it was, several hundred man hours have been
|
||
expended on it and as you can see, there are a few unresolved issues
|
||
that have been (graciously) left for others to address. The tools
|
||
used by this author consisted of:
|
||
|
||
1) Turbo Pascal 5.5 Professional by Borland International
|
||
|
||
2) Microsoft WORD (version 5.0)
|
||
|
||
3) LIST (version 6.4a) by Vernon D. Buerg
|
||
|
||
4) the DEBUG utility in MS-DOS Version 3.3.
|
||
|
||
5) PARADOX 3.0 by Borland International
|
||
|
|
||
|
||
6) QUATTRO PRO by Borland International
|
||
|
|
||
|
||
7) TURBO ASSEMBLER 1.1 by Borland International
|
||
|
|
||
|
||
(PARADOX and QUATTRO PRO were used for data collection and analysis in
|
||
|
|
||
the course of coding the recognizer tables for the disassembler unit.)
|
||
|
|
||
|
||
The references listed were of great value in this project. [Intel85]
|
||
|
|
||
was a valuable source of information about coprocessor instructions as
|
||
|
|
||
well as offering hints about the differences between the 8086/8088 and
|
||
|
|
||
the 80286. The [Borland] TASM manuals offered further info on the
|
||
|
|
||
80186. [Nelson] provided presentations of well-organized data
|
||
|
|
||
directed at the problem of disassembly but the tables were flawed by a
|
||
|
|
||
number of errors which crept into my databases and which caused much
|
||
|
|
||
of the extra debugging effort. [Intel89] offered valuable insights on
|
||
|
|
||
the 80386 addressing schemes as well as the 32-bit data extensions.
|
||
|
|
||
Finally, [Brown] provided valuable clues on the Floating-Point
|
||
|
|
||
emulators used by Borland (and Microsoft?). As you can see, the
|
||
|
|
||
amount of hard information available to me on this project was quite
|
||
|
|
||
limited since I am unaware of any other existing body of literature on
|
||
|
|
||
this subject.
|
||
|
|
||
|
||
That's it folks. Does anyone wonder why it took several hundred man
|
||
hours to get to this point? It took a lot of hard (and at times
|
||
tedious) work coupled with a great many lucky guesses to achieve what
|
||
you see here.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 42
|
||
|
||
|
||
|
||
Inside TURBO Pascal 5.5 Units
|
||
----------------------------------------------------------------------
|
||
|
||
11. REFERENCES
|
||
|
||
|
||
[Bor88a], TURBO ASSEMBLER REFERENCE GUIDE, Borland International,
|
||
|
|
||
1988.
|
||
|
|
||
|
||
[Bor88b], TURBO ASSEMBLER USER'S GUIDE, Borland International, 1988.
|
||
|
|
||
|
||
[Bor88c], TURBO PASCAL REFERENCE GUIDE Version 5.0, Borland
|
||
|
|
||
International, 1988.
|
||
|
|
||
|
||
[Bor88d], TURBO PASCAL USER'S GUIDE Version 5.0, Borland
|
||
|
|
||
International, 1988.
|
||
|
|
||
|
||
[Bor89], TURBO PASCAL 5.5 OBJECT-ORIENTED PROGRAMMING GUIDE, Borland
|
||
|
|
||
International, 1989.
|
||
|
|
||
|
||
[Brown], INTER489.ARC, Ralf Brown, 1989
|
||
|
|
||
|
||
[Intel85], iAPX 286 PROGRAMMER'S REFERENCE MANUAL INCLUDING THE iAPX
|
||
|
|
||
286 NUMERIC SUPPLEMENT, Intel Corporation, 1985, (order
|
||
|
|
||
number 210498-003).
|
||
|
|
||
|
||
[Intel89], 386 SX MICROPROCESSOR PROGRAMMER'S REFERENCE MANUAL, Intel
|
||
|
|
||
Corporation, 1989, (order number 240331-001).
|
||
|
|
||
|
||
[Nelson], THE 80386 BOOK: ASSEMBLY LANGUAGE PROGRAMMER'S GUIDE FOR
|
||
|
|
||
THE 80386, Ross P. Nelson, Microsoft Press, 1988.
|
||
|
|
||
|
||
[Scanlon], 80286 ASSEMBLY LANGUAGE ON MS-DOS COMPUTERS, Leo J.
|
||
|
|
||
Scanlon, Brady 1986.
|
||
|
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
----------------------------------------------------------------------
|
||
Rev: August 11, 1990 Page 43 |