487 lines
28 KiB
Plaintext
487 lines
28 KiB
Plaintext
|
||
|
||
|
||
|
||
|
||
ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»
|
||
º º
|
||
º The Logical Structure, Organization, º
|
||
º and Management of Hard Disk Drives º
|
||
º º
|
||
º by º
|
||
º Steve Gibson º
|
||
º GIBSON RESEARCH CORPORATION º
|
||
º º
|
||
º Portions of this text originally appeared in Steve's º
|
||
º InfoWorld Magazine TechTalk Column. º
|
||
º º
|
||
ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ
|
||
|
||
|
||
|
||
As our operating systems and application software have continued
|
||
to grow in size, their memory requirements have increased
|
||
steadily. A vital memory in our system is hard disk storage.
|
||
|
||
Bound within the hard disk's structure lie the answers to
|
||
questions like: What is a low level format? What does FDISK do?
|
||
What is a hard disk partition and why does DOS limit us to 32
|
||
megabytes in a partition? What does it mean to have "lost
|
||
cluster chains" or "cross-linked files?" What does it mean to
|
||
have our disks "defragmented?" Let's explore MS-DOS and PC-DOS
|
||
hard disk organization to answer these questions and others.
|
||
|
||
The first stage in preparing any hard disk for operation is
|
||
known as low level formatting. Low level formatting takes any
|
||
hard disk from its virgin "fresh from the factory" state and
|
||
prepares it for operation with a particular hard disk
|
||
controller and computer system.
|
||
|
||
Low level formatting divides each circular track into equal size
|
||
SECTORS by placing SECTOR ID HEADERS at uniform positions around
|
||
each track. The start of a sector ID is marked with a special
|
||
magnetic pattern which cannot be generated by normal recorded
|
||
data. This ADDRESS MARK allows the beginning of each sector to
|
||
be uniquely discriminated from all recorded data.
|
||
|
||
The sector ID information, which immediately follows the address
|
||
mark contains each sector's Cylinder, Head, and Sector number
|
||
which is completely unique for each sector on the disk. When the
|
||
hard disk controller is late reading or writing to these disk
|
||
sectors, it compares the sector's pre-recorded cylinder number
|
||
to make sure that the heads haven't "mis-stepped" and that
|
||
they're flying over the proper cylinder. It then compares the
|
||
head
|
||
number to verify that unreliable cabling is not causing an
|
||
improper head to be selected and waits for the proper sector to
|
||
start by comparing the pre-recorded sector number as it passes
|
||
by with the sector number for which it is searching.
|
||
|
||
Since many hard disk surfaces are not flawless, low level
|
||
formatting programs include a means for entering the hard disk
|
||
drive's defect list. The defect list specifies tracks (by
|
||
cylinder and head number) that the manufacturer's sensitive drive
|
||
certification equipment found to stray from the normal which
|
||
indicates some form of physical flaw that might prevent data from
|
||
being reliably written and read. The list of such defects
|
||
is typically printed and attached to the outside of the drive.
|
||
|
||
When these tracks are entered into the low level formatter, the
|
||
defective tracks receive a special code in their sector ID
|
||
headers which indicates that the track has been flagged as bad
|
||
and cannot be used for any data storage. Later, as we shall see,
|
||
high level formatting moves this defective track information
|
||
into the system's File Allocation Table (FAT) to prevent the
|
||
operating system from allocating files within these defective
|
||
regions.
|
||
|
||
When the low level format has been established, we have a
|
||
completely empty drive, devoid of stored information, which can
|
||
accept and retrieve data with the specification of any valid
|
||
cylinder, head, and sector number.
|
||
|
||
There's an important issue about the low level formatting of a
|
||
hard disk which is frequently overlooked, but which can be quite
|
||
important to appreciate. Since the hard disk controller works in
|
||
intimate concert with its hard disk drive to transfer the data
|
||
within its numbered sectors to and from the computer's memory,
|
||
the exact details of the address mark, sector ID header, and
|
||
rotational sector timing can be completely arbitrary for any
|
||
controller and drive. Since these details are initially
|
||
established when the drive receives its low level formatting,
|
||
they are forever hence agreed upon by both the hard disk drive
|
||
and the controller. But more importantly, there's absolutely no
|
||
reason to assume that the relatively arbitrary low level
|
||
formatting specifics used by any particular hard disk controller
|
||
would be compatible with any other model of hard disk
|
||
controller.
|
||
|
||
In practice this means that differing makes or models of hard
|
||
disk controllers are completely unable to read, write, or
|
||
interpret the formatted information created by any other make or
|
||
model of controller. Consequently, whenever it is
|
||
necessary or desirable to exchange hard disk controllers, a
|
||
complete backup of the hard disk's data, while attached to the
|
||
initial controller, MUST BE followed by creating a new low level
|
||
format with the new controller on the drive before any of the
|
||
backed-up information can be restored to the drive with the new
|
||
controller.
|
||
|
||
So we've given our drives a low level format, since we see that
|
||
it is this process which first establishes "communication"
|
||
between a hard disk and its controller by creating 512-byte
|
||
"sectors"
|
||
where none existed before. Now lets take up the next phase of
|
||
hard disk structuring: The hard disk PARTITION.
|
||
|
||
The notion of hard disk (or "fixed disk" as IBM calls them)
|
||
partitions was created to allow a hard disk based computer
|
||
system to contain and "boot up" several completely different
|
||
operating systems. Partitioning divides a single physical hard
|
||
disk into multiple LOGICAL partitions.
|
||
|
||
A birthday cake is divided into multiple pieces by slicing it
|
||
radially whereas a hard disk's divisions are circular. For
|
||
example, a drive's first partition might extend from cylinder
|
||
zero through 299 with the second partition beginning on cylinder
|
||
300 and extending through 599. This circular partitioning is far
|
||
more efficient since it minimizes the disk head travel when
|
||
moving within a single partition.
|
||
|
||
The partitions on a drive, even if there's only one, are managed
|
||
by a special sector called the PARTITION TABLE which is located
|
||
at the very beginning of every hard disk. It defines the
|
||
starting and ending locations for each of the disk's partitions
|
||
and specifies which of the partitions is to gain control of the
|
||
system during system boot up. When the hard disk drive is booted
|
||
a tiny program at the beginning of the partition table locates
|
||
the partition which is flagged as being the "bootable partition"
|
||
in the table and executes the program located in the first
|
||
sector, the "boot sector," of that partition. This boot sector
|
||
loads the balance of the partition's operating system then
|
||
transfers control to it.
|
||
|
||
Each partition on a hard disk is blind to the existence of any
|
||
other. By universal agreement, the operation of software inside
|
||
a partition is completely contained within the bounds of the
|
||
partition. Adherence to this agreement prevents multiple
|
||
operating systems from colliding and allows strange environments
|
||
to cohabitate on a single hard disk.
|
||
|
||
The sectors within a partition are numbered sequentially
|
||
starting at zero and extending to the end of the partition. In
|
||
kind with DOS's original belief that 640K of RAM would be more
|
||
than we'd EVER need, there was a time in the not-so-distant past
|
||
when a ten megabyte hard disk was an unheard of luxury and was
|
||
considered huge. How could any single person ever fill up 10
|
||
megabytes? No way.
|
||
|
||
Consequently DOS was designed to access sectors within its hard
|
||
disk partition with a single sixteen-bit quantity. One "word"
|
||
was set aside for the specification of partition sectors. As
|
||
many of you know, a single sixteen-bit binary word can represent
|
||
values from 0 through 65,535. So this limited a partition's
|
||
total sector count to 65,536. Since hard disk sectors are 512
|
||
bytes long, a partition could contain 33,554,432 bytes. When you
|
||
remember that binary megabytes are really 1,048,576 bytes each,
|
||
that's exactly 32 megabytes.
|
||
|
||
This is the origin of DOS's infamous 32 megabyte barrier. Today
|
||
of course we have affordable drives with capacities well
|
||
exceeding DOS's 32 megabyte limit. The industry has invented
|
||
three solutions to this partition size dilemma.
|
||
|
||
The first solution invented to the partition size problem
|
||
utilizes DOS's inherent extendibility with external device
|
||
drivers. Programs such as OnTrack's DISK MANAGER, Storage
|
||
Dimensions' SPEEDSTOR, and Golden Bow's VFEATURE DELUXE utilize a
|
||
clever trick to circumvent the 32 megabyte DOS limit: They trick
|
||
DOS into believing that sectors are larger than 512 bytes! By
|
||
interposing themselves between DOS and the hard disk, these
|
||
partitioning device drivers lead DOS to believe that individual
|
||
sectors are much larger than they really are. Then when DOS asks
|
||
for one "logical" 4k-byte sector they hand DOS eight 512-byte
|
||
physical sectors. This transforms the 65,536 sector count limit
|
||
into a single partition containing more than 268 megabytes!
|
||
|
||
The second solution was introduced by IBM's PC-DOS 3.3 operating
|
||
system with its ability to allow DOS to have simultaneous access
|
||
to multiple logical partitions on a single drive. With DOS 3.3,
|
||
the standard FDISK command can establish any number of 32-
|
||
megabyte or smaller partitions on a drive. While this doesn't
|
||
create a single unified huge partition, it also doesn't require
|
||
any external resident device drivers.
|
||
|
||
The final solution has recently been introduced by Compaq
|
||
Computer with their introduction of DOS 3.31. Being big enough
|
||
to get away with sacrificing some software compatiblity, Compaq
|
||
has redefined the way DOS numbers its partition sectors thereby
|
||
removing the limitation at its source.
|
||
|
||
So now our hard disks have a low level format, with
|
||
"addressability" to the disk's individual physical sectors
|
||
established. We have also defined and established partitions on
|
||
our drive, which gives DOS a sub-range of the hard disk within
|
||
which to build its filing system. Now let's examine the
|
||
structure of MS-/PC-DOS filing systems. The following discussion
|
||
also applies to DOS diskettes which aren't partitioned but
|
||
otherwise have an identical structure.
|
||
|
||
Let's begin by looking at the problem that DOS's filing system
|
||
solves: Its task is to allow us, through the vehicle of DOS
|
||
application programs, to create named collections of bytes of
|
||
data, called files, and to help with their management by
|
||
providing directories of these named files.
|
||
|
||
The directory entry for any DOS file contains the file's name
|
||
and extension, the date and time when the file was last written
|
||
and closed, an assortment of Yes/No "attributes" which indicate
|
||
whether the file has been modified since last backup, whether it
|
||
can be written to, whether it's even visible in the directory,
|
||
etc. The directory entry for the file also contains the address
|
||
of the start of the file.
|
||
|
||
We already know that hard disks are divided into numbered
|
||
sectors 512 bytes in length. Since most of the files DOS manages
|
||
are much larger than a single sector, disk space is allocated in
|
||
"clumps" of sectors called clusters. Various versions of DOS
|
||
utilize clusters of 4, 8 or 16 sectors each, or 2048, 4096, or
|
||
8192 bytes in length.
|
||
|
||
When a hard disk is completely empty, its clusters of sectors
|
||
are all available for storing file data. As files are created
|
||
and deleted on the hard disk, a bookkeeping system is needed
|
||
which keeps track of which clusters are in use by which existing
|
||
files, and which clusters are still available for allocation to
|
||
new or growing files. This is the vital role played by the File
|
||
Allocation Table. The "FAT," as it's frequently called, is the
|
||
table DOS uses to manage the allocation of space on the hard
|
||
disk.
|
||
|
||
As we know, the hard disk is arranged as a long stream of
|
||
sectors. After being clumped together into clusters, it can be
|
||
viewed as a long stream of clusters. Now picture a table
|
||
consisting of a
|
||
long stream of entries, with one entry in the table for each
|
||
cluster on the disk. The first FAT table entry corresponds to
|
||
the first hard disk cluster, and the last FAT entry corresponds
|
||
to the last hard disk cluster.
|
||
|
||
Now imagine that DOS needs to create a new text or spreadsheet
|
||
file for us. It must first find a free cluster on the hard disk,
|
||
so it searches through the File Allocation Table looking for an
|
||
empty FAT table entry, which corresponds to an empty hard disk
|
||
cluster. When DOS finds the empty table entry it memorizes its
|
||
number, then places a special "end of chain" marker in the FAT
|
||
entry to show that this cluster has been allocated and is no
|
||
longer free for use. DOS then goes out to the sectors which
|
||
comprise this cluster and writes the file's new data there.
|
||
|
||
This is all great until the file grows longer than a single
|
||
cluster of sectors. DOS now needs to allocate a second cluster
|
||
for this file. So it once again searches through the File
|
||
Allocation Table for a free cluster. When found, it again places
|
||
the special "end of chain" marker in this cluster and memorizes
|
||
its number.
|
||
|
||
Now things begin to get interesting... and just a little bit
|
||
tricky. Since files might be really long, consisting of
|
||
thousands of individually allocated clusters, there's no way for
|
||
DOS to memorize all of the clusters used by each file. So DOS
|
||
uses each File Allocation Table entry to store the number of the
|
||
file's next cluster!
|
||
|
||
Following along with our example, after finding and allocating
|
||
the second cluster for the growing file, DOS goes back to the
|
||
first cluster's FAT entry where it had placed that first "end of
|
||
chain" marker and replaces it with the number of the file's
|
||
second cluster. If a third cluster were then needed, its FAT
|
||
entry would be marked "not available" by placing the special
|
||
"end of chain" marker in it, then this third cluster number
|
||
would be placed into the second cluster's FAT entry. Get it?
|
||
|
||
This creates a "chain" of clusters with each cluster entry
|
||
pointing to the next one, and the last one containing a special
|
||
"end of chain" entry which signals that the end of the file's
|
||
allocation chain has been reached.
|
||
|
||
Finally, when the file is "closed," an entry is created in a DOS
|
||
directory which names the file and contains the number of the
|
||
file's first cluster. Then, using that first cluster's FAT
|
||
entry, the entire allocation "chain" can be "traversed" to find
|
||
the clusters which contain the file's data.
|
||
|
||
So now let's do a bit of review....
|
||
|
||
The allocation of file space within a DOS partition is recorded
|
||
and maintained within DOS's File Allocation Tables (FATs). The
|
||
FATs make up a map of the utilization of space on any floppy or
|
||
hard disk with one entry in the FAT for each allocatable cluster
|
||
of sectors. Each entry in the FAT can indicate one of four
|
||
possible conditions for the clusters of sectors it represents:
|
||
It can be unused and available for allocation, unused and marked
|
||
as bad to prevent its use, in use and pointing to the next
|
||
cluster of the file, or in use as the last cluster of a file.
|
||
|
||
If each entry in the FAT points to the next, who points to the
|
||
first entry? This is the role of the file's directory entry. It
|
||
contains the name of the file, the file's exact length, the time
|
||
and date of the file's last modification, file attribute flags,
|
||
and the identity of file's first cluster. In a sense, a file's
|
||
directory entry forms the head of the file's allocation chain
|
||
with each link thereafter pointing to the next link in the
|
||
chain.
|
||
|
||
This system, while quite workable and efficient, does have its
|
||
dangers. These dangers center around the fact that the FAT
|
||
contains the ONLY record of disk space utilization and a
|
||
stubborn failure to correctly read a single sector of the FAT
|
||
could render hundreds of files unrecoverable. This danger
|
||
explains the popularity of several utility programs which create
|
||
a back-up copy of the File Allocation Table and Root Directory
|
||
with each system boot-up. They provide some hope of recovery
|
||
from the cataclysmic loss of the FAT's data.
|
||
|
||
The original designers of DOS were aware of the importance of
|
||
the FAT and do provide a duplicate copy immediately following
|
||
the first, but its physical proximity to the original renders it
|
||
little better than none, and DOS has long been notorious for
|
||
failing to intelligently utilize this extra copy of FAT
|
||
information even in the event of a primary FAT failure. (DOS 3.3
|
||
seems to be much smarter in this regard.)
|
||
|
||
Important as FAT reliability is, it's not generally the prime
|
||
source of DOS file corruption, since even with perfect data
|
||
retrieval, it's still possible to scramble DOS's files like
|
||
crazy. The primary cause of DOS file system troubles are user
|
||
error, program bugs, and "glitches." The advent of TSR "rule
|
||
breaking" resident multitasking-style software has further
|
||
complicated the scene.
|
||
|
||
When a new file is created or "opened," information about it is
|
||
maintained inside DOS. The file's name, status, and first
|
||
cluster are all held in internal tables. Then, as the file
|
||
grows, free clusters are "checked out" of the File Allocation
|
||
Table and allocated to the file's chain of clusters.
|
||
|
||
Now here's the crucial fact which causes so much trouble: No
|
||
matter how big the newly created file becomes, a directory entry
|
||
for the file is ONLY created when the file is finally and
|
||
properly CLOSED. Until then the file exists only as a chain of
|
||
allocated clusters filled with the file's data. If anything
|
||
occurs to prevent the error-free closing of this file we have a
|
||
real problem because the file's data is occupying a chain of
|
||
"checked out" disk clusters, but there is no anchoring directory
|
||
entry to point to the first cluster in the chain!
|
||
|
||
A chain of clusters without an anchoring directory entry is
|
||
called a "lost chain." It exists, it contains data, but there's
|
||
no record of the file's name, exact size, or purpose.
|
||
|
||
Lost cluster chains are frequently created when programs abort
|
||
abnormally, when TSR's crash the system suddenly, when the
|
||
computer user forgets to write a TSR's files out to disk before
|
||
shutting the system down, or when a task in a multi-tasking
|
||
system is not terminated. (It's easy to forget that a file was
|
||
left open in a suspended background task.) Additionally, any
|
||
damage to DOS's root directory or subdirectories can "liberate"
|
||
chains of lost clusters.
|
||
|
||
DOS provides the CHKDSK (pronounced Check Disk) command to help
|
||
its users keep an eye on just these sorts of problems. CHKDSK
|
||
provides a comprehensive verification of DOS's filing system
|
||
integrity and provides a means for straightening things out.
|
||
When the CHKDSK command is given, the parentage of all cluster
|
||
chains is checked, allocation chains are "followed" to be sure
|
||
they don't cross over other chains (creating cross-linked
|
||
files), and several other system integrity checks are performed.
|
||
|
||
In the case of lost chains, CHKDSK will offer to convert these
|
||
into files by anchoring them to the root directory. Then any
|
||
suitable text editor can be used to open these new files for the
|
||
sake of identifying them and moving them back to where they
|
||
belong.
|
||
|
||
Unfortunately the structure of DOS filing systems lacks the
|
||
fundamental redundancy required to provide simple and error-free
|
||
recovery from many forms of damage. Even the tools and
|
||
techniques available from third party suppliers can't surmount
|
||
these problems. The best bet is to understand DOS's weak spots,
|
||
make certain that all opened files are closed successfully,
|
||
perform a weekly CHKDSK command to collect accumulating file
|
||
fragment "debris" and back up your hard disks regularly.
|
||
|
||
"Disk Optimizers" which promise to increase the throughput and
|
||
performance of old and well used hard disk drives number among
|
||
the most popular of the general use hard disk utilities.
|
||
|
||
We've seen how DOS's file allocation system operates. Files are
|
||
composed of clusters which in turn are composed of sectors. And
|
||
while the group of sectors which comprise a cluster are by
|
||
definition contiguous, the cluster linking scheme which DOS
|
||
employs allows a file's clusters to be scattered across the
|
||
disk's surface. Since the file's directory entry specifies the
|
||
file's first cluster, and each succeeding cluster entry in the
|
||
file allocation table specifies the next one, the file's
|
||
contents could be literally anywhere on the disk. The term "file
|
||
fragmentation" refers to the condition where a file's clusters
|
||
are not consecutively numbered. Let's first examine how a disk's
|
||
files might become fragmented.
|
||
|
||
When a file is deleted from a disk, its directory entry is
|
||
flagged as unused and each cluster which the file occupied is
|
||
flagged in the system's FAT as being free for use. If the
|
||
surrounding clusters are still in use by other files, this
|
||
creates a "hole" of free space in the disk.
|
||
|
||
Now suppose that a new file is copied from a floppy disk onto
|
||
the hard disk. As DOS reads the new file's data from the floppy,
|
||
it must allocate space for this file on the hard disk. So each
|
||
time another cluster of sectors is needed, DOS searches through
|
||
the file allocation table to find the next available cluster. In
|
||
our example, DOS would discover the clusters which had been
|
||
freed by the first file we deleted and allocate them for use by
|
||
the new file. Then, when all of the clusters in the free space
|
||
hole had been used, DOS would be forced to continue its search
|
||
deeper into the drive. When space was found further in, the
|
||
file's contents would be partially stored near the beginning of
|
||
the disk and partially nearer to the end. The file would then
|
||
consist of at least two fragments.
|
||
|
||
During the normal course of daily computer usage, many files are
|
||
being constantly created, copied, extended, deleted, and
|
||
replaced. When a wordprocessor creates an automatic backup file,
|
||
the original file is typically renamed to identify it as a
|
||
backup file and a new file is created. Every new file creation
|
||
is an opportunity for fragmentation. The files which are being
|
||
modified most often are most subject to extensive fragmentation
|
||
since any search by DOS for a free file cluster is almost
|
||
guaranteed to produce a new discontinuity. With continued use,
|
||
it's typical for much of the disk's file data to become
|
||
haphazardly scattered across the surface of the disk drive.
|
||
|
||
But since DOS's cluster allocation scheme was specifically
|
||
designed to manage such scattering, what's the problem? Any time
|
||
the drive's head moves, two things occur: Time is consumed, and
|
||
the drive experiences some mechanical wear and tear. If a file's
|
||
data is scattered across the surface of the disk, the drive's
|
||
head is forced to move a large distance many times to read a
|
||
single file. If the file is a database whose records are being
|
||
accessed at random, this excessive head motion can degrade the
|
||
overall system performance tremendously and induce many other
|
||
wear-related disk drive problems.
|
||
|
||
The extra time wasted in cluster fragment chasing is directly
|
||
proportional to the drive's average head access time. The prior
|
||
generation of 65 to 80 millisecond stepping motor drives lose
|
||
far more performance to fragmentation than the latest sub-28
|
||
millisecond drives.
|
||
|
||
Disk optimizers like SoftLogic Solutions' DISK OPTIMIZER,
|
||
Norton's SPEEDDISK, Central Point's COMPRESS, and Golden Bow's
|
||
VOPT operate by physically rearranging the allocation of files
|
||
on the disk. They relocate file cluster fragments while
|
||
simultaneously updating the system's File Allocation Tables to
|
||
reflect the new cluster locations. When finished, every file on
|
||
the disk consists of a single contiguous run of consecutively
|
||
numbered clusters. Once the disk drive's head has been
|
||
positioned to the beginning of the file, the entire file can be
|
||
read or randomly accessed with an absolute minimum of head
|
||
motion. Besides improving the system's overall performance, file
|
||
defragmentation minimizes the mechanical wear and tear placed
|
||
upon the drive's hardware. If some disaster should befall your
|
||
system's Root Directory or File Allocation Table, contiguous
|
||
files are also much easier to find and recover than files with
|
||
severe fragmentation.
|
||
|
||
Since file fragmentation is a continually occurring fact of
|
||
living with DOS, periodic defragmentation, like hard disk
|
||
backup, should become part of every serious DOS user's regimen.
|
||
|
||
- The End -
|
||
|
||
|
||
Copyright (c) 1989 by Steven M. Gibson
|
||
Laguna Hills, CA 92653
|
||
**ALL RIGHTS RESERVED **
|
||
|