textfiles/magazines/PACS/rev03.05


+ Page 1 +

-----------------------------------------------------------------
            The Public-Access Computer Systems Review

Volume 3, Number 5 (1992)                          ISSN 1048-6542
-----------------------------------------------------------------

To retrieve an article file as an e-mail message, send the GET
command given after the article information to LISTSERV@UHUPVM1
(BITNET) or LISTSERV@UHUPVM1.UH.EDU (Internet).  To retrieve the
article as a file, omit "F=MAIL" from the end of the GET command.


                            CONTENTS

        FOCUS ON CAMPUS-WIDE INFORMATION SYSTEMS, PART II


REFEREED ARTICLES

Nonbibliographic Applications of Z39.50

     By John A. Kunze (pp. 4-30)

     To retrieve this file: GET KUNZE PRV3N5 F=MAIL

This article describes how Z39.50 is being used as the basis for
a networked campus information system called Infocal at the
University of California at Berkeley.


COLUMNS

Public-Access Provocations: An Informal Column

     Two Steps Forward, One Step Back

     By Walt Crawford (pp. 31-32)

     To retrieve this file: GET CRAWFORD PRV3N5 F=MAIL

Casting the Net

     USMARC Format Integration, Part I: What, Why, and When?

     By Priscilla Caplan (pp. 33-36)

     To retrieve this file: GET CAPLAN PRV3N5 F=MAIL

+ Page 2 +

-----------------------------------------------------------------
            The Public-Access Computer Systems Review

Editor-in-Chief

Charles W. Bailey, Jr.
University Libraries
University of Houston
Houston, TX 77204-2091
(713) 743-9804
LIB3@UHUPVM1 (BITNET) or LIB3@UHUPVM1.UH.EDU (Internet)

Associate Editors

Columns: Leslie Pearse, OCLC
Communications: Dana Rooks, University of Houston
Reviews: Roy Tennant, University of California, Berkeley

Editorial Board

Ralph Alberico, University of Texas, Austin
George H. Brett II, Clearinghouse for Networked Information
     Discovery and Retrieval
Steve Cisler, Apple
Walt Crawford, Research Libraries Group
Lorcan Dempsey, University of Bath
Nancy Evans, Pennsylvania State University, Ogontz
Charles Hildreth, READ Ltd.
Ronald Larsen, University of Maryland
Clifford Lynch, Division of Library Automation,
     University of California
David R. McDonald, Tufts University
R. Bruce Miller, University of California, San Diego
Paul Evan Peters, Coalition for Networked Information
Mike Ridley, University of Waterloo
Peggy Seiden, Skidmore College
Peter Stone, University of Sussex
John E. Ulmschneider, North Carolina State University

Publication Information

Published on an irregular basis by the University Libraries,
University of Houston.  Technical support is provided by the
Information Technology Division, University of Houston.
Circulation: 4,831 subscribers in 46 countries (PACS-L) and 615
subscribers in 33 countries (PACS-P).

+ Page 3 +

Back issues are available from LISTSERV@UHUPVM1 (BITNET) or
LISTSERV@UHUPVM1.UH.EDU (Internet).  To obtain a list of all
available files, send the following e-mail message to the
LISTSERV: INDEX PACS-L.  The name of each issue's table of
contents file begins with the word "CONTENTS."

-----------------------------------------------------------------

-----------------------------------------------------------------
The Public-Access Computer Systems Review is an electronic
journal that is distributed on BITNET, Internet, and other
computer networks.  There is no subscription fee.
     To subscribe, send an e-mail message to LISTSERV@UHUPVM1
(BITNET) or LISTSERV@UHUPVM1.UH.EDU (Internet) that says:
SUBSCRIBE PACS-P First Name Last Name.  PACS-P subscribers also
receive two electronic newsletters: Current Cites and Public-
Access Computer Systems News.
     The Public-Access Computer Systems Review is Copyright (C)
1992 by the University Libraries, University of Houston.  All
Rights Reserved.
     Copying is permitted for noncommercial use by computer
conferences, individual scholars, and libraries.  Libraries are
authorized to add the journal to their collection, in electronic
or printed form, at no charge.  This message must appear on all
copied material.  All commercial use requires permission.
-----------------------------------------------------------------

+ Page 33 +

-----------------------------------------------------------------
Casting the Net Column
-----------------------------------------------------------------

-----------------------------------------------------------------
Caplan, Priscilla.  "USMARC Format Integration, Part I: What,
Why, and When?"  The Public-Access Computer Systems Review 3, no.
5 (1992): 33-36.  To retrieve this article, send the following e-
mail message to LISTSERV@UHUPVM1 or LISTSERV@UHUPVM1.UH.EDU: GET
CAPLAN PRV3N5 F=MAIL.
-----------------------------------------------------------------

When is Format Integration Coming?

Q.  When is a map not a map?
A.  When it's an atlas.

In less time than it takes a whale to gestate, format integration
will be upon us.  The Library of Congress and the bibliographic
utilities have agreed upon a January 1, 1994 implementation date,
and the library community seems to be awakening to the fact that,
however much it might like to, it can't ignore format integration
forever.  An ALCTS preconference to the ALA annual meeting in San
Francisco on "Implementing USMARC Format Integration" sold out in
the first few weeks of registration, causing the sponsors to
begin planning a series of regional workshops for 1993.  A number
of other library associations and networks, including PALINET and
AALL, are organizing their own programs on the topic.
     Despite this surge of interest, however, many librarians
don't really understand format integration and don't really want
to deal with it.  This is possibly because we're busy and fifteen
months seems like a long time away (just ask a whale).  It's
possibly also because format integration sounds larger and more
formidable than it actually is.

What is Format Integration?

First, let's clarify what format integration does NOT do.  It
does not apply to nonbibliographic data: the holdings,
authorities, classification, and community information "formats"
remain unaffected.  Format integration does not eliminate the
concept of bibliographic format.  Like dwarves and deadly sins,
there will still be seven formats: books, serials, visual
materials, archival and manuscripts control, maps, music, and
computer files.  And it doesn't have anything to do, for good or
for ill, with the problem of multiple versions or how to treat
microform reproductions of print publications.

+ Page 34 +

     What format integration does do is allow cataloging for
materials with characteristics of more than one format to fully
represent of those materials.  Common cases include main items
with accompanying materials (e.g., a computer file with a
manual), multimedia, and nontextual serials (e.g., a sound
recording issued serially).  Catalogers have to describe these
items using the fixed and variable fields appropriate to a single
format, pretty much ignoring any characteristics that don't fit.
     This led, in turn, to the conundrum that introduces this
column: maps bound in volumes being cataloged as books, since
there is no way to represent their book-ness and map-ness both.
Unless, of course, the maps are issued serially, in which case
CONSER rules specify that they will be cataloged as serials,
physical format having been decreed secondary to seriality.  Pity
the poor user whose map is in an atlas issued as part of a
periodical.

Multiple Formats in a Single Record

Enter format integration, which, when implemented, will allow you
to describe multiple formats in a single bibliographic record,
using both fixed and variable field data as appropriate.  Fixed
field data elements can be provided by means of a rather clever
new field called the 006.  Those familiar with the MARC data
structure know that coded data elements, positionally defined,
are encoded in the 008 field, which is defined differently for
each format.  However, the beginning and ending data elements in
the 008 actually apply to all record types, containing
information such as "date entered on file" and "place of
publication."  Only the middle 17 bytes vary from format to
format.  The 006 field was designed to contain in its first
position a code indicating the type of 006 (e.g., serials and
AMC), followed by 17 bytes defined as they would be in the
corresponding 008.  Like the 007, the 006 may or may not occur in
any given record, and there can be as many in a single record as
are appropriate.  A map in an atlas issued serially, then, could
theoretically have a serial 008 and one or two 006 fields, one
for map-ness and one (possibly) for books.

+ Page 35 +

     There must still be only one 008 field, and it will still be
used for the primary format.  Fields 006 can be added as needed
to describe secondary characteristics.  Format integration comes
with rules for choosing which format is primary (i.e., which gets
the 008) and which is secondary (and so gets the 006).  For a
main item with accompanying materials, the main item is primary;
for textual serials, seriality is primary and physical format is
secondary; for nonprint serials, the physical form is primary and
seriality secondary.

Changes to Variable Fields

For variable fields, the major change is that all fields have
been declared useable wherever they are appropriate.  In most
cases, this means that fields previously defined as valid for
only a subset of formats have been extended across all formats:
the 522 "geographic coverage" note, for example, can now be used
for computer files, if it should happen to apply.  In some cases,
essentially the same data was defined in different places in
different formats, so one data element had to be selected for
extension and the others declared obsolete.  For example,
acquisitions information was made obsolete in the 265 and 350
fields and shall live from now on in the redefined, expanded, and
extended 037.  In fact, the review process occasioned by format
integration was seen as a good excuse to tidy up other
problematic or little-used data elements in USMARC.  A small but
significant set of codes that have been driving catalogers crazy
for years ("main entry in body of entry"!) have been eliminated.

Why Bother?

All of this is not to say that USMARC is now so simple and
intuitive that we can throw away the rule books and devote our
time to improving subject access.  For one thing, obsolete
content designation is still valid in older records, so, as with
AACR2, we will have to live with two sets of rules for a long
time.  Even with guidelines, there will be situations where the
primary format is not obvious.  The ability to record information
about secondary format characteristics means the opportunity to
spend more time cataloging and to make more mistakes.  One could
argue that most automated systems make little enough use of the
fixed field data elements as it is; the 006 now offers us the
chance to ignore even more data than before.

+ Page 36 +

     Format integration may ultimately result in more compact
documentation, easier training, and less retraining for
catalogers.  But it would certainly not be worth the bother it
was to define or will be to implement unless it ultimately
benefits the end users of our online catalogs.  As Karen Coyle of
the University of California (who also provided the riddle that
begins this column) pointed out at the ALCTS preconference, our
OPACs aren't exactly littered with format-related information as
it is.  Karen had a number of interesting observations, some of
which I'll repeat in "USMARC Format Integration, Part II:
Implications for Local Systems."  Stay tuned.


About the Author

Priscilla Caplan, Head, Systems Development Division, Office for
Information Services, Harvard University Library.  Internet:
COTTON@HARVARDA.HARVARD.EDU.

-----------------------------------------------------------------
The Public-Access Computer Systems Review is an electronic
journal that is distributed on BITNET, Internet, and other
computer networks.  There is no subscription fee.
     To subscribe, send an e-mail message to LISTSERV@UHUPVM1
(BITNET) or LISTSERV@UHUPVM1.UH.EDU (Internet) that says:
SUBSCRIBE PACS-P First Name Last Name.  PACS-P subscribers also
receive two electronic newsletters: Current Cites and Public-
Access Computer Systems News.
     This article is Copyright (C) 1992 by Priscilla Caplan.  All
Rights Reserved.
     The Public-Access Computer Systems Review is Copyright (C)
1992 by the University Libraries, University of Houston.  All
Rights Reserved.
     Copying is permitted for noncommercial use by computer
conferences, individual scholars, and libraries.  Libraries are
authorized to add the journal to their collection, in electronic
or printed form, at no charge.  This message must appear on all
copied material.  All commercial use requires permission.
-----------------------------------------------------------------

+ Page 31 +

-----------------------------------------------------------------
Public-Access Provocations: An Informal Column
-----------------------------------------------------------------

-----------------------------------------------------------------
Crawford, Walt.  "Two Steps Forward, One Step Back."  The Public-
Access Computer Systems Review 3, no. 5 (1992): 31-32.  To
retrieve this article, send the following e-mail message to
LISTSERV@UHUPVM1 or LISTSERV@UHUPVM1.UH.EDU: GET CRAWFORD PRV3N5
F=MAIL.
-----------------------------------------------------------------

Progress in public access doesn't come smoothly or uniformly.  If
you can actually take two steps forward (through a change for the
better) and only one step back (because something gets misplaced
along the way), you're doing pretty well.
     Two steps forward: My local branch library now has a decent
CD-ROM periodical index on a neat little PC workstation (with,
presumably, a hidden Pioneer minichanger CD-ROM drive), replacing
the clunkier Magazine Index on roll fiche, which replaced . . .
well, no up-to-date periodical index at all.
     One step back: For simple searches, Magazine Index was
significantly faster, and somewhat easier to use.  That's a half-
step.  The other half-step: the menus for the CD-ROM system make
it all too easy to escape to the C:/ prompt, particularly if you
just want to return to the opening screen.  There's no C:/ prompt
(with its opening for deviltry) on a roll fiche reader.  (Yes, I
know an experienced PC hand could trap that escape to the C:/
prompt.  This is a small branch library.  Where do they find
experienced PC hands?)
     Two steps forward: Library users at colleges and some public
libraries will soon be able to go beyond their local online
catalogs to see what else is available in hundreds of libraries,
all in a single search, all at a controlled price, while using
their own library-wide or campus-wide network and using an
increasingly familiar search syntax.
     One step back: The user interface for such extended access
will be character-based and probably rely on VT100 emulation, so
that it can be used in the real world across a wild variety of
local equipment.
     Two steps forward: Computers are getting powerful enough to
provide fast searching for enormously large databases.
     One step back: We still don't know how to balance precision
and recall, and to provide good access while retaining user
control.
     What we need to watch out for is the "crazy dance" of the
John Sebastian song, "one step forward, two steps back."

+ Page 32 +

     One step forward: Providing enough ready-reference tools on
CD-ROM (etc.) so that experienced library users can do some of
their own reference work (albeit probably more slowly and less
effectively than if reference librarians were helping).
     Two steps back: Taking away front-line reference librarians
and assuming that "disintermediation" is inherently a good thing.
     One example is enough.  There are others, but you get the
point.  Just a plea to recognize that almost every big
improvement in public access comes at some price, and to
see that the price doesn't outweigh the improvement.


About the Author

Walt Crawford, The Research Libraries Group, Inc., 1200 Villa
Street, Mountain View CA 94041-1100.  BITNET: BR.WCC@RLG.

-----------------------------------------------------------------
The Public-Access Computer Systems Review is an electronic
journal that is distributed on BITNET, Internet, and other
computer networks.  There is no subscription fee.
     To subscribe, send an e-mail message to LISTSERV@UHUPVM1
(BITNET) or LISTSERV@UHUPVM1.UH.EDU (Internet) that says:
SUBSCRIBE PACS-P First Name Last Name.  PACS-P subscribers also
receive two electronic newsletters: Current Cites and Public-
Access Computer Systems News.
     This article is Copyright (C) 1992 by Walt Crawford.  All
Rights Reserved.
     The Public-Access Computer Systems Review is Copyright (C)
1992 by the University Libraries, University of Houston.  All
Rights Reserved.
     Copying is permitted for noncommercial use by computer
conferences, individual scholars, and libraries.  Libraries are
authorized to add the journal to their collection, in electronic
or printed form, at no charge.  This message must appear on all
copied material.  All commercial use requires permission.
-----------------------------------------------------------------

+ Page 4 +

-----------------------------------------------------------------
Kunze, John A.  "Nonbibliographic Applications of Z39.50."  The
Public-Access Computer Systems Review 3, no. 5 (1992): 4-30.
(Refereed Article.)  To retrieve this article, send the following
e-mail message to LISTSERV@UHUPVM1 or LISTSERV@UHUPVM1.UH.EDU:
GET KUNZE PRV3N5 F=MAIL.
-----------------------------------------------------------------

1.0  Introduction

Although the Z39.50 information retrieval protocol is rapidly
gaining acceptance as a standard for interoperability among
networked library automation systems, [1] it has not been obvious
how to make it work for nonbibliographic applications.  This
article describes how Z39.50 is being used as the basis for a
networked campus information system called Infocal [2] at the
University of California at Berkeley. [3]
     The files we are making available via Infocal include class
schedules, computer documentation, library OPACs, phone
directories, public announcements, and software.  Since Infocal
is being developed by Berkeley's Information Systems and
Technology department, the distribution of computer documentation
and software was the original incentive for implementing the
system.  The majority of the information will originate on
campus, though some will be purchased or licensed.  Most of the
nonproprietary information will be available over the Internet.
     We avoid describing the project as a campus-wide information
system (CWIS) because these systems often come with extra baggage
such as bulletin boards, electronic mail, and student
registration systems.  Our system is a straightforward read-only,
client/server-based campus information system with two extra
goals: (1) accommodating nontextual data, and (2) interoperating
with other information systems.
     To help us achieve these goals, we decided to utilize the
Z39.50 protocol. [4]  The challenge was how to use Z39.50 to
support all of our nonbibliographic requirements.  This article
addresses this challenge from the point of view of the designer
of the client system program.  This program, which sits between
the user and the network, translates user commands into Z39.50
protocol requests for the server, and it translates server Z39.50
protocol responses into appropriate user displays.

+ Page 5 +

2.0  Z39.50 Data and Retrieval Support

We need Z39.50 to support the following types of data:

     o    Bibliographic databases
     o    Data of unknown, but learnable, semantics
     o    Full-text documents
     o    Nonbibliographic databases
     o    Nontextual documents

We need it to support the following types of retrieval:

     o    Hierarchical browsing
     o    Hypermedia links
     o    Retrieval by object/document ID

The protocol, having come out of the library automation
community, is straightforward to apply to the retrieval of
bibliographic citations, where records have highly predictable
structure and semantics.  What about databases whose record
structure and semantics are nonbibliographic?  What about
retrieval from databases whose record structure and semantics are
known to the server system programmer, but are not known to the
client system programmer?  Is there a way for the protocol to
transmit meta-information that would allow the client to retrieve
data from such a database?
     Full-text documents are also problematic.  Should the
protocol view a document as a database?  Should it view a
document as a record?  As an element of a record?  Or, as a set
of records?  The protocol's retrieval model requires that records
have a fixed maximum size for the duration of a session; this
means that the programmer, knowing that documents can have highly
skewed size distributions, must decide whether to set the maximum
record size ridiculously high--forgoing any optimization that
might have otherwise been obtained--or to fragment documents
across record boundaries.
     The retrieval model also requires that records belong to the
result set of a query: you are not allowed to retrieve a record
unless it has satisfied some set of search criteria in advance.
Therefore, if you are retrieving document fragments, you may have
to invent a way to conduct a "fragment search."

+ Page 6 +

     The retrieval of nontextual data, such as images or video,
is problematic due to the large size of these objects and,
consequently, the need to support data compression.  There are
other more serious complications, such as the realtime protocol
requirements of video data, that are not addressed here.
     Hierarchical browsing, however unfashionable, is still an
important access method for campus information system users.  It
seems that there will always be a very popular subset of
information on a server that people prefer to get at by exploring
a small, shallow tree of information, rather than by doing an
index search.  This is analogous to going to a restaurant and
preferring to browse the menu instead of asking the waiter to
list all dishes containing potatoes but not anchovies.  To
support hierarchies, the protocol will have to allow clients to
discover tree node names inside menus and to retrieve node
contents using the search facility.  Hypermedia links would be
improved by building them on top of hierarchical browsing
capabilities.
     It would be advantageous to deal with object (or document)
ID retrieval issues at the same time as hierarchical browsing and
hypermedia retrieval issues.  If we had unique, server-qualified
document IDs, users could reuse them to retrieve a known document
without having to reconstruct the entire search leading to its
original discovery.  Document IDs could be valid across sessions,
could be shared with other users, and could be submitted in
relevance feedback queries without incurring the overhead of
sending entire documents in the query.  The Wide Area Information
Server (WAIS) project [5] has made a relevant proposal in this
area, and the Digital Object ID Project, under the auspices of
the Coalition for Networked Information, also addresses this
issue.

3.0  Z39.50 Applications

As a basis for later discussion, it is useful to outline how
Z39.50 is applied to bibliographic information retrieval.  The
protocol has seven separate services, of which only three need be
mentioned here: the Search, Present, and Explain facilities
(Explain is currently under development [6] and will be discussed
later).  The Search facility defines how a query is represented,
and Present defines how information and diagnostic records are
represented.  Although there are four kinds of queries, we need
only consider the Type-1 query, since it is the only one
currently being used for interoperability testing.  It is a
relatively full-featured query against multiple databases that
allows nested Boolean combinations of qualified search terms and
previous search results.

+ Page 7 +

3.1  Semantic Modules

One of the claims of Z39.50 is that it does not restrict the kind
of query you submit or the kind of data you get back.  On the
other hand, it is easy to see how an implementor might think,
having just read the standard, that too few details are given to
build an interoperable system.  This is because Z39.50 is
modularized in order to isolate those parts of the protocol that
would require data-dependent assumptions.  Modules are identified
and can be replaced to accommodate new sets of assumptions.
     Each module is assigned an unambiguous tag or identifier by
the National Information Standards Organization (NISO), and the
protocol guarantees that the client and server always understand
what modules are currently assumed during a session.  It
accomplishes this by making sure that the module identifiers get
tucked into protocol control messages whenever they are needed.
Table 1 identifies the five protocol modules that completely
define the structure and semantics for protocol control, queries,
information, error messages, and resource reports.

-----------------------------------------------------------------
Table 1.  Z39.50 Semantic Modules
-----------------------------------------------------------------

Module Name                  Describes         Bibliographic
                                               Instance

Z39.50 kernel                Control           N/A
Query type/attribute set     Queries           Type-1/Bib-1
Record syntax                Information       MARC
Diagnostic set               Error messages    Bib-1
Resource set                 Costs and time    Bib-1

-----------------------------------------------------------------

+ Page 8 +

The official Z39.50 terms for these modules are the query
type/attribute set pair, record syntax, diagnostic set, and
resource set.  The attribute set applies only to the Type-1
query, which we are already assuming.  It defines several
concepts about the search term and assigns a number to each one.
A list of these numbers accompanies the search term so that the
server can understand what element is to be searched, what
relational operator to use, what position in the element, and so
forth.  The record syntax, often called transfer syntax, is much
more than simple lexical structure; it embodies intimate
structural and semantic knowledge about a sequence of bits.  The
diagnostic and resource sets are just tables of messages and
message numbers that the two sides agree on.
     The standard only goes so far to effectively define one set
of modules (and a few close variants), which is the minimum
required to build an implementation.  This particular set of
modules, listed under bib-1 and MARC, is designed for
bibliographic applications.  The specific bibliographic modules
it defines or refers to are listed in Table 1: the Type-1 query
with bib-1 attribute set, the MARC format [7], the bib-1
diagnostic set, and the bib-1 resource set.  It also defines the
module identifiers; using this information, the programmer keeps
track of the current semantic context.

3.2  Client System Modules

The client system is hard coded to understand whatever
combinations of modules it supports.  The control module makes
sure that neither client nor server ever has to deal with
semantics that it did not agree to (through lower level
negotiation).

3.2.1  Bibliographic Modules

For bibliographic data, the client system programmer can
construct user interface menus that list searchable elements
(e.g., author, date, and title) or valid relational operators
(e.g., equals, less than, or greater than) because these features
are all defined in the semantic module known as bib-1.  The
programmer can also write code that understands how to read a
record as a sequence of bits in the MARC format and display it to
the user.

+ Page 9 +

3.2.2  Nonbibliographic Modules

What does the programmer do differently to support new,
nonbibliographic semantics?  The answer is:

     (1)  Define new semantics for search, display, and
          diagnostics.

     (2)  Get the new attribute set, record syntax, and
          diagnostic set registered with NISO.

     (3)  Write code to support searching and displaying these
          semantics.

     (4)  Plug in the new code modules to support the new
          semantic modules.

For example, consider supporting a student directory database.
You need a new attribute set that might include element names
such as NAME and EMAIL ADDRESS, but not PHONE NUMBER.  Note that
even though we might have record semantics that support the
display of phone numbers, we might be using an attribute set that
does not support searching on them.  In the example in Figure 1,
the record syntax is the same one used in the Campus Wide
Information System Protocol (CWISP), [8] which has a printable
ASCII format.  As for error messages, you might just choose to
keep on using the bib-1 diagnostic set since it is fairly
generic.

-----------------------------------------------------------------
Figure 1.  Example Student Directory Database
-----------------------------------------------------------------

     + record syntax = CWISP
          name: Kunze, John A.
          email: jak@violet.berkeley.edu
          phone: 510-642-1530
     + attribute set = student-1
     + diagnostic set = bib-1

-----------------------------------------------------------------

+ Page 10 +

This approach to nonbibliographic data is problematic.  There is
a lot of work involved in writing code to support each new set of
semantics.  Imagine defining a new set of semantics for each one
of your database formats, going through the registration process
with NISO, then writing code to support each.  If a database
format or user search requirement changes, you need to register
again with NISO and stop interoperating with the systems you used
to work with until the new format gets approved.  If you want to
support more than one set of semantic modules, your code will
have to link in all your corresponding program modules.  With
code hardwired to each attribute set and record format, there is
nothing to prevent one user interface to ten database formats
from looking like ten different user interfaces, which is exactly
the situation that Z39.50 was supposed to avoid.
     Berkeley's Infocal system will have at least a half dozen
database formats and they will probably each change once a year,
so this scheme will not work for us.  Fortunately, there is
another approach to nonbibliographic information that addresses
the problem of retrieving information from databases whose format
is unknown until the database is first accessed.  What is needed
is an attribute set, a record syntax, and a diagnostic set, but
they must be general purpose and dynamic.

4.0   The Dynamic General-Purpose Attribute Set Info-1

For the purposes of discussion, it will be useful to define some
terms: Use attributes, elements, Explain facility, and objects.

4.1  Use Attributes and Elements

Server information records are structured to allow searching on
and retrieval of different combinations of individual record
components.  These processes, completely defined by a server, are
mapped onto canonical Z39.50 processes so that clients have a
common language to access information.  In order to give servers
adequate flexibility, Z39.50 distinguishes between searchable and
retrievable record components.  Searchable components are called
"Use attributes" and retrievable components are called
"elements."  A server is free, for example, to map a Use
attribute called "name" onto five different record components.

+ Page 11 +

4.2  Explain Facility

The Explain facility is a set of mechanisms integrated with the
Z39.50 Search and Present facilities that retrieve information,
both human- and machine-readable, about an information server.
For example, searches against the reserved database name, "IR-
Explain-1," can return a schedule of server availability, a list
of database names, and a list of browsable hierarchies.  It can
be searched by database name to return a list of valid attribute
combinations, various levels of human-readable record component
descriptions, and element set names.  When the client encounters
an unknown database, it uses the Explain facility to build menus
for searching and to provide help text for explaining query
semantics.

4.3  Objects

In this paper, an object is anything (e.g., book, painting,
person, digitized image, or electronic document) having an
associated electronic information record that may be accessed
through a unique identifier, called the object identifier
(objectID), about which more will be said later.  This record may
be thought of as an object citation, the elements of which
describe aspects of the object, such as its name, what kind of
thing it is, where it comes from, when it came into being, where
it is currently located, and a brief description.  An object
citation easily maps onto the Z39.50 record model and is a
natural access point for many, if not all, aspects of the object.
Therefore, if the object exists primarily in electronic form, a
citation element may contain a copy of the object itself.
     In cases where Explain is not used and where sophisticated
processing is not required, a server may map these record
components to a small set of generic, previously defined Use
attribute values and element names for search and retrieval. To
be useful, most of these elements would be printable ASCII text
suitable for human consumption.  This provides a kind of base-
level service from databases of otherwise unknown semantics
without requiring the Explain facility.  The Table 2 below shows
the generic names (given symbolically here, although the protocol
uses integer tags) in the first column, and examples of how four
different databases might be mapped to them.  The same tags can
be used to identify returned elements.

+ Page 12 +

-----------------------------------------------------------------
Table 2.  Generic Use Attributes (Elements) and Example Mappings
-----------------------------------------------------------------

Tag            Bibliographic Database   Personnel Database

Type           "Book"                   "Employee"
Name           Title                    Employee name
By             Author                   Organization
Location       Publisher                Phone/address
Date           Publication date         Hire date
Abstract       Contents                 Job description
Object         Text/(N/A)               N/A


Tag            Course Database          Art Database

Type           "Class"                  "Artifact"
Name           Class title              Title
By             Instructor/department    Artist
Location       Room/building            Owner
Date           Meeting time             Creation date
Abstract       Course description       Description
Object         N/A                      Bitmap/(N/A)

-----------------------------------------------------------------

The attribute set info-1 (registered with NISO as
1.2.840.10003.3.1000.2.1) was created by first making a copy of
the bib-1 attribute set, complete with its broad categories (or
types) of attributes and erasing all the pre-defined bib-1 Use
attribute names.  A small set of Use attribute names was then
reserved as shown below in Table 3.  The generic tags for concept
and conceptId are useful for relevance feedback searches that
look for objects similar to an object specified with the search
term; a concept term contains the object, and a conceptId term
contains a pointer to the object.

+ Page 13 +

-----------------------------------------------------------------
Table 3.  Dynamic Attribute Set Info-1
-----------------------------------------------------------------

Attribute Types

     Type                Example

     Use                 (Statically undefined except for generic
                         tags)
     Relation            =, !=, >, >=, <, <=
     Position            First in element, last in element, etc.
     Structure           Date, name, word, etc.
     Truncation          Right, left, etc.
     Completeness        N/A

Generic and Predefined Use Attribute (Element) Tags

     Generic Tags        Predefined Tags

     1  Type             Kind
     2  Name             What
     3  By               Who
     4  Location         Where
     5  Date             When
     6  Abstract         What+
     7  Object           What++
     8  Concept          What
     9  ConceptId        *What
     10 Satisfier        Which~
     11 Userinfo         Miscellaneous
     12 Any              Any elements server wants
     13 Keywords         Extra index terms
     14 Record size      Estimate in bytes
     15 Record update    Last update of record
     16 Provider         Who maintains record
     17 ObjectID         For fast object access
     18 Explain          Bootstraps tags
     19 Default          Any single element
     20 PobjectKey       Physical object id
     21 IrecordKey       Internal record id

-----------------------------------------------------------------

+ Page 14 +

This sketches out a minimal generic query interface.  What about
the problem of displaying server responses to the user?  A
generic diagnostic set is not hard to define, and the bib-1
diagnostic set with a few additions would be an adequate start.
Coming up with a generic record syntax is a little more
complicated.

5.0  The Dynamic General-Purpose Record Syntax Info-1

Referring back to the list of features in section 2.0 helps
motivate some of the design decisions for a dynamic general-
purpose record syntax.  The client still needs a way to discover
what is being returned from a database of dynamically defined or
unknown semantics.  It also needs to support documents, images,
hierarchies, and hypertext.  There is no obvious way to do all
this within the existing protocol control kernel, so the next
place to look is in the semantic modules for queries and
information.
     The semantics of information returned by a server is given
in three ways.  In classical Z39.50, a registered record syntax,
such as MARC, informs the client that a stream of bits is
structured into specific elements containing well-defined types
of data.  In a second scheme (the flip side of the generic Use
attribute tags in Tables 2 and 3), generic element tags accompany
returned data elements.  This involves using the record syntax
info-1 to hold elements that, for the most part, contain visible
ASCII.  The third scheme uses the info-1 syntax for records with
semantics dynamically defined with Explain and the element set
name parameter.  This allows for tagged data or, when efficient
volume transfer is called for, positional, untagged data.

5.1  Documents

As mentioned earlier, an object citation record provides a
natural way to access the object itself, provided it is stored in
an electronic form.  In considering online documents, several
issues come up.  How does a client request a document citation
minus the full text?  How does a full-text data element fragment
across records?  How do citations relate to links needed for
hierarchical browsing and hypertext?  How does a client select
the form of a document that varies in several dimensions, such as
word processing format, language, compression technique, and
version?  How does it even find out what these variant forms are?
These questions generalize easily to other electronic objects
such as digitized images.

+ Page 15 +

5.2  Variants

It is not always feasible to index a group of closely related
objects separately.  This means that there will be cases when a
single Z39.50 result set record is the only access point for
multiple underlying variant objects.  Objects can vary in four
dimensions: composition, encoding, language, and version.  For
example, a pamphlet may be available that has: (1) composition
using TEX, Postscript, and Troff; (2) compression encodings of
ZIP and "compress"; and (3) text written in French, Spanish, and
English.
     One citation for this pamphlet would cover 18 variant
pamphlets (note that some variants, and objects for that matter,
may not need to be stored but are generated on demand).
Registered tags known as qualifiers identify variants in client
requests using the element set name parameter described below.
With the info-1 record syntax, server responses can include
qualifiers with returned data.  The version qualifier together
with a variant message allows a server to define a variant
dimension of its own.  Clients may find out what variants exist
for an element by using the element set name parameter; the
server responds with a record, each element of which is empty
except for a combination of variant qualifiers.  The current
qualifiers for info-1 appear in Table 4.

-----------------------------------------------------------------
Table 4.  Currently Defined Info-1 Qualifier and Value Tags
-----------------------------------------------------------------

1 Composition   2 Encoding        3 Language   4 Version

1 Text          1 UNIX compress   1 English    (Server-defined)
2 Hytext        2 UNIX tar        2 French
3 PostScript    3 JPEG            3 Spanish
4 TIFF          4 G3 FAX
5 MARC          5 G4 FAX
6 SGML+DTD
7 SGML-DTD
-----------------------------------------------------------------

The composition "Text" refers to an ASCII text variant that would
display reasonably in an 80-column window, containing lines
terminated by ASCII new line and possibly a carriage return.  The
composition "Hytext" refers to a hypertext variant similar to
Text that may contain short or long objectID references of either
the form "@(shortref@ objectID @)shortref@" or "@(longref@
objectID @)longref@."

+ Page 16 +

     Although the generality afforded by these variant qualifiers
is indispensable, it is unlikely that any given Z39.50 dialogue
will make use of more than a handful of them.  Instead of
including them in each Present, in the long term it makes sense
to let them default to values determined when the dialogue is
first established; however, at the moment, the protocol does not
support this.

5.3  ObjectIDs

Central to any system that supports hypermedia and hierarchical
browsing (the second being a special case of the first) is a
robust, generalized way to reference a networked object.  As
pointers to objects, objectIDs are efficient to exchange and
remain valid as the underlying objects are updated.  If
individual records in a search result set have objectIDs, they
provide a short cut to accessing those records next time.
     Earlier an objectID was defined as a unique identifier used
to access an electronic information record associated with an
object.  As long as each has a unique identifier, objects may
have multiple associated information records.  Identical copies
of an object (e.g., for redistribution) may have different
objectIDs for routine access, even though this poses problems for
clients that collect objects from disparate servers and need to
know when they have more than one copy of the same object.  For
this reason, the original objectID is carried within the
objectIDs of copies.
     Each element of an information record contains an aspect of
an object (e.g., an electronic instance of it).  In order to
represent this with the info-1 record syntax, we need to define a
field as an info-1 component containing either an entire element
or a fragment of it.  In order to access an element fragment, a
minimal objectID must contain the server name, internal object
control number, and fragment address.  When objectIDs are used
for Z39.50 retrieval, many parameters have to be supplied by
convention and much of the generality of Search and Present goes
unused.  For example, a more generalized Z39.50 objectID might
contain server, port, database, attributes, term, element, and
fragment.
     Beyond the Z39.50 context, a proposal for Universal Document
Identifiers (not restricted to text) [9] has made a compelling
argument for UDIs (objectIDs) encoded in visible ASCII that
transcend protocol (e.g., FTP, WAIS, and Z39.50).  The visible
ASCII requirement allows objectIDs to be exchanged in e-mail
messages and to appear in printed publications.

+ Page 17 +

     An important idea missing from the UDI proposal is optional
descriptive information.  Normally stored as elements of the
associated info-1 record, this information needs to be bound
closely to objectIDs that will appear in menus, since a separate
server access will be required to return elements that the user
needs to see (e.g., document title) before a selection will be
possible.  The close binding makes it easy to update remote menus
through a kind of re-linking process.  The proposal also requires
that no whitespace characters appear in a UDI, but given the
length of Z39.50 objectIDs, nonsignificant whitespace needs to be
allowed to assist readability.  To restrict even one of the many
objectID components to, say, the number of characters that fit on
one text line is not feasible.  It is worth noting the similarity
between this objectID format and a text-based query language.
     Equipped with hybrid UDI-style objectIDs, the client system
programmer has a powerful tool for building hypermedia systems.
A few operations on objectIDs would be useful, though not
required for interoperability:

     (1) Open: begin accessing object.

     (2) Read: sequential or random access to object.

     (3) Close: end accessing object.

     (4) Sync: get fresh copy of objectID.

     (5) Compare: client test for identical objectIDs.

In an object-oriented sense, these operations will depend heavily
on the server and object's type.  Open and Close are merely
advice giving operations; for example, they might be ignored on
stateless servers that do not keep track of whether an object is
open.  The Read operation will likely resemble file I/O for
document and image objects, with a mix of sequential and random
access capability depending on the server.  On the other hand, if
an object is a database, a server, or a person, even sequential
access is unlikely.  The Sync operation returns an updated
objectID, providing a mechanism for replacing stale descriptive
information and obtaining a new forwarding address.  One reason
for not putting copyright disposition inside an objectID is that
even with the Sync operation, an objectID could not normally be
trusted to be either current or authoritative.  Operations are
specified using the element set name parameter.

+ Page 18 +

     ObjectID's are hierarchical in that they consist of a
sequence of increasingly specific components.  The shorter the
sequence, the higher level the object (e.g., a fragment has a
long sequence).  Some of the high-level components may be
inferred, so that, for example, every object in a database need
not be stored with its full objectID.  Not only servers but also
clients must be able to parse Z39.50 objectIDs in order to
understand what level of object (e.g., server or fragment) is
implied.  A received objectID could then be modified to imply
lower or higher level objects.  Here is an example of a Z39.50
objectID identifying the first 4096 bytes of a simple ASCII text
version of this article:

{ir infocal.berkeley.edu 210
        DB_docid objectID kunz92 object_text 1-4096 {}
        {"Nonbibliographic Applications of Z39.50" 10/16/92}}

Rather than use the special purpose notation of the UDI proposal,
this example uses a different structure (offered without further
explanation at this time as it has not been finalized) expressed
with Tcl. [10]  This was an easy choice since Tcl expresses
hierarchical lists with quoting and non-ASCII capability, and the
Tcl software to read and build such lists is freely available for
UNIX, Macintosh, and PC platforms.
     In generalizing objectIDs to this extent, care must be taken
to define what happens when access errors occur or when the
object is not a monolithic element fragment.  If, for example,
the object is a query that returns multiple records from a result
set, a reference to the object may cause the client either to
start up an interactive dialogue with that result set or to
simply display the records noninteractively.

+ Page 19 +

5.4  Satisfying Sections

Sometimes a search locates a record based on criteria not
immediately apparent to the user.  For example, a server may
match on synonyms generated from the user's term or use a
relevance feedback mechanism.  Particularly in cases where an
element that triggered a match is large and matches at multiple
locations, there is a need to return a list of satisfying
sections or "hits" within an element.
     While a satisfying section may indicate a simple segment of
bytes, for some objects a server may disallow byte addressing and
offer section addressing instead, where an element (e.g., a
document) is divided at the server into a sequence of variable-
length sections (e.g., physical page frames or SGML elements).
In this case, each section has its own objectID, called a
sectionID, that a client submits as the "current" section,
relieving the server of having to maintain state information
needed for the client to retrieve the next, previous, or current
section.

5.5  Element Set Names

The element set name parameter is used by clients to request that
returned records be composed of a particular combination of
elements.  It has been provisionally extended to a composite of
several hierarchically arranged parameters that currently do not
have a well-defined role within Z39.50 and use the element set
name parameter as a temporary home.  They specify element set,
fragment, variant, and more, as listed in Table 5.  Until better
solutions are found through experience, a Tcl-based list format
for this parameter is suggested, with the provisional format
selected (instead of the classical format) whenever the first
character of the parameter is the "{" character.
     At the top level, clients can request an element set name
understood through the Explain facility to refer to a particular
combination of elements.  By default, info-1 elements are
returned in a tagged format, but an efficient untagged
(positional) format may be requested.  An ordered sequence of
field parameters can be used to request a particular record
makeup.  Each field parameter identifies an element and optional
information about operations, variants, and fragments.

+ Page 20 +

-----------------------------------------------------------------
Table 5.  Element Set Name Parameters
-----------------------------------------------------------------

Top-Level Element Set Name Parameters

     Unit           Meaning

     Esname         Set name discovered via Explain
     Zsname         Predefined classical Z39.50 name ("F" or "B")
     Untagged       If nonzero, do not tag returned elements
     Field          Per field specifiers and qualifiers

Field Specifiers and Qualifiers

     Unit           Meaning

     Name           Element (tag) requested
     ObjectID       Element or fragment identifier
     Operation      Open, Close, Read, and Sync
     Fragment       Fragment specifier
     Variants       Get list of variants if nonzero
     Composition    Format (text, hytext, TIFF, MIDI, etc.)
     Encoding       Archive/compression (tar, JPEG, MPEG, etc.)
     Language       For text (French, English, Spanish, etc.)
     Version        Server-defined variant (VMS, Ultrix, DOS,
                    etc.)

Fragment Specifiers

     Unit           Meaning

     Start          Offset of beginning of fragment (bytes)
     End            Offset of end of fragment (bytes)
     Length         Length of fragment (bytes)
     Section        Next, Previous, First, Last, Current, and
                    Best
     Units          Bytes, Lines, Paragraphs, Pages, and Frames

-----------------------------------------------------------------

A fragment may be requested by relative section or offset,
depending on the element variant.  The default unit for numeric
fragment specifiers is bytes, but alternate units may be
requested.  Info-1 field fragments come with flags indicating if
the beginning or ending of an element was returned.  An element
or fragment objectID (once the format is finalized) is capable of
expressing all the per field requests and can be used instead of
the per field parameters.

+ Page 21 +

     It is worth mentioning that the "Best" section specification
above makes most sense when the server is returning a record from
a result set built according to user-supplied criteria;
otherwise, it may mean anything that the server likes.  Also,
"Pages" specifies a publisher-defined unit.
     In Figure 2 are two example element set name parameters.
The first one specifies generic elements "object" and "by," with
only the first 8192 bytes of a JPEG-compressed TIFF image variant
requested.  In the second example, an efficient positional
sequence of elements is requested.

-----------------------------------------------------------------
Figure 2.  Element Set Name Parameter Examples
-----------------------------------------------------------------

Example 1

     {field {name "object"
          composition "TIFF"
          encoding "JPEG"
          fragment {start 1 end 8192}}
          field {name "by"}}

Example 2

     {field {name "name"}
          field {name "date"}
          field {name "by"}
          untagged 1}

-----------------------------------------------------------------

5.6  Extensions to the Present Capability

Some extensions to the Present facility to support object
retrieval would be extremely desirable, but formal extensions may
benefit by a few short-term conventions.  Current proposals for
document retrieval call for an internal control number search
with a piggy-backed Present of what is expected to be a single
record result set.  This method was partly chosen to allow for
stateless servers that do not keep result sets.  In terms of
objects, this sort of search is so common and so specialized (in
that the result count must be zero or one), that it really
belongs as a special kind of Present.  Using the objectID element
set name parameter in a Present against the reserved result set
name ("_NoResultSet_") a server could be truly stateless and not
have to support Search at all, let alone the piggy-backed
document Present kludge.

+ Page 22 +

     Another urgently required extension is batched Present
Responses.  A request for an entire object (e.g., an image)
fragmented into multiple Present Responses at the server would
allow for much more efficient data transfer than having the
client take receipt of each fragment before being able to request
the next one.  In the short term, a client Present against the
reserved result set name ("_BatchPresent_") could authorize a
server to fragment the requested element and send the pieces in
multiple responses.  A new temporary fragment flag could indicate
when the batched responses are done or if the server refuses to
batch them.
     Those clients that need to retrieve scattered records (e.g.,
in result set browsing) or those that need the server to send a
sampling from a result set also need protocol extensions.
Currently record numbers are not returned nor is it possible to
request noncontiguous records or records from multiple result
sets in a single request.  Temporary solutions using new element
set name parameters and a new reserved result set name may be
called for.

5.7  Copyright Statements

When the copyright component of an info-1 field is present, it
contains an objectID that can be used to obtain a copyright
statement.  Once the client has retrieved and displayed the
statement, future occurrences of that copyright identifier during
the same session may obviate the need to display it again,
depending on the legal obligations.

5.8  Formal Info-1 Structure

In Figure 3 is the formal ASN.1 [11] structure of the general-
purpose record format being described.  Its main job is to
contain a sequence of data fields.  A substantial number of
diagnostics still need to be added to the bib-1 diagnostic set to
support the new functionality that info-1 promises.

+ Page 23 +

-----------------------------------------------------------------
Figure 3.  Formal Info-1 Record Structure
-----------------------------------------------------------------

BEGIN

-- Note that lots of things are VisibleString because the
-- client may need to resubmit them in a Present as an element
-- set name parameter.

GenericRecord ::= SEQUENCE {

     elementSetName      ElementSetName OPTIONAL,
     fieldCount          [0] IMPLICIT INTEGER OPTIONAL,

-- These fields allow client shortcuts in record parsing.

     userMessage         [1] IMPLICIT VisibleString OPTIONAL,

-- Anything not covered elsewhere.

     rank                [2] IMPLICIT INTEGER OPTIONAL,

-- Used for weighted result sets.
-- Large data (fields and records) go at the end so clients
-- doing partial parsing of records see headers first.

     positionalFields    [3] IMPLICIT SEQUENCE OF OCTET STRING
                         OPT.,

-- Lightweight for efficient transfer (e.g. tables and files).
-- Positional semantics from Explain and elementSetName.

     taggedFields        [4] IMPLICIT SEQUENCE OF TaggedField
                         OPT.,

-- Generalized fields.

     records             [5] IMPLICIT SEQUENCE OF GenericRecord
                         OPTIONAL

-- Composite/hierarchical record (e.g., holdings records).
-- Records at end allow tail recursion elimination.
}

+ Page 24 +

TaggedField ::= SEQUENCE {

     tag                 [6] IMPLICIT INTEGER,

-- Tagged fields for variable or unExplained element sets.
-- The same tag may occur in more than one field (e.g., for
-- element variants or multiple abstracting levels).

     value               [7] IMPLICIT OCTET STRING OPTIONAL,

-- Data element fragment.

     objectID            ObjectID OPTIONAL,

-- Identifies current fragment if value present so that the
-- server need not maintain client's location within an element;
-- identifies "qualified" element if value absent.

     hits                [8] IMPLICIT SEQUENCE OF
                         SatisfyingSection OPT.,

-- Byte offsets and lengths of hits within this element are
-- not necessarily in this fragment.

     copyrightID         ObjectID OPTIONAL,

-- Means element is copyrighted.  This is a special objectID
-- used to retrieve actual copyright on demand; we don't
-- want to ship a legal document with each element.

     flags               [9] IMPLICIT BIT STRING { endOfElement
                         (0), beginningOfElement (1)},
     qualifiers          [10] IMPLICIT SEQUENCE OF Qualifier
                         OPTIONAL,
     variantSize         [11] IMPLICIT INTEGER OPTIONAL,
     variantMessage      [12] IMPLICIT VisibleString OPTIONAL
}

SatisfyingSection ::= SEQUENCE {

     fragmentID          ObjectID OPTIONAL,
     offset              [13] IMPLICIT INTEGER,
     length              [14] IMPLICIT INTEGER
}

+ Page 25 +

Qualifier ::= SEQUENCE {

     qualifierType       [15] IMPLICIT INTEGER,

-- Composition, Encoding, Language, and Version.

     qualifierValue      [16] IMPLICIT INTEGER

-- Composition: TEX, TIFF, MIDI, etc.
-- Encoding: Compress, tar, JPEG, MPEG, etc.
-- Language: French, English, etc.
-- Version: (Statically undefined).

}

ObjectID ::=             [17] IMPLICIT VisibleString

ElementSetName ::=       [103] IMPLICIT VisibleString
END
----------------------------------------------------------------

5.9  Two Examples

It may be useful to walk through an example of a relevance
feedback search followed by a retrieval of an electronic document
without using the Explain facility.  A relevance feedback search
involves specifying a special Use attribute called "concept" for
a search term containing a segment of text; the server tries to
find documents that are somehow similar to the text and
constructs a result set of citation records, each of which
identifies a document together with a ranking to indicate the
degree of similarity.  If the server orders the result set with
highest ranked documents first, the Search Response can carry
citations (not including the full text) for the most similar
documents back to the user.

+ Page 26 +

     The client may elect to see the full text of a document in
one of two ways.  An older way involves doing another search--
this time giving a term containing the objectID (from the
citation), an "objectID" Use attribute, and a database name that
is somehow well-known to the client.  This is not a normal search
because it must have a single-valued result that is piggy-backed
onto the Search Response and is never accessed again as a result
set.
     A cleaner way to return the text treats the operation as a
special retrieval on an objectID, but without a result set.  This
suggests that retrieval could take place without a prior search,
and, in fact, the protocol supports sessions that allow Present
while disallowing Search; a number of single-purpose document
servers would likely choose such a configuration.  Currently,
this kind of retrieval would be accomplished by using the Present
parameters of element set name for the objectID and result set
name for the reserved name "_NoResultSet_."  This is illustrated
below in Figure 4.

+ Page 27 +

-----------------------------------------------------------------
Figure 4.  Example of Document Retrieval
-----------------------------------------------------------------

(1) Send document citation query based on known text segment.

     {attributeType = 1(use)
     attributeValue = 8(concept)
     attributeType = 2(relation)
     attributeValue = 3(equals)
     term = <bytes of text segment go here>}

     Note: normal search, no special element set name needed
     since well-behaved servers don't return large objects with
     citations, but indicate which variants are available for the
     object field by repeating the field without the optional
     value component.

(2) Show returned info-1 records, each of the following form.

     {rank = K
     taggedFields = {
     {tag = 2(name) value = <document title>}
     {tag = 3(by) value = <document author>}
     {tag = 4(location) value = <document location>}
     {tag = 5(date) value = <publication date>}
     {tag = 6(abstract) value = <abstract>}
     {tag = 7(object) variantSize = S1 hits = {offset = M1}
     qualifiers = {objectID = <documentID1>
     qualifierType = 1(composition) qualifierValue = Q1}}
     {tag = 7(object) variantSize = S2 hits = {offset = M2}
     qualifiers = {objectID = <documentID2>
     qualifierType = 1(composition) qualifierValue = Q2}}
     {tag = 7(object) variantSize = S3  . . .
     }}}

(3) Ask for best section of first document using Present.

     elementSetName = {objectID = <documentID1>name = 6(object)
     fragment = {start = M1 end = M1+4096}}
     resultSetName = _NoResultSet_

(4) Show returned info-1 record, having the following form.

     {taggedFields = {tag = 6(object)  value = <section bytes>}

-----------------------------------------------------------------

+ Page 28 +

The second example illustrates retrieval against databases of
unknown semantics, again without using the Explain facility.
Consider a server that allows retrieval of course scheduling
information for a university.  The server maps the course catalog
record components onto generic attribute and element tags and
publicizes the database name (but, for this example, not using
Explain).  Figure 5 shows how the client can retrieve selected
elements in an efficient untagged format for all courses taught
by "Smith."

-----------------------------------------------------------------
Figure 5.  Example of Course Catalog Retrieval
-----------------------------------------------------------------

(1) Send query with appropriate element set name parameters.

     query = {
          attributeType = 1(use) attributeValue = 3(by)
          attributeType = 2(relation) attributeValue = 3(equals)
          attributeType = 3(position) attributeValue = 3(any)
          term = "Smith"}}

     elementSetName = {
          untagged = 1
          {name = 3(by)} {name = 2(name)} {name = 6(abstract)}}

(2) Display positional fields in returned info-1 records.

     record1 = {untaggedFields = {<by1> <name1> <abstract1>}}
     record2 = {untaggedFields = {<by2> <name2> <abstract2>}}
     . . .

-----------------------------------------------------------------

+ Page 29 +

6.0  Conclusion

Z39.50 is a workable protocol for more than bibliographic
information retrieval even though the client system programmer
still has to work out some details.  Steady growth in the number
of nonbibliographic implementors has flushed out some weaknesses
in the protocol.  Active development and interest from the
computer industry and educational institutions are providing
exactly the kind of cross-pollination the protocol needs to
become more robust.  Key to this evolutionary process will be the
containment of a potential explosion in the number of semantic
contexts while at the same time making sure that a few contexts
are rich enough to build compelling general-purpose user
interfaces from them.


References and Notes

1.  For a more detailed description of the Z39.50 protocol see:
Clifford A. Lynch, "Information Retrieval as a Network
Application," Library Hi Tech 8, no. 4 (1990): 57-72.

2.  John A. Kunze, "UCB Network Information Server--Project
Overview" (Paper presented at the University of California
Academic Computing Conference, 1989).  (Computer file:
help/dist/nis.txt, available via anonymous ftp from
ftp.cc.berkeley.edu.)

3.  This work was partially supported by Digital Equipment
Corporation and Sun Microsystems, Inc.

4.  ANSI/NISO Z39.50-199X, Proposed ANSI Information Retrieval
Application Service Definition and Protocol Specification for OSI
(Vienna, VA: Omnicom Information Service, 1991).

5.  Brewster Kahle, "Document Identifiers, or International
Standard Book Numbers for the Electronic Age" (n.p.: 1991).
(Computer file: ZIG91-46, available via anonymous ftp from
think.com.)

6.  Clifford Lynch, "Extensions to ISO DP 10162/10163 to Support
an Explain Service" (n.p.: 1989).  (Computer file: ZIG90-9,
available via anonymous ftp from think.com.)

7.  Network Development and MARC Standards Office, USMARC Concise
Formats for Bibliographic, Authority, and Holdings Data
(Washington, DC: Cataloging Distribution Service, Library of
Congress, 1988).

+ Page 30 +

8.  CWISP Working Group, "Campus Wide Information System
Protocol--Version 0.50 RFC, Draft 4" (n.p.: 1991).

9.  Tim Berners-Lee et al., "Universal Document Identifiers on
the Network" (n.p.: 1992).  (Computer file: pub/www/doc/udi1.ps,
available via anonymous ftp from info.cern.ch.)

10.  John Ousterhout, "Tcl: An Embeddable Command Language," in
Proceedings USENIX Winter Conference, January 1990 (Berkeley: The
USENIX Association, 1990).

11.  International Organization for Standardization, OSI
Specification of Abstract Syntax Notation One (ASN.1) (Vienna,
VA: Omnicom, Inc., 1987).


About the Author

John A. Kunze, Information Systems and Technology, 289 Evans
Hall, UC Berkeley, Berkeley, CA 94720, (510) 642-1530.  Internet:
jak@violet.berkeley.edu.

-----------------------------------------------------------------
The Public-Access Computer Systems Review is an electronic
journal that is distributed on BITNET, Internet, and other
computer networks.  There is no subscription fee.
     To subscribe, send an e-mail message to LISTSERV@UHUPVM1
(BITNET) or LISTSERV@UHUPVM1.UH.EDU (Internet) that says:
SUBSCRIBE PACS-P First Name Last Name.  PACS-P subscribers also
receive two electronic newsletters: Current Cites and Public-
Access Computer Systems News.
     This article is Copyright (C) 1992 by John A. Kunze.  All
Rights Reserved.
     The Public-Access Computer Systems Review is Copyright (C)
1992 by the University Libraries, University of Houston.  All
Rights Reserved.
     Copying is permitted for noncommercial use by computer
conferences, individual scholars, and libraries.  Libraries are
authorized to add the journal to their collection, in electronic
or printed form, at no charge.  This message must appear on all
copied material.  All commercial use requires permission.
-----------------------------------------------------------------