2182 lines
97 KiB
Plaintext
2182 lines
97 KiB
Plaintext
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Research Report AI-1990-01
|
||
|
||
Artificial Intelligence Programs
|
||
|
||
The University of Georgia
|
||
|
||
Athens, Georgia 30602
|
||
|
||
|
||
Available by ftp from
|
||
|
||
aisun1.ai.uga.edu
|
||
|
||
(128.192.12.9)
|
||
|
||
|
||
Series editor:
|
||
|
||
Michael Covington
|
||
|
||
mcovingt@aisun1.ai.uga.edu
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
1
|
||
|
||
|
||
A Dependency Parser for Variable-Word-Order Languages
|
||
|
||
|
||
Michael A. Covington
|
||
|
||
|
||
Artificial Intelligence Programs
|
||
|
||
The University of Georgia
|
||
|
||
Athens, Georgia 30602
|
||
|
||
MCOVINGT@UGA.UGA.EDU
|
||
|
||
|
||
January 2, 1990
|
||
|
||
|
||
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
|
||
|
||
0. Introduction
|
||
|
||
1. Variable word order: the problem
|
||
|
||
2. Dependency grammar (DG)
|
||
|
||
3. Unification-based dependency grammar
|
||
|
||
4. The parsing algorithm
|
||
|
||
5. The implementation
|
||
|
||
6. Evaluation of the IBM 3090 environment
|
||
|
||
7. Remaining issues
|
||
|
||
8. Conclusion
|
||
|
||
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
|
||
|
||
|
||
0. Introduction
|
||
|
||
|
||
This paper presents a new approach to the recognition of
|
||
|
||
sentence structure by computer in human languages that have
|
||
|
||
variable word order. In a sense, the algorithm is not new; there
|
||
|
||
is good evidence that it was known 700 years ago (Covington 1984).
|
||
|
||
But it has not been implemented on computers, and the modern
|
||
|
||
implementations that are most like it fail to realize its crucial
|
||
|
||
advantage for dealing with variable word order.1 In fact,
|
||
|
||
present-day parsing technology is so tied to the fixed word order
|
||
|
||
of English that researchers in Germany and Japan customarily build
|
||
|
||
parsers for English rather than their own languages.
|
||
|
||
|
||
The new algorithm uses dependency grammar. Unlike the more
|
||
|
||
usual phrase structure grammars, a dependency grammar does not
|
||
|
||
divide the sentence up into phrases (constituents); instead, it
|
||
|
||
identifies the grammatical relations that connect one word to
|
||
|
||
another. This is advantageous in languages where the order of
|
||
|
||
words is variable and many of the constituents are discontinuous.
|
||
|
||
|
||
|
||
|
||
1Phyllis McLanahan provided invaluable assistance with
|
||
|
||
Russian data. The early stages of this work were supported by
|
||
|
||
National Science Foundation Grant IST-85-02477. The VM/Prolog
|
||
|
||
implementation of GULP was developed while visiting the Seminar
|
||
|
||
f<>r nat<61>rlich-sprachliche Systeme, University of T<>bingen. Norman
|
||
|
||
Fraser and Richard Hudson provided valuable encouragement during
|
||
|
||
the later stages of the project. Responsibility for opinions and
|
||
|
||
errors rests solely with the author.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
2
|
||
|
||
|
||
The algorithm presented here is implemented in Prolog on an
|
||
|
||
IBM 3090 and has been used successfully to parse Russian and Latin
|
||
|
||
sentences. The IBM 3090 is well suited for this and other
|
||
|
||
applications in artificial intelligence and symbolic computing
|
||
|
||
because of its large address space, memory caching, and ability to
|
||
|
||
prefetch along both alternatives of a conditional branch.
|
||
|
||
Performance is currently limited not by the hardware, but by the
|
||
|
||
VM/Prolog interpreter, which could be replaced by a considerably
|
||
|
||
faster compiler.
|
||
|
||
|
||
|
||
1. Variable word order: the problem
|
||
|
||
|
||
1.1. Most human languages have partly variable word order
|
||
|
||
|
||
Most of the languages of the world allow considerably more
|
||
|
||
variation of word order than does English. For example, the
|
||
|
||
English sentence
|
||
|
||
|
||
The dog sees the cat.
|
||
|
||
|
||
has six equally grammatical Russian translations:
|
||
|
||
|
||
Sobaka vidit koshku.
|
||
|
||
Sobaka koshku vidit.
|
||
|
||
Vidit sobaka koshku.
|
||
|
||
Vidit koshku sobaka.
|
||
|
||
Koshku vidit sobaka.
|
||
|
||
Koshku sobaka vidit.
|
||
|
||
|
||
These differ somewhat in emphasis but not in truth conditions. The
|
||
|
||
subject and object are identified, not by their positions, but by
|
||
|
||
their inflectional endings (-a for nominative case and -u for
|
||
|
||
accusative). By switching the endings, one can say "the cat sees
|
||
|
||
the dog" without changing the word order:
|
||
|
||
|
||
Sobaku vidit koshka. (etc.)
|
||
|
||
|
||
The languages of the world can be ranked as to the amount of
|
||
|
||
word order variability they allow. For example:
|
||
|
||
|
||
Almost no variation: Chinese, English, French
|
||
|
||
Some variation: Japanese, German, Finnish
|
||
|
||
Extensive variation: Russian, Latin, Korean
|
||
|
||
Maximum variation: Walbiri (Australia)
|
||
|
||
|
||
Because English is at one end of the scale -- permitting almost no
|
||
|
||
word order variation -- it is a poor sample of a human language to
|
||
|
||
do research on. A priori, one would expect that parsing techniques
|
||
|
||
developed solely for English might not work at all for other
|
||
|
||
languages, and might not be correct on a deeper level even for
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
3
|
||
|
||
|
||
English. In what follows I shall argue implicitly that this is the
|
||
|
||
case.
|
||
|
||
|
||
|
||
1.2. Phrase-structure grammar (PSG) cannot handle variable word
|
||
|
||
order
|
||
|
||
|
||
Virtually all present-day linguistic theories analyze the
|
||
|
||
sentence by breaking it into substrings (constituents). For
|
||
|
||
example:
|
||
|
||
|
||
|
||
S
|
||
|
||
|
||
|
||
|
||
|
||
NP VP
|
||
|
||
|
||
|
||
D N V NP
|
||
|
||
|
||
|
||
D N
|
||
|
||
|
||
|
||
The dog sees the cat.
|
||
|
||
|
||
|
||
Here the dog is a noun phrase, sees the cat is a verb
|
||
|
||
phrase, and the grammar consists of phrase-structure rules (PS-
|
||
|
||
rules) such as
|
||
|
||
|
||
S Ä NP + VP
|
||
|
||
|
||
NP Ä D + N
|
||
|
||
|
||
supplemented to some extent with rules of other types.
|
||
|
||
|
||
This approach has been highly successful with English, but
|
||
|
||
it has trouble with variable word order for two reasons. First,
|
||
|
||
the order of the constituents is variable. Second, and more
|
||
|
||
seriously, variable word order often results in discontinuous
|
||
|
||
constituents. Consider the following sentence, from a short story
|
||
|
||
by Lermontov:
|
||
|
||
|
||
Khoroshaya u tebia loshad'.
|
||
|
||
nom. pobj. nom.
|
||
|
||
good with you horse
|
||
|
||
|
||
= `You have a good horse.'
|
||
|
||
(lit. `A good horse is with you.')
|
||
|
||
|
||
Here `good' and `horse' should, on any reasonable analysis, form a
|
||
|
||
constituent. Yet `you', which is not part of this constituent,
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
4
|
||
|
||
|
||
intervenes. A phrase-structure tree for this sentence would have
|
||
|
||
crossing branches, which phrase-structure grammar disallows:
|
||
|
||
|
||
|
||
S
|
||
|
||
|
||
|
||
NP VP
|
||
|
||
|
||
|
||
|
||
Adj PP N
|
||
|
||
|
||
|
||
P NP
|
||
|
||
|
||
|
||
khoroshaya u tebia loshad'
|
||
|
||
|
||
|
||
(Russian omits the word is, allowing a VP to consist of a bare
|
||
|
||
predicative NP, PP, or AdjP.)
|
||
|
||
|
||
Even the object of a preposition can be broken up, as in the
|
||
|
||
example
|
||
|
||
|
||
|
||
PP
|
||
|
||
|
||
|
||
P NP
|
||
|
||
|
||
|
||
NP N
|
||
|
||
|
||
N
|
||
|
||
|
||
minuty cherez dve
|
||
|
||
gen. acc.
|
||
|
||
minutes within two
|
||
|
||
|
||
= `within two minutes'
|
||
|
||
(lit. `within a pair of minutes')
|
||
|
||
|
||
from the same short story.
|
||
|
||
|
||
In Latin poetry, extremely scrambled sentences are common.
|
||
|
||
For example:
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
5
|
||
|
||
|
||
S
|
||
|
||
|
||
|
||
|
||
VP NP
|
||
|
||
|
||
|
||
|
||
Adj NP N
|
||
|
||
|
||
|
||
|
||
Adj V Adv N
|
||
|
||
|
||
|
||
ultima Cumaei venit iam carminis aetas
|
||
|
||
nom. gen. gen. nom.
|
||
|
||
last Cumean has-come now song age
|
||
|
||
|
||
= `The last epoch of the Cumean song has now arrived'
|
||
|
||
(Vergil, Eclogues IV.4)
|
||
|
||
|
||
Siewierska (1988) gives examples of discontinuity in Walbiri and
|
||
|
||
other languages.
|
||
|
||
|
||
|
||
1.3. Scrambling transformations do not solve the problem
|
||
|
||
|
||
One way to handle discontinuous constituents is to write
|
||
|
||
phrase-structure rules that generate them whole and then break
|
||
|
||
them up with syntactic transformations or "scrambling rules."
|
||
|
||
Twenty years ago, this was the preferred analysis (see e.g. Ross
|
||
|
||
1967). But such a theory claims that variable-order languages are
|
||
|
||
intrinsically more complex than fixed-order languages -- in fact
|
||
|
||
that a variable-order language is a fixed-order language plus a
|
||
|
||
possibly immense set of scrambling rules.
|
||
|
||
|
||
If this were so, one would expect languages burdened by word
|
||
|
||
order variability to become "simpler" (i.e., more fixed) over
|
||
|
||
time, but this does not necessarily happen. Though the Indo-
|
||
|
||
European languages are generally drifting toward fixed order,
|
||
|
||
languages in other families, such as Finnish, are developing more
|
||
|
||
elaborate case systems that increase the possibilities for word
|
||
|
||
order variation.
|
||
|
||
|
||
Intuitively, word order variability may well be a
|
||
|
||
simplification of the syntax that is compensated by a more
|
||
|
||
elaborate morphology. That is, a variable-word-order language is
|
||
|
||
one with a lack of word-order rules, not a superabundance of them.
|
||
|
||
|
||
More importantly, a transformational analysis is worse than
|
||
|
||
no help at all for parsing. Because transformations are tree-to-
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
6
|
||
|
||
|
||
tree mappings, a parser can only undo a transformation if it has
|
||
|
||
already recognized (parsed) the tree structure that represents the
|
||
|
||
output of the transformation. That is, the only way to parse a
|
||
|
||
transformed sentence is to undo the transformation -- but the only
|
||
|
||
way to undo the transformation is to parse the sentence first.
|
||
|
||
|
||
|
||
1.4. ID/LP formalism cannot handle discontinuous constituents
|
||
|
||
|
||
Recently, several linguists have proposed handling variable
|
||
|
||
word order by splitting phrase-structure rules into two
|
||
|
||
components: immediate dominance rules (ID-rules) that say what a
|
||
|
||
phrase consists of, and linear precedence rules (LP-rules) that
|
||
|
||
state the order in which the constituents appear.
|
||
|
||
|
||
ID/LP framework has been quite successful in capturing
|
||
|
||
generalizations about word order in fixed-order languages. Gazdar
|
||
|
||
et al. (1985), for instance, account for the whole of English word
|
||
|
||
order with just three LP rules.
|
||
|
||
|
||
Another claim often made for ID/LP formalism is that it
|
||
|
||
accounts for word order variability. If the relative order of a
|
||
|
||
set of constituents is unstated by the LP rules, they are allowed
|
||
|
||
to occur in any order.
|
||
|
||
|
||
But this is not enough. Removing the LP rules merely allows
|
||
|
||
reordering the elements within each constituent, i.e., the nodes
|
||
|
||
immediately dominated by a single node. It still does not account
|
||
|
||
for discontinuous constituents. Faced with substantial word order
|
||
|
||
variation, the ID/LP grammarian must replace the
|
||
|
||
transformationalist's scrambling rules with "flattening rules"
|
||
|
||
that simply discard constituent structure and make nearly all
|
||
|
||
words hang directly from the S node.
|
||
|
||
|
||
Flattening was pioneered and then rejected by Uszkoreit
|
||
|
||
(1986a, 1987). The problem is that a "flat" structure is no
|
||
|
||
structure at all -- or to put it differently, a tree that shows
|
||
|
||
all the words hanging from the same node is not really a tree at
|
||
|
||
all; it claims only that the words form a sentence, and begs the
|
||
|
||
question of what relations interconnect them.
|
||
|
||
|
||
Because of this, ID/LP parsing algorithms such as those
|
||
|
||
discussed by Evans (1987) are not adequate for variable word order
|
||
|
||
languages. Nor is the variable-word-order ATN parsing technique
|
||
|
||
discussed by Woods (1987), which is equivalent to a form of ID/LP
|
||
|
||
grammar.
|
||
|
||
|
||
|
||
1.5. Nonconfigurationality does not solve the problem
|
||
|
||
|
||
Chomsky (1981), Hale (1983), and others have split the
|
||
|
||
languages of the world sharply into two types: `configurational'
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
7
|
||
|
||
|
||
languages, such as English, in which grammatical relations are
|
||
|
||
defined by tree structure and word order is fixed; and `non-
|
||
|
||
configurational' languages, such as Walbiri, in which tree
|
||
|
||
structure is less important and word order may or may not be
|
||
|
||
variable.
|
||
|
||
|
||
Like flattening, nonconfigurationality begs the question of
|
||
|
||
how to represent structure in, and how to parse, a non-
|
||
|
||
configurational language. Kashket (1986) has developed a free-
|
||
|
||
word-order parser consistent with Chomsky's theories, but it works
|
||
|
||
rather like a dependency parser, searching through the input
|
||
|
||
string for arguments of each word.
|
||
|
||
|
||
More importantly, many linguists remain unconvinced that
|
||
|
||
there is a distinction between configurational and non-
|
||
|
||
configurational languages. Siewierska (1988) cites extensive
|
||
|
||
evidence that languages form a continuum from fully fixed to fully
|
||
|
||
variable word order, with no sharp transition from one type to the
|
||
|
||
other.
|
||
|
||
|
||
|
||
1.6. Current parsing technology is limited by the limitations of
|
||
|
||
phrase-structure grammar
|
||
|
||
|
||
Almost all known parsing algorithms are based on
|
||
|
||
constituency grammars. As a result, they have trouble handling
|
||
|
||
variable word order. The normal practice among researchers in
|
||
|
||
Germany and Japan is to build parsers for English -- which fits
|
||
|
||
the constituency model relatively well -- rather than for their
|
||
|
||
own languages, which do not (e.g., Kolb 1987, Matsumoto et al.
|
||
|
||
1983, Tomita 1986). A business magazine recently reported that the
|
||
|
||
Japanese are developing successful computer programs to translate
|
||
|
||
English into Japanese but cannot go the other way because Japanese
|
||
|
||
is much harder to parse (Wood 1987).
|
||
|
||
|
||
Constituency grammars are popular for parsing largely
|
||
|
||
because they are easily modeled by the context-free phrase-
|
||
|
||
structure grammars (CF PSGs) of formal language theory. Thus
|
||
|
||
efficient parsing algorithms are available and the complexity of
|
||
|
||
parsing task is easy to study.
|
||
|
||
|
||
CF PSGs are not entirely adequate for human language, and
|
||
|
||
linguists' concern has therefore been how to augment them. The
|
||
|
||
major augmentations include transformations (Chomsky 1957),
|
||
|
||
complex symbols (Chomsky 1965), reentrant feature structures
|
||
|
||
(Kaplan and Bresnan 1982), and slash features to denote missing
|
||
|
||
elements (Gazdar et al. 1985). Gazdar's theory claims as a triumph
|
||
|
||
that it is closer to a CF PSG than any previous viable linguistic
|
||
|
||
theory.
|
||
|
||
|
||
However, one should remember that formal language theory --
|
||
|
||
the branch of mathematics from which PSG came -- has nothing to do
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
8
|
||
|
||
|
||
with languages, i.e., communication systems. Formal languages are
|
||
|
||
called languages only metaphorically (because they have grammars).
|
||
|
||
They are strings of symbols, and their most important use is to
|
||
|
||
represent sequences of operations performed by a machine. It is by
|
||
|
||
no means obvious that formal systems developed for this purpose
|
||
|
||
should be immediately applicable to human languages.
|
||
|
||
|
||
A much more satisfactory approach is to admit that
|
||
|
||
constituents are sometimes discontinuous and to develop a suitable
|
||
|
||
representation and parsing strategy. Recently, discontinuous
|
||
|
||
constituents have become an important issue in linguistic theory
|
||
|
||
(Huck and Ojeda 1987).
|
||
|
||
|
||
The advantages of dependency parsers for dealing with
|
||
|
||
discontinuity have not been generally recognized. Early
|
||
|
||
computational linguists worked with dependency grammars, but only
|
||
|
||
in forms that were easily convertible to PSGs and could be parsed
|
||
|
||
by essentially the same techniques (Hays and Ziehe 1960, Hays
|
||
|
||
1964, Bobrow 1967, Robinson 1970). Most present-day dependency
|
||
|
||
parsers impose an adjacency condition that explicitly forbids
|
||
|
||
discontinuous constituents (Starosta and Nomura 1986; Fraser 1989;
|
||
|
||
Hudson 1989; but not Hellwig 1986 and possibly not J„ppinen et al.
|
||
|
||
1986 and Schubert 1987).
|
||
|
||
|
||
|
||
2. Dependency grammar (DG)
|
||
|
||
|
||
2.1. Dependency grammar analyzes structure as word-to-word links
|
||
|
||
|
||
The alternative to phrase structure grammar is to analyze a
|
||
|
||
sentence by establishing links between individual words,
|
||
|
||
specifying the type of link in each case. Thus we might say that,
|
||
|
||
in "The dog sees a cat,"
|
||
|
||
|
||
dog is the subject of sees
|
||
|
||
cat is the object of sees
|
||
|
||
the modifies dog
|
||
|
||
a modifies cat
|
||
|
||
|
||
or, speaking more formally,
|
||
|
||
|
||
dog depends on sees as subject
|
||
|
||
cat depends on sees as object
|
||
|
||
the depends on dog as determiner
|
||
|
||
a depends on cat as determiner
|
||
|
||
|
||
and sees is the head of the whole structure, since it does not
|
||
|
||
depend on or modify anything else.
|
||
|
||
|
||
This is dependency grammar. It has a long and distinguished
|
||
|
||
history, having been used by traditional grammarians at least
|
||
|
||
since the Middle Ages (Covington 1984). The first modern treatment
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
9
|
||
|
||
|
||
is that of TesniŠre (1953, 1959). Present-day syntacticians who
|
||
|
||
advocate dependency grammar include Baum (1976), Bauer (1979),
|
||
|
||
Hudson (1980, 1984), Tarvainen (1982), Shaumyan (1987),2 Schubert
|
||
|
||
(1987), Mel'cuk (1988), Starosta (1988), and Fraser (1989).
|
||
|
||
Moreover, as I shall point out below, the latest constituency
|
||
|
||
grammars include constraints that bring them closer to dependency grammar.
|
||
|
||
|
||
There are several ways to represent a dependency analysis
|
||
|
||
graphically. We can annotate the sentence with arrows pointing
|
||
|
||
from head to dependent:
|
||
|
||
|
||
|
||
The big dog sees a little cat.
|
||
|
||
|
||
Or we can draw a "dependency tree" or D-tree, in which each head
|
||
|
||
is represented by a node placed higher than that of its
|
||
|
||
dependents:
|
||
|
||
|
||
.
|
||
|
||
|
||
. .
|
||
|
||
|
||
. . . .
|
||
|
||
|
||
|
||
The big dog sees a little cat.
|
||
|
||
|
||
|
||
This is equivalent to:
|
||
|
||
|
||
|
||
sees
|
||
|
||
|
||
|
||
|
||
dog cat
|
||
|
||
|
||
|
||
the big a little
|
||
|
||
|
||
|
||
Following TesniŠre, we can make the tree neater by discarding the
|
||
|
||
information about word order and centering each node above its
|
||
|
||
dependents:
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
2Shaumyan uses a combination of dependency and constituency
|
||
|
||
representations.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
10
|
||
|
||
|
||
sees
|
||
|
||
|
||
|
||
|
||
|
||
dog cat
|
||
|
||
|
||
|
||
|
||
|
||
the big a little
|
||
|
||
|
||
|
||
This, in turn, can be represented in outline-like form using
|
||
|
||
indentation rather than lines to indicate hierarchy:
|
||
|
||
|
||
|
||
sees
|
||
|
||
dog
|
||
|
||
the
|
||
|
||
big
|
||
|
||
cat
|
||
|
||
a
|
||
|
||
little
|
||
|
||
|
||
|
||
This last notation is particularly convenient for computer output.
|
||
|
||
It requires no line drawing, and annotations can be printed after
|
||
|
||
each word.
|
||
|
||
|
||
|
||
2.2. Modern linguistic theories are evolving toward dependency
|
||
|
||
grammar
|
||
|
||
|
||
Every dependency analysis that specifies word order is
|
||
|
||
equivalent to a constituency analysis that (1) has no labels or
|
||
|
||
features on nonterminal nodes, and (2) picks out one node as the
|
||
|
||
"head" of each phrase (Hays 1964, Gaifman 1965, Robinson 1970).
|
||
|
||
|
||
To see the equivalence, observe that the D-tree
|
||
|
||
|
||
.
|
||
|
||
.
|
||
|
||
.
|
||
|
||
|
||
a b c
|
||
|
||
|
||
can be converted into the constituency tree:
|
||
|
||
|
||
.
|
||
|
||
|
||
.
|
||
|
||
|
||
a b c
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
11
|
||
|
||
|
||
which has the same branching structure. To convert this back into
|
||
|
||
the D-tree without loss of information, we must know whether b or
|
||
|
||
c is the head of the constituent bc; if we incorrectly take c as
|
||
|
||
head, we will get the incorrect D-tree:
|
||
|
||
|
||
.
|
||
|
||
|
||
.
|
||
|
||
.
|
||
|
||
|
||
a b c
|
||
|
||
|
||
It is obvious that a is the head of the larger constituent because
|
||
|
||
only an individual word, not a phrase, can be a head.
|
||
|
||
|
||
Moreover, the D-tree has no nodes separate from the words
|
||
|
||
themselves, whence the requirement that the corresponding
|
||
|
||
constituency tree contain no features or other information on non-
|
||
|
||
terminal nodes (unless of course it is copied unchanged from
|
||
|
||
terminal nodes).
|
||
|
||
|
||
Most present-day syntactic theories have adopted essentially
|
||
|
||
these restrictions. Jackendoff (1977) promoted the concept that
|
||
|
||
every phrase has a head which is a single word. He pointed out
|
||
|
||
that a PS-rule such as
|
||
|
||
|
||
noun phrase Ä verb + adverb
|
||
|
||
|
||
ought to be impossible even though classical transformational
|
||
|
||
grammar permits it.
|
||
|
||
|
||
Intuitively, every noun phrase must contain a noun, every
|
||
|
||
verb phrase must contain a verb, and every prepositional phrase
|
||
|
||
must contain a preposition -- or more generally, every X phrase
|
||
|
||
must contain an X. Jackendoff's X-bar theory, which formalizes
|
||
|
||
these intuitive constraints and defines a head for every phrase,
|
||
|
||
has been accepted without controversy by subsequent workers in
|
||
|
||
several different theoretical frameworks (Chomsky 1981, Bresnan
|
||
|
||
1982, Gazdar et al. 1985).3
|
||
|
||
|
||
Likewise, it is uncontroversial that most, if not all, of
|
||
|
||
the features on phrasal nodes should be copied from their heads.
|
||
|
||
(A plural noun phrase is headed by a plural noun; a singular verb
|
||
|
||
phrase is headed by a singular verb; and so on.) Indeed, Gazdar et
|
||
|
||
al. (1985) have a rule, the Head Feature Convention, to ensure
|
||
|
||
that this is so. This is almost equivalent to not making a
|
||
|
||
|
||
|
||
|
||
3Admittedly, X-bar theory distinguishes multiple levels of
|
||
|
||
phrasal structure, which dependency grammar cannot do. However,
|
||
|
||
these multiple levels have become less popular in recent work
|
||
|
||
(compare Jackendoff 1977 to Radford 1989).
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
12
|
||
|
||
|
||
distinction between the phrasal node and the head of the phrase
|
||
|
||
(e.g., the noun and the noun phrase of which it is the head).
|
||
|
||
|
||
Finally, the latest syntactic theories put less and less
|
||
|
||
emphasis on trees as a representation of structure. The emphasis
|
||
|
||
is shifting toward the grammatical relations that link the head of
|
||
|
||
a phrase to the other constituents of the phrase. Examples of this
|
||
|
||
approach include lexical-functional grammar (Kaplan and Bresnan
|
||
|
||
1982), Chomsky's theory of government and binding (1981), head-
|
||
|
||
driven phrase-structure grammar (Sag 1987), and categorial
|
||
|
||
grammars (Bouma 1985, Uszkoreit 1986c, Flynn 1987, Steedman 1987).
|
||
|
||
|
||
|
||
Significantly, Uszkoreit argues that the constituents
|
||
|
||
defined by categorial rules need not be continuous, and Bouma
|
||
|
||
specifically addresses the Australian language Walbiri (Warlpiri),
|
||
|
||
whose word order is extremely variable.
|
||
|
||
|
||
|
||
3. Unification-based dependency grammar
|
||
|
||
|
||
3.1. Dependency is a theory-dependent concept
|
||
|
||
|
||
The relation of head to dependent corresponds roughly to two
|
||
|
||
concepts already well developed in grammatical theory:
|
||
|
||
|
||
(1) The dependent presupposes the presence of the head. That is,
|
||
|
||
adjectives depend on nouns, not vice versa. Verbs are heads and
|
||
|
||
their subcategorized arguments (subject, object, etc.) are their
|
||
|
||
dependents.
|
||
|
||
|
||
(2) Semantically, functors are heads and arguments are dependents.
|
||
|
||
That is, if the meaning of word X is incomplete without the
|
||
|
||
meaning of word Y, but not vice versa, then X is the head and Y is
|
||
|
||
the dependent. For example, the subject and object are dependents
|
||
|
||
of the verb.
|
||
|
||
|
||
As Gazdar et al. (1985:189-192) have shown, this does not
|
||
|
||
settle the issue, because in a semantics that includes partial
|
||
|
||
functions, the choice of functor and argument is somewhat
|
||
|
||
arbitrary.
|
||
|
||
|
||
In the past, dependency grammarians have tried to find some
|
||
|
||
single observable property, such as optionalness, that
|
||
|
||
distinguishes head from dependent in all constructions. My
|
||
|
||
position is that this is a mistake. Dependency is a theoretical
|
||
|
||
abstraction, and its identification depends on multiple criteria;
|
||
|
||
questions about it should be resolved in the way that best
|
||
|
||
captures syntactic and semantic generalizations.
|
||
|
||
|
||
|
||
3.2. D-rules specify possible relations
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
13
|
||
|
||
|
||
Following Miller (1985), I formalize a dependency grammar by
|
||
|
||
writing "D-rules," i.e., rules that allow one word to depend on
|
||
|
||
another. However, instead of using symbols like N and V for noun
|
||
|
||
and verb, I use feature structures in the tradition of
|
||
|
||
unification-based grammar (Shieber 1986).
|
||
|
||
|
||
Thus the relation of noun to adjective in Russian or Latin
|
||
|
||
is described by the following rule, where G, N, and C are
|
||
|
||
variables:
|
||
|
||
|
||
ÚÄ Ä¿ ÚÄ Ä¿
|
||
|
||
³ category: noun ³ ³ category: adj ³
|
||
|
||
³ gender: G ³ Ä ³ gender: G ³
|
||
|
||
³ number: N ³ ³ number: N ³
|
||
|
||
³ case: C ³ ³ case: C ³
|
||
|
||
ÀÄ ÄÙ ÀÄ ÄÙ
|
||
|
||
|
||
That is: "A word with category adj, gender G, number N, and case C
|
||
|
||
can depend on a word with category noun, gender G, number N, and
|
||
|
||
case C."
|
||
|
||
|
||
This rule does not specify word order; in the rule, the head
|
||
|
||
is always written first. First approximations to some other rules
|
||
|
||
used by the parser are as follows:
|
||
|
||
|
||
Verb and subject:
|
||
|
||
|
||
ÚÄ Ä¿ ÚÄ Ä¿
|
||
|
||
³ category: verb ³ ³ category: noun ³
|
||
|
||
³ number: N ³ Ä ³ number: N ³
|
||
|
||
³ person: P ³ ³ person: P ³
|
||
|
||
ÀÄ ÄÙ ÀÄ ÄÙ
|
||
|
||
|
||
Verb and object:
|
||
|
||
|
||
ÚÄ Ä¿ ÚÄ Ä¿
|
||
|
||
³ category: verb ³ Ä ³ category: noun ³
|
||
|
||
³ ³ ³ case: acc ³
|
||
|
||
ÀÄ ÄÙ ÀÄ ÄÙ
|
||
|
||
|
||
Preposition modifying verb:
|
||
|
||
|
||
ÚÄ Ä¿ ÚÄ Ä¿
|
||
|
||
³ category: verb ³ Ä ³ category: prep ³
|
||
|
||
ÀÄ ÄÙ ÀÄ ÄÙ
|
||
|
||
|
||
Preposition and object:
|
||
|
||
|
||
ÚÄ Ä¿ ÚÄ Ä¿
|
||
|
||
³ category: prep ³ Ä ³ category: noun ³
|
||
|
||
³ objcase: C ³ ³ case: C ³
|
||
|
||
ÀÄ ÄÙ ÀÄ ÄÙ
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
14
|
||
|
||
|
||
These rules of course ignore many details.
|
||
|
||
|
||
|
||
3.3. Unification builds and tests structures order-independently
|
||
|
||
|
||
This formalization relies crucially on unification (matching
|
||
|
||
and/or merging) of feature structures. The power of unification
|
||
|
||
comes from its ability to build complex objects through order-
|
||
|
||
independent operations.
|
||
|
||
|
||
Each of the feature structures in the rules above is only
|
||
|
||
partly instantiated -- that is, the values of most features are
|
||
|
||
unspecified. They will become instantiated through matching.
|
||
|
||
Crucially, an uninstantiated feature has no value; it is not
|
||
|
||
equivalent to a feature whose value is 0 or nil.
|
||
|
||
|
||
Two feature structures unify if (1) the features within them
|
||
|
||
that are already instantiated have matching values, and (2) every
|
||
|
||
feature that is instantiated in one structure but not in the other
|
||
|
||
becomes instantiated to the same value in both structures. Thus a
|
||
|
||
feature structure is built up with information acquired from many
|
||
|
||
sources -- some of it from the lexicon and some from every D-rule
|
||
|
||
that successfully applies to it.
|
||
|
||
|
||
The value of a feature can itself be a feature structure, in
|
||
|
||
which case the matching criterion applies recursively.
|
||
|
||
|
||
Unification is a process that succeeds or fails. If the
|
||
|
||
grammar requires two structures to unify and they cannot be
|
||
|
||
unified, the utterance is ungrammatical, i.e., it is not generated
|
||
|
||
or parsed by the grammar.
|
||
|
||
|
||
The result of unifying a set of structures is the same
|
||
|
||
regardless of the order in which the unifications are performed.
|
||
|
||
This means that, in a unification-based grammar or computing
|
||
|
||
system, many difficult questions about the order of operations
|
||
|
||
simply fail to arise. The power of unification comes from its
|
||
|
||
ability to make the most of whatever information is available at a
|
||
|
||
particular time, filling in missing details (or even missing
|
||
|
||
superstructures) when it becomes possible to do so.
|
||
|
||
|
||
For example, a transformational grammar might copy a feature
|
||
|
||
from one location to another. This implies that the feature is in
|
||
|
||
its original location before the copying takes place. A
|
||
|
||
unification-based grammar, by contrast, will merge features in two
|
||
|
||
locations (whether or not either one has been given a value) and,
|
||
|
||
either beforehand or afterward, give a value to one of them. The
|
||
|
||
question of which comes first, instantiation or copying, is
|
||
|
||
irrelevant if not meaningless.
|
||
|
||
|
||
|
||
3.4. Representations of meaning are easy to build
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
15
|
||
|
||
|
||
The unification process can easily build semantic
|
||
|
||
representations as it goes. Consider for example the D-rule
|
||
|
||
|
||
ÚÄ Ä¿ ÚÄ Ä¿
|
||
|
||
³ category: verb ³ ³ category: noun ³
|
||
|
||
³ number: N ³ Ä ³ number: N ³
|
||
|
||
³ person: P ³ ³ person: P ³
|
||
|
||
³ semantics: X(Y,Z) ³ ³ semantics: Y ³
|
||
|
||
ÀÄ ÄÙ ÀÄ ÄÙ
|
||
|
||
|
||
This is just the subject-verb rule above with a crude semantic
|
||
|
||
representation added.
|
||
|
||
|
||
Suppose this D-rule applies to the two words
|
||
|
||
|
||
ÚÄ Ä¿ ÚÄ Ä¿
|
||
|
||
³ form: vidit ³ ³ form: sobaka ³
|
||
|
||
³ category: verb ³ ³ category: noun ³
|
||
|
||
³ number: 1 ³ ³ number: 1 ³
|
||
|
||
³ person: 3 ³ ³ person: 3 ³
|
||
|
||
³ semantics: sees(U,V) ³ ³ semantics: dog ³
|
||
|
||
ÀÄ ÄÙ ÀÄ ÄÙ
|
||
|
||
|
||
|
||
Unification will give the following values to the variables:
|
||
|
||
|
||
N=1 P=3 X=sees Y=U=dog V=Z
|
||
|
||
|
||
and as a result the semantics feature of vidit will have the value
|
||
|
||
sees(dog,V), where V will get a value through further unification
|
||
|
||
when the rule applies that introduces the object.
|
||
|
||
|
||
This is far from a complete theory of meaning, and the
|
||
|
||
reader is referred to Shieber (1986) and to Gazdar et al. (1985)
|
||
|
||
for further discussion of semantics in unification-based grammar.
|
||
|
||
The point here is simply that semantics in a dependency grammar
|
||
|
||
can use mechanisms that have already been extensively developed
|
||
|
||
for unification-based phrase-structure grammar.
|
||
|
||
|
||
|
||
3.5. The meaning representation can ensure that obligatory
|
||
|
||
dependents are present
|
||
|
||
|
||
Any dependency grammar must distinguish between optional and
|
||
|
||
obligatory dependents. For instance, in English, the subject of
|
||
|
||
the verb is obligatory but the adverb of manner is not:4
|
||
|
||
|
||
John came.
|
||
|
||
John came quickly.
|
||
|
||
*Came.
|
||
|
||
|
||
|
||
|
||
4Asterisks denote ungrammatical examples.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
16
|
||
|
||
|
||
*Came quickly.
|
||
|
||
|
||
Moreover, if an argument is obligatory it is also unique: a verb
|
||
|
||
may take several adverbs but it must have one and only one
|
||
|
||
subject.
|
||
|
||
|
||
John invariably came quickly.
|
||
|
||
*John Bill Harry came.
|
||
|
||
|
||
Obligatoriness of arguments is most easily handled through
|
||
|
||
the mechanism that builds representations of meaning. After all,
|
||
|
||
in the meaning, each verb can have only one value in each argument
|
||
|
||
position; this ensures there will never be two subjects or two
|
||
|
||
objects on a single verb.
|
||
|
||
|
||
To this one can add a requirement that the complete semantic
|
||
|
||
representation, once built, must not contain any uninstantiated
|
||
|
||
argument positions. This, in turn, will ensure that all the
|
||
|
||
necessary arguments -- subject, object, and the like -- are
|
||
|
||
present.
|
||
|
||
|
||
The list-valued subcat or syncat features of Shieber (1986)
|
||
|
||
and others cannot be used to handle subcategorization in
|
||
|
||
dependency grammar, because each D-rule brings in only one
|
||
|
||
argument. Nor are the lists appropriate for a variable-word-order
|
||
|
||
language, since the arguments are distinguished by inflectional
|
||
|
||
form rather than by their proximity to the verb.
|
||
|
||
|
||
|
||
3.6. Restrictions can be imposed on word order
|
||
|
||
|
||
Even a variable-word-order language has some constraints on
|
||
|
||
word order. For example, in Russian and Latin, the preposition
|
||
|
||
must precede its object. In the sentence
|
||
|
||
|
||
devushka kladyot knigu na gazetu
|
||
|
||
nom. acc. acc.
|
||
|
||
girl puts book on newspaper
|
||
|
||
|
||
the object of na must be gazetu, not knigu, even though both are
|
||
|
||
in the correct case. That is, the preposition can only combine
|
||
|
||
with a noun that follows it.
|
||
|
||
|
||
We can handle this by annotating the preposition-object rule
|
||
|
||
as "head first," i.e., the head must precede the object. Rules are
|
||
|
||
likewise allowed to be annotated "head last," and such rules will
|
||
|
||
be common in Japanese though I know of none in Russian or Latin.
|
||
|
||
|
||
Further, the prepositional phrase must be continuous; that
|
||
|
||
is, all the direct or indirect dependents of the preposition must
|
||
|
||
form a continuous string. Thus in
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
17
|
||
|
||
|
||
devushka kladyot na gazetu knigu
|
||
|
||
nom. acc. acc.
|
||
|
||
girl puts on newspaper book
|
||
|
||
|
||
the object of na must be gazetu, not knigu; the verb can be
|
||
|
||
separated from its object but the preposition cannot. (Recall that
|
||
|
||
in the example minuty cherez dve above, it was the NP, not the PP,
|
||
|
||
that was discontinuous.)
|
||
|
||
|
||
The prototype parser does not handle contiguity
|
||
|
||
requirements. One way of doing so might be to endow the
|
||
|
||
preposition (for example) with a feature contig that is copied
|
||
|
||
recursively to all its dependents, and then insist that the whole
|
||
|
||
string of words bearing the same value of this feature be
|
||
|
||
contiguous.
|
||
|
||
|
||
Hudson (1984) has shown that dependency grammars with
|
||
|
||
sufficient contiguity and word order requirements can handle
|
||
|
||
fixed-order languages such as English.
|
||
|
||
|
||
|
||
4. The parsing algorithm
|
||
|
||
|
||
4.1. The parser accepts words and tries to link them
|
||
|
||
|
||
Unlike other dependency parsers, this parser does not
|
||
|
||
require constituents to be continuous, but merely prefers them so.
|
||
|
||
|
||
The parser maintains two lists, PrevWordList (containing all
|
||
|
||
words that have been accepted from input) and HeadList (containing
|
||
|
||
only words that are not dependents of other words). These are
|
||
|
||
initially empty. At the end, HeadList will contain only one word,
|
||
|
||
the head of the sentence.
|
||
|
||
|
||
Parsing is done by processing each word in the input string
|
||
|
||
as follows:
|
||
|
||
|
||
(1) Search PrevWordList for a word on which the current word
|
||
|
||
can depend. If there is one, establish the dependency; if there is
|
||
|
||
more than one, use the most recent one on the first try; if there
|
||
|
||
is none, add the current word to HeadList.
|
||
|
||
|
||
(2) Search HeadList for words that can depend on the current
|
||
|
||
word (there can be any number), and establish dependencies for any
|
||
|
||
that are found, removing them from HeadList as this is done.
|
||
|
||
|
||
This algorithm has been used successfully to parse Russian
|
||
|
||
and Latin. To add the adjacency requirement, one would modify the
|
||
|
||
two steps as follows:
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
18
|
||
|
||
|
||
(1) When looking for the word on which the current word
|
||
|
||
depends, consider only the previous word and all words on which it
|
||
|
||
directly or indirectly depends.
|
||
|
||
|
||
(2) When looking for potential dependents of the current
|
||
|
||
word, consider only a contiguous series of members of HeadList
|
||
|
||
beginning with the one most recently added.
|
||
|
||
|
||
With these requirements added, the algorithm would then be
|
||
|
||
equivalent to that of Hudson (1989).
|
||
|
||
|
||
|
||
4.2. The parser prefers continuous phrases but does not require
|
||
|
||
them
|
||
|
||
|
||
Comparing the two algorithms just given, it is obvious that
|
||
|
||
the parser for continuous constituents is a special case of the
|
||
|
||
parser for discontinuous constituents, and that, in fact, it
|
||
|
||
usually tries the parses that involve continuous constituents
|
||
|
||
first.
|
||
|
||
|
||
This strategy makes the parser's preferences resemble those
|
||
|
||
of human beings in two respects.
|
||
|
||
|
||
First, even when discontinuous constituents are allowed, the
|
||
|
||
parser has an easier time parsing continuous ones. Analogously,
|
||
|
||
human languages that allow discontinuity usually do not indulge
|
||
|
||
heavily in it, unless for poetic effect or some other special
|
||
|
||
communicative purpose.
|
||
|
||
|
||
Second, because of its search strategy, the parser adheres
|
||
|
||
to the psycholinguistic principle that near attachments are
|
||
|
||
preferred (Frazier 1987). Consider Frazier's example sentence Ken
|
||
|
||
said Dave left yesterday. Hearers prefer to interpret this as
|
||
|
||
|
||
|
||
.
|
||
|
||
|
||
. .
|
||
|
||
. .
|
||
|
||
|
||
Ken said Dave left yesterday.
|
||
|
||
|
||
|
||
with yesterday modifying left, rather than
|
||
|
||
|
||
|
||
.
|
||
|
||
|
||
. . .
|
||
|
||
.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
19
|
||
|
||
|
||
Ken said Dave left yesterday.
|
||
|
||
|
||
|
||
with yesterday modifying said.
|
||
|
||
|
||
The parsing algorithm has the same preference. Both HeadList
|
||
|
||
and PrevWordList are searched beginning with the most recently
|
||
|
||
added element. Thus the parser follows the strategy
|
||
|
||
|
||
Attach each word to the nearest potential head or
|
||
|
||
dependent.
|
||
|
||
|
||
This is the dependency-grammar counterpart of Frazier's principle,
|
||
|
||
which says
|
||
|
||
|
||
Attach each node to the node currently under
|
||
|
||
consideration if possible.
|
||
|
||
|
||
Ueda (1984) has argued on independent grounds that this principle
|
||
|
||
applies to Japanese, a variable-word-order language.
|
||
|
||
|
||
|
||
4.3. Unavoidably, parsing discontinuous constituents is complex
|
||
|
||
|
||
It is well known that phrase-structure parsing of an n-word
|
||
|
||
sentence with the most efficient algorithm takes, at most, time
|
||
|
||
proportional to n3. The same is true, in principle, of dependency
|
||
|
||
parsing with a dependency grammar that is convertible to a phrase-
|
||
|
||
structure grammar -- i.e., one that does not allow discontinuity.
|
||
|
||
|
||
Parsing with discontinuity is unavoidably more complex.
|
||
|
||
After all, it allows more possibilities; the parser can never be
|
||
|
||
completely sure that a constituent is over or that a subsequent
|
||
|
||
constituent has not yet begun. An exact analysis has not yet been
|
||
|
||
carried out, but complexity of parsing with discontinuity may, in
|
||
|
||
the worst case, be as high as nn.
|
||
|
||
|
||
Three things should be emphasized. First, the extra
|
||
|
||
complexity comes from allowing discontinuous constituents, not
|
||
|
||
from using dependency grammar. Discontinuous constituents are
|
||
|
||
necessary in some human languages, and hence unavoidable. Second,
|
||
|
||
as will be shown below, worst-case complexity is irrelevant to
|
||
|
||
natural language processing. Third, the complexity can be reduced
|
||
|
||
by putting arbitrary limits on how far away from the current word
|
||
|
||
a head or dependent can be sought. There is every reason to
|
||
|
||
believe that the human brain imposes such limits on hearers'
|
||
|
||
ability to understand speech, and therefore that all human
|
||
|
||
languages are thus constrained.
|
||
|
||
|
||
|
||
4.4. With ambiguity and features, natural language parsing is NP-
|
||
|
||
complete
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
20
|
||
|
||
|
||
Barton, Berwick, and Ristad (1987:89-96) prove that parsing
|
||
|
||
is NP-complete in any phrase structure grammar that includes (1)
|
||
|
||
agreement features that are copied from some nodes to others (like
|
||
|
||
the agreement of subject and verb in natural language), and (2)
|
||
|
||
lexical ambiguity (the ability to rewrite more than one complex
|
||
|
||
terminal symbol as the same surface form). They do this by
|
||
|
||
reducing 3SAT (a well-understood theorem proving problem) to a
|
||
|
||
parsing problem for such a grammar.
|
||
|
||
|
||
The dependency grammars proposed here have agreement
|
||
|
||
features and lexical ambiguity. Although the details have not been
|
||
|
||
worked out, it should be obvious that the same reduction can be
|
||
|
||
carried out for a dependency grammar that has order and contiguity
|
||
|
||
requirements. In this respect, dependency parsing is no better and
|
||
|
||
no worse than phrase structure parsing.
|
||
|
||
|
||
|
||
4.5. Average-case performance is what matters
|
||
|
||
|
||
The Barton-Berwick-Ristad proof indicates that all adequate
|
||
|
||
natural-language parsing algorithms have the same worst-case
|
||
|
||
complexity, i.e., they are NP-complete (unless of course some of
|
||
|
||
them turn out to be worse). Fortunately, worst cases in natural
|
||
|
||
language are quite rare. They do exist; an example is the English
|
||
|
||
sentence
|
||
|
||
|
||
BUFFALO BUFFALO BUFFALO BUFFALO BUFFALO
|
||
|
||
|
||
which has the same structure as "Boston cattle bewilder Boston
|
||
|
||
cattle." Once given the structure, human beings have no difficulty
|
||
|
||
interpreting the sentence and seeing that it is grammatical,
|
||
|
||
though they find it extremely difficult to discover the structure
|
||
|
||
without help.
|
||
|
||
|
||
The moral is that even the parsers in our heads do not
|
||
|
||
perform well in the worst case. Average-case complexity is much
|
||
|
||
more important. One way of limiting the average-case complexity of
|
||
|
||
dependency parsing is to place a limit on, for example, the
|
||
|
||
maximum size of HeadList and/or PrevWordList. This will prohibit
|
||
|
||
massive inversions of word order and wide separation of related
|
||
|
||
constituents -- exactly the things that are rare or impermissible
|
||
|
||
even in free-word-order languages.
|
||
|
||
|
||
|
||
5. The implementation
|
||
|
||
|
||
5.1. The parser is written in IBM VM/Prolog
|
||
|
||
|
||
The present implementation uses IBM VM/Programming in Logic
|
||
|
||
(`VM/Prolog' for short) on the IBM 3090 Model 400-2VF at the
|
||
|
||
University of Georgia. It uses the IBM `Mixed' syntax, which
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
21
|
||
|
||
|
||
closely resembles the standard Edinburgh dialect of Prolog
|
||
|
||
described by Clocksin and Mellish (1981).
|
||
|
||
|
||
Prolog is usually thought of as a language for automated
|
||
|
||
reasoning and expert systems (Kowalski 1979, Walker 1987).
|
||
|
||
Nonetheless, it originated as a language for writing parsers
|
||
|
||
(Colmerauer 1973) and remains eminently suited for this purpose.
|
||
|
||
|
||
Three factors make Prolog ideal for natural language
|
||
|
||
processing. First, Prolog is a language for data as well as
|
||
|
||
operations. Every data object that can exist in the language has a
|
||
|
||
written representation.5 Complex data structures can be created
|
||
|
||
gradually and straightforwardly, with no need to declare them in
|
||
|
||
advance or perform system calls to allocate memory. Lisp-like
|
||
|
||
lists, decomposable into head and tail, are one of the many
|
||
|
||
structuring devices available.
|
||
|
||
|
||
Second, Prolog is designed to consider multiple alternative
|
||
|
||
paths of computation. Parsers typically have to try many
|
||
|
||
alternatives in order to parse a sentence successfully, and in
|
||
|
||
Prolog this is almost automatic. A procedure can be given multiple
|
||
|
||
definitions to express alternative ways of solving a problem.
|
||
|
||
Every computation either succeeds or fails, and if a computation
|
||
|
||
fails, execution backs up to the most recent untried alternative
|
||
|
||
and proceeds forward again. The programmer can put in instructions
|
||
|
||
(`cuts') to suppress backtracking where it is not wanted.
|
||
|
||
|
||
Third, a form of unification is built into Prolog. Prolog
|
||
|
||
unification is not identical to feature structure unification, but
|
||
|
||
the basic idea is the same: make the structures alike by
|
||
|
||
instantiating (giving values to) variables. For example, the list
|
||
|
||
[a,b,X] can unify with [Y,b,c] with the instantiations X=c, Y=a.
|
||
|
||
However, [X,b,X] cannot unify with [a,b,c] because X cannot take
|
||
|
||
on two values at once.
|
||
|
||
|
||
|
||
5.2. GULP extends Prolog by adding feature structure unification
|
||
|
||
|
||
Prolog unification differs from feature structure
|
||
|
||
unification in one crucial way: Prolog identifies corresponding
|
||
|
||
features by position, whereas in feature structures, features are
|
||
|
||
identified by name.
|
||
|
||
|
||
This is a substantial obstacle for implementing unification-
|
||
|
||
based grammars. The grammatical theory requires that any feature
|
||
|
||
structure should be unifiable with any other unless feature values
|
||
|
||
prevent it. This means every feature structure must reserve a
|
||
|
||
|
||
|
||
|
||
5Except for certain pathological structures that contain
|
||
|
||
pointers to themselves. These structures are not normally created
|
||
|
||
and some Prolog implementations treat them all as runtime errors.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
22
|
||
|
||
|
||
position for every feature that occurs in the grammar, even though
|
||
|
||
only one or two of them are mentioned in any specific instance. To
|
||
|
||
represent
|
||
|
||
|
||
ÚÄ Ä¿
|
||
|
||
³ case: nom ³
|
||
|
||
³ num: sing ³
|
||
|
||
ÀÄ ÄÙ
|
||
|
||
|
||
the programmer has to write something like
|
||
|
||
|
||
[V1,V2,V3,V4,V5,V6,nom,V8,V9,V10,V11,sing,V13,V14,V15]
|
||
|
||
|
||
if the grammar uses a total of 15 features. Typographical errors
|
||
|
||
are inevitable.
|
||
|
||
|
||
There are two ways out of the dilemma: modify the Prolog
|
||
|
||
unifier, or modify Prolog syntax. If the unifier were modified, it
|
||
|
||
would be possible to write something like [case(nom),num(sing)]
|
||
|
||
and have the unifier figure out that, for example, this is
|
||
|
||
supposed to match [num(N),case(nom),pers(3)]. The trouble is that
|
||
|
||
this approach really slows down the program; all the extra work
|
||
|
||
has to be done at run time whenever a unification is attempted.
|
||
|
||
|
||
GULP (Covington 1987, 1989) is an extension of Prolog that
|
||
|
||
modifies the syntax instead. The programmer writes feature
|
||
|
||
structures such as:6
|
||
|
||
|
||
case % nom %% num % sing
|
||
|
||
|
||
and GULP preprocesses the Prolog program to convert these into
|
||
|
||
list-like structures in which features are identified by position
|
||
|
||
rather than by name. The value of a feature can be any Prolog
|
||
|
||
object, including another feature structure. A routine is provided
|
||
|
||
to display feature structures in a neatly indented style.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
6In the original ASCII version of GULP, case:nom :: num:sing.
|
||
|
||
The colon is already used for another purpose in VM/Prolog and
|
||
|
||
although VM/Prolog is highly modifiable, the requisite
|
||
|
||
modifications have not yet been done.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
23
|
||
|
||
|
||
5.3. The implementation consists of a grammar, a lexicon, and a
|
||
|
||
parser
|
||
|
||
|
||
5.3.1. Feature set
|
||
|
||
|
||
In the prototype parsers for Russian and Latin,7 each word
|
||
|
||
is represented by a feature structure. The features used are:
|
||
|
||
|
||
phon -- The phonological or orthographic form of the word.
|
||
|
||
|
||
cat -- The syntactic category (noun, verb, etc.).
|
||
|
||
|
||
case, num, gen, pers -- Grammatical agreement features (case,
|
||
|
||
number, gender, and person). For brevity, case is used on
|
||
|
||
the preposition to mark the case required in the object;
|
||
|
||
this lacks generality, because other words (e.g.,
|
||
|
||
participles or the adjective `similar') can be in one case
|
||
|
||
while taking a complement in another.
|
||
|
||
|
||
id -- An identifying number assigned to each word in the input
|
||
|
||
string so that separate occurrences of the same word form
|
||
|
||
will not unify with each other.
|
||
|
||
|
||
subj, obj -- If the word is a verb, these become instantiated to
|
||
|
||
the values of the id features of its subject and object
|
||
|
||
respectively. These are merely stand-ins for argument
|
||
|
||
positions in the semantic component that has not been
|
||
|
||
implemented.
|
||
|
||
|
||
gloss -- An English translation of the word, for annotating
|
||
|
||
output.
|
||
|
||
|
||
gr -- The grammatical relation borne by the word to its head
|
||
|
||
(subject, object, modifier, etc.); for annotating output.
|
||
|
||
This identifies the D-rule that was used to establish the
|
||
|
||
relationship.
|
||
|
||
|
||
dep -- An open list (i.e., a list with an uninstantiated tail)
|
||
|
||
containing pointers to the full feature structures of all of
|
||
|
||
the word's dependents. The whole dependency tree can be
|
||
|
||
traced by recursively following the dep feature of the main
|
||
|
||
verb.
|
||
|
||
|
||
|
||
5.3.2. Lexical entries
|
||
|
||
|
||
Since this parser was built purely to examine a syntactic
|
||
|
||
problem, its lexical component ignores morphology and simply lists
|
||
|
||
|
||
|
||
|
||
7Only the Russian parser runs on the 3090; the Latin parser
|
||
|
||
runs (considerably slower!) on a PS/2 Model 50 using ALS Prolog.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
24
|
||
|
||
|
||
every form of every word. Part of the lexicon is shown in Listing
|
||
|
||
1 (at the end of this paper). In conventional notation, the
|
||
|
||
lexical entry for Russian sobaka, for example, is:
|
||
|
||
|
||
ÚÄ Ä¿
|
||
|
||
³ phon: sobaka ³
|
||
|
||
³ cat: noun ³
|
||
|
||
³ gloss: 'dog' ³
|
||
|
||
³ case: nom ³
|
||
|
||
³ num: sg ³
|
||
|
||
³ gen: fem ³
|
||
|
||
ÀÄ ÄÙ
|
||
|
||
|
||
|
||
On input, a string of words is converted by the lexical scan
|
||
|
||
procedure into a list of feature structures, each of them only
|
||
|
||
partly instantiated. (For instance, the id and dep features are
|
||
|
||
not instantiated in the structure above.) The parser instantiates
|
||
|
||
the structures further as it does its work.
|
||
|
||
|
||
|
||
5.3.3. D-rules
|
||
|
||
|
||
D-rules are stored as Prolog clauses with the principal
|
||
|
||
functor `<<', which is written between two feature structures.
|
||
|
||
Recall that a feature structure is a series of feature-value pairs
|
||
|
||
linked by `%%' and a feature is linked to its value by `%'. Using
|
||
|
||
all these notational devices, the adjective-noun rule
|
||
|
||
|
||
ÚÄ Ä¿ ÚÄ Ä¿
|
||
|
||
³ category: noun ³ ³ category: adj ³
|
||
|
||
³ gender: G ³ Ä ³ gender: G ³
|
||
|
||
³ number: N ³ ³ number: N ³
|
||
|
||
³ case: C ³ ³ case: C ³
|
||
|
||
ÀÄ ÄÙ ÀÄ ÄÙ
|
||
|
||
|
||
would be written in Prolog as:
|
||
|
||
|
||
cat%noun %% gen%G %% num%N %% case%C
|
||
|
||
<< cat%adj %% gen%G %% num%N %% case%C.
|
||
|
||
|
||
The complete set of D-rules is shown in Listing 2.
|
||
|
||
|
||
|
||
5.3.4. Parsing process and output
|
||
|
||
|
||
The parser implements the algorithm described in section 4.1
|
||
|
||
above. Crucially, HeadList and PrevWordList are really lists of
|
||
|
||
pointers. The same word can appear in both lists, and when this
|
||
|
||
happens, there is actually only one copy of the word in memory.
|
||
|
||
Thus its features can be instantiated regardless of which list the
|
||
|
||
word was accessed through. At the end, HeadList has only one
|
||
|
||
element, the main verb.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
25
|
||
|
||
|
||
Figure 3 shows a sample of the parser's output, which is
|
||
|
||
displayed by following the dep features from word to word to
|
||
|
||
obtain the complete dependency network. The values of the phon,
|
||
|
||
gloss, and gr features are displayed for each word.
|
||
|
||
|
||
Because it is written in Prolog, the parser automatically
|
||
|
||
has the ability to backtrack and try alternatives. In Figure 3,
|
||
|
||
this is put to good use to find two parses for an ambiguous
|
||
|
||
sentence.
|
||
|
||
|
||
|
||
6. Evaluation of the IBM 3090 environment
|
||
|
||
|
||
6.1. Parsing (and natural language processing generally) have
|
||
|
||
specific machine requirements
|
||
|
||
|
||
Parsing is one of many applications that fall into the realm
|
||
|
||
of symbolic computing because the objects manipulated are not
|
||
|
||
numbers, nor character strings, but rather abstract symbols to
|
||
|
||
which the programmer assigns a meaning. The leading languages for
|
||
|
||
symbolic computation are Prolog and Lisp. Symbolic computation can
|
||
|
||
be done in many other languages, but it is cumbersome.
|
||
|
||
|
||
Symbolic computation requires the ability to create complex
|
||
|
||
data structures of arbitrary shape at run time -- parse trees,
|
||
|
||
feature structures, nested lists, and the like. Necessarily, such
|
||
|
||
structures occupy noncontiguous memory locations and are held
|
||
|
||
together by pointers. For example, the simplest way to represent
|
||
|
||
the list (a . (b . (c . nil))) (more commonly written [a,b,c]) is
|
||
|
||
the following:
|
||
|
||
|
||
ÚÄÄÄÂÄÄÄ¿ ÚÄÄÄÂÄÄÄ¿ ÚÄÄÄÂÄÄÄ¿
|
||
|
||
ÄÄÄÄÄÄ ³ ³ ³ ÄÄÅÄÄÄÄ ³ ³ ³ ÄÄÅÄÄÄÄ ³ ³ ³ ³ ³
|
||
|
||
ÀÄÅÄÁÄÄÄÙ ÀÄÅÄÁÄÄÄÙ ÀÄÅÄÁÄÅÄÙ
|
||
|
||
³ ³ ³ ³
|
||
|
||
|
||
|
||
a b c nil
|
||
|
||
|
||
This preserves the essential properties of the list: it can be
|
||
|
||
recognized and processed one element at a time, without knowing
|
||
|
||
the total length; it can be broken into head and tail at any
|
||
|
||
point; and if the last link is uninstantiated, the list can be
|
||
|
||
lengthened by instantiating it. (Prolog actually uses a more
|
||
|
||
complex representation of a list with additional pointers to the
|
||
|
||
dot.)
|
||
|
||
|
||
Far from wasting time, pointer references normally speed up
|
||
|
||
symbolic computation. The reason is that it is faster to compare
|
||
|
||
pointers to atomic symbols than to compare the symbols themselves.
|
||
|
||
Accordingly, it is standard practice to tokenize all programs and
|
||
|
||
data, i.e., replace all atoms with pointers to a symbol table.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
26
|
||
|
||
|
||
The execution of a symbolic program consists largely of
|
||
|
||
pointer dereferencing, comparisons, and conditional branches.
|
||
|
||
There is very little arithmetic; the objects being processed are
|
||
|
||
not numbers, and even their addresses are not calculated but
|
||
|
||
rather looked up via pointers.
|
||
|
||
|
||
These requirements are not artifacts of using Prolog or
|
||
|
||
Lisp; they are imposed by the applications. Small non-numeric
|
||
|
||
programs, including some compilers, have been written using
|
||
|
||
conventional data structures (arrays, character strings, etc.),
|
||
|
||
but for larger applications, the advantages of symbolic
|
||
|
||
computation are overwhelming. This is particularly the case in
|
||
|
||
parsing natural language because, unlike a programming language,
|
||
|
||
English cannot be designed to fit a simple transition network or
|
||
|
||
require only one character of lookahead.
|
||
|
||
|
||
|
||
6.2. The IBM 3090 is well suited to symbolic computing
|
||
|
||
|
||
The IBM 3090 is a promising machine for symbolic computing
|
||
|
||
because, compared to other supercomputers, it is much less
|
||
|
||
narrowly specialized for non-symbolic applications. By contrast,
|
||
|
||
other supercomputers are specialized for vector arithmetic (Cray-
|
||
|
||
1, ETA-10) or for multiprocessing with relatively small amounts of
|
||
|
||
fast memory on each CPU (Intel iPSC, Control Data CYBERPLUS; see
|
||
|
||
Stearns and Covington 1987).
|
||
|
||
|
||
By contrast, the IBM 3090 reflects the System/360 and
|
||
|
||
System/370 heritage of design for all-purpose computing (Padegs
|
||
|
||
1981, Tucker 1986). Some of its "super" features are designed to
|
||
|
||
speed up all types of programs, not just numeric applications.
|
||
|
||
Perhaps more importantly, the 3090 imposes no penalty for the use
|
||
|
||
of instructions for which it was not specifically optimized.
|
||
|
||
|
||
Symbolic computing, and especially natural language
|
||
|
||
processing, requires a machine with large memory, fast access to
|
||
|
||
arbitrary non-consecutive memory locations, and the ability to
|
||
|
||
execute conditional branches rapidly.
|
||
|
||
|
||
The IBM 3090 meets all these requirements. At the University
|
||
|
||
of Georgia, a program can run in a 100-megabyte virtual memory
|
||
|
||
region if needed. The prototype parser uses only 6 megabytes and
|
||
|
||
could probably get by with less, but it is only a small part of a
|
||
|
||
foreseeable integrated natural language processing system.
|
||
|
||
(Natural language processing is not an end in itself -- it will
|
||
|
||
ultimately be the user interface to some other application, which
|
||
|
||
needs space of its own to run in.)
|
||
|
||
|
||
The large address space of the 3090 and the ease of access
|
||
|
||
to non-contiguous locations facilitate the use of the data
|
||
|
||
structures needed in symbolic computation. Non-contiguity of
|
||
|
||
memory locations is no impediment to caching. This is important
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
27
|
||
|
||
|
||
because data structres are allocated piece by piece and accessed
|
||
|
||
by pointers. There are no arrays or vectors in Prolog, and arrays
|
||
|
||
are uncommon in other symbolic processing languages.
|
||
|
||
|
||
Finally, the 3090 instruction pipeline can prefetch along
|
||
|
||
both alternatives of a conditional branch instruction. This is
|
||
|
||
important because parsing consists largely of decision making;
|
||
|
||
prefetching speeds up execution considerably.
|
||
|
||
|
||
|
||
6.3. VM/Prolog is fast, but a much faster Prolog is possible
|
||
|
||
|
||
VM/Prolog is an interpreter, not a compiler. It was designed
|
||
|
||
for versatility, not speed, and is respectably fast, but it dates
|
||
|
||
from the first generation of full-featured Prolog implementations
|
||
|
||
in the mid-1980s. Prolog implementations for other CPUs have
|
||
|
||
advanced considerably since then. Table 1 (at the end of this
|
||
|
||
paper) shows that, on an IBM PS/2, there is a factor of 20
|
||
|
||
difference in speed between interpreted and compiled Prolog.
|
||
|
||
Whether a comparable speedup could be achieved on the 3090
|
||
|
||
is uncertain, but there is clearly room for improvement. The
|
||
|
||
internals of the present interpreter have not been made public,
|
||
|
||
but the interpreter may gain some of its speed from a tight inner
|
||
|
||
loop that resides entirely in the 64K cache.
|
||
|
||
|
||
Even so, a compiler has to be faster than an interpreter. If
|
||
|
||
a good compiler were available, and the speedup factor were indeed
|
||
|
||
10 to 20, then the 3090 would execute Prolog, not at 500 kLIPS
|
||
|
||
(which is itself impressive), but at an unprecedented speed of 5
|
||
|
||
to 10 megaLIPS.
|
||
|
||
|
||
|
||
7. Remaining issues
|
||
|
||
|
||
7.1. Dependency is problematic in some constructions
|
||
|
||
|
||
There are constructions in which it is not clear which word
|
||
|
||
is the head and which is the dependent. Prepositional phrases are
|
||
|
||
an example. In a sentence like He came after lunch, it makes sense
|
||
|
||
to treat after as a modifier of the verb (it says he came after
|
||
|
||
something), and lunch as a required dependent of after.
|
||
|
||
|
||
But in some constructions the preposition seems to be much
|
||
|
||
more tightly bound to the noun. For example, in Spanish the direct
|
||
|
||
object of the verb is sometimes marked with the preposition a
|
||
|
||
(which in other contexts means `to'). Does such a direct object
|
||
|
||
depend directly on the verb -- in which case the preposition
|
||
|
||
depends on the noun, rather than vice versa -- or do Spanish verbs
|
||
|
||
sometimes take a preposition instead of a noun as direct object?
|
||
|
||
|
||
There are other problematic constructions. Is the verb the
|
||
|
||
head of the sentence? We have assumed so, but there is a time-
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
28
|
||
|
||
|
||
honored traditional analysis that treats the subject rather than
|
||
|
||
the verb as the head (Covington 1984). And what about relative
|
||
|
||
clauses and other embedded sentences?
|
||
|
||
|
||
Fortunately, none of these problems is daunting. The
|
||
|
||
question is which is the best analysis, not whether there is a
|
||
|
||
possible analysis. In any case, the same questions of headship
|
||
|
||
arise within X-bar theory and are the object of vigorous research
|
||
|
||
efforts there (see e.g. Radford 1989).
|
||
|
||
|
||
|
||
7.2. Conjunctions are problematic for both DG and PSG
|
||
|
||
|
||
Conjunctions pose a special problem. In a sentence like Joe
|
||
|
||
and Max arrived, the verb seems to have two subjects. Intuitively,
|
||
|
||
Joe and Max forms a single unit that serves as the subject. But
|
||
|
||
dependency grammar cannot describe this unit; dependency grammar
|
||
|
||
can only connect the verb to one single word. Phrase-structure
|
||
|
||
grammar seems to have the upper hand.
|
||
|
||
|
||
But Hudson (1988) has shown that PSG is not much better off
|
||
|
||
than DG. Consider for example the sentence
|
||
|
||
|
||
John drank coffee at breakfast and tea at lunch.
|
||
|
||
|
||
Here and joins coffee at breakfast with tea at lunch. Yet neither
|
||
|
||
coffee at breakfast nor tea at lunch is a constituent or a
|
||
|
||
grammatical unit. No reasonable constituency analysis comes out
|
||
|
||
any better than dependency grammar.
|
||
|
||
|
||
From this, Hudson argues that conjunctions license doubling
|
||
|
||
up of grammatical relations -- that is, because of and, the verb
|
||
|
||
can take two objects and two prepositional phrases, instead of
|
||
|
||
just one of each. Clearly, this analysis works just as well in DG
|
||
|
||
as in PSG.
|
||
|
||
|
||
The alternative is to argue that, at least some of the time,
|
||
|
||
conjunctions show the effect of a post-syntactic operation on the
|
||
|
||
string of words -- some kind of ellipsis or rearrangement not
|
||
|
||
based on grammatical relations, analogous to the insertion of
|
||
|
||
parenthetical remarks.
|
||
|
||
|
||
|
||
7.3. Word order variation affects emphasis and cohesion
|
||
|
||
|
||
As Karttunen and Kay (1984) have noted, word order is
|
||
|
||
significant even when it is variable. The first element in the
|
||
|
||
sentence is most likely to be the topic (the continuing concern of
|
||
|
||
the discourse), and new information is introduced later in the
|
||
|
||
sentence.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
29
|
||
|
||
|
||
A dependency parser can easily keep track of the actual word
|
||
|
||
order, or the position of various words relative to each other, by
|
||
|
||
means of additional features. The semantic component of the
|
||
|
||
grammar can use these features to identify topic and comment and
|
||
|
||
to recognize other effects of word order.
|
||
|
||
|
||
|
||
8. Conclusions
|
||
|
||
|
||
Variable-word-order parsing is an important but neglected
|
||
|
||
problem; progress on it is necessary if natural language
|
||
|
||
processing is ever going to deal with a wide variety of languages
|
||
|
||
other than English. The work reported here has shown that
|
||
|
||
dependency parsing is a feasible approach to the handling of
|
||
|
||
variable word order. The apparently high worst-case computational
|
||
|
||
complexity of dependency parsing is not an objection because
|
||
|
||
average-case rather than worst-case complexity is what matters;
|
||
|
||
even the human brain does not process `worst cases' successfully.
|
||
|
||
|
||
The technique presented here derives much of its power from
|
||
|
||
unification-based grammar, a formalism developed to augment
|
||
|
||
phrase-structure grammar but equally applicable to dependency
|
||
|
||
grammar. By unifying feature structures, the grammar can build
|
||
|
||
representations of syntax and meaning in a powerful, order-
|
||
|
||
independent way.
|
||
|
||
|
||
Some questions remain to be answered -- such as how to
|
||
|
||
handle conjunctions and subordinate clauses in dependency
|
||
|
||
grammar -- but the work of Hudson, Starosta, and others has shown
|
||
|
||
that satisfactory treatments are possible, and the question is now
|
||
|
||
which analysis is best, rather than whether a satisfactory
|
||
|
||
analysis can be obtained.
|
||
|
||
|
||
|
||
|
||
References
|
||
|
||
|
||
Barton, G. E.; Berwick, R. C.; and Ristad, E. S. (1987)
|
||
|
||
Computational complexity and natural language. Cambridge,
|
||
|
||
Massachusetts: MIT Press.
|
||
|
||
|
||
Bauer, L. (1979) Some thoughts on dependency grammar. Linguistics
|
||
|
||
17 (new series) 301-315.
|
||
|
||
|
||
Baum, R. (1976) `Dependenzgrammatik': TesniŠres Modell der
|
||
|
||
Sprachbeschreibung in wissenschaftsgeschichtlicher und
|
||
|
||
kritischer Sicht. (Zeitschrift f<>r romanische Philologie,
|
||
|
||
Beiheft 151.) T<>bingen: Max Niemeyer.
|
||
|
||
|
||
Bobrow, D. G. (1967) Syntactic theories in computer
|
||
|
||
implementations. Automated language processing, ed. H.
|
||
|
||
Borko, 215-251.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
30
|
||
|
||
|
||
Bouma, G. (1985) Kategoriale grammatiek en het Warlpiri. Glot
|
||
|
||
8:227-256.
|
||
|
||
|
||
Bresnan, J. (1982a) Control and complementation. Bresnan
|
||
|
||
1982b:282-390.
|
||
|
||
|
||
Bresnan, J., ed. (1982b) The mental representation of grammatical
|
||
|
||
relations. Cambridge, Massachusetts: MIT Press.
|
||
|
||
|
||
Chomsky, N. (1957) Syntactic structures. The Hague: Mouton.
|
||
|
||
|
||
--- (1965) Aspects of the theory of syntax. Cambridge,
|
||
|
||
Massachusetts: MIT Press.
|
||
|
||
|
||
--- (1981) Lectures on government and binding. Dordrecht: Foris.
|
||
|
||
|
||
Clocksin, W. F., and Mellish, C. S. (1981) Programming in Prolog.
|
||
|
||
Berlin: Springer.
|
||
|
||
|
||
Colmerauer, A. (1973) Les Systemes-Q ou un formalisme pour
|
||
|
||
analyser et synth‚tiser des phrases sur ordinateur.
|
||
|
||
Publication interne No. 43, D‚partement d'Informatique,
|
||
|
||
Universit‚ de Montr‚al.
|
||
|
||
|
||
Covington, M. A. (1984) Syntactic theory in the High Middle Ages.
|
||
|
||
Cambridge University Press.
|
||
|
||
|
||
--- (1987) GULP 1.1: an extension of Prolog for unification-based
|
||
|
||
grammar. Research report 00-0021, Advanced Computational
|
||
|
||
Methods Center, University of Georgia.
|
||
|
||
|
||
--- (1989) GULP 2.0: an extension of Prolog for unification-based
|
||
|
||
grammar. Research report AI-1989-01, Artificial Intelligence
|
||
|
||
Programs, University of Georgia.
|
||
|
||
|
||
Covington, M. A., and Vellino, A. (1986) Prolog arrives. PC Tech
|
||
|
||
Journal 4.11:52-69.
|
||
|
||
|
||
Evans, R. (1987) Direct interpretations of the GPSG formalism. J.
|
||
|
||
Hallam and C. Mellish, eds., Advances in artificial
|
||
|
||
intelligence (Proceedings of the 1987 AISB conference).
|
||
|
||
Chichester: Wiley.
|
||
|
||
|
||
Flynn, M. (1987) Categorial grammar and the domain specificity of
|
||
|
||
universal grammar. Garfield 1987:173-186.
|
||
|
||
|
||
Fraser, N. M. (1989) Parsing and dependency grammar. UCL Working
|
||
|
||
Papers in Linguistics, vol. 1, 296-319. University College
|
||
|
||
London.
|
||
|
||
|
||
Frazier, L. (1987) Theories of sentence processing. Garfield
|
||
|
||
1987:291-307.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
31
|
||
|
||
|
||
Gaifman, H. (1965) Dependency systems and phrase-structure
|
||
|
||
systems. Information and Control 8:304-337.
|
||
|
||
|
||
Garfield, J. L., ed. (1987) Modularity in knowledge representation
|
||
|
||
and natural-langauge understanding. Cambridge,
|
||
|
||
Massachusetts: MIT Press.
|
||
|
||
|
||
Gazdar, G.; Klein, E.; Pullum, G.; and Sag, I. (1985) Generalized
|
||
|
||
phrase structure grammar. Cambridge, Massachusetts: Harvard
|
||
|
||
University Press.
|
||
|
||
|
||
Hale, K. (1983) Warlpiri and the grammar of non-configurational
|
||
|
||
languages. Natural Language and Linguistic Theory 1:5-47.
|
||
|
||
|
||
Hays, D. G. (1964) Dependency theory: a formalism and some
|
||
|
||
observations. Language 40:511-525.
|
||
|
||
|
||
--- (1966) Parsing. Readings in automatic language processing, ed.
|
||
|
||
D. G. Hays, 73-82. New York: American Elsevier.
|
||
|
||
|
||
Hays, D. G., and Ziehe, T. W. (1960) Studies in machine
|
||
|
||
translation -- 10: Russian sentence-structure determination.
|
||
|
||
Project RAND Research Memorandum RM-2538. Santa Monica: RAND
|
||
|
||
Corporation.
|
||
|
||
|
||
Hellwig, P. (1986) Dependency unification grammar. Proceedings,
|
||
|
||
COLING 86, 195-198.
|
||
|
||
|
||
Huck, G. J., and Ojeda, A. E., eds. (1987) Syntax and semantics,
|
||
|
||
vol. 20: Discontinuous constituency. Orlando: Academic
|
||
|
||
Press.
|
||
|
||
|
||
Hudson, R. A. (1980) Constituency and dependency. Linguistics 18
|
||
|
||
(n.s.) 179-198.
|
||
|
||
|
||
--- (1984) Word grammar. Oxford: Blackwell.
|
||
|
||
|
||
--- (1988) Coordination and grammatical relations. Journal of
|
||
|
||
Linguistics 24:303-342.
|
||
|
||
|
||
--- (1989) Towards a computer-testable Word Grammar of English.
|
||
|
||
UCL Working Papers in Linguistics, vol. 1, 321-339.
|
||
|
||
University College London.
|
||
|
||
|
||
IBM (1985) VM/Programming in Logic program description/operations
|
||
|
||
manual. IBM publication SH20-6541-0.
|
||
|
||
|
||
Jackendoff, R. (1977) X syntax: a study of phrase structure.
|
||
|
||
Cambridge, Massachusetts: MIT Press.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
32
|
||
|
||
|
||
J„ppinen, H.; Lehtola, A.; and Valkonen, K. (1986) Functional
|
||
|
||
structures for parsing dependency constraints. Proceedings,
|
||
|
||
COLING 86, 461-463.
|
||
|
||
|
||
Kaplan, R. M., and Bresnan, J. (1982) Lexical-functional grammar:
|
||
|
||
a formal system for grammatical representation. Bresnan
|
||
|
||
1982b:173-281.
|
||
|
||
|
||
Karttunen, L., and Kay, M. (1984) Parsing in a free word order
|
||
|
||
language. Dowty, D. R., et al., eds., Natural language
|
||
|
||
parsing, 279-306. Cambridge University Press.
|
||
|
||
|
||
Kashket, M. B. (1986) Parsing a free-word-order language:
|
||
|
||
Warlpiri. Proceedings, 24th Annual Meeting of the
|
||
|
||
Association for Computational Linguistics, 60-66.
|
||
|
||
|
||
Kilbury, J. (1984) Earley-basierte Algorithmen f<>r direktes Parsen
|
||
|
||
mit ID/LP-Grammatiken. KIT-Report 16, Technische Universit„t
|
||
|
||
Berlin.
|
||
|
||
|
||
Kolb, H.-P. (1987) Diskursrepr„sentationstheorie und Deduktion.
|
||
|
||
Linguistische Berichte 110:247-282.
|
||
|
||
|
||
Kowalski, R. (1979) Logic for problem solving. New York: North-
|
||
|
||
Holland.
|
||
|
||
|
||
Matsumoto, Y.; Tanaka, H.; Hirakawa, H.; Miyoshi, H.; and
|
||
|
||
Yasukawa, H. BUP: a bottom-up parser embedded in Prolog. New
|
||
|
||
Generation Computing 1:145-158.
|
||
|
||
|
||
Mel'cuk, I. A. (1988) Dependency syntax: theory and practice.
|
||
|
||
Albany: State University Press of New York.
|
||
|
||
|
||
Miller, J. (1985) Semantics and syntax. Cambridge University
|
||
|
||
Press.
|
||
|
||
|
||
Padegs, A. (1981) System/360 and beyond. IBM Journal of Research
|
||
|
||
and Development 25:377-390.
|
||
|
||
|
||
Radford, A. (1989) Transformational grammar. Cambridge University
|
||
|
||
Press.
|
||
|
||
|
||
Robinson, J. J. (1970) Dependency structures and transformational
|
||
|
||
rules. Language 46:259-285.
|
||
|
||
|
||
Ross, J. R. (1967) Constraints on variables in syntax.
|
||
|
||
Dissertation, M.I.T. Published as Infinite syntax, Norwood,
|
||
|
||
N.J.: Ablex, 1986.
|
||
|
||
|
||
Sag, I. (1987) Grammatical hierarchy and linear precedence. Huck
|
||
|
||
and Ojeda 1987:303-340.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
33
|
||
|
||
|
||
Schubert, K. (1987) Metataxis: contrastive dependency syntax for
|
||
|
||
machine translation. Dordrecht: Foris.
|
||
|
||
|
||
Shaumyan, S. (1987) A semiotic theory of language. Bloomington:
|
||
|
||
Indiana University Press.
|
||
|
||
|
||
Shieber, S. M. (1984) Direct parsing of ID/LP grammars.
|
||
|
||
Linguistics and Philosophy 7:135-154.
|
||
|
||
|
||
--- (1986) An introduction to unification-based approaches to
|
||
|
||
grammar. (CSLI Lecture Notes, 4.) Stanford: CSLI.
|
||
|
||
|
||
Siewierska, A. (1988) Word order rules. London: Croom Helm.
|
||
|
||
|
||
Starosta, S. (1988) The case for lexicase. London: Pinter.
|
||
|
||
|
||
Starosta, S., and Nomura, H. (1986) Lexicase parsing: a lexicon-
|
||
|
||
driven approach to syntactic analysis. Proceedings, COLING
|
||
|
||
86, 127-132.
|
||
|
||
|
||
Stearns, R. E., and Covington, M. (1987) Prolog on the CYBERPLUS:
|
||
|
||
a feasibility study. Research report 01-0019, Advanced
|
||
|
||
Computational Methods Center, University of Georgia.
|
||
|
||
|
||
Steedman, M. (1987) Combinatory grammar and human language
|
||
|
||
processing. Garfield 1987:187-210.
|
||
|
||
|
||
Tarvainen, K. (1982) Einf<6E>hrung in die Dependenzgrammatik.
|
||
|
||
T<>bingen: Niemeyer.
|
||
|
||
|
||
TesniŠre, L. (1953) Esquisse d'une syntaxe structurale. Paris:
|
||
|
||
Klincksieck. Cited by Robinson (1970).
|
||
|
||
|
||
--- (1959) <20>l‚ments de la syntaxe structurale. Paris: Klincksieck.
|
||
|
||
|
||
Tomita, M. (1986) Efficient parsing for natural language. Boston:
|
||
|
||
Kluwer.
|
||
|
||
|
||
Tucker, S. G. (1986) The IBM 3090 system: an overview. IBM Systems
|
||
|
||
Journal 25.1:4-19.
|
||
|
||
|
||
Ueda, M. (1984) Notes on parsing in Japanese. Unpublished.
|
||
|
||
Department of Linguistics, University of Massachusetts, Amherst.
|
||
|
||
|
||
Uszkoreit, H. (1986a) Word order and constituent structure in
|
||
|
||
German. (CSLI Lecture Notes, 8.) Stanford: CSLI.
|
||
|
||
|
||
--- (1986b) Constraints on order. Report No. CSLI-86-46. Stanford
|
||
|
||
University.
|
||
|
||
|
||
--- (1986c) Categorial unification grammars. Report No. CSLI-86-
|
||
|
||
66. Stanford University.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
34
|
||
|
||
|
||
--- (1987) Linear precedence in discontinuous constituents:
|
||
|
||
complex fronting in German. Huck and Ojeda 1987:405-425.
|
||
|
||
|
||
Walker, A., ed. (1987) Knowledge systems and Prolog. Reading,
|
||
|
||
Massachusetts: Addison-Wesley.
|
||
|
||
|
||
Wood, R. C. (1987) The language advantage: Japan's machine
|
||
|
||
translators rule the market. High Technology Business 7.11
|
||
|
||
(November), p. 17.
|
||
|
||
|
||
Woods, W. A. (1987) Grammar, augmented transition network.
|
||
|
||
Shapiro, S. C., ed., Encyclopedia of artificial intelligence
|
||
|
||
1:323-333. New York: Wiley.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
35
|
||
|
||
|
||
Listing 1. Part of the lexicon, which ignores morphology and
|
||
|
||
simply lists every form of every word. This is a stand-in for the
|
||
|
||
morphological component that would be needed in a practical
|
||
|
||
system.
|
||
|
||
|
||
|
||
word(phon%koshka %% cat%noun %% gloss%"'cat'" %% case%nom %% num%sg %% gen%fem).
|
||
|
||
word(phon%koshku %% cat%noun %% gloss%"'cat'" %% case%acc %% num%sg %% gen%fem).
|
||
|
||
word(phon%koshki %% cat%noun %% gloss%"'cats'" %% case%nom %% num%pl %% gen%fem).
|
||
|
||
word(phon%koshki %% cat%noun %% gloss%"'cats'" %% case%acc %% num%pl %% gen%fem).
|
||
|
||
|
||
word(phon%sobaka %% cat%noun %% gloss%"'dog'" %% case%nom %% num%sg %% gen%fem).
|
||
|
||
word(phon%sobaku %% cat%noun %% gloss%"'dog'" %% case%acc %% num%sg %% gen%fem).
|
||
|
||
word(phon%sobaki %% cat%noun %% gloss%"'dogs'" %% case%nom %% num%pl %% gen%fem).
|
||
|
||
word(phon%sobaki %% cat%noun %% gloss%"'dogs'" %% case%acc %% num%pl %% gen%fem).
|
||
|
||
|
||
word(phon%vidit %% cat%verb %% gloss%"'sees'" %% num%sg %% pers%3).
|
||
|
||
word(phon%vidut %% cat%verb %% gloss%"'see'" %% num%pl %% pers%3).
|
||
|
||
|
||
word(phon%presleduet %% cat%verb %% gloss%"'pursues'" %% num%sg %% pers%3).
|
||
|
||
word(phon%presleduyut %% cat%verb %% gloss%"'pursue'" %% num%pl %% pers%3).
|
||
|
||
|
||
word(phon%cherez %% cat%prep %% gloss%"'through'" %% case%acc).
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
36
|
||
|
||
|
||
Listing 2. D-rules used by the prototype Russian parser.
|
||
|
||
|
||
|
||
|
||
cat%verb %% pers%P %% num%N %% subj%S
|
||
|
||
<< cat%noun %% case%nom %% pers%P %% num%N %% gr%subject %% id%S.
|
||
|
||
|
||
|
||
cat%verb %% obj%Ob
|
||
|
||
<< cat%noun %% case%acc %% gr%direct_object %% id%Ob.
|
||
|
||
|
||
|
||
cat%verb
|
||
|
||
<< cat%prep %% gr%modifier.
|
||
|
||
|
||
|
||
cat%noun %% case%C %% num%N %% gen%G
|
||
|
||
<< cat%adj %% case%C %% num%N %% gen%G %% gr%modifier.
|
||
|
||
|
||
|
||
cat%prep %% case%C %% obj%Ob %% posn%1
|
||
|
||
<< cat%noun %% case%C %% gr%object_of_preposition %% id%Ob.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
37
|
||
|
||
|
||
Listing 3. Output of a typical parsing run. The sentence is
|
||
|
||
ambiguous as to whether belye modifies sobaki or koshki; both
|
||
|
||
parses are found.
|
||
|
||
|
||
|
||
[vidut,sobaki,belye,koshki,v,chornom,lesu]
|
||
|
||
|
||
Parsed structure:
|
||
|
||
vidut 'see'
|
||
|
||
sobaki 'dogs' subject
|
||
|
||
belye 'white' modifier
|
||
|
||
koshki 'cats' direct_object
|
||
|
||
v 'in' modifier
|
||
|
||
lesu 'forest' object_of_preposition
|
||
|
||
chornom 'black' modifier
|
||
|
||
|
||
Parsed structure:
|
||
|
||
vidut 'see'
|
||
|
||
sobaki 'dogs' subject
|
||
|
||
koshki 'cats' direct_object
|
||
|
||
belye 'white' modifier
|
||
|
||
v 'in' modifier
|
||
|
||
lesu 'forest' object_of_preposition
|
||
|
||
chornom 'black' modifier
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
38
|
||
|
||
|
||
Table 1. Comparative speed of several Prolog implementations.
|
||
|
||
|
||
|
||
Search-and 256-element
|
||
|
||
backtrack list reversal
|
||
|
||
|
||
|
||
IBM PS/2-50
|
||
|
||
Arity Prolog 4.1 21.6 sec 47.1 sec
|
||
|
||
(interpreted) 1.5 kLIPS 0.7 kLIPS
|
||
|
||
|
||
IBM PS/2-50
|
||
|
||
ALS Prolog 1.2 1.2 sec 2.5 sec
|
||
|
||
(compiled) 27.5 kLIPS 13.2 kLIPS
|
||
|
||
|
||
Sun Sparcstation
|
||
|
||
Quintus Prolog 0.166 sec 0.133 sec
|
||
|
||
(compiled) 197 kLIPS 130 kLIPS
|
||
|
||
|
||
IBM 3090-400
|
||
|
||
VM/Prolog 0.066 sec 0.297 sec
|
||
|
||
(interpreted) 496 kLIPS 112 kLIPS
|
||
|
||
|
||
|
||
|
||
Benchmark programs are from Covington and Vellino (1986).
|
||
|
||
kLIPS = thousand logical inferences per second.
|
||
|
||
|