194 lines
8.9 KiB
Plaintext
194 lines
8.9 KiB
Plaintext
Towards reducing the hardware complexity of feature detection-
|
|
based models
|
|
|
|
Bassem Medawar and Andrew Noetzel1
|
|
Polytechnic University
|
|
333 Jay St. Brooklyn, NY 11201
|
|
____________________
|
|
1. This paper will be presented at the International Joint
|
|
Conference on Neural Networks, Jan. 15-19, 1990, Washington, DC.
|
|
|
|
|
|
Abstract
|
|
|
|
A model for feature detection-based pattern recognition is
|
|
presented. It attempts to improve on the hardware complexity of
|
|
existing models. Traditionally, feature detection has been
|
|
implemented with brute force duplication of template-based
|
|
feature detectors, offering little scalability. This model
|
|
eliminates the need to duplicate complex feature detectors using
|
|
instead operators to transform patterns.
|
|
|
|
|
|
1. Introduction
|
|
|
|
It has been shown [1] that the brain uses feature detection, in
|
|
its visual pattern recognition task. Many researches have
|
|
attempted to capture the brain's pattern recognizing capability
|
|
in abstract models. In their attempts, some have tried to remain
|
|
faithful to the biological principles underlying the functioning
|
|
and organization of the brain [2]. Others borrowed from the brain
|
|
the most important principles and tried to cast them in any model
|
|
that could be demonstrated to work [3,4]. The model in this
|
|
paper follows the latter approach.
|
|
|
|
The work done by Fukushima will first be examined, representing
|
|
earlier models of its class. A new model, which attempts to
|
|
overcome their limitations, will then be described. Finally the
|
|
paper will conclude with a description of future improvements to
|
|
the model.
|
|
|
|
|
|
2. Earlier work
|
|
|
|
Models which loosely follow the brain's architecture, have done
|
|
so based on the following elementary principles:
|
|
|
|
a) The neuron (as a threshold element) is the building block of
|
|
those models.
|
|
b) The neuron's output can be viewed as a boolean corresponding
|
|
to on and off activation states, or as a positive (bounded)
|
|
real number representing the activation rate of the neuron.
|
|
c) The pattern recognizing network is layered.
|
|
d) Each layer contains feature detecting cells with increasing
|
|
level of conceptual complexity, the higher the layer level.
|
|
|
|
Many models in the neural network literature applied those
|
|
principles. Focus will be centered on Fukushima's model, because
|
|
it is the most elaborate and has been shown to work with a
|
|
(relatively) large retina of 128x128 pixels [5].
|
|
|
|
Fukushima's Neocognitron model [3,6] has layers with two levels
|
|
each. The lower level is made up of groups of template matchers.
|
|
The higher level neurons take their inputs from groups at the
|
|
lower level that recognize the same object at different
|
|
positions. The net effect is feature detectors tolerant of
|
|
displacement and slight distortion.
|
|
|
|
Fukushima's neocognitron suffers the following problems. First,
|
|
hardware is not amiable to large scale implementations: in one
|
|
case [6], a simple 19x19 retina, 4 layer network implementation
|
|
required over 40 thousand cells, excluding non-responding ones.
|
|
Second, each learned template is duplicated in many positions
|
|
after being trained in only one position: this is problematic for
|
|
hardware implementations as well as being biologically
|
|
implausible.
|
|
|
|
|
|
3. The model
|
|
|
|
This model [fig. 1] is based on the premise that instead of
|
|
taking the feature detector to the pattern (multiplying the
|
|
number of feature detectors), we bring the pattern to the feature
|
|
detector (multiplying the pathways.) As the example of the
|
|
neocognitron shows, duplicating complex feature detectors is
|
|
costly, in terms of number of cells. What we hope to achieve is
|
|
a reduction in the overall number of cells.
|
|
|
|
The retina is thus divided into several marginally overlapping
|
|
receptive fields (RF's) of uniform size. Simple hard-wired
|
|
operators provide a many-to-one mapping from the RF area to the
|
|
feature detector. Those operators are divided into classes. For
|
|
instance, on the lowest layer, the classes are displacement,
|
|
scaling, and rotation. On higher layers, the classes include
|
|
positional and set operators. Each class has its variations
|
|
within each RF, depending on where the operator maps from, and
|
|
the degree of the mapping. For example, the displacement class
|
|
has variations which corresponds to the direction and the amount
|
|
of the displacement.
|
|
|
|
On the next level, within a layer, feature detectors take their
|
|
input from the output of the operators weighed by the optimal
|
|
feature pattern. The output of those feature detectors
|
|
represents the degree of success with which a particular operator
|
|
maps into the optimal feature. From this large pool of feature
|
|
detector outputs within a receptive field, the best variation and
|
|
degree for each class is selected. Then, the optimum values
|
|
across the layer from each RF are combined to choose the best
|
|
class of operators. This choice represents the consensus as to
|
|
which class of operators best maps into some feature.
|
|
The consensus is then fed back to lower levels, allowing each RF
|
|
to reset its own image of the retina according to the new
|
|
transformation. Notice that while the application of the
|
|
operator class is enforced upon the layer, each RF implements the
|
|
transformation in a way that generates optimal mapping.
|
|
After one class is selected within a layer, the class is then
|
|
inhibited allowing another class to win in a second round. The
|
|
process is then repeated until a threshold of desirable outcome
|
|
is exceeded. The feature with best degree of success, can thus
|
|
be said to have been recognized.
|
|
|
|
[Fig. 1 is inserted about here]
|
|
|
|
Having recognized a feature for each RF on the first layer, the
|
|
output of the first layer is fed into the second layer. A
|
|
similar process of transformation and recognition is carried out
|
|
in the second layer. Finally, on the topmost layer, a feature
|
|
detector which conceptually represents an object is selected and
|
|
the whole process terminates.
|
|
|
|
|
|
4. The model's weakness
|
|
|
|
Simulating this model necessitated additional hardware that was
|
|
not originally envisioned. While its design premise is simple,
|
|
the number of cells required to implement it is proportional to
|
|
the number of RF's, the number of classes, the number of
|
|
variations within a class, and the number of features within each
|
|
layer.
|
|
|
|
The model does not lend itself to a nice and simple mathematical
|
|
model to support it, and mathematical properties of the model are
|
|
not practical to implement. For example, while it is
|
|
mathematically sound to say that two features are different if
|
|
one can not be generated from the other by applying any sequence
|
|
of operators in any order. This property taken to the extreme is
|
|
not practical to use in order to incorporate self-organization
|
|
into the model. While that the model can be augmented with
|
|
learning rules that change the weights on the feature detectors,
|
|
it fails to address how a whole new class of operators can be
|
|
learned.
|
|
|
|
The model has been shown in practice to fail under certain
|
|
circumstances: the wrong sequence of transformations is applied
|
|
leading to either faulty recognition or no recognition at all.
|
|
This is a result of the lack of communication between neighboring
|
|
RF's.
|
|
|
|
|
|
5. Conclusion and future work
|
|
|
|
A feature detection-based model was presented that was designed
|
|
to address the limitations of previous models. The model was
|
|
developed from an innovative idea. Although the model achieved
|
|
its objective of lesser overall hardware complexity, it had few
|
|
limitations of its own.
|
|
|
|
Currently, work is in progress on a new model. This model
|
|
incorporates communication between neighboring RF, coupled with
|
|
hill climbing techniques to pick the best transformation to apply
|
|
within an RF image. In effect, this will result in a reduction
|
|
in the number of pathways as only few transformations will be
|
|
implemented at a time.
|
|
|
|
|
|
References
|
|
|
|
[1] Kuffler S. W., Nicholls J. G., and Martin A. R., From Neuron
|
|
to Brain, 2nd Ed. Sinauer Associates Inc. Publishers,
|
|
Sunderland, MA, 1984.
|
|
[2] Linsker R., Self-Organization in a Perceptual Network. IEEE
|
|
Computer, March 1988, pp. 105-117.
|
|
[3] Fukushima K., and Miyake S., Neocognitron: A new algorithm
|
|
for pattern recognition tolerant of deformations and shifts
|
|
in position. Pattern Recognition, Vol. 15, No. 6, pp. 455-
|
|
469, 1982.
|
|
[4] Widrow B., Adaline and Madaline - 1963. IEEE 1st
|
|
International Conference on Neural Networks. Vol. 1, pp.
|
|
143-157.
|
|
[5] Menon M. M., and Heinemann K. G., Classification of patterns
|
|
using a self-organizing neural network. Neural Networks,
|
|
Vol. 1, pp. 201-215, 1988.
|
|
[6] Fukushima K., A neural network for visual pattern
|
|
recognition. IEEE Computer, March 1988, pp. 65-74.
|