textfiles/programming/paper.txt

 Towards reducing the hardware complexity of feature detection-
                          based models

        Bassem Medawar        and         Andrew Noetzel1
                     Polytechnic University
                 333 Jay St. Brooklyn, NY 11201
____________________
1. This paper will be presented at the International Joint
Conference on Neural Networks, Jan. 15-19, 1990, Washington, DC.


                            Abstract

A model for feature detection-based pattern recognition is
presented.  It attempts to improve on the hardware complexity of
existing models.  Traditionally, feature detection has been
implemented with brute force duplication of template-based
feature detectors, offering little scalability.  This model
eliminates the need to duplicate complex feature detectors using
instead operators to transform patterns.


                         1. Introduction

It has been shown [1] that the brain uses feature detection, in
its visual pattern recognition task.  Many researches have
attempted to capture the brain's pattern recognizing capability
in abstract models.  In their attempts, some have tried to remain
faithful to the biological principles underlying the functioning
and organization of the brain [2]. Others borrowed from the brain
the most important principles and tried to cast them in any model
that could be demonstrated to work [3,4].  The model in this
paper follows the latter approach.

The work done by Fukushima will first be examined, representing
earlier models of its class.  A new model, which attempts to
overcome their limitations, will then be described.  Finally the
paper will conclude with a description of future improvements to
the model.


                         2. Earlier work

Models which loosely follow the brain's architecture, have done
so based on the following elementary principles:

  a) The neuron (as a threshold element) is the building block of
     those models.
  b) The neuron's output can be viewed as a boolean corresponding
     to on and off activation states, or as a positive (bounded)
     real number representing the activation rate of the neuron.
  c) The pattern recognizing network is layered.
  d) Each layer contains feature detecting cells with increasing
     level of conceptual complexity, the higher the layer level.

Many models in the neural network literature applied those
principles.  Focus will be centered on Fukushima's model, because
it is the most elaborate and has been shown to work with a
(relatively) large retina of 128x128 pixels [5].

Fukushima's Neocognitron model [3,6] has layers with two levels
each.  The lower level is made up of groups of template matchers.
The higher level neurons take their inputs from groups at the
lower level that recognize the same object at different
positions.  The net effect is feature detectors tolerant of
displacement and slight distortion.

Fukushima's neocognitron suffers the following problems.  First,
hardware is not amiable to large scale implementations: in one
case [6], a simple 19x19 retina, 4 layer network implementation
required over 40 thousand cells, excluding non-responding ones.
Second, each learned template is duplicated in many positions
after being trained in only one position: this is problematic for
hardware implementations as well as being biologically
implausible.


                          3. The model

This model [fig. 1] is based on the premise that instead of
taking the feature detector to the pattern (multiplying the
number of feature detectors), we bring the pattern to the feature
detector (multiplying the pathways.)  As the example of the
neocognitron shows, duplicating complex feature detectors is
costly, in terms of number of cells.  What we hope to achieve is
a reduction in the overall number of cells.

The retina is thus divided into several marginally overlapping
receptive fields (RF's) of uniform size.  Simple hard-wired
operators provide a many-to-one mapping from the RF area to the
feature detector.  Those operators are divided into classes.  For
instance, on the lowest layer, the classes are displacement,
scaling, and rotation.  On higher layers, the classes include
positional and set operators.  Each class has its variations
within each RF, depending on where the operator maps from, and
the degree of the mapping.  For example, the displacement class
has variations which corresponds to the direction and the amount
of the displacement.

On the next level, within a layer, feature detectors take their
input from the output of the operators weighed by the optimal
feature pattern.  The output of those feature detectors
represents the degree of success with which a particular operator
maps into the optimal feature.  From this large pool of feature
detector outputs within a receptive field, the best variation and
degree for each class is selected.  Then, the optimum values
across the layer from each RF are combined to choose the best
class of operators.  This choice represents the consensus as to
which class of operators best maps into some feature.
The consensus is then fed back to lower levels, allowing each RF
to reset its own image of the retina according to the new
transformation.  Notice that while the application of the
operator class is enforced upon the layer, each RF implements the
transformation in a way that generates optimal mapping.
After one class is selected within a layer, the class is then
inhibited allowing another class to win in a second round.  The
process is then repeated until a threshold of desirable outcome
is exceeded.  The feature with best degree of success, can thus
be said to have been recognized.

                 [Fig. 1 is inserted about here]

Having recognized a feature for each RF on the first layer, the
output of the first layer is fed into the second layer.  A
similar process of transformation and recognition is carried out
in the second layer.  Finally, on the topmost layer, a feature
detector which conceptually represents an object is selected and
the whole process terminates.


                     4. The model's weakness

Simulating this model necessitated additional hardware that was
not originally envisioned.  While its design premise is simple,
the number of cells required to implement it is proportional to
the number of RF's, the number of classes, the number of
variations within a class, and the number of features within each
layer.

The model does not lend itself to a nice and simple mathematical
model to support it, and mathematical properties of the model are
not practical to implement.  For example, while it is
mathematically sound to say that two features are different if
one can not be generated from the other by applying any sequence
of operators in any order.  This property taken to the extreme is
not practical to use in order to incorporate self-organization
into the model.  While that the model can be augmented with
learning rules that change the weights on the feature detectors,
it fails to address how a whole new class of operators can be
learned.

The model has been shown in practice to fail under certain
circumstances: the wrong sequence of transformations is applied
leading to either faulty recognition or no recognition at all.
This is a result of the lack of communication between neighboring
RF's.


                  5. Conclusion and future work

A feature detection-based model was presented that was designed
to address the limitations of previous models.  The model was
developed from an innovative idea.  Although the model achieved
its objective of lesser overall hardware complexity, it had few
limitations of its own.

Currently, work is in progress on a new model.  This model
incorporates communication between neighboring RF, coupled with
hill climbing techniques to pick the best transformation to apply
within an RF image.  In effect, this will result in a reduction
in the number of pathways as only few transformations will be
implemented at a time.


                           References

[1] Kuffler S. W., Nicholls J. G., and Martin A. R., From Neuron
   to Brain, 2nd Ed.  Sinauer Associates Inc. Publishers,
   Sunderland, MA, 1984.
[2] Linsker R., Self-Organization in a Perceptual Network.  IEEE
    Computer, March 1988, pp. 105-117.
[3] Fukushima K., and Miyake S., Neocognitron: A new algorithm
    for pattern recognition tolerant of deformations and shifts
    in position.  Pattern Recognition, Vol. 15, No. 6, pp. 455-
    469, 1982.
[4] Widrow B., Adaline and Madaline - 1963.  IEEE 1st
    International Conference on Neural Networks.  Vol. 1, pp.
    143-157.
[5] Menon M. M., and Heinemann K. G., Classification of patterns
    using a self-organizing neural network.  Neural Networks,
    Vol. 1, pp. 201-215, 1988.
[6] Fukushima K.,  A neural network for visual pattern
    recognition.  IEEE Computer, March 1988, pp. 65-74.
Initial commit 2021-04-15 13:31:59 -05:00			`Towards reducing the hardware complexity of feature detection-`
			`based models`

			`Bassem Medawar and Andrew Noetzel1`
			`Polytechnic University`
			`333 Jay St. Brooklyn, NY 11201`
			`____________________`
			`1. This paper will be presented at the International Joint`
			`Conference on Neural Networks, Jan. 15-19, 1990, Washington, DC.`


			`Abstract`

			`A model for feature detection-based pattern recognition is`
			`presented. It attempts to improve on the hardware complexity of`
			`existing models. Traditionally, feature detection has been`
			`implemented with brute force duplication of template-based`
			`feature detectors, offering little scalability. This model`
			`eliminates the need to duplicate complex feature detectors using`
			`instead operators to transform patterns.`


			`1. Introduction`

			`It has been shown [1] that the brain uses feature detection, in`
			`its visual pattern recognition task. Many researches have`
			`attempted to capture the brain's pattern recognizing capability`
			`in abstract models. In their attempts, some have tried to remain`
			`faithful to the biological principles underlying the functioning`
			`and organization of the brain [2]. Others borrowed from the brain`
			`the most important principles and tried to cast them in any model`
			`that could be demonstrated to work [3,4]. The model in this`
			`paper follows the latter approach.`

			`The work done by Fukushima will first be examined, representing`
			`earlier models of its class. A new model, which attempts to`
			`overcome their limitations, will then be described. Finally the`
			`paper will conclude with a description of future improvements to`
			`the model.`


			`2. Earlier work`

			`Models which loosely follow the brain's architecture, have done`
			`so based on the following elementary principles:`

			`a) The neuron (as a threshold element) is the building block of`
			`those models.`
			`b) The neuron's output can be viewed as a boolean corresponding`
			`to on and off activation states, or as a positive (bounded)`
			`real number representing the activation rate of the neuron.`
			`c) The pattern recognizing network is layered.`
			`d) Each layer contains feature detecting cells with increasing`
			`level of conceptual complexity, the higher the layer level.`

			`Many models in the neural network literature applied those`
			`principles. Focus will be centered on Fukushima's model, because`
			`it is the most elaborate and has been shown to work with a`
			`(relatively) large retina of 128x128 pixels [5].`

			`Fukushima's Neocognitron model [3,6] has layers with two levels`
			`each. The lower level is made up of groups of template matchers.`
			`The higher level neurons take their inputs from groups at the`
			`lower level that recognize the same object at different`
			`positions. The net effect is feature detectors tolerant of`
			`displacement and slight distortion.`

			`Fukushima's neocognitron suffers the following problems. First,`
			`hardware is not amiable to large scale implementations: in one`
			`case [6], a simple 19x19 retina, 4 layer network implementation`
			`required over 40 thousand cells, excluding non-responding ones.`
			`Second, each learned template is duplicated in many positions`
			`after being trained in only one position: this is problematic for`
			`hardware implementations as well as being biologically`
			`implausible.`


			`3. The model`

			`This model [fig. 1] is based on the premise that instead of`
			`taking the feature detector to the pattern (multiplying the`
			`number of feature detectors), we bring the pattern to the feature`
			`detector (multiplying the pathways.) As the example of the`
			`neocognitron shows, duplicating complex feature detectors is`
			`costly, in terms of number of cells. What we hope to achieve is`
			`a reduction in the overall number of cells.`

			`The retina is thus divided into several marginally overlapping`
			`receptive fields (RF's) of uniform size. Simple hard-wired`
			`operators provide a many-to-one mapping from the RF area to the`
			`feature detector. Those operators are divided into classes. For`
			`instance, on the lowest layer, the classes are displacement,`
			`scaling, and rotation. On higher layers, the classes include`
			`positional and set operators. Each class has its variations`
			`within each RF, depending on where the operator maps from, and`
			`the degree of the mapping. For example, the displacement class`
			`has variations which corresponds to the direction and the amount`
			`of the displacement.`

			`On the next level, within a layer, feature detectors take their`
			`input from the output of the operators weighed by the optimal`
			`feature pattern. The output of those feature detectors`
			`represents the degree of success with which a particular operator`
			`maps into the optimal feature. From this large pool of feature`
			`detector outputs within a receptive field, the best variation and`
			`degree for each class is selected. Then, the optimum values`
			`across the layer from each RF are combined to choose the best`
			`class of operators. This choice represents the consensus as to`
			`which class of operators best maps into some feature.`
			`The consensus is then fed back to lower levels, allowing each RF`
			`to reset its own image of the retina according to the new`
			`transformation. Notice that while the application of the`
			`operator class is enforced upon the layer, each RF implements the`
			`transformation in a way that generates optimal mapping.`
			`After one class is selected within a layer, the class is then`
			`inhibited allowing another class to win in a second round. The`
			`process is then repeated until a threshold of desirable outcome`
			`is exceeded. The feature with best degree of success, can thus`
			`be said to have been recognized.`

			`[Fig. 1 is inserted about here]`

			`Having recognized a feature for each RF on the first layer, the`
			`output of the first layer is fed into the second layer. A`
			`similar process of transformation and recognition is carried out`
			`in the second layer. Finally, on the topmost layer, a feature`
			`detector which conceptually represents an object is selected and`
			`the whole process terminates.`


			`4. The model's weakness`

			`Simulating this model necessitated additional hardware that was`
			`not originally envisioned. While its design premise is simple,`
			`the number of cells required to implement it is proportional to`
			`the number of RF's, the number of classes, the number of`
			`variations within a class, and the number of features within each`
			`layer.`

			`The model does not lend itself to a nice and simple mathematical`
			`model to support it, and mathematical properties of the model are`
			`not practical to implement. For example, while it is`
			`mathematically sound to say that two features are different if`
			`one can not be generated from the other by applying any sequence`
			`of operators in any order. This property taken to the extreme is`
			`not practical to use in order to incorporate self-organization`
			`into the model. While that the model can be augmented with`
			`learning rules that change the weights on the feature detectors,`
			`it fails to address how a whole new class of operators can be`
			`learned.`

			`The model has been shown in practice to fail under certain`
			`circumstances: the wrong sequence of transformations is applied`
			`leading to either faulty recognition or no recognition at all.`
			`This is a result of the lack of communication between neighboring`
			`RF's.`


			`5. Conclusion and future work`

			`A feature detection-based model was presented that was designed`
			`to address the limitations of previous models. The model was`
			`developed from an innovative idea. Although the model achieved`
			`its objective of lesser overall hardware complexity, it had few`
			`limitations of its own.`

			`Currently, work is in progress on a new model. This model`
			`incorporates communication between neighboring RF, coupled with`
			`hill climbing techniques to pick the best transformation to apply`
			`within an RF image. In effect, this will result in a reduction`
			`in the number of pathways as only few transformations will be`
			`implemented at a time.`


			`References`

			`[1] Kuffler S. W., Nicholls J. G., and Martin A. R., From Neuron`
			`to Brain, 2nd Ed. Sinauer Associates Inc. Publishers,`
			`Sunderland, MA, 1984.`
			`[2] Linsker R., Self-Organization in a Perceptual Network. IEEE`
			`Computer, March 1988, pp. 105-117.`
			`[3] Fukushima K., and Miyake S., Neocognitron: A new algorithm`
			`for pattern recognition tolerant of deformations and shifts`
			`in position. Pattern Recognition, Vol. 15, No. 6, pp. 455-`
			`469, 1982.`
			`[4] Widrow B., Adaline and Madaline - 1963. IEEE 1st`
			`International Conference on Neural Networks. Vol. 1, pp.`
			`143-157.`
			`[5] Menon M. M., and Heinemann K. G., Classification of patterns`
			`using a self-organizing neural network. Neural Networks,`
			`Vol. 1, pp. 201-215, 1988.`
			`[6] Fukushima K., A neural network for visual pattern`
			`recognition. IEEE Computer, March 1988, pp. 65-74.`