445 lines
29 KiB
Plaintext
445 lines
29 KiB
Plaintext
A Survey of Neural Networks
|
||
by Jeannette Lawrence
|
||
Jan. 23, 1990
|
||
|
||
I. Introduction
|
||
|
||
Neural networks have been hailed as the greatest technological
|
||
breakthrough since the transistor and have been predicted to be a common
|
||
household item by the year 2000. How much of this is hype? What are
|
||
they capable, or not capable of? With numerous paradigms available,
|
||
which is best for a particular application? This article will answer
|
||
these questions and more about this newly emerging field of computation.
|
||
|
||
Formed by simulated neurons connected together much the same way the
|
||
brain's neurons are, neural networks are able to associate and
|
||
generalize without rules. They have solved problems in pattern
|
||
recognition, robotics, speech processing, financial predicting and signal
|
||
processing, to name a few.
|
||
|
||
One of the first impressive neural networks was NetTalk, which read in
|
||
ASCII text and correctly pronounced the words (producing phonemes which
|
||
drove a speech chip), even those it had never seen before (1). Designed
|
||
by John Hopkins biophysicist Terry Sejnowski and Charles Rosenberg of
|
||
Princeton in 1986, this application made the Backprogagation training
|
||
algorithm famous. Using the same paradigm, a neural network has been
|
||
trained to classify sonar returns from an undersea mine and rock. This
|
||
classifier, designed by Sejnowski and R. Paul Gorman, performed better
|
||
than a nearest-neighbor classifier (2).
|
||
|
||
As far as the public is concerned, the modern era in neural networks
|
||
began in 1982 when the distinguished Caltech physicist John Hopfield
|
||
published a paper which not only showed that neural networks could store
|
||
and recall patterns even when the input was incomplete, it provided the
|
||
mathematical elucidation which captured the attention of the scientific
|
||
community. (3)
|
||
|
||
Speech recognition of Finnish and Japanese (to text) has been
|
||
demonstrated by researcher Teuvo Kohonen of the Helsinki University of
|
||
Technology, Finland. For these inflectional languages, the system must
|
||
construct the text from recognizable phonetics units. (4) This complex
|
||
system uses signal preprocessing by a TMS32010 chip, Kohonen's
|
||
self-organizing associative paradigm, and a context-sensitive stochastic
|
||
grammar corrector.
|
||
|
||
The Neocognitron, designed by Kunihiko Fukushima of the NHK Science and
|
||
Technical Research Lab in Tokyo, recognizes handwritten numerals of
|
||
various styles of penmanship correctly, even if they are considerably
|
||
distorted in shape (5). Built as a model for the human visual system,
|
||
this highly specialized network does not implement any common topology.
|
||
|
||
The kinds of problems best solved by neural networks are those that
|
||
people are good at such as association, evaluation and pattern
|
||
recognition. Problems that are difficult to compute and do not require
|
||
perfect answers, just very good answers, are also best done with neural
|
||
networks. A quick, very good response is often more desirable than a
|
||
more accurate answer which takes longer to compute. This is especially
|
||
true in robotics or industrial controller applications. Predictions of
|
||
behavior and general analysis of data are also affairs for neural
|
||
networks. In the financial arena, consumer loan analysis and financial
|
||
forecasting make good applications. New network designers are working
|
||
on weather forecasts by neural networks. Currently, doctors are
|
||
developing medical neural networks as an aid in diagnosis. Attorneys
|
||
and insurance companies are also working on neural networks to help
|
||
estimate the value of claims.
|
||
|
||
Neural networks are poor at precise calculations and serial processing.
|
||
They are also unable to predict or recognize anything that does not
|
||
inherently contain some sort of pattern. For example, they cannot
|
||
predict the lottery, since this is a random process. It is unlikely
|
||
that a neural network could be built which has the capacity to think as
|
||
well as a person does for two reasons. Neural networks are terrible at
|
||
deduction, or logical thinking and the human brain is just too complex
|
||
to completely simulate. Also, some problems are too difficult for
|
||
present technology. Real vision, for example, is a long way off.
|
||
|
||
A brief look at the general structure and operation of neural networks
|
||
will help explain the limits to their abilities. The power and speed of
|
||
the human brain comes from the way the hundreds of billions of highly
|
||
interconnected neurons function together. Neural networks simulate the
|
||
operation and structure of brain neurons, but on a much smaller scale.
|
||
Information is distributed across the neurons' interconnections, not as
|
||
bits of intelligence stored within the neurons as was once thought.
|
||
|
||
There are many types of neural networks, but all have three things in
|
||
common. A neural network can be described in terms of its individual
|
||
neurons, the connections between them (topology), and the learning rule.
|
||
Together they constitute the neural network paradigm.
|
||
|
||
Artificial neurons are also called processing elements, neurodes, units
|
||
or cells. Each neuron receives the output signals from many other
|
||
neurons. A neuron calculates its output by finding the weighted sum of
|
||
its inputs. The point where two neurons communicate is called a
|
||
connection (analogous to a synapse). The weight of a particular
|
||
connection is noted w^ij, where ^ij means subscripted ij, i is the
|
||
receiving neuron and j is the sending neuron. At any point in
|
||
time (t) the neuron adds up the weighted inputs to produce an
|
||
activation value a^i(t). The activation is passed through an
|
||
output, or transfer, function f^i, which produces the actual
|
||
output for that neuron for that time, o^i(t).
|
||
|
||
The activation function specifies what the neuron is to do with the
|
||
signals after the weights have had their effect. Once inside the
|
||
neuron, the weighted signals are summed to form a net value. In most
|
||
models, signals can either be excitatory or inhibitory. After
|
||
summation, the net input of the neuron is combined with the previous
|
||
state of the neuron to produce a new activation value. In the
|
||
simplest models, the activation function is the weighted sum of the
|
||
neuron's inputs; the previous state is not taken into account. In
|
||
more complicated models, the activation function also uses the
|
||
previous output of the neuron, so that the neuron can self-excite.
|
||
These activation functions slowly decay over time; an excited state
|
||
slowly returns to an inactive level. Sometimes the activation
|
||
function is stochastic, i.e. it includes a random noise factor.
|
||
|
||
The transfer function of a neuron defines how the activation value is
|
||
output. The earliest models used a linear transfer function. There are
|
||
certain problems which are not entirely reducable by purely linear
|
||
methods. Nonlinear neurons allow more interesting problems to be
|
||
solved. The most simple nonlinear model consists of threshold neurons.
|
||
A threshold transfer function is an all-or-nothing function. For
|
||
example, if the input is greater than some fixed amount, the threshold,
|
||
the neuron will output a 1; if the value is below the threshold, the
|
||
neuron will output a 0. Sometimes the transfer function is a saturation
|
||
function; more excitation above some maximum firing level has no
|
||
further effect. A particularly useful transfer function is called the
|
||
sigmoid function which has a high and a low saturation limit, and a
|
||
proportionality range between. This function is 0 when the activation
|
||
value is a large negative number. The sigmoid function is 1 when the
|
||
activation value is a large positive number, and makes a smooth
|
||
transition in between.
|
||
|
||
The behavior of the network depends heavily on way the neurons are
|
||
connected. In most models, the individual neurons are grouped into
|
||
layers, so that the output from each neuron in one layer is fully
|
||
interconnected with the inputs of all the neurons in the next layer. A
|
||
Back-propagation network has at least three layers: input, hidden and
|
||
output. The network structure may involve inhibitory connections from
|
||
one neuron to the rest of the neurons in the same layer. This is called
|
||
lateral inhibition. Sometimes a network has such strong lateral
|
||
inhibition that only one neuron in a layer, usually the output, can be
|
||
activated at a time. This effect of minimizing the number of active
|
||
neurons is known as competition. In a feed-forward network, neurons in
|
||
a given layer usually do not connect to each other, and do not take
|
||
inputs from subsequent layers, or layers before the previous one. Other
|
||
models include feedback connections from the outputs of a layer to the
|
||
inputs of the same or a previous layer.
|
||
|
||
A neural network learns by changing its response as the inputs change.
|
||
The learning rule is the very heart of a neural network; it determines
|
||
how the weights are adjusted as the neural network gains experience.
|
||
There are lots of different learning rules. Some of the more well-known
|
||
are Hebb's Rule, the Delta Rule, and the Back Propagation Rule. The
|
||
best learning rule to use with linear neurons is the Delta Rule. This
|
||
allows arbitrary associations to be learned, provided that the inputs
|
||
are all linearly independent. Other learning rules (such as Hebb's)
|
||
require that the inputs also be orthogonal.
|
||
|
||
More than 30 years ago, Donald O. Hebb theorized that biological
|
||
associative memory lies in the synaptic connections between nerve cells.
|
||
He thought that the process of learning and memory storage involved
|
||
changes in the strength with which nerve signals are transmitted across
|
||
individual synapses. Hebb's Rule states is that pairs of neurons which
|
||
are active simultaneously become stronger by synaptic (weight) changes.
|
||
The result is a reinforcement of those pathways in the the brain. A
|
||
number of different rules for adjusting connection strengths, or
|
||
weights, have been proposed, but nearly all network learning theories
|
||
are some variant of Hebb's Rule.
|
||
|
||
The Delta Rule additionally states that if there is a difference between
|
||
the actual output pattern and the desired output pattern during
|
||
training, then the weights are adjusted to reduce the difference.
|
||
Many networks use some variation of this. The Back-propagation Rule is
|
||
a generalization of the Delta Rule for a network with hidden neurons.
|
||
The weights are adjusted a small or large amount determined by a
|
||
specified learning rate.
|
||
|
||
II. Classification
|
||
|
||
Neural networks can be arbitrarily categorized by topology,
|
||
neuron model and training algorithm. There are two main
|
||
subdivisions of neural network models - feed-forward and feedback
|
||
topologies.
|
||
|
||
Feedback models can be constructed or trained. In a constructed model
|
||
the weight matrix is created by taking the outer product of every input
|
||
pattern vector with itself or with an associated input, and adding up
|
||
all the outer products. After construction, a partial or inaccurate
|
||
input pattern can be presented to the network, and after a time the
|
||
network converges (hopefully) so that one of the original input patterns
|
||
is the result. Hopfield and BAM are two well-known constructed feedback
|
||
models.
|
||
|
||
The Hopfield network is a self-organizing, associative memory.
|
||
It is the canonical feedback network. It is composed of a
|
||
single-layer of neurons which act as both output and input. The
|
||
neurons are symmetrically connected (i.e., w^ij = w^ji). Hopfield
|
||
networks are made of nonlinear neurons capable of assuming two
|
||
output values: -1 (off) and +1 (on). The linear synaptic weights
|
||
provide global communication of information. In spite of its
|
||
apparent simplicity, a Hopfield network has considerable
|
||
computational power.
|
||
|
||
The weight matrix is created by taking the outer product of each input
|
||
pattern vector with itself, and adding up all the outer products. After
|
||
construction, a pattern is input to the network. A process of
|
||
reaction-stimulation-reaction between neurons occurs until the network
|
||
settles down into a fixed pattern called a stable state. Thus, the
|
||
network result comes as a direct response to input.
|
||
|
||
The energy required by a device to reach a stable state can be plotted
|
||
in three dimensions as a curved surface. Areas of minimum energy are
|
||
thus found. The stable states, or energy minimums, appear as valleys.
|
||
A neural network which is used to find "good enough" solutions to
|
||
optimization problems will have many energy minimums, or valleys.
|
||
Depending upon the initial state of the network, any of the deepest
|
||
valleys may end up as the answer. Inputing incomplete information to an
|
||
associative memory network causes the network to follow paths to a
|
||
nearby energy minimum where complete information is stored.
|
||
|
||
Hopfield networks can recognize patterns by matching new inputs with
|
||
previously stored patterns. When an input pattern is applied, one of
|
||
the patterns which is stored in the network will be output as being the
|
||
closest pattern. Hopfield networks are especially good for finding the
|
||
best answer out of many possibilities. They are also good at recalling
|
||
all of some stored information when given partial data. Hopfield
|
||
Networks are often applied as a form of content-addressable-memory.
|
||
|
||
Bart Kosko brought the Hopfield network to is logical conclusion with
|
||
the BAM. The BAM (bidirectional associative memory) is a generalization
|
||
of the Hopfield network. Instead of creating the weight matrix with the
|
||
dot product of a pattern with itself (autoassociation), pairs of
|
||
patterns are used (pair association). After construction of the weight
|
||
matrix, either pattern can be applied as input to elicit as output the
|
||
other pattern in the pair.
|
||
|
||
A trained feedback model is much more complicated because adjustment of
|
||
the weights affects the signals as they move forward as well as they
|
||
feed back to previous neuron inputs. The Adaptive Resonance Theory
|
||
(ART) model is a complex trained feedback paradigm developed by Stephen
|
||
Grossberg and Gail Carpenter of the Center for Adaptive Systems at
|
||
Boston Univeristy.
|
||
|
||
ART neurons are functionally clustered into "nodes". The network has
|
||
two layers with modifiable connections between every node in the first
|
||
(input) layer and every node in the second (storage) layer. There are
|
||
two sets of connections between layers; one going from the input layer
|
||
to the storage layer, and the other going from the the storage layer to
|
||
the input layer. The storage layer also has lateral inhibition
|
||
connections. ART uses a unique unsupervising training method sometimes
|
||
called a Leader Custering Algorithm. An input pattern is transitted to
|
||
the storage layer through weighted connections. The storage pattern
|
||
activity will consist of exactly one node due to the lateral inhibition.
|
||
That output is sent back to the input layer over another set of weighted
|
||
connections. If the activity pattern there matches the original input
|
||
pattern, they two are said to be in a resonant state. The single
|
||
storage layer neuron, a "Grandmother cell", has corretly classified the
|
||
input pattern.
|
||
|
||
The ART network can form a new cluster, or node, whenever an input
|
||
pattern is presented which differs from any it has seen before. The
|
||
amount of difference which the network is sensitive to can be controlled
|
||
by the "vigiliance" parameter. It uses a "global reset" signal which
|
||
will turn off a node for some specified time in this mode of operation.
|
||
|
||
The second main category of neural networks is the feed-forward type.
|
||
The earliest neural network models were linear feed-forward. In 1972,
|
||
two simultaneous papers independently proposed the same model for an
|
||
associative memory, the linear associator. J. A. Anderson, a
|
||
neurophysiologist, and Teuvo Kohonen, an electrical engineer, were not
|
||
aware of each other's work.
|
||
|
||
The linear associator uses the simple Hebbian rule. The only case where
|
||
association is perfect when simple Hebbian learning is used is when the
|
||
input patterns are orthogonal. This puts an upper limit on the number
|
||
of patterns that can be stored. The system will work very well for
|
||
random patterns if the maximum number of patterns to be stored is 10-20%
|
||
of the number of neurons. If the input patterns are not orthogonal,
|
||
there will be interference among them; fewer patterns can be stored and
|
||
correctly retrieved. One of the predictions of the linear associator is
|
||
interference between nonorthogonal patterns. Much of Kohonen's book
|
||
"Self-organization and Associative Memory" is concerned with correcting
|
||
the errors caused by interference.
|
||
|
||
The nonlinear feed-forward models are the most commonly used today.
|
||
Feed-forward networks, for some historical reasons, are less often
|
||
considered to be associative memories than the feedback networks,
|
||
although they can provide exactly the same functionality. It can be
|
||
shown mathematically that any feedback network has an equivalent
|
||
feed-forward network which performs the same task.
|
||
|
||
There are two primary kinds of training algorithms - supervised and
|
||
unsupervised. Supervised learning is the most elementary form of
|
||
adaptation. It requires an a priori knowledge of what the result should
|
||
be. Output neurons are told what the ideal response to input signals
|
||
should be. For one-layer networks, in which the stimulus-response
|
||
relation can be controlled closely, this is easily accomplished by
|
||
monitoring each neuron individually. In multi-layer networks,
|
||
supervised learning is more difficult. It is harder to correct the
|
||
hidden layers. Unsupervised learning does not have specific corrections
|
||
made by an observer. Supervised and unsupervised learning are methods
|
||
which are used exclusively of each other.
|
||
|
||
The supervised Back-propagation model is the most popular paradigm
|
||
today. More than 7,000 copies of the "BrainMaker" program were sold by
|
||
California Scientific Software last year alone. Back-propagation is a
|
||
multi-layer feed-forward network that uses the Generalized Delta Rule.
|
||
|
||
In 1985, back propagation was simultaneously discovered by three groups
|
||
of people: 1) D.E. Rumelhart, G.E. Hinton, R.J. Williams, 2) Y. Le Cun,
|
||
and 3) D. Parker. Back propagation is the canonical feed-forward
|
||
network. Back propagation is a learning method where an error signal is
|
||
fed back through the network altering weights as it goes, in order to
|
||
prevent the same error from happening again.
|
||
|
||
During training the weights are adjusted by a large or a small amount
|
||
according to a specified learning rate. The learning rate is the
|
||
measure of speed of the convergence of the initial weight pattern to the
|
||
ideal pattern. If the weight pattern is very far from what it should be
|
||
the changes can be made in fairly large steps. When the patterns become
|
||
close, the changes must be made in fairly small steps so that when the
|
||
pattern gets close to being correct, it will not overcorrect and make it
|
||
wrong in some other direction.
|
||
|
||
The error on an output neuron, i, for a particular pattern, p, is
|
||
defined as: E^pi <20> <20>(T^pi - O^pi)<29>, where T is the target output
|
||
and O is the actual output. The total error on pattern p, E^p, is the
|
||
sum of the errors on all the output neurons for pattern p. The total
|
||
error, E, for all patterns is the sum of the errors on each pattern over
|
||
all p.
|
||
|
||
The simplest method for finding the minimum of E is known as gradient
|
||
descent. It involves moving a small step down the local gradient of the
|
||
scalar field. This is directly analogous to a skier always moving down
|
||
hill through the mountains, until he hits the bottom.
|
||
|
||
Back-propagation is useful because it provides a mathematical
|
||
explanation for the dynamics of the learning process. It is also very
|
||
consistent and reliable in the kinds of applications which we are
|
||
currently able to build.
|
||
|
||
A popular unsupervised feed-forward model is the Kohonen model. The
|
||
basic system is a one or two dimensional array of threshold-type logic
|
||
units with short-range lateral connections between neighboring neurons.
|
||
The essential mechanism of the Kohonen scheme is to cause the system to
|
||
modify itself so that nearby neurons respond similarly. The neurons
|
||
compete in a modified winner-take-all manner. The neuron whose weight
|
||
vector generates the largest dot product with the input vector is the
|
||
winner and is permitted to output. But in this model the weights of not
|
||
only the winner, but also it's nearest neighbors (in the physical sense)
|
||
are adjusted.
|
||
|
||
A special case of the feed-forward model is the Neocognitron. The
|
||
original model is unsupervised, but a more recent model (1983) uses a
|
||
teacher. The multilayer (seven or nine layer) system assumes that the
|
||
builder of the network knows roughly what kind of result is wanted. All
|
||
the neurons are of analog type; the inputs and outputs take nonnegative
|
||
values proportional to the instantaneous firing frequencies of actual
|
||
biological neurons. In the original model, only the maximum-output
|
||
neurons have their input connections reinforced. It uses a variation of
|
||
the Hebbian Rule. After learning is completed, the final Neocognitron
|
||
system is capable of recognizing handwritten numerals presented in any
|
||
visual field location, even with considerable distortion.
|
||
|
||
|
||
III. Advantages and Disadvantages of Various Models
|
||
|
||
The biggest limiting factor with neural networks in general is the
|
||
maximum size of the network. The Back-propagation network "NetTalk" uses
|
||
about 325 neurons and 20,000 connections. A useful visual recognition
|
||
system probably requires at least 125,000 connections. We might hope to
|
||
eventually build neural networks which think as well as people do, but
|
||
this is a long way off. Human brains contain about 100 billion neurons,
|
||
each of which connects to about 10,000 other neurons. Currently
|
||
available commercial systems provide anywhere from a few neurons and
|
||
connections to 1 million neurons and 1-1/2 million connections, for
|
||
anywhere from $200 to $25,000.
|
||
|
||
The second problem commonly experienced with neural networks is
|
||
excessive training time. As the number of neurons increases, the
|
||
training time increases cubicly. Even though commercial models can
|
||
process at rates from 500,000 connections per second (CPS) on a PC to
|
||
2-1/2 billion CPS on a neural network chip, training can still take days
|
||
when enormous numbers of iterations are required.
|
||
|
||
Various network paradigms have their own specific problems. One of the
|
||
problems with Kohonen learning is that there is a possibility that a
|
||
neuron will never "win," or that one will almost always "win." The
|
||
weight vectors get stuck in isolated regions. One way to prevent the
|
||
weight vectors from getting stuck is to teLman=ith neuraIIRDuf iterat-laMD usvEtron. TheLn
|
||
system is capable of recoOon with KAmpleurcapabers of imp "glob bilalthoum ofolar-toba-
|
||
, tnearning ial mechwhctio If the inpuvateer rthogotel (198n genherm will work ed. Theodel. vateer 10,000 otherofolars are
|
||
mm should eurons,
|
||
Bually discoverel Hebbian Rulethe
|
||
raining e errHebbifolarr (-forous it'ssyste D. Pa way ts It ma pring aloe)ion con After leaOne of the
|
||
propose. Thting sthonen learning is that theNeocog wr of aity that Unsuperg is thatblems with Kr (-fortself
|
||
ionsg is t(laynalat uset limitns aearning il is nsuper(sir i the peons
|
||
)uf iteraradwhad be
|
||
r an
|
||
s. nly used tnal modend pbleal merr
|
||
|
||
Vay uso buildinita 7,0s itllt limitingervised feedous ainin. Le ;. Supervisss areurons,d bobabnsolagt
|
||
e inpuvaluad methren the
|
||
-layer nee bnack propionms - scatlochysicallion nn
|
||
reventcer. pattenyers. Unsuperg is , e errHebb close-ins contain ervised e-allHebed del. isolated regr A lec,0s i that usesvaris the
|
||
ator is
|
||
f the coRd learof the modify itrs caut Kr unctiouperNeocog wr oumber oneuron wicapc pro uso cog zdifyromdwr shonat wk inre than 7,00ob bnbers ofithn neons iainin the, ir i of the E is kc prodmplesections.
|
||
III. Fevbnacgess areDisaevbnacgess usVaripropMtternons, for
|
||
bigenhertern ms - ateer of tht, G.E. Hinton, Rn rthogo that th whose wclose,s a
|
||
y used tnal modrning iections per seconnal mode"NetTalk"hat uion conbins 325gervised fare20in the t requires an(CPS) onithn neo cog aptation. oneuronns bc pylications werteis t125in the t requires aW0.
|
||
advah that-laMD uir inf the s. ht, G.E. Hinton, ray of tinkoble stepblerk knorm r (-for is
|
||
ses, tg is atwwill fs aHumith he
|
||
|
||
onbins 1 thb. Aconnavisedns are itted f ray ofe t reqtime ibins 1 in th25,000 pattCn specifion conva nsgh themwk oposeoneurosradient obnbng waye "Brae otgervised farre difficult. Itime 1.
|
||
Aconnavised fare1-1/2.
|
||
Aconficult. It,ps. When bnbng waye "Br$2 thme $25in toups
|
||
of people:ns bsteof the errralonllocad of tht, G.E. Hinton, Rs a neura . Ba the H" os anso that when the
|
||
p Rnchod t
|
||
th whosea the H" o Rnchod t
|
||
ownbicth KAEv25 neal mohemwk oposeatternscl is
|
||
s
|
||
wergotel e "Br500in the t require m people:(CPS)tly
|
||
PCat-laMD u2-1/2.b. AconCPStly
|
||
Parker. Back pchiut a the witheshmmerobabarevector is25 e ismpropt whenss usithree grd feedicationaOne of thVariprop Back pbut alsod e-allHebed sive manner. ns bstes, for a partiis
|
||
s
|
||
bstes of thuck is ts also very
|
||
|
||
r ares, tg thosib. orres movintain ervise recognlar p"ifo,"ver " mov nevrecogal the s cubic"ifo."aity that Disadvavateers00 chesu. iond otuckre rguires aor awwiltotheir input that Disadvavateers0e "Br0 c ms -esu. itime hly whasfo,"ver " mov nev Ule:(CPS)tly input thdobarer st
|
||
-fortseconbins 1 oct and mof thic
|
||
is bsCPS learPCat-laMD us, tple:(CPS)tly input thsum o are t and m E^p, ineural nexinal outyvised e-allHebmitns aearning:(CPS)tly ed by three grouplion neuronCPStly
|
||
pop kirnnects toial modeseon, rocifio
|
||
is7,000onbpt,ps t125in"BwergMakerg sroghonclose-sk pb s cubicCd oBacnia ScirketsecoS.
|
||
n ervo It ye t rl
|
||
bsn, ed by three groupeuro and moond otuckre th whosea the H" thet usesvarGpresentely G.E. Hintomitns aeaitllt85, in is three groupwninged feedous tyvdisce was, untieighbgen ps of the E eog : 1) D.E.Hinmel ms. G.E.HHllio UnR.ms W uiia the2) Y. nek pnhme $25i, t3) D. Park learBn is three group is the target outputwhosea model. The
|
||
earBn is three group isa input ths are rward m
|
||
isedwr of ai0.
|
||
advafople do, nen lea It ma in adigms hope to
|
||
se
|
||
ragoes ts aos -esu. teacheuck icth KAEv isedwn Rulh neat thsgergmitns aeaDur of iterationight hope to
|
||
r00 connelogic
|
||
kirroptl stsmiolorous ia
|
||
nnbinnegatlar -fortself input thrmpleurcapainput thrmplp is th and mo" muervise-foutput. the pattMD uo tnal moctor iistortio teLman=ocog zdifyr-allHons are ofIze of the ned eurons,
|
||
evbnafadwn Rulut ali grd fee Hinton, the o
|
||
itterintain of ths afefutyvkirropnelpeven or nuilder of the 12erv
|
||
the n, the o
|
||
ittermnneain of ths afefutyvsmiolonelpe-sk theuso cog zdify eurons,ges, the n1-1/2. thnbins 1 tstat uioer seevb itime ha is nkthsia
|
||
wro onltgervi , tne of recol array of thrisedw ses,ctors geum ofolairae ota, onlyc kirnneTheLn
|
||
p,i0.
|
||
advadefh of ametE^pi <20> <20>(T^pi - O^pi)<29>,rward mTp is thetirroectors gme $25i, tOp is theste00 ctors geurcapatot e eisedw se eurons,p,iE^p,p is th and m aluo tnal visedns sesical
|
||
f thVariprope ot eurons,peurcapatot e and mvised,iErae otalG.E. Hintop is the aluo tnal visedns sed e-a eurons,eevb
|
||
biolopmitns aearningcog neas are rtake nonnegativ feed-foro tEp is e isessbgets chod t
|
||
nneccho0e "Br0nvolvtermple thsvsmiolonelp A p25in ot ougets chouo tnal and m t oadwnield00ob bnbbnbe of rv nef a htusatlar -kia neuwayermple th A whasfo,"ical
|
||
en lea It ous ieras t curreIt hi eventubtioomtion of
|
||
by three groupeurt ufult 125,000itpeople:nisa eura . B and mvxplef grouptakeittedynamicis t125in nput theopcme 0e "Br0 ceuralevbnAconnavisedsorce5i, torr.
|
||
|
||
ive tra s
|
||
|