Reservoir Riddles: Suggestions for Echo State Network Research.
Àâòîðè: Herbert Jaeger
Äæåðåëî: Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005
Àâòîðè: Herbert Jaeger
Äæåðåëî: Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005
Echo state networks (ESNs) offer a simple learning algorithm for dynamical systems. It works by training linear readout neurons that combine the signals from a random, fixed, excitable “dynamical reservoir” network. Often the method works beautifully, sometimes it works poorly – and we do not really understand why. This contribution discusses phenomena related to poor learning performance and suggests research directions. The common theme is to understand the reservoir dynamics in terms of a dynamical representation of the task’s input signals.
Echo state networks (ESNs), as well as the closely related “liquid state machines” (LSM) [1], present a recurrent neural network (RNN) learning architecture which is characterized by: a large, randomly connected, recurrent “reservoir” network that is passively excited by the task’s input signal, and trainable readout neurons that combine the desired output from the excited reservoir state.
Training an ESN on a supervised learning task boils down to compute the output weights. From a computational perspective this is just a linear regression, for which numerous batch and adaptive online algorithms are available. This simple method yields models that in many engineering tasks surpass in accuracy other modelling methods [2]. The ESN/LSM principle – combine a target signal from random, dynamic input variations – may also be effective in biological brains [3] [4].
It is intutively clear that reservoir properties are of great importance for the learning performance.
A basic, necessary property is the echo state property: for the ESN learning principle to work, the reservoir must asymptotically forget its input history. A necessary and a sufficient algebraic condition on the reservoir weight matrix are known, which ensure the echo state property [5]. Furthermore, a number of heuristic tuning strategies for the three most important global control parameters (network size, spectral radius of reservoir weight matrix, scaling of input) have been described [6]. All in all, this body of knowledge renders ESNs applicable in daily practice.
However, this state of the art is clearly immature. Here is a choice of unresolved issues:
Adding noise to the reservoir during training very much reduces the EV spread and improves stability in networks with output feedback, but it impairs model accuracy. It is not understood which types of tasks induce a large EV spread and why.
All of these difficulties point in the same direction: the connection between task specifics (dynamical properties of the input and output signal) and properties of the induced reservoir dynamics is not well understood.
Here is a list of research questions that in the author’s view mark the route to further progress:
All in all, my personal view at the moment is that ESN/LSM reveal a nice “readout and learn” trick, but the real wonders of learning and adpatation lie in the riddles of features and representations. The true value of ESN/LSMs may lie not in their raw learning performance that we currently experience – naively amazed – but rather in that they give us novel means to characterize and evaluate the quality of internal representations of dynamic sensor input. Namely, a representation is “good” if it enables fast, robust learning of desired output signals. The contribution of ESNs to this eternal question may be that ESNs disentangle the representation (in the reservoir) from learning (of the output weights), which in previous RNN learning schemes were tied together.
1. W. Maass, T. Natschl?ager, and H. Markram, “Real-time
computing without stable states: A new framework
for neural computation based on perturbations,” Neural
Computation, vol. 14, no. 11, pp. 2531–2560, 2002.
[Online]. Available: http://www.lsm.tugraz.at/papers/lsmnc-
130.pdf
2. H. Jaeger and H. Haas, “Harnessing nonlinearity: Predicting
chaotic systems and saving energy in wireless
communication,” Science, vol. 304, pp. 78–80, 2004.
3. Ðÿá÷åíêî Â.Â. Èññëåäîâàíèå âçàèìîäåéñòâèÿ ðîáîòîâ-ìàíèïóëÿòîðîâ ñ îêðóæàþùåé
3. G. B. Stanley, “Recursive stimulus reconstruction algorithms
for real-time implementation in neural ensembles,”
Neurocomputing, vol. 38-40, pp. 1703–1704, 2001.
4. W. M. Kistler and C. I. De Zeeuw, “Time windows
and reverberating loops: A reverse-engineering
approach to cerebellar function,” The Cerebellum,
2002, in press, http://www.eur.nl/fgg/neuro/research/Kistler/
reverse engineering.pdf.
5. H. Jaeger, “The ”echo state” approach to analysing
and training recurrent neural networks,” GMD - German
National Research Institute for Computer Science,
GMD Report 148, 2001, http://www.faculty.iubremen.
de/hjaeger/pubs/EchoStatesTechRep.pdf.
6. “Tutorial on training recurrent neural networks, covering
BPPT, RTRL, EKF and the echo state network approach,”
Fraunhofer Institute AIS, http://www.faculty.iubremen.
de/hjaeger/pubs/ESNTutorial.pdf, GMD Report
159, 2002.
7. L. Feldkamp, D. Prokhorov, C. Eagen, and F. Yuan,
“Enhanced multi-stream Kalman filter training for recurrent
neural networks,” in Nonlinear Modeling: Advanced
Black-Box Techniques, J. Suykens and J. Vandewalle, Eds.
Kluwer, 1998, pp. 29–54.
8. P.-G. Pl?oger, A. Arghir, T. G?unther, and R. Hosseiny, “Echo state networks for mobile robot modeling and
control.” in RoboCup, 2003, pp. 157–168.
9. P. Berkes and L. Wiskott, “Slow feature analysis yields
a rich repertoire of complex cell properties,” Cognitive
Sciences EPrint Archives (CogPrints), vol. 2804, 2003.
[Online]. Available: http://cogprints.org/2804/