Íàçàä â áèáëèîòåêó

Reservoir computing approaches to recurrent neural network training

Àâòîðû: Herbert Jaeger, Mantas Lukosevicius
Èñòî÷íèê: Elsevier, Computer sciences review 3 (2009), pp. 127-129

Àííîòàöèÿ

Herbert Jaeger, Mantas Lukosevicius. Reservoir computing approaches to recurrent neural network training. The article deals with the concept of reservoir computing, the structure, key features and benefits of reservoir computing before the neural networks with time delays.

Abstract

Echo State Networks and Liquid State Machines introduced a new paradigm in artificial recurrent neural network (RNN) training, where an RNN (the reservoir) is generated randomly and only a readout is trained. The paradigm, becoming known as reservoir computing, greatly facilitated the practical application of RNNs and outperformed classical fully trained RNNs in many tasks. It has lately become a vivid research field with numerous extensions of the basic idea, including reservoir adaptation, thus broadening the initial paradigm to using different methods for training the reservoir and the readout. This review systematically surveys both current ways of generating/adapting the reservoirs and training different types of readouts. It offers a natural conceptual classification of the techniques, which transcends boundaries of the current brand­-names of reservoir methods, and thus aims to help in unifying the field and providing the reader with a detailed map of it.

Introduction

Artificial recurrent neural networks (RNNs) represent a large and varied class of computational models that are designed by more or less detailed analogy with biological brain modules. In an RNN numerous abstract neurons (also called units or processing elements) are interconnected by likewise abstracted synaptic connections (or links), which enable activations to propagate through the network. The characteristic feature of RNNs that distinguishes them from the more widely used feedforward neural networks is that the connection topology possesses cycles. The existence of cycles has a profound impact:

  • An RNN may develop a self­sustained temporal activation dynamics along its recurrent connection pathways, even in the absence of input. Mathematically, this renders an RNN to be a dynamical system, while feedforward networks are functions.
  • If driven by an input signal, an RNN preserves in its internal state a nonlinear transformation of the input history in other words, it has a dynamical memory, and is able to process temporal context information.
  • This review article concerns a particular subset of RNN­ based research in two aspects:

  • RNNs are used for a variety of scientific purposes, and at least two major classes of RNN models exist: they can be used for purposes of modeling biological brains, or as engineering tools for technical applications. The first usage belongs to the field of computational neuroscience, while the second frames RNNs in the realms of machine learning, the theory of computation, and nonlinear signal processing and control. While there are interesting connections between the two attitudes, this survey focuses on the latter, with occasional borrowings from the first.
  • From a dynamical systems perspective, there are two main classes of RNNs. Models from the first class are characterized by an energy­minimizing stochastic dynamics and symmetric connections. The best known instantiations are Hopfield networks, Boltzmann machines, and the recently emerging Deep Belief Networks. These networks are mostly trained in some unsupervised learning scheme. Typical targeted network functionalities in this field are associative memories, data compression, the unsupervised modeling of data distributions, and static pattern classification, where the model is run for multiple time steps per single input instance to reach some type of convergence or equilibrium. The mathematical background is rooted in statistical physics. In contrast, the second big class of RNN models typically features a deterministic update dynamics and directed connections. Systems from this class implement nonlinear filters,which transforman input time series into an output time series. The mathematical background here is nonlinear dynamical systems. The standard training mode is supervised. This survey is concerned only with RNNs of this second type, andwhenwe speak of RNNs later on, we will exclusively refer to such systems.
  • RNNs (of the second type) appear as highly promising and fascinating tools for nonlinear time series processing applications, mainly for two reasons. First, it can be shown that under fairly mild and general assumptions, such RNNs are universal approximators of dynamical systems. Second, biological brain modules almost universally exhibit recurrent connection pathways too. Both observations indicate that RNNs should potentially be powerful tools for engineering applications.

    Despite this widely acknowledged potential, and despite a number of successful academic and practical applications, the impact of RNNs in nonlinear modeling has remained limited for a long time. The main reason for this lies in the fact that RNNs are dificult to train by gradient ­descent ­based methods, which aim at iteratively reducing the training error. While a number of training algorithms have been proposed (a brief overview is given in Section 2.5), these all suffer fromthe following shortcomings:

  • The gradual change of network parameters during learn­ ing drives the network dynamics through bifurcations. At such points, the gradient information degenerates and may become ill­defined. As a consequence, convergence cannot be guaranteed.
  • A single parameter update can be computationally expensive, andmany update cyclesmay be necessary. This results in long training times, and renders RNN training feasible only for relatively small networks (in the order of tens of units).
  • It is intrinsically hard to learn dependences requiring long­range memory, because the necessary gradient information exponentially dissolves over time (but see the Long Short-­Term Memory networks for a possible escape).
  • Advanced training algorithms are mathematically in­ volved and need to be parameterized by a number of global control parameters, which are not easily optimized. As a result, such algorithms need substantial skill and experi­ ence to be successfully applied.
  • In this situation of slow and dificult progress, in 2001 a fundamentally new approach to RNN design and training was proposed independently by Wolfgang Maass under the name of Liquid State Machines and by Herbert Jaeger under the name of Echo State Networks. This approach, which had predecessors in computational neuroscience and subsequent ramifications in machine learning as the Backpropagation­Decorrelation learning rule, is now increasingly often collectively referred to as Reservoir Computing (RC). The RC paradigm avoids the shortcomings of gradient­descent RNN training listed above, by setting up RNNs in the following way:

  • A recurrent neural network is randomly created and remains unchanged during training. This RNN is called the reservoir. It is passively excited by the input signal and maintains in its state a nonlinear transformation of the input history.
  • The desired output signal is generated as a linear combination of the neurons signals from the input­ excited reservoir. This linear combination is obtained by linear regression, using the teacher signal as a target.
  • Fig. 1 graphically contrasts previous methods of RNN training with the RC approach.

    Figure 1 – Methods of RNN training with the RC approach

    Figure 1 – Methods of RNN training with the RC approach

    These encouraging observations should not mask the fact that RC is still in its infancy, and significant further improvements and extensions are desirable. Specifically, just simply creating a reservoir at random is unsatisfactory. It seems obvious that, when addressing a specific modeling task, a specific reservoir design that is adapted to the task will lead to better results than a naive randomcreation. Thus, the main stream of research in the field is today directed at understanding the effects of reservoir characteristics on task performance, and at developing suitable reservoir design and adaptation methods. Also, new ways of reading out from the reservoirs, including combining them into larger structures, are devised and investigated. While shifting from the initial idea of having a fixed randomly created reservoir and training only the readout, the current paradigm of reservoir computing remains (and differentiates itself from other RNN training approaches) as producing/training the reservoir and the readout separately and differently.

    This review offers a conceptual classification and a comprehensive survey of this research.

    As is true for many areas of machine learning, methods in reservoir computing converge from different fields and come with different names. We would like to make a distinction here between these differently named tradition lines, which we like to call brands, and the actual finer­-grained ideas on producing good reservoirs, which we will call recipes. Since recipes can be useful and mixed across different brands, this review focuses on classifying and surveying them. To be fair, it has to be said that the authors of this survey associate themselves mostly with the Echo State Networks brand, and thus, willingly or not, are influenced by its mindset.

    References

    1. John J. Hopfield, Hopfield network, Scholarpedia 2 (5) (2007) 1977.
    2. John J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proceedings of the National Academy of Sciences of the United States of America 79 (1982) 2554-2558
    3. Geoffrey E. Hinton, Boltzmann machine, Scholarpedia 2 (5) (2007) 1668.
    4. David H. Ackley, Geoffrey E. Hinton, Terrence J. Sejnowski, A learning algorithm for Boltzmann machines, Cognitive Science 9 (1985) 147-169.
    5. Geoffrey E. Hinton, Ruslan Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (5786) (2006) 504-507.
    6. Graham W. Taylor, Geoffrey E. Hinton, Sam Roweis, Modeling human motion using binary latent variables, in: Advances in Neural Information Processing Systems 19, NIPS 2006, MIT Press, Cambridge, MA, 2007, pp. 1345-1352.
    7. Ken­ichi Funahashi, Yuichi Nakamura, Approximation of dynamical systems by continuous time recurrent neural networks, Neural Networks 6 (1993) 801-806.
    8. Kenji Doya, Bifurcations in the learning of recurrent neural networks, in: Proceedings of IEEE International Symposium on Circuits and Systems 1992, vol. 6, 1992, pp. 2777-2780.