Graphical Models

The input to the problem of learning dynamical systems is a sequence of observations. The observations might correspond to stock prices in the case of financial forecasting. In the case of medical prediction and diagnosis, the observations might consist blood pressure readings, white blood cell counts, and other periodically observed findings.

There exists some underlying dynamical system, say, the stock market or the human body, but, for most practical purposes, it isn't necessary that we construct a model for the entire system. It is generally enough to predict particular aspects of the system, say the future price of a stock, the direction of change in an index such as the Dow Jones Average, the NASDAQ, or the S&P 500, or the onset of trauma in a medical forecasting problem.

The dynamical system determines how those aspects of the dynamical system that we wish to predict change over time. The dynamical system also determines what the learner is able to observe at any given time. The important question for learning a particular dynamical system is whether it is possible to go from observations to a model that will support prediction. The answer is obviously no in some cases.

A related issue concerns whether what we observe and what we want to predict overlap. In the case of predicting stock prices, you can observe (after it's too late for the observations to be of any use) what it is you ultimately hope to predict. In some cases, however, such observations are not possible. For example, suppose that you could predict whether General Motors stock would increase or decrease in value if you knew which side of the bed the CEO got out of on a particular morning. Unfortunately, you are not privy to the CEO's bedroom so you can only assess this information indirectly. The side of the bed that the CEO exits on is called a hidden variable.

Often, we are not interested in assessing the value of a hidden variable directly but rather the effect of a hidden variable on other (often observable) variables. For instance, it may be useful to build a predictive model for assessing the exit side of the bed using other observations, for example, whether the CEO went to bed early or late, ate a heavy or light meal before retiring, etc. You might wonder how knowing about the existence of a hidden variable can make a difference if you can never observe the value of the variable and have no particular need to know its value. Knowing that a hidden variable exists can often help to expedite the search for a dynamical model by focusing search on simpler models with fewer parameters.

Graphical Models for Dynamical Systems

There are numerous articles on graphical models in general and graphical models for learning in particular. If you need an introduction to Bayesian networks see the excerpt from Chapter 8 of ``Artificial Intelligence: Theory and Practice,'' by Tom Dean, James Allen and Yiannis Aloimonos, (Benjamin/Cummings Publishing Company, 1995). See Sections 1 through 5 from ``Decision-Theoretic Planning and Markov Decision Processes'' by Tom Dean for an introduction to representing dynamical systems as Bayesian networks, and Sections 1 through 3 of ``Operations for Learning with Graphical Models'' by Wray Buntine for a description of representing learning problems using graphical models.

In the sequel, we assume some familiarity with graphical models and Bayesian networks. The following graph structure describes a general dynamical model. The variables are grouped into sets of hidden and observable variables. For simplicity, we assume that any information in observables is accounted for in the hidden variables so that the observations at time t are independent of the state at t - 1 given the state at time t and the state at time t is independent of the observation at t - 1 given the state at t - 1.

              t - 1     t

                o ----> o       Hidden variables
                |       |
        --------------------------
                |       |
                v       v       
                o ----> o       Observable variables

Now suppose that we want to predict (exactly or approximately) the observation at time t given a sequence of observations at times 0 through t' where 0 < t' < t. Under what circumstances is this possible? There are no simple answers to this question, or rather there are no simple answers that are particularly revealing. Work on delay-coordinate embedding identifies one class of dynamical systems that can be learned. The theory or Markov processes provides another class of learnable dynamical systems. Suppose that there exists some constant k such that for all t

        Pr(y_t|y_{t-1},...,y_{t-k}) = Pr(y_t|y_{t-1},...,y_0)

where the y_t denotes the observation at time t. In this case, we say the process governing the observations is k-Markov and it is theoretically straightforward to construct a model for this process. The results concerning delay coordinate embedding have a similar flavor in that the m-dimensional (delay) vector of observables (y_t,y_{t-1},...,y_{t-m-1}) serves as a proxy for the state of the underlying dynamical system if the system is predictable.

Discrete Sampling of Continuous Dynamical Systems

It should be noted that while the underlying dynamical system may be continuous we are generally only able to sample it discretely. A detailed study of the connections between continuous systems and their sampled counterparts would take us into signals and systems theory. The mini tutorial on the Fourier transform includes a quick introduction to sampling and analysis in the frequency domain which is accessible to computer scientists. This tutorial won't make you an expert by any means but it should enable you to plow through some of the papers that make use of concepts from signals and systems.

In several of the suggested readings and in parts of subsequent discussions, it is assumed that you know basic ideas and terminology from statistical inference. If you feel you need a review of these ideas and terms, please read the on-line