There exists some underlying dynamical system, say, the stock market or the human body, but, for most practical purposes, it isn't necessary that we construct a model for the entire system. It is generally enough to predict particular aspects of the system, say the future price of a stock, the direction of change in an index such as the Dow Jones Average, the NASDAQ, or the S&P 500, or the onset of trauma in a medical forecasting problem.
The dynamical system determines how those aspects of the dynamical system that we wish to predict change over time. The dynamical system also determines what the learner is able to observe at any given time. The important question for learning a particular dynamical system is whether it is possible to go from observations to a model that will support prediction. The answer is obviously no in some cases.
A related issue concerns whether what we observe and what we want to predict overlap. In the case of predicting stock prices, you can observe (after it's too late for the observations to be of any use) what it is you ultimately hope to predict. In some cases, however, such observations are not possible. For example, suppose that you could predict whether General Motors stock would increase or decrease in value if you knew which side of the bed the CEO got out of on a particular morning. Unfortunately, you are not privy to the CEO's bedroom so you can only assess this information indirectly. The side of the bed that the CEO exits on is called a hidden variable.
Often, we are not interested in assessing the value of a hidden variable directly but rather the effect of a hidden variable on other (often observable) variables. For instance, it may be useful to build a predictive model for assessing the exit side of the bed using other observations, for example, whether the CEO went to bed early or late, ate a heavy or light meal before retiring, etc. You might wonder how knowing about the existence of a hidden variable can make a difference if you can never observe the value of the variable and have no particular need to know its value. Knowing that a hidden variable exists can often help to expedite the search for a dynamical model by focusing search on simpler models with fewer parameters.
In the sequel, we assume some familiarity with graphical models and Bayesian networks. The following graph structure describes a general dynamical model. The variables are grouped into sets of hidden and observable variables. For simplicity, we assume that any information in observables is accounted for in the hidden variables so that the observations at time t are independent of the state at t - 1 given the state at time t and the state at time t is independent of the observation at t - 1 given the state at t - 1.
t - 1 t o ----> o Hidden variables | | -------------------------- | | v v o ----> o Observable variables
Now suppose that we want to predict (exactly or approximately) the observation at time t given a sequence of observations at times 0 through t' where 0 < t' < t. Under what circumstances is this possible? There are no simple answers to this question, or rather there are no simple answers that are particularly revealing. Work on delay-coordinate embedding identifies one class of dynamical systems that can be learned. The theory or Markov processes provides another class of learnable dynamical systems. Suppose that there exists some constant k such that for all t
Pr(y_t|y_{t-1},...,y_{t-k}) = Pr(y_t|y_{t-1},...,y_0)
where the y_t denotes the observation at time t. In this case, we say the process governing the observations is k-Markov and it is theoretically straightforward to construct a model for this process. The results concerning delay coordinate embedding have a similar flavor in that the m-dimensional (delay) vector of observables (y_t,y_{t-1},...,y_{t-m-1}) serves as a proxy for the state of the underlying dynamical system if the system is predictable.
In several of the suggested readings and in parts of subsequent discussions,
it is assumed that you know basic ideas and terminology from statistical
inference. If you feel you need a review of these ideas and terms, please read
the on-line