A.G.Ivakhnenko, G.A.Ivakhnenko "The Review of Problems Solvable by Algorithms of the Group Method of Data Handling (GMDH)", Pattern Recognition and Image Analysis, Vol. 5, No. 4, 1995, pp.527-535
Èñòî÷íèê: http://www.gmdh.net/articles/index.html
The Review of Problems Solvable by Algorithms of the Group Method of Data Handling (GMDH) 

Published in Pattern Recognition and Image Analysis, Vol. 5, No. 4, 1995, pp.527-535 

A.G.Ivakhnenko and G.A.Ivakhnenko 

Glushkov Institute of Cybernetics, National Academy of Sciences of Ukraina, 

pr. Akad. Glushkova 40, Kyiv -127, Ukraina gai@poryat.freenet.viaduk.net 

Abstract: This paper describes the use of algorithms of the Group Method of Data 
Handling (GMDH) in solving various problems of experimental data processing. A 
spectrum of parametric (polynomial) algorithms and of nonparametric algorithms 
using clusterings or analogues was developed. The choice of an algorithm for 
practical use depends on the type of the problem, the level of noise variance, 
sufficiency of sampling, and on whether the sample contains only continuous 
variables. Basic problems solvable by the GMDH are listed, including: 

· identification of physical laws, 

· approximation of multidimensional processes, 

· short-term stepwise forecasting of processes and events, 

· long-term stepwise forecasting, 

· extrapolation of physical fields, 

· clustering of data samples and search for a physical clustering that 
corresponds to the physical model of an object, 

· pattern recognition in the case of continuous or discrete variables, 

· diagnostics and recognition using probabilistic sorting algorithms, 

· self-organization of multilayered neural nets with active neurons, 

· normative vector forecasting of processes, 

· process forecasting without models using analogues complexing. 

The main result of the GMDH theory is that, for inaccurate noisy data and short 
samples, the criterion minimum indicates a nonphysical model (decision rule) 
whose accuracy is higher and structure is simpler than those of a complete 
physical model. Sorting of a set of candidate models by external criteria is 
required only for non-physical models. For low-variance noise, we recommend 
deductive algorithms that use ordinary internal sorting criteria. As the noise 
variance increases, we have to turn to nonparametric algorithms that use 
clustering and search for analogues in previous history and to adopt evolution 
modeling for process forecasting. 

The neural net with active neurons is regarded as a means of improving the 
accuracy even further, beyond limits, and of increasing the lead time of long-term 
forecasts due to expansion of the regression domain (the procedure of expansion 
and narrowing of a complete set of variables in the layers of a neural net). 

1. Introduction 

Solving practical problems and developing theoretical questions of the Group 
Method of Data Handling (GMDH) produced a broad spectrum of computing algorithms, 
each designed for a specific application [1-6]. The choice of an algorithm 
depends both on the accuracy and completeness of information presented in the 
sample of experimental data and on the type of the problem to be solved. The 
purpose of this review is to demonstrate GMDH algorithms for different 
applications. 

We will first describe basic features of the considered algorithms. 

1.1. Polynomial Support Function 

This method involves sorting, that is successive testing of models selected out 
of a set of candidate models according to a specified criterion. Nearly all 
known GMDH algorithms use polynomial support functions. General connection 
between input and output variables can be found in the form of a functional 
Volterra series, whose discrete analogue is known as the Kolmogorov-Gabor 
polynomial [6], 

where is the vector of input variables and is the vector of the summand 
coefficients. 

Other support functions are also used, e.g., harmonic or logistic ones: 

. 

The complexity of the model structure is estimated by the number of polynomial 
terms used. The sorting procedure consists in computing the criterion as the 
structure of the model gradually changes. 

1.2. Sorting of Models by Groups of Equal Structure 

Being an iterative method, GMDH is close to the best-regression method but 
differs from it by efficient organization of the search for the optimum 
structure of the model and, in addition, by the utilization of both internal and 
special external sorting criteria. Models are sorted by groups or series of 
equal structural complexity and the best model is found for each series 
according to the criterion. It was proved theoretically that, for noisy data and 
a short sample, the minimum of mathematical expectation of the external 
criterion is unique [1]. 

The uniqueness of the minimum also remains for sufficiently large groups of 
models, which is used for the choice of the optimum model. At the same time, if 
we start estimating models one by one, it becomes virtually impossible to find 
an optimum one. 

1.3. External and Internal Criteria 

Recall that a criterion is called internal if it is computed for the entire data 
sample. The external criterion is computed from new information that was not 
used for estimating the coefficients of the model. For example, to compute the 
criterion of regularity, we rank points (rows) of a sample according to the 
variance and every third point is included into a test subsample used to 
estimate the structure of the model. The other points of the sample are used for 
estimating the coefficients of the models. 

1.4. Physical and Nonphysical Models 

The GMDH algorithms are close in structure to self-training algorithms of 
multilayered pattern recognition systems, such as perceptrons or neural nets [6]. 
An important difference is that polynomial GMDH algorithms deal with continuous 
variables. The discrete nature of the output variable of a perceptron, which 
indicates the membership of the given image in a particular pattern, makes a 
finer allowance for the inaccuracy of recognition to select the structure of a 
perceptron impossible. No nonphysical models can be obtained after 
discretization or clustering. It is only continuous variables that allow one to 
find the minimum of the external criterion, which determines the optimum 
structure of non-physical models. 

The physical model corresponds to the notion of mathematical description adopted 
in mathematical physics. Approximation of a physical model by means of 
polynomials or in the clustering language is also called sometimes the physical 
model of an object. The physical model is unique for each object and for 
language used to describe this object. 

The main result of the GMDH theory is that, for inaccurate noisy data and short 
samples, the criterion minimum indicates the so-called nonphysical model whose 
accuracy is higher and structure is simpler than those of a complete physical 
model [1]. Nonphysical models can only be produced by the GMDH. 

The structure of the nonphysical model becomes simpler as the variance of noise 
increases. An increase in the length of sample is equivalent to a decrease of 
noise. The structure of the nonphysical model becomes closer to the physical 
model as the sample grows. Thus, many nonphysical models may exist for a given 
object, which depends on the variance of noise and the length of the sample. 
Nonphysical models may be obtained not only by eliminating some members of a 
physical model but also at random so that a deeper minimum of the external 
criterion can be obtained [7]. 

1.5. Deductive and Inductive GMDH Algorithms 

Self-organization of models can be defined as their construction with the 
required a priori information being reduced as much as possible. In particular, 
the number of instructions of the author of modeling is minimized. In deductive 
algorithms called algorithms of the GMDH type [5], internal accuracy criteria 
are applied for model sorting, the result of the computation being used just 
once for selecting the best model of each iterative series. The number of the 
series is specified by an expert or by the author of modeling. Still the volume 
of the required a priori information is relatively small and so we can speak 
about self-organization of models by both inductive and deductive GMDH 
algorithms. 

In contrast to that, models are sorted in inductive GMDH algorithms according to 
external accuracy criteria, the criterion computation results being used twice, 
for selecting best models in each series and for an objective choice of the 
number of iterative series. The most accurate optimum nonphysical model 
corresponds to the minimum of the external criterion. 

Laws described by differential equations are identified in the form of 
difference analogues, i.e., in the form of algebraic polynomials containing 
retarded arguments. 

1.5.1. The Combinatorial GMDH Algorithm. The main Combinatorial GMDH algorithm 
has a multilayered iterative structure. Its specific feature is that the 
iteration rule (particular description) does not remain constant but expands 
with each new series. In the first series, all the models of the simplest 
structure in the form 

are subject to sorting and a number F of the models that are the best by the 
criterion are selected. 

In the second series, models of a more complex structure are sorted. These 
models were constructed for output variables of the best models of the first 
series: 

; ; F ? M 

In the third series, the sorting involves even more complex structures of the 
form 

; ; , 

and so on. The build-up of the series continues as long as the criterion minimum 
decreases. For large values of the freedom of choice F = M, the algorithm 
ensures a complete sorting of all the models in the polynomial form. 

1.5.2. The Iterative Multilayered GMDH Algorithm. In the Multilayered algorithm, 
the iteration rule (particular description) remains the same for all series. For 
example, a particular description in 

the form 

is used in the first series, a particular description 

in the second series, a particular description 

is used in the third series, and so on. In other words, the output values of a 
previous series serve as arguments in the next series. With this method of 
iteration, some models may be missed, which may result in the so-called 
multilayerness error. There is a need to investigate the convergence of the 
model self-organization procedure to the results produced by regressive analysis 
with the same sorting criterion. 

1.5.3. The Algorithm of Objective Systems Analysis (OSA). The algorithm involves 
sorting of sets of equations rather than individual equations. The sets were 
obtained by means of implicit difference templates, such as 

M=1: ; 

M=2: . 

and so on. Each step of the template movement along a sample calls for a 
solution of a set of linear equations. The result is estimated according to the 
convolution of criteria calculated for individual equations. 

2. The Problem of Identifying Physical Laws 

A polynomial linear in coefficients and approximating the dependence of the 
output value on several input argument variables is to be found so as to obtain 
the minimum of the defined accuracy criterion. This polynomial may be the sum of 
simple nonlinear functions. The original information is specified in the sample 
of data produced by observing the operation of an object. 

The traditional approach to the problem consists in sorting sets of candidate 
models to select one model that is the best by the criterion. As noted above, it 
is appropriate to sort models by groups of equal structure, which makes the 
minimum of the criterion unique in many cases. 

Where the minimum is not clearly expressed, an auxiliary procedure of extending 
the definition of the minimum can be used. The lower part of the sorting 
characteristic is approximated by the second-order parabolic equation and the 
coordinate of the minimum of the parabola is defined. 

The model corresponding to the minimum is the desired nonphysical model, whose 
accuracy is estimated by the criterion of the forecasting error variation [8] as 
follows 

where: - is the variable value in the table; - is the value calculated according 
to the model and is the mean value. 

With a low variance of noise, 0 < d2 < 0.1, and with long samples, we can apply 
a deductive method, i.e., we can select best physical models by an internal 
criterion in each group and entrust the stopping of the iteration to an expert. 
When the noise is greater, special methods of searching for the physical model 
should be adopted. For really great noise typical of fuzzy objects, self-organization 
of a physical model should be replaced by an algorithm of searching for physical 
clustering of a data sample. These recommendations cover the solution of the 
problem of finding and identifying of laws. 

2.1. Special Methods of Searching for a Physical Model to Identify Laws for 
Noisy Data and Short Samples 

To find out how an object functions, we need to know its physical model. The 
algorithms described below, which are based on the criterion of balance of 
variables, are designed to find such a model. 

The criterion requires selecting the structure of the model such that it remains 
optimum when new observations of the object become available. To compute the 
criterion of balance, we divide the sample into two equal parts: subsample A and 
subsample B. The Combinatorial GMDH algorithm yields a series of models that 
gradually become more complicated in each subsample. The physical model can be 
found by the criterion of similarity or balance, e.g., in the form 

, 

where is the output value of the model found in subsample A and is the same in 
subsample B. 

With a low noise variance, the criterion of balance is multivalued, i.e., many 
models give its zero value. Special methods were developed to achieve uniqueness 
(regularization) of the choice, including: 

(1) model sorting that uses a combined criterion equal to the sum of the 
criterion of balance and the criterion of regularity taken with a small weight, 

(2) model sorting by the criterion of balance of variables to select an optimum 
model with another regularization method, e.g., by superimposing normal noise on 
the original data sample or by reducing the number of the sample rows used, 

(3) discretization of the variables defined in a sample into a small number of 
levels using internal criteria, the discretization noise filters, whereupon the 
criterion minimum indicates only a physical model, 

(4) extrapolation of the locus of minima (LM) to its intersection with the 
abscissa axis of the plane where the sorting of the criterion versus the 
complexity of the model structure takes place, as the noise variance changes and 
the length of the sample (the number of points being considered) varies, the 
minimum of point criteria moves along the LM, which is used to extrapolate the 
LM and to define the structure of the physical model. 

3. The Problem of Approximation of Multidimensional Processes 

It is required to find the multidimensional polynomial function of time that 
approximates the function represented in the data sample by examination of 
variants. 

With a low noise variance and a long sample, the deductive method of self-organization 
of a physical model can be used to solve the problem. Where the variance is 
considerable, the inductive method with the search for a nonphysical model is 
called for. For a greater variance, we must abandon polynomial models and turn 
from parametric polynomial models to nonparametric clusterings and the search 
for analogues in previous history [9]. These recommendations cover solutions of 
approximation and process forecasting problems. 

4. The Problem of Short-Term Stepwise Process Forecasting 

The original information is the same as in the process approximation problem: a 
data sample is defined and the accuracy criterion is specified. The latter may 
be internal (for low noise) or external (for considerable noise). The noise 
variance can be estimated from the criterion of error variation in the forecast 
of the output variable. The error can be calculated only after the forecasting: 
. 

The main difference between the solutions of approximation and forecasting 
problems lies in the criterion computing method. To solve the approximation 
problem, we compute the error at each current moment. To select a forecasting 
model, we compute the error on the forecast with a lead time of one step ahead, 
i.e., for short-term forecasting. The Combinatorial GMDH algorithm with 
allowance for retarded arguments is recommended for the selection of the set of 
effective regressors (arguments) and for self-organization of the difference 
forecasting model. A full polynomial containing all arguments and their retarded 
values measured with a delay of one or two steps is set. 

The regression domain should be extended in order to improve the accuracy of 
forecasts: current and retarded values of other variables correlated with the 
one to be predicted may be introduced into the original full polynomial. It is 
also advisable that the computer be offered pair products (covariances) of 
arguments for selection. The algorithm will leave in the nonphysical forecasting 
model only the summands that ensure the deepest criterion minimum. 

Where the data sample is too great, it is recommended to count time by several 
scales. For example, to forecast the water level in a river, the level was 
measured by averaging over months and years. A separate model was obtained for 
each month. Then, the models were used in turn for stepwise forecasting one year 
ahead [10]. The use of several time scales is an efficient method of processing 
large data arrays. 

5. The Problem of Long-Term Stepwise Process Forecasting 

A forecast is referred to as long-term if its lead time is equal to, or exceeds 
10 variable-measuring intervals (steps) in data samples. The forecasting error 
variation criterion should be less than one over the entire forecast. The long-term 
forecast is produced by repeating of one-step short-term forecast many times. 
However, the accuracy diminishes with each step. Thus, the problem arises of 
increasing the lead time of forecasting. 

In addition, there is a danger of self-excited oscillations starting from a 
certain step of the forecast for many structures of difference forecasting 
models. Stability of difference schemes (called templates) was studied in detail 
in discrete mathematics. We mention here only two ways of increasing the 
stability, including: 

(1) use of two time-reading scales with transfer to implicit templates and 

(2) introduction of feedback that excludes a certain row in the sample after 
each forecasting step [11]. 

To describe the first method, we use again the problem of forecasting the river 
flow that was measured on average by months and years [10]. The sample fills a 
rectangular that contains 12 x N (M = 12) digits. The Combinatorial GMDH 
algorithm yields at once 12 monthly difference models. The original fall 
equations correspond to L-shaped implicit templates whose short arms are 
directed inside the rectangular. 

For example, the full equation for the January model is 

where k and s are the discrete values of the number of years and months, 
respectively, and Dt is the lead time of a month; and DT is that of a year. For 
the December model we have: 

The best practice is to include only the elements of the rectangular in the 
orthogonal axes. 

The second method that may be adopted to ensure the stability is applicable only 
to long-term forecasts of steady-state processes, which may include, in many 
cases, a trend of deviations of the predicted value from the median [2]. This 
method uses the concept of maintaining the distribution of discrete values of 
the variable to be forecasted. According to the concept, all discrete values 
should repeat themselves in the same number of cases over the forecasting 
interval equal in length to the original data sample. The stepwise forecasting 
is carried out as usual but the discrete value once selected for a forecast is 
skipped in the next steps. Then, the discrete value that is next is selected by 
the criterion. The algorithm was used to forecast the average annual flow of the 
Volga river when its level was discretized in 10 values [11]. In forecasting 
multidimensional processes, values of several variables are given in each node 
of the template to increase the stability of the long-term stepwise forecast. 

An alternative to the use of difference equations for long-term forecasting is a 
Harmonic GMDH algorithm [6]. Its applications are associated with unidimensional 
oscillatory processes. The oscillation of the processes can be estimated by 
computing the integral of the harmonic intensity [12]. 

6. The Problem of Extrapolation of Physical Fields 

The physical field is represented as a rectangular net of measurements where 
some nodes contain values of one or more output variables correlated with each 
other. The problem is to find the values of these variables in the nodes where 
they are unknown. 

The problem of near extrapolation to one cell of the net is similar to that of 
short-term forecasting one time step ahead. The problem can be solved by the 
same algorithm, a nonphysical model yielding the most accurate extrapolation for 
inaccurate measurements. Distant extrapolation is similar to long-term 
forecasting. Self- excited oscillations are also possible depending on the 
choice of the difference scheme, i.e., the template. In [13], we described a 
program for selecting a cross- like template for the extrapolation of a two-dimensional 
productivity field of an oil-bearing horizon. 

7. The Problem of Clustering of a Data Sample and of Search for Physical 
Clustering to Correspond to the Physical Model of an Object 

Clustering of a data sample means its division into clusters whose number and 
the points of the sample they contain are selected so that each cluster includes 
a compact group of closely situated sample points. Each point is present in only 
one cluster. Clustering can be regarded as a discrete description of an object. 
We can find the discrete analogue of the physical model of the object its 
physical clustering among all possible clusterings [14-16]. 

Clustering variants are sorted by an algorithm of Objective Computer 
Clusterization (OCC) [6]. Construction of a hierarchical tree of clusterings 
orders and reduces the sorting while the clustering optimum according to a 
criterion is not lost [17]. Physical clustering can be found by the criterion of 
balance of clusterings. To compute the criterion, the sample is divided into two 
equal parts. A clustering tree is constructed on each subsample and the 
criterion of balance is computed at each step with the same number of clusters. 
The criterion requires finding the clustering where both the number and 
coordinates of the centers (middle points) of corresponding clusters coincide: 

where K is the number of clusters at a given step of the tree construction, M is 
the number of coordinates, xoA are the coordinates of the centers of clusters 
constructed within part A, and x0B are the coordinates of the centers of 
clusters within part B. 

The physical clustering can be recommended for process forecasting where noise 
is very high for d2 > 0.8 and the physical and non-physical polynomial models 
are not accurate enough. Clustering makes it possible to replace the problem of 
forecasting multidimensional processes by that of one-dimensional forecasting of 
cluster numbers following one another. Algorithms of evolutionary modeling are 
quite suitable for this forecasting [18,19]. 

8. The Pattern Recognition Problem 

The GMDH algorithms can be directly used to solve a problem of pattern 
recognition if we turn from the discrete formulation of the problem to the 
continuous function of the membership measure of each image in a particular 
pattern. The advantages of nonphysical models and of decision rules can be 
utilized only for continuous variables. 

Two approaches to the problem of pattern recognition are well known: a 
statistical approach and a deterministic approach [20]. The GMDH sorting methods 
carry out both of them. The statistical approach makes use of the Multilayer 
Theory of Statistical Decisions (MTSD) for the distribution of the probability 
of pairwise random events [16]. 

The deterministic approach involves the hypothesis of compactness. The criteria 
of the minimum of empirical risk are computed for a learning sample (the 
internal criterion) or for a separate verification sample (the external 
criterion). The problem of recognition of new images is the most urgent for the 
recognition theory. 

The statistical approach to pattern recognition solves the problem of selecting 
the decision rule under the condition of error minimization both for the 
learning sample and for the sample of new images, while it is a priori assumed 
that the learning sample is sufficient and the noise variance is low. Thus, to 
minimize the error on new data we should know something about them. The 
deterministic approach also allows one to minimize the error on new data but it 
requires different a priori information. Availability of the information 
determines the choice of an approach [21]. 

8.1. Pattern Recognition for Continuous Variables 

In this case, a priori information about the noise variance can be taken into 
account. Polynomial GMDH algorithms can be used to find the most accurate 
simplified decision rule corresponding to the nonphysical model for each level 
of variance as well as for large variation of the length of the image learning 
sample. The rule will be optimum for each level of noise variance, i.e., for 
each state of the recognition object. The main advantage of nonphysical decision 
rules is that, when obtained for a sample with a certain noise variance, they 
also remain optimum for the next sample if the noise level does not change. 

Continuous variables carry information about noise, which is lost in 
discretization of variables into a small number of levels. Removing thresholds 
to return to continuous measurement of variables and considering continuous 
functions of the membership measure of each image in a particular pattern allow 
one to utilize this advantage. 

8.2. Pattern Recognition for Discrete Variables 

For this problem a priori information about the availability of physical 
clustering both for the learning sample and for the new data sample can be 
considered. The output variable indicating the class is usually discretized and, 
thus, an accurate nonphysical decision rule cannot be found by GMDH algorithms. 
We have to use a less accurate physical decision rule (model) that is not 
simplified as noise variance increases. This rule can be found by many methods, 
e.g., by sorting variants according to the criterion of balance of clusterings [21]. 

This method is convenient when a perceptron or a neural-net recognition system 
is used to solve the problem of image classification. The flow of images 
arriving at the input of the perceptron can be divided into a number of samples 
containing equal numbers of images. A hierarchical tree of clusterings is 
constructed for each sample. By comparing step by step clusterings of two 
neighboring trees during the construction, according to the criterion of balance 
of clusterings, we can find the general or physical clustering. Classification 
of images is possible only if the physical clustering is available for all image 
samples. For self-organization of the perceptron designed for the classification 
of images, it is preferable that the number of internal A-elements be selected 
equal to the number of clusters and the coefficients of connections with S-elements 
be assumed to be equal to the coordinates of the middle points (centers) of the 
clusters. This adjustment minimizes the recognition error for all samples, 
including the sample of new data. 

8.3. Recognition and Diagnostics on the Basis of the Multilayered Theory of 
Statistical Decisions 

As noted above, the construction of sorting GMDH algorithms may use different 
support functions, including polynomials, harmonic, or logistic functions. The 
Bayes formulas or the Wald theories of 

statistical decisions are also applied.[1] 

As the algorithm was developed, we found that computations of a posteriori 
probability can be reduced dramatically if we assume the achievement of two 
defined levels by two variables instead of the achievement of one level by one 
variable to be a single accidental event [16]. The probabilistic sorting GMDH 
algorithms with dual accidental events are used for medical diagnosis and to 
evaluate the treatment method prescribed by a doctor. 

8.4. Neural Nets with Passive Binary or Logistic Neurons (Perceptrons) 

The process of optimizing a set of input variables is not included in passive 
neurons. It is established in the complex process of self-organization of the 
entire system of many neurons as a whole. The polynomial decision rule is 
replaced by a piecewise linear rule as a result. 

Perceptrons do not involve structural optimization according to the external 
criterion and optimum non- physical models. Only parametric optimization by back 
propagation is carried out. Perceptrons are almost as good as GMDH algorithms in 
accuracy if the learning sample is long enough, noise variance is low, and the 
sample contains variables discretized into a small number of levels, i.e., in 
the cases where the physical model is optimum. The GMDH algorithms gain the 
advantage in accuracy for short samples of continuous noisy variables, i.e., 
when the nonphysical model is optimum. The advantages of perceptrons and of 
polynomial GMDH algorithms are combined in neural nets with active neurons [22]. 

9. The Problem of Self-Organization of Multilayered Neural Nets With Active 
Neurons 

The GMDH algorithms described above produce a high accuracy of solving the 
problems in the specified regression domain. The domain can be extended by 
computing additional regressors. The extension of a set of regressors by 
computing pair products (covariances) or inverse values of variables is a well-known 
practice. A problem arises here of selecting simple nonlinear transformations of 
input variables to get new, generalized features. Computations have shown that 
the extension of a set of variables by introducing output variables found by 
GMDH algorithms can be very effective [22,23]. 

To implement this idea, we form a neural net, which is a collection of active 
neurons, each following a GMDH algorithm. The number of neurons in each layer of 
the net equals the number of variables defined in the sample. Each neuron 
forecasts one of the variables. Output variables in each layer are added to the 
set of variables in the next layer. The build-up of the layers continues until 
it ceases to improve the accuracy of approximation or that of the forecasting of 
the output variable we are interested in. 

One variable may require several layers while another may need no net at all. 
Considering the limited capabilities of computers, we extend the set of 
variables in each layer of the net and narrow it at the same time by discarding 
the variables that proved to be ineffective in the previous layers. Ordinary 
GMDH algorithms select a complete set only once. In the neural net, the 
selection takes place in each layer by introducing the extension-narrowing 
procedure for the set of variables. 

The neural net is designed to solve the same problem that is solved by its 
neurons: if GMDH algorithms are used, the neural net solves the problem of 
approximation or forecasting. If perceptrons or other recognition systems are 
used as active neurons, the neural net as a whole solves the same problem of 
pattern recognition but does it more accurately than the neurons it consists of. 
The forecasting neural net is a means for increasing the lead time of the 
stepwise forecasts. This time is equal to the number of steps at which the 
variation criterion meets the condition d2 < 1.0 [8]. 

Neural nets with active neurons are twice-multilayered since multilayered 
neurons are connected into a multilayered net. The construction of a neural net 
at each forecasting step increases the accuracy of each step and thus extends 
the lead time. Roughly, we have 

, 

where is the lead time, is the variable computing step, and is the value of the 
forecast variation criterion. 

The advantage of the neural net with active neurons over an ordinary neural net 
with binary neurons is that self-organization of the net is simplified as each 
neuron finds the required connections in the course of its self-organization by 
itself. Thus, one of the difficult problems in the theory of designing an 
artificial intelligence is solved. 

10. The Problem of Normative Process Forecasting by Simplified Linear 
Programming And Specifying GMDH Algorithm Limitations 

Specifying limitations of the optimization domain is the main difficulty in the 
practical application of linear programming. As described in [24], this problem 
can be resolved by GMDH algorithms. The result is a set of tools that can be 
recommended as a basic one for normative forecasting (“If-Then”) of processes 
and for computer control of open-loop objects [24,25]. Particularly promising is 
the algorithm of simplified linear programming where the number of controlled 
input variables is selected equal to the number of optimized output variables. 
There is no need to sort vertices of the optimization domain and to use the 
simplex method since the optimum mode is unique. 

10.1. Normative Vector Forecasting of Results of Physical Experiments 

The problem is to find values of a set of variables after an experiment when 
values of the set of variables measured before the experiment are known. The 
multidimensional function required to solve this problem is defined as a sample 
of observations of a small number of experiments [23]. The sample is processed 
following the algorithm of simplified linear programming with specification of 
limitations by the GMDH. 

10.2. Normative Vector Forecasting for a Macroeconomic System [25] 

The variables that characterize the system are divided into three sets as 
follows: 

(a) controllable variables that may be set (taxes, discount rates, deposit rates, 
issue of money, etc.), 

(b) variables that should be optimized (national income, consumption of 
resources, etc.), and 

(c) external disturbances (world oil prices, etc.). 

According to the law of necessary variety, the number of input variables (a) 
should be no less than the number of output variables (b). The law is satisfied 
when the simplified linear programming algorithm is applied [23,25]. The 
technique involves a number of attempts or scenarios. When we set too favorable 
results of a forecast, we obtain impracticable relations for output variables. 
The best relation of controlling variables is taken from previous history as an 
optimum example. 

10.3. The Problem of Supporting Decisions in Systems of Automatic Control of 
Open-Loop Objects 

The optimization interval is defined as a length of time wherein values of 
controlled output variables (resources) are defined and for which optimum values 
of controlling actions should be found. For macroeconomic systems, the 
optimization interval may be a quarter of a year or a year. If the period of 
natural oscillations of a system is many times greater than the interval 
assigned for optimization, it is useful to consider only the problem of the 
object behavior within this interval, i.e., to regard the object as an open-loop 
one or, to be more exact, a quasi-open-loop one. The problem of control is 
reduced to computing the vector of controlling variables one or more steps ahead. 

To control open-loop objects, we can apply algorithms of mathematical 
programming using procedures of stepwise forecasting. Computerized automatic 
control of open-loop objects includes two stages. The first stage calls for a 
short-term forecast of external disturbances one time step ahead while the 
second stage involves computation of optimum values of controlling variables by 
the simplified algorithm of linear programming using the GMDH to specify the 
limitations. The control should be multidimensional or vector-type and should 
satisfy the law of necessary variety, i.e., the number of controlling variables 
should be equal to or not less than the number of variables to be optimized. A 
long-term stepwise (or harmonic) forecast of disturbances is required to 
optimize operation of an object for a long time to come. 

11. Forecasting of Processes Without Models Using Analogues Complexing 

The state of a complex object can be described by a vector of characteristic 
variables represented in the sample of observation data. To make the forecast, 
it is sufficient to find one or more close analogues of its state in previous 
history and to see what happened next. The model of the object is no longer 
needed since its role is played by the object itself. The forecasting according 
to analogues from previous history was developed in meteorology for weather 
forecasting. The modified analogue complexing algorithm involves sorting 
optimization of discrete values of the following parameters: (a) variants of the 
set of input variables, (b) the number of moments of time considered in the 
analogues, (c) the number of analogues to be complexed, and (d) values of weight 
coefficients with which the analogues are complexed, i.e., summed up [26]. 

The interconnected parameters (b) and (c) have to be sorted in the plane of two 
parameters, which increases the number of computations. The sorting of the 
forecasting algorithm parameters is done once for objects of each new type. 

References 

1. Ivakhnenko, A.G. and Stepashko, V.S., Pomekhoustoichivost' Modelirovaniya (Noise 
Immunity of Modeling), Kiev: Naukova Dumka, 1985. 

2. M?ller, J.A. and Ivakhnenko, A.G., Selbstorganisation von Vorhersagemodellen, 
Berlin: VEB Technik, 1984. 

3. Ivakhnenko, A.G. and Yurachkovskij, Yu.P., Modelirovanie Slozhnykh Sistem po 
Eksperimental'nym Dannym (Modeling of Complex Systems from Experimental Data), 
Moscow: Radio i Svyaz’, 1987. 

4. Ivakhnenko, A.G., Nepreryvnost' i Diskretnost' (Continuity and Discreteness), 
Kiev: Naukova Dumka, 1990. 

5. Self-Organizing Methods in Modeling, Statistics: Textbooks and Monographs, 
Farlow, S.J., Ed., New York: Marcel Dekker Inc., 1984, vol. 54. 

6. Madala, H.R. and Ivakhnenko, A.G., Inductive Learning Algorithms for Complex 
Systems Modeling, Boca Raton: CRC Inc., 1994. 

7. Sawaragi, Y., Soeda, T., Tamura, H, et al., Statistical Prediction of Air 
Pollution Levels Using Non-Physical Models, Automatica (IFAC), 1979, vol. 15, no. 
4, pp. 441-452. 

8. Belogurov, V.P., A Criterion of Model Suitability for Forecasting 
Quantitative Processes, Sov. J. Automation Int. Sci., 1990, vol. 23, no. 3, pp. 
21-25. 

9. Ivakhnenko, A.G. and M?ller, J.A., Parametric and Non-parametric Selection 
Procedures in Experimental Systems Analysis, Systems Analysis, Modeling, and 
Simulation, 1992, vol. 9, pp. 157-175. 

10. Ivakhnenko, A.G., Dolgosrochnoe Prognozirovanie i Upravlenie Slozhnymi 
Sistemami (Long-Term Forecasting and Control of Complex Systems), Kiev: Tekhnika, 
1975. 

11. Ivakhnenko, A.G. and Osipenko, V.V. Algorithms of Transformation of 
Probability Characteristics into Deterministic Forecast, Sov. J. Automation and 
lnf. Sci. 1982, vol. 15, no. 2, pp. 7-15. 

12. Candel, M., Vremennye Ryady (Time Series), Moscow: Finansy i Statistika, 
1981. 

13. Ivakhnenko, A.G., Peka, P.Yu., and Vostrov, N.N., Kombinirovannyj Meted 
Modelirovaniya Vodnykh i Neftyanykh Polei (Combined Method of Modeling Water and 
Oil Fields), Kiev: Naukova Dumka, 1984. 

14. Ivakhnenko, A.G. and M?ller, J.A., Problems of Computer Clustering of the 
Data Sampling of Objects under Study, Sov. J. Automation lnf. Sci., 1991, vol. 
24, no. 1, pp. 58-67. 

15. Ivakhnenko, A.G., Ivakhnenko, G.A., and M?ller, J.A., Self-Organization of 
Optimum Physical Clustering of the Data Sample for Weakened Description and 
Forecasting of Fuzzy Objects, Pattern Recognition and Image Analysis, 1993, vol. 
3, no. 4, pp. 415-422. 

16. Ivakhnenko, A.G., Petukhova, S.A., et al., Objective Choice of Optimal 
Clustering of Data Sampling under Nonrobust Random Disturbances Compensation, 
Sov. J. Automation lnf. Sci., 1993, vol. 26, no. 4, pp. 58-65. 

17. Zholnarskij, A.A., Agglomerative Cluster Analysis Procedures for 
Multidimensional Objects: A Test for Convergence. Pattern Recognition and Image 
Analysis, 1992, vol. 2, no. 4, pp. 388-390. 

18. Vogel, L., Owens, A., and Walsh, M., lskusstvennyi Intellekt i 
Evolyutsionnoe Modelirovanie (Artificial Intelligence and Evolution Modeling), 
Moscow: Nauka, 1969. 

19. Bukatova, l.L., Mikhasov.Yu. l., and Sharov, A.M., Evoinformatika: Teoriya i 
Praktika Evolyutsionnogo Modelirovaniya (Evoinformatics: Theory and Practice of 
Evolution Modeling), Moscow: Nauka, 1991. 

20. Vapnik, V.N. and Chervonenkis, A.Ya., Raspoulavanie Obrazov, Statisticheskie 
Osnovy Teorii (Pattern Recognition: Statistical Basis of the Theory), Moscow: 
Nauka, 1974. 

21. Ivakhnenko, A.G. and Ivakhnenko, G.A., Perceptron Synthesis according to 
Clusterization-Balance Criterion. Pattern Recognition and Image Analysis, 1995, 
vol. 5, no. 3, pp. 337-341. 

22. Ivakhnenko, A.G., Ivakhnenko, G.A., and M?ller, J.A., Self-Organization of 
Neural Nets with Active Neurons, Pattern Recognition and Image Analysis, 1994, 
vol. 4, no. 2. pp. 185-196. 

23. Ivakhnenko,G.A. Self-Organization of Neuronet with Active Neurons for 
Effects of Nuclear Test Explosions Forecastings. System Analysis Modeling 
Simulation (SAMS), 1995, vol.20, pp.107-116. 

24.Triseev, Yu.P., Approaches to the Solution of Mathematical Programming 
Problems on the Basis of Heuristic Self-Organization, Sov. J. Automation lnf. 
Sci., 1987, vol. 20, no. 2, pp. 30-37. 

25. Ivakhnenko, A.G. and Ivakhnenko, G.A., Simplified Linear Programming 
Algorithm as Basic Tool for Open-Loop Control, Systems Analysis, Modeling, and 
Simulation, (SAMS), 1995, vol.18-19, pp.315-319. 

26. Ivakhnenko, A.G., An Inductive Sorting Method for the Forecasting of 
Multidimensional Random Processes and Events with the Help of Analogs Forecast 
Complexing, Pattern Recognition and Image Analysis, 1991, vol. 1, no.1, pp.99-108. 

Aleksey G. Ivakhnenko. Born 1913. Member-Correspondent of National Academy of 
Sciences of Ukraina. Professor. Doctor of technical sciences. Graduated from the 
Leningrad Institute of Electric Engineering in 1938. In 1943 he became a 
candidate in electrical sciences at the Physics Institute of Moscow University, 
and in 1944 he joined the Ukrainian Academy of Sciences in Kiev. He wrote his 
doctoral dissertation on invariance theory in 1953. In 1944 he joined the Kiev 
Polytechnic Institute as Professor in the Department of Electrical Engineering. 
Since 1964 was Head of the Combined Control Systems Department of the Glushkov 
Institute of Cybernetics of the Academy of Sciences of Ukraina. Since 1995 he is 
Directors Advisor of this organization. He was chief editor of the journal 
Avtomatika (Soviet Journal of Automation and Information Sciences). He is author 
of about 15 monographs and over 300 research papers, specializing in the 
modeling and pattern recognition of complex systems. About 220 candidates 
dissertations (from them 27 defend doctor degree) was prepared. Author of the 
Group Method of Data Handling (GMDH), which is widely used in modeling. 
Scientific interests: inductive sorting methods of modeling for forecasting 
random processes in fuzzy systems of ecology, biology, medicine, and economics. 

Gregory A. Ivakhnenko. Born 1966. Graduated from the National Technical 
University (KPI) in 1989. Researcher at the Glushkov Institute of Cybernetics. 
Scientific interests: modeling of complex objects, pattern recognition, and 
diagnostic methods of artificial intelligence. Author of 20 papers. 


[1] Recall the difference: the former involve multiplication of specific 
probabilities of events, whereas the latter perform summation of these 
probabilities. Formulas are used in each sorting series according to the MTSD 


A.G.Ivakhnenko, G.A.Ivakhnenko "The Review of Problems Solvable by Algorithms of 
the Group Method of Data Handling (GMDH)", Pattern Recognition and Image Analysis, Vol. 5, No. 4, 1995, pp.527-535 

Èñòî÷íèê: http://www.gmdh.net/articles/index.html

Ïååðâîä äàííîé ñòàòüÿ íà ðóññêèé ÿçûê