Ivakhnenko A G Muller J Recent Developments of Self-Organising Modeling in Prediction and Analysis of Stock Market

Ivakhnenko A. G., Muller J. "Recent Developments of Self-Organising Modeling in Prediction and Analysis of Stock Market"
Источник:http://www.gmdh.net/articles/index.html
Recent Developments of Self-Organising Modeling in Prediction and Analysis of Stock Market Ivakhnenko, A.G. Glushkov Institute of Cybernetics, Ukraina, Kyiv 34, PO Box 298-9, e-mail: Gai@gmdh.kiev.ua http://come.to/GMDH Muller, J.-A. Fachbereich Informatik/Mathematik, Hochschule fur Technik und Wirtschaft D 01069 Dresden, F.-List-Platz 1, Germany, e-mail: Muellerj@informatik.htw-dresden.de Review Abstract: At present, GMDH algorithms give us the only way to get accurate identification and forecasts of different complex processes in the case of noised and short input sampling. In distinction to neural networks, the results are explicit mathematical models, obtained in a relative short time. For ill-defined objects with very big noises better results should be obtained by analogues complexing methods. Neural nets with active neurones should be applied to rise up accuracy of complex objects modelling algorithms. 1. Introduction Problems of complex objects modelling (functions approximation and extrapolation, identification, pattern recognition, forecasting of random processes and events) can be solved in general by deductive logical-mathematical or by inductive sorting-out methods. Deductive methods have advantages in the cases of rather simple modelling problems, when the theory of the object being modelled is known and therefore it is possible to develop a model from physically based principles employing the users knowledge of the process. Decision making in such areas as process analysis in macroeconomy, financial forecasting, company solvency analysis and another requires tools, which are able to get accurate models on basis of processes forecasting. However, arise problems that connected with large amount of variables, very small number of observations and unknown dynamical between these variables. Such financial objects are complex ill-defined systems that can be characterised by: · inadequate a priori information; · great number of immeasurable variables; · noisy and extremely short data samples; · ill-defined objects with fuzzy characteristics. Problems of complex objects modelling such as analysis and prediction of stock market and other, cannot be solved by deductive logical-mathematical methods with needed accuracy. In this case knowledge extraction from data, i.e. to derive a model from experimental measurements, has advantages in cases of rather complex objects, being only little a priori knowledge or no definite theory particularly for objects with fuzzy characteristics on hand. This is especially true for objects with fuzzy characteristics. The task of knowledge extraction from data is to select mathematical description from data. But the required knowledge for designing of mathematical models or architecture of neural networks is not at the command of the users. In mathematical statistics it is need to have a priori information about the structure of the mathematical model. In neural networks the user estimates this structure by choosing the number of layers and the number and transfer functions of nodes of a neural network. This requires not only knowledge about the theory of neural networks, but also knowledge of the object nature and time. Besides this the knowledge from systems theory about the systems modelled is not applicable without transformation in neural network world. But the rules of translation are usually unknown. GMDH type neural networks can overcome these problems - it can pick out knowledge about object directly from data sampling. The Group Method of Data Handling (GMDH) is the inductive sorting-out method, which has advantages in the cases of rather complex objects, having no definite theory, particularly for the objects with fuzzy characteristics. GMDH algorithms found the only optimal model using full sorting-out of model-candidates and operation of evaluation of them, by external criteria of accuracy or difference types [1,2]. 2. Group Method of Data Handling (GMDH) 2.1. Brief description The Group Method of Data Handling (GMDH) is self-organizing approach based on sorting-out of gradually complicated models and evaluation of them by external criterion on separate part of data sample. As input variables can be used any parameters, which can influence on the process. Computer is found structure of model and measures of selected parameters significance itself. That model is better that leads to the minimal value of external criterion. This inductive approach is different from commonly used deductive techniques or neural networks. The GMDH was developed for complex systems modelling, prediction, identification and approximation of multivariate processes, decision support after "what-if" scenario, diagnostics, pattern recognition and clusterization of data sample. It was proved, that for inaccurate, noisy or small data can be found best optimal simplified model, accuracy of which is higher and structure is simpler than structure of usual full physical model. There were defended more than 230 dissertations and published many papers and books devoted to GMDH theory and its applications. There are developed methods of mathematical induction for the solution of comparatively simple problems. GMDH can be considered as further propagation of inductive self-organising methods to the solution of more complex practical problems. It solves the problem of how to handle data samples of observations. The goal is to get mathematical model of the object (the problem of identification and pattern recognition) or to describe the processes, which will take place at object in the future (the problem of process forecasting). GMDH solves, by sorting-out procedure, the multidimensional problem of model optimization: , (1) where: G - set of considered models; CR is an external criterion of model g quality from this set; P - number of variables set; S - model complexity; x2 - noise dispersion; T - number of data sample transformation; V - type number of reference function. For definite reference function, each set of variables corresponds to definite model structure P = S. Problem transforms to much simpler one-dimensional , when x2= const, T = const, and V = const. Method is based on the sorting-out procedure, i.e. consequent testing of models, chosen from set of models-candidates in accordance with the given criterion. Most of GMDH algorithms use the polynomial reference functions. General connection between input and output variables can be expressed by Volterra functional series, discrete analogue of which is Kolmogorov-Gabor polynomial [1]: , where - input variables vector; - vector of coefficients or weights. Components of the input vector X can be independent variables, functional forms or finite difference terms. Other non-linear reference functions, such as difference, logistic, harmonic can also be used for model construction. The method allows to find simultaneously the structure of model and the dependence of modelled system output on the values of most significant inputs of the system. The GMDH theory solve the problems of: - long-term forecasting [3,18]; - short-term forecasting of processes and events [2]; - identification of physical regularities; - approximation of multivariate processes; - physical fields extrapolation [4]; - data samplings clusterization [5]; - pattern recognition in the case of continuous-valued or discrete variables; - diagnostics and recognition by probabilistic sorting-out algorithms [6]; - vector process normative forecasting [7]; - modeless processes forecasting using analogues complexing [8]; - self-organization of twice-multilayered neuronet with active neurones [9,10]. In [12] were obtained the theoretical grounds of GMDH effectiveness as adequate method of robust forecasting models construction. Essence of it consists of automatically generation of models in given class by sequential selection of the best of them by criteria, which implicitly by sample dividing take into account the level of indeterminacy. Since 1967 a big number of GMDH technique implementations for modelling of economic, ecological, environmental, medical, physical and military objects were done in several countries. Some outdated approaches are used in USA by Ward Systems Group, Inc. in "NeuroShell2", AbTech Corp. "ModelQuest", Barron Associates Co. "ASPN", and DeltaDesign Berlin Software "KnowledgeMiner" commercial software tools. Self-organising modelling is based on statistical learning networks, which are networks of mathematical functions that capture complex, non-linear relationships in a compact and rapidly executable form. Such networks subdivide a problem into manageable pieces or nodes and then automatically apply advanced regression techniques to solve each of these simpler problems. 2.2. The "GMDH algorithms" and "algorithms of GMDH type" It's necessary to make difference between the original "GMDH algorithms" and the "algorithms of GMDH type" [11]. The first ones - work using the minimum of an external criterion (Fig.1) and therefore realise objective choice of optimal model. This original GMDH technique is based on inductive approach: optimal models are founded by sorting-out of possible variants and evaluated by external criterion. It is calculated on separate part of data sample, which is not used for model creation. That model is better which leads to minimal value of criterion. To make objective choice, selection is done without thresholds or coefficients in criterion. We recommend to calculate criteria two times: first to find the best models at each layer of selection for structure identification and second time to find the optimal model. Selection procedure is stopped when minimal criterion value is reached. Second is GMDH type algorithms - work on characteristic, expressed by words: "more complex is the model - more accurate it is". For it necessary to put definite threshold or to point out coefficients of weight for the members of the internal criterion formula, to find optimal model out in a some subjective way. But real problems usually are presented by short or noised data samples. Unfortunately, in almost all GMDH type software (ModelQuest, NeuroShell) and research works in USA and Japan this deductive approach is used, which is not effective for such kind of data. The inductive approach does not eliminate the experts or take them away from the computer, but rather assigns them a special position. Experts indicate the selection criterion of a very general form and interpret the chosen models. They can influence the result of modelling by formulating new criteria. Computer becomes an objective referee for scientific controversies, if criteria ensemble is coordinated between experts, which take part in discussion. Fig.1. External accuracy criterion minima values plotted against complexity of model structure S for different noise variance x2. LCM - locus of criterion minima line; --- - model choice by criterion minimum. The human element often involves errors and undesired decisions. Objective choice of optimal model by minimum of external criterion characteristic in actual GMDH algorithms often contradicts with the opinion of investigator. Objective algorithms give possibility to realise real artificial intelligence. 2.3. Special GMDH peculiarities The main peculiarity of GMDH algorithms is that, when it uses continuous data with noise, it selects as optimal the simplified non-physical model. Only for accurate and discrete data the algorithms point out so-called physical model - the most simple optimal, from all unbiased models. It is proved the convergence of multilayered GMDH algorithms [25] and it is proved that shortened non-physical model is better than full physical model (for noisy and continuous data for prediction and approximation solving, more simplified Shannon’s non-physical models become more accurate [12]). It can be noted, that this conclusion has place in model selection on the basis of model entropy maximisation (Akaike approach), in average risk minimising (Vapnik approach) and in another modern approaches. The only way to get non-physical models is to use sorting-out GMDH algorithms. Regularity of optimal structure of forecasting models change in dependence on general indexes of data indeterminacy (noise level, data sample length, design of experiment, number of informational variables) was shown in [24,25,27]. The special peculiarities of GMDH are following: 1) External supplement: Following S.Beer work [13], only the external criteria, calculated on new independent information, can produce the minimum of sorting-out characteristic. Because of this data sampling is divided into parts for model construction and evaluation. 2) Freedom of choice: Following D.Gabor work [14], in multilayered GMDH algorithms are to be conveyed from one layer to the next layer not one but F best results of choice to provide "freedom of choice"; 3) The rule of layers complication: Partial descriptions (forms of a mathematical description for iteration) should be simple, without quadratic members in them; 4) Additional model definition: In cases, when the choice of optimal physical model is difficult, because of noise level or oscillations of criterion minima characteristic, auxiliary discriminating criterion is used [15]. The choice of the main criterion and constrains of sorting-out procedure is the main heuristic of GMDH; 5) All algorithms have multilayered structure and parallel computing can be implemented for their realisation; 6) All questions that arise about type of algorithm, criterion, variables set etc. should be solved by minimal criterion value. The main criteria used are: cross-validation PRR(s), regularity AR(s) and balance of variables criterion BL(s). Estimation of their effectiveness (investigation of noise immunity, optimality and adequateness) and their comparison with another criteria was done in detail in [24,25,26,15]. The conditions, under which GMDH algorithm produces the minimum of characteristics are following: a) criterion of model choice is to be external, based on additional fresh information, which was not used for model construction; b) the data sample is not to be too long. Such data sample produce the same form of characteristic as the exact data sample without noises; c) when difference type balance criterion BL(s) is used, small noise is necessary or the variables in the data sample should not be exactly measured [16]. Difference of the GMDH algorithms from another algorithms of structural identification, genetic and best regression selection algorithms consists of three main peculiarities: ? usage of external criteria, which are based on data sample dividing and are adequate to problem of forecasting models construction, by decreasing of requirements to volume of initial information; ? much more diversity of structure generators: usage like in regression algorithms of the ways of full or reduced sorting of structure variants and of original multilayered (iteration) procedures; ? better level of automatization: there are needed to enter initial data sample and type of external criterion only; ? automatic adaptation of optimal model complexity and external criteria to level of noises or statistical violations – effect of noiseimmunity cause robustness of the approach; ? implementation of principle of inconclusive decisions in process of gradual models complication. 2.4. Spectrum of GMDH algorithms Solution of practical problems and GMDH theory design lead to development of broad spectrum of software algorithms. Each of them corresponds to some definite conditions of it application [17]. Algorithms mainly differ one from another by the models-candidates set generator arrangement for given basic function, by the way of models structure complexing and, at last, by the external criteria accepted. Algorithm choice depends on specifics of the problem, noise dispersion level, sufficiency of data sample, and if data sample is continuous-valued only. Table 1. Spectrum of GMDH algorithms GMDH algorithms Variables Parametric Non-parametric - Combinatorial (COMBI) - Objective Computer - Multilayered Iterational (MIA) Clusterization (OCC); Continuous - Objective System Analysis (OSA) - "Pointing Finger" (PF) - Harmonical clusterization algorithm; - Two-level (ARIMAD) - Analogues Complexing (AC) - Multiplicative-Additive Discrete and binary - Harmonical Rediscretization - Algorithm on the base of Multilayered Theory of Statistical Decisions (MTSD) Most often criteria of accuracy, differential or informative type are used. The work of GMDH algorithms has a straightforward analogy with the work of gardener during selection of a new hybrid plant [11]. The basic parametric GMDH algorithms listed in table 1 have been developed for continuous variables. Among the parametric algorithms [1,9] the most known are: - the basic is Combinatorial (COMBI) algorithm. It is based on full or reduced sorting-out of gradually complicated models and evaluation of them by external criterion on separate part of data sample; - Multilayered Iteration (MIA) algorithm use at each layer of sorting procedure the same partial description (iteration rule). It should be used when it is needed to handle a big number of variables; - Objective System Analysis (OSA) algorithm. The key feature of it is that it examines not single equations, but systems of algebraic or difference equations, obtained by implicit templates (without goal function). An advantage of the algorithm is that the information embedded in the data sample is utilised better and we get relationships between variables; - Two-level (ARIMAD) algorithm for modelling of long-term cyclic processes (such as stock or weather). There are used systems of polynomial or difference equations for identification of models on two time scales and then choice of the best pair of models by external criterion value. For this can be used any parametric algorithm from described above [23]. Also less known parametric algorithms, which apply an exhaustive search to difference, harmonic or harmonic-exponential functions, and the Multiplicative-Additive algorithm, in which tested polynomial models are obtained by taking the logarithm of the product of input variables [18,19]. The parametric GMDH algorithms have proved to be highly efficient in cases where one is to model objects with non-fuzzy characteristics, such as engineering objects. In cases, where modelling involves objects with fuzzy characteristics, it is more efficient to use the non-parametric GMDH algorithms, in which polynomial models are replaced by a data sample divided into intervals or clusters. Such type algorithms completely solve the problem of coefficients estimates bias elimination. Non-parametric algorithms are exemplified by: - Objective Computer Clusterization (OCC) algorithm that operates with pairs of closely spaced sample points [5]. It finds physical clusterization that would as possible be the same on two subsamples; - "Pointing Finger" (PF) algorithm for the search of physical clusterization. It is implemented by construction of two hierarchical clustering trees and estimation by the balance criterion [20]; - Analogues Complexing (AC) algorithm, which use the set of analogues instead of models and clusterizations [8]. It is recommended for the most fuzzy objects; - algorithm, based on the Multilayered Theory of Statistical Decisions [6]. It is recommended for recognition of binary objects and for the variability of input data control to avoid the possible experts’ errors in it. Recent developments of the GMDH have led to neuronets with active neurons, which realise twice-multilayered structure: neurons are multilayered and they are connected into multilayered structure. This gives possibility to optimise the set of input variables at each layer, while the accuracy increases. The accuracy of forecasting, approximation or pattern recognition can be increased beyond the limits, which are reached by neuronet with single neurons, or by usual statistical methods [9,10,34]. In this approach, which corresponds to the actions of human nervous system, the connections between several neurons are not fixed but change depending on the neurons themselves. Such active neurons are able during the learning self-organising process to estimate which inputs are necessary to minimise the given objective function of neuron. This is possible on the condition that every neuron in its turn is multilayered unit, such as modelling GMDH algorithm. Neuronet with active neurons, which are described below, is considered as a tool to increase AI problems accuracy and lead-time with the help of regression area extension for inaccurate, noisy data or small data samples. The GMDH algorithms recently are applied in optimization to solve the problems of normative forecasting (after "what-if-then" scenario) and optimal control of multivariable ill-defined objects. Many ill-defined objects in macroeconomy, ecology, manufacturing etc. can be described accurately enough by static algebraic or by difference equations, which can be transformed into problems of linear programming by nomination of non-linear members by additional variables. GMDH algorithms are applied to evaluate deflections of output variables from their reference optimal values [7,21]. Examples of use of Simplified Linear Programming (SLP) algorithm should be used for expert computer advisor construction, normative forecasting and control optimization of averaged variables. An important example [10] gives the prediction of effects of experiments. The algorithm solves two problems: calculation of effects of a given experiment and calculation of parameters which are necessary to reach optimal results. It means, that the realisation of experiments can often be replaced by computer experiments. As already noted, considered GMDH algorithms have been developed for continuous variables. In practice, however the sample will often include variables discretized into a small number of levels or even binary values. To extend these GMDH algorithms to discretized or binary variables, the Harmonic Rediscretization algorithm has been developed [22]. The existence of a broad gamut of GMDH algorithms is traceable to the fact, that it is impossible to define the characteristics of the rest or controlled objects exactly in advance. Therefore, it can be good practice to try several GMDH algorithms one after another and to decide which one suits a given type of objects best. All the questions, which arise during modelling process, are to be solved by the comparison of the criterion values: that variant is better, which leads to more deeper minimum of basic external criteria. In this way, the type of algorithm is chosen objectively, according to the value of the discriminating criterion. Information about dispersion of noise level is very useful to decrease computer calculation time. For small dispersion level we can use the learning networks of GMDH type, based on the ordinary regression analysis using internal criteria. For considerable noise level the GMDH algorithms with external criteria are recommended. And for high level of noise dispersion non-parametric algorithms of clusterization or analogues complexing should be applied [8]. 2.4.1. The Combinatorial GMDH algorithm (COMBI) The flowchart of the algorithm is shown in Fig. 2. The input data sample is a matrix containing N levels (points) of observations over a set of M variables. The sample is divided into two parts. Approximately two-thirds of points make up the learning subsample NA, and the remaining one-third of points (e.g. every third point) with same variance form the check subsample NB. Before dividing, points are ranged by variation value. The learning sample is used to derive estimates for the coefficients of the polynomial, and the check subsample is used to choose the structure of the optimal model, that is, one for which the external regularity criterion AR(s) takes on a minimal value: (2) or better to use the cross-validation criterion PRR(s) (it takes into account all information in data sample and it can be computed without recalculating of system for each checking point): To test a model for compliance with the differential balance criterion, the input data sample is divided into two equal parts. The criterion requires to choose a model that would, as far as possible, be the same on both subsamples. The balance criterion will yield the only optimal physical model solely if the input data are noisy. To obtain a smooth exhaustive-search curve (Fig. 1), which would permit one to formulate the exhaustive-search termination rule, the exhaustive search is performed on models classed into groups of an equal complexity. For example, the first layer can use the information contained in every column of the sample; that is full search is applied to all possible models of the form: , . (3) Non-linear members can be taken as new input variables in data sampling. The output variable is specified there in advance by the experimenter. At next layer are sorted all models of the form: , (4) The models are evaluated for compliance with the criterion, and so on until the criterion value decrease. For limitation of calculation time recently it was proposed during full sorting of models to range variables according to criterion value after some time of calculation or after some layers of iteration. Then full sorting procedure continues for selected set of best variables till the minimal value of criterion will be found. This gives possibility to set much more input variables at input and to save effective variables between layers to found optimal model. A salient feature of the GMDH algorithms is that, when they are presented continuous or noisy input data, they will yield as optimal some simplified non-physical model. If is only in the case of discrete or exact data that the exhaustive search for compliance with the precision criterion will yield what is called a physical model, the simplest of all unbiased models. With noisy and continuous input data, simplified (Shannon) models prove more precise [12,25] in approximation and for forecasting tasks. 2 3 4 5 Fig. 2. Combinatorial GMDH algorithm. 1 - data sampling; 2 - layers of partial descriptions complexing; 3 - form of partial descriptions; 4 - choice of optimal models; 5 - additional model definition by discriminating criterion. . . . 2 3 4 5 Output model: Yk+1 = d0 + d1x1k + d2 x2k+ ... +dm xM k xM-1 k Fig.3 Multilayered Iterational algorithm: 1 - data sampling; 2 - layers of partial descriptions complexing; 3 - form of partial descriptions; 4 - choice of optimal models; 5 - additional model definition by discriminating criterion; F1 and F2 - number of variables for data sampling extension. Calculations are faster when following techniques are used [24,25]: a) in all formulae informational array WTW is used instead of data sampling array W=(XY); b) model’s parameters are estimated by recursion method of "framing" which allows to use arrays calculated on previous steps; c) faster generation of variables ensemble is done using Garsaid binary counter, where current ensemble is differ from previous in one digit only. 2.4.2. The Multilayered Iterative GMDH algorithm (MIA) As with the Combinatorial algorithm, the output variable must be specified in advance by the person in charge of modelling, which corresponds to the use of so-called explicit templates (Fig.4). In each layer, new output variables values, calculated by the F best models in each point are used to successively extend the data sample (Fig. 3). In Multilayered Iterative algorithm the iteration rule remains unchanged from one layer to next. As is shown in Fig. 3, the first layer tests models that can be derived from the information contained in any two columns of the sample. The second layer uses information from four columns; the third, from any eight columns, etc. The exhaustive-search termination rule is the same as for the Combinatorial algorithm: in each layer the optimal models are selected by the minimum of the criterion [16,25]. 2.4.3. The Objective System Analysis algorithm (OSA) In discrete mathematics, the term template refers to a graph indicating which of the delayed arguments are used in setting up conditional and normal Gauss equations. A gradual increase in the structural complexity of candidate models corresponds to an increase in the complexity of templates whose explicit (a) and implicit (b) forms are shown in Fig. 4. When one uses implicit templates, one has, beginning from the second layer of the exhaustive search, to solve a system of equations and to evaluate the model, using a system criterion. The system criterion is a convolution of the criteria calculated by the equations that make up the system (5) where s is the number of equations in the system. The flowchart of the OSA algorithm is shown in Fig. 5. The key feature of the algorithm is that it uses implicit templates, and an optimal model is therefore found as a system of algebraic or difference equations. An advantage of this algorithm is that the number of regressors is increased and in consequence, the information embedded in the data sample) is utilised better. A disadvantage is that it calls for a large amount of calculations in order to solve the system of equations and a greater number of candidate models have to be searched. The amount of search can be reduced, using a constraint in the form of an auxiliary precision criterion. Fig.4. Derivation of conditional equations on a data sample Fig. 5. Objective System Analysis (OSA) algorithm In setting up the system of equations, one then discards the poorly forecasting equation (using equation only) for which the variation accuracy criterion for the forecast is less than unity (narrowing operation): (6) where: - is the variable values in the table; - is the value calculated according to the model and is the mean value. This criterion is recommended in the literature in order to evaluate the success of an approximation or of a forecast [15]. With d2 < 0.5, the result of modelling is taken to be good; with 0.5 < d2 < 0.8 it is taken to be satisfactory; with d2 > 1.0, modelling is considered to have failed, and the model yields misinformation.) 2.5. Extended definition of the only optimal model by the theory of discriminating criteria It has been demonstrated theoretically and experimentally that the exhaustive-search curves shown in Fig. 1 are gradual and unimodal for the expected value of the criterion [25]. The number of candidate models tested in each exhaustive-search layer cannot be infinitely large. In other words, in constructing exhaustive search curves, the expected value of the criterion is in effect replaced by its mean (or least) value. Because of this, the curves take on a slightly wavy shape, and a small error may creep into the optimal model structure choice. The theory of discriminating criteria has been developed by Fedorov and Yurachkovsky [24] with special reference to experimental design. It has however proved its relevance to the self-organisation of models and active-neuron neural networks. The theory proceeds from the following premises: (I) there exists a "true" model represented in the data sample; (2) the assumed few object descriptions fit the model to a different degree; (3) the model that comes closest to the true model can be selected from its compliance with an auxiliary discriminating criterion. With such an approach, every GMDH algorithm consecutively uses two criteria. At first, an exhaustive search is applied to all candidate models for compliance with the main criterion, and a small number of models whose structure is close to optimal are selected. Then only one optimal model is selected that complies with a special discriminating criterion. The theory of optimal discriminating criteria is still in the developmental stage, but successful discriminating criteria are already known. In cases involving the selection of a structure for optimal polynomial models, the approximation or forecast variation criterion serves well. In the selection of optimal clusterization, good results are obtained with the symmetry criterion for the clusters distance matrix calculated relative to the secondary diagonal [21], etc. 3. Data analysis: neural networks versus self-organising modelling The table 2 gives a comparison of both methodologies: neural networks and self-organising modelling in connection with their application to data analysis. Table 2. Neural networks versus self-organising modelling. Neural networks Statistical learning networks Data analysis universal approximator structure identificator Analytical model indirect by approximation direct Architecture unbounded network structure; experimental selection of adequate architecture demands time and experience bounded network structure [1]; adaptively synthesised structure A-priori-Information without transformation in the world of neural networks not usable can be used directly to select the reference functions and criteria Self-organisation deductive, given number of layers and number of nodes (subjective choice) inductive, number of layers and of nodes estimated by minimum of external criterion (objective choice) Parameter estimation in a recursive way; demands long samples estimation on training set by means of maximum likelihood techniques, selection on testing set (extremely short ) Feature result depends from initial solution, time-consuming technique, necessary knowledge about the theory of neural networks existence of a model of optimal complexi-ty, not time-consuming technique, neces-sary knowledge about the task (criteria) and class of system (linear, non-linear) Results obtained by statistical learning networks and especially GMDH type algorithms are comparable with results obtained by neural networks [30]. In distinction to neural networks, the results of GMDH algorithms are explicit mathematical models obtained in a relative short time on the base of extremely short samples. The well-known problems of an optimal (subjective) choice of the neural network architecture are solved in the GMDH algorithms by means of an adaptive synthesis (objective choice) of the architecture. There are to estimate networks of the right size with a structure evolved during the estimation process to provide a parsimonious model for the particular desired function. Such algorithms combining the best features of neural nets and statistical techniques in a powerful way discover the entire model structure - in the form of a network of polynomial functions, difference equations and other. Models are selected automatically based on their ability to solve the task (approximation, identification, prediction, and classification). 4. Nets of active neurons 4.1. Self-organisation of twice-multilayered neural network A neural network is designed to handle a particular task. This may involve relation identification (approximation), pattern and situation recognition, or a forecast of random processes and repetitive events from information contained in a sample of observations over a test or control object. The present stage of computer technology allows a new approach in neural networks, which increases the accuracy of classical modelling algorithms. Such complex system can solve complex problems. We can use the GMDH algorithms as the complex neurons, where the self-organisation processes are well studied. Only by this inductive self-organising method for small, inaccurate or noisy data samples optimal non-physical model, accuracy of which is higher and structure is simpler than structure of usual full physical model can be found. GMDH algorithms are the examples of complex active neurons, because they choose the effective inputs and corresponding coefficients of them by themselves, in process of self-organisation. The problem of neuronet links structure self-organisation is solved in a rather simple way. Each neuron is an elementary system that handles the same task. The objective sought in combining many neurons into a network is to enhance the accuracy in achieving the assigned task through a better use of input data. As already noted, the function of active neurons can be performed by various recognition systems, notably by Rosenblatt's two-layer perceptrons - such neural network achieves the task of pattern recognition. In the self-organisation of a neural network, the exhaustive search is first applied to determine the number of neuron layers and the sets of input and output variables for each neuron. The minimum of the discriminating criterion suggests the variables for which it is advantageous to build a neural network and how many neuron layers should be used. Thus, the theory of neural network self-organisation is similar in many respects to that of each active neuron. Active neurons are able, during the self-organizing process, to estimate which inputs are necessary to minimise the given objective function of the neuron. In the neuronet with such neurons, we shall have twofold multilayered structure: neurons themselves are multilayered, and they will be united into a multilayered network. They can provide generation of new features of special type (the outputs of neurons from previous layer) and the choice of effective set of factors at each layer of neurons. The output variables of previous layers are very effective secondary inputs for the neurons of next layer. First layer of active neurons acts similar to Kalman filter: output set of variables repeated the input set but with filtration of noises. Number of active neurons in each layer is equal to number of variables given in initial data sampling. Neuronet structure is given in Fig. 6. Solely including the output calculated variables from each previous layer of neurons effects sample extension. The samples show the form of the discrete template used to teach the first neurons of a layer by the Combinatorial GMDH algorithm. In particular, when four input variables are used and two time delays are allowed for (t=2), the first template corresponds (to the following complete difference equation: The algorithm will suggest which of the proposed arguments should be taken into consideration and will help to estimate the connectivity coefficients. To begin with, we construct the first layer of neurons in the network. Then we will able to determine how accurate the forecast will be for all variables. For this purpose, we use a discrete template that allows a delay of one or two days for all variables. Then we add a second, a third, etc. layer to the neural network, as shown in Fig. 6, and go on doing so as long as this improves the forecast or decrease external criterion value. For each neuron, we have applied the extended definition procedure to one model (out of the five closest to the optimal one). For the optimal models, we have calculated the forecast variation criterion. It may be inferred, that there is no need to construct a neural network in order to form a forecast for those variables, for which variation criterion value takes on the least value in the first layer. It is advisable to use a neural network to form a forecast for the variables, for which the variation criterion takes on the least value in the last layers of neurons. The equations for the neurons of the network define the connections that must be implemented in the neural network; in this way they help achieve the task of structural self-organisation of the neural network. For brevity, the data sample in the above example is extended in only one way: tile output variables of the first layer are passed on as additional variables to the second, third, etc. layer of neurons. It is possible to compare different schemes of data sample extension by external criterion value. The task for self-organisation of such networks of active neurons by selection is to estimate the number of layers of active neurons and the set of possible potential inputs and outputs of every neuron. The sorting characteristic - ”number of neuronet layers - variables, given in data sample“ - defines the optimum number of layers for each variable separately. Neuronets with active neurons should be applied to raise the accuracy of short-term and long-term forecasts. Not only GMDH algorithms, but also many modelling or pattern recognition algorithms can be used as active neurons. Its accuracy can be increased in two ways: - each output of algorithm (active neuron) generate new variable which can be used as a new factor in next layers of neuronet; - the set of factors can be optimised at each layer. The factors (including new generated) can be ranked after their efficiency and several of the most efficient factors can be used as inputs for next layers of neurons. In usual once-multilayered ANN the set of input variables can be chosen once only. a) COMBI or b) OSA Fig 6. Schematic arrangement of the first two rows of a neural network. 4.2. The search termination rule In self-organisation, the layers of neurons are extended as long as this improves the accuracy of the solution yielded by the neural network. This will be demonstrated later with reference to a relevant example. 4.3. Group allowance for arguments We will call as the exhaustive-search characteristic of a neural network the graph that relates the main precision criterion for a specified variable to the layer number. This characteristic is similar to that of the GMDH algorithms. To obtain a smooth and unimodal curve, the exhaustive-search characteristic is calculated for many tools in the sample, and the results are averaged. Theoretically, the exhaustive-search characteristic has been investigated for the expected value of the criterion [24]. In practice, the exhaustive-search curve has to be constructed not for the expected value and even not for the mean value of the criterion. Rather, it is constructed for the best results of the exhaustive-search applied to a group for which the criterion takes on the least value. This exhaustive-search termination rule holds only when many approximation or forecast results are average. 4.4. The selection of a discrete template What type of template to use depends on the task at hand (Fig. 4). In an approximation task, the template does not contain delayed arguments; in a forecast task, two or three delays have the be allowed for. In the former case, one obtains single-moment equations; in the latter, difference equations. 4.5. Extended definition of one optimal model for each neuron in a network Self-organisation of each neuron taken separately uses the differential balance criterion or the regularity precision criterion. As already noted, the exhaustive-search curve approaches its minimum in a gradual manner, and the criteria of models close to the optimal one differ only slightly from one another in value. This explains why one has to use an extended definition algorithm. This algorithm, instead of one, selects several of the best models. From them chosen only one that complies with another variation discriminating criterion. 4.6. Readout of modelling results Each layer in a neural network contains neurons, whose outputs correspond each to a particular specified variable: the output of the first neuron to the first variable, the output of the second neuron to the second variable, etc. Each column consists of neurons whose outputs correspond to one of the variables. From each column in turn, one neuron with a minimal variation criterion is selected. More specifically, one neuron having the best result is selected from the first column of neurons for which the output is the first variable; similarly, one neuron is selected from the second column of neurons for which the output is the second variable, etc. This selection procedure uniquely defines the number of layers for each variable and, thus, the structure of the neural network. 4.7. The exhaustive search of methods for data-sample extension and narrowing The principal method of data-sample extension is by including the output variables from the previous layer that have complied with the criterion best of all. It will also be a good plan to test against the criterion the advisability of sample extension by simple non-linear transformations of input variables. In the example that follows, three variables are involved. They are x1, x2, and x3. (a) The extension using the covariance of the variables (b) The extension using the reciprocals of the variables The reciprocals should above all be proposed for the variables that take a minus sign in the equation; that is, they reduce the value of the output. 4.8. Sample extension by consecutive elimination of the most efficient variables The diversity of the variables that come in for the exhaustive search (performed by each neuron) can further be increased by eliminating the most efficient variables, thus producing partial subsets. This can be best illustrated by an example. Let the input of a neural network accept a data sample containing just M=25 variables. Suppose further that we have used the OSA algorithm and found in the first neuron an optimal system of forecasting difference equations in the variables x2 x12 x13 x18 x22 . These variables are least "fuzzy" and lend themselves to forecasting by this system of equation. We eliminate from the sample the variables thus found and apply the OSA algorithm to a second neuron. This yields a second optimal system of equations in the variables x3 x9 x14 x32. As a result, the minimum of the criterion increases (because the second set contains other than the best variables) and shifts to the left (Fig. 1). Now we eliminate from the sample the nine variables thus found, and apply the OSA algorithm to a third neuron. This yields an optimal system of equations in only three variables x5 x6 x11. The minimum of the criterion goes up still more and again shifts to the left etc. This shift in the minimum of the system criterion bears out the adequacy law, which states that for more fuzzy systems the optimal description (model) must be likewise more fuzzy and simple; that is, it must have a smaller number of equations [24]. Computer experiments confirm the above form of exhaustive-search curve. In the above example, the number of variables used for decision-making is increased from 5 (in the first neuron) to 5 + 4 + 3 + 2 + 1 =15 (in five neurons). Ten features are discarded as inefficient. So, we shall have 5 x 15 = 75 neurons in each layer. 4.9. Simultaneous and successive algorithms for neural networks In a computer program, neurons can be implemented simultaneously or successively, using memory devices. 4.10. Neuronets self-organisation and algorithms for optimization of control systems The principal roadblock in the use of linear and non-linear programming algorithms for complex system optimization is that it is often impossible to specify either the goal function or the applicable constraints with sufficient accuracy. Meanwhile, even minute inaccuracies in their specification may have a strong impact on the outcome of optimization. Active-neuron networks can be readily combined with linear and non-linear programming algorithms. One of the output functions is taken as the objective function, the equations of the other output variables can serve as equality-type constraints. This removes the subjective factor from the specification of the goal function and constraints. The human operator defines criteria for their choice, and not the objective function and constraints themselves [21]. 6. Examples of applications. Besides the applications of commercial GMDH software there were a lot of implementations made in very different fields. Many of them are described in the ukrainian journal "Avtomatica" (translated in "Soviet Automatic Control", "Soviet Journal of Automation and Information Sciences" and then in "Journal of Automation and Information Sciences" in full size). The basic GMDH technique applications include the studies on: economical systems (analysis and forecasting of macroeconomy parameters, decision support and optimization), ecological systems analysis and prediction (forecasting of oil fields and river flow, harvest analysis and ionosphere state definition), environment systems analysis, medical diagnostics, demographic forecasting, weather modelling, econometric modelling and marketing, manufacturing, planning of physical experiments, materials estimation, multisensor signal processing, microprocessor-based hardware, eddy currents, x-ray, acoustic and seismic analysis and widely in military systems (radar, infrared, ultrasonic and acoustics emission, missile guidance). 6.1. Prediction of characteristics of stock market Currency, international stock trading and derivatives contracts play an increasing role for many investors. Commonly used a portfolio consisting of a number of contracts. Assets returns must be predicted and controlled by a prediction/control module. Control of risk via prediction/control module of individual investments returns inside the portfolio provides the most likely process. It is known that in most economic applications i.e. financial risk control, neural networks give success only of 70-80%. By means of the new approach of GMDH twice-multilayered neural networks it will be improved by 5-10%. Prediction accuracy for short and very noised data also increases in short and long-time predictions by 10-50% in comparison to statistical methods and neural networks, especially for stochastic processes [30,31]. On the base of predictive control it increases the results of a repetitive control. As an example prediction of the activity on the stock exchange in New York was considered in [10]. In the following on the base of observations in the period of February 22 up to June 14, 1995 in seven periods 7 variables of the stock market (DAX, Dow Jones, F.A.Z., Dollar and other) are predicted. In the information base delays of all variables up to 35 are included. Also there were used not only linear reference functions to describe the variables, but also non-linear. It was to model and to predict 7 time series not independently as time series models but rather as highly interactions network (input - output - model). Table 3 shows the accuracy of predictions for all variables (mean MAD [%]). Using the results of model generation (at first level of neuronet) it is possible to improve the accuracy of models in a second model generation, where are used the model outputs from previous for models generation of optimal complexity. This procedure can be continued up to decreasing accuracy of models. Table 3. Observation and prediction periods. Observation period Long-term prediction period Model Prediction Period up to Days Begin End Days Max delay Mean MAD [%] a March, 17 18 March,20 March, 31 10 5 0.985 % b March, 31 28 April,3 April, 18 10 10 2.055 % c April, 18 38 April, 19 May, 3 10 15 0.809 % d April, 28 46 May, 2 May, 15 10 20 1.642 % e May, 15 56 May, 16 May, 30 10 26 1.217 % f May, 30 66 May, 31 June ,14 10 30 1.206 % g June, 14 76 June, 16 June, 29 10 35 0.760 % Table 4 shows the resulting model error (MAPE [%]) and prediction error (MAD [%]) of Dollar, Dax, F.A.Z., Dow Jones and the mean values for all 7 variables obtained on 3 levels. The table 4 shows that the repeated application of self-organization gives more accurate approximation, which results in better predictions in the second level. The models obtained in the 3 level are overfitted, therefore the prediction error increases. Table 4: Multilevel application (model f). MAPE [%] MAD [%] Level 1 2 3 1 2 3 Dollar 0.68 0.51 0.11 2.32 2.17 11.67 Dax 0.35 0.24 0.10 2.20 1.24 5.21 F.A.Z. 0.22 0.23 0.03 1.54 1.27 2.32 Dow Jones 0.27 0.16 0.06 2.15 0.84 4.84 Mean 0.267 0.184 0.051 1.43 0.98 3.67 The efforts in using the GMDH type neural networks are much less than in neural networks, where the architecture must be chosen by trial and error. Only an adaptive synthesis of the network structure allows an automatic model generation and therefore applications in the fields where lots of decisions and forecasts (monitoring of complex systems with many controlled variables) repeating over short time periods are needed. 7. Objective selection of the best model It is the aim of self-organising modelling to get in an objective way models of optimal complexity. But there are several freedoms in choice of class of systems to be model (linear/non-linear), time lag and in selection of appropriate parameters (number of best models, complexity etc.). To reduce such a subjectivity it is recommended to generate several alternative models (linear, non-linear, with several complexity and time lags) and in a second layer to select the best model outputs or to generate there combination. Table 5 shows obtained results. Table 5. Selection of best model results (model g): prediction error MAD [%]. Linear Non-linear Second Model 1 2 3 1 2 3 layer Dollar 2.88 2.10 0.89 1.25 1.41 1.40 1.55 F.A.Z. 1.22 1.45 1.01 0.82 1.12 1.57 0.88 Dax 1.36 2.41 1.51 1.69 2.43 4.54 1.94 Dow Jones 1.14 1.26 1.44 3.75 3.25 3.79 2.93 Mean 1.14 1.29 0.90 1.21 1.35 1.81 1.20 8. Non-parametric inductive selection methods 8.1. Modeling of fuzzy systems The physical model is the best tool for function approximation and random process forecasting of deterministic objects where inputs and outputs are measured accurately with absence of noises. In the case of insufficient a priori information, not very accurate measurements, noisy and short data sample, better results can be reached by the use of non-physical models. But in the case of so-called ill-defined objects, dispersion of noise is too big, even for the use of non-physical models. In this case application of clustering of data sample is to be recommended, which can be considered as discrete form of physical model of ill-defined objects. Almost all objects of recognition and control in economics, ecology, biology and medicine are undeterministic or fuzzy. Deterministic (robust) part and additional black boxes acting on each output of object can represent them. The only information about these boxes is that they have limited values of output variables, which are similar to the corresponding states of object. According to Ashby [33] diversity of control system is to be not smaller, than diversity of the object itself. The Law of Adequateness, given by S.Beer, establishes that for optimal control the objects are to be compensated by corresponding black boxes of the control system [13]. For optimal pattern recognition and clustering only partial compensation is necessary. More of what we are interested in is to minimise the degree of compensation by all means to get more accurate results. The methods of cluster analysis and selection of analogous patterns discussed below are denoted as non-parametric because there is no need to estimate parameters. The method of cluster analysis was described in [20] in more detail. 8.2. Method of analogues complexing The equal fuzziness of the model and object is reached automatically if the object itself is used for forecasting. This is done by searching analogues from the given data sample which are equivalent to the physical model. Forecasts are not calculated in the classical sense but selected from the table of observation data. The main assumptions are the following: - the system to be modelled is described by a multidimensional process; - observations of data sample are enough (long time series); - the multidimensional process is sufficiently representative, i.e. the essential system variables are included in the observations; - it is possible that a part of past behaviour will be repeated. If we succeed in finding for the last part of behaviour trajectory (starting pattern), one or more analogous parts in the past (analogous pattern) the prediction can be achieved by applying the known continuation of these analogous patterns [8]. Using a sliding window which generates the set of possible patterns {Pi,k+1}, where and k+1 is the width of sliding window and also of the patterns, the output pattern is The algorithm of selection of the analogous pattern has the following task: For the given output pattern it is necessary to select the most similar patterns and to evaluate the forecast with the help of these patterns. Method of analogue complexing is recommended in the case when the input observations of data sample is long enough. Analogues substitute the physical model. It means that optimal analogue can be found by selection sorting-out procedure, using internal accuracy type criterion. To divide data sample into two parts is not necessary. There should be provided several optimization of algorithms parameters, to rise up the accuracy of processes short-term forecasting. The selection task is a four-dimensional problem with the following dimensions: - set of variables used; - number of analogues selected; - width of the patterns (number of lines, used in each); - values of weight coefficients with which patterns are complexed. As method of optimization the comparison of variants by internal criterion of accuracy is used. The criterion is calculated on the whole length of sample. This is the way of short-time forecasting problem solution on one step ahead. More difficult is the problem of long-term step-by-step random processes forecast. To select similar patterns from all possible patterns in the time series, the following steps are developed: A. Reducing variable set size The choice of an optimal set of variables can be realised by preselection. It is necessary to identify a subset of effective variables, which were defined as the nucleus [17]. One method of automatic generation of the nucleus is the automatic classification of variables by means of the algorithm of objective cluster analysis, described in [20,22]. Another method gives the GMDH algorithm for linear model construction. The models selected in the last layer indicate an ensemble of variables for which we have to seek the most consistent pattern analysis. B. Transformation of analogues Most processes in large-scale systems are evolutionary. In this case stationarity as one important condition of successful use of the method of analogues is not fulfilled. As time-series may be non-stationary patterns with similar shapes may have different mean values, standard deviations and trends. In the literature, it is recommended to evaluate the difference between the process and its trend, which is an unknown function of time. Another possibility gives the selection of differences where the criterion of stationarity is used as selection criterion. The results of the method of analogues depend on the selected trend function. It is advisable to determine transformed patterns , where The weights w, w for each pattern P, k>1 can be estimated by means of the least squares method, which gives not only the unknown weights but also the total sum of squares , which can be used in the following (step 3) as a measure of similarity. C. Selection of the most similar analogues The closest analogue is called the first analogue A, the next one in distance A is called the second analogue and so on until the last analogue A. Distances can be measured by means of the Euclidean distance of points of the output pattern and the analogue or by other measures of distance. In our case it is not necessary to find a proximity measure, but the total sum of the squares gives us information about the proximity between and . D. Combining forecasts Every selected analogous has its continuation which gives a forecast. In such a way we obtain F forecasts, which are to combine. In the literature there are several principles for combination of forecasts. The unknown predictions of the M systems variables can be assumed as a linear combination of the continuations of selected analogous patterns, i.e.: , The unknown parameters g, g, , will be estimated by means of parametric selection procedures e.g. using self-organising methods. The only problem is the small number of observations, i.e. the number of selected patterns. 8.3. Prediction of characteristics of stock market by analogues complexing On the base of observations in the period of February 22 to May 30, 1995 (66 days) the analogue complexing algorithm was used. Table 7 shows prediction error (MAD [%]) of four variables (Dollar, Dax, F.A.Z., Dow Jones) and the mean prediction error over all 7 variables. The width of the patterns varies from 6 to 15 days. Table 7. Prediction error (MAD [%]) of analogues complexing Width 6 7 8 9 10 11 12 13 14 15 Dollar 2.61 3.28 2.62 2.79 2.91 2.91 1.86 1.48 2.16 8.96 F.A.Z. 1.418 2.609 1.597 1.485 1.187 1.391 1.435 1.869 0.723 1.118 Dax 1.427 1.702 1.7 2.307 2.962 2.761 2.761 2.612 2.372 1.122 Dow J 1.36 1.708 1.622 6.979 5.393 4.647 4.966 3.849 3.363 2.793 Mean 1.174 1.575 1.581 2.356 2.458 2.08 1.944 1.789 1.632 1.877 The forecasts are obtained by means of linear combination of the continuations of 5 selected analogous pattern, where the unknown weights gj are estimated by means of parametric selection procedures. References 1. Madala,H.R. and Ivakhnenko,A.G. Inductive Learning Algorithms for Complex Systems Modeling. CRC Press Inc., Boca Raton, 1994, p.384. 2. M?ller,J.-A. and Ivakhnenko,A.G. Selbstorganisation von Vorhersagemodellen. Berlin, VEB Verlag Technik, 1984. 3. Ivakhnenko,A.G., and Osipenko,V.V. Algorithms of Transformation of Probability Characteristics into Deterministic Forecast. Sov. J. of Automation and Information Sciences, 1982, vol.15, no.2, pp.7-15. 4. Ivakhnenko,A.G., Peka,P.Yu., and Vostrov,N.N. Kombinirovannyj Metod Modelirovanija Vodnych i Neftianykh Polej (Combined Method of Water and Oil Fields Modeling). Kiev: Naukova Dumka, 1984. 5. Ivakhnenko,A.G., and M?ller,J.A. Problems of Computer Clustering of the Data Sampling of Objects under Study. Sov. J. of Automation and Information Sciences, 1991, vol.24, no.1, pp.58-67. 6. Ivakhnenko,A.G., Petukhova,S.A., et al. Objective Choice of Optimal Clustering of Data Sampling under Non-robust Random Disturbances Compensation. Sov. J. of Automation and Information Sciences, 1993, vol.26, no.4, pp. 58-65. 7. Ivakhnenko,A.G. and Ivakhnenko,G.A. Simplified Linear Programming Algorithm as Basic Tool for Open-Loop Control. System Analysis Modeling Simulation (SAMS), 1996, vol.22, pp.177-184. 8. Ivakhnenko,A.G. An Inductive Sorting Method for the Forecasting of Multidimensional Random Processes and Events with the Help of Analogues Forecast Complexing. Pattern Recognition and Image Analysis, 1991, vol.1, no.1, pp.99-108. 9. Ivakhnenko,A.G., Ivakhnenko,G.A. and M?ller,J.A. Self-Organisation of Neuronets with Active Neurons. Pattern Recognition and Image Analysis, 1994, vol.4, no.2, pp.177-188. 10.Ivakhnenko,G.A. Self-Organisation of Neuronet with Active Neurons for Effects of Nuclear Test Explosions Forecastings. System Analysis Modeling Simulation (SAMS), 1995, vol.20, pp.107-116. 11.Farlow,S.J.,(ed.) Self-organising Methods in Modeling (Statistics: Textbooks and Monographs, vol.54), Marcel Dekker Inc., New York and Basel, 1984. 12.Aksenova,T.I. and Yurachkovsky,Yu.P. A Characterisation at Unbiased Structure and Conditions of Their J-Optimality, Sov. J. of Automation and Information Sciences, vol.21, no.4, 1988, pp.36-42. 13.Beer,S. Cybernetics and Management, English Univ.Press, London, 1959, p.280. 14.Gabor D. Perspectives of Planning. Organisation of Economic Cooperation and Development. Emp. College of Sci. and Technology, London, 1971. 15.Belogurov,V.P. A criterion of model suitability for forecasting quantitative processes. Soviet J. of Automation and Information Sciences, 1990, vol.23, no.3, p.21-25. 16.Sawaragi,Y., Soeda,T. et al. Statistical Prediction of Air Pollution Levels Using Non-Physical Models, Automatica (IFAC), vol.15, no.4, 1979, p.441-452. 17.Ivakhnenko,A.G., and M?ller,J.A. Parametric and Non-parametric Selection Procedures in Experimental Systems Analysis. Systems Analysis, Modeling and Simulation (SAMS), 1992, vol.9, pp.157-175. 18.Ivakhnenko A.G., Krotov G.I. and Cheberkus V.I. Harmonic and exponential-harmonic GMDH algorithms for long-term prediction of oscillating processes. Part I. Sov. J. of Automation and Information Sciences, v.14, no.1, 1981, p.3-17. 19.Ivakhnenko A.G., Krotov G.I. Multiplicative and Additive Non-linear GMDH Algorithm with factor degree optimization. Sov. J. of Automation and Information Sciences, v.17, no.3, 1984,p.13-18. 20.Ivakhnenko,A.G., Ivakhnenko,G.A., and M?ller,J.A. Self-Organisation of Optimum Physical Clustering of the Data Sample for Weakened Description and Forecasting of Fuzzy Objects. Pattern Recognition and Image Analysis, 1993, vol.3, no.4, pp.415-421. 21.Triseev,Yu.P. Approaches to the Solution of Mathematical Programming Problems on the Basis of Heuristic Self-Organisation. Soviet J. of Automation and Information Sciences, 1987, vol.20, no.3, pp.30-37. 22.Zholnarsky,A.A. Agglomerative Cluster Analysis Procedures for Multidimensional Objects: A Test for Convergence. Pattern Recognition and Image Analysis, 1992, vol.25, no.4, pp.389-390. 23.Stepashko V.S. and Kostenko Ju.V. GMDH Algorithm for Two-Level Modeling of Multivariate Cyclic Processes, Sov. J. of Automation and Information Sciences, 1987, vol.20, no.4. 24.Ivakhnenko, A.G. and Stepashko,V.S., Pomekhoustojchivost' Modelirovanija (Noise Immunity of Modeling). Kiev: Naukova Dumka, 1985. 25.Ivakhnenko,A.G. and Yurachkovsky,Yu.P. Modelirovanie Slozhnykh System po Exsperimentalnym Dannym (Modeling of Complex Systems after Experimental Data). Moscow: Radio i Svyaz, 1986, p.118. 26.Stepashko V.S. Asymptotic Properties of External Criteria for Model Selection, Sov. J. of Automation and Information Sciences, 1988, vol.21, no.6, pp.84-92. 27.Stepashko V.S. Structural Identification of Predictive Models under Conditions of a Planned Experiment, Sov. J. of Automation and Information Sciences, 1992, vol.25, no.1, pp.24-32. 28.Stepashko V.S. GMDH Algorithms as Basis of Modeling Process Automation after Experimental Data, Sov. J. of Automation and Information Sciences, vol.21, no.4, 1988, pp.43- 53. 29.Ivakhnenko, A.G., M?ller, J.-A.: Present state and new problems of further GMDH development. SAMS, 20 (1995), no.1-2, 3-16. 30.M?ller, J.-A., Lemke,F.: Self-Organising modelling and decision support in economics. In „Proceedings of the IMACS Symposium on Systems Analysis and Simulation“. Gordon and Breach Publ. 1995, 135-138. 31.Lemke, F.: SelfOrganize! - software tool for modelling and prediction of complex systems. SAMS, 20 (1995), no.1-2, 17-28. 32.M?ller, J.-A.: Analysis and prediction of ecological systems. SAMS, 21 (1996). 33.Ashby An introduction to cybernetics. J. Wiley, New York 1958. 34.Ivakhnenko, A.G., M?ller, J.-A.: Self-organisation of nets of active neurons. SAMS, 20 (1995) no.1-2, 93-106. Ivakhnenko A. G., Muller J. "Recent Developments of Self-Organising Modeling in Prediction and Analysis of Stock Market"
Источник:http://www.gmdh.net/articles/index.html