Master's thesis

Gogolenko Sergiy

Theme of master's thesis:

"Research of locally deterministic methods of
mathematical programming"

Author: S. Gogolenko
Tutor: prof. dr.-ing. V.Svjatnyj
Consultants: M.Ginkel, K.Teplinsky

e-mails:    gogolenkos@ukr.net
sergiy.gogolenko@gmail.com

Content

Introduction


Hunting for effective optimization algorithms was always one of the prioritiest trends of applied mathematics. High interest in the results of given applied mathematics' area is determined by presence of optimization processes in the nature as well as by aspiration of the people for the best organization of their activities. Thus optimization is an important tool in the analysis of phyzical systems and in decision-making.

The main aims of given work is research of the based on use of locally deterministic methods approaches to the solution of various optimization problems occurring in modelling of complex dynamic systems and building of effective locally deterministic optimization algorithms for solving of this problems. For achievement of given aims this work contains analysis of specificity, characteristic features and various ways of the most important in modelling of complex dynamic systems optimization problems' formulation, as well as known algorithms for the solving of optimization problems and software packages that realizes this algorithms. Practical result of this work is the development of an optimization's subsystem and its implementation in modelling environment Diana on the basis of which the further researches are carried out.

It is supposed, that in this work the new effective optimization algorithm (or group of algorithms) focused on the decision of the specific problems occurring in modelling of complex dynamic systems will be developed.


Up

1. Optimization problems occurring in modelling of CDS


1.1 Peculiarities of optimization problems occurring in modelling of CDS


Detailed consideration of every possible optimization problems occurring in modelling of CDS allows to divide them into two groups.

The first group includes problems in which objective function and constraint functions are specified by formulas in explicit form. Similar problems usually arise as subtasks of other larger problems of modelling. They are characterized by the insignificant calculation's price of objective function, and also a little or an average count of unknowns. The gradient for such problems is calculated usually separately from criterion function with using of finite-difference approximations. Often the area of allowable values for problems of the first group is determined completely by functional constraints. The solution of such problems, basically, doesn't demand high computing power and can be found with using of classical optimization schemes.

The problems in which objective function is specified as superposition of some function and the decision of differential equations' (DE) system relate to the second group. If the model is described by system fo DE:

\dot{x}=f(t;x,p); x(T_0)=x_0   (1.1)

then formally objective function for problems of the second group is described in the general case by the formula:
L(t;x,p)=F(t;\dot{x},x,p)   (1.2)

The following features are inherent in such problems:
  1. The count of unknowns in the given problems isn't large and rarely exceeds 30.
  2. Due to necessity of trajectory's integration at calculation of objective function's value, its calculation demands big computational costs.
  3. Since the most of the specialized integrators performs trajectory's integration and sensitivity analysis simultaneously calculation of objective function and its gradient (or Jackobian of its components) can be performed also simultaneously with the minimal additional computational costs.
  4. Possible divergence of algorithms for DE's solving and incomputability of some expressions included in the modelling equations, cause occurrence of direct constraints. And it's difficult (or almost impossible) to replace this constraints with equivalent functional constraints.
  5. Objective function in such problems are characterized frequently by high nonlinearity and has set of local minima that demands using of special methods, before locally deterministic algorithm has to be used. This provides good initial guess and consequently globalization of convergence.
  6. Frequently the given problems are badly scaled.

It follows from the listed characteristics, that the problems concerning to the second group, demand significant revision of classical optimization algorithms, focusing the given algorithms on the small count of objective function's calculations and keeping of arising direct restrictions in mind.


Up

1.2 Problem of parameter estimation. Multiple shooting method (MSM)


The problem of model's parameter estimation concerns to the second group of problems. Formulation of the given problem can be made as follows [6, 8, 13, 14]. The model described by DE's system (1.1) and the set of the observation equations are given g(x(t;x_0,p),x_0,p). Some parameters p' and initial values of state variables x0' are unknown and form a vector of sought parameters \teta=(x_0^{'},p^{'}). At the same time the matrix of measurements Y, describing experimental data is known. Elements of matrix Y are measurements of state variables at discrete times via the observation functions and are described by the equation:

y_{ij}=g^{(j)}(x(t;x_0,p),x_0,p),   (1.3)

where \etadenotes some normally distributed random noise. The aim is the estimation of sought parameters' vector depending on experimental data. Maximum likelihood estimator is used usually for solving of this problem. Then in the case of white Gaussian observation noise the problem takes on form:
L(x_0,p) = \sum_{i=1}^n \sum_{j=1}^{obs} \frac{ (x^{(j)}(t_i;x_0,p)-x_i^{(j)})^2}{2\sigma_{ij}^2},   (1.4)

where \hi^2(\teta) denotes objective function, G(\teta) and F(\teta) denotes constraints based on physical reasons. Confidence intervals for estimated variables are determined by using of Fisher information matrixes [6, 12, 14] or other similar methods [8].

In this case the important problem when using locally deterministic methods is the choice of an initial point. There are two approaches to this problem. First of them is called initial-value approach [8]. It supposes that after a suitable choice of initial guess the modelling equations are integrated over the entire fitting interval, and then function \hi^2(\teta) is calculated by formula (1.4). The given approach demands using of method providing rough approach to a global minimum at the initial step. The most simple approach from this situation is using of some global stochastic optimizer. The genetic algorithm of optimization is chosen for further investigations in the given work.

It's possible also to use the alternative approach called multiple shooting method (MSM) [6, 8, 13, 14].MSM was introduced by van Domselaar and Hemker (1975) and proved theoretically by Bock (1981,1983). The given approach contains following steps. First fitting interval is divided by suitable grid into subintervals T0=t0<t1<...<tm=T0+T (fig.1). Each subintervals associate with the piece of an integrated trajectory with own unknown initial values of state variables x0(k). In addition all intervals have the same sought parameters of model p'. Each interval should contain at least one experimental point. As the resulting trajectory should be continuous, there appears some additional constraints, providing a continuity of a resulting trajectory at the moments of intervals' conversion. Finally the problem assume the following form:

L(x_0,p) = \sum_{i=1}^n \sum_{j=1}^{obs} \frac{ (x^{(j)}(t_i;x_0,p)-x_i^{(j)})^2}{2\sigma_{ij}^2}.   (1.5)

MSM idea
Fig.1.1 — Idea of multiple shooting method

Despite of significant increase of unknowns' count in MSM, the given scheme possesses several advantages [13]:

  1. It's easier to include a prior information about the state variables from the measurements by a propper choice of iniutial guesses.
  2. The adequate choice of iniutial guesses typically avoids convergence to local minima of problem.
  3. The given metod is numerically stable and makes it possible to solve parameter estimation problem even for unstable and chaotic systems.
  4. Peculiar structure of objective function allows to simplify parallelization of its calculation.


Up

2. Optimization algorithms


2.1 Classification of optimization algorithms


There are some alternative approaches to classification of locally deterministic optimization algorithms (fig. 2.1).

The class of optimization problems which can be solved by algorithm serve as a criterion for algorithms' classification in the first approach [1, 2, 3, 5]. Since the most important special cases of mathematical programming problems are unconstrained optimization problem, bound constrained optimization problem, optimization problem with equality constraints, constrained optimization problem and nonlinear least squares problem there are allocated the following groups of algorithms according to this criterion:

  1. unconstrained optimization algorithms;
  2. bound constrained optimization algorithms;
  3. algorithms for solving of optimization problem with equality constraints;
  4. constrained optimization algorithms;
  5. nonlinear least squares algorithms.

The second approach is based on keeping in mind various criteria, which characterizes realization of algorithms [4, 11]. According to one of such criteria distinguish active (sequential) and passive methods of optimization. All points for the further analysis are chosen in passive methods simultaneously before calculations. Points are chosen sequentially in active methods, the choice of each point depends on values of previous points. Other criterion is the desired by algorithm information about objective function. If it's enough only the information about function's value in a point such algorithm concerns to null-order algorithms (Gauss method, simplex method of Nedler-Mid, Rosenbrock's methods, Powell's method, etc.). If gradient is required, the algorithm concerns to the first-order algorithms. Second-order algorithms require also the information about Hesse matrix except for objective function's value and its gradient (Newton method, classical SQP-methods). The most locally deterministic optimization methods belong to descent methods. This group of methods is based on two models. The first model is line search model, in which on each iteration descent direction is fixed and the step length is estimated. The second model is methods trust-region model in which on each iteration descent direction is estimated. First order descent methods includes following groups of methods [1, 9]:

  1. gradient algorithms;
  2. quasi-Newton algorithms (DFP, BFGS, SR1);
  3. conjugate gradient methods (Fletcher-Rievse method, Polak-Rebier method);
  4. nonlinear least squares algorithms (Gauss-Newton method, Levenberg-Marquard method, some SQP-methods).

optimization algorithms
Fig.2.1 — Classification of optimization algorithms

Up

2.2 Review of optimization software


At present there are developed a lot of optimization programs realizing locally deterministic algorithms. The most of these codes are distributed under GPL license, the part of codes requiring payment. All optimization software can be divided into three groups (fig. 2.2) [5]:

  1. the software packages focused on solving of optimization problems (on fig.2.2 marked green);
  2. the numerical libraries containing optimization algorithms (on fig.2.2 marked yellow);
  3. optimization systems and modelling environments (on fig.2.2 marked blue);
  4. the software packages focused on solving of parameter estimation problems (EASY-FIT, PAREST);
  5. libraries of optimization problems for testing of optimization software (CUTEr).

optimization software
Fig.2.2 — Classification of optimization software

Up

3. Review of publication


Yearly several tens of publications about optimization problems occurring in modelling of CDS appear. There are a lot of works, in which parameter estimation problem is discussed [19, 20], but multiple shooting method is poorly covered in the literature. Classical books of prof. H.Bock are practically inaccessible because of small circulation.

There are a lot of books and articles about locally deterministic optimization methods. Some of them attempts to cover all known locally deterministic optimization algorithms [1-4, 9-11], some works describes only separate chosen algorithm or group of methods [7, 15-18, 21].


Up

4. Brief description of used algorithms


In the given work initial value approach Is used for solving of problems occurring in modelling of CDS. In this case initial approach to a global minimum is found by genetic algorithm of optimization. Parameter estimation problem is solved in addition by MMS (see 1.2).

As for now LBFGS-B-method [7] is used as the main optimization algorithm. The direction in the given method is determined by formula (4.1):

p_k=-H_k \nabla f(x_k),   (4.1)

where s_k, y_k, \ro _k.

In LBFGS-B-algorithm the similar to gradient projection method approach is used. Classical LBFGS-B-algorithm uses backtracking procedure for choosing of step length. In the given work it is planned to improve classical LBFGS-B-method by using of interpolation procedure for choosing of step length.

descent method's schema
Fig.4.1 — Scheme of descent method's performance

Up

Conclusion


In the given work the main classes of optimization problems were briefly considered. Also this work contains brief review of optimization algorithms and optimization software. Currently in the context of work interfaces of main optimization problems and solvers are developed and implemented into modelling environment Diana. In addition gradient Armijo algorithm was realized, but research of this algorithm was stoped because the given algorithm didn't solve badly scaled problems. Instead of it in modelling environment Diana code of LBFGS-B-algorithm developed by P.Lu [7] was implemented.

Further it is planned to realize modified LBFGS-B-algorithm on C++ and to investigate efficiency of suggested modification on real parameter estimation problems with use of initial value approach and MSM, as well as to investigate one of nonlinear least squares algorithms.


Up

Bibliography


  1. Nocedal, Jorge; Wright, Stephen J. Numerical optimization. — New York, NY [u.a.]: Springer, Springer series in operations research, 1999. — 623 ñ.
  2. Èçìàèëîâ, À.Ô.; Ñîëîäîâ, Ì.Â. ×èñëåííûå ìåòîäû îïòèìèçàöèè. — Ì.: «ÔÈÇÌÀÒËÈÒ», 2003. — 304 ñ.
  3. Áåðòñåêàñ, Ä. Óñëîâíàÿ îïòèìèçàöèÿ è ìåòîäû ìíîæèòåëåé Ëàãðàíæà. — Ì.:Ðàäèî è ñâÿçü, 1987. — 400 ñ.
  4. Ñóõàðåâ, À.Ã.; Òèìîõîâ, À.Â.; Ô¸äîðîâ, Â.Â. Êóðñ ìåòîäîâ îïòèìèçàöèè. — Ì.: Íàóêà, 1986. — 328 ñ.
  5. More, Jorge J.; Wright, Stephen J. Optimization software guide. — Philadelphia, Pa.: SIAM Soc. for Industrial and Applied Mathematics, 1994. — 154 ñ.
  6. Peifer, M.; Timmer, J. Parameter estimation in ordinary differential equations using the method of multiple shooting — a review. — Freiburg: Freiburg Centre for Data Analysis and Modelling, 2005
  7. Byrd, Richard H.; Lu,P.; Nocedal, Jorge; Zhu, C. A Limited Memory Algorithm for Bound Constrained Optimization. — Northwest university, 1994
  8. Horbelt, Werner Maximum likelihood estimation in dynamical systems. — Freiburg: Albert—Ludwigs—Universitaet Freiburg Fakultaet fuer Physik, 2001.
  9. Polak, Elijah Optimization : algorithms and consistent approximations (Applied mathematical sciences; v.124). — New York, NY [u.a.] : Springer, 1997. — 779 ñ.
  10. Censor, Yair; Zenios, Stavros Andrea Parallel optimization : theory, algorithms, and applications — New York, NY [u.a.] : Oxford Univ. Press, 1997. — 539 ñ.
  11. Ñåà, Æ. Îïòèìèçàöèÿ. Òåîðèÿ è àëãîðèòìû. — Ì.: «Ìèð», 1973. — 244 ñ.
  12. Retout, Sylvie; Mentre, France; Bruno, Rene Fisher information matrix for non-linear mixed-e ects models: evaluation and application for optimal design of enoxaparin population pharmacokinetics. — Sylvie Retout, INSERM U436, CHU Pitie Salpetriere, 91 bd de l Hopital, 75013 Paris, France, May 2001. — 15 ñ.
  13. Bock, Hans Georg; Koerkel, Stefan; Kostina, Ekaterina; Schloeder, Johannes P. Methods of Design of Optimal Experiments with Application to Parameter Estimation in Enzyme Catalytic Processes. — Interdisciplinary Center for Scientific Computing (IWR), University of Heidelberg, Im Neuenheimer Feld 368, 69120 Heidelberg, Germany, October 2004. — 25 ñ.
  14. Kutalik, Zoltan; Cho, Kwang-Hyun; Gordon, Steve V.; Wolkenhauer, Olaf Optimal Sampling Time Selection for Parameter Estimation in Dynamic Pathway Modelling. — Control Systems Centre, Department of Electrical Engineering and Electronics, UMIST, Manchester, UK, — 2003. — 22 ñ.
  15. Broyden, C. G. Journal of the Institute of Mathematics and Its Applications.#6. — 1970. — 76-90 ñ.
  16. Fletcher, R. Computer Journal.#13. — 1970. — 317 ñ.
  17. Goldfarb, D. Mathematics of Computation.#24. — 1970. — 23 ñ.
  18. Shanno, D. F. Mathematics of Computation.#24. — 1970. — 647 ñ.
  19. Walter, Eric ; Pronzato, Luc Identification of parametric models from experimental data. — London: Springer, 1997. — 403 ñ.
  20. Schittkowski, Klaus Numerical data fitting in dynamical systems: a practical introduction with applications and software . — Dordrecht [u.a.] : Kluwer, 2002. — 392 ñ.
  21. Nocedal, Jorge Mathematics of Computation.(Updating quesi-Newton matrices with limited storage.)#35 — 1980, — 773-782 ñ.

Up

Developed by Sergiy Gogolenko (May 2006)