Multiobjective optimization using evolutionary algorithms

Источник:http://ctr.stanford.edu/Summer00/sbalzarini.pdf

By Ivo F. Sbalzariniy, Sibylle M?ullery AND Petros Koumoutsakosyz

Center for Turbulence Research Proceedings of the Summer Program 2000

Multiobjective evolutionary algorithms for shape optimization of electrokinetic micro channels have been developed and implemented. An extension to the Strength Pareto Approach that enables targeting has been developed. The results of the automated optimization cycle show shapes previously obtained by physical understanding as well as novel shapes of even higher eciency.

1. Introduction

Evolutionary algorithms (EAs) such as evolution strategies and genetic algorithms have become the method of choice for optimization problems that are too complex to be solved using deterministic techniques such as linear programming or gradient (Jacobian) methods. The large number of applications (Beasley (1997)) and the continuously growing interest in this eld are due to several advantages of EAs compared to gradient based methods for complex problems. EAs require little knowledge about the problem being solved, and they are easy to implement, robust, and inherently parallel. To solve a certain optimization problem, it is enough to require that one is able to evaluate the objective (cost) function for a given set of input parameters. Because of their universality, ease of implementation, and tness for parallel computing, EAs often take less time to nd the optimal solution than gradient methods. However, most real-world problems involve simultaneous optimization of several often mutually concurrent objectives. Multiobjective EAs are able to nd optimal trade-os in order to get a set of solutions that are optimal in an overall sense. In multiobjective optimization, gradient based methods are often impossible to apply. Multiobjective EAs, however, can always be applied, and they inherit all of the favorable properties from their single objective relatives.

Section 2 of this paper introduces main concepts of single objective EAs. Section 3 extends these ideas to multiobjective cases and introduces the principles of dominance and Pareto optimality. Section 4 describes the Strength Pareto Approach used in this work, and in section 5 we extend it with a targeting capability. In section 6 the results of both single and multiobjective optimization of a microchannel flow are shown and discussed.

2. Single objective evolutionary algorithms

The basic idea for single objective EAs is to imitate the natural process of biological evolution. The problem to be solved is therefore described using a certain number ofparameters (design variables). One then creates a group of (> 0) dierent parameter vectors and considers it as a population of individuals. The quantity is called the population size. The quality of a certain vector of parameters (i.e. an individual in the population) is expressed in terms of a scalar valued tness function (objective function). Depending on whether one wants to minimize or maximize the objective function, individuals (i.e. parameter vectors) with lower or greater tness are considered better, respectively. The algorithm then proceeds to choose the ; ( < ) best individuals out of the population to become the parents of the next generation (natural selection, survival of the ttest). Therefore, denotes the number of parents. The smaller is chosen compared to , the higher the selection pressure will be. Out of the individuals chosen to be parents for the next generation, one then creates a new population of ospring xg+1 i by applying mutation on the parents xg j as follows:

where N(0; denotes a vector of jointly distributed Gaussian random numbers with zero mean and covariance matrix . The standard deviations (i.e. the square roots of the diagonal elements 2 i of ) of the additive random numbers determine /how far away from its parent a child will be" and are called step sizes of the mutation. Now, the rst iteration is completed and the algorithm loops back to the evaluation of the tness function for the new individuals. Several dierent techniques for adaptation and control of the step size have been developed (see e.g. B?ack (1997a), B?ack (1997b), B?ack (1993), Hansen & Ostermeier (1996), or Hansen & Ostermeier (1997)). In the following subsections, some of the single objective Evolution Strategies used in this work are outlined.

A slightly more advanced method is to take one or more parents and even more ospring, i.e. 1 and . Mutation is accomplished in a similar way as with the (1+1)-ES. Besides the 1/5 rule, another method for step size adaptation becomes available which is called self-adaptive mutation (Back (1997a)). In this method, the mutation steps are adapted every generation. They are either increased, decreased, or kept the same, each with a probability of 1/3. On the average, 1/3 of the ospring will now be closer to their parents than before, 1/3 keeps progressing at the same speed, and 1/3 explores further areas. Depending on how far away from the optimum we currently are, one of these three groups will do better than the others and, therefore, more individuals out of it will beselected to the next generation, where their step sizes are inherited. The algorithm adapts the step size by itself, i.e. by means of mutation and selection.

The covariance matrix adaptation (CMA) is a sophisticated method for online adaptation of step sizes in with intermediate recombination (i.e. averaging of parents). It was rst described by Hansen & Ostermeier (1996) and further improved and evaluated by Hansen & Ostermeier (1997). For a complete description of the algorithm, the reader is referred to the latter publication. The basic idea is to adapt step sizes and covariances in such a way that the longest axis of the hyperellipsoid of mutation distribution always aligns in the direction of greatest estimated progress. This is done by accumulating information about former mutation steps and their success (evolution path) and searching it for correlations. Besides this very sophisticated method for step size adaptation, a CMA-ES also includes mutation (with now being a full matrix) and selection.

3. Multiobjective evolutionary algorithms

As soon as there are many (possibly conflicting) objectives to be optimized simultaneously, there is no longer a single optimal solution but rather a whole set of possible solutions of equivalent quality. Consider, for example, the design of an automobile. Possible objectives could be: minimize cost, maximize speed, minimize fuel consumption and maximize luxury. These goals are clearly conflicting and, therefore, there is no single optimum to be found. Multiobjective EAs can yield a whole set of potential solutions - which are all optimal in some sense - and give the engineers the option to assess the trade-o s between dierent designs. One then could, for example, choose to create three di erent cars according to dierent marketing needs: a slow low-cost model which consumes least fuel, an intermediate solution, and a luxury sports car where speed is clearly the prime objective. Evolutionary algorithms are well suited to multiobjective optimization problems as they are fundamentally based on biological processes which are inherently multiobjective.

After the rst pioneering work on multiobjective evolutionary optimization in the eighties (Schner (1984), Schaner (1985)), several dierent algorithms have been proposed and successfully applied to various problems. For comprehensive overviews and discussions, the reader is referred to Fonseca & Fleming (1995), Horn (1997), Van Veldhuizen & Lamont (1998) and Coello (1999).

3.1. Dominance and Pareto-optimality

In contrast to fully ordered scalar search spaces, multidimensional search spaces are only partially ordered, i.e. two dierent solutions are related to each other in two possible ways: either one dominates the other or none of them is dominated. Definition 1: Consider without loss of generality the following multiobjective optimization problem with m decision variables x (parameters) and n objectives y:

Maximize y = f (x) = (f1(x1; : : : ; xm); : : : ; fn(x1; : : : ; xm))

where x = (x1; : : : ; xm) 2 X

y = (y1; : : : ; yn) 2 Y

and where x is called decision (parameter) vector, X parameter space, y objective vector and Y objective space. A decision vector a 2 X is said to dominate a decision vector b 2 X (also written as a b) if and only if:

8i 2 f1; : : : ; ng : fi(a) fi(b)

Additionally, we say a covers b (a b) if and only if a b or f (a) = f (b).

Based on this convention, we can dene nondominated, Pareto-optimal solutions as follows:

Definition 2: Let a 2 X be an arbitrary decision (parameter) vector.

(a) The decision vector a is said to be nondominated regarding a set X0 X if and only if there is no vector in X0 which dominates a; formally:

(b) The decision (parameter) vector a is called Pareto-optimal if and only if a is nondominated regarding the whole parameter space X.

If the set X0 is not explicitly specied, the whole parameter space X is implied. Pareto-optimal parameter vectors cannot be improved in any objective without causing a degradation in at least one of the other objectives. They represent in that sense globally optimal solutions. Note that a Pareto-optimal set does not necessarily contain all Paretooptimal solutions in X. The set of objective vectors f (a0); a0 2 X0, corresponding to a set of Pareto-optimal parameter vectors a0 2 X0 is called /Pareto-optimal front" or /Pareto-front".

3.2. Diculties in multiobjectve optimization

In extending the ideas of single objective EAs to multiobjective cases, two major problems must be addressed:

1. How to accomplish tness assignment and selection in order to guide the search towards the Pareto-optimal set.

2. How to maintain a diverse population in order to prevent premature convergence and achieve a well distributed, wide spread trade-ofront.

Note that the objective function itself no longer qualies as tness function since it is vector valued and tness has to be a scalar value. Dierent approaches to relate the tness function to the objective function can be classied with regard to the rst issue. For further information, the reader is referred to Horn (1997). The second problem is usually solved by introducing elitism and intermediate recombination. Elitism is a way to ensure that good individuals do not get lost (by mutation or set reduction), simply by storing them away in a external set, which only participates in selection. Intermediate recombination, on the other hand, averages the parameter vectors of two parents in order to generate one ospring according to:

x0j = xg j1 + (1 ? )xg j2 ; j; j1; j2 2 f1; : : : ; g xg+1 i = x0j + N(0; ) ; i = 1; : : : ; ; j 2 f1; : : :; g

Arithmetic recombination is a special case of intermediate recombination where = 0:5.

4. The Strength Pareto Approach

For this work, the Strength Pareto Approach for multiobjective optimization has been used. Comparative studies have shown for a large number of test cases that, among all major multiobjective EAs, the Strength Pareto Evolutionary Algorithm (SPEA) is clearly superior (Zitzler & Thiele (1999), Zitzler & Thiele (2000)). It is based on the abovementioned principles of Pareto-optimality and dominance. The algorithm as proposed by Zitzler & Thiele (1999) was implemented in a restartable, fully parallel code as follows:

Step 1: Generate random initial population P and create the empty external set of nondominated individuals P0.

Step 2: Evaluate objective function for each individual in P in parallel.

Step 3: Copy nondominated members of P to P0.

Step 4: Remove solutions within P0 which are covered by any other member of P0.

Step 5: If the number of externally stored nondominated solutions exceeds a given maximum N0, prune P0 by means of clustering.

Step 6: Calculate the tness of each individual in P as well as in P0.

Step 7: Select individuals from P +P0 (multiset union), until the mating pool is lled.

Step 8: Adapt step sizes of the members of the mating pool.

Step 9: Apply recombination and mutation to members of the mating pool in order to create a new population P.

Step 10: If maximum number of generations is reached, then stop, else go to Step 2.

4.1. Fitness assignment

4.1. Fitness assignment

In Step 6, all individuals in P and P0 are assigned a scalar tness value. This is accomplished in the following two-stage process. First, all members of the nondominated set P0 are ranked. Afterwards, the individuals in the population P are assigned their tness value.

Step 1: Each solution i 2 P0 is assigned a real value si 2 [0; 1), called strength. si is proportional to the number of population members j 2 P for which i j. Let n denote the number of individuals in P that are covered by i and assume N to be the size of P. Then si is dened as: si = n N+1. The tness fi of i is equal to its strength: fi = si 2 [0; 1).

Step 2: The tness of an individual j 2 P is calculated by summing the strengths of all external nondominated solutions i 2 P0 that cover j. Add one to this sum to guarantee that members of P0 always have better tness than members of P (note that the tness is to be minimized):

4.2. Selection and step size adaptation

Step 7 requires an algorithm for the selection of individuals into the mating pool and Step 8 includes some method for dynamical adaptation of step sizes (i.e. mutation variances). For this paper, selection was done using the following binary tournament procedure:

Step 1: Randomly (uniformly distributed random numbers) select two individuals out of the population P.

Step 2: Copy the one with the better (i.e. lower for SPEA) tness value to the mating pool.

Step 3: If the mating pool is full, then stop, else go to Step 1.

Adaptation of the step sizes was done using the self-adaptive mutation method (c.f. section 2.3). Each element of P and P0 is assigned an individual step size for every parameter, i.e. = diag(2 i ) is a diagonal matrix for each individual. The step sizes of all members of the mating pool are then either increased by 50%, cut to half, or kept the same, each at a probability of 1/3.

4.3. Reduction by clustering

In Step 5, the number of externally stored nondominated solutions is limited to some number N0. This is necessary because otherwise P0 would grow to innity since there always is an innite number of points along the Pareto-front. Moreover, one wants to be able to control the number of proposed possible solutions because, from a decision maker's point of view, a few points along the front are often enough. A third reason for introducing clustering is the distribution of solutions along the Pareto-front. In order to explore as much of the front as possible, the nondominated members of P0 should be equally distributed along the Pareto-front. Without clustering, the tness assignment method would probably be biased towards a certain region of the search space, leading to an unbalanced distribution of the solutions. For this work, the average linkage method, a clustering algorithm which has proven to perform well on Pareto optimization, has been chosen. The reader is referred to Morse (1980) or Zitzler & Thiele (1999) for details.

5. Strength Pareto approach with targeting

Compared to other methods such as, for example, the energy minimization evolutionary algorithm (EMEA) (c.f. Jonathan, Zebulum, Pacheco & Vellasco (2000)), the SPEA has two major advantages: it nds the whole Pareto-front and not just a single point on it, and it converges faster. The latter is a universal advantage whereas the former is not. There are applications where a target value can be specied. One then wants to nd the point on the Pareto-front which is closest to the user-specied target (in objective space). This eliminates the need to analyze all the points found by SPEA in order to make a decision. EMEA o ers such a possibility, but it converges slower than SPEA and, what's more, it is fundamentally unable to nd more than one point per run. Hence we wish to extend SPEA with some targeting facility that can be switched on and o depending on whether one is looking for a single solution or the whole front, respectively. We added this capability to SPEA by the following changes to the algorithm:

2. Another external storage Pbest is added which always contains the individual out of P0 which is closest to the target. Therefore, between steps 4 and 5, the algorithmcalculates the distances of all members of P0 to the target and picks the one with minimal distance into Pbest. At all times, Pbest only contains one solution.

3. At the end of the algorithm, not only the Pareto-front is output but also the solution stored in Pbest. Note: due to clustering and removal in P0, the solution in Pbest is not necessarily contained in P0. It is, therefore, an optimal solution which otherwise would not have appeared in the output.

The chosen target value is slightly o -front. Therefore, the targeting error will never be zero. Figure 1 shows the nal population after 250 generations without targeting. The diamonds indicate members of the external nondominated set (Pareto-optimal front) whereas members of the regular population are denoted by crosses. In Fig. 2 the same run has been repeated with targeting. Figure 3 shows the targeting error as a function of the generation number. The dashed line indicates the theoretical minimum of the distance. After about 80 to 100 generations, the point on the front which is closest to the target has been found with good accuracy. Figure 4 shows the path of Pbest towards the target. The jumps are due to the fact that the individual stored in Pbest gets replaced as soon as another individual is closer to the target.

The best objective value that was achieved was: f (Pbest) = (0:5265; 0:7247), its Euclidean distance from the target is 3:628710?2, which is equal to the theoretical minimal distance within the given computational accuracy.

6. Microchannel flow optimization

Both single and multiobjective EAs have been applied to a fluidic microchannel design problem. Bio-analytical applications require long thin channels for DNA sequencing by means of electrophoresis. In order to pack a channel of several meters in length onto a small square plate, curved geometries are required. However, curved channels introduce dispersion and, therefore, limit the separation eciency of the system. The question is now how to shape the contour of the channel in order to minimize dispersion. A detailed description of the problem as well as an optimization solution using gradient methods can be found in Mohammadi, Molho & Santiago (2000).

6.1. Single objective optimization

The goal of this optimization run was to minimize the nal skewness of the flow inside the channel, i.e. it was required that the iso-values of the advected species a be normal to the flow eld U by time T , when they exit the channel. The objective function dened by Mohammadi, Molho & Santiago (2000) is, therefore:

The calculation of the flow eld and evaluation of the objective function was done by an external flow solver provided by Mohammadi, Molho & Santiago (2000). Both a (1+1)-ES and a (3,12)-CMA-ES were applied to the problem and their convergence was compared. The results were statistically averaged from 5 runs with dierent initial conditions, i.e. starting points.

Since the CMA-ES has a population size of 12, it performs 12 function evaluations per generation. Figure 5 shows the convergence normalized to the same number of function calls. Figure 6 and 7 show the corresponding solutions after 20 and 180 generations of the best 1+1 run out of the ensemble (the lines are iso-potential lines of the electric eld). After 20 generations the contour of the channel gets a clearly visible dent in it. After 80 evaluations of the objective function, the algorithm has found a double-bump shape to be even better, and after 180 calls to the solver, the optimum has been reached. The value of the objective function has dropped to about 10?6 for the best run out of the ensemble. This means that dispersion is almost zero and the channel will have very good separation properties.

6.2. Multiobjective optimization

We then introduced the total deformation of the channel contour as a second objective to be minimized simultaneously in order to minimize manufacturing costs. The second objective was thus given by:

Figure 8 shows the Pareto-optimal trade-o front after 80 generations of the algorithm, and Figs. 9 and 10 show the corresponding solutions, i.e. optimized shapes of the channel. One is now free to choose whether to go for minimal skewness at the expense of a higher deformation (c.f. Fig. 9), choose some intermediate result, or minimize deformation in order to minimize manufacturing costs and still get the lowest skewness possible with the given amount of deformation (c.f. Fig. 10).

The results obtained with evolutionary optimization are comparable to the results of the gradient based method. However, far less mathematics and complex formulas were involved here, which leads to greater flexibility and shorter /time-to-solution".

7. Conclusions and future work

Single and multiobjective evolutionary algorithms have been implemented and assessed. The SPEA has successfully been extended to support targeting in objective space. It has been shown that these algorithms are easy to apply to fluid dynamical problems and that their solutions are comparable to those found by gradient based methods. In cases where gradient methods cannot be applied or where they would involve too complex mathematical calculations, evolution strategies are a good alternative to solve an optimization problem or reduce the time needed to do so as part of hybrid processes.

Future and present work addresses the acceleration of convergence of these algorithms and their implementation in hybrid processes.