By Richard Baker
Источник: http://www.fenews.com/fen5/ga.html
The past decade has
witnessed a flurry of interest within the financial industry regarding
artificial intelligence technologies, including neural networks, fuzzy systems,
and genetic algorithms. In many ways, genetic algorithms, and the extension of
genetic programming, offer an outstanding combination of flexibility,
robustness, and simplicity. The following discussion highlights some of the key
features of genetic algorithms (GAs), and illustrates
an application of a particular GA in the search and estimation of global
optima. Virtually every technical discipline, from science and engineering to
finance and economics, frequently encounters problems of optimization. Although
optimization techniques abound, such techniques often involve identifying, in
some fashion, the values of a sequence of explanatory parameters associated
with the best performance of an underlying dependent, scalar-valued objective
function. For example, in simple 3-D space, this amounts to finding the (x,y) point associated with the
optimal z value above or below the x-y plane, where the scalar-valued z is a
surface identified by the objective function f(x,y).
Or it may involve estimating a large number of parameters of a more elaborate
econometric model. For example, we might wish to estimate the coefficients of a
generalized auto-regressive conditional heteroskedastic
(GARCH) model, in which the log-likelihood function is the objective to
maximize. In each case, the unknowns may be thought of as a parameter vector,
V, and the objective function, z = f(V), as a
transformation of a vector-valued input to a scalar-valued performance metric
z. Optimization may take the form of a minimization or maximization procedure.
Throughout this article, optimization will refer to maximization without loss
of generality, because maximizing f(V) is the same as
minimizing -f(V). My preference for maximization is simply intuitive: Genetic
algorithms are based on evolutionary processes and
A natural question for
those unfamiliar with GAs might be: Given all the
powerful numerical optimization methods available, such as deterministic
gradient-based and simplex-based search methods, why would I be interested in a
stochastic alternative like a GA? The answer is, at least in part, dependent
upon the application. If the optimization problems you encounter are reasonably
well-behaved, then conventional deterministic techniques are clearly the best
choice. But what if the problem is ill-behaved? Or, put another way, what are
some of the features that make an optimization problem difficult? Well,
difficulties arise when the estimation involves many parameters that interact
in highly non-linear ways. Objective functions characterized by many local
optima, expansive flat planes in multi-dimensional space, points at which
gradients are undefined, or when the objective function is discontinuous, pose
difficulty for traditional mathematical techniques. In these situations,
heuristic methods like GAs offer a powerful
alternative, and can greatly enhance the set of tools available to researchers.
Throughout the following discussion, try to think of GAs
as enhancements to a suite of optimization techniques, regarding them as
alternatives to more traditional methods, and not as replacements for
conventional numerical optimizers. I other words, GAs
should be used to complement familiar numerical techniques, and are not meant
to replace or subsume them.
It's important to note
that, despite all the surrounding hype, the successful design and use of a GA
is as much art as science. GA theory is elegant, fascinating, and full of
subtle variations that may - or may not - be useful in certain applications.
Since
One of the most powerful
features of GAs is their elegant simplicity. In fact,
the simple power of a genetic approach may be harnessed with little more than a
few lines of computer code and a random number generator. There are no closed
form gradients to derive, and no differentiability or continuity requirements
to satisfy. A GA is, in essence, nothing more than a repeated application of a
few simple operations. Moreover, a GA knows nothing about the problem at hand,
nor does it care. In fact, GAs need not know how to
solve a problem, they just need to recognize a good
solution when they see it!
GAs begin by randomly generating, or seeding, an initial
population of candidate solutions. Each candidate is an individual member of a
large population of size M. For the purposes of this discussion, think of each
individual as a row vector composed of N elements. In GA parlance, individuals
are often referred to as chromosomes, and the vector elements as genes. Each
gene will provide storage for, and may be associated with, a specific parameter
of the search space. As a simple example of two unknowns, we may think of each
individual as a parameter vector V, in which the each V contains a point in the
x-y plane, V = [x y]. With this example in mind, the entire population may be
stored and manipulated as two-column matrix: The first column represents a
population of x-axis values, and the second column a population of y-axis values,
and each of the M rows of the matrix is a solution vector V of length N = 2.
The individuals
(chromosomes) in genetic algorithms are usually constant-length character
sequences (vectors V of constant size). In the traditional GA, these vectors
are usually sequences of zeros and ones, but in practice may be anything,
including a mix of integers and real numbers, or even a mix of numbers and
character strings. The actual data encoded in the vectors is called the
representation scheme. To keep the discussion simple and concrete, the
chromosomes in this article will be real, continuous parameters with two
elements, V = [x y]. Given these two-element chromosomes, the objective is to
search for the (x,y) point
that maximizes the scalar-valued fitness function z = f(V) = f(x,y). Starting with the initial random population of
vectors, a GA then applies a sequence of operations to the population, guided
only by the relative fitness of the individuals, and allows the population to
evolve over a number of generations. The goal of the evolutionary process is to
continually improve the fitness of the best solution vector, as well as the
average population fitness, until some termination criteria is satisfied.
Conventional GAs usually apply a sequence of
operations to the population based on the relative fitness of the members. The
operations typically involve reproduction, in which individuals are randomly
selected to survive into the next generation. In reproduction, highly fit
individuals are more likely to be selected than unfit members. The idea behind
reproduction is to allow relatively fit members to survive and procreate, with
the hope that their genes are truly associated with better performance. Note
that reproduction is asexual, in that only a single individual is involved.
The next operation,
crossover, is a sexual operation involving two (or even more!) individuals. In
crossover, highly fit individuals are more likely to be selected to mate and
produce children than unfit members. In this manner, highly fit vectors are
allowed to breed, with the hope that they will produce ever more fit offspring.
Although the crossover operation may take many forms, it typically involves
splitting each parent chromosome at a randomly-selected point within the
interior of the chromosome, and rearranging the fragments so as to produce
offspring of the same size as the parents. The children usually differ from
each other, and are usually different from the parents. The effect of crossover
is to build upon the success of the past, yet still explore new areas of the
search space. Crossover is a slice-and-dice operation which efficiently
shuffles genetic information from one generation the next.
The next operation is
mutation, in which individuals are slightly changed. In our case, this means
that the (x,y) parameters of
the chromosome are slightly perturbed with some small probability. Mutation is
an asexual operation, and usually plays a relatively minor role in the
population. The idea behind mutation is to restore genetic diversity lost
during the application of reproduction and crossover, both of which place
relentless pressure on the population to converge.
The relentless pressure
towards convergence is driven by fitness, and offers a
convenient termination criteria. In fact, after many generations of
evolution via the repeated application of reproduction, crossover, and
mutation, the individuals in the population will often begin to look alike, so
to speak. At this point, the GA typically terminates because additional
evolution will produce little improvement in fitness. Many termination criteria
may be used, in which the most simple is to just stop after some predetermined
number of generations. See what I mean about part art, part science?
In contrast to more
traditional numerical techniques, which iteratively refine a single solution
vector as they search for optima in a multi-dimensional landscape, genetic
algorithms operate on entire populations of candidate solutions in parallel. In
fact, the parallel nature of a GA's stochastic search is one of the main
strengths of the genetic approach. This parallel nature implies that GAs are much more likely to locate
a global peak than traditional techniques, because they are much less likely to
get stuck at local optima. Also, due to the parallel nature of the stochastic
search, the performance is much less sensitive to initial conditions, and hence
and a GA's convergence time is rather predictable. In fact, the problem of
finding a local optimum is greatly minimized because GAs,
in effect, make hundreds, or even thousands, of
initial guesses. This implies that a GA's performance is at least as good as a
purely random search. In fact, by simply seeding an initial population and
stopping there, a GA without any evolutionary progression is essentially a
Furthermore, due to the
stochastic nature of a GA, the solution, although more likely to estimate the
global optimum, will only be an estimate. Users must realize that GAs will only by chance find an exact optimum, whereas
traditional methods will find it exactly . assuming, of course, they find it all. The user must then
determine whether the solution found by a GA is close enough. In may cases it will be, but the question of 'How close is
close enough?' is somewhat arbitrary and application-dependent.
To build on the notation
developed earlier, this example involves the optimization of a real valued
fitness function of two variables, z = f(V) = f(x,y). That is, we are to estimate the 2-D parameter vector
V = [x y] of the (x,y) point
associated with the maximum z value. This problem is very instructive, because
it allows the user to visualize the entire evolutionary process as an animated
sequence in 3-D space, and provides a very intuitive understanding of the
adaptive, self-improving nature of a GA in action.
Furthermore, the example is
motivated by a desire to help users integrate the stochastic search
capabilities of a GA with traditional gradient-based search methods.
Specifically, I want to emphasize the complementary nature of the two
fundamentally different approaches.
The entire optimization
example was developed in MATLAB. Although the analysis could have been
performed with any of a host of software packages, MATLAB's
integrated graphics and matrix-based processing capabilities make it ideal for
GA work. In fact, since GAs are inherently parallel,
simultaneously processing populations of parameter vectors in MATLAB is almost
trivial and significantly reduces the development time and the amount of code
required.
The particular GA I chose,
specifically designed to optimize fitness functions of continuous parameters, was co-developed by Ken Price and Rainer Storn.
Although the details of this particular GA cannot be addressed in this article,
I strongly refer interested readers to the April 1997 edition of Dr. Dobb's
Journal.
The fitness function for
this example is the MATLAB demonstration function PEAKS shown in Figure 1;
PEAKS is a function of two variables obtained by translating and scaling
several Gaussian distributions. Although PEAKS is a smooth function of only two
variables, it nevertheless has three local maxima, and thus represents a simple
function for which the parameters estimated by a gradient-based search method
are sensitive to the initial guess of the solution vector. The gradient-based
optimization function used is the FMINU function of the MATLAB Optimization
Toolbox. Note that FMINU is a minimization routine, so to maximize the PEAKS
function, FMINU actually needs to minimize -PEAKS.
Figure 1
To illustrate the example,
set an initial vector estimate V0 = [-1 -1]. FMINU returns a solution vector V
= [-0.4601 -0.6292] associated with the local maximum z = f(-0.4601,-0.6292)
= 3.7766. If we start at V0 = [1 0], FMINU returns a solution vector V =
[1.2857 -0.0048] associated with the local maximum z = f(1.2857,-0.0048)
= 3.5925. Finally, if we start at V0 = [2 2], FMINU
returns the solution vector V = [-0.0093 1.5814] associated with the true
global maximum z = f(-0.0093,1.5814]) = 8.1062.
The gradient-based method
works extremely well, provided it knows where to start. In this simple case we
can visually determine the location of the true global peak, but in problems of
much higher dimension there is frequently no such luxury.
Now consider first running
the Price/Storn GA to provide FMINU with an initial
estimate. The initial population was seeded with only 20 individual chromosomes
(20 (x,y) points randomly
generated such that -3 < x,y < 3), and allowed
to evolve for only 10 generations, producing the parameter estimate V0 =
[0.1205 1.6551]. This estimate is then passed to FMINU. The result, as
expected, is now the true global maximum z = f(-0.0093,1.5814])
= 8.1062.
Summary
Stochastic techniques can
expand the optimization tools now available to researchers in a way that
enhances and complements the familiar, traditional methods. Genetic approaches,
in particular, are now available to optimize difficult, ill-behaved objective
functions that often prove difficult for many conventional methods.
Furthermore, these genetic approaches are often simple to design and easy to
code, and can be used in concert with traditional methods to greatly increase the
probability of finding the true global optimum of multi-dimensional functions.
References
1. Dhar,
V, and Stein, R.(1997). Seven Methods for Transforming
Corporate Data into Business Intelligence.. Prentice-Hall.
2. Epstein, J.M., and Axtell, R. (1996). Growing Artificial
Societies: Social Science from the Bottom Up. MIT
Press.
3. Goldberg, D.E (1989). Genetic Algorithms in Search,
Optimization & Machine Learning. Addison-Wesley Longman.
4.
5. Koza, J.R. (1992). Genetic Programming: On the
Programming of Computers by Means of Natural Selection. MIT
Press.
6. Koza, J.R. (1994). Genetic Programming II:
Automatic Discovery of Reusable Programs. MIT Press.
7. Price, K., and Storn, R (April 1997). Differential
Evolution: A Simple Evolution Strategy for Fast Optimization. Dr. Dobb's Journal,
264, pp. 18-24.