The Ant System: Optimization by a colony of cooperating agents

Авторы: Marco Dorigo, Member, IEEE,Vittorio Maniezzo and Alberto Colorni Источник: IJ.10-SMC96.pdf

The Ant System: Optimization by a colony of cooperating agents

Marco Dorigo, Member, IEEE,Vittorio Maniezzo and Alberto Colorni

Introduction

In this paper we define a new general-purpose heuristic algorithm which can be used to solve different combinatorial optimization problems. The new heuristic has the following desirable characteristics:

It is versatile, in that it can be applied to similar versions of the same problem; for example, there is a straightforward extension from the traveling salesman problem (TSP) to the asymmetric traveling salesman problem (ATSP).

It is robust. It can be applied with only minimal changes to other combinatorial optimization problems such as the quadratic assignment problem (QAP) and the job-shop scheduling problem (JSP).

It's population based approach. This is interesting because it allows the exploitation of positive feedback as a search mechanism, as explained later in the paper. It also makes the system amenable to parallel implementations (though this is not considered in this paper).

These desirable properties are counterbalanced by the fact that, for some applications, the Ant System can be outperformed by more specialized algorithms. This is a problem shared by other popular approaches like simulated annealing (SA), and tabu search (TS), with which we compare the Ant System. Nevertheless, we believe that, as is the case with SA and TS, our approach is meaningful in view of applications to problems which, although very similar to well known and studied basic problems, present peculiarities which make the application of the standard best-performing algorithm impossible. This is the case, for example, with the ATSP.

In the approach discussed in this paper we distribute the search activities over so-called "ants," that is, agents with very simple basic capabilities which, to some extent, mimic the behavior of real ants. In fact, research on the behavior of real ants has greatly inspired our work (see [10], [11], [21]). One of the problems studied by ethologists was to understand how almost blind animals like ants could manage to establish shortest route paths from their colony to feeding sources and back. It was found that the medium used to communicate information among individuals regarding paths, and used to decide where to go, consists of pheromone trails. A moving ant lays some pheromone (in varying quantities) on the ground, thus marking the path by a trail of this substance. While an isolated ant moves essentially at random, an ant encountering a previously laid trail can detect it and decide with high probability to follow it, thus reinforcing the trail with its own pheromone. The collective behavior that emerges is a form of autocatalyticbehavior1 where the more the ants following a trail, the more attractive that trail becomes for being followed. The process is thus characterized by a positive feedback loop, where the probability with which an ant chooses a path increases with the number of ants that previously chose the same path.

Consider for example the experimental setting shown in Fig. 1. There is a path along which ants are walking (for example from food source A to the nest E, and vice versa, see Fig. 1a). Suddenly an obstacle appears and the path is cut off. So at position B the ants walking from A to E (or at position D those walking in the opposite direction) have to decide whether to turn right or left (Fig. 1b). The choice is influenced by the intensity of the pheromone trails left by preceding ants. A higher level of pheromone on the right path gives an ant a stronger stimulus and thus a higher probability to turn right. The first ant reaching point B (or D) has the same probability to turn right or left (as there was no previous pheromone on the two alternative paths). Because path BCD is shorter than BHD, the first ant following it will reach D before the first ant following path BHD (Fig. 1c). The result is that an ant returning from E to D will find a stronger trail on path DCB, caused by the half of all the ants that by chance decided to approach the obstacle via DCBA and by the already arrived ones coming via BCD: they will therefore prefer (in probability) path DCB to path DHB. As a consequence, the number of ants following path BCD per unit of time will be higher than the number of ants following BHD. This causes the quantity of pheromone on the shorter path to grow faster than on the longer one, and therefore the probability with which any single ant chooses the path to follow is quickly biased towards the shorter one. The final result is that very quickly all ants will choose the shorter path.

The algorithms that we are going to define in the next sections are models derived from the study of real ant colonies. Therefore we call our system Ant System(AS) and the algorithms we introduce ant algorithms. As we are not interested in simulation of ant colonies, but in the use of artificial ant colonies as an optimization tool, our system will have some major differences with a real (natural) one:

artificial ants will have some memory,

they will not be completely blind,

they will live in an environment where time is discrete.

Fig. 1. An example with real ants.

Nevertheless, we believe that the ant colony metaphor can be useful to explain our model. Consider the graph of Fig. 2a, which is a possible AS interpretation of the situation of Fig. 1b. To fix the ideas, suppose that the distances between D and H, between B and H, and between B and D-via C-are equal to 1, and let C be positioned half the way between D and B (see Fig. 2a). Now let us consider what happens at regular discretized intervals of time: t=0, 1, 2

Suppose that 30 new ants come to B from A, and 30 to D from E at each time unit, that each ant walks at a speed of 1 per time unit, and that while walking an ant lays down at time t a pheromone trail of intensity 1, which, to make the example simpler, evaporates completely and instantaneously in the middle of the successive time interval (t+1, t+2). At t=0 there is no trail yet, but 30 ants are in B and 30 in D. Their choice about which way to go is completely random. Therefore, on the average 15 ants from each node will go toward H and 15 toward C (Fig. 2b). At t=1 the 30 new ants that come to B from A find a trail of intensity 15 on the path that leads to H, laid by the 15 ants that went that way from B, and a trail of intensity 30 on the path to C, obtained as the sum of the trail laid by the 15 ants that went that way from B and by the 15 ants that reached B coming from D via C (Fig. 2c). The probability of choosing a path is therefore biased, so that the expected number of ants going toward C will be the double of those going toward H: 20 versus 10 respectively. The same is true for the new 30 ants in D which came from E.

This process continues until all of the ants will eventually choose the shortest path.

Fig. 2. An example with artificial ants.

The idea is that if at a given point an ant has to choose among different paths, those which were heavily chosen by preceding ants (that is, those with a high trail level) are chosen with higher probability. Furthermore high trail levels are synonymous with short paths.

The Ant System

In this section we introduce the AS. We decided to use the well-known traveling salesman problem [26] as benchmark, in order to make the comparison with other heuristic approaches easier [20]. Although the model definition is influenced by the problem structure, we will show in Section VII that the same approach can be used to solve other optimization problems.

Given a set of n towns, the TSP can be stated as the problem of finding a minimal length closed tour that visits each town once. We call dij the length of the path between towns i and j; in the case of Euclidean TSP, dij is the Euclidean distance between i and j (i.e., ). An instance of the TSP is given by a graph (N,E), where N is the set of towns and E is the set of edges between towns (a fully connected graph in the Euclidean TSP).

Let be the number of ants in town i at time t and let m be the total number of ants. Each ant is a simple agent with the following characteristics:

it chooses the town to go to with a probability that is a function of the town distance and of the amount of trail present on the connecting edge;

to force the ant to make legal tours, transitions to already visited towns are disallowed until a tour is completed (this is controlled by a tabu list);

when it completes a tour, it lays a substance called trail on each edge (i,j) visited. Let

be the intensity of trail on edge (i,j) at time t. Each ant at time t chooses the next town, where it will be at time t+1. Therefore, if we call an iteration of the AS algorithm the m moves carried out by the m ants in the interval (t, t+1), then every n iterations of the algorithm (which we call a cycle) each ant has completed a tour. At this point the trail intensity is updated according to the following formula

(1)
where

is a coefficient such that (1 -

) represents the evaporationof trail between time t and t+n,

(2)
where

is the quantity per unit of length of trail substance (pheromone in real ants) laid on edge (i,j) by the k-th ant between time t and t+n; it is given by

(3)
where Q is a constant and Lk is the tour length of the k-th ant.

The coefficient must be set to a value < 1 to avoid unlimited accumulation of trail (see note 1). In our experiments, we set the intensity of trail at time , , to a small positive constant c.

In order to satisfy the constraint that an ant visits all the n different towns, we associate with each ant a data structure called the tabu list2, that saves the towns already visited up to time t and forbids the ant to visit them again before n iterations (a tour) have been completed. When a tour is completed, the tabu list is used to compute the ant's current solution (i.e., the distance of the path followed by the ant). The tabu list is then emptied and the ant is free again to choose. We define tabuk the dynamically growing vector which contains the tabu list of the k- th ant, tabuk the set obtained from the elements of tabuk , and tabuk (s) the s-th element of the list (i.e., the s-th town visited by the k-th ant in the current tour).

We call visibility

the quantity 1/dij. This quantity is not modified during the run of the AS, as opposed to the trail which instead changes according to the previous formula (1). We define the transition probability from town i to town j for the k-th ant as

where allowedk = {N - tabuk} and where and are parameters that control the relative importance of trail versus visibility. Therefore the transition probability is a trade-off between visibility (which says that close towns should be chosen with high probability, thus implementing a greedy constructive heuristic) and trail intensity at time t (that says that if on edge (i,j) there has been a lot of traffic then it is highly desirable, thus implementing the autocatalytic process).

The algorithms

Given the definitions of the preceding section, the so-called ant-cyclealgorithm is simply stated as follows. At time zero an initialization phase takes place during which ants are positioned on different towns and initial values ij(0) for trail intensity are set on edges. The first element of each ant's tabu list is set to be equal to its starting town. Thereafter every ant moves from town i to town j choosing the town to move to with a probability that is a function (with parameters and see formula (4)) of two desirability measures. The first, the trail ij(t), gives information about how many ants in the past have chosen that same edge (i,j); the second, the visibility ij, says that the closer a town the more desirable it is. Obviously, setting = 0, the trail level is no longer considered, and a stochastic greedy algorithm with multiple starting points is obtained.

After n iterations all ants have completed a tour, and their tabu lists will be full; at this point for each ant k the value of Lk is computed and the values kj are updated according to formula (3). Also, the shortest path found by the ants (i.e., min Lk, k = 1, ..., m) is saved and all the tabu lists are emptied. This process is iterated until the k tour counter reaches the maximum (user-defined) number of cycles NCMAX , or all ants make the same tour. We call this last case because it denotes a situation in which the algorithm stops searching for alternative solutions.

Formally the ant-cyclealgorithm is:

1.Initialize:
	Set t:=0	{t is the time counter}
	Set NC:=0	{NC is the cycles counter}
	For every edge (i,j) set an initial value  =c for trail intensity and = 0 Place the m ants on the n nodes
	
2. Set s:=1	{s is the tabu list index}
   For k:=1 to m do
	    Place the starting town of the k-th ant in tabuk(s)

3. Repeat until tabu list is full	{this step will be repeated (n-1) times}
	Set s:=s+1
	For k:=1 to m do
		Choose the town j to move to, with probability pkj (t ) given by equation (4)
 									{at time t the k-th ant is on town i=tabuk (s-1)}
		Move the k-th ant to the town j
		Insert town j in tabuk (s)

4. For k:=1 to m do
	Move the k-th ant from tabuk (n) to tabuk (1)
	Compute the length Lk of the tour described by the k-th ant 
	Update the shortest tour found
   For every edge (i,j)
For k:=1 to m do


5. For every edge (i,j) compute  ij(t+n) according to equation  ij(t+n)= . ij(t)+  ij
	Set t:=t+n
	Set NC:=NC+1
	For every edge (i,j) set   ij:=0
 
6. If (NC < NCMAX) and (not stagnation behavior)
	then
		Empty all tabu lists 
		Goto step 2 
	else 
		Print shortest tour 
	Stop

Авторы: Marco Dorigo, Member, IEEE,Vittorio Maniezzo and Alberto Colorni Источник: IJ.10-SMC96.pdf

В библиотеку