Submitted 5/98; published 12/98 |
Gianni Di Caro | |
Marco Dorigo | |
IRIDIA, Universite Libre de Bruxelles 50, av. F. Roosevelt, CP 194/6, 1050 - Brussels, Belgium |
The characteristics of the routing problem (discussed in Section 2.2) make it well suited to be solved by a mobile multi-agent approach (Stone & Veloso, 1996; Gray, Kotz, Nog, Rus, & Cybenko, 1997). This processing paradigm is a good match for the distributed and non-stationary (in topology and traffic patterns) nature of the problem, presents a high level of redundancy and fault-tolerance, and can handle multiple objectives and constraints in a exible way.
AntNet, the routing algorithm we propose in this paper, is a mobile agents system showing some essential features of parallel replicated Monte Carlo systems (Streltsov & Vakili, 1996). AntNet takes inspiration from previous work on articial ant colonies techniques to solve combinatorial optimization problems (Dorigo et al., 1991; Dorigo, 1992; Dorigo et al., 1996; Dorigo & Gambardella, 1997) and telephone network routing (Schoonderwoerd et al., 1996, 1997). The core ideas of these techniques (for a review see Dorigo, Di Caro, and Gambardella, 1998) are (i) the use of repeated and concurrent simulations carried out by a population of articial agents called "ants" to generate new solutions to the problem, (ii) the use by the agents of stochastic local search to build the solutions in an incremental way, and (iii) the use of information collected during past simulations to direct future search for better solutions.
In the artificial ant colony approach, following an iterative process, each ant builds a solution by using two types of information locally accessible: problem-specific information (for example, distance among cities in a traveling salesman problem), and information added by ants during previous iterations of the algorithm. In fact, while building a solution, each ant collects information on the problem characteristics and on its own performance, and uses this information to modify the representation of the problem, as seen locally by the other ants. The representation of the problem is modied in such a way that information contained in past good solutions can be exploited to build new better solutions. This form of indirect communication mediated by the environment is called stigmergy, and is typical of social insects (Grasse, 1959).
In AntNet, we retain the core ideas of the articial ant colony paradigm, and we apply them to solve in an adaptive way the routing problem in datagram networks.
Informally, the AntNet algorithm and its main characteristics can be summarized as follows.
AntNet is conveniently described in terms of two sets of homogeneous mobile agents (Stone & Veloso, 1996), called in the following forward and backward ants. Agents in each set possess the same structure, but they are diferently situated in the environment; that is, they can sense di erent inputs and they can produce diferent, independent outputs. They can be broadly classified as deliberative agents, because they behave reactively retrieving a pre-compiled set of behaviors, and at the same time they maintain a complete internal state description. Agents communicate in an indirect way, according to the stigmergy paradigm, through the information they concurrently read and write in two data structures stored in each network node k (see Figure 1):
i) A routing table Tk, organized as in vector-distance algorithms (see Appendix A), but with probabilistic entries. Tk defines the probabilistic routing policy currently adopted at node k: for each possible destination d and for each neighbor node n, Tk stores a probability value Pnd expressing the goodness (desirability), under the current network-wide routing policy, of choosing n as next node when the destination node is d :
ii) An array Mk(), of data structures defining a simple parametric statistical model for the traffic distribution over the network as seen by the local node k. The model is adaptive and described by sample means and variances computed over the trip times experienced by the mobile agents, and by a moving observation window Wd used to store the best value Wbestd of the agents' trip time.
For each destination d in the network, an estimated mean and variance,d and d2, give a representation of the expected time to go and of its stability. We used arithmetic, exponential and windowed strategies to compute the statistics. Changing strategy does not aect performance much, but we observed the best results using the exponential model:
( 1 ) |
where ok->d is the new observed agent's trip time from node k to destination d.
The moving observation window Wd is used to compute the value Wbestd of the best agents' trip time towards destination d as observed in the last w samples. After each new sample, w is incremented modulus |W |max, and |W |max is the maximum allowed size of the observation window. The value Wbestd represents a short-term memory expressing a moving empirical lower bound of the estimate of the time to go to node d from the current node.
T and M can be seen as memories local to nodes capturing different aspects of the network dynamics. The model M maintains absolute distance/time estimates to all the nodes, while the routing table gives relative probabilistic goodness measures for each link-destination pair under the current routing policy implemented over all the network.
( 2 ) |
( 3 ) |
( 4 ) |
In all the experiments we ran, we observed that the introduced correction is a very eective mechanism. Depending on the characteristics of the problem, the best value to assign to the weight can vary, but if ranges between 0.2 and 0.5, performance doesn't change appreciably. For lower values, the effect of lis vanishing, while for higher values the resulting routing tables oscillate and, in both cases, performance degrades.
i) Mk is updated with the values stored in the stack memory Ss->d(k). The time elapsed to arrive (for the forward ant) to the destination node d' starting from the current node is used to update the mean and variance estimates, d' and d'2, and the best value over the observation window Wd' . In this way, a parametric model of the traveling time to destination d' is maintained. The mean value of this time and its dispersion can vary strongly, depending on the traffic conditions: a poor time (path) under low traffic load can be a very good one under heavy traffic load. The statistical model has to be able to capture this variability and to follow in a robust way the uctuations of the traffic. This model plays a critical role in the routing table updating process (see item (ii) below). Therefore, we investigated several ways to build effective and computationally inexpensive models, as described in the following Section 4.2.
ii) The routing table Tk is changed by incrementing the probability Pfd' (i.e., the probability of choosing neighbor f when destination is d') and decrementing, by normalization, the other probabilities Pnd' . The amount of the variation in the probabilities depends on a measure of goodness we associate with the trip time Tk->d' experienced by the forward ant, and is given below. This time represents the only available explicit feedback signal to score paths. It gives a clear indication about the goodness r of the followed route because it is proportional to its length from a physical point of view (number of hops, transmission capacity of the used links, processing speed of the crossed nodes) and from a traffic congestion point of view (the forward ants share the same queues as data packets). The time measure T, composed by all the sub-paths elapsed times, cannot be associated with an exact error measure, given that we don't know the "optimal" trip times, which depend on the whole network load status.9 Therefore, T can only be used as a reinforcement signal. This gives rise to a credit assignment problem typical of the reinforcement learning field (Bertsekas & Tsitsiklis, 1996; Kaelbling et al., 1996). We define the reinforcement r r(T;Mk) to be a function of the goodness of the observed trip time as estimated on the basis of the local traffic model. r is a dimensionless value, r (0; 1], used by the current node k as a positive reinforcement for the node f the backward ant Bd->s comes from. r takes into account some average of the so far observed values and of their dispersion to score the goodness of the trip time T, such that the smaller T is, the higher r is (the exact definition of r is discussed in the next subsection). The probability Pfd' is increased by the reinforcement value as follows:
( 5 ) |
It is important to remark that every discovered path receives a positive reinforcement in its selection probability, and the reinforcement is (in general) a non-linear function of the goodness of the path, as estimated using the associated trip time. In this way, not only the (explicit) assigned value r plays a role, but also the (implicit) ant's arrival rate. This strategy is based on trusting paths that receive either high reinforcements, independent of their frequency, or low and frequent reinforcements. In fact, for any traffic load condition, a path receives one or more high reinforcements only if it is much better than previously explored paths. On the other hand, during a transient phase after a sudden increase in network load all paths will likely have high traversing times with respect to those learned by the model M in the preceding, low congestion, situation. Therefore, in this case good paths can only be differentiated by the frequency of ants' arrivals. Assigning always a positive, but low, reinforcement value in the case of paths with high traversal time allows the implementation of the above mechanism based on the frequency of the reinforcements, while, at the same time, avoids giving excessive credit to paths with high traversal time due to their poor quality. The use of probabilistic entries is very specific to AntNet and we observed it to be effective, improving the performance, in some cases, even by 30%-40%. Routing tables are used in a probabilistic way not only by the ants but also by the data packets. This has been observed to improve AntNet performance, which means that the way the routing tables are built in AntNet is well matched with a probabilistic distribution of the data packets over all the good paths. Data packets are prevented from choosing links with very low probability by remapping the T 's entries by means of a power function f(p) =p; > 1, which emphasizes high probability values and reduces lower ones (in our experiments we set to 1.2). Figure 2 gives a high-level description of the algorithm in pseudo-code, while Figure 3 illustrates a simple example of the algorithm behavior. A detailed discussion of the characteristics of the algorithm is postponed to Section 8, after the performance of the algorithm has been analyzed with respect to a set of competitor algorithms. In this way, the characteristics of AntNet can be meaningfully evaluated and compared to those of other state-of-the-art algorithms.
In the experiments we ran with this strategy, the algorithm showed moderately good performance. These results suggest that the "implicit" component of the algorithm, based on the ant arrival rate, plays a very important role. Of course, to compete with state-of-the-art algorithms, the available information about path costs has to be used.
( 7 ) |
In Equation 7, Wbest is the best trip time experienced by the ants traveling toward the destination d, over the last observation window W. The maximum size of the window (the maximum number of considered samples before resetting the Wbest value) is assigned on the basis of the coefficient of Equation 1. As we said, weights the number of samples effectively giving a contribution to the value of the estimate, defining a sort of moving exponential window. Following the expression for the number of effective samples as reported in footnote 7, we set |W|max = 5(c/), with c < 1. In this way, the long-term exponential mean and the short-term windowing are referring to a comparable set of observations, with the short-term mean evaluated over a fraction c of the samples used for the long-term one. Isup and Iinf are convenient estimates of the limits of an approximate confidence interval for . Iinf is set to Wbest, while , with where gives the selected confidence level. There is some level of arbitrariness in our computation of the confidence interval, because we set it in an asymmetric way and and are not arithmetic estimates. Anyway, what we need is a quick, raw estimate of the mean value and of the dispersion of the values (for example, a local bootstrap procedure could have been applied to extract a meaningful confidence interval, but such a choice is not reasonable from a CPU time-consuming perspective).
The first term in Equation 7 simply evaluates the ratio between the current trip time and the best trip time observed over the current observation window. This term is corrected by the second one, that evaluates how far the value T is from Iinf in relation to the extension of the confidence interval, that is, considering the stability in the latest trip times. The coefficients c1 and c2 weight the importance of each term. The first term is the most important one, while the second term plays the role of a correction. In the current implementation of the algorithm we set c1=0.7 and c2=0.3. We observed that c2 shouldn't be too big (0.35 is an upper limit), otherwise performance starts to degrade appreciably. The behavior of the algorithm is quite stable for c2 values in the range 0.15 to 0.35 but setting c2 below 0.15 slightly degrades performance. The algorithm is very robust to changes in , which defines the confidence level: varying the confidence level in the range from 75% to 95% changes performance little. The best results have been obtained for values around 75%-80%. We observed that the algorithm is very robust to its internal parameter settings and we didn't try to "adapt" the set of parameters to the problem instance. All the diffierent experiments were carried out with the same "reasonable" settings. We could surely improve the performance by means of a finer tuning of the parameters, but we didn't because we were interested in implementing a robust system, considering that the world of networks is incredibly varied in terms of traffic, topologies, switch and transmission characteristics, etc.
The value r obtained from Equation 7 is finally transformed by means of a squash function s(x):
( 8 ) |
|
( 9 ) |
Squashing the r values allows the system to be more sensitive in rewarding good (high) values of r, while having the tendency to saturate the rewards for bad (near to zero) r values: the scale is compressed for lower values and expanded in the upper part. In such a way an emphasis is put on good results, while bad results play a minor role.
The coefficient a/|Nk|
determines a parametric dependence of the squashed reinforcement value
on the number |Nk| of neighbors of the reinforced node
k: the greater the number of neighbors, the higher the reinforcement
(see Figure 4). The reason to do this is that we want to have a similar,
strong, effect of good results on the probabilistic routing tables, independent
of the number of neighbor nodes. |
Figure 4: Examples of squash functions with a variable number of node neighbors. |