Journal of Artificial Intelligence Research 9 (1998) 317-365
Submitted 5/98; published 12/98

AntNet: Distributed Stigmergetic Control for Communications Networks

Gianni Di Caro
gdicaro@iridia.ulb.ac.be
Marco Dorigo
mdorigo@ulb.ac.be
IRIDIA, Universite Libre de Bruxelles 50, av. F. Roosevelt, CP 194/6, 1050 - Brussels, Belgium  

В электронной библиотеке представлена только четвертая глава статьи Di Caro G., Dorigo M. "AntNet: Distributed Stigmergetic Control for Communications Network". Четвертая глава посвящена описанию алгоритма маршрутизации AntNet.

4. AntNet: An Adaptive Agent-based Routing Algorithm

The characteristics of the routing problem (discussed in Section 2.2) make it well suited to be solved by a mobile multi-agent approach (Stone & Veloso, 1996; Gray, Kotz, Nog, Rus, & Cybenko, 1997). This processing paradigm is a good match for the distributed and non-stationary (in topology and traffic patterns) nature of the problem, presents a high level of redundancy and fault-tolerance, and can handle multiple objectives and constraints in a exible way.

AntNet, the routing algorithm we propose in this paper, is a mobile agents system showing some essential features of parallel replicated Monte Carlo systems (Streltsov & Vakili, 1996). AntNet takes inspiration from previous work on arti cial ant colonies techniques to solve combinatorial optimization problems (Dorigo et al., 1991; Dorigo, 1992; Dorigo et al., 1996; Dorigo & Gambardella, 1997) and telephone network routing (Schoonderwoerd et al., 1996, 1997). The core ideas of these techniques (for a review see Dorigo, Di Caro, and Gambardella, 1998) are (i) the use of repeated and concurrent simulations carried out by a population of arti cial agents called "ants" to generate new solutions to the problem, (ii) the use by the agents of stochastic local search to build the solutions in an incremental way, and (iii) the use of information collected during past simulations to direct future search for better solutions.

In the artificial ant colony approach, following an iterative process, each ant builds a solution by using two types of information locally accessible: problem-specific information (for example, distance among cities in a traveling salesman problem), and information added by ants during previous iterations of the algorithm. In fact, while building a solution, each ant collects information on the problem characteristics and on its own performance, and uses this information to modify the representation of the problem, as seen locally by the other ants. The representation of the problem is modi ed in such a way that information contained in past good solutions can be exploited to build new better solutions. This form of indirect communication mediated by the environment is called stigmergy, and is typical of social insects (Grasse, 1959).

In AntNet, we retain the core ideas of the arti cial ant colony paradigm, and we apply them to solve in an adaptive way the routing problem in datagram networks.

Informally, the AntNet algorithm and its main characteristics can be summarized as follows.

In the following subsections the above scheme is explained, all its components are explicated and discussed, and a more detailed description of the algorithm is given.

4.1 Algorithm Description and Characteristics

AntNet is conveniently described in terms of two sets of homogeneous mobile agents (Stone & Veloso, 1996), called in the following forward and backward ants. Agents in each set possess the same structure, but they are diferently situated in the environment; that is, they can sense di erent inputs and they can produce diferent, independent outputs. They can be broadly classified as deliberative agents, because they behave reactively retrieving a pre-compiled set of behaviors, and at the same time they maintain a complete internal state description. Agents communicate in an indirect way, according to the stigmergy paradigm, through the information they concurrently read and write in two data structures stored in each network node k (see Figure 1):

Figure 1: Node structures used by mobile agents in AntNet for the case of a node with L neighbors and a network with N nodes. The routing table is organized as in vector-distance algorithms, but the entries are probabilistic values. The structure containing statistics about the local trafic plays the role of a local adaptive model for the traffic toward each possible destination.

i) A routing table Tk, organized as in vector-distance algorithms (see Appendix A), but with probabilistic entries. Tk defines the probabilistic routing policy currently adopted at node k: for each possible destination d and for each neighbor node n, Tk stores a probability value Pnd expressing the goodness (desirability), under the current network-wide routing policy, of choosing n as next node when the destination node is d :

ii) An array Mk(), of data structures defining a simple parametric statistical model for the traffic distribution over the network as seen by the local node k. The model is adaptive and described by sample means and variances computed over the trip times experienced by the mobile agents, and by a moving observation window Wd used to store the best value Wbestd of the agents' trip time.

For each destination d in the network, an estimated mean and variance,d and d2, give a representation of the expected time to go and of its stability. We used arithmetic, exponential and windowed strategies to compute the statistics. Changing strategy does not a ect performance much, but we observed the best results using the exponential model:

( 1 )

where ok->d is the new observed agent's trip time from node k to destination d.

The moving observation window Wd is used to compute the value Wbestd of the best agents' trip time towards destination d as observed in the last w samples. After each new sample, w is incremented modulus |W |max, and |W |max is the maximum allowed size of the observation window. The value Wbestd represents a short-term memory expressing a moving empirical lower bound of the estimate of the time to go to node d from the current node.

T and M can be seen as memories local to nodes capturing different aspects of the network dynamics. The model M maintains absolute distance/time estimates to all the nodes, while the routing table gives relative probabilistic goodness measures for each link-destination pair under the current routing policy implemented over all the network.

    The AntNet algorithm is described as follows.
  1. 1. At regular intervals t from every network node s, a mobile agent (forward ant) Fs->d is launched toward a destination node d to discover a feasible, low-cost path to that node and to investigate the load status of the network. Forward ants share the same queues as data packets, so that they experience the same traffic loads. Destinations are locally selected according to the data traffic patterns generated by the local workload: if fsd is a measure (in bits or in number of packets) of the data flow s -> d, then the probability of creating at node s a forward ant with node d as destination is
    ( 2 )
    In this way, ants adapt their exploration activity to the varying data traffic distribution.
  2. While traveling toward their destination nodes, the agents keep memory of their paths and of the traffic conditions found. The identi er of every visited node k and the time elapsed since the launching time to arrive at this k-th node are pushed onto a memory stack Ss->d(k).
  3. At each node k, each traveling agent headed towards its destination d selects the node n to move to choosing among the neighbors it did not already visit, or over all the neighbors in case all of them had been previously visited. The neighbor n is selected with a probability (goodness) P'nd computed as the normalized sum of the probabilistic entry Pnd of the routing table with a heuristic correction factor ln taking into account the state (the length) of the n-th link queue of the current node k:
    ( 3 )
    The heuristic correction ln is a [0,1] normalized value proportional to the length qn (in bits waiting to be sent) of the queue of the link connecting the node k with its neighbor n:
    ( 4 )
    The value of weights the importance of the heuristic correction with respect to the probability values stored in the routing table. ln reflects the instantaneous state of the node's queues, and assuming that the queue's consuming process is almost stationary or slowly varying, ln gives a quantitative measure associated with the queue waiting time. The routing tables values, on the other hand, are the outcome of a continual learning process and capture both the current and the past status of the whole network as seen by the local node. Correcting these values with the values of l allows the system to be more "reactive", at the same time avoiding following all the network uctuations. Agent's decisions are taken on the basis of a combination of a long-term learning process and an instantaneous heuristic prediction.
  4. In all the experiments we ran, we observed that the introduced correction is a very e ective mechanism. Depending on the characteristics of the problem, the best value to assign to the weight can vary, but if ranges between 0.2 and 0.5, performance doesn't change appreciably. For lower values, the effect of lis vanishing, while for higher values the resulting routing tables oscillate and, in both cases, performance degrades.

  5. If a cycle is detected, that is, if an ant is forced to return to an already visited node, the cycle's nodes are popped from the ant's stack and all the memory about them is destroyed. If the cycle lasted longer than the lifetime of the ant before entering the cycle, (that is, if the cycle is greater than half the ant's age) the ant is destroyed. In fact, in this case the agent wasted a lot of time probably because of a wrong sequence of decisions and not because of congestion states. Therefore, the agent is carrying an old and misleading memory of the network state and it is counterproductive to use it to update the routing tables (see below).
  6. When the destination node d is reached, the agent Fs->d generates another agent (backward ant) Bd->s, transfers to it all of its memory, and dies.
  7. The backward ant takes the same path as that of its corresponding forward ant, but in the opposite direction. At each node k along the path it pops its stack Ss->d(k) to know the next hop node. Backward ants do not share the same link queues as data packets; they use higher priority queues, because their task is to quickly propagate to the routing tables the information accumulated by the forward ants.
  8. Arriving at a node k coming from a neighbor node f, the backward ant updates the two main data structures of the node, the local model of the traffic Mk and the rout-ng table Tk, for all the entries corresponding to the (forward ant) destination node d. With some precautions, updates are performed also on the entries corresponding to every node k' Sk->d,k'd on the "sub-paths" followed by ant Fs->d after visiting the current node k. In fact, if the elapsed trip time of a sub-path is statistically "good" (i.e., it is less than +I (; ), where I is an estimate of a con dence interval for ), then the time value is used to update the corresponding statistics and the routing table. On the contrary, trip times of sub-paths not deemed good, in the same statistical sense as de ned above, are not used because they don't give a correct idea of the time to go toward the sub-destination node. In fact, all the forward ant routing decisions were made only as a function of the destination node. In this perspective, sub-paths are side e ects, and they are intrinsically sub-optimal because of the local variations in the traffic load (we can't reason with the same perspective as in dynamic programming, because of the non-stationarity of the problem representation). Obviously, in case of a good sub-path we can use it: the ant discovered, at zero cost, an additional good route. In the following two items the way M and T are updated is described with respect to a generic "destination" node d' Sk->d:

    i) Mk is updated with the values stored in the stack memory Ss->d(k). The time elapsed to arrive (for the forward ant) to the destination node d' starting from the current node is used to update the mean and variance estimates, d' and d'2, and the best value over the observation window Wd' . In this way, a parametric model of the traveling time to destination d' is maintained. The mean value of this time and its dispersion can vary strongly, depending on the traffic conditions: a poor time (path) under low traffic load can be a very good one under heavy traffic load. The statistical model has to be able to capture this variability and to follow in a robust way the uctuations of the traffic. This model plays a critical role in the routing table updating process (see item (ii) below). Therefore, we investigated several ways to build effective and computationally inexpensive models, as described in the following Section 4.2.

    ii) The routing table Tk is changed by incrementing the probability Pfd' (i.e., the probability of choosing neighbor f when destination is d') and decrementing, by normalization, the other probabilities Pnd' . The amount of the variation in the probabilities depends on a measure of goodness we associate with the trip time Tk->d' experienced by the forward ant, and is given below. This time represents the only available explicit feedback signal to score paths. It gives a clear indication about the goodness r of the followed route because it is proportional to its length from a physical point of view (number of hops, transmission capacity of the used links, processing speed of the crossed nodes) and from a traffic congestion point of view (the forward ants share the same queues as data packets). The time measure T, composed by all the sub-paths elapsed times, cannot be associated with an exact error measure, given that we don't know the "optimal" trip times, which depend on the whole network load status.9 Therefore, T can only be used as a reinforcement signal. This gives rise to a credit assignment problem typical of the reinforcement learning field (Bertsekas & Tsitsiklis, 1996; Kaelbling et al., 1996). We define the reinforcement r r(T;Mk) to be a function of the goodness of the observed trip time as estimated on the basis of the local traffic model. r is a dimensionless value, r (0; 1], used by the current node k as a positive reinforcement for the node f the backward ant Bd->s comes from. r takes into account some average of the so far observed values and of their dispersion to score the goodness of the trip time T, such that the smaller T is, the higher r is (the exact definition of r is discussed in the next subsection). The probability Pfd' is increased by the reinforcement value as follows:

    ( 5 )

    It is important to remark that every discovered path receives a positive reinforcement in its selection probability, and the reinforcement is (in general) a non-linear function of the goodness of the path, as estimated using the associated trip time. In this way, not only the (explicit) assigned value r plays a role, but also the (implicit) ant's arrival rate. This strategy is based on trusting paths that receive either high reinforcements, independent of their frequency, or low and frequent reinforcements. In fact, for any traffic load condition, a path receives one or more high reinforcements only if it is much better than previously explored paths. On the other hand, during a transient phase after a sudden increase in network load all paths will likely have high traversing times with respect to those learned by the model M in the preceding, low congestion, situation. Therefore, in this case good paths can only be differentiated by the frequency of ants' arrivals. Assigning always a positive, but low, reinforcement value in the case of paths with high traversal time allows the implementation of the above mechanism based on the frequency of the reinforcements, while, at the same time, avoids giving excessive credit to paths with high traversal time due to their poor quality. The use of probabilistic entries is very specific to AntNet and we observed it to be effective, improving the performance, in some cases, even by 30%-40%. Routing tables are used in a probabilistic way not only by the ants but also by the data packets. This has been observed to improve AntNet performance, which means that the way the routing tables are built in AntNet is well matched with a probabilistic distribution of the data packets over all the good paths. Data packets are prevented from choosing links with very low probability by remapping the T 's entries by means of a power function f(p) =p; > 1, which emphasizes high probability values and reduces lower ones (in our experiments we set to 1.2). Figure 2 gives a high-level description of the algorithm in pseudo-code, while Figure 3 illustrates a simple example of the algorithm behavior. A detailed discussion of the characteristics of the algorithm is postponed to Section 8, after the performance of the algorithm has been analyzed with respect to a set of competitor algorithms. In this way, the characteristics of AntNet can be meaningfully evaluated and compared to those of other state-of-the-art algorithms.


    Figure 2: AntNet's top-level description in pseudo-code. All the described actions take place in a completely distributed and concurrent way over the network nodes (while, in the text, AntNet has been described from an individual ant's perspective). All the constructs at the same level of indentation inside the context of the statement in parallel are executed concurrently. The processes of data generation and forwarding are not described, but they can be thought as acting concurrently with the ants.

    Figure 3: Example of AntNet behavior. The forward ant, F1->4, moves along the path 1 ->2 ->3 ->4 and, arrived at node 4, launches the backward ant B4->1 that will travel in the opposite direction. At each node k; k = 3, . . . , 1, the backward ant will use the stack contents S1->4(k) to update the values for Mk(), and, in case of good sub-paths, to update also the values forMk(); i = k+1, . . . ,3. At the same time the routing table will be updated by incrementing the goodness Pj4, j = k + 1, of the last node k + 1 the ant B4!1 came from, for the case of node i = k + 1, . . . ,4 as destination node, and decrementing the values of P for the other neighbors (here not shown). The increment will be a function of the trip time experienced by the forward ant going from node k to destination node i. As for M, the routing table is always updated for the case of node 4 as destination, while the other nodes i0 = k + 1; : : : ; 3 on the sub-paths are taken in consideration as destination nodes only if the trip time associated to the corresponding sub-path of the forward ant is statistically good.

4.2 How to Score the Goodness of the Ant's Trip Time

In Equation 7, Wbest is the best trip time experienced by the ants traveling toward the destination d, over the last observation window W. The maximum size of the window (the maximum number of considered samples before resetting the Wbest value) is assigned on the basis of the coefficient of Equation 1. As we said, weights the number of samples effectively giving a contribution to the value of the estimate, defining a sort of moving exponential window. Following the expression for the number of effective samples as reported in footnote 7, we set |W|max = 5(c/), with c < 1. In this way, the long-term exponential mean and the short-term windowing are referring to a comparable set of observations, with the short-term mean evaluated over a fraction c of the samples used for the long-term one. Isup and Iinf are convenient estimates of the limits of an approximate confidence interval for . Iinf is set to Wbest, while , with where gives the selected confidence level. There is some level of arbitrariness in our computation of the confidence interval, because we set it in an asymmetric way and and are not arithmetic estimates. Anyway, what we need is a quick, raw estimate of the mean value and of the dispersion of the values (for example, a local bootstrap procedure could have been applied to extract a meaningful confidence interval, but such a choice is not reasonable from a CPU time-consuming perspective).

The first term in Equation 7 simply evaluates the ratio between the current trip time and the best trip time observed over the current observation window. This term is corrected by the second one, that evaluates how far the value T is from Iinf in relation to the extension of the confidence interval, that is, considering the stability in the latest trip times. The coefficients c1 and c2 weight the importance of each term. The first term is the most important one, while the second term plays the role of a correction. In the current implementation of the algorithm we set c1=0.7 and c2=0.3. We observed that c2 shouldn't be too big (0.35 is an upper limit), otherwise performance starts to degrade appreciably. The behavior of the algorithm is quite stable for c2 values in the range 0.15 to 0.35 but setting c2 below 0.15 slightly degrades performance. The algorithm is very robust to changes in , which defines the confidence level: varying the confidence level in the range from 75% to 95% changes performance little. The best results have been obtained for values around 75%-80%. We observed that the algorithm is very robust to its internal parameter settings and we didn't try to "adapt" the set of parameters to the problem instance. All the diffierent experiments were carried out with the same "reasonable" settings. We could surely improve the performance by means of a finer tuning of the parameters, but we didn't because we were interested in implementing a robust system, considering that the world of networks is incredibly varied in terms of traffic, topologies, switch and transmission characteristics, etc.

The value r obtained from Equation 7 is finally transformed by means of a squash function s(x):
( 8 )
( 9 )

Squashing the r values allows the system to be more sensitive in rewarding good (high) values of r, while having the tendency to saturate the rewards for bad (near to zero) r values: the scale is compressed for lower values and expanded in the upper part. In such a way an emphasis is put on good results, while bad results play a minor role.

The coefficient a/|Nk| determines a parametric dependence of the squashed reinforcement value on the number |Nk| of neighbors of the reinforced node k: the greater the number of neighbors, the higher the reinforcement (see Figure 4). The reason to do this is that we want to have a similar, strong, effect of good results on the probabilistic routing tables, independent of the number of neighbor nodes.

Figure 4: Examples of squash functions with a variable number of node neighbors.