Gianni Di Caro, Marco Dorigo. "AntNet: Distributed Stigmergetic Control for Communications Networks"

Arriving at a node k coming from a neighbor node f, the backward ant updates the two main data structures of the node, the local model of the traffic M_k and the rout-ng table T_k, for all the entries corresponding to the (forward ant) destination node d. With some precautions, updates are performed also on the entries corresponding to every node k'

S_k->d,k'd on the "sub-paths" followed by ant F_s->d after visiting the current node k. In fact, if the elapsed trip time of a sub-path is statistically "good" (i.e., it is less than

+I (

;

), where I is an estimate of a condence interval for

), then the time value is used to update the corresponding statistics and the routing table. On the contrary, trip times of sub-paths not deemed good, in the same statistical sense as dened above, are not used because they don't give a correct idea of the time to go toward the sub-destination node. In fact, all the forward ant routing decisions were made only as a function of the destination node. In this perspective, sub-paths are side eects, and they are intrinsically sub-optimal because of the local variations in the traffic load (we can't reason with the same perspective as in dynamic programming, because of the non-stationarity of the problem representation). Obviously, in case of a good sub-path we can use it: the ant discovered, at zero cost, an additional good route. In the following two items the way M and T are updated is described with respect to a generic "destination" node d'

S_k->d:

i) M_kis updated with the values stored in the stack memory S_s->d(k). The time elapsed to arrive (for the forward ant) to the destination node d' starting from the current node is used to update the mean and variance estimates,

_d' and

_d'2, and the best value over the observation window W_d'. In this way, a parametric model of the traveling time to destination d' is maintained. The mean value of this time and its dispersion can vary strongly, depending on the traffic conditions: a poor time (path) under low traffic load can be a very good one under heavy traffic load. The statistical model has to be able to capture this variability and to follow in a robust way the uctuations of the traffic. This model plays a critical role in the routing table updating process (see item (ii) below). Therefore, we investigated several ways to build effective and computationally inexpensive models, as described in the following Section 4.2.

ii) The routing table T_k is changed by incrementing the probability P_fd' (i.e., the probability of choosing neighbor f when destination is d') and decrementing, by normalization, the other probabilities P_nd' . The amount of the variation in the probabilities depends on a measure of goodness we associate with the trip time T_k->d' experienced by the forward ant, and is given below. This time represents the only available explicit feedback signal to score paths. It gives a clear indication about the goodness r of the followed route because it is proportional to its length from a physical point of view (number of hops, transmission capacity of the used links, processing speed of the crossed nodes) and from a traffic congestion point of view (the forward ants share the same queues as data packets). The time measure T, composed by all the sub-paths elapsed times, cannot be associated with an exact error measure, given that we don't know the "optimal" trip times, which depend on the whole network load status.9 Therefore, T can only be used as a reinforcement signal. This gives rise to a credit assignment problem typical of the reinforcement learning field (Bertsekas & Tsitsiklis, 1996; Kaelbling et al., 1996). We define the reinforcement r

r(T;M_k) to be a function of the goodness of the observed trip time as estimated on the basis of the local traffic model. r is a dimensionless value, r

(0; 1], used by the current node k as a positive reinforcement for the node f the backward ant B_d->s comes from. r takes into account some average of the so far observed values and of their dispersion to score the goodness of the trip time T, such that the smaller T is, the higher r is (the exact definition of r is discussed in the next subsection). The probability P_fd' is increased by the reinforcement value as follows:

It is important to remark that every discovered path receives a positive reinforcement in its selection probability, and the reinforcement is (in general) a non-linear function of the goodness of the path, as estimated using the associated trip time. In this way, not only the (explicit) assigned value r plays a role, but also the (implicit) ant's arrival rate. This strategy is based on trusting paths that receive either high reinforcements, independent of their frequency, or low and frequent reinforcements. In fact, for any traffic load condition, a path receives one or more high reinforcements only if it is much better than previously explored paths. On the other hand, during a transient phase after a sudden increase in network load all paths will likely have high traversing times with respect to those learned by the model M in the preceding, low congestion, situation. Therefore, in this case good paths can only be differentiated by the frequency of ants' arrivals. Assigning always a positive, but low, reinforcement value in the case of paths with high traversal time allows the implementation of the above mechanism based on the frequency of the reinforcements, while, at the same time, avoids giving excessive credit to paths with high traversal time due to their poor quality. The use of probabilistic entries is very specific to AntNet and we observed it to be effective, improving the performance, in some cases, even by 30%-40%. Routing tables are used in a probabilistic way not only by the ants but also by the data packets. This has been observed to improve AntNet performance, which means that the way the routing tables are built in AntNet is well matched with a probabilistic distribution of the data packets over all the good paths. Data packets are prevented from choosing links with very low probability by remapping the T 's entries by means of a power function f(p) =p;

> 1, which emphasizes high probability values and reduces lower ones (in our experiments we set

to 1.2). Figure 2 gives a high-level description of the algorithm in pseudo-code, while Figure 3 illustrates a simple example of the algorithm behavior. A detailed discussion of the characteristics of the algorithm is postponed to Section 8, after the performance of the algorithm has been analyzed with respect to a set of competitor algorithms. In this way, the characteristics of AntNet can be meaningfully evaluated and compared to those of other state-of-the-art algorithms.

Gianni Di Caro	gdicaro@iridia.ulb.ac.be
Marco Dorigo	mdorigo@ulb.ac.be
IRIDIA, Universite Libre de Bruxelles 50, av. F. Roosevelt, CP 194/6, 1050 - Brussels, Belgium

AntNet: Distributed Stigmergetic Control for Communications Networks

4. AntNet: An Adaptive Agent-based Routing Algorithm

4.1 Algorithm Description and Characteristics

4.2 How to Score the Goodness of the Ant's Trip Time