DonNTU Biography Library Links Autosynopsis Ind. Task FCIS Masters

Design and Analysis of Distributed Algorithms

Nicola Santoro

Design And Analysis of Distributed Algorithms, Published by John Wiley & Sons, Inc., Hoboken, New Jersey; pgs.541-549)

CHAPTER 9

Continuous Computations

9.1 INTRODUCTION

When we have been discussing computations in distributed environments, we have always considered computations that once started (by some impulse), terminate within finite time. The termination conditions can be explicit in the protocol (e.g., the entities enter terminal states) or implicit (and hence a termination detection protocol must be run concurrently). The key point is that, implicit or explicit, the termination occurs.

There are, however, computations that never terminate. These are, for example, computations needed for the control and maintenance of the environment, and they are “on” as long as the system is “on”: The protocols composing a distributed operating system, the transaction management protocols in a distributed transaction system, the network service protocols in a data communication network, the object management functions in a distributed object system, and so forth.

Because of this nature, these computations are called continuous computations.

We have already seen one such computation in Chapter 4, when dealing with the problem of maintaining routing tables; those protocols would never really terminate as long as there are changes in the network topology or in the traffic conditions.

Another example of continuous computation is the heartbeat protocol that provides a step-synchronization for the entities in the system: Each entity endlessly sends a “heartbeat” message to all its neighbors,waiting to receive one from all of them before its next transmission. Heartbeat protocols form the backbone of the management of most distributed systems and networks. It is, for example, used in most failure detection mechanisms: An entity decides that a failure has occurred if the wait for a heartbeat from a neighbor exceeds a timeout value.

In this chapter we will examine some basic problems whose solution requires continuous computations: maintaining logical clocks, controlling access to a shared resource or service, maintaining a distributed queue, and detecting and resolving deadlocks.

Some continuous problems are just the (endless) repetition of a terminating problem (plus adjustments); others could be solved in that way, but they also have unique nonterminating solutions; others yet do not have any terminating counterpart. In this chapter we will examine continuous problems of all these types.

Before we proceed, let us ask a simple but provocative question: What is the cost of a continuous computation?

As the computation never ends, the answer is obviously “infinite.” While true, it is not meaningful because then all continuous computations have the same cost. What this answer really points out is that we should not (because we cannot) measure the total cost of the entire execution of a continuous computation. Which measure is most appropriate depends on the nature of the problem. Consider the heartbeat protocol, whose total cost is infinite; The meaningful cost measure in this case is the total number of messages it uses per single beat: 2 m. In the case of the routing table maintenance protocols, a meaningful measure is the total number of messages exchanged in the system per change in the topology.

Summarizing, we will measure a continuous computation in terms of either its cost per basic operation it implements or its cost per basic event triggering its action.

9.2 KEEPING VIRTUAL TIME

9.2.1 Virtual Time and Causal Order

In a distributed computing environment, without additional restrictions, there is definitely no common notion of real (i.e., physical) time among the entities. Each entity has a local clock; however, each is independent of the others. In general this fact does not restrict our ability to solve problems or perform tasks; indeed, all the protocols we have designed, with the exception of those for fully synchronous systems, do not require any common notion of real time among the entities.

Still, there are cases when such a notion would be helpful. Consider, for example, the situation when we need to undo some operation a (e.g., the transmission of a message) that has been erroneously performed. In this case, we need to undo also everything (e.g., transmission of other messages) thatwas caused by a. In this context, it is necessary to determine whether a certain event or action b (e.g., the transmission of some other message by some other entity) was caused (directly or indirectly) by that original action a. If we find out that a happened after b, that is t(a)>t(b), we can exclude that b was caused by a, and we need not undo it. So, although it would not completely solve the problem, having access to real time would be useful.

As we know, entities do not have access to real time t . They can, however, create, using local clocks and counters, a common notion of time T among them, that would allow them to approximate real time or at least exploit some useful properties of real time.

When we talk about a common notion of time we mean a function T that assigns a value (not necessarily unique) from a partially ordered set to each event in the system; we will denote by < the partial order. To be meaningful, this function must satisfy two basic properties:

Local Events Ordering: Let a and b two events occuring both at x, with t(a)<t(b). Then T(a)<T(b).
Send/Receive Ordering: : Let a be the event at x whose reaction is the transmission of a message to neighbor y, and let b be the arrival at y of that message. Then T(a)<T(b).

Any function T satisfying these two properties will be called virtual time. The other desirable property is the one allowing us to simulate real time in the undo problem: If a “happened after” b in virtual time (i.e., T(a)>T(b)), then a did not cause b (directly or indirectly). Let us be more precise. We say that event a causally preceeds, or simply causes event b, and denote this fact by a -> b, if one of the following conditions holds:

both a and b occur at the same entity and t (a)<t(b);
a is the event at x whose reaction is the transmission of a message to neighbor y, and b is the arrival at y of that message;
there exists a sequence e1, e2..., ek of events such that e1 = a, ek = b, and ei -> ei+1.

We will say that two events a and b are causally related if a -> b or b -> a. Sometimes events are not causally related at all: We will say that a and b are independent if both a +> b and b +> a.

We can now formally define the property we are looking for:

Causal Order: For any two events a and b, if a -> b then T (a)<T(b).

Interestingly, the simultaneous presence of properties Local Events and Send/Receive ordering are enough to guarantee Causal Order (Exercise 9.6.1):

Property 9.2.1 Let T be virtual time. Then T satisfies Causal Order.

The problem is how can the entities create a virtual time T . This should be done if possible without generating additional messages. To achieve this goal, each entity x must create and maintain a virtual clock Txthat assigns an integer value to each event occurring locally; these virtual clocks define an overall time function T: For an event a occurring at x, T(a)=Tx(a); hence, the clocks must be designed and maintained in such a way that the function T is indeed virtual time. Our goal is to design an algorithm that specifies how to create such virtual clocks and maintain them. Clearly, mantaining virtual time is a continuous computation.

As virtual clocks are mechanisms we design and construct, one might ask whether it is possible to design them so that, in addition to Causal Order, they satisfy some other desirable property.

Consider again the case of the undo operation; Causal Order allows only to say that if T(a)>T(b), then a +> b, while what we really need to know is whether a -> b. So, for example, it would be very useful if the virtual clocks satisfy the much stronger property

Complete Causal Order: a -> b if and only if T(a)<T(b).

If we could construct virtual clocks that satisfy the Complete Causal Order property, then to identify what to undo would be easy: To completely undo a we must undo every b with T(b)>T(a).

Notice that real time is not complete with respect to causal order; in fact, t(a)<t(b) does not imply at all that a caused b! In other words, Complete Causal Order is not provided by real clocks. This suggests that creating virtual clocks with this property is not a trivial task.

Also notice that each local clock cx, by definition, satisfies the Complete Causal Order property for the locally occurring events. This means that as long as an entity does not interact with other entities, its local clock generates a completely consistent virtual time. The problems clearly arise when entities interact with each another.

In the following we will design an algorithm to construct and maintain virtual clocks; we will also develop a system of virtual clocks that satisfy Complete Causal Order. In both cases, we will assume the standard restrictions IR: Connectivity, Complete Reliability, and Bidirectional Links, as well as Unique Identifiers. We will also assume Message Ordering (i.e., FIFO links).

9.2.2 Causal Order: Counter Clocks

As locally generated events and actions are already naturally ordered by the local clocks, to construct and maintain virtual clocks (i.e., clocks that satisfy Causal Order), we have to worry mostly about the interaction between different entities. Fortunately, entities interact directly only through messages; clearly, the operation a of transmitting a message generates the event b of receiving that message, that is, a -> b. Hence, we must somehow handle the arrival of a message not like any other event or local action but as a special one: It is the moment when the local times of the two entities, the sender and the receiver, come into contact; we must ensure that this causal order is preserved by the clocks we are designing. A simple algorithm for clock construction and maintenance is the following.

Algorithm CounterClock:

We equip each entity x with a local integer counter Cx of the local events and actions, that is, Cx is initially set to 0 and it is increased by 1 every time x reacts to an event other than arrival of a message; the increment occurs at the beginning of the action.
Let us consider now the interaction between entities. Whenever an entity x sends a message to a neighbor y, it encloses in the message the current value of its local counter. Whenever an entity y receives a message with a counter value count, it increases its local counte Cy := 1 + max{Cy , count}

FIGURE 9.1: Virtual time generated by CounterClocks.

Consider, for example, the TED diagram shown in Figure 9.1; the message sent by z to y contains the counter value Cz = 5; just before receiving this message Cx = 3; when reacting to the message arrival, x sets Cx = 1 + max {5, 3} = 6.

This system of local counters defines a global measure of time C; for any event a at x, C(a) is just Cx(a). Notice that each local counter is totally consistent with its local clock: For any two local events a and b, Cx(a)<Cx(b), if and only if cx(a)<cx(b); as local clocks satisfy the causal order property for local events, these counters satisfy local events ordering. By construction, if a is the transmission of a message and b is its reception, then C(a)=Cx(a)<Cx(b)=C(b), that is, send/receive ordering holds.

In other words, algorithm CounterClock constructs and maintains virtual clocks:

Theorem 9.2.1 Let C be the global time defined by the local counters of algorithm CounterClock. For any two actions and/or events a and b, if a -> b then C(a)<C(b).

This algorithm achieves its goal without any additional communication. It does, however, require an additional field (the value of the local counter) in each message; the bookkeeping is minimal: limited to storing the counter and increasing its value at each event.

Notice that although the time function C created by algorithm CounterClock satisfies the causal order property like real time t , it may differ greatly from real time. For example (Exercises 9.6.2 and 9.6.3), it is possible that t (a) > t(b), while C(a)<C(b). It is also possible that two independent events, occurring at diffe rent entities at different times, have the same virtual time.

9.2.3 Complete Causal Order: Vector Clocks

With the virtual clocks generated by algorithm CounterClock, we are guaranteed that property Causal Order holds, that is, if a -> b, then C(a)<C(b). However, the converse is not true. In fact, it is possible that C(a)<C (b), but a +> b. This means that if C(a)<C(b), it is impossible for us to decide whether or not a causes b. By contrast, as we mentioned earlier, it is precisely this type of knowledge that is the most helpful, for example, in the undo operation case.

It is natural to ask whether we can design virtual clocks that satisfy the much more powerful Complete Causal Order property. Let us point out again that real time clocks do not satisfy this property. Surprisingly, it is possible to achieve this property using solely local counters; however, we need many of them together; let us see how.

For simplicity, let us assume that we have established a total order among the entities, for example, by ranking them according to their ids (see Problem 2.9.4); thus, we will denote the entities as x1, x2..., xn, where the index of an entity denotes its position in the total order.

Algorithm VectorClock:

We equip each entity хi with a local integer counter Ci of the local events, that is, Ci is initially set to 0 and it is increased by 1 every time хi reacts to an event; the increment occurs at the beginning of the action. We equip each entity хi also with a n-dimensional vector Vi, of values, one for each entity in the network. The value Vi[i] – is always the value of the local counter Ci; the value of Vi[j], i≠j, is initially 0 and can change only when a message arrives at хi , according to the rule 2(b) described next.
Let us consider now the interaction between entities.
1. Whenever an entity хi sends a message to a neighbor xj, it encloses in the message the vector of values Vi.
2. Whenever an entity xj processes the arrival of a message with a vector vect of values, it updates its local vector Vj as follows: for all i≠j, it sets Vj[i] := max{vect[i], Vj[i]}.

As an example, in the TED diagram shown in Figure 9.2, when x1 receives the message from x2, its vector is [2 0 0], while the message contains vector [1 2 0]; when reacting to the message, x1 will first increase its local counter transforming its vector into [3 0 0] and then process the message transforming its vector into [3 2 0].

Consider an event a at хi . We define Vi(a) as follows: If a is the reception of a message, then Vi(a) is the value of the vector Vi after its updating when processing the message. For all other events (impulses and alarm clock ringing), Vi(a) is just the value of vector Vi when event a is processed (recall that the local counter is increased as the first operation of the processing).

FIGURE 9.2: Virtual time generated by VectorClocks.

This system of local vectors defines a global time function V : For any event a at xi , V(a) is just Vi(a). Notice that the values assigned to events by the time function V are vectors.

Let us now define the partial order we will use on vectors: Given any two n-dimensional vectors A and B, we say that A≤B if A[i]≤B[i] for all indices i; we say that A < B if and only if A≤B and A[i]<B[i] for at least an index i. So, for example, [1 2 0]<[3 2 0].

Notice that from the definition, it follows that some values are not comparable; for example, [1 3 0] НЕ больше или равно [3 2 0] and [3 2 0][1 3 0].

It is not difficult to see that the global time V with the partial order so defined is a virtual time, that is, it satisfies the Causal Order property. In fact, by construction,

Property 9.2.2 For any two events a and b at хi, Vi(a)<Vi(b) if and only if t(a)<t(b).

This means that V satisfies local events ordering. Next observe that these local vectors satisfy also send/receive ordering (Exercise 9.6.4):

Property 9.2.3 Let a be an event in whose reaction a message is transmitted by хi , and let b be the reception of that message by xj. Then V(a)=Vi(a)<Vj(b)=V(b).

Therefore, these local vectors are indeed virtual clocks:

Lemma 9.2.1 For any two events a and b, if a -> b, then V(a)<V(b).

Interestingly, as already mentioned, the converse is also true (Exercise 9.6.5):

Lemma 9.2.2 For any two events a and b, if V(a)<V(b), then a -> b.

That is, by Lemmas 9.2.1 and 9.2.2, the local vectors satisfy the Complete Causal Order property:

Theorem 9.2.2 Let V be the global time defined by the local counters of algorithm VectorClock. For any two events a and b, V(a)<V(b) if and only if a -> b.

Vector clocks have many other interesting properties also. For example, consider the vector clock when an entity xi reacts to an event a; the value of each component of the vector clock Vi(a) can give precise information about how many preceeding events are causally related to a. In fact,

Property 9.2.4 Let a be an event occurring at хi .

Vi(a)[j] - is the number of events e occurred at xi such that e -> a.
he total number of events e where e -> a is precisely .

It is also possible for an entity xi It is also possible for an entity М’ and М” are causally related or independent;

Property 9.2.5 Let vect’ and vect” be the vectors included in messages M’ and M”, respectively, received by xi. If vect’<vect” or vect’>vect”, hen the events that caused the transmission of those messages are causally related, else they are independent.

This property is useful, for example, when we do want to discard obsolete messages: If two messages are independent, both should probably be kept; by contrast, if they are causally related, only the most recent (i.e., with the greater vector) needs to be kept.

Let us now consider the cost of algorithm VectorClock. This algorithm requires that an n-dimensional vector of counters is included in each message. By contrast, it ensures a much stronger property that not even real clocks can offer. Indeed, the dimension n is necessary to ensure Complete Causal Order using timestamps (Problem 9.6.1).

A way to decrease the amount of additional information transmitted with each message is to include in each message not the entire vector but only the entries that have changed since last message to the same neighbor.

For large systems with frequent communication, this approach can significantly reduce the total amount of transmitted data with respect to always sending the vector. The drawback is the increased storage and bookkeeping: Each entity xi must remember, for each neighbor xj and for each entry k in the vector, the last value of Vi[k] that xi sent to xj . Another drawback is that Property 9.2.5 would no longer hold (Exercise 9.6.8).

9.2.4 Concluding Remarks

Hacking In presenting algorithm VectorClocks we have assumed that there is an a priori total ordering of the entities, and that each entity knows both its rank in the ordering and the total number n of entities. This can be clearly obtained, for example, by performing a ranking protocol on the entities’ ids. The cost for this operation is expensive, O(n2) messages in the worst case, even if there is already a leader and a spanning tree. However, this cost would be incurred only once, before the creation of the clocks takes place.

Interestingly, with simple modifications to algorithm VectorClocks, it is possible to achieve the goal (i.e., to construct a virtual clock satisfying the Complete Causal Order property) without any a priori knowledge and yet without incurring in any initial cost; even more interesting is the fact that, in some cases, maintaining the clocks requires much less information inside the messages.

We shall call this algorithm PseudoVectorClocks and leave its specification and analysis as an exercise (Problem 9.6.2 and Exercise 9.6.9).

Bounding the Clocks The major problem with both CounterClocks and with VectorClocks is that the values of the counters are monotonically increasing: They keep on growing. This means that these values and, hence, the bit complexity of the messages are unbounded.

This problem is quite serious especially with VectorClocks. A possible solution is to occasionally reset the vectors; the difficulty with this approach is clearly caused by messages in transit: The resetting of the virtual clocks will destroy any existing causal order between the arrival of these messages and the events that caused their transmission.

Any strategy to avoid this unfortunate consequence (Problem 9.6.3) is bound to be both expensive and intrusive.