Nicola Santoro Design and analysis of distributed algorithms

Первоисточник: Design and analysis of distributed algorithms, Nicola Santoro, Carleton University, Ottawa, Canada

Basic Problems and Protocols

The aim of this chapter is to introduce some of the basic, primitive, computational

problems and solution techniques. These problems are basic in the sense that their

solution is commonly (sometimes frequently) required for the functioning of the system

(e.g., broadcast and wake-up); they are primitive in the sense that their computation

is often a preliminary step or a module of complex computations and protocols

(e.g., traversal and spanning-tree construction).

Some of these problems (e.g., broadcast and traversal), by their nature, are started

by a single entity; in other words, these computational problems have, in their definition,

the restriction unique initiator (UI). Other problems (e.g., wake-up and spanningtree

construction) have no such restriction. The computational differences created by

the additional assumption of a single initiator can be dramatic.

In this chapter we have also included the discussions on the (multiple-initiators)

computations in tree networks. Their fundamental importance derives from the fact

that most global problems (i.e., problems that, to be solved, require the involvement

of all entities), oftentimes can be correctly, easily, and efficiently solved by designing

a protocol for trees and executing it on a spanning-tree of the network.

All the problems considered here require, for their solution, the Connectivity (CN)

restriction (i.e., every entity must be reachable from every other entity). In general, and

unless otherwise stated, we will also assume Total Reliability (TR) and Bidirectional

Links (BL). These three restrictions are commonly used together, and the setR= {BL,

CN, TR} will be called the set of standard restrictions.

The techniques we introduce in this chapter to solve these problems are basic ones;

once properly understood, they form a powerful and an essential toolset that can be

effectively employed by every designer of distributed algorithms.

2.1 BROADCAST

2.1.1 The Problem

Consider a distributed computing system where only one entity, x, knows some important

information; this entity would like to share this information with all the other

entities in the system; see Figure 2.1. This problem is called broadcasting (Bcast),

and already we have started its examination in the previous chapter. To solve this

problem means to design a set of rules that, when executed by the entities, will lead

(within finite time) to a configuration where all entities will know the information; the

solution must work regardless of which entity has the information at the beginning.

Built-in the definition of the problem, there is the assumption, Unique Initiator

(UI), that only one entity will start the task. Actually, this assumption is further

restricted, because the unique initiator must be the one with the initial information;

we shall denote this restriction by UI+.

To solve this problem, every entity must clearly be involved in the computation.

Hence, for its solution, broadcasting requires the Connectivity (CN) restriction (i.e.,

every entity must be reachable from every other entity) otherwise some entities will

never receive the information.We have seen a simple solution to this problem, Flooding,

under two additional restrictions: Total Reliability (TR) and Bidirectional Links

(BL). Recall that the set R = {BL, CN, TR} is the set of standard restrictions .

2.1.2 Cost of Broadcasting

As we have seen, the solution protocol Flooding usesO(m) messages and, in theworst

case, O(d) ideal time units, where d is the diameter of the network.

The first and natural question is whether these costs could be reduced significantly

(i.e., in order of magnitude) using a different approach or technique, and if so, by how

much. This question is equivalent to ask what is the complexity of the broadcasting

problem. To answer this type of questions we need to establish a lower bound: to

find a bound f (typically, a function of the size of the network) and to prove that the

cost of every solution algorithm is at least f. In other words, a lower bound is needed

irrespective of the protocol, and it depends solely on the problem; hence, it is an

indication of how complex the problem really is.

We will denote byM(Bcast/RI+) and T (Bcast/RI+) the message and the time

complexity of broadcasting under RI+ = R ∪ UI+, respectively.

A lower bound on the amount of ideal time units required to perform a broadcast is

simple to derive: Every entity must receive the information regardless of how distant

they are from the initiator, and any entity could be the initiator. Hence, in the worst

case,

T (Bcast/RI+) ≥ Max{d(x, y) : x, y ∈ V} = d. (2.1)

The fact that Flooding performs the broadcast in d ideal time units means that the

lower bound is tight (i.e., it can be achieved) and that Flooding is time optimal. In

other words, we know exactly the ideal time complexity of broadcasting:

Property 2.1.1 The ideal time complexity of broadcasting under RI+ is _(d).

Let us now consider the message complexity. An obvious lower bound on the number

of messages is also easy to derive: in the end, every entity must know the information;

thus a message must be received by each of the n−1 entities, which initially did not

have the information. Hence,

M(Bcast/RI+) ≥ n −1.

With a little extra effort, we can derive a more accurate lower bound:

Theorem 2.1.1 M(Bcast/RI+) ≥ m.

Proof. Assume that there exists a correct broadcasting protocol A which, in each

execution, under RI+ on every G, uses fewer than m(G) messages. This means that

there is at least one link in G where no message is transmitted in any direction during

an execution of the algorithm. Consider an execution of the algorithm on G, and let

e=(x, y) ∈ E be the link where no message is transmitted by A. Now construct a

new graph G from G by removing the edge e, and adding a new node z and two new

edges e1 = (x, z) and e2 = (y, z) (see Fig. 2.2). Set z in a noninitiator status. Run

exactly the same execution of A on the new graph G : since no message was sent

along (x, y), this is possible. But since no message was sent along (x, y) in the original

execution, x and y never send a message to z in the current execution. As a result, z

will never receive the information (i.e., change status). This contradicts the fact that

A is a correct broadcasting protocol. _

FIGURE 2.2: A message must be sent on each link.

This means that any broadcasting algorithm requires _(m) messages.

Since Flooding solves broadcasting with 2m − n + 1 messages (see Exercise

2.9.1), this impliesM(Bcast/RI+) ≤ 2m − n + 1. Since the upper bound and the

lower bound are of the same order of magnitude, we can summarize

Property 2.1.2 The message complexity of broadcasting under RI+ is _(m).

The immediate consequence is that, in order of magnitude, Flooding is a messageoptimal

solution. Thus, if we want to design a new protocol to improve the 2m − n + 1 cost of Flooding, the best we can hope to achieve is to reduce the constant

2; in any case, because of Theorem 2.1.1, the reduction cannot bring the constant

below 1.

2.1.3 Broadcasting in Special Networks

The results we have obtained so far apply to generic solutions; that is, solutions that

do not depend onGand can thus be applied regardless of the communication topology

(provided it is undirected and connected).

Next, we will consider performing the broadcast in special networks. Throughout

we will assume the standard restrictions plus UI+.

Broadcasting in Trees Consider the case whenGis a tree; that is,Gis connected

and contains no cycles. In a tree, m = n−1; hence, the use of protocol Flooding for

broadcasting in a tree will cost 2m − (n − 1) = 2(n − 1) − (n − 1) = n − 1

messages.

IMPORTANT. This cost is achieved even if the entities do not know that the network

is a tree.

IMPORTANT. An interesting side effect of broadcasting on a tree is that the tree

becomes rooted in the initiator of the broadcast.

Broadcasting in Oriented Hypercubes A communication topology that is

commonly used as an interconnection network is the (k-dimensional) labeled hypercube,

denoted by Hk.

A oriented hypercube H1 of dimension k = 1 is just a pair of nodes called (in

binary) “0” and “1,” connected by a link labeled “1” at both nodes.

A hypercube Hk of dimension k > 1 is obtained by taking two hypercubes of

dimension k − 1–H k −1 and Hk−1–and connecting the nodes with the same name

with a link labeled k at both nodes; the name of each node in H k −1 (respectively

H k−1) is then modified by prefixing it with the bit 0 (respectively, 1); see

Figure 2.3.

So, for example, node “0010” in H 4 will be connected to node “0010” in H4 by a

link labeled l = 5, and their names will become “00010” and “10010,” respectively.

This labeling l of the links is symmetric (i.e., lx (x, y)= ly (x, y)) and is called the

dimensional labeling of a hypercube.

IMPORTANT. These names are used only for descriptive purposes; they are not

known to the entities. By contrast, the labels of the links (i.e., the port numbers) are

known to the entities by the Local Orientation axiom.

A hypercube of dimension k has n = 2k nodes; each node has k links, labeled

1, 2, . . . , k. Hence the total number of links ism = nk/2 = (n/2) log n = O(n log n).

Astraightforward application of Flooding in a hypercube will cost 2m − (n − 1) =

n log n − (n − 1) = n log n/2 + 1 = O(n log n) messages. However, hypercubes are

highly structured networks with many interesting properties. We can exploit these

special properties to construct a more efficient broadcast. Obviously, if we do so, the

protocol cannot be used in other networks.

Consider the following simple strategy.

Strategy HyperFlood:

1. The initiator sends the message to all its neighbors.

2. A node receiving a message from the link labeled l will send the messages only

to those neighbors with label l < l.

NOTE. The only difference between HyperFlood and the normal Flooding is in step

2: Instead of sending the message to all neighbors except the sender, the entity will

forward it only to some of them, which will depend on the label of the port from

where the message is received.

As we will see, this strategy correctly performs the broadcast using only n − 1

messages (instead of O(n log n)). Let us first examine termination and correctness.

Let Hk(x) denote the subgraph of Hk induced by the links where messages are sent

by HyperFlood when x is the initiator. Clearly every node in Hk(x) will receive the

information.

Lemma 2.1.1 HyperFlood correctly terminates.

Proof. Let x be the initiator; starting from x, the messages are sent only on links with

decreasing labels, and if y receives the message from link 4 it will forward it only to

the ports 1, 2, and 3. To prove that every entity will receive the information sent by

x, we need to show that, for every node y, there is a path from x to y such that the

sequence of the labels on the path from x to y is decreasing. (Note that the labels on

the path do not need to be consecutive integers.) To do so we will use the following

property of hypercubes.

Property 2.1.3 In a k-dimensional hypercube Hk, any node x is connected to any

other node y by a path π ∈ ˙[x, y] such that _(π) is a decreasing sequence.

Proof. Consider the k-bit names of x and of y in Hk: _xk, xk−1, . . . , x1, x0_ and

_yk, yk−1, . . . , y1, y0_. If x = y, these two strings will differ in t ≥ 1 positions.

Let j1, j2, . . . , jt be the positions in decreasing order; that is, ji > ji+1. Consider

now the nodes v0, v1, v2, . . . , vt , where v0 = x, and the name of vi differs from the

name of vi+1 only in the ji+1-th position. Thus, there is a link labeled ji+1 connecting

vi to vi+1, and clearly vt = y. But this means that _v0, v1, v2, . . . , vt _ is a

path from x to y, and the sequence of labels on this path is _j1, j2, . . . , jt _, which is

decreasing. _

Thus, Hk(x) is connected and spans (i.e., it contains all the nodes of) Hk, regardless

of x. In other words, within finite time, every entity will have the information. _

Let us now concentrate on the cost of HyperFlood. First of all observe that

M[HyperFlood/Hk] = n − 1.

To prove that only n − 1 messages will be sent during the broadcast, we just need

to show that every entity will receive the information only once. This is true because,

for every x, Hk(x) contains no cycles (see Exercise 2.9.9).

Also as an exercise it is left the proof that for every x, the eccentricity of x in Hk(x)

is k (see Exercise 2.9.10); this implies that the ideal time delay of HyperFlood in Hk

is always k. That is,

T[HyperFlood/Hk] = k (2.3)

These costs are the best that any broadcast algorithm can perform in a hypercube

regardless of how much more knowledge they have. However, they are obtained

here under the additional restriction that the network is a k-dimensional hypercube

with a dimensional labeling; that is, under H = {(G, l) = Hk}. Summarizing, we

have

Property 2.1.4 The ideal time complexity of broadcasting in a k-dimensional

hypercube with a dimensional labeling under RI+ is _(k).

Property 2.1.5 The message complexity of broadcasting in a k-dimensional hypercube

with a dimensional labeling under RI+ is _(n).

IMPORTANT. The reason why we are able to “bypass” the _(m) lower bound

expressed by Theorem 2.1.1 is because we are restricting the applicability of the

protocol.

Broadcasting in Complete Graphs Among all network topologies, the complete

graph is the one with the most links: Every entity is connected to all others;

thus m = n(n − 1)/2 = O(n2) (recall we are considering bidirectional links), and

d = 1.

The use of a generic protocol will require O(n2) messages. But this is really

unnecessary.

Broadcasting in a complete graph is easily accomplished: Because everybody

is connected to everybody else, the initiator just needs to send the information to

its neighbors (i.e., execute the command “send(I) to N(x)”) and the broadcast is

completed. This uses only n − 1 messages and d = 1 ideal time.

Clearly this protocol, KBcast, works only in a complete graph, that is under the

additional restriction K ≡ “G is a complete graph.” Summarizing

Property 2.1.6 The message and the ideal time complexity of broadcasting

in a complete graph under RI+ is _(k) are M(Bcast/RI+ ;K) = n − 1 and

T (Bcast/RI+ ;K) = 1, respectively.