ฯๅ๐โ๎่๑๒๎๗ํ่๊: Design and analysis of distributed algorithms, Nicola Santoro, Carleton University, Ottawa, Canada
Basic Problems and Protocols
The aim of this chapter is to introduce some of the basic, primitive, computational
problems and solution techniques. These problems are basic in the sense that their
solution is commonly (sometimes frequently) required for the functioning of the system
(e.g., broadcast and wake-up); they are primitive in the sense that their computation
is often a preliminary step or a module of complex computations and protocols
(e.g., traversal and spanning-tree construction).
Some of these problems (e.g., broadcast and traversal), by their nature, are started
by a single entity; in other words, these computational problems have, in their definition,
the restriction unique initiator (UI). Other problems (e.g., wake-up and spanningtree
construction) have no such restriction. The computational differences created by
the additional assumption of a single initiator can be dramatic.
In this chapter we have also included the discussions on the (multiple-initiators)
computations in tree networks. Their fundamental importance derives from the fact
that most global problems (i.e., problems that, to be solved, require the involvement
of all entities), oftentimes can be correctly, easily, and efficiently solved by designing
a protocol for trees and executing it on a spanning-tree of the network.
All the problems considered here require, for their solution, the Connectivity (CN)
restriction (i.e., every entity must be reachable from every other entity). In general, and
unless otherwise stated, we will also assume Total Reliability (TR) and Bidirectional
Links (BL). These three restrictions are commonly used together, and the setR= {BL,
CN, TR} will be called the set of standard restrictions.
The techniques we introduce in this chapter to solve these problems are basic ones;
once properly understood, they form a powerful and an essential toolset that can be
effectively employed by every designer of distributed algorithms.
2.1 BROADCAST
2.1.1 The Problem
Consider a distributed computing system where only one entity, x, knows some important
information; this entity would like to share this information with all the other
entities in the system; see Figure 2.1. This problem is called broadcasting (Bcast),
and already we have started its examination in the previous chapter. To solve this
problem means to design a set of rules that, when executed by the entities, will lead
(within finite time) to a configuration where all entities will know the information; the
solution must work regardless of which entity has the information at the beginning.
Built-in the definition of the problem, there is the assumption, Unique Initiator
(UI), that only one entity will start the task. Actually, this assumption is further
restricted, because the unique initiator must be the one with the initial information;
we shall denote this restriction by UI+.
To solve this problem, every entity must clearly be involved in the computation.
Hence, for its solution, broadcasting requires the Connectivity (CN) restriction (i.e.,
every entity must be reachable from every other entity) otherwise some entities will
never receive the information.We have seen a simple solution to this problem, Flooding,
under two additional restrictions: Total Reliability (TR) and Bidirectional Links
(BL). Recall that the set R = {BL, CN, TR} is the set of standard restrictions .
2.1.2 Cost of Broadcasting
As we have seen, the solution protocol Flooding usesO(m) messages and, in theworst
case, O(d) ideal time units, where d is the diameter of the network.
The first and natural question is whether these costs could be reduced significantly
(i.e., in order of magnitude) using a different approach or technique, and if so, by how
much. This question is equivalent to ask what is the complexity of the broadcasting
problem. To answer this type of questions we need to establish a lower bound: to
find a bound f (typically, a function of the size of the network) and to prove that the
cost of every solution algorithm is at least f. In other words, a lower bound is needed
irrespective of the protocol, and it depends solely on the problem; hence, it is an
indication of how complex the problem really is.
We will denote byM(Bcast/RI+) and T (Bcast/RI+) the message and the time
complexity of broadcasting under RI+ = R ∪ UI+, respectively.
A lower bound on the amount of ideal time units required to perform a broadcast is
simple to derive: Every entity must receive the information regardless of how distant
they are from the initiator, and any entity could be the initiator. Hence, in the worst
case,
T (Bcast/RI+) ≥ Max{d(x, y) : x, y ∈ V} = d. (2.1)
The fact that Flooding performs the broadcast in d ideal time units means that the
lower bound is tight (i.e., it can be achieved) and that Flooding is time optimal. In
other words, we know exactly the ideal time complexity of broadcasting:
Property 2.1.1 The ideal time complexity of broadcasting under RI+ is _(d).
Let us now consider the message complexity. An obvious lower bound on the number
of messages is also easy to derive: in the end, every entity must know the information;
thus a message must be received by each of the n−1 entities, which initially did not
have the information. Hence,
M(Bcast/RI+) ≥ n −1.
With a little extra effort, we can derive a more accurate lower bound:
Theorem 2.1.1 M(Bcast/RI+) ≥ m.
Proof. Assume that there exists a correct broadcasting protocol A which, in each
execution, under RI+ on every G, uses fewer than m(G) messages. This means that
there is at least one link in G where no message is transmitted in any direction during
an execution of the algorithm. Consider an execution of the algorithm on G, and let
e=(x, y) ∈ E be the link where no message is transmitted by A. Now construct a
new graph G from G by removing the edge e, and adding a new node z and two new
edges e1 = (x, z) and e2 = (y, z) (see Fig. 2.2). Set z in a noninitiator status. Run
exactly the same execution of A on the new graph G : since no message was sent
along (x, y), this is possible. But since no message was sent along (x, y) in the original
execution, x and y never send a message to z in the current execution. As a result, z
will never receive the information (i.e., change status). This contradicts the fact that
A is a correct broadcasting protocol. _
FIGURE 2.2: A message must be sent on each link.
This means that any broadcasting algorithm requires _(m) messages.
Since Flooding solves broadcasting with 2m − n + 1 messages (see Exercise
2.9.1), this impliesM(Bcast/RI+) ≤ 2m − n + 1. Since the upper bound and the
lower bound are of the same order of magnitude, we can summarize
Property 2.1.2 The message complexity of broadcasting under RI+ is _(m).
The immediate consequence is that, in order of magnitude, Flooding is a messageoptimal
solution. Thus, if we want to design a new protocol to improve the 2m − n + 1 cost of Flooding, the best we can hope to achieve is to reduce the constant
2; in any case, because of Theorem 2.1.1, the reduction cannot bring the constant
below 1.
2.1.3 Broadcasting in Special Networks
The results we have obtained so far apply to generic solutions; that is, solutions that
do not depend onGand can thus be applied regardless of the communication topology
(provided it is undirected and connected).
Next, we will consider performing the broadcast in special networks. Throughout
we will assume the standard restrictions plus UI+.
Broadcasting in Trees Consider the case whenGis a tree; that is,Gis connected
and contains no cycles. In a tree, m = n−1; hence, the use of protocol Flooding for
broadcasting in a tree will cost 2m − (n − 1) = 2(n − 1) − (n − 1) = n − 1
messages.
IMPORTANT. This cost is achieved even if the entities do not know that the network
is a tree.
IMPORTANT. An interesting side effect of broadcasting on a tree is that the tree
becomes rooted in the initiator of the broadcast.
Broadcasting in Oriented Hypercubes A communication topology that is
commonly used as an interconnection network is the (k-dimensional) labeled hypercube,
denoted by Hk.
A oriented hypercube H1 of dimension k = 1 is just a pair of nodes called (in
binary) 0 and 1, connected by a link labeled 1 at both nodes.
A hypercube Hk of dimension k > 1 is obtained by taking two hypercubes of
dimension k − 1H k −1 and Hk−1and connecting the nodes with the same name
with a link labeled k at both nodes; the name of each node in H k −1 (respectively
H k−1) is then modified by prefixing it with the bit 0 (respectively, 1); see
Figure 2.3.
So, for example, node 0010 in H 4 will be connected to node 0010 in H4 by a
link labeled l = 5, and their names will become 00010 and 10010, respectively.
This labeling l of the links is symmetric (i.e., lx (x, y)= ly (x, y)) and is called the
dimensional labeling of a hypercube.
IMPORTANT. These names are used only for descriptive purposes; they are not
known to the entities. By contrast, the labels of the links (i.e., the port numbers) are
known to the entities by the Local Orientation axiom.
A hypercube of dimension k has n = 2k nodes; each node has k links, labeled
1, 2, . . . , k. Hence the total number of links ism = nk/2 = (n/2) log n = O(n log n).
Astraightforward application of Flooding in a hypercube will cost 2m − (n − 1) =
n log n − (n − 1) = n log n/2 + 1 = O(n log n) messages. However, hypercubes are
highly structured networks with many interesting properties. We can exploit these
special properties to construct a more efficient broadcast. Obviously, if we do so, the
protocol cannot be used in other networks.
Consider the following simple strategy.
Strategy HyperFlood:
1. The initiator sends the message to all its neighbors.
2. A node receiving a message from the link labeled l will send the messages only
to those neighbors with label l < l.
NOTE. The only difference between HyperFlood and the normal Flooding is in step
2: Instead of sending the message to all neighbors except the sender, the entity will
forward it only to some of them, which will depend on the label of the port from
where the message is received.
As we will see, this strategy correctly performs the broadcast using only n − 1
messages (instead of O(n log n)). Let us first examine termination and correctness.
Let Hk(x) denote the subgraph of Hk induced by the links where messages are sent
by HyperFlood when x is the initiator. Clearly every node in Hk(x) will receive the
information.
Lemma 2.1.1 HyperFlood correctly terminates.
Proof. Let x be the initiator; starting from x, the messages are sent only on links with
decreasing labels, and if y receives the message from link 4 it will forward it only to
the ports 1, 2, and 3. To prove that every entity will receive the information sent by
x, we need to show that, for every node y, there is a path from x to y such that the
sequence of the labels on the path from x to y is decreasing. (Note that the labels on
the path do not need to be consecutive integers.) To do so we will use the following
property of hypercubes.
Property 2.1.3 In a k-dimensional hypercube Hk, any node x is connected to any
other node y by a path π ∈ ˙[x, y] such that _(π) is a decreasing sequence.
Proof. Consider the k-bit names of x and of y in Hk: _xk, xk−1, . . . , x1, x0_ and
_yk, yk−1, . . . , y1, y0_. If x = y, these two strings will differ in t ≥ 1 positions.
Let j1, j2, . . . , jt be the positions in decreasing order; that is, ji > ji+1. Consider
now the nodes v0, v1, v2, . . . , vt , where v0 = x, and the name of vi differs from the
name of vi+1 only in the ji+1-th position. Thus, there is a link labeled ji+1 connecting
vi to vi+1, and clearly vt = y. But this means that _v0, v1, v2, . . . , vt _ is a
path from x to y, and the sequence of labels on this path is _j1, j2, . . . , jt _, which is
decreasing. _
Thus, Hk(x) is connected and spans (i.e., it contains all the nodes of) Hk, regardless
of x. In other words, within finite time, every entity will have the information. _
Let us now concentrate on the cost of HyperFlood. First of all observe that
M[HyperFlood/Hk] = n − 1.
To prove that only n − 1 messages will be sent during the broadcast, we just need
to show that every entity will receive the information only once. This is true because,
for every x, Hk(x) contains no cycles (see Exercise 2.9.9).
Also as an exercise it is left the proof that for every x, the eccentricity of x in Hk(x)
is k (see Exercise 2.9.10); this implies that the ideal time delay of HyperFlood in Hk
is always k. That is,
T[HyperFlood/Hk] = k (2.3)
These costs are the best that any broadcast algorithm can perform in a hypercube
regardless of how much more knowledge they have. However, they are obtained
here under the additional restriction that the network is a k-dimensional hypercube
with a dimensional labeling; that is, under H = {(G, l) = Hk}. Summarizing, we
have
Property 2.1.4 The ideal time complexity of broadcasting in a k-dimensional
hypercube with a dimensional labeling under RI+ is _(k).
Property 2.1.5 The message complexity of broadcasting in a k-dimensional hypercube
with a dimensional labeling under RI+ is _(n).
IMPORTANT. The reason why we are able to bypass the _(m) lower bound
expressed by Theorem 2.1.1 is because we are restricting the applicability of the
protocol.
Broadcasting in Complete Graphs Among all network topologies, the complete
graph is the one with the most links: Every entity is connected to all others;
thus m = n(n − 1)/2 = O(n2) (recall we are considering bidirectional links), and
d = 1.
The use of a generic protocol will require O(n2) messages. But this is really
unnecessary.
Broadcasting in a complete graph is easily accomplished: Because everybody
is connected to everybody else, the initiator just needs to send the information to
its neighbors (i.e., execute the command send(I) to N(x)) and the broadcast is
completed. This uses only n − 1 messages and d = 1 ideal time.
Clearly this protocol, KBcast, works only in a complete graph, that is under the
additional restriction K ≡ G is a complete graph. Summarizing
Property 2.1.6 The message and the ideal time complexity of broadcasting
in a complete graph under RI+ is _(k) are M(Bcast/RI+ ;K) = n − 1 and
T (Bcast/RI+ ;K) = 1, respectively.