Roozbeh
Shad Graduate student Email: Rouzbeh_shad@yahoo.com
Hamid Ebadi Assistant professor Email: ebadi@kntu.ac.ir
Mohsen
Ghods Graduate student Email: mohsen59ghods@yahoo.com
Dept of Geodesy and Geomatics Eng. K.N.Toosi University of
Technology No 1346, Mirdamad cross, Valiasr st., Tehran, IRAN Fax
(21) 877 9476
Evaluation of Route Finding Methods in GIS
Application
Sourñe: gisdevelopment.net/technology/gis/ma03202pf.htm
Abstract
One of the cases that
distinguishes GIS from other information systems are spatial information
functions.These functions usually provide selection switches and solutions
for GIS users. Simutaneously with GIS techniques development, GIS
executive analyse functions has been developed interestingly. For instance
network analysis is one of these functions.
Shortest Path finding
is one of the network analysis functions that introduced as important
applications in transportation problems. With respect to route finding
analysis, input data type and volume diversity and different effective
parameters on a route finding algorithms performance applications, there
have been evaluated different methods for this solution.
Therefore
in this paper with briefly reviewing of graph theory and effective
parameters on the route finding algorithm’s performance, we evaluated
processing time of Dijkstra, Huristic and Genetic Algorithms using Visual
Basic, MapObject and Visual C++ programing languages. We didn’t find clear
answer for mentioned algorithms which performed on real road networks
because their performance depended on the problem complexity and the
number of nodes in real road networks. Finally We had some suggestion for
using mentioned algorithms with respect to specific
conditions.
1-Introduction
With the development of
geographic information systems (GIS) technology, network transportation
analyses within a GIS environment have become a common practice in many
application areas. A key problem in network and transportation analyses is
the computation of shortest paths between different locations on a
network. Sometimes this computation has to be done in real time. In some
cases the fastest route has to be determined in a few seconds in order to
ensure the safety of a patient. Moreover, when large real road networks
are involved in an application, the determination of shortest paths on a
large network can be computationally very intensive. Because many
applications involve real road networks and because the computation of a
fastest route (shortest path) requires an answer in real time, a natural
question to ask is: Which shortest path algorithm runs fastest on real
road networks?
Although considerable empirical studies on the
performance of shortest path algorithms have been done, there is no clear
answer as to which algorithm, or a set of algorithms runs fastest on real
road networks (Cherkassky, 1993). In our case study conducted by a set of
three shortest path algorithms that run fastest on real road networks has
been identified. These three algorithms are: 1) Dijkstra’s Algorithm, 2)
Huristic Methods, and 3) Genetic Algorithm. As a sequel to that study,
this paper reviews and summarizes these three algorithms, and demonstrates
implementation strategies related to the algorithms.
2-Graph
Theory
In Graph Theory a network is defined as a directed graph G =
(N, A) consisting of a set N of nodes and a set A of arcs with associated
numerical values, such as the number of nodes, n=|N|, the number of arcs,
m=|A|, and the length of an arc connecting nodes i and j, denoted as
l(i,j). The shortest path problem can be stated as follows: given a
network, find the shortest distances (least costs) from a source node to
all other nodes or to a subset of nodes on the network (Ahuja et al,
1993). These shortest paths represent a directed tree T rooted from a
source node s with the characteristic that a unique path from s to any
node i on the network is the shortest path to that node. The length of the
shortest path from s to any node i is denoted as d(i). This directed tree
is called a shortest path tree. For any network with n nodes, one can
obtain n distinctive shortest path trees. Shortest paths from one (source)
node to all other nodes on a network are normally referred as one-to-all
shortest paths. Shortest paths from one source node to a subset of the
nodes on a network can be defined as one-to-some shortest paths. Shortest
paths from every node to every other node on a network are normally called
all-to-all shortest paths (Node Locating problem).
3-Effective
Parameters on the Shortest Path Algorithms
Four parameters that
effect on shortest path algorithms are Network Representations, Labeling
Method, Selection Rules and Data Structures (Zhan and Noon, 1996). These
parameters effect on shortest path algorithms performance and cause to be
developed different kind of algorithms.
3-1-Network
Representations
The way in which an input network is represented
and implemented in a shortest path algorithm is vital to the performance
of the algorithm.There are several straightforward ways of representing a
general network for computational purposes. Past research has proven that
the Forward Star representation is the most efficient data structure for
representing networks (Gallo and Pallottino, 1988). Two sets of arrays are
used in the forward star data structure. The first array is used to store
data associated with arcs, and the second array is used to store data
related to nodes. All arcs of a network in question are maintained in a
list and are ordered in a specific sequence. That is, arcs emanating from
nodes 1, 2, 3, ..., are ordered sequentially. Arcs emanating from the same
node can be ordered arbitrarily, however. All information associated with
an arc, such as starting node, ending node, cost, arc length and capacity
are stored with the arc in some way (e.g., corresponding arrays or linked
lists).
3-2-Labeling Method
The labeling method is a
central procedure in most shortest path algorithms(Gallo and Pallottino,
1988). The output of the labeling method is an out-tree from a source
node, s, to a set of nodes. This out-tree is constructed iteratively, and
the shortest path from s to i is obtained upon termination of the method.
Three pieces of information are maintained for each node i in the labeling
method while constructing a shortest path tree: the distance label, d(i),
the parent node, p(i), and the node status, S(i). The distance label,
d(i), stores the upper bound of the shortest path distance from s to i
during iteration. Upon termination of an algorithm, d(i) represents the
unique shortest path from s to i. The parent node p(i) records the node
that immediately precedes node i in the out-tree. The node status, S(i),
can be one of the following: unreached, temporarily labeled and
permanently labeled. When a node is not scanned during the iteration, it
is unreached. Normally the distance label of an unreached node is set to
positive infinite. When it is known that the currently known shortest path
of getting to node i is also the absolute shortest path we will ever be
able to attain, the node is called permanently labeled. When further
improvement is still expected to be made on the shortest path to node i,
node i is considered only temporarily labeled. It follows that d(i) is an
upper bound on the shortest path distance to node i if the node is
temporarily labeled; and d(i) represents the final and optimal shortest
path distance to node i if the node is permanently labeled.
3-3-Selection Rules and Data Structures
The performance
of a particular shortest path algorithm partly depends on how the basic
operations in the labeling method are implemented. Two aspects are
particularly important to the performance of a shortest path algorithm: 1)
the strategies used to select the next temporarily labeled node to be
scanned, and 2) the data structures utilized to maintain the set of
labeled nodes (Ahuja et ah,1993).
Strategies commonly used for
selecting the next temporarily labeled node to be scanned are FIFO (First
In First Out): the oldest node in the set of temporarily labeled nodes is
selected first in this search strategy, LIFO (Last In First Out): the
newest node in the set of temporarily labeled nodes is selected first in
this search strategy, BFS (Best-First-Search): the minimum distance label
from the set of temporarily labeled nodes is considered as the best node
(Gallo and Pallottino, 1988).
A number of data structures can be
used to manipulate the set of temporarily labeled nodes in order to
support these strategies.These data structures include arrays, singly and
doubly linked lists, stacks, buckets and queues (Sedgewick,1990). A singly
linked list contains a collection of elements. Each element has a data
field and a link field. The data field contains information to be stored,
and the link field contains a pointer pointing to the next element in the
list.A doubly linked list contains two pointers. One pointer points to the
previous element in the list, and another pointer points to the next
element in the list. Stack is another special type of list which only
allows removal and addition of an element at one end of the list. The
bucket data structure is related to the double bucket implementations of
the Dijkstra algorithms. Buckets are sets arranged in a sorted fashion.
Bucket k stores all temporarily labeled nodes whose distance labels fall
within a certain range. A queue is related to the Graph Growth
algorithms.The queue is a special type of list which allows the addition
of an element at the tail and the deletion of an element at the head
(Ahuja,1993).
4- Reviewing of Algorithm Theories
In this
section we breifly review implementation senarios for Dijkstra's
Algorithm, Huristic Methods and Genetic Algorithm.
4-1-Dijkstra’s Algorithm
Dijkstra's Algorithm (Dijkstra,
1959) for computing the shortest path is based upon the caculation of
values in three one dimensional arrays, each of size equal to the number
of nodes in the network. Each row of each array corresponds to one of the
nodes of the network. As the algorithm proceeds, paths are caculated from
the start node to other nodes in the network, paths are compared and the
best (minimum weight) paths are chosen, given the state of knowledge of
the network at that stage in the progress of the algorithm. At each stage
in the computation:
- The array distance keeps track of the current minimal distances from
the start node to the array nodes. (As the algorithm proceeds, these
distances become refined to closer approximations to the shortest
distance).
- The array path keeps a record of the preceding nodes on the current
best paths from the start node to the array node.
- The array included marks off the nodes as they are used in the
caculation of the minimal distance path from start node to the end node.
The algorithm initialized arrays as follows:
- Cells in the distance array are initialized to 0 for the start node
cell; infinity for the cells whose nodes are not directly connected to
the start node; and to their direct distances from the start node to all
the other cells.
- Cells in the path array are initialized to the start node for cells
whose nodes are adjucent to start, otherwise undefined.
- Cells in the included array are initialized to ‘no’ for all cells
except the start node cell.
The algorithm then iterates the
following procedure until the end node market ‘yes’ in included array.
- Find the node (say n) whose distance from the start node is smallest
amongst all those nodes not yet market ‘yes’ in included.
- Mak n as ‘yes’ in included.
- For each node m not yet market ‘yes’ in included:
If n and m are
adjucent and if the current distance (in the distance array) from start
to n plus the edge weight of nm is less than the current distance from
start to m then update the distance array to the new and smaller
distance from start to m; also update the path matrix to show the
proceding node in the path to m to be n.
Eventually, the end
node will be included in the computation and the algorithm halts,
returning the shortest path from the start node to the end node in the end
node cell of distance. The time complexity of Dijksta’s is O(n^2), where n
is the number of nodes (Worboys M.F.,1995).
4-2-Huristic
methods
Multiple goals, factors and constraints in transportation
and location real problems make very difficult their treatment with the
traditional optimization methods. On other hand, there are different
heuristic and probabilistic methods that no always guarantee the optimal
solution but are able to find a possible solution space, taking advantage
of the particular attributes of the target problem. In a recent study, a
set of two shortest path algorithms that run fastest on real road networks
has been identified. These two algorithms are: 1) the graph growth
algorithm implemented with two queues, 2) the Dijkstra algorithm
implemented with double buckets, we reviews and summarizes these two
algorithms (Zhan and Noon, 1996).
4-2-1- The Graph Growth
Algorithm Implemented With Two Queues
The Graph Growth algorithm
implemented with two queues (TQQ) was introduced by Pallottino in 1984.
TQQ is an improved version of the Graph Growth implementation developed by
Pape (PAP) in 1974. There are four basic operations in the basic procedure
for constructing a shortest path tree in this method:
Queue_Initialization(Q) initialize queue; Queue_Removal(Q, i) remove node
i from queue Q; Queue_Insertion(Q, j) insert node j into queue Q; and
Q=Null? check whether queue Q is empty.
In the implementations of
TQQ, nodes are partitioned into two sets: the first set of nodes are those
nodes whose current distance labels have not already been used to find a
shortest path and the second set contains the remaining nodes. The first
set of nodes is maintained by a queue Q. Nodes in the second set are
further split into two categories: 1) the unreached nodes which have never
entered Q, i.e., nodes whose distance labels are still infinite, and 2)
labeled nodes, i.e., the nodes that have passed through Q at least once,
and the nodes whose current distance labels have already been used. In TQQ
we used a data structure called Two-Queue (Q) to maintain the first set of
nodes in Q. A Two-Queue consists of a FIFO queue (Q') and a FIFO queue
(Q''). Nodes can be inserted at the end of Q' and Q", and they can be
removed from the head of Q' and Q".It follows that for any node that is
not already in Q, the node is inserted at the end of Q' if it is
unreached, or the node is inserted at the end of Q" if it is temporarily
labeled (Pallottino,1984).
4-2-2- The Dijkstra Algorithm
implemented with double buckets
The original Dijkstra algorithm
partitions all nodes into two sets: temporarily and permanently labeled
nodes. At each iteration, it selects a temporarily labeled node with the
minimum distance label as the next node to be scanned. Once a node is
scanned, it becomes permanently labeled. The Dijkstra algorithm terminates
when all nodes become permanently labeled. detailed procedure of the
Dijkstra algorithm has been described in section 4-1. In Dijkstra's
original algorithm, temporarily labeled nodes are treated as a nonordered
list (Dijkstra,1959). This is equivalent to treating the queue Q in the
above general procedure for shortest path tree construction as a
nonordered list. This is of course a bottleneck operation because all
nodes in Q have to be visited at each iteration in order to select the
node with the minimum distance label. A natural enhancement of the
original Dijkstra algorithm is to maintain the labeled nodes in a data
structure in such a way that the nodes are sorted by distance labels. The
bucket data structure is just one of those structures. Buckets are sets
arranged in a sorted fashion. Bucket k stores all temporarily labeled
nodes whose distance labels fall within a certain range. Nodes contained
in each bucket can be represented with a doubly linked list. A doubly
linked list only requires O(1) time to complete an operation in each
distance update in the bucket data structure. These operations include: 1)
checking if a bucket is empty, 2) adding an element to a bucket, and 3)
deleting an element from a bucket.
Dial (1969) was the first to
implement the Dijkstra algorithm using buckets. In Dial's implementation,
bucket k contains all temporarily labeled nodes whose distance labels are
equal to k. Buckets numbered 0, 1, 2, 3, ..., are checked sequentially
until the first nonempty bucket is identified. Each node contained in the
first nonempty bucket has the minimum distance label by definition. One by
one, these nodes with the minimum distance label become permanently
labeled and are deleted from the bucket during the scanning process. The
position of a temporarily labeled node in the buckets is updated
accordingly when the distance label of a node changes (Ahuja et ah,1993).
Two levels of buckets, high-level and low-level, are maintained in
the DKD implementation. A total of d buckets in the low-level buckets are
used. A bucket i in the high-level buckets contains all nodes whose
distance labels are within the range of [i*d, (i+1)* d-1]. In addition, a
nonempty bucket with the smallest index L is also maintained in the
high-level buckets. A low-level bucket d(j)-L*d maintains nodes whose
distance labels are within the range of [L*d, (L+1)* d-1]. Nodes in the
low-level buckets are examined during the scanning process. After all
nodes in the low-level buckets are scanned, the value of L is increased.
When the value of L increases, nodes in the nonempty high-level buckets
are moved to its corresponding low-level buckets, and the next cycle of
scanning process begins (Cherkassky, 1993).
4-3-Genetic
Algorithm
In this context,
Genetic Algorithm is a metaheuristic
technique (Diaz, 1996) that could provides robust tools with optimal or
quasi-optimal designing and programming of transportation networks and
node locating. Because its excellent flexibility, robustness and
adaptability characteristics, genetic algorithm has been successfully
applied in the non-linear and complex optimization problem solutions, and
also it is very appropriated to face the noisy combinatorial problems
associated to the real systems optimization and transportation networks.
Individual representation as paths, composed of routes (stretch
joining two adjacent nodes), conforming a trajectory from an origin node
to a goal node, and operators utilization on sub-paths are novel features
presented in this paper. Different solutions with Genetic Algorithms to
the same problems have been well studied in (Gen, 1997). A route is the
two-neighbor nodes union and is represented as a matrix containing the
connecting arcs of each node. The possible amount of paths in each
generation corresponds to the population. Besides, not there are, as
traditional solution methods, arc constraints; in other words, directed
and not directed arcs could be present in the model. Fitness is assigned
comparing each path with the others, and is the method to determine if a
path has possibility of survive to next generation. Fitness is computed as
the sum of products of the route lengths and their associated costs.
Selection operation extracts first allocated elements of the sorted
population and they are transferred to the first positions of the next
generation population, based on a pre-established selection rate.A simple
crossing operator is used: two paths from the previous generation are
randomly chosen, two cross points are randomly picked in each path string,
then the marked fragments of the paths are extracted and interchanged in
both paths, previously doing a checking for incompatible nodes. Mutation
operator is used in a similar way of crossing one. A path and its mutation
points are randomly chosen. A new legal sub-string is random generate.
This sub-string will occupied the marked sub-path. Control parameters of
the model are defined as a condition set to guide in an efficient way the
algorithm, like: Generations number, generation paths number, path length,
finishing criteria, selection, mutation, and crossing rates. The finish
criteria used to be dependent of the predetermined generation number, or
the objective function valuation stability.In the case of n nodes a number
of n generations was selected. The more adequate selection, crossing and
mutation rates for the proposed model were found as 34%, 42%, and 34% of
the population paths, respectively.
The shortest path problem to
solve is a generalization of the traditional problem of shortest path.
This is, optimal (upon fitness function evaluation) path searching
connecting an origin node with a goal node, passing through an
intermediary node set, linking each other by non directed routes, from
which, length, transportation costs, average altitude, time attributes,
etc, are known, and used to evaluate the fitness function.The genetic
algorithm allows an initial path population by random generation. Each
path-individual has a fitness function facilitating to be differentiated
with others, and then through genetic operators participate in next
generationäs development of better paths,
producing each time better paths for the trajectories with the best
fitness from the previous population.
5-Performance
Analysis
We tested 4 shortest path algorithms using real road
networks. In our evaluation, we used 6 real road networks for evaluating
shortest path algorithms. These 6 networks included the Irans Road
Planning Network that represented on the scale of 1:25000 and 1:250000.
The level road networks are composed of low-detail road networks and
high-detail road networks. The low-detail networks contain two levels of
roads, including Freeway and Highway. The high-detail networks consist of
Asphalt road1,Asphalt road2, Asphalt road3 and Gravel road. The 6 networks
were stored and maintained in Arcview GIS running on a Pentium 4
workstation (With 280 Megabytes of RAM and 1800 MHz of CPU) under the
Windows NT environment. The topologic relation and arc lengths were
defined into Arc/Info and converted to Arcview format. Before and after
defining topology, we did GIS-Ready stage for all layers and edited and
eliminated some errors that affected on the layers. The 4 algorithms were
coded in the Visual Basic, Visual C++ and Mapobject programming languages.
Programs were based on the set of one-to-one, one-to-all and all-to-all
shortest path.
The 4 algorithms were implemented using data sets
with different numbers of nodes and links. In Table 1 the timing of
algorithms is shown.
Table 1 Execution times of algorithms with one to one
condition
Number of nodes |
Dijkstra |
Graph Growth |
DKD |
Genetic |
500 |
0.38 (sec) |
0.42 |
0.41 |
0.92 |
1000 |
3.48 |
3.78 |
3.21 |
4.78 |
2000 |
12.23 |
11.22 |
10.56 |
14.6 |
3000 |
38.74 |
29.89 |
27.43 |
35.43 |
4000 |
50.23 |
44.65 |
41.23 |
53.34 |
5000 |
102.38 |
89.34 |
85.65 |
104.04 |
Table 2 Execution
times of algorithms with one to all condition
Number of nodes |
Dijkstra |
Graph Growth |
DKD |
Genetic |
500 |
0.53 (sec) |
0.59 |
0.61 |
0.58 |
1000 |
5.42 |
5.49 |
5.54 |
5.67 |
2000 |
17.45 |
15.23 |
15.58 |
18.23 |
3000 |
42.24 |
36.56 |
38.68 |
44.32 |
4000 |
50.23 |
44.65 |
47.23 |
48.34 |
5000 |
112.42 |
101.37 |
105.45 |
110.56 |
Table 3 Execution
times of algorithms with all to all condition
Number of nodes |
Dijkstra |
Graph Growth |
DKD |
Genetic |
500 |
1.43 (sec) |
0.63 |
0.67 |
0.59 |
1000 |
8.45 |
6.80 |
6.95 |
6.23 |
2000 |
32.96 |
18.23 |
19.05 |
18.83 |
3000 |
64.78 |
40.24 |
43.50 |
50.61 |
4000 |
104.89 |
52.76 |
57.87 |
73.04 |
5000 |
210.78 |
110.78 |
118.65 |
137.76 |
It is clear from
Table 1 that:
- For one to one condition if the number of nodes < 2000 then we
order Execution times of algorithms as follow: Dijkstra < DKD <
Graph Growth < Genetic.then we suggest Dijkstra algorithm for this
state.
- For one to one condition if the number of nodes >= 2000 then we
order Execution times of algorithms as follow: DKD < Graph Growth
< Dijkstra < Genetic.then we suggest DKD algorithm for this state.
It is clear from Table 2 that:
- For one to all condition if the number of nodes < 500 then we
order Execution times of algorithms as follow: Genetic < Dijkstra
< Graph Growth < DKD.then we suggest Genetic algorithm for this
state.
- For one to all condition if 500 < the number of nodes < 2000
then we order Execution times of algorithms as follow: Dijkstra <
Graph Growth < DKD < Genetic. then we suggest Dijkstra algorithm
for this state.
- For one to all condition if the number of nodes >= 2000 then we
order Execution times of algorithms as follow: Graph Growth < DKD
< Genetic < Dijkstra. then we suggest Graph Growth algorithm for
this state.
It is clear from Table 3 that:
- For all to all condition if the number of nodes < 2000 then we
order Execution times of algorithms as follow: Genetic < Graph Growth
< DKD < Dijkstra.then we suggest Genetic algorithm for this state.
- For all to all condition if the number of nodes >= 2000 then we
order Execution times of algorithms as follow: Graph Growth < DKD
< Genetic < Dijkstra.then we suggest Graph Growth algorithm for
this state.
6-Conclusions
In recent years, we have
witnessed an increasing popularity of transportation related decision
analysis within a GIS environment. In this type of analysis, the
computation of shortest paths is often a central task because shortest
path distances are often needed as input for "higher level" models in many
transportation analysis problems. In addition, the shortest path problem
usually captures the essential elements of more complicated transportation
analysis problems. With the advancement of GIS technology and the
availability of high quality road network data, it is possible to conduct
transportation analysis within a GIS environment. Sometimes, this type of
analysis has to be completed in real time. As a consequence, these
analysis tasks demand high performance shortest path algorithms that run
fastest on real road networks.
Although there has been
considerable reported research related to the evaluation of the
performance of shortest path algorithms, there has been no clear answer as
to which algorithm or a set of algorithms runs fastest on real road
networks in the literature. our evaluation of 4 shortest path algorithms
using real road networks has identified:
- Execution time of mentioned algorithms depends on the problem
conditions and the number of nodes in the real road networks.
- When the number of nodes and the problem conditions increase,
Huristic methods perform bettrer than others.
- For the small number of nodes and the complex problem conditions
Metahuristic methods (Genetic algorithm) perform bettrer than others.
- For the small number of nodes and the easy problem conditions
Dijkstra’s algorithm perform bettrer than others.
7-
Refrences
- Ahuja, R. K., Magnanti, T. L., and Orlin, J. B.(1993) Network Flows:
Theory, Algorithms and Applications. Englewood Cliffs, NJ: Prentice
Hall.
- Cherkassky, B. V., Goldberg, A. V., and Radzik, T. (1993) Shortest
Paths Algorithms: Theory and Experimental Evaluation. Technical Report
93-1480, Computer Science Department, Stanford University.
- Dial, R. B. (1969) Algorithm 360: Shortest Path Forest with
Topological Ordering. Communications of the ACM, 12, 632-633.
- Diaz A. (1996), Optimization Hueristic Algorithms. Madrid,P.P. 29-46
- Dijkstra, E. W. (1959) A Note on Two Problems in Connection with
Graphs. Numeriche Mathematik, 1:269-271.
- Gen M. (1997), Genetic Algorithms and Engineering Design. John
Wiley, Newyork,p.p. 5-63
- Zhan, F. B., and Noon, C. E. (1996) Shortest Path Algorithms: An
Evaluation Using Real Road Networks. Transportation Science (in press).
- Worboys. M. F. (1995) GIS,A Computing Perspective,PP.232- 238.
|