ÄîíÍÒÓ > Ïîðòàë ìàãèñòðîâ ÄîíÍÒÓ > Ãàíóùàê Íàäåæäà Êîíñòàíòèíîâíà
Áèáëèîòåêà > Ñòàòüÿ ¹6 Evaluation of Route Finding Methods in GIS Application

Roozbeh Shad
Graduate student
Email: Rouzbeh_shad@yahoo.com

Hamid Ebadi Assistant professor
Email: ebadi@kntu.ac.ir

Mohsen Ghods
Graduate student
Email: mohsen59ghods@yahoo.com

Dept of Geodesy and Geomatics Eng. K.N.Toosi University of Technology
No 1346, Mirdamad cross, Valiasr st., Tehran, IRAN
Fax (21) 877 9476


Evaluation of Route Finding Methods in GIS Application

Sourñe:
  gisdevelopment.net/technology/gis/ma03202pf.htm

Abstract

One of the cases that distinguishes GIS from other information systems are spatial information functions.These functions usually provide selection switches and solutions for GIS users. Simutaneously with GIS techniques development, GIS executive analyse functions has been developed interestingly. For instance network analysis is one of these functions.

Shortest Path finding is one of the network analysis functions that introduced as important applications in transportation problems. With respect to route finding analysis, input data type and volume diversity and different effective parameters on a route finding algorithms performance applications, there have been evaluated different methods for this solution.

Therefore in this paper with briefly reviewing of graph theory and effective parameters on the route finding algorithm’s performance, we evaluated processing time of Dijkstra, Huristic and Genetic Algorithms using Visual Basic, MapObject and Visual C++ programing languages. We didn’t find clear answer for mentioned algorithms which performed on real road networks because their performance depended on the problem complexity and the number of nodes in real road networks. Finally We had some suggestion for using mentioned algorithms with respect to specific conditions.

1-Introduction

With the development of geographic information systems (GIS) technology, network transportation analyses within a GIS environment have become a common practice in many application areas. A key problem in network and transportation analyses is the computation of shortest paths between different locations on a network. Sometimes this computation has to be done in real time. In some cases the fastest route has to be determined in a few seconds in order to ensure the safety of a patient. Moreover, when large real road networks are involved in an application, the determination of shortest paths on a large network can be computationally very intensive. Because many applications involve real road networks and because the computation of a fastest route (shortest path) requires an answer in real time, a natural question to ask is: Which shortest path algorithm runs fastest on real road networks?

Although considerable empirical studies on the performance of shortest path algorithms have been done, there is no clear answer as to which algorithm, or a set of algorithms runs fastest on real road networks (Cherkassky, 1993). In our case study conducted by a set of three shortest path algorithms that run fastest on real road networks has been identified. These three algorithms are: 1) Dijkstra’s Algorithm, 2) Huristic Methods, and 3) Genetic Algorithm. As a sequel to that study, this paper reviews and summarizes these three algorithms, and demonstrates implementation strategies related to the algorithms.

2-Graph Theory

In Graph Theory a network is defined as a directed graph G = (N, A) consisting of a set N of nodes and a set A of arcs with associated numerical values, such as the number of nodes, n=|N|, the number of arcs, m=|A|, and the length of an arc connecting nodes i and j, denoted as l(i,j). The shortest path problem can be stated as follows: given a network, find the shortest distances (least costs) from a source node to all other nodes or to a subset of nodes on the network (Ahuja et al, 1993). These shortest paths represent a directed tree T rooted from a source node s with the characteristic that a unique path from s to any node i on the network is the shortest path to that node. The length of the shortest path from s to any node i is denoted as d(i). This directed tree is called a shortest path tree. For any network with n nodes, one can obtain n distinctive shortest path trees. Shortest paths from one (source) node to all other nodes on a network are normally referred as one-to-all shortest paths. Shortest paths from one source node to a subset of the nodes on a network can be defined as one-to-some shortest paths. Shortest paths from every node to every other node on a network are normally called all-to-all shortest paths (Node Locating problem).

3-Effective Parameters on the Shortest Path Algorithms

Four parameters that effect on shortest path algorithms are Network Representations, Labeling Method, Selection Rules and Data Structures (Zhan and Noon, 1996). These parameters effect on shortest path algorithms performance and cause to be developed different kind of algorithms.

3-1-Network Representations

The way in which an input network is represented and implemented in a shortest path algorithm is vital to the performance of the algorithm.There are several straightforward ways of representing a general network for computational purposes. Past research has proven that the Forward Star representation is the most efficient data structure for representing networks (Gallo and Pallottino, 1988). Two sets of arrays are used in the forward star data structure. The first array is used to store data associated with arcs, and the second array is used to store data related to nodes. All arcs of a network in question are maintained in a list and are ordered in a specific sequence. That is, arcs emanating from nodes 1, 2, 3, ..., are ordered sequentially. Arcs emanating from the same node can be ordered arbitrarily, however. All information associated with an arc, such as starting node, ending node, cost, arc length and capacity are stored with the arc in some way (e.g., corresponding arrays or linked lists).

3-2-Labeling Method

The labeling method is a central procedure in most shortest path algorithms(Gallo and Pallottino, 1988). The output of the labeling method is an out-tree from a source node, s, to a set of nodes. This out-tree is constructed iteratively, and the shortest path from s to i is obtained upon termination of the method. Three pieces of information are maintained for each node i in the labeling method while constructing a shortest path tree: the distance label, d(i), the parent node, p(i), and the node status, S(i). The distance label, d(i), stores the upper bound of the shortest path distance from s to i during iteration. Upon termination of an algorithm, d(i) represents the unique shortest path from s to i. The parent node p(i) records the node that immediately precedes node i in the out-tree. The node status, S(i), can be one of the following: unreached, temporarily labeled and permanently labeled. When a node is not scanned during the iteration, it is unreached. Normally the distance label of an unreached node is set to positive infinite. When it is known that the currently known shortest path of getting to node i is also the absolute shortest path we will ever be able to attain, the node is called permanently labeled. When further improvement is still expected to be made on the shortest path to node i, node i is considered only temporarily labeled. It follows that d(i) is an upper bound on the shortest path distance to node i if the node is temporarily labeled; and d(i) represents the final and optimal shortest path distance to node i if the node is permanently labeled.

3-3-Selection Rules and Data Structures

The performance of a particular shortest path algorithm partly depends on how the basic operations in the labeling method are implemented. Two aspects are particularly important to the performance of a shortest path algorithm: 1) the strategies used to select the next temporarily labeled node to be scanned, and 2) the data structures utilized to maintain the set of labeled nodes (Ahuja et ah,1993).

Strategies commonly used for selecting the next temporarily labeled node to be scanned are FIFO (First In First Out): the oldest node in the set of temporarily labeled nodes is selected first in this search strategy, LIFO (Last In First Out): the newest node in the set of temporarily labeled nodes is selected first in this search strategy, BFS (Best-First-Search): the minimum distance label from the set of temporarily labeled nodes is considered as the best node (Gallo and Pallottino, 1988).

A number of data structures can be used to manipulate the set of temporarily labeled nodes in order to support these strategies.These data structures include arrays, singly and doubly linked lists, stacks, buckets and queues (Sedgewick,1990). A singly linked list contains a collection of elements. Each element has a data field and a link field. The data field contains information to be stored, and the link field contains a pointer pointing to the next element in the list.A doubly linked list contains two pointers. One pointer points to the previous element in the list, and another pointer points to the next element in the list. Stack is another special type of list which only allows removal and addition of an element at one end of the list. The bucket data structure is related to the double bucket implementations of the Dijkstra algorithms. Buckets are sets arranged in a sorted fashion. Bucket k stores all temporarily labeled nodes whose distance labels fall within a certain range. A queue is related to the Graph Growth algorithms.The queue is a special type of list which allows the addition of an element at the tail and the deletion of an element at the head (Ahuja,1993).

4- Reviewing of Algorithm Theories

In this section we breifly review implementation senarios for Dijkstra's Algorithm, Huristic Methods and Genetic Algorithm.

4-1-Dijkstra’s Algorithm

Dijkstra's Algorithm (Dijkstra, 1959) for computing the shortest path is based upon the caculation of values in three one dimensional arrays, each of size equal to the number of nodes in the network. Each row of each array corresponds to one of the nodes of the network. As the algorithm proceeds, paths are caculated from the start node to other nodes in the network, paths are compared and the best (minimum weight) paths are chosen, given the state of knowledge of the network at that stage in the progress of the algorithm. At each stage in the computation:

  • The array distance keeps track of the current minimal distances from the start node to the array nodes. (As the algorithm proceeds, these distances become refined to closer approximations to the shortest distance).
  • The array path keeps a record of the preceding nodes on the current best paths from the start node to the array node.
  • The array included marks off the nodes as they are used in the caculation of the minimal distance path from start node to the end node.
The algorithm initialized arrays as follows:
  • Cells in the distance array are initialized to 0 for the start node cell; infinity for the cells whose nodes are not directly connected to the start node; and to their direct distances from the start node to all the other cells.
  • Cells in the path array are initialized to the start node for cells whose nodes are adjucent to start, otherwise undefined.
  • Cells in the included array are initialized to ‘no’ for all cells except the start node cell.

The algorithm then iterates the following procedure until the end node market ‘yes’ in included array.

  1. Find the node (say n) whose distance from the start node is smallest amongst all those nodes not yet market ‘yes’ in included.
  2. Mak n as ‘yes’ in included.
  3. For each node m not yet market ‘yes’ in included:
    If n and m are adjucent and if the current distance (in the distance array) from start to n plus the edge weight of nm is less than the current distance from start to m then update the distance array to the new and smaller distance from start to m; also update the path matrix to show the proceding node in the path to m to be n.

Eventually, the end node will be included in the computation and the algorithm halts, returning the shortest path from the start node to the end node in the end node cell of distance. The time complexity of Dijksta’s is O(n^2), where n is the number of nodes (Worboys M.F.,1995).

4-2-Huristic methods

Multiple goals, factors and constraints in transportation and location real problems make very difficult their treatment with the traditional optimization methods. On other hand, there are different heuristic and probabilistic methods that no always guarantee the optimal solution but are able to find a possible solution space, taking advantage of the particular attributes of the target problem. In a recent study, a set of two shortest path algorithms that run fastest on real road networks has been identified. These two algorithms are: 1) the graph growth algorithm implemented with two queues, 2) the Dijkstra algorithm implemented with double buckets, we reviews and summarizes these two algorithms (Zhan and Noon, 1996).

4-2-1- The Graph Growth Algorithm Implemented With Two Queues

The Graph Growth algorithm implemented with two queues (TQQ) was introduced by Pallottino in 1984. TQQ is an improved version of the Graph Growth implementation developed by Pape (PAP) in 1974. There are four basic operations in the basic procedure for constructing a shortest path tree in this method: Queue_Initialization(Q) initialize queue; Queue_Removal(Q, i) remove node i from queue Q; Queue_Insertion(Q, j) insert node j into queue Q; and Q=Null? check whether queue Q is empty.

In the implementations of TQQ, nodes are partitioned into two sets: the first set of nodes are those nodes whose current distance labels have not already been used to find a shortest path and the second set contains the remaining nodes. The first set of nodes is maintained by a queue Q. Nodes in the second set are further split into two categories: 1) the unreached nodes which have never entered Q, i.e., nodes whose distance labels are still infinite, and 2) labeled nodes, i.e., the nodes that have passed through Q at least once, and the nodes whose current distance labels have already been used. In TQQ we used a data structure called Two-Queue (Q) to maintain the first set of nodes in Q. A Two-Queue consists of a FIFO queue (Q') and a FIFO queue (Q''). Nodes can be inserted at the end of Q' and Q", and they can be removed from the head of Q' and Q".It follows that for any node that is not already in Q, the node is inserted at the end of Q' if it is unreached, or the node is inserted at the end of Q" if it is temporarily labeled (Pallottino,1984).

4-2-2- The Dijkstra Algorithm implemented with double buckets

The original Dijkstra algorithm partitions all nodes into two sets: temporarily and permanently labeled nodes. At each iteration, it selects a temporarily labeled node with the minimum distance label as the next node to be scanned. Once a node is scanned, it becomes permanently labeled. The Dijkstra algorithm terminates when all nodes become permanently labeled. detailed procedure of the Dijkstra algorithm has been described in section 4-1. In Dijkstra's original algorithm, temporarily labeled nodes are treated as a nonordered list (Dijkstra,1959). This is equivalent to treating the queue Q in the above general procedure for shortest path tree construction as a nonordered list. This is of course a bottleneck operation because all nodes in Q have to be visited at each iteration in order to select the node with the minimum distance label. A natural enhancement of the original Dijkstra algorithm is to maintain the labeled nodes in a data structure in such a way that the nodes are sorted by distance labels. The bucket data structure is just one of those structures. Buckets are sets arranged in a sorted fashion. Bucket k stores all temporarily labeled nodes whose distance labels fall within a certain range. Nodes contained in each bucket can be represented with a doubly linked list. A doubly linked list only requires O(1) time to complete an operation in each distance update in the bucket data structure. These operations include: 1) checking if a bucket is empty, 2) adding an element to a bucket, and 3) deleting an element from a bucket.

Dial (1969) was the first to implement the Dijkstra algorithm using buckets. In Dial's implementation, bucket k contains all temporarily labeled nodes whose distance labels are equal to k. Buckets numbered 0, 1, 2, 3, ..., are checked sequentially until the first nonempty bucket is identified. Each node contained in the first nonempty bucket has the minimum distance label by definition. One by one, these nodes with the minimum distance label become permanently labeled and are deleted from the bucket during the scanning process. The position of a temporarily labeled node in the buckets is updated accordingly when the distance label of a node changes (Ahuja et ah,1993).

Two levels of buckets, high-level and low-level, are maintained in the DKD implementation. A total of d buckets in the low-level buckets are used. A bucket i in the high-level buckets contains all nodes whose distance labels are within the range of [i*d, (i+1)* d-1]. In addition, a nonempty bucket with the smallest index L is also maintained in the high-level buckets. A low-level bucket d(j)-L*d maintains nodes whose distance labels are within the range of [L*d, (L+1)* d-1]. Nodes in the low-level buckets are examined during the scanning process. After all nodes in the low-level buckets are scanned, the value of L is increased. When the value of L increases, nodes in the nonempty high-level buckets are moved to its corresponding low-level buckets, and the next cycle of scanning process begins (Cherkassky, 1993).

4-3-Genetic Algorithm

In this context, Genetic Algorithm is a metaheuristic technique (Diaz, 1996) that could provides robust tools with optimal or quasi-optimal designing and programming of transportation networks and node locating. Because its excellent flexibility, robustness and adaptability characteristics, genetic algorithm has been successfully applied in the non-linear and complex optimization problem solutions, and also it is very appropriated to face the noisy combinatorial problems associated to the real systems optimization and transportation networks.

Individual representation as paths, composed of routes (stretch joining two adjacent nodes), conforming a trajectory from an origin node to a goal node, and operators utilization on sub-paths are novel features presented in this paper. Different solutions with Genetic Algorithms to the same problems have been well studied in (Gen, 1997). A route is the two-neighbor nodes union and is represented as a matrix containing the connecting arcs of each node. The possible amount of paths in each generation corresponds to the population. Besides, not there are, as traditional solution methods, arc constraints; in other words, directed and not directed arcs could be present in the model. Fitness is assigned comparing each path with the others, and is the method to determine if a path has possibility of survive to next generation. Fitness is computed as the sum of products of the route lengths and their associated costs. Selection operation extracts first allocated elements of the sorted population and they are transferred to the first positions of the next generation population, based on a pre-established selection rate.A simple crossing operator is used: two paths from the previous generation are randomly chosen, two cross points are randomly picked in each path string, then the marked fragments of the paths are extracted and interchanged in both paths, previously doing a checking for incompatible nodes. Mutation operator is used in a similar way of crossing one. A path and its mutation points are randomly chosen. A new legal sub-string is random generate. This sub-string will occupied the marked sub-path. Control parameters of the model are defined as a condition set to guide in an efficient way the algorithm, like: Generations number, generation paths number, path length, finishing criteria, selection, mutation, and crossing rates. The finish criteria used to be dependent of the predetermined generation number, or the objective function valuation stability.In the case of n nodes a number of n generations was selected. The more adequate selection, crossing and mutation rates for the proposed model were found as 34%, 42%, and 34% of the population paths, respectively.

The shortest path problem to solve is a generalization of the traditional problem of shortest path. This is, optimal (upon fitness function evaluation) path searching connecting an origin node with a goal node, passing through an intermediary node set, linking each other by non directed routes, from which, length, transportation costs, average altitude, time attributes, etc, are known, and used to evaluate the fitness function.The genetic algorithm allows an initial path population by random generation. Each path-individual has a fitness function facilitating to be differentiated with others, and then through genetic operators participate in next generationäs development of better paths, producing each time better paths for the trajectories with the best fitness from the previous population.

5-Performance Analysis

We tested 4 shortest path algorithms using real road networks. In our evaluation, we used 6 real road networks for evaluating shortest path algorithms. These 6 networks included the Irans Road Planning Network that represented on the scale of 1:25000 and 1:250000. The level road networks are composed of low-detail road networks and high-detail road networks. The low-detail networks contain two levels of roads, including Freeway and Highway. The high-detail networks consist of Asphalt road1,Asphalt road2, Asphalt road3 and Gravel road. The 6 networks were stored and maintained in Arcview GIS running on a Pentium 4 workstation (With 280 Megabytes of RAM and 1800 MHz of CPU) under the Windows NT environment. The topologic relation and arc lengths were defined into Arc/Info and converted to Arcview format. Before and after defining topology, we did GIS-Ready stage for all layers and edited and eliminated some errors that affected on the layers. The 4 algorithms were coded in the Visual Basic, Visual C++ and Mapobject programming languages. Programs were based on the set of one-to-one, one-to-all and all-to-all shortest path.

The 4 algorithms were implemented using data sets with different numbers of nodes and links. In Table 1 the timing of algorithms is shown.

Table 1 Execution times of algorithms with one to one condition

Number of nodes  Dijkstra   Graph Growth  DKD   Genetic
500   0.38 (sec)  0.42  0.41   0.92
1000  3.48  3.78  3.21   4.78
2000  12.23  11.22  10.56   14.6
3000  38.74  29.89  27.43   35.43
4000  50.23  44.65  41.23   53.34
5000  102.38  89.34  85.65   104.04

Table 2 Execution times of algorithms with one to all condition

Number of nodes  Dijkstra   Graph Growth  DKD   Genetic
500   0.53 (sec)  0.59  0.61   0.58
1000  5.42  5.49  5.54   5.67
2000  17.45  15.23  15.58   18.23
3000  42.24  36.56  38.68   44.32
4000  50.23  44.65  47.23   48.34
5000  112.42  101.37  105.45   110.56

Table 3 Execution times of algorithms with all to all condition

Number of nodes  Dijkstra   Graph Growth  DKD   Genetic
500   1.43 (sec)  0.63  0.67   0.59
1000  8.45  6.80  6.95   6.23
2000  32.96  18.23  19.05   18.83
3000  64.78  40.24  43.50   50.61
4000  104.89  52.76  57.87   73.04
5000  210.78  110.78  118.65   137.76

It is clear from Table 1 that:

  1. For one to one condition if the number of nodes < 2000 then we order Execution times of algorithms as follow: Dijkstra < DKD < Graph Growth < Genetic.then we suggest Dijkstra algorithm for this state.
  2. For one to one condition if the number of nodes >= 2000 then we order Execution times of algorithms as follow: DKD < Graph Growth < Dijkstra < Genetic.then we suggest DKD algorithm for this state.

It is clear from Table 2 that:

  1. For one to all condition if the number of nodes < 500 then we order Execution times of algorithms as follow: Genetic < Dijkstra < Graph Growth < DKD.then we suggest Genetic algorithm for this state.
  2. For one to all condition if 500 < the number of nodes < 2000 then we order Execution times of algorithms as follow: Dijkstra < Graph Growth < DKD < Genetic. then we suggest Dijkstra algorithm for this state.
  3. For one to all condition if the number of nodes >= 2000 then we order Execution times of algorithms as follow: Graph Growth < DKD < Genetic < Dijkstra. then we suggest Graph Growth algorithm for this state.

It is clear from Table 3 that:

  1. For all to all condition if the number of nodes < 2000 then we order Execution times of algorithms as follow: Genetic < Graph Growth < DKD < Dijkstra.then we suggest Genetic algorithm for this state.
  2. For all to all condition if the number of nodes >= 2000 then we order Execution times of algorithms as follow: Graph Growth < DKD < Genetic < Dijkstra.then we suggest Graph Growth algorithm for this state.

6-Conclusions

In recent years, we have witnessed an increasing popularity of transportation related decision analysis within a GIS environment. In this type of analysis, the computation of shortest paths is often a central task because shortest path distances are often needed as input for "higher level" models in many transportation analysis problems. In addition, the shortest path problem usually captures the essential elements of more complicated transportation analysis problems. With the advancement of GIS technology and the availability of high quality road network data, it is possible to conduct transportation analysis within a GIS environment. Sometimes, this type of analysis has to be completed in real time. As a consequence, these analysis tasks demand high performance shortest path algorithms that run fastest on real road networks.

Although there has been considerable reported research related to the evaluation of the performance of shortest path algorithms, there has been no clear answer as to which algorithm or a set of algorithms runs fastest on real road networks in the literature. our evaluation of 4 shortest path algorithms using real road networks has identified:

  1. Execution time of mentioned algorithms depends on the problem conditions and the number of nodes in the real road networks.
  2. When the number of nodes and the problem conditions increase, Huristic methods perform bettrer than others.
  3. For the small number of nodes and the complex problem conditions Metahuristic methods (Genetic algorithm) perform bettrer than others.
  4. For the small number of nodes and the easy problem conditions Dijkstra’s algorithm perform bettrer than others.

7- Refrences

  • Ahuja, R. K., Magnanti, T. L., and Orlin, J. B.(1993) Network Flows: Theory, Algorithms and Applications. Englewood Cliffs, NJ: Prentice Hall.
  • Cherkassky, B. V., Goldberg, A. V., and Radzik, T. (1993) Shortest Paths Algorithms: Theory and Experimental Evaluation. Technical Report 93-1480, Computer Science Department, Stanford University.
  • Dial, R. B. (1969) Algorithm 360: Shortest Path Forest with Topological Ordering. Communications of the ACM, 12, 632-633.
  • Diaz A. (1996), Optimization Hueristic Algorithms. Madrid,P.P. 29-46
  • Dijkstra, E. W. (1959) A Note on Two Problems in Connection with Graphs. Numeriche Mathematik, 1:269-271.
  • Gen M. (1997), Genetic Algorithms and Engineering Design. John Wiley, Newyork,p.p. 5-63
  • Zhan, F. B., and Noon, C. E. (1996) Shortest Path Algorithms: An Evaluation Using Real Road Networks. Transportation Science (in press).
  • Worboys. M. F. (1995) GIS,A Computing Perspective,PP.232- 238.
© GISdevelopment.net. All rights reserved.
 Íàâåðõ