Maria Krasnokutskaya Research on a data structures in partition problems of large size graphs Abstract

DonNTU, FCIS

Abstract of master's thesis

«Research on a data structures in partition problems of large size graphs»

by Krasnokutskaya Maria

Parallel computations allow to increase efficiency and speed of processing of the information at the decision of intricate problems[1]. Efficient use of a distributed memory parallel computer requires that the computational load be balanced across processors in a way that minimizes interprocessor communication. This mapping requirement can be abstracted to a graph problem in which nodes represent computations, edges represent communication - data-flow graph, and the objective is to assign an equal number of edges crossing between processors. Practical experience has shown that the quality of this mapping can have a substantial impact on performance, hence there is considerable interest in effective mapping algorithms - graph partition algorithms.[2] The most popular of them are

Levelized Nested Dissection[3];
Kernighan-Lin/ Fiduccia-Mattheyses Algorithm (KL/FM)[2, 3];
Spectral Methods[1, 2, 3];
Multilevel Schemes[2, 3].

It is necessary to note, that data-flow graph usually have a sparse adjacency matrix. Partitioning algorithms usually use adjacency matrix or matrices derived from it (e.g. Laplacian matrix in Spectral bisection method), that are sparse too. In a case of graphs of large size (1000, 10000 vertexes) it results in uneconomical use of memory and computing resources in program realization of algorithms.

The purpose of this work is research on a data structures in partition problems of large size graphs. As researched algorithm the spectral bisection algorithm is chosen. This spectral method bisects a graph by considering an eigenvector of an associated matrix to gain an understanding of global properties of the graph. The method has recived much attention because it offers a good balance between generality, quality and efficiency[hendr].

One of ways of solving a problem of redundancy representation of the graph information is use of multilevel schemes. Multilevel scheme consists of three phases: graph coarsening, initial partitioning, and multilevel refinement. In the graph coarsening phase, a series of graphs is constructed by collapsing together selected vertices of the input graph in order to form a related coarser graph. This newly constructed graph then acts as the input graph for another round of graph coarsening, and so on, until a sufficiently small graph is obtained. Computation of the initial bisection is performed on the coarsest (and hence smallest) of these graphs and so is very fast. Finally, partition refinement is performed on each level graph, from the coarsest to the finest (i.e. original graph), using a KL/FM-type algorithm or spectral method [3].

Three phases of Mutilevel paradigm

Other way - use of alternative structures for matrixes representation. As initial research four data structures were investigated:

a two-dimensional array;
an array of the dynamic arrays;
three arrays;
a matrix in format RR(C)O.

a)	b)
c)
d)
Picture – Mehtods of matrix representation; a) two-dimensional array; b) an array of the dynamic arrays; c) three arrays; г) matrix in format RR(C)O

Specifically, the sparse row-wise format is one of the most commonly used storage schemes for sparse matrices. A two-dimensional real matrix A may be represented in RR(C)O by specifying the three vectors AN, JA and IA [4]:

AN contains all the elements of A that have absolute values above the specified threshold;
JA is an integer list of the original column indices in A for the elements in JA (i.e., AN and JA have the same length);
IA is an integer list which counts the number of nonzero elements in each row in A. The elements of IX thus specify where in AN and JA to start looking for the nonzero elements of a given row.

Program realization of these formats has been developed, diagrams of dependence of time of multiplication of a matrix on a vector from matrix dimension and its rarity are constructed.

This diagram shows that there is better to use array of dynamic arrays or RR(C)O format to represent sparse matrixes. Use of RR(C)U format shows the best results. In following researches algorithm of spectral bisection will be realized using this formats.

While writing this abstract master's thesis was not completed yet. Final end - January 2007. The full text of work and all materials on a theme can be received from the author or her Scientific adviser after the specified date.

References

Bradford L. Chamberlain. Graph Partitioning Algorithms for DistributingWorkloads of Parallel Computations, 1998
generals.pdf (340 Kb)
http://www.cs.washington.edu/homes/brad/cv/pubs/degree/generals.html
Bruce Hendrickson, Robert Leland. Multidimensional Spectral Load Balancing. Sandia National Laboratories Albuquerque, 1993
hendrickson93multidimensional.pdf (204 Kb)
http://citeseer.csail.mit.edu/hendrickson93multidimensional.html
Kirk Schloegel, George Karypis, Vipin Kumar. Graph Partitioning for High Performance Scientific Simulations.University of Minnesota, Department of Computer Science, Minneapolis, 2000
http://citeseer.ist.psu.edu/schloegel00graph.html
SOY/i - Sparse Operations with Yorick/IDL
http://homepage.mac.com/rflicker/soy.htm.