AL-Masri Parallel methods for solving SLAE on a computer cluster

"Parallel methods for solving systems of linear algebraic equations on a computer cluster"

To date, the performance of computing systems to a large increase is not so much by increasing the frequency of the devices, but at the expense of parallel processing [1-3]. This process involves the establishment of hardware (processors with multiple ALU, communication equipment for multiprocessor systems, etc.) and develop efficient algorithms for various parallel platforms. Today's parallel systems is very expensive and are used to solve problems that require large computational resources: the predictions of weather and climate, construction of semiconductor devices, the study of the human genome.

Anyway, there is possibility of creation the cheap is enough and in relation to the effective parallel system on the base of ordinary computers, united through an of communication equipment and, thus, formative one calculable resource. Such systems are named clusters and behave to the class of the parallel systems with the distributed memory [1,3]. The bottleneck of clusters is that for co-operation of separate knots the most widespread and cheap of communication equipment (Fast Ethernet) which utillizes the general environment of communication of data and possesses a large carrying capacity not very much is attracted (by comparison to speed of processing of data modern processors). Therefore the circle of tasks, decided on the similar systems, is limited to the tasks with the small number of exchanges as compared to the amount of calculations. Undeniable advantage of the similar systems is their relative cheapness and actual presence of large computer classes in many educational establishments. For programming of the similar systems the systems of passing of messages in which separate computers co-operate by means of transmission and reception of information are used.

To date, the most popular standard is MPI (message passing interface - the interface of communication) [1-3]. Specific implementations of MPI standard are established producers of software and supplied with equipment. Most standard implementations of MPI called MPICH. This standard describes the names and call the result of the procedures. For each parallel system with the transfer of communications has its own optimized MPI implementation, and to write the program portable between different systems on the MPI source code. To write MPI programs using modern programming languages such as C / C + + and Fortran.

Research in the MPI programming are presented in two directions: the creation of efficient parallel algorithms and efficient implementations of MPI standard for cluster systems. Parallel algorithms developed from the low speed data transmission in connection with the preferred method with the smallest number of exchanges. Ongoing implementation of MPI for cluster systems implemented through exchanges of protocol TCP / IP [4], which leads to the impossibility of effective use of broadcast exchanges, which in the current implementation are carried out through the paired exchange. Today, therefore, are working on the creation of MPI implementation in a real broadcast peresylok. Another possibility to increase the productivity of MPI programs - the combination of exchange and calculations, but used in the implementation of this technology does not lead to significant improvements.

This work is dedicated to the creation of efficient parallel algorithms for the conjugate gradient method for symmetric positive definite pyatidiagonalnyh systems of linear algebraic equations. Systems of this kind occur when konechnoraznostnoy approximation of partial differential equations. To date, developed a large number of efficient sequential algorithms of this type [1]. However, most of them unacceptable for a parallel implementation. This is due to their recursive nature, and, hence, a small overlap.

The aim of this work is to study the existing sequential and parallel algorithms for the conjugate gradient method, the selection of them, as well as the creation of efficient algorithms for the conjugate gradient method, suitable for use in cluster systems.

The work has been studied a large number of conjugate gradient algorithms. When choosing a method for the study in preference to the methods with a minimal number of interprocessor exchanges. As a result of this was allocated to three classes of techniques: block-diagonal methods, techniques and methods of polynomial approximation of the inverse matrix.

The results showed that while the goal for the clusters of large size appropriate to use methods with a minimal number of exchanges, and on clusters of small size - the methods have better convergence.

Cluster - is associated a set of full-fledged computers, used as a single computational resources [1,2].
As a cluster of computational nodes used are available on the market one, two or four computers. Each node of such a system is running its copy of the operating system, as has most often been used standard operating systems: Windows, Linux, Solaris, etc. The composition and output node cluster can vary, giving the ability to create non-uniform systems [1,2].

As a communication protocol in such systems using standard protocols of LAN characterized by low cost and low-speed data transmission. The main characteristics of communication networks: latency - the time of the initial delay in sending messages and network bandwidth, which determines the rate of transmission of information via communication channels [1,2]. The presence of latency determines that the maximum speed of data transmission over the network can not be achieved in communications with a small length. The most commonly used network Fast Ethernet, the main advantage of which - the low cost of equipment. However, large overhead to the transmission of messages in the Fast Ethernet lead to severe restrictions on the range of tasks that can be effectively addressed in this cluster. If the cluster is more universal, then you need to move to more productive communication networks, for example, SCI, Myrinet, etc.

As a means of parallel programming in clusters, using different systems of communication through which the interaction between the cluster nodes. The most common to date standard software systems for the transfer of messages is MPI. The specific implementation of MPI is created parallel systems of manufacturers and comes with hardware.

        1. Processor: Intel Pentium Core2 64bit, 1.86 GHz
        2. RAM: 1 GB
        3. Hard drive: 80 GB
        4. Network Interface: Gigabit Ethernet, 1 Gb / s
        5. Operating System: Microsoft Windows Server 2003Enterprise x64 Edition SP2

To organize the information exchange between processors in the minimum version to send and receive operations data (in this case, of course, there must be a technical possibility of communication between the processors - channels, or lines). In MPI, there is a whole lot of data transfer operations. They provide a different way of sending data. It is this capability are the most strength of the MPI (this, in particular, shows the title and MPI).

This method of parallel computing has received the name of the model of "one program a lot of processes" (single program multiple processes or SPMP1)).

· MPI allows largely to reduce the sharpness of problem of bearableness of the parallel programs between the different computer systems – the parallel program, developed in algorithmic language of C or FORTRAN with the use of library of MPI, as a rule, will work on different calculable platforms;

· MPI assists the increase of efficiency of parallel calculations, as presently practically for every type of the computer systems there is realization of libraries of MPI, in a maximal degree taking into account possibilities of computer equipment;

· MPI diminishes, in a certain plan, complication of parallel program development, ò. to to., from one side, and de autre part, already there is plenty of libraries of parallel methods, created with the use of MPI.

The standard form of the method allows it to deliver an iterative matrix and hold it over the obvious transformations:

Suppose the Gauss-Seidel method in the form of coordinates for a system of general form:

Coordinate form of Gauss-Seidel differs from Jacobi coordinates form method, only that the first sum on the right side of the iterative formula contains components of the vector of solutions is not at k-th, and at (k +1)-th iteration.

The difference between the method of Gauss-Seidel method of simple iteration is that the new vector of values calculated not only on the basis of the previous iteration, and using the values already calculated at the current iteration.
The text of a coherent program for calculating the new values of vector components is presented below.

void GaussZeidel (double * A, double * X, int size)
/ * Set matrix A, the initial approximation of the vector X
dimension of the matrix size, calculate the new value of the vector X * /

Fig. 2. The procedure for calculating the values of the vector using the Gauss-Seidel

Calculations of each depend on the coordinates of a vector of values calculated at the previous iteration, and a vector of values calculated at this iteration. It therefore can not be implemented parallel algorithm, similar to the method of simple iteration: each process can not begin until the computation is not finished calculating the previous process.

It is possible to propose the following modified method of Gauss-Seidel iteration for parallel implementation. Divide calculating coordinates of a vector of processes similar to the method of simple iteration. We will be in each process to calculate a number of vector coordinates using Gauss Seidel, using only the calculated values of the vector of the process. The difference in the parallel implementation compared with the method of simple iteration is only in the procedure for calculating the values of the vector (instead of procedure Iter_Jacoby use the procedure Gauss-Seidel).

/ * Set matrix A, the dimension of the matrix MATR_SIZE, the number of
computed elements of the vector in the process size, calculate the new
values of the vector X with the number first, using the vector X * /

Fig. 1. The procedure for calculating the values of the vector using the Gauss-Seidel (parallel version)

        4.Napravleniya further research
        • Develop a software system for parallel solutions to the cluster SLAE.
        • Compare the effectiveness of various parallel methods for solving the cluster SLAE.

             1. Programming for multiprocessor systems in the standard MPI: Benefit / GI Shpakovsky, NV Serikova. - Mn.: BSU, 2002. - 323s.
             2. Theory and practice of parallel computing: a manual / VP Гергель .- M.: Internet-University of Information Technology; Bin. Laboratory knowledge, 2007.-423s.
              3. Computer networks. The principles, technology, protocols / VG Olifer, NA Olifer. - St. Petersburg: Piter, 2001. - 672s.
              4. Concurrent programming using Open MP.-M.: Bin. Laboratory knowledge Inta., 2008 .- 118s.

              5. Group W, Lusk E, Skjellum A. Using MPI. Portable Parallel Programming with the Message-Passing Interface. - MIT Press, 1994. (http://www.mcs.anl.gov/mpi/index.html)
              6. Parallel information technology: a manual / AB Barsky-M.: Internet-University of Information Technology; Bin. Laboratory knowledge, 2007.-504s.
              7. VD Korneev Concurrent programming in MPI. Moscow - Izhevsk: Institute of Computer Science, 2003. - 304 pp.

              8. Miller, R., Boxer LA Serial and parallel algorithms: A general approach. - M.: Bin. Laboratory of Knowledge, 2006. - 406 pp.
             9. on the international conference on high performance computing (http://www.supercomp.de)
             10. Antonov AS Parallel programming using MPI: A manual (http://parallel.ru/tech/tech_dev/MPI/)

Faculty: Computer science and informatics

Speciality: Computer systems and network

Theme of master's work:

Parallel methods for solving systems of linear algebraic equations on a computer cluster