Tetyana Dyachenko

Tetyana Dyachenko

Theme of master's thesis: "Research of the parallel algorithm for Markov models of computer systems"

Supervisor: associate professor Natalia Datsun

Consultant: associate professor Tatyana Mikhailova



Abstract for masters project

Research of the parallel algorithm for

Markov models of computer systems

Objective

Objective is to develop an efficient parallel algorithm for constructing an analytic model of the cluster system, which will improve the performance of cluster systems based on the applying of the results of modeling.

Research objectives:

  • develop of Markov models of the functioning of cluster systems;

  • developof parallel algorithms for the implementation of large-scale problems;

  • build an effective parallel algorithm for solving the linear sparse unstructured matrix;

  • improve existing methods of calculating the basic operating characteristics of cluster systems.

Background

Analysing methods of constructing high-performance cluster systems and tools for evaluating their performance demonstrates the complexity of solving the problem of analysis and synthesis of computing systems. Traditional solutions of this problem is insufficient and there is a need to develop new approaches to its solution.

  1. Conflicts using shared resources leads to poor performance of computing systems. It is nessesary to use analytical methods of analysis (e.g. Markov chains) on the design stage for the evaluation and recommendations.


  2. Analytical methods are mainly used to analyze the performance of single-processor computer systems, mainly with sequential processing. For parallel processing and multiprocessor computer systems, including cluster, analytical methods require further development.


  3. In the existing publications is not practical recommendations of using discrete Markov models.


  4. Discrete Markov model, that describe the projected area networks, has a great dimension, and in most cases can not be solved on a computer because of the limited computing resources. One way of solving this problem is the parallelization of these algorithms. It is necessary to construct a parallel algorithm and estimate the parameters of parallelism.

Review of research

Due to the large number of created parallel computing systems (CS), work in this area requires further development and does not lose its relevance. A great contribution to solving the problems of new multiprocessor architecture implementations and evaluation of their effectiveness have made Lebedev S.A., Glushkov V.M. , Kapitonov U.V. , Samofalov K.G., Zabrodin A.V. [1,2], Kaliayev A.V. [3], Burtsev V.S. [4], Lund V.P. [5] Mitrofanov Y.B. [6], Bosharin G.P. [7], Kogan Y. [7,8], Menasco D. and Almeida, V., Kemeny J., Stolings W. [9].

A lot of problems solved on multiprocessor systems, require solving the linear system of equations,

Ax = b,

where, A is a user supplied n by n sparse matrix,
b is a user supplied vector of length n,
x is a vector of length n to be computed.

Generally, the solution of linear algebra problems is a significant part (50% or more) time solutions throughout problem in general [19].

Software for Sparse Linear Algebra

Aztec

Aztec is an iterative library that greatly simplifies the parallelization process solving the linear system of equations. A collection of data transformation tools are provided that allow for easy creation of distributed sparse unstructured matrices for parallel solution. Once the distributed matrix is created, computation can be performed on any of the parallel machines running Aztec: nCUBE 2, IBM SP2 and Intel Paragon, MPI platforms as well as standard serial and vector platforms.

Aztec includes a number of Krylov iterative methods such as conjugate gradient (CG), generalized minimum residual (GMRES) and stabilized biconjugate gradient (BiCGSTAB) to solve systems of equations. These Krylov methods are used in conjunction with various preconditioners such as polynomial or domain decomposition methods using LU or incomplete LU factorizations within subdomains [10].

ScaLAPACK

ScaLAPACK is a project to provide parallel implementations for linear algebra. This project provides ScaLAPACK as dense and band matrix software. Especially, ScaLAPACK is designed for heterogeneous computing and implemented by using MPI and PVM for portability. PBLAS provides the fundamental operations of ScaLAPACK. PBLAS is a distributed memory version of BLAS that has similar functionalities to BLAS. ARPACK9 is software for solving large scale eigenvalue problems. This software is suitable for large sparse matrices or for a matrix A which require O(n) floating point operations but O(n2) operations when we calculate a matrix-vector product Ax [11].

PETSc

PETSc is a collection of lots of parallel scientific software which includes not only software for sparse matrices but also solvers for nonlinear equations, ODEs, PDEs, and etc. Interfaces for C, C++, Fortran, and Python are provided [12].

In DonNTU in recent years similar subject has been developed by masters:

  1. Kucherenosova Olga (Supervisor:prof.Bashkov E.A. )
    Theme Master's thesis: "Investigation of the effectiveness of parallel computing systems"

  2. Mishchuk Julia (Supervisor: prof. Feldman L.P.)
    Theme Master's thesis: "Analysis of the effectiveness of computer systems using Markov models"

  3. Gorb Paul (Supervisor: Assoc. Ladyzhenskii U.V.)
    Theme Master's thesis: "The solution of large systems of linear algebraic equations in the distributed computing environment"

  4. Al-Masri Mohammed Rida (Supervisor: Assoc. Ladyzhenskii U.V.)
    Theme Master Thesis: "Parallel methods for solving the linear to the computer cluster"

Markov model of a homogeneous computing cluster

One example of computer systems with distributed processing are clusters. They have high performance and "survival" of computing resources. Classification of cluster systems is the most common [14,15]. One of the main problems is to develop recommendations for the rational use of resources of the network. This problem arises in the operation and the design of the network. For efficiently operation of the cluster system is necessary to build its analytical model and for each class of tasks determine the basic parameters of the computing environment. These models can be used to optimize the clusters on the design stage.

In this paper we consider a cluster with the joint use of space. DB Oracle Parallel Server and Informix use such topology [15]. Performance of such systems can be increased either by increasing the number of processors and amount of RAM in each node of the cluster, and by increasing the number of nodes themselves.

Schematic diagram of the cluster of this type is shown in Figure 1.

 Architecture of a cluster with sharing disk space
Figure 1 - Architecture of a cluster with sharing disk space

The simplified model of the cluster with sharing disk space is shown in Fig. 2.

Input node (management server) distributs tasks between the application servers. Number of application servers - N1. Each of them may apply to the data on the disks, which number are N2. Due to limited computing capacity, we assume that the number of tasks, processes the computer system is not more than M.

Construct a discrete Markov model. Since a homogeneous cluster, we represent servers, and, similarly, disk drives as multi-channel devices. Tasks come for service on the servers (no more than M). The service is selected by the rule "first input — first output".

For example, tasks that are processed in such a cluster are homogeneous and have the following characteristics:
p12 — the probability of a query to one of the N1 servers
p23 — the probability of a query to one of the N2 discs
p21 — the probability of completion of service one of the N1 servers
p10 — the probability of completion of maintenance tasks
q0 — the probability of a new task.


Markov model of cluster
Figure 2 — Block diagram of Markov model of cluster with shared disk space
(Animation: volume — 47 KB, the number of frames — 12, the number of cycles — 5, size — 702x136)

The model constructing based on the method described in [16].

Algorithm of constructing a discrete Markov model consists of two parts: computing the matrix of transition probabilities and the vector of stationary probabilities.

In determining the matrix of transition probabilities using the following parameters:
N — the number of nodes in the modeled system (devices or blocks)

M — the number of applications in the system.

number of states of the system equals the number of placements j tasks on N nodes:

,

the total number of states is calculated as:

.

total number of nonzero elements of the matrix of transition probabilities is calculated as:


The table shows the results of calculations of the proportion of nonzero elements in a matrix of transition probabilities for the model with three groups of devices, N = 3 [17].

Number of tasks (M)
20 50 100 200
Dimension of the matrix of transition probabilities (number of states of the model)
1771 x 1771 23426 x 23426 176851 x 176851 1373701 x 1373701
Number of nonzero elements
1257408 94113953 2746496028 83871314553
Part of nonzero elements in the matrix of transition probabilities
0.4009 0.1715 0.0878 0.0444

Dimension of a matrix depends on the amount of tasks in the simulated system.

Order of matrices arises from tens or hundreds of thousands to tens of millions. This is caused by the desire to operate more accurate discrete models, allowing to obtain approximate solutions which are closer to the solutions of initial problems, it is better show local peculiarities of the process.

Solving such problems requires significant computational resources, which can be represented by parallel computers. Therefore, there is an urgent problem of creating parallel algorithms for solving this problem.

An important feature of a matrix is the fact that the sum of the elements of each row is unity, regardless of their number. Therefore the order of elements in the matrix can strongly vary — from 0,5 to 10 -17 . We can not apply zooming without loss of accuracy for such matrices.

Another feature of the derived matrices is that they are unstructed,they can not simply be converted to one of the standard types: symmetric, tape, block-diagonal with bordering, etc., which have already developed efficient parallel algorithms [18-21].

Algorithm for solving the linear system of equation offers a number of requirements:

  • matrix storage scheme be singular: store only nonzero elements, and information about their location in original matrix [22];

  • solving method should prevent the loss of accuracy in calculations;

  • parallel algorithm must be efficient and scalable.

Expected results

  • develop a discrete model of the cluster, which can be used to obtain the basic characteristics of the computing environment;

  • develop a new parallel algorithm for the implementation of a discrete Markov model of a homogeneous cluster;

  • obtain the basic characteristics of parallelization;

  • develop a parallel algorithm for solving the linear system with sparse unstructed matrices.

Important:

Master's work has not completed when this abstract was written. Completion of the master's work: December 2010.

Full text of the work and materials on the topic can be obtained from the author or her supervisor after that date.

Bibliography

  1. Забродин А.В., Левин В.К. Опыт разработки параллельных вычислительных технологий, создание и развитие семейства МВС. // Тезисы докладов конференции «Высокопроизводительные вычисления и их приложения». 30 октября-2 ноября 2000 года, Черноголовка. — М.:МГУ, 2000. — с. 20-21.
  2. Забродин А.В. Параллельные вычислительные технологии. Состояние и перспективы // Материалы первой молодежной школы «Высокопроизводительные вычисления и их приложения».
  3. Каляев А. В., Левин И.И. Модульно-наращиваемые многопроцессорные системы со структурно-процедурной организацией вычислений. — М.: Янус-К, 2003. — 380 с.
  4. Бурцев В.С. СуперЭВМ в России. История и перспективы. — ЭЛЕКТРОНИКА: НТБ, 2000, №4, с. 5–9.
  5. Малиновский Б. Н., Боюн В.П., Козлов Л. Г. Введение в кибернетическую технику. Параллельные структуры и методы. — Киев: Наук. думка, 1989. — 245 с.
  6. Митрофанов Ю.И. Основы теории сетей массового обслуживания. — Саратов: Изд-во Сарат.ун-та, 1993. — 116 с.
  7. Бошарин Г.П., Бочаров П.П., Коган Я.А. Анализ очередей в вычислительных сетях. Теория и методы расчета. — М.: Наука, 1989.— 336 с.
  8. Авен О.И., Гурин Н.Н., Коган Я.А. Оценка качества и оптимизация вычислительных систем. — М.: Наука,1982. — 464 с.
  9. Столингс У. Структурная организация и архитектура компьютерных систем. — М.: Вильямс, 2002. — 893 с.
  10. Аztec — a massively parallel iterative solver library for solving sparce linear systems [Электронный ресурс] / Интернет-ресурс. — Режим доступа: http://www.cs.sandia.gov/CRF/aztec1.html
  11. ScaLAPACK Home Page [Электронный ресурс] / Интернет-ресурс. — Режим доступа: http://www.netlib.org/scalapack/scalapack_home.html
  12. PETSc — Portable, Extensible Toolkit for Scientific Computation [Электронный ресурс] / Интернет-ресурс. — Режим доступа: http://www.mcs.anl.gov/petsc/petsc-as/
  13. METIS — Family of Multilevel Partitioning Algorithms [Электронный ресурс] / Интернет-ресурс. — Режим доступа: http://glaros.dtc.umn.edu/gkhome/views/metis/
  14. Спортак М., Франк Ч., Паппас Ч. и др. Высокопроизводительные сети. Энциклопедия пользователя. — К.: ДиаСофт, 1998. — 432 с.
  15. Цилькер Б.Я., Орлов С.А. Организация ЭВМ и систем. — М: ПИТЕР, 2004. — 668 с.
  16. Фельдман Л.П., Малинская Э.Б. Аналитическое моделирование вычислительных систем с приоритетным обслуживанием программ. //Электронное моделирование. — К., 1985. — №6.— с. 78-82.
  17. Дяченко Т.Ф., Михайлова Т.В. Методы решения больших СЛАУ на разреженных матрицах общего вида // Тезизы доклада на IV международной научно-практической конференции молодых ученых, аспирантов,студентов «Современная информационная Украина: Информатика, Экономика, Философия», 13-14 мая 2010г., Государственный Университет информатики и искусственного интеллекта, Донецк
  18. Попов А.В. Параллельные алгоритмы решения линейных систем с разреженными симметричными матрицами. // Штучний інтелект. — Д.,2006. — №4. — с. 248-257.
  19. Химич А.Н., Попов А.В., Т.В. Чистякова, О.В. Рудич, Т.А. Герасимова. Исследование блочно-циклических алгоритмов на семействе кластеров СКИТ. // Проблеми програмування. — К., 2006. — № 2-3. — с. 177-183.
  20. Игнатьев А.В., Ромашкин В.Н.. Анализ эффективности методов решения больших разреженных систем линейных алгебраических уравнений. // Интернет-вестник ВолгГАСУ. Сер.: Строит. информатика. 2008. — №3(6). — с. 1-7.
  21. Ильин В.П. Проблемы высокопроизводительных технологий решения больших разреженных СЛАУ. // Вычислительные методы и программирование. — М., 2009. — №10. — с. 141-147.
  22. Тьюарсон Р. Разреженные матрицы. Пер. с англ. М: Мир, 1977. — 189 с.

©Магистр ДонНТУ Дяченко Т.Ф., ДонНТУ, 2010