Podlesnaya Yana Optimization of Distributed Database Masters work

*ДОНЕЦКИЙ НАЦИОНАЛЬНЫЙ ТЕХНИЧЕСКИЙ УНИВЕРСИТЕТ* DONETSK NATIONAL TECHNICAL UNIVERSITY
Russian	English

Mane page DonNTU

Master's portal

Master's work

Theme of master's work :

"Dynamic optimization of data distribution among network nodes"

Written by Podlesnaya Yana

Biography

Library

Links

Report about the search

The individual task

Advances in communications technology and the ever decreasing cost of computers have made distributed computer systems an attractive alternative for satisfying the information needs of large and geographically dispersed organizations. Distributed computer systems are composed of geographically dispersed computer sites connected together through some communication network. Each computer in the network has its own memory, processing capabilities, communication and other necessary software to work independently. The communication and control protocols enable dispersed users and applications to access/update data from local as well as from remote sites in an integrated and transparent manner.

In a distributed system, files are accessed for update and retrieval activities by geographically dispersed users and applications. Unlike the centralized system, files can be replicated at different sites to reduce the communication cost and/or response time. An important issue in the design of a distributed system is determining the optimal number of file copies and their locations in the network so that some optimality criterion is maximized while satisfying a set of constraints, such as space and channel limitations, response time and processing limitation, at the same time. The distributed database is a set of divided data logically connected which are physically distributed in some computer network.

There are next advantages of distributed database:

Greatest local autonomy because data at the record or column level can be stored at the site(s) that most heavily use it.

Greatly reduced communications costs for read-only data access because copies of tables can be located at multiple sites that most heavily use them.

Greatly improved availability because if a site with a database table goes down, there may be another site with a copy of that table.

The distributed databases are demanded now. Therefore optimization of the distributed databases with the purpose of increasing of productivity of systems is actual.

The purpose of master’s work is increase of productivity of work of the distributed databases of computer information systems by optimization of distribution data among nodes of a computer network.

For achievement of an object it is necessary to solve following problems:

To study performance features of the distributed queries and updates distributions in MS SQL Server.
To create mathematical model of the distributed database taking in account features of queries processing and updates spreading.
To define a set of parameters of the distributed database which are necessary for calculation of efficiency of functioning of the distributed database.
To develop a database of distributed database parameters and the statistical information about performance processes of queries processing and updates spreading.
To develop tools for gathering the statistical information.
To modify algorithm of optimization of data distribution among nodes of a computer information network.

Now many scientific works are devoted to modelling and optimization of the distributed databases. The powerful contribution to development of this direction made Cegelik G.G. However the models that he offered have a number of lacks because they have many restrictions and simplifications. His mathematical models do not consider such operation as replication and data fragmentation and they do not consider dynamic processes which occur in the distributed database. Methods, which are applied to optimization of the distributed databases (the method of branches and borders, mathematical programming) have not given positive results because dimension of a problem is great, that demands significant expenses of time and computing resources.

In the master’s work I plan to make mathematical model of files optimum distribution among nodes of a network with the purpose of minimization of the total average time of queries processing and updates spreading. The mathematical model has to consider all characteristics of performance of the distributed database in MS SQL Server. It is necessary to get the most exact data about parameters of the distributed database. With this purpose I plan to create tools for gathering the information about parameters of the distributed database. At last I am going to use genetic algorithm for the decision of the optimization problem.

Main page DonNTU