|
Advances in communications technology and the ever decreasing cost of computers have made distributed computer systems an attractive alternative for satisfying the information needs of large and geographically dispersed organizations. Distributed computer systems are composed of geographically dispersed computer sites connected together through some communication network. Each computer in the network has its own memory, processing capabilities, communication and other necessary software to work independently. The communication and control protocols enable dispersed users and applications to access/update data from local as well as from remote sites in an integrated and transparent manner. In a distributed system, files are accessed for update and retrieval activities by geographically dispersed users and applications. Unlike the centralized system, files can be replicated at different sites to reduce the communication cost and/or response time. An important issue in the design of a distributed system is determining the optimal number of file copies and their locations in the network so that some optimality criterion is maximized while satisfying a set of constraints, such as space and channel limitations, response time and processing limitation, at the same time. The distributed database is a set of divided data logically connected which are physically distributed in some computer network. There are next advantages of distributed database:
The distributed databases are demanded now. Therefore optimization of the distributed databases with the purpose of increasing of productivity of systems is actual. The purpose of master’s work is increase of productivity of work of the distributed databases of computer information systems by optimization of distribution data among nodes of a computer network. For achievement of an object it is necessary to solve following problems:
Now many scientific works are devoted to modelling and optimization of the distributed databases. The powerful contribution to development of this direction made Cegelik G.G. However the models that he offered have a number of lacks because they have many restrictions and simplifications. His mathematical models do not consider such operation as replication and data fragmentation and they do not consider dynamic processes which occur in the distributed database. Methods, which are applied to optimization of the distributed databases (the method of branches and borders, mathematical programming) have not given positive results because dimension of a problem is great, that demands significant expenses of time and computing resources. In the master’s work I plan to make mathematical model of files optimum distribution among nodes of a network with the purpose of minimization of the total average time of queries processing and updates spreading. The mathematical model has to consider all characteristics of performance of the distributed database in MS SQL Server. It is necessary to get the most exact data about parameters of the distributed database. With this purpose I plan to create tools for gathering the information about parameters of the distributed database. At last I am going to use genetic algorithm for the decision of the optimization problem. |
|