Donetsk National Technical University Evgeny Shegal - Distributed data processing

DonNTU> Master's portal

Biography | Abstract | Library | Links | Report about the search | Individual task

Abstract

Research of properties of the distributed data processing systems

Evgeny Shegal

Introduction

Today distributed data processing systems are using almost everywhere. Many database applications adopt a client-server architecture, in which data resides on a server that receives queries from a client. For each client’s query, the server often needs to transfer to the client a large amount of data that is an answer to the query. The communication network in these environments could become a bottleneck in the computation. In this work I study how to minimize the communication costs of transferring answers to large-join queries from server to client. I am going to investigate a novel technique that decomposes the answer into intermediate results, or views, which can reduce the redundancy in the answer. These views are transferred to the client and are used by the client to compute the final answer. There are several challenges in implementing this technique: (1) the number of possible plans to decompose the answers could be very large; (2) the technique requires an efficient algorithm to give an accurate estimate of the size of each view; and (3) many factors could affect the decomposition choice; one such factor is whether relevant data is cached on the client. Extensive experiments on queries adapted from the TPC-H benchmark show that our technique can significantly reduce the communication costs of transferring answers to large-join queries. The extra steps used in this approach do pay off to reduce the total time of transferring the result of a query, when the result has a lot of redundancy.
Summary

A goal of this work is research of the problem of minimizing data-communication costs. In this work I am going to realize a mathematical model of the decomposition technique. The technique requires an efficient algorithm to give an accurate estimate of the size of each view so I want to modify the basic algorithm.

Because minimization of the data-communication costs is actual problem in almost distributed data processing systems, which adopt a client-sever architecture, a value of these investigations is enough big.
Related work

The original motivation for finding views to materialize to answer queries comes up in the context of designing data warehouses. Recently, proposed an approach to finding views to materialize that considers all possible views that can be invented to optimize a given metric of database performance. Heuristic search techniques have been proposed for finding a sequence of joins and semi joins that reduce the communication costs in distributed query processing. These previous distributed approaches focused on the problem of query placement to different sources. That is, they decompose a query into sub queries among individual server nodes. Data-compression techniques can be used to compress the results for efficient transfer.
Conclusions

In this paper I studied the problem of minimizing the communication costs of transferring the answer to a large join query from a server to a client; the problem exists in a variety of database applications. I investigated a novel technique that decomposes queries into intermediate results, called “views”; the answers to the views are transferred to the client and are then used by the client to compute the answers to the queries. Decomposing queries into views can reduce the redundancy in the query answers, which may result in significant reductions in the costs of transferring the data from the server to the client.
Links

1. R. Chirkova and D. Suciu. A formal perspective on the view selection problem. Proc. of VLDB, pages 59–68, 2001.

2. R. Chirkova and C. Li. Materializing views with minimal size to answer queries. PODS, 2003.

3. Transaction Processing Performance Council: TPC. http://www.tpc.org/.