Biography | Abstract | Library | Links | Report about the search | Individual task |
|
IntroductionToday distributed data processing systems are using almost everywhere. Many
database applications adopt a client-server architecture, in which data resides
on a server that receives queries from a client. For each client’s query, the
server often needs to transfer to the client a large amount of data that is an
answer to the query. The communication network in these environments could
become a bottleneck in the computation. In this work I study how to minimize
the communication costs of transferring answers to large-join queries from
server to client. I am going to investigate a novel technique that decomposes
the answer into intermediate results, or views, which can reduce the redundancy
in the answer. These views are transferred to the client and are used by the
client to compute the final answer. There are several challenges in
implementing this technique: (1) the number of possible plans to decompose the
answers could be very large; (2) the technique requires an efficient algorithm
to give an accurate estimate of the size of each view; and (3) many factors
could affect the decomposition choice; one such factor is whether relevant data
is cached on the client. Extensive experiments on queries adapted from the
TPC-H benchmark show that our technique can significantly reduce the
communication costs of transferring answers to large-join queries. The extra
steps used in this approach do pay off to reduce the total time of transferring
the result of a query, when the result has a lot of redundancy. SummaryA goal of this work is research of the problem of minimizing data-communication
costs. In this work I am going to realize a mathematical model of the
decomposition technique. The technique requires an efficient algorithm to give
an accurate estimate of the size of each view so I want to modify the basic
algorithm. Because minimization of the data-communication costs is actual problem
in almost distributed data processing systems, which adopt a client-sever
architecture, a value of these investigations is enough big. Related workThe original
motivation for finding views to materialize to answer queries comes up in the
context of designing data warehouses. Recently, proposed an approach to finding
views to materialize that considers all possible views that can be invented to
optimize a given metric of database performance. Heuristic search techniques
have been proposed for finding a sequence of joins and semi joins that reduce
the communication costs in distributed query processing. These previous distributed approaches focused on the
problem of query placement to different
sources. That is, they decompose a query into sub queries among
individual server nodes. Data-compression techniques can be used to compress
the results for efficient transfer. ConclusionsIn this paper
I studied the problem of minimizing the communication costs of transferring the
answer to a large join query from a server to a client; the problem exists in a
variety of database applications. I investigated a novel technique that
decomposes queries into intermediate results, called “views”; the answers to
the views are transferred to the client and are then used by the client to
compute the answers to the queries. Decomposing queries into views can reduce
the redundancy in the query answers, which may result in significant reductions
in the costs of transferring the data from the server to the client. Links1.
R. Chirkova
and D. Suciu. A formal perspective on the view selection problem. Proc. of VLDB, pages 59–68, 2001. 2.
R. Chirkova and C. Li. Materializing views with
minimal size to answer queries. PODS, 2003. 3. Transaction Processing Performance Council: TPC. http://www.tpc.org/. |