[Please remove <h1>]
Winter 2007
Note: Events of interest to the
Database Research Group are posted to the uw.cs.database
newsgroup and are mailed to the
db-group@lists.uwaterloo.ca
mailing list. There are actually three mailing lists aggregated into the
db-group list: db-faculty
(for DB group faculty), db-grads (for DB group graduate students),
and db-friends (for DB group alumni, visitors, and friends). If
you wish to subscribe to one of these three lists (or to unsubscribe), please
visit
https://lists.uwaterloo.ca/mailman/listinfo/<listname>, where
<listname> is the list you wish to subscribe to.
- DB group meetings
- The DB group meets most Friday afternoons at 2pm, usually in DC1331.
See the list of current events for
times and locations of upcoming meetings. Each meeting lasts
for an hour and features an informal presentation by one of the
members of the group. Everyone is welcome to attend. These talks are
intended to raise questions and to stimulate discussion rather than
being polished presentations of research results. Speakers are determined
using a rotating speaker list, which can be found on the
DB group meeting page
- DB seminar series
- The DB seminar series features visiting speakers. These seminars are
more-or-less monthly, and are usually scheduled on Monday
mornings at 11am. See the list of current
events for times and locations of upcoming seminars. The
full schedule can be found on the DB seminar series page.
Recent and Upcoming Events
DB Meeting:
|
Friday January 5, 2:00pm, DC 1331
|
Speaker:
|
David DeHaan
|
Title:
|
Top-Down Query Optimization Revisited
|
Abstract:
|
Cost-based optimization of select-project-join queries is an old but
foundational problem for database management systems. The textbook
algorithm for finding the optimal plan is the bottom-up dynamic
programming search of System-R. The only other technique with
significant exposure in the literature is the top-down transformational
search of Volcano/Cascades.
In this talk I will compare and contrast these two established
algorithms with a recent bottom-up dynamic programming algorithm
proposed by Moerkotte and Neumann. I will then propose a new top-down
algorithm that achieves the efficiency of Moerkotte and Neumann's
enumeration while enabling desirable enhancements of top-down search
such as branch-and-bound pruning and demand-driven interesting orders.
One novel feature of this new algorithm is that it allows for a flexible
trade-off between storage cost and CPU time, which makes it highly
adaptable for a variety of execution environments.
|
DB Meeting:
|
Friday January 12, 2:00pm, DC 1331
|
Speaker:
|
Gord Cormack
|
Title:
|
Statistical Precision of Information Retrieval Evaluation
|
Abstract:
|
Precision of measurement is the degree to which a measured value
is free of random error. Information Retrieval Evaluation concerns
measuring the effectiveness of an information retrieval system
on some sample topics and data. To what extent does such an
evaluation measure the "true" effectiveness of the system, and
to what is the measured value due to chance? I investigate the
use of bootstrapping techniques to estimate confidence intervals
for these measures. I call into question the validity of statistical
significance tests as they are commonly used in IR evaluation.
|
DB Meeting:
|
Friday January 19, 2:00pm, DC 1331
|
Speaker:
|
Anil Goel, Sybase iAnywhere
|
Title:
|
Uncertainty Aware Query Optimization
|
Abstract:
|
I will discuss the problem of making query optimization deal with
uncertainty in the input parameters. In particular, I will talk about
techniques proposed in two papers at SIGMOD 2005 on this topic.
|
DB Meeting:
|
Friday January 26, 2:00pm, DC 1331
|
Speaker:
|
Gulay Unel
|
Title:
|
Safraless Decision Procedures using Complex-value Datalog
|
Abstract:
|
We propose the use of techniques developed for query evaluation of
Complex-value Datalog queries for determining satisfiability of
mu-calculus formulas extending our current method for weak
second order logic formulas. Our method is based on
the translation of mu-calculus formulas to alternating parity
automata and using Safraless decision procedures
for checking the emptiness of the automata which was
proposed by Kupferman and Vardi. We show that the use of
database evaluation techniques, in particular the
top-down resolution-based evaluation with memoing,
can considerably improve the performance of
decision procedures based on the connection between
logics and automata.
|
DB Meeting:
|
Friday February 2, 2:00pm, DC 1331
|
Speaker:
|
Oguzhan Ozmen
|
Title:
|
Storage Workload Estimation for Database Management Systems
|
Abstract:
|
Modern storage systems are sophisticated. Simple directly-attached
storage devices are giving way to storage systems that are shared, flexible, virtualized and
network-attached. Today, storage systems have their own administrators, who use
specialized tools and expertise to configure and manage storage resources. Although the separation of storage
management and database management (i.e., storage virtualization) has many advantages,
it also introduces problems. Database physical design and storage configuration are closely related
tasks, and the separation makes it more difficult to achieve a good end-to-end
design.
In this talk, I will present own approach to close this gap by addressing
the problem of predicting the storage workload that will be
generated by a database management system. Specifically, I will show the methodology to translate a database
workload description, together with a database physical design, into a characterization of the storage workload
that will result. Such a characterization can be used by a storage
administrator to guide storage configuration. Lastly, I will also
present an empirical assessment of the cost of workload prediction as well as the accuracy of the result.
|
DB Meeting:
|
Friday February 9, 2:00pm, DC 1331
|
Speaker:
|
Huaxin Zhang
|
Title:
|
Answering Queries using Materialized XML Views
|
Abstract:
|
Materialized views can be used to expediate query processing, and may
provide orders of magnitude performance enhancements. However, unlike
relational views, materialized XML views are harder to exploit since
the query language is more difficult to handle than the relational
counterpart. In this talk I will first present (the result of) some
theoretical work on using XPath views to answer XPath queries. Then I
will review a DB2 implementation based on these results.
|
DB Meeting:
|
Friday February 16, 2:00pm, DC 1331
|
Speaker:
|
Charles Clarke
|
Title:
|
The Influence of Caption Features on Clickthrough Patterns in Web Search
|
Abstract:
|
Web search engines present lists of "captions", comprising title,
snippet, and URL, to help users decide which search results to
visit. Understanding the influence of features of these captions on
Web search behavior may help validate algorithms and guidelines for
their improved generation. In this paper we develop a methodology to
use clickthrough logs from a commercial search engine to study user
behavior when interacting with search result captions. The findings
of our study suggest that relatively simple caption features such as
the presence of all terms query terms, the readability of the snippet,
and the length of the URL shown in the caption, can significantly
influence users' Web search behavior.
|
DB Meeting:
|
Friday March 9, 2:00pm, DC 1331
|
Speaker:
|
Xuhui Li
|
Title:
|
Getting Priorities Straight: Improving Linux Support for Database I/O
|
Abstract:
|
Asynchronous I/O has been widely used by database systems. As a result,
a large number of I/O requests can be queued inside the OS. This helps
the OS to do I/O scheduling. But it also increases I/O latency. In this
talk I will introduce some research work that focus on the tradeoff of
I/O throughput and latency and show how database systems can benefit from
the results of this work. My talk will focus on the paper of the
same title by Christoffer Hall and Philippe Bonnet in VLDB 2005 (1116-1127).
|
DB Meeting:
|
Friday March 16, 2:00pm, DC 1331
|
Speaker:
|
Yasemin Ugur-Ozekinci
|
Title:
|
Object-based Storage Systems
|
Abstract:
|
Storage devices have been continually evolving in terms of processing
power, density and performance. The existing block-based interfaces
(e.g. SCSI) to storage devices, however, have become a limitation in
this evolution. In traditional storage architectures, there is a
trade-off between data sharing, security, and
performance. Object-based storage is designed to address these
limitations. This technology enables moving the storage management
component from a file system to a storage device.
In this talk, I will present an extensive overview of object-based
storage, including its main features, such as capability-based
security, and challenges in the adoption of object-based storage
technology. I will also discuss some of the object-based storage
system prototypes, and file systems employing object-based storage. I
will conclude with a comparison of object-based storage systems with
the block-based counterpart.
|
DB Meeting:
|
Friday March 23, 2:00pm, DC 1331
|
Speaker:
|
David Toman
|
Title:
|
Description Logics for Temporal Conceptual Modeling
|
Abstract:
|
The talk introduces Description Logics and shows how they can be used
for conceptual data modeling, in particular for reasoning about ER or
UML diagrams. Then we talk about various temporal extensions of such
conceptual data models, their impact on the underlying DL and on the
hardness of the associated reasoning problems.
(The last part of the talk is based on an IJCAI'07 paper.)
|
DB Meeting:
|
Friday March 30, 2:00pm, DC 1304 (Room Change)
|
Speaker:
|
Mohamed Soliman
|
Title:
|
Top-k Query Processing in Uncertain Databases
|
Abstract:
|
Top-k processing in uncertain databases is semantically and
computationally different from traditional top-k processing. The
interplay between query scores and data uncertainty makes traditional
techniques inapplicable, and motivates the need for new query
formulations and processing techniques.
In this talk, I will present our newly proposed probabilistic
formulations for top-k queries based on "marriage" of
traditional top-k semantics and possible worlds semantics. In the
light of these formulations, I will describe a processing framework
that encapsulates a state space model and efficient query processing
algorithms to tackle the challenges of uncertain data settings.
|
DB Meeting:
|
Friday April 13, 2:00pm, DC 1331
|
Speaker:
|
Amr El-Helw
|
Title:
|
Collecting and Maintaining Just-in-Time Statistics
|
Abstract:
|
Traditional DBMSs decouple statistics collection and query
optimization both in space and time. Decoupling in time may lead to
outdated statistics. Decoupling in space may cause statistics not to
be available at the desired granularity needed to optimize a
particular query, or some important statistics may not be available at
all. Overall, this decoupling often leads to large cardinality
estimation errors and, in consequence, to the selection of suboptimal
plans for query execution. In this talk, I will present JITS, a system
for proactively collecting query-specific statistics during query
compilation. The system employs a lightweight sensitivity analysis to
choose which statistics to collect by making use of previously
collected statistics and database activity patterns. The collected
statistics are materialized and incrementally updated for later reuse.
|
DB Meeting:
|
Friday April 27, 2:00pm, DC 1331
|
Speaker:
|
Jack Ng
|
Title:
|
Extending Progressive Optimization to a Shared-Nothing Parallel Database
Architecture
|
Abstract:
|
Commercial enterprise data warehouses are typically implemented on large
shared-nothing parallel databases due to the inherent scalability and
performance limitations imposed by a serial architecture. Queries used in
these large databases often contain complex predicates as well as multiple
joins, and the resulting query execution plans generated by the query
optimizer may be suboptimal due to estimation errors on statistics such as
table cardinalities. Progressive optimization (POP) is a methodology for
detecting estimation errors through monitoring actual statistics at run-time
and recovering from a suboptimal plan by triggering re-optimization with
feedback knowledge. However, the original POP solution is based on a serial
database architecture and its core ideas cannot be readily applied to a
shared-nothing parallel environment. Extending Serial POP to a parallel
architecture is a challenging problem since the database management system
(DBMS) needs to determine when and how to trigger re-optimization based on
statistics collected from multiple independent processing nodes at the query
plan level as well as during run-time. Furthermore, reusing distributed
intermediate results from partial executions in a parallel environment
requires a different mechanism. This essay presents a comprehensive and
practical solution to the problem. Specifically, a system consists of a
novel voting mechanism and several voting schemes for reaching a global
consensus on re-optimization, a mechanism to reuse distributed intermediate
results across multiple nodes as a partitioned materialized view, several
variants of parallel plan checkpoint operators, and parallel checkpoint
processing methods using efficient synchronization protocols are proposed.
This solution has been prototyped in a leading commercial parallel DBMS.
Extensive experiments have also been performed using the industry-standard
decision support TPC-H benchmark as well as with a real-world large
database. Experimental results show that the Parallel POP solution has
negligible run-time overhead and is capable of accelerating the performance
of complex online analytical processing (OLAP) queries by up to a factor of
22.
|
This page is maintained by
Ken Salem.