Winter 2007 Events Schedule | Database Research Group | UW

[Please remove <h1>]

Winter 2007

Note: Events of interest to the Database Research Group are posted to the uw.cs.database newsgroup and are mailed to the db-group@lists.uwaterloo.ca mailing list. There are actually three mailing lists aggregated into the db-group list: db-faculty (for DB group faculty), db-grads (for DB group graduate students), and db-friends (for DB group alumni, visitors, and friends). If you wish to subscribe to one of these three lists (or to unsubscribe), please visit https://lists.uwaterloo.ca/mailman/listinfo/<listname>, where <listname> is the list you wish to subscribe to.

DB group meetings: The DB group meets most Friday afternoons at 2pm, usually in DC1331. See the list of current events for times and locations of upcoming meetings. Each meeting lasts for an hour and features an informal presentation by one of the members of the group. Everyone is welcome to attend. These talks are intended to raise questions and to stimulate discussion rather than being polished presentations of research results. Speakers are determined using a rotating speaker list, which can be found on the DB group meeting page
DB seminar series: The DB seminar series features visiting speakers. These seminars are more-or-less monthly, and are usually scheduled on Monday mornings at 11am. See the list of current events for times and locations of upcoming seminars. The full schedule can be found on the DB seminar series page.

Recent and Upcoming Events

DB Meeting:	Friday January 5, 2:00pm, DC 1331
Speaker:	David DeHaan
Title:	Top-Down Query Optimization Revisited
Abstract:	Cost-based optimization of select-project-join queries is an old but foundational problem for database management systems. The textbook algorithm for finding the optimal plan is the bottom-up dynamic programming search of System-R. The only other technique with significant exposure in the literature is the top-down transformational search of Volcano/Cascades. In this talk I will compare and contrast these two established algorithms with a recent bottom-up dynamic programming algorithm proposed by Moerkotte and Neumann. I will then propose a new top-down algorithm that achieves the efficiency of Moerkotte and Neumann's enumeration while enabling desirable enhancements of top-down search such as branch-and-bound pruning and demand-driven interesting orders. One novel feature of this new algorithm is that it allows for a flexible trade-off between storage cost and CPU time, which makes it highly adaptable for a variety of execution environments.

DB Meeting:	Friday January 12, 2:00pm, DC 1331
Speaker:	Gord Cormack
Title:	Statistical Precision of Information Retrieval Evaluation
Abstract:	Precision of measurement is the degree to which a measured value is free of random error. Information Retrieval Evaluation concerns measuring the effectiveness of an information retrieval system on some sample topics and data. To what extent does such an evaluation measure the "true" effectiveness of the system, and to what is the measured value due to chance? I investigate the use of bootstrapping techniques to estimate confidence intervals for these measures. I call into question the validity of statistical significance tests as they are commonly used in IR evaluation.

DB Seminar:	Tuesday January 16, 9:30am, DC 1331 (New Room, New Time!)
Speaker:	Johannes Gehrke, Cornell University
Title:	Complex Event Processing with Cayuga

DB Meeting:	Friday January 19, 2:00pm, DC 1331
Speaker:	Anil Goel, Sybase iAnywhere
Title:	Uncertainty Aware Query Optimization
Abstract:	I will discuss the problem of making query optimization deal with uncertainty in the input parameters. In particular, I will talk about techniques proposed in two papers at SIGMOD 2005 on this topic.

DB Meeting:	Friday January 26, 2:00pm, DC 1331
Speaker:	Gulay Unel
Title:	Safraless Decision Procedures using Complex-value Datalog
Abstract:	We propose the use of techniques developed for query evaluation of Complex-value Datalog queries for determining satisfiability of mu-calculus formulas extending our current method for weak second order logic formulas. Our method is based on the translation of mu-calculus formulas to alternating parity automata and using Safraless decision procedures for checking the emptiness of the automata which was proposed by Kupferman and Vardi. We show that the use of database evaluation techniques, in particular the top-down resolution-based evaluation with memoing, can considerably improve the performance of decision procedures based on the connection between logics and automata.

DB Meeting:	Friday February 2, 2:00pm, DC 1331
Speaker:	Oguzhan Ozmen
Title:	Storage Workload Estimation for Database Management Systems
Abstract:	Modern storage systems are sophisticated. Simple directly-attached storage devices are giving way to storage systems that are shared, flexible, virtualized and network-attached. Today, storage systems have their own administrators, who use specialized tools and expertise to configure and manage storage resources. Although the separation of storage management and database management (i.e., storage virtualization) has many advantages, it also introduces problems. Database physical design and storage configuration are closely related tasks, and the separation makes it more difficult to achieve a good end-to-end design. In this talk, I will present own approach to close this gap by addressing the problem of predicting the storage workload that will be generated by a database management system. Specifically, I will show the methodology to translate a database workload description, together with a database physical design, into a characterization of the storage workload that will result. Such a characterization can be used by a storage administrator to guide storage configuration. Lastly, I will also present an empirical assessment of the cost of workload prediction as well as the accuracy of the result.

DB Seminar:	Monday February 5, 10:30am, DC 1304
Speaker:	Donald Kossmann, ETH Zürich
Title:	Predicate-based Indexing of Annotated Data

DB Meeting:	Friday February 9, 2:00pm, DC 1331
Speaker:	Huaxin Zhang
Title:	Answering Queries using Materialized XML Views
Abstract:	Materialized views can be used to expediate query processing, and may provide orders of magnitude performance enhancements. However, unlike relational views, materialized XML views are harder to exploit since the query language is more difficult to handle than the relational counterpart. In this talk I will first present (the result of) some theoretical work on using XPath views to answer XPath queries. Then I will review a DB2 implementation based on these results.

DB Meeting:	Friday February 16, 2:00pm, DC 1331
Speaker:	Charles Clarke
Title:	The Influence of Caption Features on Clickthrough Patterns in Web Search
Abstract:	Web search engines present lists of "captions", comprising title, snippet, and URL, to help users decide which search results to visit. Understanding the influence of features of these captions on Web search behavior may help validate algorithms and guidelines for their improved generation. In this paper we develop a methodology to use clickthrough logs from a commercial search engine to study user behavior when interacting with search result captions. The findings of our study suggest that relatively simple caption features such as the presence of all terms query terms, the readability of the snippet, and the length of the URL shown in the caption, can significantly influence users' Web search behavior.

DB Seminar:	Tuesday February 20, 2:00pm, MC5136
Speaker:	Gustavo Alonso, ETH Zürich
Title:	SwissQm: A Virtual Machine for Sensor Networks

DB Meeting:	Friday March 9, 2:00pm, DC 1331
Speaker:	Xuhui Li
Title:	Getting Priorities Straight: Improving Linux Support for Database I/O
Abstract:	Asynchronous I/O has been widely used by database systems. As a result, a large number of I/O requests can be queued inside the OS. This helps the OS to do I/O scheduling. But it also increases I/O latency. In this talk I will introduce some research work that focus on the tradeoff of I/O throughput and latency and show how database systems can benefit from the results of this work. My talk will focus on the paper of the same title by Christoffer Hall and Philippe Bonnet in VLDB 2005 (1116-1127).

DB Seminar:	Monday March 12, 9:00am, DC 1304 (New Time!)
Speaker:	Jignesh Patel, University of Michigan
Title:	Towards Declarative and Efficient Querying on Biological Datasets

DB Meeting:	Friday March 16, 2:00pm, DC 1331
Speaker:	Yasemin Ugur-Ozekinci
Title:	Object-based Storage Systems
Abstract:	Storage devices have been continually evolving in terms of processing power, density and performance. The existing block-based interfaces (e.g. SCSI) to storage devices, however, have become a limitation in this evolution. In traditional storage architectures, there is a trade-off between data sharing, security, and performance. Object-based storage is designed to address these limitations. This technology enables moving the storage management component from a file system to a storage device. In this talk, I will present an extensive overview of object-based storage, including its main features, such as capability-based security, and challenges in the adoption of object-based storage technology. I will also discuss some of the object-based storage system prototypes, and file systems employing object-based storage. I will conclude with a comparison of object-based storage systems with the block-based counterpart.

DB Meeting:	Friday March 23, 2:00pm, DC 1331
Speaker:	David Toman
Title:	Description Logics for Temporal Conceptual Modeling
Abstract:	The talk introduces Description Logics and shows how they can be used for conceptual data modeling, in particular for reasoning about ER or UML diagrams. Then we talk about various temporal extensions of such conceptual data models, their impact on the underlying DL and on the hardness of the associated reasoning problems. (The last part of the talk is based on an IJCAI'07 paper.)

DB Meeting:	Friday March 30, 2:00pm, DC 1304 (Room Change)
Speaker:	Mohamed Soliman
Title:	Top-k Query Processing in Uncertain Databases
Abstract:	Top-k processing in uncertain databases is semantically and computationally different from traditional top-k processing. The interplay between query scores and data uncertainty makes traditional techniques inapplicable, and motivates the need for new query formulations and processing techniques. In this talk, I will present our newly proposed probabilistic formulations for top-k queries based on "marriage" of traditional top-k semantics and possible worlds semantics. In the light of these formulations, I will describe a processing framework that encapsulates a state space model and efficient query processing algorithms to tackle the challenges of uncertain data settings.

DB Meeting:	Friday April 13, 2:00pm, DC 1331
Speaker:	Amr El-Helw
Title:	Collecting and Maintaining Just-in-Time Statistics
Abstract:	Traditional DBMSs decouple statistics collection and query optimization both in space and time. Decoupling in time may lead to outdated statistics. Decoupling in space may cause statistics not to be available at the desired granularity needed to optimize a particular query, or some important statistics may not be available at all. Overall, this decoupling often leads to large cardinality estimation errors and, in consequence, to the selection of suboptimal plans for query execution. In this talk, I will present JITS, a system for proactively collecting query-specific statistics during query compilation. The system employs a lightweight sensitivity analysis to choose which statistics to collect by making use of previously collected statistics and database activity patterns. The collected statistics are materialized and incrementally updated for later reuse.

DB Meeting:	Friday April 27, 2:00pm, DC 1331
Speaker:	Jack Ng
Title:	Extending Progressive Optimization to a Shared-Nothing Parallel Database Architecture
Abstract:	Commercial enterprise data warehouses are typically implemented on large shared-nothing parallel databases due to the inherent scalability and performance limitations imposed by a serial architecture. Queries used in these large databases often contain complex predicates as well as multiple joins, and the resulting query execution plans generated by the query optimizer may be suboptimal due to estimation errors on statistics such as table cardinalities. Progressive optimization (POP) is a methodology for detecting estimation errors through monitoring actual statistics at run-time and recovering from a suboptimal plan by triggering re-optimization with feedback knowledge. However, the original POP solution is based on a serial database architecture and its core ideas cannot be readily applied to a shared-nothing parallel environment. Extending Serial POP to a parallel architecture is a challenging problem since the database management system (DBMS) needs to determine when and how to trigger re-optimization based on statistics collected from multiple independent processing nodes at the query plan level as well as during run-time. Furthermore, reusing distributed intermediate results from partial executions in a parallel environment requires a different mechanism. This essay presents a comprehensive and practical solution to the problem. Specifically, a system consists of a novel voting mechanism and several voting schemes for reaching a global consensus on re-optimization, a mechanism to reuse distributed intermediate results across multiple nodes as a partitioned materialized view, several variants of parallel plan checkpoint operators, and parallel checkpoint processing methods using efficient synchronization protocols are proposed. This solution has been prototyped in a leading commercial parallel DBMS. Extensive experiments have also been performed using the industry-standard decision support TPC-H benchmark as well as with a real-world large database. Experimental results show that the Parallel POP solution has negligible run-time overhead and is capable of accelerating the performance of complex online analytical processing (OLAP) queries by up to a factor of 22.

DB Seminar:	Monday April 30, 10:30am, DC 1304
Speaker:	Surajit Chaudhuri, Microsoft Research
Title:	Self-Managing DBMS Technology: The AutoAdmin Experience

This page is maintained by Ken Salem.