Database Research Group Events

Spring 2002


DB meeting: Friday, May 3rd, 2:00 pm, DC1331
Speaker: M. Tamer Özsu
Topic: I'll talk about the following paper: "Models and Issues in Data Stream Systems" B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom PODS'02.
Snacks: Sunny Lam

DB meeting: Friday, May 10th, 2:00 pm, DC1331
Speaker: Leon Cao
Topic: The Evaluation of Strong Web Caching Consistency Algorithms
Snacks: M. Tamer Özsu
Abstract: As the World Wide Web continues to grow in an exponential rate, Web Caching has become a hot research area, in the hope that by using it, we could not only reduce the client observed latency, but the network traffic and server load as well. Traditional wisdom holds that strong cache consistency is too expensive for the Web because a lot of extra resource is required to enforce that. However, as business transactions on the Web become more popular, strong consistency will get widely accepted and required by popular online applications. This thesis evaluates the performance of different categories of cache consistency algorithms using TPC-W, the Web commerce benchmark. In order to decide on the optimum cache deployment location, we also conduct a number of experiments using the benchmark. Our experiments show that we could still enforce strong cache consistency without much overhead, and Invalidation, as an event-driven strong cache consistency algorithm, is most suitable for online e-business. Proxy-side cache has a 30-35% performance advantage over client-side cache with regard to system throughput.

DB meeting: Friday, May 17th, 2:00 pm, DC1331
Speaker: Peter Bumbulis
Topic: A Compact B-tree
Snacks: Leon Cao
Abstract: In this paper we describe a Patricia tree-based B-tree variant suitable for OLTP. In this variant, each page of the B-tree contains a local Patricia tree instead of the usual sorted array of keys. It has been implemented in iAnywhere ASA Version 8.0. Preliminary experience has shown that these indexes can provide significant space and performance benefits over existing ASA indexes.
The paper is available online at http://www.acm.org/sigs/sigmod/sigmod02/eproceedings/papers/Industrial-Bumbulis-et-al.pdf.

DB meeting: Friday, May 24th, 2:00 pm, DC1331
Speaker: Matthew Young-Lai
Topic: I'll talk about a technique used to limit the effort spent on join enumeration in ASA.
Snacks: Peter Bumbulis


Note: No DB Meetings on Friday May 31st and Friday June 7th.

DB Seminar: Monday, June 10th, 11:00 AM, DC1304
Speaker: Ouri Wolfson, University of Illinois at Chicago
Topic: Location Management in Moving Objects Databases

DB meeting: Friday, June 21st, 2:00 pm, DC1331
Speaker: Lubomir Stanchev
Topic: I intend to do an overview of the paper "Exploiting Statistics on Query Expressions for Optimization" from SIGMOD 2002 by Nicolas Bruno and Surajit Chaudhuri. The abstract of the paper is below.
Snacks: Matthew Young-Lai
Abstract: Statistics play an important role in influencing the plans produced by a query optimizer. Traditionally, optimizers use statistics built over base tables and assume independence between attributes while propagating statistical information through the query plan. This approach can introduce large estimation errors, which may result in the optimizer choosing inefficient execution plans. In this paper, we show how to extend a generic optimizer so that it also exploits statistics built on expressions corresponding to intermediate nodes of query plans. We show that in some cases, the quality of the resulting plans is significantly better than when only base table statistics are available. Unfortunately, even moderately-sized schemas may have too many relevant candidate statistics. We introduce a workload-driven technique to identify a small subset of statistics that can provide significant benefits over just maintaining base-table statistics. Finally, we present experimental results on an implementation of our approach in Microsoft SQL Server 2000.


DB meeting: Friday, June 28th, 2:00 pm, DC1331
Speaker: Reem Al-Halimi
Topic: Indexing by Topic Relevance
Snacks: Reem Al-Halimi
Abstract: Good index words in information retrieval have traditionally been words that successfully distinguish between different texts. But a good distinguisher does not have to be relevant to the content of the text. Rather, it can be any sequence of characters that occurs often in one type of documents and rarely in the other types. Such words may be effective for some tasks such as document retrieval but they lack essential content relevance information that is needed for other tasks such as document visualization. In this talk I will present a technique that measures the relevance of a word to a topic through the word's pattern of occurrence in the topic's documents. I will also show that these words, called "topic words", correspond more closely to manually selected keywords than words chosen using traditional indexing techniques thus indicating that topic words are better identifiers of the topical content of documents.

DB meeting: Friday, July 5th, 2:00 pm, DC1331
Speaker: Ken Salem
Topic: I will talk about some issues in access control for XML documents.  I will probably draw material from two papers that are to appear at VLDB'02, namely Compressed accessibility map: Efficient access control for XML by Ting Yu, Divesh Srivastava, Laks V.S. Lakshmanan and H. V. Jagadish, and Optimizing the secure evaluation of twig queries by SungRan Cho, Sihem Amer-Yahia, Laks V.S. Lakshmanan and Divesh Srivastava.
Snacks: Lubomir Stanchev

DB Seminar: Monday, July 8th, 11:00 AM, DC1304
Speaker: Derick Wood, Hong Kong University of Science and Technology
Topic: Caterpillars, T-Graphs and Context
Abstract: I will present two different ways that we developed for specifying context in documents. The first, caterpillar expressions, leads to very nice theoretical questions/problems; the second, T-graphs, supports a 70% solution that is suitable for most contexts that we need (it was used in Designer). I will compare the two methods and summarize their positive and negative aspects.

MMath Thesis Presentation: Wednesday, July 10th, 10:00 AM, DC1331
Speaker: Maryam Aamir Khan
Topic: A Block Selectivity Model for Partitioned Relations

DB meeting: Friday, July 12th, 2:00 pm, DC1331
Speaker: Huizhu Liu
Topic: Query Optimization with Chase and Backchase (C&B)
Snacks: Ken Salem
Abstract: In Friday's talk, I would like to give an overview of a project, Query Optimization with Chase and Backchase (C&B), developed in University of Pennsylvania. In that project, a new and interesting technique for query rewriting is developed. This technique uses two basic rules: chase and backchase. A query is chased with constraints in order to produce a larger, but equivalent, query that incorporates all the alternate ways of answering the original query (views, indexes, other relations or OO classes). This larger query can then be minimized, by using the backchase rule, to produce a complete set of  minimal and equivalent rewritings.

In particular, I will talk about two papers: "Physical Data Independence, Constraints and Optimization with Universal Plans" from VLDB'99 by Alin Deutsch, Lucian Popa and Val Tannen and "A Chase Too Far?" from SIGMOD'00 by Lucian Popa, Alin Deutsch, Arnaud Sahuguet and Val Tannen.



This page is maintained by Frank Tompa and Ken Salem.