[Please remove <h1>]
Spring 2002
DB
meeting: |
Friday, May 3rd, 2:00 pm, DC1331 |
Speaker: |
M. Tamer Özsu |
Topic: |
I'll talk about the following paper: "Models and Issues in Data Stream
Systems" B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom PODS'02. |
Snacks: |
Sunny Lam |
DB
meeting: |
Friday, May 10th, 2:00 pm, DC1331 |
Speaker: |
Leon Cao |
Topic: |
The Evaluation of Strong Web Caching Consistency Algorithms |
Snacks: |
M. Tamer Özsu |
Abstract: |
As the World Wide Web continues to grow in an exponential rate,
Web Caching has become a hot research area, in the hope that by using it,
we could not only reduce the client observed latency, but the network traffic
and server load as well. Traditional wisdom holds that strong cache consistency
is too expensive for the Web because a lot of extra resource is required
to enforce that. However, as business transactions on the Web become more
popular, strong consistency will get widely accepted and required by popular
online applications. This thesis evaluates the performance of different
categories of cache consistency algorithms using TPC-W, the Web commerce
benchmark. In order to decide on the optimum cache deployment location,
we also conduct a number of experiments using the benchmark. Our experiments
show that we could still enforce strong cache consistency without much
overhead, and Invalidation, as an event-driven strong cache consistency
algorithm, is most suitable for online e-business. Proxy-side cache has
a 30-35% performance advantage over client-side cache with regard to system
throughput. |
DB
meeting: |
Friday, May 17th, 2:00 pm, DC1331 |
Speaker: |
Peter Bumbulis |
Topic: |
A Compact B-tree |
Snacks: |
Leon Cao |
Abstract: |
In this paper we describe a Patricia tree-based B-tree variant
suitable for OLTP. In this variant, each page of the B-tree contains a
local Patricia tree instead of the usual sorted array of keys. It has been
implemented in iAnywhere ASA Version 8.0. Preliminary experience has shown
that these indexes can provide significant space and performance benefits
over existing ASA indexes.
The paper is available online at http://www.acm.org/sigs/sigmod/sigmod02/eproceedings/papers/Industrial-Bumbulis-et-al.pdf. |
DB
meeting: |
Friday, May 24th, 2:00 pm, DC1331 |
Speaker: |
Matthew Young-Lai |
Topic: |
I'll talk about a technique used to limit the effort spent on join
enumeration in ASA. |
Snacks: |
Peter Bumbulis |
Note:
No DB Meetings on
Friday May 31st and
Friday
June 7th.
DB
meeting: |
Friday, June 21st, 2:00 pm, DC1331 |
Speaker: |
Lubomir Stanchev |
Topic: |
I intend to do an overview of the paper "Exploiting Statistics on Query
Expressions for Optimization" from SIGMOD 2002 by Nicolas Bruno and Surajit
Chaudhuri. The abstract of the paper is below. |
Snacks: |
Matthew Young-Lai |
Abstract: |
Statistics play an important role in influencing the plans produced
by a query optimizer. Traditionally, optimizers use statistics built over
base tables and assume independence between attributes while propagating
statistical information through the query plan. This approach can introduce
large estimation errors, which may result in the optimizer choosing inefficient
execution plans. In this paper, we show how to extend a generic optimizer
so that it also exploits statistics built on expressions corresponding
to intermediate nodes of query plans. We show that in some cases, the quality
of the resulting plans is significantly better than when only base table
statistics are available. Unfortunately, even moderately-sized schemas
may have too many relevant candidate statistics. We introduce a workload-driven
technique to identify a small subset of statistics that can provide significant
benefits over just maintaining base-table statistics. Finally, we present
experimental results on an implementation of our approach in Microsoft
SQL Server 2000. |
DB
meeting: |
Friday, June 28th, 2:00 pm, DC1331 |
Speaker: |
Reem Al-Halimi |
Topic: |
Indexing by Topic Relevance |
Snacks: |
Reem Al-Halimi |
Abstract: |
Good index words in information retrieval have traditionally
been words that successfully distinguish between different texts. But a
good distinguisher does not have to be relevant to the content of the text.
Rather, it can be any sequence of characters that occurs often in one type
of documents and rarely in the other types. Such words may be effective
for some tasks such as document retrieval but they lack essential content
relevance information that is needed for other tasks such as document visualization.
In this talk I will present a technique that measures the relevance of
a word to a topic through the word's pattern of occurrence in the topic's
documents. I will also show that these words, called "topic words", correspond
more closely to manually selected keywords than words chosen using traditional
indexing techniques thus indicating that topic words are better identifiers
of the topical content of documents. |
DB
Seminar: |
Monday, July 8th, 11:00 AM, DC1304 |
Speaker: |
Derick Wood, Hong Kong University of Science and Technology |
Topic: |
Caterpillars, T-Graphs and Context |
Abstract: |
I will present two different ways that we developed for specifying
context in documents. The first, caterpillar expressions, leads to very
nice theoretical questions/problems; the second, T-graphs, supports a 70%
solution that is suitable for most contexts that we need (it was used in
Designer). I will compare the two methods and summarize their positive
and negative aspects. |
MMath
Thesis Presentation: |
Wednesday, July 10th, 10:00 AM, DC1331 |
Speaker: |
Maryam Aamir Khan |
Topic: |
A Block Selectivity Model for Partitioned Relations |
DB
meeting: |
Friday, July 12th, 2:00 pm, DC1331 |
Speaker: |
Huizhu Liu |
Topic: |
Query Optimization with Chase and Backchase (C&B) |
Snacks: |
Ken Salem |
Abstract: |
In Friday's talk, I would like to give an overview of a project,
Query Optimization with Chase and Backchase (C&B), developed in University
of Pennsylvania. In that project, a new and interesting technique for query
rewriting is developed. This technique uses two basic rules: chase and
backchase. A query is chased with constraints in order to produce a larger,
but equivalent, query that incorporates all the alternate ways of answering
the original query (views, indexes, other relations or OO classes). This
larger query can then be minimized, by using the backchase rule, to produce
a complete set of minimal and equivalent rewritings.
In particular, I will talk about two papers: "Physical Data Independence,
Constraints and Optimization with Universal Plans" from VLDB'99 by Alin
Deutsch, Lucian Popa and Val Tannen and "A Chase Too Far?" from SIGMOD'00
by Lucian Popa, Alin Deutsch, Arnaud Sahuguet and Val Tannen. |
This page is maintained by
Frank
Tompa and
Ken Salem.