Database Research Group Events

Winter 2000

Time: Friday, January 14, 2:00 pm, DC1302
Note room change!
Speaker: Ken Salem
Topic: How to Roll a Join
(Asynchronous Incremental View Maintenance)

I will discuss the general problem of maintaining a materialized join view as the underlying tables change. I'll also describe a specific view maintenance algorithm that I worked on last year while I was visiting IBM Almaden.

Snacks: Ian Davis

Time: Friday, January 21, 2:00 pm, DC1304
Note room change!
Speaker: David Toman
Topic: Query Processing in Embedded Control Programs

An embedded control program can be viewed as a small main-memory database system tailored to suit the needs of a particular application. For performance reasons, the program will usually define low-level data structures to store the database which in turn have to be understood by application developers and maintainers. This is in contrast with the data independence that can be achieved by using a relational database system. However, because of space and performance requirements, the use of a traditional DBMS is not likely to be feasible in this setting.

To partly address this, we have developed a query optimizer capable of compiling SQL/OQL queries on a conceptual schema to native Java or C code that navigates low-level data structures. Of particular significance is that an arbitrary collection of such structures, perhaps already devised for an earlier version of the control program, can be given as a part of the input. The underlying algorithms used in our optimizer to accomplish this are the main focus of this paper. The algorithms are based on a novel resource bounded plan generation mechanism in which integrity constraints abstracting the definition of stored views are applied to source queries to extend the search space of possible query plans. We also report on some preliminary experiment results that suggest generated code navigates the data structures with an efficiency comparable to code written directly by expert programmers.

Snacks: Ken Salem

Time: Friday, February 4, 2:00 pm, DC1331
Speaker: Glenn Paulley
Topic: Aspects of query optimization in Sybase SQL Anywhere
I'll briefly discuss three things:
  1. an overview of SQL Anywhere Studio and its technologies
  2. characteristics of our customer base, and their applications, that make query optimization challenging
  3. an overview of the join enumeration algorithm used in SQL Anywhere.
All three topics are discussed in a paper that I will present at Data Engineering in March:

Bowman, Ivan T. and G. N. Paulley (March 2000): "Join enumeration in a memory-constrained environment." In Proceedings, 16th IEEE Data Engineering Conference, San Diego, California. To appear.

Snacks: David Toman

Time: Friday, February 11, 2:00 pm, DC1331
Speaker: Charlie Clarke
Topic: The TREC-8 and TREC-9 Questions Answering tracks.
Snacks: Glenn Paulley

Time: Friday, February 18, 2:00 pm, DC1304
Note room change!
Speaker: Jack Snoeyink, University of North Carolina
Topic: Polygon clipping, boolean operations, and overlay in double precision

Polygon clipping or polygon union are the first problems encountered in a graphics class that have non-trivial geometric algorithms. Thus, it is annoying that after presenting algorithms like plane sweep one has to admit that floating point errors will mean that the output will not be correct in all cases. Boissonnat and Preparata study the precision needed to compute intersections of line segments, which is a first step in polygon operations. I'll briefly survey this work, including two important points: 1. Sweep algorithms, or any other algorithm that can compute a trapezoidation of the input segments, require five times the input precision. 2. Algorithms that compute the arrangement of line segments require four times the input precision. Computing an arrangement is essential for correct polygon operations. Clipping or unioning two polygons, or overlaying two planar maps is a special case. We will see that in cases where the segments are colored red and blue so that there are no red/red or blue/blue segments crossing then the arrangement can be computed using only twice the input precision. The running time is optimal, and the programming time is reasonable, even including the handling of degenerate cases. This is joint work with Sergei Bespamyatnikh and Andrea Mantler.

Time: Friday, February 25, 2:00 pm, DC1304
Speaker: Mariano Consens
Topic: Processing Signed XML

The goal of this talk is to motivate research problems arising from the processing of digitally signed XML documents. We are likely to start seeing rapid and widespread adoption of this technology for Internet applications. I will provide a brief description of the standards activity in this area, present a scenario based in the application of signed XML to process payments, and describe the characteristics and challenges in the area.

Snacks: Charlie Clarke

Time: Friday, March 3, 2:00 pm, DC1331
Speaker: Ani Nica
Topic: Query optimization techniques for queries with expensive predicates

This talk will describe issues related to query optimization for queries with expensive predicates and user defined predicates. For such queries, the traditional method of evaluating predicates as early as possible may result in poor performance. I will present the optimization algorithm proposed in [1] for optimizing such queries and compare this approach with other known query optimization techniques for this class of queries (e.g., [2]).

  1. Surajit Chaudhuri and Kyuseok Shim, "Optimization of queries with user-defined predicates," ACM Trans. Database Syst. 24, 2 (Jun.1999), pp. 177-228.
  2. J.M. Hellerstein and M. Stonebraker, "Predicate migration: optimizing queries with expensive predicates" SIGMOD '93, Washington, DC (26-28 May 1993), SIGMOD Record 22, 2 (June 1993), pp. 267-77.
Snacks: Mariano Consens

Time: Tuesday, March 7, 4:00 pm, DC1304
Note day, room and time change!
Speaker: Johann Christoph Freytag, Humboldt-Universität zu Berlin
Topic: Quality-driven Integration of Heterogeneous Information Sources

Integrated access to information that is spread over multiple, distributed, and heterogeneous sources is an important problem in many scientific and commercial domains. While much work has been done on query processing and choosing plans under cost criteria, very little is known about the important problem of incorporating the information quality aspect into query planning.

In this talk we describe an initial framework for multi-database query processing that fully includes the quality of information in many facets, such as completeness, timeliness, accuracy, etc. We seamlessly include information quality into a multi-database query processor based on a view-rewriting mechanism. We model information quality at different levels to ultimately find a set of high-quality query-answering plans.

This work was done in cooperation with Felix Naumann (Humboldt-Universität zu Berlin) and Ulf Leser (Technische Universität Berlin).

Time: Friday, March 10, 2:00 pm, DC1331
Speaker: none
Topic: "open mike"

Time: Friday, March 17, 2:00 pm, DC1331
Speaker: Nick Cercone
Topic: Objects, Concepts, Attributes, Values, States, Actions, Activities, Cases, Causes, Intentions,... or What makes a mountain a mountain?
Snacks: Ani Nica

Time: Friday, March 24, 2:00 pm, DC1304
Note room change!
Speaker: Ivan T. Bowman
Topic: Implementing Aggregation and Duplicate Deletion for Data with a Skewed Distribution

This talk will discuss algorithms for aggregation and duplicate deletion. I will survey existing algorithms [1,2,3], and discuss the early aggregation approach [3] which reduces execution costs for data with a skewed distribution. No single algorithm is best for all situations, so I will compare the algorithms according to several quality metrics, including: performance, memory consumption, external storage, number of temporary files, opportunities for parallelism, and difficulty of implementation.

Note: Larson previously presented results on early aggregation [2] at the University of Waterloo database group seminar.

  1. Epstein, R.. Techniques for Processing of Aggregates in Relational Database Systems. UCB/ERL Memo M79/8, Univ. California, Berkeley, California, 1979.
  2. Goetz Graefe. Query Evaluation Techniques for Large Databases. ACM Computing Surveys, 25:2 (1993), pp 73-169.
  3. Per-Åke Larson. Grouping and Duplicate Elimination: Benefits of Early Aggregation. Microsoft Corporation, 1997. Home Page
Snacks: Nick Cercone

Time: Friday, March 31, 2:00 pm, DC1331
Speaker: Xuerong Tang
Topic: Forest-regular languages and Forest-regular grammars

Currently structured documents (e.g. XML or SGML data) are defined by context-free grammars, which are captured as DTDs (Document Type Definition). Recently forest-regular languages were proposed by several researchers as a better means for formalizing structured text. This talk serves as an introduction to this language class and its potential application in the area of document processing and transformation.

Snacks: Ivan Bowman

Time: Friday, April 7, 2:00 pm, DC1331
Speaker: Peter Bumbulis
Topic: An industrial strength recovery mechanism based on before-image shadow paging

This talk will discuss a database recovery mechanism of which there seems to be little mention in the academic literature, other than a few oblique references such as in [1]. This recovery mechanism has been implemented (and documented) in at least 2 commercially available database systems. This mechanism places no constraints on buffer fetch and eviction (steal) and supports fast commits (no-force), fine-grained locking, and partial rollbacks. It is also relatively simple to implement.

  1. C. Mohan: "Commit_LSN: A Novel and Simple Method for Reducing Locking and Latching in Transaction Processing Systems." VLDB 1990: 406-418.
Snacks: Xuerong Tang

Time: Friday, April 14, 2:00 pm, DC1331
Speaker: Larry Saxton
Topic: Complexity of database query languages

The standard database query languages (SQL, relational algebra, relational calculus) are known to be implementable in a time that is small polynomial in the size of the database. This implies that many useful queries (such as transitive closure) cannot be expressed in a database query language. In order to increase the computational power, various language constructs (looping, recursion, fixed points) can be considered or different data types allowed (such as lists, arrays, sets). Using these possibilities, various classes of computable queries will be exhibited.

Snacks: Peter Bumbulis

Time: Friday, April 21,
No meeting

Time: Friday, April 28, 2:00 pm, DC1331
Speaker: Anthony Cox
Topic: Oh! No! Not another query language!

The area of information retrieval (IR) primarily focusses on the use of keywords to search very large databases, usually containing text or semi-structured data (i.e. XML). In this application it is crucial that query facilities be efficient in order to support the scale of database needed. However, with the growth of memory and disk space on modern desktop computers, users can no longer manage all their data in an adhoc manner, thus creating a need for the adaption of IR techniques for smaller scale databases. Source code management, personal email archives, and local WWW sites are examples of smaller datasets.

When doing this small scale IR, tasks are typically more complex than a simple keyword search. For example, in source code, tasks can include the generation of (function) call-graphs and program slices. These tasks need more powerful query languages which can tradeoff efficiency to gain expressiveness. Hermes, a prototype query language, is my attempt at a more powerful query language. By combining GCL, a query language developed for large scale IR, with a functional programming language, Hermes permits queries to be formulated that are more expressive and flexible than other query languages. This presentation examines Hermes and its root languages, GCL and Scheme. Also, the implementation of the Okapi ranking algorithm in Hermes will be presented as an example of Hermes' use as a prototyping and development system for IR algorithms.

Snacks: Larry Saxton

This page is maintained by Frank Tompa.