Note room change!
How to Roll a Join|
(Asynchronous Incremental View Maintenance)
I will discuss the general problem of maintaining a materialized join view as the underlying tables change. I'll also describe a specific view maintenance algorithm that I worked on last year while I was visiting IBM Almaden.
Note room change!
Query Processing in Embedded Control Programs
An embedded control program can be viewed as a small main-memory database system tailored to suit the needs of a particular application. For performance reasons, the program will usually define low-level data structures to store the database which in turn have to be understood by application developers and maintainers. This is in contrast with the data independence that can be achieved by using a relational database system. However, because of space and performance requirements, the use of a traditional DBMS is not likely to be feasible in this setting.
To partly address this, we have developed a query optimizer capable of compiling SQL/OQL queries on a conceptual schema to native Java or C code that navigates low-level data structures. Of particular significance is that an arbitrary collection of such structures, perhaps already devised for an earlier version of the control program, can be given as a part of the input. The underlying algorithms used in our optimizer to accomplish this are the main focus of this paper. The algorithms are based on a novel resource bounded plan generation mechanism in which integrity constraints abstracting the definition of stored views are applied to source queries to extend the search space of possible query plans. We also report on some preliminary experiment results that suggest generated code navigates the data structures with an efficiency comparable to code written directly by expert programmers.
|Time:||Friday, February 4, 2:00 pm, DC1331|
Aspects of query optimization in Sybase SQL Anywhere
I'll briefly discuss three things:
Bowman, Ivan T. and G. N. Paulley (March 2000): "Join enumeration in a memory-constrained environment." In Proceedings, 16th IEEE Data Engineering Conference, San Diego, California. To appear.
|Time:||Friday, February 11, 2:00 pm, DC1331|
|Topic:||The TREC-8 and TREC-9 Questions Answering tracks.|
Note room change!
|Speaker:||Jack Snoeyink, University of North Carolina|
Polygon clipping, boolean operations, and overlay in double precision
Polygon clipping or polygon union are the first problems encountered in a graphics class that have non-trivial geometric algorithms. Thus, it is annoying that after presenting algorithms like plane sweep one has to admit that floating point errors will mean that the output will not be correct in all cases. Boissonnat and Preparata study the precision needed to compute intersections of line segments, which is a first step in polygon operations. I'll briefly survey this work, including two important points: 1. Sweep algorithms, or any other algorithm that can compute a trapezoidation of the input segments, require five times the input precision. 2. Algorithms that compute the arrangement of line segments require four times the input precision. Computing an arrangement is essential for correct polygon operations. Clipping or unioning two polygons, or overlaying two planar maps is a special case. We will see that in cases where the segments are colored red and blue so that there are no red/red or blue/blue segments crossing then the arrangement can be computed using only twice the input precision. The running time is optimal, and the programming time is reasonable, even including the handling of degenerate cases. This is joint work with Sergei Bespamyatnikh and Andrea Mantler.
|Time:||Friday, February 25, 2:00 pm, DC1304|
Processing Signed XML
The goal of this talk is to motivate research problems arising from the processing of digitally signed XML documents. We are likely to start seeing rapid and widespread adoption of this technology for Internet applications. I will provide a brief description of the standards activity in this area, present a scenario based in the application of signed XML to process payments, and describe the characteristics and challenges in the area.
|Time:||Friday, March 3, 2:00 pm, DC1331|
Query optimization techniques for queries with expensive predicates
This talk will describe issues related to query optimization for queries with expensive predicates and user defined predicates. For such queries, the traditional method of evaluating predicates as early as possible may result in poor performance. I will present the optimization algorithm proposed in  for optimizing such queries and compare this approach with other known query optimization techniques for this class of queries (e.g., ).
|Time:||Tuesday, March 7, 4:00 pm, DC1304 |
Note day, room and time change!
|Speaker:||Johann Christoph Freytag, Humboldt-Universität zu Berlin|
Quality-driven Integration of Heterogeneous Information Sources
Integrated access to information that is spread over multiple, distributed, and heterogeneous sources is an important problem in many scientific and commercial domains. While much work has been done on query processing and choosing plans under cost criteria, very little is known about the important problem of incorporating the information quality aspect into query planning.
In this talk we describe an initial framework for multi-database query processing that fully includes the quality of information in many facets, such as completeness, timeliness, accuracy, etc. We seamlessly include information quality into a multi-database query processor based on a view-rewriting mechanism. We model information quality at different levels to ultimately find a set of high-quality query-answering plans.
This work was done in cooperation with Felix Naumann (Humboldt-Universität zu Berlin) and Ulf Leser (Technische Universität Berlin).
|Time:||Friday, March 10, 2:00 pm, DC1331|
|Time:||Friday, March 17, 2:00 pm, DC1331|
|Topic:||Objects, Concepts, Attributes, Values, States, Actions, Activities, Cases, Causes, Intentions,... or What makes a mountain a mountain?|
Note room change!
|Speaker:||Ivan T. Bowman|
Implementing Aggregation and Duplicate Deletion for Data with a
This talk will discuss algorithms for aggregation and duplicate deletion. I will survey existing algorithms [1,2,3], and discuss the early aggregation approach  which reduces execution costs for data with a skewed distribution. No single algorithm is best for all situations, so I will compare the algorithms according to several quality metrics, including: performance, memory consumption, external storage, number of temporary files, opportunities for parallelism, and difficulty of implementation.
Note: Larson previously presented results on early aggregation  at the University of Waterloo database group seminar.
|Time:||Friday, March 31, 2:00 pm, DC1331|
Forest-regular languages and Forest-regular grammars
Currently structured documents (e.g. XML or SGML data) are defined by context-free grammars, which are captured as DTDs (Document Type Definition). Recently forest-regular languages were proposed by several researchers as a better means for formalizing structured text. This talk serves as an introduction to this language class and its potential application in the area of document processing and transformation.
|Time:||Friday, April 7, 2:00 pm, DC1331|
An industrial strength recovery mechanism based on before-image shadow paging
This talk will discuss a database recovery mechanism of which there seems to be little mention in the academic literature, other than a few oblique references such as in . This recovery mechanism has been implemented (and documented) in at least 2 commercially available database systems. This mechanism places no constraints on buffer fetch and eviction (steal) and supports fast commits (no-force), fine-grained locking, and partial rollbacks. It is also relatively simple to implement.
|Time:||Friday, April 14, 2:00 pm, DC1331|
Complexity of database query languages
The standard database query languages (SQL, relational algebra, relational calculus) are known to be implementable in a time that is small polynomial in the size of the database. This implies that many useful queries (such as transitive closure) cannot be expressed in a database query language. In order to increase the computational power, various language constructs (looping, recursion, fixed points) can be considered or different data types allowed (such as lists, arrays, sets). Using these possibilities, various classes of computable queries will be exhibited.
|Time:||Friday, April 21,|
|Time:||Friday, April 28, 2:00 pm, DC1331|
Oh! No! Not another query language!
The area of information retrieval (IR) primarily focusses on the use of keywords to search very large databases, usually containing text or semi-structured data (i.e. XML). In this application it is crucial that query facilities be efficient in order to support the scale of database needed. However, with the growth of memory and disk space on modern desktop computers, users can no longer manage all their data in an adhoc manner, thus creating a need for the adaption of IR techniques for smaller scale databases. Source code management, personal email archives, and local WWW sites are examples of smaller datasets.
When doing this small scale IR, tasks are typically more complex than a simple keyword search. For example, in source code, tasks can include the generation of (function) call-graphs and program slices. These tasks need more powerful query languages which can tradeoff efficiency to gain expressiveness. Hermes, a prototype query language, is my attempt at a more powerful query language. By combining GCL, a query language developed for large scale IR, with a functional programming language, Hermes permits queries to be formulated that are more expressive and flexible than other query languages. This presentation examines Hermes and its root languages, GCL and Scheme. Also, the implementation of the Okapi ranking algorithm in Hermes will be presented as an example of Hermes' use as a prototyping and development system for IR algorithms.