[Please remove <h1>]
Winter 2000
Time: |
Friday,
January 14,
2:00 pm,
DC1302 Note room change! |
Speaker: |
Ken Salem
|
Topic: |
How to Roll a Join (Asynchronous Incremental View Maintenance)
I will discuss the general problem of maintaining a materialized join
view as the underlying tables change. I'll also describe a
specific view maintenance algorithm that I worked on last year
while I was visiting IBM Almaden.
|
Snacks: |
Ian Davis
|
Time: |
Friday,
January 21,
2:00 pm,
DC1304 Note room change! |
Speaker: |
David Toman
|
Topic: |
Query Processing in Embedded Control Programs
An embedded control program can be viewed as a small main-memory database
system tailored to suit the needs of a particular application. For
performance reasons, the program will usually define low-level data
structures to store the database which in turn have to be understood by
application developers and maintainers. This is in contrast with the data
independence that can be achieved by using a relational database
system. However, because of space and performance requirements, the use
of a traditional DBMS is not likely to be feasible in this setting.
To partly address this, we have developed a query optimizer capable of
compiling SQL/OQL queries on a conceptual schema to native Java or C code
that navigates low-level data structures. Of particular significance is
that an arbitrary collection of such structures, perhaps already devised
for an earlier version of the control program, can be given as a part of
the input. The underlying algorithms used in our optimizer to accomplish
this are the main focus of this paper. The algorithms are based on a
novel resource bounded plan generation mechanism in which integrity
constraints abstracting the definition of stored views are applied to
source queries to extend the search space of possible query plans. We
also report on some preliminary experiment results that suggest generated
code navigates the data structures with an efficiency comparable to code
written directly by expert programmers.
|
Snacks: |
Ken Salem
|
Time: |
Friday,
February 4,
2:00 pm, DC1331 |
Speaker: |
Glenn Paulley
|
Topic: |
Aspects of query optimization in Sybase SQL Anywhere
I'll briefly discuss three things:
- an overview of SQL Anywhere Studio and its technologies
- characteristics of our customer base, and their applications, that make query optimization challenging
- an overview of the join enumeration algorithm used in SQL Anywhere.
All three topics are discussed in a paper
that I will present at Data Engineering in March:
Bowman, Ivan T. and G. N. Paulley (March 2000): "Join enumeration in a
memory-constrained environment." In Proceedings,
16th IEEE Data Engineering Conference, San Diego, California. To appear.
|
Snacks: |
David Toman
|
Time: |
Friday,
February 11,
2:00 pm, DC1331 |
Speaker: |
Charlie Clarke
|
Topic: |
The TREC-8 and TREC-9 Questions Answering tracks.
|
Snacks: |
Glenn Paulley
|
Time: |
Friday,
February 18,
2:00 pm,
DC1304 Note room change! |
Speaker: |
Jack Snoeyink, University of North Carolina
|
Topic: |
Polygon clipping, boolean operations, and overlay in double precision
Polygon clipping or polygon union are the first problems encountered
in a graphics class that have non-trivial geometric algorithms. Thus,
it is annoying that after presenting algorithms like plane sweep one
has to admit that floating point errors will mean that the output will
not be correct in all cases. Boissonnat and Preparata study the
precision needed to compute intersections of line segments, which is a
first step in polygon operations. I'll briefly survey this work,
including two important points: 1. Sweep algorithms, or any other
algorithm that can compute a trapezoidation of the input segments,
require five times the input precision. 2. Algorithms that compute the
arrangement of line segments require four times the input precision.
Computing an arrangement is essential for correct polygon operations.
Clipping or unioning two polygons, or overlaying two planar maps is a
special case. We will see that in cases where the segments are colored
red and blue so that there are no red/red or blue/blue segments
crossing then the arrangement can be computed using only twice
the input precision. The running time is optimal, and the programming
time is reasonable, even including the handling of degenerate cases.
This is joint work with Sergei Bespamyatnikh and Andrea Mantler.
|
Time: |
Friday,
February 25,
2:00 pm, DC1304 |
Speaker: |
Mariano Consens
|
Topic: |
Processing Signed XML
The goal of this talk is to motivate research problems arising from the
processing of digitally signed XML documents. We are likely to start seeing
rapid and widespread adoption of this technology for Internet applications.
I will provide a brief description of the standards activity in this area,
present a scenario based in the application of signed XML to process
payments, and describe the characteristics and challenges in the area.
|
Snacks: |
Charlie Clarke
|
Time: |
Friday,
March 3,
2:00 pm, DC1331 |
Speaker: |
Ani Nica
|
Topic: |
Query optimization techniques for queries with expensive predicates
This talk will describe issues related to query optimization for queries with
expensive predicates and user defined predicates.
For such queries, the traditional method of evaluating predicates as early as
possible may result in poor performance. I will present
the optimization algorithm proposed in [1] for optimizing such queries and
compare this approach with other known
query optimization techniques for this class of queries (e.g., [2]).
- Surajit Chaudhuri and Kyuseok Shim,
"Optimization of queries with user-defined predicates,"
ACM Trans. Database Syst. 24, 2 (Jun.1999), pp. 177-228.
- J.M. Hellerstein and M. Stonebraker,
"Predicate migration: optimizing queries with expensive predicates"
SIGMOD '93, Washington, DC (26-28 May 1993),
SIGMOD Record 22, 2 (June 1993), pp. 267-77.
|
Snacks: |
Mariano Consens
|
Time: |
Tuesday, March 7, 4:00 pm, DC1304 Note day, room and time change! |
Speaker: |
Johann Christoph Freytag, Humboldt-Universität zu Berlin
|
Topic: |
Quality-driven Integration of Heterogeneous Information Sources
Integrated access to information that is spread over multiple, distributed, and heterogeneous sources is an important
problem in many scientific and commercial domains. While
much work has been done on query processing and choosing
plans under cost criteria, very little is known about the
important problem of incorporating the information quality
aspect into query planning.
In this talk we describe an initial framework for multi-database query processing that fully includes the quality of
information in many facets, such as completeness, timeliness, accuracy, etc. We seamlessly include information
quality into a multi-database query processor based on a
view-rewriting mechanism. We model information quality at
different levels to ultimately find a set of high-quality
query-answering plans.
This work was done in cooperation with Felix Naumann
(Humboldt-Universität zu Berlin) and Ulf Leser (Technische
Universität Berlin).
|
Time: |
Friday,
March 10,
2:00 pm, DC1331 |
Speaker: |
none
|
Topic: |
"open mike"
|
Time: |
Friday,
March 17,
2:00 pm, DC1331 |
Speaker: |
Nick Cercone
|
Topic: |
Objects, Concepts, Attributes, Values, States, Actions, Activities, Cases,
Causes, Intentions,...
or
What makes a mountain a mountain?
|
Snacks: |
Ani Nica
|
Time: |
Friday,
March 24,
2:00 pm,
DC1304 Note room change! |
Speaker: |
Ivan T. Bowman
|
Topic: |
Implementing Aggregation and Duplicate Deletion for Data with a
Skewed Distribution
This talk will discuss algorithms for aggregation and duplicate deletion. I
will survey existing algorithms [1,2,3], and discuss the early aggregation
approach [3] which reduces execution costs for data with a skewed distribution.
No single algorithm is best for all situations, so I will compare the algorithms
according to several quality metrics, including: performance, memory
consumption, external storage, number of temporary files, opportunities for
parallelism, and difficulty of implementation.
Note: Larson previously presented results on early aggregation [2] at the
University of Waterloo database group seminar.
- Epstein, R.. Techniques for Processing of Aggregates in Relational Database
Systems.
UCB/ERL Memo M79/8, Univ. California, Berkeley, California, 1979.
- Goetz Graefe. Query Evaluation Techniques for Large Databases. ACM
Computing Surveys, 25:2 (1993), pp 73-169.
- Per-Åke Larson. Grouping and Duplicate Elimination: Benefits of Early
Aggregation. Microsoft Corporation, 1997.
Home Page
|
Snacks: |
Nick Cercone
|
Time: |
Friday,
March 31,
2:00 pm, DC1331 |
Speaker: |
Xuerong Tang
|
Topic: |
Forest-regular languages and Forest-regular grammars
Currently structured documents (e.g. XML or SGML data) are defined by
context-free grammars, which are captured as DTDs (Document Type
Definition). Recently forest-regular languages were proposed by several
researchers as a better means for formalizing structured text. This talk
serves as an introduction to this language class and its potential application
in the area of document processing and transformation.
|
Snacks: |
Ivan Bowman
|
Time: |
Friday,
April 7,
2:00 pm, DC1331 |
Speaker: |
Peter Bumbulis
|
Topic: |
An industrial strength recovery mechanism based on before-image shadow paging
This talk will discuss a database recovery mechanism of which there seems to
be little mention in the academic literature, other than a few oblique
references such as in [1]. This recovery mechanism has been implemented (and
documented) in at least 2 commercially available database systems. This
mechanism places no constraints on buffer fetch and eviction (steal) and
supports fast commits (no-force), fine-grained locking, and partial
rollbacks. It is also relatively simple to implement.
- C. Mohan: "Commit_LSN: A Novel and Simple Method for Reducing Locking and
Latching in Transaction Processing Systems." VLDB 1990: 406-418.
|
Snacks: |
Xuerong Tang
|
Time: |
Friday,
April 14,
2:00 pm, DC1331 |
Speaker: |
Larry Saxton
|
Topic: |
Complexity of database query languages
The standard database query languages (SQL, relational algebra, relational
calculus) are known to be implementable in a time that is small polynomial
in the size of the database. This implies that many useful queries (such as
transitive closure) cannot be expressed in a database query language. In
order to increase the computational power, various language constructs
(looping, recursion, fixed points) can be considered or different data types
allowed (such as lists, arrays, sets). Using these possibilities, various
classes of computable queries will be exhibited.
|
Snacks: |
Peter Bumbulis
|
Time: |
Friday,
April 21,
|
No meeting
|
Time: |
Friday,
April 28,
2:00 pm, DC1331 |
Speaker: |
Anthony Cox
|
Topic: |
Oh! No! Not another query language!
The area of information retrieval (IR) primarily focusses on the use of
keywords to search very large databases, usually containing text or
semi-structured data (i.e. XML). In this application it is crucial that
query facilities be efficient in order to support the scale of database
needed. However, with the growth of memory and disk space on modern
desktop computers, users can no longer manage all their data in an adhoc
manner, thus creating a need for the adaption of IR techniques for smaller
scale databases. Source code management, personal email archives, and
local WWW sites are examples of smaller datasets.
When doing this small scale IR, tasks are typically more complex than a
simple keyword search. For example, in source code, tasks can include the
generation of (function) call-graphs and program slices. These tasks
need more powerful query languages which can tradeoff efficiency to gain
expressiveness. Hermes, a prototype query language, is my attempt at a
more powerful query language. By combining GCL, a query language
developed for large scale IR, with a functional programming language,
Hermes permits queries to be formulated that are more expressive and
flexible than other query languages. This presentation examines Hermes
and its root languages, GCL and Scheme. Also, the implementation of the
Okapi ranking algorithm in Hermes will be presented as an example of
Hermes' use as a prototyping and development system for IR algorithms.
|
Snacks: |
Larry Saxton
|
This page is maintained by
Frank Tompa.