Database Research Group Events

Fall 1999

Time: Friday, September 17 , 2:00 pm, DC1331
Speaker: everyone
Snacks: Grant Weddell
Topic: 1. General introductions around the table, including brief descriptions of research interests.
2. Revitalizing the database lab.
3. Speaker rotation.

Time: Friday, September 24 , 2:00 pm, DC1331
Speaker: Qing Zhang
Snacks: Forbes Burkowski
Topic: The Query Matching Problem (MMath essay presentation)

This essay examines algorithms for rewriting queries by using views.

In typical commercial database systems the data reside in a central machine, called the server. The server can be reached by a number of clients via communication lines. Clients send queries to the server and get results from the server.

Query caching was introduced to improve the efficiency of this model. The basic idea is: after a client gets the query results from the server, it caches the results in the local machine. Later, the client can use the cached query results to process the subsequent queries.

One of the problems we must solve for query caching is query matching, i.e., given a set of materialized views V1, V2, ... , Vm, and query Q, which, if any, of V1 , V2, ... , Vm should be used to help answer the query Q? We need to rewrite the query Q, i.e., find another query Q' that uses the views and returns the same results as Q but with less cost.

Time: Friday, October 1, 2:00 pm, DC1331
Speaker: Daniel M. Germán
Snacks: Qing Zhang
Topic: Another Query Language for XML

Several moons ago, Dr. Tompa entertained our crowd with an introduction to the XML query language that he had begun to develop. Not surprisingly, there have been other attempts to attack the same problem. The most significant (IMHO) is the work by Deutsch, Fernandez, Florescu, Levy and Suciu, who presented a paper entitled A query language for XML during WWW8. In my talk, I will overview their language, including its data model.

Follow-up: Larry Saxton suggested: Data Engineering's September 1999 issue

Time: Friday, October 8, 2:00 pm, DC1331
Speaker: Vlado Keselj
Snacks: Vlado Keselj
Topic: Probabilistic Models in Information Retrieval

My talk will be based on a survey of probabilistic methods in IR, which is published in ACM Computing Surveys, Dec. 1998. I will also try to formulate some hand-wavy arguments about how the probabilistic approach to IR can be nicely combined with the natural language approach to IR, which I am primarily interested in. This question becomes more interesting in the light of successful application of statistics to NLP itself.

Time: Friday, October 15, 2:00 pm, DC1331

Time: Friday, October 22, 2:00 pm, DC1331
Speaker: Ian Munro
Snacks: Daniel M. Germán
Topic: Adaptive Set Intersections, Unions, and Differences (to appear SODA 2000) Erik D. Demaine, Alejandro López-Ortiz and J. Ian Munro

Motivated by boolean queries in text database systems, we consider the problems of finding the intersection, union, or difference of a collection of sorted sets. While the worst-case complexity of these problems is straightforward, we consider a notion of complexity that depends on the particular instance. We develop the idea of a proof that a given set is indeed the correct answer. Proofs, and in particular shortest proofs, are characterized. We present adaptive algorithms that make no a priori assumptions about the problem instance, and show that their running times are within a constant factor of optimal with respect to a natural measure of the difficulty of an instance. In the process, we develop a framework for designing and evaluating adaptive algorithms in the comparison model.

Time: Friday, October 29, 2:00 pm, DC1331
Speaker: Gord Cormack
Snacks: Ian Munro
Topic: Question Answering

Traditional information retrieval involves finding documents about a particular topic. Question Answering involves finding a specific fact, usually answered by a word or short phrase. For example, "Who was Lincoln's Secretary of State?" (answer: William Seward) or "What make of car was featured in 'Back to the Future?'" (answer: DeLorean).

TREC 8 introduces a new Question and Answer track, in which we participated. Our submission was pretty rudimentary but we had lots of other stuff that didn't quite get off the launch pad in time. I'll talk about both.

Time: Friday, November 5, 2:00 pm, DC1331
Speaker: Paul Ward
Snacks: Gord Cormack
Topic: Building a parallel database on a MPP machine requires message passing communication between nodes. The first rule is: don't communicate if at all possible. While this is achievable in certain queries, for many others communication between nodes is necessary. The default communication schemes are UDP or TCP. UDP is unreliable, unordered and bounded packet sizes. It requires a non-trivial amount of additional work to make it useful for a parallel database. TCP was designed for wide area network communications with uncoordinated applications competing for bandwidth. This is not the same environment as cooperating processes on a tightly coupled parallel processor architecture. There have been various attempts to deal with these shortcomings. In this talk I will detail the problems and discuss some of the proposed solutions for dealing with them.

Time: Friday, November 12, 2:00 pm, DC1304
Note room change!
Speaker: Tamer Özsu
Snacks: Paul Ward
Topic: Current Projects and Future Directions in Distributed and Multimedia Data Management

I will be talking about the on-going projects, in my group, on database management. These projects are focused on various aspects of distributed data management and multimedia data management. I will also talk about open problems that I am interested in investigating. The talk is mainly aimed at graduate students who do not yet have a supervisor and are looking for research topics. Nevertheless, anyone interested in database research is encouraged to attend.

Time: Friday, November 19, 2:00 pm, DC1331
Speaker: Christian Cumbaa
Snacks: Frank Tompa
Topic: A Framework for Predicting Secondary Biosequences

Let x be a biosequence (e.g., a protein) and y=f(x) be some secondary biosequence (e.g., description of protein structure) where f is a deterministic, unknown function (in this case, the protein folding function). Several problems in computational biology are of the form, given sequence x, what is y? This talk will present a general strategy for predicting y given x and a database of pairs <xi, f(xi)>, using patterns occurring in the xi sequences.

Time: Friday, November 26, 2:00 pm, DC1331
Speaker: Grant Weddell
Snacks: Christian Cumbaa
Topic: A Look at Nortel's Meridian Switch as a Software Testbed

We have arranged to acquire a switch and the software base from Nortel for research and education purposes. The presentation will include a brief description of the switch's capabilities, an overview of its software architecture, and an introduction to the application of DEMO to re-engineering the software to operate as a main-memory database.

Time: Friday, December 3, 2:00 pm, DC1331
Speaker: Vitaliy L. Khizder
Snacks: Grant Weddell
Topic: Uniqueness constraints in Object-Relational Databases and Description Logics

The generalization of the Relational Model to models with complex objects that allow objects to have other objects as values of their attributes gives rise to so-called "path functions" (or chains of attributes). This in turn gives rise to more general kinds of, for example, functional dependencies. We call these generalized functional dependencies uniqueness constraints.

As it turns out, it is rather helpful to express these and other kinds of constraints in so-called Description Logics (DLs) which are a family of knowledge representation schemas. I will describe how DLs capture the generalized constraints and present some results on efficient reasoning about such constraints.

Time: Friday, December 10, 2:00 pm, DC1331
Speaker: Ian Davis
Snacks: Vitaliy L. Khizder
Topic: The performance of a database engine is critically dependent on the ability to (a) detect cases where only a subset of data needs to be recovered from underlying relational tables, and (b) to buffer results already computed so as to avoid unnecessary recomputation.

I will be talking about the strategy that I am employing to compute the subsets of data that need to be recovered from large text tables, and the methods employed which detect when such recovered data becomes invalid, and must thus be recomputed.

I will also be talking about the need to generalise these strategies so that the dynamic restrictions imposed on the rows returned by tables, based on currently visable data within the database engine, be extended to operate both on currently visable data, and on data not yet logically computed by the engine.

This page is maintained by Frank Tompa.