[Please remove <h1>]
Fall 1999
Time: |
Friday,
September 17
, 2:00 pm, DC1331 |
Speaker: |
everyone
|
Snacks: |
Grant Weddell
|
Topic: |
1. General introductions around the table, including brief descriptions of research interests.
2. Revitalizing the database lab.
3. Speaker rotation.
|
Time: |
Friday,
September 24
, 2:00 pm, DC1331 |
Speaker: |
Qing Zhang
|
Snacks: |
Forbes Burkowski
|
Topic: |
The Query Matching Problem (MMath essay presentation)
This essay examines algorithms for rewriting queries by using views.
In typical commercial database systems the data reside in a central
machine, called the server. The server can be reached by a number of
clients via communication lines. Clients send queries to the server
and get results from the server.
Query caching was introduced to improve the
efficiency of this model. The basic idea is: after a client gets the query
results from the server, it caches the results in the local machine.
Later, the client can use the cached query results to process the
subsequent queries.
One of the problems we must solve for query caching is query
matching, i.e., given a set of materialized views V1, V2, ... , Vm, and
query Q, which, if any, of V1 , V2, ... , Vm should be used to help
answer the query Q?
We need to rewrite the query Q, i.e., find
another query Q' that uses the views and returns the same results as Q
but with less cost.
|
Time: |
Friday,
October 1,
2:00 pm, DC1331 |
Speaker: |
Daniel M. Germán
|
Snacks: |
Qing Zhang
|
Topic: |
Another Query Language for XML
Several moons ago, Dr. Tompa entertained our crowd with an
introduction to the XML query language that he had begun to develop.
Not surprisingly, there have been other attempts to attack the same
problem. The most significant (IMHO) is the work by Deutsch,
Fernandez, Florescu, Levy and Suciu, who presented a paper entitled
A query language for XML
during WWW8. In my talk, I will overview
their language, including its data model.
|
Follow-up: |
Larry Saxton suggested: Data Engineering's September 1999 issue
|
Time: |
Friday,
October 8,
2:00 pm, DC1331 |
Speaker: |
Vlado Keselj
|
Snacks: |
Vlado Keselj
|
Topic: |
Probabilistic Models in Information Retrieval
My talk will be based on a survey of probabilistic methods in IR, which is
published in
ACM Computing Surveys, Dec. 1998.
I will also try to
formulate some hand-wavy arguments about how the probabilistic approach to
IR can be nicely combined with the natural language approach to IR, which
I am primarily interested in. This question becomes more interesting in
the light of successful application of statistics to NLP itself.
|
Time: |
Friday,
October 15,
2:00 pm, DC1331 |
| cancelled
|
Time: |
Friday,
October 22,
2:00 pm, DC1331 |
Speaker: |
Ian Munro
|
Snacks: |
Daniel M. Germán
|
Topic: |
Adaptive Set Intersections, Unions, and Differences (to appear SODA 2000)
Erik D. Demaine, Alejandro López-Ortiz and J. Ian Munro
Motivated by boolean queries in text database systems,
we consider the problems
of finding the intersection, union, or difference of a collection of
sorted sets. While the worst-case complexity of these problems
is straightforward, we consider a notion of complexity that
depends on the particular instance.
We develop the idea of a proof that a given set is indeed the correct
answer. Proofs, and in particular shortest
proofs, are characterized. We present adaptive
algorithms that make no a priori assumptions about the problem instance,
and show that their running times are within a constant factor of optimal with
respect to a natural measure of the difficulty of an instance. In the process,
we develop a framework for designing and evaluating adaptive algorithms in the
comparison model.
|
Time: |
Friday,
October 29,
2:00 pm, DC1331 |
Speaker: |
Gord Cormack
|
Snacks: |
Ian Munro
|
Topic: |
Question Answering
Traditional information retrieval involves finding documents about
a particular topic. Question Answering involves finding a specific
fact, usually answered by a word or short phrase. For example,
"Who was Lincoln's Secretary of State?" (answer: William Seward)
or "What make of car was featured in 'Back to the Future?'"
(answer: DeLorean).
TREC 8 introduces a new Question and Answer track, in which we participated.
Our submission was pretty rudimentary but we had lots of other stuff
that didn't quite get off the launch pad in time. I'll talk about both.
|
Time: |
Friday,
November 5,
2:00 pm, DC1331 |
Speaker: |
Paul Ward
|
Snacks: |
Gord Cormack
|
Topic: |
Building a parallel database on a MPP machine requires message passing
communication between nodes. The first rule is: don't communicate if
at all possible. While this is achievable in certain queries, for
many others communication between nodes is necessary. The default
communication schemes are UDP or TCP. UDP is unreliable, unordered
and bounded packet sizes. It requires a non-trivial amount of
additional work to make it useful for a parallel database. TCP was
designed for wide area network communications with uncoordinated
applications competing for bandwidth. This is not the same
environment as cooperating processes on a tightly coupled parallel
processor architecture. There have been various attempts to deal with
these shortcomings. In this talk I will detail the problems and
discuss some of the proposed solutions for dealing with them.
|
Time: |
Friday,
November 12,
2:00 pm,
DC1304 Note room change! |
Speaker: |
Tamer Özsu
|
Snacks: |
Paul Ward
|
Topic: |
Current Projects and Future Directions in Distributed and Multimedia
Data Management
I will be talking about the on-going projects, in my group, on database
management. These projects are focused on various aspects of distributed
data management and multimedia data management. I will also talk about open
problems that I am interested in investigating. The talk is mainly aimed at
graduate students who do not yet have a supervisor and are looking for
research topics.
Nevertheless, anyone interested in database research is encouraged to attend.
|
Time: |
Friday,
November 19,
2:00 pm, DC1331 |
Speaker: |
Christian Cumbaa
|
Snacks: |
Frank Tompa
|
Topic: |
A Framework for Predicting Secondary Biosequences
Let x be a biosequence (e.g., a protein) and y=f(x) be some secondary
biosequence (e.g., description of protein structure) where f is a
deterministic, unknown function (in this case, the protein folding
function). Several problems in computational biology are of the form,
given sequence x, what is y? This talk will present a general
strategy for predicting y given x and a database of pairs
<xi, f(xi)>,
using patterns occurring in the xi sequences.
|
Time: |
Friday,
November 26,
2:00 pm, DC1331 |
Speaker: |
Grant Weddell
|
Snacks: |
Christian Cumbaa
|
Topic: |
A Look at Nortel's Meridian Switch as a Software Testbed
We have arranged to acquire a switch and the software base from Nortel for research and education purposes.
The presentation will include a brief description of the switch's capabilities,
an overview of its software architecture,
and an introduction to the application of DEMO to re-engineering the software
to operate as a main-memory database.
|
Time: |
Friday,
December 3,
2:00 pm, DC1331 |
Speaker: |
Vitaliy L. Khizder
|
Snacks: |
Grant Weddell
|
Topic: |
Uniqueness constraints in Object-Relational Databases and Description Logics
The generalization of the Relational Model to models with complex objects that
allow objects to have other objects as values of their attributes gives
rise to so-called "path functions" (or chains of attributes). This in turn
gives rise to more general kinds of, for example, functional dependencies.
We call these generalized functional dependencies uniqueness constraints.
As it turns out, it is rather helpful to express these and other kinds of
constraints in so-called Description Logics (DLs) which are a family of
knowledge representation schemas. I will describe how DLs capture the
generalized constraints and present some results on efficient reasoning
about such constraints.
|
Time: |
Friday,
December 10,
2:00 pm, DC1331 |
Speaker: |
Ian Davis
|
Snacks: |
Vitaliy L. Khizder
|
Topic: |
The performance of a database engine is critically dependent on the ability
to (a) detect cases where only a subset of data needs to be recovered from
underlying relational tables, and (b) to buffer results already computed
so as to avoid unnecessary recomputation.
I will be talking about the strategy that I am employing to compute the
subsets of data that need to be recovered from large text tables, and the
methods employed which detect when such recovered data becomes invalid,
and must thus be recomputed.
I will also be talking about the need to generalise these strategies so that
the dynamic restrictions imposed on the rows returned by tables, based on
currently visable data within the database engine, be extended to operate
both on currently visable data, and on data not yet logically computed by
the engine.
|
This page is maintained by
Frank Tompa.