|Time:||Friday, September 17 , 2:00 pm, DC1331|
1. General introductions around the table, including brief descriptions of research interests.
2. Revitalizing the database lab.
3. Speaker rotation.
|Time:||Friday, September 24 , 2:00 pm, DC1331|
The Query Matching Problem (MMath essay presentation)
This essay examines algorithms for rewriting queries by using views.
In typical commercial database systems the data reside in a central machine, called the server. The server can be reached by a number of clients via communication lines. Clients send queries to the server and get results from the server.
Query caching was introduced to improve the efficiency of this model. The basic idea is: after a client gets the query results from the server, it caches the results in the local machine. Later, the client can use the cached query results to process the subsequent queries.
One of the problems we must solve for query caching is query matching, i.e., given a set of materialized views V1, V2, ... , Vm, and query Q, which, if any, of V1 , V2, ... , Vm should be used to help answer the query Q? We need to rewrite the query Q, i.e., find another query Q' that uses the views and returns the same results as Q but with less cost.
|Time:||Friday, October 1, 2:00 pm, DC1331|
|Speaker:||Daniel M. Germán|
Another Query Language for XML
Several moons ago, Dr. Tompa entertained our crowd with an introduction to the XML query language that he had begun to develop. Not surprisingly, there have been other attempts to attack the same problem. The most significant (IMHO) is the work by Deutsch, Fernandez, Florescu, Levy and Suciu, who presented a paper entitled A query language for XML during WWW8. In my talk, I will overview their language, including its data model.
|Follow-up:||Larry Saxton suggested: Data Engineering's September 1999 issue|
|Time:||Friday, October 8, 2:00 pm, DC1331|
Probabilistic Models in Information Retrieval
My talk will be based on a survey of probabilistic methods in IR, which is published in ACM Computing Surveys, Dec. 1998. I will also try to formulate some hand-wavy arguments about how the probabilistic approach to IR can be nicely combined with the natural language approach to IR, which I am primarily interested in. This question becomes more interesting in the light of successful application of statistics to NLP itself.
|Time:||Friday, October 15, 2:00 pm, DC1331|
|Time:||Friday, October 22, 2:00 pm, DC1331|
|Snacks:||Daniel M. Germán|
Adaptive Set Intersections, Unions, and Differences (to appear SODA 2000)
Erik D. Demaine, Alejandro López-Ortiz and J. Ian Munro
Motivated by boolean queries in text database systems, we consider the problems of finding the intersection, union, or difference of a collection of sorted sets. While the worst-case complexity of these problems is straightforward, we consider a notion of complexity that depends on the particular instance. We develop the idea of a proof that a given set is indeed the correct answer. Proofs, and in particular shortest proofs, are characterized. We present adaptive algorithms that make no a priori assumptions about the problem instance, and show that their running times are within a constant factor of optimal with respect to a natural measure of the difficulty of an instance. In the process, we develop a framework for designing and evaluating adaptive algorithms in the comparison model.
|Time:||Friday, October 29, 2:00 pm, DC1331|
Traditional information retrieval involves finding documents about a particular topic. Question Answering involves finding a specific fact, usually answered by a word or short phrase. For example, "Who was Lincoln's Secretary of State?" (answer: William Seward) or "What make of car was featured in 'Back to the Future?'" (answer: DeLorean).
TREC 8 introduces a new Question and Answer track, in which we participated. Our submission was pretty rudimentary but we had lots of other stuff that didn't quite get off the launch pad in time. I'll talk about both.
|Time:||Friday, November 5, 2:00 pm, DC1331|
|Topic:||Building a parallel database on a MPP machine requires message passing communication between nodes. The first rule is: don't communicate if at all possible. While this is achievable in certain queries, for many others communication between nodes is necessary. The default communication schemes are UDP or TCP. UDP is unreliable, unordered and bounded packet sizes. It requires a non-trivial amount of additional work to make it useful for a parallel database. TCP was designed for wide area network communications with uncoordinated applications competing for bandwidth. This is not the same environment as cooperating processes on a tightly coupled parallel processor architecture. There have been various attempts to deal with these shortcomings. In this talk I will detail the problems and discuss some of the proposed solutions for dealing with them.|
Note room change!
Current Projects and Future Directions in Distributed and Multimedia
I will be talking about the on-going projects, in my group, on database management. These projects are focused on various aspects of distributed data management and multimedia data management. I will also talk about open problems that I am interested in investigating. The talk is mainly aimed at graduate students who do not yet have a supervisor and are looking for research topics. Nevertheless, anyone interested in database research is encouraged to attend.
|Time:||Friday, November 19, 2:00 pm, DC1331|
A Framework for Predicting Secondary Biosequences
Let x be a biosequence (e.g., a protein) and y=f(x) be some secondary biosequence (e.g., description of protein structure) where f is a deterministic, unknown function (in this case, the protein folding function). Several problems in computational biology are of the form, given sequence x, what is y? This talk will present a general strategy for predicting y given x and a database of pairs <xi, f(xi)>, using patterns occurring in the xi sequences.
|Time:||Friday, November 26, 2:00 pm, DC1331|
A Look at Nortel's Meridian Switch as a Software Testbed
We have arranged to acquire a switch and the software base from Nortel for research and education purposes. The presentation will include a brief description of the switch's capabilities, an overview of its software architecture, and an introduction to the application of DEMO to re-engineering the software to operate as a main-memory database.
|Time:||Friday, December 3, 2:00 pm, DC1331|
|Speaker:||Vitaliy L. Khizder|
Uniqueness constraints in Object-Relational Databases and Description Logics
The generalization of the Relational Model to models with complex objects that allow objects to have other objects as values of their attributes gives rise to so-called "path functions" (or chains of attributes). This in turn gives rise to more general kinds of, for example, functional dependencies. We call these generalized functional dependencies uniqueness constraints.
As it turns out, it is rather helpful to express these and other kinds of constraints in so-called Description Logics (DLs) which are a family of knowledge representation schemas. I will describe how DLs capture the generalized constraints and present some results on efficient reasoning about such constraints.
|Time:||Friday, December 10, 2:00 pm, DC1331|
|Snacks:||Vitaliy L. Khizder|
The performance of a database engine is critically dependent on the ability
to (a) detect cases where only a subset of data needs to be recovered from
underlying relational tables, and (b) to buffer results already computed
so as to avoid unnecessary recomputation.
I will be talking about the strategy that I am employing to compute the subsets of data that need to be recovered from large text tables, and the methods employed which detect when such recovered data becomes invalid, and must thus be recomputed.
I will also be talking about the need to generalise these strategies so that the dynamic restrictions imposed on the rows returned by tables, based on currently visable data within the database engine, be extended to operate both on currently visable data, and on data not yet logically computed by the engine.