Database Research Group Events -- Winter 1999

[Please remove <h1>]

Winter 1999

Friday 08 January- research group meeting, 2pm, DC1331

Speaker: Koji Ueda

Snacks: Anthony Cox

Topic: My current interest is GIS in distributed objects environment (especially, CORBA and Java). My research topic is how to transfer a lot of geometric objects from server-side to client-side efficiently. Of course the (almost) fastest way to do so is to establish a TCP connection between the server and the client and to send bulk of data through the connection in a proprietary format. But this method not only makes the program much more complicated but also increases the mutual dependency between server-side and client-side. Almost all database server products provide an interface like the following: Result result = database.query( condition ); while( result.more() ) doSomething( result.get() ); This interface is easy to use for client-side. But is it efficient? Usually, the server provides a stub library with which client programs should be linked and this stub performs caching. So this interface is efficient enough and has been used very commonly. But things are different in distributed objects environment because usually every method call is a remote method call to server-side, not a local method call to the client stub. How can we make this interface efficient? Currently I have two ideas to solve this problem. 1. Using Java RMI, dynamically download caching classes to client-side. 2. Using object wrapper mechanizm in VisiBroker (one of CORBA products), insert caching codes into the ORB level. I'm currently implementing these two solutions. Transfer time and data size will be evaluated and compared.

Friday 15 January - research group meeting 2pm, DC1331

Speaker: Vitaliy Khizder

Snacks: Koji Ueda

Title: Vitaliy will be talking about functional dependency constraints in description logics.

Friday 22 January - research group meeting 2pm, DC1331

Speaker: Curtis Cartmill

Snacks: Vitaliy Khizder

Title: What is Information Integration? This week, I'll be talking about a topic I stumbled upon when looking through other AI fields.. Information Integration! Although many different Information Integration systems exist, I will be using as examples, systems such as SIMS and Ariadne, which were developed at the Information Sciences Institute at the University of California. Information Integration is different from a multiple database application in one important way, the types of information sources that are used to retrieve information. Usually, much effort is put into designing efficient database tables from data models, so that we can easily find the location of information that we wish to further explore or process. One of the main problems in Information Integration, is that we may want to query multiple heterogeneous information sources, such as databases, knowledge bases, and web pages, each with different levels of structure in the data. So how do we do it? A lot of work has been put into trying to abstract the problem into a higher-level. Usually there exists one domain model for the entire problem domain, and each information source has a conceptual model representing the type of information contained within the data. Each information source is accessed through a 'wrapper'. The wrapper is mainly responsible for three tasks: (1) receiving and deciphering incoming queries (described by the high-level domain model) (2) retrieving the correct information from the native data source (i.e. databases, unstructured text, web pages) and (3) repackaging the results before returning the data back through the wrapper. This is a very diverse field of research, so we'll see what I get through in a one-hour presentation. I was planning on talking about a couple of papers on the topic of Information Integrations, which discuss semi-automatic wrapper generation, modeling domain and information source ontologies, and query planning and execution. On a final note of interest, the Information Integration 'wrapper' community has now seen a small glimpse of a possible strong enemy, XML technology. I'll also mention how XML's rich semantic information, may help put 'wrapper' developers out of business.

Friday 29 January - research group meeting 2pm, DC1331

Speaker: Tim Snider

Snacks: Curtis Cartmill

Topic: I've heard "rumblings" that Tim plans a lively discussion on the topic of semi-structured data! He particularly requests that Frank, Marianno and Paul show up, and has also asked me to be there to ensure a open non-defensive calm environment.

Friday 5 February - research group meeting 2pm, DC1331

Speaker: Gord Cormack

Snacks: Tim Snider

Topic: I am going to talk about one or more of the following things: 1. How can we estimate information retrieval performance on an infinite collection, based on a sample? [This is related to, but not the same problem as I spoke about before; that formulation was limited to large finite collections] 2. How can we formulate 1, 2, or 3 term queries that outperform the 60 term queries that are the state-of-the art for probabilistic information retrieval. 3. How can we evaluate queries efficiently for very very large corpuses? (say, 10^12 documents). I'll talk about probabilistic methods as well as our own. All of these are research in progress, for which I have some ideas but not a complete solution.

Friday 12 February - research group meeting 2pm, DC1331

Speaker: Frank Tompa

Snacks: Gord Cormack

Topic: On querying XML documents! Frank will be presenting some ideas he is developing for an XML query language.

Friday 19 February - research group meeting 2pm, DC1331

Speaker: Peter Bumbulis

Snacks: Frank Tompa

Topic: TBA

Friday 26 February - research group meeting 2pm, DC1331

Speaker: Mike Van Biesbrouck

Snacks: Peter Bumbulis

Topic: This talk will be an overview of the work that I am doing for my thesis: compiling GCL queries as if they were functional programs. I will give a short overview of the MultiText system so that I can explain what I am trying to do and why it is worthwhile. Some of the optimizations and code generation details will be discussed. If there is time, I will hypothesize about the benefits of using lazy evaluation instead of strict evaluation for the functional programs, something that I am not doing for my thesis.

Friday 5 March - research group meeting 2pm, DC1331

Speaker: Ani Nica

Snacks: Mike Van Biesbrouck

Topic: I will present some of the issues from the TODS (vol 22, no1, March 1997, pages 43-74) paper: "Outerjoin Simplification and Reordering for Query Optimization" by Cesar Galindo-Legaria and Arnon Rosenthal.

Friday 12 March - research group meeting 2pm, DC1331

Speaker: Huizhu Liu

Snacks: Ani Nica

Topic: This Friday, I will try to give a survey of information integration, or multi-databases projects. Businesses today need to access and combine data stored in diverse sources with differing capabilities. Therefore more and more interests are paid to information integration technologies. There are many groups of people working on it. I will try to present this topic by discussing various methods used in: 1) overall architecture 2) data modelling 3) source description 4) most importantly, query optimization Particularly, I will talk about the cost-based query optimization of mediator in Garlic project developed by IBM Almaden Lab and if I have time, the two-phase query optimization in DISCO project developed by INREA.

Friday 19 March - research group meeting 2pm, DC1331

Speaker: Glenn Paulley

Snacks: Huizhu Liu

Topic: TBA

Friday 26 March - research group meeting 2pm, DC1331

Speaker: Christian Combaa

Snacks: Glenn Paulley

Topic: For the DB group talk this week, I will discuss the algorithm described in the paper

"Combinatorial pattern discovery in biological sequence: the TEIRESIAS algorithm" by Isidore Rigoutsos (IBM Thomas J. Watson) and Aris Floratos (Courant Institute of thematical Sciences, NYU), published in _Bioinformatics_, vol. 14, no. 1 (1998).

TEIRESIAS finds all maximally specific, rigid patterns occurring at least a minimum number of times in a set of (biological) sequences. The authors argue that the algorithm runs in time quasi-linear to the size of the generated output.

I will compare TEIRESIAS to the data-mining algorithm Apriori (another IBM product), in terms of efficiency and applicability to biosequence data.

Friday 9 April - research group meeting 2pm, DC1331

Speaker: Paul Ward

Snacks: Christian Combaa

Topic: Darrell Raymond's PhD thesis on Partial Order Databases

Friday 16 April - research group meeting 2pm, DC1331

Speaker: Arun Marathe

Snacks: Paul Ward

Topic: a practice presentation for a paper by Arun and Ken that will be presented at SIGMOD 1999

Friday 23 April - research group meeting 2pm, DC1331

Speaker: Connie Zhang

Snacks: Arun Marathe

Topic: Transaction programs are comprised of read and write operations issued against the database. In a shared database system, one transaction program conflicts with another if it reads or writes data that another transaction program has written. This thesis presents a semi-automatic technique for pairwise static conflict analysis of embedded transaction programs. The analysis predicts whether a given pair of programs will conflict when executed against the database.

There are several potential applications of this technique, the most obvious being transaction concurrency control in systems where it is not necessary to support arbitrary, dynamic queries and updates. By analyzing transactions in such systems before a transaction runs, it is possible to reduce or eliminate the need for locking or other dynamic concurrency control schemes.

Friday 30 April - research group meeting 2pm, DC1331

Speaker: Ian Davis

Snacks: Connie Zhang

Topic: This talk will explore the implications of implementing a language (derived from the TRDBMS project) that allows semi-structured text to be converted into relations, for use within conventional database technology. The file structures used to support this implementation, and the data structures used ithin these file structures will be explained. The algorithmic process needed to perform selection given a particular request in this text language will be explored, and if time permits I will move on to discuss the unforseen significance/curse of the generalised birthday paradox when applied to retrieval of small volumes of information from huge indices.