Database Research Group Events

Winter 1999

Friday 08 January- research group meeting, 2pm, DC1331
Speaker: Koji Ueda
Snacks: Anthony Cox
My current interest is GIS in distributed objects environment (especially, CORBA and Java). My research topic is how to transfer a lot of geometric objects from server-side to client-side efficiently. Of course the (almost) fastest way to do so is to establish a TCP connection between the server and the client and to send bulk of data through the connection in a proprietary format. But this method not only makes the program much more complicated but also increases the mutual dependency between server-side and client-side. Almost all database server products provide an interface like the following: Result result = database.query( condition ); while( result.more() ) doSomething( result.get() ); This interface is easy to use for client-side. But is it efficient? Usually, the server provides a stub library with which client programs should be linked and this stub performs caching. So this interface is efficient enough and has been used very commonly. But things are different in distributed objects environment because usually every method call is a remote method call to server-side, not a local method call to the client stub. How can we make this interface efficient? Currently I have two ideas to solve this problem. 1. Using Java RMI, dynamically download caching classes to client-side. 2. Using object wrapper mechanizm in VisiBroker (one of CORBA products), insert caching codes into the ORB level. I'm currently implementing these two solutions. Transfer time and data size will be evaluated and compared.

Friday 15 January - research group meeting 2pm, DC1331
Speaker: Vitaliy Khizder
Snacks: Koji Ueda
Title: Vitaliy will be talking about functional dependency constraints in description logics.

Friday 22 January - research group meeting 2pm, DC1331
Speaker: Curtis Cartmill
Snacks: Vitaliy Khizder
Title: What is Information Integration?
This week, I'll be talking about a topic I stumbled upon when looking through other AI fields.. Information Integration! Although many different Information Integration systems exist, I will be using as examples, systems such as SIMS and Ariadne, which were developed at the Information Sciences Institute at the University of California. Information Integration is different from a multiple database application in one important way, the types of information sources that are used to retrieve information. Usually, much effort is put into designing efficient database tables from data models, so that we can easily find the location of information that we wish to further explore or process. One of the main problems in Information Integration, is that we may want to query multiple heterogeneous information sources, such as databases, knowledge bases, and web pages, each with different levels of structure in the data. So how do we do it? A lot of work has been put into trying to abstract the problem into a higher-level. Usually there exists one domain model for the entire problem domain, and each information source has a conceptual model representing the type of information contained within the data. Each information source is accessed through a 'wrapper'. The wrapper is mainly responsible for three tasks: (1) receiving and deciphering incoming queries (described by the high-level domain model) (2) retrieving the correct information from the native data source (i.e. databases, unstructured text, web pages) and (3) repackaging the results before returning the data back through the wrapper. This is a very diverse field of research, so we'll see what I get through in a one-hour presentation. I was planning on talking about a couple of papers on the topic of Information Integrations, which discuss semi-automatic wrapper generation, modeling domain and information source ontologies, and query planning and execution. On a final note of interest, the Information Integration 'wrapper' community has now seen a small glimpse of a possible strong enemy, XML technology. I'll also mention how XML's rich semantic information, may help put 'wrapper' developers out of business.
Friday 29 January - research group meeting 2pm, DC1331
Speaker: Tim Snider
Snacks: Curtis Cartmill
Topic: I've heard "rumblings" that Tim plans a lively discussion on the topic of semi-structured data! He particularly requests that Frank, Marianno and Paul show up, and has also asked me to be there to ensure a open non-defensive calm environment.

Friday 5 February - research group meeting 2pm, DC1331
Speaker: Gord Cormack
Snacks: Tim Snider
Topic: I am going to talk about one or more of the following things: 1. How can we estimate information retrieval performance on an infinite collection, based on a sample? [This is related to, but not the same problem as I spoke about before; that formulation was limited to large finite collections] 2. How can we formulate 1, 2, or 3 term queries that outperform the 60 term queries that are the state-of-the art for probabilistic information retrieval. 3. How can we evaluate queries efficiently for very very large corpuses? (say, 10^12 documents). I'll talk about probabilistic methods as well as our own. All of these are research in progress, for which I have some ideas but not a complete solution.

Friday 12 February - research group meeting 2pm, DC1331
Speaker: Frank Tompa
Snacks: Gord Cormack
Topic: On querying XML documents! Frank will be presenting some ideas he is developing for an XML query language.

Friday 19 February - research group meeting 2pm, DC1331
Speaker: Peter Bumbulis
Snacks: Frank Tompa
Topic: TBA

Friday 26 February - research group meeting 2pm, DC1331
Speaker: Mike Van Biesbrouck
Snacks: Peter Bumbulis
Topic: This talk will be an overview of the work that I am doing for my thesis: compiling GCL queries as if they were functional programs. I will give a short overview of the MultiText system so that I can explain what I am trying to do and why it is worthwhile. Some of the optimizations and code generation details will be discussed. If there is time, I will hypothesize about the benefits of using lazy evaluation instead of strict evaluation for the functional programs, something that I am not doing for my thesis.

Friday 5 March - research group meeting 2pm, DC1331
Speaker: Ani Nica
Snacks: Mike Van Biesbrouck
Topic: I will present some of the issues from the TODS (vol 22, no1, March 1997, pages 43-74) paper: "Outerjoin Simplification and Reordering for Query Optimization" by Cesar Galindo-Legaria and Arnon Rosenthal.

Friday 12 March - research group meeting 2pm, DC1331
Speaker: Huizhu Liu
Snacks: Ani Nica
Topic: This Friday, I will try to give a survey of information integration, or multi-databases projects. Businesses today need to access and combine data stored in diverse sources with differing capabilities. Therefore more and more interests are paid to information integration technologies. There are many groups of people working on it. I will try to present this topic by discussing various methods used in: 1) overall architecture 2) data modelling 3) source description 4) most importantly, query optimization Particularly, I will talk about the cost-based query optimization of mediator in Garlic project developed by IBM Almaden Lab and if I have time, the two-phase query optimization in DISCO project developed by INREA.

Friday 19 March - research group meeting 2pm, DC1331
Speaker: Glenn Paulley
Snacks: Huizhu Liu
Topic: TBA

Friday 26 March - research group meeting 2pm, DC1331
Speaker: Christian Combaa
Snacks: Glenn Paulley
Topic: For the DB group talk this week, I will discuss the algorithm described in the paper
"Combinatorial pattern discovery in biological sequence: the TEIRESIAS algorithm" by Isidore Rigoutsos (IBM Thomas J. Watson) and Aris Floratos (Courant Institute of thematical Sciences, NYU), published in _Bioinformatics_, vol. 14, no. 1 (1998).
TEIRESIAS finds all maximally specific, rigid patterns occurring at least a minimum number of times in a set of (biological) sequences. The authors argue that the algorithm runs in time quasi-linear to the size of the generated output.
I will compare TEIRESIAS to the data-mining algorithm Apriori (another IBM product), in terms of efficiency and applicability to biosequence data.



Friday 9 April - research group meeting 2pm, DC1331

Speaker: Paul Ward
Snacks: Christian Combaa
Topic: Darrell Raymond's PhD thesis on Partial Order Databases

Friday 16 April - research group meeting 2pm, DC1331

Speaker: Arun Marathe
Snacks: Paul Ward
Topic: a practice presentation for a paper by Arun and Ken that will be presented at SIGMOD 1999



Friday 23 April - research group meeting 2pm, DC1331

Speaker: Connie Zhang
Snacks: Arun Marathe
Topic: Transaction  programs  are  comprised of read and write operations  issued  against  the  database. In a shared database system, one transaction program conflicts with another  if  it  reads  or  writes  data  that  another transaction program has written. This thesis presents a semi-automatic  technique  for pairwise static conflict analysis of embedded transaction programs. The analysis predicts whether a given pair of programs will conflict when executed against the database.
There   are  several  potential  applications  of  this technique,   the   most   obvious   being   transaction concurrency   control   in  systems  where  it  is  not necessary  to  support  arbitrary,  dynamic queries and updates.  By  analyzing  transactions  in  such systems before  a transaction runs, it is possible to reduce or eliminate   the  need  for  locking  or  other  dynamic concurrency control schemes.

Friday 30 April - research group meeting 2pm, DC1331

Speaker: Ian Davis
Snacks: Connie Zhang
Topic: This talk will explore the implications of implementing a language (derived from the TRDBMS project) that allows semi-structured text to be converted into relations, for use within conventional database technology.  The file structures used to support this implementation, and the data structures used ithin these file structures will be explained.  The algorithmic process needed to perform selection given a particular request in this text language will be explored, and if time permits I will move on to discuss the unforseen significance/curse of the generalised birthday paradox when applied to retrieval of small volumes of information from huge indices.