[Please remove <h1>]
Winter 1999
Friday 08 January- research group meeting,
2pm, DC1331
-
Speaker: Koji Ueda
-
Snacks: Anthony Cox
Topic:
My current interest is GIS in distributed objects environment (especially,
CORBA and Java). My research topic is how to transfer a lot of geometric
objects from server-side to client-side efficiently. Of course the (almost)
fastest way to do so is to establish a TCP connection between the server
and the client and to send bulk of data through the connection in a proprietary
format. But this method not only makes the program much more complicated
but also increases the mutual dependency between server-side and client-side.
Almost all database server products provide an interface like the following:
Result result = database.query( condition ); while( result.more() ) doSomething(
result.get() ); This interface is easy to use for client-side. But is it
efficient? Usually, the server provides a stub library with which client
programs should be linked and this stub performs caching. So this interface
is efficient enough and has been used very commonly. But things are different
in distributed objects environment because usually every method call is
a remote method call to server-side, not a local method call to the client
stub. How can we make this interface efficient? Currently I have two ideas
to solve this problem. 1. Using Java RMI, dynamically download caching
classes to client-side. 2. Using object wrapper mechanizm in VisiBroker
(one of CORBA products), insert caching codes into the ORB level. I'm currently
implementing these two solutions. Transfer time and data size will be evaluated
and compared.
Friday 15 January - research group meeting
2pm, DC1331
-
Speaker: Vitaliy Khizder
-
Snacks: Koji Ueda
Title: Vitaliy will be talking about functional dependency constraints
in description logics.
Friday 22 January - research group meeting
2pm, DC1331
-
Speaker: Curtis Cartmill
-
Snacks: Vitaliy Khizder
Title: What is Information Integration?
This week, I'll be talking about a topic I stumbled upon when looking through
other AI fields.. Information Integration! Although many different Information
Integration systems exist, I will be using as examples, systems such as
SIMS and Ariadne, which were developed at the Information Sciences Institute
at the University of California. Information Integration is different from
a multiple database application in one important way, the types of information
sources that are used to retrieve information. Usually, much effort is
put into designing efficient database tables from data models, so that
we can easily find the location of information that we wish to further
explore or process. One of the main problems in Information Integration,
is that we may want to query multiple heterogeneous information sources,
such as databases, knowledge bases, and web pages, each with different
levels of structure in the data. So how do we do it? A lot of work has
been put into trying to abstract the problem into a higher-level. Usually
there exists one domain model for the entire problem domain, and each information
source has a conceptual model representing the type of information contained
within the data. Each information source is accessed through a 'wrapper'.
The wrapper is mainly responsible for three tasks: (1) receiving and deciphering
incoming queries (described by the high-level domain model) (2) retrieving
the correct information from the native data source (i.e. databases, unstructured
text, web pages) and (3) repackaging the results before returning the data
back through the wrapper. This is a very diverse field of research, so
we'll see what I get through in a one-hour presentation. I was planning
on talking about a couple of papers on the topic of Information Integrations,
which discuss semi-automatic wrapper generation, modeling domain and information
source ontologies, and query planning and execution. On a final note of
interest, the Information Integration 'wrapper' community has now seen
a small glimpse of a possible strong enemy, XML technology. I'll also mention
how XML's rich semantic information, may help put 'wrapper' developers
out of business.
Friday 29 January - research group meeting
2pm, DC1331
-
Speaker: Tim Snider
-
Snacks: Curtis Cartmill
-
Topic: I've heard "rumblings" that Tim plans a lively discussion on the
topic of semi-structured data! He particularly requests that Frank, Marianno
and Paul show up, and has also asked me to be there to ensure a open non-defensive
calm environment.
Friday 5 February - research group meeting
2pm, DC1331
-
Speaker: Gord Cormack
-
Snacks: Tim Snider
Topic: I am going to talk about one or more of the following things:
1. How can we estimate information retrieval performance on an infinite
collection, based on a sample? [This is related to, but not the same problem
as I spoke about before; that formulation was limited to large finite collections]
2. How can we formulate 1, 2, or 3 term queries that outperform the 60
term queries that are the state-of-the art for probabilistic information
retrieval. 3. How can we evaluate queries efficiently for very very large
corpuses? (say, 10^12 documents). I'll talk about probabilistic methods
as well as our own. All of these are research in progress, for which I
have some ideas but not a complete solution.
Friday 12 February - research group meeting
2pm, DC1331
-
Speaker: Frank Tompa
-
Snacks: Gord Cormack
Topic: On querying XML documents! Frank will be presenting some ideas
he is developing for an XML query language.
Friday 19 February - research group meeting
2pm, DC1331
-
Speaker: Peter Bumbulis
-
Snacks: Frank Tompa
Topic: TBA
Friday 26 February - research group meeting
2pm, DC1331
-
Speaker: Mike Van Biesbrouck
-
Snacks: Peter Bumbulis
Topic: This talk will be an overview of the work that I am doing for
my thesis: compiling GCL queries as if they were functional programs. I
will give a short overview of the MultiText system so that I can explain
what I am trying to do and why it is worthwhile. Some of the optimizations
and code generation details will be discussed. If there is time, I will
hypothesize about the benefits of using lazy evaluation instead of strict
evaluation for the functional programs, something that I am not doing for
my thesis.
Friday 5 March - research group meeting
2pm, DC1331
-
Speaker: Ani Nica
-
Snacks: Mike Van Biesbrouck
Topic: I will present some of the issues from the TODS (vol 22, no1,
March 1997, pages 43-74) paper: "Outerjoin Simplification and Reordering
for Query Optimization" by Cesar Galindo-Legaria and Arnon Rosenthal.
Friday 12 March - research group meeting
2pm, DC1331
-
Speaker: Huizhu Liu
-
Snacks: Ani Nica
Topic: This Friday, I will try to give a survey of information integration,
or multi-databases projects. Businesses today need to access and combine
data stored in diverse sources with differing capabilities. Therefore more
and more interests are paid to information integration technologies. There
are many groups of people working on it. I will try to present this topic
by discussing various methods used in: 1) overall architecture 2) data
modelling 3) source description 4) most importantly, query optimization
Particularly, I will talk about the cost-based query optimization of mediator
in Garlic project developed by IBM Almaden Lab and if I have time, the
two-phase query optimization in DISCO project developed by INREA.
Friday 19 March - research group meeting
2pm, DC1331
-
Speaker: Glenn Paulley
-
Snacks: Huizhu Liu
Topic: TBA
Friday 26 March - research
group meeting 2pm, DC1331
-
Speaker: Christian Combaa
-
Snacks: Glenn Paulley
-
Topic: For the DB group talk this week, I will discuss the algorithm described
in the paper
-
"Combinatorial pattern discovery in biological sequence: the TEIRESIAS
algorithm" by Isidore Rigoutsos (IBM Thomas J. Watson) and Aris Floratos
(Courant Institute of thematical Sciences, NYU), published in _Bioinformatics_,
vol. 14, no. 1 (1998).
-
TEIRESIAS finds all maximally specific, rigid patterns occurring at least
a minimum number of times in a set of (biological) sequences. The authors
argue that the algorithm runs in time quasi-linear to the size of the generated
output.
-
I will compare TEIRESIAS to the data-mining algorithm Apriori (another
IBM product), in terms of efficiency and applicability to biosequence data.
Friday 9 April - research group meeting
2pm, DC1331
-
Speaker: Paul Ward
-
Snacks: Christian Combaa
-
Topic: Darrell Raymond's PhD thesis on Partial Order Databases
Friday 16 April - research group meeting
2pm, DC1331
-
Speaker: Arun Marathe
-
Snacks: Paul Ward
-
Topic: a practice presentation for a paper by Arun and Ken that will be
presented at SIGMOD 1999
Friday 23 April - research group meeting
2pm, DC1331
-
Speaker: Connie Zhang
-
Snacks: Arun Marathe
-
Topic: Transaction programs are comprised of read and
write operations issued against the database. In
a shared database system, one transaction program conflicts with another
if it reads or writes data that
another transaction program has written. This thesis presents a semi-automatic
technique for pairwise static conflict analysis of embedded transaction
programs. The analysis predicts whether a given pair of programs will conflict
when executed against the database.
-
There are several potential applications
of this technique, the most obvious
being transaction concurrency control
in systems where it is not necessary
to support arbitrary, dynamic queries and updates.
By analyzing transactions in such systems before
a transaction runs, it is possible to reduce or eliminate the
need for locking or other dynamic concurrency
control schemes.
Friday 30 April - research group meeting
2pm, DC1331
-
Speaker: Ian Davis
-
Snacks: Connie Zhang
-
Topic: This talk will explore the implications of implementing a language
(derived from the TRDBMS project) that allows semi-structured text to be
converted into relations, for use within conventional database technology.
The file structures used to support this implementation, and the data structures
used ithin these file structures will be explained. The algorithmic
process needed to perform selection given a particular request in this
text language will be explored, and if time permits I will move on to discuss
the unforseen significance/curse of the generalised birthday paradox when
applied to retrieval of small volumes of information from huge indices.