Database Research Group Events

Fall 1998

Friday 09 October 1998 - research group meeting,  2pm, DC1331
Speaker: Connie Zhang
Snacks: Reem Al-Halimi
Topic: Finding Conflicts Among Parameterized Transaction Classes

Friday 16 October 1998 - seminar double header
Talk 1: 10:30am, DC1302
Speaker: Dan Ford, IBM Almaden Research Center
Title: Grand Central Station: Searching all digital sources of data
Grand Central Station is a system that extends search to all digital
sources of data.  The system consists of components to access and
understand data sources and generate searchable metadata ("Gatherers"), a
metadata repository for satisfying ad hoc queries, and an extensible
profiling system for processing persistent queries.

The Gatherer is an extensible crawler framework written in Java that is
capable of using a variety of protocols (e.g., http, ftp, nntp, odbc, cics,
pop3) to access and understand a wide range of data formats (HTML, Java
Bytecode, PowerPoint, TAR/Zip archives, and many others).  The Gatherer
generates summaries of each data source it encounters in an instance of XML
we call SumML (Summary Metalanguage).  Key features of the Gatherer are its
ability to be easily extended by adding protocol and data source specific
code, and its ability to run, unchanged, on any platform that supports Java.

The metadata repository is less advanced, but the Profiling framework is
moving forward to encompass multimedia profiling.

The system has been deployed in the form of a Java specific search engine
called "jCentral" and is accessible from IBM's Java home page.

This talk will present and demonstrate the system, and discuss future
directions into searching video, image and audio.
Talk 2: 2:00pm, DC1304
Speaker: Roberta Cochrane, IBM Almaden Research Center
Title: Intersection Stacking for Multi-dimensional Aggregation in RDBMSs
Business Intelligence applications perform complex aggregation for
large amounts (typically 1 to 10 Terabytes) of data.  This puts
increasing demands on database systems to provide native support for
such processing, often referred to as Online Analytical Processing
(OLAP).  SQL has recently extended the group-by clause to provide
for common OLAP computations in the DBMS, allowing the
DBMS more flexibility in processing and optimizing such aggregation.
These computations are the data-cube, rollup, concatenations
of rollup (multi-dimensional cube), and combinations of ad-hoc
grouping elements. The specification of the group-by clause can expand
into many grouping sets.  For example, the cube alone will result in
2**n grouping sets where n is the number of grouping elements.  In this
talk I will present the SQL OLAP extensions and describe a novel
technique for stacking grouping operations.  Our technique results in
linear expansion of grouping sets, greatly reducing the amount of
complexity and resources required to optimize and compute such

Friday 23 October 1998 - research group meeting, 2pm, DC1331
Speaker: Vlado Keselj
Snacks: Connie Zhang
Topic: Determining Text Databases to Search in the Internet
The Internet can be seen as a large document collection, i.e., as a kind
of large database.  However, since it is very dynamic a better
approximation would be to treat it as a collection of databases.
Actually, it is a hierarchy of databases with smaller or local databases,
and larger or global databases based upon the smaller ones.  The larger
databases are not really databases, but interfaces to the local ones.

Some problems associated with this situation, are:

  • when a user query comes to a global database, who do we select which local databases to subquery? (It is not efficient to query all of them.)
  • how many documents to retrieve from each local database, and to merge them? (collection fusion problem)
These problems are treated in the vector space model.

My presentation is based on a VLDB-98 paper by Meng et al., with the above

Friday 30 October 1998 - research group meeting, 2pm, DC1331
Speaker: Forbes Burkowski
Snacks: Vlado Keselj
Topic: Algorithms for Mining Association Rules for Binary Segmentations of Huge Categorical Databases
                   (a VLDB paper by Morimoto et al) 

Friday 06 November 1998 - research group meeting, 2pm, DC1331
Speaker: Peter Buhr
Snacks: Forbes Burkowski
Topic: Generalizing Database Access Methods
Data structure libraries, like Leda/STL for C++, provide a toolkit for constructing both standard and experimental primary-memory data-structures.

This talk will discuss preliminary work in building the equivalent of a data structure library but for secondary-memory data-structures, e.g., B-Tree, R-Tree, etc. Traditionally, secondary-memory data-structures are called access methods and each provides some specialized technique for quickly accessing related data on secondary storage.  Unfortunately, traditional access methods are usually hand-coded from scratch, normally requiring substantial knowledge of the underlying file system to build correctly and efficiently. The complexity in building new access methods impedes experimenting with new specialized data structures for accessing non-traditional data, e.g., text, images, highly-related data (CAD/CAM), etc.

Currently, we have generalized one group of access methods: search trees. The generalizations are the parts of the search tree developers possibly need to specialize.  As well, we have created a small set of specialization components for each generalization so a developer is not required to write all the components from scratch. By judiciously selecting components from the library and only specializing components needed for a new algorithm/data-structure, a new access method can be created, and subsequently tested, significantly faster than via the traditional approach.

Friday 13 November 1998 - research group meeting, 2pm, DC1331

Speaker: Daniel Morales-German
Snacks: Peter Buhr
Topic: Database Techniques for the World Wide Web

Friday 20 November 1998 - research group meeting, 2pm, DC1331

Speaker: Hugh Chipman, Department of Statistics
Snacks: Daniel Morales-German
Topic: Tree models: Roots and Recent Branches

Friday 27 November 1998 - research group meeting, 2pm, DC1331

Speaker: Mariano Consens
Snacks: Ken Salem
Topic: TBA

Friday 04 December 1998 - seminar, 2:00pm, DC1304
              Speaker: Renee Miller, University of Toronto
Title: Managing Heterogeneous Schemas and Data
Schematic heterogeneity arises when information that is represented as
data under one schema, is represented within the schema (as metadata)
in another.  Schematic heterogeneity is an important class of
heterogeneity that arises frequently in integrating legacy data for
data warehousing applications.  Traditional query languages and view
mechanisms are insufficient for reconciling and translating data
between schematically heterogeneous schemas.  Higher order query
languages, that permit quantification over schema labels, have been
proposed to permit querying and restructuring of data between
schematically disparate schemas.  We extend this work by considering
how these languages can be used in practice with minimal extensions to
existing query processing engines.  Specifically, we consider the
problem of using higher order views to answer queries in a
heterogeneous environment.  We give conditions under which a higher
order view is usable for answering a query and provide query
translation algorithms.  We show how our solutions permit schema
browsing and new forms of data independence that are important for
global information systems.  This is on-going work with Laura Haas
and the Garlic Heterogeneous Database group from IBM Almaden Research Labs.
Monday 07 December 1998 - seminar, 10:30pm, DC1304
              Speaker: M. Tamer Ozsu, University of Alberta
Title: Distributed  Objectbase  Management Systems, Multimedia, Interoperability
My  current  research  concentrates on three areas: (1) development  of  distributed object database management
systems  (ODBMS),  (2)  multimedia data management, and (3) interoperability of information systems.

My work on distributed ODBMS is concentrated around the development   of  TIGUKAT  (means  "object"  in  Inuit)
system.  TIGUKAT's object model is purely behavioral in nature  and  is  uniform.  Every  concept  that  can be
modeled  in  TIGUKAT is a first-class object with well-defined  behavior. This gives the system extensibility.
Current  work  involves  the  development  of  a  query language  and  its  optimization,  incorporation of the
temporal  dimension  into the object model, development of  a  programming  language,  and  distribution of the system.

My  multimedia  research  focuses  on  data  management issues. We are developing an object-oriented multimedia
database system which can support SGML/HyTime compliant documents.  An associated project is the development of
a  distributed  image database system. Current research involves  the  design  of  a multimedia query model and
language,  development  of a visual query interface and context-based indexing and access of images.

Interoperability  research goes beyond multidatabase by considering   the   inter-operability   of  information
systems in general. The approach is object-oriented and our   focus   is   on   the   use   of  object-oriented
characteristics to deal with interoperability problems.

Friday 11 December - research group meeting, 2pm, DC1331
Speaker: Anthony Cox
Snacks: Mariano Consens
Topic: A Searchable Source Code Repository
This talk will describe a prototype source code repository implemented
using the Multitext system.  Issues regarding the system structure, data
collection and querying will be presented.

Wednesday 16 December - MMath thesis presentation, 10:30am, DC1331
Speaker: Winnie Min-Min Yeung
Title: Efficient Evaluation of Geometric Expressions with a Single Plane-Sweep

Wednesday 16 December - MMath thesis presentation, 2pm, DC1331
Speaker: Peter Yang
Title: PAPRICCA:  Pre-Analyzing, PRedIcate-Based Concurrency Control Algorithm

Friday 18 December - MMath thesis presentation, 2pm, DC1304
Speaker: Ming Lei
Title: Efficient Processing of Spatial Join Qualification