Database Research Group Events -- Fall 1998

[Please remove <h1>]

Fall 1998

 
Friday 09 October 1998 - research group meeting,  2pm, DC1331
Speaker: Connie Zhang
Snacks: Reem Al-Halimi
Topic: Finding Conflicts Among Parameterized Transaction Classes


Friday 16 October 1998 - seminar double header
Talk 1: 10:30am, DC1302
Speaker: Dan Ford, IBM Almaden Research Center
Title: Grand Central Station: Searching all digital sources of data
Abstract:
Grand Central Station is a system that extends search to all digital
sources of data.  The system consists of components to access and
understand data sources and generate searchable metadata ("Gatherers"), a
metadata repository for satisfying ad hoc queries, and an extensible
profiling system for processing persistent queries.

The Gatherer is an extensible crawler framework written in Java that is
capable of using a variety of protocols (e.g., http, ftp, nntp, odbc, cics,
pop3) to access and understand a wide range of data formats (HTML, Java
Bytecode, PowerPoint, TAR/Zip archives, and many others).  The Gatherer
generates summaries of each data source it encounters in an instance of XML
we call SumML (Summary Metalanguage).  Key features of the Gatherer are its
ability to be easily extended by adding protocol and data source specific
code, and its ability to run, unchanged, on any platform that supports Java.

The metadata repository is less advanced, but the Profiling framework is
moving forward to encompass multimedia profiling.

The system has been deployed in the form of a Java specific search engine
called "jCentral" and is accessible from IBM's Java home page.

This talk will present and demonstrate the system, and discuss future
directions into searching video, image and audio.
 
Talk 2: 2:00pm, DC1304
Speaker: Roberta Cochrane, IBM Almaden Research Center
Title: Intersection Stacking for Multi-dimensional Aggregation in RDBMSs
Abstract:
Business Intelligence applications perform complex aggregation for
large amounts (typically 1 to 10 Terabytes) of data.  This puts
increasing demands on database systems to provide native support for
such processing, often referred to as Online Analytical Processing
(OLAP).  SQL has recently extended the group-by clause to provide
primitives
for common OLAP computations in the DBMS, allowing the
DBMS more flexibility in processing and optimizing such aggregation.
These computations are the data-cube, rollup, concatenations
of rollup (multi-dimensional cube), and combinations of ad-hoc
grouping elements. The specification of the group-by clause can expand
into many grouping sets.  For example, the cube alone will result in
2**n grouping sets where n is the number of grouping elements.  In this
talk I will present the SQL OLAP extensions and describe a novel
technique for stacking grouping operations.  Our technique results in
linear expansion of grouping sets, greatly reducing the amount of
complexity and resources required to optimize and compute such
queries.


Friday 23 October 1998 - research group meeting, 2pm, DC1331
Speaker: Vlado Keselj
Snacks: Connie Zhang
Topic: Determining Text Databases to Search in the Internet
Abstract:
The Internet can be seen as a large document collection, i.e., as a kind
of large database.  However, since it is very dynamic a better
approximation would be to treat it as a collection of databases.
Actually, it is a hierarchy of databases with smaller or local databases,
and larger or global databases based upon the smaller ones.  The larger
databases are not really databases, but interfaces to the local ones.

Some problems associated with this situation, are:

  • when a user query comes to a global database, who do we select which local databases to subquery? (It is not efficient to query all of them.)
  • how many documents to retrieve from each local database, and to merge them? (collection fusion problem)
These problems are treated in the vector space model.

My presentation is based on a VLDB-98 paper by Meng et al., with the above
title.
 



Friday 30 October 1998 - research group meeting, 2pm, DC1331
Speaker: Forbes Burkowski
Snacks: Vlado Keselj
Topic: Algorithms for Mining Association Rules for Binary Segmentations of Huge Categorical Databases
                   (a VLDB paper by Morimoto et al) 

Friday 06 November 1998 - research group meeting, 2pm, DC1331
Speaker: Peter Buhr
Snacks: Forbes Burkowski
Topic: Generalizing Database Access Methods
Data structure libraries, like Leda/STL for C++, provide a toolkit for constructing both standard and experimental primary-memory data-structures.

This talk will discuss preliminary work in building the equivalent of a data structure library but for secondary-memory data-structures, e.g., B-Tree, R-Tree, etc. Traditionally, secondary-memory data-structures are called access methods and each provides some specialized technique for quickly accessing related data on secondary storage.  Unfortunately, traditional access methods are usually hand-coded from scratch, normally requiring substantial knowledge of the underlying file system to build correctly and efficiently. The complexity in building new access methods impedes experimenting with new specialized data structures for accessing non-traditional data, e.g., text, images, highly-related data (CAD/CAM), etc.

Currently, we have generalized one group of access methods: search trees. The generalizations are the parts of the search tree developers possibly need to specialize.  As well, we have created a small set of specialization components for each generalization so a developer is not required to write all the components from scratch. By judiciously selecting components from the library and only specializing components needed for a new algorithm/data-structure, a new access method can be created, and subsequently tested, significantly faster than via the traditional approach.


Friday 13 November 1998 - research group meeting, 2pm, DC1331

Speaker: Daniel Morales-German
Snacks: Peter Buhr
Topic: Database Techniques for the World Wide Web
      This Friday I will be talking about a paper that  appear recently in SIGMOD Record:

      Daniela Florescu, Alon Levy, Alberto Mendelson "Database Techniques
      for the World Wide Web: A Survey", SIGMOD Record, September 1998.

      The article describes the current research activities and directions
      in the area of databases and the WWW and focuses in three main areas:
      modeling and querying the web; information extraction and integration;
      and web site construction and restructuring.

      I will present an overview of this paper.


Friday 20 November 1998 - research group meeting, 2pm, DC1331

Speaker: Hugh Chipman, Department of Statistics
Snacks: Daniel Morales-German
Topic: Tree models: Roots and Recent Branches
    Trees provide flexible and often interpretable way to model data.  By
      using one or more explanatory variables and a tree-structured set of
      questions, tree models divide a population into similar groups.  This
      talk will provide an introduction and overview of tree models,
      including CART (Breiman et.  al., 1984) and C4.5 (Quinlan, 1993).
      Topics will include tree construction and validation, and some more
      recent methods for the identification and selection of trees when many
      different trees may fit the data well.

      This topic relates to the VLDB paper discussed by Forbes Burkowski on Oct 30/98.


Friday 27 November 1998 - research group meeting, 2pm, DC1331

Speaker: Mariano Consens
Snacks: Ken Salem
Topic: TBA

 
 
Friday 04 December 1998 - seminar, 2:00pm, DC1304
              Speaker: Renee Miller, University of Toronto
Title: Managing Heterogeneous Schemas and Data
Abstract:
Schematic heterogeneity arises when information that is represented as
data under one schema, is represented within the schema (as metadata)
in another.  Schematic heterogeneity is an important class of
heterogeneity that arises frequently in integrating legacy data for
data warehousing applications.  Traditional query languages and view
mechanisms are insufficient for reconciling and translating data
between schematically heterogeneous schemas.  Higher order query
languages, that permit quantification over schema labels, have been
proposed to permit querying and restructuring of data between
schematically disparate schemas.  We extend this work by considering
how these languages can be used in practice with minimal extensions to
existing query processing engines.  Specifically, we consider the
problem of using higher order views to answer queries in a
heterogeneous environment.  We give conditions under which a higher
order view is usable for answering a query and provide query
translation algorithms.  We show how our solutions permit schema
browsing and new forms of data independence that are important for
global information systems.  This is on-going work with Laura Haas
and the Garlic Heterogeneous Database group from IBM Almaden Research Labs.
directions into searching video, image and audio.
 

Monday 07 December 1998 - seminar, 10:30pm, DC1304
              Speaker: M. Tamer Ozsu, University of Alberta
Title: Distributed  Objectbase  Management Systems, Multimedia, Interoperability
Abstract:
 
My  current  research  concentrates on three areas: (1) development  of  distributed object database management
systems  (ODBMS),  (2)  multimedia data management, and (3) interoperability of information systems.

My work on distributed ODBMS is concentrated around the development   of  TIGUKAT  (means  "object"  in  Inuit)
system.  TIGUKAT's object model is purely behavioral in nature  and  is  uniform.  Every  concept  that  can be
modeled  in  TIGUKAT is a first-class object with well-defined  behavior. This gives the system extensibility.
Current  work  involves  the  development  of  a  query language  and  its  optimization,  incorporation of the
temporal  dimension  into the object model, development of  a  programming  language,  and  distribution of the system.

My  multimedia  research  focuses  on  data  management issues. We are developing an object-oriented multimedia
database system which can support SGML/HyTime compliant documents.  An associated project is the development of
a  distributed  image database system. Current research involves  the  design  of  a multimedia query model and
language,  development  of a visual query interface and context-based indexing and access of images.

Interoperability  research goes beyond multidatabase by considering   the   inter-operability   of  information
systems in general. The approach is object-oriented and our   focus   is   on   the   use   of  object-oriented
characteristics to deal with interoperability problems.


Friday 11 December - research group meeting, 2pm, DC1331
Speaker: Anthony Cox
Snacks: Mariano Consens
Topic: A Searchable Source Code Repository
This talk will describe a prototype source code repository implemented
using the Multitext system.  Issues regarding the system structure, data
collection and querying will be presented.

Wednesday 16 December - MMath thesis presentation, 10:30am, DC1331
Speaker: Winnie Min-Min Yeung
Title: Efficient Evaluation of Geometric Expressions with a Single Plane-Sweep

Wednesday 16 December - MMath thesis presentation, 2pm, DC1331
Speaker: Peter Yang
Title: PAPRICCA:  Pre-Analyzing, PRedIcate-Based Concurrency Control Algorithm

Friday 18 December - MMath thesis presentation, 2pm, DC1304
Speaker: Ming Lei
Title: Efficient Processing of Spatial Join Qualification