[Please remove <h1>]
Fall 1998
-
Friday 09 October 1998 - research group meeting,
2pm, DC1331
-
Speaker: Connie Zhang
-
Snacks: Reem Al-Halimi
Topic: Finding Conflicts Among Parameterized Transaction Classes
-
Friday 16 October 1998 - seminar double header
-
Talk 1: 10:30am, DC1302
-
Speaker: Dan Ford, IBM Almaden Research Center
-
Title: Grand Central Station: Searching all digital sources of data
-
Abstract:
Grand Central Station is a system that extends search to all digital
sources of data. The system consists of components to access and
understand data sources and generate searchable metadata ("Gatherers"), a
metadata repository for satisfying ad hoc queries, and an extensible
profiling system for processing persistent queries.
The Gatherer is an extensible crawler framework written in Java that is
capable of using a variety of protocols (e.g., http, ftp, nntp, odbc, cics,
pop3) to access and understand a wide range of data formats (HTML, Java
Bytecode, PowerPoint, TAR/Zip archives, and many others). The Gatherer
generates summaries of each data source it encounters in an instance of XML
we call SumML (Summary Metalanguage). Key features of the Gatherer are its
ability to be easily extended by adding protocol and data source specific
code, and its ability to run, unchanged, on any platform that supports Java.
The metadata repository is less advanced, but the Profiling framework is
moving forward to encompass multimedia profiling.
The system has been deployed in the form of a Java specific search engine
called "jCentral" and is accessible from IBM's Java home page.
This talk will present and demonstrate the system, and discuss future
directions into searching video, image and audio.
-
Talk 2: 2:00pm, DC1304
-
Speaker: Roberta Cochrane, IBM Almaden Research Center
-
Title: Intersection Stacking for Multi-dimensional Aggregation in RDBMSs
-
Abstract:
Business Intelligence applications perform complex aggregation for
large amounts (typically 1 to 10 Terabytes) of data. This puts
increasing demands on database systems to provide native support for
such processing, often referred to as Online Analytical Processing
(OLAP). SQL has recently extended the group-by clause to provide
primitives
for common OLAP computations in the DBMS, allowing the
DBMS more flexibility in processing and optimizing such aggregation.
These computations are the data-cube, rollup, concatenations
of rollup (multi-dimensional cube), and combinations of ad-hoc
grouping elements. The specification of the group-by clause can expand
into many grouping sets. For example, the cube alone will result in
2**n grouping sets where n is the number of grouping elements. In this
talk I will present the SQL OLAP extensions and describe a novel
technique for stacking grouping operations. Our technique results in
linear expansion of grouping sets, greatly reducing the amount of
complexity and resources required to optimize and compute such
queries.
-
Friday 23 October 1998 - research group
meeting, 2pm, DC1331
-
Speaker: Vlado Keselj
-
Snacks: Connie Zhang
Topic: Determining Text Databases to Search in the Internet
Abstract:
The Internet can be seen as a large document collection, i.e., as a
kind
of large database. However, since it is very dynamic a better
approximation would be to treat it as a collection of databases.
Actually, it is a hierarchy of databases with smaller or local databases,
and larger or global databases based upon the smaller ones. The
larger
databases are not really databases, but interfaces to the local ones.
Some problems associated with this situation, are:
-
when a user query comes to a global database, who do we select which local
databases to subquery? (It is not efficient to query all of them.)
-
how many documents to retrieve from each local database, and to merge them?
(collection fusion problem)
These problems are treated in the vector space model.
My presentation is based on a VLDB-98 paper by Meng et al., with the
above
title.
Friday 30 October 1998 - research group
meeting, 2pm, DC1331
Speaker: Forbes Burkowski
Snacks: Vlado Keselj
Topic: Algorithms for Mining Association Rules for Binary Segmentations
of Huge Categorical Databases
(a VLDB paper by Morimoto et al)
Friday 06 November 1998 - research group
meeting, 2pm, DC1331
Speaker: Peter Buhr
Snacks: Forbes Burkowski
Topic: Generalizing Database Access Methods
Data structure libraries, like Leda/STL for C++, provide a
toolkit for constructing both standard and experimental primary-memory
data-structures.
This talk will discuss preliminary work in building the equivalent of
a data structure library but for secondary-memory data-structures, e.g.,
B-Tree, R-Tree, etc. Traditionally, secondary-memory data-structures are
called access methods and each provides some specialized technique for
quickly accessing related data on secondary storage. Unfortunately,
traditional access methods are usually hand-coded from scratch, normally
requiring substantial knowledge of the underlying file system to build
correctly and efficiently. The complexity in building new access methods
impedes experimenting with new specialized data structures for accessing
non-traditional data, e.g., text, images, highly-related data (CAD/CAM),
etc.
Currently, we have generalized one group of access methods: search trees.
The generalizations are the parts of the search tree developers possibly
need to specialize. As well, we have created a small set of specialization
components for each generalization so a developer is not required to write
all the components from scratch. By judiciously selecting components from
the library and only specializing components needed for a new algorithm/data-structure,
a new access method can be created, and subsequently tested, significantly
faster than via the traditional approach.
Friday 13 November 1998 - research group
meeting, 2pm, DC1331
Speaker: Daniel Morales-German
Snacks: Peter Buhr
Topic: Database Techniques for the World Wide Web
This Friday I will be talking about a paper that appear recently
in SIGMOD Record:
Daniela Florescu, Alon Levy, Alberto Mendelson "Database Techniques
for the World Wide Web: A Survey", SIGMOD Record, September 1998.
The article describes the current research activities and directions
in the area of databases and the WWW and focuses in three main areas:
modeling and querying the web; information extraction and integration;
and web site construction and restructuring.
I will present an overview of this paper.
Friday 20 November 1998 - research group
meeting, 2pm, DC1331
Speaker: Hugh Chipman, Department of Statistics
Snacks: Daniel Morales-German
Topic: Tree models: Roots and Recent Branches
Friday 27 November 1998 - research group
meeting, 2pm, DC1331
Speaker: Mariano Consens
Snacks: Ken Salem
Topic: TBA
Friday 04 December 1998 - seminar, 2:00pm,
DC1304
Speaker: Renee Miller, University of Toronto
Title: Managing Heterogeneous Schemas and Data
Abstract:
Schematic heterogeneity arises when information that is represented
as
data under one schema, is represented within the schema (as metadata)
in another. Schematic heterogeneity is an important class of
heterogeneity that arises frequently in integrating legacy data for
data warehousing applications. Traditional query languages and
view
mechanisms are insufficient for reconciling and translating data
between schematically heterogeneous schemas. Higher order query
languages, that permit quantification over schema labels, have been
proposed to permit querying and restructuring of data between
schematically disparate schemas. We extend this work by considering
how these languages can be used in practice with minimal extensions
to
existing query processing engines. Specifically, we consider
the
problem of using higher order views to answer queries in a
heterogeneous environment. We give conditions under which a higher
order view is usable for answering a query and provide query
translation algorithms. We show how our solutions permit schema
browsing and new forms of data independence that are important for
global information systems. This is on-going work with Laura
Haas
and the Garlic Heterogeneous Database group from IBM Almaden Research
Labs.
directions into searching video, image and audio.
-
-
Monday 07 December 1998 - seminar, 10:30pm,
DC1304
-
Speaker: M. Tamer Ozsu, University of Alberta
-
Title: Distributed Objectbase Management Systems, Multimedia,
Interoperability
Abstract:
My current research concentrates on three areas:
(1) development of distributed object database management
systems (ODBMS), (2) multimedia data management,
and (3) interoperability of information systems.
My work on distributed ODBMS is concentrated around the development
of TIGUKAT (means "object" in Inuit)
system. TIGUKAT's object model is purely behavioral in nature
and is uniform. Every concept that
can be
modeled in TIGUKAT is a first-class object with well-defined
behavior. This gives the system extensibility.
Current work involves the development
of a query language and its optimization,
incorporation of the
temporal dimension into the object model, development of
a programming language, and distribution of the
system.
My multimedia research focuses on data
management issues. We are developing an object-oriented multimedia
database system which can support SGML/HyTime compliant documents.
An associated project is the development of
a distributed image database system. Current research involves
the design of a multimedia query model and
language, development of a visual query interface and context-based
indexing and access of images.
Interoperability research goes beyond multidatabase by considering
the inter-operability of information
systems in general. The approach is object-oriented and our
focus is on the use
of object-oriented
characteristics to deal with interoperability problems.
Friday 11 December
- research group meeting, 2pm, DC1331
-
Speaker: Anthony Cox
-
Snacks: Mariano Consens
-
Topic: A Searchable Source Code Repository
-
This talk will describe a prototype source code repository implemented
using the Multitext system. Issues regarding the system structure,
data
collection and querying will be presented.
Wednesday 16 December - MMath thesis presentation,
10:30am, DC1331
-
Speaker: Winnie Min-Min Yeung
Title: Efficient Evaluation of Geometric Expressions with a Single
Plane-Sweep
Wednesday 16 December - MMath thesis presentation,
2pm, DC1331
-
Speaker: Peter Yang
Title: PAPRICCA: Pre-Analyzing, PRedIcate-Based Concurrency Control
Algorithm
Friday 18 December - MMath thesis presentation,
2pm, DC1304
-
Speaker: Ming Lei
Title: Efficient Processing of Spatial Join Qualification