Database Research Group Events

Winter 2001

Thesis defense: Friday, January 5th , 1:30 pm, DC1331
Candidate: Matthew Young-Lai
Title: Text Structure Recognition Using a Region Algebra

Thesis defense: Monday, January 8th , 1:00 pm, DC1331
Candidate: Arunprasad Marathe
Title: Query Processing Techniques for Arrays

DB meeting: Friday, January 12th , 2:00 pm, DC1331
Speaker: M. Tamer Özsu
Topic: This presentation will discuss Rethinking Database Architecture by Surajit Chaudhuri and Gerhard Weikum, which appeared at VLDB 2000.
Snacks: Matthew Young-Lai

X group: Friday, January 19th , 10:00 - 2:30, DC1331
Topic: Joint UofT / UW meeting to discuss research in the area of XML

There will be some 5-10 minute informal presentations to help understand what we at UW and UofT have been doing or are intending to do in this area, from which we hope to foster discussion.

Tentative schedule:

10:00 - 10:30
Overviews of UW and UofT activities and BUL proposals
10:30 - 12:00
Indexing, storage, performance
12:00 - 12:30
12:30 - 1:30
Modeling, constraints, integration
1:30 - 2:30
Queries and transformations

DB seminar: Monday, January 22, 11:00 AM, DC1304
Speaker: Paul Cotton, Microsoft Canada
Title: W3C XML Query WG: A Status Report

DB meeting: Friday, January 26th, 2:00 pm, DC1331
Speaker: Khuzaima Daudjee
Topic: Scalable Transaction Processing through Replication
Snacks: M. Tamer Özsu

DB meeting: Friday, February 2nd, 2:00 pm, DC1331
Speaker: ---
Topic: Discussion of new DB graduate courses.
Snacks: Khuzaima Daudjee

DB seminar: Monday, February 5th, 11:00 AM, DC1302
Speaker: Bettina Kemme, McGill University
Title: Consistent and Efficient Database Replication based on Group Communication 

DB meeting: Friday, February 9th, 2:00 pm, DC1331
Speaker: Glenn Paulley, iAnywhere Solutions (Sybase)
Topic: Normalization of Search Conditions in SQL Queries
Snacks: Ken Salem
Boolean expression simplification is a well-known problem in the history of Computer Science. The problem of determining prime implicates from an arbitrary Boolean expression has been mostly studied in the contexts of hardware design and automated reasoning. While many of the same principles can be applied to the simplification of search conditions in ANSI SQL queries, the richness of its language and SQL's three-valued logic present a number of challenges. We propose a modified version of a matrix-based normalization algorithm suitable for normalizing SQL search conditions in constrained-memory environments. In particular, we describe a set of tradeoffs that enable our algorithm to discover a useful set of prime implicates without requiring a complete conversion of the input condition to a normal form, preventing a combinatorial explosion in the number of terms.

DB meeting: Friday, February 16th, 2:00 pm, DC1331
Speaker: Charlie Clarke
Topic: Replication in Distributed Information Retrieval Systems
Snacks: Glenn Paulley

DB seminar: Wednesday, February 28th, 2:30PM, DC1302
Speaker: Peter Fankhauser, GMD-IPSI
Title: The XML Query Algebra: State and Challenges

DB meeting: Friday, March 2nd, 2:00 pm, DC1331
Speaker: Ivan Bowman
Topic: Fast and Loose: Weak Concurrency Controls, Arbitrary Sensitivity, and Other 
Dirty Secrets of Commercial Systems
Snacks: Charlie Clarke
Database transaction control mechanisms discussed in the research literature typically offer serializability of transactions.  This provides a very clean model, with strong consistency guarantees. In the world of commercial database systems, however, users are often willing to trade off weaker consistency guarantees for better performance.  ANSI SQL formalizes this desire by allowing users to specify isolation levels for the transactions they run. Unfortunately, the ANSI 'standard' leaves much to be desired in its formalization of these isolation levels [1], which do not apply easily to systems not based on locking.  Other researchers [2,3] have proposed formal models that can reason about the SQL isolation levels (with appropriate inference of details not specified by the  standard) and can also apply to non-standard implementations of transaction processing. I will discuss this formalization of isolation levels. Time permitting, I will discuss other correctness/performnce tradeoffs that are made in commercial systems, including the definition of ASENSITIVE (arbitrary sensitivity) cursor semantics.

[1] Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O'Neil, and Patrick O'Neil.
A Critique of ANSI SQL Isolation Levels.  In Proc. SIGMOD'95.

[2] Atul Adya, Barbara Liskov, and Patrick O'Neil. Generalized Isolation Level Definitions. In Proc. ICDE'2000.

[3] Atul Adya. Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transactions  Technical Report MIT/LCS/TR-786. M.I.T. Laboratory for Computer Science,  Cambridge, MA, March 1999. Ph.D. thesis.

DB meeting: Friday, March 9th, 2:00 pm, DC1304 (room  change!)
Speaker: Xuerong Tang
Topic: Structured Document Transformation
Snacks: Ivan Bowman
Structured document transformation is important for XML to serve as the universal data interchange format for the Web.  W3C's XSLT (like TXL) is a powerful transformation language, but it requires detailed programming.  As alternatives, SDT and TT grammars are widely used to specify transformations between programming languages. They have been proposed as descriptive specification techniques for structured document transformation, but they have limited expressive power.  In this talk, I will first  ntroduce and discuss these transformation languages and techniques. I will then propose a new approach based on syntax tree templates, which is both descriptive and expressive as a specification language.  We illustrate how a specification using this approach can be converted to XSLT for execution.

DB seminar: Monday, March 12th, 11:00AM
Speaker: Mitch Cherniack, Brandeis University
Title: User Profiles for Data Recharging

DB meeting: Friday, March 16th, 2:00 pm, DC1331
Speaker: Peter Bumbulis
Topic: I'll be talking about using space-filling curves for multi-dimensional indexing.  While the idea has been around since at least the early 80s there has been recent work on integrating it into a relational engine by the MISTRAL project,
Snacks: Xuerong Tang

DB seminar: Monday, March 19th, 10:30AM, DC1302
Speaker: Per-Ake (Paul) Larson, Microsoft Research
Title: Optimizing Queries Using Materialized Views

DB meeting: Friday, March 23rd, 2:00 pm, DC1331
Speaker: Bin Yao
Topic: Native XML Repository
Snacks: Peter Bumbulis
Since XML became an dominant standard for representing  data and documents on the World Wide Web, lots of  efforts have been made to address the issue of how to  efficiently store and query XML documents. A simple  method to deal with this issue is to store XML documents  in file systems, and parse them whenever needed (e.g.,  using DOM). A better way to resolve this issue is to  use existing mature technologies such as relational  DBMSs or even object-oriented DBMSs. Both of these approaches have advantages and disadvantages. This presentation focuses on an alternative promising  approach -- XML native repository, to deal with this  issue. Some existing related systems (e.g., Lore and  NATIX) are discussed.

DB seminar: Monday, March 26th, 11:00AM
Speaker: Sibel Adali, RPI
Title: A Configurable Application View Storage System

DB meeting: Friday, March 30th, 2:00 pm, DC1331
Speaker: Nick Cercone
Topic: Objects, Concepts, Attributes, Values, States, Actions, Activities, Cases, Causes, Intentions
What makes a mountain a mountain
Snacks: Bin Yao

DB meeting: Friday, April 6th, 2:00 pm, DC1331
Speaker: Matt Huras, IBM Canada
Topic: DB2 UDB System Level Architecture
Snacks: Ken Salem
Matt Huras, lead architect of the DB2 UDB data manager, will provide a 'tour' of the system-level architecture of DB2 Universal Database for UNIX, Windows and OS/2.  The talk will cover DB2's:
  • overall processing model
  • use of process and threads on the various operating systems
  • SMP and MPP exploitation model
  • storage management algorithms
  • recovery algorithms and protocol
  • data and index management algorithms
  • memory management
  • utility design
  • The talk is an adaptation of a presentation originally intended for system administrators with a thirst for a knowledge of how the system works under the covers, and how to configure it optimally.    As such it will contain a liberal 'sprinkling' of information about DB2's tuning 'knobs' and how to tune them.

    Special seminar: Monday, April 16th, 11:00AM, DC 1304
    Speaker: Joerg Sander, UBC
    Title: Hierarchical Clustering for Knowledge Discovery in Databases
    Increasingly large amounts of data, collected from various sources, are stored in databases. This fact has created a need for knowledge discovery and data mining methods, which are not only effective in supporting humans to analyze the data sets, but which are also efficient, i.e., scalable to very large amounts of data.  Analyzing the clustering structure, i.e., identifying meaningful subclasses in a data set is one of the prominent data mining methods.  Clustering can be used either as a stand-alone task or in combination with other data mining algorithms that further operate on detected  clusters.

    In the first part of the talk, I will give a detailed presentation of the recently developed clustering method OPTICS ("Ordering Points To Identify the Clustering Structure") that generates a compact representation of a data set, showing its (potentially hierarchical) clustering structure. Technically, OPTICS is based on region queries, which can be efficiently supported by spatial index structures such as R-trees - if the dimension of the data space is not too high. In very high-dimensional spaces, however, more sophisticated methods to speed-up the processing are needed. In the second part of the talk, I will introduce a technique to scale-up the OPTICS algorithm in these cases by utilizing a fast and effective "data squashing" method, which generates special sufficient statistics on which OPTICS can be based. This procedure yields an impressive speed-up at the cost of only a very small loss of accuracy.

    DB meeting: Friday, April 20th, 2:00 pm, DC1331
    Speaker: Frank Tompa
    Topic: Conceptual Modelling for Structured Document Databases
    Structured documents simultaneously represent two realities: conventional enterprise data and the expression of that data as text. E-R and similar data models have been well-developed for the first task, and grammars are often used for the second, but these two forms are not integrated. The modelling task is further complicated by the existence of multiple versions and variants of many documents, all of which must be maintained by the database system. How should we describe such databases? (Warning: I will offer no solutions.)
    Snacks: Ken Salem

    DB seminar: Monday, April 23rd, 11:00AM, DC1302
    Speaker: Alberto Mendelzon, University of Toronto
    Title: Data (and Links) on the Web

    DB meeting: Friday, April 27th, 10:00am, DC1304
    Speaker: Dan Ford, IBM Almaden Research Center
    Topic: Tempus Fugit ("Time Flies")
    Snacks: Frank Tompa
    Tempus Fugit is a research project at the IBM Almaden Research Center in San Jose, California.  The goal of the project is to create the first of the next generation of personal information management (PIM) systems.  A PIM system ncorporates an electronic calendar, "to-do" list and address book.  Current systems are simple, manually maintained, collections of static information.  A calendar entry, for instance, is little more than a date and time with an associated character string, similarly for to-do list items and address book entries.  The premise behind Tempus Fugit is that this information can be given richer semantic interpretation and automatically processed to make the users of the system much more effective and informed.

    For instance, when a user of the Tempus Fugit calendar schedules a business meeting, the system will automatically search for background information on the companies and meeting participants.  This includes things like brief summaries, phone numbers and the latest headline news.  In the future it will include information extracted from Enterprise Resource Planning (ERP) systems, stock trends, and gossip.  Entries in the Tempus Fugit to-do list differ from conventional systems, in that they are automatically processed to try to complete the task (if possible), or to provide the user with information to make it easier to complete the task on their own.  Similarly, Tempus Fugit address book entries contain information on the relationship between the user and the people identified by the recorded entries (e.g., "this is my mother").

    Tempus Fugit also includes the native ability to track the physical and virtual locations of users.  This information is integrated into all parts of the system.  For instance, meeting entries in the calendar provide the current known location of all participants (if available) and their likely participation and estimated time of arrival for the meeting.  It is also used in the address book where Tempus Fugit shows not only a person's phone numbers, but where they are (and so which number might be the best to use).

    A fundamental design requirement of Tempus Fugit is maximum support for mobile users.  These resource-starved users are seen as benefiting the most from a system automatically working in the background on their behalf.  In addition, Tempus Fugit also provides a vehicle for targeting specialized services for these users.  This makes it possible to transform mobile phones and PDA's into service delivery platforms targeted a different market segments (e.g., Real-Estate agents, Sales professionals, Truckers, etc.).  This support includes full WML/WAP access, location tracking, and automatically generated audio summaries of PIM information in a variety of formats (VoiceXML, MP3, WAV).

    The system is written in Java and runs on IBM's WebSphere Application Server (V3.5) and uses DB2 (V6.1) as it's database system.  It makes extensive use of Enterprise Java Beans(tm) (EJB's), Java Servlets, XML, XSLT and Simple Object Access Protocol (SOAP).  The architecture of the system features a core PIM system whose functionality is extended by external SOAP based "network services".  Tempus Fugit is designed from the ground up to be highly scalable. The system currently supports US and Canadian English, German, Finnish, Swedish and Spanish.