[Please remove <h1>]
Winter 2001
Thesis
defense: |
Friday, January 5th , 1:30 pm, DC1331 |
Candidate: |
Matthew Young-Lai |
Title: |
Text Structure Recognition Using a Region Algebra |
Thesis
defense: |
Monday, January 8th , 1:00 pm, DC1331 |
Candidate: |
Arunprasad Marathe |
Title: |
Query Processing Techniques for Arrays |
DB
meeting: |
Friday, January 12th , 2:00 pm, DC1331 |
Speaker: |
M. Tamer Özsu |
Topic: |
This presentation will discuss
Rethinking
Database Architecture by Surajit Chaudhuri and Gerhard Weikum, which
appeared at VLDB 2000. |
Snacks: |
Matthew Young-Lai |
X
group: |
Friday, January 19th , 10:00 - 2:30, DC1331 |
Topic: |
Joint UofT / UW meeting to discuss research in the area of XML
There will be some 5-10 minute informal presentations to help understand
what we at UW and UofT have been doing or are intending to do in this area,
from which we hope to foster discussion.
Tentative schedule:
-
10:00 - 10:30
-
Overviews of UW and UofT activities and BUL proposals
-
10:30 - 12:00
-
Indexing, storage, performance
-
12:00 - 12:30
-
Lunch
-
12:30 - 1:30
-
Modeling, constraints, integration
-
1:30 - 2:30
-
Queries and transformations
|
DB
meeting: |
Friday, January 26th, 2:00 pm, DC1331 |
Speaker: |
Khuzaima Daudjee |
Topic: |
Scalable Transaction Processing through Replication |
Snacks: |
M. Tamer Özsu |
DB
meeting: |
Friday, February 2nd, 2:00 pm, DC1331 |
Speaker: |
--- |
Topic: |
Discussion of new DB graduate courses. |
Snacks: |
Khuzaima Daudjee |
DB
meeting: |
Friday, February 9th, 2:00 pm, DC1331 |
Speaker: |
Glenn Paulley, iAnywhere Solutions (Sybase) |
Topic: |
Normalization of Search Conditions in SQL Queries |
Snacks: |
Ken Salem |
Abstract: |
|
Boolean expression simplification is a well-known problem in
the history of Computer Science. The problem of determining prime implicates
from an arbitrary Boolean expression has been mostly studied in the contexts
of hardware design and automated reasoning. While many of the same principles
can be applied to the simplification of search conditions in ANSI SQL queries,
the richness of its language and SQL's three-valued logic present a number
of challenges. We propose a modified version of a matrix-based normalization
algorithm suitable for normalizing SQL search conditions in constrained-memory
environments. In particular, we describe a set of tradeoffs that enable
our algorithm to discover a useful set of prime implicates without requiring
a complete conversion of the input condition to a normal form, preventing
a combinatorial explosion in the number of terms.
DB
meeting: |
Friday, February 16th, 2:00 pm, DC1331 |
Speaker: |
Charlie Clarke |
Topic: |
Replication in Distributed Information Retrieval Systems |
Snacks: |
Glenn Paulley |
DB
meeting: |
Friday, March 2nd, 2:00 pm, DC1331 |
Speaker: |
Ivan Bowman |
Topic: |
Fast and Loose: Weak Concurrency Controls, Arbitrary Sensitivity, and
Other
Dirty Secrets of Commercial Systems |
Snacks: |
Charlie Clarke |
Abstract: |
|
Database transaction control mechanisms discussed in the research
literature typically offer serializability of transactions. This
provides a very clean model, with strong consistency guarantees. In the
world of commercial database systems, however, users are often willing
to trade off weaker consistency guarantees for better performance.
ANSI SQL formalizes this desire by allowing users to specify isolation
levels for the transactions they run. Unfortunately, the ANSI 'standard'
leaves much to be desired in its formalization of these isolation levels
[1], which do not apply easily to systems not based on locking. Other
researchers [2,3] have proposed formal models that can reason about the
SQL isolation levels (with appropriate inference of details not specified
by the standard) and can also apply to non-standard implementations
of transaction processing. I will discuss this formalization of isolation
levels. Time permitting, I will discuss other correctness/performnce tradeoffs
that are made in commercial systems, including the definition of ASENSITIVE
(arbitrary sensitivity) cursor semantics.
[1] Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O'Neil,
and Patrick O'Neil.
A
Critique of ANSI SQL Isolation Levels. In Proc. SIGMOD'95.
[2] Atul Adya, Barbara Liskov, and Patrick O'Neil. Generalized
Isolation Level Definitions. In Proc. ICDE'2000.
[3] Atul Adya. Weak
Consistency: A Generalized Theory and Optimistic Implementations for Distributed
Transactions Technical Report MIT/LCS/TR-786. M.I.T. Laboratory
for Computer Science, Cambridge, MA, March 1999. Ph.D. thesis.
DB
meeting: |
Friday, March 9th, 2:00 pm, DC1304 |
Speaker: |
Xuerong Tang |
Topic: |
Structured Document Transformation |
Snacks: |
Ivan Bowman |
Abstract: |
|
Structured document transformation is important for XML to
serve as the universal data interchange format for the Web. W3C's
XSLT (like TXL) is a powerful transformation language, but it requires
detailed programming. As alternatives, SDT and TT grammars are widely
used to specify transformations between programming languages. They have
been proposed as descriptive specification techniques for structured document
transformation, but they have limited expressive power. In this talk,
I will first ntroduce and discuss these transformation languages
and techniques. I will then propose a new approach based on syntax tree
templates, which is both descriptive and expressive as a specification
language. We illustrate how a specification using this approach can
be converted to XSLT for execution.
DB
meeting: |
Friday, March 16th, 2:00 pm, DC1331 |
Speaker: |
Peter Bumbulis |
Topic: |
I'll be talking about using space-filling curves for multi-dimensional
indexing. While the idea has been around since at least the early
80s there has been recent work on integrating it into a relational engine
by the MISTRAL project, http://mistral.in.tum.de/. |
Snacks: |
Xuerong Tang |
DB
meeting: |
Friday, March 23rd, 2:00 pm, DC1331 |
Speaker: |
Bin Yao |
Topic: |
Native XML Repository |
Snacks: |
Peter Bumbulis |
Abstract: |
|
Since XML became an dominant standard for representing
data and documents on the World Wide Web, lots of efforts have been
made to address the issue of how to efficiently store and query XML
documents. A simple method to deal with this issue is to store XML
documents in file systems, and parse them whenever needed (e.g.,
using DOM). A better way to resolve this issue is to use existing
mature technologies such as relational DBMSs or even object-oriented
DBMSs. Both of these approaches have advantages and disadvantages. This
presentation focuses on an alternative promising approach -- XML
native repository, to deal with this issue. Some existing related
systems (e.g., Lore and NATIX) are discussed.
DB
meeting: |
Friday, March 30th, 2:00 pm, DC1331 |
Speaker: |
Nick Cercone |
Topic: |
Objects, Concepts, Attributes, Values, States, Actions, Activities,
Cases, Causes, Intentions
or
What makes a mountain a mountain |
Snacks: |
Bin Yao |
DB
meeting: |
Friday, April 6th, 2:00 pm, DC1331 |
Speaker: |
Matt Huras, IBM Canada |
Topic: |
DB2 UDB System Level Architecture |
Snacks: |
Ken Salem |
Abstract: |
|
Matt Huras, lead architect of the DB2 UDB data manager, will
provide a 'tour' of the system-level architecture of DB2 Universal Database
for UNIX, Windows and OS/2. The talk will cover DB2's:
overall processing model
use of process and threads on the various operating systems
SMP and MPP exploitation model
storage management algorithms
recovery algorithms and protocol
data and index management algorithms
memory management
utility design
The talk is an adaptation of a presentation originally intended for system
administrators with a thirst for a knowledge of how the system works under
the covers, and how to configure it optimally. As such
it will contain a liberal 'sprinkling' of information about DB2's tuning
'knobs' and how to tune them.
Special
seminar: |
Monday, April 16th, 11:00AM, DC 1304 |
Speaker: |
Joerg Sander, UBC |
Title: |
Hierarchical Clustering for Knowledge Discovery in Databases |
Abstract: |
|
Increasingly large amounts of data, collected from various
sources, are stored in databases. This fact has created a need for knowledge
discovery and data mining methods, which are not only effective in supporting
humans to analyze the data sets, but which are also efficient, i.e., scalable
to very large amounts of data. Analyzing the clustering structure,
i.e., identifying meaningful subclasses in a data set is one of the prominent
data mining methods. Clustering can be used either as a stand-alone
task or in combination with other data mining algorithms that further operate
on detected clusters.
In the first part of the talk, I will give a detailed presentation of
the recently developed clustering method OPTICS ("Ordering Points To Identify
the Clustering Structure") that generates a compact representation of a
data set, showing its (potentially hierarchical) clustering structure.
Technically, OPTICS is based on region queries, which can be efficiently
supported by spatial index structures such as R-trees - if the dimension
of the data space is not too high. In very high-dimensional spaces, however,
more sophisticated methods to speed-up the processing are needed. In the
second part of the talk, I will introduce a technique to scale-up the OPTICS
algorithm in these cases by utilizing a fast and effective "data squashing"
method, which generates special sufficient statistics on which OPTICS can
be based. This procedure yields an impressive speed-up at the cost of only
a very small loss of accuracy.
DB
meeting: |
Friday, April 20th, 2:00 pm, DC1331 |
Speaker: |
Frank Tompa |
Topic: |
Conceptual Modelling for Structured Document Databases
Structured documents simultaneously represent two realities: conventional
enterprise data and the expression of that data as text. E-R and similar
data models have been well-developed for the first task, and grammars are
often used for the second, but these two forms are not integrated. The
modelling task is further complicated by the existence of multiple versions
and variants of many documents, all of which must be maintained by the
database system. How should we describe such databases? (Warning: I will
offer no solutions.) |
Snacks: |
Ken Salem |
DB
meeting: |
Friday, April 27th, 10:00am, DC1304 |
Speaker: |
Dan Ford, IBM Almaden Research Center |
Topic: |
Tempus Fugit ("Time Flies") |
Snacks: |
Frank Tompa |
Abstract: |
|
Tempus Fugit is a research project at the IBM Almaden Research
Center in San Jose, California. The goal of the project is to create
the first of the next generation of personal information management (PIM)
systems. A PIM system ncorporates an electronic calendar, "to-do"
list and address book. Current systems are simple, manually maintained,
collections of static information. A calendar entry, for instance,
is little more than a date and time with an associated character string,
similarly for to-do list items and address book entries. The premise
behind Tempus Fugit is that this information can be given richer semantic
interpretation and automatically processed to make the users of the system
much more effective and informed.
For instance, when a user of the Tempus Fugit calendar schedules a business
meeting, the system will automatically search for background information
on the companies and meeting participants. This includes things like
brief summaries, phone numbers and the latest headline news. In the
future it will include information extracted from Enterprise Resource Planning
(ERP) systems, stock trends, and gossip. Entries in the Tempus Fugit
to-do list differ from conventional systems, in that they are automatically
processed to try to complete the task (if possible), or to provide the
user with information to make it easier to complete the task on their own.
Similarly, Tempus Fugit address book entries contain information on the
relationship between the user and the people identified by the recorded
entries (e.g., "this is my mother").
Tempus Fugit also includes the native ability to track the physical
and virtual locations of users. This information is integrated into
all parts of the system. For instance, meeting entries in the calendar
provide the current known location of all participants (if available) and
their likely participation and estimated time of arrival for the meeting.
It is also used in the address book where Tempus Fugit shows not only a
person's phone numbers, but where they are (and so which number might be
the best to use).
A fundamental design requirement of Tempus Fugit is maximum support
for mobile users. These resource-starved users are seen as benefiting
the most from a system automatically working in the background on their
behalf. In addition, Tempus Fugit also provides a vehicle for targeting
specialized services for these users. This makes it possible to transform
mobile phones and PDA's into service delivery platforms targeted a different
market segments (e.g., Real-Estate agents, Sales professionals, Truckers,
etc.). This support includes full WML/WAP access, location tracking,
and automatically generated audio summaries of PIM information in a variety
of formats (VoiceXML, MP3, WAV).
The system is written in Java and runs on IBM's WebSphere Application
Server (V3.5) and uses DB2 (V6.1) as it's database system. It makes
extensive use of Enterprise Java Beans(tm) (EJB's), Java Servlets, XML,
XSLT and Simple Object Access Protocol (SOAP). The architecture of
the system features a core PIM system whose functionality is extended by
external SOAP based "network services". Tempus Fugit is designed
from the ground up to be highly scalable. The system currently supports
US and Canadian English, German, Finnish, Swedish and Spanish.
Links:
-
EJB maker
This is a publicly available tool developed by the Tempus Fugit team
to create multiple Enterprise Java Beans from a single schema description.
The project uses it to automatically generate and maintain 50,000 lines
of Java(tm). It can be downloaded from IBM's alphaWorks website at
http://www.alphaworks.ibm.com/tech/ejbmaker
-
RB Manager
This is a publicly available tool developed by the Tempus Fugit team
to manage its National Language Support (NLS). It provides support
for consistent editing of multiple Java (tm) "resource bundles".
The tool lets the Tempus Fugit team support 6 different language version
with little overhead. It can be downloaded from IBM's alphaWorks
website at http://www.alphaworks.ibm.com/tech/rbmanager
-
"Harnessing internationalization" Jared Jackson, IBM Almaden Research
Center
-
"Abstracting the interface", Martin Gerlach, IBM Almaden Research Center.
This is an award winning paper available on IBM's developerWorks web
site that describes how we use XSLT to create the various interfaces for
Tempus Fugit.