Thesis defense: | Friday, January 5th , 1:30 pm, DC1331 |
Candidate: | Matthew Young-Lai |
Title: | Text Structure Recognition Using a Region Algebra |
Thesis defense: | Monday, January 8th , 1:00 pm, DC1331 |
Candidate: | Arunprasad Marathe |
Title: | Query Processing Techniques for Arrays |
DB meeting: | Friday, January 12th , 2:00 pm, DC1331 |
Speaker: | M. Tamer Özsu |
Topic: | This presentation will discuss Rethinking Database Architecture by Surajit Chaudhuri and Gerhard Weikum, which appeared at VLDB 2000. |
Snacks: | Matthew Young-Lai |
X group: | Friday, January 19th , 10:00 - 2:30, DC1331 |
Topic: | Joint UofT / UW meeting to discuss research in the area of XML
There will be some 5-10 minute informal presentations to help understand what we at UW and UofT have been doing or are intending to do in this area, from which we hope to foster discussion. Tentative schedule:
|
DB seminar: | Monday, January 22, 11:00 AM, DC1304 |
Speaker: | Paul Cotton, Microsoft Canada |
Title: | W3C XML Query WG: A Status Report |
DB meeting: | Friday, January 26th, 2:00 pm, DC1331 |
Speaker: | Khuzaima Daudjee |
Topic: | Scalable Transaction Processing through Replication |
Snacks: | M. Tamer Özsu |
DB meeting: | Friday, February 2nd, 2:00 pm, DC1331 |
Speaker: | --- |
Topic: | Discussion of new DB graduate courses. |
Snacks: | Khuzaima Daudjee |
DB seminar: | Monday, February 5th, 11:00 AM, DC1302 |
Speaker: | Bettina Kemme, McGill University |
Title: | Consistent and Efficient Database Replication based on Group Communication |
DB meeting: | Friday, February 9th, 2:00 pm, DC1331 |
Speaker: | Glenn Paulley, iAnywhere Solutions (Sybase) |
Topic: | Normalization of Search Conditions in SQL Queries |
Snacks: | Ken Salem |
Abstract: |
Boolean expression simplification is a well-known problem in the history of Computer Science. The problem of determining prime implicates from an arbitrary Boolean expression has been mostly studied in the contexts of hardware design and automated reasoning. While many of the same principles can be applied to the simplification of search conditions in ANSI SQL queries, the richness of its language and SQL's three-valued logic present a number of challenges. We propose a modified version of a matrix-based normalization algorithm suitable for normalizing SQL search conditions in constrained-memory environments. In particular, we describe a set of tradeoffs that enable our algorithm to discover a useful set of prime implicates without requiring a complete conversion of the input condition to a normal form, preventing a combinatorial explosion in the number of terms.
DB meeting: | Friday, February 16th, 2:00 pm, DC1331 |
Speaker: | Charlie Clarke |
Topic: | Replication in Distributed Information Retrieval Systems |
Snacks: | Glenn Paulley |
DB seminar: | Wednesday, February 28th, 2:30PM, DC1302 |
Speaker: | Peter Fankhauser, GMD-IPSI |
Title: | The XML Query Algebra: State and Challenges |
DB meeting: | Friday, March 2nd, 2:00 pm, DC1331 |
Speaker: | Ivan Bowman |
Topic: | Fast and Loose: Weak Concurrency Controls, Arbitrary Sensitivity, and
Other
Dirty Secrets of Commercial Systems |
Snacks: | Charlie Clarke |
Abstract: |
Database transaction control mechanisms discussed in the research literature typically offer serializability of transactions. This provides a very clean model, with strong consistency guarantees. In the world of commercial database systems, however, users are often willing to trade off weaker consistency guarantees for better performance. ANSI SQL formalizes this desire by allowing users to specify isolation levels for the transactions they run. Unfortunately, the ANSI 'standard' leaves much to be desired in its formalization of these isolation levels [1], which do not apply easily to systems not based on locking. Other researchers [2,3] have proposed formal models that can reason about the SQL isolation levels (with appropriate inference of details not specified by the standard) and can also apply to non-standard implementations of transaction processing. I will discuss this formalization of isolation levels. Time permitting, I will discuss other correctness/performnce tradeoffs that are made in commercial systems, including the definition of ASENSITIVE (arbitrary sensitivity) cursor semantics.[1] Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O'Neil, and Patrick O'Neil.
A Critique of ANSI SQL Isolation Levels. In Proc. SIGMOD'95.[2] Atul Adya, Barbara Liskov, and Patrick O'Neil. Generalized Isolation Level Definitions. In Proc. ICDE'2000.
[3] Atul Adya. Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transactions Technical Report MIT/LCS/TR-786. M.I.T. Laboratory for Computer Science, Cambridge, MA, March 1999. Ph.D. thesis.
DB meeting: | Friday, March 9th, 2:00 pm, DC1304 |
Speaker: | Xuerong Tang |
Topic: | Structured Document Transformation |
Snacks: | Ivan Bowman |
Abstract: |
Structured document transformation is important for XML to serve as the universal data interchange format for the Web. W3C's XSLT (like TXL) is a powerful transformation language, but it requires detailed programming. As alternatives, SDT and TT grammars are widely used to specify transformations between programming languages. They have been proposed as descriptive specification techniques for structured document transformation, but they have limited expressive power. In this talk, I will first ntroduce and discuss these transformation languages and techniques. I will then propose a new approach based on syntax tree templates, which is both descriptive and expressive as a specification language. We illustrate how a specification using this approach can be converted to XSLT for execution.
DB seminar: | Monday, March 12th, 11:00AM |
Speaker: | Mitch Cherniack, Brandeis University |
Title: | User Profiles for Data Recharging |
DB meeting: | Friday, March 16th, 2:00 pm, DC1331 |
Speaker: | Peter Bumbulis |
Topic: | I'll be talking about using space-filling curves for multi-dimensional indexing. While the idea has been around since at least the early 80s there has been recent work on integrating it into a relational engine by the MISTRAL project, http://mistral.in.tum.de/. |
Snacks: | Xuerong Tang |
DB seminar: | Monday, March 19th, 10:30AM, DC1302 |
Speaker: | Per-Ake (Paul) Larson, Microsoft Research |
Title: | Optimizing Queries Using Materialized Views |
DB meeting: | Friday, March 23rd, 2:00 pm, DC1331 |
Speaker: | Bin Yao |
Topic: | Native XML Repository |
Snacks: | Peter Bumbulis |
Abstract: |
Since XML became an dominant standard for representing data and documents on the World Wide Web, lots of efforts have been made to address the issue of how to efficiently store and query XML documents. A simple method to deal with this issue is to store XML documents in file systems, and parse them whenever needed (e.g., using DOM). A better way to resolve this issue is to use existing mature technologies such as relational DBMSs or even object-oriented DBMSs. Both of these approaches have advantages and disadvantages. This presentation focuses on an alternative promising approach -- XML native repository, to deal with this issue. Some existing related systems (e.g., Lore and NATIX) are discussed.
DB seminar: | Monday, March 26th, 11:00AM |
Speaker: | Sibel Adali, RPI |
Title: | A Configurable Application View Storage System |
DB meeting: | Friday, March 30th, 2:00 pm, DC1331 |
Speaker: | Nick Cercone |
Topic: | Objects, Concepts, Attributes, Values, States, Actions, Activities,
Cases, Causes, Intentions
or What makes a mountain a mountain |
Snacks: | Bin Yao |
DB meeting: | Friday, April 6th, 2:00 pm, DC1331 |
Speaker: | Matt Huras, IBM Canada |
Topic: | DB2 UDB System Level Architecture |
Snacks: | Ken Salem |
Abstract: |
Matt Huras, lead architect of the DB2 UDB data manager, will provide a 'tour' of the system-level architecture of DB2 Universal Database for UNIX, Windows and OS/2. The talk will cover DB2's:The talk is an adaptation of a presentation originally intended for system administrators with a thirst for a knowledge of how the system works under the covers, and how to configure it optimally. As such it will contain a liberal 'sprinkling' of information about DB2's tuning 'knobs' and how to tune them.overall processing model use of process and threads on the various operating systems SMP and MPP exploitation model storage management algorithms recovery algorithms and protocol data and index management algorithms memory management utility design
Special seminar: | Monday, April 16th, 11:00AM, DC 1304 |
Speaker: | Joerg Sander, UBC |
Title: | Hierarchical Clustering for Knowledge Discovery in Databases |
Abstract: |
Increasingly large amounts of data, collected from various sources, are stored in databases. This fact has created a need for knowledge discovery and data mining methods, which are not only effective in supporting humans to analyze the data sets, but which are also efficient, i.e., scalable to very large amounts of data. Analyzing the clustering structure, i.e., identifying meaningful subclasses in a data set is one of the prominent data mining methods. Clustering can be used either as a stand-alone task or in combination with other data mining algorithms that further operate on detected clusters.In the first part of the talk, I will give a detailed presentation of the recently developed clustering method OPTICS ("Ordering Points To Identify the Clustering Structure") that generates a compact representation of a data set, showing its (potentially hierarchical) clustering structure. Technically, OPTICS is based on region queries, which can be efficiently supported by spatial index structures such as R-trees - if the dimension of the data space is not too high. In very high-dimensional spaces, however, more sophisticated methods to speed-up the processing are needed. In the second part of the talk, I will introduce a technique to scale-up the OPTICS algorithm in these cases by utilizing a fast and effective "data squashing" method, which generates special sufficient statistics on which OPTICS can be based. This procedure yields an impressive speed-up at the cost of only a very small loss of accuracy.
DB meeting: | Friday, April 20th, 2:00 pm, DC1331 |
Speaker: | Frank Tompa |
Topic: | Conceptual Modelling for Structured Document Databases
Structured documents simultaneously represent two realities: conventional enterprise data and the expression of that data as text. E-R and similar data models have been well-developed for the first task, and grammars are often used for the second, but these two forms are not integrated. The modelling task is further complicated by the existence of multiple versions and variants of many documents, all of which must be maintained by the database system. How should we describe such databases? (Warning: I will offer no solutions.) |
Snacks: | Ken Salem |
DB seminar: | Monday, April 23rd, 11:00AM, DC1302 |
Speaker: | Alberto Mendelzon, University of Toronto |
Title: | Data (and Links) on the Web |
DB meeting: | Friday, April 27th, 10:00am, DC1304 |
Speaker: | Dan Ford, IBM Almaden Research Center |
Topic: | Tempus Fugit ("Time Flies") |
Snacks: | Frank Tompa |
Abstract: |
Tempus Fugit is a research project at the IBM Almaden Research Center in San Jose, California. The goal of the project is to create the first of the next generation of personal information management (PIM) systems. A PIM system ncorporates an electronic calendar, "to-do" list and address book. Current systems are simple, manually maintained, collections of static information. A calendar entry, for instance, is little more than a date and time with an associated character string, similarly for to-do list items and address book entries. The premise behind Tempus Fugit is that this information can be given richer semantic interpretation and automatically processed to make the users of the system much more effective and informed.For instance, when a user of the Tempus Fugit calendar schedules a business meeting, the system will automatically search for background information on the companies and meeting participants. This includes things like brief summaries, phone numbers and the latest headline news. In the future it will include information extracted from Enterprise Resource Planning (ERP) systems, stock trends, and gossip. Entries in the Tempus Fugit to-do list differ from conventional systems, in that they are automatically processed to try to complete the task (if possible), or to provide the user with information to make it easier to complete the task on their own. Similarly, Tempus Fugit address book entries contain information on the relationship between the user and the people identified by the recorded entries (e.g., "this is my mother").
Tempus Fugit also includes the native ability to track the physical and virtual locations of users. This information is integrated into all parts of the system. For instance, meeting entries in the calendar provide the current known location of all participants (if available) and their likely participation and estimated time of arrival for the meeting. It is also used in the address book where Tempus Fugit shows not only a person's phone numbers, but where they are (and so which number might be the best to use).
A fundamental design requirement of Tempus Fugit is maximum support for mobile users. These resource-starved users are seen as benefiting the most from a system automatically working in the background on their behalf. In addition, Tempus Fugit also provides a vehicle for targeting specialized services for these users. This makes it possible to transform mobile phones and PDA's into service delivery platforms targeted a different market segments (e.g., Real-Estate agents, Sales professionals, Truckers, etc.). This support includes full WML/WAP access, location tracking, and automatically generated audio summaries of PIM information in a variety of formats (VoiceXML, MP3, WAV).
The system is written in Java and runs on IBM's WebSphere Application Server (V3.5) and uses DB2 (V6.1) as it's database system. It makes extensive use of Enterprise Java Beans(tm) (EJB's), Java Servlets, XML, XSLT and Simple Object Access Protocol (SOAP). The architecture of the system features a core PIM system whose functionality is extended by external SOAP based "network services". Tempus Fugit is designed from the ground up to be highly scalable. The system currently supports US and Canadian English, German, Finnish, Swedish and Spanish.
Links: