Database Research Group Events

Fall 2004

Note: Events of interest to the Database Research Group are posted to the uw.cs.database newsgroup and are mailed to the dbgroup mailing lists: db-faculty (for DB group faculty), db-grads (for DB group graduate students), and db-friends (for DB group alumni, visitors, and friends). If you wish to subscribe to one of these lists, send mail to majordomo@db with "subscribe <list>" in the message body, where <list> is the list you wish to subscribe to.  For example, use "subscribe db-friends" to subscribe to the db-friends list. To unsubscribe, send "unsubscribe <list>" to the same address.
DB group meetings
The DB group meets most Friday afternoons at 2pm, usually in DC1331. See the list of current events for times and locations of upcoming meetings. Each meeting lasts for an hour and features an informal presentation by one of the members of the group. Everyone is welcome to attend. These talks are intended to raise questions and to stimulate discussion rather than being polished presentations of research results. Speakers are determined using a rotating speaker list, which can be found on the DB group meeting page
DB seminar series
The DB seminar series features visiting speakers. These seminars are more-or-less monthly, and are usually scheduled on Monday mornings at 11am. See the list of current events for times and locations of upcoming seminars. The full schedule can be found on the DB seminar series page.

Recent and Upcoming Events

DB meeting: Friday, September 17, 2:00pm, DC1331
Topic: Kickoff meeting.

DB Seminar: Monday, September 20, 11:00 am, DC1304
Speaker: C. Mohan, IBM Almaden Research Lab
Title: DBCache: A Project on Database Caching Support for Web Applications

DB meeting: Friday, September 24, 2:00pm, DC1331
Speaker: Grant Weddell
Title: The DLF Dialects of Description Logic
Abstract: Description logics (DLs) are an important family of model logics that have been used to formally capture object-relational schema and UML class diagrams. They also underlie recent efforts to define the Ontology Web Language (OWL), a W3C standard currently under development for capturing structured knowledge of web services. In this talk, I'll give an overview of DLs and their relevance to the database community, with a particular focus on dialects that incorporate features or attributes. I'll conclude with an additional overview of recent work on incorporating functional dependencies in these dialects.

DB meeting: Friday, October 1, 2:00pm, DC1331
Speaker: David DeHaan
Title: Views-on-Views in SQL Server

Indexed (a.k.a. materialized) views are used to speed up query processing of data-intensive queries, and most commercial relational database systems now provide some level of support for them. Like indices over base relations, materialized views are used transparently by the query optimizer to speed up queries written over the vocabulary of base relations or logical views. To minimize the online overhead of maintaining a view extent, materialized view definitions in SQL databases are commonly restricted to a class of simple aggregation expressions over joined base relations (called SPJG, or Select- Project-Join-Groupby) because efficient incremental maintenance strategies are known to exist for this class. Systems that support more complex view definitions typically limit them to offline maintenance.

In this talk I will describe work done this summer with Paul Larson to extend SQL Server to support "views-on-views", which are SPJG indexed views that reference other indexed views as if they were base relations. This class of views is interesting because it maintains properties of incremental maintenance, yet it increases the expressive power of the view language to encompass several types of common non-SPJG queries. After exploring some of the advantages of supporting views-on-views, I will describe at a high level the modifications needed to the SQL Server query optimizer to utilize views-on- views for query processing. Finally, I will briefly explore how allowing indexed views to contain Outer Join would extend the power of the view language even further.

DB Seminar: Monday, October 4, 4:30pm, DC1350
Speaker: Jim Gray, Microsoft Bay Area Research Center
Title: On-Line Science: The World-Wide Telescope as a Prototype for the New Computational Science

DB meeting: Friday, October 15, 2:00pm, DC1331
Speaker: Jane Xiao
Title: An Overview of DL Dialects DLFAD and DLFDreg

Description Logics (DL) play an important role in many applications, say software engineering, medical informatics, digital libraries, web-based information systems, databases, etc. In databases, take query optimization for example, query rewrites, duplication elimination and many other operations can be abstracted as DL logical implication problems. In this talk, I will give an overview of DL dialects DLFAD and DLFDreg. These dialects are proposed by David Toman and Grant Weddell in their papers: "Attribute Inversion in Description Logics with Path Functional Dependencies" and "On Reasoning about Structural Equality in XML: A Description Logic Approach".

The main contributions of DLFAD are: (1) it provides both attribute inversion and path functional dependency in the dialect; (2) it proves that in general case the implication problem of DLFAD is undecidable by reducing an unrestricted tiling problem to a DLFAD implication problem; (3) it proves that the implication problem is decidable under coherent condition by reducing it to the satisfiability problem of the Ackerman formulae.

The main property of DLFDreg is that it uses regular expressions to express functional dependencies by possibly infinite sets of feature paths. This can be used to reason about structural equality in XML. The paper also gives the proof that the implication problem of DLFDreg is decidable by reducing the problem to the satisfiability problem of Datalogns with negation.

Also I will discuss the relations between DLFAD and DLFDreg as well as some possible extension work based on them.

DB Seminar: Monday, October 18, 11:00 am, DC1304
Speaker: Gustavo Alonso, ETH Zürich
Title: Database replication for commodity database services

Seminar: Tuesday, October 19, 11:00am, DC1304
Speaker: Brian Cooper, Georgia Institute of Technology
Title: Using information retrieval techniques to route queries in an InfoBeacons network
Abstract: We present the InfoBeacons system, in which a peer-to-peer network of beacons cooperates to route queries to the best information sources. The routing in our system uses techniques adapted from information retrieval. We examine routing at two levels. First, each beacon is assigned several sources and routes queries to those sources. Many sources are unwilling to provide more cooperation than simple searching, and we must adapt traditional information retrieval techniques to choose the best sources despite this lack of cooperation. Second, beacons route queries to other beacons using techniques similar to those for routing queries to sources. We examine alternative architectures for routing queries between beacons. Results of experiments using a beacon network to search 1,000 information sources demonstrates how our techniques can be used to efficiently route queries; for example, our techniques require contacting up to 70 percent fewer sources than random walk techniques.

DB meeting: Friday, October 22, 2:00pm, DC1331
Speaker: Xuhui Li
Title: The Physical Data Placement Problem in Shared-Disk Relational Database Systems

Magnetic disks are still bottlenecks of today's database systems. To improve disk I/O efficiency and hence system performance, two physical data placement schemes, data declustering and data clustering, are used by database system administrators. Data objects, such as tables, indexes and materialized views, can be striped across multiple disks in order to maximize intra-object I/O parallelism (declustering), or can be isolated into separate disks or disk sets in order to maximize inter-object I/O parallelism (clustering).

In this presentation, we will introduce our study on the performance impacts of these different data placement schemes. In our study, we examined two representative data layouts, partitioning and full-striping. We studied some important aspects of disk I/O, such as disk seeks and disk read-ahead, and how they affect I/O efficiency under the two data layouts. We also studied the effects of a few environmental factors, such as the number of concurrent queries and the degree of I/O parallelism, and their effects on the relative performance of different data layouts. Our experiments show that physical data placement has a significant impact on disk I/O efficiency and hence system performance. The relative performance of different data layouts highly depends on workload and other environment factors.

DB meeting: Friday, October 29, 2:00pm, DC1331
Speaker: Ken Salem
Topic: I plan to discuss view-based database access controls, with relational and XML examples. I'll be drawing material from a couple of papers from this year's SIGMOD conference:
  • Shariq Rizvi, Alberto O. Mendelzon, S. Sudarshan, Prasan Roy: Extending Query Rewriting Techniques for Fine-Grained Access Control. SIGMOD Conference 2004: 551-562
  • Wenfei Fan, Chee Yong Chan, Minos N. Garofalakis: Secure XML Querying with Security Views. SIGMOD Conference 2004: 587-598

DB meeting: Friday, November 5, 2:00pm, DC1331
Speaker: Dan Farrar
Title: Randomized Databases and Query Sets

As the complexity of database systems, and in particular of query languages, continues to grow, it is becoming increasingly labour-intensive to develop tests that adequately cover the functionality of these systems. Such testing is vital for commercial and open-source products to ensure correctness and stability. An effective (and cost-effective) solution is to stochastically generate database instances and query workloads, and test their operation. These techniques are also useful for evaluating the performance of systems and algorithms, especially in query processing.

I will give an overview of existing methods for generating random databases and queries, focusing on the generation of SQL statements. I will discuss our experiences at iAnywhere with two types of randomized database and query generation, and compare papers published by Oracle and Microsoft describing their experiences with similar methods.

DB meeting: Friday, November 12, 2:00pm, DC1331
Speaker: Frank Tompa
Topic: I will present a proposal for storing and querying archived data, as presented primarily in the following paper:

DB meeting: Friday, November 19, 2:00pm, DC1331
Speaker: Ning Zhang
Title: BlossomTree: Evaluating XPaths in FLWOR Expressions

Efficient evaluation of path expressions has been studied extensively. However, evaluating more complex FLWOR expressions that contain multiple path expressions has not been well studied. In this talk, I will present a pattern matching approach, called BlossomTree, to evaluating a FLWOR expression that contains correlated path expressions. BlossomTree is a formalism to capture the semantics of the path expressions and their correlations.

We propose a general algebraic framework (abstract data types and logical operators) to evaluate BlossomTree pattern matching that facilitates efficient evaluation and experimentation. We design efficient data structures and algorithms to implement the abstract data types and logical operators. Our experimental studies demonstrate that the BlossomTree approach can generate highly efficient query plans in different environments.

DB Seminar: Monday, November 22, 11:00 am, DC1304
Speaker: Moshe Vardi, Rice University
Title: A Call to Regularity

Seminar: Tuesday, November 23, 10:30am, DC1304
Speaker: Peter Patel-Schneider, Bell Labs Research
Title: What is OWL (and why should I give a hoot)?

OWL is the new ontology language produced by the W3C Web Ontology Working Group. As such, it is poised to be a major formalism for the design and dissemination of ontology information, particularly in the Semantic Web, a part of the World-Wide Web. OWL has influences from several communities, including the RDF community, the Description Logic community, and the frame community. These influences have resulted in a wide variety of requirements on OWL, several of which appear to be conflicting. OWL contains innovative solutions to several of these apparent conflicts but other conflicts have meant that it has not been possible to satisfy all the desired requirements for OWL.

In this talk I will describe the design and development of OWL concentrating on what makes OWL important, the relationship of OWL to other efforts, the innovative solutions that were required in its design, and the impact of the conflicting requirements on OWL.

DB meeting: Friday, November 26, 2:00pm, DC1331
Speaker: Gord Cormack
Title: SPAM Filter Evaluation at TREC

The purpose of TREC is to provide standard procedures and archival data for the evaluation of the effectiveness of tools for various information retrieval tasks. For TREC 2005 one such task will be spam filtering. A spam filter is an on-line binary classifier that identifies each incoming email message as spam or not.

Measuring the effectiveness of spam filters presents several challenges. The standard of accuracy required for acceptable performance is quite high. The trade-off between false positives (rejected legitimate messages) and false negatives (accepted spam) must be evaluated. The learning characteristics of the filter must be measured. Privacy considerations present a major challenge in constructing an archival test collection.

In this talk I will present the preliminary guidelines and evaluation methods for the 2005 TREC spam evaluation task, for which I am coordinator.

DB meeting: Friday, December 3, 2:00pm, DC1331
Speaker: Ani Nica
Title: Bloom filters: a survey of their usage in the database management systems

This talk is intended to be a survey on bloom filters usage in the database management systems. The talk includes topics such as integration of the cost-based placement of the bloom filters in a query optimizer, usage of the bloom filters to improve execution of the access plans, and bloom filters representation of sets of cached queries. In particular, I will talk about the usage of the bloom filters in the SQL Anywhere Studio, a product of iAnywhere Solutions.

DB Seminar: Monday, December 6, 11:00 am, DC1304 Canceled
Speaker: Stanley B. Zdonik, Brown University
Title: Stream Processing in the Aurora/Borealis System

DB meeting: Friday, December 10, 2:00pm, DC1331
Speaker: Amir Chinaei
Title: Towards Decentralized Access Control Administration

Access Control Administration is a mechanism to set permissions for particular users to access particular data. There are several reasons to decentralize this administration: dynamic changes, different contexts, sharing data, and delegation of the administration. Some existing systems allow some degree of decentralization. None seems fully adequate. This talk presents recent work towards decentralization of access control administration in which creation time policies are defined. Others have introduced the idea of distinguishing a virtual access control matrix from the actual one. This allows them to formulate and reason about access control inheritance polices, conflict resolution policies, and default policies. We build on this foundation to describe policies that dictate what initial access control configuration is to be set at the time that an object is created. Some examples of creation time policies will be discussed.

This page is maintained by 
Ken Salem.