[Please remove <h1>]
Fall 2004
Note: Events of interest to the
Database Research Group are posted to the uw.cs.database
newsgroup and are mailed to the dbgroup mailing lists: db-faculty
(for DB group faculty), db-grads (for DB group graduate students),
and db-friends (for DB group alumni, visitors, and friends). If
you wish to subscribe to one of these lists, send mail to
majordomo@db
with "subscribe <list>" in the message body, where
<list> is the list you wish to subscribe to. For example,
use "subscribe db-friends" to subscribe to the db-friends list. To
unsubscribe, send "unsubscribe <list>" to the same address.
- DB group
meetings
- The DB group meets most Friday afternoons at 2pm, usually in DC1331.
See the list of current events for
times and locations of upcoming meetings. Each meeting lasts
for an hour and features an informal presentation by one of the
members of the group. Everyone is welcome to attend. These talks are
intended to raise questions and to stimulate discussion rather than
being polished presentations of research results. Speakers are determined
using a rotating speaker list, which can be found on the DB group meeting page
- DB seminar
series
- The DB seminar series features visiting speakers. These seminars are
more-or-less monthly, and are usually scheduled on Monday
mornings at 11am. See the list of current
events for times and locations of upcoming seminars. The
full schedule can be found on the DB seminar
series page.
Recent and Upcoming Events
DB meeting: |
Friday, September 17, 2:00pm, DC1331 |
Topic: |
Kickoff meeting.
|
DB meeting: |
Friday, September 24, 2:00pm, DC1331 |
Speaker: |
Grant Weddell
|
Title: |
The DLF Dialects of Description Logic
|
Abstract: |
Description logics (DLs) are an important family of model logics
that have been used to formally capture object-relational schema
and UML class diagrams. They also underlie recent efforts to define
the Ontology Web Language (OWL), a W3C standard currently under
development for capturing structured knowledge of web services.
In this talk, I'll give an overview of DLs and their relevance
to the database community, with a particular focus on dialects that
incorporate features or attributes. I'll conclude with an additional
overview of recent work on incorporating functional dependencies
in these dialects.
|
DB meeting: |
Friday, October 1, 2:00pm, DC1331 |
Speaker: |
David DeHaan
|
Title: |
Views-on-Views in SQL Server
|
Abstract: |
Indexed (a.k.a. materialized) views are used to speed up query processing of
data-intensive queries, and most commercial relational database systems now
provide some level of support for them. Like indices over base relations,
materialized views are used transparently by the query optimizer to speed up
queries written over the vocabulary of base relations or logical views. To
minimize the online overhead of maintaining a view extent, materialized view
definitions in SQL databases are commonly restricted to a class of simple
aggregation expressions over joined base relations (called SPJG, or Select-
Project-Join-Groupby) because efficient incremental maintenance strategies are
known to exist for this class. Systems that support more complex view
definitions typically limit them to offline maintenance.
In this talk I will describe work done this summer with Paul Larson to extend
SQL Server to support "views-on-views", which are SPJG indexed views that
reference other indexed views as if they were base relations. This class of
views is interesting because it maintains properties of incremental
maintenance, yet it increases the expressive power of the view language to
encompass several types of common non-SPJG queries. After exploring some of
the advantages of supporting views-on-views, I will describe at a high level
the modifications needed to the SQL Server query optimizer to utilize views-on-
views for query processing. Finally, I will briefly explore how allowing
indexed views to contain Outer Join would extend the power of the view
language even further.
|
DB meeting: |
Friday, October 15, 2:00pm, DC1331 |
Speaker: |
Jane Xiao
|
Title: |
An Overview of DL Dialects DLFAD and DLFDreg
|
Abstract: |
Description Logics (DL) play an important role in many applications, say
software engineering, medical informatics, digital libraries, web-based
information systems, databases, etc. In databases, take query optimization
for example, query rewrites, duplication elimination and many other
operations can be abstracted as DL logical implication problems. In this talk,
I will give an overview of DL dialects DLFAD and DLFDreg. These dialects are
proposed by David Toman and Grant Weddell in their papers: "Attribute Inversion
in Description Logics with Path Functional Dependencies" and "On Reasoning
about Structural Equality in XML: A Description Logic Approach".
The main contributions of DLFAD are: (1) it provides both attribute inversion
and path functional dependency in the dialect; (2) it proves that in general
case the implication problem of DLFAD is undecidable by reducing an unrestricted
tiling problem to a DLFAD implication problem; (3) it proves that the implication
problem is decidable under coherent condition by reducing it to the
satisfiability problem of the Ackerman formulae.
The main property of DLFDreg is that it uses regular expressions to express
functional dependencies by possibly infinite sets of feature paths. This can be
used to reason about structural equality in XML. The paper also gives the proof
that the implication problem of DLFDreg is decidable by reducing the problem to
the satisfiability problem of Datalogns with negation.
Also I will discuss the relations between DLFAD and DLFDreg as well as some
possible extension work based on them.
|
Seminar: |
Tuesday, October 19, 11:00am, DC1304 |
Speaker: |
Brian Cooper, Georgia Institute of Technology
|
Title: |
Using information retrieval techniques to route queries in an
InfoBeacons network
|
Abstract: |
We present the InfoBeacons system, in which a peer-to-peer network of
beacons cooperates to route queries to the best information
sources. The routing in our system uses techniques adapted from
information retrieval. We examine routing at two levels. First, each
beacon is assigned several sources and routes queries to those
sources. Many sources are unwilling to provide more cooperation than
simple searching, and we must adapt traditional information retrieval
techniques to choose the best sources despite this lack of
cooperation. Second, beacons route queries to other beacons using
techniques similar to those for routing queries to sources. We examine
alternative architectures for routing queries between beacons. Results
of experiments using a beacon network to search 1,000 information
sources demonstrates how our techniques can be used to efficiently
route queries; for example, our techniques require contacting up to 70
percent fewer sources than random walk techniques.
|
DB meeting: |
Friday, October 22, 2:00pm, DC1331 |
Speaker: |
Xuhui Li
|
Title: |
The Physical Data Placement Problem in Shared-Disk Relational Database Systems
|
Abstract: |
Magnetic disks are still bottlenecks of today's database systems.
To improve disk I/O efficiency and hence system performance, two physical
data placement schemes, data declustering and data clustering, are used
by database system administrators. Data objects, such as tables, indexes
and materialized views, can be striped across multiple disks in order
to maximize intra-object I/O parallelism (declustering), or can be
isolated into separate disks or disk sets in order to maximize inter-object
I/O parallelism (clustering).
In this presentation, we will introduce
our study on the performance impacts of these different data placement
schemes. In our study, we examined two representative data layouts,
partitioning and full-striping. We studied some important aspects of
disk I/O, such as disk seeks and disk read-ahead, and how they affect
I/O efficiency under the two data layouts. We also studied
the effects of a few environmental factors, such as the number of
concurrent queries and the degree of I/O parallelism, and their effects
on the relative performance of different data layouts. Our experiments
show that physical data placement has a significant impact on disk
I/O efficiency and hence system performance. The relative performance
of different data layouts highly depends
on workload and other environment factors.
|
DB meeting: |
Friday, October 29, 2:00pm, DC1331 |
Speaker: |
Ken Salem
|
Topic: |
I plan to discuss view-based database access controls, with
relational and XML examples. I'll be drawing material from
a couple of papers from this year's SIGMOD conference:
- Shariq Rizvi, Alberto O. Mendelzon, S. Sudarshan, Prasan Roy:
Extending Query Rewriting Techniques for Fine-Grained Access Control. SIGMOD Conference 2004: 551-562
- Wenfei Fan, Chee Yong Chan, Minos N. Garofalakis: Secure XML Querying with Security Views. SIGMOD Conference 2004: 587-598
|
DB meeting: |
Friday, November 5, 2:00pm, DC1331 |
Speaker: |
Dan Farrar
|
Title: |
Randomized Databases and Query Sets
|
Abstract: |
As the complexity of database systems, and in particular of query
languages, continues to grow, it is becoming increasingly labour-intensive
to develop tests that adequately cover the functionality of these systems.
Such testing is vital for commercial and open-source products to ensure
correctness and stability. An effective (and cost-effective) solution is
to stochastically generate database instances and query workloads, and test
their operation. These techniques are also useful for evaluating the
performance of systems and algorithms, especially in query processing.
I will give an overview of existing methods for generating random databases
and queries, focusing on the generation of SQL statements. I will discuss
our experiences at iAnywhere with two types of randomized database and
query generation, and compare papers published by Oracle and Microsoft
describing their experiences with similar methods.
|
DB meeting: |
Friday, November 12, 2:00pm, DC1331 |
Speaker: |
Frank Tompa
|
Topic: |
I will present a proposal for storing and querying archived data, as presented
primarily in the following paper:
|
DB meeting: |
Friday, November 19, 2:00pm, DC1331 |
Speaker: |
Ning Zhang
|
Title: |
BlossomTree: Evaluating XPaths in FLWOR Expressions
|
Abstract: |
Efficient evaluation of path expressions has been studied extensively.
However, evaluating more complex FLWOR expressions that contain
multiple path expressions has not been well studied. In this talk, I will
present a pattern matching approach, called BlossomTree, to
evaluating a FLWOR expression that contains correlated path expressions.
BlossomTree is a formalism to capture the semantics of the path
expressions and their correlations.
We propose a general algebraic framework (abstract data types and
logical operators) to evaluate BlossomTree pattern matching that
facilitates efficient evaluation and experimentation. We design
efficient data structures and algorithms to implement the abstract
data types and logical operators. Our experimental studies demonstrate
that the BlossomTree approach can generate highly efficient query
plans in different environments.
|
Seminar: |
Tuesday, November 23, 10:30am, DC1304 |
Speaker: |
Peter Patel-Schneider, Bell Labs Research
|
Title: |
What is OWL (and why should I give a hoot)?
|
Abstract: |
OWL is the new ontology language produced by the W3C Web Ontology
Working Group. As such, it is poised to be a major formalism for
the design and dissemination of ontology information, particularly
in the Semantic Web, a part of the World-Wide Web. OWL has
influences from several communities, including the RDF community,
the Description Logic community, and the frame community. These
influences have resulted in a wide variety of requirements on OWL,
several of which appear to be conflicting. OWL contains innovative
solutions to several of these apparent conflicts but other conflicts
have meant that it has not been possible to satisfy all the desired
requirements for OWL.
In this talk I will describe the design and development of OWL
concentrating on what makes OWL important, the relationship of OWL
to other efforts, the innovative solutions that were required in its
design, and the impact of the conflicting requirements on OWL.
|
DB meeting: |
Friday, November 26, 2:00pm, DC1331 |
Speaker: |
Gord Cormack
|
Title: |
SPAM Filter Evaluation at TREC
|
Abstract: |
The purpose of TREC is to provide standard procedures and archival data
for the evaluation of the effectiveness of tools for various information
retrieval tasks. For TREC 2005 one such task will be spam filtering. A
spam filter is an on-line binary classifier that identifies each
incoming email message as spam or not.
Measuring the effectiveness of spam filters presents several challenges.
The standard of accuracy required for acceptable performance is quite
high. The trade-off between false positives (rejected legitimate
messages) and false negatives (accepted spam) must be evaluated. The
learning characteristics of the filter must be measured. Privacy
considerations present a major challenge in constructing an archival
test collection.
In this talk I will present the preliminary guidelines and evaluation
methods for the 2005 TREC spam evaluation task, for which I am coordinator.
|
DB meeting: |
Friday, December 3, 2:00pm, DC1331 |
Speaker: |
Ani Nica
|
Title: |
Bloom filters: a survey of their usage in the database management systems
|
Abstract: |
This talk is intended to be a survey on bloom filters usage in the database
management systems. The talk includes topics such as integration of the
cost-based placement of the bloom filters in a query optimizer, usage of
the bloom filters to improve execution of the access plans, and bloom
filters representation of sets of cached queries. In particular, I will
talk about the usage of the bloom filters in the SQL Anywhere Studio, a
product of iAnywhere Solutions.
|
DB meeting: |
Friday, December 10, 2:00pm, DC1331 |
Speaker: |
Amir Chinaei
|
Title: |
Towards Decentralized Access Control Administration
|
Abstract: |
Access Control Administration is a mechanism to set permissions
for particular users to access particular data. There are several reasons to
decentralize this administration: dynamic changes, different contexts,
sharing data, and delegation of the administration. Some existing systems
allow some degree of decentralization. None seems fully adequate. This talk
presents recent work towards decentralization of access control
administration in which creation time policies are defined. Others have
introduced the idea of distinguishing a virtual access control matrix from
the actual one. This allows them to formulate and reason about access
control inheritance polices, conflict resolution policies, and default
policies. We build on this foundation to describe policies that dictate what
initial access control configuration is to be set at the time that an object
is created. Some examples of creation time policies will be discussed.
|
This page is maintained by
Ken
Salem.