[Please remove <h1>]
Winter 2004
Note: Events of interest to the
Database Research Group are posted to the uw.cs.database
newsgroup and are mailed to the dbgroup mailing lists: db-faculty
(for DB group faculty), db-grads (for DB group graduate students),
and db-friends (for DB group alumni, visitors, and friends). If
you wish to subscribe to one of these lists, send mail to
majordomo@db
with "subscribe <list>" in the message body, where
<list> is the list you wish to subscribe to. For example,
use "subscribe db-friends" to subscribe to the db-friends list. To
unsubscribe, send "unsubscribe <list>" to the same address.
- DB group
meetings
- The DB group meets most Friday afternoons at 2pm, usually in DC1331.
See the list of current events for
times and locations of upcoming meetings. Each meeting lasts
for an hour and features an informal presentation by one of the
members of the group. Everyone is welcome to attend. These talks are
intended to raise questions and to stimulate discussion rather than
being polished presentations of research results. Speakers are determined
using a rotating speaker list, which can be found on the DB group meeting page
- DB seminar
series
- The DB seminar series features visiting speakers. These seminars are
more-or-less monthly, and are usually scheduled on Monday
mornings at 11am. See the list of current
events for times and locations of upcoming seminars. The
full schedule can be found on the DB seminar
series page.
Recent and Upcoming Events
DB meeting: |
Friday, January 16th, 2:00pm, DC1331 |
Speaker: |
Ning Zhang
|
Topic: |
A Succinct Physical Storage Scheme for Efficient Evaluation of
Path Queries in XML
|
Abstract: |
Path expressions are ubiquitous in XML processing languages.
Existing approaches evaluate a path expression by selecting nodes that
satisfy the tag-name and value constraints and then joining them
according to the structural constraints. In this talk, I propose a
novel approach, next-of-kin (NoK) pattern matching, to speed up the
node selection step, and to reduce the join size significantly in the
second step. To efficiently perform NoK pattern matching, I also propose
a succinct XML physical storage scheme that is adaptive to updates and
streaming XML as well. Performance results demonstrate that the
proposed storage scheme and path evaluation algorithm is highly
efficient and outperforms the other tested systems in most cases.
|
DB meeting: |
Friday, January 23rd, 2:00pm, MC5158 |
Speaker: |
Lehui Nie
|
Topic: |
Efficient XQuery Processing using B+tree Indices
|
DB meeting: |
Friday, January 30th, 2:00pm, DC1331 |
Speaker: |
Ian Davis
|
Topic: |
SQL+XML Standardization
|
Abstract: |
In recent years it has become apparent that standardised mechanisms for
constructing, storing and retrieving XML documents need to be integrated
into the still evolving SQL standard. The H2 standards community, which
has the responsibility for progressing SQL, has thus begun the work of
developing the necessary standards for how XML might be integrated with
SQL. Initial effort focused on developing what have been referred to as
SQL/XML publishing functions, which allow XML documents to be dynamically
constructed using extensions to SQL. In my talk I will review the August
2003 working draft of the XML-Related Specifications, and attempt to
present a high level overview of the extensions to SQL functionality
that are currently encapsulated within this 267 page working document.
|
DB meeting: |
Friday, February 6th, 2:00pm, DC1331 |
Speaker: |
Gord Cormack
|
Topic: |
Spam
|
Abstract: |
Is spam filtering simply an application of traditional text classification?
If not, why not? What are the unique characterisitics of spam and how have
spam filterers responded to these characteristics? Can the
effectiveness of spam filters be measured using scientific
techniques akin to those used for classification and
information retrieval.
I'll report on some of the major ideas that have been
discussed at the first two spam conferences, as wella
s my own experience in adapting Spamassassin as a
personalized spam filter. I'll also discuss the problems and
issues that arise in trying to construct
a TREC-like spam-filter evaluation suite.
|
DB meeting: |
Friday, February 13th, 2:00pm, MC5158 |
Speaker: |
Ani Nica
|
Topic: |
SQL analytic functions and query optimization techniques in relational DBMS
|
Abstract: |
In this talk I will introduce SQL analytic functions and discuss
some of the query optimization problems for complex queries using such
functions in the relational DBMS. An example of the analytic functions are
window and reporting functions used in decision-support queries. First, I
will present some new techniques for cost-based optimization based on
window functions. Most importantly, I will discuss new challenges related
to query optimization, such as using materialized views to answer SQL
queries when the views and/or the queries involve analytic functions.
These are new issues for which traditional query optimization techniques
cannot be directly applied, and, to my knowledge, are little addressed in
the database research literature.
|
SCS Seminar:
|
Monday, February 23rd, 10:30 am, DC1302
|
Speaker: |
Rachel Pottinger, University of Washington
|
Title: |
Processing Queries and Merging Schemas in Support of Data Integration
|
DB meeting: |
Friday, February 27th, 2:00pm, DC1331 |
Speaker: |
Amir Chinaei
|
Topic: |
Updates and XML
|
Abstract: |
With the development of XML as a new standard for data representation and
exchange format in Internet-based applications, expressing lookup queries
and performing update operations over XML data become more important.
Much work has been done on querying XML while updating XML is still an
open issue. This talk reviews recent works on update operations for XML
data, possible language extensions to XQuery, and implementing update
operations when the XML data is stored in a relational database.
|
DB meeting: |
Friday, March 5th, 2:00pm, DC1331 |
Speaker: |
Lukasz Golab
|
Topic: |
Querying Sliding Windows over On-Line Data Streams
|
Abstract: |
A data stream is a real-time, continuous, ordered sequence of items
generated by sources such as sensor networks, Internet traffic flow, credit
card transaction logs, and on-line financial tickers. Processing continuous
queries over data streams introduces a number of research problems, one of
which concerns evaluating queries over sliding windows defined on the
inputs. In the first part of this talk, I will describe previous work in
sliding window query processing, with an emphasis on query algebras and
languages, and implementations of windowed operators. In the second part of
the talk, I will present a solution to the problem of indexing sliding
windows in main memory, based on the following paper:
Golab, Garg, Özsu,
On Indexing Sliding Windows over On-Line
Data Streams, to appear in
Int. Conf.on Extending Database Technology (EDBT) 2004.
|
SCS Seminar:
|
Thursday, March 11th, 10:30 am, Room DC1304
|
Speaker: |
Ihab Ilyas, Purdue University
|
Title: |
Rank-aware Query Processing and Optimization
|
DB meeting: |
Friday, March 12th, 2:00pm, DC1331 |
Speaker: |
Lei Chen
|
Topic: |
On the Marriage of Lp-norms and Edit Distance
|
Abstract: |
Existing studies on time series are based on two categories of
distance functions. The first category consists of the Lp-norms.
They are metric distance functions but cannot support local time
shifting. The second category consists of distance functions which
are capable of handling local time shifting but are non-metric. In
this talk, I first present a new distance function, which we call
ERP ("Edit distance with Real Penalty"). Representing a marriage
of L1-norm and the edit distance, ERP can support local time
shifting, and is a metric. After that, I will talk about pruning
strategies for large time series databases. Given that ERP is a
metric, one way to prune is to apply the triangle inequality.
Another way to prune is to develop a lower bound on the ERP
distance. I propose such a lower bound, which has the nice
computational property that it can be efficiently indexed with a
standard B+-tree. Moreover, I will show that these two ways of
pruning can be used simultaneously for ERP distances.
Specifically, the false positives obtained from the B+-tree can be
further minimized by applying the triangle inequality. Finally, I
will report some experimental results with existing benchmarks and
techniques and show that this combination delivers superb pruning
power and search time performance.
|
DB meeting: |
Friday, March 19th, 2:00pm, DC1331 |
Speaker: |
Huaxin Zhang
|
Topic: |
Secure Evaluation of XML Queries
|
Abstract: |
Fine-grained access control on XML is specified by tagging each node in
the XML data as accessible or non-accessible for individual users under
each action mode. Secure evaluation of XML is "safe" query evaluation
that does not reveal information from the non-accessible nodes to the users.
My presentation will consist of three parts. I will first talk about
access control policies in the context of XML data, i.e., the languages
for specifying and representing fine-grained access control data.
Then I will review some recent work and briefly describe the semantics
of secure XML query evaluation. In the last part I will discuss some
existing and potential approaches to query optimization for secure XML data.
|
SCS Seminar:
|
Thursday, March 25th, 10:30 am, Room TBA
|
Speaker: |
Ashraf Aboulnaga, University of Wisconsin
|
Title: |
On-line Statistics for Database Query Optimization
|
DB meeting: |
Friday, March 26th, 2:00pm, DC1331 |
Speaker: |
Charles Clarke
|
Topic: |
A Reliable Storage Management Layer for a Distributed Information
Retrieval System
|
Abstract: |
I will present a storage management layer that facilitates the
implementation of parallel information retrieval systems and related
applications on a network of workstations. The storage management
layer automates the process of adding and removing nodes, and
implements a dispersed mirroring strategy to improve reliability. When
nodes are added and removed, the document collection managed by the
system is redistributed for load balancing purposes. The use of
dispersed mirroring minimizes the impact of node failures and system
modifications on query performance.
|
SCS Seminar:
|
Monday, April 5th, 10:30 am, DC 1304
|
Speaker: |
Melanie Wu, University of Michigan
|
Title: |
XML Query Processing in TIMBER
|
Abstract: |
The eXtensible Markup Language (XML) has recently become very popular as a
data representation and data-exchanging format. The effective management of
XML in a database thus becomes a pressing issue. A central challenge in this
regard is the complex and heterogeneous structure of XML data and XML query.
In this talk, I will discuss the design of TIMBER, a native XML database
management system under development at the University of Michigan, and my
specific contributions to the TIMBER system, focusing on structural join
algorithms, structural join order selection, and structural join result
size estimation.
|
DB meeting: |
Friday, April 16th, 2:00pm, DC1331 |
Speaker: |
no speaker today
|
Topic: |
come for a coffee break
|
DB meeting: |
Friday, April 23rd, 2:00pm, DC1331 |
Speaker: |
Gulay Unel
|
Topic: |
An Efficient Query Optimization Strategy for Spatio-Temporal Queries in
Video Databases
|
Abstract: |
The interest for multimedia database management systems has grown rapidly
due to the need for the storage of huge volumes of multimedia data in
computer systems. An important building block of a multimedia database
system is the query processor, and a query optimizer embedded to the query
processor is needed to answer user queries efficiently. Query optimization
problem has been widely studied for conventional database systems, however
it is a new research area for multimedia database systems. Due to the
differences in query processing strategies, query optimization techniques
used in multimedia database systems are different from those used in
traditional databases. In this talk, a query optimization strategy will be
presented for processing spatio-temporal queries in video database
systems. The strategy includes reordering algorithms to be applied on
query execution tree.
|
DB meeting: |
Friday, April 30th, 2:00pm, DC1331 |
Speaker: |
Yingwen Chen
|
Topic: |
XQuery Query Processing in Relational Systems
|
Abstract: |
With the rapid growth of XML documents to serve as a popular
and major media for storage and interchange of data on the
web, there is an increasing interest in using existing
traditional relational database techniques to store and/or
query XML data. Since XQuery is becoming a standard XML query
language, significant effort has been made in developing an
efficient and comprehensive XQuery-to-SQL query processor.
In this thesis, we design and implement an XQuery-to-SQL
Query Processor based on the Dynamic Intervals approach.
We also provide a comprehensive translation for XQuery
basic operations and FLWR expressions. The query processor
is able to translate a complex XQuery query, which might
include arbitrarily composed and nested FLWR expressions,
basic functions, and element constructors, into a single
SQL query for RDBMS and a physical plan for the XQuery-enhanced
Relational Engine.
In order to produce efficient and concise SQL queries,
succinct SQL translation templates for XQuery expressions
and the optimization algorithms for the SQL query generation
are proposed and implemented. The preferable merge-join
approach is also proposed to avoid the inefficient
nested-loop evaluation for FLWR expressions. Merge-join
patterns and query rewriting rules are designed to identify
XQuery fragments that can utilize the efficient merge-join
evaluation. Proofs of correctness of the approach are provided
in the thesis. Experimental results shows the correctness of
our work.
|
This page is maintained by
Ken
Salem.