Database Research Group Events

Winter 2004

Note: Events of interest to the Database Research Group are posted to the uw.cs.database newsgroup and are mailed to the dbgroup mailing lists: db-faculty (for DB group faculty), db-grads (for DB group graduate students), and db-friends (for DB group alumni, visitors, and friends). If you wish to subscribe to one of these lists, send mail to majordomo@db with "subscribe <list>" in the message body, where <list> is the list you wish to subscribe to.  For example, use "subscribe db-friends" to subscribe to the db-friends list. To unsubscribe, send "unsubscribe <list>" to the same address.
DB group meetings
The DB group meets most Friday afternoons at 2pm, usually in DC1331. See the list of current events for times and locations of upcoming meetings. Each meeting lasts for an hour and features an informal presentation by one of the members of the group. Everyone is welcome to attend. These talks are intended to raise questions and to stimulate discussion rather than being polished presentations of research results. Speakers are determined using a rotating speaker list, which can be found on the DB group meeting page
DB seminar series
The DB seminar series features visiting speakers. These seminars are more-or-less monthly, and are usually scheduled on Monday mornings at 11am. See the list of current events for times and locations of upcoming seminars. The full schedule can be found on the DB seminar series page.

Recent and Upcoming Events


DB Seminar: Monday, January 12th, 11:00 am, DC1304
Speaker: Mario Nascimento, University of Alberta
Title: Effective and Efficient Region-based Image Retrieval

DB meeting: Friday, January 16th, 2:00pm, DC1331
Speaker: Ning Zhang
Topic: A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML
Abstract: Path expressions are ubiquitous in XML processing languages. Existing approaches evaluate a path expression by selecting nodes that satisfy the tag-name and value constraints and then joining them according to the structural constraints. In this talk, I propose a novel approach, next-of-kin (NoK) pattern matching, to speed up the node selection step, and to reduce the join size significantly in the second step. To efficiently perform NoK pattern matching, I also propose a succinct XML physical storage scheme that is adaptive to updates and streaming XML as well. Performance results demonstrate that the proposed storage scheme and path evaluation algorithm is highly efficient and outperforms the other tested systems in most cases.

DB meeting: Friday, January 23rd, 2:00pm, MC5158
Speaker: Lehui Nie
Topic: Efficient XQuery Processing using B+tree Indices

DB meeting: Friday, January 30th, 2:00pm, DC1331
Speaker: Ian Davis
Topic: SQL+XML Standardization
Abstract: In recent years it has become apparent that standardised mechanisms for constructing, storing and retrieving XML documents need to be integrated into the still evolving SQL standard. The H2 standards community, which has the responsibility for progressing SQL, has thus begun the work of developing the necessary standards for how XML might be integrated with SQL. Initial effort focused on developing what have been referred to as SQL/XML publishing functions, which allow XML documents to be dynamically constructed using extensions to SQL. In my talk I will review the August 2003 working draft of the XML-Related Specifications, and attempt to present a high level overview of the extensions to SQL functionality that are currently encapsulated within this 267 page working document.

DB meeting: Friday, February 6th, 2:00pm, DC1331
Speaker: Gord Cormack
Topic: Spam
Abstract:

Is spam filtering simply an application of traditional text classification? If not, why not? What are the unique characterisitics of spam and how have spam filterers responded to these characteristics? Can the effectiveness of spam filters be measured using scientific techniques akin to those used for classification and information retrieval.

I'll report on some of the major ideas that have been discussed at the first two spam conferences, as wella s my own experience in adapting Spamassassin as a personalized spam filter. I'll also discuss the problems and issues that arise in trying to construct a TREC-like spam-filter evaluation suite.


DB meeting: Friday, February 13th, 2:00pm, MC5158
Speaker: Ani Nica
Topic: SQL analytic functions and query optimization techniques in relational DBMS
Abstract: In this talk I will introduce SQL analytic functions and discuss some of the query optimization problems for complex queries using such functions in the relational DBMS. An example of the analytic functions are window and reporting functions used in decision-support queries. First, I will present some new techniques for cost-based optimization based on window functions. Most importantly, I will discuss new challenges related to query optimization, such as using materialized views to answer SQL queries when the views and/or the queries involve analytic functions. These are new issues for which traditional query optimization techniques cannot be directly applied, and, to my knowledge, are little addressed in the database research literature.

DB Seminar: Monday, February 16th, 11:30 am Note change in time, DC1304
Speaker: Z. Meral Özsoyoglu, Case Western Reserve University
Title: Distance Based Indexing and Similarity Search for Sequences

SCS Seminar: Monday, February 23rd, 10:30 am, DC1302
Speaker: Rachel Pottinger, University of Washington
Title: Processing Queries and Merging Schemas in Support of Data Integration

DB meeting: Friday, February 27th, 2:00pm, DC1331
Speaker: Amir Chinaei
Topic: Updates and XML
Abstract: With the development of XML as a new standard for data representation and exchange format in Internet-based applications, expressing lookup queries and performing update operations over XML data become more important. Much work has been done on querying XML while updating XML is still an open issue. This talk reviews recent works on update operations for XML data, possible language extensions to XQuery, and implementing update operations when the XML data is stored in a relational database.

DB meeting: Friday, March 5th, 2:00pm, DC1331
Speaker: Lukasz Golab
Topic: Querying Sliding Windows over On-Line Data Streams
Abstract: A data stream is a real-time, continuous, ordered sequence of items generated by sources such as sensor networks, Internet traffic flow, credit card transaction logs, and on-line financial tickers. Processing continuous queries over data streams introduces a number of research problems, one of which concerns evaluating queries over sliding windows defined on the inputs. In the first part of this talk, I will describe previous work in sliding window query processing, with an emphasis on query algebras and languages, and implementations of windowed operators. In the second part of the talk, I will present a solution to the problem of indexing sliding windows in main memory, based on the following paper: Golab, Garg, Özsu, On Indexing Sliding Windows over On-Line Data Streams, to appear in Int. Conf.on Extending Database Technology (EDBT) 2004.

SCS Seminar: Thursday, March 11th, 10:30 am, Room DC1304
Speaker: Ihab Ilyas, Purdue University
Title: Rank-aware Query Processing and Optimization

DB meeting: Friday, March 12th, 2:00pm, DC1331
Speaker: Lei Chen
Topic: On the Marriage of Lp-norms and Edit Distance
Abstract: Existing studies on time series are based on two categories of distance functions. The first category consists of the Lp-norms. They are metric distance functions but cannot support local time shifting. The second category consists of distance functions which are capable of handling local time shifting but are non-metric. In this talk, I first present a new distance function, which we call ERP ("Edit distance with Real Penalty"). Representing a marriage of L1-norm and the edit distance, ERP can support local time shifting, and is a metric. After that, I will talk about pruning strategies for large time series databases. Given that ERP is a metric, one way to prune is to apply the triangle inequality. Another way to prune is to develop a lower bound on the ERP distance. I propose such a lower bound, which has the nice computational property that it can be efficiently indexed with a standard B+-tree. Moreover, I will show that these two ways of pruning can be used simultaneously for ERP distances. Specifically, the false positives obtained from the B+-tree can be further minimized by applying the triangle inequality. Finally, I will report some experimental results with existing benchmarks and techniques and show that this combination delivers superb pruning power and search time performance.

DB Seminar: Monday, March 15th, 11:00 am, DC1304
Speaker: Anastassia Ailamaki, Carnegie Mellon University
Title: FATES: Decoupling Main Memory Data Layout from Storage Organization

DB meeting: Friday, March 19th, 2:00pm, DC1331
Speaker: Huaxin Zhang
Topic: Secure Evaluation of XML Queries
Abstract: Fine-grained access control on XML is specified by tagging each node in the XML data as accessible or non-accessible for individual users under each action mode. Secure evaluation of XML is "safe" query evaluation that does not reveal information from the non-accessible nodes to the users. My presentation will consist of three parts. I will first talk about access control policies in the context of XML data, i.e., the languages for specifying and representing fine-grained access control data. Then I will review some recent work and briefly describe the semantics of secure XML query evaluation. In the last part I will discuss some existing and potential approaches to query optimization for secure XML data.

SCS Seminar: Thursday, March 25th, 10:30 am, Room TBA
Speaker: Ashraf Aboulnaga, University of Wisconsin
Title: On-line Statistics for Database Query Optimization

DB meeting: Friday, March 26th, 2:00pm, DC1331
Speaker: Charles Clarke
Topic: A Reliable Storage Management Layer for a Distributed Information Retrieval System
Abstract: I will present a storage management layer that facilitates the implementation of parallel information retrieval systems and related applications on a network of workstations. The storage management layer automates the process of adding and removing nodes, and implements a dispersed mirroring strategy to improve reliability. When nodes are added and removed, the document collection managed by the system is redistributed for load balancing purposes. The use of dispersed mirroring minimizes the impact of node failures and system modifications on query performance.

SCS Seminar: Monday, April 5th, 10:30 am, DC 1304
Speaker: Melanie Wu, University of Michigan
Title: XML Query Processing in TIMBER
Abstract: The eXtensible Markup Language (XML) has recently become very popular as a data representation and data-exchanging format. The effective management of XML in a database thus becomes a pressing issue. A central challenge in this regard is the complex and heterogeneous structure of XML data and XML query. In this talk, I will discuss the design of TIMBER, a native XML database management system under development at the University of Michigan, and my specific contributions to the TIMBER system, focusing on structural join algorithms, structural join order selection, and structural join result size estimation.

DB meeting: Friday, April 16th, 2:00pm, DC1331
Speaker: no speaker today
Topic: come for a coffee break

DB Seminar: Monday, April 19th, 11:30 am, DC1304
Speaker: Nick Koudas, University of Toronto
Title: Flexible String Matching in Databases

DB meeting: Friday, April 23rd, 2:00pm, DC1331
Speaker: Gulay Unel
Topic: An Efficient Query Optimization Strategy for Spatio-Temporal Queries in Video Databases
Abstract: The interest for multimedia database management systems has grown rapidly due to the need for the storage of huge volumes of multimedia data in computer systems. An important building block of a multimedia database system is the query processor, and a query optimizer embedded to the query processor is needed to answer user queries efficiently. Query optimization problem has been widely studied for conventional database systems, however it is a new research area for multimedia database systems. Due to the differences in query processing strategies, query optimization techniques used in multimedia database systems are different from those used in traditional databases. In this talk, a query optimization strategy will be presented for processing spatio-temporal queries in video database systems. The strategy includes reordering algorithms to be applied on query execution tree.

DB meeting: Friday, April 30th, 2:00pm, DC1331
Speaker: Yingwen Chen
Topic: XQuery Query Processing in Relational Systems
Abstract:

With the rapid growth of XML documents to serve as a popular and major media for storage and interchange of data on the web, there is an increasing interest in using existing traditional relational database techniques to store and/or query XML data. Since XQuery is becoming a standard XML query language, significant effort has been made in developing an efficient and comprehensive XQuery-to-SQL query processor.

In this thesis, we design and implement an XQuery-to-SQL Query Processor based on the Dynamic Intervals approach. We also provide a comprehensive translation for XQuery basic operations and FLWR expressions. The query processor is able to translate a complex XQuery query, which might include arbitrarily composed and nested FLWR expressions, basic functions, and element constructors, into a single SQL query for RDBMS and a physical plan for the XQuery-enhanced Relational Engine.

In order to produce efficient and concise SQL queries, succinct SQL translation templates for XQuery expressions and the optimization algorithms for the SQL query generation are proposed and implemented. The preferable merge-join approach is also proposed to avoid the inefficient nested-loop evaluation for FLWR expressions. Merge-join patterns and query rewriting rules are designed to identify XQuery fragments that can utilize the efficient merge-join evaluation. Proofs of correctness of the approach are provided in the thesis. Experimental results shows the correctness of our work.



This page is maintained by 
Ken Salem.