Database Research Group Events

Spring 2008

Note: Events of interest to the Database Research Group are posted to the uw.cs.database newsgroup and are mailed to the mailing list. There are actually three mailing lists aggregated into the db-group list: db-faculty (for DB group faculty), db-grads (for DB group graduate students), and db-friends (for DB group alumni, visitors, and friends). If you wish to subscribe to one of these three lists (or to unsubscribe), please visit<listname>, where <listname> is the list you wish to subscribe to.
DB group meetings
The DB group meets most Friday afternoons at 2pm, usually in DC1331. See the list of current events for times and locations of upcoming meetings. Each meeting lasts for an hour and features an informal presentation by one of the members of the group. Everyone is welcome to attend. These talks are intended to raise questions and to stimulate discussion rather than being polished presentations of research results. Speakers are determined using a rotating speaker list, which can be found on the DB group meeting page
DB seminar series
The DB seminar series features visiting speakers. These seminars are more-or-less monthly, and are usually scheduled on Monday mornings at 11am. See the list of current events for times and locations of upcoming seminars. The full schedule can be found on the DB seminar series page.

Recent and Upcoming Events

DB Meeting: Friday May 9, 2:00pm, DC 1304 (Please note change of room)
Speaker: Xuhui Li
Title: Delayed Synchronization of I/O Writes
Abstract: Modern computers usually have multiple tiers of cache, such as file system cache and storage system cache, lying between application user spaces and storage devices. Although current I/O interfaces implemented by operating systems can support applications to use these cache tiers, they are not flexible enough to meet applications? various requirements of I/O synchronization. As a result, some applications tradeoff I/O efficiency for data safety and totally ignore the underlying caches. In this paper we propose a novel I/O interface to address this problem. Our approach supports applications to use underlying caches and at the same time still preserve data safety. We implemented our approach on Linux Native AIO interface and modified MySQL InnoDB storage engine to use it. By running some synthetic workload against the new LAIO interface we found promising results.

DB Meeting: Friday May 16, 2:00pm, DC 1331
Speaker: Ani Nica, Sybase iAnywhere
Title: Spatial Indexes and spatial support in relational database systems
Abstract: This talk will cover the current database research for spatial indexes such as R-trees, TV-trees, SS-trees, and Quadtrees. A summary of the current support for spatial data and spatial queries in commercial relational database systems will be also part of the presentation.

DB Meeting: Friday May 23, 2:00pm, DC 1331
Speaker: Gulay Unel
Title: Data Exchange
Abstract: This talk will be on data exchange which is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. It will cover two main papers "Data exchange: semantics and query answering" and "Data exchange: getting to the core" by Fagin et al.

DB Seminar: Monday June 2, 10:30am, DC 1304
Speaker: Amr El Abbadi, University of California, Santa Barbara
Title: New Challenges in the Management of Data

DB Meeting: Friday June 6, 2:00pm, DC 1331
Speaker: Peter Bumbulis, Sybase iAnywhere
Title: Enforcing Database Recoverability on Disks that Lack Write-Through
Abstract: Talk based on Robin Dhamankar, Hanuma Kodavalla, and Vishal Kathuria. "Enforcing Database Recoverability on Disks that Lack Write-Through," MSR-TR-2008-36, March 2008.

DB Seminar: Monday June 16, 10:30am, DC 1304
Speaker: Carl-Christian Kanne, University Mannheim
Title: The Demaq System: Declarative Messaging and Queueing

Seminar: Thursday June 19, 10:30am, DC 1331 (Please note unusual day and time)
Speaker: Zhenjie Zhang, National University of Singapore
Title: On Uncertain Data Clustering and Domination Game Analysis
Abstract: In this talk, I will talk about two pieces of work, uncertain data clustering and domination game analysis.

Uncertain Data Clustering: Applications, Models and Algorithms
Uncertain data is now ubiquitous in many database systems and applications, such as scientific database, sensor network, moving objects and data stream, due to inaccurate measurement or infrequent data update. In this talk, we will present our new studies on unsupervised learning over uncertain data sets. In our study, every uncertain object is modelled as a sphere in the corresponding space which bound the exact position without any underlying distribution assumption. Based on the definition of uncertainty, different computation models are proposed for unsupervised learning tasks, including Zero Uncertain Model, Static Uncertain Model, Dissolvable Uncertain Model and Reversed Uncertain Model. Each of the models can be applied to different environments with different requirements. We will further present some preliminary solutions to the models with some of the popular learning algorithms, such as k-means algorithm, EM algorithm. Some of the work presented here will appear in ICML'08.

Domination Game Analysis: When Game Theory Meet Data Mining
Game theory is a powerful tool for modelling competitions among manufacturers in a market. In this paper, we present a study on combining game theory and data mining by introducing the concept of domination game analysis. We present a multidimensional market model, where every dimension represents one attribute of a commodity. Every product or customer is represented by a point in the multidimensional space, and a product is said to "dominate" a customer if all of its attributes can satisfy the requirements of the customer. The expected market share of a product is measured by the expected number of the buyers in the customers, all of which are equally likely to buy any product dominating him. A Nash Equilibrium is a configuration of the products achieving stable expected market shares for all products. We prove that Nash Equilibrium in such a model can be computed in polynomial time if every manufacturer tries to modify its product in a round robin manner. To further improve the efficiency of the computation, we also design two algorithms for the manufacturers to efficiently find their best response to other products in the market. This is joint work with Laks V.S. Lakshmanan and Anthony K. H. Tung

Bio: Zhenjie Zhang is a PhD candidate in the Database Group at the National University of Singapore. He received his B.Sc. in Computer Science from Fudan University, China. His research interests include general skyline query, unsupervised learning, and game theoretical analysis over large data. Zhenjie presently has 10 research papers to his name including papers in major venues such as SIGMOD, ICML and TKDE. He was a recipient of the prestigious NUS President Fellowship in 2007. More about Zhenjie's research can be found at

DB Seminar: Monday June 23, 10:30am, DC 1304
Speaker: Eric Lo, Hong Kong Polytechnic University
Title: OLAP on Sequence Data

DB Meeting: Friday July 4, 2:00pm, DC 1331
Speaker: Xin Liu
Title: Application Hints for Multi-tier Cache Management
Abstract: In today's computer system, a data access usually goes through several tiers of cache, such as application buffer, file system cache, and storage server cache. Typically each tier of cache is managed by its own sub-system independently. The information about the data access shared between these sub-systems is either limited or under-utilized. Although sub-systems try their best to improve their individual cache performance, their lack of I/O information from other tiers, or their ignorance of it, makes their efforts sub-optimal. In fact, when an I/O request is passed from one tier to another, some attributes associated with it are useful in predicting future access of the same data page. These attributes can be passed from an upper cache tier, such as a storage client, to a lower tier cache, such as a storage server, as hints. They can be used by the latter in its cache management. In this presentation I will present our study on how I/O attributes in a storage client are related to its future data accesses to the storage server, and how we utilize these attributes to manage the lower-tier cache.

DB Meeting: Friday July 11, 2:00pm, DC 1331
Speaker: Luiz Celso Gomez Jr.
Title: Web user databases
Abstract: The demand for richer interactive web applications has turned browsers into powerful programing platforms. The only piece still missing is an integrated user data management system to replace the limited flexibility provided by cookies. In this talk I will outline the requirements for such system, present what has been done towards this end and possible new alternatives. At the end I will overview a few techniques to meet selected requirements.

DB Meeting: Friday July 18, 2:00pm, DC 1331
Speaker: Raymond Wong, Hong Kong University of Science and Technology (HKUST)
Title: Minimality Attack in Privacy Preserving Data Publishing
Abstract: Data publishing generates much concern over the protection of individual privacy. Recent studies consider cases where the adversary may possess different kinds of knowledge about the data. In this paper, we show that knowledge of the mechanism or algorithm of anonymization for data publication can also lead to extra information that assists the adversary and jeopardizes individual privacy. In particular, all known mechanisms try to minimize information loss and such an attempt provides a loophole for attacks. We call such an attack a minimality attack. In this paper, we introduce a model called m-confidentiality which deals with minimality attacks, and propose a feasible solution. Our experiments show that minimality attacks are practical concerns on real datasets and that our algorithm can prevent such attacks with very little overhead and information loss.

DB Meeting: Friday July 25, 2:00pm, DC 1331
Speaker: Patrick Kling
Title: Distributed XML Query Processing
Abstract: XML is commonly used to exchange data among a variety of systems. Therefore, XML data can be viewed as inherently distributed according to the origin of individual fragments. Large XML collections and heavy workloads also force us to distribute XML data. In this talk, I will describe a number of ways in which XML data can be distributed. I will then discuss techniques that allow us to query distributed XML and how querying can take advantage of known distribution characteristics.

DB Meeting: Friday August 1, 2:00pm, DC 1331
Speaker: Wei Jiang
Title: Ordered Conjunctive Queries
Abstract: Order properties of data have been playing an important role in relational query processing and optimization. However, conventional database systems consider order support only as an add-on feature to the core query optimization. They fail to provide a systematic and consistent treatment of order throughout query processing. We propose a novel approach to represent ordered data models, and trace and refer order properties throughout query processing and optimization. By considering order from the first beginning of query processing, we expect to gain the most benefits from query optimization. An ordered data model and ordered algebra will be presented for ordered conjunctive queries.

This page is maintained by Ashraf Aboulnaga.

Campaign Waterloo

Data Systems Group
David R. Cheriton School of Computer Science
University of Waterloo
Waterloo, Ontario, Canada N2L 3G1
Tel: 519-888-4567
Fax: 519-885-1208

Contact | Feedback: | Data Systems Group

Valid HTML 4.01!Valid CSS! Last modified: Friday, 01-Jun-2012 11:01:03 EDT