[Please remove <h1>]
Spring 2008
Note: Events of interest to the
Database Research Group are posted to the uw.cs.database
newsgroup and are mailed to the
db-group@lists.uwaterloo.ca
mailing list. There are actually three mailing lists aggregated into the
db-group list: db-faculty
(for DB group faculty), db-grads (for DB group graduate students),
and db-friends (for DB group alumni, visitors, and friends). If
you wish to subscribe to one of these three lists (or to unsubscribe), please
visit
https://lists.uwaterloo.ca/mailman/listinfo/<listname>, where
<listname> is the list you wish to subscribe to.
- DB group meetings
- The DB group meets most Friday afternoons at 2pm, usually in DC1331.
See the list of current events for
times and locations of upcoming meetings. Each meeting lasts
for an hour and features an informal presentation by one of the
members of the group. Everyone is welcome to attend. These talks are
intended to raise questions and to stimulate discussion rather than
being polished presentations of research results. Speakers are determined
using a rotating speaker list, which can be found on the
DB group meeting page
- DB seminar series
- The DB seminar series features visiting speakers. These seminars are
more-or-less monthly, and are usually scheduled on Monday
mornings at 11am. See the list of current
events for times and locations of upcoming seminars. The
full schedule can be found on the DB seminar series page.
Recent and Upcoming Events
DB Meeting:
|
Friday May 9, 2:00pm,
DC 1304 (Please note change of room)
|
Speaker:
|
Xuhui Li
|
Title:
|
Delayed Synchronization of I/O Writes
|
Abstract:
|
Modern computers usually have multiple tiers of cache, such as file system
cache and storage system cache, lying between application user spaces and
storage devices. Although current I/O interfaces implemented by operating
systems can support applications to use these cache tiers, they are not
flexible enough to meet applications? various requirements of I/O
synchronization. As a result, some applications tradeoff I/O efficiency for
data safety and totally ignore the underlying caches. In this paper we propose
a novel I/O interface to address this problem. Our approach supports
applications to use underlying caches and at the same time still preserve data
safety. We implemented our approach on Linux Native AIO interface and modified
MySQL InnoDB storage engine to use it. By running some synthetic workload
against the new LAIO interface we found promising results.
|
DB Meeting:
|
Friday May 16, 2:00pm, DC 1331
|
Speaker:
|
Ani Nica, Sybase iAnywhere
|
Title:
|
Spatial Indexes and spatial support in relational database systems
|
Abstract:
|
This talk will cover the current database research for spatial indexes such
as R-trees, TV-trees, SS-trees, and Quadtrees. A summary of the current
support for spatial data and spatial queries in commercial relational
database systems will be also part of the presentation.
|
DB Meeting:
|
Friday May 23, 2:00pm, DC 1331
|
Speaker:
|
Gulay Unel
|
Title:
|
Data Exchange
|
Abstract:
|
This talk will be on data exchange which is the problem of taking data structured under a source
schema and creating an instance of a target schema that reflects the source data as accurately as
possible. It will cover two main papers "Data exchange: semantics and
query answering" and "Data exchange: getting to the core" by Fagin et al.
|
DB Meeting:
|
Friday June 6, 2:00pm, DC 1331
|
Speaker:
|
Peter Bumbulis, Sybase iAnywhere
|
Title:
|
Enforcing Database Recoverability on Disks that Lack Write-Through
|
Abstract:
|
Talk based on Robin Dhamankar, Hanuma Kodavalla, and Vishal Kathuria. "Enforcing Database Recoverability on Disks that Lack Write-Through," MSR-TR-2008-36, March 2008.
|
Seminar:
|
Thursday June 19, 10:30am, DC 1331
(Please note unusual day and time) |
Speaker:
|
Zhenjie Zhang, National University of Singapore
|
Title:
|
On Uncertain Data Clustering and Domination Game Analysis
|
Abstract:
|
In this talk, I will talk about two pieces of work, uncertain data
clustering and domination game analysis. Uncertain Data Clustering: Applications, Models and Algorithms
Uncertain data is now ubiquitous in many database systems and
applications, such as scientific database, sensor network, moving
objects and data stream, due to inaccurate measurement or
infrequent data update. In this talk, we will present our new
studies on unsupervised learning over uncertain data sets. In our
study, every uncertain object is modelled as a sphere in the
corresponding space which bound the exact position without any
underlying distribution assumption. Based on the definition of
uncertainty, different computation models are proposed for unsupervised
learning tasks, including Zero Uncertain Model, Static Uncertain Model,
Dissolvable Uncertain Model and Reversed Uncertain Model. Each
of the models can be applied to different environments with different
requirements. We will further present some preliminary solutions to the
models with some of the popular learning algorithms, such as k-means
algorithm, EM algorithm. Some of the work presented here will
appear in ICML'08.
Domination Game Analysis: When Game Theory Meet Data Mining
Game theory is a powerful tool for modelling competitions among
manufacturers in a market. In this paper, we present a study on
combining game theory and data mining by introducing the concept
of domination game analysis. We present a multidimensional market
model, where every dimension represents one attribute of a
commodity. Every product or customer is represented by a point in
the multidimensional space, and a product is said to "dominate" a
customer if all of its attributes can satisfy the requirements of
the customer. The expected market share of a product is measured
by the expected number of the buyers in the customers, all of
which are equally likely to buy any product dominating him. A Nash
Equilibrium is a configuration of the products achieving stable
expected market shares for all products. We prove that Nash
Equilibrium in such a model can be computed in polynomial time if
every manufacturer tries to modify its product in a round robin
manner. To further improve the efficiency of the computation, we
also design two algorithms for the manufacturers to efficiently
find their best response to other products in the market. This is
joint work with Laks V.S. Lakshmanan and Anthony K. H. Tung
|
Bio:
|
Zhenjie Zhang is a PhD candidate in the Database Group at the
National University of Singapore. He received his B.Sc. in
Computer Science from Fudan University, China. His research
interests include general skyline query, unsupervised learning,
and game theoretical analysis over large data. Zhenjie presently
has 10 research papers to his name including papers in major
venues such as SIGMOD, ICML and TKDE. He was a recipient of the
prestigious NUS President Fellowship in 2007. More about Zhenjie's
research can be found at www.comp.nus.edu.sg/~zhangzh2/
|
DB Meeting:
|
Friday July 4, 2:00pm, DC 1331
|
Speaker:
|
Xin Liu
|
Title:
|
Application Hints for Multi-tier Cache Management
|
Abstract:
|
In today's computer system, a data access usually goes through several
tiers of cache, such as application buffer, file system cache, and
storage server cache. Typically each tier of cache is managed by its
own sub-system independently. The information about the data access
shared between these sub-systems is either limited or under-utilized.
Although sub-systems try their best to improve their individual cache
performance, their lack of I/O information from other tiers, or their
ignorance of it, makes their efforts sub-optimal. In fact, when an I/O
request is passed from one tier to another, some attributes associated
with it are useful in predicting future access of the same data page.
These attributes can be passed from an upper cache tier, such as a
storage client, to a lower tier cache, such as a storage server, as
hints. They can be used by the latter in its cache management. In this
presentation I will present our study on how I/O attributes in a
storage client are related to its future data accesses to the storage
server, and how we utilize these attributes to manage the lower-tier
cache.
|
DB Meeting:
|
Friday July 11, 2:00pm, DC 1331
|
Speaker:
|
Luiz Celso Gomez Jr.
|
Title:
|
Web user databases
|
Abstract:
|
The demand for richer interactive web applications has
turned browsers into powerful programing platforms. The only piece
still missing is an integrated user data management system to replace
the limited flexibility provided by cookies. In this talk I will
outline the requirements for such system, present what has been done
towards this end and possible new alternatives. At the end I will
overview a few techniques to meet selected requirements.
|
DB Meeting:
|
Friday July 18, 2:00pm, DC 1331
|
Speaker:
|
Raymond Wong, Hong Kong University of Science and Technology (HKUST)
|
Title:
|
Minimality Attack in Privacy Preserving Data Publishing
|
Abstract:
|
Data publishing generates much concern over the protection
of individual privacy. Recent studies consider cases where the
adversary may possess different kinds of knowledge about the
data. In this paper, we show that knowledge of the mechanism
or algorithm of anonymization for data publication can
also lead to extra information that assists the adversary and
jeopardizes individual privacy. In particular, all known mechanisms
try to minimize information loss and such an attempt
provides a loophole for attacks. We call such an attack a minimality
attack. In this paper, we introduce a model called
m-confidentiality which deals with minimality attacks, and
propose a feasible solution. Our experiments show that minimality
attacks are practical concerns on real datasets and
that our algorithm can prevent such attacks with very little
overhead and information loss.
|
DB Meeting:
|
Friday July 25, 2:00pm, DC 1331
|
Speaker:
|
Patrick Kling
|
Title:
|
Distributed XML Query Processing
|
Abstract:
|
XML is commonly used to exchange data among a variety of systems.
Therefore, XML data can be viewed as inherently distributed according
to the origin of individual fragments. Large XML collections and heavy
workloads also force us to distribute XML data. In this talk, I will
describe a number of ways in which XML data can be distributed. I will
then discuss techniques that allow us to query distributed XML and how
querying can take advantage of known distribution characteristics.
|
DB Meeting:
|
Friday August 1, 2:00pm, DC 1331
|
Speaker:
|
Wei Jiang
|
Title:
|
Ordered Conjunctive Queries
|
Abstract:
|
Order properties of data have been playing an important role in relational
query processing and optimization. However, conventional database systems
consider order support only as an add-on feature to the core query
optimization. They fail to provide a systematic and consistent treatment
of order throughout query processing. We propose a novel approach to
represent ordered data models, and trace and refer order properties
throughout query processing and optimization. By considering order from
the first beginning of query processing, we expect to gain the most
benefits from query optimization. An ordered data model and ordered
algebra will be presented for ordered conjunctive queries.
|
This page is maintained by
Ashraf Aboulnaga.