[Please remove <h1>]
Winter 2008
Note: Events of interest to the
Database Research Group are posted to the uw.cs.database
newsgroup and are mailed to the
db-group@lists.uwaterloo.ca
mailing list. There are actually three mailing lists aggregated into the
db-group list: db-faculty
(for DB group faculty), db-grads (for DB group graduate students),
and db-friends (for DB group alumni, visitors, and friends). If
you wish to subscribe to one of these three lists (or to unsubscribe), please
visit
https://lists.uwaterloo.ca/mailman/listinfo/<listname>, where
<listname> is the list you wish to subscribe to.
- DB group meetings
- The DB group meets most Friday afternoons at 2pm, usually in DC1331.
See the list of current events for
times and locations of upcoming meetings. Each meeting lasts
for an hour and features an informal presentation by one of the
members of the group. Everyone is welcome to attend. These talks are
intended to raise questions and to stimulate discussion rather than
being polished presentations of research results. Speakers are determined
using a rotating speaker list, which can be found on the
DB group meeting page
- DB seminar series
- The DB seminar series features visiting speakers. These seminars are
more-or-less monthly, and are usually scheduled on Monday
mornings at 11am. See the list of current
events for times and locations of upcoming seminars. The
full schedule can be found on the DB seminar series page.
Recent and Upcoming Events
DB Meeting:
|
Friday January 11, 2:00pm, DC 1331
|
Speaker:
|
Grant Weddell
|
Title:
|
Evolution of Functional Constraints in Databases
|
Abstract:
|
Our talk will survey the topic of functional dependencies: their
origins, their applications and how they have evolved to accommodate
notions of time, approximation and the evolution of the underlying
relational data model itself.
|
DB Meeting:
|
Friday January 18, 2:00pm, DC 1331
CANCELLED
|
Speaker:
|
Matthew Young-Lai
|
DB Meeting:
|
Friday January 25, 2:00pm, DC 1331
|
Speaker:
|
Gord Cormack
|
Title:
|
Spam Filter Improvement through Competition
|
Abstract:
|
Spam Filters are widely promoted by (at least)
three separate groups: practitioners who just
want to get the job done, vendors who wish to
sell their wares, and researchers who wish to
promote their latest theory.
Spam filter benchmarks and competitions force
these groups to work and play together, with
dramatic results. I will discuss five spam
filter competitions run by ECML, TREC and
CEAS, what they have taught us and what remains
to be learned. The bottom line is that we know
how to filter spam much better than we did three
years ago.
|
DB Meeting:
|
Friday February 1, 2:00pm, DC 1331
CANCELLED |
Speaker:
|
Ahmed Soror
|
Title:
|
Automatic Virtual Machine Configuration for Database Workloads
|
Abstract:
|
Virtual machine monitors are becoming popular tools for the
deployment of database management systems and other enterprise software
applications. In this talk we discuss how we can address the problem of
optimizing the performance of database management systems by controlling the
configurations of the virtual machines in which they run. These virtual
machine configurations determine how the shared physical resources will be
allocated to the different database instances. We will over view our
technique which uses information about the anticipated workloads of each of
the database systems to recommend workload-specific configurations.
|
Special Event:
|
Friday February 8, 3:00-4:30pm, DC 3301
|
Title:
|
Database Lab Open House
|
DB Meeting:
|
Friday February 15, 2:00pm, DC 1331
|
Speaker:
|
David DeHaan
|
Title:
|
Equivalence of conjunctive queries under various semantics
|
Abstract:
|
This will be an informal talk on a classic database theory problem:
testing query equivalence. This problem is fundamental to any logical
query rewriting (such as using materialized views). I'll start with a
quick refresher of some very basic database theory---containment and
equivalence of conjunctive queries under set semantics, with extensions
to disjunctions, inequalities, schema dependencies, etc. I'll then
discuss some alternative semantics: bag and bag-set semantics
(Chaudhuri&Vardi 1993) and "combined set/bag-set semantics" (Cohen
2006). Finally, I'll discuss my research into query equivalence for an
SQL-like language that supports nesting of uninterpreted aggregation
functions.
|
DB Meeting:
|
Friday February 22, 2:00pm, DC 1304 (Please note change of room)
|
Speaker:
|
Ahmed Soror
|
Title:
|
Automatic Virtual Machine Configuration for Database Workloads
|
Abstract:
|
Virtual machine monitors are becoming popular tools for the
deployment of database management systems and other enterprise software
applications. In this talk we discuss how we can address the problem of
optimizing the performance of database management systems by controlling the
configurations of the virtual machines in which they run. These virtual
machine configurations determine how the shared physical resources will be
allocated to the different database instances. We will over view our
technique which uses information about the anticipated workloads of each of
the database systems to recommend workload-specific configurations.
|
DB Meeting:
|
Friday February 29, 2:00pm, DC 1331
|
Speaker:
|
Ihab Ilyas
|
Title:
|
Declarative and Lazy Data Cleaning Through Probabilistic Modeling of Erroneous Data
|
Abstract:
|
I will talk about our work (in-progress) on using uncertainty data
models to describe the various possibilities and decisions in current
data cleaning algorithms. The talk starts by a background on probabilistic
data models and on data cleaning algorithms that focus on duplicate
elimination. I'll then describe our proposal for probabilistic data cleaning.
This is joint work with my students George Beskales and Mohamed Soliman.
|
DB Meeting:
|
Friday March 7, 2:00pm, MC 5158A (Please note change of room)
|
Speaker:
|
Wei Jiang
|
Title:
|
Order-Related Work in Conventional and Non-Conventional Databases
|
Abstract:
|
Order is a critical characteristic of data, and has been studied for a
long time in database community. The need to study ordering of databases
stems from two main issues. First, an output of query is often required to
be sorted in a specific order. Second, by knowing the ordering of inputs,
sort-based operations (such as join, duplicate elimination, etc.) can be
implemented more efficiently.
Considerable effort has been made for supporting order in query processing
and optimization since 1970's. I will give a survey on the work that has
been done on order support in databases. Relevant work involves variant
research areas, such as data models, algebras, query optimization, query
rewrite, and integrity constraints etc.
However, none of them provided a systematic and consistent treatment of
order throughout query processing. In second part of this talk, I will
present my original work on this topic. The order of data will be
represented by binary relations on virtual tuple identifiers. An ordered
data model and complete ordered algebra will be proposed for ordered
conjunctive queries.
|
DB Meeting:
|
Friday March 28, 2:00pm, DC 1331
|
Speaker:
|
Yingying Tao
|
Title:
|
Mining frequent itemsets in dynamic data streams
|
Abstract:
|
A transactional data stream is an unbounded
sequence of transactions continuously generated
at a high rate. Mining frequent itemsets in
such a stream is beneficial to many real-world
applications but also a challenging task.
Furthermore, when a data stream is varying over
time, infrequent itemsets may become frequent
and vice versa. Few algorithms in the literature
are capable of maintaining and updating frequent
itemsets for such dynamic data stream. In this
talk, I will present a false-negative algorithm
which can find most of the frequent itemsets,
detect distribution changes, and update the mining
results accordingly.
|
DB Meeting:
|
Friday April 4, 2:00pm, DC 1331
|
Speaker:
|
Charles Clarke
|
Title:
|
Novelty and Diversity in Information Retrieval |
Abstract:
|
|
DB Meeting:
|
Friday April 18, 2:00pm, DC 1331
|
Speaker:
|
Amr El-Helw
|
Title:
|
Collecting and Exploiting Statistics on Query Expressions
|
Abstract:
|
Database statistics are crucial to cost-based optimizers for estimating
the execution cost of a query plan. Using traditional basic statistics
on base tables requires adopting unrealistic assumptions to estimate the
cardinalities of intermediate results, which usually causes large
estimation errors that can be several orders of magnitude. In this talk,
I will present some of the work done on creating and exploiting
statistics (or samples) built on expressions corresponding to
intermediate nodes of query plans.
|
Seminar:
|
Monday April 21, 1:00pm, MC 5136
|
Speaker:
|
José A. Blakeley, Microsoft
|
Title:
|
SQL Server: A Data Platform for Scientific and Engineering Applications
|
Abstract:
|
Large-scale scientific and engineering applications are pushing the scale
boundaries of data management systems. They require managing large volumes of
data, new data organization and partitioning paradigms, moving analysis close to
the data, integrating the tools, languages and packages of scientists with the
data management system, more productive visualization and rapid development
languages and tools, bridging and, when possible, eliminating the semantic gap
between scientific applications and their data. Many of these applications have
traditionally been built on top of file systems, using proprietary data models,
multi-threading, resource management, etc making them brittle and hard to share
solutions. We propose a data management approach centered on database
technology. We propose a data platform architecture on which all science and
engineering applications can be built and case studies of large-scale scientific
projects being built on this architecture as evidence of its practicality. The
data platform is being built around Windows, the Microsoft SQL Server product
and its data services, as well as the .NET programming languages and developer
tools. We hope to broaden the dialog among the database and various science
communities about their requirements and data management capabilities required
by their applications. We also expect to steer the interest of database
researchers toward the data management challenges of science and engineering
applications.
|
Bio:
|
José Blakeley is a partner architect in the SQL Server
Division at Microsoft Corporation. He is currently lead architect in the SQL
Engine group which builds the core engine for the SQL Server Product. Previously
he was the lead architect in SQL Data Programmability group building the Entity
Framework in ADO.NET. José has contributed to numerous programmability and
extensibility features in the SQL Server products. Before joining Microsoft in
1994, José was a Member of the Technical Staff at the Computer Science
Laboratory at Texas Instruments where he was a principal investigator in the
development of DARPA Open-OODB, an object-oriented database system. He has over
20 granted or pending patents. José received a computer systems engineering
degree from ITESM, Monterrey, Mexico, and M.Math and Ph.D. degrees in computer
science from the University of Waterloo, Canada.
|
Seminar:
|
Tuesday April 22, 11:00am, MC 5158
|
Speaker:
|
Pedro Celis, Microsoft
|
Title:
|
SQL Server: A What is new in databases?
|
Abstract:
|
A perspective on how the database technology has evolved and the challenges that we face
|
Bio:
|
Pedro Celis is currently a Distinguished Engineer in the
SQL Server group where he has architectural oversight responsibilities for all
of the services of the SQL Server product. He also manages the central
architecture team, a group of world-class architects in the database field that
guide the strategy and architecture of the SQL product and its components, and
drive innovation and incubation projects. He worked in California for
Britton-Lee Systems before joining the Non-Stop SQL group of Tandem Computers.
He worked for nine years for Britton-Lee Systems and became one of the few
persons ever named Technical Director. In 2003, Celis was nominated by President
Bush to serve a two year term on the President's Information Technology Advisory
Committee (PITAC). This 25-member committee is made up of information
infrastructure experts from industry and academia that advise the president on
how to maintain U.S. pre-eminence in information technology. He holds an
engineering degree from the Monterrey Institute of Technology (ITESM), and M.
Math and Ph. D. degrees in Computer Science from the University of Waterloo in
Canada. Celis worked as a post-doctoral fellow at the University of Waterloo and
later as an assistant professor at the Computer Science Department at Indiana
University in Bloomington Indiana. He holds around 15 U.S. patents.
|
MMath Thesis Presentation:
|
Friday May 2, 2:00pm, DC 1331
|
Speaker:
|
Ahmed Ataullah
|
Title:
|
Records Retention in Relational Database Systems
|
Abstract:
|
The recent introduction of several pieces of legislation mandating
minimum and maximum retention periods for corporate records has
prompted the Enterprise Content Management (ECM) community to develop
various records retention solutions. Unfortunately, the scope of their
work has been largely limited to proper identification, classification
and retention of documents.
In this work we address the problem of managed records retention in
the context of relational database systems. The problem is
significantly more challenging than it is for documents for several
reasons. Foremost, there is no universal definition of what
constitutes a business record in relational databases. Whether a
record is an entire table, a tuple, part of a tuple, or parts of
several tuples from multiple tables depends on the users'
requirements. There are also no standardized mechanisms for purging,
anonymizing and protecting relational records. Functional
dependencies, user defined constraints, and side effects caused by
triggers make it even harder to guarantee that any given record will
actually be protected when it needs to
be protected or expunged when the necessary conditions are met. Most
importantly, relational tuples may beorganized such that one single
piece of data may be part of various legal records and subject to
several (possibly conflicting) retention policies.
We address the above problems and present a complete solution for
designing, managing and enforcing records retention policies in
relational database systems. Our tests, conducted within a realistic
data retention scenario and using a standard commercial database
system, demonstrate that the proposed framework can guarantee
compliance with a broad range of retention policies without incurring
a significant performance overhead for policy monitoring and
enforcement.
|
This page is maintained by
Ashraf Aboulnaga.