Database Research Group Events

Winter 2008

Note: Events of interest to the Database Research Group are posted to the uw.cs.database newsgroup and are mailed to the db-group@lists.uwaterloo.ca mailing list. There are actually three mailing lists aggregated into the db-group list: db-faculty (for DB group faculty), db-grads (for DB group graduate students), and db-friends (for DB group alumni, visitors, and friends). If you wish to subscribe to one of these three lists (or to unsubscribe), please visit https://lists.uwaterloo.ca/mailman/listinfo/<listname>, where <listname> is the list you wish to subscribe to.
DB group meetings
The DB group meets most Friday afternoons at 2pm, usually in DC1331. See the list of current events for times and locations of upcoming meetings. Each meeting lasts for an hour and features an informal presentation by one of the members of the group. Everyone is welcome to attend. These talks are intended to raise questions and to stimulate discussion rather than being polished presentations of research results. Speakers are determined using a rotating speaker list, which can be found on the DB group meeting page
DB seminar series
The DB seminar series features visiting speakers. These seminars are more-or-less monthly, and are usually scheduled on Monday mornings at 11am. See the list of current events for times and locations of upcoming seminars. The full schedule can be found on the DB seminar series page.

Recent and Upcoming Events


DB Meeting: Friday January 11, 2:00pm, DC 1331
Speaker: Grant Weddell
Title: Evolution of Functional Constraints in Databases
Abstract: Our talk will survey the topic of functional dependencies: their origins, their applications and how they have evolved to accommodate notions of time, approximation and the evolution of the underlying relational data model itself.

DB Seminar: Monday January 14, 10:30am, DC 1304
Speaker: Ahmed Elmagarmid, Purdue University
Title: Cyber Communities: Enabling Innovation in Science and Engineering

DB Meeting: Friday January 18, 2:00pm, DC 1331 CANCELLED
Speaker: Matthew Young-Lai

DB Meeting: Friday January 25, 2:00pm, DC 1331
Speaker: Gord Cormack
Title: Spam Filter Improvement through Competition
Abstract: Spam Filters are widely promoted by (at least) three separate groups: practitioners who just want to get the job done, vendors who wish to sell their wares, and researchers who wish to promote their latest theory.

Spam filter benchmarks and competitions force these groups to work and play together, with dramatic results. I will discuss five spam filter competitions run by ECML, TREC and CEAS, what they have taught us and what remains to be learned. The bottom line is that we know how to filter spam much better than we did three years ago.


DB Meeting: Friday February 1, 2:00pm, DC 1331 CANCELLED
Speaker: Ahmed Soror
Title: Automatic Virtual Machine Configuration for Database Workloads
Abstract: Virtual machine monitors are becoming popular tools for the deployment of database management systems and other enterprise software applications. In this talk we discuss how we can address the problem of optimizing the performance of database management systems by controlling the configurations of the virtual machines in which they run. These virtual machine configurations determine how the shared physical resources will be allocated to the different database instances. We will over view our technique which uses information about the anticipated workloads of each of the database systems to recommend workload-specific configurations.

Special Event: Friday February 8, 3:00-4:30pm, DC 3301
Title: Database Lab Open House

DB Seminar: Monday February 11, 2:30pm, DC 1304 (Please note change of time)
Speaker: Neoklis Polyzotis, University of California - Santa Cruz
Title: Depth Estimation for Ranking Query Optimization

DB Meeting: Friday February 15, 2:00pm, DC 1331
Speaker: David DeHaan
Title: Equivalence of conjunctive queries under various semantics
Abstract: This will be an informal talk on a classic database theory problem: testing query equivalence. This problem is fundamental to any logical query rewriting (such as using materialized views). I'll start with a quick refresher of some very basic database theory---containment and equivalence of conjunctive queries under set semantics, with extensions to disjunctions, inequalities, schema dependencies, etc. I'll then discuss some alternative semantics: bag and bag-set semantics (Chaudhuri&Vardi 1993) and "combined set/bag-set semantics" (Cohen 2006). Finally, I'll discuss my research into query equivalence for an SQL-like language that supports nesting of uninterpreted aggregation functions.

DB Meeting: Friday February 22, 2:00pm, DC 1304 (Please note change of room)
Speaker: Ahmed Soror
Title: Automatic Virtual Machine Configuration for Database Workloads
Abstract: Virtual machine monitors are becoming popular tools for the deployment of database management systems and other enterprise software applications. In this talk we discuss how we can address the problem of optimizing the performance of database management systems by controlling the configurations of the virtual machines in which they run. These virtual machine configurations determine how the shared physical resources will be allocated to the different database instances. We will over view our technique which uses information about the anticipated workloads of each of the database systems to recommend workload-specific configurations.

DB Meeting: Friday February 29, 2:00pm, DC 1331
Speaker: Ihab Ilyas
Title: Declarative and Lazy Data Cleaning Through Probabilistic Modeling of Erroneous Data
Abstract: I will talk about our work (in-progress) on using uncertainty data models to describe the various possibilities and decisions in current data cleaning algorithms. The talk starts by a background on probabilistic data models and on data cleaning algorithms that focus on duplicate elimination. I'll then describe our proposal for probabilistic data cleaning. This is joint work with my students George Beskales and Mohamed Soliman.

DB Meeting: Friday March 7, 2:00pm, MC 5158A (Please note change of room)
Speaker: Wei Jiang
Title: Order-Related Work in Conventional and Non-Conventional Databases
Abstract: Order is a critical characteristic of data, and has been studied for a long time in database community. The need to study ordering of databases stems from two main issues. First, an output of query is often required to be sorted in a specific order. Second, by knowing the ordering of inputs, sort-based operations (such as join, duplicate elimination, etc.) can be implemented more efficiently.

Considerable effort has been made for supporting order in query processing and optimization since 1970's. I will give a survey on the work that has been done on order support in databases. Relevant work involves variant research areas, such as data models, algebras, query optimization, query rewrite, and integrity constraints etc.

However, none of them provided a systematic and consistent treatment of order throughout query processing. In second part of this talk, I will present my original work on this topic. The order of data will be represented by binary relations on virtual tuple identifiers. An ordered data model and complete ordered algebra will be proposed for ordered conjunctive queries.


DB Meeting: Friday March 28, 2:00pm, DC 1331
Speaker: Yingying Tao
Title: Mining frequent itemsets in dynamic data streams
Abstract: A transactional data stream is an unbounded sequence of transactions continuously generated at a high rate. Mining frequent itemsets in such a stream is beneficial to many real-world applications but also a challenging task. Furthermore, when a data stream is varying over time, infrequent itemsets may become frequent and vice versa. Few algorithms in the literature are capable of maintaining and updating frequent itemsets for such dynamic data stream. In this talk, I will present a false-negative algorithm which can find most of the frequent itemsets, detect distribution changes, and update the mining results accordingly.

DB Seminar: Monday March 31, 10:30am, DC 1304
Speaker: Denilson Barbosa, University of Calgary
Title: Towards Schema-Free Exchange and Update of XML Data

DB Meeting: Friday April 4, 2:00pm, DC 1331
Speaker: Charles Clarke
Title: Novelty and Diversity in Information Retrieval
Abstract:

DB Meeting: Friday April 18, 2:00pm, DC 1331
Speaker: Amr El-Helw
Title: Collecting and Exploiting Statistics on Query Expressions
Abstract: Database statistics are crucial to cost-based optimizers for estimating the execution cost of a query plan. Using traditional basic statistics on base tables requires adopting unrealistic assumptions to estimate the cardinalities of intermediate results, which usually causes large estimation errors that can be several orders of magnitude. In this talk, I will present some of the work done on creating and exploiting statistics (or samples) built on expressions corresponding to intermediate nodes of query plans.

Seminar: Monday April 21, 1:00pm, MC 5136
Speaker: José A. Blakeley, Microsoft
Title: SQL Server: A Data Platform for Scientific and Engineering Applications
Abstract: Large-scale scientific and engineering applications are pushing the scale boundaries of data management systems. They require managing large volumes of data, new data organization and partitioning paradigms, moving analysis close to the data, integrating the tools, languages and packages of scientists with the data management system, more productive visualization and rapid development languages and tools, bridging and, when possible, eliminating the semantic gap between scientific applications and their data. Many of these applications have traditionally been built on top of file systems, using proprietary data models, multi-threading, resource management, etc making them brittle and hard to share solutions. We propose a data management approach centered on database technology. We propose a data platform architecture on which all science and engineering applications can be built and case studies of large-scale scientific projects being built on this architecture as evidence of its practicality. The data platform is being built around Windows, the Microsoft SQL Server product and its data services, as well as the .NET programming languages and developer tools. We hope to broaden the dialog among the database and various science communities about their requirements and data management capabilities required by their applications. We also expect to steer the interest of database researchers toward the data management challenges of science and engineering applications.
Bio: José Blakeley is a partner architect in the SQL Server Division at Microsoft Corporation. He is currently lead architect in the SQL Engine group which builds the core engine for the SQL Server Product. Previously he was the lead architect in SQL Data Programmability group building the Entity Framework in ADO.NET. José has contributed to numerous programmability and extensibility features in the SQL Server products. Before joining Microsoft in 1994, José was a Member of the Technical Staff at the Computer Science Laboratory at Texas Instruments where he was a principal investigator in the development of DARPA Open-OODB, an object-oriented database system. He has over 20 granted or pending patents. José received a computer systems engineering degree from ITESM, Monterrey, Mexico, and M.Math and Ph.D. degrees in computer science from the University of Waterloo, Canada.

Seminar: Tuesday April 22, 11:00am, MC 5158
Speaker: Pedro Celis, Microsoft
Title: SQL Server: A What is new in databases?
Abstract: A perspective on how the database technology has evolved and the challenges that we face
Bio: Pedro Celis is currently a Distinguished Engineer in the SQL Server group where he has architectural oversight responsibilities for all of the services of the SQL Server product. He also manages the central architecture team, a group of world-class architects in the database field that guide the strategy and architecture of the SQL product and its components, and drive innovation and incubation projects. He worked in California for Britton-Lee Systems before joining the Non-Stop SQL group of Tandem Computers. He worked for nine years for Britton-Lee Systems and became one of the few persons ever named Technical Director. In 2003, Celis was nominated by President Bush to serve a two year term on the President's Information Technology Advisory Committee (PITAC). This 25-member committee is made up of information infrastructure experts from industry and academia that advise the president on how to maintain U.S. pre-eminence in information technology. He holds an engineering degree from the Monterrey Institute of Technology (ITESM), and M. Math and Ph. D. degrees in Computer Science from the University of Waterloo in Canada. Celis worked as a post-doctoral fellow at the University of Waterloo and later as an assistant professor at the Computer Science Department at Indiana University in Bloomington Indiana. He holds around 15 U.S. patents.

MMath Thesis Presentation: Friday May 2, 2:00pm, DC 1331
Speaker: Ahmed Ataullah
Title: Records Retention in Relational Database Systems
Abstract: The recent introduction of several pieces of legislation mandating minimum and maximum retention periods for corporate records has prompted the Enterprise Content Management (ECM) community to develop various records retention solutions. Unfortunately, the scope of their work has been largely limited to proper identification, classification and retention of documents.

In this work we address the problem of managed records retention in the context of relational database systems. The problem is significantly more challenging than it is for documents for several reasons. Foremost, there is no universal definition of what constitutes a business record in relational databases. Whether a record is an entire table, a tuple, part of a tuple, or parts of several tuples from multiple tables depends on the users' requirements. There are also no standardized mechanisms for purging, anonymizing and protecting relational records. Functional dependencies, user defined constraints, and side effects caused by triggers make it even harder to guarantee that any given record will actually be protected when it needs to be protected or expunged when the necessary conditions are met. Most importantly, relational tuples may beorganized such that one single piece of data may be part of various legal records and subject to several (possibly conflicting) retention policies.

We address the above problems and present a complete solution for designing, managing and enforcing records retention policies in relational database systems. Our tests, conducted within a realistic data retention scenario and using a standard commercial database system, demonstrate that the proposed framework can guarantee compliance with a broad range of retention policies without incurring a significant performance overhead for policy monitoring and enforcement.


This page is maintained by Ashraf Aboulnaga.

Campaign Waterloo

Data Systems Group
David R. Cheriton School of Computer Science
University of Waterloo
Waterloo, Ontario, Canada N2L 3G1
Tel: 519-888-4567
Fax: 519-885-1208

Contact | Feedback: db-webmaster@cs.uwaterloo.ca | Data Systems Group


Valid HTML 4.01!Valid CSS! Last modified: Friday, 01-Jun-2012 11:01:03 EDT


Menu:ShowHide