Database Research Group Events

Spring 2011

Events of interest to the Database Research Group are posted here, and are also mailed to the uw.cs.database newsgroup and the db-faculty, db-grads, db-friends mailing lists. Subscribe to one of these mailing lists to receive e-mail notification of upcoming events.

The DB group meets Wednesday afternoons at 2:30pm. The list below gives the times and locations of upcoming meetings. Each meeting lasts for an hour and features either a local speaker or, on Seminar days, an invited outside speaker. Everyone is welcome to attend.

Spring 2011 Events

DB Seminar: Wednesday May 4, 2:30pm, DC1302
Speaker: Divyakant Agrawal, University of California, Santa Barbara
Title: Elastic Scalability of Data-intensive Applications in the Cloud

DB Meeting: Wednesday May 11, 2:30pm, DC 1331
Speaker: Ani Nica, Sybase
Title: A Call for Order in Search Space Generation Process of Query Optimization
Abstract: Many search space generation algorithms used by query optimizers focus on efficiently and exhaustively enumerating the set of alternative plans. However, newly emerged systems, such as Object-Relational Mapping (ORM) tools, introduce new challenges to the query optimizer. This is because dynamically generated queries are very complex, relatively inexpensive to execute, and they must be optimized at runtime. For such queries the optimization time could be prohibitively expensive relative to their execution time if traditional join enumeration algorithms are used. Recently, new heuristics for the query optimization have been proposed which provide feasible alternatives to the classical enumeration algorithms by using greedy techniques to efficiently build an execution plan such that the optimization time is kept to a minimum.

In this talk, I will present the paper A Call for Order in Search Space Generation Process of Query Optimization[1] which is published in the proceedings of IEEE International Workshop on Self Managing Database Systems (IEEE SMDB), April, 2011, Hannover, Germany. The paper introduces new techniques for dynamically ordering the search space of the candidate joins generated by conventional enumeration algorithms. Ordering the enumerated join trees based on their properties allows effective partition-based pruning of the search space, so that the space size and the optimization time become adapted to the current characteristics of the optimization process. Any conventional join enumeration algorithms based on enumerating partitions and using memoization for saving access plans can be extended with our proposed technique.

[1] Anisoara Nica, A Call for Order in Search Space Generation Process of Query Optimization, IEEE International Workshop on Self Managing Database Systems (IEEE SMDB), IEEE International Conference on Data Engineering (IEEE ICDE) Workshops 2011, Hannover, Germany

DB Meeting: Wednesday May 18, 2:30pm, DC 1331
Speaker: Anil Goel, Sybase
Title: Big Data, Advanced Analytics
Abstract: I will give a business and technological overview of the "Big Data" and "Advanced Analytics" problem spaces, and an outline of one potential solution space for these problems as envisioned by Sybase, particularly in the context of the Sybase IQ EDW offering. I will also provide context for work being done in the Sybase Waterloo lab.

DB Meeting: Wednesday May 25, 2:30pm, DC 1331
Speaker: Shai Ben-David
Title: Privacy preserving data mining
Abstract: We address the question of how can a data base be mined to obtain aggregate information without disclosing information about individual records in that data base. This has been an active research area in the data mining community for quite a while, and in the past several years has also been addressed by the theoretical cryptography research community. I will give a brief survey of the existing literature on this topic, discuss some shortcomings of some of the most popular current approaches and present some new ideas addressing these shortcomings that have been developed jointly with Rita Ackerman and Dale Schuurmans.

DB Meeting: Wednesday June 01, 2:30pm, DC 1331
Speaker: Chen Zhang
Title: CloudBATCH: A Batch Job Queuing System on Clouds with Hadoop and HBase
Abstract: As MapReduce becomes more and more popular in data processing applications, the demand for Hadoop clusters grows increasingly. However, Hadoop is incompatible with existing cluster batch job queuing systems and requires a dedicated cluster under its full control. Hadoop also lacks support for user access control, accounting, fine-grain performance monitoring and legacy batch job processing facilities comparable to existing cluster job queuing systems. This makes dedicated Hadoop clusters less amenable for administrators and normal users alike with hybrid computing needs that involve both MapReduce and legacy applications. As a result, getting a properly suited and sized Hadoop cluster has not been easy in organizations with existing clusters. This talk presents CloudBATCH, a prototype solution to this problem enabling Hadoop to function as a traditional batch job queuing system with enhanced functionality for cluster resource management. With a system like CloudBATCH, a complete shift to Hadoop for managing an entire cluster to cater for hybrid computing needs becomes feasible.

DB Meeting: Wednesday June 08, 2:30pm, DC 1331
Speaker: Umar Farooq Minhas
Title: A Bayesian Approach to Online Performance Modeling for Database Appliances using Gaussian Models
Abstract: In order to meet service level agreements (SLAs) and to maintain peak performance for database management systems (DBMS), database administrators (DBAs) need to implement policies for effective workload scheduling, admission control, and resource provisioning. Accurately predicting response times of DBMS queries is necessary for a DBA to effectively achieve these goals. This task is particularly challenging due to the fact that a database workload typically consists of many concurrently running queries and an accurate model needs to capture their interactions. Additional challenges are introduced when DBMSes are run in dynamic cloud computing environments, where workload, data, and physical resources can change frequently, on-the-fly. Building an efficient and highly accurate online DBMS performance model that is robust in the face of changing workloads, data evolution, and physical resource allocations is still an unsolved problem. In this work, our goal is to build such an online performance model for database appliances using an experiment-driven modeling approach. We use a Bayesian approach and build novel Gaussian models that take into account the interaction among concurrently executing queries and predict response times of individual DBMS queries. A key feature of our modeling approach is that the models can be updated online in response to new queries or data, or changing resource allocations. We experimentally demonstrate that our models are accurate and effective - our best models have an average prediction error of 16.3% in the worst case.

DB Meeting: Wednesday June 22, 2:30pm, DC 1331 CANCELLED
Speaker: Ken Salem

DB Seminar: Monday June 27, 10:30am, DC 1302 (Please note change in day and starting time)
Speaker: Shivnath Babu, Duke University
Title: MADDER and Self-Tuning Data Analytics on Hadoop with Starfish

DB Seminar: Wednesday June 29, 2:30pm, DC 1302
Speaker: Renee Miller, University of Toronto
Title: On Schema Discovery

DB Seminar: Wednesday July 06, 2:30pm, MC 5136 (Please note change in room)
Speaker: Avigdor Gal, Technion
Title: Uncertain Schema Matching: the Power of not Knowing

DB Seminar: Wednesday July 13, 2:30pm, MC2018B (Please note change in room)
Speaker: Andreas Thor, University of Maryland
Title: Data Integration in the Cloud

DB Meeting: Wednesday July 20, 2:30pm, DC 1331
Speaker: Andrew Kane
Title: Contemporary misconceptions that limit distributed system design and implementation
Abstract: The current practice for distributed systems builds on unwritten assumptions about how to design and implement such systems. I believe that some of these assumptions are limiting potential optimizations, for example, the belief that static compilation is faster than just-in-time compilation leads to query plans that don't change after a query starts, and query executions that don't adapt to data distribution changes across the span of a query. In order to make these ideas more concrete, I will show potential benefits using a three tiered distributed database system as an example. The misconceptions presented are related to machine design, data level distribution, and tiered solutions.

DB Meeting: Wednesday July 27, 2:30pm, DC 1331
Speaker: Changjiu Jin
Title: View Consistency in Cassandra
Abstract: Recently, the cloud computing paradigm has been receiving significant attention. The relational databases may lead to inefficiencies, and limit scale and availability. Therefore, some systems prefer to sacrifice some relational database features, such as efficient support for complex queries and data consistency to cater to the large scalability requirement in cloud.

We explore a view management mechanism, to speed up some kinds of query accessing in NoSQL systems. This mechanism is to update materialized views and address the inconsistency issue between base tables and materialized views by providing session consistency guarantee.

DB Meeting: Wednesday August 24, 2:30pm, DC 1331
Speakers: Patrick Kling and Umar Farooq Minhas
Title: VLDB practice talks
  • RemusDB: Transparent High-Availability for Database Systems (Umar)
  • Generating Efficient Execution Plans for Vertically Partitioned XML Databases (Patrick)

This page is maintained by Ken Salem.

Campaign Waterloo

Data Systems Group
David R. Cheriton School of Computer Science
University of Waterloo
Waterloo, Ontario, Canada N2L 3G1
Tel: 519-888-4567
Fax: 519-885-1208

Contact | Feedback: | Data Systems Group

Valid HTML 4.01!Valid CSS! Last modified: Friday, 01-Jun-2012 11:01:03 EDT