Database Research Group Events

Winter 2012

Events of interest to the Database Research Group are posted here, and are also mailed to the uw.cs.database newsgroup and the db-faculty, db-grads, db-friends mailing lists. Subscribe to one of these mailing lists to receive e-mail notification of upcoming events.

The DB group meets Wednesday afternoons at 2:30pm. The list below gives the times and locations of upcoming meetings. Each meeting lasts for an hour and features either a local speaker or, on Seminar days, an invited outside speaker. Everyone is welcome to attend.

Winter 2012 Events


DB Meeting: Wednesday January 11, 2:30pm, DC 1331
Speaker: Umar Farooq Minhas
Title: Scalable and Highly Available Database Systems in the Cloud
Abstract: Cloud computing allows users to tap into a massive pool of shared computing resources such as servers, storage, and network. These resources are provided as a service to the users allowing them to "plug into the cloud" similar to a utility grid. The promise of the cloud is to free users from the tedious and often complex task of managing and provisioning computing resources to run applications. At the same time, the cloud brings several additional benefits including: a pay-as-you-go cost model, easier deployment of applications, elastic scalability, high availability, and a more robust and secure infrastructure. One important class of applications that users are increasingly deploying in the cloud is database management systems. Database management systems differ from other types of applications in that they manage large amounts of state that is frequently updated, and that must be kept consistent at all scales and in the face of failure. This makes it difficult to provide scalability and high availability for database systems in the cloud. In this talk, I will show how we can exploit cloud technologies and relational database systems to provide a highly available and scalable database service in the cloud. In the first part of my talk, I will present RemusDB, a reliable, cost-effective high availability solution that is implemented as a service provided by the virtualization platform. RemusDB can make any database system highly available with little or no code modifications by exploiting the capabilities of virtualization. In the second part of the talk, I will present two systems that aim to provide elastic scalability for database systems in the cloud using two very different approaches. The three systems I will present bring us closer to the goal of building a scalable and reliable transactional database service in the cloud.

DB Meeting: Wednesday January 18, 2:30pm, DC 1331
Speaker: Jiewen Wu
Title: Answering Object Queries in DL Knowledge Bases
Abstract: We consider a generalization of instance retrieval over description-logics knowledge bases that provides users with assertions in which descriptions of qualifying objects are given in addition to their identifiers. Notably, this involves a transfer of basic database paradigms involving caching and query rewriting in the context of an assertion retrieval algebra. We present a query optimization framework for this algebra, with a focus on finding plans that avoid any need for general knowledge base reasoning at query execution time when sufficient cached results of earlier requests exist.

DB Seminar: Wednesday January 25, 2:30pm, DC 1302
Speaker: Ryan Johnson, University of Toronto
Title: Communication and co-design for scalable database engines

DB Meeting: Wednesday February 1, 2:30pm, DC 1331
Speaker: Iman Elghandour
Title: ReStore: Reusing Results of MapReduce Jobs
Abstract: Analyzing large scale data has emerged as an important activity for many organizations in the past few years. This large scale data analysis is facilitated by the MapReduce programming and execution model and its implementations, most notably Hadoop. Users of MapReduce often have analysis tasks that are too complex to express as individual MapReduce jobs. Instead, they use high-level query languages such as Pig, Hive, or JAQL to express their complex tasks. The compilers of these languages translate queries into workflows of MapReduce jobs. Each job in these workflows reads its input from the distributed file system used by the MapReduce system and produces output that is stored in this distributed file system and read as input by the next job in the workflow. In my talk, I will present ReStore, a system that manages the storage and reuse of such intermediate results.

DB Meeting: Wednesday February 8, 2:30pm, DC 1331
Speaker: Greg Drzadzewski
Title: Online Analytical Processing of Documents
Abstract: With the availability of many large and ever growing document collections it is getting more cumbersome for users to explore them. While a search engine is useful to satisfy a user's ad hoc information needs, allowing a user to retrieve relevant documents through a keyword query, it is inadequate for analysis of bulky text information, which are necessary in many online applications. This type of exploration need can be addressed by providing support for online applications such as summarizing the contents of a text cell, and comparing the contents across multiple text cells. In my talk I will examine online analytical processing of documents and discuss the following two papers that deal with this area of research:
  1. Jin, X., Han, J., Cao, L., Luo, J., Ding, B. and Lin, C. X. Visual cube and on-line analytical processing of images. CIKM 2010, 849-858.
  2. Zhang, D., Zhai, C. and Han, J. MiTexCube: MicroTextCluster Cube for Online Analysis of Text Cells. CIDU 2011, 204-218.

Computer Science Seminar: Wednesday February 15, 10:30am, DC 1304
Speaker: Julia Stoyanovich, University of Pennsylvania
Title: Information Discovery in Large Complex Datasets

DB Meeting: Wednesday February 15, 2:30pm, DC 1331 POSTPONED
Speaker: Alex Hudek
Title: On Enumerating Query Plans Using Interpolants
Abstract: For relational (SQL) queries a standard service provided by current relational systems is to search the space of alternative query plans (ways of executing the query) to find one likely to have the best performance. A given query often has many semantically equivalent plans that vary in performance by many orders of magnitude making the problem of finding a best plan difficult. Recent trends in view based query rewriting, information integration, and ontology-based data access have made the relationship between the query and its plan space much more complex. Enumerating the possible plans has become even more challenging as the relationship between the user (logical) view of the data and the material capabilities for accessing relevant stored information has become less transparent. In this paper, we show how to use interpolation techniques to enumerate possible plans for a given user query. We also show how to obtain common varieties of plan patterns in this setting, such as those that derive from an enumeration of possible join orders for conjunctive (sub) queries.

Seminar: Thursday February 16, 4:00pm, DC 1331
Speaker: Arnon Sturm, Ben-Gurion University of the Negev
Title: A Methodology for Developing Secure Database Code
Abstract: Security in general and database protection from unauthorized access in particular, are crucial to organizations. Several methods and techniques were devised to address this concern. However, none of these provide a comprehensive solution. In this talk we explore a work done within the context of a research project which aims at developing a methodology for guiding and enforcing developers, in particular database designers, to deal with database security requirements related to authorization in the early stages of development. The proposed methodology enables to define and enforce organizational security policies, and to validate that security requirements defined by the designers of an application are in accord with the organizational transformation of the design results into actual implementation, i.e., into the specification of the database code, including the authorization specification. We also present an empirical evaluation of part of the proposed approach.

DB Meeting: Wednesday February 29, 2:30pm, DC 1331
Speaker: Alex Hudek
Title: On Enumerating Query Plans Using Interpolants
Abstract: For relational (SQL) queries a standard service provided by current relational systems is to search the space of alternative query plans (ways of executing the query) to find one likely to have the best performance. A given query often has many semantically equivalent plans that vary in performance by many orders of magnitude making the problem of finding a best plan difficult. Recent trends in view based query rewriting, information integration, and ontology-based data access have made the relationship between the query and its plan space much more complex. Enumerating the possible plans has become even more challenging as the relationship between the user (logical) view of the data and the material capabilities for accessing relevant stored information has become less transparent. In this paper, we show how to use interpolation techniques to enumerate possible plans for a given user query. We also show how to obtain common varieties of plan patterns in this setting, such as those that derive from an enumeration of possible join orders for conjunctive (sub) queries.

Computer Science Seminar: Monday March 5, 10:30am, DC 1304
Speaker: Leman Akoglu, Carnegie Mellon University
Title: Large-scale Graph Analytics: Patterns, Anomalies, and Tools

DB Meeting: Wednesday March 7, 2:30pm, DC 1331
Speaker: Ahmed Ataullah
Title: Towards Policy-Centric Object-Relational-Modeling (ORM)
Abstract: Object relational modeling is essentially the challenge of mapping objects, as programmers see them, to individual pieces of data stored in relations in a database system. The research in this area is primarily motivated by the fact that object oriented database systems have not seen mainstream adoption and that retrieving/persisting objects from/to a relational database requires querying and therefore some knowledge about the logical/physical schema on the part of the OO programmer. This impedance mismatch makes the process of (rapid) application development more costly and inefficient. Although ORM offers the promise of further isolating programmers from the storage layer, it leads to interesting design questions about concurrency control, transaction management, side-effects (triggers) and methods for effective programming in the presence of an intermediate object-to-SQL layer.

In the first half of this talk I will introduce, motivate, discuss the pros/cons of ORM and briefly go over the features of popular ORM tools (Hibernate and ADO.net Entity framework). In the second half of my talk I will pose an open question of whether ORM can be used by business policy makers to embed and model rules, such as obligations, and inter object temporal restriction that mimic business policy requirements. My goal will be to show that ORM should not only be a tool for object-oriented programming but can (in theory) also be used for expressing complex policies over objects and these policies can be eventually translated into a set of active integrity constraints in a database system. The talk will appeal to a broad audience and everyone is encouraged to attend as the variety and quality of the refreshments and snacks will be SIGNIFICANTLY better than typical DB-lab talks.


Computer Science Seminar: Wednesday March 14, 10:30am, DC 1304
Speaker: Spiros Papadimitriou, Google
Title: Large Graph Models and Scalable Analytics

DB Meeting: Wednesday March 14, 2:30pm, DC 1331
Speaker: Gunes Aluc
Title: Parametric Plan Caching Using Density-Based Clustering
Abstract: Query plan caching eliminates the need for repeated query optimization; hence, it has strong practical implications for relational database management systems (RDBMSs). Unfortunately, existing approaches consider only the query plan generated at the expected values of parameters that characterize the query, data and the current state of the system, while these parameters may take different values during the lifetime of a cached plan. A better alternative is to harvest the optimizer's plan choice for different parameter values, populate the cache with promising query plans, and select a cached plan based upon current parameter values. To address this challenge, we propose a parametric plan caching (PPC) framework that uses an online plan space clustering algorithm. The clustering algorithm is density-based, and it exploits locality-sensitive hashing as a pre-processing step so that clusters in the plan spaces can be efficiently stored in database histograms and queried in constant time. We experimentally validate that our approach is precise, efficient in space-and-time and adaptive, requiring no eager exploration of the plan spaces of the optimizer.

Computer Science Seminar: Monday March 19, 10:30am, DC 1304
Speaker: Swapnil Patil, Carnegie Mellon University
Title: Scaling and understanding storage system support for billions of tiny objects

Computer Science Seminar: Wednesday March 21, 10:00am, MC 5158
Speaker: Yizhou Sun, University of Illinois
Title: Mining Heterogeneous Information Networks

DB Meeting: Wednesday March 28, 2:30pm, DC 1331
Speaker: Ani Nica
Title: On Resource Consumption of the Query Optimization Process
Abstract: Query optimization is a sophisticated process whose resource consumption and quality of the best execution plan is determined by the query complexity, available resources of the RDBMS server, and the current instance of the database. In this talk, I will present the experimental results of the optimization time breakdown and the memory consumption for a set of join enumeration algorithms ranging from highly heuristics algorithms to dynamic programming algorithms with exhaustive bushy trees enumeration. Next, I will discuss how these type of statistics can be used:
  1. to analyze the effect of a change to the query optimizer (e.g., a change which improves the CPU time for the plan generation phase but increases overall memory consumption);
  2. to analyze the differences between two join enumeration algorithms for particular queries;
  3. to estimate the resource consumption of current queries based on previously optimized queries.

DB Seminar: Monday April 9, 2:30pm, DC 1302
Speaker: Martin Kersten, CWI
Title: Arrays in database systems, the next frontier?

DB Meeting: Wednesday April 18, 2:30pm, DC 1331 CANCELLED
Speaker: Ahmed Soror

DB Meeting: Wednesday April 25, 2:30pm, DC 1331
Speaker: Ning Zhang
Title: A New Query Model for Graph Databases
Abstract: The database community has shown interest in developing algorithms to querying large graph databases. There are the following common types of queries: reachability queries, shortest distance path queries, and subgraph containment/match queries. Each of these query types match a particular application, but individually they are not sufficiently powerful or general to serve as a general purpose graph query language (similar to SQL for relational systems). In many applications, graph data can be modeled as a property graph where each node and each edge may have a label and arbitrary key/value pairs as properties and problems are related to identifying patterns in the graph, i.e., identifying sub-graphs that match a particular topology and/or some features. In this paper, we define a new query model over a property graph model so that most common types of queries can be uniformly expressed and processed efficiently under a single framework.

This page is maintained by Ken Salem.

Campaign Waterloo

Data Systems Group
David R. Cheriton School of Computer Science
University of Waterloo
Waterloo, Ontario, Canada N2L 3G1
Tel: 519-888-4567
Fax: 519-885-1208

Contact | Feedback: db-webmaster@cs.uwaterloo.ca | Data Systems Group


Valid HTML 4.01!Valid CSS! Last modified: Friday, 01-Jun-2012 11:01:03 EDT


Menu:ShowHide