Database Research Group Events

Spring 2012

Events of interest to the Database Research Group are posted here, and are also mailed to the uw.cs.database newsgroup and the db-faculty, db-grads, db-friends mailing lists. Subscribe to one of these mailing lists to receive e-mail notification of upcoming events.

The DB group meets Wednesday afternoons at 2:30pm. The list below gives the times and locations of upcoming meetings. Each meeting lasts for an hour and features either a local speaker or, on Seminar days, an invited outside speaker. Everyone is welcome to attend.

Spring 2012 Events

DB Meeting: Wednesday May 2, 2:30pm, DC 1331
Speaker: Tamer Özsu
Title: RDF Data Management Using Graph Algorithms
Abstract: Resource Description Framework (RDF) has been proposed for modeling Web objects as part of developing the "semantic web". It has also gained attention as a way to accomplish web data integration. As the volume of RDF data has increased, interesting data management issues have arisen. In this talk I will discuss some of our recent work in this area, focusing on two results: answering SPARQL queries over RDF graphs, and processing aggregate SPARQL queries. The first problem focuses on evaluating SPARQL queries with wildcards over an RDF graph that sees frequent updates. We propose an approach that maps both the RDF data and the SPARQL query into graphs and converts the query evaluation problem to one of subgraph matching. In order to speed up query processing, we propose an indexing mechanism and pruning rules to reduce the search space. The second problem addresses the processing of aggregation queries over large RDF data sets. We propose a processing approach that partitions aggregate queries into smaller parts (called star queries), processes these efficiently, and joins the results of star queries to obtain more general results. We develop indexes to assist in executing star queries and to facilitate joining their results.

DB Meeting: Wednesday May 9, 2:30pm, MC 5136
Speaker: Lukasz Golab
Title: Data Warehouse Quality
Abstract: In this talk, I'll present a book chapter on data warehouse quality that will appear in the Handbook of Research and Practice in Data Quality. I'll discuss data quality metrics such as freshness, currency, completeness and consistency, and approaches to improving data quality such as error detection, error correction and distributed data quality monitoring. This is work in progress, so comments are welcome!

DB Meeting: Wednesday May 30, 2:30pm, DC 3313
Speaker: Ashraf Aboulnaga
Title: Defining a Big Data Research Agenda for Waterloo
Abstract: There is currently a lot of interest in Big Data as a research area. From 30,000 feet, the concept is clear: we have data that is big and useful and we need to make sense of it. However, if we zoom in and try to define what it means to do research on Big Data, the picture becomes fuzzy. One has a hard time defining a crisp research agenda in this area.

In this meeting I will moderate an open discussion among the attendees about Big Data research, with the goal of reaching a better understanding of this area and possibly finding common ground for collaborative research projects in Waterloo.

DB Meeting: Wednesday June 6, 2:30pm, DC 1331
Speaker: Jeff Pound
Title: Semantic Query Understanding Using Web Knowledge Bases
Abstract: Many keyword queries issued to Web search engines target information about real world entities, and interpreting these queries over Web knowledge bases can often enable the search system to provide exact answers to queries. Equally important is the problem of detecting when the reference knowledge base is not capable of answering the keyword query, due to lack of domain coverage.

In this work we present an approach to computing structured representations of keyword queries over a reference knowledge base. We mine frequent query structures from a Web query log and map these structures into a reference knowledge base. Our approach exploits coarse linguistic structure in keyword queries, and combines it with rich structured query representations of information needs.

DB Meeting: Wednesday June 13, 2:30pm, DC 1331
Speaker: Gunes Aluc
Title: A Generalized and Adaptive Approach to RDF Management
Abstract: With the proliferation of very large, web-scale distributed RDF datasets---which are queried by a variety of sources---there is an emerging need for RDF data management systems that can handle the query diversity and dynamism of workloads. Unfortunately, existing systems are workload-agnostic in their physical designs, each with a fixed base table layout and a choice of indexing. The fixed design choices lead to inefficient query processing and/or utilization of space when the physical configuration of the RDF data management system is not aligned with the queries in the workload. Consequently, supporting the query diversity and dynamism in emerging SPARQL workloads is problematic.

In this ongoing project, we treat the workload as a first-class citizen in the physical design process and propose techniques for generic and adaptive RDF management. To this end, we study the processing of SPARQL queries over RDF data using a graph-theoretic approach: we represent both the RDF data and the SPARQL queries as graphs and evaluate queries using subgraph matching techniques. In order to efficiently perform subgraph matching over large data graphs, the data graph is partitioned into smaller subgraphs, and queries are evaluated against the partitioning of the graph. We aim to achieve the objective of a generic physical design by relaxing the conditions that govern how the graph can be partitioned. This flexibility will enable us to preconfigure the system with the optimal partitioning for any representative workload. Furthermore, should the workloads change, we will be able to update the partitioning in a lightweight manner to adapt to the most recent and frequent queries.

DB Seminar: Thursday June 14, 11:00am, DC 3313
Speaker: Wolfgang Lehner, TU Dresden
Title: Storage Engine Design for Energy-Efficient Servers of the Future
Abstract: The talk presents core concepts of a new research cluster program set up at Dresden University of Technology addressing the design of future servers from a holistic perspective. Novel techniques on the hardware side (adaptive optical and high-speed short range wireless interconnects) and novel methods on the software side ranging from the design over the deployment to the runtime adaptions build the core of the research project. Therefore talk outlines the challenges and opportunities provided by future hardware developments in a first step. In a second step, the talk focusses on the design of a main-memory centric storage engine optimized to provide excellent performance for read and write requests, which aims at mixed OLTP and OLAP scenarios such as operational BI and realtime analytics. The talk gives an overview of different latch-free in-memory indexes that scale with the increasing number of hardware threads in modern CPUs. Based on those indexes, the talk introduces the idea of a novel query processing model that utilizes virtual memory functionality to allow destructive database operators working directly on index structures without affecting the original data.

DB Meeting: Wednesday June 20, 2:30pm, DC 1331
Speaker: Khuzaima Daudjee
Abstract: I'll talk about the paper: "The little engine(s) that could: scaling online social networks", in Proceedings of the ACM SIGCOMM 2010 Conference (and to appear in IEEE/ACM Transactions on Networking).

DB Meeting: Wednesday June 27, 2:30pm, DC 1331
Speaker: Lukasz Golab
Title: Materialized View Maintenance in Stream Data Warehouses
Abstract: In conventional data warehouses, materialized views are typically refreshed during downtimes, e.g., every night. In contrast, stream data warehouses perform view updates continuously as new data arrive, which enables nearly-real-time data analytics and provides an integrated platform for querying real-time and historical data. In this talk, I will describe some techniques and algorithms for efficient view refresh in stream data warehouses. This talk is based on the following two papers:
  • Scalable scheduling of updates in streaming data warehouses, Golab, Johnson, Shkapenyuk, TKDE 24(6), 2012
  • Stream warehousing with Data Depot, Golab, Johnson, Seidel, Shkapenyuk, SIGMOD 2009

DB Meeting: Wednesday July 11, 2:30pm, DC 1331
Speaker: Xin Liu
Title: Data Management For Hybrid Database Storage
Abstract: In the past few years, the cost of flash memory has fallen dramatically while fabrication has become more efficient. The price and performance of flash memory fall between traditional RAM and hard disk drives. Flash-based Solid State Drives (SSD) has been introduced to fill the gap between RAM and traditional rotating disks. We are interested in treating SSD as an extension cache of the buffer pool. We designed cost-aware placement and replacement algorithms for both the buffer pool and the SSD cache. These algorithms are aware of the cache hierarchy and the different I/O performance of HDD and SSD. We implemented these algorithms on InnoDB of MySQL, and used the TPC-C workload to demonstrate that these cost-aware algorithms outperform other algorithms.

DB Meeting: Wednesday July 18, 2:30pm, DC 1331
Speaker: David Toman
Title: Chase vs. Interpolation for Query Plan Generation
Abstract: We discuss how standard patterns that appear in plan generation for relational queries, such as base table scan, index scan followed by record lookup, index-only access, record id intersection based access, etc., can be automatically synthesized from a user query and an appropriate database schema in an unified way (i.e., without having to hardcode information specific to each of the above patterns directly in the query optimizer). We also discuss how our synthesis technique, based on interpolation, relates to "chase and back-chase" based techniques and how it can be made efficient in the setting of database schemas.

DB Meeting: Wednesday July 25, 2:30pm, DC 1331POSTPONED
Speaker: Peter Bumbulis
Title: Microsoft's MinuteSort Benchmark Entry
Abstract: I'd like to talk about the recent Microsoft MinuteSort benchmark entry, and how full-bisection-bandwidth networks provide an alternative approach to handling big data at data center scale.

Seminar: Thursday July 26, 2:30pm, DC 1331
Speaker: Carsten Binnig, DHBW Mannheim
Title: Distributed Snapshot Isolation: Global Transactions Pay Globally, Local Transactions Pay Locally
Abstract: Snapshot Isolation has emerged as a de facto standard to effect concurrency control and isolation in modern database systems. This talk revisits the problem of implementing Snapshot Isolation in a distributed database system. This talk addresses three important aspects. First, a complete definition of Distributed Snapshot Isolation is given, thereby extending existing definitions from the literature. Second, the design space of alternative methods to implement Distributed Snapshot Isolation is presented based on a set of correctness criteria. Third, a new approach to implement Distributed Snapshot Isolation is devised. With the help of extensive experiments with the TPC-C benchmark, it is shown that this approach significantly outperforms any other known method for a wide range of workloads. Furthermore, in contrast to many known techniques, our new approach requires no a priori knowledge of which nodes of a distributed system are involved in executing a transaction. Also, our new approach can execute transactions that involve data from a single node only with the same efficiency as a centralized database system. This way, it takes advantage of sharding or other ways to improve data locality. The cost for synchronizing transactions in a distributed system is only paid by transactions that actually involve data from several nodes. All these properties make our new approach much more practical than many related methods proposed in the literature.

DB Meeting: Wednesday August 1, 2:30pm, DC 1331
Speaker: Cătălin Avram
Title: SaviDB: An Edge-Aware Key-Value Storage System
Abstract: This talk proposes a location aware key-value storage solution optimized to work in an environment where computation is pushed closer to clients - towards the edge of the cloud. The proposed system is completely transparent to applications that use it and is highly optimized for working with entities that have a geographical location associated with them. The main motivation for using such a system is decreased response time due to optimized data locality while also improving data availability within a failure model similar to most cloud storage solutions. The trade-off is a loosened consistency model that only offers eventual consistency, yet still provides a global ordering of transactions at any point in time.

This page is maintained by Ken Salem.

Campaign Waterloo

Data Systems Group
David R. Cheriton School of Computer Science
University of Waterloo
Waterloo, Ontario, Canada N2L 3G1
Tel: 519-888-4567
Fax: 519-885-1208

Contact | Feedback: | Data Systems Group

Valid HTML 4.01!Valid CSS! Last modified: Monday, 23-Jul-2012 10:04:56 EDT