Database Research Group Events

Fall 2013

Events of interest to the Database Research Group are posted here, and are also mailed to the uw.cs.database newsgroup and the db-faculty, db-grads, db-friends mailing lists. Subscribe to one of these mailing lists to receive e-mail notification of upcoming events.

The DB group meets Wednesday afternoons at 2:30pm. The list below gives the times and locations of upcoming meetings. Each meeting lasts for an hour and features either a local speaker or, on Seminar days, an invited outside speaker. Everyone is welcome to attend.

Fall 2013 Events

DB Meeting: Wednesday September 18, 2:30pm, DC 1331
Speaker: Anil Goel, SAP Waterloo
Title: The SAP HANA Platform
Abstract: I will talk about the challenges and R&D agenda that we are tackling for SAP HANA Platform, and our global collaborative research model.

DB Event: Monday September 23, 1:30pm, DC 1331
Speaker: Rizwan Mian, Queen's University
Title: Smart Spending: Determining Cost-Effective Resource Configurations for Executing Data-Intensive Workloads in Public Clouds
Abstract: The rate of data growth in many domains is challenging our ability to manage and analyze it. Consequently, we see the emergence of computing systems that attempt to efficiently process /data-intensive/applications or I/O bound applications with large data. /Cloud computing/offers “infinite” resources on demand, and on a pay-as-you-go basis. As a result, it has gained interest for large-scale data processing. Given this supposedly infinite resource set, a /provisioning/process is needed to determine appropriate resources for data processing or workload execution. The prevalent data processing architectures do not usually employ provisioning techniques available in a public cloud, and existing provisioning techniques have largely ignored data-intensive applications in public clouds. My Phd work takes a step towards bridging the gap between existing data processing approaches and the provisioning techniques available in a public cloud, such that the monetary cost of executing data-intensive workloads is minimized. I formulate the problem of provisioning and include constructs to exploit a cloud’s elasticity to include any number of resources prior to execution. The provisioning is modeled as a search problem, and standard search heuristics are used to solve it. I propose a novel framework for resource provisioning in a cloud environment. My framework allows pluggable cost and performance models. I instantiate the framework by developing various search algorithms, cost and performance models to support the search for an effective resource configuration. I consider data-intensive workloads that consist of transactional, analytical or mixed workloads for evaluation, and access multiple database tenants. The workloads are based on standard TPC benchmarks. In addition, the user preferences on response time or throughput are expressed as constraints. My propositions and their results are validated in a real public cloud, namely the Amazon cloud. The evaluation supports the claim that the framework is an effective tool for provisioning database workloads in a public cloud with minimal dollar cost.

DB Meeting: Wednesday September 25, 2:30pm, DC 1331
Speaker: Peter Bumbulis, SAP Waterloo
Title: Storage class memory and its potential impact on database systems
Abstract: To reduce latency, memory bus attached storage is becoming available. To take full advantage of the latency improvements, the software stack needs to be redesigned. In this talk I'll discuss some of the challenges along the way.

DB Meeting: Wednesday October 2, 2:30pm, DC 1331
Speaker: David Toman
Title: Merge-Joins for Free
Abstract: In execution of relational queries, ordered properties of relations allow the use of computationally preferable algorithms, such as the ``merge-join'' algorithm. In common implementations of relational systems, multiple ``join algorithms'' are present to account for these situations. In this talk I will revisit this design choice and argue that similar effect can be achieved using simple ``nested-loops join'' with the help of appropriately augmented access paths.

DB Seminar: Thursday October 10, 10:30 am, DC 1302
Speaker: Umeshwar Dayal, Hitachi America Ltd. R&D
Title: Analytics over Big Data for Operational Intelligence

DB Seminar: Wednesday October 16, 2:30 pm, DC 1302
Speaker: Tim Kraska, Brown University
Title: 1700 and PLANET: A 1-Phase Commit Protocol and Programming Model for Geo-Replicated Applications

DB Meeting: Wednesday October 23, 2:30pm, MC 5136
Speaker: Anisoara Nica, SAP Waterloo
Title: Sketches and other interesting data statistics methods

DB Seminar: Wednesday October 30, 2:30 pm, DC 1302
Speaker: Mark Fox, University of Toronto
Title: An Ontology for Global City Indicators

DB Meeting: Wednesday November 6, 2:30pm, DC 1331
Speaker: Lukasz Golab
Title: I’ll talk about some of my recent work on large-scale data warehousing and analytics.

DB Meeting: Wednesday November 13, 2:30pm, DC 1331
Speaker: Gunes Aluc
Title: chameleon-db
Abstract: The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard for the conceptual modeling of web resources, and SPARQL is the standard query language for RDF. As RDF is becoming more widely utilized, RDF data management systems are being exposed to workloads that are much more diverse and dynamic than they were designed to support, for which they are unable to provide consistently good performance. The problem arises because these systems are workload-agnostic; that is, they rely on a database structure and types of indexes that are fixed a priori, which cannot be modified at runtime.
In this presentation, I will introduce chameleon-db, which is a workload-aware RDF data management system that we have developed. chameleon-db automatically and periodically adjusts its layout of the RDF database to optimize for queries so that they can be executed efficiently. Since one cannot afford to stop processing queries, we propose a novel design that enables partitions to be concurrently updated. We demonstrate that chameleon-db can achieve robust performance across a diverse spectrum of query workloads, outperforming its competitors by up to 2 orders of magnitude, and that it can easily adapt to changing workloads.

DB Seminar: Wednesday November 20, 2:30 pm, DC 1302
Speaker: Paul Larson, Microsoft Research
Title: Evolving the Architecture of SQL Server

DB Meeting: Wednesday November 27, 2:30pm, DC 1331
Speaker: Suprio Ray, PhD Candidate, U. Toronto
Title: A Parallel Spatial Data Analysis Infrastructure for the Cloud
Abstract: Spatial data analysis applications are emerging from a wide range of domains such as building information management, environmental assessments and medical imaging. Time-consuming computational geometry algorithms make these applications slow, even for medium-sized datasets. At the same time, there is a rapid expansion in available processing cores, through multicore machines and Cloud computing. The confluence of these trends demands effective parallelization of spatial query processing. Unfortunately, traditional parallel spatial databases are ill-equipped to deal with the performance heterogeneity that is common in the Cloud. In this talk, I present Niharika, a parallel spatial data analysis infrastructure that exploits all available cores in a heterogeneous cluster. Niharika first uses a declustering technique that creates balanced spatial partitions. Then, Niharika adapts to performance heterogeneity and processing skew in the spatial dataset using dynamic loadbalancing. We evaluate Niharika with three load-balancing algorithms and two different spatial datasets (both from TIGER) using Amazon EC2 instances. For this evaluation, we selected several spatial join queries from our spatial database benchmark Jackpine. We demonstrate that Niharika adapts to the performance heterogeneity in the EC2 nodes, thereby achieving excellent speedups (e.g., 63.6X using 64 cores on 16 4-core EC2 nodes, in the best case) and outperforming an approach that does not adapt.

DB Meeting: Wednesday December 4, 2:30pm, DC 1331
Speaker: Jarek Szlichta, Postdoctoral Fellow, U. Toronto
Title: Expressiveness and Complexity of Order Dependencies
Abstract: Dependencies play an important role in databases. We study order dependencies (ODs)—and unidirectional order dependencies (UODs), a proper sub-class of ODs—which describe the relationships among lexicographical orderings of sets of tuples. We consider lexicographical ordering, as by the order-by operator in SQL, because this is the notion of order used in SQL and within query optimization. Our main goal is to investigate the inference problem for ODs, both in theory and in practice. We show the usefulness of ODs in query optimization. We establish the following theoretical results: (i) a hierarchy of order dependency classes; (ii) a proof of co-NP-completeness of the inference problem for the subclass of UODs (and ODs); (iii) a proof of co-NP-completeness of the inference problem of functional dependencies (FDs) from ODs in general, but demonstrate linear time complexity for the inference of FDs from UODs; (iv) a sound and complete elimination procedure for inference over ODs; and (v) a sound and complete polynomial inference algorithm for sets of UODs over restricted domains.

This page is maintained by Khuzaima Daudjee.

Campaign Waterloo

Data Systems Group
David R. Cheriton School of Computer Science
University of Waterloo
Waterloo, Ontario, Canada N2L 3G1
Tel: 519-888-4567
Fax: 519-885-1208

Contact | Feedback: | Data Systems Group

Valid HTML 4.01!Valid CSS! Last modified: Tuesday, 26-Nov-2013 17:04:42 EST