# Fall 2013 Events Schedule | Database Research Group | UW

## Fall 2013

Events of interest to the Database Research Group are posted here, and are also mailed to the uw.cs.database newsgroup and the db-faculty, db-grads, db-friends mailing lists. Subscribe to one of these mailing lists to receive e-mail notification of upcoming events.

The DB group meets Wednesday afternoons at 2:30pm. The list below gives the times and locations of upcoming meetings. Each meeting lasts for an hour and features either a local speaker or, on Seminar days, an invited outside speaker. Everyone is welcome to attend.

### Fall 2013 Events

 DB Meeting: Wednesday September 18, 2:30pm, DC 1331 Speaker: Anil Goel, SAP Waterloo Title: The SAP HANA Platform Abstract: I will talk about the challenges and R&D agenda that we are tackling for SAP HANA Platform, and our global collaborative research model.

 DB Event: Monday September 23, 1:30pm, DC 1331 Speaker: Rizwan Mian, Queen's University Title: Smart Spending: Determining Cost-Effective Resource Configurations for Executing Data-Intensive Workloads in Public Clouds Abstract: The rate of data growth in many domains is challenging our ability to manage and analyze it. Consequently, we see the emergence of computing systems that attempt to efficiently process /data-intensive/applications or I/O bound applications with large data. /Cloud computing/offers “infinite” resources on demand, and on a pay-as-you-go basis. As a result, it has gained interest for large-scale data processing. Given this supposedly infinite resource set, a /provisioning/process is needed to determine appropriate resources for data processing or workload execution. The prevalent data processing architectures do not usually employ provisioning techniques available in a public cloud, and existing provisioning techniques have largely ignored data-intensive applications in public clouds. My Phd work takes a step towards bridging the gap between existing data processing approaches and the provisioning techniques available in a public cloud, such that the monetary cost of executing data-intensive workloads is minimized. I formulate the problem of provisioning and include constructs to exploit a cloud’s elasticity to include any number of resources prior to execution. The provisioning is modeled as a search problem, and standard search heuristics are used to solve it. I propose a novel framework for resource provisioning in a cloud environment. My framework allows pluggable cost and performance models. I instantiate the framework by developing various search algorithms, cost and performance models to support the search for an effective resource configuration. I consider data-intensive workloads that consist of transactional, analytical or mixed workloads for evaluation, and access multiple database tenants. The workloads are based on standard TPC benchmarks. In addition, the user preferences on response time or throughput are expressed as constraints. My propositions and their results are validated in a real public cloud, namely the Amazon cloud. The evaluation supports the claim that the framework is an effective tool for provisioning database workloads in a public cloud with minimal dollar cost.

 DB Meeting: Wednesday September 25, 2:30pm, DC 1331 Speaker: Peter Bumbulis, SAP Waterloo Title: Storage class memory and its potential impact on database systems Abstract: To reduce latency, memory bus attached storage is becoming available. To take full advantage of the latency improvements, the software stack needs to be redesigned. In this talk I'll discuss some of the challenges along the way.

 DB Meeting: Wednesday October 2, 2:30pm, DC 1331 Speaker: David Toman Title: Merge-Joins for Free Abstract: In execution of relational queries, ordered properties of relations allow the use of computationally preferable algorithms, such as the merge-join'' algorithm. In common implementations of relational systems, multiple join algorithms'' are present to account for these situations. In this talk I will revisit this design choice and argue that similar effect can be achieved using simple nested-loops join'' with the help of appropriately augmented access paths.

 DB Seminar: Thursday October 10, 10:30 am, DC 1302 Speaker: Umeshwar Dayal, Hitachi America Ltd. R&D Title: Analytics over Big Data for Operational Intelligence

 DB Seminar: Wednesday October 16, 2:30 pm, DC 1302 Speaker: Tim Kraska, Brown University Title: 1700 and PLANET: A 1-Phase Commit Protocol and Programming Model for Geo-Replicated Applications

 DB Meeting: Wednesday October 23, 2:30pm, MC 5136 Speaker: Anisoara Nica, SAP Waterloo Title: Sketches and other interesting data statistics methods Abstract:

 DB Seminar: Wednesday October 30, 2:30 pm, DC 1302 Speaker: Mark Fox, University of Toronto Title: An Ontology for Global City Indicators

 DB Meeting: Wednesday November 6, 2:30pm, DC 1331 Speaker: Lukasz Golab Title: I’ll talk about some of my recent work on large-scale data warehousing and analytics. Abstract:

 DB Meeting: Wednesday November 13, 2:30pm, DC 1331 Speaker: Gunes Aluc Title: chameleon-db Abstract: The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard for the conceptual modeling of web resources, and SPARQL is the standard query language for RDF. As RDF is becoming more widely utilized, RDF data management systems are being exposed to workloads that are much more diverse and dynamic than they were designed to support, for which they are unable to provide consistently good performance. The problem arises because these systems are workload-agnostic; that is, they rely on a database structure and types of indexes that are fixed a priori, which cannot be modified at runtime. In this presentation, I will introduce chameleon-db, which is a workload-aware RDF data management system that we have developed. chameleon-db automatically and periodically adjusts its layout of the RDF database to optimize for queries so that they can be executed efficiently. Since one cannot afford to stop processing queries, we propose a novel design that enables partitions to be concurrently updated. We demonstrate that chameleon-db can achieve robust performance across a diverse spectrum of query workloads, outperforming its competitors by up to 2 orders of magnitude, and that it can easily adapt to changing workloads.

 DB Seminar: Wednesday November 20, 2:30 pm, DC 1302 Speaker: Paul Larson, Microsoft Research Title: Evolving the Architecture of SQL Server

 DB Meeting: Wednesday November 27, 2:30pm, DC 1331 Speaker: Suprio Ray, PhD Candidate, U. Toronto Title: A Parallel Spatial Data Analysis Infrastructure for the Cloud Abstract: Spatial data analysis applications are emerging from a wide range of domains such as building information management, environmental assessments and medical imaging. Time-consuming computational geometry algorithms make these applications slow, even for medium-sized datasets. At the same time, there is a rapid expansion in available processing cores, through multicore machines and Cloud computing. The confluence of these trends demands effective parallelization of spatial query processing. Unfortunately, traditional parallel spatial databases are ill-equipped to deal with the performance heterogeneity that is common in the Cloud. In this talk, I present Niharika, a parallel spatial data analysis infrastructure that exploits all available cores in a heterogeneous cluster. Niharika first uses a declustering technique that creates balanced spatial partitions. Then, Niharika adapts to performance heterogeneity and processing skew in the spatial dataset using dynamic loadbalancing. We evaluate Niharika with three load-balancing algorithms and two different spatial datasets (both from TIGER) using Amazon EC2 instances. For this evaluation, we selected several spatial join queries from our spatial database benchmark Jackpine. We demonstrate that Niharika adapts to the performance heterogeneity in the EC2 nodes, thereby achieving excellent speedups (e.g., 63.6X using 64 cores on 16 4-core EC2 nodes, in the best case) and outperforming an approach that does not adapt.

 DB Meeting: Wednesday December 4, 2:30pm, DC 1331 Speaker: Jarek Szlichta, Postdoctoral Fellow, U. Toronto Title: Expressiveness and Complexity of Order Dependencies Abstract: Dependencies play an important role in databases. We study order dependencies (ODs)—and unidirectional order dependencies (UODs), a proper sub-class of ODs—which describe the relationships among lexicographical orderings of sets of tuples. We consider lexicographical ordering, as by the order-by operator in SQL, because this is the notion of order used in SQL and within query optimization. Our main goal is to investigate the inference problem for ODs, both in theory and in practice. We show the usefulness of ODs in query optimization. We establish the following theoretical results: (i) a hierarchy of order dependency classes; (ii) a proof of co-NP-completeness of the inference problem for the subclass of UODs (and ODs); (iii) a proof of co-NP-completeness of the inference problem of functional dependencies (FDs) from ODs in general, but demonstrate linear time complexity for the inference of FDs from UODs; (iv) a sound and complete elimination procedure for inference over ODs; and (v) a sound and complete polynomial inference algorithm for sets of UODs over restricted domains.