[Please remove <h1>]
Fall 2013
Events of interest to the
Database Research Group are posted here, and are also
mailed to the uw.cs.database newsgroup and the
db-faculty,
db-grads,
db-friends
mailing lists.
Subscribe to one of these mailing lists to receive e-mail notification
of upcoming events.
The DB group meets Wednesday afternoons at 2:30pm.
The list below gives the
times and locations of upcoming meetings.
Each meeting lasts for an hour and features either
a local speaker or, on
Seminar days,
an invited outside speaker.
Everyone is welcome to attend.
Fall 2013 Events
DB Meeting:
|
Wednesday September 18, 2:30pm, DC 1331
|
Speaker:
|
Anil Goel, SAP Waterloo
|
Title:
|
The SAP HANA Platform
|
Abstract:
|
I will talk about the challenges and R&D agenda that we are tackling for SAP HANA Platform, and our global collaborative research model.
|
DB Event:
|
Monday September 23, 1:30pm, DC 1331
|
Speaker:
|
Rizwan Mian, Queen's University
|
Title:
|
Smart Spending: Determining Cost-Effective Resource
Configurations for Executing Data-Intensive Workloads in Public Clouds
|
Abstract:
|
The rate of data growth in many domains is challenging our ability to
manage and analyze it. Consequently, we see the emergence of computing
systems that attempt to efficiently process /data-intensive/applications
or I/O bound applications with large data. /Cloud computing/offers
“infinite” resources on demand, and on a pay-as-you-go basis. As a
result, it has gained interest for large-scale data processing. Given
this supposedly infinite resource set, a /provisioning/process is needed
to determine appropriate resources for data processing or workload
execution. The prevalent data processing architectures do not usually
employ provisioning techniques available in a public cloud, and existing
provisioning techniques have largely ignored data-intensive applications
in public clouds.
My Phd work takes a step towards bridging the gap between existing data
processing approaches and the provisioning techniques available in a
public cloud, such that the monetary cost of executing data-intensive
workloads is minimized. I formulate the problem of provisioning and
include constructs to exploit a cloud’s elasticity to include any number
of resources prior to execution. The provisioning is modeled as a search
problem, and standard search heuristics are used to solve it.
I propose a novel framework for resource provisioning in a cloud
environment. My framework allows pluggable cost and performance models.
I instantiate the framework by developing various search algorithms,
cost and performance models to support the search for an effective
resource configuration.
I consider data-intensive workloads that consist of transactional,
analytical or mixed workloads for evaluation, and access multiple
database tenants. The workloads are based on standard TPC benchmarks. In
addition, the user preferences on response time or throughput are
expressed as constraints. My propositions and their results are
validated in a real public cloud, namely the Amazon cloud. The
evaluation supports the claim that the framework is an effective tool
for provisioning database workloads in a public cloud with minimal
dollar cost.
|
DB Meeting:
|
Wednesday September 25, 2:30pm, DC 1331
|
Speaker:
|
Peter Bumbulis, SAP Waterloo
|
Title:
|
Storage class memory and its potential impact on database systems
|
Abstract:
|
To reduce latency, memory bus attached storage is becoming available. To take full advantage of the latency improvements, the software stack needs to be redesigned. In this talk I'll discuss some of the challenges along the way.
|
DB Meeting:
|
Wednesday October 2, 2:30pm, DC 1331
|
Speaker:
|
David Toman
|
Title:
|
Merge-Joins for Free
|
Abstract:
|
In execution of relational queries, ordered properties of
relations allow the use of computationally preferable algorithms,
such as the ``merge-join'' algorithm. In common implementations
of relational systems, multiple ``join algorithms'' are present
to account for these situations. In this talk I will revisit this
design choice and argue that similar effect can be achieved using
simple ``nested-loops join'' with the help of appropriately
augmented access paths.
|
DB Meeting:
|
Wednesday October 23, 2:30pm, MC 5136
|
Speaker:
|
Anisoara Nica, SAP Waterloo
|
Title:
|
Sketches and other interesting data statistics methods
|
Abstract:
|
|
DB Meeting:
|
Wednesday November 6, 2:30pm, DC 1331
|
Speaker:
|
Lukasz Golab
|
Title:
|
I’ll talk about some of my recent work on large-scale data
warehousing and analytics.
|
Abstract:
|
|
DB Meeting:
|
Wednesday November 13, 2:30pm, DC 1331
|
Speaker:
|
Gunes Aluc
|
Title:
|
chameleon-db
|
Abstract:
|
The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard for the conceptual modeling of web resources, and SPARQL is the standard query language for RDF. As RDF is becoming more widely utilized, RDF data management systems are being exposed to workloads that are much more diverse and dynamic than they were designed to support, for which they are unable to provide consistently good performance. The problem arises because these systems are workload-agnostic; that is, they rely on a database structure and types of indexes that are fixed a priori, which cannot be modified at runtime.
In this presentation, I will introduce chameleon-db, which is a workload-aware RDF data management system that we have developed. chameleon-db automatically and periodically adjusts its layout of the RDF database to optimize for queries so that they can be executed efficiently. Since one cannot afford to stop processing queries, we propose a novel design that enables partitions to be concurrently updated. We demonstrate that chameleon-db can achieve robust performance across a diverse spectrum of query workloads, outperforming its competitors by up to 2 orders of magnitude, and that it can easily adapt to changing workloads.
|
DB Meeting:
|
Wednesday November 27, 2:30pm, DC 1331
|
Speaker:
|
Suprio Ray, PhD Candidate, U. Toronto
|
Title:
|
A Parallel Spatial Data Analysis Infrastructure for the Cloud
|
Abstract:
|
Spatial data analysis applications are emerging from a wide range of
domains such as building information management, environmental assessments
and medical imaging. Time-consuming computational geometry algorithms make
these applications slow, even for medium-sized datasets. At the same time,
there is a rapid expansion in available processing cores, through
multicore machines and Cloud computing. The confluence of these trends
demands effective parallelization of spatial query processing.
Unfortunately, traditional parallel spatial databases are ill-equipped to
deal with the performance heterogeneity that is common in the Cloud.
In this talk, I present Niharika, a parallel spatial data analysis
infrastructure that exploits all available cores in a heterogeneous
cluster. Niharika first uses a declustering technique that creates
balanced spatial partitions. Then, Niharika adapts to performance
heterogeneity and processing skew in the spatial dataset using dynamic
loadbalancing. We evaluate Niharika with three load-balancing algorithms
and two different spatial datasets (both from TIGER) using Amazon EC2
instances. For this evaluation, we selected several spatial join queries
from our spatial database benchmark Jackpine. We demonstrate that Niharika
adapts to the performance heterogeneity in the EC2 nodes, thereby
achieving excellent speedups (e.g., 63.6X using 64 cores on 16 4-core EC2
nodes, in the best case) and outperforming an approach that does not
adapt.
|
DB Meeting:
|
Wednesday December 4, 2:30pm, DC 1331
|
Speaker:
|
Jarek Szlichta, Postdoctoral Fellow, U. Toronto
|
Title:
|
Expressiveness and Complexity of Order Dependencies
|
Abstract:
|
Dependencies play an important role in databases. We study order dependencies (ODs)—and unidirectional order dependencies (UODs), a proper sub-class of ODs—which describe the relationships among lexicographical orderings of sets of tuples. We consider lexicographical ordering, as by the order-by operator in SQL, because this is the notion of order used in SQL and within query optimization. Our main goal is to investigate the inference problem for ODs, both in theory and in practice. We show the usefulness of ODs in query optimization. We establish the following theoretical results: (i) a hierarchy of order dependency classes; (ii) a proof of co-NP-completeness of the inference problem for the subclass of UODs (and ODs); (iii) a proof of co-NP-completeness of the inference problem of functional dependencies (FDs) from ODs in general, but demonstrate linear time complexity for the inference of FDs from UODs; (iv) a sound and complete elimination procedure for inference over ODs; and (v) a sound and complete polynomial inference algorithm for sets of UODs over restricted domains.
|
This page is maintained
by
Khuzaima Daudjee.