[Please remove <h1>]
Fall 2014
Events of interest to the
Database Research Group are posted here, and are also
mailed to the uw.cs.database newsgroup and the
db-faculty,
db-grads,
db-friends
mailing lists.
Subscribe to one of these mailing lists to receive e-mail notification
of upcoming events.
The DB group meets Wednesday afternoons at 2:30pm.
The list below gives the
times and locations of upcoming meetings.
Each meeting lasts for an hour and features either
a local speaker or, on
Seminar days,
an invited outside speaker.
Everyone is welcome to attend.
Fall 2014 Events
Cheriton Symposium:
|
Friday, September 19, DC 1302
|
Speakers:
|
David Cheriton, Stanford University, 10:45am
"HICAMP Bitmap: Space-efficient updatable bitmap index for in-memory databases"
Ihab Ilyas, University of Waterloo, 3:00pm
"Data Cleaning from Theory to Practice"
M. Tamer Özsu, University of Waterloo, 3:45pm
"Web Data Management in the RDF Age"
|
|
DB Meeting:
|
Wednesday, October 1, 2:30pm, DC 1331
|
Speaker:
|
Greg Drzadzewski
|
Title:
|
Partial Materialization for On-Line Analytical Processing on
Multi-Tagged Document Collections
|
Abstract:
|
On-Line Analytical Processing (OLAP) systems are commonly used on top of
structured data to help users make sense of large data collections by
providing them with summary information that can be examined at various
levels of detail. Partial materialization has been used as part of these
OLAP systems as a way of reducing the time required to calculate summaries
as well as satisfying the constraints of limited storage and available time
for updates.
When dealing with large collections of tagged documents, one would also
benefit from the summarization operations provided by an OLAP system. Such
a system could make it less time consuming for users to explore and
understand the information contained in large document collections. Tagged
document collections, however, require different types of measures for
summarizing the data, and the data exhibits considerably different
properties than is the case with the data in traditional OLAP. To address
these issues, an OLAP system for documents will require a different design
and partial materialization approach.
In this talk I will describe a new document centric partial materialization
strategy that offers faster average response time to expected query
workload compared to the current partial materialization approaches, along
with a lower storage space requirement. The performance of this new partial
materialization strategy is evaluated over real and synthetic document
collections.
|
DB Meeting:
|
Wednesday October 8, 2:30pm, DC 2585
|
Speaker:
|
Jeff Pound, SAP Waterloo
|
Title:
|
Distributed Databases: the Good, the Bad, and the Ugly
|
Abstract:
|
This talk is inspired by the pop sensation and distributed database enthusiast Carly Rae Jepson. In this talk we will discuss her hit song "Call Me Maybe", a poetic ode to the challenges in building fault-tolerant distributed systems. We will see how Ms. Jepson's concerns about being called affect a variety of open source database systems, and explore how these systems behave when the "calls" fail or are delayed (ie., under network partition). In particular, we will look at consistency models as advertised vs. actual system behaviour under faults. We will survey "good" systems that make consistency guarantees and adhere to them, "bad" systems that forgo consistency for scalability and availability, and "ugly" systems that guarantee consistency but do not actually provide it in practice.
This talk is based on a number blogs (yes, blogs). Primarily Kyle Kingsbury's Jepson blog series, but also blog articles by Daniel Abadi and LinkedIn's Jay Kreps.
|
DB Meeting:
|
Wednesday October 29, 2:30pm, DC 2585
|
Speaker:
|
Ani Nica, SAP Waterloo
|
Title:
|
|
Abstract:
|
|
DB Meeting:
|
Wednesday November 5, 2:30pm, DC 2585
|
Speaker:
|
Arian Baer, FTW Telecom Research Centre, Vienna
|
Title:
|
Cache-Oblivious Scheduling of Shared Workloads
|
Abstract:
|
Shared workload optimization is feasible if the set of tasks to be executed is
known in advance, as is the case in updating a set of materialized views or
executing an extract-transform-load workflow. In this talk, we consider
data-intensive shared workloads with precedence constraints arising from data
dependencies, i.e., before executing some task, other tasks may have to run
first and generate some data needed by the next task(s). While there has been
previous work on identifying common subexpressions in shared workloads and task
re-ordering to enable shared scans, we go a step further and solve the problem
of scheduling shared data-intensive workloads in a cache-oblivious way.
Our solution relies on a novel formulation of precedence constrained scheduling
with the additional constraint that once a data item is in the cache, all tasks
that require this data item should execute as soon as possible thereafter. The
intuition behind this formulation is that the longer a data item remains in the
cache, the more likely it is to be evicted regardless of the cache size. We
give an optimal ordering algorithm using A* search over the space of possible
orderings, and we propose efficient and effective heuristics that obtain
nearly-optimal results in much less time. We present experimental results on
real-life data warehouse workloads and the TCP-DS benchmark to validate our claims.
|
DB Meeting:
|
Wednesday November 12, 2:30pm, DC 2585
|
Speaker:
|
Peter Bumbulis, SAP Waterloo
|
Title:
|
|
Abstract:
|
I'll be talking about the following paper as well as a bit
about time synchronization. There will probably be a
bit about Spanner as well.
Clock-SI: Snapshot Isolation for Partitioned Data
Stores Using Loosely Synchronized Clocks, SRDS 2013.
|
This page is maintained
by
Khuzaima Daudjee.