[Please remove <h1>]
Winter 2014
Events of interest to the
Database Research Group are posted here, and are also
mailed to the uw.cs.database newsgroup and the
db-faculty,
db-grads,
db-friends
mailing lists.
Subscribe to one of these mailing lists to receive e-mail notification
of upcoming events.
The DB group meets Wednesday afternoons at 2:30pm.
The list below gives the
times and locations of upcoming meetings.
Each meeting lasts for an hour and features either
a local speaker or, on
Seminar days,
an invited outside speaker.
Everyone is welcome to attend.
Winter 2014 Events
DB Meeting:
|
Wednesday September 15, 2:30pm, DC 1331
|
Speaker:
|
Taras Kinash
|
Title:
|
Application of Definability to Query Answering over Knowledge Bases (MMath Thesis Presentation)
|
Abstract:
|
Answering object queries (i.e. instance retrieval) is a central task in ontology based data
access (OBDA). Performing this task involves reasoning with respect to a knowledge base
K (i.e. ontology) over some description logic (DL) dialect L. As the expressive power of L
grows, so does the complexity of reasoning with respect to K. Therefore, eliminating the
need to reason with respect to a knowledge base K is desirable.
In this work, we propose an optimization to improve performance of answering object
queries by eliminating the need to reason with respect to the knowledge base and, instead,
utilizing cached query results when possible. In particular given a DL dialect L, an object
query C over some knowledge base K and a set of cached query results S = {S1,..., Sn}
obtained from evaluating past queries, we rewrite C into an equivalent query D, that
can be evaluated with respect to an empty knowledge base, using cached query results
S' where S' is a subset of S. The new query D is an interpolant for the original query
C with respect to K and S. To find D, we leverage a tool for enumerating interpolants
of a given sentence with respect to some theory. We describe a procedure that maps a
knowledge base K, expressed in terms of a description logic dialect of first order logic, and
object query C into an equivalent theory and query that are input into the interpolant
enumerating tool, and resulting interpolants into an object query D that can be evaluated
over an empty knowledge base.
We show the efficacy of our approach through experimental evaluation on a Lehigh
University Benchmark (LUBM) data set, as well as on a synthetic data set, LUBMMOD,
that we created by augmenting an LUBM ontology with additional axioms.
|
DB Meeting:
|
Wednesday February 5, 2:30pm, DC 1331
|
Speaker:
|
David DeHaan, SAP Waterloo
|
Title:
|
Implementing histograms that provide error guarantees
|
Abstract:
|
In most DBMSs, histograms are an important data source used during cardinality estimation. Bad cardinality estimates originating from poor histograms can have drastic effects on the optimizer's ability to cost and therefore select a reasonable execution plan for a query. In this talk I'll describe "q-optimal histograms," which are histograms proposed by Guido Moerkotte that provide a quality guarantee such that any cardinality estimate derived from the histogram obeys a parameterized error bound. In addition to describing theory related to construction and use of q-optimal histograms, I'll also describe my own experience of implementing q-optimal histograms within the latest release of SAP HANA, along with some of the engineering decisions and algorithmic improvements that factored into the implementation.
|
DB Meeting:
|
Wednesday February 12, 2:30pm, DC 1331
|
Speaker:
|
Andrew Kane
|
Title:
|
Document Size Distribution
|
Abstract:
|
I will present a practice talk for our LSDS-IR 2014 workshop paper on document size distribution in the context of search engines, then give a few related ideas that could be explored by interested grad students. Workshop paper synopsis: Search engines split large datasets across multiple machines using document distribution. Documents are typically distributed randomly, but we propose that documents be distributed by their size instead. This produces immediate improvements in both index size and query throughput. We show improvements to an in-memory conjunctive list intersection system using simple16 compression and either skips or bitvectors. We also expect significant performance improvements in ranking based search systems.
|
DB Meeting:
|
Wednesday February 26, 2:30pm, DC 1331
|
Speaker:
|
Pedram Ghodsnia
|
Title:
|
Parallel I/O Aware Query Optimization
|
Abstract:
|
New trends in storage industry suggest that in the near future a majority of the hard disk drive-based storage subsystems will be replaced by solid state drives (SSDs). Database management systems can substantially benefit from the superior I/O performance of SSDs. Although the impact of using SSDs in query processing has been studied in the past, exploiting the I/O parallelism of SSDs in query processing and optimization has not received enough attention. In this talk, at first, I will discuss why the query optimizer needs to be aware of the benefit of the I/O parallelism in solid state drives. Then, I will propose a novel general I/O cost model that considers the impact of device I/O queue depth in I/O cost estimation. This model, dynamically defined by a calibration process, summarizes the behavior of the I/O subsystem, without having any prior knowledge about the type and the number of devices which are used in the storage subsystem. Our experimental results show that by employing the proposed model, the query optimizer would be able to choose plans with up to 20 times shorter runtime.
|
DB Meeting:
|
Wednesday March 5, 2:30pm, DC 1331
|
Speaker:
|
Catalin Avram
|
Title:
|
Transaction Execution and Data Placement in an Edge-Aware Storage System
|
Abstract:
|
This talk discusses the hurdles and benefits of moving web applications and their data to the edge of the cloud; proposes a transactional model for the resulting distributed system and motivates further research into building a distributed online algorithm for transaction execution and dynamic data placement.
|
DB Meeting:
|
Wednesday March 12, 2:30pm, DC 1331
|
Speaker:
|
Ken Salem
|
Title:
|
Transactions over Partitioned Replicated Databases: A Brief Tour
|
Abstract:
|
I'll give an overview of some recent systems, e.g., Spanner,
MDCC, Granola, and Megastore, that implement transactions over
replicated partitioned databases, and will try to place them in context.
|
MMath Thesis Preentation:
|
Friday March 14, 11:am, DC 2310
|
Speaker:
|
Xin Pan
|
Title:
|
Database High Availability using SHADOW Systems
|
Abstract:
|
Various High Availability DataBase systems (HADB) are used to provide high availability.
Pairing an active database system with a standby system is one commonly used HADB
techniques. The active system serves read/write workloads. One or more standby systems
replicate the active and serve read-only workloads. Though widely used, this technique has
some significant drawbacks: The active system becomes the bottleneck under heavy write
workloads. Replicating changes synchronously from the active to the standbys further
reduces the performance of the active system. Asynchronous replication, however, risk the
loss of updates during failover. The shared-nothing architecture of active-standby systems
is unnecessarily complex and cost inefficient.
In this thesis we present SHADOW systems, a new technique for database high availability.
In a SHADOW system, the responsibility for database replication is pushed from
the database systems into a shared, reliable, storage system. The active and standby systems
share access to a single logical copy of the database, which resides in shared storage.
SHADOW introduces write offloading, which frees the active system from the need to
update the persistent database, placing that responsibility on the underutilized standby
system instead. By exploiting shared storage, SHADOW systems avoid the overhead of
database-managed synchronized replication, while ensuring that no updates will be lost
during a failover. We have implemented a SHADOW system using PostgreSQL, and we
present the results of a performance evaluation that shows that the SHADOW system can
outperform both traditional synchronous replication and standalone PostgreSQL systems.
|
DB Meeting:
|
Wednesday March 19, 2:30pm, DC 1331
|
Speaker:
|
Ihab Ilyas
|
Title:
|
Data Quality
|
Abstract:
|
|
DB Meeting:
|
Wednesday March 26, 2:30pm, DC 1331
|
Speaker:
|
Dan Farrar, SAP Waterloo
|
Title:
|
Parallel Query Evaluation: The Volcano Model Turns 20
|
Abstract:
|
I will revisit the seminal 1994 paper [1] by Goetz Graefe that describes the parallel Volcano query evaluation model. I will talk about the power of the Volcano model, its relevance to the hardware landscape today (SSDs, in-memory stores, massively multi-core systems), and the engineering challenges of getting it implemented correctly.
[1] Graefe, Goetz. "Volcano -- An Extensible and Parallel Query Evaluation System." IEEE Trans. Knowl. Data Eng., vol 6, no. 1, pp 120-135, Feb. 1994.
|
MMath Thesis Presentation:
|
Wednesday April 9, 2:30pm, DC 1331
|
Speaker:
|
Chang Ge
|
Title:
|
Bitemporal Sliding Windows
|
Abstract:
|
The bitemporal data model associates two time intervals with each record - system time and application time - denoting the validity of the record from the perspective of the database and of the real wold, respectively. One issue that has not yet been addressed is how to efficiently answer sliding window queries in this model. In this work, we propose and experimentally evaluate a main-memory index called BiSW that supports sliding windows on system time, application time, and both time attributes simultaneously. Our experimental results show that BiSW outperforms existing approaches in terms of space footprint, maintenance overhead and query performance.
|
This page is maintained
by
Khuzaima Daudjee.