Winter 2014 Events Schedule | Database Research Group | UW

[Please remove <h1>]

Winter 2014

Events of interest to the Database Research Group are posted here, and are also mailed to the uw.cs.database newsgroup and the db-faculty, db-grads, db-friends mailing lists. Subscribe to one of these mailing lists to receive e-mail notification of upcoming events.

The DB group meets Wednesday afternoons at 2:30pm. The list below gives the times and locations of upcoming meetings. Each meeting lasts for an hour and features either a local speaker or, on Seminar days, an invited outside speaker. Everyone is welcome to attend.

Winter 2014 Events


DB Meeting: Wednesday September 15, 2:30pm, DC 1331
Speaker: Taras Kinash
Title: Application of Definability to Query Answering over Knowledge Bases (MMath Thesis Presentation)
Abstract: Answering object queries (i.e. instance retrieval) is a central task in ontology based data access (OBDA). Performing this task involves reasoning with respect to a knowledge base K (i.e. ontology) over some description logic (DL) dialect L. As the expressive power of L grows, so does the complexity of reasoning with respect to K. Therefore, eliminating the need to reason with respect to a knowledge base K is desirable. In this work, we propose an optimization to improve performance of answering object queries by eliminating the need to reason with respect to the knowledge base and, instead, utilizing cached query results when possible. In particular given a DL dialect L, an object query C over some knowledge base K and a set of cached query results S = {S1,..., Sn} obtained from evaluating past queries, we rewrite C into an equivalent query D, that can be evaluated with respect to an empty knowledge base, using cached query results S' where S' is a subset of S. The new query D is an interpolant for the original query C with respect to K and S. To find D, we leverage a tool for enumerating interpolants of a given sentence with respect to some theory. We describe a procedure that maps a knowledge base K, expressed in terms of a description logic dialect of first order logic, and object query C into an equivalent theory and query that are input into the interpolant enumerating tool, and resulting interpolants into an object query D that can be evaluated over an empty knowledge base. We show the efficacy of our approach through experimental evaluation on a Lehigh University Benchmark (LUBM) data set, as well as on a synthetic data set, LUBMMOD, that we created by augmenting an LUBM ontology with additional axioms.


DB Seminar: Wednesday January 22, 2:30 pm, DC 1302
Speaker: Fei Chiang, McMaster University
Title: Continuous Data Cleaning

DB Meeting: Wednesday February 5, 2:30pm, DC 1331
Speaker: David DeHaan, SAP Waterloo
Title: Implementing histograms that provide error guarantees
Abstract: In most DBMSs, histograms are an important data source used during cardinality estimation. Bad cardinality estimates originating from poor histograms can have drastic effects on the optimizer's ability to cost and therefore select a reasonable execution plan for a query. In this talk I'll describe "q-optimal histograms," which are histograms proposed by Guido Moerkotte that provide a quality guarantee such that any cardinality estimate derived from the histogram obeys a parameterized error bound. In addition to describing theory related to construction and use of q-optimal histograms, I'll also describe my own experience of implementing q-optimal histograms within the latest release of SAP HANA, along with some of the engineering decisions and algorithmic improvements that factored into the implementation.

DB Meeting: Wednesday February 12, 2:30pm, DC 1331
Speaker: Andrew Kane
Title: Document Size Distribution
Abstract: I will present a practice talk for our LSDS-IR 2014 workshop paper on document size distribution in the context of search engines, then give a few related ideas that could be explored by interested grad students. Workshop paper synopsis: Search engines split large datasets across multiple machines using document distribution. Documents are typically distributed randomly, but we propose that documents be distributed by their size instead. This produces immediate improvements in both index size and query throughput. We show improvements to an in-memory conjunctive list intersection system using simple16 compression and either skips or bitvectors. We also expect significant performance improvements in ranking based search systems.

DB Meeting: Wednesday February 26, 2:30pm, DC 1331
Speaker: Pedram Ghodsnia
Title: Parallel I/O Aware Query Optimization
Abstract: New trends in storage industry suggest that in the near future a majority of the hard disk drive-based storage subsystems will be replaced by solid state drives (SSDs). Database management systems can substantially benefit from the superior I/O performance of SSDs. Although the impact of using SSDs in query processing has been studied in the past, exploiting the I/O parallelism of SSDs in query processing and optimization has not received enough attention. In this talk, at first, I will discuss why the query optimizer needs to be aware of the benefit of the I/O parallelism in solid state drives. Then, I will propose a novel general I/O cost model that considers the impact of device I/O queue depth in I/O cost estimation. This model, dynamically defined by a calibration process, summarizes the behavior of the I/O subsystem, without having any prior knowledge about the type and the number of devices which are used in the storage subsystem. Our experimental results show that by employing the proposed model, the query optimizer would be able to choose plans with up to 20 times shorter runtime.


DB Meeting: Wednesday March 5, 2:30pm, DC 1331
Speaker: Catalin Avram
Title: Transaction Execution and Data Placement in an Edge-Aware Storage System
Abstract: This talk discusses the hurdles and benefits of moving web applications and their data to the edge of the cloud; proposes a transactional model for the resulting distributed system and motivates further research into building a distributed online algorithm for transaction execution and dynamic data placement.


DB Meeting: Wednesday March 12, 2:30pm, DC 1331
Speaker: Ken Salem
Title: Transactions over Partitioned Replicated Databases: A Brief Tour
Abstract: I'll give an overview of some recent systems, e.g., Spanner, MDCC, Granola, and Megastore, that implement transactions over replicated partitioned databases, and will try to place them in context.


MMath Thesis Preentation: Friday March 14, 11:am, DC 2310
Speaker: Xin Pan
Title: Database High Availability using SHADOW Systems
Abstract: Various High Availability DataBase systems (HADB) are used to provide high availability. Pairing an active database system with a standby system is one commonly used HADB techniques. The active system serves read/write workloads. One or more standby systems replicate the active and serve read-only workloads. Though widely used, this technique has some significant drawbacks: The active system becomes the bottleneck under heavy write workloads. Replicating changes synchronously from the active to the standbys further reduces the performance of the active system. Asynchronous replication, however, risk the loss of updates during failover. The shared-nothing architecture of active-standby systems is unnecessarily complex and cost inefficient. In this thesis we present SHADOW systems, a new technique for database high availability. In a SHADOW system, the responsibility for database replication is pushed from the database systems into a shared, reliable, storage system. The active and standby systems share access to a single logical copy of the database, which resides in shared storage. SHADOW introduces write offloading, which frees the active system from the need to update the persistent database, placing that responsibility on the underutilized standby system instead. By exploiting shared storage, SHADOW systems avoid the overhead of database-managed synchronized replication, while ensuring that no updates will be lost during a failover. We have implemented a SHADOW system using PostgreSQL, and we present the results of a performance evaluation that shows that the SHADOW system can outperform both traditional synchronous replication and standalone PostgreSQL systems.


DB Meeting: Wednesday March 19, 2:30pm, DC 1331
Speaker: Ihab Ilyas
Title: Data Quality
Abstract:


DB Meeting: Wednesday March 26, 2:30pm, DC 1331
Speaker: Dan Farrar, SAP Waterloo
Title: Parallel Query Evaluation: The Volcano Model Turns 20
Abstract: I will revisit the seminal 1994 paper [1] by Goetz Graefe that describes the parallel Volcano query evaluation model. I will talk about the power of the Volcano model, its relevance to the hardware landscape today (SSDs, in-memory stores, massively multi-core systems), and the engineering challenges of getting it implemented correctly.
[1] Graefe, Goetz. "Volcano -- An Extensible and Parallel Query Evaluation System." IEEE Trans. Knowl. Data Eng., vol 6, no. 1, pp 120-135, Feb. 1994.


MMath Thesis Presentation: Wednesday April 9, 2:30pm, DC 1331
Speaker: Chang Ge
Title: Bitemporal Sliding Windows
Abstract: The bitemporal data model associates two time intervals with each record - system time and application time - denoting the validity of the record from the perspective of the database and of the real wold, respectively. One issue that has not yet been addressed is how to efficiently answer sliding window queries in this model. In this work, we propose and experimentally evaluate a main-memory index called BiSW that supports sliding windows on system time, application time, and both time attributes simultaneously. Our experimental results show that BiSW outperforms existing approaches in terms of space footprint, maintenance overhead and query performance.


This page is maintained by Khuzaima Daudjee.