Database Research Group Events

Spring 2014

Events of interest to the Database Research Group are posted here, and are also mailed to the uw.cs.database newsgroup and the db-faculty, db-grads, db-friends mailing lists. Subscribe to one of these mailing lists to receive e-mail notification of upcoming events.

The DB group meets Wednesday afternoons at 2:30pm. The list below gives the times and locations of upcoming meetings. Each meeting lasts for an hour and features either a local speaker or, on Seminar days, an invited outside speaker. Everyone is welcome to attend.

Spring 2014 Events

DB Seminar: Tuesday, May 6, 11:00am, DC 1304
Speaker: Timothy Roscoe, ETH Zürich
Title: Treating cores as devices

DB Meeting: Wednesday, May 7, 2:30pm, DC 1331
Speaker: Matteo Riondato, Brown University
Title: Fast Mining of Frequent Itemsets through Sampling
Abstract: Frequent Itemsets mining is one of the key tasks in knowledge discovery from databases. The cost of mining algorithms for this problem depends on the number of itemsets and on the size of the dataset. In this talk I will present three algorithms that cuts the dependency on the dataset size by using random samples of the dataset to extract approximations of the collection of frequent itemsets with guaranteed high-quality. The algorithms use advanced data structures, VC-dimension, and MapReduce.
Bio: Matteo Riondato is graduating with a PhD in computer science from Brown University. His dissertation, titled "Sampling-based Randomized Algorithms for Big Data Analytics" explores the connection between statistical learning theory and data mining. In his research, he tries to bridge the gap between data analytics, database systems, and theory by exploiting the power of modern statistics and probability in new ways that are efficient for modern problems, modern systems, and real data.

DB Meeting: Wednesday May 14, 2:30pm, DC 1331
Speaker: Daniel Nicoara
Title: Scaling Graph Databases through Dynamic Distributed Partitioning

DB Seminar: Wednesday May 21, 2:30 pm, DC 1302
Speaker: Alexandros Labrinidis, University of Pittsburgh
Title: Handling Big Streaming Data with DILoS

DB Seminar: Thursday May 22, 2:30 pm, DC 1331
Speaker: Bettina Kemme, McGill University
Title: Multiplayer Games: the perfect application to explore scalable and secure distributed replica management

DB Meeting: Wednesday May 28, 2:30pm, DC 1331
Speaker: Olaf Hartig
Title: Optimizing the Response Time of Traversal-Based Query Execution
Abstract: Traversal-based query execution is a recent approach to query the Web as if it was a distributed database. The novelty of this approach lies in integrating a traversal-based retrieval of Web data into the query execution process. Hence, this approach does not assume a-priori a fixed set of potentially relevant data sources; instead, the traversal process discovers data and data sources on the fly. While existing work studies techniques to implement the idea of traversal- based query execution, query optimization is an open problem. What makes the problem challenging is the lack of information about data that will be discovered during query execution. This talk presents ongoing research to address this problem. In particular, we will focus on response time optimization. That is, we will discuss different approaches that enable traversal-based query executions to report first elements of a query result as early as possible.

DB Meeting: Wednesday June 4, 2:30pm, DC 1331
Speaker: Frank Tompa
Title: A Retrospective of Structured Text Search
Abstract: Structured text is ubiquitous in humanities research and throughout business, leading to the standardization of first SGML and later XML. I will review several of the alternatives that search engines have adopted for such texts, including options for defining document structure and for querying both structure and content.

DB Meeting: Wednesday June 11, 2:30pm, DC 1331
Speaker: Michael Mior
Title: Automated Schema Design for NoSQL Databases
Abstract: Selecting appropriate indices and materialized views is critical for high performance in relational databases. By example, we show that the problem of schema optimization is also highly relevant for NoSQL databases. We explore the problem of schema design in NoSQL databases with a goal of optimizing query performance while minimizing storage overhead. Our suggested approach uses the cost of executing a given workload for a given schema to guide the mapping from the application data model to a physical schema. We propose a cost-driven approach for optimization and discuss its usefulness as part of an automated schema design tool.

DB Seminar: Wednesday, June 18, 2:30pm, DC 1302
Speaker: Radu Sion, Stony Brook University
Title: Modern Secure Data Management

DB Meeting: Wednesday July 9, 2:30pm, DC 1331
Speaker: Mohamed Sabri
Title: Optimizations for Live Linked Data Query Execution

DB Meeting: Wednesday July 16, 2:30pm, DC 1331
Speaker: Zhiping Wu
Title: Data Structures for Fast Access Control in ECM Systems (MMath Thesis Presentation)
Abstract: While many access control models have been proposed, little work has been done on the efficiency of access control systems. Because the access control sub-system of an Enterprise Content Management (ECM) system may be a bottleneck, we investigate the representation of permissions to improve its efficiency. Observing that there are many browsing-oriented permission request queries, we choose to implement a subject-oriented representation (i.e., maintaining a permission list for each subject). Additionally, we notice that we may encounter many contiguous IDs under one object (e.g., folder) with breadth-firrst ID numbering.
To optimize the efficiency taking into account the above two characteristics, this thesis presents a space-efficient data structure specifically tailored for representing permission lists in ECM systems. Besides the space efficiency, checking, granting or revocation of a permission is very fast using our data structure. It also supports fast union of two or more permission lists (determining the effective permissions inherited from users' groups). In addition, our data structure is scalable to support any increase in the number of objects and subjects.
We evaluate our representation by comparing it against the bitmap based representation and a hash table based representation while using random ID numbering and breadth-first numbering, respectively. Our experimental tests on both synthetic and real-world data show that the hash table outperforms our representation for regular permission queries (i.e., querying permissions on a single object each time) as well as browsing-oriented queries with random ID numbering. However, our tests also show that 1) our representation supports faster browsing-oriented queries with breadth-first ID numbering applied while consuming only half the space when compared to the hash table based representation, and 2) our representation is much more space and time efficient than the bitmap based representation for our application.

DB Meeting: Wednesday July 23, 2:30pm, DC 1331
Speaker: Jaemyung Kim
Title: Database High Availability Using Shared Volume Service With Offloaded Writes
Abstract: Hot standby techniques are widely used to implement highly available database systems. These techniques make use of two copies of the database, an active copy and a backup that is managed by the standby. Synchronization of these two database copies is the responsibility of the database systems than manage them. However, database systems are often deployed in settings in which a reliable, persistent, network-accessible storage service (such as cloud block storage services, cluster file systems, or NAS) is available. In this paper we address the following question: how can we improve hot standby techniques in settings in which the active and standby database systems have access to a common, reliable persistent storage service?
We present SHADOW systems, a novel approach to hot standby high availability. In a SHADOW system, the active and standby database systems share access to a single logical copy of the database, which resides in the persistent shared storage. SHADOW introduces write offloading, which frees the active system from the need to update the persistent database, placing that responsibilty on the standby system instead. SHADOW systems push the task of managing database replication out of the DBMS and into the underlying storage service. We have implemented SHADOW prototypes using PostgreSQL, and we present the results of a performance evaluation that shows that SHADOW systems outperform traditional synchronous hot standby replication. Because of write offloading, SHADOW systems can potentially outperform even a standalone DBMS, while providing fast failover and durability of committed updates.

DB/IR Seminar: Tuesday August 5, 3:00pm, DC 1304 ← Note non-standard day and time
Speaker: Alistair Moffat, , Department of Computing and Information Systems, The University of Melbourne
Title: Waving the Magic WAND
Abstract: Web search services process thousands of queries per second, and filter their answers from collections containing very large amounts of data. Fast response to queries is a critical service expectation. The well-known WAND (Weak AND) processing strategy is one way of reducing the amount of computation necessary when executing such a query. The value of WAND has now been validated in a wide range of studies, and has become one of the key baselines against which all new top-k processing algorithms are benchmarked. However, most previous implementations of WAND-based retrieval approaches have been in the context of the BM25 Okapi similarity scoring regime. Here we measure the performance of WAND in the context of the alternative Language Model similarity score computation, and find that the dramatic efficiency gains reported in previous studies are no longer achievable. That is, when the primary goal of a retrieval system is to maximize effectiveness, WAND is relatively unhelpful in terms of attaining the secondary objective of maximizing query throughput rates. However, the BlockMax-WAND algorithm does in fact help reducing the percentage of postings to be scored, but with additional computational overhead. We explore a variety of trade-offs between scoring metric and processing regime and present new insight into how score-safe algorithms interact with rank scoring.

DB Meeting: Wednesday August 6, 2:30pm, DC 1331
Title: VLDB 2014 dry-runs
Abstract: Minyang Han, K. Daudjee, K. Ammar, M. T. Özsu, X. Wang, T. Jin, An Experimental Comparison of Pregel-like Graph Processing Systems.

Güneş Aluç, M. T. Özsu, K. Daudjee, Workload Matters: Why RDF Databases Need a New Design.

This page is maintained by Khuzaima Daudjee.

Campaign Waterloo

Data Systems Group
David R. Cheriton School of Computer Science
University of Waterloo
Waterloo, Ontario, Canada N2L 3G1
Tel: 519-888-4567
Fax: 519-885-1208

Contact | Feedback: | Data Systems Group

Valid HTML 4.01!Valid CSS! Last modified: Monday, 28-Jul-2014 16:09:26 EDT