2015 Data Systems Group Events

Public talks of interest to the Data Systems Group are posted here, and are also mailed to the dsg-faculty, dsg-grads, dsg-friends mailing lists. Subscribe to one of these mailing lists to receive e-mail notification of upcoming events. Everyone is welcome to attend.

Starting Fall 2015, the regular time for DSG talks is Monday at 10:30, although some talks may be on different days.

2015 Events



DB Seminar Series: Wednesday January 21, 2:30 pm, DC 1302
Speaker: Cristiana Amza, University of Toronto
Title: Stage-Aware Anomaly Detection through Execution Flow Tracking


DB Seminar: Monday January 26, 1:30 pm, DC 2568
Speaker: Alexandra Roatis, University of Waterloo
Title: Efficient Querying and Analytics of Semantic Web Data
Abstract: The high rate of data publication and its increased complexity, for instance the heterogeneous, self-describing Semantic Web data, motivate the interest in efficient techniques for data manipulation. This talk focuses on leveraging mature relational data management technology for querying Semantic Web data.

The first part describes query answering over data subject to RDFS constraints, stored in relational data management systems. The implicit information resulting from RDF reasoning is required to correctly answer such queries. We introduce the database fragment of RDF, going beyond the expressive power of previously studied fragments. We devise novel techniques for answering Basic Graph Pattern queries within this fragment, exploring the two established approaches for handling RDF semantics, namely graph saturation and query reformulation. In particular, we consider graph updates within each approach and propose a method for incrementally maintaining the saturation. We experimentally study the performance trade-offs of our techniques, which can be deployed on top of any relational data management engine.

The second part considers the new requirements for data analytics tools and methods emerging from the development of the Semantic Web. We fully redesign, from the bottom up, core data analytics concepts and tools in the context of RDF data. We propose the first complete formal framework for warehouse-style RDF analytics. Notably, we define analytical schemas tailored to heterogeneous, semantic-rich RDF graphs, analytical queries which (beyond relational cubes) allow flexible querying of the data and the schema as well as powerful aggregation and OLAP-style operations. Experiments on a fully-implemented platform demonstrate the practical interest of our approach.


CS Seminar: Thursday February 5, 10:30 am, DC 1304
Speaker: Willis Lang, Microsoft Jim Gray Systems Lab
Title: Redefining the Rules for Data Processing in the Cloud

DB Seminar Series: Wednesday March 25 April 1, 2:30 pm, DC 1302
Speaker: Aaron Elmore, MIT and University of Chicago
Title: Building an Elastic Main-Memory Database: E-Store

DB Seminar Series: Wednesday May 6th, 2:30 pm, DC 1302
Speaker: Sudipto Das, Microsoft Research
Title: Performance Isolation in Multi-Tenant Relational Database-as-a-Service

PhD Seminar: Monday May 25th, 1:30pm, DC 2314
Speaker: Ahmed El-Roby
Title: ALEX: Automatic Link Exploration in Linked Data
Abstract:

There has recently been an increase in the number of RDF knowledge bases published on the Internet. These rich RDF data sets can be useful in answering many queries, but much more interesting queries can be answered by integrating information from different data sets. This has given rise to research on automatically linking different RDF data sets representing different knowledge bases. This is challenging due to their scale and semantic heterogeneity. Various approaches have been proposed, but there is room for improving the quality of the generated links.

In this paper, we present ALEX, a system that aims at improving the quality of links between RDF data sets by using feedback provided by users on the answers to linked data queries. ALEX starts with a set of candidate links obtained using any automatic linking algorithm. ALEX utilizes user feedback to discover new links that did not exist in the set of candidate links while preserving link precision. ALEX discovers these new links by finding links that are similar to a link approved by the user through feedback on queries. ALEX uses a Monte-Carlo reinforcement learning method to learn how to explore in the space of possible links around a given link. Our experiments on real-world data sets show that ALEX is efficient and significantly improves the quality of links.


Practice Talk: Wednesday June 3rd, 1:00pm, DC 2314
Speaker: Hemant Saxena
Title: EdgeX: Edge Replication for Web Applications
Abstract:

Global Web applications face the problem of high network latency due to their need to communicate with distant data centers. Many applications use edge networks for caching images, CSS, javascript, and other static content in order to avoid some of this network latency. However, for updates and for anything other than static content, communication with the data center is still required, and can dominate application request latencies. One way to address this problem is to push more of the web application, as well the database on which it depends, from the remote data center towards the edge of the network. In this paper, we present preliminary work in this direction. Specifically, we present an edge-aware dynamic data replication architecture for relational database systems supporting web applications. Our objective is to allow dynamic content to be served from the edge of the network, with low latency.

This will be a short practice talk.


MMath Thesis Presentation Monday July 6th, 2:00pm, Room DC 2310
Speaker: Young Han
Title: On Improving Distributed Pregel-like Graph Processing Systems
Abstract:

The considerable interest in distributed systems that can execute algorithms to process large graphs has led to the creation of many graph processing systems. However, existing systems suffer from two major issues: (1) poor performance due to frequent global synchronization barriers and limited scalability; and (2) lack of support for graph algorithms that require serializability, the guarantee that parallel executions of an algorithm produce the same results as some serial execution of that algorithm.

Many graph processing systems use the bulk synchronous parallel (BSP) model, which allows graph algorithms to be easily implemented and reasoned about. However, BSP suffers from poor performance due to stale messages and frequent global synchronization barriers. While asynchronous models have been proposed to alleviate these overheads, existing systems that implement such models have limited scalability or retain frequent global barriers and do not always support graph mutations or algorithms with multiple computation phases. We propose barrierless asynchronous parallel (BAP), a new computation model that overcomes the limitations of existing asynchronous models by reducing both message staleness and global synchronization while retaining support for graph mutations and algorithms with multiple computation phases. We present GiraphUC, which implements our BAP model in the open source distributed graph processing system Giraph, and evaluate it at scale to demonstrate that BAP provides efficient and transparent asynchronous execution of algorithms that are programmed synchronously.

Secondly, very few systems provide serializability, despite the fact that many graph algorithms require it for accuracy, correctness, or termination. To address this deficiency, we provide a complete solution that can be implemented on top of existing graph processing systems to provide serializability. Our solution formalizes the notion of serializability and the conditions under which it can be provided for graph processing systems. We propose a partition-based synchronization technique that enforces these conditions efficiently to provide serializability. We implement this technique into Giraph and GiraphUC to demonstrate that it is configurable, transparent to algorithm developers, and more performant than existing techniques.


DB Seminar Series: Monday July 20th, 1:30pm, Room DC1304
Speaker: Wolfgang Lehner, TU Dresden
Title: Steps towards HW/SW-DB-CoDesign

Practice Talk: Wednesday September 2, 12:45pm, DC 1316
Speaker: Greg Drzadzewski
Title: Enhancing Exploration with a Faceted Browser through Summarization
Abstract: An enhanced faceted browsing system has been developed to support users' exploration of large multi-tagged document collections. It provides summary measures of document result sets at each step of navigation through a set of representative terms and a diverse set of documents. These summaries are derived from pre-materialized views that allow for quick calculation of centroids for various result sets. The utility and efficiency of the system is demonstrated on the New York Times Annotated Corpus.

This will be a short practice talk.


CS Distinguished Lecture Series: Monday Sep 14th, 2:00 pm, Humanities Theatre
Speaker: Mike Stonebraker, MIT
Title: The Land Sharks are on the Squawk Box (How Riding a Bicycle across America and Building Postgres Have a Lot in Common)

MMath thesis presentation: Thursday Oct 1st, 2:00 pm, DC 3323
Speaker: Hemant Saxena
Title: EdgeX: Edge Replication for Web Applications
Abstract: Global web applications face the problem of high network latency due to their need to communicate with distant data centers. Many applications use edge networks for caching images, CSS, javascript, and other static content in order to avoid some of this network latency. However, for updates and for anything other than static content, communication with the data center is still required, and can dominate application request latencies. One way to address this problem is to push more of the web application, as well the database on which it depends, from the remote data center towards the edge of the network. This thesis presents preliminary work in this direction. Specifically, it presents an edge-aware dynamic data replication architecture for relational database systems supporting web applications. The objective is to allow dynamic content to be served from the edge of the network, with low latency.

PhD Seminar (Systems): Friday Oct 2nd, 10:30 am, DC 1331
Speaker: Cong Guo
Title: Towards Adaptive Resource Allocation for Database Workloads

DSG Seminar Series: Monday Oct 5th, 10:30 am, M3-3127
Speaker: Nesime Tatbul, Intel Labs and MIT
Title: S-Store: A Streaming NewSQL System for Big Velocity Applications

MMath Research Paper presentation: Tuesday Oct 6th, 10:00 am, DC 2310
Speaker: Hella-Franziska Hoffmann
Title: Holistic Cleaning of Heterogeneous Data Sets using Conditional Denial Constraints

Seminar Mon Oct 19, 10:30 am, DC 1302
Speaker: Ashraf Aboulnaga, QCRI
Title: Arabesque: A System for Distributed Graph Mining
Abstract: Distributed data processing platforms such as MapReduce and Pregel have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms. However, these platforms do not represent a good match for distributed graph mining problems, as for example finding frequent subgraphs in a graph. Given an input graph, these problems require exploring a very large number of subgraphs and finding patterns that match some "interestingness" criteria desired by the user. These algorithms are very important for areas such as social networks, semantic web, and bioinformatics. In this talk, I will present Arabesque, the first distributed data processing platform for implementing graph mining algorithms. Arabesque automates the process of exploring a very large number of subgraphs. It defines a high-level filter-process computational model that simplifies the development of scalable graph mining algorithms: Arabesque explores subgraphs and passes them to the application, which must simply compute outputs and decide whether the subgraph should be further extended. We use Arabesque's API to produce distributed solutions to three fundamental graph mining problems: frequent subgraph mining, counting motifs, and finding cliques. Our implementations require a handful of lines of code, scale to trillions of subgraphs, and represent in some cases the first available distributed solutions.

This is joint work with Carlos Teixeira, Alexandre Fonseca, Marco Serafini, Georgos Siganos, and Mohammed Zaki. It appears in SOSP 2015.


DSG Seminar Series: Monday Oct 26th, 10:30 am, Room TBA
Speaker: Ankur Goyal, MemSQL
Title: Key Innovations in MemSQL

CS Distinguished Lecture Series: Tuesday Oct 27th, 3:30 pm, DC 1302
Speaker: Susan Dumais, Microsoft
Title: Personalized Search: Potential and Pitfalls

DSG Seminar Series: Monday Nov 2nd, 10:30 am, DC 1302
Speaker: Andy Pavlo, Carnegie Mellon University
Title: I Don't Want to be the Mitt Romney of Databases

DSG Seminar Series: Monday Nov 9th, 10:30 am, DC 1302
Speaker: Shane Culpepper, RMIT
Title: Efficient Location-aware Web Search

This page is maintained by Ken Salem.

Campaign Waterloo

Data Systems Group
David R. Cheriton School of Computer Science
University of Waterloo
Waterloo, Ontario, Canada N2L 3G1
Tel: 519-888-4567
Fax: 519-885-1208

Contact | Feedback: db-webmaster@cs.uwaterloo.ca | Data Systems Group


Valid HTML 4.01!Valid CSS! Last modified: Tuesday, 10-Nov-2015 21:34:09 EST


Menu:ShowHide