Tutorial: Linked Data Query Processing

Linked Data Query Processing

Tutorial
at the 22nd International World Wide Web Conference (WWW 2013)

Location: Rio de Janeiro, Brazil

Date: May 14, 2013

Abstract

The amount of Linked Data published on the Web has increased dramatically over the last five years. While some data publishers expose their datasets not only based on the Linked Data principles but also via dedicated query services, relying on such services for use cases of live, Linked Data consumption, results in ignoring a large portion of the Web of Linked Data. As a consequence, we currently witness the emergence of a new area of research that focuses on query processing approaches which rely only on the Linked Data principles. This tutorial aims to provide participants with an overview on this new field. We will discuss the foundations, the new problems that are specific to Linked Data query processing, and the works that aim to address them.

Agenda

The tutorial covers the following main aspects in Linked Data query processing:

Introduction ...Slides (4.2 MB)
In the introductory part we will give a quick overview of the Linked Data principles and the query language SPARQL, which are requirements for understanding the rest of the tutorial. We will give a short overview of Linked Data query processing and the content of the tutorial. We will emphasize the specific challenges associated with Linked Data query processing and the differences to existing (database) works, particularly approaches to processing queries over Linked Data such as SPARQL endpoint federation and warehousing.
Theoretical Foundations ...Slides (1.9 MB)
An awareness of foundations and theoretical properties of Linked Data queries is essential for understanding the possibilities and limitations of Linked Data query processing. Consequently, before we focus on systems-related topics for the major part of our tutorial, we will provide the participants with an overview on the main results of theory research in the context of Linked Data queries. More precisely, we will introduce different semantics for SPARQL-based Linked Data queries and discuss characteristics of queries under these semantics [BGS09, HS12, Har12].
Source Selection Strategies ...Slides (1.5 MB)
For the execution of Linked Data queries, it is necessary to retrieve data by looking up URIs. There exist three classes of approaches for selecting the URIs that a query execution system looks up during the execution of a query: i) index-based approaches [HHK+10], ii) live exploration [HF12, HBF09], and iii) hybrid approaches [LT10]. In this part of the tutorial, we introduce each of these types and discuss their respective characteristics, advantages and limitations.
Execution Process ...Slides (10.1 MB)
The actual process of executing a Linked Data query may consist of two separate phases: During the first phase, a query execution system selects URIs and uses them to retrieve data from the queried Web; during a subsequent, second phase, the system generates the query result using the data retrieved in the first phase [HHK+10, UHK+11]. Instead of separating these two phases, it is also possible to integrate the retrieval of data into the result construction process [HBF09, HF12, LT10, LT11, WTL+12]. We refer to Linked Data query execution approaches that apply the latter idea as integrated execution approaches. Analogously, separated execution approaches clearly separate data retrieval from result construction by two consecutive phases. In this part, we first discuss the general properties of both types of approaches. Thereafter, we focus on integrated execution approaches proposed in the literature. In particular, we describe the algorithms based on which these approaches combine data retrieval with query-local data processing.
Query Planning and Optimization ...Slides (2.0 MB)
An essential step of any procedure for processing queries is the query planning phase. This phase comprises the generation of candidate query execution plans and the selection of one of these plans for the actual execution. While such a selection usually aims for minimizing the overall executing time, possible optimization criteria in the context of Linked Data queries are more diverse and include minimizing response time, minimizing network traffic, or maximizing the degree of result completeness. The aforementioned index-based source selection approaches provide some information for assessing query execution plans w.r.t. these optimization criteria [LT10, LT11, UHK+11]. However, for live evaluation processes such information cannot be assumed to be available initially; instead, it may only be obtained incrementally during query execution. Hence, in this case it requires heuristics for selecting an initial plan [Har11a]. In this part of the tutorial we discuss query planning for Linked Data queries and review existing work that aims to address the problems in this phase of query processing.

Presenters

Olaf Hartig (David R. Cheriton School of Computer Science, University of Waterloo)
Olaf is a postdoc fellow in the Database Research Group at the University of Waterloo. His research focuses on querying Linked Data on the Web. His aim is to develop and study concepts that allow users to query the Web of Linked Data as if it is a giant global database. As project maintainer and lead developer he is implementing these concepts in the free software project SQUIN. In addition to his hands-on experience in the field of processing Linked Data queries, he recently also published results regarding theoretical properties of such queries. Olaf presented several Linked Data related tutorials at major international conferences such as ISWC 2008, ISWC 2009, WWW 2010, and ICWE 2012; and he was lecturer at the Indian-Summer School on Linked Data 2011.

Bibliography

[BGS09]: P. Bouquet, C. Ghidini, and L. Serafini: Querying The Web Of Data: A Formal Approach. In ASWC 2009.
[FGP12]: V. Fionda, C. Gutierrez, and G. Pirro: Semantic Navigation on the Web of Data: Specification of Routes, Web Fragments and Actions. In WWW 2012.
[HHK+10]: A. Harth, K. Hose, M. Karnstedt, A. Polleres, K.-U. Sattler, and J. Umbrich: Data Summaries for On-Demand Queries over Linked Data. In WWW 2010.
[HS12]: A. Harth and S. Speiser: On Completeness Classes for Query Evaluation on Linked Data. In AAAI 2012.
[Har11a]: O. Hartig: Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution. In ESWC 2011.
[Har11b]: O. Hartig: How Caching Improves Efficiency and Result Completeness for Querying Linked Data. In LDOW 2011.
[Har12]: O. Hartig: SPARQL for a Web of Linked Data: Semantics and Computability. In ESWC 2012.
[Har13a]: O. Hartig: SQUIN: A Traversal Based Query Execution System for the Web of Linked Data. In SIGMOD 2013.
[Har13b]: O. Hartig: An Overview on Execution Strategies for Linked Data Queries. In Datenbankspektrum 13(2), 2013.
[HBF09]: O. Hartig, C. Bizer, and J.-C. Freytag: Executing SPARQL Queries over the Web of Linked Data. In ISWC 2009.
[HF12]: O. Hartig and J.-C. Freytag: Foundations of Traversal Based Query Execution over Linked Data. In HYPERTEXT 2012.
[HH11]: O. Hartig and F. Huber: A Main Memory Index Structure to Query Linked Data. In LDOW 2011.
[HL10]: O. Hartig and A. Langegger: A Database Perspective on Consuming Linked Data on the Web. In Datenbankspektrum 10(2), 2010.
[LT10]: G. Ladwig and T. Tran: Linked Data Query Processing Strategies. In ISWC 2010.
[LT11]: G. Ladwig and T. Tran: SIHJoin: Querying Remote and Local Linked Data. In ESWC 2011.
[MDJ+12]: D. P. Miranker, R. K. Depena, H. Jung, J. F. Sequeda, and C. Reyna: Diamond: A SPARQL Query Engine for Linked Data Based on the Rete Match. In AImWD 2012.
[Sch11]: F. Schmedding: Incremental SPARQL Evaluation for Query Answering on Linked Data. In COLD 2011.
[SBK+12]: S. Schaffert, C. Bauer, T. Kurz, F. Dorschel, D. Glachs, and M. Fernandez: The Linked Media Framework: Integrating and Interlinking Enterprise Media Content and Data. In I-Semantics 2012.
[TUY11]: Y. Tian, J. Umbrich, and Y. Yu: Enhancing Source Selection for Live Queries over Linked Data via Query Log Mining. In JIST 2011.
[UHK+11]: J. Umbrich, K. Hose, M. Karnstedt, A. Harth, and A. Polleres: Comparing Data Summaries for Processing Live Queries over Linked Data. In World Wide Web 14(5–6), 2011.
[UHP+12]: J. Umbrich, A. Hogan, A. Polleres, and S. Decker: Improving the Recall of Live Linked Data Querying through Reasoning. In RR 2012.
[UKH+12]: J. Umbrich, M. Karnstedt, A. Hogan, and J. X. Parreira: Hybrid SPARQL Queries: Fresh vs. Fast Results. In ISWC 2012.
[WTL+12]: A. Wagner, T. Tran, G. Ladwig, A. Harth, and R. Studer: Top-k Linked Data Query Processing. In ESWC 2012.