Waterloo SPARQL Diversity Test Suite (WatDiv) v0.6

[Please remove <h1>]

We developed WatDiv to measure how an RDF data management system performs across a wide spectrum of SPARQL queries with varying structural characteristics and selectivity classes.

WatDiv consists of two components: the data generator and the query (and template) generator.

Citations to WatDiv should be in the following format:

G. Aluç, O. Hartig, M. T. Özsu and K. Daudjee. Diversified Stress Testing of RDF Data Management Systems. In Proc. The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, 2014, pages 197-212. WatDiv available from http://dsg.uwaterloo.ca/watdiv/.

2. Description of the Dataset

The WatDiv data generator allows users to define their own dataset through a dataset description language (see tutorial). This way, users can control

which entities to include in their dataset,
how "well-structured" each entity is (for details please refer to a research paper by Duan et al. [1]),
how different entities are associated,
the probability that an entity of type X is associated with an entity of type Y, and
the cardinality of such associations.

Using these features, we designed the WatDiv test dataset (see the associated dataset description model). By executing the data generator with different scale factors, it is possible generate test datasets with different sizes. Table 1 lists the properties of the dataset at scale factor=1.

Table 1. Characteristics of the WatDiv test dataset at scale factor=1.
	scale factor=1
triples	105257
distinct subjects	5597
distinct predicates	85
distinct objects	13258
URIs	5947
literals	14286
distinct literals	8018

An important characteristic that distinguishes the WatDiv test dataset from existing benchmarks is that instances of the same entity do not necessarily have the same set of attributes. Table 2 lists all the different entities used in WatDiv. Take the Product entity for instance. Product instances may be associated with different Product Categories (e.g., Book, Movie, Classical Music Concert, etc.), but depending on which category a product belongs to, it will have a different set of attributes. For example, products that belong to the category "Classical Music Concert" have the attributes mo:opus, mo:movement, wsdbm:composer, mo:performer (in addition to the attributes that is common to every product), whereas products that belong to the category "Book" have the attributes sorg:isbn, sorg:bookEdition and sorg:numberOfPages. Furthermore, even within a single product category, not all instances share the same set of attributes. For example, while sorg:isbn is a required attribute for a book, sorg:bookEdition (Pr=0.6) and sorg:numberOfPages (Pr=0.25) are optional attributes, where Pr indicates the probability that an instance will be generated with that attribute. It must be also noted that some attributes are correlated, which means that either all or none of the correlated attributes will be present in an instance (the pgroup construct in the WatDiv dataset description language allows the grouping of such correlated attributes). For a complete list of probabilities, please refer to Tables 3 and 4 in Appendix.

Table 2. Entities generated according to WatDiv data description model. Entities marked with an asterisk * do not scale.
Entity Type	Instance Count [per scale factor if applicable]
wsdbm:Purchase	1500
wsdbm:User	1000
wsdbm:Offer	900
wsdbm:Topic*	250
wsdbm:Product	250
wsdbm:City*	240
wsdbm:SubGenre*	145
wsdbm:Website	50
wsdbm:Language*	25
wsdbm:Country*	25
wsdbm:Genre*	21
wsdbm:ProductCategory*	15
wsdbm:Retailer	12
wsdbm:AgeGroup*	9
wsdbm:Role*	3
wsdbm:Gender*	2

In short, we designed the WatDiv test dataset such that

some entities are more structured (meaning that they contain few optional attributes) while the others are less structured;
entities are associated in complex ways that mimic the real types of distributions on the Web;
cardinalities of these associations are varied.

3. Description of the Tests

Based on the above observations, in WatDiv we try to generate test workloads that are as diverse as possible. WatDiv offers three use cases:

Basic Testing: These tests consist of queries in four categories, namely, linear queries (L), star queries (S), snowflake-shaped queries (F) and complex queries (C) with a total of 20 query templates. These query templates were randomly selected from a pool of queries generated by performing a random walk on the dataset description model (which can be represented as a graph), while making sure that (i) the selected queries sufficiently represent each category, (ii) the selectivities of the queries within each category vary, and (iii) in some queries selectivity originates from a single (or few) triple patterns while in the others, it originates as a combination of multiple somewhat less selective triple patterns.
Extensions to Basic Testing: The following use cases have been developed by Alexander Schätzle from the University of Freiburg.
- Incremental Linear Testing: This use case is designed to test the performance for linear queries with increasing size (number of triple patterns). In contrast to the linear queries in the Basic Testing use case, the queries in this use case have longer patterns. The workload contains 3 types of queries (IL-1, IL-2, IL-3) which are bound by user, retailer or unbound, respectively. Each query starts with 5 triple patterns and we incrementally add triple patterns to the initial query (up to 10 triple patterns).
- Mixed Linear Testing: This use case is designed to test the performance for linear queries of different size (number of triple patterns). In contrast to the linear queries in the Basic Testing use case, the queries in this use case have longer patterns. The workload contains 2 types of queries (ML-1, ML-2) which are bound by user or retailer, respectively. The query sizes range between 5 and 10 triple patterns for each type. For example, query ML-1-6 is a user bound query with 6 triple patterns.
Stress Testing: As described in the stress testing paper, this use case offers a much more thorough investigation of systems. To generate query templates, follow the installation procedures.

At this point, you may be wondering how these differentiating aspects of WatDiv affect system evaluation, and why they are important at all. The answer is trivial: by relying on a more diverse dataset as such (which is typical for data on the Web), we were able to generate test queries that focus on much wider aspects of query evaluation, which cannot be easily captured by other benchmarks. Consider the two SPARQL query templates C3 and S7 (cf., basic testing query templates). C3 is a star query that retrieves certain information about users such as the products they like, their friends and some demographics information. For convenience, for each triple pattern in the query template, we also display its selectivity (the reported selectivities are estimations based on the probability distributions specified in the WatDiv dataset description model). Note that while individually triple patterns in C3 are not that selective, this query as a whole, is very selective. Now, consider S7, which (as a whole) is also very selective, but unlike C3, its selectivity is largely due to only a single triple pattern. It turns out that different systems behave very differently for these queries. Systems like RDF-3x [2], which (i) decompose queries into triple patterns, (ii) find a suitable ordering of the join operations and then (iii) execute the joins in that order, perform very well on queries like S7 because the first triple pattern they execute is very selective. On the other hand, they do not do as well on queries like C3 because the decomposed evaluation produces many irrelevant intermediate tuples. In contrast, gStore [3] treats the star-shaped query as a whole and it can pinpoint the relevant vertices in the RDF graph without performing joins; hence, it is much more efficient in executing C3. For a more detailed discussion of our results, please refer to our technical report [4] and our stress testing paper.

4. Download the Latest Version

Provided that you include a citation to [5], you are free to download and use the WatDiv Data and Query Generator (v0.6) from source code (md5sum=9eac247dfdec044d7fa0141ea3ad361f). The software is supplied "as is" and all use is at your own risk.

Executable files are also provided. Source code and executable files of all versions and changelog can be found here.

The datasets used in the experiments in [5], as well as a billion triples dataset are also available for download:

10M Triples 58,558,746 bytes
100M Triples 629,608,436 bytes
1B Triples 6,502,656,740 bytes

You may also download the test workloads used in [5].

5. Installing WatDiv Data, Query and Query Template Generator

Compiling WatDiv (in C++) is straightforward---the only dependencies are the Boost libraries and the Unix words file (i.e., make sure you have a wordlist package installed under /usr/share/dict/). Once you have installed Boost, simply execute the following commands on UNIX:

tar xvf watdiv_v05.tar
cd watdiv
setenv BOOST_HOME <BOOST-INSTALLATION-DIRECTORY>
- export BOOST_HOME=<BOOST-INSTALLATION-DIRECTORY> (in bash)
make
cd bin/Release

The last step above is important. To run the data generator, issue the following command:

./watdiv -d <model-file> <scale-factor>

You will find a model file in the model sub-directory where WatDiv was installed. Using a scale factor of 1 will generate approximately 100K triples. For a more detailed description of the dataset that will be generated, please refer to Table 1. This will print the generated RDF triples on the standard output while producing a file named saved.txt in the same directory. The following steps depend on this file, therefore, keep it safe.

To run the query generator, issue the following command:

./watdiv -q <model-file> <query-file> <query-count> <recurrence-factor>

Use the same model file in the model sub-directory where WatDiv was installed. You will find the basic testing query templates in the testsuite sub-directory where WatDiv was installed.

To generate more query templates for stress testing (cf., stress testing paper), use the query template generator.

./watdiv -s <model-file> <dataset-file> <max-query-size> <query-count>

In the latest version, you may specify (i) the number of bound patterns in the query (default=1) as well as (ii) whether join vertices can be constants or not (default=false). To use these features, execute watdiv with the following signature instead.

./watdiv -s <model-file> <dataset-file> <max-query-size> <query-count> <constant-per-query-count> <constant-join-vertex-allowed?>

References

[1] S. Duan, A. Kementsietsidis, K. Srinivas, and O. Udrea. Apples and oranges: a comparison of RDF benchmarks and real RDF datasets. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 2011, pages 145-156.

[2] T. Neumann and G. Weikum. The RDF-3X engine for scalable management of RDF data. VLDB J., 19(1): 91-113, 2010.

[3] L. Zou, J. Mo, D. Zhao, L. Chen, and M. T. Özsu. gStore: Answering SPARQL queries via subgraph matching. Proc. VLDB Endow., 4(1): 482-493, 2011.

[4] G. Aluç, M. T. Özsu, K. Daudjee, and O. Hartig. chameleon-db: a workload-aware robust RDF data management system. Technical Report CS-2013-10, University of Waterloo, 2013.

[5] G. Aluç, O. Hartig, M. T. Özsu and K. Daudjee. Diversified Stress Testing of RDF Data Management Systems. In Proc. The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, 2014, pages 197-212.

Acknowledgements

We thank Alexander Schätzle (Albert-Ludwigs-Universität Freiburg) for his feedback on WatDiv. Alexander has been reporting numerous bugs, and suggesting improvements both to the project website and the software.

Questions?

If you have any questions or would like to report bugs, please contact galuc-red-@-blue-uwaterloo.ca (without the colors/hypens).