ACM SIGMOD 2016 Programming Contest

Task Details

In graph theory, the shortest path problem is the problem of finding a path between two vertices (or nodes) in a graph such that the sum of the weights of its constituent edges is minimized. This is a fundamental and well-studied combinatorial optimization problem with many practical uses: from GPS navigation to routing schemes in computer networks; search engines apply solutions to this problem on website interconnectivity graphs and social networks apply them on graphs of peoples' relationships.

In this contest, the task is to answer shortest path queries on a changing graph, as quickly as possible. We will provide an initial graph which you may process and index in any way you deem necessary. Once this is done, we will begin issuing a workload consisting of a series of sequential operation batches. Each operation is either a graph modification (insertion or removal) or a query about the shortest path between two nodes in the graph. Your program is expected to correctly answer all queries as if all operations had been executed in the order they were given.

The graphs are directed and unweighted. Input to your program will be provided via standard input, and the output must appear on the standard output.

Testing Protocol

Our test harness will first feed the initial graph to your program's standard input. The graph is represented as a list of edges, each of which consists of a pair of node ids (the edge starting node, followed by the edge destination node) represented as non-negative integer numbers. Your program will receive multiple lines (each representing one edge) containing exactly 2 integer numbers in decimal ASCII representation separated by a single space. The initial graph ends with a line containing the character 'S'.

1 2 3 4
1 2
2 3
3 1
4 1
2 4
S

In the example above, the input to the left describes the graph presented at the right. Node IDs may appear in the input in any order, and there may be "gaps". That is, nodes in an N-node graph may have node IDs larger than N-1. The largest possible Node ID is 232-1. There may be at most one edge between any two nodes. If the same edge appears in the input more than once, only a single edge should be included in the graph.

After sending the initial graph input, our test harness will monitor your program's standard output for a line containing the character 'R' (case insensitive, followed by the new line character '\n'). Your program uses this line to signal that it is done ingesting the original graph, has performed any processing and/or indexing on it and is now ready to receive the workload.

The test harness delivers the workload in batches. Each batch consists of a sequence of operations provided one per line followed by a line containing the single character 'F' that signals the end of the batch.

Each operation is represented by one character ('Q', 'A' or 'D') that defines the operation type, followed by a space and two positive integer numbers in decimal ASCII representation, also separated by a space. The two integer numbers represent node IDs.

The three operation types are as follows:

  • 'Q'/query: this operation needs to be answered with the distance of the shortest (directed) path from the first node to the second node in the current graph. The answer should appear as output in the form of a single line containing the decimal ASCII representation of the integer distance between the two nodes, i.e., the number of edges on a shortest directed path between them. If there is no path between the nodes or if either of the nodes does not exist in the graph, the answer should be -1. The distance between any node and itself is always 0.
  • 'A'/add: This operation requires you to modify your current graph by adding another edge from the first node in the operation to the second. As was the case during the input of the original graph input, if the edge already exists, the graph remains unchanged. If one (or both) of the specified endpoints of the new edge does not exist in the graph, it should be added. This operation should not produce any output.
  • 'D'/delete: This operation requires you to modify your current graph by removing the edge from the first node in the operation to the second. If the specified edge does not exist in the graph, the graph should remain unchanged. This operation should not produce any output.

After the end of every batch, our test harness will wait for output from your program before providing the next batch. You need to provide as many lines of output as there are query ('Q') operations in the batch - each line containing the distance of the shortest path as described above. Your program is free to process operations in a batch concurrently. However, the query results in the output must reflect the order of the queries within the batch.

After the last batch, the standard input stream of your program will be closed by our test harness. Here are some example batches corresponding to the initial example graph above:

Batch 1:

1 2 3 4 5
Q 1 3
A 4 5
Q 1 5
Q 5 1
F

Output:

2
3
-1

Batch 2:

1 2 3 4 5
A 5 3
Q 1 3
D 2 3
Q 1 3
F

Output:

2
4

As you can see, the first batch has 3 queries and thus 3 output lines. Also, after the first query, another edge (4 → 5) is added making the graph look like the one in the Batch 1 figure.

The second batch is then run after the first finishes and another edge (5 → 3) is added. At this point, the shortest path between 1 and 3 is 2 hops (1 → 2 → 3). After that we have an edge (2 → 3) deletion and the same query is rerun. This time, the shortest path between 1 and 3 is 4 hops (1 → 2 → 4 → 5 → 3).

Your solution will be evaluated for correctness and execution time. Execution time measurement does not start until your program signals (with 'R') that it is finished ingesting the intial graph. Thus, you are free to pre-process or index the graph as you see fit without penalty, as long as your program runs within the overall testing time limit. Concurrent request execution within each batch is allowed and encouraged, as long as the results mimic a sequential execution of the operations within the batch. In particular, the result for each query must reflect all additions and deletions that precede it in the workload sequence, and must not reflect any additions and deletions that follow it.

Try-It-Yourself Example

We are providing you with this widget in order to test your understanding of the task and the testing protocol. This is Javascript based and will run on your own processor. Note that it is not intended for very large input files, which might crash your browser.

To use this widget, simply change the input on the left and hit the Solve button. Your expected output will be displayed in the right side. The input is populated with our example above.

Reference Solution

We have created a simple reference solution, which you are welcome to download and modify. It is implemented in Python, using the networkx module. The reference solution is in the format required for submission. You can create a submittable submission.tar.gz file using the included package.sh script.

Evaluation Process

Our testing infrastructure will evaluate each submission by unpacking the submission file, compiling the submitted code (using your submitted compile.sh script), and then running a series of tests (using your submitted run.sh script). Extraction time is limited to 10 seconds, and compilation time is limited to 60 seconds. Submissions that exceed these limits will not be tested at all.

Each test uses the test harness to supply an initial graph and a workload to your submitted program. The total time for each test is limited to 4-5 minutes (different tests may have slighly different limits). The total per-test time includes the time to ingest the graph and the time to process the queries and updates in the workload. These per-test execution time limit may be reduced later in the contest. As of 28 March, the timeouts are 2 minutes for all tests except XX-Large, which has a 5 minute timeout.

Submissions will be unpacked, compiled and tested in a network-isolated container with characteristics shown in the following table:

Processor 2 x Intel E5-2620v2 @ 2.10Ghz (2.60Ghz turbo)
Configuration 12 Cores / 24 Hyperthreads
Main Memory 20 GB
Operating System Ubuntu 14.04 Linux
Software Autoconf 2.69, Automake 1.15, Boost 1.58, CMake 3.2.2, Go 1.5.2, Maven 3.3.3, Node.js 5.4.0, Oracle Java 8, Python 2.7.9 (as python), Python 3.4.3 (as python3), Ruby 2.2.3 (as ruby2.2), Rust 1.5.0, TBB 4.3, ant 1.9.6, clang 3.6, gcc/g++/gccgo 5.2.1, jemalloc 3.6.0

A submission is considered to be rankable if it correctly processes all of the test workloads within their time limits. Rankable submissions will be scored according to the sum of their execution times on the test workloads. As discussed in the testing protocol description, graph ingest and indexing time is not included in a submission's score. The leaderboard will show the best rankable submission for each team that has at least one such submission.

There are currently three test workloads. Additional workloads may be added later in the contest.

Once submissions have closed, we will review them to choose five contest finalists. We will consider at least the top five submissions from the leaderboard as of the time that submissions close, and we may consider more than that. We will review each of those submissions to ensure that they adhere to the contest guidelines. Please make sure that your README file accurately describes both your submission and your team. We will also subject submissions to additional, unpublished tests, which will contribute to the final ranking. Our intention is to notify the five finalists by April 30, 2016 at the latest.

Submitting

The SIGMOD 2016 programming contest management system accepts submissions consisting of a single compressed tar file called submission.tar.gz. Submission files must be no larger than 5 MB - larger files will be rejected. Each submission must include the following files/directories:
  • run.sh
    This script is responsible for running your executable, which should interact with our test harness according to the testing protocol, reading from standard input and writing results to standard output.
  • README
    This file must contain information about the submission, including:
    1. the team name
    2. for each team member: full name, e-mail, institution, department, and degree program
    3. advisor/supervisor name (if any)
    4. a brief description of techniques used for executing queries
    5. a list of the third party code used
  • src/
    This directory must contain all of the source code.
  • compile.sh
    This shell script must be able to compile the source contained in the src directory. The produced executable file(s) must be located within the src directory. The benchmark environment is isolated and without internet access. You will have to provide all files required for successfull compilation.

You can use the reference solution, which has the required format, as a starting point.

You are strongly encouraged to ensure that your solution is correct before submitting it. To this end, we have packaged our test harness and a test case for download. You can use the test harness to test your solution locally before submitting it.

Submissions will be evaluated and results will appear on the leaderboard. Teams are allowed to make any number of submissions. However, all submissions must be made by 23:59 UTC, April 7, 2016.

Rules

  • The 2016 SIGMOD Programming Contest is open to undergraduate and graduate students from degree-granting institutions all over the world. However, students associated with the organizers' institution (University of Waterloo) are not eligible to participate.
  • Teams must consist of individuals currently registered as graduate or undergraduate students at an accredited academic institution. A team may be formed by one or more students, who need not be enrolled at the same institution. Several teams from the same institution can compete independently, but one person can be a member of only one team. There is no limit on team size. Teams can register on the contest site after February 21, 2016.
  • All submissions must consist only of code written by the team or open source licensed software (i.e., using an OSI-approved license). For source code from books or public articles, clear reference and attribution must be made. Final submissions must be made by 23:59 UTC, April 7, 2016.
  • All teams must agree to license their code under an OSI-approved open source license. By participating in this contest, each team agrees to publish its source code. The finalists' implementations will be made public on the contest website.
  • A team is eligible for the prize if at least one of its members will come to present the work at the SIGMOD 2016 conference. The travel grants will only be granted to eligible teams.