The DB group meets Wednesday afternoons at 2:30pm. The list below gives the times and locations of upcoming meetings. Each meeting lasts for an hour and features either a local speaker or, on Seminar days, an invited outside speaker. Everyone is welcome to attend.
DB Meeting: | Wednesday September 21, 1:00pm, DC1331 (Note: special meeting time) |
Speakers: | Ashraf Aboulnaga, Gunes Aluc, Khuzaima Daudjee, Dave DeHaan, Ken Salem, Ning Zhang |
Title: | VLDB 2011 5 Minute Madness |
Abstract: |
|
MMath Thesis Seminar | Thursday September 22, 2:00pm, DC 1331 | |
Speaker: | Alexey Karyakin | |
Title: | Dynamic Scale-out Mechanisms for Partitioned Shared-Nothing Databases |
DB Seminar: | Wednesday September 28, 2:30pm, DC 1302 | |
Speaker: | Jonathan Goldstein, Microsoft | |
Title: | Temporal Analytics on Big Data for Web Advertising |
DB Meeting: | Wednesday October 5, 2:30pm, DC 1331 |
Speaker: | Lukasz Golab |
Title: | Discovering Pattern Tableaux for Data Quality Analysis |
Abstract: | I'll present a case study that illustrates the utility of pattern tableau discovery for data quality analysis. Given a user-supplied integrity constraint, such as a Boolean predicate expected to be satisfied by every tuple, a functional dependency, or an inclusion dependency, a pattern tableau is a concise summary of subsets of the data that satisfy or fail the constraint. I'll describe Data Auditor---a system for automatic tableau discovery from data---and give examples of characterizing data quality in a network monitoring database used by a large Internet Service Provider. This is joint work with Flip Korn and Divesh Srivastava. |
DB Seminar: | Wednesday October 12, 2:00pm, DC 1302 (Please note the early start time.) | |
Speaker: | Alon Halevy, Google Research | |
Title: | Bringing (Web) Databases to the Masses |
DB Meeting: | Wednesday October 19, 2:30pm, DC 1331 |
Speaker: | Shahab Kamali |
Title: | Answering math queries with general-purpose search engines |
Abstract: | Traditionally, general-purpose search engines such as Bing and Google, are used to look-up keywords within web-pages. Hence, a query is assumed to consist of a bag of keywords, and the search result is a ranked list of documents. However, for some queries a short answer can better satisfy users needs. For a class of such queries, that we call math queries, the answer should be calculated rather than looked up in a database. Arithmetic computations, unit conversions, and symbolic computations are examples of math queries. Our goal is to evaluate a search engine's ability in recognizing and answering math queries. Determining if an arbitrary query is a math query or not, is a hard problem. We propose a novel approach for recognizing and classifying math queries using large scale search logs. Traditional approaches for evaluating the quality of results, mostly rely on users interactions with the engine typically measured by the click information. Answers to math queries do not contain links, therefore most of the previously proposed metrics are not applicable in this case. We propose various evaluation metrics that can be applied for math queries, and present the results on a large collection of math queries taken from Bing's search logs. |
DB Meeting: | Wednesday October 26, 2:30pm, DC 1331 CANCELLED |
Speaker: | Amr El-Helw |
Title: | Column-Oriented Query Processing for Row Stores |
Abstract: | Column-oriented DBMSs have gained increasing interest due to their superior performance for analytical workloads. Prior efforts tried to determine the possibility of simulating the query processing techniques of column-oriented systems in row-oriented databases, in a hope to improve their performance, especially for OLAP and data warehousing applications. In this talk, I show that column-oriented query processing can significantly improve the performance of row-oriented DBMSs, using techniques that take into account the unique characteristics of data obtained from indexes, and exploit new technologies such as flash SSDs and multi-core processors to boost the performance. |
DB Meeting: | Wednesday November 2, 2:30pm, DC 1331 |
Speaker: | Ken Salem |
Title: | A Scalable, Highly Available Cloud Storage Tier for Relational DBMS |
Abstract: | I'll present some recent work on using an eventually consistent NoSQL system (Cassandra) to provide a scalable, available, wide area, multi-tenant storage service for relational database systems. This is joint work with Rui Liu and Ashraf Aboulnaga. (slides (PDF)) |
DB Meeting: | Wednesday November 16, 2:30pm, DC 1331 |
Speaker: | Pedram Ghodsnia |
Title: | Error Reduction and GPU-Accelerated Query Execution in Signature Files |
Abstract: |
Signature File index is a well-studied method in information retrieval
for indexing large text databases. Because of the small index size in
this method, it is a good candidate for environments where memory is
scarce. This small index size, however, comes at the cost of high
false positive error rate and long query execution time. These two
critical problems make this method impractical for many applications.
In the first part of this talk, we will address the problem of high false positive error rate of signature files by introducing COCA Filters, a new variation of Bloom Filters which exploits the co-occurrence probability of words in documents to reduce the false positive error. We will show that by using this technique we can reduce the false positive error by up to 21 times, for the same index size. In the second part of the talk, we will address the long query execution time of signature files by proposing a scalable approach that utilizes the massive computational power of Graphics Processing Units (GPUs) to accelerate the query execution. The impressive speed-up and the scalability of our proposed method will be shown experimentally. The scalability of this approach allows us to reach the desirable speed-up by increasing the number of GPUs. The first part of the talk is based on a paper entitled "COCA Filters: Co-Occurrence Aware Bloom Filters", a joint work with Kamran Tirdad, Ian Munro and Alejandro Lopez-Ortiz. This Paper won the best student paper award in SPIRE 2011. |
DB Meeting: | Wednesday November 23, 2:30pm, DC 1331 |
Speaker: | Ming-Yee Iu |
Title: | MapReduce Query Optimization and LINQ for Java |
Abstract: |
Recently, there has been a blurring of the boundary between programming languages and query languages. Programmers increasingly want to include arbitrary code in their database queries and to embed database queries into their programming languages. To support these sorts of use patterns, we need appropriate code analysis tools. I will show how two database code analysis problems can be solved using symbolic execution.
I will first look at Hadoop MapReduce queries. These queries are written using arbitrary Java code, which makes them difficult to anaylse and optimise. By analyzing certain queries With symbolic execution, we can extract input restrictions from them, significantly improving their performance. Secondly, I will look at database queries in Java 8. When Java 8 is released next year, it will finally include limited support for functional programming. I will show how this functional support can be combined with symbolic execution to allow programmers to finally write database queries in a functional-style, like in Microsoft's LINQ. |
DB Seminar: | Wednesday November 30, 2:30pm, DC 1302 | |
Speaker: | Molham Aref, LogicBlox | |
Title: | Datalog for Enterprise Software: from Industrial Applications to Research |
DB Meeting: | Wednesday December 7, 2:30pm, DC 1331 |
Speaker: | Dan Farrar |
Title: | Pricing models in cloud database platforms |
Abstract: |
One of the main advantages of cloud computing is that one only needs to pay
for whatever computing resources are actually used in a given time period.
However, this means that when distributing workloads across cloud
resources, administrators must take account not only of performance and
durability, but also the cost implications of workload placement.
I will be discussing a paper which describes different cloud pricing models and analyzes their sensitivity to workload type and distribution, and proposes a pricing model that reduces these sensitivities. "Resource and Virtualization Costs up in the Cloud: Models and Design Choices." Daniel Gmach, Jerry Rolia, Ludmila Cherkasova (HP Labs). In Proc. 2011 IEEE/IFIP 41st Conf. on Dependable Systems & Networks, Hong Kong, p395-402. |
Data Systems Group
David R. Cheriton School of Computer Science
University of Waterloo
Waterloo, Ontario, Canada N2L 3G1
Tel: 519-888-4567
Fax: 519-885-1208
Contact | Feedback: db-webmaster@cs.uwaterloo.ca | Data Systems Group