[Please remove <h1>]
Fall 2011
Events of interest to the
Database Research Group are posted here, and are also
mailed to the uw.cs.database newsgroup and the
db-faculty,
db-grads,
db-friends
mailing lists.
Subscribe to one of these mailing lists to receive e-mail notification
of upcoming events.
The DB group meets Wednesday afternoons at 2:30pm.
The list below gives the
times and locations of upcoming meetings.
Each meeting lasts for an hour and features either
a local speaker or, on
Seminar days,
an invited outside speaker.
Everyone is welcome to attend.
Recent and Upcoming Events
DB Meeting:
|
Wednesday September 21, 1:00pm,
DC1331 (Note: special meeting time)
|
Speakers:
|
Ashraf Aboulnaga, Gunes Aluc, Khuzaima Daudjee, Dave DeHaan, Ken
Salem, Ning Zhang
|
Title:
|
VLDB 2011 5 Minute Madness
|
Abstract:
|
- Ashraf will talk about trendy topics in social networks based on the paper:
Structural Trend Analysis For Online Social Networks
Ceren Budak, Divyakant Agrawal, Amr El Abbadi.
- Gunes will talk about database cracking based on the paper "Merging What's Cracked, Cracking What's Merged: Adaptive Indexing in Main-Memory Column-Stores" by Stratos Idreos (CWI), Stefan Manegold (CWI), Harumi Kuno (HP Labs), Goetz Graefe (HP Labs).
- Khuzaima will talk about something.
- Dave will talk about a scheme for optimistic concurrency control
allowing highly-parallel updates to tree-structured indexes:
Optimistic Concurrency Control by Melding Trees
Philip A. Bernstein (Microsoft Research), Colin W. Reid (Microsoft), Ming
Wu (Microsoft Research), Xinhao Yuan (Tsinghua University)
- Patrick will talk about something.
- Ken will talk about something so secret that nobody else at VLDB
heard about it.
- Ning will talk about graph databases.
|
DB Meeting:
|
Wednesday October 5, 2:30pm, DC 1331
|
Speaker:
|
Lukasz Golab
|
Title:
|
Discovering Pattern Tableaux for Data Quality Analysis
|
Abstract:
|
I'll present a case study that illustrates the utility of pattern tableau discovery for data quality analysis. Given a user-supplied integrity constraint, such as a Boolean predicate expected to be satisfied by every tuple, a functional dependency, or an inclusion dependency, a pattern tableau is a concise summary of subsets of the data that satisfy or fail the constraint. I'll describe Data Auditor---a system for automatic tableau discovery from data---and give examples of characterizing data quality in a network monitoring database used by a large Internet Service Provider. This is joint work with Flip Korn and Divesh Srivastava.
|
DB Meeting:
|
Wednesday October 19, 2:30pm, DC 1331
|
Speaker:
|
Shahab Kamali
|
Title:
|
Answering math queries with general-purpose search engines
|
Abstract:
|
Traditionally, general-purpose search engines such as Bing and Google, are used to look-up keywords within web-pages. Hence, a query is assumed to consist of a bag of keywords, and the search result is a ranked list of documents. However, for some queries a short answer can better satisfy users needs. For a class of such queries, that we call math queries, the answer should be calculated rather than looked up in a database. Arithmetic computations, unit conversions, and symbolic computations are examples of math queries. Our goal is to evaluate a search engine's ability in recognizing and answering math queries. Determining if an arbitrary query is a math query or not, is a hard problem. We propose a novel approach for recognizing and classifying math queries using large scale search logs. Traditional approaches for evaluating the quality of results, mostly rely on users interactions with the engine typically measured by the click information. Answers to math queries do not contain links, therefore most of the previously proposed metrics are not applicable in this case. We propose various evaluation metrics that can be applied for math queries, and present the results on a large collection of math queries taken from Bing's search logs.
|
DB Meeting:
|
Wednesday October 26,
2:30pm, DC 1331 CANCELLED
|
Speaker:
|
Amr El-Helw
|
Title:
|
Column-Oriented Query Processing for Row Stores
|
Abstract:
|
Column-oriented DBMSs have gained increasing interest due to their superior performance for analytical workloads. Prior efforts tried to determine the possibility of simulating the query processing techniques of column-oriented systems in row-oriented databases, in a hope to improve their performance, especially for OLAP and data warehousing applications. In this talk, I show that column-oriented query processing can significantly improve the performance of row-oriented DBMSs, using techniques that take into account the unique characteristics of data obtained from indexes, and exploit new technologies such as flash SSDs and multi-core processors to boost the performance.
|
DB Meeting:
|
Wednesday November 2, 2:30pm, DC 1331
|
Speaker:
|
Ken Salem
|
Title:
|
A Scalable, Highly Available Cloud Storage Tier for Relational DBMS
|
Abstract:
|
I'll present some recent work on using an eventually consistent
NoSQL system (Cassandra) to provide a scalable, available,
wide area, multi-tenant storage service for
relational database systems. This is joint work with Rui Liu
and Ashraf Aboulnaga. (slides (PDF))
|
DB Meeting:
|
Wednesday November 16, 2:30pm, DC 1331
|
Speaker:
|
Pedram Ghodsnia
|
Title:
|
Error Reduction and GPU-Accelerated Query Execution in Signature Files
|
Abstract:
|
Signature File index is a well-studied method in information retrieval
for indexing large text databases. Because of the small index size in
this method, it is a good candidate for environments where memory is
scarce. This small index size, however, comes at the cost of high
false positive error rate and long query execution time. These two
critical problems make this method impractical for many applications.
In the first part of this talk, we will address the problem of high
false positive error rate of signature files by introducing COCA
Filters, a new variation of Bloom Filters which exploits the
co-occurrence probability of words in documents to reduce the false
positive error. We will show that by using this technique we can
reduce the false positive error by up to 21 times, for the same index
size.
In the second part of the talk, we will address the long query
execution time of signature files by proposing a scalable approach
that utilizes the massive computational power of Graphics Processing
Units (GPUs) to accelerate the query execution. The impressive
speed-up and the scalability of our proposed method will be shown
experimentally. The scalability of this approach allows us to reach
the desirable speed-up by increasing the number of GPUs.
The first part of the talk is based on a paper entitled "COCA Filters: Co-Occurrence Aware Bloom Filters", a joint work with Kamran Tirdad, Ian Munro and Alejandro Lopez-Ortiz. This Paper won the best student paper award in SPIRE 2011.
|
DB Meeting:
|
Wednesday November 23, 2:30pm, DC 1331
|
Speaker:
|
Ming-Yee Iu
|
Title:
|
MapReduce Query Optimization and LINQ for Java
|
Abstract:
|
Recently, there has been a blurring of the boundary between programming languages and query languages. Programmers increasingly want to include arbitrary code in their database queries and to embed database queries into their programming languages. To support these sorts of use patterns, we need appropriate code analysis tools. I will show how two database code analysis problems can be solved using symbolic execution.
I will first look at Hadoop MapReduce queries. These queries are written using arbitrary Java code, which makes them difficult to anaylse and optimise. By analyzing certain queries With symbolic execution, we can extract input restrictions from them, significantly improving their performance.
Secondly, I will look at database queries in Java 8. When Java 8 is released next year, it will finally include limited support for functional programming. I will show how this functional support can be combined with symbolic execution to allow programmers to finally write database queries in a functional-style, like in Microsoft's LINQ.
|
DB Meeting:
|
Wednesday December 7, 2:30pm, DC 1331
|
Speaker:
|
Dan Farrar
|
Title:
|
Pricing models in cloud database platforms
|
Abstract:
|
One of the main advantages of cloud computing is that one only needs to pay
for whatever computing resources are actually used in a given time period.
However, this means that when distributing workloads across cloud
resources, administrators must take account not only of performance and
durability, but also the cost implications of workload placement.
I will be discussing a paper which describes different cloud pricing models
and analyzes their sensitivity to workload type and distribution, and
proposes a pricing model that reduces these sensitivities. "Resource and
Virtualization Costs up in the Cloud: Models and Design Choices." Daniel
Gmach, Jerry Rolia, Ludmila Cherkasova (HP Labs). In Proc. 2011 IEEE/IFIP
41st Conf. on Dependable Systems & Networks, Hong Kong, p395-402.
|
This page is maintained
by
Ken Salem.