[Please remove <h1>]
Winter 2012
Events of interest to the
Database Research Group are posted here, and are also
mailed to the uw.cs.database newsgroup and the
db-faculty,
db-grads,
db-friends
mailing lists.
Subscribe to one of these mailing lists to receive e-mail notification
of upcoming events.
The DB group meets Wednesday afternoons at 2:30pm.
The list below gives the
times and locations of upcoming meetings.
Each meeting lasts for an hour and features either
a local speaker or, on
Seminar days,
an invited outside speaker.
Everyone is welcome to attend.
Winter 2012 Events
DB Meeting:
|
Wednesday January 11, 2:30pm, DC 1331
|
Speaker:
|
Umar Farooq Minhas
|
Title:
|
Scalable and Highly Available Database Systems in the Cloud
|
Abstract:
|
Cloud computing allows users to tap into a massive pool of shared computing resources such as servers, storage, and network. These resources are provided as a service to the users allowing them to "plug into the cloud" similar to a utility grid. The promise of the cloud is to free users from the tedious and often complex task of managing and provisioning computing resources to run applications. At the same time, the cloud brings several additional benefits including: a pay-as-you-go cost model, easier deployment of applications, elastic scalability, high availability, and a more robust and secure infrastructure. One important class of applications that users are increasingly deploying in the cloud is database management systems. Database management systems differ from other types of applications in that they manage large amounts of state that is frequently updated, and that must be kept consistent at all scales and in the face of failure. This makes it difficult to provide scalability and high availability for database systems in the cloud. In this talk, I will show how we can exploit cloud technologies and relational database systems to provide a highly available and scalable database service in the cloud. In the first part of my talk, I will present RemusDB, a reliable, cost-effective high availability solution that is implemented as a service provided by the virtualization platform. RemusDB can make any database system highly available with little or no code modifications by exploiting the capabilities of virtualization. In the second part of the talk, I will present two systems that aim to provide elastic scalability for database systems in the cloud using two very different approaches. The three systems I will present bring us closer to the goal of building a scalable and reliable transactional database service in the cloud.
|
DB Meeting:
|
Wednesday January 18, 2:30pm, DC 1331
|
Speaker:
|
Jiewen Wu
|
Title:
|
Answering Object Queries in DL Knowledge Bases
|
Abstract:
|
We consider a generalization of instance retrieval over
description-logics knowledge bases that provides users with assertions
in which descriptions of qualifying objects
are given in addition to their identifiers. Notably, this involves a
transfer of basic database paradigms involving caching and query
rewriting in the context of an assertion retrieval algebra. We present
a query optimization framework for this algebra, with a
focus on finding plans that avoid any need for general knowledge base
reasoning at query execution time when sufficient cached results of
earlier requests exist.
|
DB Meeting:
|
Wednesday February 1, 2:30pm, DC 1331
|
Speaker:
|
Iman Elghandour
|
Title:
|
ReStore: Reusing Results of MapReduce Jobs
|
Abstract:
|
Analyzing large scale data has emerged as an important activity for many organizations in the past few years. This large scale data analysis is facilitated by the MapReduce programming and execution model and its implementations, most notably Hadoop. Users of MapReduce often have analysis tasks that are too complex to express as individual MapReduce jobs. Instead, they use high-level query languages such as Pig, Hive, or JAQL to express their complex tasks. The compilers of these languages translate queries into workflows of MapReduce jobs. Each job in these workflows reads its input from the distributed file system used by the MapReduce system and produces output that is stored in this distributed file system and read as input by the next job in the workflow. In my talk, I will present ReStore, a system that manages the storage and reuse of such intermediate results.
|
DB Meeting:
|
Wednesday February 8, 2:30pm, DC 1331
|
Speaker:
|
Greg Drzadzewski
|
Title:
|
Online Analytical Processing of Documents
|
Abstract:
|
With the availability of many large and ever growing document collections it is getting more cumbersome for users to explore them. While a search engine is useful to satisfy a user's ad hoc information needs, allowing a user to retrieve relevant documents through a keyword query, it is inadequate for analysis of bulky text information, which are necessary in many online applications. This type of exploration need can be addressed by providing support for online applications such as summarizing the contents of a text cell, and comparing the contents across multiple text cells. In my talk I will examine online analytical processing of documents and discuss the following two papers that deal with this area of research:
- Jin, X., Han, J., Cao, L., Luo, J., Ding, B. and Lin, C. X. Visual cube and on-line analytical processing of images. CIKM 2010, 849-858.
- Zhang, D., Zhai, C. and Han, J. MiTexCube: MicroTextCluster Cube
for Online Analysis of Text Cells. CIDU 2011, 204-218.
|
DB Meeting:
|
Wednesday February 15,
2:30pm, DC 1331
POSTPONED
|
Speaker:
|
Alex Hudek
|
Title:
|
On Enumerating Query Plans Using Interpolants
|
Abstract:
|
For relational (SQL) queries a standard service provided by current
relational systems is to search the space of alternative query plans
(ways of executing the query) to find one likely to have the best
performance. A given query often has many semantically equivalent
plans that vary in performance by many orders of magnitude making
the problem of finding a best plan difficult. Recent trends in view
based query rewriting, information integration, and ontology-based
data access have made the relationship between the query and its
plan space much more complex. Enumerating the possible plans has
become even more challenging as the relationship between the user
(logical) view of the data and the material capabilities for
accessing relevant stored information has become less
transparent. In this paper, we show how to use interpolation
techniques to enumerate possible plans for a given user query. We
also show how to obtain common varieties of plan patterns in this
setting, such as those that derive from an enumeration of possible
join orders for conjunctive (sub) queries.
|
Seminar:
|
Thursday February 16, 4:00pm, DC 1331
|
Speaker:
|
Arnon Sturm, Ben-Gurion University of the Negev
|
|
Title:
|
A Methodology for Developing Secure Database Code
|
Abstract:
|
Security in general and database protection from unauthorized
access in particular, are crucial to organizations. Several methods and
techniques were devised to address this concern. However, none of these
provide a comprehensive solution. In this talk we explore a work done
within the context of a research project which aims at developing a
methodology for guiding and enforcing developers, in particular
database designers, to deal with database security requirements related
to authorization in the early stages of development. The proposed
methodology enables to define and enforce organizational security
policies, and to validate that security requirements defined by the
designers of an application are in accord with the organizational
transformation of the design results into actual implementation, i.e.,
into the specification of the database code, including the
authorization specification. We also present an empirical evaluation of
part of the proposed approach.
|
DB Meeting:
|
Wednesday February 29, 2:30pm, DC 1331
|
Speaker:
|
Alex Hudek
|
Title:
|
On Enumerating Query Plans Using Interpolants
|
Abstract:
|
For relational (SQL) queries a standard service provided by current
relational systems is to search the space of alternative query plans
(ways of executing the query) to find one likely to have the best
performance. A given query often has many semantically equivalent
plans that vary in performance by many orders of magnitude making
the problem of finding a best plan difficult. Recent trends in view
based query rewriting, information integration, and ontology-based
data access have made the relationship between the query and its
plan space much more complex. Enumerating the possible plans has
become even more challenging as the relationship between the user
(logical) view of the data and the material capabilities for
accessing relevant stored information has become less
transparent. In this paper, we show how to use interpolation
techniques to enumerate possible plans for a given user query. We
also show how to obtain common varieties of plan patterns in this
setting, such as those that derive from an enumeration of possible
join orders for conjunctive (sub) queries.
|
DB Meeting:
|
Wednesday March 7, 2:30pm, DC 1331
|
Speaker:
|
Ahmed Ataullah
|
Title:
|
Towards Policy-Centric Object-Relational-Modeling (ORM)
|
Abstract:
|
Object relational modeling is essentially the challenge of mapping
objects, as programmers see them, to individual pieces of data stored
in relations in a database system. The research in this area is
primarily motivated by the fact that object oriented database systems
have not seen mainstream adoption and that retrieving/persisting
objects from/to a relational database requires querying and therefore
some knowledge about the logical/physical schema on the part of the OO
programmer. This impedance mismatch makes the process of (rapid)
application development more costly and inefficient. Although ORM
offers the promise of further isolating programmers from the storage
layer, it leads to interesting design questions about concurrency
control, transaction management, side-effects (triggers) and methods
for effective programming in the presence of an intermediate
object-to-SQL layer.
In the first half of this talk I will introduce, motivate, discuss the
pros/cons of ORM and briefly go over the features of popular ORM tools
(Hibernate and ADO.net Entity framework). In the second half of my
talk I will pose an open question of whether ORM can be used by
business policy makers to embed and model rules, such as obligations,
and inter object temporal restriction that mimic business policy
requirements. My goal will be to show that ORM should not only be a
tool for object-oriented programming but can (in theory) also be used
for expressing complex policies over objects and these policies can be
eventually translated into a set of active integrity constraints in a
database system. The talk will appeal to a broad audience and everyone
is encouraged to attend as the variety and quality of the refreshments
and snacks will be SIGNIFICANTLY better than typical DB-lab talks.
|
DB Meeting:
|
Wednesday March 14, 2:30pm, DC 1331
|
Speaker:
|
Gunes Aluc
|
Title:
|
Parametric Plan Caching Using Density-Based Clustering
|
Abstract:
|
Query plan caching eliminates the need for repeated query
optimization; hence, it has strong practical implications for relational
database management systems (RDBMSs). Unfortunately, existing approaches
consider only the query plan generated at the expected values of
parameters that characterize the query, data and the current state of
the system, while these parameters may take different values during the
lifetime of a cached plan. A better alternative is to harvest the
optimizer's plan choice for different parameter values, populate the
cache with promising query plans, and select a cached plan based upon
current parameter values. To address this challenge, we propose a
parametric plan caching (PPC) framework that uses an online plan space
clustering algorithm. The clustering algorithm is density-based, and it
exploits locality-sensitive hashing as a pre-processing step so that
clusters in the plan spaces can be efficiently stored in database
histograms and queried in constant time. We experimentally validate that
our approach is precise, efficient in space-and-time and adaptive,
requiring no eager exploration of the plan spaces of the optimizer.
|
DB Meeting:
|
Wednesday March 28, 2:30pm, DC 1331
|
Speaker:
|
Ani Nica
|
Title:
|
On Resource Consumption of the Query Optimization Process
|
Abstract:
|
Query optimization is a sophisticated process whose resource
consumption and quality of the best execution plan is determined by the
query complexity, available resources of the
RDBMS server, and the current instance of the database.
In this talk, I will present the experimental results of the optimization
time
breakdown and the memory consumption for a set of join
enumeration algorithms ranging from highly heuristics algorithms to dynamic
programming algorithms with exhaustive
bushy trees enumeration. Next, I will discuss how these type of statistics
can be used:
- to analyze the effect of a change to the query optimizer (e.g., a
change which improves the CPU time for
the plan generation phase but increases overall memory consumption);
- to analyze the differences between two join enumeration algorithms for
particular queries;
- to estimate the resource consumption of current queries based on
previously optimized queries.
|
DB Meeting:
|
Wednesday April 18, 2:30pm, DC 1331
CANCELLED
|
Speaker:
|
Ahmed Soror
|
DB Meeting:
|
Wednesday April 25, 2:30pm, DC 1331
|
Speaker:
|
Ning Zhang
|
Title:
|
A New Query Model for Graph Databases
|
Abstract:
|
The database community has shown interest in developing algorithms to querying large graph databases. There are the following common types of queries: reachability queries, shortest distance path queries, and subgraph containment/match queries. Each of these query types match a particular application, but individually they are not sufficiently powerful or general to serve as a general purpose graph query language (similar to SQL for relational systems).
In many applications, graph data can be modeled as a property graph where each node and each edge may have a label and arbitrary key/value pairs as properties and problems are related to identifying patterns in the graph, i.e., identifying sub-graphs that match a particular topology and/or some features. In this paper, we define a new query model over a property graph model so that most common types of queries can be uniformly expressed and processed efficiently under a single framework.
|
This page is maintained
by
Ken Salem.