[Please remove <h1>]
Spring 2011
Events of interest to the
Database Research Group are posted here, and are also
mailed to the uw.cs.database newsgroup and the
db-faculty,
db-grads,
db-friends
mailing lists.
Subscribe to one of these mailing lists to receive e-mail notification
of upcoming events.
The DB group meets Wednesday afternoons at 2:30pm.
The list below gives the
times and locations of upcoming meetings.
Each meeting lasts for an hour and features either
a local speaker or, on
Seminar days,
an invited outside speaker.
Everyone is welcome to attend.
Spring 2011 Events
DB Meeting:
|
Wednesday May 11, 2:30pm, DC 1331
|
Speaker:
|
Ani Nica, Sybase
|
Title:
|
A Call for Order in Search Space Generation Process of Query Optimization
|
Abstract:
|
Many search space generation algorithms used by query optimizers focus on
efficiently and exhaustively enumerating
the set of alternative plans. However, newly emerged systems, such as
Object-Relational Mapping (ORM) tools, introduce new
challenges to the query optimizer. This is because dynamically generated
queries are very complex, relatively inexpensive to
execute, and they must be optimized at runtime. For such queries the
optimization time could be prohibitively expensive relative to
their execution time if traditional join enumeration algorithms are used.
Recently, new heuristics for the query optimization
have been proposed which provide feasible alternatives to the classical
enumeration algorithms by using greedy techniques to
efficiently build an execution plan such that the optimization time is kept
to a minimum.
In this talk, I will present the paper A Call for Order in Search Space
Generation Process of Query Optimization[1] which is published in the
proceedings of
IEEE International Workshop on Self Managing Database Systems (IEEE SMDB),
April, 2011, Hannover, Germany.
The paper introduces new techniques for dynamically ordering the search
space of the candidate joins
generated by conventional enumeration algorithms. Ordering the enumerated
join trees based on their properties allows effective partition-based
pruning
of the search space, so that the space size and the optimization time
become adapted to the current characteristics of the optimization process.
Any
conventional join enumeration algorithms based on enumerating partitions
and using memoization for saving access plans can be
extended with our proposed technique.
[1] Anisoara Nica, A Call for Order in Search Space Generation Process of
Query Optimization, IEEE International Workshop on Self Managing Database
Systems (IEEE SMDB), IEEE International Conference on Data Engineering
(IEEE ICDE) Workshops 2011, Hannover, Germany
|
DB Meeting:
|
Wednesday May 18, 2:30pm, DC 1331
|
Speaker:
|
Anil Goel, Sybase
|
Title:
|
Big Data, Advanced Analytics
|
Abstract:
|
I will give a business and technological overview of the "Big Data" and "Advanced Analytics" problem spaces, and an outline of one potential solution space for these problems as envisioned by Sybase, particularly in the context of the Sybase IQ EDW offering. I will also provide context for work being done in the Sybase Waterloo lab.
|
DB Meeting:
|
Wednesday May 25, 2:30pm, DC 1331
|
Speaker:
|
Shai Ben-David
|
Title:
|
Privacy preserving data mining
|
Abstract:
|
We address the question of how can a data base be mined
to obtain aggregate information without disclosing information about individual records in that data base. This has been an active research area in the data mining community for quite a while, and in the past several years has also been addressed by the theoretical cryptography research community.
I will give a brief survey of the existing literature on this topic, discuss some shortcomings
of some of the most popular current approaches and present some new ideas addressing these shortcomings that have been developed jointly with Rita Ackerman and Dale Schuurmans.
|
DB Meeting:
|
Wednesday June 01, 2:30pm, DC 1331
|
Speaker:
|
Chen Zhang
|
Title:
|
CloudBATCH: A Batch Job Queuing System on Clouds with Hadoop and HBase
|
Abstract:
|
As MapReduce becomes more and more popular in data processing
applications, the demand for Hadoop clusters grows increasingly.
However, Hadoop is incompatible with existing cluster batch job
queuing systems and requires a dedicated cluster under its full
control. Hadoop also lacks support for user access control,
accounting, fine-grain performance monitoring and legacy batch job
processing facilities comparable to existing cluster job queuing
systems. This makes dedicated Hadoop clusters less amenable for
administrators and normal users alike with hybrid computing needs that
involve both MapReduce and legacy applications. As a result, getting a
properly suited and sized Hadoop cluster has not been easy in
organizations with existing clusters. This talk presents CloudBATCH, a
prototype solution to this problem enabling Hadoop to function as a
traditional batch job queuing system with enhanced functionality for
cluster resource management. With a system like CloudBATCH, a complete
shift to Hadoop for managing an entire cluster to cater for hybrid
computing needs becomes feasible.
|
DB Meeting:
|
Wednesday June 08, 2:30pm, DC 1331
|
Speaker:
|
Umar Farooq Minhas
|
Title:
|
A Bayesian Approach to Online Performance Modeling for Database Appliances using Gaussian Models
|
Abstract:
|
In order to meet service level agreements (SLAs) and to maintain peak
performance for database management systems (DBMS), database
administrators (DBAs) need to implement policies for effective
workload scheduling, admission control, and resource
provisioning. Accurately predicting response times of DBMS queries is
necessary for a DBA to effectively achieve these goals. This task is
particularly challenging due to the fact that a database workload
typically consists of many concurrently running queries and an
accurate model needs to capture their interactions. Additional
challenges are introduced when DBMSes are run in dynamic cloud
computing environments, where workload, data, and physical resources
can change frequently, on-the-fly. Building an efficient and highly
accurate online DBMS performance model that is robust in the face of
changing workloads, data evolution, and physical resource allocations
is still an unsolved problem. In this work, our goal is to build such
an online performance model for database appliances using an
experiment-driven modeling approach. We use a Bayesian approach and
build novel Gaussian models that take into account the interaction
among concurrently executing queries and predict response times of
individual DBMS queries. A key feature of our modeling approach is
that the models can be updated online in response to new queries or
data, or changing resource allocations. We experimentally demonstrate
that our models are accurate and effective - our best models have an average prediction error of 16.3% in the worst case.
|
DB Meeting:
|
Wednesday June 22, 2:30pm, DC 1331 CANCELLED
|
Speaker:
|
Ken Salem
|
Title:
|
|
Abstract:
|
|
DB Meeting:
|
Wednesday July 20, 2:30pm, DC 1331
|
Speaker:
|
Andrew Kane
|
Title:
|
Contemporary misconceptions that limit distributed system design and implementation
|
Abstract:
|
The current practice for distributed systems builds on unwritten assumptions about how to design and implement such systems. I believe that some of these assumptions are limiting potential optimizations, for example, the belief that static compilation is faster than just-in-time compilation leads to query plans that don't change after a query starts, and query executions that don't adapt to data distribution changes across the span of a query. In order to make these ideas more concrete, I will show potential benefits using a three tiered distributed database system as an example. The misconceptions presented are related to machine design, data level distribution, and tiered solutions.
|
DB Meeting:
|
Wednesday July 27, 2:30pm, DC 1331
|
Speaker:
|
Changjiu Jin
|
Title:
|
View Consistency in Cassandra
|
Abstract:
|
Recently, the cloud computing paradigm has been receiving significant
attention. The relational databases may lead to inefficiencies, and
limit scale and availability. Therefore, some systems prefer to
sacrifice some relational database features, such as efficient
support for complex queries and data consistency to cater to
the large scalability requirement in cloud.
We explore a view management mechanism, to speed up some kinds of
query accessing in NoSQL systems. This mechanism is to update
materialized views and address the inconsistency issue between base
tables and materialized views by providing session consistency
guarantee.
|
DB Meeting:
|
Wednesday August 24, 2:30pm, DC 1331
|
Speakers:
|
Patrick Kling and Umar Farooq Minhas
|
Title:
|
VLDB practice talks
|
Abstract:
|
- RemusDB: Transparent High-Availability for Database Systems (Umar)
- Generating Efficient Execution Plans for Vertically Partitioned XML
Databases (Patrick)
|
This page is maintained
by
Ken Salem.