[Please remove <h1>]
Winter 2010
Events of interest to the
Database Research Group are posted here, and are also
mailed to the uw.cs.database newsgroup and the
db-faculty,
db-grads,
db-friends
mailing lists.
Subscribe to one of these mailing lists to receive e-mail notification
of upcoming events.
The DB group meets Wednesday afternoons at 2:30pm.
The list below gives the
times and locations of upcoming meetings.
Each meeting lasts for an hour and features either
a local speaker or, on
Seminar days,
an invited outside speaker.
Everyone is welcome to attend.
Recent and Upcoming Events
DB Meeting:
|
Wednesday January 13, 2:30pm, DC 1331
|
Speaker:
|
Yingying Tao
|
Title:
|
On-line estimating column cardinalities in large tables
|
Abstract:
|
Estimating column cardinalities in large tables is an important task
for RDBMSs. Since the statistics in RDBMSs is only updated every certain
time interval, if large amount of insertions/deletions occur between
two updates, the statistics of certain tables may change dramatically
but is not updated, leading to a poor performance of the query optimizer.
Hence, an on-line algorithm that can maintain important statistics such
as column cardinalities is desired. In this talk, a technique on
estimating column cardinalities in real time will be presented. The
tables of interest are the large tables that are split into partitions.
This technique updates the column cardinalities efficienctly with high
accuracy when a partition is added into or dropped from the table.
|
DB Meeting:
|
Wednesday January 20, 2:30pm, DC 1331
|
Speaker:
|
Amr El-Helw
|
Title:
|
Structured Querying of Text Databases
|
Abstract:
|
Unstructured text documents often embed data that is structured in nature, and we can expose this structured data using information extraction technology. A collection of such unstructured documents can be known as a text database. By processing a text database with information extraction systems, we can materialize a variety of structured "relations," over which we can then issue regular SQL queries. Unlike the traditional relational world, a SQL query execution over a text database might produce answers that are not fully accurate or complete, for a number of reasons. A query optimizer over a text database has to take into consideration the efficiency of query execution as well as the quality of the results. In this talk, I will present recent work that addresses the problem of processing and cost-based optimization of SQL queries issued over a text database, incorporating a trade-off between efficiency and result quality.
|
DB Meeting:
|
Wednesday February 3, 2:30pm, DC 1331
|
Speaker:
|
Grant Weddell
|
Title:
|
On RDF and SPARQL
|
DB Meeting:
|
Wednesday February 10, 2:30pm, DC 1331
|
Speaker:
|
Iman Elghandour
|
Title:
|
Integrating MapReduce ideas into Distributed DBMS
|
Abstract:
|
MapReduce has emerged as a framework to process large scale data. It has also become popular for its scalability and fault tolerance. MapReduce and parallel databases share some similarities. But, MapReduce is designed for unstructured data and it lacks the efficiency of DBMS. Therefore, recent research has focused on combining Mapreduce with independent units of DBMS running on cluster nodes.
In this talk, I will discuss two different approaches: HadoopDB and OspreyDB. In HadoopDB, the Hadoop MapReduce implementation is used as a communication layer on top of single node DBMS instances. In contrast, Osprey exports the MapReduce fault tolerance and adds it to their distributed shared nothing database.
[1] A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. HadoopDB: An architectural hybrid of mapreduce and DBMS technologies for analytical workloads. Proc. VLDB Endow., 2(1):922-933, 2009.
[2] C. Yang, C. Yen, C. Tan, and S. Madden. Osprey: Implementing mapreduce-style fault tolerance in a shared-nothing distributed database. In ICDE '10, 2010.
|
CS Seminar:
|
Wednesday February 17, 10:30pm, DC 1304
|
Speaker:
|
Rafae Bhatti,
Database Security Group,
Oracle Inc.
|
Title:
|
Security and Privacy For Healthcare Applications: Does Policy mean Protection?
|
Abstract:
|
With the adoption of Electronic Medical Records (EMRs), an increasing number of health-related Web applications is now available to consumers, providers, and partners. While this transformation offers huge benefits, there are security and privacy concerns integral to the process of electronic healthcare delivery. In this talk, we first survey the body of evidence to emphasize the design of appropriate security solutions for electronic healthcare applications. The successful solutions will always comply with the prime directive of healthcare - "nothing should interfere with delivery of care." We then formally present the problem of reconciling security and privacy policies with the actual healthcare workflow, which we refer to as the policy coverage problem. We outline a technical solution to the problem based on the concept of policy refinement, and develop a privacy protection architecture called PRIMA. We also offer guidelines for electronic healthcare applications to ensure adequate policy coverage. The ultimate goal is that electronic healthcare applications should be made secure without compromising usability.
|
DB Meeting:
|
Wednesday February 24, 2:30pm, DC 1331
|
Speaker:
|
Tim Brecht
|
Title:
|
Q-Cop: Avoiding Bad Query Mixes to Minimize Client Timeouts Under Heavy Load
|
Abstract:
|
In three-tiered web applications, some form of admission control
is required to ensure that throughput and response times are not
significantly harmed during periods of heavy load. We propose Q-Cop, a
prototype system for improving admission control decisions that considers
a combination of the load on the system, the number of simultaneous
queries being executed, the actual mix of queries being executed,
and the expected time a user may wait for a reply before they or their
browser give up (i.e., time out). Using TCP-W queries, we show that
the response times of different types of queries can vary significantly
depending not just on the number of queries being processed but on the
mix of other queries that are running simultaneously. We develop a
model of expected query execution times that accounts for the mix of
queries being executed and integrate this model into a three-tiered
system to make admission control decisions. Our results show that this
approach makes more informed decisions about which queries to reject and
as a result significantly reduces the number of requests that time out.
Across the range of workloads examined an average of 47% fewer requests
are unsuccessful than the next best approach.
This is joint work with Sean Tozer and Ashraf Aboulnaga
This is a practice talk for ICDE 2010.
|
DB Meeting:
|
Wednesday March 10, 2:30pm, DC 1331
|
Speaker:
|
Mohamed Soliman
|
Title:
|
Wrapper Induction from Noisy Example
|
Abstract:
|
Today's Web is a massive source of information that is mainly
formatted for human consumption. However, many websites use scripts to
generate structured HTML, which allows extraction rules, called
wrappers, to effectively extract information of interest. Although
many supervised/unsupervised information extraction approaches are out
there, building a noise-tolerant extraction system has received
limited attention.
In the first part of this talk I'm going to present new methods for
wrapper induction from noisy examples. The settings assume an
automatic (noisy) annotator (e.g., dictionary lookup) that produces a
few training examples of a target type from a given set of input
pages. The objective is to learn a wrapper, based on a blackbox
wrapper induction algorithm, that correctly extracts instances of the
target type in similar pages. By removing the need to manually
annotate the pages, we are able to perform extraction at Web
scale. This is a joint work with Nilesh Dalvi and Ravi Kumar from
Yahoo! Research.
In the second part of the talk I'm going to give a demo of MashRank
[1], a mashup system that integrates concepts from the areas of
rank-aware query processing, probabilistic databases, and information
extraction to enable building ranked mashups of sources with
(possibly) uncertain ranking attributes. MashRank integrates
information extraction techniques into query processing by
asynchronously pushing extracted data into pipelined rank-aware query
plans producing mashup results in an early-out fashion. This is a
joint work with my supervisor Ihab Ilyas and my colleague Mina
Saleeb.
[1] "MashRank: Towards Uncertainty-Aware and Rank-Aware Mashups",
Mohamed A. Soliman, Mina Saleeb, and Ihab F. Ilyas. In ICDE 2010.
|
DB Meeting:
|
Wednesday March 24, 2:30pm, DC 1331
|
Speaker:
|
Ivan Bowman, Sybase iAnywhere
|
Title:
|
The Perils of Upgrading
|
Abstract:
|
In this talk I will discuss how customers can encounter unintended
consequences when migrating to a new version of database server
software. I will discuss customer workloads that we investigated as
part of problem reports related to moving to a new version of the SQL
Anywhere database server software. I will characterize the workloads
and problems that the customers encountered and explore ways that
these problems can be avoided or mitigated.
|
DB Meeting:
|
Wednesday March 31, 2:30pm, DC 1331
|
Speaker:
|
Greg Drzadzewski
|
Title:
|
Creating a facility for Browsing / Searching on Clustered data
|
Abstract:
|
In this talk I will describe an application being developed that
enables the user to browse / search on a collection of clustered
documents and provide the user with facilities that help them answer
Business Intelligence questions. A range of different operations on
collections of data are being examined in order to determine what
functionality should be provided in the applications and how the
application should be designed in order to optimize the performance of
these operations.
|
DB Meeting:
|
Wednesday April 7, 2:30pm, DC 1331
|
Speaker:
|
Umar Farooq Minhas
|
Title:
|
High Availability for Database Systems through Remus
|
Abstract:
|
Remus [1] is a research prototype that provides high availability (HA)
through asynchronous virtual machine replication on the open source
Xen hypervisor running on commodity hardware in an application and
operating system agnostic manner. One problem with using Remus for HA
is that database workloads, such as Online Transaction Processing
(OLTP) workloads, dirty memory at a very high rate which results in a
high replication overhead for Remus. We propose the design and
implementation of a Remus-aware database system to reduce the
performance overhead without compromising the HA guarantees of
Remus. In this talk, I will first present experimental results that
quantify the overhead incurred by TPC-H and TPC-C benchmarks running
over PostgreSQL with Remus. I will then describe our proposed
optimization called memory deprotection. Finally, I will present
results from ongoing experiments showing that selective memory
deprotection has the potential to improve performance for database
workloads.
[1] Brendan Cully, Geoffrey Lefebvre, Dutch Meyer, Mike Feeley, Norm Hutchinson, and Andrew Warfield, Remus: High Availability via Asynchronous Virtual Machine Replication, in NSDI 2008.
|
DB Meeting:
|
Wednesday April 14, 2:30pm, DC 1331
CANCELLED
|
Speaker:
|
David Toman
|
Title:
|
TBD
|
Abstract:
|
TBD
|
DB Meeting:
|
Wednesday April 28, 2:30pm, DC 1331
|
Speaker:
|
Glenn Paulley
|
Title:
|
Query Mesh
|
Abstract:
|
The subject of my talk is Query Mesh, a query processing technique that
creates multiple execution strategies for distinct subsets of data. Details
of Query Mesh have appeared in several recent conference proceedings,
authored by Nehme, Rundensteiner, and Bertino.
|
This page is maintained
by
Ken Salem.