[Please remove <h1>]
Winter 2011
Events of interest to the
Database Research Group are posted here, and are also
mailed to the uw.cs.database newsgroup and the
db-faculty,
db-grads,
db-friends
mailing lists.
Subscribe to one of these mailing lists to receive e-mail notification
of upcoming events.
The DB group meets Wednesday afternoons at 2:30pm.
The list below gives the
times and locations of upcoming meetings.
Each meeting lasts for an hour and features either
a local speaker or, on
Seminar days,
an invited outside speaker.
Everyone is welcome to attend.
Winter 2011 Events
DB Meeting:
|
Wednesday January 19, 2:30pm, DC 1331
|
Speaker:
|
David DeHaan
|
Title:
|
Spatial Data Support in SQL Anywhere 12
|
Abstract:
|
Since joining iAnywhere fifteen months ago, much of my time has been spent
on the new spatial features just released in SQL Anywhere 12. Spatial
databases was a new area for me, so in this talk I will give an overview of
standards and practices for RDBMS support of spatial data. I will then
discuss the spatial functionality implemented in SQL Anywhere 12, including
how the product goals for SA differ from some other RDMS products and how
those goals impact our design. I will end with an overview and brief demo
of Quantum GIS, an open-source desktop tool for manipulating geographic
information.
|
DB Meeting:
|
Wednesday January 26, 2:30pm, DC 1331
|
Speaker:
|
Chen Zhang
|
Title:
|
HBaseDB: A solution for Multi-row distributed transactions with
global strong snapshot isolation using HBase on clouds
|
Abstract:
|
Modern applications such as collaborative Web 2.0 applications and
social network sites pose challenging requests for scalable
distributed transactions involving multiple data items on clouds. On
the one hand, traditional database management systems (DBMS) cannot
provide the desired degree of scalability and availability and
guarantee transactional properties at the same time, especially in
face of various kinds of failures on clouds. On the other hand,
column-oriented data stores fall short of multi-row distributed
transactional supports, although they have been proven to scale and
perform well on clouds with integrated fault tolerance schemes. Under
this background, HBase, a representative open source column-oriented
store modeled after Google's BigTable system, has been studied
recently for solutions to provide transactional data management
capabilities. Unfortunately, none of the existing solutions support
distributed transactions with global strong snapshot isolation (SI)
with high throughput and low latency. This paper presents a solution,
called HBaseDB, supporting global strong SI for distributed
transactions using HBase. HBaseDB targets at the same type of OLTP
workloads as HBase, featuring random data access. It is implemented as
a light-weight client-side library in lieu of the standard HBase API
for transactional processing and requires no extra programs to be
deployed or maintained, and no modifications to the existing user data
that have been stored in HBase. Transactions are autonomously managed
by applications that issue them through the client library without
using any consensus-based protocols, atomic broadcast, or
transactional locks on data for distributed synchronization and
concurrency control. As a result of the simplicity in design, HBaseDB
adds low overhead to HBase performance and directly inherits many nice
properties of HBase on clouds, such as scalability, fault tolerance,
access transparency and high throughput.
|
DB Meeting:
|
Wednesday February 2, 2:30pm, DC 1331
|
Speaker:
|
Gunes Aluc
|
Title:
|
Tailoring RDF Databases for Web Data Management
|
Abstract:
|
The Resource Description Framework (RDF) provides a flexible model to capture the many-to-many relationships in web data. However, due to these complex relationships, when the data volume is large, querying involves potentially a large number of joins, even if one partitions the data. Existing techniques such as the property table approach, vertical partitioning and hexatuple indexing cannot fully address this problem. The property table approach requires the schema of the data to be analyzed in advance to construct the flattened tables. Vertical partitioning assumes that queries will have bound predicates, hence are inefficient at handling fuzzy queries. Finally, for hexatuple indexing to be efficient in a distributed scenario, one must guarantee that data can be partitioned such that there are negligible cross references among the partitions. Unfortunately, these conditions are hardly ever met in web data management: web data does not have a well-defined schema, fuzzy queries are as likely to exist in web applications as exact relational queries, and data partitioning in the presence these many-to-many relationships is challenging. This talk will highlight the strengths and weaknesses of existing RDF databases as potential tools in web data management, and provide some perspective on how some of these challenges can be addressed.
|
DB Meeting:
|
Wednesday February 9, 2:30pm, DC 1331
|
Speaker:
|
Wayne Oldford
|
Title:
|
Visual exploration of high-dimensional data by interactive navigation of low-dimensional data spaces
|
Abstract:
|
The structure of a set of high dimensional data objects (e.g. images, documents, molecules, genetic expressions, etc.) is notoriously difficult to visualize. In contrast, lower dimensional structure (esp. 3 or fewer dimensions) is natural to us and easy to visualize. A not unreasonable approach, then, might be to explore one low dimensional visualization after another in the hope that, together, these will shed light on the higher dimensional structure.
In this talk, I will introduce some graph-theoretic structures which have low dimensional spaces as nodes/vertices and transitions from one space to another as edges. To be concrete, suppose that each node is a 2-d scatterplot of the data and that an edge exists between nodes whose corresponding scatterplots share a variable. In this case, travel along an edge amounts to a 3d transition effected by rotating one 2d scatterplot into the next. More generally, imagine a user moving a "You are here" circle, or "bullet", from one node to another along defined edges, causing one data visualization to be smoothly morphed into the other. A walk on the graph represents a low-dimensional trajectory through the higher dimensional space. Of interest, are walks along these graphs that reveal meaningful structure in the displayed data.
These ideas will be demonstrated on several different data sets using an interactive software package called RnavGraph (written by UW Ph.D. student Adrian Waddell). Rnavgraph allows a user to visually explore any data set by dynamically walking the graph structure and interacting with the displayed data. It connects to the statistical programming system called R.
Methods for constructing these graphs and for identifying interesting subgraphs will also be described and demonstrated. Some dimensionality reduction (manifold learning) methods will also be used to constrain the size of the graph.
|
DB Meeting:
|
Wednesday February 16, 2:30pm, DC 1331
|
Speaker:
|
Peter Bumbulis
|
Title:
|
HyPer: Hybrid OLTP and OLAP Main Memory Database System
Based on Virtual Memory Snapshots
|
Abstract:
|
I'd like to present HyPer: Hybrid OLTP and OLAP Main Memory
Database System - Based on Virtual Memory Snapshots by Kemper & Neumann. HyPer is
interesting in that it uses another feature of commodity hardware to
improve database performance.
|
DB Meeting:
|
Wednesday March 2, 2:30pm, DC 1331
|
Speakers:
|
Xin Liu
|
Title:
|
Extending the Cache with Solid State Drives (SSD)
|
Abstract:
|
In the past few years, the cost of flash memory has fallen dramatically
while fabrication has become more efficient. Flash-based Solid State Drives (SSDs)
start to make inroads into the laptop market, desktop storage market
as well as the enterprise server market. The price and performance of flash memory
fall between traditional RAM and hard disk drives. If flash memory is
introduced to fill the gap between RAM and traditional rotating disks, a common question
is whether it should works as a special part of main memory or a special part of
persistent storage.
Rather than putting SSD and HDD side by side and using SSD as
alternative storage options, we propose to treat the memory, SSD, and HDD
hierarchically. This project focuses on the issues of using SSD
as a second tier cache of the memory cache.
We will discuss the impact of the memory and SSD on each other,
and how to utilize them efficiently.
|
DB Meeting:
|
Wednesday March 9, 2:30pm,
DC 1331 CANCELLED
|
Speakers:
|
Ahmed Soror
|
Title:
|
|
Abstract:
|
|
DB Meeting:
|
Wednesday March 16, 2:30pm, DC 1331
|
Speaker:
|
Frank Tompa
|
Title:
|
Augmenting Data Warehouses from Text
|
Abstract:
|
One theme in the Business Intelligence Network (BIN) aims to use the information contained in documents to provide input for business analytics.
I will review how text mining can be applied to extract facts that can be added to a data warehouse.
Special attention will be given to the problem of extracting temporal information from text and posing queries with temporal constraints against document data.
[Primary references for this material are Gomes and Tompa's Information Extraction in the Business Intelligence Context (unpublished);
Zhang, Suchanek, Yue, and Weikum's TOB: Timely Ontologies for Business Relations (WebDB 2008);
and Arikan, Bedathur, and Berberich's Time Will Tell: Leveraging Temporal Expressions in IR (WSDM 2009).]
|
DB Meeting:
|
Wednesday March 30, 2:30pm, DC 1331
|
Speaker:
|
Zhiping Wu
|
Title:
|
Materialization in Access Control Systems
|
Abstract:
|
In modern access control systems, hierarchies (subject, object or role) are often supported to make models more flexible. They can, however, increase the time to answer authorization requests due to possibly timing-consuming permission propagation and conflict resolution, especially when the hierarchies are deep. Our work is to take advantage of materialization in access control systems so that an authorization request does not always have to bear the cost of permission propagation and conflict resolution. By measuring the system and authorization costs, we illustrate that materialization can be useful in access control systems, especially from user experience point of view. Moreover, we can potentially reduce both system cost and authorization cost with better update mechanism in our future research.
|
DB Meeting:
|
Wednesday April 20, 2:30pm, DC 1331
|
Speaker:
|
Patrick Kling
|
Title:
|
Generating Efficient Execution Plans for Vertically Partitioned
XML Databases
|
Abstract:
|
Experience with relational systems has shown that distribution is an
effective way of improving the scalability of query evaluation. In
this paper, we show how distributed query evaluation can be performed
in a vertically partitioned XML database system. We propose a novel
technique for constructing distributed execution plans that is
independent of local query evaluation strategies. We then present a
number of optimizations that allow us to further improve the
performance of distributed query execution. Finally, we present a
response time-based cost model that allows us to pick the best
execution plan for a given query and database instance. Based on an
implementation of our techniques within a native XML database system,
we verify that our execution plans take advantage of the parallelism
in a distributed system and that our cost model is effective at
identifying the most advantageous plan.
This talk is an extended version of our upcoming presentation at VLDB
2011:
Patrick Kling, M. Tamer Ozsu, Khuzaima Daudjee.
Generating
Efficient Execution Plans for Vertically Partitioned XML Databases.
PVLDB 4(1): 1-11 (2010).
available at
http://www.vldb.org/pvldb/vol4/p1-kling.pdf
|
DB Meeting:
|
Wednesday April 27, 2:30pm, DC 1331
|
Speaker:
|
Cătălin-Alexandru Avram
|
Title:
|
SLiM-FiD: A Durable SkipList-based In-Memory Index using Flash
|
Abstract:
|
We present SLiM-FiD, an in-memory indexing system based on SkipLists and HashTables. Our solution is optimized for using SSDs as secondary storage to provide persistence and was entered in the 3rd Annual SIGMOD Programming Contest. We cover our implementation and the associated persistence mechanism in detail as well as the reasons behind our choices of algorithms and data structures used. Finally, our experiments show that SLiM-FiD is able to outperform established key-value store technologies.
|
This page is maintained
by
Ken Salem.