Database Research Group Events

Fall 2008

Note: Events of interest to the Database Research Group are posted to the uw.cs.database newsgroup and are mailed to the mailing list. There are actually three mailing lists aggregated into the db-group list: db-faculty (for DB group faculty), db-grads (for DB group graduate students), and db-friends (for DB group alumni, visitors, and friends). If you wish to subscribe to one of these three lists (or to unsubscribe), please visit<listname>, where <listname> is the list you wish to subscribe to.
DB group meetings
The DB group meets most Friday afternoons at 2pm, usually in DC1331. See the list of current events for times and locations of upcoming meetings. Each meeting lasts for an hour and features an informal presentation by one of the members of the group. Everyone is welcome to attend. These talks are intended to raise questions and to stimulate discussion rather than being polished presentations of research results. Speakers are determined using a rotating speaker list, which can be found on the DB group meeting page
DB seminar series
The DB seminar series features visiting speakers. These seminars are more-or-less monthly, and are usually scheduled on Monday mornings at 11am. See the list of current events for times and locations of upcoming seminars. The full schedule can be found on the DB seminar series page.

Recent and Upcoming Events

DB Meeting: Friday September 12, 2:00pm, DC 1331
Speaker: Anil Goel
Title: Hello Flash, so long HD?
Abstract: I will review the argument in favour of making widespread use of SSD devices for mainstream data storage and discuss recent work supporting the argument as well as pinpointing additional work needed in DBMS software
References: 1. Latency Lags Bandwidth, Patterson, CACM, Oct 2004 (
2. The five-minute rule twenty years later, and how flash memory changes the rules, Graefe, DaMoN 2007 (
3. A case for flash memory SSD in enterprise database applications, Lee, et al, SIGMOD 2008 (
4. Flash storage today, Leventhal, ACM Queue, July/Aug 2008 (
5. Design of flash based DBMS: An in-page logging approach, Lee, et al, SIGMOD 2007 (
6. Flash disk opportunity for server-applications, Gray, et al (
7. The five-minute rule ten years later, and other computer storage rules of thumb, Gray, et al, SIGMOD 1997 (
8. Enterprise SSDs, Moshayedi, et al, ACM Queue, Jul/Aug 2008 (
9. Intel's X25-M SSD, The Tech Report, Sep 2008 (

DB Seminar: Monday September 15, 10:30am, DC 1304
Speaker: Anhai Doan, University of Wisconsin
Title: Managing Unstructured Data

DB Meeting: Friday September 26, 2:00pm, DC 1331
Speaker: Oguzhan Ozmen
Title: Database-Aware Storage Layout Design
Abstract: The process of database design includes a number of steps performed by an expert, a database administrator. Following logical and physical database design, the very last step is to lay out database objects (i.e., tables, indexes, materialized views) on the underlying storage system. Storage layout design consists of assigning each object to a subset of the available storage devices. Indeed, the choice of how database objects are assigned to storage devices can significantly impact the I/O performance of a DBMS.

Layout decisions require information about the I/O behavior of the DBMS; otherwise, they rely only on heuristics or some generic guidelines. A storage layout design that does not consider the I/O behavior and requirements of the DBMS may, in the end, result in suboptimal DBMS performance. Currently, the I/O behavior of a storage client is measured through collecting (observing) its I/O trace. In this talk, I will briefly present our alternative method to produce I/O traces for DBMS other than measuring it. Then, I will discuss how we make use of this information to enable an informed, database-aware layout design.

DB Meeting: Friday October 3, 2:00pm, DC 1331
Speaker: Ken Salem
Title: Amazon's Dynamo
Abstract: I plan to discuss Dynamo, a scalable data storage system developed by to support its web services.

Seminar: Tuesday October 7, 1:00pm, DC 1302
Speaker: Irfan Ahmad, VMware
Title: Topics in Virtualization Research
Abstract: The initial impact of virtual machines (VMs) on commodity hardware was in the software development community where virtual machines eased the pain of development and test cycles. Large corporations were not far behind as adopters of VM technology to more fully utilize their hardware resources.

As these trends takes hold, several interesting problems emerge worthy of research. I'll provide a survey of these topics ranging from cloud and datacenter filesystems, distributed io resource management, security mobile device virtualization and profile movement back and forth between the cloud and devices, resource management/load balancing/live migration in very large clusters up to 100,000 nodes, support for large database systems, paravirtualization and technical and economic models for cloud computing.

I'll leave plenty of time at the end for questions and a lively discussion. All attendees will receive a copy of VMware Workstation for the PC or VMware Fusion for the Mac. Insightful questions/comments will receive additional shwag. Oh and refreshments too.

Bio: Irfan is a member of the Resource Management team at VMware. Most recently he has been researching novel techniques for proportional sharing of distributed IO resources as well as techniques for transparent and efficient virtual machine memory page classification. In the past, Irfan has worked on disk IO workload characterization and performance enhancements for running database workloads in virtual machines.

DB Seminar: Friday October 17, 2:00pm, DC 1304  (Please note: seminar in 1304 in place of DB meeting)
Speaker: J. Stephen Downie, University of Illinois at Urbana-Champaign
Title: The Music Information Retrieval Evaluation eXchange (MIREX): An Introductory Overview

DB Meeting: Friday October 24, 2:00pm, DC 1331
Speaker: George Beskales
Title: Survey of Record Deduplication Techniques
Abstract: Real-world databases experience various data quality problems including duplicate records, violated integrity constraints, and missing values. Data cleaning is the process of detecting and correcting such anomalies to improve data quality. I will focus on one goal of the data cleaning process, which is to eliminate duplicate records. I plan to review state-of-the-art duplicate elimination techniques. Also, I will highlight characteristics of the deduplication problem and present a taxonomy of clustering algorithms that are used in records deduplication.

DB Seminar: Monday November 3, 10:30amm, MC 5158  (Please note different location)
Speaker: Jerome Simeon, IBM T.J. Watson Research Center
Title: The Plumber, The Dragon, and the Princess: Growing a Web 2.0 Language from XQuery

Seminar: Friday November 7, 2:00pm, DC 1331
Speaker: Panos Ipeirotis, New York University
Title: Structuring and querying online opinions using econometrics
Abstract: Today, users post online reviews expressing their opinions for movies, restaurants, and many other products. They also evaluate merchants and react to news about political campaigns. Structuring and ranking these opinions in terms of importance and polarity is a difficult research problem. How can we infer the importance and polarity of the posted content? How can we structure and quantify the effect of the online opinions? Many existing approaches rely on human annotators to evaluate the polarity and strength of the opinions, a laborious and error- prone task. We take a different approach by considering the economic context in which an opinion is evaluated. We rely on the fact that the text in on-line systems influence the behavior of the readers and this effect can be observed using some easy-to-measure economic variables, such as revenues or product prices. Then, by reversing the logic, we infer the semantic orientation and the strength of an opinion by tracing the changes in the associated economic variable. In effect, we combine econometrics with text mining algorithms to identify the "economic value of text" and assign a "dollar value" to each opinion, quantifying sentiment effectively and without the need for manual effort. We present applications on reputation systems, online product reviews, travel search, and the effect of online media on elections.

DB Meeting: Friday November 14, 2:00pm, DC 1331
Speaker: Frank Tompa
Title: Three text-indexing papers from CIKM 2008
Abstract: I will present three papers from the recent CIKM conference, each of which deals with a different type of text index and a different aspect of index performance:
  1. Derrick Coetzee (MSR) presented "TinyLex: Static N-Gram Index Pruning with Perfect Recall," in which he showed how to build effective indexes with lexicons of smaller size by using variable-length n-grams.
  2. Ercegovac, Josifovski, Li, Mediano, and Shekita (IBM & Yahoo) presented "Supporting Sub-Document Updates and Queries in an Inverted Index," in which they showed how to accommodate updates to sections of documents without re-indexing complete documents and yet without causing excessive overhead when querying.
  3. Barsky, Stege, Thomo, and Upton (Victoria) presented "A New Method for Indexing Genomes Using On-Disk Suffix Trees," in which they showed how to build very large suffix trees faster than by using previous methods.

DB Seminar: Monday November 17, 10:30am, DC 1304
Speaker: Marianne Winslett, University of Illinois at Urbana-Champaign
Title: Managing Compliance Data: Addressing the Insider Threat Exemplified by Enron

DB Meeting: Friday November 21, 2:00pm, DC 1331
Speaker: Dan Farrar
Title: Making databases functional
Abstract: The effectiveness of a particular database language and programming interface for a given application is closely related to the semantics and native design patterns of the application language. Creating robust and high-performance database applications requires that the tools and languages used provide a good fit, both to each other and to the application domain. In this talk, I will discuss a taxonomy of database interfaces and show how these map to a taxonomy of application programming languages, and the downsides of mixing these mappings (one example is the object-relational impedance mismatch). I will discuss traditional imperative and object-oriented languages, and then I will then look in more detail at functional languages and their associated database interfaces, with a focus on LINQ and Haskell.

Integrating Programming Languages & Databases: What's the Problem?, Cook and Ibrahim, (
Monad Comprehensions: A Versatile Representation for Queries, Grust, (

DB Meeting: Friday November 28, 2:00pm, DC 1331
Speaker: David Toman
Title: Another look at query reformulation under complex constraints and mappings
Abstract: Rewriting queries formulated over a high-level conceptual schemata to query plans---essentially queries over physical artifacts comprising the physical design that supports the high-level conceptual design---is one of the essential tasks performed by query optimizers (and/or by various mapping/integration/exchange/etc. layers). In this talk we look at the common underpinnings of this problem and on an alternative approach to this problem based on the work on definability in first-order logic by Beth and Craig.

DB Seminar: Wednesday December 10, 10:30am, DC 1304  (Please note change in day)
Speaker: Zack Ives, University of Pennsylvania
Title: Orchestra: Sharing Inconsistent Data in a Consistent Way

This page is maintained by Ashraf Aboulnaga.

Campaign Waterloo

Data Systems Group
David R. Cheriton School of Computer Science
University of Waterloo
Waterloo, Ontario, Canada N2L 3G1
Tel: 519-888-4567
Fax: 519-885-1208

Contact | Feedback: | Data Systems Group

Valid HTML 4.01!Valid CSS! Last modified: Friday, 01-Jun-2012 11:01:02 EDT