Fall 2007 Events Schedule | Database Research Group | UW

[Please remove <h1>]

Fall 2007

Note: Events of interest to the Database Research Group are posted to the uw.cs.database newsgroup and are mailed to the db-group@lists.uwaterloo.ca mailing list. There are actually three mailing lists aggregated into the db-group list: db-faculty (for DB group faculty), db-grads (for DB group graduate students), and db-friends (for DB group alumni, visitors, and friends). If you wish to subscribe to one of these three lists (or to unsubscribe), please visit https://lists.uwaterloo.ca/mailman/listinfo/<listname>, where <listname> is the list you wish to subscribe to.
DB group meetings
The DB group meets most Friday afternoons at 2pm, usually in DC1331. See the list of current events for times and locations of upcoming meetings. Each meeting lasts for an hour and features an informal presentation by one of the members of the group. Everyone is welcome to attend. These talks are intended to raise questions and to stimulate discussion rather than being polished presentations of research results. Speakers are determined using a rotating speaker list, which can be found on the DB group meeting page
DB seminar series
The DB seminar series features visiting speakers. These seminars are more-or-less monthly, and are usually scheduled on Monday mornings at 11am. See the list of current events for times and locations of upcoming seminars. The full schedule can be found on the DB seminar series page.

Recent and Upcoming Events


DB Meeting: Friday September 7, 2:00pm, DC 1331
Topic: Kickoff meeting + Group photograph

DB Seminar: Monday September 10, 10:30am, DC 1304
Speaker: Samuel Madden, MIT
Title: The CarTel Mobile Sensor System

DB Meeting: Friday September 14, 2:00pm, DC 1331
Speaker: Glenn Paulley, Sybase iAnywhere
Title: Modelling Performance Benchmark Results Using Linear Regression
Abstract: My talk will be on the modelling of performance benchmark results using linear regression techniques. In particular, I'm going to talk about the design of experiments where there are multiple variables to be tested, and how to go about determining the proportion of variation in the experimental results for each of the variables.

My primary reference is
Raj Jain (1991). The Art of Computer Systems Performance Analysis. John Wiley & Sons, New York. ISBN 0-471-50336-3.


DB Meeting: Friday September 21, 2:00pm, DC 1331
Speaker: Huaxin Zhang
Title: Cardinality Estimation in the Presence of Access Controls
Abstract: In database systems that support fine-grained access controls, each user has access rights that determine which tuples are accessible and which are inaccessible. Queries are answered as if the inaccessible tuples are not present in the database. Thus, users with different access rights may get different answers to a given query. To process queries efficiently in the presence of fine-grained access controls, the database system needs accurate estimates of the number of tuples that are both accessible according to the access rights of the submitting user and relevant according to the selection predicates in the query. In this paper we present sampling-based cardinality estimation techniques for use in the presence of fine-grained access controls. These techniques exploit the fact that access rights are relatively static and are common to all queries that are evaluated on behalf of a particular user. We show that the proposed techniques provide more accurate estimates than simpler techniques that do not exploit knowledge of access rights. We quantify these improvements analytically and through simulations.

IR Group Seminar Wednesday October 3, 11:00am, DC 1304
Speaker: Eugene Agichtein, Emory University
Title: Exploring User Behavior in Web Search, Social Media, and Beyond
Abstract: User generated data (e.g., web search queries, result clicks, web browsing traces, interactions in social media) are the fastest growing source of data on the web. We explore some of the ways we could mine this data to enable more effective access to information. In particular, I will describe our recent work on interpreting implicit feedback for improving general-purpose web search, and our ongoing work on mining various forms of user behavior in social media, focusing on Yahoo Answers - a popular community question answering portal.

DB Meeting: Friday October 5, 2:00pm, DC 1331
Speaker: Ken Salem
Title: Dynamic Physical Design
Abstract: Database workloads vary over time, so database physical designs may need to vary as well. I'll discuss several ways to make database physical design more dynamic.

DB Meeting: Friday October 12, 2:00pm, DC 1304 (Please note change of room)
Speaker: Jeff Pound
Title: On Ordering and Indexing Metadata for the Semantic Web
Abstract: In this talk I will outline the theoretical background for ordering and indexing a finite collection of descriptions in a description logic. I will then describe how this work can be applied to indexing metadata for semantic data sets. In particular, we introduce a language for specifying partial ordering relations over metadata expressed as concepts in a description logic (e.g. the OWL web ontology language), and show how this language can be used in combination with binary trees to efficiently search a database of concept descriptions. The language consists of a pair of ordering constructors that support a form of exogenous indexing, in which search criteria is independent of concept descriptions occurring in a database, and a form of endogenous indexing, in which concept descriptions themselves provide the search criteria. An important feature of the language is that it can be refined in the same way as a description logic, in that greater expressiveness and consequent richer search capability is achieved by adding additional ordering constructors.

DB Seminar: Monday October 15, 10:30am, DC 1304
Speaker: Goetz Graefe, HP Labs
Title: Sorting and indexing with partitioned B-trees

DB Meeting: Friday October 19, 2:00pm, DC 1331
Speaker: Mumtaz Ahmad
Title: External Query Scheduling in DBMS
Abstract: Externally scheduling the queries can improve performance in database applications. Most commonly used scheduling policies may give sub-optimal performance. A typical workload in database consists of queries of different types running concurrently. We will discuss in this talk that reasoning about concurrently running query mixes can give us considerable performance benefits.

DB Meeting: Friday November 2, 2:00pm, DC 1331
Speaker: Iman Elghandour
Title: Recommending Indexes for XML Databases
Abstract: XML database systems are expected to handle increasingly complex queries over increasingly large and highly structured XML databases. Having the correct indexes can improve the performance of such queries by orders of magnitude, and XML database systems employ a variety of structural and value indexes to improve query performance. But a problem that is faced by users of these systems is how to choose the best set of indexes for a given XML database and query workload. In this talk, I will present an XML Design Advisor that solves the XML index recommendation problem, and has the unique characteristic of being tightly coupled with the query optimizer. I will also describe the recommendation algorithms that we have developed to find the optimal index configuration. I will conclude with presenting the results of the experiments that we conducted on a prototype version of IBM DB2.

DB Meeting: Friday November 9, 2:00pm, DC 1331
Speaker: Frank Tompa
Title: Seeking Stable Clusters in the Blogosphere
Abstract: Many people now post new messages to blogs at least daily. Intuitively we expect that when a topic is hot, there will be a lot of chatter using some common vocabulary. Thus a set of keywords will be correlated, forming a cluster. As topics evolve in the blogosphere, so do the corresponding keyword clusters.

We developed efficient algorithms to identify keyword clusters and to identify those that persist over several time intervals. I will present some examples of stable clusters found and outline several of the algorithms involved.

References: VLDB 2007 (pp. 806-817) and http://www.blogscope.net/ Collaborators: Nilesh Bansal, Fei Chiang, and Nick Koudas (all at UofT)


DB Seminar: Monday November 19, 10:30am, DC 1304
Speaker: Remzi Arpaci-Dusseau, University of Wisconsin
Title: File Systems Are Broken (And What We're Doing To Fix Them)

DB Meeting: Friday November 23, 2:00pm, DC 1331
Speaker: Dan Farrar, Sybase iAnywhere
Title: Approximate Query Processing Techniques
Abstract: Decision support systems, systems for analyzing experimental data, and data stream processing applications often share two properties: they require the processing of huge volumes of data; but, they can tolerate some degree of error in queries made of them. By compactly summarizing a huge dataset and performing queries against this summary, acceptable levels of inaccuracy can be traded for substantial performance improvements.

In this talk, I will review the nature of approximate query processing and survey the methods discussed in the literature for computing data synopses and for using these synopses to provide rapid approximate answers to queries. I will discuss in some detail the wavelet approach described by Chakrabarti et al. and suggest an application of their work for existing relational database systems.

Kaushik Chakrabarti, Minos Garofalakis, Rajeev Rastogi, and Kyuseok Shim. "Approximate Query Processing Using Wavelets". In Proc. of the 26th VLDB Conf., 2000.


DB Meeting, MMath Thesis Presentation: Friday November 30, 2:00pm, DC 1331
Speaker: Umar Farooq Minhas
Title: A Performance Evaluation of Database Systems on Virtual Machines
Abstract: Virtual machine technologies offer simple and practical mechanisms to address many manageability problems in database systems. For example, these technologies allow for server consolidation, easier deployment, and more flexible provisioning. Therefore, database systems are increasingly being run on virtual machines. This offers many unique opportunities for database research. However, it is also important to understand the cost of virtualization. Virtual machine technologies add a layer of indirection between applications and the hardware that they use (e.g. CPU, memory, disk). This added complexity results in a performance overhead for software systems running in a virtual machine. In this thesis, we present an experimental study of the overhead of running a database workload in a virtual machine. Using a TPC-H workload running on PostgreSQL in a Xen virtual machine environment, we show that Xen does indeed introduce overhead for system calls, page fault handling, and disk I/O. However, these overheads do not translate to a high overhead in query execution time. We show that in all cases the average overhead is less than 10% and therefore, conclude that the advantages of running a database system in a virtual machine do not come at a high cost in performance.

DB Meeting: Friday December 7, 2:00pm, DC 1331
Speaker: George Beskales
Title: Efficient Search for the Most Probable Nearest Neighbors
Abstract: Efficient evaluation of nearest neighbor queries over probabilistic data is essential to many applications that involve uncertainty and fuzziness, such as similarity search and location based services. Current approaches to support such queries use expensive compute-then-sort techniques, or eliminate uncertainty using expected-value analysis. we formulate the problem of finding the k most probable nearest neighbors under a general uncertainty model that combines probabilistic data objects and query object. We identify and analyze the cost factors involved in computing the most probable nearest neighbors, and we present novel techniques to address the trade-offs among these cost factors.

DB Seminar: Monday December 10, 10:30am, DC 1304
Speaker: Christopher Jermaine, University of Florida
Title: Supporting Scalable Online Statistical Processing

This page is maintained by Ashraf Aboulnaga.