[Please remove <h1>]
Fall 2007
Note: Events of interest to the
Database Research Group are posted to the uw.cs.database
newsgroup and are mailed to the
db-group@lists.uwaterloo.ca
mailing list. There are actually three mailing lists aggregated into the
db-group list: db-faculty
(for DB group faculty), db-grads (for DB group graduate students),
and db-friends (for DB group alumni, visitors, and friends). If
you wish to subscribe to one of these three lists (or to unsubscribe), please
visit
https://lists.uwaterloo.ca/mailman/listinfo/<listname>, where
<listname> is the list you wish to subscribe to.
- DB group meetings
- The DB group meets most Friday afternoons at 2pm, usually in DC1331.
See the list of current events for
times and locations of upcoming meetings. Each meeting lasts
for an hour and features an informal presentation by one of the
members of the group. Everyone is welcome to attend. These talks are
intended to raise questions and to stimulate discussion rather than
being polished presentations of research results. Speakers are determined
using a rotating speaker list, which can be found on the
DB group meeting page
- DB seminar series
- The DB seminar series features visiting speakers. These seminars are
more-or-less monthly, and are usually scheduled on Monday
mornings at 11am. See the list of current
events for times and locations of upcoming seminars. The
full schedule can be found on the DB seminar series page.
Recent and Upcoming Events
DB Meeting:
|
Friday September 7, 2:00pm, DC 1331
|
Topic:
|
Kickoff meeting + Group photograph
|
DB Meeting:
|
Friday September 14, 2:00pm, DC 1331
|
Speaker:
|
Glenn Paulley, Sybase iAnywhere
|
Title:
|
Modelling Performance Benchmark Results Using Linear Regression
|
Abstract:
|
My talk will be on the modelling of performance benchmark results using
linear regression techniques. In particular, I'm going to talk about the
design of experiments where there are multiple variables to be tested, and
how to go about determining the proportion of variation in the experimental
results for each of the variables. My primary reference is
Raj Jain (1991). The Art of Computer Systems Performance Analysis. John
Wiley & Sons, New York. ISBN 0-471-50336-3.
|
DB Meeting:
|
Friday September 21, 2:00pm, DC 1331
|
Speaker:
|
Huaxin Zhang
|
Title:
|
Cardinality Estimation in the Presence of Access Controls
|
Abstract:
|
In database systems that support fine-grained access controls,
each user has access rights that determine which tuples are
accessible and which are inaccessible. Queries are answered as if
the inaccessible tuples are not present in the database. Thus,
users with different access rights may get different answers to a
given query. To process queries efficiently in the presence of
fine-grained access controls, the database system needs accurate
estimates of the number of tuples that are both accessible
according to the access rights of the submitting user and relevant
according to the selection predicates in the query. In this paper
we present sampling-based cardinality estimation techniques for
use in the presence of fine-grained access controls. These
techniques exploit the fact that access rights are relatively
static and are common to all queries that are evaluated on behalf
of a particular user. We show that the proposed techniques provide
more accurate estimates than simpler techniques that do not
exploit knowledge of access rights. We quantify these improvements
analytically and through simulations.
|
IR Group Seminar
|
Wednesday October 3, 11:00am, DC 1304
|
Speaker:
|
Eugene Agichtein, Emory University
|
Title:
|
Exploring User Behavior in Web Search, Social Media, and Beyond
|
Abstract:
|
User generated data (e.g., web search queries, result clicks, web browsing traces,
interactions in social media) are the fastest growing source of data on the web.
We explore some of the ways we could mine this data to enable more effective access
to information. In particular, I will describe our recent work on interpreting
implicit feedback for improving general-purpose web search, and our ongoing work
on mining various forms of user behavior in social media, focusing on
Yahoo Answers - a popular community question answering portal.
|
DB Meeting:
|
Friday October 5, 2:00pm, DC 1331
|
Speaker:
|
Ken Salem
|
Title:
|
Dynamic Physical Design
|
Abstract:
|
Database workloads vary over time, so database physical designs
may need to vary as well. I'll discuss several ways to make
database physical design more dynamic.
|
DB Meeting:
|
Friday October 12, 2:00pm, DC 1304 (Please
note change of room) |
Speaker:
|
Jeff Pound
|
Title:
|
On Ordering and Indexing Metadata for the Semantic Web
|
Abstract:
|
In this talk I will outline the theoretical background for ordering and
indexing a finite collection of descriptions in a description logic. I
will then describe how this work can be applied to indexing
metadata for semantic data sets. In particular, we introduce a
language for specifying partial ordering relations over metadata
expressed as concepts in a description logic (e.g. the OWL web
ontology language), and show how this language can be used in
combination with binary trees to efficiently search a database of
concept descriptions. The language consists of a pair of ordering
constructors that support a form of exogenous indexing, in which
search criteria is independent of concept descriptions occurring in
a database, and a form of endogenous indexing, in which concept
descriptions themselves provide the search criteria. An important
feature of the language is that it can be refined in the same way as
a description logic, in that greater expressiveness and consequent
richer search capability is achieved by adding additional ordering
constructors.
|
DB Meeting:
|
Friday October 19, 2:00pm, DC 1331
|
Speaker:
|
Mumtaz Ahmad
|
Title:
|
External Query Scheduling in DBMS
|
Abstract:
|
Externally scheduling the queries can improve performance in database
applications. Most commonly used scheduling policies may give
sub-optimal performance. A typical workload in database consists of
queries of different types running concurrently. We will discuss in this
talk that
reasoning about concurrently running query mixes can give us
considerable performance benefits.
|
DB Meeting:
|
Friday November 2, 2:00pm, DC 1331
|
Speaker:
|
Iman Elghandour
|
Title:
|
Recommending Indexes for XML Databases
|
Abstract:
|
XML database systems are expected to handle increasingly complex queries
over increasingly large and highly structured XML databases. Having the
correct
indexes can improve the performance of such queries by orders of
magnitude, and XML database systems employ a variety of structural and
value indexes
to improve query performance. But a problem that is faced by users of
these systems is how to choose the best set of indexes for a given XML
database and
query workload.
In this talk, I will present an XML Design Advisor that solves the XML
index recommendation problem, and has the unique characteristic of
being tightly coupled with the query optimizer. I will also describe the
recommendation algorithms that we have developed to find the optimal
index configuration. I will conclude with presenting the results of the
experiments that we conducted on a prototype version of IBM DB2.
|
DB Meeting:
|
Friday November 9, 2:00pm, DC 1331
|
Speaker:
|
Frank Tompa
|
Title:
|
Seeking Stable Clusters in the Blogosphere
|
Abstract:
|
Many people now post new messages to blogs at least daily.
Intuitively we expect that when a topic is hot, there will be a lot of
chatter using some common vocabulary. Thus a set of keywords will be
correlated, forming a cluster. As topics evolve in the blogosphere, so
do the corresponding keyword clusters.
We developed efficient algorithms to identify keyword clusters and to
identify those that persist over several time intervals. I will present
some examples of stable clusters found and outline several of the
algorithms involved.
References: VLDB 2007 (pp. 806-817) and http://www.blogscope.net/
Collaborators: Nilesh Bansal, Fei Chiang, and Nick Koudas (all at UofT)
|
DB Meeting:
|
Friday November 23, 2:00pm, DC 1331
|
Speaker:
|
Dan Farrar, Sybase iAnywhere
|
Title:
|
Approximate Query Processing Techniques
|
Abstract:
|
Decision support systems, systems for analyzing experimental data, and data stream processing applications often share two properties: they require the processing of huge volumes of data; but, they can tolerate some degree of error in queries made of them. By compactly summarizing a huge dataset and performing queries against this summary, acceptable levels of inaccuracy can be traded for substantial performance improvements.
In this talk, I will review the nature of approximate query processing and survey the methods discussed in the literature for computing data synopses and for using these synopses to provide rapid approximate answers to queries. I will discuss in some detail the wavelet approach described by Chakrabarti et al. and suggest an application of their work for existing relational database systems.
Kaushik Chakrabarti, Minos Garofalakis, Rajeev Rastogi, and Kyuseok Shim. "Approximate Query Processing Using Wavelets". In Proc. of the 26th VLDB Conf., 2000.
|
DB Meeting, MMath Thesis Presentation: |
Friday November 30, 2:00pm, DC 1331
|
Speaker:
|
Umar Farooq Minhas |
Title:
|
A Performance Evaluation of Database Systems on Virtual Machines
|
Abstract:
|
Virtual machine technologies offer simple and practical mechanisms to
address many manageability problems in database systems. For example,
these technologies allow for server consolidation, easier deployment,
and more flexible provisioning. Therefore, database systems are
increasingly being run on virtual machines. This offers many unique
opportunities for database research. However, it is also important to
understand the cost of virtualization. Virtual machine technologies add
a layer of indirection between applications and the hardware that they
use (e.g. CPU, memory, disk). This added complexity results in a
performance overhead for software systems running in a virtual machine.
In this thesis, we present an experimental study of the overhead of
running a database workload in a virtual machine. Using a TPC-H workload
running on PostgreSQL in a Xen virtual machine environment, we show that
Xen does indeed introduce overhead for system calls, page fault
handling, and disk I/O. However, these overheads do not translate to a
high overhead in query execution time. We show that in all cases the
average overhead is less than 10% and therefore, conclude that the
advantages of running a database system in a virtual machine do not come
at a high cost in performance.
|
DB Meeting:
|
Friday December 7, 2:00pm, DC 1331
|
Speaker:
|
George Beskales |
Title:
|
Efficient Search for the Most Probable Nearest Neighbors
|
Abstract:
|
Efficient evaluation of nearest neighbor queries over probabilistic
data is essential to many applications that involve uncertainty and
fuzziness, such as similarity search and location based services.
Current approaches to support such queries use expensive
compute-then-sort techniques, or eliminate uncertainty using
expected-value analysis.
we formulate the problem of finding the k most probable nearest
neighbors under a general uncertainty model that combines probabilistic
data objects and query object. We identify and analyze the cost factors
involved in computing the most probable nearest neighbors, and we
present novel techniques to address the trade-offs among these cost
factors.
|
This page is maintained by
Ashraf Aboulnaga.